Using Process.spawn as a replacement for Process.fork

ghz 1years ago ⋅ 1838 views

Question

My development environment is a Windows machine running ruby 1.9.3p125 (RubyInstaller) and rails 3.2.8.

One issue that comes up, time and again, when using third-party gems, is the lack of fork() on Windows. This has recently hindered my ability to use pretty much any distributed test running gem (like [these](https://www.ruby- toolbox.com/categories/distributed_testing)), due to their dependence on fork.

Some older questions on StackOverflow have attempted to find a resolution to this same problem, but were either before the addition of Process.spawn into ruby, or were from people forced to use an older version of Ruby, for some other reason.

One of the proposed solutions is to use Cygwin to gain fork() support, which is simply out of the question for this - I think I would prefer to switch to Linux fully, before that.

Another proposed solution has been using the win32-process gem to gain fork() support. Fork support was removed from the most recent version (0.7.0), and using the next oldest version (0.6.6), which does (sort-of) support fork does not seem to work, at least for running any of the distributed testing gems that I have tried (Spork, Parallel tests, Hydra, Specjour, practically all of them). Interestingly enough, the author of the gem alludes, in the readme, to Process.spawn being an acceptable workaround for Process.fork.

I have seen a lot of information either implying, or [stating outright](http://ujihisa.blogspot.com/2010/03/how-to-run-external- command.html) that spawn can be used as a replacement for fork, on Windows, with Ruby 1.9. I have spent a fair amount of time playing with this, basically trying to replace Process.fork with Process.spawn in several of the referenced gems, with no success. It seems to me that perhaps the behavior is similar, but not exactly the same. For example, it is unclear whether spawn actually copies the entire process in the same way the fork does, or simply creates a new process with the supplied arguments. It is also unclear as to whether the spawn method even accepts another ruby method as an argument, or only a system command. The docs seem to imply that it is only a command, but a method seems to work (sort-of), but I may be doing things incorrectly. I think that for some things, fork was just used to create a "cheap thread", in previous ruby versions that did not support threading. However, it seems that these distributed testing gems may legitimately rely on the full functionality of fork(), in order to maintain the project state, and to not load the whole ruby environment for every test. This is a bit outside of my normal programming duties and experience, so I may be making some incorrect assumptions.

So, my question is, can Process.spawn be used relatively simply to achieve the same outcome as Process.fork, in all cases? I am beginning to suspect not, but if so, could someone please post an example of how one would go about making the transformation?


Answer

EDIT : There is one common use case of fork() that can be replaced with spawn() -- the fork()--exec() combo. A lot of older (and modern) UNIX applications, when they want to spawn another process, will first fork, and then make an exec call (exec replaces the current process with another). This doesn't actually need fork(), which is why it can be replaced with spawn(). So, this:

if(!fork())
  exec("dir")
end

can be replaced with:

Process.spawn("dir")

If any of the gems are using fork() like this, the fix is easy. Otherwise, it is almost impossible.


EDIT : The reason why win32-process' implementation of fork() doesn't work is that (as far as I can tell from the docs), it basically is spawn(), which isn't fork() at all.


No, I don't think it can be done. You see, Process.spawn creates a new process with the default blank state and native code. So, while I can do something like Process.spawn('dir') will start a new , blank process running dir, it won't clone any of the current process' state. It's only connection to your program is the parent - child connection.

You see, fork() is a very low level call. For example, on Linux, what fork() basically does is this: first, a new process is created with exactly cloned register state. Then, Linux does a copy-on-write reference to all of the parent process' pages. Linux then clones some other process flags. Obviously, all of these operations can only be done by the kernel, and the Windows kernel doesn't have the facilities to do that (and can't be patched to either).

Technically, only native programs need the OS for some sort of fork()-like support. Any layer of code needs the cooperation of the layer above it to do something like fork(). So while native C code needs the cooperation of the kernel to fork, Ruby theoretically only needs the cooperation of the interpreter to do a fork. However, the Ruby interpreter does not have a snapshot/restore feature, which would be necessarily to implement a fork. Because of this, normal Ruby fork is achieved by forking the interpreter itself, not the Ruby program.

So, while if you could patch the Ruby interpreter to add a stop/start and snapshot/restore feature, you could do it, but otherwise? I don't think so.

So what are your options? This is what I can think of:

  • Patch the Ruby interpreter
  • Patch the code that uses fork() to maybe use threads or spawn
  • Get a UNIX (I suggest this one)
  • Use Cygwin

Edit 1: I wouldn't suggest using Cygwin's fork, as it involves special Cygwin process tables, there is no copy-on-write, which makes it very inefficient. Also, it involves a lot of jumping back and forth and a lot of copying. Avoid it if possible. Also, because Windows provides no facilities to copy address spaces, forks are very likely to fail, and will quite a lot of the time (see here).