Using Process.spawn as a replacement for Process.fork

Using Process.spawn as a replacement for Process.fork - ruby-on-rails

My development environment is a Windows machine running ruby 1.9.3p125 (RubyInstaller) and rails 3.2.8.
One issue that comes up, time and again, when using third-party gems, is the lack of fork() on Windows. This has recently hindered my ability to use pretty much any distributed test running gem (like these), due to their dependence on fork.
Some older questions on StackOverflow have attempted to find a resolution to this same problem, but were either before the addition of Process.spawn into ruby, or were from people forced to use an older version of Ruby, for some other reason.
One of the proposed solutions is to use Cygwin to gain fork() support, which is simply out of the question for this - I think I would prefer to switch to Linux fully, before that.
Another proposed solution has been using the win32-process gem to gain fork() support. Fork support was removed from the most recent version (0.7.0), and using the next oldest version (0.6.6), which does (sort-of) support fork does not seem to work, at least for running any of the distributed testing gems that I have tried (Spork, Parallel tests, Hydra, Specjour, practically all of them). Interestingly enough, the author of the gem alludes, in the readme, to Process.spawn being an acceptable workaround for Process.fork.
I have seen a lot of information either implying, or stating outright that spawn can be used as a replacement for fork, on Windows, with Ruby 1.9. I have spent a fair amount of time playing with this, basically trying to replace Process.fork with Process.spawn in several of the referenced gems, with no success. It seems to me that perhaps the behavior is similar, but not exactly the same. For example, it is unclear whether spawn actually copies the entire process in the same way the fork does, or simply creates a new process with the supplied arguments. It is also unclear as to whether the spawn method even accepts another ruby method as an argument, or only a system command. The docs seem to imply that it is only a command, but a method seems to work (sort-of), but I may be doing things incorrectly. I think that for some things, fork was just used to create a "cheap thread", in previous ruby versions that did not support threading. However, it seems that these distributed testing gems may legitimately rely on the full functionality of fork(), in order to maintain the project state, and to not load the whole ruby environment for every test. This is a bit outside of my normal programming duties and experience, so I may be making some incorrect assumptions.
So, my question is, can Process.spawn be used relatively simply to achieve the same outcome as Process.fork, in all cases? I am beginning to suspect not, but if so, could someone please post an example of how one would go about making the transformation?

EDIT: There is one common use case of fork() that can be replaced with spawn() -- the fork()--exec() combo. A lot of older (and modern) UNIX applications, when they want to spawn another process, will first fork, and then make an exec call (exec replaces the current process with another). This doesn't actually need fork(), which is why it can be replaced with spawn(). So, this:
if(!fork())
exec("dir")
end
can be replaced with:
Process.spawn("dir")
If any of the gems are using fork() like this, the fix is easy. Otherwise, it is almost impossible.
EDIT: The reason why win32-process' implementation of fork() doesn't work is that (as far as I can tell from the docs), it basically is spawn(), which isn't fork() at all.
No, I don't think it can be done. You see, Process.spawn creates a new process with the default blank state and native code. So, while I can do something like Process.spawn('dir') will start a new, blank process running dir, it won't clone any of the current process' state. It's only connection to your program is the parent - child connection.
You see, fork() is a very low level call. For example, on Linux, what fork() basically does is this: first, a new process is created with exactly cloned register state. Then, Linux does a copy-on-write reference to all of the parent process' pages. Linux then clones some other process flags. Obviously, all of these operations can only be done by the kernel, and the Windows kernel doesn't have the facilities to do that (and can't be patched to either).
Technically, only native programs need the OS for some sort of fork()-like support. Any layer of code needs the cooperation of the layer above it to do something like fork(). So while native C code needs the cooperation of the kernel to fork, Ruby theoretically only needs the cooperation of the interpreter to do a fork. However, the Ruby interpreter does not have a snapshot/restore feature, which would be necessarily to implement a fork. Because of this, normal Ruby fork is achieved by forking the interpreter itself, not the Ruby program.
So, while if you could patch the Ruby interpreter to add a stop/start and snapshot/restore feature, you could do it, but otherwise? I don't think so.
So what are your options? This is what I can think of:
Patch the Ruby interpreter
Patch the code that uses fork() to maybe use threads or spawn
Get a UNIX (I suggest this one)
Use Cygwin
Edit 1:
I wouldn't suggest using Cygwin's fork, as it involves special Cygwin process tables, there is no copy-on-write, which makes it very inefficient. Also, it involves a lot of jumping back and forth and a lot of copying. Avoid it if possible. Also, because Windows provides no facilities to copy address spaces, forks are very likely to fail, and will quite a lot of the time (see here).

Related

Run ruby code with eval and it's gems, rails

I'm using eval to run some code (that is in a database, there are no ruby files) but this code requires some gems. How would I go about running the code? Maybe there's a better way than eval?
To give a bit more context, I have a toggle button in the view that switches a boolean to true or false in the model. This is possible for every "piece of code".
When it gets switched to true, the code starts running in a thread that never stops, and when it's switched to false this kills the thread.
I'm just trying to make the thread run the code right now.
I'm pretty new to rails so maybe there's a much better way than doing it by hand like what I'm doing, but I've tried googling some typical threading stuff and it's used for sending mail or other such things. Not for something that never stops unless told to (ie toggle the button that switches the boolean to false).
Thanks in advance.

It sounds like you need the code to be prefixed with gem imports -- which probably requires a bundler environment to load those gems in the first place.
Since the code is stored in the database directly, you could prefix them with another column / constant that does all the gem loading for you.
If the gems are already installed via bundler on the server, try prefixing your code with bundle exec. To do this, you'll need to write the code to a temporary file location first.
To install the proper gems, that's part of your build/deploy processes.

To run a script in the rails environment, which sounds like what you need, you can use rails runner. E.g. something like:
rails runner lib/scripts/my_script_to_run.rb
This script would then get the required code from the database, and already have the correct (rails) context to be able to run the code.
Or maybe even more appropriate:
rails runner <your-ruby-code-here>
See documentation
An alternative is to look at background-jobs, which seem imho a better fit to this problem. Check out the guides for ActiveJob : http://edgeguides.rubyonrails.org/active_job_basics.html
The advantage of using background-jobs:
it is a proven way of working
some adapters include extra managerial tools
you do not have to manage threads yourself
there is a clean way to describe jobs in code and queue jobs
using existing concepts/solutions is also easier to explain to other developers
There is a variety of adapters available and some of the easiest just store the jobs in the current database. For a full list see http://edgeapi.rubyonrails.org/classes/ActiveJob/QueueAdapters.html, I would recommend starting with Que or DelayedJob. Your flow would change a little in that case: I would queue a job, and instead of looping, I would requeue as long as the toggle is not switched. But of course: I do not know your precise use-case so your approach could be equally valid or better.

Turning on dbg tracing on a remote node?

When running our Erlang application in our system tests, I sometimes want to turn on and capture a debug trace.
The Erlang node is started using a relx start script (called as _rel/bin/foo foreground), so I don't have any control over the startup options. The system test runner (written in Python) is capturing stdout from the node.
How do I connect to an Erlang node, using -remsh, turn on dbg-tracing, and have that output written to stdout on the original node? And how do I do this all in a Python-friendly way (though I'm happy to write an escript if that'll make it easier).
To complicate this further, the relx generated release doesn't include the runtime_tools library, so dbg: isn't actually available, so I'll also add this question.

There are quite few way you could do that. All depends on what you are familiar with, and what your use case is.
I would start from doing everything by hand. That way you have greatest control on that's going one, and how effects look like (if you are turning too much debugging or not enough). That's I'm most familiar with, and in the end you almost always will have to connect to remote shell and do something by hand (from my experience)
One feature of dbg that not too many people talk about i ability of saving/loading trace pasterns from files. I find those easiest way to store and share debugging information in between sessions; but lack of readability might be too big trade-off.
You don't have to use dbg if you don't want to interfere with your live system too much. You could use erlang:trace which is given by default, but you must be cautious about state you leave your VM in (dbg should turn off all tracing upon exit; with erlang:trace that's your responsibility)
If you debug session is part of python script, writng escript and calling it from python would be my way to go. You just have to remember that escripts are run in new VM, and -remsh will not allow you to just run your code on other VM. You will have to use rpc module for that.
Since you are using application is released you might look into logging. One might assume that there should already be some logging in place, quite possible lager which is somewhat standard in Erlang, and which have possibility to change logging level during runtime.
Personally I would try some mix of first and last option, and just experiment.

Seeing duplicate requests handled with Unicorn

Recently I upgraded our EC2 instances to c3.large in order to boost more unicorn workers to handle increased traffic to our site. (6xworkers per machine - 2 ec2 instances).
When I did this, I started seeing duplicate records being created! It looks like somehow, that maybe 2 unicorn workers are attempting to process the same request?
In my nginx logs I see ONE [POST] request to a particular rails controller - yet two records are being created in the database. Obviously I'm hitting some kind of race condition issue - but have no idea how to debug this. It doesn't happen all the time, but when it does it's frustrating as well. I also noticed that the two database records are within 5 seconds of each other. (And yes I disable the button when the user clicks the submit button).
Unicorn 4.8.2
Ruby 1.9.3
Rails 3.2.14
Any help would be great! Thanks!

Although it may not fix your problem, upgrade to Ruby 2.0.0. There are some blog mentioning that is a good idea to upgrade to Ruby 2.0.0 if you are using Unicorn.
For example: https://www.digitalocean.com/community/articles/how-to-optimize-unicorn-workers-in-a-ruby-on-rails-app
Specifically:
If you are using Ruby 1.9, you should seriously consider
switching to Ruby 2.0. To understand why, we need to understand a
little bit about forking.
Forking and Copy-on-Write (CoW)
When a child process is forked, it is
the exact same copy as the parent process. However, the actual
physical memory copied need not be made. Since they are exact copies,
both child and parent processes can share the same physical memory.
Only when a write is made-- then we copy the child process into
physical memory.
So how does this relate to Ruby 1.9/2.0 and Unicorn?
Recall the Unicorn uses forking. In theory, the operating system would
be able to take advantage of CoW. Unfortunately, Ruby 1.9 does not
make this possible. More accurately, the garbage collection
implementation of Ruby 1.9 does not make this possible. An extremely
simplified version is this — when the garbage collector of Ruby 1.9
kicks in, a write would have been made, thus rendering CoW useless.
Without going into too much detail, it suffices to say that the
garbage collector of Ruby 2.0 fixes this, and we can now exploit CoW.

What do I need to know about JRuby on Rails after developing RoR Apps?

I have done a few projects using Ruby on Rails. I am going to use JRuby on Rails and hosting it on GAE. In that case what are the stuff that I need to know while developing JRuby apps. I read that
JRuby has the same syntax
I can access Java libraries
JRuby does not have access to some gems/plugins
JRuby app would take some time to load for the first time, so I have to keep it alive by sending
request every 5 mins or so
I cannot use ActiveRecord and instead I must DataMapper
Please correct if I am wrong about any of the statements I have made and Is there anything else that I must know?. Do I need to start reading about JRuby from the scratch or I can go about as usual developing Ruby apps?

I use JRuby everyday.
True:
JRuby has the same syntax
JRuby does not have access to some gems/plugins
I can access Java libraries
Some gems/plugins have jruby-specific versions, some don't work at all. In general, I have found few problems and as the libraries and platforms have matured a lot of the problems have gone away (JRuby has become a lot better).
You can access Java, but in general why would you want to?
False:
JRuby app would take some time to load for the first time, so I have to keep it alive by sending request every 5 mins or so
I cannot use ActiveRecord and instead I must DataMapper
Although I guess it is possible to imagine a server setup where the initial startup/warmup cost of the JVM means you need to ping the server, there is nothing inherent in JRuby that makes this true. If you need to keep the server alive, you should look at your deployment environment. Something similar happens in shared-hosting with passenger where an app can go out of memory after a period of inactivity.
Also, we use ActiveRecord with no problems at all.

afaik, rails 3 is 100% compatible with jruby, so there should be no problem on that path.
like every new platform, you should make yourself comfortable with it by playing around with jruby. i recommend using RVM to do that.
as far as you questions go:
JRuby is just an other runtime like MRI or Rubinus
since JRuby is within the JVM using Java is very easy, but you can also use RJB from MRI
some gems are not compatible, when they use native c libraries, that do not run on JRuby
the JVM and your application container need startup time and some time to load your app, but that is all, there is no need for keep alive, that is wrong
you can use whatever you want, most gems are updated to be compatible with JRuby

#TobyHede mostly covered issues that you thought of you might have so I'll leave it at that.
As for other things to have in mind, it's simply a different interpreter and funny discrepancies will crop up that will take some adaptation.
some methods are implemented differently, such as sleep 10.seconds will throw exception (you have to sleep 10.seconds.to_i) and I remember getting NoMethodError on Symbol class when switching from MRI to JRuby (don't remember which method wasn't implemented), just have in mind slight variations will be there
you will experience hangs and exceptions in gems that otherwise worked for you (pry for example when listing more then one page)
some gems may work differently, pry (again) will exit if you press ctrl+c for example, pretty annoying
slightly slower load times of everything and no zeus
you'll get occasional java exception stack traces with no indication on which line of ruby code it happened
Timeout.timeout often will not work as expected when its wrapped around net code and stars align badly (this has mostly been fixed in jruby core, but it seems to still be an issue with gems that do their own netcode in pure java)
hidden problems with thread-safety in third party code How do you choose gems for a high throughput multithreaded Rails app? - stay away from EventMachine for example
threads will be awesome (due to nativeness and no gil) and fibers will suck (due to no coroutine support in JVM they're ordinary threads), this is why you often won't get a performance boost with celluloid when compared to MRI
you used to run your rails with MRI Ruby as processes in an OS, you knew how to track their PIDs, bloat, run times, kill them, monitor them etc, this part is not evident when you switch to JRuby because everything has turned to threads in a single process. Java world has very good tools to handle these issues, but its something you'll have to learn
killall -9 ruby doesn't do the trick with jruby when your console hangs (which it does more often then before), you have to ps -ef and then track the proper processes without killing your netbeans etc (minor, but annoying)
due to my last point, knowing Java and the JVM will help you get out of tight spots in certain situations (depending on what you intend to do this may be something you actually really need), choice of deployment server will increase or decrease this need (torquebox for example is a bit notorious for this, other deployment options might be simpler, see http://thenerdings.blogspot.com/2012/09/pulling-plug-on-torquebox-and-jruby-for.html)
...
Also, see what jruby team says about differences, https://github.com/jruby/jruby/wiki/DifferencesBetweenMriAndJruby
But yeah, otherwise its "just the same as MRI Ruby" :)

Why isn't there a viable mod_ruby for Apache yet?

As popular as Ruby and Rails are, it seems like this problem would already be solved. JRuby and mod_rails are all fine and dandy, but why isn't there an Apache mod for just straight Ruby?

There is Phusion Passenger, a robust Apache module that can run Rack applications with minimum configuration. It's becoming appealing to shared hosts, and turning any program into a Rack application is ridiculously easy:
A Rack application is an Ruby object
(not a class) that responds to call.
It takes exactly one argument, the
environment and returns an Array of
exactly three values: The status, the
headers, and the body.

The basic problem is this: for a long time, MRI was the only feasible Ruby Implementation. MRI has a number of problems that make it hard to embed it into another application (which is basically what mod_ruby does: it embeds MRI in Apache), especially a multi-threaded one (which Apache is). It is not particularly thread-safe and it has quite a bit of global state.
This global state means for example that if one Rails application modifies some class, then all other Rails applications that run on the same Apache server, will also see this modified class.
Another problem is that the MRI source code is not easily hackable. MRI is now more than 15 years old, and it's starting to show.
As a result of these problems, mod_ruby has never really properly worked, and at some point the maintainers simply gave up.
The C based PHP interpreter, on the other hand, was designed from day one to be run as mod_php inside Apache. Indeed, for the first couple of versions, there wasn't even a commandline version of the interpreter, mod_php was the only way to run PHP.
Phusion Passenger (aka mod_rack aka mod_rails) solves this problem by basically giving up and sidestepping the problem: they simply run a seperate copy of MRI in a seperate process for every application. It works great, and not only for Ruby. It supports WSGI (standard interface for Python Web Frameworks), Rack (standard interface for Ruby Web Frameworks) and direct support for Ruby on Rails.
My hopes are on mod_rubinius, which unfortunately doesn't exist yet. Rubinius was designed from the beginning to be thread-safe, embeddable, free of global state, not use the C stack and so on. It was designed to be able to run multiple Rubinius VMs inside one Rubinius process. This makes mod_rubinius infinitely easier to implement and maintain than mod_ruby. Unfortunately, of course, Rubinius is not released yet, and the real work on mod_rubinius cannot even begin until Rubinius is released. The good news is that mod_rubinius already has more manpower behind it than mod_ruby ever had, because it has paid developers working on it by a Rails hosting company that desperately wants to use it themselves.

It's perhaps worth double-clarifying mislav's point that mod_rails isn't actually limited to Rails code at all. The new name, mod_rack, is way better. Trivially small apps can be rackable -- their example being:
class HelloWorld
def call(env)
[200, {"Content-Type" => "text/plain"}, ["Hello world!"]]
end
end

There is one: mod_ruby, but it hasn't been maintained in about 2 years.

There is mod_rails and it can run Rack applications, what more can you need?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart