Phusion Passenger process pool explained - ruby-on-rails

I am trying to understand exactly how requests to a rails application get processed with Phusion Passenger. I have read through the Passenger docs (found here: http://www.modrails.com/documentation/Architectural%20overview.html#_phusion_passenger_architecture) and I understand how they maintain copies of the rails framework and your application code in memory so that every request to the application doesn't get bogged down by spinning up another instance of your application. What I don't understand is how these separate application instances share the native ruby process on my linux machine. I have been doing some research and here is what I think is happening:
One request hits the web server which dispatches Passenger to fulfill the request on one of Passenger's idle worker processes. Another request comes in almost simultaneously and is handled by yet another idle Passenger worker process.
At this point there are two requests being executed which are being managed by two different Passenger worker processes. Passenger creates a green thread on Linux's native Ruby thread for each worker process. Each green thread is executed using context-switching so that blocking operations on one Passenger worker process do not prevent that other worker process from being executed.
Am I on the right track?
Thanks for your help!

The application instances don't "share the native Ruby process". An application instance is a Ruby process (or a Node.js process, or a Python process, depending on what language your app is written in), and is also the same as a "Passenger worker process". It also has got nothing to do with threading.

Related

Why rails 5 using puma instead of webrick for development purpose?

I tried to find out the difference between the Puma and Webrick, but didn't get it or satisfied with it.
So could any one please share information regarding it.
By default WEBrick is single threaded, single process. This means that if two requests come in at the same time, the second must wait for the first to finish.
The most efficient way to tackle slow I/O is multithreading. A worker process spawns several worker threads inside of it. Each request is handled by one of those threads, but when it pauses for I/O - like waiting on a db query - another thread starts its work. This rapid back & forth makes best use of your RAM limitations, and keeps your CPU busy.
So, multithreading is achieved using Puma and that is why it is used as a default App Server in Rails App.
This is a question for Ruby on Rails developers rather than broad audience, because I don't understand reasons any other that putting development environment closer to production where Puma is a solid choice.
To correct the current answer however, I must say that Webrick is, and always has been, a multi-threaded web server. It now ships with Ruby language (and also a rubygem is available). And it is definitely good enough to serve Rails applications for development or for lower-scale production environments.
On the other hand it is not as configurable as other web servers like Puma. Also it is based on the old-school new thread per request design. This can be a problem under heavy load which can lead to too many threads created. Modern web servers solve this by using thread pools, worker processes or combination of the two or other techniques. This includes Puma, however for development spawning a new thread per request is totally fine.
I have no hard feelings for any of the two, both are great Ruby web servers and in our project we actually use them both in production. Anyway, if you like using Webrick for RoR development, you indeed can still use it:
rails server webrick
Rails 6.1 Minor update:
rails server -u webrick [-p NNNN]

Apache+Phusion Passenger+Ruby on Rails synchronization

I am running Apache, Phusion Passenger, Ruby on Rails website. I have a critical section that I can only allow 1 execution at the same time. I noticed that phusion passenger uses fork() to spawn the processes for each request according to this documentation. Since they are individual process, I noticed that all my Mutex.synchronize do not work. I added some loggings inside the critical section, and I still noticed logs being printed interleaving.
I tried all these:
Mutex.synchronize
Thread.exclusive
Monitor.synchronize
None of them works.
How do you do synchronization in this case? Thanks a lot.
Mutex, Monitor and Thread.exclusive only work within a single process. For inter-process synchronization, use lock files. See File#flock.

RabbitMQ with EventMachine and Rails

we are currently planning a rails 3.2.2 application where we use RabbitMQ. We would like to run several kind of workers (and several instances of a worker) to process messages from different queues. The workers are written in ruby and are laying in the lib directory of the rails app.
Some of the workers needs the rails framework (active record, active model...) and some of them don't. The first worker should be called every minute to check if updates are available. The other workers should process the messages from their queues when messages (which are send by the first worker) are present and do some (time consuming) stuff with it.
So far, so good. My problem is, that I only have little experiences with messaging systems like RabbitMQ and no experiences with the rails interaction between them. So I'm wondering what the best practices are to get the two playing with each other. Here are my requirements again:
Rails 3.2.2 app
RabbitMQ
Several kind of workers
Several instances of one worker
Control the amount of workers out of rails
Workers are doing time consuming tasks, so they have to be async
Only a few workers needs the rails framework. The others are just ruby files with some dependencies like Net or File
I was looking for some solution and came up with two possibilities:
Using amqp with EventMachine in a new thread
Of course, I don't want my rails app to be blocked when a new worker is created. The worker should run in another thread and do its work asynchronously. And furthermore, it should not start a new instance of my rails application. It should only require the things the worker needs.
But in some articles they say that there are some issues with Passenger. And another fact that I don't like is, that we are using webbrick for development and we ought to include workarounds for that too. It would be possible to switch to another webserver like thin, but I don't have any experience with that either.
Using some kind of daemonizing
Maybe its possible to run workers as a daemon, but I don't know how much overhead this would come up with, or how I can control the amount of workers.
Hope someone can advise a good solution for that (and I hope I made myself clear ;)
It seems to me that AMQP is a big shot to kill your problem. Have you tried to use Resque? The backed Redis database has some neat features (like publish/subscribe and blocking list pop) which make it very interesting as a message queue, and Resque is very easy to use in any Rails app.
The workers are daemonized, and you decide which worker of your pool listens to which queue, so you can scale each type of job as needed.
Using EM reactor inside a request/response cycle is not recommended, because it may conflict with an existing event loop (for instance if your app is served by thin), in any case you have to configure it specifically for your web server, OTOS it may be interesting to have an evented queue consumer, if your jobs have blocking IO and are not processor-bound.
If you still want to do it with AMQP, see Starting the event loop and connecting in Web applications and configure for your web server accordingly. Or use bunny to push synchronously in the queue (and whichever job consumer you deam useflu, like workling for instance)
we are running slightly different -- but similar technology stack.
daemon kit is used for eventmachine side of the system... no rails, but shared models (mongomapper & mongodb). EM is pulling messages off the queues, and doing whatever logic is required (we have ruleby in the mix, but if-then-else works too).
mulesoft ESB is our outward-facing message receiver and sender that helps us deal with the HL7/MLLP world. But in v1 of the app, we used some java code in ActiveMQ to manage HL7 messages.
the rails app then just serves up stuff for the user to see -- again, using the shared models.

Rails keeps being rebooted in production Passenger

I'm running an application that kicks off a Rufus Scheduler process in an initializer. The application is running with Passenger in production and I've noticed a couple weird behaviors:
First, in order to restart the server and make sure the initializer gets run, you have to both touch tmp/restart.txt and load the app in a browser. At that point, the initializer fires. The horrible thing is that if you only do the touch, the processes scheduled by Rufus get reset and aren't rescheduled until you load the app in a browser.
This alone I can deal with. But this leads to the second problem: I'll notice that the scheduled process hasn't run, so I load a page and suddenly the log file is telling me that it's running the initializers as if I'd rebooted. So, at some point, Passenger is randomly rebooting as if I'd touched tmp/restart.txt and wiping out my scheduled processes.
I have an incredibly poor understanding of Passenger and Rails's integration, so I don't know whether this occasional rebooting is aberrant or all part of the architecture. Can anyone offer any wisdom on this situation?
What you describe is the way Passenger works. It spawns new instances of the application when traffic warrants them, and shuts them down after periods of inactivity to free resources.
You should read the Passenger documentation, particularly the Resource Control and Optimization section. There are settings which can prevent the application from being shut down by Passenger, if that is what you want.
Using the PassengerPoolIdleTime setting, you could keep at least one process running, but you'll almost certainly want Passenger to start up other instances of the app as necessary. This thread on the Rufus Scheduler Google Group mentions using lock files to prevent more than one process from starting the scheduler, that may be useful to you.

How does Phusion Passenger reuse threads and processes?

I am setting up an Apache2 webserver running multiple Ruby on Rails web applications with Phusion Passenger. I know that Passenger spawns Ruby processes for handling requests. I have the following questions:
If more than one request has to be handled at the same time, will Passenger spawn multiple processes or multiple (Ruby) threads? How do I configure it so it always spawns single-threaded processes?
If I have two Rails applications, imagine that a request for app A goes to process 1, then later request for app B arrives. Is it possible that process 1 will handle this request as well? When and how is this possible? In other words, is one process allowed to handle requests for multiple Rails applications?
I have the same Rails application exported in multiple URLs and multiple virtual hosts (such as http:// and https://). Will the same process be able to serve different virtual hosts? (The answer to this seems to be yes, I've set a global variable in answering a request to virtual host A, and I was able to retrieve the value in virtual host B.)
Generally speaking, Passenger spawns new processes by forking an ApplicationSpawner, which has the framework and application code pre-loaded into memory, or a FrameworkSpawner, which just has the framework code.
Passenger, as far as I know, doesn't deal in threads. Instead, as the load increases on an application, it will fork that Application's ApplicationSpawner and initialize another instance. When load decreases, one or more application instances are killed off.
If Passenger is configured in a certain way (I believe by choosing the "smart" spawn method), it will create a FrameworkSpawner, which loads the rails code, but no application code, which can then be forked to load and application using that version of Rails.
So to answer your questions:
It will serve them sequentially, then spawn additional processes if it decides the load is high enough.
No. One process can only belong to a single Rails Application.
I'm kind of sketchy on this one, but your experiment makes sense. Passenger should be smart enough to figure out that even though it's running from different places in the server config, you're talking about the same application. It's probably based on the application's filesystem path.
EDIT: I went and read up on this a bit. Turns out I was mostly right, but the technical details were a bit off. See the Passenger documentation
Yup, Burke is right. In case of the third question, Phusion Passenger recognizes applications by their application root path. So even if you have two virtual hosts, if they both point to the same DocumentRoot then Phusion Passenger will think that they're the same app.

Resources