Phusion Passenger version: 5.0.29
I've read the documentation about Passenger's OOB (Out-of-Band) feature and would like to use it to make a decision out-of-band on whether a process should exit or not. If the process has to exit, then at the end of the OOB work, the process calls raise SystemExit
We've managed to get this working where the process exits and then Passenger spins up a new process later to handle the new incoming requests. But, we're seeing occasional 502s with the following lines in the passenger log.
[ 2019-03-27 22:25:13.3855 31726/7f78b6c02700 age/Cor/Con/InternalUtils.cpp:112 ]: [Client 1-10] Sending
502 response: application did not send a complete response
[ 2019-03-27 22:25:13.3859 31726/7f78b6201700 age/Cor/CoreMain.cpp:819 ]: Checking whether to disconnect
long-running connections for process 10334, application agent
App 16402 stdout:
App 16441 stdout:
App 16464 stdout:
[ 2019-03-27 22:28:05.0320 31726/7f78ba9ff700 age/Cor/App/Poo/AnalyticsCollection.cpp:102 ]: Process (pid=16365, group=agent) no longer exists! Detaching it from the pool.
Is the above behavior due to a race condition between the request handler forwarding the request to the process and the process exiting? Is Passenger designed to handle this scenario? Any workaround / solution for this problem?
Thank you!
Looks like we need to run "passenger-config detach-process [PID]" for graceful termination.
Improved process termination mechanism
If you wish to terminate a specific application process -- maybe because it's misbehaving -- then you can do that simply by killing it with the kill command. Unfortunately this method has a few drawbacks:
Any requests which the process was handling will be forcefully aborted, causing error responses to be returned to the clients.
For a short amount of time, new requests may be routed to that process. These requests will receive error responses.
In Passenger 5 we've introduced a new, graceful mechanism for terminating application processes: passenger-config detach-process. This command removes the process from the load balancing list, drains all existing requests, then terminates it cleanly.
You can learn more about this tool by running:
passenger-config detach-process --help
Source: https://blog.phusion.nl/2015/03/04/whats-new-in-passenger-5-part-2-better-logging-better-restarting-better-websockets-and-more/
Related
Running rails app in elastic-beanstalk for quite long time so far I didn't face this issue. My server keeps on restarting with logs as
[4412] ! Terminating timed out worker: 8646
[4412] ! Out-of-sync worker list, no 8646 worker
[4412] ! Out-of-sync worker list, no 8646 worker
In ebs log, I am getting
Environment health has transitioned from Ok to Warning. 50.0 % of the requests to the ELB are failing with HTTP 5xx. Insufficient request rate (3.0 requests/min) to determine application health (5 minutes ago). 1 out of 1 instances are impacted. See instance health for details.
My instance ran in t2.medium(CPU core 2), is this because of heavy load on the server? Do I need to upgrade my instance? I am totally stuck.
FYI: I included puma only by having in gemfile, as rails 4.+ can handle.
Suggest me your opinion on this. Thanks.
When I start Phusion Passenger Standalone web server (version 5.0.2), I see the following error in the log (even though everything works fine otherwise):
ServerKit/Server.h:892 ]: [Client 1-1] Disconnecting client with error: client socket write error: Broken pipe (errno=32)
Any idea what might be causing it?
Note: I start the server with foreman start and I stop it with control-c.
Passenger author here. Actually, the issue maxd linked to has got nothing to do with it.
The "Disconnecting client with error: client socket write error: Broken pipe" is a harmless informational message. It's quite normal, but I forgot to give it a lower logging level. I will do that in the next release. You can safely ignore this message. Nothing bad is going on.
The application servers used by Ruby web applications that I know have the concept of worker processes. For example, Unicorn has this on the unicorn.rb configuration file, and for mongrel it is called servers, set usually on your mongrel_cluster.yml file.
My two questions about it:
1) Does every worker/server works as a web server and spam a processes/threads/fiber each time it receives a request, or it blocks when a new request is done if there is already other running?
2) Is this different from application server to application server? (Like unicorn, mongrel, thin, webrick...)
This is different from app server to app server.
Mongrel (at least as of a few years ago) would have several worker processes, and you would use something like Apache to load balance between the worker processes; each would listen on a different port. And each mongrel worker had its own queue of requests, so if it was busy when apache gave it a new request, the new request would go in the queue until that worker finished its request. Occasionally, we would see problems where a very long request (generating a report) would have other requests pile up behind it, even if other mongrel workers were much less busy.
Unicorn has a master process and just needs to listen on one port, or a unix socket, and uses only one request queue. That master process only assigns requests to worker processes as they become available, so the problem we had with Mongrel is much less of an issue. If one worker takes a really long time, it won't have requests backing up behind it specifically, it just won't be available to help with the master queue of requests until it finishes its report or whatever the big request is.
Webrick shouldn't even be considered, it's designed to run as just one worker in development, reloading everything all the time.
off the top of my head, so don't take this as "truth"
ruby (MRI) servers:
unicorn, passenger and mongrel all use 'workers' which are separate processes, all of these workers are started when you launch the master process and they persist until the master process exits. If you have 10 workers and they are all handling requests, then request 11 will be blocked waiting for one of them to complete.
webrick only runs a single process as far as I know, so request 2 would be blocked until request 1 finishes
thin: I believe it uses 'event I/O' to handle http, but is still a single process server
jruby servers:
trinidad, torquebox are multi-threaded and run on the JVM
see also puma: multi-threaded for use with jruby or rubinious
I think GitHub best explains unicorn in their (old, but valid) blog post https://github.com/blog/517-unicorn.
I think it puts backlog requests in a queue.
At work we're running some high traffic sites in rails. We often get a problem with the following being spammed in the nginx error log:
2011/05/24 11:20:08 [error] 90248#0: *468577825 connect() to unix:/app_path/production/shared/system/unicorn.sock failed (61: Connection refused) while connecting to upstream
Our setup is nginx on the frontend server (load balancing), and unicorn on our 4 app servers. Each unicorn is running with 8 workers. The setup is very similar to the one GitHub uses.
Most of our content is cached, and when the request hits nginx it looks for the page in memcached and serves that it if can find it - otherwise the request goes to rails.
I can solve the above issue - SOMETIMES - by doing a pkill of the unicorn processes on the servers followed by a:
cap production unicorn:check (removing all the pid's)
cap production unicorn:start
Do you guys have any clue to how I can debug this issue? We don't have any significantly high load on our database server when these problems occurs..
Something killed your unicorn process on one of the servers, or it timed out. Or you have an old app server in your upstream app_server { } block that is no longer valid. Nginx will retry it from time to time. The default is to re-try another upstream if it gets a connection error, so hopefully your clients didn't notice anything.
I don't think this is a nginx issue for me, restarting nginx didn't help. It seems to be gunicorn...A quick and dirty way to avoid this is to recycle the gunicorn instances when the system is not being used, say 1AM for example if that is an acceptable maintenance window. I run gunicorn as a service that will come back up if killed so a pkill script takes care of the recycle/respawn:
start on runlevel [2345]
stop on runlevel [06]
respawn
respawn limit 10 5
exec /var/web/proj/server.sh
I am starting to wonder if this is at all related to memory allocation. I have MongoDB running on the same system and it reserves all the memory for itself but it is supposed to yield if other applications require more memory.
Other things worth a try is getting rid of eventlet or other dependent modules when running gunicorn. uWSGI can also be used as an alternative to gunicorn.
My RubyOnRails app is set up with the usual pack of mongrels behind Apache configuration. We've noticed that our Mongrel web server memory usage can grow quite large on certain operations and we'd really like to be able to dynamically do a graceful restart of selected Mongrel processes at any time.
However, for reasons I won't go into here it can sometimes be very important that we don't interrupt a Mongrel while it is servicing a request, so I assume a simple process kill isn't the answer.
Ideally, I want to send the Mongrel a signal that says "finish whatever you're doing and then quit before accepting any more connections".
Is there a standard technique or best practice for this?
I've done a little more investigation into the Mongrel source and it turns out that Mongrel installs a signal handler to catch an standard process kill (TERM) and do a graceful shutdown, so I don't need a special procedure after all.
You can see this working from the log output you get when killing a Mongrel while it's processing a request. For example:
** TERM signal received.
Thu Aug 28 00:52:35 +0000 2008: Reaping 2 threads for slow workers because of 'shutdown'
Waiting for 2 requests to finish, could take 60 seconds.Thu Aug 28 00:52:41 +0000 2008: Reaping 2 threads for slow workers because of 'shutdown'
Waiting for 2 requests to finish, could take 60 seconds.Thu Aug 28 00:52:43 +0000 2008 (13051) Rendering layoutfalsecontent_typetext/htmlactionindex within layouts/application
Look at using monit. You can dynamically restart mongrel based on memory or CPU usage. Here's a line from a config file that I wrote for a client of mine.
check process mongrel-8000 with pidfile /var/www/apps/fooapp/current/tmp/pids/mongrel.8000.pid
start program = "/usr/local/bin/mongrel_rails cluster::start --only 8000"
stop program = "/usr/local/bin/mongrel_rails cluster::stop --only 8000"
if totalmem is greater than 150.0 MB for 5 cycles then restart # eating up memory?
if cpu is greater than 50% for 8 cycles then alert # send an email to admin
if cpu is greater than 80% for 5 cycles then restart # hung process?
if loadavg(5min) greater than 10 for 3 cycles then restart # bad, bad, bad
if 3 restarts within 5 cycles then timeout # something is wrong, call the sys-admin
if failed host 192.168.106.53 port 8000 protocol http request /monit_stub
with timeout 10 seconds
then restart
group mongrel
You'd then repeat this configuration for all of your mongrel cluster instances. The monit_stub line is just an empty file that monit tries to download. If it can't, it tries to restart the instance as well.
Note: the resource monitoring seems not to work on OS X with the Darwin kernel.
Better question is how to keep your app from consuming so much memory that it requires you to reboot mongrels from time to time.
www.modrails.com reduced our memory footprint significantly
Boggy:
If you have one process running, it will gracefully shut down (service all the requests in its queue which should only be 1 if you are using proper load balancing). The problem is you can't start the new server until the old one dies, so your users will queue up in the load balancer. What I've found successful is a 'cascade' or rolling restart of the mongrels. Instead of stopping them all and starting them all (therefore queuing requests until the one mongrel is done, stopped, restarted and accepting connections), you can stop then start each mongrel sequentially, blocking the call to restart the next mongrel until the previous one is back up (use a real HTTP check to a /status controller). As your mongrels roll, only one at a time is down and you are serving across two code bases - if you can't do this you should throw up a maintenance page for a minute. You should be able to automate this with capistrano or whatever your deploy tool is.
So I have 3 tasks:
cap:deploy - which does the traditional restart all at the same time method with a hook that puts up a maintenance page and then takes it down after an HTTP check.
cap:deploy:rolling - which does this cascade across the machine (I pull from a iClassify to know how many mongrels are on the given machine) without a maintenance page.
cap deploy:migrations - which does maintenance page + migrations since its usually a bad idea to run migrations 'live'.
Try using:
mongrel_cluster_ctl stop
You can also use:
mongrel_cluster_ctl restart
got a question
what happens when /usr/local/bin/mongrel_rails cluster::start --only 8000 is triggered ?
are all of the requests served by this particular process, to their end ? or are they aborted ?
I curious if this whole start/restart thing can be done without affecting the end users...