We're doing zero downtime deploys for our Rails app using unicorn and the usual zero downtime deploy setup. (we don't use the exact setup from the example, but the thing is similiar).
This used to work, but during our upgrade to Rails 3.2 we ran into a weird problem:
Old unicorn master gets USR2
New master is spawned
However, the old master does never terminate and does seem to react to QUIT at all.
The old master will still react to WINCH and shut down all workers, and it can be shut down with TERM - but graceful shutdown just doesn't work
We do attempt to close the database connection from the master processes, but are not completely if it does still hold Redis connections (and if that would cause a problem).
Just for the people who run into the some problem: Someone put this gist into our code, that cleverly trapped the QUIT signal... sigh. So it had nothing to do with the unicorn setup at all.
Related
I have a rails 4.1 application with Sidekiq 3.3.2 in production mode with default config.
Everything works fine, but sometimes, I see in Sidekiq admin panel that Sidekiq process, not listed in the "busy" in the panel and does not exist in the list of processes.
That seems like a sidekiq's process has been fail.
In the log there is no error entries.
How can I make that stable?
This is known problem. Hard loaded Sidekiq worker is disappear sometimes from busy list but still working fine.
I known issue marked as fixed but seems like it appear again and again. Just don't worry about it and use ps aux | grep sidekiq to check running Sidekiq workers in this case.
I have a Reporter worker for Resque in my Rails 3.2.8 application. I frequently add new reports for users, or fix bugs in existing reports.
Reports are deployed as Ruby modules whose methods are called by the Resque reporter worker.
Every time I deploy new code, I have to restart Resque. During that time, there are often one or more reports out there that are then killed, left with a status of "Running". What I want to find out is, is there a way to get Resque to reload the ruby modules that it uses to run the reports?
Instead of reloading you could stop the resque workers with a kill -s QUIT. That will cause the workers to finish running their reports before shutting down.
More info on using signals with resque is here, https://github.com/defunkt/resque#signals.
In my production server, I'm using ruby foreman to run multiple processes, I just want my application to keep working, even if one of the processes down, I want my processes to keep working even if one down , is there any tricky way to restart the process or even not to stop all the processes in case one went down ? I mean in production level I want the solution to be stable enough, is that possible without Upstart ? thanks in advance
You should not be using foreman itself for production - it is only intended as a development tool. Instead, you can use something like god with my foreman_god gem in production.
Alternatively, you can use foreman to export config files for other process monitoring systems, for example upstart.
You can monitor your foreman process with another program like http://mmonit.com/monit/. But somehow you'll find that monitoring a process which monitor other processes is kinda strange.
With Unicorn, you can restart and reload a Rails app with kill -USR2 [master process], which doesn't kill the process immediately, but starts a new master process + slave processes in the background. When the new master is ready, you can shut off the old master with kill -QUIT. This lets you restart your website without having any visitors notice a slowdown in request handling.
But with Passenger, you restart the Rails app with touch tmp/restart.txt, which as far as I can tell, causes the Rails app to become unresponsive for the few seconds it takes to restart the Rails application.
Is there a way to use Passenger, but also have the Rails app restart seamlessly?
Rolling restarts are available in Phusion Passenger Enterprise.
This is the "licensed version" klochner talked about, but it wasn't released until August. Phusion Passenger Enterprise fully automates rolling restarts (Unicorn requires some manual scripting to make rolling restarts behave in a good way). It also includes a bunch of other useful features such as deployment error resistance, live IRB console, etc.
No. [now yes - see hongli's response]
You're asking for rolling restarts, where the new server processes are brought up before the old ones are killed. Passenger (the free version) won't drop requests, but they will get queued and delayed whenever you deploy.
Rolling restarts has supposedly already been implemented and is available in the licensed version, but not yet released for the free version. I've been unable to figure out how to get the licensed version.
Follow this google groups thread for more info:
https://groups.google.com/forum/#!msg/phusion-passenger/hNvU-ZE7_WY/gOF9XWmhHy0J
You could try running two standalone passenger processes and manually bring one down while the other stays up, but I don't think that's the answer you were looking for.
I'm currently moving my application from a Linode setup to EC2. Redis is currently installed on a remote instance with various worker instances interacting with the queue. Thats all going fantastic.
My problem is with the amount of time it takes for a worker to be 'instantiated' and slow forking. Starting a worker will usually take between 30 seconds and a minute(from god.rb starting the worker rake task and the worker actively starting work on the queue). I could live with that, but I've not experienced such a wait time on my current Linode production box so I believe its one of my symptoms to a bigger problem. Next issue is that jobs that took a second or less in my previous environment now seem to take about 5 to 10 times longer..
I'm assuming this must be some sort of issue with my Ubuntu install on EC2? One notable difference is that I'm running REE 1.8.7-2010.01 in my new setup, and REE 1.8.6 on the old Linode boxes.
Anyone else experienced these issues?
It turns out I had overestimated the CPU power of an EC2 small instance. Moved my workers to a large instance and all is well.