I have running RoR site which handled by unicorn. Unicorn master process spawns 10 workers and handle them well, but workers sometimes starts to spawn threads inside and do not kill them.... it leads to memory leaks and server fault.
I solved it by cron script which restarts unicorn every 10 minutes, but its really bad solution. Any ideas?
ScreenProof:
Unicorn (4.6.1) configuration files: https://gist.github.com/907th/4995323
Look into using Monit (http://mmonit.com/monit/) to monitor Unicorn and keep it in check. Watch Ryan Bates' wonderful video on the subject:* http://railscasts.com/episodes/375-monit
*requires a subscription but it's well worth the paltry $9 he's asking.
Related
Short version
Memory leak and R14 errors on Heroku with Rails + Unicorn when there are no requests at all. No problem locally, or with Webrick, or when the server is under a load.
Long version
I have a Rails 4 application running on Heroku. When there is no load on the server, the memory consumption reported by Heroku goes up around 0.5 MB per minute. After a few hours, R14 errors starting to appear in the logs, and a while later the process does not respond and Heroku kills it.
I can see the increasing memory usage in New Relic as well.
02:03:00 heroku[web.1]: source=web.1 sample#load_avg_1m=0.00 sample#load_avg_5m=0.00 sample#load_avg_15m=0.00
02:03:00 heroku[web.1]: source=web.1 sample#memory_total=533.19MB sample#memory_rss=511.88MB sample#memory_cache=0.00MB sample#memory_swap=21.30MB sample#memory_pgpgin=1034783pages sample#memory_pgpgout=903740pages
02:03:00 heroku[web.1]: Process running mem=533M(104.1%)
Environment:
Ruby 2.1.4 + Rails 4.1.7 + Unicorn + Heroku with 1 web dyno
What I tried so far
Decreased the number of Unicorn workers from the default 3 to 2. Did not help, just made the memory leak a little bit slower.
Disabled background workers (removed from Procfile). No effect. Did not change anything.
Installed Unicorn Worker Killer. It did not help because there are no requests going on, so the worker killer never kicks in (is there a time based worker killer?).
Used ApacheBench to stress test the server. The memory usage was constant, around 190MB/process according to New Relic. There was no increase. It seems like the memory leak only happens when there is no server load.
Removed Unicorn and used the default Webrick. The memory usage became constant, around 208MB/instance according to New Relic!
Tried to reproduce this issue on local machine by starting the server in production with RAILS_ENV=production foreman start. There was no memory leak in this case.
Based on point 5, I think the memory leak is related to Unicorn but somehow only happens on Heroku.
How to track down where is the problem?
Thanks! Any help is appreciated.
I am able to get Sidekiq scheduler working locally. The last obstacle in my way is how to deploy this to a production app on passenger. Can someone point me in the right direction on how to run Sidekiq continuously on passenger.
Appreciate it.
Passenger is a Apache\nginx module for running Rails\Rack apps.
Sidekiq is a threaded background worker queue that usually is run with JRuby in production.
You do not run Sidekiq through Passenger.
Rather, just configure Passenger to run and serve you app as needed. Then you can start Sidekiq and have it poll Redis for work. It is highly recommend you use either JRuby or Rubinius so you take full advantage of Sidekiq's threaded nature.
For more details on deploying Sidekiq, refer to the wiki:
https://github.com/mperham/sidekiq/wiki/Deployment
For more details on configuring Passenger refer to it's docs (for either Apache or nginx):
https://www.phusionpassenger.com/support#documentation
Update: From the creator of Sidekiq there is a library called Girl Friday. This library adds an asynchronous job queue but runs inline with your Rails application (or other Rack app). This option can greatly simplify deployment and save money!
We're doing zero downtime deploys for our Rails app using unicorn and the usual zero downtime deploy setup. (we don't use the exact setup from the example, but the thing is similiar).
This used to work, but during our upgrade to Rails 3.2 we ran into a weird problem:
Old unicorn master gets USR2
New master is spawned
However, the old master does never terminate and does seem to react to QUIT at all.
The old master will still react to WINCH and shut down all workers, and it can be shut down with TERM - but graceful shutdown just doesn't work
We do attempt to close the database connection from the master processes, but are not completely if it does still hold Redis connections (and if that would cause a problem).
Just for the people who run into the some problem: Someone put this gist into our code, that cleverly trapped the QUIT signal... sigh. So it had nothing to do with the unicorn setup at all.
With Unicorn, you can restart and reload a Rails app with kill -USR2 [master process], which doesn't kill the process immediately, but starts a new master process + slave processes in the background. When the new master is ready, you can shut off the old master with kill -QUIT. This lets you restart your website without having any visitors notice a slowdown in request handling.
But with Passenger, you restart the Rails app with touch tmp/restart.txt, which as far as I can tell, causes the Rails app to become unresponsive for the few seconds it takes to restart the Rails application.
Is there a way to use Passenger, but also have the Rails app restart seamlessly?
Rolling restarts are available in Phusion Passenger Enterprise.
This is the "licensed version" klochner talked about, but it wasn't released until August. Phusion Passenger Enterprise fully automates rolling restarts (Unicorn requires some manual scripting to make rolling restarts behave in a good way). It also includes a bunch of other useful features such as deployment error resistance, live IRB console, etc.
No. [now yes - see hongli's response]
You're asking for rolling restarts, where the new server processes are brought up before the old ones are killed. Passenger (the free version) won't drop requests, but they will get queued and delayed whenever you deploy.
Rolling restarts has supposedly already been implemented and is available in the licensed version, but not yet released for the free version. I've been unable to figure out how to get the licensed version.
Follow this google groups thread for more info:
https://groups.google.com/forum/#!msg/phusion-passenger/hNvU-ZE7_WY/gOF9XWmhHy0J
You could try running two standalone passenger processes and manually bring one down while the other stays up, but I don't think that's the answer you were looking for.
I'm currently moving my application from a Linode setup to EC2. Redis is currently installed on a remote instance with various worker instances interacting with the queue. Thats all going fantastic.
My problem is with the amount of time it takes for a worker to be 'instantiated' and slow forking. Starting a worker will usually take between 30 seconds and a minute(from god.rb starting the worker rake task and the worker actively starting work on the queue). I could live with that, but I've not experienced such a wait time on my current Linode production box so I believe its one of my symptoms to a bigger problem. Next issue is that jobs that took a second or less in my previous environment now seem to take about 5 to 10 times longer..
I'm assuming this must be some sort of issue with my Ubuntu install on EC2? One notable difference is that I'm running REE 1.8.7-2010.01 in my new setup, and REE 1.8.6 on the old Linode boxes.
Anyone else experienced these issues?
It turns out I had overestimated the CPU power of an EC2 small instance. Moved my workers to a large instance and all is well.