We have a Rails 4.2 application that runs alongisde a sidekiq process for long tasks.
Somehow, in a deploy a few weeks ago something went south (the capistrano deploy process didn't effectively stopped it, wasn't able to figure out why) and there was left an orphaned sidekiq process running that was competing with the current one for jobs on the redis queue. Because this process source became outdated it started giving random results on our application (depending on which process captured the job) and we got a very hard time until we figured this out..
How I can stop this from happenning ever again? I mean, I can ssh into the VPS after every deploy and run ps aux | grep sidekiq to check if there is more than one.. but it's not practical.
Use your init system to manage Sidekiq, not Capistrano.
https://github.com/mperham/sidekiq/wiki/Deployment#running-your-own-process
Related
I'm using an EC2 instance for hosting a rails application. I'm deploying with capistrano and I had already included sidekiq and it's working fine. However, sometimes on deploy, and sometimes sporadically, sidekiq stops running and I don't notice until some tasks that use sidekiq doesn't run.
I could do something on deploy to check that, but if it stops to work eventually after deploy, that would still be a problem.
I would like to know what is the best way, in that scenario, to check periodically if sidekiq is running, and if not to, run it.
I thought of doing a bash script for that, but apparently, when I run sidekiq from command line, it creates another process with a different pid of the one launched by sidekiq... so I think it could get messy.
Any help is appreciated. Thanks!
Learn and use systemd to manage the service.
https://github.com/mperham/sidekiq/wiki/Deployment#running-your-own-process
I have about 30 sidekiq jobs scheduled in the future (let's days 1 in a day for the next 30 days).
I use capistrano for deployment. So I have 5 release directories at anytime. Let's say:
/var/www/release1/ (recent)
/var/www/release2/
/var/www/release3/
/var/www/release4/
/var/www/release5/
Let's say after few days, I make a new release. Now, the previously scheduled jobs are still running from the old code. Is this expected? How can we fix this to ensure that it uses the latest release directory when it starts running rather than when it is scheduled?
I'd just like to contribute with an alternate answer for someone who might get into this situation by other reason.
It happened to me that there was a sidekiq zombie process running. So, even if I would stop sidekiq manually and restart it, I had another sidekiq process hanging running with old code. Therefore, it's a good idea to run unix htop command or ps aux | grep sidek and try to look for zombie processes.
This could be because sidekiq process didn't restart after a successful deployment.
Make sure your deployment process restarts sidekiq and make sure restart actually works, otherwise sidekiq processes are still holding on to old code.
https://github.com/mperham/sidekiq/wiki/Deployment
Situation: I am using Rails + Unicorn, deploying with Capistrano. Sometimes Rails app fails to start in production mode (though it is not the real production, but a staging env). This usually happens due to errors in deploy scripts or configuration (thus usually not detectable by tests). When this happens, unicorn master process kills the worker that failed and spawns a new one, which also fails and so on and so forth. During all that time unicorn consumes lots of CPU and pollutes logs with the same message.
Manual way (not good): Go to your home page to see if it works. Look at the htop. Tail the logs. Kill unicorn manually. Cons: easy to forget. Logs are polluted, CPU is loaded while you are reacting.
Another solution: Use unicorn's preload_app true. This will cause master process to fail fast. Cons: higher memory consumption in happy scenario.
Best practice: - ???
Is there any way to cleverly detect that unicorn master uselessly tries to spawn failing children and stop it?
You have something like "unicorn start" in your Capistrano script right? Make your Capistrano script ping Unicorn right after invoking that command. If Unicorn does not return an expected response within a timeout, then you know that something went wrong, and you can choose to rollback the deploy or performing some other action.
As for how to ping Unicorn, that depends. If you have Unicorn listening on a TCP socket then you can use curl. If you have Unicorn listening on a Unix domain socket then you have to write a little script that connects to it, like this:
require 'socket'
sock = UNIXSocket.new('/path-to-unicorn.sock')
sock.write("HEAD / HTTP/1.0\r\n")
sock.write("Host: www.foo.com\r\n")
sock.write("Connection: close\r\n")
sock.write("\r\n")
if sock.read !~ /something/
exit 1
end
But it sounds like Phusion Passenger Enterprise solves your problem beautifully. It has this feature called "deployment error resistance". When you deploy a new version and Phusion Passenger detects that it cannot spawn any processes for your new codebase, it will stop trying to spawn your new version and keep the processes for the old versions around indefinitely, until you manually give the signal that it's okay to spawn processes for the new version. In the mean time it will log all errors into the log file so that you can analyze the problem.
I would suggest brushing off your bash skills. The functionality you need is already in Unicorn as it leverages the Unix-y master/worker process.
You need a init.d script. Or at the very least godrb or monit. I recommend the init.d script route AND monitoring. Its more complex, but it can more easily be leveraged by your monitoring software and also gives you an automatic start on reboot.
The gist of it is:
Send the USR2 signal to the unicorn master process, this will fork the master process.
Then send the WINCH to the old master process that gets created, this will kill each worker.
Then you can send the old master process the QUIT signal.
Unicorn Signals
This will spin up a new master process running the new code and label the old one as (old). If it fails the old one should be returned to its prior state and you shouldn't suffer an outage, just a restart error. This is the beauty of unicorn. You can almost get instantaneous deploys of your code.
I'm using a lot of hedge words because I did this work on my apps over a year ago so there are a lot of cobwebs upstairs. Hope this helps!
This is by no mean a correct script. Its a good starting point though ... feel free to update the gist if you can improve upon it! :-)
Example Unicorn Control Script
So right now I have an implementation of delayed_job that works perfectly on my local development environment. In order to start the worker on my machine, I just run rake jobs:work and it works perfectly.
To get delayed_job to work on heroku, I've been using pretty much the same command: heroku run rake jobs:work. This solution works, without me having to pay anything for worker costs to Heroku, but I have to keep my command prompt window open or else the delayed_job worker stops when I close it. Is there a command to permanently keep this delayed_job worker working even when I close the command window? Or is there another better way to go about this?
I recommend the workless gem to run delayed jobs on heroku. I use this now - it works perfectly for me, zero hassle and at zero cost.
I have also used hirefireapp which gives a much finer degree of control on scaling workers. This costs, but costs less than a single heroku worker (over a month). I don't use this now, but have used it, and it worked very well.
Add
worker: rake jobs:work
to your Procfile.
EDIT:
Even if you run it from your console you 'buy' worker dyno but Heroku has per second biling. So you don't pay because you have 750h free, and month in worst case has 744h, so you have free 6h for your extra dynos, scheduler tasks and so on.
I haven't tried it personally yet, but you might find nohup useful. It allows your process to run even though you have closed your terminal window. Link: http://linux.101hacks.com/unix/nohup-command/
Using heroku console to get workers onto the jobs will only create create a temporary dyno for the job. To keep the jobs running without cli, you need to put the command into the Procfile as #Lucaksz suggested.
After deployment, you also need to scale the dyno formation, as heroku need to know how many dyno should be put onto the process type like this:
heroku ps:scale worker=1
More details can be read here https://devcenter.heroku.com/articles/scaling
I am looking to automate the starting/restarting of queues with Resque in my Ruby on Rails application. (running on JRuby)
I want to ensure the following criteria are met:
Workers are started after I deploy with capistrano
Workers are restarted if they die for whatever reason
Workers eating too much memory are stopped/restarted and can fire me an email alert
Are there tools that current provide this functionality or at least a subset of it? If there isn't anything that restarts the queue/worker, I would like to be notified at minimum so I can manually do it.
The easiest way to do it would be using a program such as God or Monit to get #2 and #3. For #1, you can just setup your Capistrano script to send a kill -INT to all the Resque workers, then the monitoring program will start them up again.
The advantaged to using kill -INT rather than manually stopping and starting the jobs in the Capistrano script is that your deploy won't have to wait for every worker to stop processing its job to start them back up. It also means if you have a long running job, you will quickly have whatever free workers were running on the new code as quickly as possible.
I'm not especially familiar with it, however I believe the god gem is used frequently for process management.