Is there a graceful way to restart resque workers(say after an event like deploy) where the worker which was processing some job from the queue doesn't get killed by a signal like SIGTERM or SIGKILL immediately, rather it should wait for the worker to finish the task it's doing and kill the worker when it gets free.
I am using God to monitor the resque workers and I went through the God homepage but wasn't able to find any info on the same but seeing as it's just a gem to monitor processes I don't think it has a graceful way to do this.
Also, I am looking to do automatic worker restart on deploy, I have looked at these two methods(http://simonecarletti.com/blog/2011/02/how-to-restart-god-when-you-deploy-a-new-release/, http://balazs.kutilovi.cz/2011/12/04/deploying-resque-scheduler-with-capistrano/). If there is a better way, it would be much helpful.
Turns out resque has an option where if you pass the signal SIGQUIT it will wait for the worker to finish the job and then quit the worker and god has an option of setting a stop signal for the process. So, you can define 'QUIT' as the stop_signal for the resque workers and when you stop god, it will send the singal SIGQUIT to the worker.
Related
I guess I need a sanity check here because if I want to prevent any sidekiq jobs from ending prematurely, Heroku Redis should handle this for me?
When I want to push new changes to a production site, I put the application in maintenance mode: heroku maintenance:on. Now when I do this and run heroku ps I can see both my web process and my worker (i.e. sidekiq) are up still (makes sense because its just to prevent users having access to the site).
If I shut down the worker dyno with a command like this: heroku ps:stop worker after the site is in maintenance mode, will this safely stop sidekiq workers before it does down? Also, from Sidekiq's documentation:
https://github.com/mperham/sidekiq/wiki/Deployment#heroku
It mentions a -t N switch where N is a number in seconds but that Heroku has a hard limit of allowing a process 30 seconds to shut down on its own. Am I correct that if I stop the worker process with the heroku command, it will give any currently running jobs N seconds to finish before giving it a SIGTERM signal?
If not, what additional steps do I need to take to make sure Sidekiq has safely shut down?
Sounds like you are fine. Heroku sends SIGTERM when you call ps:stop. Sending SIGTERM tells Sidekiq to shut down within N seconds. Your worker dyno should be safely down within 30 seconds.
I have a fleet of Backburner workers (Backburner::Workers::Simple). I seem to have hit an edge case where a worker occasionally can't get a DB connection, the job is reaped back by the server, and suddenly the worker goes on a tear, reserving jobs rapid-fire, all of which time out and eventually get buried, because the worker never again successfully gets a DB connection. Obviously, it would be ideal if I could fix the weirdness around DB connections and rapid-fire job reservation. That seems like a longer-term solution, though, because I've looked and don't see anything obvious. What I'd like to do is just have my error handler log the error, and then for the whole worker process to die. All of my workers are under process supervision, so this is a very clean, simple way to get a fresh worker without the DB problem. I've tried adding ; Kernel.exit (and variations on that) to my on_error lambda, but it doesn't seem to make a difference. How do I make this happen?
If the need is to kill the worker completely, no matter what, you can make a call to command line from ruby with system() to run a kill command.
So you just need to get the PID of the worker and then kill it from system call
system("kill -QUIT #{worker.pid}")
looking at how to get the PID of the worker, I get this information from backburner repository, and it seems like you can get the PID from the worker with
Process.pid
Is there a way using dask client to restart a worker or worker list provided. Needed a way to bounce a worker after a task is executed to reset the state of the process which may have been changed by the execution.
Client.restart() restarts entire grid and so may end up killing any tasks running in parallel to one that just completed.
I am looking to automate the starting/restarting of queues with Resque in my Ruby on Rails application. (running on JRuby)
I want to ensure the following criteria are met:
Workers are started after I deploy with capistrano
Workers are restarted if they die for whatever reason
Workers eating too much memory are stopped/restarted and can fire me an email alert
Are there tools that current provide this functionality or at least a subset of it? If there isn't anything that restarts the queue/worker, I would like to be notified at minimum so I can manually do it.
The easiest way to do it would be using a program such as God or Monit to get #2 and #3. For #1, you can just setup your Capistrano script to send a kill -INT to all the Resque workers, then the monitoring program will start them up again.
The advantaged to using kill -INT rather than manually stopping and starting the jobs in the Capistrano script is that your deploy won't have to wait for every worker to stop processing its job to start them back up. It also means if you have a long running job, you will quickly have whatever free workers were running on the new code as quickly as possible.
I'm not especially familiar with it, however I believe the god gem is used frequently for process management.
I use delayed_job as a daemon https://github.com/tobi/delayed_job/wiki/Running-Delayed::Worker-as-a-daemon
I can't tell why, but sometimes I see more than one job done by several workers (different pids), and running stop doesn't stop anything. is there a way to kill all daemons of this proc/all workers? Or kill a specific pid (I'm on a shared hosting so kill/killall aren't available for me).
Not having access to "kill" in this setup will quickly become a PITA, and it boggles my mind that you wouldn't have the ability to kill processes you yourself started.
For increased worker dependability, you might want to try the collectiveidea fork of delayed_job, and using the daemon-spawn gem rather than daemons. I've had better luck with that combination.