Is there a way using dask client to restart a worker or worker list provided. Needed a way to bounce a worker after a task is executed to reset the state of the process which may have been changed by the execution.
Client.restart() restarts entire grid and so may end up killing any tasks running in parallel to one that just completed.
Related
In django-q we have recycle which is The number of tasks a worker will process before recycling . Useful to release memory resources on a regular basis.
When I start dask-worker with --nprocs 2, I get two worker subprocesses.
I would like to recycle them after each of them complete 1 task because there are a lot of resources that need to be freed and removed from memory after task is complete and I have no way of freeing them in an automatic fashion.
I tried a worker plugin with a simple sys.exit() inside release_key but it doesn't get rid of the process.
How to recycle worker subprocesses after task is done?
When applying workers lifetime option with restart, looks like if the worker is running a job, it still moves ahead with restart.
Applied lifetime restart option for every 60 secs using 1 worker and ran a job which simply sleeps for twice the amount of time. The restart still appears to take place even if the worker is running the job.
For graceful restart, thought the worker would wait for a long running task / job to finish and when idle would then restart itself. That way even if you have along running task its not interrupted by the auto restart option.
I am considering if replacing celery with dask. Currently we have a cluster where different jobs are submitted, each one generating multiple tasks that run in parallel. Celery has a killer feature, the "revoke" command: I can kill all the tasks of a given job without interfering with the other jobs that are running at the same time. How can I do that with dask? I only find references saying that this is not possible, but for us it would be a disaster. So don't want to be force the shut down the entire cluster when a calculation goes rogue, thus killing the jobs of the other users.
You can cancel tasks using the Client.cancel command.
If they haven't started yet then they won't start, however if they're already running in a thread somewhere then there isn't much that Python can do to stop them other than to tear down the process.
I am currently working on moving my environment off Heroku and part of my application is runs a clock process that sets off a Sidekiq background job.
As I understand it, Sidekiq is composed of a client, which sends jobs off to be queued into Redis and a server which pulls off requests of the queue and processes them. I am now trying to split out my application into the following containers on Docker:
- Redis container
- Clock container (Using Clockwork gem)
- Worker container
- Web application container (Rails)
However, I am not sure how one is supposed to split up this Sidekiq server and client. Essentially, the clock container needs to be running Sidekiq on it so that the client can send off jobs to the Redis queue every so often. However, the worker containers should also run Sidekiq (the server though) on them so that they can process the jobs. I assume that splitting up the responsibilities between different containers should be quite possible to do since Heroku allows you to split this across various dynos.
I can imagine one way to do this would be to assign the clock container to pull off a non-existent queue so that it just never pulls any jobs off the queue and then set the worker to be pulling off a queue that exists. However, this just doesn't seem like the most optimal approach to me since it will still be checking for new jobs in this non-existing queue.
Any tips or guides on how I can start going about this?
The sidekiq client just publishes jobs into redis. A sidekiq deamon process just subscribes to redis and starts worker threads as they are published.
So you can just install the redis gem on both conatainers: Clock container and Worker Container and start the worker daemon only on the Worker Container and provide a proper redis config to both. You also have to make sure that the worker sourcecode is available on both servers/containers as Sidekiq client just stores the name of the worker class and then the daemon instanciates it through metaprogramming.
But actually you also can just include a sidekiq daemon process together with every application wich needs to process a worker job. Yes, there is this best practise of docker for one container per process, however imho this is not an all or nothing rule. In this case i see both processes as one unity. It's just a way of running some code in background. You then would just configure that instances of same applications just work against same sidekiq queues. Or you could even configure that every physical node runs again a separate queue.
Is there a graceful way to restart resque workers(say after an event like deploy) where the worker which was processing some job from the queue doesn't get killed by a signal like SIGTERM or SIGKILL immediately, rather it should wait for the worker to finish the task it's doing and kill the worker when it gets free.
I am using God to monitor the resque workers and I went through the God homepage but wasn't able to find any info on the same but seeing as it's just a gem to monitor processes I don't think it has a graceful way to do this.
Also, I am looking to do automatic worker restart on deploy, I have looked at these two methods(http://simonecarletti.com/blog/2011/02/how-to-restart-god-when-you-deploy-a-new-release/, http://balazs.kutilovi.cz/2011/12/04/deploying-resque-scheduler-with-capistrano/). If there is a better way, it would be much helpful.
Turns out resque has an option where if you pass the signal SIGQUIT it will wait for the worker to finish the job and then quit the worker and god has an option of setting a stop signal for the process. So, you can define 'QUIT' as the stop_signal for the resque workers and when you stop god, it will send the singal SIGQUIT to the worker.