How to restart dask worker subprocess after task is done? - dask

In django-q we have recycle which is The number of tasks a worker will process before recycling . Useful to release memory resources on a regular basis.
When I start dask-worker with --nprocs 2, I get two worker subprocesses.
I would like to recycle them after each of them complete 1 task because there are a lot of resources that need to be freed and removed from memory after task is complete and I have no way of freeing them in an automatic fashion.
I tried a worker plugin with a simple sys.exit() inside release_key but it doesn't get rid of the process.
How to recycle worker subprocesses after task is done?

Related

Dask Workers lifetime option not waiting for job to finish

When applying workers lifetime option with restart, looks like if the worker is running a job, it still moves ahead with restart.
Applied lifetime restart option for every 60 secs using 1 worker and ran a job which simply sleeps for twice the amount of time. The restart still appears to take place even if the worker is running the job.
For graceful restart, thought the worker would wait for a long running task / job to finish and when idle would then restart itself. That way even if you have along running task its not interrupted by the auto restart option.

Killing tasks spawned by a job

I am considering if replacing celery with dask. Currently we have a cluster where different jobs are submitted, each one generating multiple tasks that run in parallel. Celery has a killer feature, the "revoke" command: I can kill all the tasks of a given job without interfering with the other jobs that are running at the same time. How can I do that with dask? I only find references saying that this is not possible, but for us it would be a disaster. So don't want to be force the shut down the entire cluster when a calculation goes rogue, thus killing the jobs of the other users.
You can cancel tasks using the Client.cancel command.
If they haven't started yet then they won't start, however if they're already running in a thread somewhere then there isn't much that Python can do to stop them other than to tear down the process.

Dask restart worker(s) using client

Is there a way using dask client to restart a worker or worker list provided. Needed a way to bounce a worker after a task is executed to reset the state of the process which may have been changed by the execution.
Client.restart() restarts entire grid and so may end up killing any tasks running in parallel to one that just completed.

sidekiq memory usage reset

I have Rails app which uses Sidekiq for background process. To deploy this application I use capistrano, ubuntu server and apache passenger. To start and restart Sidekiq I use capistrano-sidekiq gem.
My problem is - when Sidekiq is running, amount of memory (RAM) used by Sidekiq is growing up. And when Sidekiq finished all processes (workers) it keeps holding a large amount of RAM and not reseting it.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
ubuntu 2035 67.6 45.4 3630724 1838232 ? Sl 10:03 133:59 sidekiq 3.5.0 my_app [0 of 25 busy]
How to make Sidekiq to reset used memory after workers finished their work?
Sidekiq uses thread to execute jobs.
And threads share the same memory as the parent process.
So if one job uses a lot of memory the Sidekiq process memory usage will grow up and won't be released by Ruby.
Resque uses another technique it executes every jobs in another process therefore when the job is done, the job's process exits and the memory is released.
One way to prevent your Sidekiq process from using too much memory is to use Resque's forking method.
You could have your job main method executed in another process and wait until that new process exits
ex:
class Job
include Process
def perform
pid = fork do
# your code
end
waitpid(pid)
end
end

Sidekiq worker is leaking memory

Using the sidekiq gem - I have sidekiq worker that runs a process (git-tf clone of big repository) using IO.popen and tracks the stdout to check the progress of the clone.
When I am running the worker, I see that sidekiq memory is getting larger over the time until I get kernel OOM and the process get killed. the subprocess (java process) is taking only 5% of the total memory.
How I can debug/check the memory leak I have in my code? and does the sidekiq memory is the total of my workers memory with the popen process?
And does anyone have any idea how to fix it?
EDIT
This is the code of my worker -
https://gist.github.com/yosy/5227250
EDIT 2
I ran the code without sidekiq, and I have no memory leaks.. this is something strange with sidekiq and big repositories in tfs
I didn't find the cause for the memory leak in sidekiq, but I found a away to get a way from sidekiq.
I have modified git-tf to have server command that accepts command from redis queue, it removes lot of complexity from my code.
The modified version of git-tf is here:
https://github.com/yosy/gittf
I will add documentation later about the sever command when I will fix some bugs.

Resources