Rails delayed job and docker: adding more workers - ruby-on-rails

I run my rails app using a Docker. Delayed jobs are processed by a single worker that runs in a separate container called worker and inside it worker runs with a command bundle exec rake jobs:work.
I have several types of jobs that I would like to move to a separate queue and create a separate worker for that. Or at least have two workers for process tasks.
I tried to run my worker container with env QUEUE=default_queue bundle exec rake job:work && env QUEUE=another_queue bundle exec rake job:work but that does not make any sense. It does not fails, is starts but jobs aren't processed.
Is there any way to have separate workers in one container? And is it correct? Or should I create separate container for each worker I would ever want to make?
Thanx in advance!

Running the command command1 && command2 results in command2 being executed only when command1 completes. rake jobs:work never terminates, even when it has finished executing all the jobs in the queue, so the second command will never execute.
A single "&" is probably what you're looking for: command1 & command2.
This will run the commands independently in their own processes.
You should use the delayed_job script on production, and it's a good idea to put workers of different queues into different containers in case one of the queues contains jobs that use up a lot of resources.
This will start a delayed job worker for the default_queue:
bundle exec script/delayed_job --queue=default_queue start
For Rails 4, it is: bundle exec bin/delayed_job --queue=default_queue start
Check out this answer on the topic: https://stackoverflow.com/a/6814591/6006050
You can also start multiple workers in separate processes using the -n option. This will start 3 workers in separate processes, all picking jobs from the default_queue:
bundle exec script/delayed_job --queue=default_queue -n 3 start
Differences between rake jobs:work and the delayed_job script:
It appears that the only difference is that rake jobs:work starts processing jobs in the foreground, while the delayed_job script creates a daemon which processes jobs in the background. You can use whichever is more suited to your use case.
Check this github issue: https://github.com/collectiveidea/delayed_job/issues/659

Actually i just came across this problem with scaling delayed_jobs on docker
see this gist for a script that starts delayed jobs with arbitrary arguments and listens to SIGTERM and executes a smooth shutdown of the started jobs on container shutdown. This way you can execute as many processes and queues as you want.
https://gist.github.com/jklimke/3fea1e5e7dd7cd8003de7500508364df
#!/bin/bash
# Variable DELAYED_JOB_ARGS contains the arguments for delayed jobs for, e.g. defining queues and worker pools.
# function that is called when the docker container should stop. It stops the delayed job processes
_term() {
echo "Caught SIGTERM signal! Stopping delayed jobs !"
# unbind traps
trap - SIGTERM
trap - TERM
trap - SIGINT
trap - INT
# end delayed jobs
bundle exec "./bin/delayed_job ${DELAYED_JOB_ARGS} stop"
exit
}
# register handler for selected signals
trap _term SIGTERM
trap _term TERM
trap _term INT
trap _term SIGINT
echo "Starting delayed jobs ... with ARGs \"${DELAYED_JOB_ARGS}\""
# restart delayed jobs on script execution
bundle exec "./bin/delayed_job ${DELAYED_JOB_ARGS} restart"
echo "Finished starting delayed jobs... Waiting for SIGTERM / CTRL C"
# sleep forever until exit
while true; do sleep 86400; done

Related

concurrency in delayed_jobs

I have ROR application and 1 delay_job process ran using rake job:work.
ROR application add Job in multiple queue.
Lets say we have queue 1 and queue 2.
My Question is task in queue 1 and task in queue 2 will be executed concurrently?
Currently in my application after running rake job:work process only 1 thread is spawn which executes queue1 task and then queue2 task.
If i have to execute in parallel, i have to run two rake task of job:work.
Is it correct behavior or it can be run concurrently in 1 rake task of job:work.
And what is worker in Delay Job. Is delay Job interchangeably used with worker
Thanks
Priyanka
No, one worker cannot run two jobs concurrently, you need more than one process running for that.
In the example you are describing, you are starting a worker that is running in the foreground (rake job:work), but what you could do instead, is to start them as background workers, by running bin/delayed_job instead (script/delayed_job for earlier versions). That command has multiple options that you can use to specify how you want delayed_job to function.
One of the options is -n or --number_of_workers=workers. That means you can start two workers by running the following command:
bundle exec bin/delayed_job --number_of_workers=2 start
It is also possible to dedicate certain workers to only run jobs from a specific queue, or only high priority jobs.

Ruby on Rails batch processing

I am working on a Rails app that runs regularly scheduled sidekiq jobs, and I understand how queues and background jobs works. I'm working with a 3rd party that requests that I batch jobs to them so that each worker handles one job at a time with 50 workers running in parallel.
I've been researching this for hours, but I'm unclear on how to do this and how to tell if it's actually working. Currently, my procfile looks like this:
web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb
worker: bundle exec sidekiq -C ./config/sidekiq.yml
Is it as simple as increasing the concurrency from the rake task to -c 50 in the worker line? Or do I need to use ConnectionPool inside the worker class? The Rails docs say that using find_each is "useful if you want multiple workers dealing with the same processing queue." If I run find_each inside the rake task and call the worker once for each item, will it run the jobs in parallel? I read one article that says that concurrency and parallelism are often confused, so I am, in turn, a little confused about which direction to take.

How many rails instances does delayed job initialize if running multiple pools

I'm running Delayed Job with the pool option like:
bundle exec bin/delayed_job -m --pool=queue1 --pool=queue2 start
Will this spawn one OR multiple rails instances? (ie: will it spawn one instance for all the pools or will every pool get its own rails instance)?
When testing locally it seemed to only spawn one rails instance for all the pools.
But I want to confirm this 100% (esp on production).
I tried using commands like these to see what the DJ processes were actually pointing to:
ps aux, lsof, pstree
Anyone know for sure how this works, or any easy way to find out? I started digging through the source code but figured someone prob knows a quicker way.
Thanks!
It should spawn multiple processes, not sure why you're seeing only one.
From the readme:
Use the --pool option to specify a worker pool. You can use this option multiple times to start different numbers of workers for different queues.
The following command will start 1 worker for the tracking queue, 2 workers for the mailers and tasks queues, and 2 workers for any jobs:
RAILS_ENV=production script/delayed_job --pool=tracking --pool=mailers,tasks:2 --pool=*:2 start
Further details after discussion in comments
The question mentions "Rails instances", but instance is a generic term. The word you're looking for is process. The text quoted from DelayedJob's readme uses the word worker, short for worker process. In Rails, you usually refer to server processes as just servers, and to worker processes as just workers.
The rails console, too, is just another process.
In Rails all these processes will load the whole application, but will do different things.
Server processes will wait for incoming HTTP requests and send back responses; worker processes will periodically poll a queue (DelayedJob uses the DB) and execute jobs; the console process will start a REPL and wait for input.
They will all have access to the same code (models, DB config, assets, view template, etc), but will have very different responsibilities.
I hope this makes things clearer.
After digging through the code the short answer is..
Running something like this:
bundle exec bin/delayed_job -m --pool=queue1 --pool=queue2 start
will start ONE rails process/instance for ALL the pools/queues you specify.
Details below if you want more explanation:
In the Command class:
this loops through and setups up the workers:
def setup_pools
worker_index = 0
#worker_pools.each do |queues, worker_count|
options = #options.merge(:queues => queues)
worker_count.times do
process_name = "delayed_job.#{worker_index}"
run_process(process_name, options)
worker_index += 1
end
end
end
Which will run this for each queue:
def run(worker_name = nil, options = {})
Dir.chdir(root)
Delayed::Worker.after_fork
Delayed::Worker.logger ||= Logger.new(File.join(#options[:log_dir], 'delayed_job.log'))
worker = Delayed::Worker.new(options)
worker.name_prefix = "#{worker_name} "
worker.start
Each worker is daemonized, but there aren't new rails processes started. It just loops through each pool/queue in its own daemon.
You can see this in the "start" method:
def start
loop do
self.class.lifecycle.run_callbacks(:loop, self) do
#realtime = Benchmark.realtime do
#result = work_off
end
end
If you want to start a rails instance for each new queue you could use monit and do something like:
check process delayed_job_0
with pidfile /var/www/apps/{app_name}/shared/pids/delayed_job.0.pid
start program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job start -i 0"
stop program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job stop -i 0"
group delayed_job
check process delayed_job_1
with pidfile /var/www/apps/{app_name}/shared/pids/delayed_job.1.pid
start program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job start -i 1"
stop program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job stop -i 1"
group delayed_job

How can you add or remove workers from delayed_jobs?

On a similar note, how can you know how many workers there are currently assigned?
If you are on your local machine, just run one of the following
# starts the worker
rake jobs:work
# kills it
Control + C on your keyboard
or
# starts the worker
script/delayed_job start
# kills it
script/delayed_job stop
Additionally, here are some commands to spawn multiple workers: https://github.com/collectiveidea/delayed_job/wiki/Delayed-job-command-details
If you want a list of currently running workers, you would do
script/delayed_job status
and this would return each process (which you'd then have to count to get the integer value)
If you are on Heroku, you can do heroku workers to get the number of current workers, and heroku workers 2 to start two workers or heroku workers 0 to kill all workers.
You can also use HireFireApp.com to manage all of your workers for you on Heroku.
Since you didn't specify what type of environment you are running DJ on, please let me know if these don't answer your question.
$ ps ax |grep delayed_job can show you any details directly
The "ps" suggestion works better after first looking int RAILS_ROOT/tmp/pids for delayed_job*.pid files.
Together this tells you what DJ thinks it's running and what is actually running. If one of those files contains a process ID that you can't find in the ps output, that's an indicator that DJ has a worker that has died that it hasn't realized.

Redis and Resque, workers not getting updated application code

I'm having a rough time here with Resque, firstly, in development when running rake resque:work QUEUE='*' to work on the queue it starts up fine and starts running the perform method for my workers, which is fine; the problem is that the workers don't seem to run my new application code, say I update the perform method in that worker then Ctrl+c out of that rake resque:work QUEUE='*' process and starting it up again, whilst queuing new jobs to be worked on doesn't result in the worker running with the new updated code.
So mainly my problem here is, how do I safely kill the resque:work task and restart my workers with the new application code?
Resque workers respond to a few different signals:
QUIT - Wait for child to finish processing then exit
TERM / INT - Immediately kill child then exit
USR1 - Immediately kill child but don't exit
USR2 - Don't start to process any new jobs
CONT - Start to process new jobs again after a USR2
If you want to gracefully shutdown a Resque worker, use QUIT.
kill -s QUIT reqsue.pid
if you want to setup resque restart with capitrano, use this gits

Resources