How many rails instances does delayed job initialize if running multiple pools - ruby-on-rails

I'm running Delayed Job with the pool option like:
bundle exec bin/delayed_job -m --pool=queue1 --pool=queue2 start
Will this spawn one OR multiple rails instances? (ie: will it spawn one instance for all the pools or will every pool get its own rails instance)?
When testing locally it seemed to only spawn one rails instance for all the pools.
But I want to confirm this 100% (esp on production).
I tried using commands like these to see what the DJ processes were actually pointing to:
ps aux, lsof, pstree
Anyone know for sure how this works, or any easy way to find out? I started digging through the source code but figured someone prob knows a quicker way.
Thanks!

It should spawn multiple processes, not sure why you're seeing only one.
From the readme:
Use the --pool option to specify a worker pool. You can use this option multiple times to start different numbers of workers for different queues.
The following command will start 1 worker for the tracking queue, 2 workers for the mailers and tasks queues, and 2 workers for any jobs:
RAILS_ENV=production script/delayed_job --pool=tracking --pool=mailers,tasks:2 --pool=*:2 start
Further details after discussion in comments
The question mentions "Rails instances", but instance is a generic term. The word you're looking for is process. The text quoted from DelayedJob's readme uses the word worker, short for worker process. In Rails, you usually refer to server processes as just servers, and to worker processes as just workers.
The rails console, too, is just another process.
In Rails all these processes will load the whole application, but will do different things.
Server processes will wait for incoming HTTP requests and send back responses; worker processes will periodically poll a queue (DelayedJob uses the DB) and execute jobs; the console process will start a REPL and wait for input.
They will all have access to the same code (models, DB config, assets, view template, etc), but will have very different responsibilities.
I hope this makes things clearer.

After digging through the code the short answer is..
Running something like this:
bundle exec bin/delayed_job -m --pool=queue1 --pool=queue2 start
will start ONE rails process/instance for ALL the pools/queues you specify.
Details below if you want more explanation:
In the Command class:
this loops through and setups up the workers:
def setup_pools
worker_index = 0
#worker_pools.each do |queues, worker_count|
options = #options.merge(:queues => queues)
worker_count.times do
process_name = "delayed_job.#{worker_index}"
run_process(process_name, options)
worker_index += 1
end
end
end
Which will run this for each queue:
def run(worker_name = nil, options = {})
Dir.chdir(root)
Delayed::Worker.after_fork
Delayed::Worker.logger ||= Logger.new(File.join(#options[:log_dir], 'delayed_job.log'))
worker = Delayed::Worker.new(options)
worker.name_prefix = "#{worker_name} "
worker.start
Each worker is daemonized, but there aren't new rails processes started. It just loops through each pool/queue in its own daemon.
You can see this in the "start" method:
def start
loop do
self.class.lifecycle.run_callbacks(:loop, self) do
#realtime = Benchmark.realtime do
#result = work_off
end
end
If you want to start a rails instance for each new queue you could use monit and do something like:
check process delayed_job_0
with pidfile /var/www/apps/{app_name}/shared/pids/delayed_job.0.pid
start program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job start -i 0"
stop program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job stop -i 0"
group delayed_job
check process delayed_job_1
with pidfile /var/www/apps/{app_name}/shared/pids/delayed_job.1.pid
start program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job start -i 1"
stop program = "/usr/bin/env RAILS_ENV=production /var/www/apps/{app_name}/current/bin/delayed_job stop -i 1"
group delayed_job

Related

Rails delayed job and docker: adding more workers

I run my rails app using a Docker. Delayed jobs are processed by a single worker that runs in a separate container called worker and inside it worker runs with a command bundle exec rake jobs:work.
I have several types of jobs that I would like to move to a separate queue and create a separate worker for that. Or at least have two workers for process tasks.
I tried to run my worker container with env QUEUE=default_queue bundle exec rake job:work && env QUEUE=another_queue bundle exec rake job:work but that does not make any sense. It does not fails, is starts but jobs aren't processed.
Is there any way to have separate workers in one container? And is it correct? Or should I create separate container for each worker I would ever want to make?
Thanx in advance!
Running the command command1 && command2 results in command2 being executed only when command1 completes. rake jobs:work never terminates, even when it has finished executing all the jobs in the queue, so the second command will never execute.
A single "&" is probably what you're looking for: command1 & command2.
This will run the commands independently in their own processes.
You should use the delayed_job script on production, and it's a good idea to put workers of different queues into different containers in case one of the queues contains jobs that use up a lot of resources.
This will start a delayed job worker for the default_queue:
bundle exec script/delayed_job --queue=default_queue start
For Rails 4, it is: bundle exec bin/delayed_job --queue=default_queue start
Check out this answer on the topic: https://stackoverflow.com/a/6814591/6006050
You can also start multiple workers in separate processes using the -n option. This will start 3 workers in separate processes, all picking jobs from the default_queue:
bundle exec script/delayed_job --queue=default_queue -n 3 start
Differences between rake jobs:work and the delayed_job script:
It appears that the only difference is that rake jobs:work starts processing jobs in the foreground, while the delayed_job script creates a daemon which processes jobs in the background. You can use whichever is more suited to your use case.
Check this github issue: https://github.com/collectiveidea/delayed_job/issues/659
Actually i just came across this problem with scaling delayed_jobs on docker
see this gist for a script that starts delayed jobs with arbitrary arguments and listens to SIGTERM and executes a smooth shutdown of the started jobs on container shutdown. This way you can execute as many processes and queues as you want.
https://gist.github.com/jklimke/3fea1e5e7dd7cd8003de7500508364df
#!/bin/bash
# Variable DELAYED_JOB_ARGS contains the arguments for delayed jobs for, e.g. defining queues and worker pools.
# function that is called when the docker container should stop. It stops the delayed job processes
_term() {
echo "Caught SIGTERM signal! Stopping delayed jobs !"
# unbind traps
trap - SIGTERM
trap - TERM
trap - SIGINT
trap - INT
# end delayed jobs
bundle exec "./bin/delayed_job ${DELAYED_JOB_ARGS} stop"
exit
}
# register handler for selected signals
trap _term SIGTERM
trap _term TERM
trap _term INT
trap _term SIGINT
echo "Starting delayed jobs ... with ARGs \"${DELAYED_JOB_ARGS}\""
# restart delayed jobs on script execution
bundle exec "./bin/delayed_job ${DELAYED_JOB_ARGS} restart"
echo "Finished starting delayed jobs... Waiting for SIGTERM / CTRL C"
# sleep forever until exit
while true; do sleep 86400; done

Ruby on rails, delayed jobs with rails s

I want to start workes for the job directly after some certain method. So, I start the application with usual rails s. Upload some stuff, so the create method is invoked. After create method the :perform_analysis -method is delayed. Some data is inserted into delayed_jobs table. Normally I start the workers to work typing "script/delayed_job start" in the command line. But I would like to start the workers work automatically, so I will type nothing.
model:
after_create :perform_analysis
def perform_analysis
bla
end
handle_asynchronously :perform_analysis, :run_at => Proc.new { 5.minutes.from_now }
So, I run an application with rails s. I log in in my wep-page. Upload some files, after 5 min the jobs are delayed. Then the worker should start to work.
I have found this page that does almost what I want but somehow the workers do not start at all. So the schedule.rb is not run. Should I do something more that is not told on that webpage?
Is there any other possibility do it?
I recommend you take a look at Foreman (http://ddollar.github.com/foreman/) and have your procfile declare a worker process:
web: bundle exec rails s
worker: bundle exec rake jobs:work
This way, a single command foreman start will start both the server and worker. The output will be presented in the same window for both.

Resque: worker status is not right

Resque is currently showing me that I have a worker doing work on a queue. That worker was shutdown by me in the middle of the queue (it's just for testing) and the worker is still showing as running. I've confirmed the process ID has been killed and bluepill is no longer monitoring it. I can't find anyway in the UI to force clear that it is working.
What's the best way to update the status for the # of workers that are currently up (I have 2, web UI reports 3).
You may have a lingering pid file. This file is independent of the process running; in other words, when you killed the process, it didn't delete the pid file.
If you're using a typical Rails and Resque setup, Resque will store the pid in the Rails ./tmp directory.
Some Resque start scripts specify the pid file in a different location, something like this:
PIDFILE=foo/bar/resque/pid bundle exec rake resque:work
Wherever the script puts the pid file, look there, then delete it, then restart.
Also on the command line, you can ask redis for the running workers:
redis-cli keys *worker:*
If there are workers that you don't expect, you can delete them with:
redis-cli del <keyname>
Try to restart the applications.
For future references: also have a look under https://github.com/resque/resque/issues/299

How do I clear stuck/stale Resque workers?

As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of seconds.
I'm not sure why they won't clear or how to manually remove them.
I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.
None of these solutions worked for me, I would still see this in redis-web:
0 out of 10 Workers Working
Finally, this worked for me to clear all the workers:
Resque.workers.each {|w| w.unregister_worker}
In your console:
queue_name = "process_numbers"
Resque.redis.del "queue:#{queue_name}"
Otherwise you can try to fake them as being done to remove them, with:
Resque::Worker.working.each {|w| w.done_working}
EDIT
A lot of people have been upvoting this answer and I feel that it's important that people try hagope's solution which unregisters workers off a queue, whereas the above code deletes queues. If you're happy to fake them, then cool.
You probably have the resque gem installed, so you can open the console and get current workers
Resque.workers
It returns a list of workers
#=> [#<Worker infusion.local:40194-0:JAVA_DYNAMIC_QUEUES,index_migrator,converter,extractor>]
pick the worker and prune_dead_workers, for example the first one
Resque.workers.first.prune_dead_workers
Adding to answer by hagope, I wanted to be able to only unregister workers that had been running for a certain amount of time. The code below will only unregister workers running for over 300 seconds (5 minutes).
Resque.workers.each {|w| w.unregister_worker if w.processing['run_at'] && Time.now - w.processing['run_at'].to_time > 300}
I have an ongoing collection of Resque related Rake tasks that I have also added this to: https://gist.github.com/ewherrmann/8809350
Run this command wherever you ran the command to start the server
$ ps -e -o pid,command | grep [r]esque
you should see something like this:
92102 resque: Processing ProcessNumbers since 1253142769
Make note of the PID (process id) in my example it is 92102
Then you can quit the process 1 of 2 ways.
Gracefully use QUIT 92102
Forcefully use TERM 92102
* I'm not sure of the syntax it's either QUIT 92102 or QUIT -92102
Let me know if you have any trouble.
I just did:
% rails c production
irb(main):001:0>Resque.workers
Got the list of workers.
irb(main):002:0>Resque.remove_worker(Resque.workers[n].id)
... where n is the zero based index of the unwanted worker.
I had a similar problem that Redis saved the DB to disk that included invalid (non running) workers. Each time Redis/resque was started they appeared.
Fix this using:
Resque::Worker.working.each {|w| w.done_working}
Resque.redis.save # Save the DB to disk without ANY workers
Make sure you restart Redis and your Resque workers.
Started working on https://github.com/shaiguitar/resque_stuck_queue/ recently. It's not a solution to how to fix stuck workers but it addresses the issue of resque hanging/being stuck, so I figured it could be helpful for people on this thread. From README:
"If resque doesn't run jobs within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you."
Been used in production and works pretty well for me thus far.
Here's how you can purge them from Redis by hostname. This happens to me when I decommission a server and workers do not exit gracefully.
Resque.workers.each { |w| w.unregister_worker if w.id.start_with?(hostname) }
I ran into this issue and started down the path of implementing a lot of the suggestions here. However, I discovered the root cause that was creating this issue was that I was using the gem redis-rb 3.3.0. Downgrading to redis-rb 3.2.2 prevented these workers from getting stuck in the first place.
I've cleared them out from redis-cli directly. Luckily redistogo.com allows access from environments outside heroku.
Get dead worker ID from the list. Mine was
55ba6f3b-9287-4f81-987a-4e8ae7f51210:2
Run this command in redis directly.
del "resque:worker:55ba6f3b-9287-4f81-987a-4e8ae7f51210:2:*"
You can monitor redis db to see what it's doing behind the scenes.
redis xxx.redistogo.com> MONITOR
OK
1380274567.540613 "MONITOR"
1380274568.345198 "incrby" "resque:stat:processed" "1"
1380274568.346898 "incrby" "resque:stat:processed:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*" "1"
1380274568.346920 "del" "resque:worker:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*"
1380274568.348803 "smembers" "resque:queues"
Second last line deletes the worker.
In resque 2.0.0, here's one way that seems to work to remove only actually appearantly-dead workers in resque 2.0.0:
Resque::Worker.all_workers_with_expired_heartbeats.each { |w| w.unregister_worker }
I am not an expert in what's going, it's possible there's a better way to do this or that this will have problems. I'm just trying to figure this out too.
This seems to remove workers that haven't sent a "heartbeat" in much longer than expected from the resque worker list.
If the phantom worker was in a "running" state, then a new entry in the "failed" job queue will be created corresponding to phantom job.
I had stuck/stale resque workers here too, or should I say 'jobs', because the worker is actually still there and running fine, it's the forked process that is stuck.
I chose the brutal solution of killing the forked process "Processing" since more than 5min, via a bash script, then the worker just spawn the next in queue, and everything keeps on going
have a look at my script here: https://gist.github.com/jobwat/5712437
If you are using newer versions of Resque, you'll need to use the following command as the internal APIs have changed...
Resque::WorkerRegistry.working.each {|work| Resque::WorkerRegistry.remove(work.id)}
This avoids the problem as long as you have a resque version newer than 1.26.0:
resque: env QUEUE=foo TERM_CHILD=1 bundle exec rake resque:work
Keep in mind that it does not let the currently running job finish.
If you use Docker, you can also use this command:
<id> is the worker id.
docker stop <id>
docker start <id>

How can you add or remove workers from delayed_jobs?

On a similar note, how can you know how many workers there are currently assigned?
If you are on your local machine, just run one of the following
# starts the worker
rake jobs:work
# kills it
Control + C on your keyboard
or
# starts the worker
script/delayed_job start
# kills it
script/delayed_job stop
Additionally, here are some commands to spawn multiple workers: https://github.com/collectiveidea/delayed_job/wiki/Delayed-job-command-details
If you want a list of currently running workers, you would do
script/delayed_job status
and this would return each process (which you'd then have to count to get the integer value)
If you are on Heroku, you can do heroku workers to get the number of current workers, and heroku workers 2 to start two workers or heroku workers 0 to kill all workers.
You can also use HireFireApp.com to manage all of your workers for you on Heroku.
Since you didn't specify what type of environment you are running DJ on, please let me know if these don't answer your question.
$ ps ax |grep delayed_job can show you any details directly
The "ps" suggestion works better after first looking int RAILS_ROOT/tmp/pids for delayed_job*.pid files.
Together this tells you what DJ thinks it's running and what is actually running. If one of those files contains a process ID that you can't find in the ps output, that's an indicator that DJ has a worker that has died that it hasn't realized.

Resources