How do I clear stuck/stale Resque workers? - ruby-on-rails

As you can see from the attached image, I've got a couple of workers that seem to be stuck. Those processes shouldn't take longer than a couple of seconds.
I'm not sure why they won't clear or how to manually remove them.
I'm on Heroku using Resque with Redis-to-Go and HireFire to automatically scale workers.

None of these solutions worked for me, I would still see this in redis-web:
0 out of 10 Workers Working
Finally, this worked for me to clear all the workers:
Resque.workers.each {|w| w.unregister_worker}

In your console:
queue_name = "process_numbers"
Resque.redis.del "queue:#{queue_name}"
Otherwise you can try to fake them as being done to remove them, with:
Resque::Worker.working.each {|w| w.done_working}
EDIT
A lot of people have been upvoting this answer and I feel that it's important that people try hagope's solution which unregisters workers off a queue, whereas the above code deletes queues. If you're happy to fake them, then cool.

You probably have the resque gem installed, so you can open the console and get current workers
Resque.workers
It returns a list of workers
#=> [#<Worker infusion.local:40194-0:JAVA_DYNAMIC_QUEUES,index_migrator,converter,extractor>]
pick the worker and prune_dead_workers, for example the first one
Resque.workers.first.prune_dead_workers

Adding to answer by hagope, I wanted to be able to only unregister workers that had been running for a certain amount of time. The code below will only unregister workers running for over 300 seconds (5 minutes).
Resque.workers.each {|w| w.unregister_worker if w.processing['run_at'] && Time.now - w.processing['run_at'].to_time > 300}
I have an ongoing collection of Resque related Rake tasks that I have also added this to: https://gist.github.com/ewherrmann/8809350

Run this command wherever you ran the command to start the server
$ ps -e -o pid,command | grep [r]esque
you should see something like this:
92102 resque: Processing ProcessNumbers since 1253142769
Make note of the PID (process id) in my example it is 92102
Then you can quit the process 1 of 2 ways.
Gracefully use QUIT 92102
Forcefully use TERM 92102
* I'm not sure of the syntax it's either QUIT 92102 or QUIT -92102
Let me know if you have any trouble.

I just did:
% rails c production
irb(main):001:0>Resque.workers
Got the list of workers.
irb(main):002:0>Resque.remove_worker(Resque.workers[n].id)
... where n is the zero based index of the unwanted worker.

I had a similar problem that Redis saved the DB to disk that included invalid (non running) workers. Each time Redis/resque was started they appeared.
Fix this using:
Resque::Worker.working.each {|w| w.done_working}
Resque.redis.save # Save the DB to disk without ANY workers
Make sure you restart Redis and your Resque workers.

Started working on https://github.com/shaiguitar/resque_stuck_queue/ recently. It's not a solution to how to fix stuck workers but it addresses the issue of resque hanging/being stuck, so I figured it could be helpful for people on this thread. From README:
"If resque doesn't run jobs within a certain timeframe, it will trigger a pre-defined handler of your choice. You can use this to send an email, pager duty, add more resque workers, restart resque, send you a txt...whatever suits you."
Been used in production and works pretty well for me thus far.

Here's how you can purge them from Redis by hostname. This happens to me when I decommission a server and workers do not exit gracefully.
Resque.workers.each { |w| w.unregister_worker if w.id.start_with?(hostname) }

I ran into this issue and started down the path of implementing a lot of the suggestions here. However, I discovered the root cause that was creating this issue was that I was using the gem redis-rb 3.3.0. Downgrading to redis-rb 3.2.2 prevented these workers from getting stuck in the first place.

I've cleared them out from redis-cli directly. Luckily redistogo.com allows access from environments outside heroku.
Get dead worker ID from the list. Mine was
55ba6f3b-9287-4f81-987a-4e8ae7f51210:2
Run this command in redis directly.
del "resque:worker:55ba6f3b-9287-4f81-987a-4e8ae7f51210:2:*"
You can monitor redis db to see what it's doing behind the scenes.
redis xxx.redistogo.com> MONITOR
OK
1380274567.540613 "MONITOR"
1380274568.345198 "incrby" "resque:stat:processed" "1"
1380274568.346898 "incrby" "resque:stat:processed:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*" "1"
1380274568.346920 "del" "resque:worker:c65c8e2b-555a-4a57-aaa6-477b27d6452d:2:*"
1380274568.348803 "smembers" "resque:queues"
Second last line deletes the worker.

In resque 2.0.0, here's one way that seems to work to remove only actually appearantly-dead workers in resque 2.0.0:
Resque::Worker.all_workers_with_expired_heartbeats.each { |w| w.unregister_worker }
I am not an expert in what's going, it's possible there's a better way to do this or that this will have problems. I'm just trying to figure this out too.
This seems to remove workers that haven't sent a "heartbeat" in much longer than expected from the resque worker list.
If the phantom worker was in a "running" state, then a new entry in the "failed" job queue will be created corresponding to phantom job.

I had stuck/stale resque workers here too, or should I say 'jobs', because the worker is actually still there and running fine, it's the forked process that is stuck.
I chose the brutal solution of killing the forked process "Processing" since more than 5min, via a bash script, then the worker just spawn the next in queue, and everything keeps on going
have a look at my script here: https://gist.github.com/jobwat/5712437

If you are using newer versions of Resque, you'll need to use the following command as the internal APIs have changed...
Resque::WorkerRegistry.working.each {|work| Resque::WorkerRegistry.remove(work.id)}

This avoids the problem as long as you have a resque version newer than 1.26.0:
resque: env QUEUE=foo TERM_CHILD=1 bundle exec rake resque:work
Keep in mind that it does not let the currently running job finish.

If you use Docker, you can also use this command:
<id> is the worker id.
docker stop <id>
docker start <id>

Related

Rails - Old cron job keeps running, can't delete it

So I'm using Rails and I have a few Sidekiq workers, but none are enabled. I'm using the sidekiq-cron gem, which requires you to put files in app/workers/, configure a sidekiq scheduler in config/sidekiq_schedule.yml, and also add a few lines in config/initializers/sidekiq.rb. However, I've commented everything out from sidekiq_schedule.yml and also commented the following lines out from sidekiq.rb:
# Sidekiq scheduler.
# schedule_file = 'config/sidekiq_schedule.yml'
# if File.exists?(schedule_file) && Sidekiq.server?
# Sidekiq::Cron::Job.load_from_hash! YAML.load_file(schedule_file)
# end
However, if I launch Sidekiq, every minute (which is the old schedule), I see this in the prompt:
2018-01-19T02:54:04.156Z 22197 TID-ovsidcme8 ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-8609429b89db2a91793509ea INFO: start
2018-01-19T02:54:04.164Z 22197 TID-ovsidcme8 ActiveJob::QueueAdapters::SidekiqAdapter::JobWrapper JID-8609429b89db2a91793509ea INFO: fail: 0.008 sec
and it fails because it's trying to launch code a job that's not supposed to be launching.
I've went to the rails console prompt (rails -c) and tried to find the job, but nothing's in there:
irb(main):001:0> Sidekiq::Cron::Job.all
=> []
so I'm not quite sure why it's constantly trying to launch a job. If I go to the rails interface on my application, I don't see anything in the queue, nothing being processed, busy, retries, enqueued, nothing.
Any suggestions would be greatly appreciated. I've been trying to hunt this down for like the last hour and have no success. I even removed ALL of the workers from the workers directory, and yet it's still trying to launch one of them.
Because you have already load jobs, I think that those jobs configuration are still in REDIS. Checking this assumption by opening a new terminal tab with redis-cli:
KEYS '*cron*'
If there are those keys on REDIS, clear them will fix your issue.
Since you mentioned a cron job in your title but not in the question, I'm assuming there's a cronjob running the background sidekiq task.
Try running crontab - l in Terminal to see all your cron jobs. If you see something like "* * * * *", that means there's a job that is running every minute.
Then, use crontab - r to clear your cron tab and delete all scheduled tasks.

Prevent sidekiq from executing queued up jobs when starting from command line?

When I start sidekiq in my development environment (Rails 3.2), I use the following command:
bundle exec sidekiq
When I do this, sidekiq will execute all jobs that have been queued up when it was not running. e.g. If I have created a bunch of new user accounts during testing, it will try and send welcome emails to all of the fake accounts (my emails are sent from a sidekiq job).
Is there a way to start sidekiq and tell it to delete all pending jobs? That way I can turn it back on without worrying that it will try and run a bunch of jobs that don't need to run (since this is my dev environment).
I have looked in documentation, but can't find an answer, hopefully it's something simple I overlooked...
redis-cli flushall && bundle exec sidekiq
I found a solution: Using the sidekiq monitoring UI that comes with sidekiq (https://github.com/mperham/sidekiq/wiki/Monitoring), I'm able to view all queues (even when sidekiq is not running). Deleting the queue will remove all of the jobs in it, which solves the problem.

Resque: worker status is not right

Resque is currently showing me that I have a worker doing work on a queue. That worker was shutdown by me in the middle of the queue (it's just for testing) and the worker is still showing as running. I've confirmed the process ID has been killed and bluepill is no longer monitoring it. I can't find anyway in the UI to force clear that it is working.
What's the best way to update the status for the # of workers that are currently up (I have 2, web UI reports 3).
You may have a lingering pid file. This file is independent of the process running; in other words, when you killed the process, it didn't delete the pid file.
If you're using a typical Rails and Resque setup, Resque will store the pid in the Rails ./tmp directory.
Some Resque start scripts specify the pid file in a different location, something like this:
PIDFILE=foo/bar/resque/pid bundle exec rake resque:work
Wherever the script puts the pid file, look there, then delete it, then restart.
Also on the command line, you can ask redis for the running workers:
redis-cli keys *worker:*
If there are workers that you don't expect, you can delete them with:
redis-cli del <keyname>
Try to restart the applications.
For future references: also have a look under https://github.com/resque/resque/issues/299

Rufus-scheduler only running once on production

I'm using rufus-scheduler to run a process every day from a rails server. For testing purposes, let's say every 5 minutes. My code looks like this:
in config/initializers/task_scheduler.rb
scheduler = Rufus::Scheduler::PlainScheduler.start_new
scheduler.every "10m", :first_in => '30s' do
# Do stuff
end
I've also tried the cron format:
scheduler.cron '50 * * * *' do
# stuff
end
for example, to get the process to run every hour at 50 minutes after the hour.
The infuriating part is that it works on my local machine. The process will run regularly and just work. It's only on my deployed-to-production app that the process will run once, and not repeat.
ps faux reveals that cron is running, passenger is handling the spin-up of the rails process, the site has been pinged again so it knows it should refresh, and production shows the changes in the code. The only thing that's different is that, without a warning or error, the scheduled task doesn't repeat.
Help!
You probably shouldn't run rufus-scheduler in the rails server itself, especially not with a multi-process framework like passenger. Instead, you should run it in a daemon process.
My theory on what's happening:
Passenger starts up a ruby server process and uses it to fork off other servers to handle requests. But since rufus-scheduler runs its jobs in a separate thread from the main thread, the rufus thread is only alive in the original ruby process (ruby's fork only duplicates the thread that does the forking). This might seem like a good thing because it prevents multiple schedulers from running, but... Passenger may kill ruby processes under certain conditions - and if it kills the original, the scheduler thread is gone.
Add the Below lines to your apache2 config /etc/apache2/apach2.conf and restart your apache server
RailsAppSpawnerIdleTime 0
PassengerMinInstances 1
Kelvin is right.
Passenger kills 'unnecessary' threads.
http://groups.google.com/group/rufus-ruby/search?group=rufus-ruby&q=passenger

Delayed Jobs on Rails 2: any better way to run workers?

I finally got the DelayedJobs plugin working for Rails 2, and it does indeed work fine...as long as I run:
rake jobs:work
Just like the readme says, to be fair.
BUT, this doesn't fit my requirements...what kind of background task requires you to have a shell open, and a command running? That'd be like having to say script/server to run my rails app, and never getting that -d option so it'll keep running even after I close my shell.
Is there ANY way to keep the workers getting processed in the backgroun, or in daemon mode, or whatever?
I had a ray of hope when I saw the
You can also run by writing a simple
#script/job_runner#, and invoking it
externally:
Line in the readme...but...that just does the exact same thing the rake task does, you just call it a different way.
What I want:
I want to start my rails app, then start whatever will process the workers, and have BOTH of them run invisibly in the background, without the need for me to babysit it and keep the shell that started it running.
(My server is something I SSH into, so I don't want to have that shell that SSHed into it running 24/7 (especially since I like to turn off my local computer now and again)).
Is there any way to acomplish this?
You can make any *nix command run on the background by appending an & to its end:
rake jobs:work &
Just make sure you exit the shell (or use the disown command) to detach the process from your login session... Otherwise, if your session disconnects, the processes you own will be killed with it.
Perhaps Beanstalkd and Stalker?
Beanstalk is a fast and easy way to queue background tasks. Stalker provides a nice wrapper interface for creating these jobs.
See the railscast on it for more information
Edit:
You could also run that rake task as a cronjob which would mean the server would run it periodically without you needing to be logged in
Use the collectiveidea fork of delayed_job... It's more actively developed and has support for running the jobs in a daemon without any extra messing about.
My capistrano script calls
RAILS_ENV=production script/delayed_job start

Resources