How to restart Resque workers automatically when Redis server restarts - ruby-on-rails

I have a Rails application that runs jobs on the background using the Resque adapter. I have noticed that once a couple of days my workers disappear (just stop), my jobs get stuck in the queue, and I have to restart the workers anew every time they stop.
I check using ps -e -o pid,command | grep [r]esque and launch workers in the background using
(RAILS_ENV=production PIDFILE=./resque.pid BACKGROUND=yes bundle exec rake resque:workers QUEUE='*' COUNT='12') 2>&1 | tee -a log/resque.log.
Then I stopped redis-server using /etc/init.d/redis-server stop and again checked the worker processes. They disappeared.
This gives a reason to think that worker processes stop maybe because the redis server restarting because of some reason.
Is there any Rails/Ruby way solution to this problem? What comes to my mind is writing a simple Ruby code that would watch the worker processes with the period, say, 5 seconds, and restart them if they stop.
UPDATE:
I don't want to use tools such as Monit, God, eye, and etc. They are not reliable. Then I will need to watch them too. Something like to install God to manage Resque workers, then install Monit to watch God, ...
UPDTAE
This is what I am using and it is really working. I manually stoped redis-server and then started it again. This script successfully launched the workers.
require 'logger'
module Watch
def self.workers_dead?
processes = `ps -e -o pid,command | grep [r]esque`
return true if processes.empty?
false
end
def self.check(time_interval)
logger = Logger.new('watch.log', 'daily')
logger.info("Starting watch")
while(true) do
if workers_dead?
logger.warn("Workers are dead")
restart_workers(logger)
end
sleep(time_interval)
end
end
def self.restart_workers(logger)
logger.info("Restarting workers...")
`cd /var/www/agts-api && (RAILS_ENV=production PIDFILE=./resque.pid BACKGROUND=yes rake resque:workers QUEUE='*' COUNT='12') 2>&1 | tee -a log/resque.log`
end
end
Process.daemon(true,true)
pid_file = File.dirname(__FILE__) + "#{__FILE__}.pid"
File.open(pid_file, 'w') { |f| f.write Process.pid }
Watch.check 10

You can use process monitoring tools such as monit, god, eye etc. These tools can check for resque PID and memory usage at time interval specified by you. You also have options to restart background processes if the memory limit exceeds your specified expectations. Personally, I use eye gem.

You could do it much simpler. Start Resque in foreground. When it exits, start it again. No pid files, no monitoring, no sleep.
require 'logger'
class Restarter
def initialize(cmd:, logger: Logger.new(STDOUT))
#cmd = cmd
#logger = logger
end
def start
loop do
#logger.info("Starting #{#cmd}")
system(#cmd)
#logger.warn("Process exited: #{#cmd}")
end
end
end
restarter = Restarter.new(
cmd: 'cd /var/www/agts-api && (RAILS_ENV=production rake resque:workers QUEUE='*' COUNT='12') 2>&1 | tee -a log/resque.log',
logger: Logger.new('watch.log', 'daily')
)
restarter.start

Related

Capistrano not restarting Sidekiq

I have Capistrano deploying my app to a Ubuntu remote server on a cloud host. It works except that Sidekiq does not get restarted. After a deploy new Sidekiq jobs are stuck in the queue until it does finally get restarted. I currently manually SSH into the machine and run sudo initctl stop/start workers which works. I am not super strong at all with Capistrano and me research so far has failed to find me a solution to this. I am hoping I am missing something obvious to someone more familiar than me. Here is the relevant portion of my /config/deploy.rb file:
namespace :deploy do
namespace :sidekiq do
task :quiet do
on roles(:app) do
puts capture("pgrep -f 'workers' | xargs kill -USR1")
end
end
task :restart do
on roles(:app) do
execute :sudo, :initctl, :stop, :workers
execute :sudo, :initctl, :start, :workers
end
end
end
after 'deploy:starting', 'sidekiq:quiet'
after 'deploy:reverted', 'sidekiq:restart'
after 'deploy:published', 'sidekiq:restart'
end
UPDATE
From my reply logs:
DEBUG [268bc235] Running /usr/bin/env kill -0 $( cat /home/ubuntu/staging/shared/tmp/pids/sidekiq-0.pid ) as ubuntu#159.203.8.242
DEBUG [268bc235] Command: cd /home/ubuntu/staging/releases/20160806065537 && ( export RBENV_ROOT="$HOME/.rbenv" RBENV_VERSION="2.2.3" ; /usr/bin/env kill -0 $( cat /home/ubuntu/staging/shared/tmp/pids/sidekiq-0.pid ) )
DEBUG [268bc235] Finished in 0.471 seconds with exit status 1 (failed).
I don't believe you need those configs in your deploy.rb if you have the capistrano-sidekiq gem installed and called in your Capfile.
Make sure you have require 'capistrano/sidekiq' in your Capfile or it won't know to call the default tasks.

How to kill sidekiq job in Rails 4 with ActiveJob

Let's say we have test job:
class TestJob < ActiveJob::Base
queue_as :default
def perform(video)
video.process
end
end
After it we run bundle exec sidekiq start
And run in new terminal window in rails console
Video.first.pending!; TestJob.perform_later(Video.first)
Job is running, ffmpeg is on background, everything is fine, but according to official sidekiq wiki docs I try:
require 'sidekiq/api'
Sidekiq::Queue.new
=> #<Sidekiq::Queue:0x00000006406978 #name="default", #rname="queue:default">
Sidekiq::Queue.new.each {|job| puts job}
=> nil
Sidekiq::Queue.new.size
=> 0
ss = Sidekiq::ScheduledSet.new
=> #<Sidekiq::ScheduledSet:0x000000064a4330 #_size=0, #name="schedule">
ss.size
=> 0
Why there is no jobs? The job is running successfully ( I see it also in first window where sidekiq starts, but I can't see and delete it in rails console )
I am using Ubuntu 14 if it helps
With best regards, Ruslan.
UPD:
It looks that
ps = Sidekiq::ProcessSet.new; ps.each(&:quiet!)
works
, but it doesn't stop my ffmpeg process this part of code internally in process:
cmd = "ffmpeg -i #{input_file.shellescape} #{options} -threads 0 -y #{self.path + outfile}"
pid = spawn(cmd, :out => output_file, :err => output_file)
Process.wait(pid)
How to stop it?
If a job is processing, it's not enqueued anymore.
I use sidekiq web UI and various queues for my jobs. If I realize that particular job is failing I can clear out the queue of subsequent jobs of the same class.

How can I execute a binary through ruby for a limited time?

I want to run a binary through ruby for a limited time. In my case its airodump-ng with the full command of:
airodump-ng -c <mychan> mon0 --essid "my_wlan" --write capture
For the ones who don't know airdump-ng for normal it starts and doesn't terminate itself. Its running forever if the user doesn't stop it by pressing Strg + C. This isn't a problem at the bash but executing it through ruby it's causing serious trouble. Is there a way to limit the time a binary is runned by the system method?
Try timeout library:
require 'timeout'
begin
Timeout.timeout(30) do
system('airodump-ng -c <mychan> mon0 --essid "my_wlan" --write capture')
end
rescue Timeout::Error
# timeout
end
You could use the Ruby spawn method. From the Ruby docs:
This method is similar to #system but it doesn’t wait for the command to finish.
Something like this:
# Start airodump
pid = spawn('airodump-ng -c <mychan> mon0 --essid "my_wlan" --write capture')
# Wait a little while...
sleep 60
# Then stop airodump (similar to pressing CTRL-C)
Process.kill("INT", pid)

Sidekiq jobs not persisting data on a rails app

I have the following setup, an ubuntu server with nginx - passenger, postgres db, redis and sidekiq.
i have set up an upstart job to keep an eye on the sidekiq process.
description "Sidekiq Background Worker"
# no "start on", we don't want to automatically start
stop on (stopping workers or runlevel [06])
# change to match your deployment user
setuid deploy
setgid deploy
respawn
respawn limit 3 30
# TERM and USR1 are sent by sidekiqctl when stopping sidekiq. Without declaring these as normal exit codes, it just respawns.
normal exit 0 TERM USR1
script
# this script runs in /bin/sh by default
# respawn as bash so we can source in rbenv
exec /bin/bash <<EOT
# use syslog for logging
exec &> /dev/kmsg
# pull in system rbenv
export HOME=/home/deploy
export RAILS_ENV=production
source /home/deploy/.rvm/scripts/rvm
cd /home/deploy/myapp/current
bundle exec sidekiq -c 25 -L log/sidekiq.log -P tmp/pids/sidekiq.pid -q default -q payment -e production
EOT
end script
This actually work i can start, stop and restart the process. i can see the process over at the sidekiq web http monitor. but when i run this workers workers they get processed. Here are both workers:
class EventFinishedWorker
include Sidekiq::Worker
def perform(event_id)
event = Event.find(event_id)
score_a = event.team_A_score
score_b = event.team_B_score
event.bets.each do |bet|
if (score_a == bet.team_a_score) && (score_b == bet.team_b_score)
bet.guessed = true
bet.save
end
end
end
end
and
class PayPlayersBetsWorker
include Sidekiq::Worker
sidekiq_options :queue => :payment
def perform(event_id)
event = Event.find(event_id)
bets = event.bets.where(guessed: true)
total_winning_bets = bets.count
unless total_winning_bets == 0
pot = event.pot
amount_to_pay = pot / total_winning_bets
unless bets.empty?
bets.each do |bet|
account = bet.user.account
user = User.find_by_id(bet.user.id)
bet.payed = true
bet.save
account.wincoins += amount_to_pay
account.accum_wincoins.increment(amount_to_pay)
account.save
user.set_score(user.current_score)
end
end
end
end
end
2014-05-27T18:48:34Z 5985 TID-7agy8 EventFinishedWorker JID-33ef5d7e7de51189d698c1e7 INFO: start
2014-05-27T18:48:34Z 5985 TID-5s4rg PayPlayersBetsWorker JID-e8473aa1bc59f0b958d23509 INFO: start
2014-05-27T18:48:34Z 5985 TID-5s4rg PayPlayersBetsWorker JID-e8473aa1bc59f0b958d23509 INFO: done: 0.07 sec
2014-05-27T18:48:35Z 5985 TID-7agy8 EventFinishedWorker JID-33ef5d7e7de51189d698c1e7 INFO: done: 0.112 sec
I get no errors running both workers processes, but ...
cheking my data, EventFinishedWorker, does its job. no problems whatsoever. the second job PayPlayersBetsWorker doesnt work. but when i go into the console and do something like.
worker = PayPlayersBetsWorker.new
worker.perform(1)
it works it runs the jobs flawlessly. i have detected also that if i run sidekiq from the console directly not using upstart it works too.
bundle exec sidekiq -P tmp/pids/sidekiq.pid -q default -q payment -e production
this works. the problem seems to be running sidekiq with upstart. help would be appreciated :D
It looks like you have a race condition between the two jobs. EFW isn't finished before PPBW queries, doesn't find anything and immediately exits.

Keep getting 504 Gateway Time-out after deploying on ec2 using rubber

I used the rubber gem to deploy my application on ec2.
I followed the instructions here: http://ramenlab.wordpress.com/2011/06/24/deploying-your-rails-app-to-aws-ec2-using-rubber/.
The process seems to finish successfully but when I try to use the app I keep getting 504 gateway time-out.
Why is this happening and how do I fix it?
Answer from Matthew Conway (reposted below): https://groups.google.com/forum/?fromgroups#!searchin/rubber-ec2/504/rubber-ec2/AtEoOf-T9M0/zgda0Fo1qeIJ
Note: Even with this code you need to do something like:
> cap deploy:update
> FILTER=app01,app02 cap deploy:restart
> FILTER=app03,app04 cap deploy:restart
I assume this is a rails application? The rails stack is notoriously slow to load up, so this delay in load time is probably what you are seeing. Passenger was supposed to make this better with the zero downtime feature v3, but they seemed to have reneged on that and are only going to be offering it as part of some undefined paid version at some undefined point n the future.
What I do is have multiple app server instances and restart them serially so that I can continue to serve traffic on one, while the others are restarting. Doesn't work with a single instance, but most production setups need multiple instances for redundancy/reliability anyway. This isn't currently part of rubber, but I have it deploy scripts setup for my app and will merge it in at some point - my config looks something like the below.
Matt
rubber-passenger.yml:
roles:
passenger:
rolling_restart_port: "#{passenger_listen_port}"
web_tools:
rolling_restart_port: "#{web_tools_port}"
deploy-apache.rb:
on :load do
rubber.serial_task self, :serial_restart, :roles => [:app, :apache] do
rsudo "service apache2 restart"
end
rubber.serial_task self, :serial_reload, :roles => [:app, :apache] do
# remove file checked by haproxy to take server out of pool, wait some
# secs for haproxy to realize it
maybe_sleep = " && sleep 5" if RUBBER_ENV == 'production'
rsudo "rm -f #{previous_release}/public/httpchk.txt #{current_release}/public/httpchk.txt#{maybe_sleep}"
rsudo "if ! ps ax | grep -v grep | grep -c apache2 &> /dev/null; then service apache2 start; else service apache2 reload; fi"
# Wait for passenger to startup before adding host back into haproxy pool
logger.info "Waiting for passenger to startup"
opts = get_host_options('rolling_restart_port') {|port| port.to_s}
rsudo "while ! curl -s -f http://localhost:$CAPISTRANO:VAR$/ &> /dev/null; do echo .; done", opts
# Touch the file so that haproxy adds this server back into the pool.
rsudo "touch #{current_path}/public/httpchk.txt#{maybe_sleep}"
end
end
after "deploy:restart", "rubber:apache:reload"
desc "Starts the apache web server"
task :start, :roles => :apache do
rsudo "service apache2 start"
opts = get_host_options('rolling_restart_port') {|port| port.to_s}
rsudo "while ! curl -s -f http://localhost:$CAPISTRANO:VAR$/ &> /dev/null; do echo .; done", opts
rsudo "touch #{current_path}/public/httpchk.txt"
end
I got the same error and solved the problem.
It was "haproxy" timeout. It is a load balancer installed by Rubber.
It is set to 30000ms, you should change it in rubber configuration file.
Good luck!

Resources