How to monitor delayed_job with monit - ruby-on-rails

Are there any examples on the web of how to monitor delayed_job with Monit?
Everything I can find uses God, but I refuse to use God since long running processes in Ruby generally suck. (The most current post in the God mailing list? God Memory Usage Grows Steadily.)
Update: delayed_job now comes with a sample monit config based on this question.

Here is how I got this working.
Use the collectiveidea fork of delayed_job besides being actively maintained, this version has a nice script/delayed_job daemon you can use with monit. Railscasts has a good episode about this version of delayed_job (ASCIICasts version). This script also has some other nice features, like the ability to run multiple workers. I don't cover that here.
Install monit. I installed from source because Ubuntu's version is so ridiculously out of date. I followed these instructions to get the standard init.d scripts that come with the Ubuntu packages. I also needed to configure with ./configure --sysconfdir=/etc/monit so the standard Ubuntu configuration dir was picked up.
Write a monit script. Here's what I came up with:
check process delayed_job with pidfile /var/www/app/shared/pids/delayed_job.pid
start program = "/var/www/app/current/script/delayed_job -e production start"
stop program = "/var/www/app/current/script/delayed_job -e production stop"
I store this in my soucre control system and point monit at it with include /var/www/app/current/config/monit in the /etc/monit/monitrc file.
Configure monit. These instructions are laden with ads but otherwise OK.
Write a task for capistrano to stop and start. monit start delayed_job and monit stop delayed_job is what you want to run. I also reload monit when deploying to pick up any config file changes.
Problems I ran into:
daemons gem must be installed for script/delayed_job to run.
You must pass the Rails environment to script/delayed_job with -e production (for example). This is documented in the README file but not in the script's help output.
I use Ruby Enterprise Edition, so I needed to get monit to start with that copy of Ruby. Because of the way sudo handles the PATH in Ubuntu, I ended up symlinking /usr/bin/ruby and /usr/bin/gem to the REE versions.
When debugging monit, I found it helps to stop the init.d version and run it from the th command line, so you can get error messages. Otherwise it is very difficult to figure out why things are going wrong.
sudo /etc/init.d/monit stop
sudo monit start delayed_job
Hopefully this helps the next person who wants to monitor delayed_job with monit.

For what it's worth, you can always use /usr/bin/env with monit to setup the environment. This is especially important in the current version of delayed_job, 1.8.4, where the environment (-e) option is deprecated.
check process delayed_job with pidfile /var/app/shared/pids/delayed_job.pid
start program = "/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job start"
stop program = "/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job stop"
In some cases, you may also need to set the PATH with env, too.

I found it was easier to create an init script for delayed job. It is available here: http://gist.github.com/408929
or below:
#! /bin/sh
set_path="cd /home/rails/evatool_staging/current"
case "$1" in
start)
echo -n "Starting delayed_job: "
su - rails -c "$set_path; RAILS_ENV=staging script/delayed_job start" >> /var/log/delayed_job.log 2>&1
echo "done."
;;
stop)
echo -n "Stopping sphinx: "
su - rails -c "$set_path; RAILS_ENV=staging script/delayed_job stop" >> /var/log/delayed_job.log 2>&1
echo "done."
;;
*)
N=/etc/init.d/delayed_job_staging
echo "Usage: $N {start|stop}" >&2
exit 1
;;
esac
exit 0
Then make sure that monit is set to start / restart the app so in your monitrc file:
check process delayed_job with pidfile "/path_to_my_rails_app/shared/pids/delayed_job.pid"
start program = "/etc/init.d/delayed_job start"
stop program = "/etc/init.d/delayed_job stop"
and that works great!

I found a nice way to start delayed_job with cron on boot. I'm using whenever to control cron.
My schedule.rb:
# custom job type to control delayed_job
job_type :delayed_job, 'cd :path;RAILS_ENV=:environment script/delayed_job ":task"'
# delayed job start on boot
every :reboot do
delayed_job "start"
end
Note: I upgraded whenever gem to 0.5.0 version to be able to use job_type

I don't know with Monit, but I've written a couple Munin plugins to monitor Queue Size and Average Job Run Time. The changes I made to delayed_job in that patch might also make it easier for you to write Monit plugins in case you stick with that.

Thanks for the script.
One gotcha -- since monit by definition has a 'spartan path' of
/bin:/usr/bin:/sbin:/usr/sbin
... and for me ruby was installed / linked in /usr/local/bin, I had to thrash around for hours trying to figure out why monit was silently failing when trying to restart delayed_job (even with -v for monit verbose mode).
In the end I had to do this:
check process delayed_job with pidfile /var/www/app/shared/pids/delayed_job.pid
start program = "/usr/bin/env PATH=$PATH:/usr/local/bin /var/www/app/current/script/delayed_job -e production start"
stop program = "/usr/bin/env PATH=$PATH:/usr/local/bin /var/www/app/current/script/delayed_job -e production stop"

I had to combine the solutions on this page with another script made by toby to make it work with monit and starting with the right user.
So my delayed_job.monitrc looks like this:
check process delayed_job
with pidfile /var/app/shared/pids/delayed_job.pid
start program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job start' - rails"
stop program = "/bin/su -c '/usr/bin/env RAILS_ENV=production /var/app/current/script/delayed_job stop' - rails"

If your monit is running as root and you want to run delayed_job as my_user then do this:
/etc/init.d/delayed_job:
#!/bin/sh
# chmod 755 /etc/init.d/delayed_job
# chown root:root /etc/init.d/delayed_job
case "$1" in
start|stop|restart)
DJ_CMD=$1
;;
*)
echo "Usage: $0 {start|stop|restart}"
exit
esac
su -c "cd /var/www/my_app/current && /usr/bin/env bin/delayed_job $DJ_CMD" - my_user
/var/www/my_app/shared/monit/delayed_job.monitrc:
check process delayed_job with pidfile /var/www/my_app/shared/tmp/pids/delayed_job.pid
start program = "/etc/init.d/delayed_job start"
stop program = "/etc/init.d/delayed_job stop"
if 5 restarts within 5 cycles then timeout
/etc/monit/monitrc:
# add at bottom
include /var/www/my_app/shared/monit/*

Since i didn't want to run as root, I ended up creating a bash init script that monit used for starting and stopping (PROGNAME would be the absolute path to script/delayed_job):
start() {
echo "Starting $PROGNAME"
sudo -u $USER /usr/bin/env HOME=$HOME RAILS_ENV=$RAILS_ENV $PROGNAME start
}
stop() {
echo "Stopping $PROGNAME"
sudo -u $USER /usr/bin/env HOME=$HOME RAILS_ENV=$RAILS_ENV $PROGNAME stop
}

I have spent quite a bit of time on this topic. I was fed up with not having a good solution for it so I wrote the delayed_job_tracer plugin that specifically addresses monitoring of delayed_job and its jobs.
Here's is an article I've written about it: http://modernagility.com/articles/5-monitoring-delayed_job-and-its-jobs
This plugin will monitor your delayed job process and send you an e-mail if delayed_job crashes or if one of its jobs fail.

For Rails 3, you may need set HOME env to make compass work properly, and below config works for me:
check process delayed_job
with pidfile /home/user/app/shared/pids/delayed_job.pid
start program = "/bin/sh -c 'cd /home/user/app/current; HOME=/home/user RAILS_ENV=production script/delayed_job start'"
stop program = "/bin/sh -c 'cd /home/user/app/current; HOME=/home/user RAILS_ENV=production script/delayed_job stop'"

I ran into an issue where if the delayed job dies while it still has a job locked, that job will not be freed. I wrote a wrapper script around delayed job that will look at the pid file and free any jobs from the dead worker.
The script is for rubber/capistrano
roles/delayedjob/delayed_job_wrapper:
<% #path = '/etc/monit/monit.d/monit-delayedjob.conf' %>
<% workers = 4 %>
<% workers.times do |i| %>
<% PIDFILE = "/mnt/custora-#{RUBBER_ENV}/shared/pids/delayed_job.#{i}.pid" %>
<%= "check process delayed_job.#{i} with pidfile #{PIDFILE}"%>
group delayed_job-<%= RUBBER_ENV %>
<%= " start program = \"/bin/bash /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper #{i} start\"" %>
<%= " stop program = \"/bin/bash /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper #{i} stop\"" %>
<% end %>
roles/delayedjob/delayed_job_wrapper
#!/bin/bash
<% #path = "/mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current/script/delayed_job_wrapper" %>
<%= "pid_file=/mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/shared/pids/delayed_job.$1.pid" %>
if [ -e $pid_file ]; then
pid=`cat $pid_file`
if [ $2 == "start" ]; then
ps -e | grep ^$pid
if [ $? -eq 0 ]; then
echo "already running $pid"
exit
fi
rm $pid_file
fi
locked_by="delayed_job.$1 host:`hostname` pid:$pid"
<%=" /usr/bin/mysql -e \"update delayed_jobs set locked_at = null, locked_by = null where locked_by='$locked_by'\" -u#{rubber_env.db_user} -h#{rubber_instances.for_role('db', 'primary' => true).first.full_name} #{rubber_env.db_name} " %>
fi
<%= "cd /mnt/#{rubber_env.app_name}-#{RUBBER_ENV}/current" %>
. /etc/profile
<%= "RAILS_ENV=#{RUBBER_ENV} script/delayed_job -i $1 $2"%>

to see what is going on, run monit in foreground verbose mode: sudo monit -Iv
using rvm installed under user "www1" and group "www1".
in file /etc/monit/monitrc:
#delayed_job
check process delayed_job with pidfile /home/www1/your_app/current/tmp/pids/delayed_job.pid
start program "/bin/bash -c 'PATH=$PATH:/home/www1/.rvm/bin;source /home/www1/.rvm/scripts/rvm;cd /home/www1/your_app/current;RAILS_ENV=production bundle exec script/delayed_job start'" as uid www1 and gid www1
stop program "/bin/bash -c 'PATH=$PATH:/home/www1/.rvm/bin;source /home/www1/.rvm/scripts/rvm;cd /home/www1/your_app/current;RAILS_ENV=production bundle exec script/delayed_job stop'" as uid www1 and gid www1
if totalmem is greater than 200 MB for 2 cycles then alert

Related

Run Capistrano Command on Instance Startup

I am running a Rails 3.2.18 application on AWS. This application is deployed using Capistrano, including starting Resque workers for the application.
My problem is that AWS can occasionally restart instances with little to no warning, or an instance can be restarted from the AWS console. When this happens, Resque is not started as it is during our normal deployment process.
I have tried to create a shell script in /etc/init.d to start Resque on boot, but this script continues to prompt for a password, and I'm not sure what I'm missing. The essence of the start script is:
/bin/su -l deploy-user -c "/www/apps/deploy-user/current && bundle exec cap <environment> resque:scheduler:start resque:start"
Obviously the above command works as expected when run as the "deploy" user from the bash prompt, but when run via sudo /etc/init.d/resque start, it prompts for a password upon running the first Capistrano command.
Is there something glaring that I am missing? Or perhaps is there a better way to accomplish this?
You should run su with -c parameter to specify commands, and enclose all commands within double quotes:
/bin/su -l deploy-user -c "/www/apps/deploy-user/current && bundle exec cap <environment> resque:scheduler:start resque:start"
Of course, you have other alternatives, like /etc/rc.local.
But if you're going to use an init.d script, I'd suggest to create it propperly (at least start/stop, default runlevels...). Otherwise I'd go with /etc/rc.local or even with a cron job for the deploy-user:
#reboot /www/apps/deploy-user/current && bundle exec cap <environment> resque:scheduler:start resque:start

launching background process in capistrano task

capistrano task
namespace :service do
desc "start daemontools (svscan/supervise/svscanboot)"
task :start, :roles => :app do
sudo "svscanboot&"
end
end
Now this doesn't work: the svscanboot process simply doesn't run.
This helped me find sleep: https://github.com/defunkt/resque/issues/284
other sources pointed me to nohup, redirection, and pty => true, so I tried all these.
run "nohup svscanboot >/tmp/svscanboot.log 2>&1 &" # NO
run "(svscanboot&) && sleep 1" # NO
run "(nohup svscanboot&) && sleep 1" # YES!
Now, could anyone explain to me why i need the sleep statement and what difference does nohup make?
For the record all the above run equally well if run from user shell, problem is only in the context of capistrano.
thanks
Try forking the process as explained here: Spawn a background process in Ruby
You should be able to do something like this:
job1 = fork do
run "svscanboot"
end
Process.detach(job1)
As well, checkout this: Starting background tasks with Capistrano
My simple solution would be make svscanboot.sh file at remote server with whatever code you want to run. In your case
svscanboot >/tmp/svscanboot.log 2>&1
In cap rake task add this
run "sh +x somefile.sh &"
this works well for me.
I think nohup just launches the process in background, so you don't need to explicitly set the last &.
Did you try
run "nohup svscanboot >/tmp/svscanboot.log 2>&1"
(without the ending & to send it to the background).
That should work and remain running when your current capistrano session is closed.
Try this
run "nohup svscanboot >/tmp/svscanboot.log 2>&1 & sleep 5", pty: false
I'd like to share my solution which also works when executing multiple commands. I tried many other variants found online, including the "sleep N" hack.
run("nohup sh -c 'cd #{release_path} && bundle exec rake task_namespace:task_name RAILS_ENV=production > ~/shared/log/<rakelog>.log &' > /dev/null 2>&1", :pty => true)

Rails 3 delayed_job start after rebooting

I have my rails site deployed under apache. The apache is run as a service. Now I have added delayed_job there and things work fine.
Now I want to start the workers together with apache, e.g, After rebooting the server, my site and workers are up and ready so I don't have to log in and type "sudo RAILS_ENV=production script/delayed_job -n 2 start".
Another issue is that whenever I want to start the delayed_job I have to use "sudo"...
Any idea how to avoid those 2 issues?
Thanks for your help.
Use the whenever gem and its 'every :reboot' functionality. In schedule.rb:
environment = ENV['RAILS_ENV'] || 'production'
every :reboot do
command "cd #{path} && #{environment_variable}=#{environment} bin/delayed_job --pool=queue1:2, --pool=queue2,queue3:1 restart"
end
Could you just create a shell script to execute the commands you need?
#!/bin/sh
# stop delayed job
# restart apache
apachectl restart
# start delayed job
sudo RAILS_ENV=production script/delayed_job -n 2 start
It sounds like you want to have delayed_job automatically start after apache starts when you boot up the hardware. If that's the case you need to write an init script in /etc/init.d or /etc/rc.d/init.d (depending on your system). This page gives a decent primer on this:
http://www.philchen.com/2007/06/04/quick-and-dirty-how-to-write-and-init-script

Using God to monitor Unicorn - Start exited with non-zero code = 1

I am working on a God script to monitor my Unicorns. I started with GitHub's examples script and have been modifying it to match my server configuration. Once God is running, commands such as god stop unicorn and god restart unicorn work just fine.
However, god start unicorn results in WARN: unicorn start command exited with non-zero code = 1. The weird part is that if I copy the start script directly from the config file, it starts right up like a brand new mustang.
This is my start command:
/usr/local/bin/unicorn_rails -c /home/my-linux-user/my-rails-app/config/unicorn.rb -E production -D
I have declared all paths as absolute in the config file. Any ideas what might be preventing this script from working?
I haven't used unicorn as an app server, but I've used god for monitoring before.
If I remember rightly when you start god and give your config file, it automatically starts whatever you've told it to watch. Unicorn is probably already running, which is why it's throwing the error.
Check this by running god status once you've started god. If that's not the case you can check on the command line what the comand's exit status is:
/usr/local/bin/unicorn_rails -c /home/my-linux-user/my-rails-app/config/unicorn.rb -E production -D;
echo $?;
that echo will print the exit status of the last command. If it's zero, the last command reported no errors. Try starting unicorn twice in a row, I expect the second time it'll return 1, because it's already running.
EDIT:
including the actual solution from comments, as this seems to be a popular response:
You can set an explicit user and group if your process requires to be run as a specific user.
God.watch do |w|
w.uid = 'root'
w.gid = 'root'
# remainder of config
end
My problem was that I never bundled as root. Here is what I did:
sudo bash
cd RAILS_ROOT
bundle
You get a warning telling you to never do this:
Don't run Bundler as root. Bundler can ask for sudo if it is needed,
and installing your bundle as root will break this application for all
non-root users on this machine.
But it was the only way I could get resque or unicorn to run with god. This was on an ec2 instance if that helps anyone.
Add the log option has helped me greatly in debugging.
God.watch do |w|
w.log = "#{RAILS_ROOT}/log/god.log"
# remainder of config
end
In the end, my bug turned out to be the start_script in God was executed in development environment. I fixed this by appending the RAILS_ENV to the start script.
start_script = "RAILS_ENV=#{ENV['RACK_ENV']} bundle exec sidekiq -P #{pid_file} -C #{config_file} -L #{log_file} -d"

Starting background tasks with Capistrano

For my RubyOnRails-App I have to start a background job at the end of Capistrano deployment. For this, I tried the following in deploy.rb:
run "nohup #{current_path}/script/runner -e production 'Scheduler.start' &", :pty => true
Sometimes this works, but most of the time it does not start the process (= not listed in ps -aux). And there are no error messages. And there is no nohup.out, not in the home directory and not in the rails app directory.
I tried using trap('SIGHUP', 'IGNORE') in scheduler.rb instead of nohup, but the result is the same.
The only way to get it work is removing the ":pty => true" and do a manual Ctrl-C at the end of "cap deploy". But I don't like this...
Are there any other chances to invoke this Scheduler.start? Or to get some more error messages?
I'm using Rails 2.3.2, Capistrano 2.5.8, Ubuntu Hardy on the Server
With :pty => true, user shell start-up scripts (e.g. bashrc, etc.) are (usually) not loaded. My ruby program exited right after launching because of the lack of dependent environment variables.
Without :pty => true, as you described in the question, capistrano hangs there waiting for the process to exit. You'll need to redirect both stdout and stderr to make it return immediately.
run 'nohup ruby -e "sleep 5" &' # hangs for 5 seconds
run 'nohup ruby -e "sleep 5" > /dev/null &' # hangs for 5 seconds
run 'nohup ruby -e "sleep 5" > /dev/null 2>&1 &' # returns immediately. good.
If your background task still doesn't run. Try redirecting stdout and stderr to a log file so that you can investigate the output.
I'd like to share my solution which also works when executing multiple commands. I tried many other variants found online, including the "sleep N" hack.
run("nohup sh -c 'cd #{release_path} && bundle exec rake task_namespace:task_name RAILS_ENV=production > ~/shared/log/<rakelog>.log &' > /dev/null 2>&1", :pty => true)
This is a dup response launching background process in capistrano task but want to make sure others and myself can google for this solution.
Do you want your Scheduler job to run continually in the background and get restarted when you run Capistrano?
If so, then for that I use runit http://smarden.sunsite.dk/runit/ and DelayedJob http://github.com/Shopify/delayed_job/tree/master
Install runit in the mode of not replacing init
Add your background job as a runit service and add the log monitor for it from runit.
Have Capistrano call sudo sv kill job_name to kill and restart the job.
My backround job is an instance of the Rails plugin DelayedJob which handles background Rails tasks. I kill it with every Capistrano deploy so it will restart with the updated code base.
This has proved to be very reliable.
HTH,
Larry
If this task scheduler has a -d switch it will work. For example passenger standalone has a -d option to start it as a demonized process.
namespace :passenger_standalone do
task :start do
run "cd #{current_path} && passenger start -e #{rails_env} -d"
end
task :stop do
run "cd #{current_path} && RAILS_ENV=#{rails_env} passenger stop"
end
task :restart do
stop
start
end
end

Resources