Kinda new to Rails, so please cope with me. What i'm doing now is background processing some Ruby code use Resque. To get the Rescque rake task started, I've been using (on heroku), I have a resque.rake file with that recommended code to attach into heroku's magical(or strange) threading architecture:
require "resque/tasks"
require 'resque_scheduler/tasks'
task "resque:setup" => :environment do
ENV['QUEUE'] = '*'
end
desc "Alias for resque:work (To run workers on Heroku)"
task "jobs:work" => "resque:work"
Since I need access to the Rails code, I reference :environment. If I set at least 1 worker dyno in the background on heroku, my Resque does great, gets cleared, everything is happy. Until i try to automate stuff...
So I wanted to evolve the code and automatically fill the queue with relevant tasks every minute or so. Do that (without using cron, because heroku is not adequate with cron), I declare an initializer named task_scheduler.rb that uses Rufus scheduler to run tasks:
scheduler = Rufus::Scheduler.start_new
scheduler.in '5s' do
autoprocessor_method
end
scheduler.every '1m' do
autoprocessor_method
end
Things appear to work awesome for a while....then the rake process just stops picking up from the queue unexplainably. The queue just gets larger and larger. Even if i have multiple worker dynos running, they all eventually get tired and stop processing the queue. I'm not sure what I am doing wrong, but I suspect the referencing of the Rails environment in my rake task is causing the task_scheduler.rb code to run again, causing duplicate scheduling. I'm wondering how to solve that problem if someone knows, and I'm also curious if that is the reason for the rake task to stop working.
Thank you
You should not be booting the scheduler in an initializer, you should have a daemon process running the scheduler and filling up your queue. It would be something like this ("script/scheduler"):
#!/usr/bin/env ruby
root = File.expand_path(File.join(File.dirname(__FILE__), '..'))
Dir.chdir(root)
require 'rubygems'
gem 'daemons'
require 'daemons'
options = {
:dir_mode => :normal,
:dir => File.join(root, 'log'),
:log_output => true,
:backtrace => true,
:multiple => false
}
Daemons.run_proc("scheduler", options) do
Dir.chdir(root)
require(File.join(root, 'config', 'environment'))
scheduler = Rufus::Scheduler.start_new
scheduler.in '5s' do
autoprocessor_method
end
scheduler.every '1m' do
autoprocessor_method
end
end
And you can call this script as a usual daemon from your app:
script/scheduler start
This is going to make sure you have only one process sending work for the resque workers instead of one for each mongrel that you're running.
First of all, if you are not running on Heroku, i would not recommend this approach. I'd look at Mauricio's answer, or consider using a classic cron job or using Whenever to schedule the cron job.
But if you are in the pain of running on heroku and trying to do this, here is how i got this to work.
I kept the same original Resque.rake code in place, as i pasted in the original question. In addition, i created another rake task that i attached to the jobs:work rake process, just like the first case:
desc "Scheduler processor"
task :scheduler => :environment do
autoprocess_method
scheduler = Rufus::Scheduler.start_new
scheduler.every '1m' do
twitter_autoprocess
end
end
desc "Alias for resque:work (To run workers on Heroku)"
task "jobs:work" => "scheduler"
Couple of notes:
This will be imperfect once you use more than one worker dyno because the scheduler will run in more than one spot. you can solve that by saving state somewhere, but its not as clean as I would like.
I found the original reason why the process would hang. It was this line of code:
scheduler.in '5s' do
autoprocessor_method
end
I'm not sure why, but when I removed that, it never hung again.
Related
I'm seeking help concerning the whenever gem. Here my case:
I have the task I generated and that works when I run it through command line as such rake dashboard_data_t:collect.
namespace :dashboard_data_t do
desc "TODO"
task collect: :environment do
#task...
end
end
I then followed the documentation provided here in such a way that my config/schedule.rb looks like so:
# config/schedule.rb
every :day, at: '10:43 am' do
rake "dashboard_data_t:collect"
end
Happily done with that, I thought it would go on and run itself alone, without me needing to do anything more. But I noticed it didn't. I thought it might come from my task so I created an other one, this time way more simple than the 1st. Its purpose was solely to experiment and find what was going wrong:
namespace :test_name do
desc "TODO"
task test_task: :environment do
sh('echo', 'test task runned successfully')
end
end
I then added the following to my config/schedule.rb:
# config/schedule.rb
every 1.minutes do
rake "test_name:test_task"
end
Once again, the task didn't execute (periodically), but was still working manually.
I noticed by running the crontab -e command that RAILS_ENV was set to production, I understood why my dashboard_data_t:collect task wasn't working, because it relied on the development db. So I did the following:
# config/schedule.rb
set :environment, 'development'
Unfortunately, this didn't change anything as both tasks still don't execute. Now I'm stuck here with no ideas whatsoever. Can anyone help me.
Cheers.
I have a rails app that I want to use sidekiq on.
Before I figure out how to setup sidekiq on the server etc., is it possible for me to create a rake task that will run the sidekiq workers for any pending jobs?
This isn't a production solution I know, but I just want to make sure everything else is working on the server and for the time being me running a rake task on the server is fine as it is more for QA'ing at this point.
Log into the server, cd to the app's dir and run 'bundle exec sidekiq' manually.
Referencing from
SortedEntry and ScheduledSet, I managed to come up with the following:
# ScheduledSet returns "pending" jobs
scheduled_jobs = Sidekiq::ScheduledSet.new
# puts scheduled_jobs.class
# => Sidekiq::ScheduledSet
scheduled_jobs.each do |scheduled_job|
# puts scheduled_job.class
# => Sidekiq::SortedEntry
job_class = scheduled_job.klass.constantize
job_args = scheduled_job.args
# you can also filter out only jobs in certain queue, i.e.:
# if scheduled_job.queue == 'mailers' ......
# run the job inline
job_class.new.perform(*job_args)
end
Then just wrap this into a rake task.
Once upon a time, I had one app - Cashyy. It used sidekiq. I deployed it and used this upstart script to manage sidekiq (start/restart).
I decide to deploy another app to the same server. The app (let's call it Giimee) also uses sidekiq.
And here is the issue. Sometimes I need to restart sidekiq for Cashyy, but not for Giimee. Now, as I understand I will need to hack something using index thing (in upstart script and managing sidekiqs: sudo restart sidekiq index=1) (if I understood it correctly).
BUT!
I have zero desire to dabble with these indexes (nightmare to support? Like you need to know how many apps are using sidekiq and be sure to assign unique index to each sidekiq. And to know assigned index if you want to restart specific sidekiq).
So here is the question: how can I isolate each sidekiq (so I would not need to maintain the index) and still get the stability and usability of upstart (starting the process, restarting, etc.)?
Or maybe I don't understand something and the thing with index is state of art?
You create two services:
cp sidekiq.conf /etc/init/cashyy.conf
cp sidekiq.conf /etc/init/glimee.conf
Edit each as necessary. sudo start cashyy, sudo stop glimee, etc. Now you'll have two completely separate Sidekiq processes running.
As an alternative to an upstart script, you can use Capistrano and Capistrano-Sidekiq to manage those Sidekiqs.
We have Sidekiq running on 3 machines and have had a good experience with these two libraries/tools.
Note: we currently use an older version of Capistrano (2.15.5)
In our architecture, the three machines are customized slightly on deploy. This led us to break up our capistrano deploy scripts by machine so that we could customize some classes, manage Sidekiq, etc. Our capistrano files are structured something like this:
- config/
- deploy.rb
- deploy/
- gandalf.rb
- gollum.rb
- legolas.rb
With capistrano-sidekiq, we are able to control, well, Sidekiq :) at any time (during a deploy or otherwise). We set up the Sidekiq aspects of our deploy scripts in the following way:
# config/deploy.rb
# global sidekiq settings
set :sidekiq_default_hooks, false
set :sidekiq_cmd, "#{fetch(:bundle_cmd, 'bundle')} exec sidekiq"
set :sidekiqctl_cmd, "#{fetch(:bundle_cmd, 'bundle')} exec sidekiqctl"
set :sidekiq_role, :app
set :sidekiq_pid, "#{current_path}/tmp/pids/sidekiq.pid"
set :sidekiq_env, fetch(:rack_env, fetch(:rails_env, fetch(:default_stage)))
set :sidekiq_log, File.join(shared_path, 'log', 'sidekiq.log')
# config/deploy/gandalf.rb
# Custom Sidekiq settings
set :sidekiq_timeout, 30
set :sidekiq_processes, 1
namespace :sidekiq do
# .. code omitted from methods and tasks for brevity
def for_each_process(&block)
end
desc 'Quiet sidekiq (stop accepting new work)'
task :quiet, :roles => lambda { fetch(:sidekiq_role) }, :on_no_matching_servers => :continue do
end
desc 'Stop sidekiq'
task :stop, :roles => lambda { fetch(:sidekiq_role) }, :on_no_matching_servers => :continue do
end
desc 'Start sidekiq'
task :start, :roles => lambda { fetch(:sidekiq_role) }, :on_no_matching_servers => :continue do
end
desc 'Restart sidekiq'
task :restart, :roles => lambda { fetch(:sidekiq_role) }, :on_no_matching_servers => :continue do
end
end
When I need to restart one of my Sidekiq instances, I can just go to my terminal and execute the following:
$ bundle exec cap gandalf sidekiq:restart
$ bundle exec cap gollum sidekiq:stop
It's made Sidekiq management quite painless for our team and thought it would be worth sharing in the event something similar could help you out.
We have to use delayed_job (or some other background-job processor) to run jobs in the background, but we're not allowed to change the boot scripts/boot-levels on the server. This means that the daemon is not guaranteed to remain available if the provider restarts the server (since the daemon would have been started by a capistrano recipe that is only run once per deployment).
Currently, the best way I can think of to ensure the delayed_job daemon is always running, is to add an initializer to our Rails application that checks if the daemon is running. If it's not running, then the initializer starts the daemon, otherwise, it just leaves it be.
The question, therefore, is how do we detect that the Delayed_Job daemon is running from inside a script? (We should be able to start up a daemon fairly easily, bit I don't know how to detect if one is already active).
Anyone have any ideas?
Regards,
Bernie
Based on the answer below, this is what I came up with. Just put it in config/initializers and you're all set:
#config/initializers/delayed_job.rb
DELAYED_JOB_PID_PATH = "#{Rails.root}/tmp/pids/delayed_job.pid"
def start_delayed_job
Thread.new do
`ruby script/delayed_job start`
end
end
def process_is_dead?
begin
pid = File.read(DELAYED_JOB_PID_PATH).strip
Process.kill(0, pid.to_i)
false
rescue
true
end
end
if !File.exist?(DELAYED_JOB_PID_PATH) && process_is_dead?
start_delayed_job
end
Some more cleanup ideas: The "begin" is not needed. You should rescue "no such process" in order not to fire new processes when something else goes wrong. Rescue "no such file or directory" as well to simplify the condition.
DELAYED_JOB_PID_PATH = "#{Rails.root}/tmp/pids/delayed_job.pid"
def start_delayed_job
Thread.new do
`ruby script/delayed_job start`
end
end
def daemon_is_running?
pid = File.read(DELAYED_JOB_PID_PATH).strip
Process.kill(0, pid.to_i)
true
rescue Errno::ENOENT, Errno::ESRCH # file or process not found
false
end
start_delayed_job unless daemon_is_running?
Keep in mind that this code won't work if you start more than one worker. And check out the "-m" argument of script/delayed_job which spawns a monitor process along with the daemon(s).
Check for the existence of the daemons PID file (File.exist? ...). If it's there then assume it's running else start it up.
Thank you for the solution provided in the question (and the answer that inspired it :-) ), it works for me, even with multiple workers (Rails 3.2.9, Ruby 1.9.3p327).
It worries me that I might forget to restart delayed_job after making some changes to lib for example, causing me to debug for hours before realizing that.
I added the following to my script/rails file in order to allow the code provided in the question to execute every time we start rails but not every time a worker starts:
puts "cleaning up delayed job pid..."
dj_pid_path = File.expand_path('../../tmp/pids/delayed_job.pid', __FILE__)
begin
File.delete(dj_pid_path)
rescue Errno::ENOENT # file does not exist
end
puts "delayed_job ready."
A little drawback that I'm facing with this though is that it also gets called with rails generate for example. I did not spend much time looking for a solution for that but suggestions are welcome :-)
Note that if you're using unicorn, you might want to add the same code to config/unicorn.rb before the before_fork call.
-- EDITED:
After playing around a little more with the solutions above, I ended up doing the following:
I created a file script/start_delayed_job.rb with the content:
puts "cleaning up delayed job pid..."
dj_pid_path = File.expand_path('../../tmp/pids/delayed_job.pid', __FILE__)
def kill_delayed(path)
begin
pid = File.read(path).strip
Process.kill(0, pid.to_i)
false
rescue
true
end
end
kill_delayed(dj_pid_path)
begin
File.delete(dj_pid_path)
rescue Errno::ENOENT # file does not exist
end
# spawn delayed
env = ARGV[1]
puts "spawing delayed job in the same env: #{env}"
# edited, next line has been replaced with the following on in order to ensure delayed job is running in the same environment as the one that spawned it
#Process.spawn("ruby script/delayed_job start")
system({ "RAILS_ENV" => env}, "ruby script/delayed_job start")
puts "delayed_job ready."
Now I can require this file anywhere I want, including 'script/rails' and 'config/unicorn.rb' by doing:
# in top of script/rails
START_DELAYED_PATH = File.expand_path('../start_delayed_job', __FILE__)
require "#{START_DELAYED_PATH}"
# in config/unicorn.rb, before before_fork, different expand_path
START_DELAYED_PATH = File.expand_path('../../script/start_delayed_job', __FILE__)
require "#{START_DELAYED_PATH}"
not great, but works
disclaimer: I say not great because this causes a periodic restart, which for many will not be desirable. And simply trying to start can cause problems because the implementation of DJ can lock up the queue if duplicate instances are created.
You could schedule cron tasks that run periodically to start the job(s) in question. Since DJ treats start commands as no-ops when the job is already running, it just works. This approach also takes care of the case where DJ dies for some reason other than a host restart.
# crontab example
0 * * * * /bin/bash -l -c 'cd /var/your-app/releases/20151207224034 && RAILS_ENV=production bundle exec script/delayed_job --queue=default -i=1 restart'
If you are using a gem like whenever this is pretty straightforward.
every 1.hour do
script "delayed_job --queue=default -i=1 restart"
script "delayed_job --queue=lowpri -i=2 restart"
end
I'm using delayed_job to run jobs, with new jobs being added every minute by a cronjob.
Currently I have an issue where the rake jobs:work task, currently started with 'nohup rake jobs:work &' manually, is randomly exiting.
While God seems to be a solution to some people, the extra memory overhead is rather annoying and I'd prefer a simpler solution that can be restarted by the deployment script (Capistrano).
Is there some bash/Ruby magic to make this happen, or am I destined to run a monitoring service on my server with some horrid hacks to allow the unprivelaged account the site deploys to the ability to restart it?
For me the daemons gem was unreliable with delayed_job. Could be a poorly written script (was using the one on collectiveidea's delayed_job github page), and not daemons fault, I'm not really sure. But for whatever reason, it would restart inconsistently on deployments.
I read somewhere this was due to it not waiting for the process to actually exit, so the pid files would get overwritten or something. But I didn't really bother to investigate. I switched to the daemons-spawn gem using these instructions and it seems to be much more reliable now.
The delayed_job docs suggest that you use a monitoring service to manage the rake worker job(s). I use runit--works well.
(You can install it in the mode where it does not replace init.)
Added:
Re: restart by Capistrano: yes, runit enables that. Just do a
sudo sv kill delayed_job
in your Capistrano recipe to kill the delayed_job worker. Runit will then restart it with your newly deployed code base.
I have implemented small rake task that restarts the jobs task over and over again:
desc "Start a delayed_job worker in a endless loop to prevent exits."
task :jobs => :environment do
while true
begin
Delayed::Worker.new(:min_priority => ENV['MIN_PRIORITY'],
:max_priority => ENV['MAX_PRIORITY'],
:quiet => false).start
rescue Exception => e
puts "Exception occured (#{e})"
end
puts "Task jobs:work exited, clearing queue and restarting"
sleep 1
Delayed::Job.delete_all
end
end
Apparently it did not work. So I ended with this simple solution:
for (( ;; )); do rake jobs:work --trace; done
get rid of delayed job and use either whenever or resque