Multiple delayed_jobs roles with Capistrano? - ruby-on-rails

I have a question that I am not finding much useful information for. I'm wondering if this is possible and, if so, how to best implement it.
We are building an app in Rails which has heavy data-processing in the background via DelayedJob (…it is working well for us.)
The app runs in AWS and we have a few different environments configured in Capistrano.
When we have heavy processing loads, our DelayedJob queues can back up--which is mostly fine. I do have one or two queues that I'd like to have a separate node tend to. Since it would be ignoring the 'clogged' queues, it would keep tending its one or two queues and they would stay current. For example, some individual jobs can take over an hour and I wouldn't want a forgotten-password-email delivery to be held up for 90 minutes until the next worker completes a task and checks for a priority job.
What I want is to have a separate EC2 instance that has one worker launched that tends to two different, explicit queues.
I can do this manually on my dev machine by launching one or two workers with the '--QUEUES' option.
Here is my question, how can I define a new role in capistrano and tell that role's nodes to start a different number of workers and tend to specific queues? Again, my normal delayed_jobs role is set to 3 workers and runs all queues.
Is this possible? Is there a better way?
Presently on Rails 3.2.13 with PostgreSQL 9.2 and the delayed_job gem.

Try this code - place it in deploy.rb after requiring default delayed_job recipes.
# This overrides default delayed_job tasks to support args per role
# If you want to use command line options, for example to start multiple workers,
# define a Capistrano variable delayed_job_args_per_role:
#
# set :delayed_job_args_per_role, {:worker_heavy => "-n 4",:worker_light => "-n 1" }
#
# Target server roles are taken from delayed_job_args_per_role keys.
namespace :delayed_job do
def args_per_host(host)
roles.each do |role|
find_servers(:roles => role).each do |server|
return args[role] if server.host == host
end
end
end
def args
fetch(:delayed_job_args_per_role, {:app => ""})
end
def roles
args.keys
end
desc "Start the delayed_job process"
task :start, :roles => lambda { roles } do
find_servers_for_task(current_task).each do |server|
run "cd #{current_path};#{rails_env} script/delayed_job start #{args_per_host server.host}", :hosts => server.host
end
end
desc "Restart the delayed_job process"
task :restart, :roles => lambda { roles } do
find_servers_for_task(current_task).each do |server|
run "cd #{current_path};#{rails_env} script/delayed_job restart #{args_per_host server.host}", :hosts => server.host
end
end
end
P.S. I've tested it only with single role in hash, but multiple roles should work fine too.

In Capistrano3, using the official capistrano3-delayed-job gem, you can do this without modifying the Capistrano methods:
# If you have several servers handling Delayed Jobs and you want to configure
# different pools per server, you can define delayed_job_pools_per_server:
#
# set :delayed_job_pools_per_server, {
# 'server11-prod' => {
# 'default,emails' => 3,
# 'loud_notifications' => 1,
# 'silent_notifications' => 1,
# },
# 'server12-prod' => {
# 'default' => 2
# }
# }
# Server names (server11-prod, server12-prod) in :delayed_job_pools_per_server
# must match the hostnames on Delayed Job servers. You can verify it by running
# `hostname` on your servers.
# If you use :delayed_job_pools_per_server, :delayed_job_pools will be ignored.

Related

Schedule a one-time Resque job on heroku on application start in Rails

I am using Resque and Resque Schedule to start a job that has to be run immediately on the application start. Other scheduled jobs are loaded every 30 seconds.
This is the code for my config/initializers/redis.rb
require 'rake'
require 'resque'
require 'resque/server'
require 'resque_scheduler/tasks'
# This will make the tabs show up.
require 'resque_scheduler'
require 'resque_scheduler/server'
uri = URI.parse(ENV["REDISTOGO_URL"])
REDIS = Redis.new(:host => uri.host, :port => uri.port, :password => uri.password)
Resque.redis = REDIS
Dir["#{Rails.root}/app/workers/*.rb"].each { |file| require file }
Resque.enqueue(AllMessageRetriever)
Resque.schedule = YAML.load_file(Rails.root.join('config', 'schedule.yml'))
When the application is started up, the AllMessageRetriever gets run 2-3 times rather than only once. Do the initializers get called more than once? This happens both on Heroku and my local environment?
Is it possible to set a delayed job in Resque-Scheduler which will only get executed once and immediately on runtime?
The code for AllMessageRetriever. Basically it loops over a table and calls an external API to get data and then updates it to the table. This entire task happens 2-3 times if I add the enqueue method in initializer file
require 'socialcast'
module AllMessageRetriever
#queue = :message_queue
def self.perform()
Watchedgroup.all.each do |group|
puts "Running group #{group.name}"
continueLoading=true
page=1
per_page=500
while(continueLoading == true)
User.first.refresh_token_if_expired
token = User.first.token
puts "ContinueLoading: #{continueLoading}"
#test = Socialcast.get_all_messages(group.name,token,page,per_page)
messagesArray = ActiveSupport::JSON.decode(#test)["messages"]
puts "Message Count: #{messagesArray.count}"
if messagesArray.count == 0
puts 'count is zero now'
continueLoading = false
else
messagesArray.each do |message|
if not Message.exists?(message["id"])
Message.create_with_socialcast(message, group.id)
else
Message.update_with_socialcast(message)
end
end
end
page += 1
end
Resqueaudit.create({:watchedgroup_id => group.id,:timecompleted => DateTime.now})
end
# Do anything here, like access models, etc
puts "Doing my job"
end
end
Rake
Firstly, why are you trying to queue on init?
You'd be much better delegating to a rake task which is called from an initializer.
This will remove dependency on the initialize process, which should clear things up a lot. I wouldn't put this in an initializer itself, as it will be better handled elsewhere (modularity)
Problem
I think this line is causing the issue:
Resque.enqueue(AllMessageRetriever)
Without seeing the contents of AllMessageRetriever, I'd surmise that you're AllMessageRetriever (module / class?) will be returning the results 2/3 times, causing Resque to add the (2 / 3 times) data-set to the queue
Could be wrong, but it would make sense, and mean your issue is not with Resque / Initializers, but your AllMessageRetriever class
Would be a big help if you showed it!

Resque workers fail immediately: undefined method `write' for nil:NilClass

I am using Resque (and resque-scheduler) in my Rails app to run a recurring job. This was working fine for me, until today. I made some code changes, which I thought were unrelated, but now every worker fails before the perform method is even entered (checked with a debug statement). The same worker method works fine when I run it in the rails console. It only fails via resque on the development localhost (Postgres DB).
The error shown in the resque console for the failed worker is:
Exception
NoMethodError
Error
undefined method `write' for nil:NilClass
There is no additional stack trace for the error. Any idea why this is failing?
Additional info:
lib/tasks/resque.rake
# Resque tasks
require 'resque/tasks'
require 'resque_scheduler/tasks'
namespace :resque do
task :setup do
require 'resque'
require 'resque_scheduler'
require 'resque/scheduler'
# you probably already have this somewhere
Resque.redis = 'localhost:6379'
# If you want to be able to dynamically change the schedule,
# uncomment this line. A dynamic schedule can be updated via the
# Resque::Scheduler.set_schedule (and remove_schedule) methods.
# When dynamic is set to true, the scheduler process looks for
# schedule changes and applies them on the fly.
# Note: This feature is only available in >=2.0.0.
#Resque::Scheduler.dynamic = true
# The schedule doesn't need to be stored in a YAML, it just needs to
# be a hash. YAML is usually the easiest.
Resque.schedule = YAML.load_file("#{Rails.root}/config/resque_schedule.yml")
# If your schedule already has +queue+ set for each job, you don't
# need to require your jobs. This can be an advantage since it's
# less code that resque-scheduler needs to know about. But in a small
# project, it's usually easier to just include you job classes here.
# So, something like this:
# require 'jobs'
end
end
task "resque:setup" => :environment do
#ENV['QUEUE'] = '*'
Resque.before_fork = Proc.new { ActiveRecord::Base.establish_connection }
end
config/resque.yml
development: localhost:6379
test: localhost:6379:1
staging: redis1.se.github.com:6379
fi: localhost:6379
production: redis1.ae.github.com:6379
initializers/resque.rb
rails_root = Rails.root || File.dirname(__FILE__) + '/../..'
rails_env = Rails.env || 'development'
resque_config = YAML.load_file(rails_root.to_s + '/config/resque.yml')
Resque.redis = resque_config[rails_env]
# This will make the tabs show up.
require 'resque_scheduler'
require 'resque_scheduler/server'
config/resque_schedule.yml
populate_game_data:
# you can use rufus-scheduler "every" syntax in place of cron if you prefer
every: 1m
# By default the job name (hash key) will be taken as worker class name.
# If you want to have a different job name and class name, provide the 'class' option
class: PopulateDataWorker
queue: high
args:
description: "This job populates the game and data"
Should note that the above files were not changed between working and non-working state.
We had the same issue this morning, and we pinned it down to a gem update by New Relic.
Version 3.5.6.46 of newrelic_rpm was yanked on rubygems, but it was somehow installed by bundle update.
They are still on the beta track for 3.5.6 and had some issues with Resque. See https://github.com/newrelic/rpm/commit/e81889c2bce97574ec682dafee12015e13ccb2e1
The fix was to add '~> 3.5.5.38' in our Gemfile for newrelic_rpm

How to have long running threads in heroku to listen to multiple aws sqs queues inside rails?

I have two tasks named task "sqs:listen_converted" and task "sqs:listen_failed".
The first one listens to converted files and create a record for it in postgres.
The second set a flag in file record saying that the conversion failed.
These tasks have to listen forever a SQS Queue and read messages published to it.
Their code in a simple way is:
task "sqs:listen_converted" => :environment do
queue = AWS::SQS::Queue.new(SQSADDR['converted'])
queue.poll do |msg|
begin
...
end
end
task "sqs:listen_failed" => :environment do
queue = AWS::SQS::Queue.new(SQSADDR['failed'])
queue.poll do |msg|
begin
...
end
end
Well, in heroku I could have these two process running executing the command:
ps:scale sqs_converted=1 sqs_failed=1
Unfortunately doing this way I will have to pay $36 for each one.
How can I have it running on a single dyno on heroku in background?
Maybe running detached rakes process via heroku-api gem?
heroku = Heroku::API.new(:api_key => 'api_key')
heroku.post_ps('myapp', "rake sqs:listen_converted", { :attach => false })
heroku.post_ps('myapp', "rake sqs:listen_failed", { :attach => false })
Thanks!

Resque worker not recognizing the Rails Mongoid model

I am using resque in my application for delayed jobs, where i cant send emails & sms to bulk number of users asynchronously. And the data is stored in mongodb, mongoid is the ODM connects rails & mongo.
My mongoid model looks like this
class Item
include Mongoid::Document
include Geo::LocationHelper
field :name, :type => String
field :desc, :type => String
#resque queue name
#queue = :item_notification
#resque perform method
def self.perform(item_id)
#item = Item.find(item_id)
end
end
I can able to add jobs to resque, i have verified using resque-web. Whenever i start an resque-worker
QUEUE=item_notification rake resque:work
i got the uninitialized constant Item , since i am using resque as rails gem and starting rake in rails root, i believe my mongoid models should be loaded.
After digging lot, i found that we can explicitly ask rake to load the environment by
QUEUE=item_notification rake environment resque:work
but now also i got the same error uninitialized constant Item
can someone help me out?
and my
Actually, its a problem in dev environment. after adding this line to into resque.rake task file
# load the Rails app all the time
namespace :resque do
puts "Loading Rails environment for Resque"
task :setup => :environment
ActiveRecord::Base.send(:descendants).each { |klass| klass.columns }
end
it works fine
The code taken from GitHub-Resque-Wiki

Primary servers in capistrano

I have a task in capistrano wherein I want just a single line to run only if the server is a marked as primary. Is there a variable or method that I can reference inside a task? 'primary?' or 'primary' doesn't seem to work.
I've also tried something akin to the following:
after "deploy", "task1"
after "deploy", "task2"
after "deploy", "task3"
task :task1, :roles => :app do
*code*
end
task :task2, :roles => :app, :only => {:primary => true} do
*code for just primary server*
end
task :task3, :roles => :app do
*more code*
end
But even this doesn't seem to work (all three tasks get run on every server).
I've been working on this on and off for a few days and I'm having no luck with my searches. Thoughts?
I've solved the issue, but it wasn't pretty. The thing that I've found is that you need to use the 'primary => true' on a per-task basis. Looking at the code, it appears that capistrano generates a list of the servers that the task will be run on before the task is run.

Resources