Sidekiq worker best practices for recurring jobs - ruby-on-rails

I'm trying to define what the best/most efficient way is to schedule background jobs on sidekiq. I need this jobs to run periodically (i.e every 15 minutes, every day, etc.) for each user.
The jobs are tied to several objects such as calendars, postings, blogs, etc. Each user can have 0-many of those objects.
I have considered two options:
1) Have a scheduler job that in turn schedules a background worker for each one of the objects above. The worker would look something like this:
`
class WorkerScheduler
def perform
CalendarWorker.perform_async
BlogWorker.perform_async
##### etc...
end
`
and inside each worker, I would go through the process for each one of the available records (that may require threading as discussed here: How get best performance rails requests parallel sidekiq worker
`
class CalendarWorker
calendars = Calendar.all
calendars.each do |calendar|
#### actions for each calendar
end
### reschedule worker
CalendarWorker.perform_in(15.minutes)
end
end
`
2) Every time a new record is created for calendars, postings, blogs, etc. Schedule a background worker and within that worker, reschedule to perform again later as desired. I.e:
`
class CalendarWorker
def perform(i)
.... complete all logic for Calendar.find(i) ...
CalendarWorker.perform_in(15.minutes, i)
end
end
`
Are either of the above better than the other? I'm looking to make sure this is done in the most efficient way and also that my worker dyno (Heroku) does not get overloaded. Right now, I've been scheduling a job offer record and my memory seems to be making out the hey Dyno at just under 500 MB and there is little to no traffic. Does the number of scheduled jobs have a big impact on memory usage?
Are there other potential ways to do this both for the job running itself and the scheduling?

Related

ruby on rails background application to run jobs automaically at a time dynamically defined by users?

I have a use case where user schedules a 'command' from the web interface. The user also specifies the date and time the command needs to be triggred.
This is sequence of steps:
1.User schedules a command 'Restart Device' at May 31, 3pm.
2.This is saved in a database table called Command.
3.Now there needs to be a background job that needs to be triggered at this specified time to do something (make an api call, send email etc.)
4.Once job is executed, It is removed or marked done, until a new command is issued.
There could be multpile users concurrently performing the above sequence of steps.
Is delayed_job a good choice for above? I couldnt find an example as how to implement above using delayed job.
EDIT: the reason I was looking at delayed_job is because eventually I would need to leverage existing relational database
I would advise to use Sidekiq. With it you can use scheduled jobs to tell sidekiq when to perform the jobs.
Example :
MyWorker.perform_at(3.hours.from_now, 'mike', 1)
EDIT : worker example
#app/workers/restart_device_worker.rb
class RestartDeviceWorker
include Sidekiq::Worker
def perform(params)
# Do the job
# ...
# update in DB
end
end
see doc: https://blog.codeship.com/how-to-use-rails-active-job/
https://guides.rubyonrails.org/active_job_basics.html
If you are using Rails 5 then you have best option of ActiveJob(inbuilt feature)
Use ActiveJob
"Active Job – Make work happen later. Active Job is a framework for declaring jobs and making them run on a variety of queuing backends. These jobs can be everything from regularly scheduled clean-ups, to billing charges, to mailings. Anything that can be chopped up into small units of work and run in parallel, really."
Active Job has built-in adapters for multiple queuing backends (Sidekiq, Resque, Delayed Job and others). You just need to tell them.
Scenario: I want to delete my story after 24 hours(1 day). Then we do create a job named "StoriesCleanupJob". Call this job at the time of the creation of story like below
StoriesCleanupJob.set(wait: 1.day).perform_later(story)
It will call the Job after 1 day.
class StoriesCleanupJob < ApplicationJob
queue_as :default
def perform(story)
if story.destroy
#put your own conditions like update the status and all, whatever you want to perform.
end
end
end

What Rails gem should I use to make recurring jobs with Resque?

I have a "Play" button in my app that checks a stock value from an API and creates a Position object that holds that value. This action uses Resque to make a background job using Resque and Redis in the following way:
Controller - stock_controller.rb:
def start_tracking
#stock = Stock.find(params[:id])
Resque.enqueue(StockChecker, #stock.id)
redirect_to :back
end
Worker:
class StockChecker
#queue = :stock_checker_queue
def self.perform(stock_id)
stock = Stock.find_by(id: stock_id)
stock.start_tracking_position
end
end
Model - stock.rb:
def start_tracking_position
// A Position instance that holds the stock value is created
end
I now want this to happen every 15 minutes for every Stock object. I looked at the scheduling section in the Ruby Toolbox website and have a hard time deciding what fits to my needs and how to start implementing it.
My concern is that my app will create tons of Position objects so I need something that is simple, uses Resque and can withstand this type of object creating without overloading the app.
What gem should I use and what is the simplest way to make my Resque Job happen every 15 minutes when the start_tracking action happens on a Stock object?
I've found resque scheduler to be useful: https://github.com/resque/resque-scheduler.
Configure up the schedule.yml for 15 mins
The biggest issue I found was ensuring it's running post releases etc. In the end I set up God to shutdown and restart
In terms of load. Not sure I follow, the schedulers will trigger events but the load is determined by the number of workers you have and how you decide to implement the creation. You can set the priority of the queues, and workers for the queue, but I guess if you don't process them in a timely way you get a backlog, is that acceptable. ? Normally you would run them of a separate server, this minimising impact to front end

Rails and sucker_punch: Debounce x seconds before executing job to control rate of execution

In my Rails 3.2 project, I am using SuckerPunch to run a expensive background task when a model is created/updated.
Users can do different types of interactions on this model. Most of the times these updates are pretty well spaced out, however for some other actions like re-ordering, bulk-updates etc, those POST requests can come in very frequently, and that's when it overwhelms the server.
My question is, what would be the most elegant/smart strategy to start the background job when first update happens, but wait for say 10 seconds to make sure no more updates are coming in to that Model (Table, not a row) and then execute the job. So effectively throttling without queuing.
My sucker_punch worker looks something like this:
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
#perform some expensive job
end
end
It gets called from Marker and 'Map' model and sometimes from controllers (for update_all cases)like so:
after_save :generate_static_map_html
def generate_static_map_html
StaticMapWorker.new.async.perform(self.map, self.map.markers)
end
So, a pretty standard setup for running background job. How do I make the job wait or not schedule until there are no updates for x seconds on my Model (or Table)
If it helps, Map has_many Markers so triggering the job with logic that when any marker associations of a map update would be alright too.
What you are looking for is delayed jobs, implemented through ActiveJob's perform_later. According to the edge guides, that isn't implemented in sucker_punch.
ActiveJob::QueueAdapters comparison
Fret not, however, because you can implement it yourself pretty simply. When your job retrieves the job from the queue, first perform some math on the records modified_at timestamp, comparing it to 10 seconds ago. If the model has been modified, simply add the job to the queue and abort gracefully.
code!
As per the example 2/5 of the way down the page, explaining how to add a job within a worker Github sucker punch
class StaticMapWorker
include SuckerPunch::Job
workers 10
def perform(map,markers)
if Map.where(modified_at: 10.seconds.ago..Time.now).count > 0
StaticMapWorker.new.async.perform(map,markers)
else
#perform some expensive job
end
end
end

How can I get notified when Sidekiq Enqueues jobs or has stopped processing jobs?

I am on Heroku and I got an error because my redis db got too full. The my sidekiq processes stopped working. It was like that for a day until I realized it. Now I have 600+ jobs that I have tried to process but they are just breaking everything now. How can I sound off the alarms when sidekiq can't process jobs or when the Enqueue starts to fill up?
You could set a rake task on a schedule to check Sidekiq stats, and then take the appropriate action ( like send an email ).
I've created my own module with helper methods for Sidekiq that serves a number of purposes, e.g deleting jobs, checking queues, retriving jobs by certain criteria etc. https://gist.github.com/blotto/10324119
For your purpose, grab the sidekiq stats as such :
def sidekiq_stats()
summary = Hash.new
stats = Sidekiq::Stats.new
summary = { processed: stats.processed,
failed: stats.failed,
enqueued: stats.enqueued,
queues: stats.queues}
end
And then evaluate the enqueued value, set a tolerance on what you think is too high, and then let loose the hounds.
If you're using zabbix for monitoring, you could use sidekiq_queue_zabbix template at https://github.com/hungntit/sidekiq_queue_zabbix. This template supports showing graph and sending alert when sidekiq queue size is higher than one specified limit number

Rails 3.1/rake - datespecific tasks without queues

I want to give my users the option to send them a daily summary of their account statistics at a specific (user given) time ....
Lets say following model:
class DailySummery << ActiveRecord::Base
# attributes:
# send_at
# => 10:00 (hour)
# last_sent_at
# => Time of the last sent summary
end
Is there now a best practice how to send this account summaries via email to the specific time?
At the moment I have a infinite rake task running which checks permanently if emails are available for sending and i would like to put the dailysummary-generation and sending into this rake task.
I had a thought that I could solve this with following pseudo-code:
while true
User.all.each do |u|
u.generate_and_deliver_dailysummery if u.last_sent_at < Time.now - 24.hours
end
sleep 60
end
But I'm not sure if this has some hidden caveats...
Notice: I don't want to use queues like resq or redis or something like that!
EDIT: Added sleep (have it already in my script)
EDIT: It's a time critical service (notification of trade rates) so it should be as fast as possible. Thats the background why I don't want to use a queue or job based system. And I use Monit to manage this rake task, which works really fine.
There's only really two main ways you can do delayed execution. You run the script when an user on your site hits a page, which is inefficient and not entirely accurate. Or use some sort of background process, whether it's a cron job or resque/delayed job/etc.
While your method of having an rake process run forever will work fine, it's inefficient because you're iterating over users 24/7 as soon as it finishes, something like:
while true
User.where("last_sent_at <= ? OR last_sent_at = ?", 24.hours.ago, nil).each do |u|
u.generate_and_deliver_dailysummery
end
sleep 3600
end
Which would run once an hour and only pull users that needed an email sent is a bit more efficient. The best practice would be to use a cronjob though that runs your rake task though.
Running a task periodically is what cron is for. The whenever gem (https://github.com/javan/whenever) makes it simple to configure cron definitions for your app.
As your app scales, you may find that the rake task takes too long to run and that the queue is useful on top of cron scheduling. You can use cron to control when deliveries are scheduled but have them actually executed by a worker pool.
I see two possibilities to do a task at a specific time.
Background process / Worker / ...
It's what you already have done. I refactored your example, because there was two bad things.
Check conditions directly from your database, it's more efficient than loading potential useless data
Load users by batch. Imagine your database contains millions of users... I'm pretty sure you would be happy, but not Rails... not at all. :)
Beside your code I see another problem. How are you going to manage this background job on your production server? If you don't want to use Resque or something else, you should consider manage it another way. There is Monit and God which are both a process monitor.
while true
# Check the condition from your database
users = User.where(['last_sent_at < ? OR created_at IS NULL', 24.hours.ago])
# Load by batch of 1000
users.find_each(:batch_size => 1000) do |u|
u.generate_and_deliver_dailysummery
end
sleep 60
end
Cron jobs / Scheduled task / ...
The second possibility is to schedule your task recursively, for instance each hour or half-hour. Correct me if I'm wrong, but do your users really need to schedule the delivery at 10:39am? I think that let them choose the hour is enough.
Applying this, I think a job fired each hour is better than an infinite task querying your database every single minute. Moreover it's really easy to do, because you don't need to set up anything.
There is a good gem to manage cron task with the ruby syntax. More infos here : Whenever
You can do that, you'll need to also check for the time you want to send at. So starting with your pseudo code and adding to it:
while true
User.all.each do |u|
if u.last_sent_at < Time.now - 24.hours && Time.now.hour >= u.send_at
u.generate_and_deliver_dailysummery
# the next 2 lines are only needed if "generate_and_deliver_dailysummery" doesn't sent last_sent_at already
u.last_sent_at = Time.now
u.save
end
end
sleep 900
end
I've also added the sleep so you don't needlessly hammer your database. You might also want to look into limiting that loop to just the set of users you need to send to. A query similar what Zachary suggests would be much more efficient than what you have.
If you don't want to use a queue - consider delayed job (sort of a poor mans queue) - it does run as a rake task similar to what you are doing
https://github.com/collectiveidea/delayed_job
http://railscasts.com/episodes/171-delayed-job
it stores all tasks in a jobs table, usually when you add a task it queues it to run as soon as possible, however you can override this to delay it until a specific time
you could convert your DailySummary class to DailySummaryJob and once complete it could re-queue a new instance of itself for the next days run
How did you update the last_sent_at attribute?
if you use
last_sent_at += 24.hours
and initialized with last_sent_at = Time.now.at_beginning_of_day + send_at
it will be all ok .
don't use last_sent_at = Time.now . it is because there may be some delay when the job is actually done , this will make the last_sent_at attribute more and more "delayed".

Resources