execute only one of many duplicate jobs with sidekiq? - ruby-on-rails

I have a background job that does a map/reduce job on MongoDB. When the user sends in more data to the document, it kicks of the background job that runs on the document. If the user sends in multiple requests, it will kick off multiple background jobs for the same document, but only one really needs to run. Is there a way I can prevent multiple duplicate instances? I was thinking of creating a queue for each document and making sure it is empty before I submit a new job. Or perhaps I can set a job id somehow that is the same as my document id, and check that none exists before submitting it?
Also, I just found a sidekiq-unique-jobs gem. But the documentation is non-existent. Does this do what I want?

My initial suggestion would be a mutex for this specific job. But as there's a chance that you may have multiple application servers working the sidekiq jobs, I would suggest something at the redis level.
For instance, use redis-semaphore within your sidekiq worker definition. An untested example:
def perform
s = Redis::Semaphore.new(:map_reduce_semaphore, connection: "localhost")
# verify that this sidekiq worker is the first to reach this semaphore.
unless s.locked?
# auto-unlocks in 90 seconds. set to what is reasonable for your worker.
s.lock(90)
your_map_reduce()
s.unlock
end
end
def your_map_reduce
# ...
end

https://github.com/krasnoukhov/sidekiq-middleware
UniqueJobs
Provides uniqueness for jobs.
Usage
Example worker:
class UniqueWorker
include Sidekiq::Worker
sidekiq_options({
# Should be set to true (enables uniqueness for async jobs)
# or :all (enables uniqueness for both async and scheduled jobs)
unique: :all,
# Unique expiration (optional, default is 30 minutes)
# For scheduled jobs calculates automatically based on schedule time and expiration period
expiration: 24 * 60 * 60
})
def perform
# Your code goes here
end
end

There also is https://github.com/mhenrixon/sidekiq-unique-jobs (SidekiqUniqueJobs).

You can do this, assuming you have all the jobs are getting added to Enqueued bucket.
class SidekiqUniqChecker
def self.perform_unique_async(action, model_name, id)
key = "#{action}:#{model_name}:#{id}"
queue = Sidekiq::Queue.new('elasticsearch')
queue.each { |q| return if q.args.join(':') == key }
Indexer.perform_async(action, model_name, id)
end
end
The above code is just a sample, but you may tweak it to your needs.
Source

Related

ruby on rails background application to run jobs automaically at a time dynamically defined by users?

I have a use case where user schedules a 'command' from the web interface. The user also specifies the date and time the command needs to be triggred.
This is sequence of steps:
1.User schedules a command 'Restart Device' at May 31, 3pm.
2.This is saved in a database table called Command.
3.Now there needs to be a background job that needs to be triggered at this specified time to do something (make an api call, send email etc.)
4.Once job is executed, It is removed or marked done, until a new command is issued.
There could be multpile users concurrently performing the above sequence of steps.
Is delayed_job a good choice for above? I couldnt find an example as how to implement above using delayed job.
EDIT: the reason I was looking at delayed_job is because eventually I would need to leverage existing relational database
I would advise to use Sidekiq. With it you can use scheduled jobs to tell sidekiq when to perform the jobs.
Example :
MyWorker.perform_at(3.hours.from_now, 'mike', 1)
EDIT : worker example
#app/workers/restart_device_worker.rb
class RestartDeviceWorker
include Sidekiq::Worker
def perform(params)
# Do the job
# ...
# update in DB
end
end
see doc: https://blog.codeship.com/how-to-use-rails-active-job/
https://guides.rubyonrails.org/active_job_basics.html
If you are using Rails 5 then you have best option of ActiveJob(inbuilt feature)
Use ActiveJob
"Active Job – Make work happen later. Active Job is a framework for declaring jobs and making them run on a variety of queuing backends. These jobs can be everything from regularly scheduled clean-ups, to billing charges, to mailings. Anything that can be chopped up into small units of work and run in parallel, really."
Active Job has built-in adapters for multiple queuing backends (Sidekiq, Resque, Delayed Job and others). You just need to tell them.
Scenario: I want to delete my story after 24 hours(1 day). Then we do create a job named "StoriesCleanupJob". Call this job at the time of the creation of story like below
StoriesCleanupJob.set(wait: 1.day).perform_later(story)
It will call the Job after 1 day.
class StoriesCleanupJob < ApplicationJob
queue_as :default
def perform(story)
if story.destroy
#put your own conditions like update the status and all, whatever you want to perform.
end
end
end

Limit the number of background workers per user in Rails

I have a Rails application where each user has a specific number of background workers.
Since users pay more to increase the number of workers available, I want to be able to add these workers dynamically.
I would like to use ActiveJob in combination with Sidekiq and I thought about the following solution:
when the user registers, I create a new queue in sidekiq with the id of the user.
I add a number of workers, dedicated to that specific queue, depending on how much the user is paying.
I have problems in implementing this solution with Sidekiq and I could not find documentation on how to add queues and workers dynamically.
If I were to do this, here's what I'd try first:
Wrap all limited jobs in a counter.
on start/dequeue job checks if this user has capacity to run it.
If yes, job runs. If not, it reschedules itself.
Something along these lines:
class MyWorker
def perform(user_id, *args)
user = User.find(user_id)
unless user.has_available_workers
# re-enqueue with the same args. Possibly, with a delay.
return
end
user.checkout_worker
# do work
ensure
user.release_worker
end
end

Save metadata to jobs on Sidekiq

Is it possible to save metadata to jobs when using Sidekiq?
For example, I want to execute a validation as a background job so that when it finishes, any errors encountered would be saved as metadata inside the job.
If this is possible, will I still be able to recover this metadata after the job is finished or dead?
Thanks in advance.
Not straight out of the box with Sidekiq, but I have accomplished this with sidekiq-status
For example, in your scenario, it would look something like this:
class ValidatorJob
include Sidekiq::Worker
include Sidekiq::Status::Worker
def perform(*args)
# Run validations
# after they are done, you can store any data with the store method
store attr1: 'failed'
end
end
Yes, Sidekiq provides middleware (client & server) and the possibility to add metadata to the job.
def call(worker_class, job, queue, redis_pool)
# return false/nil to stop the job from going to redis
return false if queue != 'default'
job['customer'] = Customer.current_id
yield
end
Check this link for the docs.
You can't recover any state of a job when it's finished.
Seems that in your case you will need to save or send the data to somewhere else (like the database) to read it later and take action.

is there a way to run a job at a set time later, without cron, say a scheduled queue?

I have a rails application where I want to run a job in the background, but I need to run the job 2 hours from the original event.
The use case might be something like this:
User posts a product listing.
Background job is queued to syndicate listing to 3rd party api's, but even after original request, the response could take a while and the 3rd party's solution is to poll them every 2 hours to see if we can get a success acknowledgement.
So is there a way to queue a job, so that a worker daemon knows to ignore it or only listen to it at the scheduled time?
I don't want to use cron because it will load up a whole application stack and may be executed twice on overlapping long running jobs.
Can a priority queue be used for this? What solutions are there to implement this?
try delayed job - https://github.com/collectiveidea/delayed_job
something along these lines?
class ProductCheckSyndicateResponseJob < Struct.new(:product_id)
def perform
product = Product.find(product_id)
if product.still_needs_syndicate_response
# do it ...
# still no response, check again in two hours
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(product.id), :run_at => 2.hours.from_now)
else
# nothing to do ...
end
end
end
initialize job first time in controller or maybe before_create callback on model?
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(#product.id), :run_at => 2.hours.from_now)
Use the Rufus Scheduler gem. It runs as a background thread, so you don't have to load the entire application stack again. Add it to your Gemfile, and then your code is as simple as:
# in an initializer,
SCHEDULER = Rufus::Scheduler.start_new
# then wherever you want in your Rails app,
SCHEDULER.in('2h') do
# whatever code you want to run in 2 hours
end
The github page has tons of more examples.

Rails 3.1/rake - datespecific tasks without queues

I want to give my users the option to send them a daily summary of their account statistics at a specific (user given) time ....
Lets say following model:
class DailySummery << ActiveRecord::Base
# attributes:
# send_at
# => 10:00 (hour)
# last_sent_at
# => Time of the last sent summary
end
Is there now a best practice how to send this account summaries via email to the specific time?
At the moment I have a infinite rake task running which checks permanently if emails are available for sending and i would like to put the dailysummary-generation and sending into this rake task.
I had a thought that I could solve this with following pseudo-code:
while true
User.all.each do |u|
u.generate_and_deliver_dailysummery if u.last_sent_at < Time.now - 24.hours
end
sleep 60
end
But I'm not sure if this has some hidden caveats...
Notice: I don't want to use queues like resq or redis or something like that!
EDIT: Added sleep (have it already in my script)
EDIT: It's a time critical service (notification of trade rates) so it should be as fast as possible. Thats the background why I don't want to use a queue or job based system. And I use Monit to manage this rake task, which works really fine.
There's only really two main ways you can do delayed execution. You run the script when an user on your site hits a page, which is inefficient and not entirely accurate. Or use some sort of background process, whether it's a cron job or resque/delayed job/etc.
While your method of having an rake process run forever will work fine, it's inefficient because you're iterating over users 24/7 as soon as it finishes, something like:
while true
User.where("last_sent_at <= ? OR last_sent_at = ?", 24.hours.ago, nil).each do |u|
u.generate_and_deliver_dailysummery
end
sleep 3600
end
Which would run once an hour and only pull users that needed an email sent is a bit more efficient. The best practice would be to use a cronjob though that runs your rake task though.
Running a task periodically is what cron is for. The whenever gem (https://github.com/javan/whenever) makes it simple to configure cron definitions for your app.
As your app scales, you may find that the rake task takes too long to run and that the queue is useful on top of cron scheduling. You can use cron to control when deliveries are scheduled but have them actually executed by a worker pool.
I see two possibilities to do a task at a specific time.
Background process / Worker / ...
It's what you already have done. I refactored your example, because there was two bad things.
Check conditions directly from your database, it's more efficient than loading potential useless data
Load users by batch. Imagine your database contains millions of users... I'm pretty sure you would be happy, but not Rails... not at all. :)
Beside your code I see another problem. How are you going to manage this background job on your production server? If you don't want to use Resque or something else, you should consider manage it another way. There is Monit and God which are both a process monitor.
while true
# Check the condition from your database
users = User.where(['last_sent_at < ? OR created_at IS NULL', 24.hours.ago])
# Load by batch of 1000
users.find_each(:batch_size => 1000) do |u|
u.generate_and_deliver_dailysummery
end
sleep 60
end
Cron jobs / Scheduled task / ...
The second possibility is to schedule your task recursively, for instance each hour or half-hour. Correct me if I'm wrong, but do your users really need to schedule the delivery at 10:39am? I think that let them choose the hour is enough.
Applying this, I think a job fired each hour is better than an infinite task querying your database every single minute. Moreover it's really easy to do, because you don't need to set up anything.
There is a good gem to manage cron task with the ruby syntax. More infos here : Whenever
You can do that, you'll need to also check for the time you want to send at. So starting with your pseudo code and adding to it:
while true
User.all.each do |u|
if u.last_sent_at < Time.now - 24.hours && Time.now.hour >= u.send_at
u.generate_and_deliver_dailysummery
# the next 2 lines are only needed if "generate_and_deliver_dailysummery" doesn't sent last_sent_at already
u.last_sent_at = Time.now
u.save
end
end
sleep 900
end
I've also added the sleep so you don't needlessly hammer your database. You might also want to look into limiting that loop to just the set of users you need to send to. A query similar what Zachary suggests would be much more efficient than what you have.
If you don't want to use a queue - consider delayed job (sort of a poor mans queue) - it does run as a rake task similar to what you are doing
https://github.com/collectiveidea/delayed_job
http://railscasts.com/episodes/171-delayed-job
it stores all tasks in a jobs table, usually when you add a task it queues it to run as soon as possible, however you can override this to delay it until a specific time
you could convert your DailySummary class to DailySummaryJob and once complete it could re-queue a new instance of itself for the next days run
How did you update the last_sent_at attribute?
if you use
last_sent_at += 24.hours
and initialized with last_sent_at = Time.now.at_beginning_of_day + send_at
it will be all ok .
don't use last_sent_at = Time.now . it is because there may be some delay when the job is actually done , this will make the last_sent_at attribute more and more "delayed".

Resources