I am implementing a video streaming interface for Azure's Media Services API in Rails and I need to continuously update the uploaded video in order to process it (copy, encode) through Media Services, the status will eventually be either available or failed. To do this I decided to use delayed jobs, however, I am not sure what's the best way to keep a job always running.
class UpdateAzureVideosJob < ApplicationJob
queue_as :azure_media_service
def perform
to_update = AzureVideo.all.map{ |v| v if v.state != 5 }.compact
unless to_update.empty?
to_update.each do |video|
video.update
end
end
sleep(5)
Delayed::Job.enqueue self
end
def before(job)
Delayed::Job.where("last_error is NULL AND queue = ? AND created_at < ?", job.queue, DateTime.now).delete_all
end
end
The reason I delete previous jobs of the same queue is because when I call enqueue method inside perform it adds an extra job which then adds an extra job and the queue with the scheduled jobs gets dirty really quickly.
I am just experimenting and this is probably the closest workaround (although a bit silly) for my case. I haven't tried other alternatives but any suggestions would be appreciated. Thank you.
This is how I did it:
def after(job)
Delayed::Job.transaction do
Delayed::Job.create(handler: job.handler, queue: job.queue, run_at: job.run_at + 5.minutes)
job.destroy
end
end
It will re-schedule the job to run every 5.minutes after original has finihsed. Been using it in production for major part of the year with out any issues. Also have the same logic in #error(job) callback
Related
I am using active jobs and it works wonderfully well. As I was playing around, I noticed something and I'm looking for any improvements.
I have a job like:
class SomeJob < ApplicationJob
queue_as :default
def perform(param)
# if condition then re-try after x minutes
if condition
self.class.set(:wait => x.minutes).perform_later(param)
return
end
# something else
end
end
Upon some condition, I am trying to re-schedule the current job after a x minutes delay with the same original parameters. The scheduling works great. But, there was some nuance that I observed at the database level and wanted an improvement.
The issue is a new job is created, a new row in the db table. Instead, I'd like to have it work as the same job just with some added delay (basically I want to modify the parameters to re-schedule the same current job with the same parameters obviously) .
I do realize that raising an error will probably do the trick, as far as working on the same job is concerned. One nice thing about that is the attempts gets incremented too. But, I'd like to be able to just add an delay before the job runs again (the same job, without creating a new one).
How can I do this? Thanks.
Yes you'll want to retry versus enqueuing a new job. Look at customizations by using the class method on_retry
Changing your code, it could look like:
class SomeJob < ApplicationJob
queue_as :default
retry_on RetrySomeJobException, wait: x.minutes
def perform(param)
raise RetrySomeJobException if condition
# Do the work!
end
end
I want to re-queue the job automatically after it's done. There will be optional delay for 10 seconds if nothing was processed.
I can see two approaches. The first one is to put the logic inside perform block. The second is to use around_block feature to do that.
What is the more elegant way to do that?
Here is an example of code block with such logic
# Process queue automatically
class ProcessingJob < ApplicationJob
queue_as :processing_queue
around_perform do |_job, block|
if block.call.zero?
# wait for 10 seconds if 0 items was handled
self.class.set(wait: 10.seconds).perform_later
else
# don't rest, there are many work to do
self.class.perform_later
end
end
def perform
# will return number of processed items
ProcessingService.handle_next_batch
end
end
Should I put around_block logic into the perform function?
def perform
# will return number of processed items
processed_items_count = ProcessingService.handle_next_batch
delay_for_next_job = processed_items_count.zero? ? 10 : 0
next_job = self.class.set(wait: delay_for_next_job.seconds)
next_job.perform_later
end
You can extract delay_for_next_job as private method, etc, apply refactoring as needed.
Why to use around_perform? You don't need job instance here.
Depending on your needs, you can also check if any jobs are currently pending using something like https://github.com/mhenrixon/sidekiq-unique-jobs (sorry, I'm not really familiar with ActiveJob API)
I need to overwrite the Delayed::Worker.max_attempts for one specific job, which I want to retry a lot of times. Also, I don't want the next scheduled time to be determined exponentially (From the docs: 5 seconds + N ** 4, where N is the number of retries).
I don't want to overwrite the Delayed::Worker settings, and affect other jobs.
My job is already a custom job (I handle errors in a certain way), so that might be helpful. Any pointers on how to do this?
I figured it out by looking through delayed_job source code. This is not documented anywhere in their docs.
Here's what I did:
class MyCustomJob < Struct.new(:param1, :param2)
def perform
# do something
end
# attempts and time params are required by delayed_job
def reschedule_at(time, attempts)
30.seconds.from_now
end
def max_attempts
50
end
end
Then run it wherever you need to by using enqueue, like this:
Delayed::Job.enqueue( MyCustomJob.new( param1, param2 ) )
Hope this helps someone in the future.
I have a background job that does a map/reduce job on MongoDB. When the user sends in more data to the document, it kicks of the background job that runs on the document. If the user sends in multiple requests, it will kick off multiple background jobs for the same document, but only one really needs to run. Is there a way I can prevent multiple duplicate instances? I was thinking of creating a queue for each document and making sure it is empty before I submit a new job. Or perhaps I can set a job id somehow that is the same as my document id, and check that none exists before submitting it?
Also, I just found a sidekiq-unique-jobs gem. But the documentation is non-existent. Does this do what I want?
My initial suggestion would be a mutex for this specific job. But as there's a chance that you may have multiple application servers working the sidekiq jobs, I would suggest something at the redis level.
For instance, use redis-semaphore within your sidekiq worker definition. An untested example:
def perform
s = Redis::Semaphore.new(:map_reduce_semaphore, connection: "localhost")
# verify that this sidekiq worker is the first to reach this semaphore.
unless s.locked?
# auto-unlocks in 90 seconds. set to what is reasonable for your worker.
s.lock(90)
your_map_reduce()
s.unlock
end
end
def your_map_reduce
# ...
end
https://github.com/krasnoukhov/sidekiq-middleware
UniqueJobs
Provides uniqueness for jobs.
Usage
Example worker:
class UniqueWorker
include Sidekiq::Worker
sidekiq_options({
# Should be set to true (enables uniqueness for async jobs)
# or :all (enables uniqueness for both async and scheduled jobs)
unique: :all,
# Unique expiration (optional, default is 30 minutes)
# For scheduled jobs calculates automatically based on schedule time and expiration period
expiration: 24 * 60 * 60
})
def perform
# Your code goes here
end
end
There also is https://github.com/mhenrixon/sidekiq-unique-jobs (SidekiqUniqueJobs).
You can do this, assuming you have all the jobs are getting added to Enqueued bucket.
class SidekiqUniqChecker
def self.perform_unique_async(action, model_name, id)
key = "#{action}:#{model_name}:#{id}"
queue = Sidekiq::Queue.new('elasticsearch')
queue.each { |q| return if q.args.join(':') == key }
Indexer.perform_async(action, model_name, id)
end
end
The above code is just a sample, but you may tweak it to your needs.
Source
Is there any way i can fetch start and time for my every job. I am using delayed_job.
Depending on your setup, you could include actions to record the start time and end time of your job within the job itself.
class SomeJob < Struct.new(:param1, :param2)
def perform
start_time = Time.now
## Do Something
SomeModel.find(id).update_parameters({:start_time => start_time, :end_time => Time.now})
end
end
Might be easier than forking the repository and I am not crazy about the idea of keeping all of those jobs around, it would slow down the queue over time depending on load.
delayed_job has no ability to track the start time, duration or end time of a job. It also by default removes the table entry upon success.
You would have to fork the github version to and create a patch to track and record this information or utilise an external method ( http://helderribeiro.net/?p=87 uses monit ) to track this data (again which uses a forked version).