Is it possible to save metadata to jobs when using Sidekiq?
For example, I want to execute a validation as a background job so that when it finishes, any errors encountered would be saved as metadata inside the job.
If this is possible, will I still be able to recover this metadata after the job is finished or dead?
Thanks in advance.
Not straight out of the box with Sidekiq, but I have accomplished this with sidekiq-status
For example, in your scenario, it would look something like this:
class ValidatorJob
include Sidekiq::Worker
include Sidekiq::Status::Worker
def perform(*args)
# Run validations
# after they are done, you can store any data with the store method
store attr1: 'failed'
end
end
Yes, Sidekiq provides middleware (client & server) and the possibility to add metadata to the job.
def call(worker_class, job, queue, redis_pool)
# return false/nil to stop the job from going to redis
return false if queue != 'default'
job['customer'] = Customer.current_id
yield
end
Check this link for the docs.
You can't recover any state of a job when it's finished.
Seems that in your case you will need to save or send the data to somewhere else (like the database) to read it later and take action.
Related
I'm using Rails 6 in my app with Sidekiq on board. I've got FetchAllProductsWorker like below:
module Imports
class FetchAllProductsWorker
include Sidekiq::Worker
sidekiq_options queue: 'imports_fetch_all', retry: 0
def perform
(...)
end
end
end
I want to check when FetchAllProductsWorker was finished successfully last time and display this info in my front-end. This job will be fired sporadically but the user must have feedback when the last database sync (FetchAllProductsWorker is responsible for that) succeeded.
I want to have such info only for this one worker. I saw a lot of useful things inside the sidekiq API docs but none of them relate to the history of completed jobs.
You could use the Sidekiq Batches API that provides an on_success callback but that is mostly used for tracking batch work which is an overkill for your problem. I suggest writing your code at the end of the perform function.
def perform
(...) # Run the already implemented code
(notify/log for success) # If it is successful notify/log.
end
The simplified default lifecycle of a Sidekiq looks like this:
If there is an error then the job will be retried a couple of times (read about Retries in the Sidekiq docs). During that time you can see the failing job and the error on the Sidekiq Web UI if configured.
– If the job is finished successfully the job is removed from Redis and there is no information about this specific job available to the application.
That means Sidekiq does not really support running queries about jobs that run successfully in the past. If you need information about past jobs then you need to build this on its own. I basically see three options to allow monitoring Sidekiq Jobs:
Write useful information to your application's log. Most logging tools support monitoring for specific messages, sending messages, or creating views for specific events. This might be enough if you just need this information for debugging reasons.
def perform
Rails.logger.info("#{self.class.name}" started)
begin
# job code
rescue => exception
Rails.logger.error("#{self.class.name} failed: #{exception.message}")
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
Rails.logger.info("#{self.class.name}" was finished successfully)
end
end
If you are mostly interested in getting informed when there is a problem then I suggest looking at a tool like Dead Man's Snitch. how those tools are working is that you ping their API as the last step of a job which will only reach when there was no error. Then configure that tool to notify you if its API hasn't been pinged in the expected timeframe, for example, if you have a daily import job, then Dead Man's Snitch would send you a message only if there wasn't a successful import Job in the last 24 hours. If the job was successful it will not spam you every single day.
require 'open-uri'
def perform
# job code
open("https://nosnch.in/#{TOKEN}")
end
If you want to allow your application's users to see job return statuses on a dashboard in the application. Then it makes sense to store that information in the database. You could, for example, just create a JobStatus ActiveRecord model with columns like job_name, status, payload, and a created_at and then create records in that table whenever it feels useful. Once the data is in the database you can present that data like every other model's data to the user.
def perform
begin
# job code
rescue => exception
JobStatus.create(job_name: self.class.name, status: 'failed', payload: exception.to_json)
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
JobStatus.create(job_name: self.class.name, status: 'success')
end
end
And, of course, you can combine all those technics and tools for different use-cases. 1. for history and statistics, 2. for admins and people being on-call, 3. for users of your application.
How can I update an active job parameter before retrying? I have a job that needs some persistent storage so I store its data as an argument to the job (hash) and the data is updated after each job. If the job fails I want to retry with the updated data instead of the data that was used to schedule the job.
I am using sidekiq for scheduling my jobs btw.
Regards.
You need to rescue and create a new job with the modified parameter. Sidekiq does not allow you to modify a job from the Worker.
def perform(a)
begin
do_work
rescue SomeError
self.class.perform_async(a+1)
end
end
I am currently working on a rails project, i was asked to save progress of a sidekiq workers and store it, so the user who is using the application can see the progress. Now i am faced with this dilemna, is it better to just write out to a text file or save it in a database.
If it is a database, then how to save it in a model object. I know we can store the progress of workers by just sending out the info to log file.
class YourWorker
include Sidekiq::Worker
def perform
logger.info { "Things are happening." }
logger.debug { "Here's some info: #{hash.inspect}" }
end
So if i want to save the progress of workers in a data model, then how?
Your thread title says that the data is unstructured, but your problem description indicates that the data should be structured. Which is it? Speed is not always the most important consideration, and it doesn't seem to be very important in your case. The most important consideration is how your data will be used. Will the way your data is used in the future change? Usually, a database with an appropriate model is the better answer because it allows flexibility for future requirements. It also allows other clients access to your data.
You can create a Job class and then update some attribute of the currently working job.
class Job < ActiveRecord::Base
# assume that there is a 'status' attribute that is defined as 'text'
end
Then when you queue something to happen you create a new Job and pass the id of the Job to perform or perform_async.
job = Job.create!
YourWorker.perform_async job.id
Then in your worker, you'd receive the id of the job to be worked on, and then retrieve and update that record.
def perform(job_id)
job = Job.find job_id
job.status = "It's happening!"
job.save
end
I have a rails application where I want to run a job in the background, but I need to run the job 2 hours from the original event.
The use case might be something like this:
User posts a product listing.
Background job is queued to syndicate listing to 3rd party api's, but even after original request, the response could take a while and the 3rd party's solution is to poll them every 2 hours to see if we can get a success acknowledgement.
So is there a way to queue a job, so that a worker daemon knows to ignore it or only listen to it at the scheduled time?
I don't want to use cron because it will load up a whole application stack and may be executed twice on overlapping long running jobs.
Can a priority queue be used for this? What solutions are there to implement this?
try delayed job - https://github.com/collectiveidea/delayed_job
something along these lines?
class ProductCheckSyndicateResponseJob < Struct.new(:product_id)
def perform
product = Product.find(product_id)
if product.still_needs_syndicate_response
# do it ...
# still no response, check again in two hours
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(product.id), :run_at => 2.hours.from_now)
else
# nothing to do ...
end
end
end
initialize job first time in controller or maybe before_create callback on model?
Delayed::Job.enqueue(ProductCheckSyndicateResponseJob.new(#product.id), :run_at => 2.hours.from_now)
Use the Rufus Scheduler gem. It runs as a background thread, so you don't have to load the entire application stack again. Add it to your Gemfile, and then your code is as simple as:
# in an initializer,
SCHEDULER = Rufus::Scheduler.start_new
# then wherever you want in your Rails app,
SCHEDULER.in('2h') do
# whatever code you want to run in 2 hours
end
The github page has tons of more examples.
I have a background job that does a map/reduce job on MongoDB. When the user sends in more data to the document, it kicks of the background job that runs on the document. If the user sends in multiple requests, it will kick off multiple background jobs for the same document, but only one really needs to run. Is there a way I can prevent multiple duplicate instances? I was thinking of creating a queue for each document and making sure it is empty before I submit a new job. Or perhaps I can set a job id somehow that is the same as my document id, and check that none exists before submitting it?
Also, I just found a sidekiq-unique-jobs gem. But the documentation is non-existent. Does this do what I want?
My initial suggestion would be a mutex for this specific job. But as there's a chance that you may have multiple application servers working the sidekiq jobs, I would suggest something at the redis level.
For instance, use redis-semaphore within your sidekiq worker definition. An untested example:
def perform
s = Redis::Semaphore.new(:map_reduce_semaphore, connection: "localhost")
# verify that this sidekiq worker is the first to reach this semaphore.
unless s.locked?
# auto-unlocks in 90 seconds. set to what is reasonable for your worker.
s.lock(90)
your_map_reduce()
s.unlock
end
end
def your_map_reduce
# ...
end
https://github.com/krasnoukhov/sidekiq-middleware
UniqueJobs
Provides uniqueness for jobs.
Usage
Example worker:
class UniqueWorker
include Sidekiq::Worker
sidekiq_options({
# Should be set to true (enables uniqueness for async jobs)
# or :all (enables uniqueness for both async and scheduled jobs)
unique: :all,
# Unique expiration (optional, default is 30 minutes)
# For scheduled jobs calculates automatically based on schedule time and expiration period
expiration: 24 * 60 * 60
})
def perform
# Your code goes here
end
end
There also is https://github.com/mhenrixon/sidekiq-unique-jobs (SidekiqUniqueJobs).
You can do this, assuming you have all the jobs are getting added to Enqueued bucket.
class SidekiqUniqChecker
def self.perform_unique_async(action, model_name, id)
key = "#{action}:#{model_name}:#{id}"
queue = Sidekiq::Queue.new('elasticsearch')
queue.each { |q| return if q.args.join(':') == key }
Indexer.perform_async(action, model_name, id)
end
end
The above code is just a sample, but you may tweak it to your needs.
Source