I have to check if the same job has already been added to the queue before enqueue again.
Currently I use this code but it's very inefficient because it loads all object into memory. Is there a way to query like a normal active record?
Ideally, I would something like
Sidekiq::Queue.where(job_class: "SyncJob", queue_name:"high", arguments: [1,2,3])
Current code:
def list_queued_jobs(job_class, queue_name, arguments)
found_jobs = []
queues = Sidekiq::Queue.all
queues.each do |queue|
queue.each do |job|
job.args.each do |arg|
if arg['job_class'].to_s == job_class.to_s && arg['queue_name'] == queue_name && ActiveJob::Arguments.deserialize(arg['arguments']) == arguments
found_jobs << job
end
end
end
end
return found_jobs
end
There is no out-of-the-box solution, but there is a simpler way to determine if a job's been already queued (true/false):
def in_queues?(job_class, queue_name, arguments = [])
Sidekiq::Queue.new(queue_name).any? do |job|
job_class == job.klass && job.args == arguments
end
end
Keep in mind that sidekiq arguments are presented as an array. For example, if you call your job like:
SyncJob.perform_async([1, 2, 3])
then you check that it's been queued as follows:
in_queues?("SyncJob", "high", [[1, 2, 3]])
Checking for an existing job is known as "unique jobs". Sidekiq Enterprise provides this functionality, as do several 3rd party gems.
Implementing and using the code you describe is a very bad idea.
Related
I am working on an Ruby On Rails application. We have many sidekiq workers that can process multiple jobs at a time. Each job will make calls to the Shopify API, the calls limit set by Shopify is 2 calls per second. I want to synchronize that, so that only two jobs can call the API in a given second.
The way I'm doing that right now, is like this:
# frozen_string_literal: true
class Synchronizer
attr_reader :shop_id, :queue_name, :limit, :wait_time
def initialize(shop_id:, queue_name:, limit: nil, wait_time: 1)
#shop_id = shop_id
#queue_name = queue_name.to_s
#limit = limit
#wait_time = wait_time
end
# This method should be called for each api call
def synchronize_api_call
raise "a block is required." unless block_given?
get_api_call
time_to_wait = calculate_time_to_wait
sleep(time_to_wait) unless Rails.env.test? || time_to_wait.zero?
yield
ensure
return_api_call
end
def set_api_calls
redis.del(api_calls_list)
redis.rpush(api_calls_list, calls_list)
end
private
def get_api_call
logger.log_message(synchronizer: 'Waiting for api call', color: :yellow)
#api_call_timestamp = redis.brpop(api_calls_list)[1].to_i
logger.log_message(synchronizer: 'Got api call.', color: :yellow)
end
def return_api_call
redis_timestamp = redis.time[0]
redis.rpush(api_calls_list, redis_timestamp)
ensure
redis.ltrim(api_calls_list, 0, limit - 1)
end
def last_call_timestamp
#api_call_timestamp
end
def calculate_time_to_wait
current_time = redis.time[0]
time_passed = current_time - last_call_timestamp.to_i
time_to_wait = wait_time - time_passed
time_to_wait > 0 ? time_to_wait : 0
end
def reset_api_calls
redis.multi do |r|
r.del(api_calls_list)
end
end
def calls_list
redis_timestamp = redis.time[0]
limit.times.map do |i|
redis_timestamp
end
end
def api_calls_list
#api_calls_list ||= "api-calls:shop:#{shop_id}:list"
end
def redis
Thread.current[:redis] ||= Redis.new(db: $redis_db_number)
end
end
the way I use it is like this
synchronizer = Synchronizer.new(shop_id: shop_id, queue_name: 'shopify_queue', limit: 2, wait_time: 1)
# this is called once the process started, i.e. it's not called by the jobs themselves but by the App from where the process is kicked off.
syncrhonizer.set_api_calls # this will populate the api_calls_list with 2 timestamps, those timestamps will be used to know when the last api call has been sent.
then when a job wants to make a call
syncrhonizer.synchronize_api_call do
# make the call
end
The problem
The problem with this is that if for some reason a job fails to return to the api_calls_list the api_call it took, that will make that job and the other jobs stuck for ever, or until we notice that and we call set_api_calls again. That problem won't affect that particular shop only, but also the other shops as well, because the sidekiq workers are shared between all the shops using our app. It happen sometimes that we don't notice that until a user calls us, and we find that it was stuck for many hours while it should be finished in a few minutes.
The Question
I just realised lately that Redis is not the best tool for shared locking. So I am asking, Is there any other good tool for this job?? If not in the Ruby world, I'd like to learn from others as well. I'm interested in the techniques as well as the tools. So every bit helps.
You may want to restructure your code and create a micro-service to process the API calls, which will use a local locking mechanism and force your workers to wait on the socket. It comes with the added complexity of maintaining the micro-service. But if you're in a hurry then Ent-Rate-Limiting looks cool too.
Is there a way to get a list of all the jobs currently in the queue and running? Basically, I want to know if a job of given class is already there, I don't want to insert my other job. I've seen other option but I want to do it this way.
I can see here how to get the list of jobs in the queue.
queue = Sidekiq::Queue.new("mailer")
queue.each do |job|
job.klass # => 'MyWorker'
end
from what I understand this will not include processing/running jobs. Any way to get them?
if you want to list all currently running jobs from console, try this
workers = Sidekiq::Workers.new
workers.each do |_process_id, _thread_id, work|
p work
end
a work is a hash.
to list all queue data.
queues = Sidekiq::Queue.all
queues.each do |queue|
queue.each do |job|
p job.klass, job.args, job.jid
end
end
for a specific queue change this to Sidekiq::Queue.new('queue_name')
similarly you can get all scheduled jobs using Sidekiq::ScheduledSet.new
running jobs:
Sidekiq::Workers.new.each do |_process_id, _thread_id, work|
p work
end
queued jobs across all queues:
Sidekiq::Queue.all.each do |queue|
# p queue.name, queue.size
queue.each do |job|
p job.klass, job.args
end
end
Assuming you passed the Hash as the argument to Sidekiq when you enqueued.
args = {
"student_id": 1,
"student_name": "Michael Moore"
}
YourWorker.perform_in(1.second,args)
Then anywhere from your application, you could retrieve it as following
ss = Sidekiq::ScheduledSet.new
student_id_list = ss.map{|job| job['args'].first["student_id"]}
I am using this in the ApplicationJob to check if there is already a job in the queue with same name/arguments and prevent from queue it twice
apps/jobs/application_job.rb
class ApplicationJob < ActiveJob::Base
# Check if there is the same job already queued
around_enqueue do |job, block|
existing_queued_jobs = list_queued_jobs(job.class, job.queue_name, job.arguments)
if existing_queued_jobs.size == 0
block.call # this will enqueue your job
else
puts "JOB not enqueue because already queued (#{job.class}, #{job.queue_name}, #{job.arguments})"
end
end
def list_queued_jobs(job_class, queue_name, arguments)
found_jobs = []
queues = Sidekiq::Queue.all
queues.each do |queue|
queue.each do |job|
job.args.each do |arg|
found_jobs << job if arg['job_class'].to_s == job_class.to_s && arg['queue_name'] == queue_name && arg['arguments'] == arguments
end
end
end
return found_jobs
end
end
I am using sidekiq in my rails app. Users of my app create reports that start a sidekiq job. However, sometimes users want to be able to cancel "processing" reports. Deleting the report is easy but I also need to be able to delete the sidekiq job as well.
So far I have been able to get a list of workers like so:
workers = Sidekiq::Workers.new
and each worker has args that include a report_id so I can identify which job belongs to which report. However, I'm not sure how to actually delete the job. It should be noted that I want to delete the job whether it is currently busy, or set in retry.
According to this Sidekiq documentation page to delete a job with a single id you need to iterate the queue and call .delete on it.
queue = Sidekiq::Queue.new("mailer")
queue.each do |job|
job.klass # => 'MyWorker'
job.args # => [1, 2, 3]
job.delete if job.jid == 'abcdef1234567890'
end
There is also a plugin called sidekiq-status that provides you the ability to cancel a single job
scheduled_job_id = MyJob.perform_in 3600
Sidekiq::Status.cancel scheduled_job_id #=> true
The simplest way I found to do this is:
job = Sidekiq::ScheduledSet.new.find_job([job_id])
where [job_id] is the JID that pertains to the report. Followed by:
job.delete
I found no need to iterate through the entire queue as described by other answers here.
I had the same problem, but the difference is that I needed to cancel a scheduled job, and my solution is:
Sidekiq::ScheduledSet.new.each do |_job|
next unless [online_jid, offline_jid].include? _job.jid
status = _job.delete
end
If you want to cancel a scheduled job, I'm not sure about #KimiGao's answer, but this is what I adapted from Sidekiq's current API documentation:
jid = MyCustomWorker.perform_async
r = Sidekiq::ScheduledSet.new
jobs = r.select{|job| job.jid == jid }
jobs.each(&:delete)
Hope it helps.
You can delete sidekiq job filtering by worker class and args:
class UserReportsWorker
include Sidekiq::Worker
def perform(report_id)
# ...
end
end
jobs = Sidekiq::ScheduledSet.new.select do |retri|
retri.klass == "UserReportsWorker" && retri.args == [42]
end
jobs.each(&:delete)
I had the same problem.
I solved it by registering the job id when I initialize it and by creating another function cancel! to delete it.
Here is the code:
after_enqueue do |job|
sidekiq_job = nil
queue = Sidekiq::Queue.new
sidekiq_job = queue.detect do |j|
j.item['args'][0]['job_id'] == job.job_id
end
if sidekiq_job.nil?
scheduled = Sidekiq::ScheduledSet.new
sidekiq_job = scheduled.detect do |j|
j.item['args'][0]['job_id'] == job.job_id
end
end
if sidekiq_job.present?
booking = job.arguments.first
booking.close_comments_jid = sidekiq_job.jid
booking.save
end
end
def perform(booking)
# do something
end
def self.cancel!(booking)
queue = Sidekiq::Queue.new
sidekiq_job = queue.find_job(booking.close_comments_jid)
if sidekiq_job.nil?
scheduled = Sidekiq::ScheduledSet.new
sidekiq_job = scheduled.find_job(booking.close_comments_jid)
end
if sidekiq_job.nil?
# Report bug in my Bug Tracking tool
else
sidekiq_job.delete
end
end
There is simple way of deleting a job if you know the job_id:
job = Sidekiq::ScheduledSet.new.find_job(job_id)
begin
job.delete
rescue
Rails.logger.error "Job: (job_id: #{job_id}) not found while deleting jobs."
end
Or you can use sidekiq page on rails server.
For example, http://localhost:3000/sidekiq, you can stop/remove the sidekiq jobs.
Before that, you have to updates the routes.rb.
require 'sidekiq/web'
mount Sidekiq::Web => '/sidekiq'
I have some update triggers which push jobs onto the Sidekiq queue. So in some cases, there can be multiple jobs to process the same object.
There are a couple of uniqueness plugins ("Middleware", Unique Jobs), they're not documented much, but they seem to be more like throttlers to prevent repeat processing; what I want is a throttler that prevents repeat creating of the same jobs. That way, an object will always be processed in its freshest state. Is there a plugin or technique for this?
Update: I didn't have time to make a middleware, but I ended up with a related cleanup function to ensure queues are unique: https://gist.github.com/mahemoff/bf419c568c525f0af903
What about a simple client middleware?
module Sidekiq
class UniqueMiddleware
def call(worker_class, msg, queue_name, redis_pool)
if msg["unique"]
queue = Sidekiq::Queue.new(queue_name)
queue.each do |job|
if job.klass == msg['class'] && job.args == msg['args']
return false
end
end
end
yield
end
end
end
Just register it
Sidekiq.configure_client do |config|
config.client_middleware do |chain|
chain.add Sidekiq::UniqueMiddleware
end
end
Then in your job just set unique: true in sidekiq_options when needed
My suggestion is to search for prior scheduled jobs based on some select criteria and delete, before scheduling a new one. This has been useful for me when i want a single scheduled job for a particular Object, and/or one of its methods.
Some example methods in this context:
find_jobs_for_object_by_method(klass, method)
jobs = Sidekiq::ScheduledSet.new
jobs.select { |job|
job.klass == 'Sidekiq::Extensions::DelayedClass' &&
((job_klass, job_method, args) = YAML.load(job.args[0])) &&
job_klass == klass &&
job_method == method
}
end
##
# delete job(s) specific to a particular class,method,particular record
# will only remove djs on an object for that method
#
def self.delete_jobs_for_object_by_method(klass, method, id)
jobs = Sidekiq::ScheduledSet.new
jobs.select do |job|
job.klass == 'Sidekiq::Extensions::DelayedClass' &&
((job_klass, job_method, args) = YAML.load(job.args[0])) &&
job_klass == klass &&
job_method == method &&
args[0] == id
end.map(&:delete)
end
##
# delete job(s) specific to a particular class and particular record
# will remove any djs on that Object
#
def self.delete_jobs_for_object(klass, id)
jobs = Sidekiq::ScheduledSet.new
jobs.select do |job|
job.klass == 'Sidekiq::Extensions::DelayedClass' &&
((job_klass, job_method, args) = YAML.load(job.args[0])) &&
job_klass == klass &&
args[0] == id
end.map(&:delete)
end
Take a look at this: https://github.com/mhenrixon/sidekiq-unique-jobs
It's sidekiq with unique jobs added
Maybe you could use Queue Classic which enqueues jobs on a Postgres database (in a really open way), so it could be extended (open-source) to check for uniqueness before doing so.
I'm using Resque workers to process job in a queue, I have a large number of jobs > 1M in a queue and have some of the jobs that I need to remove ( added by error). Crating the queue with the jobs was not an easy tasks, so clearing the queue using resque-web and adding the correct jobs again is not an option for me.
Appreciate any advice. Thanks!
To remove a specific job from queue you can use the destroy method. It's very easy to use,
For example if you want to remove a job with class Post and id x, which is in queue named queue1
You can do like this..
Resque::Job.destroy(queue1, Post, 'x')
If you want to remove all the jobs of particular type from a queue you can use
Resque::Job.destroy(QueueName, ClassName)
You can find it's documentation at
http://www.rubydoc.info/gems/resque/Resque%2FJob.destroy
In resque's sources (Job class) there's such method, guess it's what you need :)
# Removes a job from a queue. Expects a string queue name, a
# string class name, and, optionally, args.
#
# Returns the number of jobs destroyed.
#
# If no args are provided, it will remove all jobs of the class
# provided.
#
# That is, for these two jobs:
#
# { 'class' => 'UpdateGraph', 'args' => ['defunkt'] }
# { 'class' => 'UpdateGraph', 'args' => ['mojombo'] }
#
# The following call will remove both:
#
# Resque::Job.destroy(queue, 'UpdateGraph')
#
# Whereas specifying args will only remove the 2nd job:
#
# Resque::Job.destroy(queue, 'UpdateGraph', 'mojombo')
#
# This method can be potentially very slow and memory intensive,
# depending on the size of your queue, as it loads all jobs into
# a Ruby array before processing.
def self.destroy(queue, klass, *args)
The above solutions work great if you know all of the arguments passed to the job. If you have a situation where you know some of the arguments passed to the job the following script will work:
queue_name = 'a_queue'
jobs = Resque.data_store.peek_in_queue(queue_name, 0, 500_000);
deleted_count = 0
jobs.each do |job|
decoded_job = Resque.decode(job)
if decoded_job['class'] == 'CoolJob' && decoded_job['args'].include?('a_job_argument')
Resque.data_store.remove_from_queue(queue_name, job)
deleted_count += 1
puts "Deleted!"
end
end
puts deleted_count