Retry Sidekiq worker from within worker - ruby-on-rails

In my app I am trying to perform two worker tasks sequentially.
First, a PDF is being created with Wicked pdf and then, once the PDF is created, to send an email to two different recipients with the PDF attached.
This is what is called in the controller :
PdfWorker.perform_async(#d.id)
MailingWorker.perform_in(1.minutes, #d.id,#d.class.name.to_s)
First worker creates the PDF and second worker sends email.
Here is second worker :
class MailingWorker
include Sidekiq::Worker
sidekiq_options retry: false
def perform(d_id,model)
#d = eval(model).find(d_id)
#model = model
if #d.pdf.present?
ProfessionnelMailer.notification_d(#d).deliver
ClientMailer.notification_d(#d).deliver
else
MailingWorker.perform_in(1.minutes, #d.id, #model.to_s)
end
end
end
The if statement checks if the PDF has been created. If true two mails are sent, otherwise, the same worker is called again one minute later, just to let the Heroku server extra time to process the PDF creation in case it takes more time or a long queue.
Though if the PDF has definitely failed to be processed, the above ends up in an infinite loop.
Is there a way to fix this ?
One option I see is calling the second worker inside the PDF creation worker though I don't really want to nest workers too deep. It makes my controller more clear to have them separate, I can see the sequence of actions. But any advice welcome.
Another option is to use sidekiq_options retry: 5 and request a retry of the controller that could be counted towards the full total of 5 retries, instead of retrying the worker with else MailingWorker.perform_in(1.minutes, #d.id, #model.to_s) but I don't know how to do this. As per this thread https://github.com/mperham/sidekiq/issues/769 it would be to raise an exception but I am not sure how to do this ... (also I am not sure how long the retry will wait before being processed with the exception method, with the solution above I can control the time frame..)

If you do not want to have nested workers, then in MailingWorker instead of enqueuing it again, raise an exception if the PDF is not present.
Also, configure the worker retry option, so that sidekiq will push it to the retry queue and run it again in sometime. According to the documentation,
Sidekiq will retry failures with an exponential backoff using the
formula (retry_count ** 4) + 15 + (rand(30) * (retry_count + 1)) (i.e.
15, 16, 31, 96, 271, ... seconds + a random amount of time). It will
perform 25 retries over approximately 21 days.
Worker code will be more like,
class MailingWorker
include Sidekiq::Worker
sidekiq_options retry: 5
def perform(d_id,model)
#d = eval(model).find(d_id)
#model = model
if #d.pdf.present?
ProfessionnelMailer.notification_d(#d).deliver
ClientMailer.notification_d(#d).deliver
else
raise "PDF not present"
end
end
end

I believe the "correct" and most asynchroneous way to do this is to have two queues, and two workers:
Queue 1: CreatePdfWorker
Queue 2: SendPdfWorker
When the CreatePdfWorker has generated the PDF, it then enqueues the SendPdfWorker with the newly generated PDF and recipients.
This way, each worker can work independently and pluck from the queue asynchroneously, and you're not struggling against the design choices of Sidekiq.

Related

how to made sidekiq_retries_exhausted available globally for all workers

so I want to Implement sidekiq_retries_exhausted in such a way that it triggers automatically when all retries failed Globally and this behavior should be for all workers in my codebase
currently, I have to write sidekiq_retries_exhausted every time in each worker but what I want to understand is there any way so that sidekiq_retries_exhausted triggers when all retries failed and I don't have to write this function on each worker
so in short I want to overwrite local sidekiq_retries_exhausted and making it globally available like a kind of middleware so that it triggers automatically when the worker failed it all retries and I don't have to write this in every worker
currently, it is implemented like this in every worker
sidekiq_retries_exhausted do |msg, error|
AlertOnRetriesExhaustedUtils.send_alert_on_retries_exhausted(msg, error)
end
In sidekiq v5.1 introduced global callback death_handlers when job dies
Death Notification
The sidekiq_retries_exhausted callback is specific to a Worker class. Starting in v5.1, Sidekiq can also fire a global callback when a job dies:
# this goes in your initializer
Sidekiq.configure_server do |config|
config.death_handlers << ->(job, ex) do
puts "Uh oh, #{job['class']} #{job["jid"]} just died with error #{ex.message}."
end
end

Unexpected sidekiq jobs get executed

I'm using sidekiq cron to run some jobs. I have a parent job which only runs once, and that parent job starts 7 million child jobs. However, in my sidekiq dashboard, it says over 42 million jobs enqueued. I checked those enqueued jobs, they are my child jobs. I'm trying to figure out why so many more jobs than expected are enqueued. I checked the log in sidekiq, one thing I noticed is, "Cron Jobs - add job with name: new_topic_post_job" shows up many times in the log. new_topic_post is the name of the parent job in schedule.yml. Following lines also show up many times
2019-04-18T17:01:22.558Z 12605 TID-osb3infd0 WARN: Processing recovered job from queue queue:low (queue:low_i-03933b94d1503fec0.nodemodo.com_4): "{\"retry\":false,\"queue\":\"low\",\"backtrace\":true,\"class\":\"WeeklyNewTopicPostCron\",\"args\":[],\"jid\":\"f37382211fcbd4b335ce6c85\",\"created_at\":1555606809.2025042,\"locale\":\"en\",\"enqueued_at\":1555606809.202564}"
2019-04-18T17:01:22.559Z 12605 TID-osb2wh8to WeeklyNewTopicPostCron JID-f37382211fcbd4b335ce6c85 INFO: start
WeeklyNewTopicPostCron is the name of the parent job class. Wondering does this mean my parent job runs multiple times instead of only 1? If so, what's the cause? I'm pretty sure the time in the cron job is right, I set it to "0 17 * * 4" which means it only runs once a week. Also I set retry to false for parent job and 3 for child jobs. So even all child jobs fail, we should still only have 21 million jobs. Following is my cron job setting in schedule.yml
new_topic_post_job:
cron: "0 17 * * 4"
class: "WeeklyNewTopicPostCron"
queue: low
and this is WeeklyNewTopicPostCron:
class WeeklyNewTopicPostCron
include Sidekiq::Worker
sidekiq_options queue: :low, retry: false, backtrace: true
def perform
processed_user_ids = Set.new
TopicFollower.select("id, user_id").find_in_batches(batch_size: 1000000) do |topic_followers|
new_user_ids = []
topic_followers.map(&:user_id).each { |user_id| new_user_ids << user_id if processed_user_ids.add?(user_id) }
batch_size = 1000
offset = 0
loop do
batched_user_ids_for_redis = new_user_ids[offset, batch_size]
Sidekiq::Client.push_bulk('class' => NewTopicPostSender,
'args' => batched_user_ids_for_redis.map { |user_id| [user_id, 7] }) if batched_user_ids_for_redis.present?
break if batched_user_ids_for_redis.size < batch_size
offset += batch_size
end
end
end
end
Most probably your parent sidekiq job is causing the sidekiq process to crash, which then results in a worker restart. On restart sidekiq probably tries to recover the interrupted job and starts processing it again (from the beginning). Some details here:
https://github.com/mperham/sidekiq/wiki/Reliability#recovering-jobs
This probably happens multiple times before the parent job eventually finishes, and hence the extremely high number of child jobs are created. You can easily verify this by checking the process id of the sidekiq process while this job is being run and it most probably will keep changing after a while:
ps aux | grep sidekiq
It could be that you have some monit configuration to restart sidekiq in case memory usage goes too high.Or it might be that this query is causing the process to crash:
TopicFollower.select("id, user_id").find_in_batches(batch_size: 1000000)
Try reducing the batch_size. 1million feels like too high a number. But my best guess is that the sidekiq process dies while processing the long running parent process.

Sidekiq and unqueueing in ruby

I am currently enabling queueing in on of my app wrote in Ruby. I use sidekiq and have define a worker class as below:
class Worker
include Sidekiq::Worker
def perform(params)
sleep 10
##logger.info("--------------------------")
##logger.info("Request received")
##logger.info("--------------------------")
end
end
I am calling it from my main entry point called 'api.rb'
post '/receive/?' do
##logger.debug("/retrieve endpoint received")
Worker.perform_async #params
end
this working fine and each time the sleep is done, the next queued task is started.
In my case, I need to unqueued or start the next item queued only when I decide it. it will be triggered by an external event.
in my 'api.rb', I have added:
post '/response/?' do
next_task
end
The way the code works is that '/receive' can queued 10 requests. the first request will triggered a specific action (sent a post command to a server).
I expect the remote server to send me back a request through '/response' to tell me that the action is finished. when this response is received, I use the 'next_task' api to remove the previous task which was running and now completed and move to the next queued one.
Any idea, how to create a custom trigger to unqueue and start the new job. Is there SIGNAL which allow me to avoid the sidekiq framework to unqueue until I send a specific signal.
Merci
To delete a job in a Sidekiq queue, you would have to iterate the whole queue. It is not an idiomatic use of the queue.
I am afraid I don't understand what exactly you are trying to do. Just remember that you can store state outside of the Sidekiq queue, for example you can have a model for the Job:
post '/receive/?' do
job = Job.create(#params)
Worker.perform_in(10.seconds)
end
and then in the worker:
def perform
job = Job.find_oldest_unexecuted
if job
job.execute!
else
# wait another 10 seconds
Worker.perform_in(10.seconds)
end
end
And in Job:
class Job < ActiveRecord::Base
def self.find_oldest_unexecuted
where(executed_at: nil).order(:id).first
end
def execute!
# do what needs to be done here
update_attribute(:executed_at, Time.zone.now)
end
end

How to mark a sidekiq task/job for retry without raising an error?

I use a Sidekiq queue to process communications with an unreliable, 3rd party API. Since this API is often down for a couple minutes at a time and then back up again, Sidekiq has been handy. When a connection issue happens, an error is raised and Sidekiq throws the job back in the queue to be retried again later, after some time has passed.
I use NewRelic to not only help debug crashes, but also for monitoring. My problem is that this current methodology above creates errors in NewRelic. If the 3rd party API is down for more than a couple of minutes, the error count accumulates enough to cause notifications to send out through NewRelic.
What I'd like to do is only raise an error from my worker when a certain number of retries have occurred for a job. I'm using sidekiq_retries_exhausted to do this. My problem is that I'm not quite sure how to put jobs back in the queue after they have an error without raising an error.
Does Sidekiq provide any facilities to return a job to a queue, increment the number of retries for the job, and have it sit there until it's due to run again, as if an exception was raised in the worker class?
You raise a specific error and tell the error service to ignore errors of that type. For NewRelic:
https://docs.newrelic.com/docs/agents/ruby-agent/installation-configuration/ruby-agent-configuration#error_collector.ignore_errors
Here is what I did to keep intentional retry errors out of AirBrake:
class TaskWorker
include Sidekiq::Worker
class RetryNotAnError < RuntimeError
end
def perform task_id
task = Task.find(task_id)
task.do_cool_stuff
if task.finished?
#log.debug "Task #{task_id} was successful."
return false
else
#log.debug "Task #{task_id} will try again later."
raise RetryNotAnError, task_id
end
end
end
Tell Airbrake to ignore it:
Airbrake.configure do |config|
config.ignore << 'RetryNotAnError'
end
It's good to make your exception name OBVIOUSLY not an error (e.g. RetryLaterNotAnError), as it will still show up in logs and such, and you don't want to freak people out when they see a bunch of them.
ps. That said, I would really like to see Sidekiq to provide an explicit, errorless retry mechanism.
If using Sidekiq Enterprise, one other option might be to utilize the optional set of additional error types that will then get treated as Sidekiq::Limiter::OverLimit violations.
For my purposes, I've used a new error class and then added it to the list in the config. Here are the notes from the sidekiq-ent code (not in the public sidekiq repo) on how to modify your config file:
# An optional set of additional error types which would be
# treated as a rate limit violation, so the job would automatically
# be rescheduled as with Sidekiq::Limiter::OverLimit.
#
# Sidekiq::Limiter.errors << MyApp::TooMuch
# Sidekiq::Limiter.errors = [Foo::Error, MyApp::Limited]
Inside the specific job you can specify the max_retries, or it will default to 20:
sidekiq_options max_limiter_retries: 10
Inside the job, I'll rescue the "expected" intermittent error that I'd rather not ignore completely and then raise the error I've added to the list, something like this:
rescue RestClient::RequestTimeout => e
raise SidekiqSoftRetry.new(e.inspect)
end
Here's what that looks like in my initialization file-- and Mike Perham was kind enough to respond with the option to update the global retry limit.
class SidekiqSoftRetry < RuntimeError
end
Sidekiq::Limiter::DEFAULT_OPTIONS[:reschedule] = 10
Sidekiq::Limiter.configure do |config|
config.errors.concat(
[
SidekiqSoftRetry,
]
)
end

Rails threads testing db lock

Let's say we want to test that the database is being locked..
$transaction = Thread.new {
Rails.logger.debug 'transaction process start'
Inventory.transaction do
inventory.lock!
Thread.stop
inventory.units_available=99
inventory.save
end
}
$race_condition = Thread.new {
Rails.logger.debug 'race_condition process start'
config = ActiveRecord::Base.configurations[Rails.env].symbolize_keys
config[:flags] = 65536 | 131072 | Mysql2::Client::FOUND_ROWS
begin
connection = Mysql2::Client.new(config)
$transaction.run
$transaction.join
rescue NoMethodError
ensure
connection.close if connection
end
}
Rails.logger.debug 'main process start'
$transaction.join
Rails.logger.debug 'main process after transaction.join'
sleep 0.1 while $transaction.status!='sleep'
Rails.logger.debug 'main process after sleep'
$race_condition.join
Rails.logger.debug 'main process after race_condition.join'
In theory, I'd think it would do the transaction thread, then wait( Thread.stop ), then the main process would see that it's sleeping, and start the race condition thread(which will be trying to alter data in the locked table when it actually works). Then the race condition would continue the transaction thread after it was done.
what's weird is the trace
main process start
transaction process start
race_condition process start
Coming from nodejs, it seems like threads aren't exactly as user friendly.. though, there has to be a way to get this done.
Is there an easier way to lock the database, then try to change it with a different thread?
Thread.new automatically starts the Thread.
But that does not mean that it is executing.
That depends on Operations system, ruby or jruby, how many cores, etc.
In your example the main thread runs until
$transaction.join,
and only then your transaction thread starts, just by chance.
It runs still Thread.stop, then your '$race_condition' Thread starts, because both other are blocked (it might have started before)
So that explains your log.
You have two $transaction.join
they wait until the thread exits, but a thread can only exit once...
I don't know what is happen then, maybe the second call waits forever.
For your test, you need some sort of explicit synchronization, so that our race_thread writes exactly when the transaction_thread is in the middle of the transaction. You can do this with Mutex, but better would be some sort of message passing. The following blog post may help:
http://www.engineyard.com/blog/2011/a-modern-guide-to-threads/
For any resource to make it a "Mutually Exclusive", you need to use Mutex class and use a synchronize method to make the resources locked while one thread is using them. You have to do something like this:
semaphore = Mutex.new
and use it inside the Thread instance.
$transaction = Thread.new {
semaphore.synchronize{
# Do whatever you want with the *your shared resource*
}
}
This way you can prevent any deadlocks.
Hope this helps.

Resources