More Advanced Control Over Delayed Job workers retry - ruby-on-rails

So I'm using Delayed::Job workers (on Heroku) as an after_create callback after a user creates a certain model.
A common use case, though, it turns out, is for users to create something, then immediately delete it (likely because they made a mistake or something).
When this occurs the workers are fired up, but by the time they query for the model at hand, it's already deleted, BUT because of the auto-retry feature, this ill-fated job will retry 25 times, and definitely never work.
Is there any way I can catch certain errors and, when they occur, prevent that specific job from ever retrying again, but if it's not that error, it will retry in the future?

Abstract the checks into the function you call with delayed_job. Make the relevant checks wether your desired job can proceed or not and either work on that job or return success.

To expand on David's answer, instead of doing this:
def after_create
self.send_later :spam_this_user
end
I'd do this:
# user.rb
def after_create
Delayed::Job.enqueue SendWelcomeEmailJob.new(self.id)
end
# send_welcome_email_job.rb
class SendWelcomeEmailJob < Struct(:user_id)
def perform
user = User.find_by_id(self.user_id)
return if user.nil? #user must have been deleted
# do stuff with user
end
end

Related

Sidekiq mailer job access to db before model is been saved

Probably the title is not self explanatory, the situation is this:
# user.points: 0
user.update!(points: 1000)
UserMailer.notify(user).deliver_later. # user.points = 0 => Error !!!!
user instance is updated and after that the Mailer is called with the user as a parameter, and in the email that changes are non-existent: user.points=0 instead of 1000
But, with a sleep 1 just after the user_update the email is sent with the changes updated, so it seems that the email job is faster than updating data to database.
# user.points: 0
user.update!(points: 1000)
sleep 1
UserMailer.notify(user).deliver_later. # user.points = 1000 => OK
What's the best approach to solve this avoiding this two possible solutions?
One solution could be calling UserMailer.notify not with the user instance but with the user values
Another solution, it could be sending the mail in the user callback after_commit
So, is there another way to solve this keeping the user instance as the parameter and avoiding the after_commit callback?
Thanks
Remember, Sidekiq runs copy of your Rails app in a separate process, using Redis as the medium. When you call deliver_later, it does not actually 'pass' user to the mailer job. It spawns a thread that enqueues the job in Redis, passing a serialized hash of user properties, including the ID.
When the mailer job runs in the Sidekiq process, it loads a fresh copy of user from the database. If the transaction containing your update! in the main Rails app has not yet finished committing, Sidekiq gets the old record from the database. So, it's a race condition.
(update! already wraps an implicit transaction around itself if there isn't one, so wrapping it in your own transaction is redundant, and doesn't help the race condition since nested ActiveRecord transactions commit only when the outermost transaction commits.)
In a pinch, you could delay enqueuing the job with something hacky like .deliver_later(wait_until: 10.seconds.from_now), but your best bet is to put the mailer notification in an after_commit callback on your model.
class User < ApplicationRecord
after_commit :send_points_mailer
def send_points_mailer
return unless previous_changes.includes?(:points)
UserMailer.notify(self).deliver_later
end
end
A model's after_commit callbacks are guaranteed to run after the final transaction is committed, so, like nuking from orbit, it's the only way to be sure.
You didn't mention it, but I'm assuming you are using ActiveRecord? If so you likely need to assure to flush the database transaction before your sidekiq job is scheduled.
https://api.rubyonrails.org/v6.1.4/classes/ActiveRecord/Transactions/ClassMethods.html

Rails : ActiveJob : Retry same job with some delay

I am using active jobs and it works wonderfully well. As I was playing around, I noticed something and I'm looking for any improvements.
I have a job like:
class SomeJob < ApplicationJob
queue_as :default
def perform(param)
# if condition then re-try after x minutes
if condition
self.class.set(:wait => x.minutes).perform_later(param)
return
end
# something else
end
end
Upon some condition, I am trying to re-schedule the current job after a x minutes delay with the same original parameters. The scheduling works great. But, there was some nuance that I observed at the database level and wanted an improvement.
The issue is a new job is created, a new row in the db table. Instead, I'd like to have it work as the same job just with some added delay (basically I want to modify the parameters to re-schedule the same current job with the same parameters obviously) .
I do realize that raising an error will probably do the trick, as far as working on the same job is concerned. One nice thing about that is the attempts gets incremented too. But, I'd like to be able to just add an delay before the job runs again (the same job, without creating a new one).
How can I do this? Thanks.
Yes you'll want to retry versus enqueuing a new job. Look at customizations by using the class method on_retry
Changing your code, it could look like:
class SomeJob < ApplicationJob
queue_as :default
retry_on RetrySomeJobException, wait: x.minutes
def perform(param)
raise RetrySomeJobException if condition
# Do the work!
end
end

sidekiq perform_in(delay) from within the worker ignores the delay

Users in my app create Transactions, and I need for these transactions (and the associated jobs that are created for changing transaction state to ignored when users don't respond within a certain time) to cancel themselves unless a user performs a pay action.
The method I am using in one example makes the following calls using perform_async after a state changes to approved, and then cancels if it's not responded to in time:
Class Transaction < ApplicationRecord
#when approved
def create_worker
MyWorker.perform_async(self.id)
end
#if user responds in time, cancel the jobs and update the record to `paid` etc
def cancel_worker
jid = MyWorker.perform_async(self.id)
MyWorker.cancel! jid
end
end
As suggested here and here, I'm putting additional functionality about when to cancel inside the worker. It looks something like this:
class MyWorker
include Sidekiq::Worker
def perform(transaction_id)
return if paid?
transaction = Transaction.find transaction_id
self.class.perform_in(1.minutes, transaction.ignore!)
end
def paid?
Sidekiq.redis { |c| c.exists("paid-#{jid}") }
end
def self.cancel! jid
Sidekiq.redis { |c| c.setex("paid-#{jid}", 86400, 1) }
end
end
This code results in the following terminal output:
2018-12-16T01:40:50.645Z 30530 TID-oxm547nes MyWorker JID-6c97e448fe30998235dee95d INFO: start
Changing transaction 4 approved to ignored (event: ignore!)
2018-12-16T01:40:50.884Z 30530 TID-oxm547nes MyWorker JID-6c97e448fe30998235dee95d INFO: done: 0.239 sec
2018-12-16T01:41:56.122Z 30530 TID-oxm547oag MyWorker JID-b46bb3b002e00f480a04be16 INFO: start
2018-12-16T01:41:56.125Z 30530 TID-oxm547oag MyWorker JID-b46bb3b002e00f480a04be16 INFO: fail: 0.003 sec
2018-12-16T01:41:56.126Z 30530 TID-oxm547oag WARN: {"context":"Job raised exception","job":{"class":"MyWorker","args":[true],"retry":true,"queue":"default","jid":"b46bb3b002e00f480a04be16","created_at":1544924450.884224,"enqueued_at":1544924516.107598,"error_message":"Couldn't find Transaction with 'id'=true","error_class":"ActiveRecord::RecordNotFound","failed_at":1544924516.125679,"retry_count":0},"jobstr":"{\"class\":\"MyWorker\",\"args\":[true],\"retry\":true,\"queue\":\"default\",\"jid\":\"b46bb3b002e00f480a04be16\",\"created_at\":1544924450.884224,\"enqueued_at\":1544924516.107598}"}
So this creates two jobs - one with a jid of 6c97e448fe30998235dee95d and which immediately sets the Transaction to ignored, and then one with a jid of b46bb3b002e00f480a04be16 which blows right past the early return in the worker's perform function (because it doesn't use the same jid as the first job).
One reason I can surmise about why this does not work the way I intend is that the call to MyWorker.cancel! cannot get the jid of the worker i want to cancel without first creating a db migration to hold said jid.
Is creating a db migration to contain the jid for a worker the preferred method for making sure that jid is accessible between actions? And how is id=true getting in there? As the error above says: Couldn't find Transaction with 'id'=true"
Ok, lets go piece by piece.
This code:
self.class.perform_in(1.minute, transaction.ignore!)
is passing whatever the returned value of the ignore! method (in this case, true) as argument for the job, which causes the exception.
You should make sure to pass the right arguments:
self.class.perform_in(1.minute, transaction.tap(&:ignore!).id)
Every time you call MyWorker.perform_async (or any other performing class method) you are creating a new job, so it’s not surprising that you are not getting the same jid.
You should, as suggested, store the initial jid in the transaction table, and then upon payment retrieve it to cancel. Otherwise the job id is lost. An alternative is to actually use the same redis to store the paid flag, but keyed by the transaction instead. c.exists("paid-#{transaction.id}")
Your code does not wait 1 minute to ignore the transaction, it just ignores the transaction right away and sets itself to execute again in 1 minute.
You are probably wanting to call
jid = MyWorker.perform_in(1.minute, transaction.id)
directly from the create_worker method.
UPDATE
If, as I imagine, you are using some kind of persistent state machine, it's even easier to just "ignore unless complete" and forget about cancelling the job
class Transaction
# I'm inventing a DSL here
include SomeStateMachine
state :accepted do
event :ignore, to: :ignored
event :confirm, to: :confirmed
end
state :ignored
state :confirmed
def create_worker
# no need to track it
MyWorker.perform_in(1.minute, id)
end
end
class MyWorker
include Sidekiq::Worker
def perform(id)
transaction = Transaction.find(id)
transaction.ignore! if transaction.can_ignore?
end
end
You can let your job run, and it will happily skip any non-ignorable transaction.

How to prevent parallel Sidekiq jobs from executing code in Rails

I have around 10 workers that performs a job that includes the following:
user = User.find_or_initialize_by(email: 'some-email#address.com')
if user.new_record?
# ... some code here that does something taking around 5 seconds or so
elsif user.persisted?
# ... some code here that does something taking around 5 seconds or so
end
user.save
The problem is that at certain times, two or more workers run this code at the exact time, and thus I later found out that two or more Users have the same email, in which I should always end up only unique emails.
It is not possible for my situation to create DB Unique Indexes for email as unique emails are conditional -- some Users should have unique email, some do not.
It is noteworthy to mention that my User model has uniqueness validations, but it still doesn't help me because, between .find_or_initialize_by and .save, there is a code that is dependent if the user object is already created or not.
I tried Pessimistic and Optimistic locking, but it didn't help me, or maybe I just didn't implement it properly... should you have some suggestions regarding this.
The solution I can only think of is to lock the other threads (Sidekiq jobs) whenever these lines of codes get executed, but I am not too sure how to implement this nor do I know if this is even a suggestable approach.
I would appreciate any help.
EDIT
In my specific case, it is gonna be hard to put email parameter in the job, as this job is a little more complex than what was just said above. The job is actually an export script in which a section of the job is the code above. I don't think it's also possible to separate the functionality above into another separate worker... as the whole job flow should be serial and that no parts should be processed parallely / asynchronously. This job is just one of the jobs that are managed by another job, in which ultimately is managed by the master job.
Pessimistic locking is what you want but only works on a record that exists - you can't use it with new_record? because there's nothing to lock in the DB yet.
I managed to solve my problem with the following:
I found out that I can actually add a where clause in Rails DB Uniqueness Partial Index, and thus I can now set up uniqueness conditions for different types of Users on the database-level in which other concurrent jobs will now raise an ActiveRecord::RecordNotUnique error if already created.
The only problem now then is the code in between .find_or_initialize_by and .save, since those are time-dependent on the User objects in which always only one concurrent job should always get a .new_record? == true, and other concurrent jobs should then trigger the .persisted? == true as one job would always be first to create it, but... all of these doesn't work yet because it is only at the line .save where the db uniqueness index validation gets called. Therefore, I managed to solve this problem by putting .save before those conditions, and at the same time I added a rescue block for .save which then adds another job to the queue of itself should it trigger the ActiveRecord::RecordNotUnique error, to make sure that async jobs won't get conflicts. The code now looks like below.
user = User.find_or_initialize_by(email: 'some-email#address.com')
begin
user.save
is_new_record = user.new_record?
is_persisted = user.persisted?
rescue ActiveRecord::RecordNotUnique => exception
MyJob.perform_later(params_hash)
end
if is_new_record
# do something if not yet created
elsif is_persisted
# do something if already created
end
I would suggest a different architecture to bypass the problem.
How about a producer-worker model, where one master Sidekiq process gets a list of email addresses, and then spawns a worker Sidekiq process for each email? Sidekiq makes this easy with a dedicated queue for master and workers to communicate.
Doing so, the email address becomes an input parameter of workers, so we know by construction that workers will not stump on each other data.

Requeue or evaluate delayed job from inside?

Is there a way to determine the status of a running delayed_job job from inside the job task itself? I have a job that interacts with a service that can be pretty flaky and for a certain class of connection failures I'd like to requeue the job and only raise an exception if a connection failure occurs again at the retry limit.
Pseudo-code to demonstrate what I want to be able to do:
def do_thing
service.send_stuff(args)
rescue Exception1, Exception2
if job.retries == JOBS_MAX
raise
else
job.requeue
end
end
I don't want to raise an exception on any failure because generally the job will be completed okay on a later retry and it is just making noise for me. I do want to know if it is never completed, though.
Define a custom job for DJ, setting a number for max_attempts and behavior for the error callback. This is untested, but it might look something like this:
class DoThingJob
def max_attempts; #max_attempts ||= 5; end
def error(job, exception)
case exception
when Exception1, Exception2
# will be requeued automatically until max_attempts is reached
# can add extra log message here if desired
else
#max_attempts = job.attempts
# this will cause DJ to fail the job and not try again
end
end
end
NOTE
I started writing this before #pdobb posted his answer. I'm posting it anyway because it provides some more detail about how you might handle the exceptions and requeue logic.
As you've said, if the Delayed Job runner gets to the end of the perform queue then it will be considered as a successful run and removed from the queue. So you just have to stop it from getting to the end. There isn't a requeue -- even if there was it'd be a new record with new attributes. So you may rethink whatever it is that is causing the job to notify you about exceptions. You could, for example, add a condition upon which to notify you...
Potential Solutions
You can get the default JOBS_MAX (as you pseudo-coded it) with Delayed::Worker.max_attempts or you can set your own per-job by defining a method, e.g.: max_attempts.
# Fail permanently after the 10th failure for this job
def max_attempts
10
end
That is, this method will be usable given the following:
You can also make use of callback hooks. Delayed Job will callback to your payload object via the error method if it is defined. So you may be able to use the error method to notify you of actual exceptions beyond a given attempt number. To do that...
Within the callback, the Delayed::Job object itself is returned as the first argument:
def error(job, exception)
job.attempts # gives you the current attempt number
# If job.attempts is greater than max_attempts then send exception notification
# or whatever you want here...
end
So you can use the callbacks to start adding logic on when to notify yourself and when not to. I might even suggest making a base set of functionality that you can include into all of your payload objects to do these things... but that's up to you and your design.

Resources