Run rails code after an update to the database has commited, without after_commit - ruby-on-rails

I'm trying to battle some race cases with my background task manager. Essentially, I have a Thing object (already exists) and assign it some properties, and then save it. After it is saved with the new properties, I queue it in Resque, passing in the ID.
thing = Thing.find(1)
puts thing.foo # outputs "old value"
thing.foo = "new value"
thing.save
ThingProcessor.queue_job(thing.id)
The background job will load the object from the database using Thing.find(thing_id).
The problem is that we've found Resque is so fast at picking up the job and loading the Thing object from the ID, that it loads a stale object. So within the job, calling thing.foo will still return "old value" like 1/100 times (not real data, but it does not happen often).
We know this is a race case, because rails will return from thing.save before the data has actually been commit to the database (postgresql in this case).
Is there a way in Rails to only execute code AFTER a database action has commit? Essentially I want to make sure that by the time Resque loads the object, it is getting the freshest object. I know this can be achieved using an after_commit hook on the Thing model, but I don't want it there. I only need this to happen in this one specific context, not every time the model has commit changed to the DB.

You can put in a transaction as well. Just like the example below:
transaction do
thing = Thing.find(1)
puts thing.foo # outputs "old value"
thing.foo = "new value"
thing.save
end
ThingProcessor.queue_job(thing.id)
Update: there is a gem which calls After Transaction, with this you may solve your problem. Here is the link:
http://xtargets.com/2012/03/08/understanding-and-solving-race-conditions-with-ruby-rails-and-background-workers/

What about wrapping a try around the transaction so that the job is queued up only upon success of the transaction?

I had a similar issue, where by I needed to ensure that a transaction had commited before running a series of action. I ended up using this Gem:
https://github.com/Envek/after_commit_everywhere
It meant that I could do the following:
def finalize!
Order.transaction do
payment.charge!
# ...
# Ensure that we only send out items and perform after actions when the order has defintely be completed
after_commit { OrderAfterFinalizerJob.perform_later(self) }
end
end

One gem to allow that is https://github.com/Ragnarson/after_commit_queue
It is a little different than the other answer's after_commit_everywhere gem. The after_commit_everywhere call seems decoupled from current model being saved or not.
So it might be what you expect or not expect, depending on your use case.

Related

How to prevent parallel Sidekiq jobs from executing code in Rails

I have around 10 workers that performs a job that includes the following:
user = User.find_or_initialize_by(email: 'some-email#address.com')
if user.new_record?
# ... some code here that does something taking around 5 seconds or so
elsif user.persisted?
# ... some code here that does something taking around 5 seconds or so
end
user.save
The problem is that at certain times, two or more workers run this code at the exact time, and thus I later found out that two or more Users have the same email, in which I should always end up only unique emails.
It is not possible for my situation to create DB Unique Indexes for email as unique emails are conditional -- some Users should have unique email, some do not.
It is noteworthy to mention that my User model has uniqueness validations, but it still doesn't help me because, between .find_or_initialize_by and .save, there is a code that is dependent if the user object is already created or not.
I tried Pessimistic and Optimistic locking, but it didn't help me, or maybe I just didn't implement it properly... should you have some suggestions regarding this.
The solution I can only think of is to lock the other threads (Sidekiq jobs) whenever these lines of codes get executed, but I am not too sure how to implement this nor do I know if this is even a suggestable approach.
I would appreciate any help.
EDIT
In my specific case, it is gonna be hard to put email parameter in the job, as this job is a little more complex than what was just said above. The job is actually an export script in which a section of the job is the code above. I don't think it's also possible to separate the functionality above into another separate worker... as the whole job flow should be serial and that no parts should be processed parallely / asynchronously. This job is just one of the jobs that are managed by another job, in which ultimately is managed by the master job.
Pessimistic locking is what you want but only works on a record that exists - you can't use it with new_record? because there's nothing to lock in the DB yet.
I managed to solve my problem with the following:
I found out that I can actually add a where clause in Rails DB Uniqueness Partial Index, and thus I can now set up uniqueness conditions for different types of Users on the database-level in which other concurrent jobs will now raise an ActiveRecord::RecordNotUnique error if already created.
The only problem now then is the code in between .find_or_initialize_by and .save, since those are time-dependent on the User objects in which always only one concurrent job should always get a .new_record? == true, and other concurrent jobs should then trigger the .persisted? == true as one job would always be first to create it, but... all of these doesn't work yet because it is only at the line .save where the db uniqueness index validation gets called. Therefore, I managed to solve this problem by putting .save before those conditions, and at the same time I added a rescue block for .save which then adds another job to the queue of itself should it trigger the ActiveRecord::RecordNotUnique error, to make sure that async jobs won't get conflicts. The code now looks like below.
user = User.find_or_initialize_by(email: 'some-email#address.com')
begin
user.save
is_new_record = user.new_record?
is_persisted = user.persisted?
rescue ActiveRecord::RecordNotUnique => exception
MyJob.perform_later(params_hash)
end
if is_new_record
# do something if not yet created
elsif is_persisted
# do something if already created
end
I would suggest a different architecture to bypass the problem.
How about a producer-worker model, where one master Sidekiq process gets a list of email addresses, and then spawns a worker Sidekiq process for each email? Sidekiq makes this easy with a dedicated queue for master and workers to communicate.
Doing so, the email address becomes an input parameter of workers, so we know by construction that workers will not stump on each other data.

How to check if resque job has finished

I have a case scenario where I need to run multiple record updates in the background(using resque) and I want to give user visual indicator of how the task is running(eg started/running/finished).
One way of achieving this(which I can think of) is saving the current state into a table, then showing the state to user by simple page refresh.
Can anyone suggest a better solution of doing it?I want to avoid creating the whole migration, model, controller for this.
Thanks
As I've commented, resque-status gem could be useful for you. I am not sure if that is an answer but since you said that you do not want to create migration, model and controller for this. Thus, a gem might be the way to go.
From the job id you can get the status you are looking for, for example:
status = Resque::Plugins::Status::Hash.get(job_id)
status.working? #=> true
There is also a front-end called resque-web, check that out too.
You may use ruby's global variable $var_name = 'foo'. However I am not sure about it, because they are considered bad practice in rails, but in this case I see them reasonable, as soon as their name is very unique.
It can be done like (in case of resque):
class UpdateJob
#queue = data
def self.perform
$my_job_name_is_running = true
MyJobName.new.run
$my_job_name_is_running = nil
end
end
then you can access them from anywhere in the app:
while $my_job_name_is_running
puts "job is running..." if $my_job_name_is_running
sleep 3 # important to not overload your processor
end
Ruby global vars are not very popular. Check docs for more info https://ruby-doc.org/docs/ruby-doc-bundle/UsersGuide/rg/globalvars.html

sidekiq method to store some data in sqlite3 database, rather than redis?

I am currently working on a rails project, i was asked to save progress of a sidekiq workers and store it, so the user who is using the application can see the progress. Now i am faced with this dilemna, is it better to just write out to a text file or save it in a database.
If it is a database, then how to save it in a model object. I know we can store the progress of workers by just sending out the info to log file.
class YourWorker
include Sidekiq::Worker
def perform
logger.info { "Things are happening." }
logger.debug { "Here's some info: #{hash.inspect}" }
end
So if i want to save the progress of workers in a data model, then how?
Your thread title says that the data is unstructured, but your problem description indicates that the data should be structured. Which is it? Speed is not always the most important consideration, and it doesn't seem to be very important in your case. The most important consideration is how your data will be used. Will the way your data is used in the future change? Usually, a database with an appropriate model is the better answer because it allows flexibility for future requirements. It also allows other clients access to your data.
You can create a Job class and then update some attribute of the currently working job.
class Job < ActiveRecord::Base
# assume that there is a 'status' attribute that is defined as 'text'
end
Then when you queue something to happen you create a new Job and pass the id of the Job to perform or perform_async.
job = Job.create!
YourWorker.perform_async job.id
Then in your worker, you'd receive the id of the job to be worked on, and then retrieve and update that record.
def perform(job_id)
job = Job.find job_id
job.status = "It's happening!"
job.save
end

Do rails Transactions blocks exit after all actions have been committed?

Related to Run rails code after an update to the database has commited, without after_commit, but I think deserving its own question.
If I have code like this:
my_instance = MyModel.find(1)
MyModel.transaction do
my_instance.foo = "bar"
my_instance.save!
end
new_instance = MyModel.find(1)
puts new_instance.foo
Is this a guarantee that new_instance.foo will always output "bar" and not its previous value? I'm looking for a way to ensure that all the database actions that occur in a previous statement are committed BEFORE executing my next statements. Rails has an after_commit hook for this, but I don't want this code executed every time... only in this specific context.
I can't find anything in the documentation on Transactions that would indicate if Transaction blocks are "blocking". If they are blocking, that will satisfy my requirement. Unfortunately, I can't think of a practical way to test this behavior to confirm my suspicions one way or another.
Still researching this, but I think a transaction does block code execution until after the database confirms that it has written. Since "save!" is automatically wrapped in a transaction by Rails, the relevant code should run synchronously. The extra transaction block should be unnecessary.
I don't think Rails returns as soon as it hands off the call to the DB when the DB calls are within a transaction. The confusion I had was with after_save callbacks. After_save callbacks suffer from race conditions because they are in fact part of the transaction that saves are automatically wrapped in, so any code called by an after_save callback is not race condition safe, it is not protected by the transaction. Only after_commit calls are safe. Within the transaction Rails will hand off to the DB and then execute after_save callbacks before the DB has finished committing.
Studying this for more insights:
https://github.com/rails/rails/blob/bfdd3c2182156fa2cb81ed4f048b065a2d6f1341/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
UPDATE
Changing my answer to "no". It doesn't appear that save! or save blocks execution. From these two resources, looks like this is a common problem:
https://github.com/resque/resque/wiki/FAQ#how-do-you-make-a-resque-job-wait-for-an-activerecord-transaction-commit
https://blog.engineyard.com/2011/the-resque-way

Delayed Jobs on Rails 2: can't save in "perform" method/how to know when job is done?

I'm using the Delayed Jobs plugin for Rails 2, and every time I try to modify a model and save it in the "perform" method required by Delayed Jobs, it fails out (no error messages or anything, it's just listed as a failure in the database).
I have the "perform" method in one of my rails model files (Video), and I'm passing an instance of that model (#video, let's say) to the Delayed::Job.enqueue
Is it a known issue that you can't do database modifications while in the queue? Am I doing something wrong (it only fails when it tries to save, not when I'm actually changing the attributes, and that sounds like a database modification issue).
If this IS expected: How can I fix it? I'm trying to save a "done" attribute to true, so I know when the model is ready to get to the next step. Is there some standard way to figure out when a delayed job is done?
EDIT: I have confirmed that calling perform standalone (without delayed job) has no problems with saving (no errors or warnings, or anything). When I call it through DelayedJobs it fails IMMEDIATELY (no time out) the second it gets to the save line.
EDIT: Wait, I think I see what is going on: my "perform" is part of an "after_create" call back... Which is all well and good, until I try to SAVE. It looks like when I save, it calls perform AGAIN (while already in perform), and that doesn't fly with Delayed Jobs (nor should it). For some reason I thought after_create would only get called once (not after every save). Wait, a simple test just showed that that IS the case. Hrrm... So why is perform called twice when I save, and once when I don't, in delayed jobs?
My code:
after_create :start_transcodes
def start_transcodes
Delayed::Job.enqueue self
end
def perform
puts "performing"
self.flash_status = 100
self.save!
puts "done"
end
What I see:
performing
performing
2 jobs processed at 3.3406 j/s, 2 failed ...
I don't see it say "done" ever.
What I DO see in my rails log is:
"* [JOB] Video failed with NameError: undefined local variable or method `flush_deletes' for #<Paperclip::Attachment:0xb6e51da0> - 2 failed attempts
undefined local variable or method `flush_deletes' for #<Paperclip::Attachment:0xb6e51da0>"
I am using the paperclip plugin for this class, and I can call save all day (even in that perform method) and get no problems. I ALSO can call save(again, even in perform) all day and not see my after_create method called more than once--UNLESS I"m using Delayed Job. (might be it doing some sort of auto retry?)
I'm gonna look around my paperclip plugin, see what's going on...
If your save fails, its got nothing to do with delayed_job ( atleast it shouldn't be unless the save takes longer than the MAX_RUN_TIME. )
Try diagnosing the problem with the save, by not using delayed_job.
Also take a look at the delayed_job.log file in your logs
Okay, not sure EXACTLY what was happening, but I've made a skeleton "TranscodeJob" class in my lib directory. This class gets initialized with a reference to what video I want it to process, and processes it, and saves it, and plays nicely with Delayed Job.
Basically, it looks like passing my entire complicated Video object (complete with paperclip plugins) to Delayed Job was freaking things out, and passing a simple object, with no more info than it needs makes things much easier.
Below is the code I used, which works just fine (and if that works fine, i can add my long running code back little by little and confirm it continues to do so, but it worked fine before, just hiccuped with saving)
class TranscodeJob
def initialize(video_id)
#video_id = video_id
end
#delayed jobs expected method
def perform
#video = Video.find(#video_id)
#video.flash_status = 100
#video.save!
end
end
This code is STILL called from a after_create filter, and I'm not seeing it called twice, so it looks like I mistoke DelayedJobs auto-retry for recursion, or something.

Resources