Related to Run rails code after an update to the database has commited, without after_commit, but I think deserving its own question.
If I have code like this:
my_instance = MyModel.find(1)
MyModel.transaction do
my_instance.foo = "bar"
my_instance.save!
end
new_instance = MyModel.find(1)
puts new_instance.foo
Is this a guarantee that new_instance.foo will always output "bar" and not its previous value? I'm looking for a way to ensure that all the database actions that occur in a previous statement are committed BEFORE executing my next statements. Rails has an after_commit hook for this, but I don't want this code executed every time... only in this specific context.
I can't find anything in the documentation on Transactions that would indicate if Transaction blocks are "blocking". If they are blocking, that will satisfy my requirement. Unfortunately, I can't think of a practical way to test this behavior to confirm my suspicions one way or another.
Still researching this, but I think a transaction does block code execution until after the database confirms that it has written. Since "save!" is automatically wrapped in a transaction by Rails, the relevant code should run synchronously. The extra transaction block should be unnecessary.
I don't think Rails returns as soon as it hands off the call to the DB when the DB calls are within a transaction. The confusion I had was with after_save callbacks. After_save callbacks suffer from race conditions because they are in fact part of the transaction that saves are automatically wrapped in, so any code called by an after_save callback is not race condition safe, it is not protected by the transaction. Only after_commit calls are safe. Within the transaction Rails will hand off to the DB and then execute after_save callbacks before the DB has finished committing.
Studying this for more insights:
https://github.com/rails/rails/blob/bfdd3c2182156fa2cb81ed4f048b065a2d6f1341/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
UPDATE
Changing my answer to "no". It doesn't appear that save! or save blocks execution. From these two resources, looks like this is a common problem:
https://github.com/resque/resque/wiki/FAQ#how-do-you-make-a-resque-job-wait-for-an-activerecord-transaction-commit
https://blog.engineyard.com/2011/the-resque-way
Related
I have around 10 workers that performs a job that includes the following:
user = User.find_or_initialize_by(email: 'some-email#address.com')
if user.new_record?
# ... some code here that does something taking around 5 seconds or so
elsif user.persisted?
# ... some code here that does something taking around 5 seconds or so
end
user.save
The problem is that at certain times, two or more workers run this code at the exact time, and thus I later found out that two or more Users have the same email, in which I should always end up only unique emails.
It is not possible for my situation to create DB Unique Indexes for email as unique emails are conditional -- some Users should have unique email, some do not.
It is noteworthy to mention that my User model has uniqueness validations, but it still doesn't help me because, between .find_or_initialize_by and .save, there is a code that is dependent if the user object is already created or not.
I tried Pessimistic and Optimistic locking, but it didn't help me, or maybe I just didn't implement it properly... should you have some suggestions regarding this.
The solution I can only think of is to lock the other threads (Sidekiq jobs) whenever these lines of codes get executed, but I am not too sure how to implement this nor do I know if this is even a suggestable approach.
I would appreciate any help.
EDIT
In my specific case, it is gonna be hard to put email parameter in the job, as this job is a little more complex than what was just said above. The job is actually an export script in which a section of the job is the code above. I don't think it's also possible to separate the functionality above into another separate worker... as the whole job flow should be serial and that no parts should be processed parallely / asynchronously. This job is just one of the jobs that are managed by another job, in which ultimately is managed by the master job.
Pessimistic locking is what you want but only works on a record that exists - you can't use it with new_record? because there's nothing to lock in the DB yet.
I managed to solve my problem with the following:
I found out that I can actually add a where clause in Rails DB Uniqueness Partial Index, and thus I can now set up uniqueness conditions for different types of Users on the database-level in which other concurrent jobs will now raise an ActiveRecord::RecordNotUnique error if already created.
The only problem now then is the code in between .find_or_initialize_by and .save, since those are time-dependent on the User objects in which always only one concurrent job should always get a .new_record? == true, and other concurrent jobs should then trigger the .persisted? == true as one job would always be first to create it, but... all of these doesn't work yet because it is only at the line .save where the db uniqueness index validation gets called. Therefore, I managed to solve this problem by putting .save before those conditions, and at the same time I added a rescue block for .save which then adds another job to the queue of itself should it trigger the ActiveRecord::RecordNotUnique error, to make sure that async jobs won't get conflicts. The code now looks like below.
user = User.find_or_initialize_by(email: 'some-email#address.com')
begin
user.save
is_new_record = user.new_record?
is_persisted = user.persisted?
rescue ActiveRecord::RecordNotUnique => exception
MyJob.perform_later(params_hash)
end
if is_new_record
# do something if not yet created
elsif is_persisted
# do something if already created
end
I would suggest a different architecture to bypass the problem.
How about a producer-worker model, where one master Sidekiq process gets a list of email addresses, and then spawns a worker Sidekiq process for each email? Sidekiq makes this easy with a dedicated queue for master and workers to communicate.
Doing so, the email address becomes an input parameter of workers, so we know by construction that workers will not stump on each other data.
I'm trying to battle some race cases with my background task manager. Essentially, I have a Thing object (already exists) and assign it some properties, and then save it. After it is saved with the new properties, I queue it in Resque, passing in the ID.
thing = Thing.find(1)
puts thing.foo # outputs "old value"
thing.foo = "new value"
thing.save
ThingProcessor.queue_job(thing.id)
The background job will load the object from the database using Thing.find(thing_id).
The problem is that we've found Resque is so fast at picking up the job and loading the Thing object from the ID, that it loads a stale object. So within the job, calling thing.foo will still return "old value" like 1/100 times (not real data, but it does not happen often).
We know this is a race case, because rails will return from thing.save before the data has actually been commit to the database (postgresql in this case).
Is there a way in Rails to only execute code AFTER a database action has commit? Essentially I want to make sure that by the time Resque loads the object, it is getting the freshest object. I know this can be achieved using an after_commit hook on the Thing model, but I don't want it there. I only need this to happen in this one specific context, not every time the model has commit changed to the DB.
You can put in a transaction as well. Just like the example below:
transaction do
thing = Thing.find(1)
puts thing.foo # outputs "old value"
thing.foo = "new value"
thing.save
end
ThingProcessor.queue_job(thing.id)
Update: there is a gem which calls After Transaction, with this you may solve your problem. Here is the link:
http://xtargets.com/2012/03/08/understanding-and-solving-race-conditions-with-ruby-rails-and-background-workers/
What about wrapping a try around the transaction so that the job is queued up only upon success of the transaction?
I had a similar issue, where by I needed to ensure that a transaction had commited before running a series of action. I ended up using this Gem:
https://github.com/Envek/after_commit_everywhere
It meant that I could do the following:
def finalize!
Order.transaction do
payment.charge!
# ...
# Ensure that we only send out items and perform after actions when the order has defintely be completed
after_commit { OrderAfterFinalizerJob.perform_later(self) }
end
end
One gem to allow that is https://github.com/Ragnarson/after_commit_queue
It is a little different than the other answer's after_commit_everywhere gem. The after_commit_everywhere call seems decoupled from current model being saved or not.
So it might be what you expect or not expect, depending on your use case.
Here is my before_save method:
before_save :check_postal
def check_postal
first_three = self.postal_code[0..2]
first_three.downcase!
postal = Postal.find_by_postal_code(first_three)
if postal
self.zone_id = postal.zone_id
else
PostalError.create(postal_code: self.postal_code)
return false
end
end
Everything runs fine when self.zone_id = postal.zone_id but inside the else statement, my PostalError.create(postal_code: self.postal_code) doesn't save the record to the database..
I know it's got something to do with the return false statement, because when I remove it, it saves fine -- but then that defeats the purpose..
How can I get a new PostalError record to save while returning false to prevent the current object from saving..
You're exactly right: the problem is the before_save.
The entirety of the save process is wrapped in a transaction. If the save fails, whether it be because of a validation failure, an exception being rolled back or something else, the transaction is rolled back. This undoes the creation of your PostalError record.
Normally this is a good thing - it's so that incomplete saves don't leave detritus around
I can think of two ways to solve this. One is to not create the record there at all: use a after_rollback hook to execute it once the danger has passed.
The other way is to create that record using a different database connection (since transactions are a per connection thing). An easy way to do that is to use a different thread:
Thread.new { PostalError.create(...)}.join
I stuck the join on there so that this waits for the thread to complete rather than adding a degree of concurrency to your app that you might not expect.
i don't know, but i trying to guess the solve.
else
PostalError.create(postal_code: self.postal_code)
self.zone_id = postal.zone_id
end
Is there any way for a before_save filter to halt the entire save without halting the transaction? What I'm trying to do is have a "sample" version of my model that the user can interact with and save but the changes themselves are never actually saved. The following will halt the transaction and (naturally) return false when I call #model.update_attributes:
before_filter :ignore_changes_if_sample
def ignore_changes_if_sample
if self.sample?
return false
end
end
Thanks!
That's precisely what's happening here. If you look at your SQL, you should be seeing BEGIN and then COMMIT, without anything between them. The before_save is not halting the transaction; it's simply preventing the record from being saved by returning false.
To more generally answer your question, records that fail to persist do not halt transactions unless they also raise an exception. Exceptions trigger the ROLLBACK that prevents any part of the transaction from being committed. So even if you return false here, a larger, overarching transaction should continue just fine.
You can read more about transactions and how Rails uses them in the ActiveRecord::Transactions documentation.
We have an asynchronous task that performs a potentially long-running calculation for an object. The result is then cached on the object. To prevent multiple tasks from repeating the same work, we added locking with an atomic SQL update:
UPDATE objects SET locked = 1 WHERE id = 1234 AND locked = 0
The locking is only for the asynchronous task. The object itself may still be updated by the user. If that happens, any unfinished task for an old version of the object should discard its results as they're likely out-of-date. This is also pretty easy to do with an atomic SQL update:
UPDATE objects SET results = '...' WHERE id = 1234 AND version = 1
If the object has been updated, its version won't match and so the results will be discarded.
These two atomic updates should handle any possible race conditions. The question is how to verify that in unit tests.
The first semaphore is easy to test, as it is simply a matter of setting up two different tests with the two possible scenarios: (1) where the object is locked and (2) where the object is not locked. (We don't need to test the atomicity of the SQL query as that should be the responsibility of the database vendor.)
How does one test the second semaphore? The object needs to be changed by a third party some time after the first semaphore but before the second. This would require a pause in execution so that the update may be reliably and consistently performed, but I know of no support for injecting breakpoints with RSpec. Is there a way to do this? Or is there some other technique I'm overlooking for simulating such race conditions?
You can borrow an idea from electronics manufacturing and put test hooks directly into the production code. Just as a circuit board can be manufactured with special places for test equipment to control and probe the circuit, we can do the same thing with the code.
SUppose we have some code inserting a row into the database:
class TestSubject
def insert_unless_exists
if !row_exists?
insert_row
end
end
end
But this code is running on multiple computers. There's a race condition, then, since another processes may insert the row between our test and our insert, causing a DuplicateKey exception. We want to test that our code handles the exception that results from that race condition. In order to do that, our test needs to insert the row after the call to row_exists? but before the call to insert_row. So let's add a test hook right there:
class TestSubject
def insert_unless_exists
if !row_exists?
before_insert_row_hook
insert_row
end
end
def before_insert_row_hook
end
end
When run in the wild, the hook does nothing except eat up a tiny bit of CPU time. But when the code is being tested for the race condition, the test monkey-patches before_insert_row_hook:
class TestSubject
def before_insert_row_hook
insert_row
end
end
Isn't that sly? Like a parasitic wasp larva that has hijacked the body of an unsuspecting caterpillar, the test hijacked the code under test so that it will create the exact condition we need tested.
This idea is as simple as the XOR cursor, so I suspect many programmers have independently invented it. I have found it to be generally useful for testing code with race conditions. I hope it helps.