Poking around in the rails code, I ran across with_scope.
From what I can tell, it takes the scope type and conditions, merges them into existing conditions for that scope type, yields to the block, then gets rid of the additional scope.
So my first thought is in a multithreaded environment (like jruby on rails), what happens if while thread 1 is executing the block, thread 2 decides to do a Model.find :all? It seems to me like a race condition waiting to happen.
Am I missing something?
So the trick in here is that if you trace deep enough, the scopes are getting set through Thread.current[method], which will execute method but only in the scope of the current thread. I didn't even know that was possible for ruby... guess you learn something new every day
Related
I am using OmniThread Parallel.foreach(). There are instances where the loop takes a long time or gets stuck.
I would like to know, is it possible to timeout each process in the Parallel.foreach() loop?
In short: Nope, there isn't.
Unless you program the timeout handling in your 'thread body' code (what gets called in the execute).
eg my database engine allows sending a CancelProcessing call to running queries from a different thread that runs the query, this would 'cleanly' end the running subthread.
'Dirty' end of the subthreads:
I added a FR to Omnithread's github site to add an (Dirty) Terminate method to the IOmniParallel interfaces (and alikes). Which has is drawback because killing subthreads will probably leave you with memory/resource leaks.
Meanwile you might use this dirty shutdown solution/workaround wich actually comes down fixing a similar problem (I had a deadlock in my parallel processed routine, so my parallel.Waitfor never returned true, and worse my IOmniParallelTask interface variable was never released causing the calling thread to block as well.
When you use ActiveRecord::Base.connection.execute(sql_string), should you call clear on the result in order to free memory?
At 19:09 in this podcast, the speaker (a Rails committer who has done a lot of work on Active Record) says that if we use ActiveRecord::Base.connection.execute, we should call clear on the result, or we should use the method ActiveRecord::Base.connection.execute_and_clear, which takes a block.
(He’s a bit unclear on the method names. The method for the MySQL adapter is free and the method for the Postgres adapter is clear. He also mentions release, but that method doesn't exist.)
My understanding is that he's saying we should change
result = ActiveRecord::Base.connection.execute(sql_string).to_a
process_result(result)
to
ActiveRecord::Base.connection.execute_and_clear(sql_string, "SCHEMA", []) do |result|
process_result(result)
end
or
result = ActiveRecord::Base.connection.execute(sql_string)
process_result(result)
result.clear
That podcast was the only place I've heard this claim, and I couldn't find any other information about it. The Rails app I'm working on uses execute without clear in a number of instances, and we don't know of any problems caused by it. Are there certain circumstances under which failing to call clear is more likely to cause memory problems?
It depends on the adapter. Keep in mind that Rails doesn't control the object that is returned by execute. If you're using PostgreSQL, you'll get back a PG::Result, and using the mysql2 adapter, you'll get back a Mysql2::Result.
For PG (documented here), you need to call clear unless autoclear? returns true or you'll get a memory leak. You may also want to call clear manually if you've got a large enough result set to ensure it doesn't cause memory issues before it gets cleaned up.
Mysql2 doesn't appear to expose its free through the Ruby API, and appears to always clean itself up during GC.
Related to Run rails code after an update to the database has commited, without after_commit, but I think deserving its own question.
If I have code like this:
my_instance = MyModel.find(1)
MyModel.transaction do
my_instance.foo = "bar"
my_instance.save!
end
new_instance = MyModel.find(1)
puts new_instance.foo
Is this a guarantee that new_instance.foo will always output "bar" and not its previous value? I'm looking for a way to ensure that all the database actions that occur in a previous statement are committed BEFORE executing my next statements. Rails has an after_commit hook for this, but I don't want this code executed every time... only in this specific context.
I can't find anything in the documentation on Transactions that would indicate if Transaction blocks are "blocking". If they are blocking, that will satisfy my requirement. Unfortunately, I can't think of a practical way to test this behavior to confirm my suspicions one way or another.
Still researching this, but I think a transaction does block code execution until after the database confirms that it has written. Since "save!" is automatically wrapped in a transaction by Rails, the relevant code should run synchronously. The extra transaction block should be unnecessary.
I don't think Rails returns as soon as it hands off the call to the DB when the DB calls are within a transaction. The confusion I had was with after_save callbacks. After_save callbacks suffer from race conditions because they are in fact part of the transaction that saves are automatically wrapped in, so any code called by an after_save callback is not race condition safe, it is not protected by the transaction. Only after_commit calls are safe. Within the transaction Rails will hand off to the DB and then execute after_save callbacks before the DB has finished committing.
Studying this for more insights:
https://github.com/rails/rails/blob/bfdd3c2182156fa2cb81ed4f048b065a2d6f1341/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb
UPDATE
Changing my answer to "no". It doesn't appear that save! or save blocks execution. From these two resources, looks like this is a common problem:
https://github.com/resque/resque/wiki/FAQ#how-do-you-make-a-resque-job-wait-for-an-activerecord-transaction-commit
https://blog.engineyard.com/2011/the-resque-way
I have the following code in a rails model:
foo = Food.find(...)
foo.with_lock do
if bar = foo.bars.find_by_stuff(stuff)
# do something with bar
else
bar = foo.bars.create!
# do something with bar
end
end
The goal is to make sure that a Bar of the type being created is not being created twice.
Testing with_lock works at the console confirms my expectations. However, in production, it seems that in either some or all cases the lock is not working as expected, and the redundant Bar is being attempted -- so, the with_lock doesn't (always?) result in the code waiting for its turn.
What could be happening here?
update
so sorry to everyone who was saying "locking foo won't help you"!! my example initially didin't have the bar lookup. this is fixed now.
You're confused about what with_lock does. From the fine manual:
with_lock(lock = true)
Wraps the passed block in a transaction, locking the object before yielding. You pass can the SQL locking clause as argument (see lock!).
If you check what with_lock does internally, you'll see that it is little more than a thin wrapper around lock!:
lock!(lock = true)
Obtain a row lock on this record. Reloads the record to obtain the requested lock.
So with_lock is simply doing a row lock and locking foo's row.
Don't bother with all this locking nonsense. The only sane way to handle this sort of situation is to use a unique constraint in the database, no one but the database can ensure uniqueness unless you want to do absurd things like locking whole tables; then just go ahead and blindly try your INSERT or UPDATE and trap and ignore the exception that will be raised when the unique constraint is violated.
The correct way to handle this situation is actually right in the Rails docs:
http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/find_or_create_by
begin
CreditAccount.find_or_create_by(user_id: user.id)
rescue ActiveRecord::RecordNotUnique
retry
end
("find_or_create_by" is not atomic, its actually a find and then a create. So replace that with your find and then create. The docs on this page describe this case exactly.)
Why don't you use a unique constraint? It's made for uniqueness
A reason why a lock wouldn't be working in a Rails app in query cache.
If you try to obtain an exclusive lock on the same row multiple times in a single request, query cached kicks in so subsequent locking queries never reach the DB itself.
The issue has been reported on Github.
We have an asynchronous task that performs a potentially long-running calculation for an object. The result is then cached on the object. To prevent multiple tasks from repeating the same work, we added locking with an atomic SQL update:
UPDATE objects SET locked = 1 WHERE id = 1234 AND locked = 0
The locking is only for the asynchronous task. The object itself may still be updated by the user. If that happens, any unfinished task for an old version of the object should discard its results as they're likely out-of-date. This is also pretty easy to do with an atomic SQL update:
UPDATE objects SET results = '...' WHERE id = 1234 AND version = 1
If the object has been updated, its version won't match and so the results will be discarded.
These two atomic updates should handle any possible race conditions. The question is how to verify that in unit tests.
The first semaphore is easy to test, as it is simply a matter of setting up two different tests with the two possible scenarios: (1) where the object is locked and (2) where the object is not locked. (We don't need to test the atomicity of the SQL query as that should be the responsibility of the database vendor.)
How does one test the second semaphore? The object needs to be changed by a third party some time after the first semaphore but before the second. This would require a pause in execution so that the update may be reliably and consistently performed, but I know of no support for injecting breakpoints with RSpec. Is there a way to do this? Or is there some other technique I'm overlooking for simulating such race conditions?
You can borrow an idea from electronics manufacturing and put test hooks directly into the production code. Just as a circuit board can be manufactured with special places for test equipment to control and probe the circuit, we can do the same thing with the code.
SUppose we have some code inserting a row into the database:
class TestSubject
def insert_unless_exists
if !row_exists?
insert_row
end
end
end
But this code is running on multiple computers. There's a race condition, then, since another processes may insert the row between our test and our insert, causing a DuplicateKey exception. We want to test that our code handles the exception that results from that race condition. In order to do that, our test needs to insert the row after the call to row_exists? but before the call to insert_row. So let's add a test hook right there:
class TestSubject
def insert_unless_exists
if !row_exists?
before_insert_row_hook
insert_row
end
end
def before_insert_row_hook
end
end
When run in the wild, the hook does nothing except eat up a tiny bit of CPU time. But when the code is being tested for the race condition, the test monkey-patches before_insert_row_hook:
class TestSubject
def before_insert_row_hook
insert_row
end
end
Isn't that sly? Like a parasitic wasp larva that has hijacked the body of an unsuspecting caterpillar, the test hijacked the code under test so that it will create the exact condition we need tested.
This idea is as simple as the XOR cursor, so I suspect many programmers have independently invented it. I have found it to be generally useful for testing code with race conditions. I hope it helps.