Why does a sleep between ActiveRecord#first calls change its behaviour? - ruby-on-rails

I've inherited some code written in ruby 1.8.7 with Rails ~> 2.3.15 and it contains a test which looks something like this:
class Wibble < ActiveRecord::Base
# Wibbles have integer primary keys and string names
end
def test
create_two_wibbles
w1 = Wibble.first
sleep 2 # This sleep is necessary to
w2 = Wibble.first
w1.name = "Name for w1"
w2.name = "Name for w2"
w1.save
w3 = Wibble.first
assert(!w3.update_attributes(w2.attributes))
end
That comment next to the sleep line hasn't been cut off, it literally says This sleep is necessary to. Without that sleep, this test fails - removing it changes the behaviour somehow beyond making it run 2 seconds faster. I've been digging through the file's history in our version control system, and the messages were uninformative. I also cannot contact the original author of this test to figure out what they were trying to do.
From my understanding, we're pulling the same record out of the database twice, editing it in two different ways, saving the first, and asserting that we can't then save the second. I suspect this is a test to make sure our database locks the table correctly, but surely if this were to fail our Wibbles would be fine, and ActiveRecord would be at fault. Nobody in the office can figure out why this test may have been necessary, nor what difference the sleep statement might make. Any ideas?

This is likely caused by Active Record caching and memoization stopping the second find from actually going to the DB and get fresh data.
In fact, try printing the object_id of each wibble; they might be the exact same memoized object. If that is the case, then the test kinda makes sense.
You could also do the same test in a controller action and look at the verbose SQL logs from Rails; I'd expect the second find call to tell you it just used cached data and did not actually run any SQL query.

Related

Records disappear during Rails tests when calling PORO (transaction issue?)

I have an existing application (Rails 6) with a set of tests (minitest). I've just converted my tests to use factory_bot instead of fixtures but I'm having a strange problem with records created and confirmed in the test and controller being unavailable in a PORO that does the actual work. This problem occurs inconsistently and never seems to happen when I run an individual test, only when tests are run in bulk (e.g. a whole file or directory). The more tests are run, the more likely the failure, it seems.
(NB I've never seen this code fail in actual use - it only happens during tests.)
Summary
Previously, when using fixtures, every test ran successfully both individually and when run all together with rails t. Now, with factory_bot, a few of my tests often (but not always) fail, all related to the use of the same object that is defined as a PORO.
Drilling down, I have found that there's an issue with records sometimes mysteriously going missing or being unavailable within the PORO during the test, even though they're confirmed as present in the test and in the controller that calls the PORO!
Details
In my application, I have a RichText object that receives some text and processes it, highlighting words in the text that match those stored in a Dictionary table. In my tests, I create several test Dictionary records, and expect the RichText object to perform appropriately when passed test data. And it does, when the individual test files are run (and always did when I used fixtures).
However, now, the records are created and available in the test and in the controller it calls, but then are not available within the RichText object created by the controller. With no Dictionary records available in the RichText object, the test naturally fails because no words are highlighted in the text. And, again, this only seems to happen when I run the tests as a group rather than running just a single test file (e.g. rails t test/objects/rich_text.rb passes, but rails t test/objects will fail within the same rich_text.rb test file).
It doesn't seem to matter whether I create the records with factory_bot#create or directly with Dictionary#create, which suggests it's nothing to do with factory_bot - but then why has this just started happening?
I do have parallelisation enabled in minitest but disabling it makes no difference - the tests still fail the same way.
Code
Example test code that runs and passes, up to the last assertion here, which sometimes fails as described above:
test 'can create new content' do
create(:dictionary, word_root: 'word_1')
create(:dictionary, word_root: 'word_2')
create(:dictionary, word_root: 'word_3')
assert_equal 3, Dictionary.all.count
...
# This next line is the one that calls the relevant controller code below
post '/api/v0/content', headers: #auth_headers, params: #new_content_params
...
# This assertion passes, as it did above, even though the error's already happened after the post above
assert_equal 3, Dictionary.all.count
# This assertion checks the response from the above post and fails under certain circumstances, as described above
assert_equal #new_content_output, response.body
...
end
I've added checks to the controller as below and, again, everything's fine through this code, which is called by the post line in the test above (i.e. the database records are present and correct just before the RichText object is called):
def create
...
byebug unless Dictionary.all.count == 3
rich_text = RichText::Basic.new(#organisation, new_version[:content])
...
end
However, the RichText object's initialize method immediately fails the same check for these records - but only if the test is being run in bulk rather than individually:
class RichText::Basic
def initialize(organisation, text)
byebug unless Dictionary.all.count == 3
...
end
end
Rails 6.1.4, ruby 2.7.1
Having tried various things (like disabling transactions in the affected tests), I found that the root cause was a constant defined in the RichText class (a line I didn't include in the question!). It looks like there was a race condition or similar that meant that the RichText class sometimes ran before the database was populated, leaving it with an empty constant.
Replacing the constant with a direct database call resolved the problem. It does mean slightly more database calls but, on the flip side, does mean it's slightly easier to update the Dictionary table. (This happens rarely - on the order of once a month - which is why I'd put it into a constant.)
From:
class RichTest::Basic
WORDS = Dictionary.standard
def some_method
WORDS.each do...
end
end
to
class RichTest::Basic
def some_method
Dictionary.standard.each do...
end
end

Mongoid identity_map and memory usage, memory leaks

When I executing query
Mymodel.all.each do |model|
# ..do something
end
It uses allot of memory and amount of used memory increases at all the time and at the and it crashes. I found out that to fix it I need to disable identity_map but when I adding to my mongoid.yml file identity_map_enabled: false I am getting error
Invalid configuration option: identity_map_enabled.
Summary:
A invalid configuration option was provided in your mongoid.yml, or a typo is potentially present. The valid configuration options are: :include_root_in_json, :include_type_for_serialization, :preload_models, :raise_not_found_error, :scope_overwrite_exception, :duplicate_fields_exception, :use_activesupport_time_zone, :use_utc.
Resolution:
Remove the invalid option or fix the typo. If you were expecting the option to be there, please consult the following page with repect to Mongoid's configuration:
I am using Rails 4 and Mongoid 4, Mymodel.all.count => 3202400
How can I fix it or maybe some one know other way to reduce amount of memory used during executing query .all.each ..?
Thank you very much for the help!!!!
I started with something just like you by doing loop through millions of record and the memory just keep increasing.
Original code:
#portal.listings.each do |listing|
listing.do_something
end
I've gone through many forum answers and I tried them out.
1st attempt: I try to use the combination of WeakRef and GC.start but no luck, I fail.
2nd attempt: Adding listing = nil to the first attempt, and still fail.
Success Attempt:
#start_date = 10.years.ago
#end_date = 1.day.ago
while #start_date < #end_date
#portal.listings.where(created_at: #start_date..#start_date.next_month).each do |listing|
listing.do_something
end
#start_date = #start_date.next_month
end
Conclusion
All the memory allocated for the record will never be released during
the query request. Therefore, trying with small number of record every
request does the job, and memory is in good condition since it will be
released after each request.
Your problem isn't the identity map, I don't think Mongoid4 even has an identity map built in, hence the configuration error when you try to turn it off. Your problem is that you're using all. When you do this:
Mymodel.all.each
Mongoid will attempt to instantiate every single document in the db.mymodels collection as a Mymodel instance before it starts iterating. You say that you have about 3.2 million documents in the collection, that means that Mongoid will try to create 3.2 million model instances before it tries to iterate. Presumably you don't have enough memory to handle that many objects.
Your Mymodel.all.count works fine because that just sends a simple count call into the database and returns a number, it won't instantiate any models at all.
The solution is to not use all (and preferably forget that it exists). Depending on what "do something" does, you could:
Page through all the models so that you're only working with a reasonable number of them at a time.
Push the logic into the database using mapReduce or the aggregation framework.
Whenever you're working with real data (i.e. something other than a trivially small database), you should push as much work as possible into the database because databases are built to manage and manipulate big piles of data.

Rails 4 Multithreaded App - ActiveRecord::ConnectionTimeoutError

I have a simple rails app that scrapes JSON from a remote URL for each instance of a model (let's call it A). The app then creates a new data-point under an associated model of the 1st. Let's call this middle model B and the data point model C. There's also a front end that let's users browse this data graphically/visually.
Thus the hierarchy is A has many -> B which has many -> C. I scrape a URL for each A which returns a few instances of B with new Cs that have data for the respective B.
While attempting to test/scale this app I have encountered a problem where rails will stop processing, hang for a while, and finally throw a "ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds" Obviously the 5 is just the default.
I can't understand why this is happening when 1) there are no DB calls being made explicitly, 2) the log doesn't show any under the hood DB calls happening when it does work 3) it works sometimes and not others.
What's going on with rails 4 AR and the connection pool?!
A couple of notes:
The general algorithm is to spawn a thread for each model A, scrape the data, create in memory new instances of model C, save all the C's in one transaction at the end.
Sometimes this works, other times it doesn't, i can't figure out what causes it to fail. However, once it fails it seems to fail more and more.
I eager load all the model A's and B's to begin with.
I use a transaction at the end to insert all the newly created C instances.
I currently use resque and resque scheduler to do this work but I highly doubt they are the source of the problem as it persists even if I just do "rails runner Class.do_work"
Any suggestions and or thoughts greatly appreciated!
I believe I have found the cause of this problem. When you loop through an association via
model.association.each do |a|
#work here
end
Rails does some behind the scenes work that "uses" a DB connection. I put uses in quotes because in my case I think the result is actually returned from memory. I eager loaded the association and thus the DB is never actually hit.
Preliminary testing of wrapping my block in a
ActiveRecord::Base.connection_pool.with_connection do
#something me doing?
end
seems to have resolved the issue.
I uncovered this by adding a backtrace to my thread's error message that was printing out.
-----For those using resque----
I also had to add a bit in my resque.rake file to get this fully working as intended.
task 'resque:setup' => :environment do
Resque.after_fork do |job|
ActiveRecord::Base.establish_connection
end
end
If you are you using
ActiveRecord::Base.transaction do
... code
end
to accomplish faster transactions in a thread, note that this locks the database. I had an app that did this for a hugely expensive process, in a thread, and it would lock the DB for over 5 seconds. It is faster, though it will lock your database

Database lock not working as expected with Rails & Postgres

I have the following code in a rails model:
foo = Food.find(...)
foo.with_lock do
if bar = foo.bars.find_by_stuff(stuff)
# do something with bar
else
bar = foo.bars.create!
# do something with bar
end
end
The goal is to make sure that a Bar of the type being created is not being created twice.
Testing with_lock works at the console confirms my expectations. However, in production, it seems that in either some or all cases the lock is not working as expected, and the redundant Bar is being attempted -- so, the with_lock doesn't (always?) result in the code waiting for its turn.
What could be happening here?
update
so sorry to everyone who was saying "locking foo won't help you"!! my example initially didin't have the bar lookup. this is fixed now.
You're confused about what with_lock does. From the fine manual:
with_lock(lock = true)
Wraps the passed block in a transaction, locking the object before yielding. You pass can the SQL locking clause as argument (see lock!).
If you check what with_lock does internally, you'll see that it is little more than a thin wrapper around lock!:
lock!(lock = true)
Obtain a row lock on this record. Reloads the record to obtain the requested lock.
So with_lock is simply doing a row lock and locking foo's row.
Don't bother with all this locking nonsense. The only sane way to handle this sort of situation is to use a unique constraint in the database, no one but the database can ensure uniqueness unless you want to do absurd things like locking whole tables; then just go ahead and blindly try your INSERT or UPDATE and trap and ignore the exception that will be raised when the unique constraint is violated.
The correct way to handle this situation is actually right in the Rails docs:
http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/find_or_create_by
begin
CreditAccount.find_or_create_by(user_id: user.id)
rescue ActiveRecord::RecordNotUnique
retry
end
("find_or_create_by" is not atomic, its actually a find and then a create. So replace that with your find and then create. The docs on this page describe this case exactly.)
Why don't you use a unique constraint? It's made for uniqueness
A reason why a lock wouldn't be working in a Rails app in query cache.
If you try to obtain an exclusive lock on the same row multiple times in a single request, query cached kicks in so subsequent locking queries never reach the DB itself.
The issue has been reported on Github.

Updating a lot of records frequently

I have a Rails 3 app that has several hundred records in a mySQL-DB that need to be updated multiple times each hour. The actual updating is done through delayed_job which is triggered in controller-logic (checking if enough time has passed since the last update, only then sth. happens).
Each update is slow, it can take up to a second in some cases (although it averages at 3 - 5 updates/sec.).
Code looks like this:
class Thing < ActiveRecord::Base
...
def self.scheduled_update
Thing.all.each do |t|
...
t.some_property = new_value
t.save
end
end
end
I've observed that the execution stalls after 300 - 400 records and then the delayed job just seems to hang and times out eventually (entries in delayed_job.log). After a while the next one starts, also fails, and so forth, so not all records get updated.
What is the proper way to do this?
How does Rails handle database-connections when used like that? Could it be some timeout issue that is not detected/handled properly?
There must be a default way to do this, but couldn't find anything so far..
Any help is appreciated.
Another options is update_all.
Rails is a bad choice for mass data records. See if you can create a sql stored procedure or some other way that would avoid active record.
Use object.save_with_validation(false) if you are ok with skipping validations altogether.
When finding records, use :select => 'a,b,c,other_fields' to limit the fields you want ('a', 'b', 'c' and 'other' in this example).
Use :include for eager loading when you are initially selecting and joining across multiple tables.
So I solved my problem.
There was some issue with the rails-version I was using (3.0.3), the Timeout was caused by some bug I suspect. Updating to a later version of the 3.0.x branch solved it and everything runs perfectly now.

Resources