ActiveRecord models "losing" their find_by_* methods - ruby-on-rails

We have a relatively standard Ruby on Rails project, which has quite a few background jobs that run under Resque (with Redis as a backend.)
The issue is that very rarely -- perhaps once a month, maybe a little less -- we'll suddenly see floods of exceptions from Resque. The exceptions are all in the following vein:
undefined method `find_by_id` for User():Class
undefined method `find_by_name` for CustomerAccount():Class
undefined method `find_by_id` for Job():Class
It appears that suddenly, all ActiveRecord::Base models lose their find_by_* methods for the entire thread. Restarting the worker fixes the issue.
I know that generically, the answer must be "someone, somewhere -- probably in a gem -- is breaking method_missing somehow." Or perhaps, somehow the constants are getting reassigned to a different class. But before I begin a really thorough investigation, I wanted to check if anyone has run into this problem and solved it already.
This project is running Ruby 2.1.1p76, Rails 3.2.17, Resque 1.25.1.

Closing the loop on this ancient question: it turns out moonfly's comment was indeed the issue, and in long-running workers, a dropped database connection would result in this (somewhat strange) error message.
Knowing the root cause, we were able to add a periodic conn refresh (to try and keep the connection alive when a worker was idle for too long), and also, to add a detection mechanism for when a db conn was dropped, and to reconnect. So, thanks much #moonfly, if you'd like to turn your comment into an answer I'm happy to award you much-delayed answer credit.

Related

How can I stop active record callbacks from throwing an error when they fail?

I have a Rails app that includes some after_save active record callbacks, which are responsible for updating a second cache database.
Recently I was having some load issues and the second database was slow to respond or refusing connections, and my users started getting 500 errors when they tried to save objects.
I think this was down to me misunderstanding how callbacks work - I assumed that by making these after_save rather than before_save, the user was "safe" and if the callback failed, that would be graceful and silent.
How might I refactor my code to stop this behaviour and avoid exposing my users to error messages when these callbacks fail?
I've looked at refactoring the callbacks to simply trigger an active job, like this:
# models/mymodel.rb
after_save :update_index
def update_index
UpdateIndexJob.perform_later(self)
end
The active job includes the logic for actually updating the second cache database.
I'm not sure whether I would also need to implement sidekiq and redis for this to work.
I've also read about rails observers, which apparently work similar to callbacks but don't break the request-response cycle when they fail.
How can I stop active record callbacks from throwing an error when they fail?
That's a very open question, and I think you've already answered it yourself.
You can either:
Rewrite it in such a way that it won't fail (e.g. only invoke a background task).
Rescue the exception. (But then, is aborting that path of the code actually a sensible decision?! Probably not.)
I've looked at refactoring the callbacks to simply trigger an active job [...] I'm not sure whether I would also need to implement sidekiq and redis for this to work.
That seems like a safe implementation. And yes, you will need to configure a backend in order to actually execute the jobs. See the documentation. Sidekiq, Resque, Sneakers, Sucker Punch, Queue Classic, Delayed Job and Que are just some of the available choices.
I've also read about rails observers ...
That would also prevent user errors, but what happens if it fails due to an intermittent timeout?? One of the core features of background jobs is that they retry on failure.
Therefore I would advise using a background job instead of an observer for this.
Now with all of that said, we've barely spoken about what problem you're trying to solve:
updating a second cache database
You've said very little able what the "second cache database" actually is, but I suspect this could actually be configured as a leader/follower database, and live completely outside of your application.

Bizarre ActiveRecord issues - like generating invalid SQL

Recently we deployed a new version of our app, and since then we've been seeing some really weird issues with ActiveRecord. For example, here's a snippet of a query it generates hundreds of times per day, usually correctly:
`entries`.`style` AS t1_r25, `entries`.`pdf_visibility` AS , `entries`.`web_visibility` AS t1_r27
That's not a typo, t1_r26 is missing there although there's a space where it should be. But only that one time. That's not hand-written SQL, either, that's ActiveRecord writing the query and deciding on all the placeholder variables. It has similarly botched other queries leaving things blank that shouldn't be blank (shouldn't even be possible), but only once in a while. Most of the time it's fine.
We're also seeing a lot of instances where it complains about things like table_alias or reflection being an undefined variable or method on false:FalseClass. That's true...but the thing that is a FalseClass should have been an ActiveRecord model. We have no clue how any of this is happening, or how we could possibly have written a bug in our Rails code that would do most of this (especially the invalid query above).
We're on Rails 4.1.16 (we upgraded from 4.1.8 when this started happening) with Ruby 2.2.0 in Passenger 5.0.26 (going to 5.0.30 next). These errors are extremely sporadic and none of them make any sense. Out of thousands of requests per day, only a small handful of them (less than 10 across 5 servers) result in one of these weird errors, and we can't purposely reproduce any of them.
My entire team is stumped. We've spent hours poring over code changes and can't see anything that might cause any of this. We don't even know what we could possibly have written that would cause ActiveRecord to sometimes write a bad query in a way that we shouldn't be able to affect. We have no idea how to begin troubleshooting this kind of thing. Does anyone out there have a hint that might point us in some useful direction?
Update: Here's a new one it threw this morning. Note that LibraryItem is one of our pretty straightforward ActiveRecord models:
NoMethodError: undefined method `__callbacks' for #<LibraryItem:0x007f66cc5b82b0>
I...have no idea.
To close the loop for those who tried to help and for anyone who stumbles into this: We cured it by upgrading MRI. We'd been running on 2.2.0 for around a year, which was why we didn't immediately suspect it, and also because this started with a particular deployment. I was tipped off when we saw a couple of errors about an inability to allocate memory, and when MRI exploded in a hail of shrapnel on one server (by which I mean it segfaulted) and took Passenger down with it.
From there I started looking at MRI changelogs and noticed a ton of memory and GC related bug fixes between 2.2.0 and 2.2.5. Last night we upgraded to 2.2.5 with a deployment, and (fingers crossed) we haven't seen a single one of these weird issues yet. (Previously we were seeing 12-20 per day or more, depending on traffic).
So, why did it start happening following a deployment for us? I don't know for sure, but I have a guess: I'm thinking the size in bytes of our application in memory finally hit some critical mass at which it started triggering one or more of the MRI bugs that were fixed between 2.2.0 and 2.2.5. Best I can come up with.
Huge thanks to those who stepped in to try to assist!

Sucker punch tests in rails, using connection_pool block, results in connection timeout

Thanks in advance for your kind response.
At work we are using sucker punch gem for a rails app to send emails and other stuff we want to do asynchronously.
We implemented a couple of actors with no problems and even wrote some tests for them successfully, using the recommended configuration for that matter (requiring sucker_punch/testing/inline in the specs and using truncation as database cleaning strategy).
Everything was working like a charm, until the last actor that we decided to implement. It has nothing different than the others, but now, when running the test suite, ActiveRecord::ConnectionTimeoutError is raised.
I've searched the internet for a solution but nothing came up. Most of the answers (like this one) would suggest to use ActiverRecord::Base.connection_pool.with_connection method passing a block to it. We were already doing that.
The only thing that I can think of is that we are handling errors on the actors, rescuing exceptions, like this:
def perform
ActiveRecord::Base.connection_pool.with_connection do
begin
... # do something
rescue SomeException => e
... # handle exception
end
end
end
But looking at source this souldn't be a problem since with_connection has an ensure to release it.
I will be opening an issue on sucker punch and will be updating this question if I have some news.
The release in question can wait, but this also makes me wonder if we are having this same issue in production...
Cheers,
Aldana.
EDIT
The author of the gem told me that apparently there was nothing wrong with the code, and suggested to increase the pool size. I'm going to use this approach and if the error persists we will change some parts of the code not to use sucker punch.

What are 'best practices' for dealing with transient database errors in a highly concurrent Rails system?

While researching a deadlock issue, I found the following post:
https://rails.lighthouseapp.com/projects/8994/tickets/6596
The gist of it is as follows:
the MySQL docs say:
Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
Therefore debugging transient deadlocks is an antipattern because MySQL says they are OK and unavoidable.
Therefore, Rails should offer us a way, because it:
makes the assumption that there is the "best" way to do things, and it's designed to encourage that way
but Rails doesn't offer us a way so we are using a hacky DIY thing.
So if all of this is true, where is the Rails solution?
NOTE: This project is inactive, but seems simple enough to be a solution. Why does Rails not have something like this?
https://github.com/qertoip/transaction_retry
The fix, for me, was a better index.
The update in question was in a query with a join, and existing indexes were not sufficient for MySQL to join and search efficiently.
Adding the appropriate index completely removed the deadlock issue even in tests with unreasonably concurrent loads.

In Rails, do I need to worry about failed transactions if the same model gets updated by two different mongrels?

I have a fairly vanilla rails app with low traffic at present, and everything seems to work OK.
However, I don't know much about the rails internals and I'm wondering what happens in a busy site if two requests come in at the same time and try to update the same model, from (I assume) two separate mongrel processes. Could this result in a failed transaction exception or similar, or does rails do any magic to serialize controller methods?
If an update could fail, what is the best practice to watch for and handle this type of situation?
For more background, my controller methods often update multiple models. I currently don't do anything special to create transactions and just rely on the default behaviors. Ideally I'd like the update to be retried rather than return an error (the updates are generally idempotent, i.e. doing them twice if necessary would be OK). My database is mysql.
afaik, mysql will wait until the first transaction is processed and then it will process the second one. #create, #update and #save get their stuff wrapped in an SQL transaction. And I guess mysql can handle those well.

Resources