Recently we deployed a new version of our app, and since then we've been seeing some really weird issues with ActiveRecord. For example, here's a snippet of a query it generates hundreds of times per day, usually correctly:
`entries`.`style` AS t1_r25, `entries`.`pdf_visibility` AS , `entries`.`web_visibility` AS t1_r27
That's not a typo, t1_r26 is missing there although there's a space where it should be. But only that one time. That's not hand-written SQL, either, that's ActiveRecord writing the query and deciding on all the placeholder variables. It has similarly botched other queries leaving things blank that shouldn't be blank (shouldn't even be possible), but only once in a while. Most of the time it's fine.
We're also seeing a lot of instances where it complains about things like table_alias or reflection being an undefined variable or method on false:FalseClass. That's true...but the thing that is a FalseClass should have been an ActiveRecord model. We have no clue how any of this is happening, or how we could possibly have written a bug in our Rails code that would do most of this (especially the invalid query above).
We're on Rails 4.1.16 (we upgraded from 4.1.8 when this started happening) with Ruby 2.2.0 in Passenger 5.0.26 (going to 5.0.30 next). These errors are extremely sporadic and none of them make any sense. Out of thousands of requests per day, only a small handful of them (less than 10 across 5 servers) result in one of these weird errors, and we can't purposely reproduce any of them.
My entire team is stumped. We've spent hours poring over code changes and can't see anything that might cause any of this. We don't even know what we could possibly have written that would cause ActiveRecord to sometimes write a bad query in a way that we shouldn't be able to affect. We have no idea how to begin troubleshooting this kind of thing. Does anyone out there have a hint that might point us in some useful direction?
Update: Here's a new one it threw this morning. Note that LibraryItem is one of our pretty straightforward ActiveRecord models:
NoMethodError: undefined method `__callbacks' for #<LibraryItem:0x007f66cc5b82b0>
I...have no idea.
To close the loop for those who tried to help and for anyone who stumbles into this: We cured it by upgrading MRI. We'd been running on 2.2.0 for around a year, which was why we didn't immediately suspect it, and also because this started with a particular deployment. I was tipped off when we saw a couple of errors about an inability to allocate memory, and when MRI exploded in a hail of shrapnel on one server (by which I mean it segfaulted) and took Passenger down with it.
From there I started looking at MRI changelogs and noticed a ton of memory and GC related bug fixes between 2.2.0 and 2.2.5. Last night we upgraded to 2.2.5 with a deployment, and (fingers crossed) we haven't seen a single one of these weird issues yet. (Previously we were seeing 12-20 per day or more, depending on traffic).
So, why did it start happening following a deployment for us? I don't know for sure, but I have a guess: I'm thinking the size in bytes of our application in memory finally hit some critical mass at which it started triggering one or more of the MRI bugs that were fixed between 2.2.0 and 2.2.5. Best I can come up with.
Huge thanks to those who stepped in to try to assist!
Related
I have a large webb app of which I have recently been working hard to reduce load times. I have two controllers Generator (some 20.000 items) and Product (some 1.500 items) that have been slow for a while but I have worked with indexes and smart queries. On my dev app the app response time is about 500 ms.
From time to time I still get RequestTimeOut on the app and I need help trouble shooting this error. I understand what it means (a request has taken too much time) and I have installed the 'rack-timeout' gem and set it to 15 seconds (which works fine).
I have gone through the entire app (and especially the two slowest: Generator & Product) in search for time to save. I have had some issues with caching that I am currently trying to fix (caching would help quite a bit).
It seems that these timeouts happens mostly when bots (Yandex.ru especially) spiders through my site and especially goes through one generator after another. They may not be very slow any more but loading so many after another causes a lot of requests.
Now I am out of ideas and need some help in order to know what and how to continue my trouble shooting:
Is there anything else outside of response time that cause this
error? E.g. memory leakage or something? Or is it just a matter of
lots of requests on slow controllers?
I haven't been able to test it on my development platform. Is
there a way to benchmark and see how the app would handle
requests like from the bots? I seem to remember there was an
"Apache-thing" one could use to simulate traffic like this.
Any other ways of looking at the problem or trouble shoot this
issue from a high level point of view? Any ideas and
thoughts are welcome!
This is a long shot, but I was wondering if anyone has any idea:
I'm profiling my Rails app with the NewRelic RPM in development, and I'm seeing some really long view load times. The thing is, the next time I load the page, it's an entirely different set of views that's taking a long time to load.
Page Load #1
Page Load #2
I'm not doing anything too crazy. Rails 4, Ruby 2, partial caching with memcached (but I'm seeing the same errors even when caching is disabled.)
Any idea what's up with this? It's not just an error with the logs, as the application is indeed taking a while to render this page. Not an error specific to NewRelic's RPM either -- I see the same thing with rails_panel.
It turns out this is an issue with the Garbage Collector. (I knew there had to be a reasonable explanation for this.)
There's a great post going on about this issue on the Discource Meta Forum. The tl;dr is to set your RUBY_GC_MALLOC_LIMIT to a higher value, to avoid garbage collections during renders.
I found copycopter a while ago and now started using it. I am pretty happy with it but now am having some weird issues.
My DB size is 8M (thousands of versions on day one) and it keeps adding blubs. It started with a few hundres and over the weekend it went up to 1800+. I did not touch the app over the weekend.
I keep getting blurbs added to the DB nonstop even when I don't edit the application. Things like activerecord.model.x, then activerecord.model.x.other for EVERY MODEL in the app so this is definitely a duplication issue.
I really want to use copycopter but am stuck on this continued blurb adding issue/8M DB. Any help is appreciated. Thank you.
We have just discovered the same the problem. It currently seems to be add any form variable into the equation even when there is no copycopter syntax in site. Also overtime a helper has a label it grabs it.
We narrowed it down to just development ENV, so we think it maybe one of our debugger tools. It seems to capture temp files that are written
I'm running a rails 3.0 application on Heroku and using the New Relic addon/service.
I have been looking at the transaction traces feature (available in the pro version) to understand a little more about the performance characteristics of the application. However, a significant portion of time (30-50%) is "uninstrumented time". After making a few stabs by putting method_tracers in some places and going through the reasonably slow cycle to test whether I get more info, I'm feeling this is going nowhere fast.
It seems in the PHP new relic agent they have a great feature to get very detailed traces without needing to guess where to put method tracers: http://newrelic.com/docs/php/php-agent-faq#top100
Is there anything similar to this for ruby?
Note: I'm already using rpm_contrib to get some more info and have garbage collection stats enabled. Also, this is not about fixing a performance problem, just understanding how to better use the performance tools available and scratch a niggling itch about that uninstrumented time.
There isn't currently anything similar for Ruby. I'll mention it to the Ruby engineer when I get a chance. My guess is unless a lot of requests come in for it, it won't be at the top of the list for a while, though. In the meantime, you can use the method tracers to figure out the uninstrumented time.
Hope that helps.
Method tracers can work well, but if you have a lot of code in your controller, try a binary search using trace_execution_scoped, which records the time spent in a block of code:
http://newrelic.github.com/rpm/NewRelic/Agent/MethodTracer/InstanceMethods/TraceExecutionScoped.html#method-i-trace_execution_scoped
Add a couple calls to this, give each metric a sensible name like "Custom/MySlowControllerAction/block0" (first argument to trace_execution_scoped), and repeat.
The metrics you name will show up not just in Transaction Traces, but also in the Performance Breakdown for the controller action under the Web Transactions tab, so you'll see average time in that block of code across all requests, not just the slow ones.
I have a long running rake task that is gobbling up all my system memory over time? What is the quickest way to track down my issue and get to the bottom of it?
Im using rails 2.3.5, ruby 1.8.7, ubuntu on slicehost and mysql 5.
I have rails app that works fine. I have a nightly job that runs all night and does tons of work (some external calls to twitter, google etc, and lots of db calls using active record, over time that job grows in memory size to nearly 4 gig. I need to figure out why the rake task is not releasing memeory.
I started looking into bleak_house, but it seems complex to setup and hasnt been updated in over a year. I cant get it to work locally so im reluctant to try in production.
thanks
Joel
Throwing out two ideas. First, if you're looping as part of this job, make sure you're not holding onto references to objects you don't need, as this will prevent them from being collected. If you're done, remove them from your array, or whatever. Also, put a periodic GC.start into your loop as a way to see if it's simply not getting around to GC-ing.
Second idea is that ruby does not GC symbols, so if your API clients are storing values as symbols you can end up with a huge and growing set of symbols that will never be re-used. Symbols are tiny, but tiny things can still add up.
And of course, don't load more objects than you need to. use #find_each to load AR objects in batches if you have to iterate over lots of them.