Rails Profiling Weirdness - Inconsistent Render Times Reported - ruby-on-rails

This is a long shot, but I was wondering if anyone has any idea:
I'm profiling my Rails app with the NewRelic RPM in development, and I'm seeing some really long view load times. The thing is, the next time I load the page, it's an entirely different set of views that's taking a long time to load.
Page Load #1
Page Load #2
I'm not doing anything too crazy. Rails 4, Ruby 2, partial caching with memcached (but I'm seeing the same errors even when caching is disabled.)
Any idea what's up with this? It's not just an error with the logs, as the application is indeed taking a while to render this page. Not an error specific to NewRelic's RPM either -- I see the same thing with rails_panel.

It turns out this is an issue with the Garbage Collector. (I knew there had to be a reasonable explanation for this.)
There's a great post going on about this issue on the Discource Meta Forum. The tl;dr is to set your RUBY_GC_MALLOC_LIMIT to a higher value, to avoid garbage collections during renders.

Related

Bizarre ActiveRecord issues - like generating invalid SQL

Recently we deployed a new version of our app, and since then we've been seeing some really weird issues with ActiveRecord. For example, here's a snippet of a query it generates hundreds of times per day, usually correctly:
`entries`.`style` AS t1_r25, `entries`.`pdf_visibility` AS , `entries`.`web_visibility` AS t1_r27
That's not a typo, t1_r26 is missing there although there's a space where it should be. But only that one time. That's not hand-written SQL, either, that's ActiveRecord writing the query and deciding on all the placeholder variables. It has similarly botched other queries leaving things blank that shouldn't be blank (shouldn't even be possible), but only once in a while. Most of the time it's fine.
We're also seeing a lot of instances where it complains about things like table_alias or reflection being an undefined variable or method on false:FalseClass. That's true...but the thing that is a FalseClass should have been an ActiveRecord model. We have no clue how any of this is happening, or how we could possibly have written a bug in our Rails code that would do most of this (especially the invalid query above).
We're on Rails 4.1.16 (we upgraded from 4.1.8 when this started happening) with Ruby 2.2.0 in Passenger 5.0.26 (going to 5.0.30 next). These errors are extremely sporadic and none of them make any sense. Out of thousands of requests per day, only a small handful of them (less than 10 across 5 servers) result in one of these weird errors, and we can't purposely reproduce any of them.
My entire team is stumped. We've spent hours poring over code changes and can't see anything that might cause any of this. We don't even know what we could possibly have written that would cause ActiveRecord to sometimes write a bad query in a way that we shouldn't be able to affect. We have no idea how to begin troubleshooting this kind of thing. Does anyone out there have a hint that might point us in some useful direction?
Update: Here's a new one it threw this morning. Note that LibraryItem is one of our pretty straightforward ActiveRecord models:
NoMethodError: undefined method `__callbacks' for #<LibraryItem:0x007f66cc5b82b0>
I...have no idea.
To close the loop for those who tried to help and for anyone who stumbles into this: We cured it by upgrading MRI. We'd been running on 2.2.0 for around a year, which was why we didn't immediately suspect it, and also because this started with a particular deployment. I was tipped off when we saw a couple of errors about an inability to allocate memory, and when MRI exploded in a hail of shrapnel on one server (by which I mean it segfaulted) and took Passenger down with it.
From there I started looking at MRI changelogs and noticed a ton of memory and GC related bug fixes between 2.2.0 and 2.2.5. Last night we upgraded to 2.2.5 with a deployment, and (fingers crossed) we haven't seen a single one of these weird issues yet. (Previously we were seeing 12-20 per day or more, depending on traffic).
So, why did it start happening following a deployment for us? I don't know for sure, but I have a guess: I'm thinking the size in bytes of our application in memory finally hit some critical mass at which it started triggering one or more of the MRI bugs that were fixed between 2.2.0 and 2.2.5. Best I can come up with.
Huge thanks to those who stepped in to try to assist!

Rails app: Trouble shooting frequent Handling RequestTimeOut errors

I have a large webb app of which I have recently been working hard to reduce load times. I have two controllers Generator (some 20.000 items) and Product (some 1.500 items) that have been slow for a while but I have worked with indexes and smart queries. On my dev app the app response time is about 500 ms.
From time to time I still get RequestTimeOut on the app and I need help trouble shooting this error. I understand what it means (a request has taken too much time) and I have installed the 'rack-timeout' gem and set it to 15 seconds (which works fine).
I have gone through the entire app (and especially the two slowest: Generator & Product) in search for time to save. I have had some issues with caching that I am currently trying to fix (caching would help quite a bit).
It seems that these timeouts happens mostly when bots (Yandex.ru especially) spiders through my site and especially goes through one generator after another. They may not be very slow any more but loading so many after another causes a lot of requests.
Now I am out of ideas and need some help in order to know what and how to continue my trouble shooting:
Is there anything else outside of response time that cause this
error? E.g. memory leakage or something? Or is it just a matter of
lots of requests on slow controllers?
I haven't been able to test it on my development platform. Is
there a way to benchmark and see how the app would handle
requests like from the bots? I seem to remember there was an
"Apache-thing" one could use to simulate traffic like this.
Any other ways of looking at the problem or trouble shoot this
issue from a high level point of view? Any ideas and
thoughts are welcome!

Am I missing potential problems with custom page caching in Rails 3?

I use rails to present automated hardware testing results; our tests are run mainly via TCL. Recently, we have implemented a "log4TCL" which is basically a translated version of log4J. The log files have upwards of 40000 lines, each of which is written to the database as a logline record, and load time for the view is too long to be considered usable. I have tried to use ajax requests to speed things up, but the initial query/page load accounts for ~75% of the full page load.
My solution is page caching. I cannot use the rails included page caching because each log report is a different instance of "log_viewer". The report is generated using a test_run_id parameter. Rails-included page caching only caches one instance of "log_viewer.html". What I need is "log_viewer_#{test_run_id}.html". I have implemented a way of doing this. The reports age out after one week and are purged from the test_runs/log_viewer_cache directory to save disk space. If an older report is needed, loading the page re-generates the report with a fresh age-out timer.
I have come to the conclusion that this is the way to go. My concern is that I have not found any other implementations such as this anywhere which leads me to believe that I have missed an inherent flaw in my design. Any input would be much appreciated.
EDIT: For clarification, the "Dynamic" content of this report is what takes too long to load. I need to cache multiple instances of what action/fragment caching is not concerned with.

Should I care about the DOM processing time in Newrelic Rails app?

I am using Newrelic on my Ruby on Rails app? In the "Browser page load time" section, I can see a large portion of the loading time falls into the "DOM processing"(about 5 sec). I just want to know if this is normal? Should I be worrying about this and optimizing this more? There a lots of JS code in my app and many DOM are dynamically created, I think that is why its taking the most time to load. But the Firebug shows the load time is 6.18s (onload: 5.16s), seems to be pretty fast loading to me, and many js load a at the bottom of the page.
Thanks
If you want to improve client load times, then yes, you should care :) If not, don't worry about it.
https://newrelic.com/docs/features/how-does-real-user-monitoring-work

How do I get a more detailed transaction trace with the Ruby New Relic agent

I'm running a rails 3.0 application on Heroku and using the New Relic addon/service.
I have been looking at the transaction traces feature (available in the pro version) to understand a little more about the performance characteristics of the application. However, a significant portion of time (30-50%) is "uninstrumented time". After making a few stabs by putting method_tracers in some places and going through the reasonably slow cycle to test whether I get more info, I'm feeling this is going nowhere fast.
It seems in the PHP new relic agent they have a great feature to get very detailed traces without needing to guess where to put method tracers: http://newrelic.com/docs/php/php-agent-faq#top100
Is there anything similar to this for ruby?
Note: I'm already using rpm_contrib to get some more info and have garbage collection stats enabled. Also, this is not about fixing a performance problem, just understanding how to better use the performance tools available and scratch a niggling itch about that uninstrumented time.
There isn't currently anything similar for Ruby. I'll mention it to the Ruby engineer when I get a chance. My guess is unless a lot of requests come in for it, it won't be at the top of the list for a while, though. In the meantime, you can use the method tracers to figure out the uninstrumented time.
Hope that helps.
Method tracers can work well, but if you have a lot of code in your controller, try a binary search using trace_execution_scoped, which records the time spent in a block of code:
http://newrelic.github.com/rpm/NewRelic/Agent/MethodTracer/InstanceMethods/TraceExecutionScoped.html#method-i-trace_execution_scoped
Add a couple calls to this, give each metric a sensible name like "Custom/MySlowControllerAction/block0" (first argument to trace_execution_scoped), and repeat.
The metrics you name will show up not just in Transaction Traces, but also in the Performance Breakdown for the controller action under the Web Transactions tab, so you'll see average time in that block of code across all requests, not just the slow ones.

Resources