Rails Memory Error R14 on Heroku (blank screen with reload??) - ruby-on-rails

Using: Rails 3.0.3. Webhost: Heroku.com. 2 dynos & 0 worker.
I am a bit of a beginner using Rails and just released my first project. The users are experiencing intermittent problems that, according to the users, are "I get a blank screen with a message that the page needs to reload". Unfortunately I cannot get it better explained than that (one way-communication channel from the users).
I also get this error in the logs:
2011-11-09T19:00:12+00:00 heroku[web.1]: Process running mem=598M(116.8%)
2011-11-09T19:00:12+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
which seems pretty straightforward.
I have about 4 000 visitors a day and about 10 000 page views.
Edit: I also have New Relic and Exception notifier installed. I get a lot of "Execution expired" problems.
What I would like to know now is:
How can I find these intermittent errors (I have no timestamps). What should I search for in the logs (what string)?
Do memory problems cause the web browser to crash and reload (or something similar)? Or, is that related to java-problems?
Most importantly: How can I test my application to see where it is the most memory intensive? I know I have not made it with perfect coding so I need to find the bad parts.
Once again, this is my first project so the solutions might be easy but please help me out.

Are you using ImageMagick (specifically RMagick)? People have reported issues with its memory management in the past: https://groups.google.com/group/dragonfly-users/browse_thread/thread/67f88d9a2e085b7a?pli=1&auth=DQAAAIUAAABUdJ8RK3XRKIAvXno2rkOsd8OzwcKqNX3T21NjURsvINiRoHH-S_786Si2mphcOdRDmfGrjir6hBMLwj4xv6LE89Dd62ng2xmCArP3lcZZbw7-wXCBNS5BiaSeDVy-z46gHUHiVC21vEMWOBKMYMn7kMnJZhWXr1EcfZqb1KQNaGhwal2KLCmYxThW99pWLtE

Install the New Relic Standard addon - that will give you insight into your application and what's going on. The 'Dynos' tab will show you memory utilisation of your application, it sounds like an awfully high memory utilisation for the level of traffic you're reporting but it depends on your application - if you're seeing memory errors in the log then performance will be suffering see http://devcenter.heroku.com/articles/error-codes#r14__memory_quota_exceeded
Are you using any kind of error handling? You could install the Airbrake addon so you get notification of errors or use the Exception Notifier gem which will email you errors as they occur. Once you have these in place you'll know what's occuring - whether it's in the application or if you don't receive any then it's outside factors, like the visitors internet connection etc.

Related

How to debug an ajax request raising "Error R15 (Memory quota vastly exceeded)"

I have a Rails app on Heroku that is crashing with Error R15 (Memory quota vastly exceeded).
I've tracked this issue to pages that contain several asynchronous requests. The errors appears to coincide with ajax requests to build remote datatables.
The problem is, I can't figure out why these errors are being raised.
I thought perhaps the databse queries and controller actions behind the ajaxified datatables might be running slowly. But if I examine these in development using miniprofiler, the requests appear very efficent.
Then I thought, perhaps the server is receiving multiple simultaneous requests, and this is overloading the heroku dyno. But I ramped the dynos up to a very high number, and still see the error.
What would be a sensible way to start identfying and debugging what is causing this memory error? I've not had to solve a issue like this before.
Memory is allocated per-dyno on Heroku so adding more dynos will probably not actually solve the problem if it is code-level since it will cause each dyno to exceed its memory limit individually costing you lots of money and not actually solving the problem.
You're better off scaling horozontally and using Performance-L dynos. This will increase each dyno up to 14GB of memory. You can then use metrics to see how much memory is being used. If the amount of user memory manages to use up all 14GB then you may have a memory leak in one of your dependencies.

Strange TTFB (time to first byte) issue on Heroku

We're in the process of improving performance of the our rails app hosted at Heroku (rails 3.2.8 and ruby 1.9.3). During this we've come across one alarming problem for which the source seems to be extremely difficult to track. Let me quickly explain how we experience the problem and how we've tried to isolate it.
--
Since around June we've experienced weird lag behavior in Time to First Byte all over the site. The problems is obvious from using the site (sometimes the application doesn't respond for 10-20 seconds), and it's also present in waterfall analysis via webpagetest.org.
We're based in Denmark but get this result from any host.
To confirm the problem we've performed a benchmark test where we send 300 identical requests to a simple page and measured the response time.
If we send 300 requests to the front page the median response time is below 1 second, which is fairly good. What scares us is that 60 requests takes more that double that time and 40 of those takes more than 4 seconds. Some requests take as much as 16 seconds.
None of these slow requests show up in New Relic, which we use for performance monitoring. No request queuing shows up and the results are the same no matter how high we scale our web processes.
Still, we couldn't reject that the problem was caused by application code, so we tried another experiment where we responded to the request via rack middleware.
By placing this middleware (TestMiddleware) at the beginning of the rack stack, we returned a request before it even hit the application, ensuring that none of the following middleware or the rails app could cause the delay.
Middleware setup:
$ heroku run rake middleware
use Rack::Cache
use ActionDispatch::Static
use TestMiddleware
use Rack::Rewrite
use Rack::Lock
use Rack::Runtime
use Rack::MethodOverride
use ActionDispatch::RequestId
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use ActionDispatch::RemoteIp
use Rack::Sendfile
use ActionDispatch::Callbacks
use ActiveRecord::ConnectionAdapters::ConnectionManagement
use ActiveRecord::QueryCache
use ActionDispatch::Cookies
use ActionDispatch::Session::DalliStore
use ActionDispatch::Flash
use ActionDispatch::ParamsParser
use ActionDispatch::Head
use Rack::ConditionalGet
use Rack::ETag
use ActionDispatch::BestStandardsSupport
use NewRelic::Rack::BrowserMonitoring
use Rack::RailsExceptional
use OmniAuth::Builder
run AU::Application.routes
We then ran the same script to document response time and got pretty much the same result. The median response time was around 130ms (obviously faster because it doesn't hit the app. But still 60 requests took more than 400ms and 25 requests took more than 1 second. Again, with some requests as slow as 16 seconds.
One explanation could be related to slow hops on the network or DNS setup, but the results of traceroute looks perfectly OK.
This result was confirmed from running the response script on another rails 3.2 and ruby 1.9.3 application hosted on Heroku - no weird behavior at all.
The DNS setup follows Heroku's recommendations.
--
We're confused to say the least. Could there be something fishy with Heroku's routing network?
Why the heck are we seeing this weird behavior? How do we get rid of it? And why can't we see it in New Relic?
It Turned out that it was a kind of request queuing. Sometimes, that web server was busy, and since heroku just routs randomly incoming requests randomly to any dyno, then I could end up in a queue behind a dyno, which was totally stuck due to e.g. database problems. The strange thing is, that this was hardly noticeable in new relic (it's a good idea to uncheck all other resources when viewing thins in their charts, then the queuing suddenly appears)
EDIT 21/2 2013: It has turned out, that the reason why it wasn't hardly noticeable in Newrelic was, that it wasn't measured! http://rapgenius.com/Lemon-money-trees-rap-genius-response-to-heroku-lyrics
We find this very frustrating, and we ended up leaving Heroku in favor of dedicated servers. This gave us 20 times better performance at a 1/10 of the cost. Additionally I must say that we are disappointed by Heroku who at the time this happened, denied that the slowness was due to their infrastructure even though we suspected it and highlighted it several times. We even got answers like this back:
Heroku 28/8 2012: "If you're not seeing request queueing or other slowness reported in New Relic, then this is likely not a server-side issue. Heroku's internal routing should take <1ms. None of our monitoring systems are indicating any routing problems currently."
Additionally we spoke to Newrelic who also seemed unaware of the issue, even though they according to them selfs has a very close work relationship with Heroku.
Newrelic 29/8 2012: "It looks like whatever is causing this is happening before the Ruby agent's visibility starts. The queue time that the agent records is from the time the request enters a dyno, so the slow down is occurring before then."
The bottom-line was, that we ended up spending hours and hours on optimizing code that wasn't really the bottleneck. Additionally running with a too high dyno scale in a desperate try to boost our performance, but the only thing that we really got from this was bigger receipts from both Heroku and Newrelic - NOT COOL. I'm glad that we changed.
PS. At that time there even was a bug that caused newrelic pro to be charged on ALL dynos even though we, (according to Newrelics own advice), had disabled the monitoring on our background worker processes. It took a lot of time and many emails before the mistake was admitted by both parties.
PPS. If you are not aware of the current ongoing discussion, then here is the link http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics
EDIT 26/2 2013
Heroku has just announced in their newsletter, that Newrelic has released an update that apparently should cast some light on the situation at Heroku.
EDIT 8/4 2013
Heroku has just released an FAQ over the topic
traceroute is not a good measure of problems in the network, its a tool that can find failures along the network, but it will not show you the best view.
Try just putting up a static webpage and hit it with the ip address from your webpage tester. If it is still slow, blame the network.
If for some reason it is fast, then you have a different issue.

Rails: Execution expired (and memory error)

Using: Rails 3.0.3 & Heroku & Execution Notifier & New Relic
I get a lot of execution expired all throughout my website. I have recently realized that I have a part of the website that caused an infinite loop (and thus a memory error).
Question #1: Is it very likely that, when this infinite loop occured, that it would affect the entire website making all others have to wait for it to stop/crash and thus causing them execution expired (which I believe is 30 sec at Heroku).
Question #2: It seems like my website is quite slow. Can you recommend any service I can use to pinpoint what is actually taking time? I have seen some graphical service before with columns on how much each part took to load (like image2 = 3 ms, this javascript = 3002 ms and so on). How else can I troubleshoot or handle Execution Expired errors (referrals to good guides etc is appreciated).
1) likely depends on how many dynos you have, but if it's a common issue then they could potentially all be locked up simultaneously.
2) New Relic is excellent. It'll let you pinpoint slow actions, drill in and inspect queries, etc.

Diagnosing Rails 3 Heroku Slowness

I have a Rails 3 app that I am running on Heroku. The app is usually really fast but sometimes I'll get cases where the app seems to hang for upwards of 2 minutes before finally returning the requested page.
I have the New Relic addon installed and there doesn't seem to be anything sticking out at me. It seems to be kind of sporadic and doesn't seem to be connected to a particular controller/action.
How would you suggest I go about pinpointing the cause of this problem?
http://github.com/kyledecot/skateparks-web
Always check the logs. When it happens, immediately go check your logs. Pretty sure all SQL queries are logged and timed, and you might want to add logging and timing to some of your own service calls.
If you upgrade to the Pro level of New Relic, you can get detailed traces specifically of your slow transactions. Turn up your Transaction Trace threshold to a large number (1s is pretty big), and wait for traces to show up. You'll see a detailed breakdown of the performance of an individual request, including SQL queries.
(Full disclosure: I work for New Relic.)

How do I debug random Timeout::Error: execution expired

We are using Rails 2.3.5 and have been experiencing seemingly random Timeout::Error: execution expired errors. The errors reported by Hoptoad are not consistently in any particular controller and show up everywhere from user sessions to account settings to some of our core functionality controllers.
The vast majority of requests do not Timeout but there are enough to cause concern.
Is this normal? If so, what are some things to look at to decrease the occurance? If not, has anyone run into this and what are some common problems that can trigger an error like this.
It is normal for requests to timeout, if your server is running under a heavy load. You should look to see if the timeouts are coincident with long-running SQL requests or some other activity that takes a lot of time. Often, you can decrease your timeouts by upgrading your hardware, or by optimizing your code in general. If you can't upgrade your hardware, try optimizing your longest running and most frequently accessed actions.

Resources