I have a Rails 3 app that I am running on Heroku. The app is usually really fast but sometimes I'll get cases where the app seems to hang for upwards of 2 minutes before finally returning the requested page.
I have the New Relic addon installed and there doesn't seem to be anything sticking out at me. It seems to be kind of sporadic and doesn't seem to be connected to a particular controller/action.
How would you suggest I go about pinpointing the cause of this problem?
http://github.com/kyledecot/skateparks-web
Always check the logs. When it happens, immediately go check your logs. Pretty sure all SQL queries are logged and timed, and you might want to add logging and timing to some of your own service calls.
If you upgrade to the Pro level of New Relic, you can get detailed traces specifically of your slow transactions. Turn up your Transaction Trace threshold to a large number (1s is pretty big), and wait for traces to show up. You'll see a detailed breakdown of the performance of an individual request, including SQL queries.
(Full disclosure: I work for New Relic.)
Related
I have a server on Heroku - 3 dynos, 2 processes each.
The server does 2 things:
It responds to requests from the browser (AJAX and some web pages), based on data stored in a postgresql database
It exposes a REST API to update the data in the database. This API is called by another server. The rate of calls is limited: The other server only calls my server through a queue with a single worker, which makes sure the other server doesn't issue more than one request in parallel to my server (I verified that indeed it doesn't).
When I look at new relic, I see the following graph, which suggests that even though I keep the other server at one parallel request at most, it still loads my server which creates peaks.
I'd expect that since the rate of calls from the other server is limited, my server will not get overloaded, since a request will only start when the previous request ended (I'm guessing that maybe the database gets overloaded if it gets an update request and returns but continue processing after that).
What can explain this behaviour?
Where else can I look at in order to understand what's going on?
Is there a way to avoid this behaviour?
There are whole lot of directions this investigation could go, but from your screenshot and some inferences, I have two guesses.
A long query—You'd see this graph if your other server or a browser occasionally hits a slow query. If it's just a long read query and your DB isn't hitting its limits, it should only affect the process running the query, but if the query is taking an exclusive lock, all dynos will have to wait on it. Since the spikes are so regular, first think of anything you have running on a schedule - if the cadence matches, you probably have your culprit. The next simple thing to do is run heroku pg:long-running-queries and heroku pg:seq-scans. The former shows queries that might need optimization, and the latter shows full table scans you can probably fix with a different query or a better index. You can find similar information in NewRelic's Database tab, which has time and throughput graphs you can try to match agains your queueing spikes. Finally, look at NewRelic's Transactions tab.
There are various ways to sort - slowest average response time is probably going to help, but check out all the options and see if any transactions stand out.
Click on a suspicious transaction and look at the graph on the right. If you see spikes matching your queueing buildups, that could be it, but since it looks to be affecting your whole site, watch out for several transactions seeing correlated slowdowns.
Check out the transaction traces at the bottom. Something in there taking a long time to run is as close to a smoking gun as you'll get. This should correlate with pg:long-running-queries.
Look at the breakdown table between the graph and the transaction traces. Check for things that are taking a long time (eg. a 2 second external request) or happening often (eg, a partial that gets rendered 2500 times per request). Those are places for caching or optimization.
Garbage collection—This is less likely because Ruby GCs all the time and there's no reason it would show spikes on that regular cadence, but if there's a regular request that allocates a ton of objects, both building the objects and cleaning them up will take time. It would only affect one dyno at once, and it would be correlated with a long or highly repetitive query in your NewRelic investigation. You can see some stats about this in NewRelic's Ruby VM tab.
Take a look at your dyno and DB memory usage too. Both are printed to the Heroku logs, and if you add Librato, they'll build some automatic graphs that are quite helpful. If your dyno is swapping, performance will suffer and you should either upgrade to a bigger dyno or run fewer processes per dyno. Processes will typically accumulate memory as they run and never quite release as much as you'd like, so tune it so that right before a restart, your dyno is just under its available RAM. Similarly for the DB, if you're hitting swap there, query performance will suffer and you should upgrade.
Other things it could be, but probably isn't in this case:
Sleeping dynos—Heroku puts a dyno to sleep if it hasn't served a request in a while, but only if you have just 1 dyno running. You have 3, so this isn't it.
Web Server Concurrency—If at any given moment, there are more requests than available processes, requests will be queued. The obvious fix is to increase the available dynos/processes, which will put more load on your DB and potentially move the issue there. Since some regular request is visible every time, I'm guessing request volume is low and this also isn't your problem.
Heroku Instability—Sometimes, for no obvious reason, Heroku starts queueing requests more than it should and doesn't report any issues at status.heroku.com. Restarting the dynos typically fixes that temporarily while Heroku gets their head back on straight.
My rails app, according to my heroku logs, is serving requests on average of about 1700 to 2500 milliseconds (this is the entire roundtrip). I used new relic to profile my app, and it seems that the majority of the request is not spent in my database but rather in the "Web Transaction" section of New Relic. It seems like the "Controller" category tends to be the slowest among requests, followed by the "SQL - SELECT" segment in the "Database" category.
I'm not quite sure what could be causing my performance bottleneck in my controllers, nor do I think I can dive deeper into new relic without paying for the premium version. I recently added indexes to the foreign keys of my application, although I do not think this made much of a difference in terms of database response times.
I know this is not enough information to figure out what is causing these bottlenecks, but I do not even know where to start or what info to give. If people could tell me what info is needed to diagnose these issues, then that would be helpful to me.
New Relic for Ruby includes a free, standalone developer mode. When running in RAILS_ENV=development, the New Relic gem adds a route that will show you a detailed profile for each request. Go to http://localhost:3000/newrelic after you hit your app a few times.
The profile includes time for each SQL query, as well as for components of your code. You can use custom instrumentation to break down big chunks of code into smaller segments (or individual methods) that get timed separately. This feature is a lot like the transaction traces you get in the paid Pro version, one major difference being that you wouldn't want to run the free dev mode in production.
(Full disclosure: I work for NR. Not many people know about the free dev mode, though, so I thought it was worth mentioning.)
You could potentially make Javascript loading appear even faster with something like head.js, which will load your JS files asynchronously and in parallel.
Take a look at this slide show:
http://www.slideshare.net/drhenner/optimize-the-obvious-7636674
Might not be enough but it goes through some common faults.
Digging a little deaper take a look at this video: http://windycityrails.org/videos2011/#2
It is longer but gives a lot of places to look.
On a different note. Do you use a CDN?
Using: Rails 3.0.3 & Heroku & Execution Notifier & New Relic
I get a lot of execution expired all throughout my website. I have recently realized that I have a part of the website that caused an infinite loop (and thus a memory error).
Question #1: Is it very likely that, when this infinite loop occured, that it would affect the entire website making all others have to wait for it to stop/crash and thus causing them execution expired (which I believe is 30 sec at Heroku).
Question #2: It seems like my website is quite slow. Can you recommend any service I can use to pinpoint what is actually taking time? I have seen some graphical service before with columns on how much each part took to load (like image2 = 3 ms, this javascript = 3002 ms and so on). How else can I troubleshoot or handle Execution Expired errors (referrals to good guides etc is appreciated).
1) likely depends on how many dynos you have, but if it's a common issue then they could potentially all be locked up simultaneously.
2) New Relic is excellent. It'll let you pinpoint slow actions, drill in and inspect queries, etc.
Using: Rails 3.0.3. Webhost: Heroku.com. 2 dynos & 0 worker.
I am a bit of a beginner using Rails and just released my first project. The users are experiencing intermittent problems that, according to the users, are "I get a blank screen with a message that the page needs to reload". Unfortunately I cannot get it better explained than that (one way-communication channel from the users).
I also get this error in the logs:
2011-11-09T19:00:12+00:00 heroku[web.1]: Process running mem=598M(116.8%)
2011-11-09T19:00:12+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
which seems pretty straightforward.
I have about 4 000 visitors a day and about 10 000 page views.
Edit: I also have New Relic and Exception notifier installed. I get a lot of "Execution expired" problems.
What I would like to know now is:
How can I find these intermittent errors (I have no timestamps). What should I search for in the logs (what string)?
Do memory problems cause the web browser to crash and reload (or something similar)? Or, is that related to java-problems?
Most importantly: How can I test my application to see where it is the most memory intensive? I know I have not made it with perfect coding so I need to find the bad parts.
Once again, this is my first project so the solutions might be easy but please help me out.
Are you using ImageMagick (specifically RMagick)? People have reported issues with its memory management in the past: https://groups.google.com/group/dragonfly-users/browse_thread/thread/67f88d9a2e085b7a?pli=1&auth=DQAAAIUAAABUdJ8RK3XRKIAvXno2rkOsd8OzwcKqNX3T21NjURsvINiRoHH-S_786Si2mphcOdRDmfGrjir6hBMLwj4xv6LE89Dd62ng2xmCArP3lcZZbw7-wXCBNS5BiaSeDVy-z46gHUHiVC21vEMWOBKMYMn7kMnJZhWXr1EcfZqb1KQNaGhwal2KLCmYxThW99pWLtE
Install the New Relic Standard addon - that will give you insight into your application and what's going on. The 'Dynos' tab will show you memory utilisation of your application, it sounds like an awfully high memory utilisation for the level of traffic you're reporting but it depends on your application - if you're seeing memory errors in the log then performance will be suffering see http://devcenter.heroku.com/articles/error-codes#r14__memory_quota_exceeded
Are you using any kind of error handling? You could install the Airbrake addon so you get notification of errors or use the Exception Notifier gem which will email you errors as they occur. Once you have these in place you'll know what's occuring - whether it's in the application or if you don't receive any then it's outside factors, like the visitors internet connection etc.
Website built on: Rails 3.0.3 & Heroku
Installed: Exception Notifier & New Relic
I am rewriting this question since my previous attempt was unclear and subjective, hope this works better.
I have a website where users can perform calculations. Once in a while I get reports from the users through my (one way) communication media that "the website crashes and tells me I need to restart IE, but it still doesn't work" which is pretty much as specific information I have been retrieving.
I get no timestamps so I can not look for it in the logs (Heroku only allows 2000 lines of error logs), I get no exception notifications and I cannot make the error appear myself so I would like your help with the following:
What would make a website crash in the way that it would tell the user to restart the browser? I have never even heard of that! What should I look for in the logs, if I can get timestamps for the errors?
Assuming it is a JavaScript-problem (which seems likely). How could I trouble shoot this issue? What tools can I use? Firebug does not give me any errors.
Assuming it is a IE version thing. How can I test the application in a systematic manner? (without installing/reinstalling different versions). Is there any applications that can test an application for different browsers?
It seems to work for most users/combinations. Do you have an older version of IE installed and can produce this error? Site: www.countcalculate.com (try any calculation).
Probably related to a very intensive loop. For some reason IE thinks it's appropriate to block the UI thread while JavaScript is executing, so the whole thing will freeze up if your JavaScript breaks.
I can't reproduce the issue, so I'd suggest trying to get more detailed reports from your customers.
The problem was (appearantly) limited to IE8 & XP-users. That combination conflicted with a bug in jQuery 1.6.2 according to http://bugs.jquery.com/ticket/9981.
Downgrading to 1.6.1 solved the problem.