Time to First Byte for heroku server - ruby-on-rails

Our Ruby on Rails website on heroku shows a Waiting (Time to First Byte) in chrome inspector of 1000ms+ on most requests for all pages.
Heroku's logs and New Relic both show total response times of under 200ms however. (This includes request Queuing)
The heroku app has two dynos and doesn't go into idle.
What could account for 800ms on average of missing time?

I believe this discrepancy is due to the time it takes to leave the server vs. the time it takes to be received by the client.

I sent a support ticket to heroku support who were great enough to look into it show the latency was occurring before the request arrived at the heroku server. The application makes use of CloudFlare which was contributing to the latency. CloudFlare had a blog write up on this at https://blog.cloudflare.com/ttfb-time-to-first-byte-considered-meaningles/ which goes into full detail.

Related

Unexplained high response time on Heroku environment

I'm using a paid Heroku ($35/mo 2x web dyno, $50/mo silver postgresdb) plan for a small internal app with typically 1-2 concurrent users. Database snapshot size is less than 1MB.
App is Rails 4.1. In the last two weeks, there's been a significant performance drop in production env, where Chrome dev tools reports response times like 8s
Typical:
Total ​8.13s
Stalled ​3.643 ms
DNS Lookup ​2.637 ms
Initial connection 235.532 ms
SSL ​133.738 ms
Request sent ​0.546 ms
Waiting (TTFB) ​3.43 s
Content Download ​4.47 s
I'm using Nitrous dev environment and get sub-1s response on dev server with non-precompiled assets (with mirrored db).
I'm a novice programmer and am not clear on how to debug this. Why would I see 800% slower performance than the dev environment on an $85/mo+ Heroku plan? Given my current programming skill level, my app is probably poorly optimized (a few N+1 queries in there...) but how bad can it be when production has 1-2 concurrent users??
Sample from logs if it helps:
sample#current_transaction=16478 sample#db_size=18974904bytes sample#tables=28 sample#active-connections=6 sam
ple#waiting-connections=0 sample#index-cache-hit-rate=0.99929 sample#table-cache-hit-rate=0.99917 sample#load-avg-1m=0.365 sample#load-avg-5m=0.45 sample#load-avg-15m=0.445 sample#read-iops=
37.587 sample#write-iops=36.7 sample#memory-total=15405616kB sample#memory-free=1409236kB sample#memory-cached=12980840kB sample#memory-postgres=497784kB
Sample from server logs:
Completed 200 OK in 78ms (Views: 40.9ms | ActiveRecord: 26.6ms
I'm seeing similar numbers on dev server but the actual visual performance is night and day. The dev responds as you would expect - sub-1s response and render. The production server is 4-5s delay.
I don't think it's related to ISP as suggested because I've actually been traveling and seeing identical performance problems from USA and Europe.
Add NewRelic add-on to your app's ecosystem on Heroku and explore what's going on. You can do it from Heroku's dashboard. Choose free plan and you will be provided with 2 weeks trial period of full functionality. After trial period there will be some constraints but anyway it's enough for measuring performance of the small app.
Also you can add the Logentries add-on and you will get access to the app's log history via web interface.

Rails 4 on Heroku extremely slow with loading

I've got Heroku deployment with my Rails 4 app and it's proving to be extremely slow. I'm not sure if my location has a factor as I'm based in Australia
I've got NewRelic addon and below is the problem that I'm seeing.
Category Segment % Time Avg calls Avg Time (ms)
View layouts/users Template 98.4 1.0 16,800
Based on this breakdown, I see that layout users is the problem for the performance (which is nearly 16.8 seconds!).
Is there a good way to profile this to find out exactly what functions are causing this problem and what are the best way to fix those?
Also another important thing to note is that when I go to map report it shows End User of 19.5 seconds which takes up a lot of time.
When an app on Heroku has only one web dyno and that dyno doesn't receive any traffic in 1 hour, the dyno goes to sleep.
When someone accesses the app, the dyno manager will automatically wake up the web dyno to run the web process type. This causing delay for this first request.
Are you noticing similar behaviour?

How to profile inconsistent H12 timeouts on Heroku

My users are seeing occasional request timeouts on Heroku. Unfortunately I can not consistently reproduce them which makes them really hard to debug. There's plenty of opportunity to improve performance - e.g. by reducing the huge number of database queries per request and by adding more caching - but without profiling that's a shot in the dark.
According to our New Relic analytics, many requests take between 1 and 5 seconds on the server. I know that's too slow, but it nowhere near the 30 seconds needed for the timeout.
The error tab on New Relic shows me several different database queries where the timeout occurs, but these aren't particularly slow queries and it can be different queries for each crash. Also for the same URL it sometimes does and sometimes does not show a database query.
How do I find out what's going on in these particular cases? E.g. how do I see how much time it was spending in the database when the timeout occurred, as opposed to the time it spends in the database when there's no error?
One hypothesis I have is that the database gets locked in some cases; perhaps a combination of reading and writing.
You may have already seen it, but Heroku has a doc with some good background about request timeouts.
If your requests are taking a long time, and the processes servicing them are not being killed before the requests complete, then they should be generating transaction traces that will provide details about individual transactions that took too long.
If you're using Unicorn, it's possible that this is not happening because the requests are taking long enough that they're hitting up against Unicorn's timeout (after which the workers servicing those requests will be forcibly killed, not giving the New Relic agent enough time to report back in).
I'd recommend a two-step approach:
Configure the rack-timeout middleware to have a timeout below Heroku's 30s timeout. If this works, it will terminate requests taking longer than the timeout by raising a Timeout::Error, and such requests should generate transaction traces in New Relic.
If that yields nothing (which it might, because rack-timeout relies on Ruby's stdlib Timeout class, which has some limitations), you can try bumping the Unicorn request handling timeout up from its default of 60s (assuming you're using Unicorn). Be aware that long-running requests will tie up a Unicorn worker for a longer period in this case, which may further slow down your site, so use this as a last resort.
Two years late here. I have minimal experience with Ruby, but for Django the issue with Gunicorn is that it does not properly handle slow clients on Heroku because requests are not pre-buffered, meaning a server connection could be left waiting (blocking). This might be a helpful article to you, although it applies primarily to Gunicorn and Python.
You're pretty clearly hitting the issue with long running requests. Check out http://artsy.github.com/blog/2013/02/17/impact-of-heroku-routing-mesh-and-random-routing/ and upgrade to NewRelic RPM 3.5.7.59 - the wait time measuring will be accurately reported.

Strange TTFB (time to first byte) issue on Heroku

We're in the process of improving performance of the our rails app hosted at Heroku (rails 3.2.8 and ruby 1.9.3). During this we've come across one alarming problem for which the source seems to be extremely difficult to track. Let me quickly explain how we experience the problem and how we've tried to isolate it.
--
Since around June we've experienced weird lag behavior in Time to First Byte all over the site. The problems is obvious from using the site (sometimes the application doesn't respond for 10-20 seconds), and it's also present in waterfall analysis via webpagetest.org.
We're based in Denmark but get this result from any host.
To confirm the problem we've performed a benchmark test where we send 300 identical requests to a simple page and measured the response time.
If we send 300 requests to the front page the median response time is below 1 second, which is fairly good. What scares us is that 60 requests takes more that double that time and 40 of those takes more than 4 seconds. Some requests take as much as 16 seconds.
None of these slow requests show up in New Relic, which we use for performance monitoring. No request queuing shows up and the results are the same no matter how high we scale our web processes.
Still, we couldn't reject that the problem was caused by application code, so we tried another experiment where we responded to the request via rack middleware.
By placing this middleware (TestMiddleware) at the beginning of the rack stack, we returned a request before it even hit the application, ensuring that none of the following middleware or the rails app could cause the delay.
Middleware setup:
$ heroku run rake middleware
use Rack::Cache
use ActionDispatch::Static
use TestMiddleware
use Rack::Rewrite
use Rack::Lock
use Rack::Runtime
use Rack::MethodOverride
use ActionDispatch::RequestId
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use ActionDispatch::RemoteIp
use Rack::Sendfile
use ActionDispatch::Callbacks
use ActiveRecord::ConnectionAdapters::ConnectionManagement
use ActiveRecord::QueryCache
use ActionDispatch::Cookies
use ActionDispatch::Session::DalliStore
use ActionDispatch::Flash
use ActionDispatch::ParamsParser
use ActionDispatch::Head
use Rack::ConditionalGet
use Rack::ETag
use ActionDispatch::BestStandardsSupport
use NewRelic::Rack::BrowserMonitoring
use Rack::RailsExceptional
use OmniAuth::Builder
run AU::Application.routes
We then ran the same script to document response time and got pretty much the same result. The median response time was around 130ms (obviously faster because it doesn't hit the app. But still 60 requests took more than 400ms and 25 requests took more than 1 second. Again, with some requests as slow as 16 seconds.
One explanation could be related to slow hops on the network or DNS setup, but the results of traceroute looks perfectly OK.
This result was confirmed from running the response script on another rails 3.2 and ruby 1.9.3 application hosted on Heroku - no weird behavior at all.
The DNS setup follows Heroku's recommendations.
--
We're confused to say the least. Could there be something fishy with Heroku's routing network?
Why the heck are we seeing this weird behavior? How do we get rid of it? And why can't we see it in New Relic?
It Turned out that it was a kind of request queuing. Sometimes, that web server was busy, and since heroku just routs randomly incoming requests randomly to any dyno, then I could end up in a queue behind a dyno, which was totally stuck due to e.g. database problems. The strange thing is, that this was hardly noticeable in new relic (it's a good idea to uncheck all other resources when viewing thins in their charts, then the queuing suddenly appears)
EDIT 21/2 2013: It has turned out, that the reason why it wasn't hardly noticeable in Newrelic was, that it wasn't measured! http://rapgenius.com/Lemon-money-trees-rap-genius-response-to-heroku-lyrics
We find this very frustrating, and we ended up leaving Heroku in favor of dedicated servers. This gave us 20 times better performance at a 1/10 of the cost. Additionally I must say that we are disappointed by Heroku who at the time this happened, denied that the slowness was due to their infrastructure even though we suspected it and highlighted it several times. We even got answers like this back:
Heroku 28/8 2012: "If you're not seeing request queueing or other slowness reported in New Relic, then this is likely not a server-side issue. Heroku's internal routing should take <1ms. None of our monitoring systems are indicating any routing problems currently."
Additionally we spoke to Newrelic who also seemed unaware of the issue, even though they according to them selfs has a very close work relationship with Heroku.
Newrelic 29/8 2012: "It looks like whatever is causing this is happening before the Ruby agent's visibility starts. The queue time that the agent records is from the time the request enters a dyno, so the slow down is occurring before then."
The bottom-line was, that we ended up spending hours and hours on optimizing code that wasn't really the bottleneck. Additionally running with a too high dyno scale in a desperate try to boost our performance, but the only thing that we really got from this was bigger receipts from both Heroku and Newrelic - NOT COOL. I'm glad that we changed.
PS. At that time there even was a bug that caused newrelic pro to be charged on ALL dynos even though we, (according to Newrelics own advice), had disabled the monitoring on our background worker processes. It took a lot of time and many emails before the mistake was admitted by both parties.
PPS. If you are not aware of the current ongoing discussion, then here is the link http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics
EDIT 26/2 2013
Heroku has just announced in their newsletter, that Newrelic has released an update that apparently should cast some light on the situation at Heroku.
EDIT 8/4 2013
Heroku has just released an FAQ over the topic
traceroute is not a good measure of problems in the network, its a tool that can find failures along the network, but it will not show you the best view.
Try just putting up a static webpage and hit it with the ip address from your webpage tester. If it is still slow, blame the network.
If for some reason it is fast, then you have a different issue.

Heroku. Request taking 100ms, intermittently Times out

After performing load testing against an app hosted on Heroku, I am finding that the most DB intensive request takes 50-200ms depending upon load. It never gets slower, no matter the load. However, seemingly at random, the request will outright timeout (30s or more).
On Heroku, why might a relatively high performing query/request work perfectly 8 times out of 10 and outright timeout 2 times out of 10 as load increases?
If this is starting to seem like a question for Heroku itself, I'm looking to first answer the question of whether "bad code" could somehow cause this issue -- or if it is clearly a problem on their end.
A bit more info:
Multiple Dynos
Cedar Stack
Dedicated Heroku DB (16 connections, 1.7 GB RAM, 1 comp. unit)
Rails 3.0.7
Thanks in advance.
Since you have multiple dynos and a dedicated DB instance and are paying hundreds of dollars a month for their service, you should ask Heroku
Edit: I should have added that when you check your logs, you can look for a line that says "routing" That is the Heroku routing layer that takes HTTP request and sends them to your app. You can add those up to see how much time is being spent outside your app. Unfortunately I don't know how easy it is to get large volumes of those logs for a load test.

Resources