Why Heroku requests times are slower / faster after deploy - ruby-on-rails

I'm using Heroku for a production Rails application.
I'm monitoring it with scoutapp and noticed that requests time can be 4 times slower or faster after a deploy in production.
I made some screenshots this time, but this happened multiple times, if I'm luky it will be fast after deploy.
The deployment just contains a css update
heroku stats also shows slower response time:

I'd guess that slower times are due to cache being cleared when you deploy and the server restarts. When each request is cached again you would expect it to speed back up.
I can't offer an explanation for the faster times though.

Related

Strange postgres slow query every day around the same time on Heroku

I have a production app on heroku with postgres, serving a Rails API. Every day between 7-8am, there will be one or more long-running requests (about 20s or more) and it only occurs then. According to logs the time spent is in the database
There are no scheduled jobs
Backups are not scheduled anywhere close to that time
Traffic is at its lowest at that time
Memory is stable
There are other requests between the daily restart and that, so the dynos are not "cold"
It is not always the same endpoint, and it doesn't occur any other time
I'm not sure if it means anything but timezone is Singapore GMT+8, so 7-8am is right before midnight.
Has anyone else experienced this or has ideas to troubleshoot?
EDIT to add details:
You can see here (Scout APM) that it precedes the high throughput, basically before the start of the work day, so it's not due to load.
In fact there is pretty much no load at all. Neither is it the first request since server restart (at 4am). The slow request at 7:46am here was repeated at night (same url, same query string) which finished in under a second

Randomly degrading heroku DB performance?

So I am working on a pretty high traffic rails/heroku/postgres app, the backend only, and after running for hours, days, or weeks at times the database will randomly start taking 120 seconds to perform queries that usually take 2-3 seconds, and it clears up as soon as the app is restarted and everyone is essentially "kicked off". What could cause a database to start taking a ridiculously long time to perform all queries? The database is not running out of memory, it is being vacuumed regularly, and it is not running out of connections. There are around 500 users at times, dynos are autoscaling, and the web server is passenger. However this is probably something with PG as it is happening at the query level.

ActiveRecord::QueryCache#call slow on Heroku with pg:backups

Lately we've had trouble on our Rais 4.2.7.1 app every night, we start seeing a bunch of really slow ActiveRecord::QueryCache#call calls even though our traffic is relatively low in the middle of the night:
We're running on Heroku using Puma and the app is very job heavy, for which we use Sidekiq. During the day it works fine but every night we get these spikes of extremely slow response times via the API that seem to originate with ActiveRecord::QueryCache#call.
The only thing I can find from our app that might be causing this is we have heroku pg:backups enabled, and the night of the above image the backup began running at 3:06 which is the exact time you see that first ActiveRecord::QueryCache#call spike in the newrelic graph. The backup finished an hour later, however (around the biggest spike), but as you can see the spikes continued until around 5am.
Could this be caused by pg:backups? (Our database is about 19GB), or could it be something else entirely? Is there a good way to avoid this cache call or speed it up? I don't fully understand WHY it would be so slow or exist at all in the transaction list. Any recommendations?
Funnily enough, we've been investigating this lately after seeing similar behaviour. There is a definite performance hit caused by pg:backups on large databases. Notice the big spike just after 1am, when backup kicks in:
DB size is >100GB
It's not that surprising, and in fact Heroku do have documentation on this, which suggests that you should only use pg:backups for databases under 20GB.
For larger databases, creating a follower and taking the backup from that is preferable. Annoyingly for high availability databases, it doesn't appear that you can read from the standby.
I can't shed much light on ActiveRecord::QueryCache though, so the rest of this post is speculation, and maybe the starting point for further investigation. Happy to delete/amend if someone more knowledgable can weigh in :-)
Heroku's docs do say that the backup process will evict well cached data from non-Postgres caches, so this could represent your workers repopulating that cache many times over.
It may also be worth having a look at this. Could your workers be reusing connections and receiving dirty query caches?

Heroku. Request taking 100ms, intermittently Times out

After performing load testing against an app hosted on Heroku, I am finding that the most DB intensive request takes 50-200ms depending upon load. It never gets slower, no matter the load. However, seemingly at random, the request will outright timeout (30s or more).
On Heroku, why might a relatively high performing query/request work perfectly 8 times out of 10 and outright timeout 2 times out of 10 as load increases?
If this is starting to seem like a question for Heroku itself, I'm looking to first answer the question of whether "bad code" could somehow cause this issue -- or if it is clearly a problem on their end.
A bit more info:
Multiple Dynos
Cedar Stack
Dedicated Heroku DB (16 connections, 1.7 GB RAM, 1 comp. unit)
Rails 3.0.7
Thanks in advance.
Since you have multiple dynos and a dedicated DB instance and are paying hundreds of dollars a month for their service, you should ask Heroku
Edit: I should have added that when you check your logs, you can look for a line that says "routing" That is the Heroku routing layer that takes HTTP request and sends them to your app. You can add those up to see how much time is being spent outside your app. Unfortunately I don't know how easy it is to get large volumes of those logs for a load test.

Response time increasing (worsening) over time with consistent load

Ok. I know I don't have a lot of information. That is, essentially, the reason for my question. I am building a game using Flash/Flex and Rails on the back-end. Communication between the two is via WebORB.
Here is what is happening. When I start the client an operation calls the server every 60 seconds (not much, right?) which results in two database SELECTS and an UPDATE and a resulting response to the client.
This repeats every 60 seconds. I deployed a test version on heroku and NewRelic's RPM told me that response time degraded over time. One client with one task every 60 seconds. Over several hours the response time drifted from 150ms to over 900ms in response time.
I have been able to reproduce this in my development environment (my Macbook Pro) so it isn't a problem on Heroku's side.
I am not doing anything sophisticated (by design) in the server app. An action gets called, gets some data from the database, performs an AR update and then returns a response. No caching, etc.
Any thoughts? Anyone? I'd really appreciate it.
What does the development log say is slow for those requests? The view or db? If it's the db, check to see how many records there are in database and see how to optimize the queries. Maybe you need to index some fields.
Are you running locally in development or production mode? I've seen Rails apps performance degrade faster (memory usage) over time in development mode. I'm not sure if one can run an app on Heroku in development mode but if I were you I would check into that.

Resources