I am working on the free and fair use of Heroku for a Rails Postgres application.
Regarding the pricing on Heroku Postgres, I can see a connection limit (set to 120 for the cheapest offer).
What does this connection limit mean?
I have seen it which may deal with the parameter max_connections but even that, I do not really get how it works.
If I have my rails server receiving an order and request my database for an update or simple select with always the same database Postgres user: does it count for one connection?
I do not think so but I would like to get more details on this limit.
Thank you very much.
max_connection defines the number of Postgres instances running simultaneously to serve queries. max_connections open up multiple connection for same postgres user. It can increase performance in some cases but this can lead to poor performance like these:
Disk contention: If you are using disk(not RAM) then more tables and indexes to be accessed at the same time, causing heavier seeking all over the disk
RAM: This leads more RAM usage.
Synchronization: You need to handle synchronization of multiple instances which will effect the overall throughput.
If you are user base increases, then you may face too many clients error due to limited number of connections. Then there is a concept of
The solution is to two fold:
pool connections: If Postgres is failing to handle the situation, you can use tools like pgbouncer, repmgr and PgPool to handle pool connection before going to Postgres.
Application optimization: Try to open as less connections as possible and close the connection as soon as work is done.
You can study further on this from following links:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
http://www.postgresql.org/docs/current/static/runtime-config-connection.html#GUC-MAX-CONNECTIONS
You can study particularly for Ruby here: https://devcenter.heroku.com/articles/concurrency-and-database-connections#maximum-database-connections
Related
I have a Rails application that I've been developing for quite some time. All this time I tested it locally and on a DEV server. On the DEV server, next to the deployed application, there is also a PG database. And there were no problems with connections. I think there is simply no connection limit, or it is too high - not so important.
Today I started doing deployment to the PROD server. It is similar in power to that for DEV, but BD is already in the DO Database. By the way, the servers themselves are also located in DigitalOcean.
The problem is that DO Database has a limit of 20 connections. And as far as I understand, exceeding this limit - the Rails application gives an error:
ActiveRecord::ConnectionNotEstablished (FATAL: remaining connection slots are reserved for non-replication superuser connections)
The most obvious option is to reduce the number of requests on page load. But this still did not solve the problem if, for example, the number of users increases.
Can you please tell me which way to look? Are there any solutions to the problem other than updating the DO Database power?
You might want to try PG Bouncer (never tried it though, so i can't really tell how it will impact the app).
I'm making a new web app (Rails 6) that I would like to use websockets with. This is the first time I've attempted to use ActionCable with Rails. Ideally to start I would toss the app up on Heroku, since it is generally a fast and convenient way for me to get started with an app project.
If my app is successful I wouldn't be surprised if it had 200 concurrent users. This doesn't seem like a lot to me. I assume many of them would have open other tabs, so my estimate would be roughly 500 websocket connections for 200 users. If my site were more successful, I could see this going to 500 active users so more than 1000 websocket connections.
After preliminary research, it is recommended to use Redis on Heroku for ActionCable. I was dismayed to see that a Redis plan (Redistogo) with 500 websocket connections would cost $75/month and 1000 connections $200/month. More than dismayed, really. Shocked. Am I missing something? Why is this so expensive? It seems like these plans are also linked to large amounts of storage space, something that (my understanding) ActionCable doesn't even need.
I've seen that it's also possible (in theory) to configure ActionCable to use Postgresql as its adapter, though looking online I didn't see any examples of anyone doing this successfully on Heroku. So my questions are these:
Is it possible to use ActionCable on Heroku without paying through the nose? I assume this would mean Postgresql adapter...
How many connections could such a setup handle, what are the limitations, bottlenecks, etc?
I wasn't able to find much information about ActionCable and Postgresql, except that the number of pools has to match. I assume the pool number is not the same as the websocket connection count?
Thanks for any info...
I’ve set a client up with Heroku for their Ruby on Rails application and have had a great deal of trouble over the years with their application not running well regardless of how much money we spend on additional resources, find their documentation highly confusing. I’ve never been able to understand their specific terminology and documentation. We are constantly getting "H12" errors and "R14" errors etc. The memory usage and dyno loads are constantly spiking. And yet this is a small to medium-sized business without a massive amount of traffic. Wondering if anybody out there who does understand the ins and outs of Heroku can look this configuration over and tell me if it makes sense:
DB_POOL: 10
MALLOC_ARENA_MAX: 2
RAILS_MAX_THREADS: 5
WEB_CONCURRENCY: 4
Ruby 2.7
Rails 6.0
Puma
8 2x web dynos
5 1x worker dynos
$50 Postgres standard 0 database
$15 Memcachier
$10 Rediscloud
...etc addons
Your WEB_CONCURRENCY is too high for your Standard-2x dynos. The recommended default is 2: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#recommended-default-puma-process-and-thread-configuration
This is likely contributing to your R14 errors as higher web concurrency means more memory usage. So you need to either lower your web concurrency (which may mean you also need to increase the # of dynos to compensate) or you need to use bigger dynos.
You already have MALLOC_ARENA_MAX=2 but not sure if you are using jemalloc. You might want to try that too.
Of course, you may also have other memory issues in your app - check out some tips here. I also recommend adding a monitoring tool like AppSignal as it's capable of tracking memory allocations per transaction.
For mitigating H12s:
Ensure you have installed something like the rack-timeout gem, which ensures that a long-running request is dropped at the dyno-level and thus avoids the H12 error (you get a Rack::TimeoutError exception instead). Set the timeout to 15s so that it is well under the 30s for H12 timeout.
Investigate your slow transactions. A monitoring tool is key here, i.e. New Relic (start with lowest-priced paid plan - free plan does not allow transaction tracing). Here is their blog post on how to trace transactions
When you've identified the problem - fix it!
if the bottleneck is external:
check for external API limits and throttling
add timeouts and make app resilient to slow external responses
if the bottleneck is due to the database:
optimize slow queries
check cache hit rates
check for the # of waiting connections and db locks -> if the number of waiting connections is consistently above 0 for X minutes, that indicates you have some long locks that you'll need to investigate. Waiting connections is easiest to track over time with Librato (free plan should do fine)
if the bottleneck is other app code:
add more custom instrumentation to get more insights, i.e. New Relic instructions
address app code issues
I want to stress the importance of monitoring tools to help diagnose issues and help determine optimal resource usage. Doing things like figuring out the correct concurrency configs, the correct size and # of dynos to run are virtually impossible without proper monitoring tools. Hopefully you have some already that are covered by your etc add-ons that are not listed, but if you do not, I'll summarize my recommendations and mention a couple other tips:
To get more metrics info, ensure you have enabled log-runtime-metrics
Also enable Ruby language metrics
Add a monitoring tool that can track Ruby memory allocations like AppSignal. Scout APM can do this too but I think their plans capable of this are more expensive (requires Scout Insights feature)
Add the lowest-paid version of New Relic. This is my go-to tool for transaction tracing. AppSignal can do this too if you don't want to pay for another tool, but I find it easier with New Relic.
Add Librato. It offers some great charts out of the box, including a set of Postgres charts in its own dashboard.
Set alerts in your monitoring apps to warn you about things like response times so you can look into them!
And of course, make all your changes in staging first AND load test them to see the impacts of your changes before attempting in production!
Update: I also just noticed that you said you are using Standard-0 Postgres, which means it has a 120 connection limit. So if you end up lowering your WEB_CONCURRENCY and increasing the # of dynos, watch out for your total connections to that database. Beyond just the fact that there is a limit, more connections also mean more overhead for your db anyway so if you are close to your connection limit, you are more likely to see db performance suffer. You may want to upgrade to another plan that has a higher connection limit or use pgbouncer as your connection pooler to avoid connection limits.
I'm running 7 sidekiq processes (currency set to 40) plus a passenger webserver, connecting to a postgres database. Rails pool setting is set to 100 and and postgres max_connections setting is also the default 100.
I just added a new job class where each job makes multiple postgres requests, and I started getting this error on many sidekiq jobs and sometimes on my webserver: PG::ConnectionBad: FATAL: remaining connection slots are reserved for non-replication superuser connections
I tried increasing postgres max_connections to 200, and the error still occurs. Then I tried reducing the activerecord pool setting to 25 (25 connections for each process = 200 total connections), figuring I might start getting DB connection timeout errors but at least it would stop the "no remaining connection slots" errors.
But I'm still getting the remaining connection slots are reserved error.
The smarter way to to deal with this issue might be to load the important postgres data that I keep reusing into redis, and then access it from redis - which obivously plays much more nicely and quickly with sidekiq. But even as I do that, I'd like to understand what's going on here with the postgres connections:
Am I likely leaking connections, and is that something I should be
managing inside the sidekiq jobs?
(see Releasing ActiveRecord connection before the end of a Sidekiq job)
Should I look into more obscure things like locking/contention issues
or threading issues with the PG driver?
(see https://github.com/mperham/sidekiq/issues/594. I think I'm using ActiveRecord pretty simply without much obscure or abnormal logic for a rails app...)
Or maybe I'm just not understanding how the ActiveRecord pool setting
and postgres max_connection settings work together...?
My situation may be too specific to help many others running into this error, but I'll share what I've found out in case it helps to point you in the right direction.
Am I likely leaking connections, and is that something I should be managing inside the sidekiq jobs?
No, not likely. Sidekiq's default middleware includes a hook to close connections even if a job fails. It took me a long time to understand what the heck that means, so if you're not sure what that means, tl;dr: Sidekiq won't leak connections if you're using it normally.
Should I look into more obscure things like locking/contention issues or threading issues with the PG driver?
Unless you're using a very obscure setup, its probably something more simple.
Or maybe I'm just not understanding how the ActiveRecord pool setting and postgres max_connection settings work together...?
Anyone can feel free to correct me if I'm wrong, but here's the guidelines I'm going on for pool settings, max_connections, and sidekiq processes:
Minimum DB pool size = sidekiq concurrency setting
Maximum DB pool size* = postgres max_connections / total sidekiq processes (+ leave a few connections for web processes)
*note that active record will only create a new connection when a new thread needs one, so if 95% of your threads don't use postgres at the same time, you should be able to get away with far fewer max_connections than if every thread is trying to check out a connection at the same time.
What fixed my problem:
On my Ubuntu machine, I had changed the vm.overcommit_memory setting to 1 as recommended by redis, so that it can spawn it's write to disk process without breaking the machine.
This is the right way to go, but leaves postgres vulnerable to being killed by OOM (out of memory) Killer if memory usage gets too high. Turns out that postgres will stop allowing new connections if it receives a kill signal from the OOM Killer.
Once I restarted postgres, sidekiq was able to connect again. The longer term solution is simply to work on memory leaks and make sure memory usage doesn't get too high. Also it's possible to configure the OOM killer to prioritize killing my sidekiqs before killing postgres.
My database is very cpu constrained, and I can't find the root cause of the issue. I currently have two applications servers each wit a Rails api connecting to PostgreSQL via the ruby-pg gem. Both application server also have sidekiq running background jobs, and I have a handful of support servers processing new posts from a national feed via sidekiq. If I were running out of memory, the solution would seemingly be straight forward. Any general ideas why I am CPU constrained?
Database Specs:
Rackspace 8GB Performance Tier cloud VM (8GB RAM, 8x Core CPU, SSD)
Debian 7 Wheezy Linux OS
PostgreSQL 9.1 with PostGIS extension
Possible Problems:
PostgreSQL 9.1 is bad at indexes
The database has nearly 10GB of indexes. I am going to upgrade my database to PostgreSQL version >= 9.2. In version 9.2, index only scans were introduced.
Too many connections
In the postgresql.conf, I have set max connection equal to '500'. Usually throughout the day, only 175 connections are utilized, but during peak times, sidekiq tasks will increase the current connections to 350. How many connections are recommended with an 8GB server instance?
Idol Connections
When I take a look at pg_stat_activity in the psql console, I see sidekiq is leaving a lot of IDLE connections. Could these connections result in CPU inflation? Does the fix exist in the api or in sidekiq?
Need a more powerful server
Maybe there is not a bug. I might need to simply increase the server instance. Again this would make more sense if I was memory bound. However, both app servers and 3 of the support sidekiq servers are 4gb performance tier instances. Essentially, servers that interact with the database have combined more than double the resources of the database. Should this even matter?
Additional questions:
What tools/techniques should I employ to troubleshoot the issue?
Any basic settings in the postgresql.conf related to cpu usage?
Are there any known issues related to rails, sidekiq, or the pg gem that could be a contributing factor? (I havent seen any open issues.)
Are there any general postgreSQL guideline for CPU usage?
Any other ideas thoughts that might help my search?
You are using massively too many concurrent connections. PostgreSQL will be wasting lots of its time on housekeeping and juggling concurrent queries. All the concurrent work will be fighting for CPU and buffer space, there'll be heavy contention on spinlocks, and it'll all generally be a mess.
On an 8 core machine, you should probably not have more than 20 actively working connections if you're mostly CPU constrained. If you're I/O limited, you can go higher, but 350 is just ridiculous.
If possible, put a PgBouncer in transaction pooling mode in front of your PostgreSQL instance, so queries get queued up and executed rapidly in series instead of slowly in parallel.
See number of database connections (Pg wiki).
Additionally, PostGIS can be very CPU-heavy. It sometimes needs to do very complex calculations. I suggest using the auto_explain module to record long running queries, and using pg_stat_statements / pg_stat_plans to record what's taking up resources. Examine these queries to see if they need improvement.
Your idle in transaction sessions must be dealt with, too. Depending on why they're idle and whether they have a transaction ID or not, they might be causing serious table bloat. They're also creating unnecessary signalling overhead within PostgreSQL, as it has to do more co-ordination with backends that're actively doing things. Finally, the number of open transactions its self increases the cost of some internal housekeeping operations.
So. Your DB will probably perform better if you reduce the connection counts, put a PgBouncer in transaction pooling mode in front, and fix those idle connections.
Most likely you are CPU constrained because your work needs a lot of CPU. :)
9.1 is not generally bad at indexes. There may be some specific issues, as all versions might, which exactly what they are might change from version to version.
Index-only-scans are mostly a benefit when you are IO constrained. I wouldn't hold out much hope for that being a magic bullet for you.
350 connections are certainly not helpful, but probably are not very harmful, either. But when they are harmful, it can be downright catastrophic. The correct value is more determined by the number of cores, not the amount of RAM. If it is easy to throttle down the sidekiq connections, do it even if you can't prove that it helps.
If the connections are just IDLE, not IDLE in transaction, then they probably aren't very harmful, but again there are a few cases where they can be. That is pretty much the same issue as the number of connections.
The connection you showed from top was idle in transaction. That status shouldn't be taking up much CPU, so that probably means it is rapidly cycling through statements and top just happens to catch it while it is between them. But you didn't say how many similar lines there were in top, if it is just that one it suggests your code is not running concurrently and 7 of you 8 CPUs are wasted.
Regarding the db server versus the other servers, if the database is fundamentally the limit, beating on it with a bigger hammer is not going to help. Often there is some flexibility about where computation is done. If you can get the app servers to do more computation that is currently done on the db and let the db focus on ACID issues, that would be good. But no one but you can know if that is possible or feasible.
My first stop would be to use pg_stat_statements to see what SQL statements are taking the most time. Maybe just adding an index to the slowest/most frequent query would make the problem magically go away.