I have a Rails app running on OSE, 5 pods, 1 container per pod. The Rails app uses the Puma web server with default thread settings (min: 0, max: 16). In my database.yml I've defined a connection pool: of 10.
I'd like to know what my maximum PG connection footprint would be?
My current theory is:
5 pods x 1 container x 16 threads x 10 connection pool = 800 possible PostgreSQL connections.
However, I'm questioning if each of the 16 Puma threads share from the same PG connection pool? In which case the formula would be:
5 pods x 1 container x 10 connection pool = 50 possible PostgreSQL connections.
(Of course, having Puma 16 threads if this math is correct would be a problem since my app might request more connections than could be provided, at 1 per thread, 6 more than the pool offers.)
Can anyone point me to definitive documentation on the subject? Thanks!
If the connection pool is within the process and doles out database connections across threads, with threads waiting if all database connections busy, then the second is correct. If not, the first. Either way it can actually be worse though. If you are using rolling deployments, on a restart there could be one additional pod active.
Have a look at using pgbouncer (https://pgbouncer.github.io/) in front of the PostgreSQL database instance. My understanding is that it provides additional flexibility in being able to manage a pool of database connections without needing to do anything in your application, instead it is dealt with in pgbouncer.
Related
Background:
On production we have a poorly understood error that occurs sporadically (more frequently by the week) and may take down our whole application at times – or sometimes just all of our background processes. Unfortunately, I am not certain what causes the issue, below is my working theory – could you please validate its logic?
The error preceding the downtime (occurring a couple of hundred times in matters of seconds) is the PostgreSQL error FATAL: sorry, too many clients already.
Working theory:
Various parts of an API can request connections with the database. In our Ruby on Rails application for example, we have 12 puma workers with 16 threads (12 * 16 = 192 possible db connections). Also, we have 10 background workers, each being allowed a single db connection. If we also account for a single SSH session with 1 database connection, the maximum amount of db connections we would have to anticipate is 192 + 10 + 1 = 203 PostgreSQL connections, set with the max_connections in the postgresql.conf config file.
Our max_connections however is still set to the PostgreSQL default of 100. My understanding is that this is problematic: when the application thinks more db connections are possible (looking at the application side settings for puma and our background workers) it allows for new db connections to be made. But when those connections with PostgreSQL are initiated, PostgreSQL looks at its own set maximum of 100 connections and breaks the connection.
When instead the amount of "requestable" connections (in this case 203) would either be lower than or equal to the PostgreSQL max_connections, it would utilise the pool timeout to queue to requested db connection until a db socket becomes available.
This is desirable since too many connections could be resolved within the pool timeout. Thus the solution to our problem is to make the "requestable" database connections =< possible database connections. If that is still not enough, I should increase the 100 possible connections.
Does this make sense...?
Any ideas or criticism would be very much appreciated!
Your app threads does not need to map 1-1 to database connections. You can use a connection pool for the database connections. See https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
There is also lots of good info on this subject at https://devcenter.heroku.com/articles/concurrency-and-database-connections
I'm getting this error sporadically on my Prod server.
ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds
I see there is not high CPU usage for DB, but still this error happened once a day maybe twice.
Puma.rb
threads 2, 100
workers 2
database.yml
pool: 15
Ruby
ruby:2.3
Puma
puma (3.11.2)
DB size
db.m5.large
In your current configuration, each puma worker has its own connection pool with 15 available database connections. And each worker is allowed to scale between 2 and 100 threads depending on the server load.
This means when the load goes up or there are more long-running requests then your server will run create more than 15 threads and at that point, your database connection pool will be empty and the new threads have to wait for other threads to return database connections. This might take a while and after 5 seconds of waiting you will observe an ActiveRecord::ConnectionTimeoutError exception.
To solve this issue you have to make sure that the database connection pool is big enough for all your threads. But at the same time, you have to make sure that the total number of all connections in all pools – in the workers plus in sidekiq and other tools (database management tools or Rails consoles) – is below the maximum number of connections available by your database.
My advice is: First, figure out the maximum number of connections in your database. You might find this information in your database's config or in the docs of your database provider. Then split that number overall workers and tools like Sidekiq. Once you know the max number of connections per worker set the max number of thread to that number.
Example: Imaging your database supports 64 connections. Then you might want to run two servers with 2 workers each, a Sidekiq instance with 4 workers and you want to have a buffer to be able to connect a Rails console or a backup system to the database too.
2 servers with 2 workers 48
8 sidekiq workers 8
buffer 8
With those numbers, I would set the connection pool in Rails' database config to 12 and would set the number of threads in the puma config to 2, 12
Read more about this topic at Concurrency and Database Connections in Ruby with ActiveRecord in the Heroku devcenter.
Assume I have the below setup on Heroku + Rails, with one web dyno and two worker dynos.
Below is what I believe to be true, and I'm hoping that someone can confirm these statements or point out an assumption that is incorrect.
I'm confident in most of this, but I'm a bit confused by the usage of client and server, "connection pool" referring to both DB and Redis connections, and "worker" referring to both puma and heroku dyno workers.
I wanted to be crystal clear, and I hope this can also serve as a consolidated guide for any other beginners having trouble with this
Thanks!
How everything interacts
A web dyno (where the Rails application runs)
only interacts with the DB when it needs to query it to serve a page request
only interacts with Redis when it is pushing jobs onto the Sidekiq queue (stored in Redis). It is the Sidekiq client
A Worker dyno
only interacts with the DB if the Sidekiq job it's running needs to query the DB
only interacts with Redis to pull jobs from the Sidekiq queue (stored in Redis). It is the Sidekiq server
ActiveRecord Pool Size
An ActiveRecord pool size of 25 means that each dyno has 25 connections to work with. (This is what I'm most unsure of. Is it each dyno or each Puma/Sidekiq worker?)
For the web dynos, it can only run 10 things (threads) at once (2 puma x 5 threads), so it will only consume a maximum of 10 threads. 25 is above and beyond what it needs.
For worker dynos, the Sidekiq concurrency of 15 means 15 Sidekiq processes can run at a time. Again, 25 connections is beyond what it needs, but it's a nice buffer to have in case there are stale or dead connections that won't clear.
In total, my Postgres DB can expect 10 connections from the web dyno and 15 connects from each worker dyno for a total of 40 connections maximum.
Redis Pool Size
The web dyno (Sidekiq client) will use the connection pool size specified in the Sidekiq.configure_client block. Generally ~3 is sufficient because the client isn't constantly adding jobs to the queue. (Is it 3 per dyno, or 3 per Puma worker?)
Each worker dyno (Sidekiq server) will use the connection pool size specified in the Sidekiq.configure_server block. By default it's sidekiq concurrency + 2, so here 17 redis connections will be taken up by each dyno
I don't know Heroku + Rails but believe I can answer some of the more generic questions.
From the client's perspective, the setup/teardown of any connection is very expensive. The concept of connection pooling is to have a set of connections which are kept alive and can be used for some period of time. The JDK HttpUrlConnection does the same (assuming HTTP 1.1) so that - assuming you're going to the same server - the HTTP connection stays open, waiting for the next expected request. Same thing applies here - instead of closing a JDBC connection each time, the connection is maintained - assuming same server and authentication credentials - so the next request skips the unnecessary work and can immediately move forward in sending work to the database server.
There are many ways to maintain a client-side pool of connections, it may be part of the JDBC driver itself, you might need to implement pooling using something like Apache Commons Pooling, but whatever you do it's going to increase your behavior and reduce errors that might be caused by network hiccups that could prevent your client from connecting to the server.
Server-side, most database providers are configured with a pool of n possible connections that the database server may accept. Usually each additional connection has a footprint - usually quite small - so based on the memory available you can figure out the maximum number of available connections.
In most cases, you're going to want to have larger-than-expected connections available. For example, in postgres, the configured connection pool size is for all connections to any database on that server. If you have development, test, and production all pointed at the same database server (obviously different databases), then connections used by test might prevent a production request from being fulfilled. Best not to be stingy.
We are unable to scale the frequency of our crons as we've would've liked and the thing holding us back is the number of database connection issues.
We have a primary server which has the master db, and 3 slaves. We run sidekiq on all our machines.
Our postgresql.conf -: max_connections = 200
Our pool option is also set at pool: 200 on all our rails app - database.yml in our servers.
We are running 2 sidekiq processes on each of our servers
In the green machine, if we change our concurrency from 6 to 7, we start getting a steam of errors -: Sidekiq - could not obtain a database connection within 5.042 seconds. Where am I messing up? :-(
Could it be something else inside our app? The numbers just don't add up.
Also does the number of active record connections have any association with pg_stat_activity?
Thanks in advance
Just figured it out
We're using replication, and in shards.yml, we had not set the pool size :-(
It was picking up 5 by default.
We have recently been having issues with postgres running out of connection slots, and after a lot of debugging and shrugging of shoulders we have pretty much tracked it down to the fact that we understood Connection pools wrong.
We use Rails, Postgres and Unicorn, and Delayed Job
Are we correct to assume that the connection pool is process specific, i.e each process has its own 10 (our connection pool limit) connections to the db in the pool?
And If there are no threads anywhere in the app, are we correct to assume that for the most part each process will use 1 connection, since noone ever needs a second one?
Based on these assumptions we tracked it down to the number of processes
Web server - 4x unicorn
Delayed job 3x server - 30 processes = 90 connections
That's 94 connections, and a couple connections for rails:consoles and a couple of rails runner or rake tasks would explain why we were hitting the limit often right? It has been particularly often this week after I converted a ruby script into a rails runner script.
We are planning to increase the max from 100 -> 200 or 250 to relieve this but is there a trivial way to implement inter process connection pooling in rails?
You probably want to take a look at pgbouncer. It's a purpose-built PostgreSQL connection pooler. There are some notes on the wiki too. It's packaged for most linux distros too.