Rails connection pool size for WEBrick - ruby-on-rails

I am getting
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5 seconds (waited 5.000798 seconds). The max pool size is currently 1; consider increasing it.)
when I try to run WEBrick (rails server) with the pool size 1, and no problems with higher pool sizes.
What does WEBrick use the first connection on, and what is the best pool size for a single-threaded application? Is this a WEBrick-specific issue, or it applies to any other servers (like Unicorn)?
Rails version is 3.2.13
Update. Just verified this with unicorn, it works fine with single connection.

If I recall correctly, Rails reserves a connection to the database when it boots up and uses the remainder of the connections available in the connection pool to handle requests. Even if you never touch ActiveRecord objects during the life of the request, Rails will still try to reserve one connection from the pool for each request, or will block until one is available up until the timeout limit.
The default pool size is 5 connections: 1 reserved for Rails + 4 available for requests.
Rails does this to maintain thread-safety in the application.
If your application is single-threaded and only processes one request at a time with no regard for concurrency, the number of connections in the pool should be set to 2 at an absolute minimum. I'd still recommend the default of 5 though so you have some breathing room in case you need to utilize more than one connection per request.
This is not specific to WEBrick. The connection pool limit affects the application the same regardless of what application server is running.

Related

What happens when the possible amount of database connections is larger than the PostgreSQL allowed max_connections?

Background:
On production we have a poorly understood error that occurs sporadically (more frequently by the week) and may take down our whole application at times – or sometimes just all of our background processes. Unfortunately, I am not certain what causes the issue, below is my working theory – could you please validate its logic?
The error preceding the downtime (occurring a couple of hundred times in matters of seconds) is the PostgreSQL error FATAL: sorry, too many clients already.
Working theory:
Various parts of an API can request connections with the database. In our Ruby on Rails application for example, we have 12 puma workers with 16 threads (12 * 16 = 192 possible db connections). Also, we have 10 background workers, each being allowed a single db connection. If we also account for a single SSH session with 1 database connection, the maximum amount of db connections we would have to anticipate is 192 + 10 + 1 = 203 PostgreSQL connections, set with the max_connections in the postgresql.conf config file.
Our max_connections however is still set to the PostgreSQL default of 100. My understanding is that this is problematic: when the application thinks more db connections are possible (looking at the application side settings for puma and our background workers) it allows for new db connections to be made. But when those connections with PostgreSQL are initiated, PostgreSQL looks at its own set maximum of 100 connections and breaks the connection.
When instead the amount of "requestable" connections (in this case 203) would either be lower than or equal to the PostgreSQL max_connections, it would utilise the pool timeout to queue to requested db connection until a db socket becomes available.
This is desirable since too many connections could be resolved within the pool timeout. Thus the solution to our problem is to make the "requestable" database connections =< possible database connections. If that is still not enough, I should increase the 100 possible connections.
Does this make sense...?
Any ideas or criticism would be very much appreciated!
Your app threads does not need to map 1-1 to database connections. You can use a connection pool for the database connections. See https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
There is also lots of good info on this subject at https://devcenter.heroku.com/articles/concurrency-and-database-connections

Maximising use of available database connections

I just upgrade our database plan on Heroku for Postgres. On the new plan we have a lot more connections and I'm trying to make sure we're making full use of them at scale.
Say we configured our Puma server with the 40 threads:
puma -t 40:40
...and I set the pool size to 60 (just for a bit of buffer). My understanding is that because I've preallocated 40 Puma threads, each one will reserve a connection, resulting in 40 active connections. However, if I check the active connections there are only 5.
Am I completely misunderstanding how this works?
I am far from an expert in Puma so I just share my own knowledge.
First if you set the number of threads to 40, then your Puma worker will have 40 threads. Though be careful, because of GIL (or GVL) your Puma worker can have only a single thread doing a Ruby task at once. The 39 remaining threads are just sitting idle. UNLESS they are doing I/O (access to database or such ).
Basically the common knowledge is that after 5 threads, you have no more gain from increasing the number of threads. Maybe this can be pushed to 10 if your app is really I/O oriented but I wouldn't go further..
The real concurrency is set by the number of Puma workers (if you boot Puma in clustered mode). If you set the number of Puma workers to 40 then your app can at least handle 40 users at a time.
But 40 workers requires a huge Heroku Dyno, with quite a bit of RAM. Also if you add 5 threads per Puma worker then you need 200 DB connections !
What about the live DB connections
Due to the above, it is very hard to have a single worker with 40 threads to have them all access the DB at the same time. This is probably why your live DB connections are only 5 (unless you have not redeployed your app after the change).
I have a small app and also see a varying number of live DB connections across time.
The buffer
Never do a buffer. You are just blocking connections that can't be accessed by your app. The thread pool should equates the max number of threads.
My question: why so many DB connections ?
What was your goal in increasing the DB connections ? More concurrency ? If you have a small app, with a small web dyno, there is no point to have a big database plan behind.
If you want to scale your app. Get a bigger web dyno. Add more Puma workers while sticking to a number of threads to 5.
When the number of workers multiplied by the number of threads exceeds the number of allowed database connections, then it is time to upgrade the database.
Nota Bene : Rails may use a few connections for its internals. So if you have a database with 20 connnections, a Puma config with 3 workers and 5 threads. It is better to upgrade before adding a fourth worker.

Can I / Should I disable Ruby On Rails Database Connection Pooling when using PgBouncer?

Can I disable Ruby on Rails connection pooling completely?
And would this be okay considering PgBouncer already handles database connection pooling?
No.
PgBouncer advertises itself to clients as a Postgres server and then manages connections to the actual Postgres server. We don't need to get into any further detail than that for PgBouncer -- from the Rails side of things, the PgBouncer instance is the Postgres server, so let's explain why starting from there.
In Rails there are two primary factors to consider for concurrency: the number of inbound client requests that can be made to its web server and the size of the connection pool for the database.
If your connection pool size is 1 then your application is effectively single-threaded: every time an inbound client request is made that must make a database query, the request must check out a connection to the database from the pool. When the pool size is 1 and the number of concurrent inbound requests is greater than 1, every request after the first request must be put on hold while the first request completes its query and returns the connection to the pool. If the first request takes a long time to complete then subsequent requests may timeout while waiting for an available connection from the pool.
So from the Rails side, you want to have a connection pool with a size larger than 1 to allow for concurrency. If you set it to a size of 1 then it doesn't matter how you have PgBouncer configured; Rails will always make at most one connection to the database and all threads must share that single connection.
You can read more about connection pools in Rails at ActiveRecord::ConnectionAdapters::ConnectionPool < Object.

(Heroku + Sidekiq) Is my understanding of how Connection Pooling works correct?

Assume I have the below setup on Heroku + Rails, with one web dyno and two worker dynos.
Below is what I believe to be true, and I'm hoping that someone can confirm these statements or point out an assumption that is incorrect.
I'm confident in most of this, but I'm a bit confused by the usage of client and server, "connection pool" referring to both DB and Redis connections, and "worker" referring to both puma and heroku dyno workers.
I wanted to be crystal clear, and I hope this can also serve as a consolidated guide for any other beginners having trouble with this
Thanks!
How everything interacts
A web dyno (where the Rails application runs)
only interacts with the DB when it needs to query it to serve a page request
only interacts with Redis when it is pushing jobs onto the Sidekiq queue (stored in Redis). It is the Sidekiq client
A Worker dyno
only interacts with the DB if the Sidekiq job it's running needs to query the DB
only interacts with Redis to pull jobs from the Sidekiq queue (stored in Redis). It is the Sidekiq server
ActiveRecord Pool Size
An ActiveRecord pool size of 25 means that each dyno has 25 connections to work with. (This is what I'm most unsure of. Is it each dyno or each Puma/Sidekiq worker?)
For the web dynos, it can only run 10 things (threads) at once (2 puma x 5 threads), so it will only consume a maximum of 10 threads. 25 is above and beyond what it needs.
For worker dynos, the Sidekiq concurrency of 15 means 15 Sidekiq processes can run at a time. Again, 25 connections is beyond what it needs, but it's a nice buffer to have in case there are stale or dead connections that won't clear.
In total, my Postgres DB can expect 10 connections from the web dyno and 15 connects from each worker dyno for a total of 40 connections maximum.
Redis Pool Size
The web dyno (Sidekiq client) will use the connection pool size specified in the Sidekiq.configure_client block. Generally ~3 is sufficient because the client isn't constantly adding jobs to the queue. (Is it 3 per dyno, or 3 per Puma worker?)
Each worker dyno (Sidekiq server) will use the connection pool size specified in the Sidekiq.configure_server block. By default it's sidekiq concurrency + 2, so here 17 redis connections will be taken up by each dyno
I don't know Heroku + Rails but believe I can answer some of the more generic questions.
From the client's perspective, the setup/teardown of any connection is very expensive. The concept of connection pooling is to have a set of connections which are kept alive and can be used for some period of time. The JDK HttpUrlConnection does the same (assuming HTTP 1.1) so that - assuming you're going to the same server - the HTTP connection stays open, waiting for the next expected request. Same thing applies here - instead of closing a JDBC connection each time, the connection is maintained - assuming same server and authentication credentials - so the next request skips the unnecessary work and can immediately move forward in sending work to the database server.
There are many ways to maintain a client-side pool of connections, it may be part of the JDBC driver itself, you might need to implement pooling using something like Apache Commons Pooling, but whatever you do it's going to increase your behavior and reduce errors that might be caused by network hiccups that could prevent your client from connecting to the server.
Server-side, most database providers are configured with a pool of n possible connections that the database server may accept. Usually each additional connection has a footprint - usually quite small - so based on the memory available you can figure out the maximum number of available connections.
In most cases, you're going to want to have larger-than-expected connections available. For example, in postgres, the configured connection pool size is for all connections to any database on that server. If you have development, test, and production all pointed at the same database server (obviously different databases), then connections used by test might prevent a production request from being fulfilled. Best not to be stingy.

recommended pool size to set for mongo from a rails app (nginx, mongomapper)?

our app runs on rails 3.2.12, mongo 2.2 (will migrate to 2.4), mongomapper, and uses nginx + passenger.
if we're on a VPS with 5 GB of RAM, what's the best way to determine the optimal pool size for our application, and where do we set it?
With a web application server like Passenger, the max pool size for your MongoDB connection will be per passenger worker process since the web server is forking your entire application for each worker.
Additionally, if your app (your code) isn't spawning any threads and attempting to do work in parallel, then you can just leave your pool size at 1 since increasing it won't actually help you much in that situation.
If you are spawning threads to talk to the database then just make sure your pool size * number of passenger workers doesn't exceed the total number of connections to the database that you want (eg. pool size = 5 * passenger workers = 10 = 50 connections to MongoDB). You probably want your thread pool size and connection pool size to be fairly close to each other.
Also, bare in mind that MongoDB has a hard limit of 20k connections to a single MongoDB instance.

Resources