Rails' database.yml file has a setting, pool: 5. I understand what a database connection pool is but I'm being tripped by a few subtleties:
A connection is used then returned to its pool. The next request can then use a connection from the pool rather than creating a new connection.
How is it determined which request gets which connection?
Suppose I have a concurrent connections limit of 5 and one of my web pages needs to make 10 queries to the database:
Is each query a separate connection or all 10 queries are considered one connection?
In terms of queries, connections, or speed, what can be an example of a situation that would overwhelm that 5 concurrent connections limit?
And suppose that, in a different database, I set the database connection pool size to 5.
How are pool size and concurrent connections related, if at all?
In terms of queries, connections, or speed, what can be an example of a situation that would overwhelm this pool size?
1) ActiveRecord::Base loads a connection when required (lazily on a request or it's current one is closed/disconnected)
2) No, The same connection will be used to make multiple queries
3) No way to answer that without using diagnostic utilities which your db vendor supplied with your db
4) That is db vendor/adapter specific
5) same answer as 3.
If you are experiencing a slow down, the only way to solve them is by using diagnostic tools to inform you where your bottleneck is concurring. 90% of the time, it's not your db or the connections to it (It's usually the indexing, n+1, etc... )
If you are NOT experiencing any slow down, then keep the defaults and move on. Premature optimization will lead to an over engineered solution
Related
Background:
On production we have a poorly understood error that occurs sporadically (more frequently by the week) and may take down our whole application at times – or sometimes just all of our background processes. Unfortunately, I am not certain what causes the issue, below is my working theory – could you please validate its logic?
The error preceding the downtime (occurring a couple of hundred times in matters of seconds) is the PostgreSQL error FATAL: sorry, too many clients already.
Working theory:
Various parts of an API can request connections with the database. In our Ruby on Rails application for example, we have 12 puma workers with 16 threads (12 * 16 = 192 possible db connections). Also, we have 10 background workers, each being allowed a single db connection. If we also account for a single SSH session with 1 database connection, the maximum amount of db connections we would have to anticipate is 192 + 10 + 1 = 203 PostgreSQL connections, set with the max_connections in the postgresql.conf config file.
Our max_connections however is still set to the PostgreSQL default of 100. My understanding is that this is problematic: when the application thinks more db connections are possible (looking at the application side settings for puma and our background workers) it allows for new db connections to be made. But when those connections with PostgreSQL are initiated, PostgreSQL looks at its own set maximum of 100 connections and breaks the connection.
When instead the amount of "requestable" connections (in this case 203) would either be lower than or equal to the PostgreSQL max_connections, it would utilise the pool timeout to queue to requested db connection until a db socket becomes available.
This is desirable since too many connections could be resolved within the pool timeout. Thus the solution to our problem is to make the "requestable" database connections =< possible database connections. If that is still not enough, I should increase the 100 possible connections.
Does this make sense...?
Any ideas or criticism would be very much appreciated!
Your app threads does not need to map 1-1 to database connections. You can use a connection pool for the database connections. See https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
There is also lots of good info on this subject at https://devcenter.heroku.com/articles/concurrency-and-database-connections
If I use pgbouncer, what are the reasons I should not just set Active Record's pool size to 99999, effectively disabling it, and leaving pgbouncer in charge of all pooling?
In my case, this is with Rails 5.2. pgbouncer uses transaction pooling.
I can think of some possible reasons:
If a runaway process somehow tries to open a high number of threads/connections, the AR pool would set a ceiling, preventing it from exhausting all connections.
Similarly, if AR doesn't close connections to pgbouncer correctly (e.g. if some code opens connections in threads without closing them), and AR's reaper does not run or does not run often enough, that code could exhaust all connections.
If Active Record itself has costly overhead per connection (does it?), perhaps it's preferable to wait and reuse connections instead of opening a higher number of connections, in situations where the same process tries to open a lot of connections.
Are those valid reasons? Are they the only reasons?
(I've seen Disabling Connection Pooling in Rails to use PgBouncer and think this is related but not quite the same question.)
What are the benefits to using an external connection pool?
I've heard that most other applications will open up a connection for each unit of work. In Rails, for example, I'd take that to mean that each request could open a new connection. I'm assuming a connection pool would make that possible.
The only benefit I can think of is that it allows you to have 1,000 frontend processes without having 1,000 postgres processes running.
Are there any other benefits?
Rails has connection pooling built in:
Simply use ActiveRecord::Base.connection as with Active Record 2.1 and earlier (pre-connection-pooling). Eventually, when you’re done with the connection(s) and wish it to be returned to the pool, you call ActiveRecord::Base.clear_active_connections!. This will be the default behavior for Active Record when used in conjunction with Action Pack’s request handling cycle.
Manually check out a connection from the pool with ActiveRecord::Base.connection_pool.checkout. You are responsible for returning this connection to the pool when finished by calling ActiveRecord::Base.connection_pool.checkin(connection).
Use ActiveRecord::Base.connection_pool.with_connection(&block), which obtains a connection, yields it as the sole argument to the block, and returns it to the pool after the block completes.
This has been available since version 2.2. You'll see a pool parameter in your database.yml for controlling it:
pool: number indicating size of connection pool (default 5)
I don't think there would be much point in layering another pooling system underneath it and it could even confuse AR's pooling if you tried it.
What is the advantage and disadvantage of connection timeout=0?
And what is the use of Connection Lifetime=0?
e.g
(Database=TestDB;
port=3306;
Uid=usernameID;
Pwd=myPassword;
Server=192.168.10.1;
Pooling=false;
Connection Lifetime=0;
Connection Timeout=0)
and what is the use of Connection Pooling?
Timeout is how long you wait for a response from a request before you give up. TimeOut=0 means you will keep waiting for the connection to occur forever. Good I guess if you are connecting to a really slow server that it is normal if it takes 12 hours to respond :-). Generally a bad thing. You want to put some kind of reasonable timeout on a request, so that you can realize your target is down and move on with your life.
Connection Lifetime = how long a connection lives before it is killed and recreated. A lifetime of 0 means never kill and recreate. Normally not a bad thing, because killing and recreating a connection is slow. Through various bugs your connections may get stuck in an unstable state (like when dealing with weird 3 way transactions).. but 99% of the time it is good to keep connection lifetime as infinite.
Connection pooling is a way to deal with the fact that creating a connection is very slow. So rather than make a new connection for every request, instead have a pool of say, 10, premade connections. When you need one, you borrow one, use it, and return in. You can adjust the size of the pool to change how your app behaves. Bigger pool = more connections = more threads doing stuff at a time, but this could also overwhelm whatever you are doing.
In summary:
ConnectionTimeout=0 is bad, make it something reasonable like 30 seconds.
ConnectionLifetime=0 is okay
ConnectionPooling=disabled is bad, you will likely want to use it.
I know this is an old thread but I think it is important to point out an instance in which you may want to disable Connection Pooling or use Connection Lifetime.
In some environments (especially when using Oracle, or at least in my experience) the web application is designed so that it connects to the database using the user's credentials vs a fixed connection string located in the server's configuration file. In this case enabling connection pooling will cause the server to create a connection pool for each user accessing the website (See Pool Fragmentation). Depending on the scenario this could either be good or bad.
However, connection pooling becomes a problem when the database server is configured to kill database connections that exceed a maximum idle time due to the fact that the database server could kill connections that may still reside in the connection pool. In this scenario the Connection Lifetime may come in handy to throw away these connections since they have been closed by the server anyway.
I've been recently applying threads for making queries to a MYSQL database, I use MyDAC for connection to DB, 'cause TMyConnection doesnot let making simultaneously queries per a connection, I create a new connection and a new query object per every thread executing a query, so in certain time could happens that server has several connections per a client. If we consider this scenario for several clients connecting to database, this is would be a problem, I guess. Is there a better solution for using threads in queries?
Thanks in advance
Use a second tier where you can pool some connections (you can do with datasnap or remobjetcs...) This way you can reuse connections of all of your users and mantain the number of connections in a smaller level.
Have a look Cary Jansen article called
Using Semaphores in Delphi, Part 2: The Connection Pool
He goes in to great detail about how to provide thread-safe access to a limited number of database connections
Getting is code to work with MyDac - TMyConnection is trivial.