we are having problems with connection pooling our DB connections (Otherwise we encounter “Cannot obtain database connection”).
For some reason we need to define a connection per Sidekiq worker we have running which will limit us in the very near future in terms of scaling our pods above the number connections we have available to our database.
So, for example, if each of our sidekiq processes has max 100 workers and our max database connections is 3500, we can have max 35 sidekiq processes up.
Anyone have a similar environment and can share some insights on how to have sidekiq workers re-use the database connections?
Ideally, I think the number of connections should be approx. 10% of the number of running workers and then having the workers re-using the connections to the DB.
Related
I just upgrade our database plan on Heroku for Postgres. On the new plan we have a lot more connections and I'm trying to make sure we're making full use of them at scale.
Say we configured our Puma server with the 40 threads:
puma -t 40:40
...and I set the pool size to 60 (just for a bit of buffer). My understanding is that because I've preallocated 40 Puma threads, each one will reserve a connection, resulting in 40 active connections. However, if I check the active connections there are only 5.
Am I completely misunderstanding how this works?
I am far from an expert in Puma so I just share my own knowledge.
First if you set the number of threads to 40, then your Puma worker will have 40 threads. Though be careful, because of GIL (or GVL) your Puma worker can have only a single thread doing a Ruby task at once. The 39 remaining threads are just sitting idle. UNLESS they are doing I/O (access to database or such ).
Basically the common knowledge is that after 5 threads, you have no more gain from increasing the number of threads. Maybe this can be pushed to 10 if your app is really I/O oriented but I wouldn't go further..
The real concurrency is set by the number of Puma workers (if you boot Puma in clustered mode). If you set the number of Puma workers to 40 then your app can at least handle 40 users at a time.
But 40 workers requires a huge Heroku Dyno, with quite a bit of RAM. Also if you add 5 threads per Puma worker then you need 200 DB connections !
What about the live DB connections
Due to the above, it is very hard to have a single worker with 40 threads to have them all access the DB at the same time. This is probably why your live DB connections are only 5 (unless you have not redeployed your app after the change).
I have a small app and also see a varying number of live DB connections across time.
The buffer
Never do a buffer. You are just blocking connections that can't be accessed by your app. The thread pool should equates the max number of threads.
My question: why so many DB connections ?
What was your goal in increasing the DB connections ? More concurrency ? If you have a small app, with a small web dyno, there is no point to have a big database plan behind.
If you want to scale your app. Get a bigger web dyno. Add more Puma workers while sticking to a number of threads to 5.
When the number of workers multiplied by the number of threads exceeds the number of allowed database connections, then it is time to upgrade the database.
Nota Bene : Rails may use a few connections for its internals. So if you have a database with 20 connnections, a Puma config with 3 workers and 5 threads. It is better to upgrade before adding a fourth worker.
I'm getting this error sporadically on my Prod server.
ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds
I see there is not high CPU usage for DB, but still this error happened once a day maybe twice.
Puma.rb
threads 2, 100
workers 2
database.yml
pool: 15
Ruby
ruby:2.3
Puma
puma (3.11.2)
DB size
db.m5.large
In your current configuration, each puma worker has its own connection pool with 15 available database connections. And each worker is allowed to scale between 2 and 100 threads depending on the server load.
This means when the load goes up or there are more long-running requests then your server will run create more than 15 threads and at that point, your database connection pool will be empty and the new threads have to wait for other threads to return database connections. This might take a while and after 5 seconds of waiting you will observe an ActiveRecord::ConnectionTimeoutError exception.
To solve this issue you have to make sure that the database connection pool is big enough for all your threads. But at the same time, you have to make sure that the total number of all connections in all pools – in the workers plus in sidekiq and other tools (database management tools or Rails consoles) – is below the maximum number of connections available by your database.
My advice is: First, figure out the maximum number of connections in your database. You might find this information in your database's config or in the docs of your database provider. Then split that number overall workers and tools like Sidekiq. Once you know the max number of connections per worker set the max number of thread to that number.
Example: Imaging your database supports 64 connections. Then you might want to run two servers with 2 workers each, a Sidekiq instance with 4 workers and you want to have a buffer to be able to connect a Rails console or a backup system to the database too.
2 servers with 2 workers 48
8 sidekiq workers 8
buffer 8
With those numbers, I would set the connection pool in Rails' database config to 12 and would set the number of threads in the puma config to 2, 12
Read more about this topic at Concurrency and Database Connections in Ruby with ActiveRecord in the Heroku devcenter.
After reading this blog: https://manuelvanrijn.nl/blog/2012/11/13/sidekiq-on-heroku-with-redistogo-nano/
I noticed SideKiq had two settings that I originally thought referred to the same thing:
server pool size
concurrency
Concurrency is set in the sidekiq.yml and server pool size is set in the sidekiq.rb file in the initializer folder.
What is the difference between these two settings?
Concurrency - number of tasks that can be performed parallelly(threads) in sidekiq.
If concurrency is 2 and 10 jobs come to sidekiq, it will execute 2 jobs at a time. Once the first two jobs are completed then only it can execute the next jobs.
Server Pool Size - In order to fetch jobs from Redis sidekiq will have to connect to the Redis server. A Redis server will have a limit of max connections. Server pool size will limit the max number of connections a sidekiq thread can open to the Redis server. A single thread will at max need 3 Redis connection. So if the concurrency is 2 then the server pool size should be 6.
If we don't add server pool size, sidekiq will create new connections to Redis as needed(instead of reusing the existing open connections) this might exhaust the number of connections available in the Redis
Concurrency is the number of job execution threads and makes a big difference in how much RAM your process will consume during its life. Pool size is the number of Redis connections in the process. A Sidekiq process must have at least (concurrency + 2) connections available.
I strongly recommend you don't configure the pool size at all, let Sidekiq do it for you.
Here is my problem: Each night, I have to process around 50k Background Jobs, each taking an average of 60s. Those jobs are basically calling the Facebook, Instagram and Twitter APIs to collect users' posts and save them in my DB. The jobs are processed by sidekiq.
At first, my setup was:
:concurrency: 5 in sidekiq.yml
pool: 5 in my database.yml
RAILS_MAX_THREADS set to 5 in my Web Server (puma) configuration.
My understanding is:
my web server (rails s) will use max 5 threads hence max 5 connections to my DB, which is OK as the connection pool is set to 5.
my sidekiq process will use 5 threads (as the concurrency is set to 5), which is also OK as the connection pool is set to 5.
In order to process more jobs in the same time and reducing the global time to process all my jobs, I decided to increase the sidekiq concurrency to 25. In Production, I provisionned a Heroku Postgres Standard Database with a maximum connection of 120, to be sure I will be able to use Sidekiq concurrency.
Thus, now the setup is:
:concurrency: 25 in sidekiq.yml
pool: 25 in my database.yml
RAILS_MAX_THREADS set to 5 in my Web Server (puma) configuration.
I can see that 25 sidekiq workers are working but each Job is taking way more time (sometimes more than 40 minutes instead of 1 minute) !?
Actually, I've been doing some tests and realize that processing 50 of my Jobs with a sidekiq concurrency of 5, 10 or 25 result in the same duration. As if somehow, there was a bottleneck of 5 connections somewhere.
I have checked Sidekiq Documentation and some other posts on SO (sidekiq - Is concurrency > 50 stable?, Scaling sidekiq network archetecture: concurrency vs processes) but I haven't been able to solve my problem.
So I am wondering:
is my understanding of the rails database.yml connection pool and sidekiq concurrency right ?
What's the correct way to setup those parameters ?
Dropping this here in case someone else could use a quick, very general pointer:
Sometimes increasing the number of concurrent workers may not yield the expected results.
For instance, if there's a large discrepancy between the number of tasks and the number of cores, the scheduler will keep switching your tasks and there isn't really much to gain, the jobs will just take about the same or a bit more time.
Here's a link to a rather interesting read on how job scheduling works https://en.wikipedia.org/wiki/Scheduling_(computing)#Operating_system_process_scheduler_implementations
There are other aspects to consider as well, such as datastore access, are your workers using the same table(s)? Is it backed by a storage engine that locks the entire table, such as MyISAM? If that's the case, it won't matter if you have 100 workers running at the same time, and enough RAM and cores, they will all be waiting in line for whichever query is running to release the lock on the table they're all meant to be working with.
This can also happen with tables using engines such as InnoDB, which doesn't lock the entire table on write but you may have different workers accessing the same rows (InnoDB uses row-level locking) or simply some large indexes that don't lock but slow down the table.
Another issue I've encountered was related to Rails (which I'm assuming you're using) taking quite a toll on RAM in some cases, so you might want to look at your memory footprint as well.
My suggestion is to turn on logging and look at the data, where do your workers spend most time at? Is it something on the network layer (unlikely), is it waiting to get access to a core? Reading/writing from your data store? Is your machine swapping?
Assume I have the below setup on Heroku + Rails, with one web dyno and two worker dynos.
Below is what I believe to be true, and I'm hoping that someone can confirm these statements or point out an assumption that is incorrect.
I'm confident in most of this, but I'm a bit confused by the usage of client and server, "connection pool" referring to both DB and Redis connections, and "worker" referring to both puma and heroku dyno workers.
I wanted to be crystal clear, and I hope this can also serve as a consolidated guide for any other beginners having trouble with this
Thanks!
How everything interacts
A web dyno (where the Rails application runs)
only interacts with the DB when it needs to query it to serve a page request
only interacts with Redis when it is pushing jobs onto the Sidekiq queue (stored in Redis). It is the Sidekiq client
A Worker dyno
only interacts with the DB if the Sidekiq job it's running needs to query the DB
only interacts with Redis to pull jobs from the Sidekiq queue (stored in Redis). It is the Sidekiq server
ActiveRecord Pool Size
An ActiveRecord pool size of 25 means that each dyno has 25 connections to work with. (This is what I'm most unsure of. Is it each dyno or each Puma/Sidekiq worker?)
For the web dynos, it can only run 10 things (threads) at once (2 puma x 5 threads), so it will only consume a maximum of 10 threads. 25 is above and beyond what it needs.
For worker dynos, the Sidekiq concurrency of 15 means 15 Sidekiq processes can run at a time. Again, 25 connections is beyond what it needs, but it's a nice buffer to have in case there are stale or dead connections that won't clear.
In total, my Postgres DB can expect 10 connections from the web dyno and 15 connects from each worker dyno for a total of 40 connections maximum.
Redis Pool Size
The web dyno (Sidekiq client) will use the connection pool size specified in the Sidekiq.configure_client block. Generally ~3 is sufficient because the client isn't constantly adding jobs to the queue. (Is it 3 per dyno, or 3 per Puma worker?)
Each worker dyno (Sidekiq server) will use the connection pool size specified in the Sidekiq.configure_server block. By default it's sidekiq concurrency + 2, so here 17 redis connections will be taken up by each dyno
I don't know Heroku + Rails but believe I can answer some of the more generic questions.
From the client's perspective, the setup/teardown of any connection is very expensive. The concept of connection pooling is to have a set of connections which are kept alive and can be used for some period of time. The JDK HttpUrlConnection does the same (assuming HTTP 1.1) so that - assuming you're going to the same server - the HTTP connection stays open, waiting for the next expected request. Same thing applies here - instead of closing a JDBC connection each time, the connection is maintained - assuming same server and authentication credentials - so the next request skips the unnecessary work and can immediately move forward in sending work to the database server.
There are many ways to maintain a client-side pool of connections, it may be part of the JDBC driver itself, you might need to implement pooling using something like Apache Commons Pooling, but whatever you do it's going to increase your behavior and reduce errors that might be caused by network hiccups that could prevent your client from connecting to the server.
Server-side, most database providers are configured with a pool of n possible connections that the database server may accept. Usually each additional connection has a footprint - usually quite small - so based on the memory available you can figure out the maximum number of available connections.
In most cases, you're going to want to have larger-than-expected connections available. For example, in postgres, the configured connection pool size is for all connections to any database on that server. If you have development, test, and production all pointed at the same database server (obviously different databases), then connections used by test might prevent a production request from being fulfilled. Best not to be stingy.