Benefits of connection pooling with Redis and Unicorn - ruby-on-rails

Background: I have a Ruby/Rails + Nginx/Unicorn web app with connections to multiple Redis DBs (i.e. I am not using Redis.current and am instead using global variables for my different connections). I understanding that I need to create a new connection in the after_fork block when a new Unicorn worker is created, as explained here and here.
My question is about the need for connection pooling. According to this SO thread, "In Unicorn each process establishes its own connection pool, so you if your db pool setting is 5 and you have 5 Unicorn workers then you can have up to 25 connections. However, since each unicorn worker can handle only one connection at a time, then unless your app uses threading internally each worker will only actually use one db connection... Having a pool size greater than 1 means each Unicorn worker has access to connections it can't use, but it won't actually open the connections, so that doesn't matter."
Since I am NOT using Sidekiq, do I even need to use connection pools for my Redis connections? Is there any benefit of a connection pool with a pool size of 1? Or should I simply use variables with single connections -- e.g. Redis.new(url: ENV["MY_CACHE"])?

Connection pool is only used when ActiveRecord talks to the SQL databases defined in your databases.yml config file. It is not related to Redis at all and the SO answer that you cite is actually not relevant for Redis.
So, unless you wanted to use some custom connection pool solution for Redis, you don't have to deal with it at all, as there is no pool for Redis in Rails by default. I guess the custom pool might be suitable if you had multiple threads in your applications which is not your case.
Update: Does building a connection pool make sense in your scenario? I doubt it. Connection pool is a way to reuse open connections (typically among multiple threads / requests). But you say that you:
use unicorn, the workers of which are separate, independent processes, not threads,
open a stable connection (or two) during after_fork, a connection which is then open all the time the unicorn worker lives
do not use threads in your application anywhere (I'd check if this is true again - it's not only Sidekiq but it might be any gem that tends to do things in the background).
In such scenario, pooling connection to Redis makes no sense to me as there seems to be no code that would benefit from reusing the connection - it is open all the time anyway.

Related

(Heroku + Sidekiq) Is my understanding of how Connection Pooling works correct?

Assume I have the below setup on Heroku + Rails, with one web dyno and two worker dynos.
Below is what I believe to be true, and I'm hoping that someone can confirm these statements or point out an assumption that is incorrect.
I'm confident in most of this, but I'm a bit confused by the usage of client and server, "connection pool" referring to both DB and Redis connections, and "worker" referring to both puma and heroku dyno workers.
I wanted to be crystal clear, and I hope this can also serve as a consolidated guide for any other beginners having trouble with this
Thanks!
How everything interacts
A web dyno (where the Rails application runs)
only interacts with the DB when it needs to query it to serve a page request
only interacts with Redis when it is pushing jobs onto the Sidekiq queue (stored in Redis). It is the Sidekiq client
A Worker dyno
only interacts with the DB if the Sidekiq job it's running needs to query the DB
only interacts with Redis to pull jobs from the Sidekiq queue (stored in Redis). It is the Sidekiq server
ActiveRecord Pool Size
An ActiveRecord pool size of 25 means that each dyno has 25 connections to work with. (This is what I'm most unsure of. Is it each dyno or each Puma/Sidekiq worker?)
For the web dynos, it can only run 10 things (threads) at once (2 puma x 5 threads), so it will only consume a maximum of 10 threads. 25 is above and beyond what it needs.
For worker dynos, the Sidekiq concurrency of 15 means 15 Sidekiq processes can run at a time. Again, 25 connections is beyond what it needs, but it's a nice buffer to have in case there are stale or dead connections that won't clear.
In total, my Postgres DB can expect 10 connections from the web dyno and 15 connects from each worker dyno for a total of 40 connections maximum.
Redis Pool Size
The web dyno (Sidekiq client) will use the connection pool size specified in the Sidekiq.configure_client block. Generally ~3 is sufficient because the client isn't constantly adding jobs to the queue. (Is it 3 per dyno, or 3 per Puma worker?)
Each worker dyno (Sidekiq server) will use the connection pool size specified in the Sidekiq.configure_server block. By default it's sidekiq concurrency + 2, so here 17 redis connections will be taken up by each dyno
I don't know Heroku + Rails but believe I can answer some of the more generic questions.
From the client's perspective, the setup/teardown of any connection is very expensive. The concept of connection pooling is to have a set of connections which are kept alive and can be used for some period of time. The JDK HttpUrlConnection does the same (assuming HTTP 1.1) so that - assuming you're going to the same server - the HTTP connection stays open, waiting for the next expected request. Same thing applies here - instead of closing a JDBC connection each time, the connection is maintained - assuming same server and authentication credentials - so the next request skips the unnecessary work and can immediately move forward in sending work to the database server.
There are many ways to maintain a client-side pool of connections, it may be part of the JDBC driver itself, you might need to implement pooling using something like Apache Commons Pooling, but whatever you do it's going to increase your behavior and reduce errors that might be caused by network hiccups that could prevent your client from connecting to the server.
Server-side, most database providers are configured with a pool of n possible connections that the database server may accept. Usually each additional connection has a footprint - usually quite small - so based on the memory available you can figure out the maximum number of available connections.
In most cases, you're going to want to have larger-than-expected connections available. For example, in postgres, the configured connection pool size is for all connections to any database on that server. If you have development, test, and production all pointed at the same database server (obviously different databases), then connections used by test might prevent a production request from being fulfilled. Best not to be stingy.

Split postgres connections between web and worker dynos?

I'm using two hobby dev dynos, one web, one worker - with a hobby basic postgres. I think I used up my (20) postgres connections with the worker, I was wondering how many connections should be allocated for each?
This depends on how many 'processes' your running of your web and worker dynos.
If you're running one web process, and one worker process, and each has 10 connections in the connection pool, you'd then be using your full allocation of connections (20). This would likely be the most optimal split.

Reason to use a global resource to connect to a redis-server

So, recently I moved all the session-related information in my app to the redis. Everything is running fine and now I am not facing the cookie-related issues (especially from IE).
In doing that, I read some blogs and all of them defined a redis-connector as a global variable in the config like
$redis = Redis.new(:host => 'localhost', :port => 6379)
Now there are a few things that bugging me:
Defining a global resource means that I have just a single connection to the redis. Will it create a bottleneck in my system when I have to serve multiple requests?
Also when multiple request arrives, will the Rails enqueue the requests for the redis as the connection is global resource, in case it is already in use?
Redis supports multiple instances. Wouldn't creating multiple instances boost the performance?
There are no standard connections pools included into Redis gem. If we consider Rails as a single threaded execution model it doesn't sound too problematic.
It might be evil when used in multi-threaded environment (think of background jobs as an example). So connection pooling is a good idea in general.
You can implement it for Redis using connection_pool gem.
Sidekiq also uses this gem for connecting to Redis. It can be seen here and here. Also, sidekiq author is the same person as connection_pool author, https://github.com/mperham.
As to your questions:
Multiple requests still don't mean multi-threading, so this approach might work well before you use threads;
Rails is not going to play the role of connection pool for your database;
It will boost performance (and avoid certain errors) if used in multi-threaded environment.
1) No it's not a bottleneck, opening TCP for Redis for every query/request cause leak of perfomance.
3) Yes if you have more then one core/thread.
Simply measure Redis connection number to see there is no new connection instantiated before each Rails request processed. The connection established on rails processor (Unicorn, Puma, Passenger etc) side during application load process.
echo info | redis-cli | grep connected_clients
Try to run the bash command before and during your application is being run locally.

How to properly use/plug Redis with Rails?

I have a Rails application that I want to connect to a Redis data structure server. I'm wondering how I should proceed. I'm using a global variable $redis locate at config/initializers/redis.rb to make queries across the entire application.
I believe this approach it is not suitable for a application with 80+ simultaneous connections, because it uses one single global variable to handle the Redis connection.
What should I do to overcome this problem? am I missing something about Rails internals?
Tutorial I'm following
http://jimneath.org/2011/03/24/using-redis-with-ruby-on-rails.html
This depends on the application server you will use. If you're using Unicorn which is a popular choice you should be fine.
Unicorn forks it's workers and each one will establish it's own database connection. And since each worker can only handle one request at a time it will only need one connection at a time. Adding more connections won't increase performance, it just will open more (useless) connections.
ActiveRecord (which is the DB-part of Rails) or DataMapper support connection pooling which is a common solution to overcome the problem you've mentioned. Connection pooling however only make sense in a threaded environment.
On top of that Redis is mainly single threaded (search for "Single threaded nature of Redis") so there might be no advantages anyway. There was an request to add connection pooling but it got closed, you might get more information from there.

Puma Cluster configuration on Heroku

I need some help with my configuration of Puma (Multi-Thread+Multi-Core Server) on my RoR4 Heroku app.
The Heroku docs on that are not quite up-to-date. I followed this one: Concurrency and Database Connections for the configuration, which does not mention the configuration for a Cluster, so I had to use both types together (threaded and multicore).
My current configuration:
./Procfile
web: bundle exec puma -p $PORT -C config/puma.rb
./config/puma.rb
environment production
threads 0,16
workers 4
preload_app!
on_worker_boot do
ActiveRecord::Base.connection_pool.disconnect!
ActiveSupport.on_load(:active_record) do
config = Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['DB_POOL'] || 5
ActiveRecord::Base.establish_connection
end
end
Questions:
a) Do I need the before_fork / after_fork configuration like in Unicorn, since the Cluster workers are forked?.
b) How do I tune my thread count depending on my application - what would be the reason to drop it down? / In what cases would it make a difference? Isn't 0:16 already optimized?
c) The Heroku database allows 500 connections. What would be a good value for DB_POOL depending on thread, worker and dyno count? - Does every thread per worker per dyno require a sole DB connection when working parallely?
In general: How should my configuration look like for concurrency and performance?
a) Do I need the before_fork / after_fork configuration like in
Unicorn, since the Cluster workers are forked?.
Normally no, but since you're using preload_app, yes. Preloading the app gets an instance up and running and then forks the memory space for the workers; the result is your initializers only get ran once (possibly allocating db connections and such). In this instance, your on_worker_boot code is appropriate. If you're not using preload_app, then each worker boots itself, in which case using an initializer would be ideal for setting up the custom connection like you're doing. In fact, without preload_app, your on_worker_boot block would error out because at that point ActiveRecord and friends aren't even loaded.
b) How do I tune my thread count depending on my application - what
would be the reason to drop it down? / In what cases would it make a
difference? Isn't 0:16 already optimized?
On Heroku (and my testing) you're best of matching your min/max threads, with max <= DB_POOL setting. The min threads allows your application to spin down resources when not under load, which is normally great to free up resources on the server, but likely less needed on Heroku; that dyno is already dedicated to serving web requests, may as well have them up and ready. While setting your max threads <= your DB_POOL environment variable isn't required, you run the risk of consuming all your database connections in the pool, then you have a thread wanting a connection but can't get it, and you can get the old "ActiveRecord::ConnectionTimeoutError - could not obtain a database connection within 5 seconds." error. This depends on your application though, you very well could have max > DB_POOL and be fine. I would say your DB_POOL should be at least the same as your min threads value, even though your connections are not eagerly loaded (5:5 threads wont open 5 connections if your app never hits the database).
c) The Heroku database allows 500 connections. What would be a good
value for DB_POOL depending on thread, worker and dyno count? - Does
every thread per worker per dyno require a sole DB connection when
working parallely?
The Production Tier allows 500, to be clear :)
Every thread per worker per dyno could consume a connection, depending on if they're all trying to access the database at the same time. Usually the connections are reused once they're done, but as I mentioned in b), if you're threads are greater than your pool you can have a bad time. The connections will be reused, all of this is handled by ActiveRecord, but sometimes not ideally. Sometimes connections go idle, or die, and that's why turning on the Reaper is suggested, to detect and reclaim dead connections.
You don't want less DB connections than threads. Remember that each separate process has its own connection pool, so if your DB supports 20 connections and you want to run 2 processes, the most threads you can run without risking timeouts is 10 threads each with a pool of 10 connections.
You want to leave a few connections for rails console sessions. Also be aware of background workers, and whether they are threaded.
If your workers are in a separate process (sidekiq), they will have their own pool. If your workers' threads are spawned from the web process (girl_friday or sucker_punch), you will want the DB_POOL to be larger than the max number of web threads, since they will be sharing a connection pool.

Resources