I have a Rails application that I want to connect to a Redis data structure server. I'm wondering how I should proceed. I'm using a global variable $redis locate at config/initializers/redis.rb to make queries across the entire application.
I believe this approach it is not suitable for a application with 80+ simultaneous connections, because it uses one single global variable to handle the Redis connection.
What should I do to overcome this problem? am I missing something about Rails internals?
Tutorial I'm following
http://jimneath.org/2011/03/24/using-redis-with-ruby-on-rails.html
This depends on the application server you will use. If you're using Unicorn which is a popular choice you should be fine.
Unicorn forks it's workers and each one will establish it's own database connection. And since each worker can only handle one request at a time it will only need one connection at a time. Adding more connections won't increase performance, it just will open more (useless) connections.
ActiveRecord (which is the DB-part of Rails) or DataMapper support connection pooling which is a common solution to overcome the problem you've mentioned. Connection pooling however only make sense in a threaded environment.
On top of that Redis is mainly single threaded (search for "Single threaded nature of Redis") so there might be no advantages anyway. There was an request to add connection pooling but it got closed, you might get more information from there.
Related
I'm working on adding pgBouncer to my existing Rails Heroku application. My Rails app uses Sidekiq, has one Heroku postgres DB, and one TimescaleDB. I'm a bit confused about all the layers of connection pooling between these different services. I have the following questions I'd love some help with:
Rails already provides connection pooling out of the box, right? If so, what's the benefit of adding pgBouncer? Will they conflict?
Sidekiq already provides connection pooling out of the box, right? If so, what's the benefit of adding pgBouncer? Will they conflict?
I'm trying to add pgBouncer via the Heroku Buildpack for pgBouncer. It seems that I only add the pgBouncer to run on the web dyno but not with Sidekiq. Why is that? Shouldn't the worker and web dynos both be using pgBouncer?
The docs say that I can use pgBouncer on multiple databases. However, pgBouncer fails when trying to add my TimescaleDB database URL to pgBouncer. I've checked the script and everything looks accurate but I get the following error: ActiveRecord::ConnectionNotEstablished: connection to server at "127.0.0.1", port 6000 failed: ERROR: pgbouncer cannot connect to server. What's the best way to console into the pgBouncer instance and see in more detail what's breaking?
Thanks so much for your help.
They serve different purposes. ActiveRecord connection pool will manage the connections to limit connections in a thread-level from the same process while pgBouncer allow you to manage several applications pooling connection from the same DB or several DBs.
Sidekiq makes its own internal controller but PGBouncer works in several modes with multi processes in parallel as it's orchestrating 3 modes: session, transaction, and statement.
Probably this doc can help you to understand this part.
I think you can try admin console to check what's going on but not sure if it will work for troubleshooting as you expect.
Background: I have a Ruby/Rails + Nginx/Unicorn web app with connections to multiple Redis DBs (i.e. I am not using Redis.current and am instead using global variables for my different connections). I understanding that I need to create a new connection in the after_fork block when a new Unicorn worker is created, as explained here and here.
My question is about the need for connection pooling. According to this SO thread, "In Unicorn each process establishes its own connection pool, so you if your db pool setting is 5 and you have 5 Unicorn workers then you can have up to 25 connections. However, since each unicorn worker can handle only one connection at a time, then unless your app uses threading internally each worker will only actually use one db connection... Having a pool size greater than 1 means each Unicorn worker has access to connections it can't use, but it won't actually open the connections, so that doesn't matter."
Since I am NOT using Sidekiq, do I even need to use connection pools for my Redis connections? Is there any benefit of a connection pool with a pool size of 1? Or should I simply use variables with single connections -- e.g. Redis.new(url: ENV["MY_CACHE"])?
Connection pool is only used when ActiveRecord talks to the SQL databases defined in your databases.yml config file. It is not related to Redis at all and the SO answer that you cite is actually not relevant for Redis.
So, unless you wanted to use some custom connection pool solution for Redis, you don't have to deal with it at all, as there is no pool for Redis in Rails by default. I guess the custom pool might be suitable if you had multiple threads in your applications which is not your case.
Update: Does building a connection pool make sense in your scenario? I doubt it. Connection pool is a way to reuse open connections (typically among multiple threads / requests). But you say that you:
use unicorn, the workers of which are separate, independent processes, not threads,
open a stable connection (or two) during after_fork, a connection which is then open all the time the unicorn worker lives
do not use threads in your application anywhere (I'd check if this is true again - it's not only Sidekiq but it might be any gem that tends to do things in the background).
In such scenario, pooling connection to Redis makes no sense to me as there seems to be no code that would benefit from reusing the connection - it is open all the time anyway.
So, recently I moved all the session-related information in my app to the redis. Everything is running fine and now I am not facing the cookie-related issues (especially from IE).
In doing that, I read some blogs and all of them defined a redis-connector as a global variable in the config like
$redis = Redis.new(:host => 'localhost', :port => 6379)
Now there are a few things that bugging me:
Defining a global resource means that I have just a single connection to the redis. Will it create a bottleneck in my system when I have to serve multiple requests?
Also when multiple request arrives, will the Rails enqueue the requests for the redis as the connection is global resource, in case it is already in use?
Redis supports multiple instances. Wouldn't creating multiple instances boost the performance?
There are no standard connections pools included into Redis gem. If we consider Rails as a single threaded execution model it doesn't sound too problematic.
It might be evil when used in multi-threaded environment (think of background jobs as an example). So connection pooling is a good idea in general.
You can implement it for Redis using connection_pool gem.
Sidekiq also uses this gem for connecting to Redis. It can be seen here and here. Also, sidekiq author is the same person as connection_pool author, https://github.com/mperham.
As to your questions:
Multiple requests still don't mean multi-threading, so this approach might work well before you use threads;
Rails is not going to play the role of connection pool for your database;
It will boost performance (and avoid certain errors) if used in multi-threaded environment.
1) No it's not a bottleneck, opening TCP for Redis for every query/request cause leak of perfomance.
3) Yes if you have more then one core/thread.
Simply measure Redis connection number to see there is no new connection instantiated before each Rails request processed. The connection established on rails processor (Unicorn, Puma, Passenger etc) side during application load process.
echo info | redis-cli | grep connected_clients
Try to run the bash command before and during your application is being run locally.
I'm experimenting with Rails 4 ActionController::Live and Server Sent Events. I'm using MRI 2.0.0 and Puma.
For what I can see, each connected client keeps an active connection to the server. I was wondering if it is possible to leverage SSEs without keeping all response streams running.
Puma manages multiple connections using threads, and I imagine there is a limit to the number of cuncurrent connections.
What if I want to support a real-world scenario with thousands of clients registering to my Rails app for SSE events?
Is there any example?
Also, I usually run Rails app servers behind an nginx reverse proxy. Would it require any particular setup?
The way that SSEs are built is by the client opening a connection to the server, which is then left open until the server has some data to send. This is part of the SSE spec, and not a thing specific to ActionController::Live. It's effectively the same as long-polling, but with the connection not being closed after the first bit of data is returned, and with the mechanism built into the browser.
As such, the only way it can be implemented is by having multiple open client connections to the webserver which sit there indefinitely. As to what resources are required to deal with them, I'm not sure, as I've not yet tried to benchmark this, but it'll need enough servers for Puma to keep open thousands of connections if you have that many users with a page open.
The default limit for puma is 16 concurrent connections. Several blogs posts about setting up SSEs for Rails mention upping this to a larger value, but none that I've found suggest what this higher value should be. They do mention that the number of DB connections will need to be the same, as each Rails thread keeps one running. Sort of sounds like an expensive way to run things.
"Run a benchmark" is the only answer really.
I can't comment as to reverse proxying as I've not tried it, but as SSEs are done over standard HTTP, I shouldn't think it'll need any special setup.
We all hear a lot about scaling issues in Rails.
I was just curious what the actual costs in handling a HTTP request is in the Rails framework. Meaning, what has to happen for each and every request which comes in? Is there class parsing? Configuration? Database Connection establishment?
That actually depends a lot on which web server you're using, and which configuration you're using, not to mention the application design itself. Configuration and design issues involved include:
Whether you're using fastcgi, old-school cgi, or some other request handling mechanism (affects whether you're going to have to rerun all of the app initialization code per request or not)
Whether you're using memcache (or an alternate caching strategy) or not (affects cost of database requests)
Whether you're using additional load balancing techniques or not
Which session persistence strategy you're using (if needed)
Whether you're using "development" mode or not, which causes code files to be reloaded whenever they're changed (as I recall; maybe it's just per-request) or not
Like most web app frameworks, there are solutions for connection pooling, caching, and process management. There are a whole bunch of ways to manage database access; the usual, default ones are not necessarily the highest performance, but it's not rocket science to adjust that strategy.
Someone who has dug into the internals more deeply can probably speak in more excruciating detail, but most apps use either FastCGI on Apache or an alternate more rails-friendly web server, which means that you only have app setup once per process.
Until the release of Phusion Passenger (aka mod_rails) the "standard" for deployment was not FastCGI but using a cluster of Mongrel servers fronted by Apache and mod_proxy (or Nginx etc).
The main issue behind the "Rails doesn't scale" is the fact that there are some quite complicated threading issues which has meant tiwht the current version of Ruby and the available serving mechanisms, Rails has not been threadsafe. This has meant that multiple containers have been required to run a Rails app to support high-levels of concurrent requests. Passenger makes some of this moot, as it handles all of this internally, and can also be run on a custom build of Ruby (Ruby Enterprise Edition) that changes the way memory is handled.
On top of this, the upcoming versions of both Ruby and Rails are directly addressing the threading issue and should close this argument once and for all.
As far as I am concerned the whole claim is pretty bogus. "Scale" is an architectural concern.
Here's a good high level overview of the lifecycle of a Rails request. After going through this, you can choose specific sections to profile and optimize.