I have a Ruby on Rails application deployed with Heroku; it has one Standard-1X web dyno (512MB) and one Standard-1X worker dyno (512MB). I use Puma and Redis (with RedisToGo) and Sidekiq for background jobs.
I regularly check the metrics for traffic and memory usage, etc. in the Heroku dashboard. I'm a little confused because my app seems to be using a high amount of memory considering its activity.
Every day it really only gets a few visits from a couple users and a few visits from web crawlers. Despite this low amount of traffic, my web dyno memory usage is pretty high. On most days, the graph looks like this for the web dyno memory usage:
As you can see, it sits at around 256MB all day (the dip on the left is a daily dyno restart).
On the other hand, the worker dyno's average memory usage total is about 112 MB.
Is there an explanation for these relatively high memory usages? Or are these simply typical for a deployed application? I've looked at other answers on StackOverflow and it doesn't look like a memory leak.
In case it helps, here is my Procfile and my Puma and Redis initializers.
Procfile
web: bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq -c 5 -v
Puma.rb
workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['RAILS_MAX_THREADS'] || 5)
threads threads_count, threads_count
preload_app!
rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development'
on_worker_boot do
ActiveRecord::Base.establish_connection
end
Redis.rb
uri = ENV["REDISTOGO_URL"] || "redis://localhost:6379/"
REDIS = Redis.new(:url => uri)
Thanks in advance for any tips.
I am using Nginx with Phusion Passenger with a single-threaded Rails application. Here's the catch. Within that application, I am using multi-threaded sidekiq to perfrom some background jobs. Typically in my database.yml, I would only need to set the pool value to 1. Here's an example:
default: &default
adapter: mysql2
encoding: utf8
collation: utf8_unicode_ci
pool: 1
username: username
password: password
host: localhost
The reason is because for each tcp socket connection opened, when an http request comes in through that socket, nginx will take the request and pass the information to passenger. Passenger detects its a Rails app, and it spawns a Rails instance, which converts the response to html, which is sent back to nginx, which is then passed back to the client (browser) So for each passenger instance, I will only need one database connection, with a single-threaded Rails app.
But in my sidekiq.yml, I have set concurrency to 5:
:concurrency: 5
This means for each passenger rack instance, I will have 5 concurrent threads handled by sidekiq plus the one connection for the main app, that is a total of 6 database connections for one passenger instance.
When I look at passenger-status, I notice that max_pool_size is set to 6:
----------- General information -----------
Max pool size : 6
So does that mean passenger will never spawn more than 6 Rails instances concurrently? And if that's the case, does that mean my math is correct: 6 (instances) * 6 (database connections: 5 for sidekiq and 1 for main app) = 36 (total database connections possible for my rails app to handle concurrently).
Right now my mysql database is configured to handle 151 max concurrent connections.
SHOW VARIABLES LIKE "max_connections";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 151 |
+-----------------+-------+
I just want to make sure my math is correct regarding passenger, rails and sidekiq.
First of all, your Sidekiq processes and your web server (in your case, Passenger) are separate. Passenger's thread pool size has no effect on your Sidekiq concurrency; instead, your Sidekiq configuration specifies a separate concurrency. So, we'll consider the two separately:
Passenger
The ActiveRecord database pool value is the number of database connections that your web process will use, in total across all threads. If your Passenger server is set up in multi-process mode, then your max connections from your web processes is db pool size * passenger pool size. On the other hand, if you set it up in multi-threaded mode (which I'd recommend if possible), your max connections is just db pool size (multiplied by however many processes are running; Puma, for example, runs by default two processes with up to fifteen threads or so, so the max connections in that case would be 30).
So, if you're using multi-threaded mode, a pool size of 1 is absolutely not sufficient -- you'll want at least as big a pool as you expect to have threads. In multi-process mode, 1 might work but I doubt it's really worth straying from the default of 5, until you encounter issues.
Sidekiq
Sidekiq always runs in multi-threaded mode (you can technically run multiple processes as well, but I'll assume you aren't). So, like above, you want your connection pool to be at least as big as the number of threads. This might mean that you technically need two different values for your db pool value depending on whether the Rails env is spinning up for Passenger, or for Sidekiq -- see this issue on the Sidekiq repo or this helpful Heroku guide for more information on how to address that.
In summary
Don't forget that, aside from all the above, you may easily have multiple servers all running the same Rails app, but only one database with one connection limit. If you're running Passenger in multi-instance mode with a max of 6 processes, set your db pool size to 5, then each web server node will use up to 30 connections. If it's running a Sidekiq server, then add 5 to that. You will probably not need more than one Sidekiq server, so 4 web nodes # 30 connections + one Sidekiq process # 5 connections = 125 maximum connections, well within your MySQL connection limit.
I reviewed the Passenger documentation again, and while the answer above answers the question, I want to add a little more detail:
HTTP client via TCP sends a request to Nginx
Phusion Passenger loaded into Nginx checks if request should be handled by Passenger. If so, request is sent to Passenger Core.
Passenger core, using load balancing rules, determines which process a request should be forwarded to.
Passenger core also takes care of application spawning: if it determines that having more application processes is necessary or beneficial, then it will make that happen subject to user-configured limits: the core will never spawn more processes than a user-configured maximum.
Passenger core also has monitoring and statistics: passenger-memory-stats and passenger-status
Passenger core restarts an application process if it crashes.
UstRouter sits idle and does not consume resources if you did not configure it to send data to Union Station, a monitoring web service
Watchdog monitors Passenger Core and UstRouter. If either of them crash, they are restarted by the Watchdog.
passenger-memory-stats will verify the three aforementioned processes as well as the spawned rack apps:
------ Passenger processes ------
PID VMSize Private Name
---------------------------------
18355 419.1 MB ? Passenger watchdog
18358 1096.5 MB ? Passenger core
18363 427.2 MB ? Passenger ust-router
18700 818.9 MB 256.2 MB Passenger RubyApp: myapp_rack_rails
24783 686.9 MB 180.2 MB Passenger RubyApp: myapp_rack_rails
passenger-status shows that the max_pool_size is 6. That is, at most there will be 6 rack apps spawned by Passenger Core:
----------- General information -----------
Max pool size : 6
App groups : 2
Processes : 3
As stated in another answer, the ActiveRecord database pool value is the number of database connections that your web process will use, in total across all threads.
But since I am using the free Passenger server, which is set up in multi-process mode, then my max connections from my web processes is db pool size * passenger pool size. So since Passenger pool size is 6, and if my db pool size is 1, that is 6 * 1 = 6. That will be 6 maximum database connections.
Sidekiq always runs in multi-threaded mode.
If someone wants to use sidekiq they must configure the number of threads they want to run on or use the default (25). If they are using a database (likely) then to not hit a connection timeout error they will need to have at least as many connections in their database pool as sidekiq threads. Currently they must configure these two values in two different places, database pool in database.yml for ActiveRecord, and sidekiq connections either via command line or the sidekiq yml file. This is a problem as it is difficult to remember when you are modifying one value that you need to modify both.
Each Sidekiq worker (thread) requires 1 connection to the database. Postgresql can have at most a few hundreds connections. This is a bottleneck for scalability.
Since I need about 1 thousand workers and Postgresql isn't required (I can pass all the data that I need through Redis and remove the SQL) I am wondering if it's possible to start the Rails environment without connections to Postgresql.
How can I start Sidekiq workers without Postgresql?
Note that I still need Postgresql for the normal web app/backend so I cannot remove ActiveRecord altogether from the Rails app.
If a thread doesn't use the database, it won't take a connection. This assumption is false:
Each Sidekiq worker (thread) requires 1 connection to the database.
i have a custom setup of Ruby on rails and using puma as a web server (backed by Nginx - socket)
The database I am connection to is a rds medium (so 296 connections limit). My puma setup is threads 1:32 and 4 workers. With a connection pool of 128.
I have a high load 300 requests/sec and every lets say 1000 requests a longer calculation is made that takes about 3 seconds (getting all the events, making some calculations and updating them).
I am getting the error
ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds (waited 5.016 seconds)
But if I look at the rds database only 43 connections are opened. My memory is like 2000 MB out of 7000 MB (the 2 core processors are at 100%) I am wondering why do I get a connection timeout even if all my connections are not opened (and of course is the puma configuration ok)?
Thank for your help!
EDIT:
In my puma.rb I have:
on_worker_boot do
ActiveRecord::Base.connection_pool.disconnect!
ActiveSupport.on_load(:active_record) do
config = Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['DB_POOL'] || 128
ActiveRecord::Base.establish_connection
end
end
As mentioned here in the rails configuration guide about database pooling, when all the db connections are exhausted, ActiveRecord will wait for one to free up, I assume the number you increased is the http connection limit, not the db connection limit,
You could edit your database.yml and increase you connection limit to say 296 which is the limit of the rds instance
production:
adapter: mysql2
database: /path/to/sock
pool: 296
# username, password, etc
I am using sidekiq with rails3. Sidekiq runs 25 threads default. I would like to increase multi-thread limit, I have done this by changing sidekiq.yml.
So, what is the relation between pool value in database.yml and sidekiq multi-thread. What is the maximun value of mysql pool. Is it depends on server memory?
sidekiq.yml
:verbose: true
:concurrency: 50
:pool: 50
:queues:
- [queue_primary, 7]
- [default, 5]
- [queue_secondary, 3]
database.yml
production:
adapter: mysql2
encoding: utf8
reconnect: false
database: db_name
pool: 50
username: root
password: root
socket: /var/run/mysqld/mysqld.sock
Each Sidekiq job executes in one of up to 50 threads with your configuration. Inside the job, any time an ActiveRecord model needs to access the database, it uses a database connection from the pool of available connections shared by all ActiveRecord models in this process. The connection pool lets a thread take a connection or blocks until a free connection is available.
If you have less connections available in your ActiveRecord database connection pool than running Sidekiq jobs/threads, jobs will be blocked waiting for a connection and possibly timeout (after ~ 5 seconds) and fail.
This is why it's important that you have as many available database connections as threads in your sidekiq worker process.
Unicorn is a single-threaded, multi-process server - so you shouldn't need more than one connection for each Unicorn back-end worker process.
However, the database can only handle so many connections (depending on OS, hardware, and configuration limits) so you need to make sure that you are distributing your database connections where they are needed and not exceeding your maximum.
For example, if your database is limited to 1000 connections, you could only run 20 sidekiq processes with 50 threads each and nothing else.