Why are we out of database connections on Heroku? - ruby-on-rails

We have a Rails app on Heroku with Sidekiq and are running out of database connections.
ActiveRecord::ConnectionTimeoutError: could not obtain a database
connection within 5.000 seconds (waited 5.000 seconds)
Heroku stuff:
Database plan: Standard0 (120 connections)
Web dynos: 2 Standard-2X
Worker dynos: 1 Standard-2X
heroku config:
MAX_THREADS: 5
(DB_POOL not set)
(WEB_CONCURRENCY not set)
Procfile:
web: bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq
database.yml:
...
production:
url: <%= ENV["DATABASE_URL"] %>
pool: <%= ENV["DB_POOL"] || ENV['MAX_THREADS'] || 5 %>
puma.rb:
# https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#adding-puma-to-your-application
workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['MAX_THREADS'] || 2)
threads threads_count, threads_count
preload_app!
rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || 'development'
on_worker_boot do
# Worker specific setup for Rails 4.1+
# See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
ActiveRecord::Base.establish_connection
end
sidekiq.yml:
---
:concurrency: 25
:queues:
- [default]
We also have a couple of rake tasks that fire every 10 minutes, and they finish within a second or two.
The problem seems to happen when we do a lot of message processing in sidekiq. We do something like:
get article headlines from a 3rd party web service
insert each headline into the db inside a single transaction
create a message in sidekiq for each headline (worker.perform_async)
each message is processed, hits an endpoint to get the body and updates the body (can take .5 - 3 seconds)
While number 4 is happening we see the connection issue.
My understanding is we are way, way, way below the connection limit with our configuration above, but did we do something incorrectly? Is something just consuming the pool? Any help would be great, thanks.
Sources:
https://devcenter.heroku.com/articles/concurrency-and-database-connections
https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server
https://github.com/mperham/sidekiq/wiki/Advanced-Options

You are sharing 5 DB connections among 25 Sidekiq threads. Set DB_POOL to 25 or Sidekiq's concurrency to 5.

Related

Sidekiq getting ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool

I am getting an ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool exception for background jobs in Sidekiq
CONFIG
I have a PUMA web process and a SIDEKIQ process running on Heroku (2 hobby dynos) [A Rails app with background jobs]
In database.yml I have pool: 40 (in default and production)
In sidekiq.yml I have :concurrency: 7
In puma.rb I have max_threads_count = ENV.fetch("PUMA_MAX_THREADS") { 5 } and have set ENV["PUMA_MAX_THREADS"] = 5
I am using a Heroku pgsql hobby instance, which allows for 20 connections
EXPECTED BEHAVIOR
When the 7 Sidekiq workers are busy running jobs they should have enough available db connections.
Because:
Needed db connections:
5 for 5 PUMA threads
12: [7 + 5] for SIDEKIQ threads (7 workers + 5 for redis? - not sure about reasoning behind that one)
TOTAL NEEDED: 17 [12+5]
TOTAL AVAILABLE: 20
ACTUAL BEHAVIOR
When the 7 Sidekiq workers are busy running jobs, 2 jobs fail and raise the ConnectionTimeOutError (always 2 jobs, so actual max concurrency is 5)
STUFF I NOTICED (MIGHT HELP):
In SIDEKIQ dashboard, Redis connections reach a maximum 10 (never higher) [I guess 5 threads + 5]
In Heroku db, when enqueueing a lot of jobs, connections are always much lower than the 20 available (so no problem from the pgsql instance)
Any help or advice would be super appreciated :))
Thanks in advance!
UPDATE: Adding my database.yml file
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("DB_POOL") { 10 } %>
development:
<<: *default
database: tracker_app_development
test:
<<: *default
database: tracker_app_test
production:
url: <%= ENV['DATABASE_URL'] %>
pool: <%= ENV.fetch("DB_POOL") { 10 } %>
web: DB_POOL=$PUMA_MAX_THREADS bundle exec puma -C config/puma.rb
worker: DB_POOL=14 bundle exec sidekiq -C config/sidekiq.yml
release: rake db:migrate
This exception is being raised from the ActiveRecord::ConnectionAdapters::ConnectionPool::Queue class in Rails, specifically in the poll method of the class, which accepts a timeout period (defaults to 5s). This is how the error is being raised:
if elapsed >= timeout
msg = "could not obtain a connection from the pool within %0.3f seconds (waited %0.3f seconds); all pooled connections were in use" %
[timeout, elapsed]
raise ConnectionTimeoutError, msg
end
I think this is saying that if the time elapsed since it has tried to acquire a connection is greater than the timeout provided (default 5s) then it will raise this exception. This is happening because the number of available connections from the pool is 10, while in Sidekiq you have mentioned 14 as the default pool size. Try to increase the pool size of your web dyno to more than or equal to the number of default pool connections specified in your Sidekiq dyno. Hopefully, this resolves this exception.
If this does not work, then you can try increasing the checkout_timeout from 5s to a longer duration like so:
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("DB_POOL") { 10 } %>
checkout_timeout: 10
development:
<<: *default
database: tracker_app_development
test:
<<: *default
database: tracker_app_test
production:
url: <%= ENV['DATABASE_URL'] %>
pool: <%= ENV.fetch("DB_POOL") { 10 } %>
This is what the API documentation for Rails has to say about ConnectionPools.
https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
SOLUTION FOUND:
In my database.yml file, the production config had 1 indentation while it should have had 0...

Puma::Server::UNPACK_TCP_STATE_FROM_TCP_INFO

I succefully installed a rails app on my FreeBSD server but when I test rails s -e production or rails s -e development I get Read: #<NameError: uninitialized constant Puma::Server::UNPACK_TCP_STATE_FROM_TCP_INFO> from the Puma server after sending request
I missed a step somewhere ?
PS. I use Rails6 with SqlLite3
config/puma.rb
# Puma can serve each request in a thread from an internal thread pool.
# The `threads` method setting takes two numbers: a minimum and maximum.
# Any libraries that use thread pools should be configured to match
# the maximum value specified for Puma. Default is set to 5 threads for minimum
# and maximum; this matches the default thread size of Active Record.
#
max_threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count
# Specifies the `port` that Puma will listen on to receive requests; default is 3000.
#
port ENV.fetch("PORT") { 3000 }
# Specifies the `environment` that Puma will run in.
#
environment ENV.fetch("RAILS_ENV") { "development" }
# Specifies the `pidfile` that Puma will use.
pidfile ENV.fetch("PIDFILE") { "tmp/pids/server.pid" }
# Specifies the number of `workers` to boot in clustered mode.
# Workers are forked web server processes. If using threads and workers together
# the concurrency of the application would be max `threads` * `workers`.
# Workers do not work on JRuby or Windows (both of which do not support
# processes).
#
# workers ENV.fetch("WEB_CONCURRENCY") { 2 }
# Use the `preload_app!` method when specifying a `workers` number.
# This directive tells Puma to first boot the application and load code
# before forking the application. This takes advantage of Copy On Write
# process behavior so workers use less memory.
#
# preload_app!
# Allow puma to be restarted by `rails restart` command.
plugin :tmp_restart
I just made monkey patch that seems to work.
(This problem happens on FreeBSD, not on Mac OS X)
Place this content in an initializer file. For example: config/initializers/puma_missing_constant_monkey_patch.rb.
Rails.application.config.after_initialize do
if defined?(::Puma) && !Object.const_defined?('Puma::Server::UNPACK_TCP_STATE_FROM_TCP_INFO')
::Puma::Server::UNPACK_TCP_STATE_FROM_TCP_INFO = "C".freeze
end
end
It just defines the missing constant. I've got no clue if it breaks something else. On the other hand Puma uses a constant that isn't defined. The define of this constant in Puma (lib/puma/server.rb) is conditional.

How to start MidiSmtpServer in Rails

I am trying to use MidiSmtpServer to receive email in a Heroku application, and have been using the code on one of the examples that the documents show. However, I don't know where to put that code for the SMTP server to start after Puma, or where to put it for it to start at all. Using on_worker_boot in puma.rb doesnt work.
puma.rb:
# Puma can serve each request in a thread from an internal thread pool.
# The `threads` method setting takes two numbers: a minimum and maximum.
# Any libraries that use thread pools should be configured to match
# the maximum value specified for Puma. Default is set to 5 threads for minimum
# and maximum; this matches the default thread size of Active Record.
#
max_threads_count = ENV.fetch("RAILS_MAX_THREADS") { 5 }
min_threads_count = ENV.fetch("RAILS_MIN_THREADS") { max_threads_count }
threads min_threads_count, max_threads_count
# Specifies the `port` that Puma will listen on to receive requests; default is 3000.
#
port ENV.fetch("PORT") { 3000 }
# Specifies the `environment` that Puma will run in.
#
environment ENV.fetch("RAILS_ENV") { "development" }
# Specifies the `pidfile` that Puma will use.
pidfile ENV.fetch("PIDFILE") { "tmp/pids/server.pid" }
# Specifies the number of `workers` to boot in clustered mode.
# Workers are forked web server processes. If using threads and workers together
# the concurrency of the application would be max `threads` * `workers`.
# Workers do not work on JRuby or Windows (both of which do not support
# processes).
#
# workers ENV.fetch("WEB_CONCURRENCY") { 2 }
require "midi-smtp-server"
require "mail"
on_worker_boot do
class MySmtpd < MidiSmtpServer::Smtpd
def on_message_data_event(ctx)
puts "[#{ctx[:envelope][:from]}] for recipient(s): [#{ctx[:envelope][:to]}]..."
# Just decode message ones to make sure, that this message ist readable
#mail = Mail.read_from_string(ctx[:message][:data])
# handle incoming mail, just show the message source
puts #mail.to_s
end
end
# try to gracefully shutdown on Ctrl-C
trap("INT") do
puts "Interrupted, exit now..."
exit 0
end
# Output for debug
puts "#{Time.now}: Starting MySmtpd..."
# Create a new server instance listening at localhost interfaces 127.0.0.1:2525
# and accepting a maximum of 4 simultaneous connections
server = MySmtpd.new(2525, "0.0.0.0", 4)
# setup exit code
at_exit do
# check to shutdown connection
if server # Output for debug
puts "#{Time.now}: Shutdown MySmtpd..." # stop all threads and connections gracefully
server.stop
end # Output for debug
puts "#{Time.now}: MySmtpd down!\n"
end
# Start the server
server.start
# Run on server forever
server.join
end
# Use the `preload_app!` method when specifying a `workers` number.
# This directive tells Puma to first boot the application and load code
# before forking the application. This takes advantage of Copy On Write
# process behavior so workers use less memory.
#
# preload_app!
# Allow puma to be restarted by `rails restart` command.
plugin :tmp_restart
Applications running on Heroku are containerized, and running an SMTP server in or with the web process is not possible.
You need to instead look at services that provide inbound mail delivery. If you're using Rails 6, follow the documentation to set up ActionMailbox.

Optimising Sidekiq, Redis, Heroku and Rails

So I'm trying to process a CSV file via Sidekiq background job processing on a Heroku Worker instance. Whilst I can complete the process, I feel it could certainly be done quicker/more efficiently than I'm currently doing it. This question contains two parts - firstly are the database pools setup correctly and secondly how can I optimise the process.
Application environment:
Rails 4 application
Unicorn
Sidekiq
Redis-to-go (Mini plan, 50 connections max)
CarrierWave S3 implementation
Heroku Postgres (Standard Yanari, 60 connections max)
1 Heroku Web dyno
1 Heroku Worker dyno
NewRelic monitoring
config/unicorn.rb
worker_processes 3
timeout 15
preload_app true
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
end
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to send QUIT'
end
if defined?(ActiveRecord::Base)
config = ActiveRecord::Base.configurations[Rails.env] ||
Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['DB_POOL'] || 2
ActiveRecord::Base.establish_connection(config)
end
end
config/sidekiq.yml
---
:concurrency: 5
staging:
:concurrency: 5
production:
:concurrency: 35
:queues:
- [default, 1]
- [imports, 10]
- [validators, 10]
- [send, 5]
- [clean_up_tasks, 30]
- [contact_generator, 20]
config/initializers/sidekiq.rb
ENV["REDISTOGO_URL"] ||= "redis://localhost:6379"
Sidekiq.configure_server do |config|
config.redis = { url: ENV["REDISTOGO_URL"] }
database_url = ENV['DATABASE_URL']
if database_url
ENV['DATABASE_URL'] = "#{database_url}?pool=50"
ActiveRecord::Base.establish_connection
end
end
Sidekiq.configure_client do |config|
config.redis = { url: ENV["REDISTOGO_URL"] }
end
The database connection pools are worked out as such:
I have 3 Web processes (unicorn worker_processes), to each of these I am allocating 2 ActiveRecord connections via the after_fork hook (config/unicorn.rb) for 6 total (maximum) of my 60 available Postgres connections assigned to the Web dyno. In the Sidekiq initialiser, I'm allocating 50 Postgres connections via the ?pool=50 param appended to ENV['DATABASE_URL'] as described (somewhere) in the docs. I'm keeping my Sidekiq concurrency value at 35 (sidekiq.yml) to ensure I stay under both the 50 Redis connections and 60 Postgres connection limits. This still needs more finely grained tuning, but I'd rather get the data processing itself sorted before going any further with this.
Now, assuming the above is correct (and it wouldn't surprise me at all if it weren't) I'm handling the following scenario:
A user uploads a CSV file to be processed via their browser. This file can be anywhere between 50 rows and 10 million rows. The file is uploaded to S3 via the CarrierWave gem.
The user then configures a couple of settings for the import via the UI,the culmination of which adds an FileImporter job to the Sidekiq queue to start creating various models based on the rows.
The Import worker looks something like:
class FileImporter
include Sidekiq::Worker
sidekiq_options :queue => :imports
def perform(import_id)
import = Import.find_by_id import_id
CSV.foreach(open(import.csv_data), headers: true) do |row|
# import.csv_data is the S3 URL of the file
# here I do some validation against a prebuilt redis table
# to validate the row without making any activerecord calls
# (business logic validations rather than straight DB ones anyway)
unless invalid_record # invalid_record being the product of the previous validations
# queue another job to actually create the AR models for this row
ImportValidator.perform_async(import_id, row)
# increment some redis counters
end
end
end
This is slow - I've tried to limit the calls to ActiveRecord in the FileImporter worker so I'm assuming it's because I'm streaming the file from S3. It's not processing rows fast enough to build up a queue so I'm never utilising all of my worker threads (usually somewhere between 15-20 of the 35 available threads are active. I've tried splitting this job up and feeding rows a 100 at a time into an intermediary worker which then creates the ImportValidator jobs in a more parallel fashion but that didn't fare much better.
So my question is, what's the best/most efficient method to accomplish a task like this?
It's possible you are at 100% CPU with 20 threads. You need another dyno.

Possible to avoid ActiveRecord::ConnectionTimeoutError on Heroku?

On Heroku I have a rails app running that with both a couple web dynos as well as one worker dyno. I'm running thousands of worker tasks throughout the day on Sidekiq however occasionally the ActiveRecord::ConnectionTimeoutError is raised (approximately 50 times a day). I've set up my unicorn server as follows
worker_processes 4
timeout 30
preload_app true
before_fork do |server, worker|
# As suggested here: https://devcenter.heroku.com/articles/rails-unicorn
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
end
end
after_fork do |server,worker|
if defined?(ActiveRecord::Base)
config = Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['DB_POOL'] || 10
ActiveRecord::Base.establish_connection(config)
end
Sidekiq.configure_client do |config|
config.redis = { :size => 1 }
end
Sidekiq.configure_server do |config|
config = Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['DB_POOL'] || 10
ActiveRecord::Base.establish_connection(config)
end
end
On heroku I've set the DB_POOL config variable to 2 as recommended by Heroku. Should these errors be happening at all? Seems odd that it would be impossible to avoid such errors, no? What would you suggest?
A sidekiq server (the process running on your server that is actually performing the delayed tasks) will by default dial up to 25 threads to process work off its queue. Each of these threads could be requesting a connection to your primary database through ActiveRecord if your tasks require it.
If you only have a connection pool of 5 connections, but you have 25 threads trying to connect, after 5 seconds the threads will just give up if they can't get an available connection from the pool and you'll get a connection time out error.
Setting the pool size for your Sidekiq server to something closer to your concurrency level (set with the -c flag when you start the process) will help alleviate this issue at the cost of opening many more connections to your database. If you are on Heroku and are using Postgres for example, some of their plans are limited to 20, whereas others have a connection limit of 500 (source).
If you are running a multi-process server environment like Unicorn, you also need to monitor the number of connections each forked process makes as well. If you have 4 unicorn processes, and a default connection pool size of 5, your unicorn environment at any given time could have 20 live connections. You can read more about that on Heroku's docs. Note also that the DB pool size doesn’t mean that each dyno will now have that many open connections, but only that if a new connection is needed it will be created until a maximum of that many have been created.
With that said, here is what I do.
# config/initializers/unicorn.rb
if ENV['RACK_ENV'] == 'development'
worker_processes 1
listen "#{ENV['BOXEN_SOCKET_DIR']}/rails_app"
timeout 120
else
worker_processes Integer(ENV["WEB_CONCURRENCY"] || 2)
timeout 29
end
# The timeout mechanism in Unicorn is an extreme solution that should be avoided whenever possible.
# It will help catch bugs in your application where and when your application forgets to use timeouts,
# but it is expensive as it kills and respawns a worker process.
# see http://unicorn.bogomips.org/Application_Timeouts.html
# Heroku recommends a timeout of 15 seconds. With a 15 second timeout, the master process will send a
# SIGKILL to the worker process if processing a request takes longer than 15 seconds. This will
# generate a H13 error code and you’ll see it in your logs. Note, this will not generate any stacktraces
# to assist in debugging. Using Rack::Timeout, we can get a stacktrace in the logs that can be used for
# future debugging, so we set that value to something less than this one
preload_app true # for new relic
before_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn master intercepting TERM and sending myself QUIT instead'
Process.kill 'QUIT', Process.pid
end
if defined?(ActiveRecord::Base)
ActiveRecord::Base.connection.disconnect!
end
end
after_fork do |server, worker|
Signal.trap 'TERM' do
puts 'Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT'
end
Rails.logger.info("Done forking unicorn processes")
#https://devcenter.heroku.com/articles/concurrency-and-database-connections
if defined?(ActiveRecord::Base)
db_pool_size = if ENV["DB_POOL"]
ENV["DB_POOL"]
else
ENV["WEB_CONCURRENCY"] || 2
end
config = Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['DB_POOL'] || 2
ActiveRecord::Base.establish_connection(config)
# Turning synchronous_commit off can be a useful alternative when performance is more important than exact certainty about the durability of a transaction
ActiveRecord::Base.connection.execute "update pg_settings set setting='off' where name = 'synchronous_commit';"
Rails.logger.info("Connection pool size for unicorn is now: #{ActiveRecord::Base.connection.pool.instance_variable_get('#size')}")
end
end
And for sidekiq:
# config/initializers/sidekiq.rb
Sidekiq.configure_server do |config|
sidekiq_pool = ENV['SIDEKIQ_DB_POOL'] || 20
if defined?(ActiveRecord::Base)
Rails.logger.debug("Setting custom connection pool size of #{sidekiq_pool} for Sidekiq Server")
db_config = Rails.application.config.database_configuration[Rails.env]
db_config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
cb_config['pool'] = sidekiq_pool
ActiveRecord::Base.establish_connection(db_config)
Rails.logger.info("Connection pool size for Sidekiq Server is now: #{ActiveRecord::Base.connection.pool.instance_variable_get('#size')}")
end
end
If all goes well, when you fire up your processes you'll see something like in your log:
Setting custom connection pool size of 10 for Sidekiq Server
Connection pool size for Sidekiq Server is now: 20
Done forking unicorn processes
(1.4ms) update pg_settings set setting='off' where name = 'synchronous_commit';
Connection pool size for unicorn is now: 2
Sources:
https://devcenter.heroku.com/articles/concurrency-and-database-connections#connection-pool
https://github.com/mperham/sidekiq/issues/503
https://github.com/mperham/sidekiq/wiki/Advanced-Options
For Sidekiq server config it is recommended to have a db_pool number the same as your concurrency, which I assume you have set to greater than 2.
Assuming that setting your db_pool is working in unicorn.rb (I've not had experience doing it this way) a potential solution is to set another environment variable to control the Sidekiq db_pool directly.
If you had a sidekiq concurrency of 20 then something like:
Config var - SIDEKIQ_DB_POOL = 20
Sidekiq.configure_server do |config|
config = Rails.application.config.database_configuration[Rails.env]
config['reaping_frequency'] = ENV['DB_REAP_FREQ'] || 10 # seconds
config['pool'] = ENV['SIDEKIQ_DB_POOL'] || 10
ActiveRecord::Base.establish_connection(config)
end
This ensures you have two separate pools optimised to either your web workers DB_POOL and your background workers SIDEKIQ_DB_POOL

Resources