Close Rails ActiveRecord Connection Pool - ruby-on-rails

I am using a second database with datasets within my API.
Every API request can have up to 3 queries on that Database so I am splitting them in three Threads. To keep it Thread safe I am using a connection pool.
But after the whole code is run the ConnectionPool thread is not terminated. So basically every time a request is made, we will have a new Thread on the server until basically there is no memory left.
Is there a way to close the connection pool thread? Or am I doing wrong on creating a connection pool per request?
I setup the Connection Pool this way:
begin
full_db = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)
resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new(full_db)
spec = resolver.spec(Rails.env.to_sym)
pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)
Then I am running through the queries array and getting the results to the query
returned_responses = []
queries_array.each do |query|
threads << Thread.new do
pool.with_connection do |conn|
returned_responses << conn.execute(query).to_a
end
end
end
threads.map(&:join)
returned_responses
Finally I close the connections inside the connection pool:
ensure
pool.disconnect!
end

Since you want to make SQL queries directly without taking advantage of ActiveRecord as the ORM, but you do want to take advantage of ActiveRecord connection pooling, I suggest you create a new abstract class like ApplicationRecord:
# app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
connects_to database: {
writing: :full_datasets_database,
reading: :full_datasets_database
}
end
You'll need to configure the database full_datasets_database in database.yml so that connects_to is able to connect to it.
Then you'll be able to connect directly to that database and make direct SQL queries against it by referencing that class instead of ActiveRecord::Base:
FullDatasets.connection.execute(query)
The connection pooling will happen transparently with different pools:
FullDatasets.connection_pool.object_id
=> 22620
ActiveRecord::Base.connection_pool.object_id
=> 9000
You may have to do additional configuration, like dumping the schema to db/full_datasets_schema.rb, but any additional troubleshooting or configuration you'll have to do will be in described in https://guides.rubyonrails.org/active_record_multiple_databases.html.
The short version of this explanation is that you should attempt to take advantage of ActiveRecord as much as possible so that your implementation is clean and straightforward while still allowing you to drop directly to raw SQL.

After some time spent, I ended up finding an answer. The generic idea came from #anothermg but I had to do some changes in order to work in my version of rails (5.2).
I setup the database in config/full_datasets_database.yml
I had the following initializer already:
#! config/initializers/db_full_datasets.rb
DB_FULL_DATASETS = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)[Rails.env]
I created the following model to create a connection to the new database:
#! app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
establish_connection DB_FULL_DATASETS
end
On the actual module I added the following code:
def parallel_queries(queries_array)
returned_responses = []
threads = []
conn = FullDatasets.connection_pool
queries_array.each do |query|
threads << Thread.new do
returned_responses << conn.with_connection { |c| c.execute(query).to_a }
end
end
threads.map(&:join)
returned_responses
end

Follow the official way of handling multiple databases in Rails:
https://guides.rubyonrails.org/active_record_multiple_databases.html
I can't give you an accurate answer as I do not have your source code to fully understand the whole context. If the setup that I sent above is not applicable to your use case, you might have missed some background clean up tasks. You can refer to this doc:
https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html

Related

Automatic role change for read/writes using ActiveRecord

The addition of multiple databases on Rails 6.1 is great and provides a way to have a fairly simple read from follower write on replica setup if you use the included resolver.
That works if your app is fully RESTful and there's no mutations on the database during GETs and HEADs. On paper that's great but there are some situations where alterations of the data can happen (see consuming tokens, lazy cleaning data, set flags, etc).
ActiveRecord knows when the query is a write one and raises an error if the target connection is a replica one.
I want to automatically switch between leader and follower based on that condition. PGPool2 does that but I'd like to avoid setting up another service if it were possible.
A simple solution I tried was to create a custom resolver and wrap the execution but since ActiveRecord can detect a query that modifies the data I was wondering if there's any way to switch the connection before execution.
# frozen_string_literal: true
module Middleware
module DatabaseSelector
class AutoSwitchResolver < ActiveRecord::Middleware::DatabaseSelector::Resolver
def read(&block)
super(&block)
rescue ActiveRecord::ReadOnlyError
write(&block)
end
end
end
end
I also tried to hook up to ActiveRecord::ConnectionAdapters::PostgreSQLAdapter execute methods as Marginalia does to try to overwrite the target without much luck.
EDIT:
I also tried and it seemed to work but it ends up failing alongside the stock Rails resolver:
class ApplicationRecord < ActiveRecord::Base
# ...
around_save :ensure_write_database
around_destroy :ensure_write_database
def touch(*args)
ensure_write_database { super(*args) }
end
def ensure_write_database
ApplicationRecord.connected_to(role: :writing) { yield }
end

Different database connections per thread, same model?

I would like to be able to connect to different databases in separate threads, and query the same model in each database. For instance, without threads, I can do something like:
# given 'db1' and 'db2' are rails environments with connection configurations
['db1', 'db2'].each do |db|
Post.establish_connection(db)
Post.where(title: "Foo")
end
Post.establish_connection(Rails.env)
This will loop over the two databases and look up the posts in each. I need to be able to do this in parallel using threads, like:
['db1', 'db2'].each do |db|
threads = Thread.new do
Post.establish_connection(db)
Post.where(title: "Foo")
end
end
threads.join
Post.establish_connection(Rails.env)
But quite clearly, establishing a new connection pool in each thread using the global Post class isn't threadsafe.
What I'd like to do is establish a new connection pool in each thread. I got this far:
['db1', 'db2'].each do |db|
threads = Thread.new do
conf = ActiveRecord::ConnectionAdapters::ConnectionSpecification.new(Rails.configuration.database_configuration[db], "mysql2_connection")
pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(conf)
pool.with_connection do |con|
# problem is, I have this con object but using the Post class still uses the thread's default connection.
Post.where(title: "Foo")
end
end
end
threads.join
There has to be a way for me to change the connection pool that ActiveRecord uses, on a thread by thread basis?

Rails: switch connection on each request but keep a connection pool

In our Rails application we need to use different databases depending on the subdomain of the request (different DB per country).
Right now we're doing something similar to what's recommended in this question. That is, calling ActiveRecord::Base.establish_connection on each request.
But it seems ActiveRecord::Base.establish_connection drops the current connection pool and establishes a new connection each time it's called.
I made this quick benchmark to see if there was any significant difference between calling establish_connection each time and having the connections already established:
require 'benchmark/ips'
$config = Rails.configuration.database_configuration[Rails.env]
$db1_config = $config.dup.update('database' => 'db1')
$db2_config = $config.dup.update('database' => 'db2')
# Method 1: call establish_connection on each "request".
Benchmark.ips do |r|
r.report('establish_connection:') do
# Simulate two requests, one for each DB.
ActiveRecord::Base.establish_connection($db1_config)
MyModel.count # A little query to force the DB connection to establish.
ActiveRecord::Base.establish_connection($db2_config)
MyModel.count
end
end
# Method 2: Have different subclasses of my models, one for each DB, and
# call establish_connection only once
class MyModelDb1 < MyModel
establish_connection($db1_config)
end
class MyModelDb2 < MyModel
establish_connection($db2_config)
end
Benchmark.ips do |r|
r.report('different models:') do
MyModelDb1.count
MyModelDb2.count
end
end
I run this script with rails runner and pointing to a local mysql with some couple thousand records on the DBs and the results seem to indicate that there in fact is a pretty big difference (of an order of magnitude) between the two methods (BTW, i'm not sure if the benchmark is valid or i screwed up and therefore the results are misleading):
Calculating -------------------------------------
establish_connection: 8 i/100ms
-------------------------------------------------
establish_connection: 117.9 (±26.3%) i/s - 544 in 5.001575s
Calculating -------------------------------------
different models: 119 i/100ms
-------------------------------------------------
different models: 1299.4 (±22.1%) i/s - 6188 in 5.039483s
So, basically, i'd like to know if there's a way to maintain a connection pool for each subdomain and then re-use those connections instead of establishing a new connection on each request. Having a subclass of my models for each subdomain is not feasible, as there are many models; i just want to change the connection for all the models (in ActiveRecord::Base)
Well, i've been digging into this a bit more and managed to get something working.
After reading tenderlove's post about connection management in ActiveRecord, which explains how the class hierarchy gets unnecessarily coupled with the connection management, i understood why doing what i'm trying to do in not as straightforward as one would expect.
What i ended up doing was subclassing ActiveRecord's ConnectionHandler and using that new connection handler at the top of my model hierarchy (some fiddling on the ConnectionHandler code was needed to understand how it works internally; so of course this solution could be very tied to the Rails version i'm using (3.2)). Something like:
# A model class that connects to a different DB depending on the subdomain
# we're in
class ModelBase < ActiveRecord::Base
self.abstract_class = true
self.connection_handler = CustomConnectionHandler.new
end
# ...
class CustomConnectionHandler < ActiveRecord::ConnectionAdapters::ConnectionHandler
def initialize
super
#pools_by_subdomain = {}
end
# Override the behaviour of ActiveRecord's ConnectionHandler to return a
# connection pool for the current domain.
def retrieve_connection_pool(klass)
# Get current subdomain somehow (Maybe store it in a class variable on
# each request or whatever)
subdomain = ##subdomain
#pools_by_subdomain[subdomain] ||= create_pool(subdomain)
end
private
def create_pool(subdomain)
conf = Rails.configuration.database_configuration[Rails.env].dup
# The name of the DB for that subdomain...
conf.update!('database' => "db_#{subdomain}")
resolver = ActiveRecord::Base::ConnectionSpecification::Resolver.new(conf, nil)
# Call ConnectionHandler#establish_connection, which receives a key
# (in this case the subdomain) for the new connection pool
establish_connection(subdomain, resolver.spec)
end
end
This still needs some testing to check if there is in fact a performance gain, but my initial tests running on a local Unicorn server suggest there is.
As far as I know Rails does not maintain it's database pool between requests, except if you use multi-threaded env. like Sidekiq. But if you use Passenger or Unicorn on your production server, it will create a new database connection for each Rails instance.
So basically using a database connection pool is useless, which means that creating a new database connection on each request should not be a concern.

Active Record and multi threading involving multiple dbs

I have a Rails app that is multi homed.
foo.mysite.com talks to the "foo" db.
bar.mysite.com talks to the "bar" db.
This is accomplished by calling:
ActiveRecord::Base.connection_handler.establish_connection("ActiveRecord::Base", foo_spec)
When requests come in for foo it uses the foo_spec, when requests come in for bar it uses the bar_spec.
Everything is happy and there is peace in the world.
However,
I also use sidekiq, it is heavily multi-threaded.
I was getting weird behavior in sidekiq. Often when I thought I was talking to the foo_db, ActiveRecord::Base.connection was pointed at bar_db.
I dug into the code and found:
def retrieve_connection_pool(klass)
pool = #class_to_pool[klass.name]
return pool if pool
return nil if ActiveRecord::Base == klass
retrieve_connection_pool klass.superclass
end
Turns out the internal design of AR only allows AR::Base to know about a single connection pool.
Is there any way to get thread 1 to talk to db1, and thread 2 to talk to a db2 at the same time, using ActiveRecord::Base.connection ?
I would recommend using Postgres and separate schemas rather than separate databases entirely; that was you can share pools.
Usage would look like: select * from foo.users, select * from bar.users
And you would pass the schema to your background worker as an argument.

ActiveRecord - working within one connection

For example, suppose there is the code in Rails 3.2.3
def test_action
a = User.find_by_id(params[:user_id])
# some calculations.....
b = Reporst.find_by_name(params[:report_name])
# some calculations.....
c = Places.find_by_name(params[:place_name])
end
This code does 3 requests to database and opens 3 different connections. Most likely it's going to be a quite long action.
Is there any way to open only one connection and do 3 requests within it? Or I want to control which connection to use by myself.
You would want to bracket the calls with transaction:
Transactions are protective blocks where SQL statements are only
permanent if they can all succeed as one atomic action. The classic
example is a transfer between two accounts where you can only have a
deposit if the withdrawal succeeded and vice versa. Transactions
enforce the integrity of the database and guard the data against
program errors or database break-downs. So basically you should use
transaction blocks whenever you have a number of statements that must
be executed together or not at all.
def test_action
User.transaction do
a = User.find_by_id(params[:user_id])
# some calculations.....
b = Reporst.find_by_name(params[:report_name])
# some calculations.....
c = Places.find_by_name(params[:place_name])
end
end
Even though they invoke different models the actions are encapsulated into one call to the DB. It is all or nothing though. If one fails in the middle then the entire capsule fails.
Though the transaction class method is called on some Active Record
class, the objects within the transaction block need not all be
instances of that class. This is because transactions are per-database
connection, not per-model.
You can take a look at ActiveRecord::ConnectionAdapters::ConnectionPool documentation
Also AR doesn't open a connection for each model/query it reuses the existent connection.
[7] pry(main)> [Advertiser.connection,Agent.connection,ActiveRecord::Base.connection].map(&:object_id)
=> [70224441876100, 70224441876100, 70224441876100]

Resources