Rails: switch connection on each request but keep a connection pool - ruby-on-rails

In our Rails application we need to use different databases depending on the subdomain of the request (different DB per country).
Right now we're doing something similar to what's recommended in this question. That is, calling ActiveRecord::Base.establish_connection on each request.
But it seems ActiveRecord::Base.establish_connection drops the current connection pool and establishes a new connection each time it's called.
I made this quick benchmark to see if there was any significant difference between calling establish_connection each time and having the connections already established:
require 'benchmark/ips'
$config = Rails.configuration.database_configuration[Rails.env]
$db1_config = $config.dup.update('database' => 'db1')
$db2_config = $config.dup.update('database' => 'db2')
# Method 1: call establish_connection on each "request".
Benchmark.ips do |r|
r.report('establish_connection:') do
# Simulate two requests, one for each DB.
ActiveRecord::Base.establish_connection($db1_config)
MyModel.count # A little query to force the DB connection to establish.
ActiveRecord::Base.establish_connection($db2_config)
MyModel.count
end
end
# Method 2: Have different subclasses of my models, one for each DB, and
# call establish_connection only once
class MyModelDb1 < MyModel
establish_connection($db1_config)
end
class MyModelDb2 < MyModel
establish_connection($db2_config)
end
Benchmark.ips do |r|
r.report('different models:') do
MyModelDb1.count
MyModelDb2.count
end
end
I run this script with rails runner and pointing to a local mysql with some couple thousand records on the DBs and the results seem to indicate that there in fact is a pretty big difference (of an order of magnitude) between the two methods (BTW, i'm not sure if the benchmark is valid or i screwed up and therefore the results are misleading):
Calculating -------------------------------------
establish_connection: 8 i/100ms
-------------------------------------------------
establish_connection: 117.9 (±26.3%) i/s - 544 in 5.001575s
Calculating -------------------------------------
different models: 119 i/100ms
-------------------------------------------------
different models: 1299.4 (±22.1%) i/s - 6188 in 5.039483s
So, basically, i'd like to know if there's a way to maintain a connection pool for each subdomain and then re-use those connections instead of establishing a new connection on each request. Having a subclass of my models for each subdomain is not feasible, as there are many models; i just want to change the connection for all the models (in ActiveRecord::Base)

Well, i've been digging into this a bit more and managed to get something working.
After reading tenderlove's post about connection management in ActiveRecord, which explains how the class hierarchy gets unnecessarily coupled with the connection management, i understood why doing what i'm trying to do in not as straightforward as one would expect.
What i ended up doing was subclassing ActiveRecord's ConnectionHandler and using that new connection handler at the top of my model hierarchy (some fiddling on the ConnectionHandler code was needed to understand how it works internally; so of course this solution could be very tied to the Rails version i'm using (3.2)). Something like:
# A model class that connects to a different DB depending on the subdomain
# we're in
class ModelBase < ActiveRecord::Base
self.abstract_class = true
self.connection_handler = CustomConnectionHandler.new
end
# ...
class CustomConnectionHandler < ActiveRecord::ConnectionAdapters::ConnectionHandler
def initialize
super
#pools_by_subdomain = {}
end
# Override the behaviour of ActiveRecord's ConnectionHandler to return a
# connection pool for the current domain.
def retrieve_connection_pool(klass)
# Get current subdomain somehow (Maybe store it in a class variable on
# each request or whatever)
subdomain = ##subdomain
#pools_by_subdomain[subdomain] ||= create_pool(subdomain)
end
private
def create_pool(subdomain)
conf = Rails.configuration.database_configuration[Rails.env].dup
# The name of the DB for that subdomain...
conf.update!('database' => "db_#{subdomain}")
resolver = ActiveRecord::Base::ConnectionSpecification::Resolver.new(conf, nil)
# Call ConnectionHandler#establish_connection, which receives a key
# (in this case the subdomain) for the new connection pool
establish_connection(subdomain, resolver.spec)
end
end
This still needs some testing to check if there is in fact a performance gain, but my initial tests running on a local Unicorn server suggest there is.

As far as I know Rails does not maintain it's database pool between requests, except if you use multi-threaded env. like Sidekiq. But if you use Passenger or Unicorn on your production server, it will create a new database connection for each Rails instance.
So basically using a database connection pool is useless, which means that creating a new database connection on each request should not be a concern.

Related

Close Rails ActiveRecord Connection Pool

I am using a second database with datasets within my API.
Every API request can have up to 3 queries on that Database so I am splitting them in three Threads. To keep it Thread safe I am using a connection pool.
But after the whole code is run the ConnectionPool thread is not terminated. So basically every time a request is made, we will have a new Thread on the server until basically there is no memory left.
Is there a way to close the connection pool thread? Or am I doing wrong on creating a connection pool per request?
I setup the Connection Pool this way:
begin
full_db = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)
resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new(full_db)
spec = resolver.spec(Rails.env.to_sym)
pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)
Then I am running through the queries array and getting the results to the query
returned_responses = []
queries_array.each do |query|
threads << Thread.new do
pool.with_connection do |conn|
returned_responses << conn.execute(query).to_a
end
end
end
threads.map(&:join)
returned_responses
Finally I close the connections inside the connection pool:
ensure
pool.disconnect!
end
Since you want to make SQL queries directly without taking advantage of ActiveRecord as the ORM, but you do want to take advantage of ActiveRecord connection pooling, I suggest you create a new abstract class like ApplicationRecord:
# app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
connects_to database: {
writing: :full_datasets_database,
reading: :full_datasets_database
}
end
You'll need to configure the database full_datasets_database in database.yml so that connects_to is able to connect to it.
Then you'll be able to connect directly to that database and make direct SQL queries against it by referencing that class instead of ActiveRecord::Base:
FullDatasets.connection.execute(query)
The connection pooling will happen transparently with different pools:
FullDatasets.connection_pool.object_id
=> 22620
ActiveRecord::Base.connection_pool.object_id
=> 9000
You may have to do additional configuration, like dumping the schema to db/full_datasets_schema.rb, but any additional troubleshooting or configuration you'll have to do will be in described in https://guides.rubyonrails.org/active_record_multiple_databases.html.
The short version of this explanation is that you should attempt to take advantage of ActiveRecord as much as possible so that your implementation is clean and straightforward while still allowing you to drop directly to raw SQL.
After some time spent, I ended up finding an answer. The generic idea came from #anothermg but I had to do some changes in order to work in my version of rails (5.2).
I setup the database in config/full_datasets_database.yml
I had the following initializer already:
#! config/initializers/db_full_datasets.rb
DB_FULL_DATASETS = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)[Rails.env]
I created the following model to create a connection to the new database:
#! app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
establish_connection DB_FULL_DATASETS
end
On the actual module I added the following code:
def parallel_queries(queries_array)
returned_responses = []
threads = []
conn = FullDatasets.connection_pool
queries_array.each do |query|
threads << Thread.new do
returned_responses << conn.with_connection { |c| c.execute(query).to_a }
end
end
threads.map(&:join)
returned_responses
end
Follow the official way of handling multiple databases in Rails:
https://guides.rubyonrails.org/active_record_multiple_databases.html
I can't give you an accurate answer as I do not have your source code to fully understand the whole context. If the setup that I sent above is not applicable to your use case, you might have missed some background clean up tasks. You can refer to this doc:
https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html

Automatic role change for read/writes using ActiveRecord

The addition of multiple databases on Rails 6.1 is great and provides a way to have a fairly simple read from follower write on replica setup if you use the included resolver.
That works if your app is fully RESTful and there's no mutations on the database during GETs and HEADs. On paper that's great but there are some situations where alterations of the data can happen (see consuming tokens, lazy cleaning data, set flags, etc).
ActiveRecord knows when the query is a write one and raises an error if the target connection is a replica one.
I want to automatically switch between leader and follower based on that condition. PGPool2 does that but I'd like to avoid setting up another service if it were possible.
A simple solution I tried was to create a custom resolver and wrap the execution but since ActiveRecord can detect a query that modifies the data I was wondering if there's any way to switch the connection before execution.
# frozen_string_literal: true
module Middleware
module DatabaseSelector
class AutoSwitchResolver < ActiveRecord::Middleware::DatabaseSelector::Resolver
def read(&block)
super(&block)
rescue ActiveRecord::ReadOnlyError
write(&block)
end
end
end
end
I also tried to hook up to ActiveRecord::ConnectionAdapters::PostgreSQLAdapter execute methods as Marginalia does to try to overwrite the target without much luck.
EDIT:
I also tried and it seemed to work but it ends up failing alongside the stock Rails resolver:
class ApplicationRecord < ActiveRecord::Base
# ...
around_save :ensure_write_database
around_destroy :ensure_write_database
def touch(*args)
ensure_write_database { super(*args) }
end
def ensure_write_database
ApplicationRecord.connected_to(role: :writing) { yield }
end

ActiveRecord - working within one connection

For example, suppose there is the code in Rails 3.2.3
def test_action
a = User.find_by_id(params[:user_id])
# some calculations.....
b = Reporst.find_by_name(params[:report_name])
# some calculations.....
c = Places.find_by_name(params[:place_name])
end
This code does 3 requests to database and opens 3 different connections. Most likely it's going to be a quite long action.
Is there any way to open only one connection and do 3 requests within it? Or I want to control which connection to use by myself.
You would want to bracket the calls with transaction:
Transactions are protective blocks where SQL statements are only
permanent if they can all succeed as one atomic action. The classic
example is a transfer between two accounts where you can only have a
deposit if the withdrawal succeeded and vice versa. Transactions
enforce the integrity of the database and guard the data against
program errors or database break-downs. So basically you should use
transaction blocks whenever you have a number of statements that must
be executed together or not at all.
def test_action
User.transaction do
a = User.find_by_id(params[:user_id])
# some calculations.....
b = Reporst.find_by_name(params[:report_name])
# some calculations.....
c = Places.find_by_name(params[:place_name])
end
end
Even though they invoke different models the actions are encapsulated into one call to the DB. It is all or nothing though. If one fails in the middle then the entire capsule fails.
Though the transaction class method is called on some Active Record
class, the objects within the transaction block need not all be
instances of that class. This is because transactions are per-database
connection, not per-model.
You can take a look at ActiveRecord::ConnectionAdapters::ConnectionPool documentation
Also AR doesn't open a connection for each model/query it reuses the existent connection.
[7] pry(main)> [Advertiser.connection,Agent.connection,ActiveRecord::Base.connection].map(&:object_id)
=> [70224441876100, 70224441876100, 70224441876100]

Rails 3.2 how to protect shared resource access across sessions

I have a shared resource that can only be used by one session at a time, how do I signal to other sessions that the resource is currently in use?
In Java or C I would use a mutex semaphore to coordinate between threads, how can I accomplish that in Rails? Do I define a new environment variable and use it to coordinate between sessions?
A little code snippet along with the answer would be very helpful.
Since your Rails instances can be run in different processes when using Nginx or Apache (no shared memory like in threads), I guess the only solution is using file locks:
lock = File.new("/lock/file")
begin
lock.flock(File::LOCK_EX)
# do your logic here, or share information in your lock file
ensure
lock.flock(File::LOCK_UN)
end
I would consider using Redis for locking the resource.
https://redis.io/topics/distlock
https://github.com/antirez/redlock-rb
This has the advantage of working across multiple servers and not limiting the lock time to live to the lifetime of the current HTTP request.
Ruby has a Mutex class that might do what you want, though it won't work across processes. I apologize though that I don't know enough to give you an example code snippet. Here's what the documentation says: "Mutex implements a simple semaphore that can be used to coordinate access to shared data from multiple concurrent threads."
You can do this with acts_as_lockable_by gem.
Imagine the shared resource is a Patient ActiveRecord class that can only be accessed by a single user (you can replace this with session_id) as follows:
class Patient < ApplicationRecord
acts_as_lockable_by :id, ttl: 30.seconds
end
Then you can do this in your controller:
class PatientsController < ApplicationController
def edit
if patient.lock(current_user.id)
# It will be locked for 30 seconds for the current user
# You will need to renew the lock by calling /patients/:id/renew_lock
else
# Could not lock the patient record which means it is already locked by another user
end
end
def renew_lock
if patient.renew_lock(current_user.id)
# lock renewed return 200
else
# could not renew the lock, it might be already released
end
end
private
def patient
#patient ||= Patient.find(params[:id])
end
end
This is a solution that works with minimum code and across a cluster of RoR machines/servers not just locally on one server (like using file locks) as the gem uses redis as locks/semaphores broker. The lock, unlock and renew_lock methods are all atomic and thread-safe ;)

Ruby and Rails: Metaprogramming variables to become class methods

I'm creating a model called Configuration and I have the following code and I want to make it more dynamic by using metaprogramming.
In a table on the database for Configuration model I have the following data.
---------------------------------------------------------
variable_name as string | value in text
|
company_name | MyCompany
welcome_text | Welcome to MyCompany's App!
email_order_text | You've just created an account with MyCompany.
year_since | 2012
----------------------------------------------------------
class Configuration < ActiveRecord::Base
#nothing here yet
end
----------------------------------------------------------
Currently, the only way to access the company_name is to do the following in rails console:
configuration_company_name = Configuration.find_by_variable_name("company_name")
configuration_company_name.company_name
> "MyCompany"
I think this is an unacceptable way to do things. First, it will access the database everytime someone checks for the company's name. I think if I could load it when the app starts and doesn't have to access it again because it's in the memory, then it would be better. How can I do something more dynamic so I could access the value "MyCompany" like this.
Configuration.company_name
> "MyCompany"
The reason to do this is to give allow fast customization of the application.
class Configuration < ActiveRecord::Base
# loads all the configuration variables to an in-memory
# static hash during the first access.
def self.[](n)
#config ||= {}.tap { |h| Configuration.all.each{ h[variable_name] = c.value}}
#config[n]
end
end
Now you can access your configuration as :
Configuration["company_name"]
If you a large number of configuration parameters, it might be beneficial to pre-load the cache by accessing a configuration parameter in an initializer file. If you have 1000s of configuration variables you might have to consider migrating the cache to memcached etc.
If you want to access the configuration parameter as a class method:
class Configuration < ActiveRecord::Base
klass = class << self; self; end
Configuration.all.each{|c| klass.send(:define_method, c.variable_name){c.value}}
end
Now you can access the parameter as follows:
Configuration.company_name
One thing you are getting wrong here,it will be never be Configuration.company_name , thats like access a Class property instead of a Object/Instance property,
It should be a instance of the Configuration Class. It would still be somewhat acceptable to use #KandagaBoggu's method in the other answer, but database access still almost everytime or from the Active Record’s query cache.But AR's query cache lives for the duration of a particular action (i.e. request ). You may want use something like Memcached for the objects to survive longer.
We can move these constant values into a yml file, when the server starts load them into a variable and access it whenever needed.

Resources