The addition of multiple databases on Rails 6.1 is great and provides a way to have a fairly simple read from follower write on replica setup if you use the included resolver.
That works if your app is fully RESTful and there's no mutations on the database during GETs and HEADs. On paper that's great but there are some situations where alterations of the data can happen (see consuming tokens, lazy cleaning data, set flags, etc).
ActiveRecord knows when the query is a write one and raises an error if the target connection is a replica one.
I want to automatically switch between leader and follower based on that condition. PGPool2 does that but I'd like to avoid setting up another service if it were possible.
A simple solution I tried was to create a custom resolver and wrap the execution but since ActiveRecord can detect a query that modifies the data I was wondering if there's any way to switch the connection before execution.
# frozen_string_literal: true
module Middleware
module DatabaseSelector
class AutoSwitchResolver < ActiveRecord::Middleware::DatabaseSelector::Resolver
def read(&block)
super(&block)
rescue ActiveRecord::ReadOnlyError
write(&block)
end
end
end
end
I also tried to hook up to ActiveRecord::ConnectionAdapters::PostgreSQLAdapter execute methods as Marginalia does to try to overwrite the target without much luck.
EDIT:
I also tried and it seemed to work but it ends up failing alongside the stock Rails resolver:
class ApplicationRecord < ActiveRecord::Base
# ...
around_save :ensure_write_database
around_destroy :ensure_write_database
def touch(*args)
ensure_write_database { super(*args) }
end
def ensure_write_database
ApplicationRecord.connected_to(role: :writing) { yield }
end
Related
I am using a second database with datasets within my API.
Every API request can have up to 3 queries on that Database so I am splitting them in three Threads. To keep it Thread safe I am using a connection pool.
But after the whole code is run the ConnectionPool thread is not terminated. So basically every time a request is made, we will have a new Thread on the server until basically there is no memory left.
Is there a way to close the connection pool thread? Or am I doing wrong on creating a connection pool per request?
I setup the Connection Pool this way:
begin
full_db = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)
resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new(full_db)
spec = resolver.spec(Rails.env.to_sym)
pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)
Then I am running through the queries array and getting the results to the query
returned_responses = []
queries_array.each do |query|
threads << Thread.new do
pool.with_connection do |conn|
returned_responses << conn.execute(query).to_a
end
end
end
threads.map(&:join)
returned_responses
Finally I close the connections inside the connection pool:
ensure
pool.disconnect!
end
Since you want to make SQL queries directly without taking advantage of ActiveRecord as the ORM, but you do want to take advantage of ActiveRecord connection pooling, I suggest you create a new abstract class like ApplicationRecord:
# app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
connects_to database: {
writing: :full_datasets_database,
reading: :full_datasets_database
}
end
You'll need to configure the database full_datasets_database in database.yml so that connects_to is able to connect to it.
Then you'll be able to connect directly to that database and make direct SQL queries against it by referencing that class instead of ActiveRecord::Base:
FullDatasets.connection.execute(query)
The connection pooling will happen transparently with different pools:
FullDatasets.connection_pool.object_id
=> 22620
ActiveRecord::Base.connection_pool.object_id
=> 9000
You may have to do additional configuration, like dumping the schema to db/full_datasets_schema.rb, but any additional troubleshooting or configuration you'll have to do will be in described in https://guides.rubyonrails.org/active_record_multiple_databases.html.
The short version of this explanation is that you should attempt to take advantage of ActiveRecord as much as possible so that your implementation is clean and straightforward while still allowing you to drop directly to raw SQL.
After some time spent, I ended up finding an answer. The generic idea came from #anothermg but I had to do some changes in order to work in my version of rails (5.2).
I setup the database in config/full_datasets_database.yml
I had the following initializer already:
#! config/initializers/db_full_datasets.rb
DB_FULL_DATASETS = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)[Rails.env]
I created the following model to create a connection to the new database:
#! app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
establish_connection DB_FULL_DATASETS
end
On the actual module I added the following code:
def parallel_queries(queries_array)
returned_responses = []
threads = []
conn = FullDatasets.connection_pool
queries_array.each do |query|
threads << Thread.new do
returned_responses << conn.with_connection { |c| c.execute(query).to_a }
end
end
threads.map(&:join)
returned_responses
end
Follow the official way of handling multiple databases in Rails:
https://guides.rubyonrails.org/active_record_multiple_databases.html
I can't give you an accurate answer as I do not have your source code to fully understand the whole context. If the setup that I sent above is not applicable to your use case, you might have missed some background clean up tasks. You can refer to this doc:
https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
One of our use cases involves publishing active record models over Drb. It looks like when we do this we are inadvertently leaving connections checked out and as a result we're receiving AR timeouts.
I think this is because of this comment in the active record code:
http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
Specifically
"Simply use ActiveRecord::Core#connection as with Active Record 2.1
and earlier (pre-connection-pooling). Eventually, when you're done
with the connection(s) and wish it to be returned to the pool, you
call ActiveRecord::Base.clear_active_connections!. This will be the
default behavior for Active Record when used in conjunction with
Action Pack's request handling cycle."
When we're accessing out models over Drb we're not going through the request cycle so the connection is not getting checked back in.
The same document suggests we need to check these connections back in manually - what I need is a way to hook into all methods on a published model and call "ActiveRecord::Base.clear_active_connections" afterwards.
class Foo < ActiveRecord::Base
#I need this method to be called after every method on this class!
def close_connections
ActiveRecord::Base.close_active_connections
end
end
Closing the connections manually isn't really an option because there are tens of thousands of lines of code and I'd need to go and add "close the connection" after every single one!
You could add this snippet at the end of your class definition..
(instance_methods - Class.new.methods).each do |method|
define_method "#{method}_with_close_connections" do |*args, &block|
self.send "#{method}_without_close_connections", *args, &block
ActiveRecord::Base.close_active_connections
end
alias_method_chain method, :close_connections
end
Highly non-recommended, however. You should probably find another solution.
One potential solution is to use Observers - http://api.rubyonrails.org/v3.2.13/classes/ActiveRecord/Observer.html
You will, however, need an observer for each one of your Models.
Before going down this path though, I would thoroughly evaluate your implementation and find a better way of accessing the connection pool.
In our Rails application we need to use different databases depending on the subdomain of the request (different DB per country).
Right now we're doing something similar to what's recommended in this question. That is, calling ActiveRecord::Base.establish_connection on each request.
But it seems ActiveRecord::Base.establish_connection drops the current connection pool and establishes a new connection each time it's called.
I made this quick benchmark to see if there was any significant difference between calling establish_connection each time and having the connections already established:
require 'benchmark/ips'
$config = Rails.configuration.database_configuration[Rails.env]
$db1_config = $config.dup.update('database' => 'db1')
$db2_config = $config.dup.update('database' => 'db2')
# Method 1: call establish_connection on each "request".
Benchmark.ips do |r|
r.report('establish_connection:') do
# Simulate two requests, one for each DB.
ActiveRecord::Base.establish_connection($db1_config)
MyModel.count # A little query to force the DB connection to establish.
ActiveRecord::Base.establish_connection($db2_config)
MyModel.count
end
end
# Method 2: Have different subclasses of my models, one for each DB, and
# call establish_connection only once
class MyModelDb1 < MyModel
establish_connection($db1_config)
end
class MyModelDb2 < MyModel
establish_connection($db2_config)
end
Benchmark.ips do |r|
r.report('different models:') do
MyModelDb1.count
MyModelDb2.count
end
end
I run this script with rails runner and pointing to a local mysql with some couple thousand records on the DBs and the results seem to indicate that there in fact is a pretty big difference (of an order of magnitude) between the two methods (BTW, i'm not sure if the benchmark is valid or i screwed up and therefore the results are misleading):
Calculating -------------------------------------
establish_connection: 8 i/100ms
-------------------------------------------------
establish_connection: 117.9 (±26.3%) i/s - 544 in 5.001575s
Calculating -------------------------------------
different models: 119 i/100ms
-------------------------------------------------
different models: 1299.4 (±22.1%) i/s - 6188 in 5.039483s
So, basically, i'd like to know if there's a way to maintain a connection pool for each subdomain and then re-use those connections instead of establishing a new connection on each request. Having a subclass of my models for each subdomain is not feasible, as there are many models; i just want to change the connection for all the models (in ActiveRecord::Base)
Well, i've been digging into this a bit more and managed to get something working.
After reading tenderlove's post about connection management in ActiveRecord, which explains how the class hierarchy gets unnecessarily coupled with the connection management, i understood why doing what i'm trying to do in not as straightforward as one would expect.
What i ended up doing was subclassing ActiveRecord's ConnectionHandler and using that new connection handler at the top of my model hierarchy (some fiddling on the ConnectionHandler code was needed to understand how it works internally; so of course this solution could be very tied to the Rails version i'm using (3.2)). Something like:
# A model class that connects to a different DB depending on the subdomain
# we're in
class ModelBase < ActiveRecord::Base
self.abstract_class = true
self.connection_handler = CustomConnectionHandler.new
end
# ...
class CustomConnectionHandler < ActiveRecord::ConnectionAdapters::ConnectionHandler
def initialize
super
#pools_by_subdomain = {}
end
# Override the behaviour of ActiveRecord's ConnectionHandler to return a
# connection pool for the current domain.
def retrieve_connection_pool(klass)
# Get current subdomain somehow (Maybe store it in a class variable on
# each request or whatever)
subdomain = ##subdomain
#pools_by_subdomain[subdomain] ||= create_pool(subdomain)
end
private
def create_pool(subdomain)
conf = Rails.configuration.database_configuration[Rails.env].dup
# The name of the DB for that subdomain...
conf.update!('database' => "db_#{subdomain}")
resolver = ActiveRecord::Base::ConnectionSpecification::Resolver.new(conf, nil)
# Call ConnectionHandler#establish_connection, which receives a key
# (in this case the subdomain) for the new connection pool
establish_connection(subdomain, resolver.spec)
end
end
This still needs some testing to check if there is in fact a performance gain, but my initial tests running on a local Unicorn server suggest there is.
As far as I know Rails does not maintain it's database pool between requests, except if you use multi-threaded env. like Sidekiq. But if you use Passenger or Unicorn on your production server, it will create a new database connection for each Rails instance.
So basically using a database connection pool is useless, which means that creating a new database connection on each request should not be a concern.
I have a shared resource that can only be used by one session at a time, how do I signal to other sessions that the resource is currently in use?
In Java or C I would use a mutex semaphore to coordinate between threads, how can I accomplish that in Rails? Do I define a new environment variable and use it to coordinate between sessions?
A little code snippet along with the answer would be very helpful.
Since your Rails instances can be run in different processes when using Nginx or Apache (no shared memory like in threads), I guess the only solution is using file locks:
lock = File.new("/lock/file")
begin
lock.flock(File::LOCK_EX)
# do your logic here, or share information in your lock file
ensure
lock.flock(File::LOCK_UN)
end
I would consider using Redis for locking the resource.
https://redis.io/topics/distlock
https://github.com/antirez/redlock-rb
This has the advantage of working across multiple servers and not limiting the lock time to live to the lifetime of the current HTTP request.
Ruby has a Mutex class that might do what you want, though it won't work across processes. I apologize though that I don't know enough to give you an example code snippet. Here's what the documentation says: "Mutex implements a simple semaphore that can be used to coordinate access to shared data from multiple concurrent threads."
You can do this with acts_as_lockable_by gem.
Imagine the shared resource is a Patient ActiveRecord class that can only be accessed by a single user (you can replace this with session_id) as follows:
class Patient < ApplicationRecord
acts_as_lockable_by :id, ttl: 30.seconds
end
Then you can do this in your controller:
class PatientsController < ApplicationController
def edit
if patient.lock(current_user.id)
# It will be locked for 30 seconds for the current user
# You will need to renew the lock by calling /patients/:id/renew_lock
else
# Could not lock the patient record which means it is already locked by another user
end
end
def renew_lock
if patient.renew_lock(current_user.id)
# lock renewed return 200
else
# could not renew the lock, it might be already released
end
end
private
def patient
#patient ||= Patient.find(params[:id])
end
end
This is a solution that works with minimum code and across a cluster of RoR machines/servers not just locally on one server (like using file locks) as the gem uses redis as locks/semaphores broker. The lock, unlock and renew_lock methods are all atomic and thread-safe ;)
When my system requires two classes or modules of the same name, what can I do to specify which I mean?
I'm using rails (new to it), and one of my models is named "Thread". When I try to refer to the class "Thread" in thread_controller.rb, the system returns some other constant of the same name.
<thread.rb>
class Thread < ActiveRecord::Base
def self.some_class_method
end
end
<thread_controller.rb>
class ThreadController < ApplicationController
def index
require '../models/thread.rb'
#threads = Thread.find :all
end
end
When I try Thread.find(), I get an error saying that Thread has no method named find. When I access Thread.methods, I don't find my some_class_method method among them.
Any help? (And don't bother posting "just name your model something else." It's not helpful to point out obvious compromises.)
You could put your app into its own namespace.
<my_app/thread.rb>
module MyApp
class Thread
end
end
No really, name your model something else.
Thread is a reserved constant in Ruby and overriding that constant is only going to make you run into trouble. I compromised for my application and called it Topic instead.
If you absolutely must overwrite an existing constant, you can do something like this:
# use Object to make sure Thread is overwritten globally
# use `send` because `remove_const` is a private method of Object
# Can use OldThread to access already existing Thread
OldThread = Object.send(:remove_const, :Thread)
# define whatever you want here
class MyNewThread
...
end
# Now Thread is the same as MyNewThread
Object.send(:const_set, :Thread, MyNewThread)
Obviously anything that relied on the pre-existing Thread would be busted unless you did some kind of monkey-patching.
Just because this kind of thing can be done, doesn't mean it should be. But in certain circumstances it can be handy, for example in tests you can override a remote data source with your own 'dumb' object.