Use other database connection and execute query - ruby-on-rails

In our app, we need to switch to read replica database and read from it for some read-only APIs.
We decided to use the around_action filter for that:
Switch DB to read_replica before the action
Yield
Switching back to master.
We decided to use establish_connection for switching, which did the job but later we noticed that it's not thread-safe i.e it causes our other threads to face "#<ActiveRecord::ConnectionNotEstablished: No connection pool with 'primary' found.>" issue. So this solution would have worked in the case of single-threaded servers.
Later we tried to create a new connection pool, as below which is thread-safe:
databases = Rails.configuration.database_configuration
resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new(databases)
spec = resolver.spec(:read_replica)
pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)
pool.with_connection { |conn|
execute SQL query here.
}
The only problem with the above approach is, we can only execute queries using execute method like conn.execute(sql_query) any AR ORM query we execute inside this with_connection block run on the original DB and not read_replica.
Seems like ActiveRecord do have its default connection and it's using it when we run AR ORM queries.
Not sure how can we execute the AR ORM query inside the with_connection block as User.where(id: 1..10).
Please note:
I am aware that we can do this natively in rails 6, need to skip that for now.
I am also aware of the Octopus gem, again need to skip on that.
Appreciate any help, Thanks.

Related

Rails DB Connection Pool Hydration

I'm working on a Rails 7 app with some pretty tight response time SLAs. I am well within SLA during normal runtime. Where I fall painfully short is first request. I've added an initializer that will load up ActiveRecord and make sure all of my DB models are loaded. It hydrates some various memory caches, etc. This took me pretty far. My first response time was reduced about 60%. However, I've been trying to figure out a couple things are are still slowing down first response time.
First API request does a check to see if I need to do a rails migration. I've not figured out how to move this check to init.
First API request appears to be be using a fresh DB Pool.. not the one that was used in init phase. I've tried fully hydrating the pool to spare the API from creating them when Rails kicks on, but I've not figured it out.
In an initializer I can do something like this:
connections = []
ActiveRecord::Base.connection.pool.size.times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { ActiveRecord::Base.connection.pool.checkin(_1) }
According to my PG logs, this opens up the connections and Rails does all of this typing queries, setting session properties, etc. However, when I go to fire off my first API call, my pool is empty.
In the end what ended up being the general issue was I needed to be hydrating the pool with the correct connections. on_worker_boot is because this is running behind puma.
on_worker_boot do
ActiveRecord::Base.connected_to(role: :reading) do
# spin up db connections
connections = []
(ActiveRecord::Base.connection.pool.size - 1).times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { |x| ActiveRecord::Base.connection.pool.checkin(x) }
end
end

Rails 4 Multithreaded App - ActiveRecord::ConnectionTimeoutError

I have a simple rails app that scrapes JSON from a remote URL for each instance of a model (let's call it A). The app then creates a new data-point under an associated model of the 1st. Let's call this middle model B and the data point model C. There's also a front end that let's users browse this data graphically/visually.
Thus the hierarchy is A has many -> B which has many -> C. I scrape a URL for each A which returns a few instances of B with new Cs that have data for the respective B.
While attempting to test/scale this app I have encountered a problem where rails will stop processing, hang for a while, and finally throw a "ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds" Obviously the 5 is just the default.
I can't understand why this is happening when 1) there are no DB calls being made explicitly, 2) the log doesn't show any under the hood DB calls happening when it does work 3) it works sometimes and not others.
What's going on with rails 4 AR and the connection pool?!
A couple of notes:
The general algorithm is to spawn a thread for each model A, scrape the data, create in memory new instances of model C, save all the C's in one transaction at the end.
Sometimes this works, other times it doesn't, i can't figure out what causes it to fail. However, once it fails it seems to fail more and more.
I eager load all the model A's and B's to begin with.
I use a transaction at the end to insert all the newly created C instances.
I currently use resque and resque scheduler to do this work but I highly doubt they are the source of the problem as it persists even if I just do "rails runner Class.do_work"
Any suggestions and or thoughts greatly appreciated!
I believe I have found the cause of this problem. When you loop through an association via
model.association.each do |a|
#work here
end
Rails does some behind the scenes work that "uses" a DB connection. I put uses in quotes because in my case I think the result is actually returned from memory. I eager loaded the association and thus the DB is never actually hit.
Preliminary testing of wrapping my block in a
ActiveRecord::Base.connection_pool.with_connection do
#something me doing?
end
seems to have resolved the issue.
I uncovered this by adding a backtrace to my thread's error message that was printing out.
-----For those using resque----
I also had to add a bit in my resque.rake file to get this fully working as intended.
task 'resque:setup' => :environment do
Resque.after_fork do |job|
ActiveRecord::Base.establish_connection
end
end
If you are you using
ActiveRecord::Base.transaction do
... code
end
to accomplish faster transactions in a thread, note that this locks the database. I had an app that did this for a hugely expensive process, in a thread, and it would lock the DB for over 5 seconds. It is faster, though it will lock your database

Active Record and multi threading involving multiple dbs

I have a Rails app that is multi homed.
foo.mysite.com talks to the "foo" db.
bar.mysite.com talks to the "bar" db.
This is accomplished by calling:
ActiveRecord::Base.connection_handler.establish_connection("ActiveRecord::Base", foo_spec)
When requests come in for foo it uses the foo_spec, when requests come in for bar it uses the bar_spec.
Everything is happy and there is peace in the world.
However,
I also use sidekiq, it is heavily multi-threaded.
I was getting weird behavior in sidekiq. Often when I thought I was talking to the foo_db, ActiveRecord::Base.connection was pointed at bar_db.
I dug into the code and found:
def retrieve_connection_pool(klass)
pool = #class_to_pool[klass.name]
return pool if pool
return nil if ActiveRecord::Base == klass
retrieve_connection_pool klass.superclass
end
Turns out the internal design of AR only allows AR::Base to know about a single connection pool.
Is there any way to get thread 1 to talk to db1, and thread 2 to talk to a db2 at the same time, using ActiveRecord::Base.connection ?
I would recommend using Postgres and separate schemas rather than separate databases entirely; that was you can share pools.
Usage would look like: select * from foo.users, select * from bar.users
And you would pass the schema to your background worker as an argument.

Postgresql schemas and Delayed::Job

I'm setting up a multi tenant Rails application with Postgresql schemas.
How can I scope the db search path for Delayed::Job?
This would work:
initializers/dj_config.rb:
Delayed::Job.class_eval do
connection.schema_search_path = ["#{current_tenant}", "public"].join(",")
end
...but I need a way to pass in the current tenant, which seems hard since the DJ worker is a different process than the one where "current_tenant" is set. Any ideas?
I had a similar problem and ended up creating custom job classes with perform methods that set the current_tenant. I simply passed the current_tenant into the constructor:
Delayed::Job.enqueue CustomJob.new(current_tenant)

When should I create Solr connection in a Rails app

I'm accessing Solr in a Ruby on Rails application by using rsolr (not Sunspot). I create the local solr object that I use to send requests like this:
solr = RSolr.connect(:url => "http://localhost:8983/solr")
as far as I understand, this is not really a connection but just an object that will issue requests on demand, so it shouldn't be expensive to keep it initialized and it should never disconnect. According to that, it should be ok to have one global solr object, create it at start time and forget about it. Right? But maybe it's not thread safe?
When should I create the solr connection?
All that the RSolr.connect method really does is sanitize and save the options that you're using. You can see that method here. It's passed a new connection object (which, notably, doesn't have an initialize method, so it's not doing anything when created) and the options that you pass to RSolr.connect.
So yes, you're right -- no harm at all in connecting once and leaving it connected forever hanging around in a variable somewhere. (For example, I memoize the result of RSolr.connect in my Solr/Rails app.)

Resources