I have a Rails app that is multi homed.
foo.mysite.com talks to the "foo" db.
bar.mysite.com talks to the "bar" db.
This is accomplished by calling:
ActiveRecord::Base.connection_handler.establish_connection("ActiveRecord::Base", foo_spec)
When requests come in for foo it uses the foo_spec, when requests come in for bar it uses the bar_spec.
Everything is happy and there is peace in the world.
However,
I also use sidekiq, it is heavily multi-threaded.
I was getting weird behavior in sidekiq. Often when I thought I was talking to the foo_db, ActiveRecord::Base.connection was pointed at bar_db.
I dug into the code and found:
def retrieve_connection_pool(klass)
pool = #class_to_pool[klass.name]
return pool if pool
return nil if ActiveRecord::Base == klass
retrieve_connection_pool klass.superclass
end
Turns out the internal design of AR only allows AR::Base to know about a single connection pool.
Is there any way to get thread 1 to talk to db1, and thread 2 to talk to a db2 at the same time, using ActiveRecord::Base.connection ?
I would recommend using Postgres and separate schemas rather than separate databases entirely; that was you can share pools.
Usage would look like: select * from foo.users, select * from bar.users
And you would pass the schema to your background worker as an argument.
Related
I am using a second database with datasets within my API.
Every API request can have up to 3 queries on that Database so I am splitting them in three Threads. To keep it Thread safe I am using a connection pool.
But after the whole code is run the ConnectionPool thread is not terminated. So basically every time a request is made, we will have a new Thread on the server until basically there is no memory left.
Is there a way to close the connection pool thread? Or am I doing wrong on creating a connection pool per request?
I setup the Connection Pool this way:
begin
full_db = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)
resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new(full_db)
spec = resolver.spec(Rails.env.to_sym)
pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)
Then I am running through the queries array and getting the results to the query
returned_responses = []
queries_array.each do |query|
threads << Thread.new do
pool.with_connection do |conn|
returned_responses << conn.execute(query).to_a
end
end
end
threads.map(&:join)
returned_responses
Finally I close the connections inside the connection pool:
ensure
pool.disconnect!
end
Since you want to make SQL queries directly without taking advantage of ActiveRecord as the ORM, but you do want to take advantage of ActiveRecord connection pooling, I suggest you create a new abstract class like ApplicationRecord:
# app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
connects_to database: {
writing: :full_datasets_database,
reading: :full_datasets_database
}
end
You'll need to configure the database full_datasets_database in database.yml so that connects_to is able to connect to it.
Then you'll be able to connect directly to that database and make direct SQL queries against it by referencing that class instead of ActiveRecord::Base:
FullDatasets.connection.execute(query)
The connection pooling will happen transparently with different pools:
FullDatasets.connection_pool.object_id
=> 22620
ActiveRecord::Base.connection_pool.object_id
=> 9000
You may have to do additional configuration, like dumping the schema to db/full_datasets_schema.rb, but any additional troubleshooting or configuration you'll have to do will be in described in https://guides.rubyonrails.org/active_record_multiple_databases.html.
The short version of this explanation is that you should attempt to take advantage of ActiveRecord as much as possible so that your implementation is clean and straightforward while still allowing you to drop directly to raw SQL.
After some time spent, I ended up finding an answer. The generic idea came from #anothermg but I had to do some changes in order to work in my version of rails (5.2).
I setup the database in config/full_datasets_database.yml
I had the following initializer already:
#! config/initializers/db_full_datasets.rb
DB_FULL_DATASETS = YAML::load(ERB.new(File.read(Rails.root.join("config","full_datasets_database.yml"))).result)[Rails.env]
I created the following model to create a connection to the new database:
#! app/models/full_datasets.rb
class FullDatasets < ActiveRecord::Base
self.abstract_class = true
establish_connection DB_FULL_DATASETS
end
On the actual module I added the following code:
def parallel_queries(queries_array)
returned_responses = []
threads = []
conn = FullDatasets.connection_pool
queries_array.each do |query|
threads << Thread.new do
returned_responses << conn.with_connection { |c| c.execute(query).to_a }
end
end
threads.map(&:join)
returned_responses
end
Follow the official way of handling multiple databases in Rails:
https://guides.rubyonrails.org/active_record_multiple_databases.html
I can't give you an accurate answer as I do not have your source code to fully understand the whole context. If the setup that I sent above is not applicable to your use case, you might have missed some background clean up tasks. You can refer to this doc:
https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html
I'm working on a Rails 7 app with some pretty tight response time SLAs. I am well within SLA during normal runtime. Where I fall painfully short is first request. I've added an initializer that will load up ActiveRecord and make sure all of my DB models are loaded. It hydrates some various memory caches, etc. This took me pretty far. My first response time was reduced about 60%. However, I've been trying to figure out a couple things are are still slowing down first response time.
First API request does a check to see if I need to do a rails migration. I've not figured out how to move this check to init.
First API request appears to be be using a fresh DB Pool.. not the one that was used in init phase. I've tried fully hydrating the pool to spare the API from creating them when Rails kicks on, but I've not figured it out.
In an initializer I can do something like this:
connections = []
ActiveRecord::Base.connection.pool.size.times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { ActiveRecord::Base.connection.pool.checkin(_1) }
According to my PG logs, this opens up the connections and Rails does all of this typing queries, setting session properties, etc. However, when I go to fire off my first API call, my pool is empty.
In the end what ended up being the general issue was I needed to be hydrating the pool with the correct connections. on_worker_boot is because this is running behind puma.
on_worker_boot do
ActiveRecord::Base.connected_to(role: :reading) do
# spin up db connections
connections = []
(ActiveRecord::Base.connection.pool.size - 1).times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { |x| ActiveRecord::Base.connection.pool.checkin(x) }
end
end
In our app, we need to switch to read replica database and read from it for some read-only APIs.
We decided to use the around_action filter for that:
Switch DB to read_replica before the action
Yield
Switching back to master.
We decided to use establish_connection for switching, which did the job but later we noticed that it's not thread-safe i.e it causes our other threads to face "#<ActiveRecord::ConnectionNotEstablished: No connection pool with 'primary' found.>" issue. So this solution would have worked in the case of single-threaded servers.
Later we tried to create a new connection pool, as below which is thread-safe:
databases = Rails.configuration.database_configuration
resolver = ActiveRecord::ConnectionAdapters::ConnectionSpecification::Resolver.new(databases)
spec = resolver.spec(:read_replica)
pool = ActiveRecord::ConnectionAdapters::ConnectionPool.new(spec)
pool.with_connection { |conn|
execute SQL query here.
}
The only problem with the above approach is, we can only execute queries using execute method like conn.execute(sql_query) any AR ORM query we execute inside this with_connection block run on the original DB and not read_replica.
Seems like ActiveRecord do have its default connection and it's using it when we run AR ORM queries.
Not sure how can we execute the AR ORM query inside the with_connection block as User.where(id: 1..10).
Please note:
I am aware that we can do this natively in rails 6, need to skip that for now.
I am also aware of the Octopus gem, again need to skip on that.
Appreciate any help, Thanks.
I have a simple rails app that scrapes JSON from a remote URL for each instance of a model (let's call it A). The app then creates a new data-point under an associated model of the 1st. Let's call this middle model B and the data point model C. There's also a front end that let's users browse this data graphically/visually.
Thus the hierarchy is A has many -> B which has many -> C. I scrape a URL for each A which returns a few instances of B with new Cs that have data for the respective B.
While attempting to test/scale this app I have encountered a problem where rails will stop processing, hang for a while, and finally throw a "ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds" Obviously the 5 is just the default.
I can't understand why this is happening when 1) there are no DB calls being made explicitly, 2) the log doesn't show any under the hood DB calls happening when it does work 3) it works sometimes and not others.
What's going on with rails 4 AR and the connection pool?!
A couple of notes:
The general algorithm is to spawn a thread for each model A, scrape the data, create in memory new instances of model C, save all the C's in one transaction at the end.
Sometimes this works, other times it doesn't, i can't figure out what causes it to fail. However, once it fails it seems to fail more and more.
I eager load all the model A's and B's to begin with.
I use a transaction at the end to insert all the newly created C instances.
I currently use resque and resque scheduler to do this work but I highly doubt they are the source of the problem as it persists even if I just do "rails runner Class.do_work"
Any suggestions and or thoughts greatly appreciated!
I believe I have found the cause of this problem. When you loop through an association via
model.association.each do |a|
#work here
end
Rails does some behind the scenes work that "uses" a DB connection. I put uses in quotes because in my case I think the result is actually returned from memory. I eager loaded the association and thus the DB is never actually hit.
Preliminary testing of wrapping my block in a
ActiveRecord::Base.connection_pool.with_connection do
#something me doing?
end
seems to have resolved the issue.
I uncovered this by adding a backtrace to my thread's error message that was printing out.
-----For those using resque----
I also had to add a bit in my resque.rake file to get this fully working as intended.
task 'resque:setup' => :environment do
Resque.after_fork do |job|
ActiveRecord::Base.establish_connection
end
end
If you are you using
ActiveRecord::Base.transaction do
... code
end
to accomplish faster transactions in a thread, note that this locks the database. I had an app that did this for a hugely expensive process, in a thread, and it would lock the DB for over 5 seconds. It is faster, though it will lock your database
I'm setting up a multi tenant Rails application with Postgresql schemas.
How can I scope the db search path for Delayed::Job?
This would work:
initializers/dj_config.rb:
Delayed::Job.class_eval do
connection.schema_search_path = ["#{current_tenant}", "public"].join(",")
end
...but I need a way to pass in the current tenant, which seems hard since the DJ worker is a different process than the one where "current_tenant" is set. Any ideas?
I had a similar problem and ended up creating custom job classes with perform methods that set the current_tenant. I simply passed the current_tenant into the constructor:
Delayed::Job.enqueue CustomJob.new(current_tenant)