Rails DB Connection Pool Hydration - ruby-on-rails

I'm working on a Rails 7 app with some pretty tight response time SLAs. I am well within SLA during normal runtime. Where I fall painfully short is first request. I've added an initializer that will load up ActiveRecord and make sure all of my DB models are loaded. It hydrates some various memory caches, etc. This took me pretty far. My first response time was reduced about 60%. However, I've been trying to figure out a couple things are are still slowing down first response time.
First API request does a check to see if I need to do a rails migration. I've not figured out how to move this check to init.
First API request appears to be be using a fresh DB Pool.. not the one that was used in init phase. I've tried fully hydrating the pool to spare the API from creating them when Rails kicks on, but I've not figured it out.
In an initializer I can do something like this:
connections = []
ActiveRecord::Base.connection.pool.size.times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { ActiveRecord::Base.connection.pool.checkin(_1) }
According to my PG logs, this opens up the connections and Rails does all of this typing queries, setting session properties, etc. However, when I go to fire off my first API call, my pool is empty.

In the end what ended up being the general issue was I needed to be hydrating the pool with the correct connections. on_worker_boot is because this is running behind puma.
on_worker_boot do
ActiveRecord::Base.connected_to(role: :reading) do
# spin up db connections
connections = []
(ActiveRecord::Base.connection.pool.size - 1).times do
connections << ActiveRecord::Base.connection.pool.checkout
end
connections.each { |x| ActiveRecord::Base.connection.pool.checkin(x) }
end
end

Related

Understanding race_condition_ttl in Rails

I am trying to understand the race_condition_ttl directive in Rails when using Rails.cache.fetch.
I have a controller action that looks like this:
def foo
#foo = Rails.cache.fetch("foo-testing", expires_in: 30.seconds, race_condition_ttl: 60.seconds) do
Time.now.to_s
end
#foo # this gets used in a view down the line...
end
Based on what I'm reading in the Rails docs, this value should expire after 30 seconds, but the stale value is allowed to be served for another 60 seconds. However, I can't figure out how to reproduce conditions that will show me this behavior working. Here is how I'm trying to test it.
100.times.map do
t = Thread.new { RestClient.get("http://myenvironment/foo") }
t
end.map {|t| t.join.value }.uniq
I have my Rails app running on a VM behind a standard nginx/unicorn setup. I am trying to spawn 100 threads hitting the site simultaneously to simulate the "dog pile effect". However, when I run my test code, all the threads report the same value back. What I would expect to see is that one thread gets the fresh value, while at least one other thread gets served some stale content.
Any pointers are welcome! Thanks so much.
You are setting race_condition_ttl to 60 seconds which means your threads will only start getting the new value after this time expires, even not taking into account the initial 30 seconds.
Your test doesn't look like it would take 1.5 minutes to run which would be required in order for the threads to start getting the new value. From the Rails Cache docs:
Yes, this process is extending the time for a stale value by another few seconds. Because of extended life of the previous cache, other processes will continue to use slightly stale data for a just a bit longer.
The text implies using a small race_condition_ttl and it makes sense both for its purpose and your test.
UPDATE
Also note that the life of stale cache is extended only if it expired recently. Otherwise a new value is generated and :race_condition_ttl does not play any role.
Without reading source it is not particularly clear how Rails decides when its server is getting hammered or what exactly recently means in the quote above. It seems clear though that the first process (of many) of those waiting to access the cache gets to set the new value while extending life of the previous one. The presence of waiting processes might be the condition Rails looks for. In any case the expected behaviour should be observed after both initial timeout and ttl expire and cache starts serving the updated value. The delay between initial timeout and the time new value starts showing up should be similar to the ttl. Of course the precondition is the server should be hammered around the moment of initial timeout expiration.

Rails 4 Multithreaded App - ActiveRecord::ConnectionTimeoutError

I have a simple rails app that scrapes JSON from a remote URL for each instance of a model (let's call it A). The app then creates a new data-point under an associated model of the 1st. Let's call this middle model B and the data point model C. There's also a front end that let's users browse this data graphically/visually.
Thus the hierarchy is A has many -> B which has many -> C. I scrape a URL for each A which returns a few instances of B with new Cs that have data for the respective B.
While attempting to test/scale this app I have encountered a problem where rails will stop processing, hang for a while, and finally throw a "ActiveRecord::ConnectionTimeoutError could not obtain a database connection within 5.000 seconds" Obviously the 5 is just the default.
I can't understand why this is happening when 1) there are no DB calls being made explicitly, 2) the log doesn't show any under the hood DB calls happening when it does work 3) it works sometimes and not others.
What's going on with rails 4 AR and the connection pool?!
A couple of notes:
The general algorithm is to spawn a thread for each model A, scrape the data, create in memory new instances of model C, save all the C's in one transaction at the end.
Sometimes this works, other times it doesn't, i can't figure out what causes it to fail. However, once it fails it seems to fail more and more.
I eager load all the model A's and B's to begin with.
I use a transaction at the end to insert all the newly created C instances.
I currently use resque and resque scheduler to do this work but I highly doubt they are the source of the problem as it persists even if I just do "rails runner Class.do_work"
Any suggestions and or thoughts greatly appreciated!
I believe I have found the cause of this problem. When you loop through an association via
model.association.each do |a|
#work here
end
Rails does some behind the scenes work that "uses" a DB connection. I put uses in quotes because in my case I think the result is actually returned from memory. I eager loaded the association and thus the DB is never actually hit.
Preliminary testing of wrapping my block in a
ActiveRecord::Base.connection_pool.with_connection do
#something me doing?
end
seems to have resolved the issue.
I uncovered this by adding a backtrace to my thread's error message that was printing out.
-----For those using resque----
I also had to add a bit in my resque.rake file to get this fully working as intended.
task 'resque:setup' => :environment do
Resque.after_fork do |job|
ActiveRecord::Base.establish_connection
end
end
If you are you using
ActiveRecord::Base.transaction do
... code
end
to accomplish faster transactions in a thread, note that this locks the database. I had an app that did this for a hugely expensive process, in a thread, and it would lock the DB for over 5 seconds. It is faster, though it will lock your database

Connection pool issue with ActiveRecord objects in rufus-scheduler

I'm using rufus-scheduler to run a number of frequent jobs that do some various tasks with ActiveRecord objects. If there is any sort of network or postgresql hiccup, even after recovery, all the threads will throw the following error until the process is restarted:
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5 seconds (waited 5.000122687 seconds). The max pool size is currently 5; consider increasing it.
The error can easily be reproduced by restarting postgres. I've tried playing (up to 15) with the pool size, but no luck there.
That leads me to believe the connections are just in a stale state, which I thought would be fixed with the call to clear_stale_cached_connections!.
Is there a more reliable pattern to do this?
The block that is passed is a simple select and update active record call, and happens to matter what the AR object is.
The rufus job:
scheduler.every '5s' do
db do
DataFeed.update #standard AR select/update
end
end
wrapper:
def db(&block)
begin
ActiveRecord::Base.connection_pool.clear_stale_cached_connections!
#ActiveRecord::Base.establish_connection # this didn't help either way
yield block
rescue Exception => e
raise e
ensure
ActiveRecord::Base.connection.close if ActiveRecord::Base.connection
ActiveRecord::Base.clear_active_connections!
end
end
Rufus scheduler starts a new thread for every job.
ActiveRecord on the other hand cannot share connections between threads, so it needs to assign a connection to a specific thread.
When your thread doesn't have a connection yet, it will get one from the pool.
(If all connections in the pool are in use, it will wait untill one is returned from another thread. Eventually timing out and throwing ConnectionTimeoutError)
It is your responsibility to return it back to the pool when you are done with it, in a Rails app, this is done automatically. But if you are managing your own threads (as rufus does), you have to do this yourself.
Lucklily, there is an api for this:
If you put your code inside a with_connection block, it will get a connection form the pool, and release it when it is done
ActiveRecord::Base.connection_pool.with_connection do
#your code here
end
In your case:
def db
ActiveRecord::Base.connection_pool.with_connection do
yield
end
end
Should do the trick....
http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html#method-i-with_connection
The reason can be that you have many threads which are using all connections, if DataFeed.update method takes more than 5 seconds, than your block can be overlapped.
try
scheduler.every("5s", :allow_overlapping => false) do
#...
end
Also try release connection instead of closing it.
ActiveRecord::Base.connection_pool.release_connection
I don't really know about rufus-scheduler, but I got some ideas.
The first problem could be a bug on rufus-scheduler that does not checkout database connection properly. If it's the case the only solution is to clear stale connections manually as you already do and to inform the author of rufus-scheduler about your issue.
Another problem that could happen is that your DataFeed operation takes a really long time and because it is performed every 5 secondes Rails is running out of database connections, but it's rather unlikely.

When should I create Solr connection in a Rails app

I'm accessing Solr in a Ruby on Rails application by using rsolr (not Sunspot). I create the local solr object that I use to send requests like this:
solr = RSolr.connect(:url => "http://localhost:8983/solr")
as far as I understand, this is not really a connection but just an object that will issue requests on demand, so it shouldn't be expensive to keep it initialized and it should never disconnect. According to that, it should be ok to have one global solr object, create it at start time and forget about it. Right? But maybe it's not thread safe?
When should I create the solr connection?
All that the RSolr.connect method really does is sanitize and save the options that you're using. You can see that method here. It's passed a new connection object (which, notably, doesn't have an initialize method, so it's not doing anything when created) and the options that you pass to RSolr.connect.
So yes, you're right -- no harm at all in connecting once and leaving it connected forever hanging around in a variable somewhere. (For example, I memoize the result of RSolr.connect in my Solr/Rails app.)

why class variable of Application Controller in Rails is re-initialized in different request

I have my Application Controller called McController which extends ApplicationController, and i set a class variable in McController called ##scheduler_map like below:
class McController < ApplicationController
##scheduler_map = {}
def action
...
end
private
def get_scheduler(host, port)
scheduler = ##scheduler_map[host+"_"+port]
unless scheduler
scheduler = Scheduler.create(host, port)
##scheduler_map[host+"_"+port] = scheduler
end
scheduler
end
end
but i found that from second request start on ##scheduler_map is always an empty hash, i run it in development env, could someone know the reason? is that related to the running env?
Thank you in advance.
You answered your own question :-)
Yes this is caused by the development environment (i tested it) and to be more precise the config option "config.cache_classes = false" in config/environments/development.rb
This flag will cause all classes to be reloaded at every request.
This is done because then you dont have to restart the whole server when you make a small change to your controllers.
You might want to take in consideration that what you are trying can cause HUGE memory leaks when later run in production with a lot of visits.
Every time a user visits your site it will create a new entree in that hash and never gets cleaned.
Imagine what will happen if 10.000 users have visited your site? or what about 1.000.000?
All this data is kept in the systems memory so this can take a lot of space the longer the server is online.
Also, i'm not really sure this solution will work on a production server.
The server will create multiple threats to handle a lot of visitors on the same time.
I think (not sure) that every threat will have his own instances of the classes.
This means that in treat 1 the schedule map for ip xx might exist but in treat 2 it doesn't.
If you give me some more information about what this scheduler is i might be able to give a suggestion for a different solution.

Resources