I am using Ruby 2.7 with Mongo 2.17 client. Currently using Sidekiq with ActiveJob to perform millions of Jobs executions to do single transactions to AWS DocumentDB. While reading the Mongo client documentation I see that they claim that is a bad idea to instantiate a Client per request, rather than just having 1 and reusing it.
Currently the job that runs millions of times does instantiate a client and closes it at the end, the job has many threads executing per Sidekiq process, currently running multiple Sidekiq processes:
jobs/my_job.rb
def perform(document)
client = Mongo::Client.new(DOCUMENTDB_HOST, DOCUMENTDB_OPTIONS)
client.insert_one(document)
client.close
end
From the documentation it states:
The default configuration for a Mongo::Client works for most applications:
client = Mongo::Client.new(["localhost:27017"])
Create this client once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient and not what the client was designed for.
To support extremely high numbers of concurrent MongoDB operations within one process, increase max_pool_size:
client = Mongo::Client.new(["localhost:27017"], max_pool_size: 200)
Any number of threads are allowed to wait for connections to become available, and they can wait the default (1 second) or the wait_queue_timeout setting:
client = Mongo::Client.new(["localhost:27017"], wait_queue_timeout: 0.5)
When #close is called on a client by any thread, all connections are closed:
client.close
Note that when creating a client using the block syntax described above, the client is automatically closed after the block finishes executing.
My question would be if this statement also applies for isolated Sidekiq jobs execution, and if so, how could i recycle Mongo Client connection object along a Sidekiq process? I could think of having a global ##client in the Sidekiq initializer:
config/initializers/sidekiq.rb:
##client = Mongo::Client.new(DOCUMENTDB_HOST, DOCUMENTDB_OPTIONS)
and then:
jobs/my_job.rb:
def perform(document)
##client[:my_collection].insert_one(document)
end
Note:
No significant errors are raised, the whole system just get frozen and I get the following exception thrown randomly after the system has several minutes running correctly:
OpenSSL::SSL::SSLError: SSL_connect SYSCALL returned=5 errno=0 state=SSLv3/TLS write client hello (for 10.0.0.123:27017
UPDATE:
I tried 'reusing' the connection client by creating a global variable with the connection object in an initializer:
config/initializers/mongodb_client.rb
$mongo_client = Mongo::Client.new(DOCUMENTDB_HOST, DOCUMENTDB_OPTIONS)
and then using it inside my ActiveJob class. So far it seems to work good but I am unaware of side effects; actually I did start many Sidekiq processes and I am closely watching at the logs looking for exceptions thrown, so far all good.
jobs/my_job.rb
def perfom(document)
$mongo_client[:activity_log].insert_one(log_document)
end
It looks like the MongoClient is threadsafe, just set that :max_pool_size to your Sidekiq concurrency so each job thread can concurrently use the client.
I'm using RoR version 4.2.3, and I understand I can set the isolation level of my transactions. However, where do I define setting the isolation level of all transactions? so I only have to define it once and then not worry about it?
I'm using postgreSQL as my database
There does not seem to be a global isolation option, thus you are left with four options:
Monkeypatch existing transaction implementation, so that it picks
your desired isolation level. (Monkeypatching is not desirable)
Use correct isolation level throughout your application.
SomeModel.transaction(isolation: :read_committed)
Extend ActiveRecord and create your own custom transaction method.
As commented - you may be able to edit the default isolation level in DB configuration. For postgres it's this one
Example code:
#lib/activerecord_extension.rb
module ActiveRecordExtension
extend ActiveSupport::Concern
module ClassMethods
def custom_transaction
transaction(isolation: :read_committed) do
yield
end
end
end
end
ActiveRecord::Base.send(:include, ActiveRecordExtension)
Then in your initializers:
#config/initializers/activerecord_extension.rb
require "active_record_extension"
Afterwards you can use:
MyModel.custom_transaction do
<...>
end
And in the future, this will allow you to change the isolation level in one place.
Rails doesn't support setting a global isolation level, but Postgres lets you set one for a session. You can hook into Rails' connection establishment to run a command every time a connection is made, thought the techniques for this all rely on monkeypatching and may be questionable.
Run Raw SQL in Rails after connecting to Database
Can I hook into ActiveRecord connection establishment?
Then configure your isolation level with:
SET SESSION CHARACTERISTICS AS TRANSACTION transaction_mode
Though this is interesting, I'd go with something more like Magnuss's answer for maintainability and sanity.
I have an initializer in my config directory, which is something like this:
ActiveSupport::Notifications.subscribe "handle_translation_event" do |name, start, finish, id, payload|
puts "Called"
end
I have a Rails development server running ( rails s ), and I start in parallel a rails console ( rails c ). Imagine that in the rails console, I write:
ActiveSupport::Notifications.instrument("handle_translation_event")
I cannot see this reflected in the server logs. Is it possible to trigger an event from the console, and have it affect the server? My guess is no.
ActiveSupport::Notifications uses ActiveSupport::Notifications::Fanout class as notifier to notify the subscribers.
This notifier uses simple instance variables to store the subscribers.
You can create your own implementation (with a database backed solution) and you can set the actual notifier to your implementation by setting the notifier attribute of ActiveSupport::Notifications.
ActiveSupport::Notifications.notifier = my_implementation
I could imagine a simple Redis backed implementation of it, because it has a publish/subscribe feature. But if you don't have Redis as a dependency, you can also do it in SQL.
In my rails application, I have a background process runner, model name Worker, that checks for new tasks to run every 10 seconds. This check generates two SQL queries each time - one to look for new jobs, one to delete old completed ones.
The problem with this - the main log file gets spammed for each of those queries.
Can I direct the SQL queries spawned by the Worker model into a separate log file, or at least silence them? Overwriting Worker.logger does not work - it redirects only the messages that explicitly call logger.debug("something").
The simplest and most idiomatic solution
logger.silence do
do_something
end
See Logger#silence
Queries are logged at Adapter level as I demonstrated here.
How do I get the last SQL query performed by ActiveRecord in Ruby on Rails?
You can't change the behavior unless tweaking the Adapter behavior with some really really horrible hacks.
class Worker < ActiveRecord::Base
def run
old_level, self.class.logger.level = self.class.logger.level, Logger::WARN
run_outstanding_jobs
remove_obsolete_jobs
ensure
self.class.logger.level = old_level
end
end
This is a fairly familiar idiom. I've seen it many times, in different situations. Of course, if you didn't know that ActiveRecord::Base.logger can be changed like that, it would have been hard to guess.
One caveat of this solution: this changes the logger level for all of ActiveRecord, ActionController, ActionView, ActionMailer and ActiveResource. This is because there is a single Logger instance shared by all modules.
Running a rails site right now using SQLite3.
About once every 500 requests or so, I get a
ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked:...
What's the way to fix this that would be minimally invasive to my code?
I'm using SQLLite at the moment because you can store the DB in source control which makes backing up natural and you can push changes out very quickly. However, it's obviously not really set up for concurrent access. I'll migrate over to MySQL tomorrow morning.
You mentioned that this is a Rails site. Rails allows you to set the SQLite retry timeout in your database.yml config file:
production:
adapter: sqlite3
database: db/mysite_prod.sqlite3
timeout: 10000
The timeout value is specified in miliseconds. Increasing it to 10 or 15 seconds should decrease the number of BusyExceptions you see in your log.
This is just a temporary solution, though. If your site needs true concurrency then you will have to migrate to another db engine.
By default, sqlite returns immediatly with a blocked, busy error if the database is busy and locked. You can ask for it to wait and keep trying for a while before giving up. This usually fixes the problem, unless you do have 1000s of threads accessing your db, when I agree sqlite would be inappropriate.
// set SQLite to wait and retry for up to 100ms if database locked
sqlite3_busy_timeout( db, 100 );
All of these things are true, but it doesn't answer the question, which is likely: why does my Rails app occasionally raise a SQLite3::BusyException in production?
#Shalmanese: what is the production hosting environment like? Is it on a shared host? Is the directory that contains the sqlite database on an NFS share? (Likely, on a shared host).
This problem likely has to do with the phenomena of file locking w/ NFS shares and SQLite's lack of concurrency.
If you have this issue but increasing the timeout does not change anything, you might have another concurrency issue with transactions, here is it in summary:
Begin a transaction (aquires a SHARED lock)
Read some data from DB (we are still using the SHARED lock)
Meanwhile, another process starts a transaction and write data (acquiring the RESERVED lock).
Then you try to write, you are now trying to request the RESERVED lock
SQLite raises the SQLITE_BUSY exception immediately (indenpendently of your timeout) because your previous reads may no longer be accurate by the time it can get the RESERVED lock.
One way to fix this is to patch the active_record sqlite adapter to aquire a RESERVED lock directly at the begining of the transaction by padding the :immediate option to the driver. This will decrease performance a bit, but at least all your transactions will honor your timeout and occurs one after the other. Here is how to do this using prepend (Ruby 2.0+) put this in a initializer:
module SqliteTransactionFix
def begin_db_transaction
log('begin immediate transaction', nil) { #connection.transaction(:immediate) }
end
end
module ActiveRecord
module ConnectionAdapters
class SQLiteAdapter < AbstractAdapter
prepend SqliteTransactionFix
end
end
end
Read more here: https://rails.lighthouseapp.com/projects/8994/tickets/5941-sqlite3busyexceptions-are-raised-immediately-in-some-cases-despite-setting-sqlite3_busy_timeout
Just for the record. In one application with Rails 2.3.8 we found out that Rails was ignoring the "timeout" option Rifkin Habsburg suggested.
After some more investigation we found a possibly related bug in Rails dev: http://dev.rubyonrails.org/ticket/8811. And after some more investigation we found the solution (tested with Rails 2.3.8):
Edit this ActiveRecord file: activerecord-2.3.8/lib/active_record/connection_adapters/sqlite_adapter.rb
Replace this:
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction }
end
with
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction(:immediate) }
end
And that's all! We haven't noticed a performance drop and now the app supports many more petitions without breaking (it waits for the timeout). Sqlite is nice!
bundle exec rake db:reset
It worked for me it will reset and show the pending migration.
Sqlite can allow other processes to wait until the current one finished.
I use this line to connect when I know I may have multiple processes trying to access the Sqlite DB:
conn = sqlite3.connect('filename', isolation_level = 'exclusive')
According to the Python Sqlite Documentation:
You can control which kind of BEGIN
statements pysqlite implicitly
executes (or none at all) via the
isolation_level parameter to the
connect() call, or via the
isolation_level property of
connections.
I had a similar problem with rake db:migrate. Issue was that the working directory was on a SMB share.
I fixed it by copying the folder over to my local machine.
Most answers are for Rails rather than raw ruby, and OPs question IS for rails, which is fine. :)
So I just want to leave this solution over here should any raw ruby user have this problem, and is not using a yml configuration.
After instancing the connection, you can set it like this:
db = SQLite3::Database.new "#{path_to_your_db}/your_file.db"
db.busy_timeout=(15000) # in ms, meaning it will retry for 15 seconds before it raises an exception.
#This can be any number you want. Default value is 0.
Source: this link
- Open the database
db = sqlite3.open("filename")
-- Ten attempts are made to proceed, if the database is locked
function my_busy_handler(attempts_made)
if attempts_made < 10 then
return true
else
return false
end
end
-- Set the new busy handler
db:set_busy_handler(my_busy_handler)
-- Use the database
db:exec(...)
What table is being accessed when the lock is encountered?
Do you have long-running transactions?
Can you figure out which requests were still being processed when the lock was encountered?
Argh - the bane of my existence over the last week. Sqlite3 locks the db file when any process writes to the database. IE any UPDATE/INSERT type query (also select count(*) for some reason). However, it handles multiple reads just fine.
So, I finally got frustrated enough to write my own thread locking code around the database calls. By ensuring that the application can only have one thread writing to the database at any point, I was able to scale to 1000's of threads.
And yea, its slow as hell. But its also fast enough and correct, which is a nice property to have.
I found a deadlock on sqlite3 ruby extension and fix it here: have a go with it and see if this fixes ur problem.
https://github.com/dxj19831029/sqlite3-ruby
I opened a pull request, no response from them anymore.
Anyway, some busy exception is expected as described in sqlite3 itself.
Be aware with this condition: sqlite busy
The presence of a busy handler does not guarantee that it will be invoked when there is
lock contention. If SQLite determines that invoking the busy handler could result in a
deadlock, it will go ahead and return SQLITE_BUSY or SQLITE_IOERR_BLOCKED instead of
invoking the busy handler. Consider a scenario where one process is holding a read lock
that it is trying to promote to a reserved lock and a second process is holding a reserved
lock that it is trying to promote to an exclusive lock. The first process cannot proceed
because it is blocked by the second and the second process cannot proceed because it is
blocked by the first. If both processes invoke the busy handlers, neither will make any
progress. Therefore, SQLite returns SQLITE_BUSY for the first process, hoping that this
will induce the first process to release its read lock and allow the second process to
proceed.
If you meet this condition, timeout isn't valid anymore. To avoid it, don't put select inside begin/commit. or use exclusive lock for begin/commit.
Hope this helps. :)
this is often a consecutive fault of multiple processes accessing the same database, i.e. if the "allow only one instance" flag was not set in RubyMine
Try running the following, it may help:
ActiveRecord::Base.connection.execute("BEGIN TRANSACTION; END;")
From: Ruby: SQLite3::BusyException: database is locked:
This may clear up the any transaction holding up the system
I believe this happens when a transaction times out. You really should be using a "real" database. Something like Drizzle, or MySQL. Any reason why you prefer SQLite over the two prior options?