Running a rails site right now using SQLite3.
About once every 500 requests or so, I get a
ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked:...
What's the way to fix this that would be minimally invasive to my code?
I'm using SQLLite at the moment because you can store the DB in source control which makes backing up natural and you can push changes out very quickly. However, it's obviously not really set up for concurrent access. I'll migrate over to MySQL tomorrow morning.
You mentioned that this is a Rails site. Rails allows you to set the SQLite retry timeout in your database.yml config file:
production:
adapter: sqlite3
database: db/mysite_prod.sqlite3
timeout: 10000
The timeout value is specified in miliseconds. Increasing it to 10 or 15 seconds should decrease the number of BusyExceptions you see in your log.
This is just a temporary solution, though. If your site needs true concurrency then you will have to migrate to another db engine.
By default, sqlite returns immediatly with a blocked, busy error if the database is busy and locked. You can ask for it to wait and keep trying for a while before giving up. This usually fixes the problem, unless you do have 1000s of threads accessing your db, when I agree sqlite would be inappropriate.
// set SQLite to wait and retry for up to 100ms if database locked
sqlite3_busy_timeout( db, 100 );
All of these things are true, but it doesn't answer the question, which is likely: why does my Rails app occasionally raise a SQLite3::BusyException in production?
#Shalmanese: what is the production hosting environment like? Is it on a shared host? Is the directory that contains the sqlite database on an NFS share? (Likely, on a shared host).
This problem likely has to do with the phenomena of file locking w/ NFS shares and SQLite's lack of concurrency.
If you have this issue but increasing the timeout does not change anything, you might have another concurrency issue with transactions, here is it in summary:
Begin a transaction (aquires a SHARED lock)
Read some data from DB (we are still using the SHARED lock)
Meanwhile, another process starts a transaction and write data (acquiring the RESERVED lock).
Then you try to write, you are now trying to request the RESERVED lock
SQLite raises the SQLITE_BUSY exception immediately (indenpendently of your timeout) because your previous reads may no longer be accurate by the time it can get the RESERVED lock.
One way to fix this is to patch the active_record sqlite adapter to aquire a RESERVED lock directly at the begining of the transaction by padding the :immediate option to the driver. This will decrease performance a bit, but at least all your transactions will honor your timeout and occurs one after the other. Here is how to do this using prepend (Ruby 2.0+) put this in a initializer:
module SqliteTransactionFix
def begin_db_transaction
log('begin immediate transaction', nil) { #connection.transaction(:immediate) }
end
end
module ActiveRecord
module ConnectionAdapters
class SQLiteAdapter < AbstractAdapter
prepend SqliteTransactionFix
end
end
end
Read more here: https://rails.lighthouseapp.com/projects/8994/tickets/5941-sqlite3busyexceptions-are-raised-immediately-in-some-cases-despite-setting-sqlite3_busy_timeout
Just for the record. In one application with Rails 2.3.8 we found out that Rails was ignoring the "timeout" option Rifkin Habsburg suggested.
After some more investigation we found a possibly related bug in Rails dev: http://dev.rubyonrails.org/ticket/8811. And after some more investigation we found the solution (tested with Rails 2.3.8):
Edit this ActiveRecord file: activerecord-2.3.8/lib/active_record/connection_adapters/sqlite_adapter.rb
Replace this:
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction }
end
with
def begin_db_transaction #:nodoc:
catch_schema_changes { #connection.transaction(:immediate) }
end
And that's all! We haven't noticed a performance drop and now the app supports many more petitions without breaking (it waits for the timeout). Sqlite is nice!
bundle exec rake db:reset
It worked for me it will reset and show the pending migration.
Sqlite can allow other processes to wait until the current one finished.
I use this line to connect when I know I may have multiple processes trying to access the Sqlite DB:
conn = sqlite3.connect('filename', isolation_level = 'exclusive')
According to the Python Sqlite Documentation:
You can control which kind of BEGIN
statements pysqlite implicitly
executes (or none at all) via the
isolation_level parameter to the
connect() call, or via the
isolation_level property of
connections.
I had a similar problem with rake db:migrate. Issue was that the working directory was on a SMB share.
I fixed it by copying the folder over to my local machine.
Most answers are for Rails rather than raw ruby, and OPs question IS for rails, which is fine. :)
So I just want to leave this solution over here should any raw ruby user have this problem, and is not using a yml configuration.
After instancing the connection, you can set it like this:
db = SQLite3::Database.new "#{path_to_your_db}/your_file.db"
db.busy_timeout=(15000) # in ms, meaning it will retry for 15 seconds before it raises an exception.
#This can be any number you want. Default value is 0.
Source: this link
- Open the database
db = sqlite3.open("filename")
-- Ten attempts are made to proceed, if the database is locked
function my_busy_handler(attempts_made)
if attempts_made < 10 then
return true
else
return false
end
end
-- Set the new busy handler
db:set_busy_handler(my_busy_handler)
-- Use the database
db:exec(...)
What table is being accessed when the lock is encountered?
Do you have long-running transactions?
Can you figure out which requests were still being processed when the lock was encountered?
Argh - the bane of my existence over the last week. Sqlite3 locks the db file when any process writes to the database. IE any UPDATE/INSERT type query (also select count(*) for some reason). However, it handles multiple reads just fine.
So, I finally got frustrated enough to write my own thread locking code around the database calls. By ensuring that the application can only have one thread writing to the database at any point, I was able to scale to 1000's of threads.
And yea, its slow as hell. But its also fast enough and correct, which is a nice property to have.
I found a deadlock on sqlite3 ruby extension and fix it here: have a go with it and see if this fixes ur problem.
https://github.com/dxj19831029/sqlite3-ruby
I opened a pull request, no response from them anymore.
Anyway, some busy exception is expected as described in sqlite3 itself.
Be aware with this condition: sqlite busy
The presence of a busy handler does not guarantee that it will be invoked when there is
lock contention. If SQLite determines that invoking the busy handler could result in a
deadlock, it will go ahead and return SQLITE_BUSY or SQLITE_IOERR_BLOCKED instead of
invoking the busy handler. Consider a scenario where one process is holding a read lock
that it is trying to promote to a reserved lock and a second process is holding a reserved
lock that it is trying to promote to an exclusive lock. The first process cannot proceed
because it is blocked by the second and the second process cannot proceed because it is
blocked by the first. If both processes invoke the busy handlers, neither will make any
progress. Therefore, SQLite returns SQLITE_BUSY for the first process, hoping that this
will induce the first process to release its read lock and allow the second process to
proceed.
If you meet this condition, timeout isn't valid anymore. To avoid it, don't put select inside begin/commit. or use exclusive lock for begin/commit.
Hope this helps. :)
this is often a consecutive fault of multiple processes accessing the same database, i.e. if the "allow only one instance" flag was not set in RubyMine
Try running the following, it may help:
ActiveRecord::Base.connection.execute("BEGIN TRANSACTION; END;")
From: Ruby: SQLite3::BusyException: database is locked:
This may clear up the any transaction holding up the system
I believe this happens when a transaction times out. You really should be using a "real" database. Something like Drizzle, or MySQL. Any reason why you prefer SQLite over the two prior options?
Related
I am working on a Rails 5.x application, and I use Postgres as my database.
I often run rake db:migrate on my production servers. Sometimes the migration will add a new column to the database, and this causes some controller actions to crash with the following error:
ActiveRecord::PreparedStatementCacheExpired: ERROR: cached plan must not change result type
This is happening in a critical controller action that needs to have zero downtime, so I need to find a way to prevent this crash from ever happening.
Should I catch the ActiveRecord::PreparedStatementCacheExpired error and retry the save? Or should I add some locking to this particular controller action, so that I don't start serving any new requests while a database migration is running?
What would be the best way to prevent this crash from ever happening again?
I was able to fix this issue in some places by using this retry_on_expired_cache helper:
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
class << self
# Retry automatically on ActiveRecord::PreparedStatementCacheExpired.
# (Do not use this for transactions with side-effects unless it is acceptable
# for these side-effects to occasionally happen twice.)
def retry_on_expired_cache(*_args)
retried ||= false
yield
rescue ActiveRecord::PreparedStatementCacheExpired
raise if retried
retried = true
retry
end
end
end
I would use it like this:
MyModel.retry_on_expired_cache do
#my_model.save
end
Unfortunately this was like playing "whack-a-mole", because this crash just kept happening all over my application during my rolling deploys (I'm not able to restart all the Rails processes at the same time.)
I finally learned that I can turn off prepared_statements to completely avoid this issue. (See this other question and answers on StackOverflow.)
I was worried about the performance penalty, but I found many reports from people who had set prepared_statements: false, and they hadn't noticed any problems. e.g. https://news.ycombinator.com/item?id=7264171
I created a file at config/initializers/disable_prepared_statements.rb:
db_configuration = ActiveRecord::Base.configurations[Rails.env]
db_configuration.merge!('prepared_statements' => false)
ActiveRecord::Base.establish_connection(db_configuration)
This allows me to continue setting the database configuration from the DATABASE_URL env variable, and 'prepared_statements' => false will be injected into the configuration.
This completely solves the ActiveRecord::PreparedStatementCacheExpired errors and makes it much easier to achieve high-availability for my service while still being able to modify the database.
Sporadically we get PG::UndefinedTable errors while using ActiveRecord. The association table name is some how corrupted and I quite often see
Cancelled appended to the end of the table name.
E.g:
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "fooCancell" does not exist
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "Cancelled" does not exist
ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation "barC" does not exist
In the example above, I have obfuscated the table name by using foo and bar.
We see this errors when the rails project is running inside Puma. Queue workers seems to be doing okay.
The tables in the error message doesn't correspond to real tables or models. It looks like the case of memory corruption. Has anyone seen such issues? If so how did you get around it?
puma.rb
on_worker_boot do
ActiveRecord::Base.establish_connection
end
database.yml
production:
url: <%= ENV["DATABASE_URL"] %>
pool: <%= ENV['DB_CONNECTION_POOL_SIZE'] || 5%>
reaping_frequency: <%= ENV['DB_CONNECTION_REAPING_FREQUENCY'] || 10 %>
prepared_statements: false
I'm hazarding a guess here, based on this possibly related error...
But you might be either:
calling fork within your application; OR
calling ActiveRecord routines (using database calls) before the server (puma) is forking it's worker processes (during the app initialization).
Either of these will break ActiveRecord's synchronization and cause multiple processes to share the database connection pool without synchronizing it's use (resulting in interlaced and corrupt database commands).
If you are using fork, make sure to close all the ActiveRecord database connections and reinitialize the connection pool (there's a function call that does it, but I don't remember it of the top of my head, maybe ActiveRecord.disconnect! or ActiveRecord.connection_pool.disconnect!).
Otherwise, before running Puma (either during the initialization process or using Puma's after_fork), close all the ActiveRecord database connections and reinitialize the connection pool.
It looks like reaping_frequency may be the issue. I found a couple claims that they may have a threading bug. I would try removing that option or setting it to nil and see if that works. The only other thing I can think of is if you are manually calling Thread.new and using active record within it.
Here are the few claims against reaping:
http://omegadelta.net/2014/03/15/the-rails-grim-reaper/
https://github.com/mperham/sidekiq/issues/1936
Search for "DO fear the Reaper" here:
https://www.google.com/amp/s/bibwild.wordpress.com/2014/07/17/activerecord-concurrency-in-rails4-avoid-leaked-connections/amp/
I am developing a Rails app for network automation. Part of app consists logic to run operations, part are operations themselves. Operation is simply a ruby class that performs several commands for network device (router, switch etc).
Right now, operation is simply part of Rails app repo. But in order to make development process more agile, I would like to decouple app and operations. I would have 2 repos - one for app and one for operations. App deploy would follow standard procedure, but operation would sync every time something is pushed to master. And what is more important, I don't want to restart app after operations repo update.
So my question is:
How to exclude several classes (or namespaces) from being cashed in production Rails app - I mean every time I call this class it would be reread file from disk. What could be potential dangers of doing so?
Some code example:
# Example operation - I would like to add or modify such classes withou
class FooOperation < BaseOperation
def perform(host)
conn = new_connection(host) # method from BaseOperation
result = conn.execute("foo")
if result =~ /Error/
# retry, its known bug in device foo
conn.execute("foo")
else
conn.exit
return success # method from BaseOperation
end
end
end
# somewhere in admin panel I would do so:
o = Operations.create(name: "Foo", class_name: "Foo")
o.id # => 123 # for next example
# Ruby worker which actually runs an operation
class OperationWorker
def perform(operation_id, host)
operation = Operation.find(operation_id)
# here, everytime I load this I want ruby to search for implementation on filesystem, never cache
klass = operation.class_name.constantize
class.new(host).perform #
end
end
i think you have quite a misunderstanding about how ruby code loading and interpretation works!
the fact that rails reloads classes at development time is kind of a "hack" to let you iterate on the code while the server has already loaded, parsed and executed parts of your application.
in order to do so, it has to implement quite some magic to unload your code and reload parts of it on change.
so if you want to have up-to-date code when executing an "operation" you are probably best of by spawning a new process. this will guarantee that your new code is read and parsed properly when executed with a blank state.
another thing you can do is use load instead of require because it will actually re-read the source on subsequent requests. you have to keep in mind, that subsequent calls to load just add to the already existing code in the ruby VM. so you need to make sure that every change is compatible with the already loaded code.
this could be circumvented by some clever instance_eval tricks, but i'm not sure that is what you want...
In my app I have several Builder classes that are responsible for taking data received from an external API request and building/saving resources to the database. I'm dealing with a large amount of data and have implemented the Parallel gem to speed this up by using multiple processes.
However, I'm finding that any test for a method that uses Parallel fails with the same error:
ActiveRecord::StatementInvalid:
PG::ConnectionBad: PQconsumeInput() server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Here is an example of the code being tested:
class AirportBuilder < Resource
def build_from_collection
Parallel.each(object_producer, in_processes: 24) do |params|
instance = Airport.find_or_initialize_by(fsid: params[:fs])
build!(instance, params)
end
end
end
I've done some searching on this but all the results in Google have to do with using multiple threads/processes to make the test suite run faster, which is a different problem.
Any ideas on how I can test this effectively without causing the PG error? I realize I may need to stub something out but am not quite sure what to stub and still have a meaningful test.
Thanks in advance to anyone who might be able to help!
Are you using too many database connections than are configured for your test database? Maybe try setting it to a pool size equal to the needs of your script (which looks like 24)?
test:
adapter: whatever
host: whatever
username: whatever
password: whatever
database: whatever
pool: 24
Heads up that you may also want to do some math on the default ActiveRecord connection pool. Some good info in this Heroku dev center article.
When a new resource is created and it needs to do some lengthy processing before the resource is ready, how do I send that processing away into the background where it won't hold up the current request or other traffic to my web-app?
in my model:
class User < ActiveRecord::Base
after_save :background_check
protected
def background_check
# check through a list of 10000000000001 mil different
# databases that takes approx one hour :)
if( check_for_record_in_www( self.username ) )
# code that is run after the 1 hour process is finished.
user.update_attribute( :has_record )
end
end
end
You should definitely check out the following Railscasts:
http://railscasts.com/episodes/127-rake-in-background
http://railscasts.com/episodes/128-starling-and-workling
http://railscasts.com/episodes/129-custom-daemon
http://railscasts.com/episodes/366-sidekiq
They explain how to run background processes in Rails in every possible way (with or without a queue ...)
I've just been experimenting with the 'delayed_job' gem because it works with the Heroku hosting platform and it was ridiculously easy to setup!!
Add gem to Gemfile, bundle install, rails g delayed_job, rake db:migrate
Then start a queue handler with;
RAILS_ENV=production script/delayed_job start
Where you have a method call which is your lengthy process i.e
company.send_mail_to_all_users
you change it to;
company.delay.send_mail_to_all_users
Check the full docs on github: https://github.com/collectiveidea/delayed_job
Start a separate process, which is probably most easily done with system, prepending a 'nohup' and appending an '&' to the end of the command you pass it. (Make sure the command is just one string argument, not a list of arguments.)
There are several reasons you want to do it this way, rather than, say, trying to use threads:
Ruby's threads can be a bit tricky when it comes to doing I/O; you have to take care that some things you do don't cause the entire process to block.
If you run a program with a different name, it's easily identifiable in 'ps', so you don't accidently think it's a FastCGI back-end gone wild or something, and kill it.
Really, the process you start should be "deamonized," see the Daemonize class for help.
you ideally want to use an existing background job server, rather than writing your own. these will typically let you submit a job and give it a unique key; you can then use the key to periodically query the jobserver for the status of your job without blocking your webapp. here is a nice roundup of the various options out there.
I like to use backgroundrb, its nice it allows you to communicate to it during long processes. So you can have status updates in your rails app
I think spawn is a great way to fork your process, do some processing in background, and show user just some confirmation that this processing was started.
What about:
def background_check
exec("script/runner check_for_record_in_www.rb #{self.username}") if fork == nil
end
The program "check_for_record_in_www.rb" will then run in another process and will have access to ActiveRecord, being able to access the database.