Custom busy handler for Rails SQLite3 - ruby-on-rails

I have a Rails/SQLite3 setup which is getting exceptions
ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked):
Now I understand that SQLite3 cannot do parallel writes, I'd be happy if they were blocked until the other write was finished, so done serially, rather than raising. This is something to do with default SQLite3 handler in C falling foul of Ruby's GIL, apparently. The fix, according to this post, is to install one's own "busy handler" for the SQLite3 connection, and I have this code in a Rails initializer
# config/initializers/sqlite3.rb
if ActiveRecord::Base.connection.adapter_name == 'SQLite' then
if raw_connection = ActiveRecord::Base.connection.raw_connection then
puts 'installing busy handler'
raw_connection.busy_handler do |count|
puts 'QUACK'
end
puts 'done'
else
raise RuntimeError 'no DB raw connection!'
end
end
On starting Rails (Puma in the console) I get the expected
installing busy handler
done
but on running parallel requests I get the exception with no QUACK, i.e., it seems that my handler has not been called. Is this a misunderstanding of the scope of initializers? Is there a "correct" way to get this handler installed?
(Obviously the real handler will not just say QUACK, but I find that my "real" handler has no effect so replace it by a debugging version.)

Related

How to mark a sidekiq task/job for retry without raising an error?

I use a Sidekiq queue to process communications with an unreliable, 3rd party API. Since this API is often down for a couple minutes at a time and then back up again, Sidekiq has been handy. When a connection issue happens, an error is raised and Sidekiq throws the job back in the queue to be retried again later, after some time has passed.
I use NewRelic to not only help debug crashes, but also for monitoring. My problem is that this current methodology above creates errors in NewRelic. If the 3rd party API is down for more than a couple of minutes, the error count accumulates enough to cause notifications to send out through NewRelic.
What I'd like to do is only raise an error from my worker when a certain number of retries have occurred for a job. I'm using sidekiq_retries_exhausted to do this. My problem is that I'm not quite sure how to put jobs back in the queue after they have an error without raising an error.
Does Sidekiq provide any facilities to return a job to a queue, increment the number of retries for the job, and have it sit there until it's due to run again, as if an exception was raised in the worker class?
You raise a specific error and tell the error service to ignore errors of that type. For NewRelic:
https://docs.newrelic.com/docs/agents/ruby-agent/installation-configuration/ruby-agent-configuration#error_collector.ignore_errors
Here is what I did to keep intentional retry errors out of AirBrake:
class TaskWorker
include Sidekiq::Worker
class RetryNotAnError < RuntimeError
end
def perform task_id
task = Task.find(task_id)
task.do_cool_stuff
if task.finished?
#log.debug "Task #{task_id} was successful."
return false
else
#log.debug "Task #{task_id} will try again later."
raise RetryNotAnError, task_id
end
end
end
Tell Airbrake to ignore it:
Airbrake.configure do |config|
config.ignore << 'RetryNotAnError'
end
It's good to make your exception name OBVIOUSLY not an error (e.g. RetryLaterNotAnError), as it will still show up in logs and such, and you don't want to freak people out when they see a bunch of them.
ps. That said, I would really like to see Sidekiq to provide an explicit, errorless retry mechanism.
If using Sidekiq Enterprise, one other option might be to utilize the optional set of additional error types that will then get treated as Sidekiq::Limiter::OverLimit violations.
For my purposes, I've used a new error class and then added it to the list in the config. Here are the notes from the sidekiq-ent code (not in the public sidekiq repo) on how to modify your config file:
# An optional set of additional error types which would be
# treated as a rate limit violation, so the job would automatically
# be rescheduled as with Sidekiq::Limiter::OverLimit.
#
# Sidekiq::Limiter.errors << MyApp::TooMuch
# Sidekiq::Limiter.errors = [Foo::Error, MyApp::Limited]
Inside the specific job you can specify the max_retries, or it will default to 20:
sidekiq_options max_limiter_retries: 10
Inside the job, I'll rescue the "expected" intermittent error that I'd rather not ignore completely and then raise the error I've added to the list, something like this:
rescue RestClient::RequestTimeout => e
raise SidekiqSoftRetry.new(e.inspect)
end
Here's what that looks like in my initialization file-- and Mike Perham was kind enough to respond with the option to update the global retry limit.
class SidekiqSoftRetry < RuntimeError
end
Sidekiq::Limiter::DEFAULT_OPTIONS[:reschedule] = 10
Sidekiq::Limiter.configure do |config|
config.errors.concat(
[
SidekiqSoftRetry,
]
)
end

Threading error when using `ActiveRecord with_connection do` & ActionController::Live

Major edit: Since originally finding this issue I have whittled it down to the below. I think this is now a marginally more precise description of the problem. Comments on the OP may therefore not correlate entirely.
Edit lightly modified version posted in rails/puma projects: https://github.com/rails/rails/issues/21209, https://github.com/puma/puma/issues/758
Edit Now reproduced with OS X and Rainbows
Summary: When using Puma and running long-running connections I am consistently receiving errors related to ActiveRecord connections crossing threads. This manifests itself in message like message type 0x## arrived from server while idle and a locked (crashed) server.
The set up:
Ubuntu 15 / OSX Yosemite
PostgreSQL (9.4) / MySQL (mysqld 5.6.25-0ubuntu0.15.04.1)
Ruby - MRI 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux] / Rubinius rbx-2.5.8
Rails (4.2.3, 4.2.1)
Puma (2.12.2, 2.11)
pg (pg-0.18.2) / mysql2
Note, not all combinations of the above versions have been tried. First listed version is what I'm currently testing against.
rails new issue-test
Add a route get 'events' => 'streaming#events'
Add a controller streaming_controller.rb
Set up database stuff (pool: 2, but seen with different pool sizes)
Code:
class StreamingController < ApplicationController
include ActionController::Live
def events
begin
response.headers["Content-Type"] = "text/event-stream"
sse = SSE.new(response.stream)
sse.write( {:data => 'starting'} , {:event => :version_heartbeat})
ActiveRecord::Base.connection_pool.release_connection
while true do
ActiveRecord::Base.connection_pool.with_connection do |conn|
ActiveRecord::Base.connection.query_cache.clear
logger.info 'START'
conn.execute 'SELECT pg_sleep(3)'
logger.info 'FINISH'
sse.write( {:data => 'continuing'}, {:event => :version_heartbeat})
sleep 0.5
end
end
rescue IOError
rescue ClientDisconnected
ensure
logger.info 'Ensuring event stream is closed'
sse.close
end
render nothing: true
end
end
Puma configuration:
workers 1
threads 2, 2
#...
bind "tcp://0.0.0.0:9292"
#...
activate_control_app
on_worker_boot do
require "active_record"
ActiveRecord::Base.connection.disconnect! rescue ActiveRecord::ConnectionNotEstablished
ActiveRecord::Base.establish_connection(YAML.load_file("#{app_dir}/config/database.yml")[rails_env])
end
Run the server puma -e production -C path/to/puma/config/production.rb
Test script:
#!/bin/bash
timeout 30 curl -vS http://0.0.0.0/events &
timeout 5 curl -vS http://0.0.0.0/events &
timeout 30 curl -vS http://0.0.0.0/events
This reasonably consistently results in a complete lock of the application server (in PostgreSQL, see notes). The scary message comes from libpq:
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x54 arrived from server while idle
In the 'real-world' I have quite a few extra elements and the issue presents itself at random. My research indicates that this message comes from libpq and is subtext for 'communication problem, possibly using connection in different threads'. Finally, while writing this up, I had the server lock up without a single message in any log.
So, the question(s):
Is the pattern I'm following not legal in some way? What have I mis[sed|understood]?
What is the 'standard' for working with database connections here that should avoid these problems?
Can you see a way to reliably reproduce this?
or
What is the underlying issue here and how can I solve it?
MySQL
If running MySQL, the message is a bit different, and the application recovers (though I'm not sure if it is then in some undefined state):
F, [2015-07-30T14:12:07.078215 #15606] FATAL -- :
ActiveRecord::StatementInvalid (Mysql2::Error: This connection is in use by: #<Thread:0x007f563b2faa88#/home/dev/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/actionpack-4.2.3/lib/action_controller/metal/live.rb:269 sleep>: SELECT `tasks`.* FROM `tasks` ORDER BY `tasks`.`id` ASC LIMIT 1):
Warning: read 'answer' as 'seems to make a difference'
I don't see the issue happen if I change the controller block to look like:
begin
#...
while true do
t = Thread.new do #<<<<<<<<<<<<<<<<<
ActiveRecord::Base.connection_pool.with_connection do |conn|
#...
end
end
t.join #<<<<<<<<<<<<<<<<<
end
#...
rescue IOError
#...
But I don't know whether this has actually solved the problem or just made it extremely unlikely. Nor can I really fathom why this would make a difference.
Posting this as a solution in case it helps, but still digging on the issue.

How can I ensure an operation runs before Rails exits, without using `at_exit`?

I have an operation that I need to execute in my rails application that before my Rails app dies. Is there a hook I can utilize in Rails for this? Something similar to at_exit I guess.
Ruby itself supports two hooks, BEGIN and END, which are run at the start of a script and as the interpreter stops running it.
See "What does Ruby's BEGIN do?" for more information.
The BEGIN documentation says:
Designates, via code block, code to be executed unconditionally before sequential execution of the program begins. Sometimes used to simulate forward references to methods.
puts times_3(gets.to_i)
BEGIN {
def times_3(n)
n * 3
end
}
The END documentations says:
Designates, via code block, code to be executed just prior to program termination.
END { puts "Bye!" }
Okay so I am making no guarantees as to impact because I have not tested this at all but you could define your own hook e.g.
ObjectSpace.define_finalizer(YOUR_RAILS_APP::Application, proc {puts "exiting now"})
Note this will execute after at_exit so the rails application server output will look like
Stopping ...
Exiting
exiting now
With Tin Man's solution included
ObjectSpace.define_finalizer(YOUR_RAILS_APP::Application, proc {puts "exiting now"})
END { puts "exiting again" }
Output is
Stopping ...
Exiting
exiting again
exiting now

What kind of exception is raised by ActiveRecord::lock!?

I am using lock! in my code and want to catch the exception thrown if lock! fails for some reason (e.g. cannot get the lock). What kind of exceptions can lock! throw? I checked the ruby docs but couldn't find the specific Exception classes.
Thanks.
When in doubt, probe.
Consider the following pair of functions:
def long_hold
ActiveRecord::Base.transaction do
u = User.find(220)
u.lock!
sleep 100.seconds
u.email="foo#bar.com"
u.save!
end
end
def short_hold
ActiveRecord::Base.transaction do
u = User.find(220)
u.lock!
u.email="foo#bar.com"
u.save!
end
end
In my setup (OSX 10.11, ruby 2.2.4, rails 4.2, postgres 9.5), running long_hold in one rails console and then running short_hold in a second console, I observe short_hold blocks until long_hold completes; moreover, instrumenting the code with puts, we see that while long_hold is sleeping, short_hold is waiting to acquire the lock.
Assuming no caveats about the independence of rails consoles, this suggests that no exceptions are thrown if a second process tries to lock a row that is already locked, but that process blocks until the first completes.
Here is the source for that locking call. It calls reload and its source looks like this:
# File lib/active_record/base.rb, line 2333
2333: def reload(options = nil)
2334: clear_aggregation_cache
2335: clear_association_cache
2336: #attributes.update(self.class.find(self.id, options).instance_variable_get('#attributes'))
2337: #attributes_cache = {}
2338: self
2339: end
so when you call reload(:lock => lock) as the call to lock does it it really updating the attributes of that record.
There are a lot of different situations here. You could try to lock a record that dosnt exist, or lock one that has been locked elsewhere. What error are you interested in catching?

ActiveRecord::StatementInvalid when process receives SIGTERM?

In my Rails app, I have a script that updates some records in the database. When I send a SIGTERM to kill the script, it occasionally receives that signal while ActiveRecord is executing a query. This leads to an ActiveRecord::StatementInvalid exception being raised.
I'd like to catch StatementInvalid exceptions that occur when they're they're the result of a SIGTERM and exit the script. How can I tell that a StatementInvalid is occurring because of a signal and not for some other reason?
If you trap the TERM signal, I believe you will avoid the exception. You can do this at the beginning of your script (or really anywhere for that matter, but you only need to do it once).
Signal.trap("TERM") do
Kernel.exit!
end
The reason you get the StatementInvalid error is Ruby handles the signal by raising a SIGTERM exception at the place of current execution. ActiveRecord is catching the exception and rethrowing it as StatementInvalid. By setting a Signal handler, Ruby will execute your handler instead of raising an exception.
See the Ruby Signal documentation for more information.
It sounds like this "script" is external to the Rails app (script/runner or similar?), so perhaps you can decouple the "signal handler" and "worker"? Eg can you fork a child process/thread/fiber/... to do the database update, and signal the parent to indicate "stop now"? Of course the parent will then have to "signal" the child to stop using some appropriate mechanism (not SIGTERM ;-)).
This is not an exact answer to the OP, however, you can control the exit point - the program will exit only after reaching the exit point defined by you.
time_to_die=false
# Prevent abrupt stopping of the daemon.
Signal.trap("TERM") { time_to_die=true; "SIG_IGN" }
loop {
.
.
exit_gracefully if time_to_die
.
.
}
def exit_gracefully
#Cleaning up..
$log.log "#{Time.now} TERM signal received. Exiting.."
$db.close
exit
end

Resources