How to prevent retrying for some Exception/Error on sidekiq - ruby-on-rails

I have a sidekiq worker which will request 3rd party api(Mailchimp) and got some response. Sometimes it will response an error message which the api gem will raise an error.
However, those Errors are normal and no need to retry. So I would like Sidekiq prevent retry when those Errors raised.
I have tried a simply rescue, but it won't prevent the sidekiq capture the error raised.
def preform(id)
UpdateMailchimpService.new.(id)
rescue
Mailchimp::ListInvalidBounceMemberError
end
Any way to do this? Thanks
Update
Finally found that my problem was caused by the broken of our deploy tool(deployment failed but not realised). Actually, the Sidekiq will ignore any rescued error/exception and they are not be retried and reported to Bugsnag.
In Bugsnag's documentation, it clearly said:
Bugsnag should be installed and configured, and any unhandled exceptions will be automatically detected and should appear in your Bugsnag dashboard.
This post on github didn't have an clear explanation so that's why I am confused by this question.

Your assumption/example is incorrect. Do the normal Ruby thing: rescue the error and ignore it.
def perform(id)
UpdateMailchimpService.new.(id)
rescue NormalError
# job will succeed normally and Sidekiq won't retry it.
end

Use retry: false advanced option:
class UpdateMailchimpWorker
include Sidekiq::Worker
sidekiq_options retry: false # ⇐ HERE
def perform(id)
UpdateMailchimpService.new.(id)
end
end

Related

Custom busy handler for Rails SQLite3

I have a Rails/SQLite3 setup which is getting exceptions
ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked):
Now I understand that SQLite3 cannot do parallel writes, I'd be happy if they were blocked until the other write was finished, so done serially, rather than raising. This is something to do with default SQLite3 handler in C falling foul of Ruby's GIL, apparently. The fix, according to this post, is to install one's own "busy handler" for the SQLite3 connection, and I have this code in a Rails initializer
# config/initializers/sqlite3.rb
if ActiveRecord::Base.connection.adapter_name == 'SQLite' then
if raw_connection = ActiveRecord::Base.connection.raw_connection then
puts 'installing busy handler'
raw_connection.busy_handler do |count|
puts 'QUACK'
end
puts 'done'
else
raise RuntimeError 'no DB raw connection!'
end
end
On starting Rails (Puma in the console) I get the expected
installing busy handler
done
but on running parallel requests I get the exception with no QUACK, i.e., it seems that my handler has not been called. Is this a misunderstanding of the scope of initializers? Is there a "correct" way to get this handler installed?
(Obviously the real handler will not just say QUACK, but I find that my "real" handler has no effect so replace it by a debugging version.)

How to mark a sidekiq task/job for retry without raising an error?

I use a Sidekiq queue to process communications with an unreliable, 3rd party API. Since this API is often down for a couple minutes at a time and then back up again, Sidekiq has been handy. When a connection issue happens, an error is raised and Sidekiq throws the job back in the queue to be retried again later, after some time has passed.
I use NewRelic to not only help debug crashes, but also for monitoring. My problem is that this current methodology above creates errors in NewRelic. If the 3rd party API is down for more than a couple of minutes, the error count accumulates enough to cause notifications to send out through NewRelic.
What I'd like to do is only raise an error from my worker when a certain number of retries have occurred for a job. I'm using sidekiq_retries_exhausted to do this. My problem is that I'm not quite sure how to put jobs back in the queue after they have an error without raising an error.
Does Sidekiq provide any facilities to return a job to a queue, increment the number of retries for the job, and have it sit there until it's due to run again, as if an exception was raised in the worker class?
You raise a specific error and tell the error service to ignore errors of that type. For NewRelic:
https://docs.newrelic.com/docs/agents/ruby-agent/installation-configuration/ruby-agent-configuration#error_collector.ignore_errors
Here is what I did to keep intentional retry errors out of AirBrake:
class TaskWorker
include Sidekiq::Worker
class RetryNotAnError < RuntimeError
end
def perform task_id
task = Task.find(task_id)
task.do_cool_stuff
if task.finished?
#log.debug "Task #{task_id} was successful."
return false
else
#log.debug "Task #{task_id} will try again later."
raise RetryNotAnError, task_id
end
end
end
Tell Airbrake to ignore it:
Airbrake.configure do |config|
config.ignore << 'RetryNotAnError'
end
It's good to make your exception name OBVIOUSLY not an error (e.g. RetryLaterNotAnError), as it will still show up in logs and such, and you don't want to freak people out when they see a bunch of them.
ps. That said, I would really like to see Sidekiq to provide an explicit, errorless retry mechanism.
If using Sidekiq Enterprise, one other option might be to utilize the optional set of additional error types that will then get treated as Sidekiq::Limiter::OverLimit violations.
For my purposes, I've used a new error class and then added it to the list in the config. Here are the notes from the sidekiq-ent code (not in the public sidekiq repo) on how to modify your config file:
# An optional set of additional error types which would be
# treated as a rate limit violation, so the job would automatically
# be rescheduled as with Sidekiq::Limiter::OverLimit.
#
# Sidekiq::Limiter.errors << MyApp::TooMuch
# Sidekiq::Limiter.errors = [Foo::Error, MyApp::Limited]
Inside the specific job you can specify the max_retries, or it will default to 20:
sidekiq_options max_limiter_retries: 10
Inside the job, I'll rescue the "expected" intermittent error that I'd rather not ignore completely and then raise the error I've added to the list, something like this:
rescue RestClient::RequestTimeout => e
raise SidekiqSoftRetry.new(e.inspect)
end
Here's what that looks like in my initialization file-- and Mike Perham was kind enough to respond with the option to update the global retry limit.
class SidekiqSoftRetry < RuntimeError
end
Sidekiq::Limiter::DEFAULT_OPTIONS[:reschedule] = 10
Sidekiq::Limiter.configure do |config|
config.errors.concat(
[
SidekiqSoftRetry,
]
)
end

Threading error when using `ActiveRecord with_connection do` & ActionController::Live

Major edit: Since originally finding this issue I have whittled it down to the below. I think this is now a marginally more precise description of the problem. Comments on the OP may therefore not correlate entirely.
Edit lightly modified version posted in rails/puma projects: https://github.com/rails/rails/issues/21209, https://github.com/puma/puma/issues/758
Edit Now reproduced with OS X and Rainbows
Summary: When using Puma and running long-running connections I am consistently receiving errors related to ActiveRecord connections crossing threads. This manifests itself in message like message type 0x## arrived from server while idle and a locked (crashed) server.
The set up:
Ubuntu 15 / OSX Yosemite
PostgreSQL (9.4) / MySQL (mysqld 5.6.25-0ubuntu0.15.04.1)
Ruby - MRI 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux] / Rubinius rbx-2.5.8
Rails (4.2.3, 4.2.1)
Puma (2.12.2, 2.11)
pg (pg-0.18.2) / mysql2
Note, not all combinations of the above versions have been tried. First listed version is what I'm currently testing against.
rails new issue-test
Add a route get 'events' => 'streaming#events'
Add a controller streaming_controller.rb
Set up database stuff (pool: 2, but seen with different pool sizes)
Code:
class StreamingController < ApplicationController
include ActionController::Live
def events
begin
response.headers["Content-Type"] = "text/event-stream"
sse = SSE.new(response.stream)
sse.write( {:data => 'starting'} , {:event => :version_heartbeat})
ActiveRecord::Base.connection_pool.release_connection
while true do
ActiveRecord::Base.connection_pool.with_connection do |conn|
ActiveRecord::Base.connection.query_cache.clear
logger.info 'START'
conn.execute 'SELECT pg_sleep(3)'
logger.info 'FINISH'
sse.write( {:data => 'continuing'}, {:event => :version_heartbeat})
sleep 0.5
end
end
rescue IOError
rescue ClientDisconnected
ensure
logger.info 'Ensuring event stream is closed'
sse.close
end
render nothing: true
end
end
Puma configuration:
workers 1
threads 2, 2
#...
bind "tcp://0.0.0.0:9292"
#...
activate_control_app
on_worker_boot do
require "active_record"
ActiveRecord::Base.connection.disconnect! rescue ActiveRecord::ConnectionNotEstablished
ActiveRecord::Base.establish_connection(YAML.load_file("#{app_dir}/config/database.yml")[rails_env])
end
Run the server puma -e production -C path/to/puma/config/production.rb
Test script:
#!/bin/bash
timeout 30 curl -vS http://0.0.0.0/events &
timeout 5 curl -vS http://0.0.0.0/events &
timeout 30 curl -vS http://0.0.0.0/events
This reasonably consistently results in a complete lock of the application server (in PostgreSQL, see notes). The scary message comes from libpq:
message type 0x44 arrived from server while idle
message type 0x43 arrived from server while idle
message type 0x5a arrived from server while idle
message type 0x54 arrived from server while idle
In the 'real-world' I have quite a few extra elements and the issue presents itself at random. My research indicates that this message comes from libpq and is subtext for 'communication problem, possibly using connection in different threads'. Finally, while writing this up, I had the server lock up without a single message in any log.
So, the question(s):
Is the pattern I'm following not legal in some way? What have I mis[sed|understood]?
What is the 'standard' for working with database connections here that should avoid these problems?
Can you see a way to reliably reproduce this?
or
What is the underlying issue here and how can I solve it?
MySQL
If running MySQL, the message is a bit different, and the application recovers (though I'm not sure if it is then in some undefined state):
F, [2015-07-30T14:12:07.078215 #15606] FATAL -- :
ActiveRecord::StatementInvalid (Mysql2::Error: This connection is in use by: #<Thread:0x007f563b2faa88#/home/dev/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/actionpack-4.2.3/lib/action_controller/metal/live.rb:269 sleep>: SELECT `tasks`.* FROM `tasks` ORDER BY `tasks`.`id` ASC LIMIT 1):
Warning: read 'answer' as 'seems to make a difference'
I don't see the issue happen if I change the controller block to look like:
begin
#...
while true do
t = Thread.new do #<<<<<<<<<<<<<<<<<
ActiveRecord::Base.connection_pool.with_connection do |conn|
#...
end
end
t.join #<<<<<<<<<<<<<<<<<
end
#...
rescue IOError
#...
But I don't know whether this has actually solved the problem or just made it extremely unlikely. Nor can I really fathom why this would make a difference.
Posting this as a solution in case it helps, but still digging on the issue.

Delayed Job Error Hook

Is the Delayed Job error hook fired once when the job first has an error or it is fired each time the job has errors on the retries too. My code seems to be firing the hook once on the 1st error and it doesn't fire on the retries errors?
error hook fires after every failed attempt, while failure fires once after number of attempts is greater than max_attempts.
If the error is running only once, check for:
max_attempts is set to one. Try explicitly setting the max attempts:
def max_attempts
3
end
You have an exception in your error hook. Try adding a rescue clause:
def error
# your code
rescue => e
Rails.logger.error "houston we have a problem #{e.message}"
end

Couldn't find Post without ID sidekiq error

I can't figure out what's going on with sidekiq. I could've swore this worked yesterday, but I must have been dreaming.
Here's my worker class:
class TagPostWorker
include Sidekiq::Worker
sidekiq_options queue: "tag"
sidekiq_options retry: false
def perform(options = {})
current_user = User.find(options[:user_id])
end
end
I've tried running this command on my show method in the Post:
TagPostWorker.perform_async({:user_id => current_user.id})
But I get this error:
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 TagPostWorker JID-ae203958bb3bcee01c8f83ef INFO: start
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 TagPostWorker JID-ae203958bb3bcee01c8f83ef INFO: fail: 0.003 sec
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 WARN: {"retry"=>false, "queue"=>"tag", "class"=>"TagPostWorker", "args"=>[{"user_id"=>7}], "jid"=>"ae203958bb3bcee01c8f83ef", "enqueued_at"=>1376779545.9099338}
2013-08-17T22:45:45Z 4029 TID-ors6jfr54 WARN: Couldn't find Post without an ID
I don't understand how sidekiq could even be attempting to a Post since I'm not even calling it in the perform method. Any ideas what could be going on?
Any help is appreciated.
You may want to take a look at this blog post.
It sounds ridiculous, but Sidekiq is so fast that it can run your
worker before your model even finishes saving.
Also strange, I could've sworn I restarted Rails/sidekiq.
But I renamed the worker to TagWorker, restarted Rails/sidekiq and it started working again!

Resources