Is the Delayed Job error hook fired once when the job first has an error or it is fired each time the job has errors on the retries too. My code seems to be firing the hook once on the 1st error and it doesn't fire on the retries errors?
error hook fires after every failed attempt, while failure fires once after number of attempts is greater than max_attempts.
If the error is running only once, check for:
max_attempts is set to one. Try explicitly setting the max attempts:
def max_attempts
3
end
You have an exception in your error hook. Try adding a rescue clause:
def error
# your code
rescue => e
Rails.logger.error "houston we have a problem #{e.message}"
end
Related
We have a job to make an API call to an external service. We are facing an issue with error handling with it. Let's say our job looks like this:
class MakeApiRequestJob < ApplicationJob
retry_on ApiClient::RequestThrottledError, wait: :exponentially_longer, attempts: 10
def perform
raise ApiClient::RequestThrottledError
end
end
The sample code above produces an error log like this:
ERROR -- : [ActiveJob] [MakeApiRequestJob] [job id] Error performing MakeApiRequestJob (Job ID: [job id]) from Sidekiq(default) in 25573.04ms: ApiClient::RequestThrottledError (Your request was throttled.):
followed by:
INFO -- : Retrying MakeApiRequestJob in 3 seconds, due to a ApiClient::RequestThrottledError.
The problem we have is that, the first ERROR line triggers an alert in our monitoring system. We know already that our requests can be throttled time to time, and we are comfortable if the request is accepted eventually. Only in case the job is permanently failed, which is when 10 retries have failed, we want to be notified by an alert.
Does anyone manage to manipulate ActiveJob::LogSubscriber and suppress it from producing error lines?
I have a sidekiq worker which will request 3rd party api(Mailchimp) and got some response. Sometimes it will response an error message which the api gem will raise an error.
However, those Errors are normal and no need to retry. So I would like Sidekiq prevent retry when those Errors raised.
I have tried a simply rescue, but it won't prevent the sidekiq capture the error raised.
def preform(id)
UpdateMailchimpService.new.(id)
rescue
Mailchimp::ListInvalidBounceMemberError
end
Any way to do this? Thanks
Update
Finally found that my problem was caused by the broken of our deploy tool(deployment failed but not realised). Actually, the Sidekiq will ignore any rescued error/exception and they are not be retried and reported to Bugsnag.
In Bugsnag's documentation, it clearly said:
Bugsnag should be installed and configured, and any unhandled exceptions will be automatically detected and should appear in your Bugsnag dashboard.
This post on github didn't have an clear explanation so that's why I am confused by this question.
Your assumption/example is incorrect. Do the normal Ruby thing: rescue the error and ignore it.
def perform(id)
UpdateMailchimpService.new.(id)
rescue NormalError
# job will succeed normally and Sidekiq won't retry it.
end
Use retry: false advanced option:
class UpdateMailchimpWorker
include Sidekiq::Worker
sidekiq_options retry: false # ⇐ HERE
def perform(id)
UpdateMailchimpService.new.(id)
end
end
I use a Sidekiq queue to process communications with an unreliable, 3rd party API. Since this API is often down for a couple minutes at a time and then back up again, Sidekiq has been handy. When a connection issue happens, an error is raised and Sidekiq throws the job back in the queue to be retried again later, after some time has passed.
I use NewRelic to not only help debug crashes, but also for monitoring. My problem is that this current methodology above creates errors in NewRelic. If the 3rd party API is down for more than a couple of minutes, the error count accumulates enough to cause notifications to send out through NewRelic.
What I'd like to do is only raise an error from my worker when a certain number of retries have occurred for a job. I'm using sidekiq_retries_exhausted to do this. My problem is that I'm not quite sure how to put jobs back in the queue after they have an error without raising an error.
Does Sidekiq provide any facilities to return a job to a queue, increment the number of retries for the job, and have it sit there until it's due to run again, as if an exception was raised in the worker class?
You raise a specific error and tell the error service to ignore errors of that type. For NewRelic:
https://docs.newrelic.com/docs/agents/ruby-agent/installation-configuration/ruby-agent-configuration#error_collector.ignore_errors
Here is what I did to keep intentional retry errors out of AirBrake:
class TaskWorker
include Sidekiq::Worker
class RetryNotAnError < RuntimeError
end
def perform task_id
task = Task.find(task_id)
task.do_cool_stuff
if task.finished?
#log.debug "Task #{task_id} was successful."
return false
else
#log.debug "Task #{task_id} will try again later."
raise RetryNotAnError, task_id
end
end
end
Tell Airbrake to ignore it:
Airbrake.configure do |config|
config.ignore << 'RetryNotAnError'
end
It's good to make your exception name OBVIOUSLY not an error (e.g. RetryLaterNotAnError), as it will still show up in logs and such, and you don't want to freak people out when they see a bunch of them.
ps. That said, I would really like to see Sidekiq to provide an explicit, errorless retry mechanism.
If using Sidekiq Enterprise, one other option might be to utilize the optional set of additional error types that will then get treated as Sidekiq::Limiter::OverLimit violations.
For my purposes, I've used a new error class and then added it to the list in the config. Here are the notes from the sidekiq-ent code (not in the public sidekiq repo) on how to modify your config file:
# An optional set of additional error types which would be
# treated as a rate limit violation, so the job would automatically
# be rescheduled as with Sidekiq::Limiter::OverLimit.
#
# Sidekiq::Limiter.errors << MyApp::TooMuch
# Sidekiq::Limiter.errors = [Foo::Error, MyApp::Limited]
Inside the specific job you can specify the max_retries, or it will default to 20:
sidekiq_options max_limiter_retries: 10
Inside the job, I'll rescue the "expected" intermittent error that I'd rather not ignore completely and then raise the error I've added to the list, something like this:
rescue RestClient::RequestTimeout => e
raise SidekiqSoftRetry.new(e.inspect)
end
Here's what that looks like in my initialization file-- and Mike Perham was kind enough to respond with the option to update the global retry limit.
class SidekiqSoftRetry < RuntimeError
end
Sidekiq::Limiter::DEFAULT_OPTIONS[:reschedule] = 10
Sidekiq::Limiter.configure do |config|
config.errors.concat(
[
SidekiqSoftRetry,
]
)
end
I'd like to write rake script, that runs all RSpec tests in my application. If any of the tests fails, id like to throw an Exception in the task (later on I will catch this exception in NewRelic alert system - I use it for other tasks as well).
Is it possible?
First, you don't need to raise an exception to let newrelic know of it. You can let newrelic know by directly posting error details to their api: https://docs.newrelic.com/docs/agents/ruby-agent/troubleshooting/sending-new-relic-handled-errors
notice_error(exception, options = { })
where exception can be an exception object (StandardError.new, for example) or a message.
Also, you can omit all this exception business and check exit code of rspec command line tool. If tests are green, it'll be zero. If errors are present, it will not be zero. Something like this
if system('rspec spec') # return true if command was successful, false otherwise
# if green
else
# if red
end
The code below keeps looping, where I would expect find to wait for its default wait time of 2 seconds before throwing an exception and having the loop iterate.
user_general.synchronize(10) do
tab_me.primary_action("Plus").click
add_edit_item.find('.ready[data-id="pageAddEditItems"]')
end
In Capybara only the outermost synchronize loop is rerun on failures, you can see this in the source code for #synchronize which does the following
if session.synchronized
yield # if we are already in a synchronize loop just run the code
else
... # catch errors and retry until max wait time expires or success
end