Sidekiq - Only handle error after x retries? - ruby-on-rails

I'm using sidekiq to process thousands of jobs per hour - all of which ping an external API (Google). One out of X thousand requests will return an unexpected (or empty) result. As far as I can tell, this is unavoidable when dealing with an external API.
Currently, when I encounter such response, I raise an Exception so that the retry logic will automatically take care of it on the next try. Something is only really wrong with the same job fails over and over many times. Exceptions are handled by Airbrake.
However my airbrake gets clogged up with these mini-outages that aren't really 'issues'. I'd like Airbrake to only be notified of these issues if the same job has failed X times already.
Is it possible to either
disable the automated airbrake integration so that I can use the sidekiq_retries_exhausted to report the error manually via Airbrake.notify
Rescue the error somehow so it doesn't notify Airbrake but keep retrying it?
Do this in a different way that I'm not thinking of?
Here's my code outline
class GoogleApiWorker
include Sidekiq::Worker
sidekiq_options queue: :critical, backtrace: 5
def perform
# Do stuff interacting with the google API
rescue Exception => e
if is_a_mini_google_outage? e
# How do i make it so this harmless error DOES NOT get reported to Airbrake but still gets retried?
raise e
end
end
def is_a_mini_google_outage? e
# check to see if this is a harmless outage
end
end

As far as I know Sidekiq has a class for retries and jobs, you can get your current job through arguments (comparing - cannot he effective) or jid (in this case you'd need to record the jid somewhere), check the number of retries and then notify or not Airbrake.
https://github.com/mperham/sidekiq/wiki/API
https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/api.rb
(I just don't give more info because I'm not able to)

if you look for Sidekiq solution https://blog.eq8.eu/til/retry-active-job-sidekiq-when-exception.html
if you are more interested in configuring Airbrake so you don't get these errors untill certain retry check Airbrake::Sidekiq::RetryableJobsFilter
https://github.com/airbrake/airbrake#airbrakesidekiqretryablejobsfilter

Related

Rails 5 + ActiveJob + Sidekiq: Stop and log error after 10 retries

Trying to program a job that after 10 retries (from all exception types) will report a failure and die. Can't get it to work. Tried this answer and this one too. Neither worked.
The best solution would be to access retry_count from within the perform method.
I think what you're asking for is the sidekiq_retries_exhausted hook. It will be called once your retries are up and job will move to dead queue. Just set retries to 10 and implement that hook.
config.death_handlers might also be interesting.
See docs here: https://github.com/mperham/sidekiq/wiki/Error-Handling#configuration

Rails - How do I prevent Sidekiq from slowing down the server?

I have a newsletter that I send out to my customers (~10k emails) every morning and sometimes happens that this Sidekiq job is taking some much CPU/memory performance that the website (Rails app) is not running and facing blackouts.
When I look at the Sidekiq dashboard, I see there is some problem (probably invalid email address and Sidekiq repeatedly trying to send it again?) with the newsletter and it's stuck.
How do I prevent this behavior and preclude repeating the Sidekiq task (which I believe that's the problem of the breakout)?
Here's my code:
rake task:
namespace :mailer do desc "Carrier blast - morning"
task :newsletter_morning => [:environment] do
NewslettertJob.perform_later
end
end
job definition:
class NewslettertJob < ApplicationJob
def perform
...
NewsletterMailer.morning_blast(data).deliver_now
end
end
and NewsletterMailer:
class NewsletterMailer < ApplicationMailer
def morning_blast(data)
...
customers.each do |customer|
yield customer, nil; next if customer.email.blank?
begin
Retryable.retryable( tries: 1, sleep: 30, on: [Net::OpenTimeout, Net::SMTPAuthenticationError, Net::SMTPServerBusy]) do
send_email(customer.email).deliver
end
send_email(customer.email).deliver
rescue Net::SMTPSyntaxError => e
error_msg = "Newsletter sending failed on #{Time.now} with: #{e.message}. e.inspect: #{e.inspect}"
logger.warn error_msg
yield customer, nil
next
end
end
end
end
What I want to achieve is that the newsletter will be sent out every morning and if Rails/Sidekiq faces a problem, it will simply shut itself down, so the newsletter will not affect the "life" on the main website (its server).
Thank you in advance for every advice. I am being stuck on this issue for a while now.
If your machine only has one core, Sidekiq and puma will fight for CPU. Lower Sidekiq's concurrency so it uses less CPU, or get a machine with multiple cores, or move Sidekiq to a different machine.
If a Sidekiq process is using 100% of a core, lower the concurrency setting. The default in Sidekiq 6.0 is 10, which is a good default but if you are just delivering emails you could probably bump that to 20. You can run multiple Sidekiq processes if you wish to utilize multiple cores to process jobs faster.
I think ideally, you should separate your background task servers from your web servers, that way background process won't impact on the performance of the web server. I work for a very high traffic/ high-load company, and we have an architecture of sorts in here.
There are explanations on how to stop retries in this answer: Disable automatic retry with ActiveJob, used with Sidekiq
Another thing, your e-mail sending is done synchronously (.deliver). This implicates on your task being a huge monolitical process with many customers, with huge impact on memory. Instead, you could use a deliver_later, so each customer get's it's own little worker. This will also help aliviate CPU and Memory usage. You could even create a worker for sending e-mails per customer, and use your monolitical Job to merely dispatch those.
class NewslettertJob < ApplicationJob
def perform
...
customers.each |customer| do
NewsletterMailer.morning_blast(customer, data).deliver_later if customer.email.present?
end
end
end
However, I think the silver bullet is separating your sidekiq server from your web server - having one server dedicated to background tasks. On your web server, you don't even start the sidekiq instances.

Acting on job failure with ActiveJob and DelayedJob

My Rails application is using ActiveJob + DelayedJob to execute some background jobs.
I am trying to figure out what is the way to define what happens on failure (not on error) - meaning, if DelayedJob has marked the job as failed, after the allowed 3 attempts, I want to perform some operation.
This is what I know so far:
DelayedJob has the aptly named failure hook.
This hook is not supported in ActiveJob
ActiveJob has a rescue_from method
The rescue_from method is probably not the right solution, since I do not want to do something on each exception, but rather only after 3 attempts (read: only after DelayedJob has deemed the job as failed).
ActiveJob has an after_perform hook, which I cannot utilize since (as far as I can see) it is not called when perform fails.
Any help is appreciated.
You may already find the solution to this, but for people who still struggle on this issue, you can use ActiveJob rety_on method with a block to run custom logic when maximum attempts have reached but still failed.
class RemoteServiceJob < ApplicationJob
retry_on(CustomAppException) do |job, error|
ExceptionNotifier.caught(error)
end
def perform(*args)
# Might raise CustomAppException
end
end
You can find more info about Exception handling in ActiveJob in https://api.rubyonrails.org/v6.0.3.2/classes/ActiveJob/Exceptions/ClassMethods.html

Sidekiq Active Job database rollback on error

I'm noticing that when a Sidekiq / Active Job fails due to an error being thrown, any database changes that occurred during the job are rolled back. This seems to be an intentional feature to make jobs idempotent.
My problem is that the method run by the job can send emails to users and it uses database modifications to prevent re-sending emails. If the database change is rolled back, then the email will be resent whenever the job is retried.
Here's roughly what my job looks like:
class ProcessPaymentsJob < ApplicationJob
queue_as :default
def perform(*args)
begin
# This can send emails to users.
PaymentProcessor.perform
rescue StandardError => error
puts 'PaymentsJob failed, ignoring'
puts error
end
end
end
The job is scheduled to run periodically using sidekiq-scheduler. I'm using rails-api v5.
I've added a rescue to try to prevent the job from rolling back the database changes but it still happens.
It occurred to me that maybe this isn't a Sidekiq issue at all but a feature of Rails.
What's the best solution here to prevent spamming the user with emails?
It sounds like your background job is doing too much. If sending the email has no bearing on whether the job was successful or not you should break the job into two jobs: one to send the email and another to do the other bit of processing work.
Alternatively, you could use Sidekiq Batches and make the first job above dependent on the second executing successfully.
Happy Sidekiq’ing!
You could wrap the database changes in a transaction inside of the PaymentProcessor, rescue the database rollback, and only send the email if the transaction succeeds. Sort of like this:
# ../payment_processor.rb
def perform
ActiveRecord::Base.transaction do
# AllTheThings.save!
end
rescue ActiveRecord::RecordInvalid => exception
# if things fail to save, handle the exception however you like
else
# if no exception is raised, send your email
end

Handling Rack Timeout exceptions

I am using the Rack-timeout gem and a Rack::Timeout::RequestTimeoutException occurs. I did no configuration outside of putting this gem into my gemfile.
How do I handle these exceptions so that they don't stop the normal app's procedure but instead just log and let me know about them?
You can catch the exception in your application with
# in your app/controllers/application_controller.rb
rescue_from Rack::Timeout::RequestTimeoutException do |exception|
# do something
end
But as it's an exception I don't believe it's possible to return execution to where it was interrupted.
However, timeout also drops a log message every 1 second like this:
source=rack-timeout id=1123e70d486cbca9796077dc96279126 timeout=20000ms
service=1018ms state=active
Perhaps you could increase the interval of these to, say 5 seconds, and change the timeout to something high like 120 seconds, that way it's unlikely to actually interrupt anything, but you will get log messages to tell you that something is running long.
The whole purpose of that gem is to raise exceptions after a timeout
"Abort requests that are taking too long; an exception is raised."
If that's not what you want to do, perhaps you shouldn't be using that particular gem? Random google hit https://github.com/moove-it/rack-slow-log
to change the timeout, run
export RACK_TIMEOUT_SERVICE_TIMEOUT=30

Resources