Sidekiq Active Job database rollback on error - ruby-on-rails

I'm noticing that when a Sidekiq / Active Job fails due to an error being thrown, any database changes that occurred during the job are rolled back. This seems to be an intentional feature to make jobs idempotent.
My problem is that the method run by the job can send emails to users and it uses database modifications to prevent re-sending emails. If the database change is rolled back, then the email will be resent whenever the job is retried.
Here's roughly what my job looks like:
class ProcessPaymentsJob < ApplicationJob
queue_as :default
def perform(*args)
begin
# This can send emails to users.
PaymentProcessor.perform
rescue StandardError => error
puts 'PaymentsJob failed, ignoring'
puts error
end
end
end
The job is scheduled to run periodically using sidekiq-scheduler. I'm using rails-api v5.
I've added a rescue to try to prevent the job from rolling back the database changes but it still happens.
It occurred to me that maybe this isn't a Sidekiq issue at all but a feature of Rails.
What's the best solution here to prevent spamming the user with emails?

It sounds like your background job is doing too much. If sending the email has no bearing on whether the job was successful or not you should break the job into two jobs: one to send the email and another to do the other bit of processing work.
Alternatively, you could use Sidekiq Batches and make the first job above dependent on the second executing successfully.
Happy Sidekiq’ing!

You could wrap the database changes in a transaction inside of the PaymentProcessor, rescue the database rollback, and only send the email if the transaction succeeds. Sort of like this:
# ../payment_processor.rb
def perform
ActiveRecord::Base.transaction do
# AllTheThings.save!
end
rescue ActiveRecord::RecordInvalid => exception
# if things fail to save, handle the exception however you like
else
# if no exception is raised, send your email
end

Related

ActiveRecord exception caught after being eaten

I've got a GCP pubsub listener that does some work and then saves to ActiveRecord. I don't want to do that work if the DB connection is down, so I've added a pre-flight check. The pre-flight check checks the DB connection, and if it fails, eats the error and raises a RuntimeError. The DB is flighty though, and to account for the scenario where the pre-flight succeeds, but the DB connection dies while the work is being done, I have the caller rescuing ActiveRecord::ActiveRecordError and PG::Error, so we can log that the work was done, but the receipt couldn't be persisted. It's more important that this work not be duplicated than for the receipt to be persisted, so RuntimeError isn't caught, (causing a retry), but the DB errors are. It looks like this (snipping significantly):
# Service
def process
begin
WorkReceipt.do_work
rescue ActiveRecord::ActiveRecordError, PG::Error
Rails.logger.error("Work was done successfully, but not persisted")
end
end
# Model
class WorkReceipt < ActiveRecord::Base
def self.do_work
if !ActiveRecord::Base.connection.active?
Rails.logger.error("DB connection is inactive. Reconnecting...")
begin
ActiveRecord::Base.connection.reconnect!
rescue => e
Rails.logger.error("Could not reestablish connection: #{e}")
raise "Could not connect to database"
end
end
# Lots of hard work
self.create!(
# Some args
)
end
end
Where things get weird is, while testing this, I brought down the DB and fired off 4 of these tasks. The first one handles correctly ("Could not reestablish connection: server closed the connection unexpectedly"), but then the other 3 get "DB connection is inactive. Reconnecting..." (good) followed by "Work was done successfully, but not persisted" (what?!). Even weirder, is that the work has logging and side-effects which I don't see happening. The pre-flight appears to correctly prevent the work from being done, but the database error is showing up in the outer rescue, preventing the retry and making me sad. There is no database access other than the create at the end.
What is going on here? Why does it seem like the database error is skipping past the inner rescue to be caught by the outer one?
Maybe I don't understand how Ruby works, but changing raise "Could not connect to database" to raise RuntimeError.new "Could not connect to database" fixes the problem. I was under the impression that providing a message to raise caused it to emit a RuntimeError without needing to be explicit about it, but here we are.

Rails, how to know if a particular request is still running

How can I detect if a particular request is still active?
For example I have this request uuid:
# my_controller.rb
def my_action
request.uuid # -> ABC1233
end
From another request, how can I know if the request with uuid ABC1233 is still working?
For the curious:
Following beanstalk directives I am running cron jobs using URL requests.
I don't want to start the next iteration if the previous one is still running. I can not just relay in a ini/end flag updated by the request because the request some times dies before it finishes.
Using normal cron tasks I was managing this properly using the PID of the process.
But I don't think I can use PID any more because processes in a web server can be reused among different requests.
I don't think Rails (or more correctly, Rack) has support for this since (to the best of my knowledge) each Rails request doesn't know about any other requests. You may try to get access to all running threads (and even processes) but such implementation (if even possible) seems ugly to me
.
How about implementing it yourself?
class ApplicationController < ActionController::Base
before_filter :register_request
after_filter :unregister_request
def register_request
$redis.set request.uuid
end
def unregister_request
$redis.unset request.uuid
end
end
You'll still need to figure out what to do with exceptions since after_filters are skipped (perhaps move this whole code to a middleware: on the before phase of the middleware it writes the uuid to redis and on the after phase it removes the key ). There's a bunch of other ways to achieve this I'm sure and obviously substitute redis with your favorite persistence of choice.
Finally I recovered my previous approach based on PIDs.
I implemented something like this:
# The Main Process
module MyProcess
def self.run_forked
Process.fork do
SynchProcess.run
end
Process.wait
end
def self.run
RedisClient.set Process.pid # store the PID
... my long process code is here
end
def self.still_alive?(pid)
!!Process.kill(0, pid) rescue false
end
end
# In one thread I can do
MyProcess.run_forked
# In another thread I can do
pid = RedisClient.get
MyProcess.still_alive?(pid) # -> true if the process still running
I can call this code from a Rails request and even if the request process is reused the child one is not and I can monitor the PID of the child process to see if the Ruby process is still running.

Sidekiq - Only handle error after x retries?

I'm using sidekiq to process thousands of jobs per hour - all of which ping an external API (Google). One out of X thousand requests will return an unexpected (or empty) result. As far as I can tell, this is unavoidable when dealing with an external API.
Currently, when I encounter such response, I raise an Exception so that the retry logic will automatically take care of it on the next try. Something is only really wrong with the same job fails over and over many times. Exceptions are handled by Airbrake.
However my airbrake gets clogged up with these mini-outages that aren't really 'issues'. I'd like Airbrake to only be notified of these issues if the same job has failed X times already.
Is it possible to either
disable the automated airbrake integration so that I can use the sidekiq_retries_exhausted to report the error manually via Airbrake.notify
Rescue the error somehow so it doesn't notify Airbrake but keep retrying it?
Do this in a different way that I'm not thinking of?
Here's my code outline
class GoogleApiWorker
include Sidekiq::Worker
sidekiq_options queue: :critical, backtrace: 5
def perform
# Do stuff interacting with the google API
rescue Exception => e
if is_a_mini_google_outage? e
# How do i make it so this harmless error DOES NOT get reported to Airbrake but still gets retried?
raise e
end
end
def is_a_mini_google_outage? e
# check to see if this is a harmless outage
end
end
As far as I know Sidekiq has a class for retries and jobs, you can get your current job through arguments (comparing - cannot he effective) or jid (in this case you'd need to record the jid somewhere), check the number of retries and then notify or not Airbrake.
https://github.com/mperham/sidekiq/wiki/API
https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/api.rb
(I just don't give more info because I'm not able to)
if you look for Sidekiq solution https://blog.eq8.eu/til/retry-active-job-sidekiq-when-exception.html
if you are more interested in configuring Airbrake so you don't get these errors untill certain retry check Airbrake::Sidekiq::RetryableJobsFilter
https://github.com/airbrake/airbrake#airbrakesidekiqretryablejobsfilter

Delayed Job just not working

I am building an application where at some point I need to sync a bunch of data from fb with my database, so I am (attemtping) to use Delayed Job to push this into the background. Here is what part of my Delayed Job class looks like.
class FbSyncJob < Struct.new(:user_id)
require 'RsvpHelper'
def perform
user = User.find(user_id)
FbSyncJob.sync_user(user)
end
def FbSyncJob.sync_user(user)
friends = HTTParty.get(
"https://graph.facebook.com/me/friends?access_token=#{user.fb['token']}"
)
friends_list = friends["data"].map { |friend| friend["id"] }
user.fb["friends"] = friends_list
user.fb["sync"]["friends"] = Time.now
user.save!
FbSyncJob.friend_crawl(user)
end
end
With the RsvpHelper class living in lib/RsvpHelper.rb. So at some point in my application I call Delayed::Job.enqueue(FbSyncJob.new(user.id)) with a known valid user. The worker I set up even tells me that the job has been completed successfully:
1 jobs processed at 37.1777 j/s, 0 failed
However when I check the user in the database he has not had his friends list updated. Am I doing something wrong or what? Thanks so much for the help this has been driving me crazy.
Delayed::Job.enqueue will put a record in the delayed job table, but you need to run a seperate process to execute the job code (perform method)
typically in development this would be bundle exec rake jobs:work (NOTE: you must restart this rake task anytime you make code changes, it will not auto load changes)
see https://github.com/collectiveidea/delayed_job#running-jobs
I usually put the following into my delayed configuration while in development - this never puts a record in the delayed job table and runs all background code synchronously (in development) and by default rails will reload changes to your code
Delayed::Worker.delay_jobs = !(Rails.env.test? || Rails.env.development?)
https://github.com/collectiveidea/delayed_job#gory-details (see config/initializers/delayed_job_config.rb example section)

delayed_job and paperclip - Images aren't processed, but no error?

I'm having big issues trying to get delayed_job working with Amazon S3 and Paperclip. There are a few posts around about how to do it, but for whatever reason it's simply not working for me. I've removed a couple of things to how others are doing it - originally I had a save(validations => false) in regenerate_styles, but that seemed to cause an infinite loop (due to the after save catch), and didn't seem to be necessary (since the URLs have been saved, just the images not uploaded). Here's the relevant code from my model file, submission.rb:
class Submission < ActiveRecord::Base
has_attached_file :photo ...
...
before_photo_post_process do |submission|
if photo_changed?
false
end
end
after_save do |submission|
if submission.photo_changed?
Delayed::Job.enqueue ImageJob.new(submission.id)
end
end
def regenerate_styles!
puts "Processing photo"
self.photo.reprocess!
end
def photo_changed?
self.photo_file_size_changed? ||
self.photo_file_name_changed? ||
self.photo_content_type_changed? ||
self.photo_updated_at_changed?
end
end
And my little ImageJob class that sites at the bottom of the submission.rb file:
class ImageJob < Struct.new(:submission_id)
def perform
Submission.find(self.submission_id).regenerate_styles!
end
end
As far as I can tell, the job itself gets created correctly (as I'm able to pull it out of the database via a query).
The problem arises when:
$ rake jobs:work
WARNING: Nokogiri was built against LibXML version 2.7.8, but has dynamically loaded 2.7.3
[Worker(host:Jarrod-Robins-MacBook.local pid:21738)] New Relic Ruby Agent Monitoring DJ worker host:MacBook.local pid:21738
[Worker(host:MacBook.local pid:21738)] Starting job worker
Processing photo
[Worker(host:MacBook.local pid:21738)] ImageJob completed after 9.5223
[Worker(host:MacBook.local pid:21738)] 1 jobs processed at 0.1045 j/s, 0 failed ...
The rake task then gets stuck and never exits, and the images themselves don't appear to have been reprocessed.
Any ideas?
EDIT: just another point; the same thing happens on heroku, not just locally.
Delayed job is capturing a stack trace for all failed jobs. It’s saved in the last_error column of the delayed_jobs table. Use a database gui too see whats going on.
If you should be using Collective Ideas fork with ActiveRecord as backend you can query the model as usual. To fetch an array of all stack traces for example do
Delayed::Job.where('failed_at IS NOT NULL').map(&:last_error)
By default failed jobs are deleted after 25 failed attempts. It may be that there are no jobs anymore. Prevent deletion for debugging purposes by setting
Delayed::Worker.destroy_failed_jobs = false
in your config/initializers/delayed_job_config.rb

Resources