Requeue or evaluate delayed job from inside? - ruby-on-rails

Is there a way to determine the status of a running delayed_job job from inside the job task itself? I have a job that interacts with a service that can be pretty flaky and for a certain class of connection failures I'd like to requeue the job and only raise an exception if a connection failure occurs again at the retry limit.
Pseudo-code to demonstrate what I want to be able to do:
def do_thing
service.send_stuff(args)
rescue Exception1, Exception2
if job.retries == JOBS_MAX
raise
else
job.requeue
end
end
I don't want to raise an exception on any failure because generally the job will be completed okay on a later retry and it is just making noise for me. I do want to know if it is never completed, though.

Define a custom job for DJ, setting a number for max_attempts and behavior for the error callback. This is untested, but it might look something like this:
class DoThingJob
def max_attempts; #max_attempts ||= 5; end
def error(job, exception)
case exception
when Exception1, Exception2
# will be requeued automatically until max_attempts is reached
# can add extra log message here if desired
else
#max_attempts = job.attempts
# this will cause DJ to fail the job and not try again
end
end
end
NOTE
I started writing this before #pdobb posted his answer. I'm posting it anyway because it provides some more detail about how you might handle the exceptions and requeue logic.

As you've said, if the Delayed Job runner gets to the end of the perform queue then it will be considered as a successful run and removed from the queue. So you just have to stop it from getting to the end. There isn't a requeue -- even if there was it'd be a new record with new attributes. So you may rethink whatever it is that is causing the job to notify you about exceptions. You could, for example, add a condition upon which to notify you...
Potential Solutions
You can get the default JOBS_MAX (as you pseudo-coded it) with Delayed::Worker.max_attempts or you can set your own per-job by defining a method, e.g.: max_attempts.
# Fail permanently after the 10th failure for this job
def max_attempts
10
end
That is, this method will be usable given the following:
You can also make use of callback hooks. Delayed Job will callback to your payload object via the error method if it is defined. So you may be able to use the error method to notify you of actual exceptions beyond a given attempt number. To do that...
Within the callback, the Delayed::Job object itself is returned as the first argument:
def error(job, exception)
job.attempts # gives you the current attempt number
# If job.attempts is greater than max_attempts then send exception notification
# or whatever you want here...
end
So you can use the callbacks to start adding logic on when to notify yourself and when not to. I might even suggest making a base set of functionality that you can include into all of your payload objects to do these things... but that's up to you and your design.

Related

Display info (date) about completed Sidekiq job

I'm using Rails 6 in my app with Sidekiq on board. I've got FetchAllProductsWorker like below:
module Imports
class FetchAllProductsWorker
include Sidekiq::Worker
sidekiq_options queue: 'imports_fetch_all', retry: 0
def perform
(...)
end
end
end
I want to check when FetchAllProductsWorker was finished successfully last time and display this info in my front-end. This job will be fired sporadically but the user must have feedback when the last database sync (FetchAllProductsWorker is responsible for that) succeeded.
I want to have such info only for this one worker. I saw a lot of useful things inside the sidekiq API docs but none of them relate to the history of completed jobs.
You could use the Sidekiq Batches API that provides an on_success callback but that is mostly used for tracking batch work which is an overkill for your problem. I suggest writing your code at the end of the perform function.
def perform
(...) # Run the already implemented code
(notify/log for success) # If it is successful notify/log.
end
The simplified default lifecycle of a Sidekiq looks like this:
If there is an error then the job will be retried a couple of times (read about Retries in the Sidekiq docs). During that time you can see the failing job and the error on the Sidekiq Web UI if configured.
– If the job is finished successfully the job is removed from Redis and there is no information about this specific job available to the application.
That means Sidekiq does not really support running queries about jobs that run successfully in the past. If you need information about past jobs then you need to build this on its own. I basically see three options to allow monitoring Sidekiq Jobs:
Write useful information to your application's log. Most logging tools support monitoring for specific messages, sending messages, or creating views for specific events. This might be enough if you just need this information for debugging reasons.
def perform
Rails.logger.info("#{self.class.name}" started)
begin
# job code
rescue => exception
Rails.logger.error("#{self.class.name} failed: #{exception.message}")
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
Rails.logger.info("#{self.class.name}" was finished successfully)
end
end
If you are mostly interested in getting informed when there is a problem then I suggest looking at a tool like Dead Man's Snitch. how those tools are working is that you ping their API as the last step of a job which will only reach when there was no error. Then configure that tool to notify you if its API hasn't been pinged in the expected timeframe, for example, if you have a daily import job, then Dead Man's Snitch would send you a message only if there wasn't a successful import Job in the last 24 hours. If the job was successful it will not spam you every single day.
require 'open-uri'
def perform
# job code
open("https://nosnch.in/#{TOKEN}")
end
If you want to allow your application's users to see job return statuses on a dashboard in the application. Then it makes sense to store that information in the database. You could, for example, just create a JobStatus ActiveRecord model with columns like job_name, status, payload, and a created_at and then create records in that table whenever it feels useful. Once the data is in the database you can present that data like every other model's data to the user.
def perform
begin
# job code
rescue => exception
JobStatus.create(job_name: self.class.name, status: 'failed', payload: exception.to_json)
raise # re-raise the exception to trigger Sidekiq's default retry behavior
else
JobStatus.create(job_name: self.class.name, status: 'success')
end
end
And, of course, you can combine all those technics and tools for different use-cases. 1. for history and statistics, 2. for admins and people being on-call, 3. for users of your application.

How to prevent parallel Sidekiq jobs from executing code in Rails

I have around 10 workers that performs a job that includes the following:
user = User.find_or_initialize_by(email: 'some-email#address.com')
if user.new_record?
# ... some code here that does something taking around 5 seconds or so
elsif user.persisted?
# ... some code here that does something taking around 5 seconds or so
end
user.save
The problem is that at certain times, two or more workers run this code at the exact time, and thus I later found out that two or more Users have the same email, in which I should always end up only unique emails.
It is not possible for my situation to create DB Unique Indexes for email as unique emails are conditional -- some Users should have unique email, some do not.
It is noteworthy to mention that my User model has uniqueness validations, but it still doesn't help me because, between .find_or_initialize_by and .save, there is a code that is dependent if the user object is already created or not.
I tried Pessimistic and Optimistic locking, but it didn't help me, or maybe I just didn't implement it properly... should you have some suggestions regarding this.
The solution I can only think of is to lock the other threads (Sidekiq jobs) whenever these lines of codes get executed, but I am not too sure how to implement this nor do I know if this is even a suggestable approach.
I would appreciate any help.
EDIT
In my specific case, it is gonna be hard to put email parameter in the job, as this job is a little more complex than what was just said above. The job is actually an export script in which a section of the job is the code above. I don't think it's also possible to separate the functionality above into another separate worker... as the whole job flow should be serial and that no parts should be processed parallely / asynchronously. This job is just one of the jobs that are managed by another job, in which ultimately is managed by the master job.
Pessimistic locking is what you want but only works on a record that exists - you can't use it with new_record? because there's nothing to lock in the DB yet.
I managed to solve my problem with the following:
I found out that I can actually add a where clause in Rails DB Uniqueness Partial Index, and thus I can now set up uniqueness conditions for different types of Users on the database-level in which other concurrent jobs will now raise an ActiveRecord::RecordNotUnique error if already created.
The only problem now then is the code in between .find_or_initialize_by and .save, since those are time-dependent on the User objects in which always only one concurrent job should always get a .new_record? == true, and other concurrent jobs should then trigger the .persisted? == true as one job would always be first to create it, but... all of these doesn't work yet because it is only at the line .save where the db uniqueness index validation gets called. Therefore, I managed to solve this problem by putting .save before those conditions, and at the same time I added a rescue block for .save which then adds another job to the queue of itself should it trigger the ActiveRecord::RecordNotUnique error, to make sure that async jobs won't get conflicts. The code now looks like below.
user = User.find_or_initialize_by(email: 'some-email#address.com')
begin
user.save
is_new_record = user.new_record?
is_persisted = user.persisted?
rescue ActiveRecord::RecordNotUnique => exception
MyJob.perform_later(params_hash)
end
if is_new_record
# do something if not yet created
elsif is_persisted
# do something if already created
end
I would suggest a different architecture to bypass the problem.
How about a producer-worker model, where one master Sidekiq process gets a list of email addresses, and then spawns a worker Sidekiq process for each email? Sidekiq makes this easy with a dedicated queue for master and workers to communicate.
Doing so, the email address becomes an input parameter of workers, so we know by construction that workers will not stump on each other data.

Can ruby exceptions be handled asynchronously outside of a Thread::handle_interrupt block?

At first glance, I thought the new ruby 2.0 Thread.handle_interrupt was going to solve all my asynchronous interrupt problems, but unless I'm mistaken I can't get it to do what I want (my question is at the end and in the title).
From the documentation, I can see how I can avoid receiving interrupts in a certain block, deferring them to another block. Here's an example program:
duration = ARGV.shift.to_i
t = Thread.new do
Thread.handle_interrupt(RuntimeError => :never) do
5.times { putc '-'; sleep 1 }
Thread.handle_interrupt(RuntimeError => :immediate) do
begin
5.times { putc '+'; sleep 1}
rescue
puts "received #{$!}"
end
end
end
end
sleep duration
puts "sending"
t.raise "Ka-boom!"
if t.join(20 + duration).nil?
raise "thread failed to join"
end
When run with argument 2 it outputs something like this:
--sending-
--received Ka-boom!
That is, the main thread sends a RuntimeError to the other thread after two seconds, but that thread doesn't handle it until it gets into the inner Thread.handle_interrupt block.
Unfortunately, I don't see how this can help me if I don't know where my thread is getting created, because I can't wrap everything it does in a block. For example, in Rails, what would I wrap the Thread.handle_interrupt or begin...rescue...end blocks around? And wouldn't this differ depending on what webserver is running?
What I was hoping for is a way to register a handler, like the way Kernel.trap works. Namely, I'd like to specify handling code that's context-independent that will handle all exceptions of a certain type:
register_handler_for(SomeExceptionClass) do
... # handle the exception
end
What precipitated this question was how the RabbitMQ gem, bunny sends connection-level errors to the thread that opened the Bunny::Session using Thread#raise. These exceptions could end up anywhere and all I want to do is log them, flag that the connection is unavailable, and continue on my way.
Ideas?
Ruby provides for this with the ruby Queueobject (not to be confused with an AMQP queue). It would be nice if Bunny required you to create a ruby Queue before opening a Bunny::Session, and you passed it that Queue object, to which it would send connection-level errors instead of using Thread#raise to send it back to where ever. You could then simply provide your own Thread to consume messages through the Queue.
It might be worth looking inside the RabbitMQ gem code to see if you could do this, or asking the maintainers of that gem about it.
In Rails this is not likely to work unless you can establish a server-wide thread to consume from the ruby Queue, which of course would be web server specific. I don't see how you can do this from within a short-lived object, e.g. code for a Rails view, where the threads are reused but Bunny doesn't know that (or care).
I'd like to raise (ha-ha!) a pragmatic workaround. Here be dragons. I'm assuming you're building an application and not a library to be redistributed, if not then don't use this.
You can patch Thread#raise, specifically on your session thread instance.
module AsynchronousExceptions
#exception_queue = Queue.new
class << self
attr_reader :exception_queue
end
def raise(*args)
# We do this dance to capture an actual error instance, because
# raise may be called with no arguments, a string, 3 arguments,
# an error, or any object really. We want an actual error.
# NOTE: This might need to be adjusted for proper stack traces.
error = begin
Kernel.raise(*args)
rescue => error
error
end
AsynchronousExceptions.exception_queue.push(error)
end
end
session_thread = Thread.current
session_thread.singleton_class.prepend(AsynchronousExceptions)
Bear in mind that exception_queue is essentially a global. We also patch for everybody, not just the reader loop. Luckily there are few legitimate reasons to do Thread.raise, so you might just get away with this safely.

What is the iOS (or RubyMotion) idiom for waiting on a block that executes asynchronously?

I have been pulling my hair out for weeks on this niggling problem, and I just can't find any info or tips on how or what to do, so I'm hoping someone here on the RubyMotion forums can help me out.
Apologies in advance if this is a little long, but it requires some setup to properly explain the issues. As background, I've got an app that uses a JSON/REST back-end implemented ina Rails app. This is pretty straightforward stuff. The back-end is working fine, and up to a point, so is the front end. I can make calls to populate model objects in the RubyMotion client and everything is great.
The one issue is that all of the http/json libs use async calls when processing requests. This is fine, and I understand why they are doing it, but there are a couple of situations where I need to be able to wait on a call because I need to do something with the returned results before proceeding to the next step.
Consider the example where a user wants to make a payment, and they have some saved payment information. Prior to presenting the user with a list of payment options, I want to make sure that I have the latest list at the client. So I need to make a request to a method on the user object that will grab the current list (or timeout). But I don't want to continue until I am sure that the list is either current, or the call to the back-end has failed. Basically, I want this call to block (without blocking the UI) until the results are returned.
Alternatives such as polling for changes or pushing changes from the back-end to the front are not appropriate for this scenario. I also considered simply pulling the data from the destination form (rather than pushing it into the form) but that doesn't work in this particular scenario because I want to do different things depending on whether the user has zero, one or multiple payment options saved. So I need to know in advance of pushing to the next controller, and to know in advance, I need to make a synchronous call.
My first attack was to create a shared instance (let's call it the SyncHelper) that I can use to store the returned result of the request, along with the "finish" state. It can provide a wait method that just spins using CFRunLoopRunInMode either until the request is finished, or until the request times out.
SyncHelper looks a bit like this (I've edited it to take some irrelevant stuff out):
class SyncHelper
attr_accessor :finished, :result, :error
def initialize()
reset
end
def reset
#finished = false
#result = nil
#error = nil
end
def finished?
#finished
end
def finish
#finished = true
end
def finish_with_result(r)
#result = r
#finished = true
end
def error?
!#error.nil?
end
def wait
timeout = 0.0
while !self.finished? && timeout < API_TIMEOUT
CFRunLoopRunInMode(KCFRunLoopDefaultMode, API_TIMEOUT_TICK, false)
timeout = timeout + API_TIMEOUT_TICK
end
if timeout >= API_TIMEOUT && !self.finished?
#error = "error: timed out waiting for API: #{#error}" if !error?
end
end
end
Then I have a helper method like this, which would allow to me to make any call synchronous via the provision of the syncr instance to the invoked method.
def ApiHelper.make_sync(&block)
syncr = ApiHelper::SyncHelper.new
BubbleWrap::Reactor.schedule do
block.call syncr
end
syncr.wait
syncr.result
end
What I had hoped to do was use the async versions everywhere, but in the small number of cases where I needed to do something synchronously, I would simply wrap the call around a make_sync block like this:
# This happens async and I don't care
user.async_call(...)
result = ApiHelper.make_sync do |syncr|
# This one is async by default, but I need to wait for completion
user.other_async_call(...) do |result|
syncr.finish_with_result(result)
end
end
# Do something with result (after checking for errors, etc)
result.do_something(...)
Importantly, I want to be able to get the return value from the 'synchronised' call back into the invoking context, hence the 'result =...' bit. If I can't do that, then the whole thing isn't much use to me anyway. By passing in syncr, I can make a a call to its finish_with_result to tell anyone listening that the async task has completed, and store the result there for consumption by the invoker.
The problem with my make_sync and SyncHelper implementations as they stand (apart from the obvious fact that I'm probably doing something profoundly stupid) is that the code inside the BubbleWrap::Reactor.schedule do ... end block doesn't get called until after the call to syncr.wait has timed out (note: not finished, because the block never gets the chance to run, and hence can't store result in it). It is completely starving all other processes from access to the CPU, even tho the call to CFRunLoopRunInMode is happening inside wait. I was under the impression that CFRunLoopRunInMode in this config would spin wait, but allow other queued blocks to run, but it appears that I've got that wrong.
This strikes me as something that people would need to do from time-to-time, so I can't be the only person having trouble with this kind of problem.
Have I had too many crazy pills? Is there a standard iOS idiom for doing this that I'm just not understanding? Is there a better way to solve this kind of problem?
Any help would be much appreciated.
Thanks in advance,
M#
When you need to display the payment options, display a HUD, like MBProgressHUD to block the user from using the UI and then start your network call. When the network call returns, dismiss the HUD in either in your success/failure blocks or in the delegate methods and then refresh your view with the data received.
If you don't like the HUD idea you can display something appropriate in your UI, like a UILabel with "loading..." or an UIActivityIndicatorView.
If you need to get the data to display first thing, do it in viewDidAppear; if it happens on an action then move your transition to the next view (performSegueWithIdentifier or whatever) into your network success block/callback and make the network call when the action is called.
There should be examples in your networking library of how, or take a look at the usage sample code in MBProgressHUD itself https://github.com/jdg/MBProgressHUD.
Here's what I do to make multi-threaded synchronized asynchronous calls.
def make_sync2(&block)
#semaphore ||= Dispatch::Semaphore.new(0)
#semaphore2 ||= Dispatch::Semaphore.new(1)
BubbleWrap::Reactor.schedule do
result = block.call("Mateus")
#semaphore2.wait # Wait for access to #result
#result = result
#semaphore.signal
end
#semaphore.wait # Wait for async task to complete
result = #result
#semaphore2.signal
result
end
as borrrden just said, I'd use a dispatch_semaphore
def ApiHelper.make_sync(&block)
#semaphore = Dispatch::Semaphore.new(0)
BubbleWrap::Reactor.schedule do
# do your stuff
#result = block.call()
#semaphore.signal
end
#semaphore.wait
#result
end
this is how I'd handle it on Rubymotion
You can also use synced queues.
Dispatch::Queue.new('name').sync
Dispatch::Queue.main.sync
Take a look at more examples of usage: http://blog.arkency.com/2014/08/concurrent-patterns-in-rubymotion/

More Advanced Control Over Delayed Job workers retry

So I'm using Delayed::Job workers (on Heroku) as an after_create callback after a user creates a certain model.
A common use case, though, it turns out, is for users to create something, then immediately delete it (likely because they made a mistake or something).
When this occurs the workers are fired up, but by the time they query for the model at hand, it's already deleted, BUT because of the auto-retry feature, this ill-fated job will retry 25 times, and definitely never work.
Is there any way I can catch certain errors and, when they occur, prevent that specific job from ever retrying again, but if it's not that error, it will retry in the future?
Abstract the checks into the function you call with delayed_job. Make the relevant checks wether your desired job can proceed or not and either work on that job or return success.
To expand on David's answer, instead of doing this:
def after_create
self.send_later :spam_this_user
end
I'd do this:
# user.rb
def after_create
Delayed::Job.enqueue SendWelcomeEmailJob.new(self.id)
end
# send_welcome_email_job.rb
class SendWelcomeEmailJob < Struct(:user_id)
def perform
user = User.find_by_id(self.user_id)
return if user.nil? #user must have been deleted
# do stuff with user
end
end

Resources