How to discard all changes executed on an async task when it raises an error in Sidekiq? - ruby-on-rails

I am using Sidekiq and Rails (6.0.3.7). I have a worker which executes an async task that creates a lot of data on my database, sequentially. So basically what it happens is, for example:
First, it creates an User, then it creates a PostCategory, then it creates a Post, and then it creates 10 comments.
Sometimes, this process fails midway, maybe when creating a PostCategory, or when creating Post.
What i want to happen is that if the task fails at any given point, all the data that has been already created in said task, is discarded. Another approach could be that all the data is created only once i am sure the process has not failed. So basically it would have to "check create" everything, before actually writing to the database.
An example of this would be that the User has been created, but for some reason, it failed to create a PostCategory, and the AsyncTask failed. What i want to happen is that it automatically deletes the created User, or that it was never created in the first place, because the task failed.
Is there any approach or technique i could use to do this on my current worker without messing around too much with the actual code? Some "double check" method already implemented on Sidekiq? What do you recommend i should look into?
Thanks in advance for any help you can give me with this issue.

First of all, great, that you design your tasks to be all-or-none. The main and preferable approach is to use database transactions, as that is what they were designed for. Open transaction before starting entity creation, and commit once all the checks are done.
Account.transaction do
balance.save!
account.save!
end
Note bang methods (those with trailing !) their intent is to throw exceptions. An exception will automatically rollback the transaction, and that's what you need.
N.B. Try to make your task idempotent which mean returning the same result regardless of the number of calls with the same input results. This could save lots of time in the future.

Related

How can I stop active record callbacks from throwing an error when they fail?

I have a Rails app that includes some after_save active record callbacks, which are responsible for updating a second cache database.
Recently I was having some load issues and the second database was slow to respond or refusing connections, and my users started getting 500 errors when they tried to save objects.
I think this was down to me misunderstanding how callbacks work - I assumed that by making these after_save rather than before_save, the user was "safe" and if the callback failed, that would be graceful and silent.
How might I refactor my code to stop this behaviour and avoid exposing my users to error messages when these callbacks fail?
I've looked at refactoring the callbacks to simply trigger an active job, like this:
# models/mymodel.rb
after_save :update_index
def update_index
UpdateIndexJob.perform_later(self)
end
The active job includes the logic for actually updating the second cache database.
I'm not sure whether I would also need to implement sidekiq and redis for this to work.
I've also read about rails observers, which apparently work similar to callbacks but don't break the request-response cycle when they fail.
How can I stop active record callbacks from throwing an error when they fail?
That's a very open question, and I think you've already answered it yourself.
You can either:
Rewrite it in such a way that it won't fail (e.g. only invoke a background task).
Rescue the exception. (But then, is aborting that path of the code actually a sensible decision?! Probably not.)
I've looked at refactoring the callbacks to simply trigger an active job [...] I'm not sure whether I would also need to implement sidekiq and redis for this to work.
That seems like a safe implementation. And yes, you will need to configure a backend in order to actually execute the jobs. See the documentation. Sidekiq, Resque, Sneakers, Sucker Punch, Queue Classic, Delayed Job and Que are just some of the available choices.
I've also read about rails observers ...
That would also prevent user errors, but what happens if it fails due to an intermittent timeout?? One of the core features of background jobs is that they retry on failure.
Therefore I would advise using a background job instead of an observer for this.
Now with all of that said, we've barely spoken about what problem you're trying to solve:
updating a second cache database
You've said very little able what the "second cache database" actually is, but I suspect this could actually be configured as a leader/follower database, and live completely outside of your application.

Delayed Job executes with wrong data when I have a big amount of jobs

I read a lot and I saw that Delayed Job doesn't actually use serialized data but it retrieves information using the deserialized id.
This isn't the behavior that I was expected when I choose that gem, but I can deal with it.
The real problem is that I use DJ to fire some alerts based on some data using an after_save callback and sometimes that data fires the alert too much in the future. So basically if I save three times the medical result for different reasons and the third time finalizes it, I will fire three alerts because DJ works three times on the finalized result.
Does exist a way to enqueue a job in the same queue, for the same method just once? I saw that handler isn't exposed and handle_asyncronously doesn't accept a parameter to identify the process.
The best solution would have been to work directly on serialized data but also execute it once is acceptable.
Thank you in advance!

How to reflect process on before_update?

I have a project in which when I try to update some attribute, a long and exhausting before_update function runs. This function runs some scripts, and when they're finished successfully the attribute is changed.
The problem is that I want a way to reflect to current status of the currently running scripts (to display some sort of 2/5...3/5... process), but I can't figure out a solution. I tried saving the last running command in the DB, but because the scripts are running in a before_update scope the commit is done only after all script are finished.
Is there any elegant solution to this kind of problem?
In general, you should avoid running expensive, cross-cutting code in callbacks. A time will come when you want to update one of those records without running that code, and then you'll start adding flags to determine when that callback should run, and all sorts of other nastiness. Also, if the record is being updated during a request, the expensive callback code will slow the whole request down, and potentially time out and/or block other visitors from accessing your application.
The way to architect this would be to create the record first (perhaps with a flag/state that tells the rest of your app that the update hasn't been "processed" yet - meaning that related code currently in your callback hasn't run yet). Then, you'd enqueue a background job that does whatever is in your callback. If you are using Sidekiq, you can use the sidekiq-status gem to update the job's status as it's running.
You'd then add a controller/action that checks up on the job's status and returns it in JSON, and some JS that pings that action every few seconds to check up on the status of the job and update your interface accordingly.
Even if you didn't want to update your users on the status of the job, a background job would probably still be in order here - especially if that code is very expensive, or involves third-party API calls. If not, it likely belongs in the controller, and you could run it all in a transaction. But if you need to update your users on the status of that work, a background job is the way to go.

Rails / ActiveRecord: avoid two threads updating model at the same time with locks

Let's say I need to be sure ModelName can't be updated at the same time by two different Rails threads; this can happen, for example, when a webhooks post to the application tries to modify it at the same time some other code is running.
Per Rails documentation, I think the solution would be to use model_name_instance.with_lock, which also begins a new transaction.
This works fine and prevents simultaneous UPDATES to the model, but it does not prevent other threads from reading that table row while the with_lock block is running.
I can prove that with_lock does not prevent other READS by doing this:
Open 2 rails consoles;
On console 1, type something like ModelName.last.with_lock { sleep 30 }
On console 2, type ModelName.last. You'll be able to read the model no problem.
On console 2, type ModelName.update_columns(updated_at: Time.now). You'll see it will wait for the 30 seconds lock to expire before it finishes.
This proves that the lock DOES NOT prevent reading, and as far as I could tell there's no way to lock the database row from being read.
This is problematic because if 2 threads are running the same method at the EXACT same time and I must decide to run the with_lock block regarding some previous checks on the model data, thread 2 could be reading stale data that would be soon be updated by thread 1 after it finishes the with_lock block that is already running, because thread 2 CAN READ the model while with_lock block is in progress in thread 1, it only can't UPDATE it because of the lock.
EDIT: I found the answer to this question, so you can stop reading here and go straight to it below :)
One idea that I had was to begin the with_lock block issuing a harmless update to the model (like model_instance_name.update_columns(updated_at: Time.now) for instance), and then following it with a model_name_instance.reload to be sure that it gets the most updated data. So if two threads are running the same code at the same time, only one would be able to issue the first update, while the other would need to wait for the lock to be released. Once it is released, it would be followed with that model_instance_name.reload to be sure to get any updates performed by the other thread.
The problem is this solution seems way too hacky for my taste, and I'm not sure I should be reinventing the wheel here (I don't know if I'm missing any edge cases). How does one assure that, when two threads run the exact same method at the exact same time, one thread waits for the other to finish to even read the model ?
Thanks Robert for the Optimistic Locking info, I could definitely see me going that route, but Optimistic locking works by raising an exception on the moment of writing to the database (SQL UPDATE), and I have a lot of complex business logic that I wouldn't even want to run with the stale data in the first place.
This is how I solved it, and it was simpler than what I imagined.
First of all, I learned that pessimistic locking DOES NOT preventing any other threads from reading that database row.
But I also learned that with_lock also initiates the lock immediately, regardless of you trying to make a write or not.
So if you start 2 rails consoles (simulating two different threads), you can test that:
If you type ModelName.last.with_lock { sleep 30 } on Console 1 and ModelName.last on Console 2, Console 2 can read that record immediately.
However, if you type ModelName.last.with_lock { sleep 30 } on Console 1 and ModelName.last.with_lock { p 'I'm waiting' } on Console 2, Console 2 will wait for the lock hold by console 1, even though it's not issuing any write whatsoever.
So that's a way of 'locking the read': if you have a piece of code that you want to be sure that it won't be run simultaneously (not even for reads!), begin that method opening a with_lock block and issue your model reads inside it that they'll wait for any other locks to be released first. If you issue your reads outside it, your reads will be performed even tough some other piece of code in another thread has a lock on that table row.
Some other nice things I learned:
As per rails documentation, with_lock will not only start a transaction with a lock, but it will also reload your model for you, so you can be sure that inside the block ModelName.last is on it's most up-to-date state, since it issues a .reload on that instance.
That are some gems designed specifically to block the same piece of code running at the same time in multiple threads (which I believe the majority of every Rails app is while in production environment), regardless of the database lock. Take a look at redis-mutex, redis-semaphore and redis-lock.
That are many articles on the web (I could find at least 3) that state that Rails with_lock will prevent a READ on the database row, while we can easily see with the tests above that's not the case. Take care and always confirm information testing it yourself! I tried to comment on them warning about this.
You were close, you want optimistic locking instead of pessimist locking: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html .
It won't prevent reading an object and submitting a form. But it can detect that the form was submitted when the user was seeing stale version of the object.

In Rails, do I need to worry about failed transactions if the same model gets updated by two different mongrels?

I have a fairly vanilla rails app with low traffic at present, and everything seems to work OK.
However, I don't know much about the rails internals and I'm wondering what happens in a busy site if two requests come in at the same time and try to update the same model, from (I assume) two separate mongrel processes. Could this result in a failed transaction exception or similar, or does rails do any magic to serialize controller methods?
If an update could fail, what is the best practice to watch for and handle this type of situation?
For more background, my controller methods often update multiple models. I currently don't do anything special to create transactions and just rely on the default behaviors. Ideally I'd like the update to be retried rather than return an error (the updates are generally idempotent, i.e. doing them twice if necessary would be OK). My database is mysql.
afaik, mysql will wait until the first transaction is processed and then it will process the second one. #create, #update and #save get their stuff wrapped in an SQL transaction. And I guess mysql can handle those well.

Resources