cache_counter getting out of sync on production - ruby-on-rails

I assume there's some kind of race condition, although the code executes in the save transaction, so I'm not quite sure how these numbers could be getting out of sync.
All cache_counters on the site have at least a few of these errors.

Make a cron job to reset them.

Related

How can I stop active record callbacks from throwing an error when they fail?

I have a Rails app that includes some after_save active record callbacks, which are responsible for updating a second cache database.
Recently I was having some load issues and the second database was slow to respond or refusing connections, and my users started getting 500 errors when they tried to save objects.
I think this was down to me misunderstanding how callbacks work - I assumed that by making these after_save rather than before_save, the user was "safe" and if the callback failed, that would be graceful and silent.
How might I refactor my code to stop this behaviour and avoid exposing my users to error messages when these callbacks fail?
I've looked at refactoring the callbacks to simply trigger an active job, like this:
# models/mymodel.rb
after_save :update_index
def update_index
UpdateIndexJob.perform_later(self)
end
The active job includes the logic for actually updating the second cache database.
I'm not sure whether I would also need to implement sidekiq and redis for this to work.
I've also read about rails observers, which apparently work similar to callbacks but don't break the request-response cycle when they fail.
How can I stop active record callbacks from throwing an error when they fail?
That's a very open question, and I think you've already answered it yourself.
You can either:
Rewrite it in such a way that it won't fail (e.g. only invoke a background task).
Rescue the exception. (But then, is aborting that path of the code actually a sensible decision?! Probably not.)
I've looked at refactoring the callbacks to simply trigger an active job [...] I'm not sure whether I would also need to implement sidekiq and redis for this to work.
That seems like a safe implementation. And yes, you will need to configure a backend in order to actually execute the jobs. See the documentation. Sidekiq, Resque, Sneakers, Sucker Punch, Queue Classic, Delayed Job and Que are just some of the available choices.
I've also read about rails observers ...
That would also prevent user errors, but what happens if it fails due to an intermittent timeout?? One of the core features of background jobs is that they retry on failure.
Therefore I would advise using a background job instead of an observer for this.
Now with all of that said, we've barely spoken about what problem you're trying to solve:
updating a second cache database
You've said very little able what the "second cache database" actually is, but I suspect this could actually be configured as a leader/follower database, and live completely outside of your application.

What can make a sidekiq worker produce eroneous or even impossible method responses

We have a Worker which had a bug that caused eroneous responses to a method being called. The issue has since been fixed, however when we restart the background workers we seem to still experience the issue.
We know the issue is resolved because for the meantime we have moved the logic to a rake task and it is now working fine. We suspect the issue relates to failed or unperformed jobs in the sidekiq queue.
We tried to overcome this by clearing the redis DB with the below approach:
Sidekiq.redis { |r| puts r.flushall }
Has anyone experienced a similar issue when using Sidekiq/Redis and how did you over come it?
I think flushall might just retry immediately all the still failed, but still queued, jobs
In most cases, if you fix a bug in the worker without changing the signature, you may be able to just let them retry (assuming the job itself is idempotent which is kinda recommended for this reason and others).
If however, the bug was in what you were passing into the async job call, then you'll have to remove all those entries because they will continue to fail every time they are retried which, by default, can go on for weeks.
I think what you want to do is clear them all... you can do it for that particular queue. You may have to be careful to not remove jobs that are newly queued if that's a problem by perhaps examining the job entries. If you just want to nuke them all:
queue = Sidekiq::Queue.new('your_queue')
queue.clear

Rails / ActiveRecord: avoid two threads updating model at the same time with locks

Let's say I need to be sure ModelName can't be updated at the same time by two different Rails threads; this can happen, for example, when a webhooks post to the application tries to modify it at the same time some other code is running.
Per Rails documentation, I think the solution would be to use model_name_instance.with_lock, which also begins a new transaction.
This works fine and prevents simultaneous UPDATES to the model, but it does not prevent other threads from reading that table row while the with_lock block is running.
I can prove that with_lock does not prevent other READS by doing this:
Open 2 rails consoles;
On console 1, type something like ModelName.last.with_lock { sleep 30 }
On console 2, type ModelName.last. You'll be able to read the model no problem.
On console 2, type ModelName.update_columns(updated_at: Time.now). You'll see it will wait for the 30 seconds lock to expire before it finishes.
This proves that the lock DOES NOT prevent reading, and as far as I could tell there's no way to lock the database row from being read.
This is problematic because if 2 threads are running the same method at the EXACT same time and I must decide to run the with_lock block regarding some previous checks on the model data, thread 2 could be reading stale data that would be soon be updated by thread 1 after it finishes the with_lock block that is already running, because thread 2 CAN READ the model while with_lock block is in progress in thread 1, it only can't UPDATE it because of the lock.
EDIT: I found the answer to this question, so you can stop reading here and go straight to it below :)
One idea that I had was to begin the with_lock block issuing a harmless update to the model (like model_instance_name.update_columns(updated_at: Time.now) for instance), and then following it with a model_name_instance.reload to be sure that it gets the most updated data. So if two threads are running the same code at the same time, only one would be able to issue the first update, while the other would need to wait for the lock to be released. Once it is released, it would be followed with that model_instance_name.reload to be sure to get any updates performed by the other thread.
The problem is this solution seems way too hacky for my taste, and I'm not sure I should be reinventing the wheel here (I don't know if I'm missing any edge cases). How does one assure that, when two threads run the exact same method at the exact same time, one thread waits for the other to finish to even read the model ?
Thanks Robert for the Optimistic Locking info, I could definitely see me going that route, but Optimistic locking works by raising an exception on the moment of writing to the database (SQL UPDATE), and I have a lot of complex business logic that I wouldn't even want to run with the stale data in the first place.
This is how I solved it, and it was simpler than what I imagined.
First of all, I learned that pessimistic locking DOES NOT preventing any other threads from reading that database row.
But I also learned that with_lock also initiates the lock immediately, regardless of you trying to make a write or not.
So if you start 2 rails consoles (simulating two different threads), you can test that:
If you type ModelName.last.with_lock { sleep 30 } on Console 1 and ModelName.last on Console 2, Console 2 can read that record immediately.
However, if you type ModelName.last.with_lock { sleep 30 } on Console 1 and ModelName.last.with_lock { p 'I'm waiting' } on Console 2, Console 2 will wait for the lock hold by console 1, even though it's not issuing any write whatsoever.
So that's a way of 'locking the read': if you have a piece of code that you want to be sure that it won't be run simultaneously (not even for reads!), begin that method opening a with_lock block and issue your model reads inside it that they'll wait for any other locks to be released first. If you issue your reads outside it, your reads will be performed even tough some other piece of code in another thread has a lock on that table row.
Some other nice things I learned:
As per rails documentation, with_lock will not only start a transaction with a lock, but it will also reload your model for you, so you can be sure that inside the block ModelName.last is on it's most up-to-date state, since it issues a .reload on that instance.
That are some gems designed specifically to block the same piece of code running at the same time in multiple threads (which I believe the majority of every Rails app is while in production environment), regardless of the database lock. Take a look at redis-mutex, redis-semaphore and redis-lock.
That are many articles on the web (I could find at least 3) that state that Rails with_lock will prevent a READ on the database row, while we can easily see with the tests above that's not the case. Take care and always confirm information testing it yourself! I tried to comment on them warning about this.
You were close, you want optimistic locking instead of pessimist locking: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html .
It won't prevent reading an object and submitting a form. But it can detect that the form was submitted when the user was seeing stale version of the object.

How can I test `perform_now` job in Rails

Recently on my current project, I noticed that on our integration test, there are many assert_enqueued_jobs 1 code are commented out, then I ask other developer about it, He told me that we've changed many ActiveJob perform_later call to perform_now for speed reason, and that assert_enqueued_jobs can't catch those performed now jobs.
I've tried using assert_performed_jobs also but didn't worked out.
Can anyone give me insight or suggestion to test it? or it just can't be testable?
Untestable? Never!
The assert_enqueded_jobs is not actually testing code, rather checking that something was actually enqueued to happen later. If something happens right away, why test that it was enqueued?
I would try to keep the queue and force jobs to be performed/cleared using other of the ActiveJob::TestHelpers. But to that's just me, it makes little difference.
https://apidock.com/rails/v4.2.1/ActiveJob/TestHelper/assert_enqueued_jobs
Say your job was to send some email, just call the ActiveJob#perform_now and check ActionMailer::Base.deliveries.count. At this point, the actual test will be very tailored to the job.
Could be creating a Notification and you might want to assert that Notification.count has changed, whatever.
The main thing is that instead of checking that the job was enqueued and end of story, we're looking for the desired outcome from that job running.

How to deal with expensive fixture/factory_girl object creation in tests?

For all users in our system, we generate a private/public key pair, which often takes a second or two. That's not a deal breaker on the live site, but it makes running the tests extremely slow, and slow tests won't get run.
Our setup is Rails 3.1 with factory_girl and rspec.
I tried creating (10 or so) some ahead of time with a method to return a random one, but this seems to be problematic: perhaps they're getting cleared out of the database and are unavailable for subsequent tests... I'm not sure.
This might be useful: https://github.com/pcreux/rspec-set - any other ideas?
You can always make a fake key pair for your tests. Pregenerating them won't work, at least not if you store them in the DB, because the DB should get cleared for every test. I suppose you could store them in a YAML file or something and read them from there...
https://github.com/pcreux/rspec-set was good enough for what we need, combined with an after/all block to clean up after the droppings it leaves in the database.

Resources