Mutex locks in Ruby do not work with Redis? - ruby-on-rails

I have a requirement of batch imports. Files can contain 1000s of records and each record needs validation. User wants to be notified how many records were invalid. Originally I did this with Ruby's Mutex and Redis' Publish/Subscribe. Note that I have 20 concurrent threads processing each record via Sidekiq:
class Record < ActiveRecord::Base
class << self
# invalidated_records is SHARED memory for the Sidekiq worker threads
attr_accessor :invalidated_records
attr_accessor :semaphore
end
def self.batch_import
self.semaphore = Mutex.new
self.invalid_records = []
redis.subscribe_with_timeout(180, 'validation_update') do |on|
on.message do |channel, message|
if message.to_s =~ /\d+|import_.+/
self.semaphore.synchronize {
self.invalidated_records << message
}
elsif message == 'exit'
redis.unsubscribe
end
end
end
end
end
Sidekiq would publish to the Record object:
Redis.current.publish 'validation_update', 'import_invalid_address'
The problem is something weird happens. All the invalid imports are not populated in Record.invalidated_records. Many of them are but not all of them. I thought it was because multiple threads try to update the object concurrently, it taints the object. And I thought the Mutex lock would solve this problem. But still after adding Mutex lock, not all invalids are populated in Record.invalidated_records.
Ultimately, I used redis atomic decrement and increment to track invalid imports and that worked like a charm. But I am curious what is the issue with Ruby Mutex and multiple threads trying to update Record.invalidated_records?

i have not used mutex but i think what happens is that thread sees semaphore is locked and skip saving << message
u need to use https://apidock.com/ruby/ConditionVariable
wait mutex lock for unlock and then save data

Related

Sidekiq mailer job access to db before model is been saved

Probably the title is not self explanatory, the situation is this:
# user.points: 0
user.update!(points: 1000)
UserMailer.notify(user).deliver_later. # user.points = 0 => Error !!!!
user instance is updated and after that the Mailer is called with the user as a parameter, and in the email that changes are non-existent: user.points=0 instead of 1000
But, with a sleep 1 just after the user_update the email is sent with the changes updated, so it seems that the email job is faster than updating data to database.
# user.points: 0
user.update!(points: 1000)
sleep 1
UserMailer.notify(user).deliver_later. # user.points = 1000 => OK
What's the best approach to solve this avoiding this two possible solutions?
One solution could be calling UserMailer.notify not with the user instance but with the user values
Another solution, it could be sending the mail in the user callback after_commit
So, is there another way to solve this keeping the user instance as the parameter and avoiding the after_commit callback?
Thanks
Remember, Sidekiq runs copy of your Rails app in a separate process, using Redis as the medium. When you call deliver_later, it does not actually 'pass' user to the mailer job. It spawns a thread that enqueues the job in Redis, passing a serialized hash of user properties, including the ID.
When the mailer job runs in the Sidekiq process, it loads a fresh copy of user from the database. If the transaction containing your update! in the main Rails app has not yet finished committing, Sidekiq gets the old record from the database. So, it's a race condition.
(update! already wraps an implicit transaction around itself if there isn't one, so wrapping it in your own transaction is redundant, and doesn't help the race condition since nested ActiveRecord transactions commit only when the outermost transaction commits.)
In a pinch, you could delay enqueuing the job with something hacky like .deliver_later(wait_until: 10.seconds.from_now), but your best bet is to put the mailer notification in an after_commit callback on your model.
class User < ApplicationRecord
after_commit :send_points_mailer
def send_points_mailer
return unless previous_changes.includes?(:points)
UserMailer.notify(self).deliver_later
end
end
A model's after_commit callbacks are guaranteed to run after the final transaction is committed, so, like nuking from orbit, it's the only way to be sure.
You didn't mention it, but I'm assuming you are using ActiveRecord? If so you likely need to assure to flush the database transaction before your sidekiq job is scheduled.
https://api.rubyonrails.org/v6.1.4/classes/ActiveRecord/Transactions/ClassMethods.html

Ruby with_advisory_lock test with multiple threads fails intermittently

I'm using the with_advisory_lock gem to try and ensure that a record is created only once. Here's the github url to the gem.
I have the following code, which sits in an operation class that I wrote to handle creating user subscriptions:
def create_subscription_for user
subscription = UserSubscription.with_advisory_lock("lock_%d" % user.id) do
UserSubscription.where({ user_id: user.id }).first_or_create
end
# do more stuff on that subscription
end
and the accompanying test:
threads = []
user = FactoryBot.create(:user)
rand(5..10).times do
threads << Thread.new do
subject.create_subscription_for(user)
end
end
threads.each(&:join)
expect(UserSubscription.count).to eq(1)
What I expect to happen:
The first thread to get to the block acquires the lock and creates a record.
Any other thread that gets to the block while it's being held by another thread waits indefinitely until the lock is released (as per docs)
As soon as the lock is released by the first thread that created the record, another thread acquires the lock and now finds the record because it was already created by the first thread.
What actually happens:
The first thread to get to the block acquires the lock and creates a record.
Any other thread that gets to the block while it's being held by another thread goes and executes the code in the block anyway and as a result, when running the test, it sometimes fails with a ActiveRecord::RecordNotUnique error (I have a unique index on the table that allows for a single user_subscription with the same user_id)
What is more weird is that if I add a sleep for a few hundred milliseconds in my method just before the find_or_create method, the test never fails:
def create_subscription_for user
subscription = UserSubscription.with_advisory_lock("lock_%d" % user.id) do
sleep 0.2
UserSubscription.where({ user_id: user.id }).first_or_create
end
# do more stuff on that subscription
end
My questions are: "Why is adding the sleep 0.2 making the tests always pass?" and "Where do I look to debug this?"
Thanks!
UPDATE: Tweaking the tests a little bit causes them to always fail:
threads = []
user = FactoryBot.create(:user)
rand(5..10).times do
threads << Thread.new do
sleep
subject.create_subscription_for(user)
end
end
until threads.all? { |t| t.status == 'sleep' }
sleep 0.1
end
threads.each(&:wakeup)
threads.each(&:join)
expect(UserSubscription.count).to eq(1)
I have also wrapped first_or_create in a transaction, which makes the test pass and everything to work as expected:
def create_subscription_for user
subscription = UserSubscription.with_advisory_lock("lock_%d" % user.id) do
UserSubscription.transaction do
UserSubscription.where({ user_id: user.id }).first_or_create
end
end
# do more stuff on that subscription
end
So why is wrapping first_or_create in a transaction necessary to make things work?
Are you turning off transactional tests for this test case? I'm working on something similar and that proved to be important to actually simulating the concurrency.
See uses_transaction https://api.rubyonrails.org/classes/ActiveRecord/TestFixtures/ClassMethods.html
If transactions are not turned off, Rails will wrap the entire test in a transaction and this will cause all the threads to share one DB connection. Furthermore, in Postgres a session-level advisory lock can always be re-acquired within the same session. From the docs:
If a session already holds a given advisory lock, additional requests
by it will always succeed, even if other sessions are awaiting the
lock; this statement is true regardless of whether the existing lock
hold and new request are at session level or transaction level.
Based on that I'm suspecting that your lock is always able to be acquired and therefore the .first_or_create call is always executed which results in the intermittent RecordNotUnique exceptions.

Rails 4 - threading error

I am trying to perform some calculations to populate some historic data in the database.
The database is SQL Server. The server is tomcat (using JRuby).
I am running the script file in a rails console pointed to the uat environment.
I am trying to use threads to speed up the execution. The idea being that each thread would take an object and run the calculations for it, and save the calculated values back to the database.
Problem: I keep getting this error:
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5.000 seconds (waited 5.000 seconds))
code:
require 'thread'
threads = []
items_to_calculate = Item.where("id < 11").to_a #testing only 10 items for now
for item in items_to_calculate
threads << Thread.new(item) { |myitem|
my_calculator = ItemsCalculator.new(myitem)
to_save = my_calculator.calculate_details
to_save.each do |dt|
dt.save!
end
}
end
threads.each { |aThread| aThread.join }
You're probably spawning more threads than ActiveRecord's DB connection pool has connections. Ekkehard's answer is an excellent general description; so here's a simple example of how to limit your workers using Ruby's thread-safe Queue.
require 'thread'
queue = Queue.new
items.each { |i| queue << i } # Fill the queue
Array.new(5) do # Only 5 concurrent workers
Thread.new do
until queue.empty?
item = queue.pop
ActiveRecord::Base.connection_pool.with_connection do
# Work
end
end
end
end.each(&:join)
I chose 5 because that's the ConnectionPool's default, but you can certainly tune that to the max that still works, or populate another queue with the result to save later and run an arbitrary number of threads for the calculation.
The with_connection method grabs a connection, runs your block, then ensures the connection is released. It's necessary because of a bug in ActiveRecord where the connection doesn't always get released otherwise. Check out this blog post for some details.
You are potentially starting a huge amount of threads at the same time if you leave the testing stage.
Each of these threads will need a DB connection. Either Rails is going to create a new one for every thread (possible creating a huge amount of DB connections at the same time), or it does not, in which case you'll run into trouble because several threads are trying to use the same connection in parallel. The first case would explain the error message because there will probably be a hard limit of open DB connections in your DB server.
Creating threads like this is usually not advisable. You're usually better off to create a handful (controlled/limited) amount of worker threads and using a queue to distribute work between them.
In your case, you could have a set of worker threads to do the calculations, and a second set of worker threads to write to the DB. I do not know enough about the details of your code to decide for you which is better. If the calculation is expensive and the DB-work is not, then you will probably have only one worker for writing to the DB in a serial fashion. If your DB is a beast and highly optimized for parallel writing and you need to write a lot of data, then you will maybe want a (small) amount of DB workers.

Rails: How to execute one task per user in parallel?

I have one simple rake task that execute some action per user. Something like these:
task users_actions: :environment do ||
User.all.each { |u|
# Some actions here
}
end
The problem it's that it doesn't start with the next user until it finished one. What I want is to execute these in parallel. How can I do that? It's even posible?
Thanks,
If there was a good library available, it would be better to use it rather than implementing everything from scratch. concurrent-ruby has all kinds of utility classes for writing concurrent code, but I'm not sure if they have something suitable for this use case; anyways, I'll show you how to do it from scratch.
First pull in the thread library:
require 'thread'
Make a thread-safe queue, and stick all the users on it:
queue = Queue.new
User.all.each { |user| queue << user }
Start some number of worker threads, and make them process items from the queue until all are done.
threads = 5.times.collect do
Thread.new do
while true
user = queue.pop(true) rescue break
# do something with the user
# this had better be thread-safe, or you will live to regret it!
end
end
end
Then wait until all the threads finish:
threads.each(&:join)
Again, please make sure that the code which processes each user is thread-safe! The power of multi-threading is in your hands, don't abuse it!
NOTE: If your user-processing code is very CPU-intensive, you might consider running this on Rubinius or JRuby, so that more than one thread can run Ruby code at the same time.

Connection pool issue with ActiveRecord objects in rufus-scheduler

I'm using rufus-scheduler to run a number of frequent jobs that do some various tasks with ActiveRecord objects. If there is any sort of network or postgresql hiccup, even after recovery, all the threads will throw the following error until the process is restarted:
ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5 seconds (waited 5.000122687 seconds). The max pool size is currently 5; consider increasing it.
The error can easily be reproduced by restarting postgres. I've tried playing (up to 15) with the pool size, but no luck there.
That leads me to believe the connections are just in a stale state, which I thought would be fixed with the call to clear_stale_cached_connections!.
Is there a more reliable pattern to do this?
The block that is passed is a simple select and update active record call, and happens to matter what the AR object is.
The rufus job:
scheduler.every '5s' do
db do
DataFeed.update #standard AR select/update
end
end
wrapper:
def db(&block)
begin
ActiveRecord::Base.connection_pool.clear_stale_cached_connections!
#ActiveRecord::Base.establish_connection # this didn't help either way
yield block
rescue Exception => e
raise e
ensure
ActiveRecord::Base.connection.close if ActiveRecord::Base.connection
ActiveRecord::Base.clear_active_connections!
end
end
Rufus scheduler starts a new thread for every job.
ActiveRecord on the other hand cannot share connections between threads, so it needs to assign a connection to a specific thread.
When your thread doesn't have a connection yet, it will get one from the pool.
(If all connections in the pool are in use, it will wait untill one is returned from another thread. Eventually timing out and throwing ConnectionTimeoutError)
It is your responsibility to return it back to the pool when you are done with it, in a Rails app, this is done automatically. But if you are managing your own threads (as rufus does), you have to do this yourself.
Lucklily, there is an api for this:
If you put your code inside a with_connection block, it will get a connection form the pool, and release it when it is done
ActiveRecord::Base.connection_pool.with_connection do
#your code here
end
In your case:
def db
ActiveRecord::Base.connection_pool.with_connection do
yield
end
end
Should do the trick....
http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html#method-i-with_connection
The reason can be that you have many threads which are using all connections, if DataFeed.update method takes more than 5 seconds, than your block can be overlapped.
try
scheduler.every("5s", :allow_overlapping => false) do
#...
end
Also try release connection instead of closing it.
ActiveRecord::Base.connection_pool.release_connection
I don't really know about rufus-scheduler, but I got some ideas.
The first problem could be a bug on rufus-scheduler that does not checkout database connection properly. If it's the case the only solution is to clear stale connections manually as you already do and to inform the author of rufus-scheduler about your issue.
Another problem that could happen is that your DataFeed operation takes a really long time and because it is performed every 5 secondes Rails is running out of database connections, but it's rather unlikely.

Resources