validates_uniqueness_of failing on heroku? - ruby-on-rails

In my User model, I have:
validates_uniqueness_of :fb_uid (I'm using facebook connect).
However, at times, I'm getting duplicate rows upon user sign up. This is Very Bad.
The creation time of the two records is within 100ms. I haven't been able to determine if it happens in two separate requests or not (heroku logging sucks and only goes back so far and it's only happened twice).
Two things:
Sometimes the request takes some time, because I query FB API for name info, friends, and picture.
I'm using bigint to store fb_uid (backend is postgres).
I haven't been able to replicate in dev.
Any ideas would be extremely appreciated.
The signin function
def self.create_from_cookie(fb_cookie, remote_ip = nil)
return nil unless fb_cookie
return nil unless fb_hash = authenticate_cookie(fb_cookie)
uid = fb_hash["uid"].join.to_i
#Make user and set data
fb_user = FacebookUser.new
fb_user.fb_uid = uid
fb_user.fb_authorized = true
fb_user.email_confirmed = true
fb_user.creation_ip = remote_ip
fb_name_data, fb_friends_data, fb_photo_data, fb_photo_ext = fb_user.query_data(fb_hash)
return nil unless fb_name_data
fb_user.set_name(fb_name_data)
fb_user.set_photo(fb_photo_data, fb_photo_ext)
#Save user and friends to the db
return nil unless fb_user.save
fb_user.set_friends(fb_friends_data)
return fb_user
end

I'm not terribly familiar with facebook connect, but is it possible to get two of the same uuid if two separate users from two separate accounts post a request in very quick succession before either request has completed? (Otherwise known as a race condition) validates_uniqueness_of can still suffer from this sort of race condition, details can be found here:
http://apidock.com/rails/ActiveModel/Validations/ClassMethods/validates_uniqueness_of
Because this check is performed
outside the database there is still a
chance that duplicate values will be
inserted in two parallel transactions.
To guarantee against this you should
create a unique index on the field.
See add_index for more information.
You can really make sure this will never happen by adding a database constraint. Add this to a database migration and then run it:
add_index :user, :fb_uid, :unique => true
Now a user would get an error instead of being able to complete the request, which is usually preferable to generating illegal data in your database which you have to debug and clean out manually.

From Ruby on Rails v3.0.5 Module ActiveRecord::Validations::ClassMethods
http://s831.us/dK6mFQ
Concurrency and integrity
Using this [validates_uniqueness_of]
validation method in conjunction with
ActiveRecord::Base#save does not
guarantee the absence of duplicate
record insertions, because uniqueness
checks on the application level are
inherently prone to race conditions.
For example, suppose that two users
try to post a Comment at the same
time, and a Comment’s title must be
unique. At the database-level, the
actions performed by these users could
be interleaved in the following
manner: ...

It seems like there is some sort of a race condition inside your code. To check this, i would first change the code so that facebook values are first extracted and only then i would create a new facebook object.
Then i would highly suggest that you write a test to check whether your function gets executed once. It seems that it's executed two times.
And upon this, there seems to be a race condition upon waiting to get the facebook results.

Related

how to lock record in RoR

I'm developing a reservation system for stuff that you could rent.
I would like to restrict multiple users from reserving the same item.
I display a list, which user can click on the item to check the details.
If any user has already opened the detail view then other user can not open it at the same time.
I am maintaining a flag call is_lock to check if the record is already locked but I was facing issue when multiple users clicked on the same item at the same time.
So I implementing pessimistic lock, which reduced the rate of occurrence of this issue but multiple users opening the same item but it did not completely fixed the issue. I am still facing the same thing.
begin
Item.transaction do
item = Item.lock.where(id: item_id, is_lock: false)
item.is_lock = true;
item.save!
end
rescue Exception => e
# Something went wrong.
end
Above is the code that I have implemented.
Please let me know if I am doing anything wrong.
EDIT:
I've tried the solution provided by #rmlockerd in following way:
Run rails in 2 separate consoles.
Fetch record with lock that has id:100 from console-1.
Fetch to fetch the same record from console-2.
But the above test failed as I was able to fetch the same record from both console even though the record was locked from console-1.
Run rails in 2 separate consoles.
It might be misleading to just look at the snippet you provided, but there does seem like a possible race condition due to your .where predicate.
If User2 attempts to get a lock on the same item after User1 but before first the transaction commits, the .where will still return the original record with is_lock false. The default behaviour for .lock is to simply wait its turn for a lock. So User2 would block until the original transaction commits, then get a lock and proceed to set is_lock to true as well.
The good news is that when you get a lock, Rails reloads the record so you are getting the latest data. Checking is_lock after obtaining the lock should eliminate that race condition, like so:
Item.transaction do
item = Item.lock.find_by(id: item_id, is_lock: false) # only 1, so where is unnecessary
return if item.blank? || !item.is_lock
item.update!(is_lock: true)
end
# I have the lock stuff...
The .lock method also takes an optional 'locking clause' -- which varies based on the database you use -- that can be used to configure the locking behaviour. For example, if you use Postgres, you could do:
Item.transaction do
item = Item.lock('FOR UPDATE SKIP LOCKED').find_by(id: item_id, is_lock: false)
return if item.blank?
item.update!(is_lock: true)
end
The SKIP LOCKED clause directs Postgres to automatically skip any record that is already locked. In the race condition described above, the second call to .lock would bail immediately and return nil, so a simple check of item presence would suffice. Check out the Postgres or MySQL documentation if you're interested in database-specific locking clauses.

How to prevent parallel Sidekiq jobs from executing code in Rails

I have around 10 workers that performs a job that includes the following:
user = User.find_or_initialize_by(email: 'some-email#address.com')
if user.new_record?
# ... some code here that does something taking around 5 seconds or so
elsif user.persisted?
# ... some code here that does something taking around 5 seconds or so
end
user.save
The problem is that at certain times, two or more workers run this code at the exact time, and thus I later found out that two or more Users have the same email, in which I should always end up only unique emails.
It is not possible for my situation to create DB Unique Indexes for email as unique emails are conditional -- some Users should have unique email, some do not.
It is noteworthy to mention that my User model has uniqueness validations, but it still doesn't help me because, between .find_or_initialize_by and .save, there is a code that is dependent if the user object is already created or not.
I tried Pessimistic and Optimistic locking, but it didn't help me, or maybe I just didn't implement it properly... should you have some suggestions regarding this.
The solution I can only think of is to lock the other threads (Sidekiq jobs) whenever these lines of codes get executed, but I am not too sure how to implement this nor do I know if this is even a suggestable approach.
I would appreciate any help.
EDIT
In my specific case, it is gonna be hard to put email parameter in the job, as this job is a little more complex than what was just said above. The job is actually an export script in which a section of the job is the code above. I don't think it's also possible to separate the functionality above into another separate worker... as the whole job flow should be serial and that no parts should be processed parallely / asynchronously. This job is just one of the jobs that are managed by another job, in which ultimately is managed by the master job.
Pessimistic locking is what you want but only works on a record that exists - you can't use it with new_record? because there's nothing to lock in the DB yet.
I managed to solve my problem with the following:
I found out that I can actually add a where clause in Rails DB Uniqueness Partial Index, and thus I can now set up uniqueness conditions for different types of Users on the database-level in which other concurrent jobs will now raise an ActiveRecord::RecordNotUnique error if already created.
The only problem now then is the code in between .find_or_initialize_by and .save, since those are time-dependent on the User objects in which always only one concurrent job should always get a .new_record? == true, and other concurrent jobs should then trigger the .persisted? == true as one job would always be first to create it, but... all of these doesn't work yet because it is only at the line .save where the db uniqueness index validation gets called. Therefore, I managed to solve this problem by putting .save before those conditions, and at the same time I added a rescue block for .save which then adds another job to the queue of itself should it trigger the ActiveRecord::RecordNotUnique error, to make sure that async jobs won't get conflicts. The code now looks like below.
user = User.find_or_initialize_by(email: 'some-email#address.com')
begin
user.save
is_new_record = user.new_record?
is_persisted = user.persisted?
rescue ActiveRecord::RecordNotUnique => exception
MyJob.perform_later(params_hash)
end
if is_new_record
# do something if not yet created
elsif is_persisted
# do something if already created
end
I would suggest a different architecture to bypass the problem.
How about a producer-worker model, where one master Sidekiq process gets a list of email addresses, and then spawns a worker Sidekiq process for each email? Sidekiq makes this easy with a dedicated queue for master and workers to communicate.
Doing so, the email address becomes an input parameter of workers, so we know by construction that workers will not stump on each other data.

Rails cross model validation

I have two tables one for members and the other for employees, both have an attribute called id_number this attribute is not required and can be null.
Is it possible to run a validation to ensure the uniqueness of the id_number, so that if an employee is added with the same id_number as an member or vice versa that it will give an error.
I am thinking of writing my own validation but hitting the db for each instance will be very slow as some companies upload 10's of thousands of employees at a time.
Yes that's possible with your own validation. I think you have to hit the database, otherwise you never could check if it exists already.
def your_validation
employee_ids = Employee.all.map(&:id_number)
member_ids = Member.all.map(&:id_number)
id = self.id_number
if employee_ids.include?(id) || member_ids.include?(id)
errors.add(:id_number, "is already taken")
end
end
I think adding an index to your id_number will be good.
UPDATE: The above method could be changed to following to improve the performance:
def your_validation
employee_ids = Employee.all.map(&:id_number)
if employee_ids.include?(self.id_number)
errors.add(:id_number, "is already taken")
else
member_ids = Member.all.map(&:id_number)
if member_ids.include?(self.id_number)
errors.add(:id_number, "is already taken")
end
end
end
The first one is cleaner, the second one should be faster. But check this out with a lot of db entries and a benchmark tool.
I think you'll want something like this:
def your_validation
if self.id_number.present?
if Employee.exists?(:id_number=>self.id_number) || Member.exists(:id_number=>self.id_number)
errors.add(:id_number, "is already taken")
end
end
end
if you have indices on the id_number columns this check should run very quickly and is the same check that validates_uniqueness_of would use within a single table. Solutions that involves fetching all ids into rails will start running into problems when the tables get large.
Another thing to note is that if your app runs multiple web server instances at a time these kinds of rails side checks can't 100% guarantee uniqueness as they are subject to races between threads. The only way to ensure uniqueness in such situations would be to use facilities built into your database or generate the id_numbers yourself from a source that precludes duplicates (such as a database sequence).

Database lock not working as expected with Rails & Postgres

I have the following code in a rails model:
foo = Food.find(...)
foo.with_lock do
if bar = foo.bars.find_by_stuff(stuff)
# do something with bar
else
bar = foo.bars.create!
# do something with bar
end
end
The goal is to make sure that a Bar of the type being created is not being created twice.
Testing with_lock works at the console confirms my expectations. However, in production, it seems that in either some or all cases the lock is not working as expected, and the redundant Bar is being attempted -- so, the with_lock doesn't (always?) result in the code waiting for its turn.
What could be happening here?
update
so sorry to everyone who was saying "locking foo won't help you"!! my example initially didin't have the bar lookup. this is fixed now.
You're confused about what with_lock does. From the fine manual:
with_lock(lock = true)
Wraps the passed block in a transaction, locking the object before yielding. You pass can the SQL locking clause as argument (see lock!).
If you check what with_lock does internally, you'll see that it is little more than a thin wrapper around lock!:
lock!(lock = true)
Obtain a row lock on this record. Reloads the record to obtain the requested lock.
So with_lock is simply doing a row lock and locking foo's row.
Don't bother with all this locking nonsense. The only sane way to handle this sort of situation is to use a unique constraint in the database, no one but the database can ensure uniqueness unless you want to do absurd things like locking whole tables; then just go ahead and blindly try your INSERT or UPDATE and trap and ignore the exception that will be raised when the unique constraint is violated.
The correct way to handle this situation is actually right in the Rails docs:
http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/find_or_create_by
begin
CreditAccount.find_or_create_by(user_id: user.id)
rescue ActiveRecord::RecordNotUnique
retry
end
("find_or_create_by" is not atomic, its actually a find and then a create. So replace that with your find and then create. The docs on this page describe this case exactly.)
Why don't you use a unique constraint? It's made for uniqueness
A reason why a lock wouldn't be working in a Rails app in query cache.
If you try to obtain an exclusive lock on the same row multiple times in a single request, query cached kicks in so subsequent locking queries never reach the DB itself.
The issue has been reported on Github.

Is this a race condition issue in Rails 3?

Basically I have this User model which has certain attributes say 'health' and another Battle model which records all the fight between Users. Users can fight with one another and some probability will determine who wins. Both will lose health after a fight.
So in the Battle controller, 'CREATE' action I did,
#battle = Battle.attempt current_user.id, opponent.id
In the Battle model,
def self.attempt current_user.id, opponent_id
battle = Battle.new({:user_id => current_user.id, :opponent_id => opponent_id})
# all the math calculation here
...
# Update Health
...
battle.User.health = new_health
battle.User.save
battle.save
return battle
end
Back to the Battle controller, I did ...
new_user_health = current_user.health
to get the new health value after the Battle. However the value I got is the old health value (the health value before the Battle).
Has anyone face this kind of problem before ???
UPDATE
I just add
current_user.reload
before the line
new_user_health = current_user.health
and that works. Problem solved. Thanks!
It appears that you are getting current_user, then updating battle.user and then expecting current_user to automatically have the updated values. This type of thing is possible using Rails' Identity Map but there are some caveats that you'll want to read up on first.
The problem is that even though the two objects are backed by the same data in the database, you have two objects in memory. To refresh the information, you can call current_user.reload.
As a side note, this wouldn't be classified a race condition because you aren't using more than one process to modify/read the data. In this example, you are reading the data, then updating the data on a different object in memory. A race condition could happen if you were using two threads to access the same information at the same time.
Also, you should use battle.user, not battle.User like Wayne mentioned in the comments.

Resources