Rails cross model validation - ruby-on-rails

I have two tables one for members and the other for employees, both have an attribute called id_number this attribute is not required and can be null.
Is it possible to run a validation to ensure the uniqueness of the id_number, so that if an employee is added with the same id_number as an member or vice versa that it will give an error.
I am thinking of writing my own validation but hitting the db for each instance will be very slow as some companies upload 10's of thousands of employees at a time.

Yes that's possible with your own validation. I think you have to hit the database, otherwise you never could check if it exists already.
def your_validation
employee_ids = Employee.all.map(&:id_number)
member_ids = Member.all.map(&:id_number)
id = self.id_number
if employee_ids.include?(id) || member_ids.include?(id)
errors.add(:id_number, "is already taken")
end
end
I think adding an index to your id_number will be good.
UPDATE: The above method could be changed to following to improve the performance:
def your_validation
employee_ids = Employee.all.map(&:id_number)
if employee_ids.include?(self.id_number)
errors.add(:id_number, "is already taken")
else
member_ids = Member.all.map(&:id_number)
if member_ids.include?(self.id_number)
errors.add(:id_number, "is already taken")
end
end
end
The first one is cleaner, the second one should be faster. But check this out with a lot of db entries and a benchmark tool.

I think you'll want something like this:
def your_validation
if self.id_number.present?
if Employee.exists?(:id_number=>self.id_number) || Member.exists(:id_number=>self.id_number)
errors.add(:id_number, "is already taken")
end
end
end
if you have indices on the id_number columns this check should run very quickly and is the same check that validates_uniqueness_of would use within a single table. Solutions that involves fetching all ids into rails will start running into problems when the tables get large.
Another thing to note is that if your app runs multiple web server instances at a time these kinds of rails side checks can't 100% guarantee uniqueness as they are subject to races between threads. The only way to ensure uniqueness in such situations would be to use facilities built into your database or generate the id_numbers yourself from a source that precludes duplicates (such as a database sequence).

Related

How do I ensure correctness when using find_in_batches?

current my application have stat needs and I
make up a background job using rufus-scheduler and runs at 3:00
to batch process these records into CacheStat table. It's just like
any normal application's Weekly/Monthly Stat needs.
And I found out using find_each(say using User.find_each to iterate
all users), which invokes find_in_batches, I checkout the source code
of rails,
while records.any?
records_size = records.size
primary_key_offset = records.last.id
yield records
break if records_size < batch_size
if primary_key_offset
records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
else
raise "Primary key not included in the custom select clause"
end
end
which the implentation is by comparing the primary-key,
my concern is the cocurrency,while I processing the batch,
whatif some records be inserted in-between?
does anybody have this kind of problem?
While I think, this code implementation may be be problemic,
because new records will always have larger PK and later in the
end will be find.
So this is what this kind of needs be implemented? If I want to
implement a batch stat processing by myself(without rails), then I
need to ensure have an integer primary key and using these fields to
compare(better not to use other kind of fields)?
(I was thinking of this because I'm kind of in the middle of switching
from mysql to mongo, so maybe later I need to implement this kind of
functionality by myself).
If I understand correctly, you can ensure correctness here by enforcing transactional isolation, e.g.
User.transaction do
User.find_each do |user|
user
end
end

Ruby on Rails - ActiveRecord::Relation count method is wrong?

I'm writing an application that allows users to send one another messages about an 'offer'.
I thought I'd save myself some work and use the Mailboxer gem.
I'm following a test driven development approach with RSpec. I'm writing a test that should ensure that only one Conversation is allowed per offer. An offer belongs_to two different users (the user that made the offer, and the user that received the offer).
Here is my failing test:
describe "after a message is sent to the same user twice" do
before do
2.times { sending_user.message_user_regarding_offer! offer, receiving_user, random_string }
end
specify { sending_user.mailbox.conversations.count.should == 1 }
end
So before the test runs a user sending_user sends a message to the receiving_user twice. The message_user_regarding_offer! looks like this:
def message_user_regarding_offer! offer, receiver, body
conversation = offer.conversation
if conversation.nil?
self.send_message(receiver, body, offer.conversation_subject)
else
self.reply_to_conversation(conversation, body)
# I put a binding.pry here to examine in console
end
offer.create_activity key: PublicActivityKeys.message_received, owner: self, recipient: receiver
end
On the first iteration in the test (when the first message is sent) the conversation variable is nil therefore a message is sent and a conversation is created between the two users.
On the second iteration the conversation created in the first iteration is returned and the user replies to that conversation, but a new conversation isn't created.
This all works, but the test fails and I cannot understand why!
When I place a pry binding in the code in the location specified above I can examine what is going on... now riddle me this:
self.mailbox.conversations[0] returns a Conversation instance
self.mailbox.conversations[1] returns nil
self.mailbox.conversations clearly shows a collection containing ONE object.
self.mailbox.conversations.count returns 2?!
What is going on there? the count method is incorrect and my test is failing...
What am I missing? Or is this a bug?!
EDIT
offer.conversation looks like this:
def conversation
Conversation.where({subject: conversation_subject}).last
end
and offer.conversation_subject:
def conversation_subject
"offer-#{self.id}"
end
EDIT 2 - Showing the first and second iteration in pry
Also...
Conversation.all.count returns 1!
and:
Conversation.all == self.mailbox.conversations returns true
and
Conversation.all.count == self.mailbox.conversations.count returns false
How can that be if the arrays are equal? I don't know what's going on here, blown hours on this now. Think it's a bug?!
EDIT 3
From the source of the Mailboxer gem...
def conversations(options = {})
conv = Conversation.participant(#messageable)
if options[:mailbox_type].present?
case options[:mailbox_type]
when 'inbox'
conv = Conversation.inbox(#messageable)
when 'sentbox'
conv = Conversation.sentbox(#messageable)
when 'trash'
conv = Conversation.trash(#messageable)
when 'not_trash'
conv = Conversation.not_trash(#messageable)
end
end
if (options.has_key?(:read) && options[:read]==false) || (options.has_key?(:unread) && options[:unread]==true)
conv = conv.unread(#messageable)
end
conv
end
The reply_to_convesation code is available here -> http://rubydoc.info/gems/mailboxer/frames.
Just can't see what I'm doing wrong! Might rework my tests to get around this. Or ditch the gem and write my own.
see this Rails 3: Difference between Relation.count and Relation.all.count
In short Rails ignores the select columns (if more than one) when you apply count to the query. This is because
SQL's COUNT allows only one or less columns as parameters.
From Mailbox code
scope :participant, lambda {|participant|
select('DISTINCT conversations.*').
where('notifications.type'=> Message.name).
order("conversations.updated_at DESC").
joins(:receipts).merge(Receipt.recipient(participant))
}
self.mailbox.conversations.count ignores the select('DISTINCT conversations.*') and counts the join table with receipts, essentially counting number of receipts with duplicate conversations in it.
On the other hand, self.mailbox.conversations.all.count first gets the records applying the select, which gets unique conversations and then counts it.
self.mailbox.conversations.all == self.mailbox.conversations since both of them query the db with the select.
To solve your problem you can use sending_user.mailbox.conversations.all.count or sending_user.mailbox.conversations.group('conversations.id').length
I have tended to use the size method in my code. As per the ActiveRecord code, size will use a cached count if available and also returns the correct number when models have been created through relations and have not yet been saved.
# File activerecord/lib/active_record/relation.rb, line 228
def size
loaded? ? #records.length : count
end
There is a blog on this here.
In Ruby, #length and #size are synonyms and both do the same thing: they tell you how many elements are in an array or hash. Technically #length is the method and #size is an alias to it.
In ActiveRecord, there are several ways to find out how many records are in an association, and there are some subtle differences in how they work.
post.comments.count - Determine the number of elements with an SQL COUNT query. You can also specify conditions to count only a subset of the associated elements (e.g. :conditions => {:author_name => "josh"}). If you set up a counter cache on the association, #count will return that cached value instead of executing a new query.
post.comments.length - This always loads the contents of the association into memory, then returns the number of elements loaded. Note that this won't force an update if the association had been previously loaded and then new comments were created through another way (e.g. Comment.create(...) instead of post.comments.create(...)).
post.comments.size - This works as a combination of the two previous options. If the collection has already been loaded, it will return its length just like calling #length. If it hasn't been loaded yet, it's like calling #count.
It is also worth mentioning to be careful if you are not creating models through associations, as the related model will not necessarily have those instances in its association proxy/collection.
# do this
mailbox.conversations.build(attrs)
# or this
mailbox.conversations << Conversation.new(attrs)
# or this
mailbox.conversations.create(attrs)
# or this
mailbox.conversations.create!(attrs)
# NOT this
Conversation.new(mailbox_id: some_id, ....)
I don't know if this explains what's going on, but the ActiveRecord count method queries the database for the number of records stored. The length of the Relation could be different, as discussed in http://archive.railsforum.com/viewtopic.php?id=6255, although in that example, the number of records in the database was less than the number of items in the Rails data structure.
Try
self.mailbox.conversations.reload; self.mailbox.conversations.count
or perhaps
self.mailbox.reload; self.mailbox.conversations.count
or, if neither of those work, just try reloading as many of the objects as possible to see if you can get it to work (self, mailbox, conversations, etc.).
My guess is that something is messed up between memory and the DB. This is definitely a really weird error though, might wanna put in an issue on Rails to see why this would be the case.
The result of mailbox.conversations is cached after the first call. To reload it write mailbox.conversations(true)

How checking value from DB table?

I want create a simple checking value from database. Here is my code:
def check_user_name(name, email)
db_name = Customers.find_by_name(name).to_s
db_email = Customers.find_by_email(email).to_s
if name == db_name && email == db_email
return 'yes'
else
return 'no'
end
end
But I have allways 'no' variant....why ?
Because you are calling to_s on your Customers model and not actually getting the name. The two fetch lines you have should be:
Customers.find_by_name(name).name.to_s # to_s probably not necessary if you know this field is a string
Customers.find_by_email(email).email
But, you're making two separate requests to the database. I don't know what the purpose of this is (as you could be selecting two different Customers) but you could do:
if Customers.where(name: name, email: email).exists?
"yes"
else
"no"
end
Since you are, however, selecting by name and email - I would highly recommend that you make sure those fields are indexed because large tables with those requests will bog the server and make that route rather slow (I would actually recommend that you pursue other routes that are more viable, but I wanted to mention this).
When you give Customers.find_by_name(name), you will not get name of a customer. Actually it will return activerecord object, so from this object you need to get the name and email of a customer, like below,
name = Customers.find_by_name(name).name
email = Customers.find_by_email(email).email
Now you will get the exact name and email of matched data from DB.

Is this a race condition issue in Rails 3?

Basically I have this User model which has certain attributes say 'health' and another Battle model which records all the fight between Users. Users can fight with one another and some probability will determine who wins. Both will lose health after a fight.
So in the Battle controller, 'CREATE' action I did,
#battle = Battle.attempt current_user.id, opponent.id
In the Battle model,
def self.attempt current_user.id, opponent_id
battle = Battle.new({:user_id => current_user.id, :opponent_id => opponent_id})
# all the math calculation here
...
# Update Health
...
battle.User.health = new_health
battle.User.save
battle.save
return battle
end
Back to the Battle controller, I did ...
new_user_health = current_user.health
to get the new health value after the Battle. However the value I got is the old health value (the health value before the Battle).
Has anyone face this kind of problem before ???
UPDATE
I just add
current_user.reload
before the line
new_user_health = current_user.health
and that works. Problem solved. Thanks!
It appears that you are getting current_user, then updating battle.user and then expecting current_user to automatically have the updated values. This type of thing is possible using Rails' Identity Map but there are some caveats that you'll want to read up on first.
The problem is that even though the two objects are backed by the same data in the database, you have two objects in memory. To refresh the information, you can call current_user.reload.
As a side note, this wouldn't be classified a race condition because you aren't using more than one process to modify/read the data. In this example, you are reading the data, then updating the data on a different object in memory. A race condition could happen if you were using two threads to access the same information at the same time.
Also, you should use battle.user, not battle.User like Wayne mentioned in the comments.

validates_uniqueness_of failing on heroku?

In my User model, I have:
validates_uniqueness_of :fb_uid (I'm using facebook connect).
However, at times, I'm getting duplicate rows upon user sign up. This is Very Bad.
The creation time of the two records is within 100ms. I haven't been able to determine if it happens in two separate requests or not (heroku logging sucks and only goes back so far and it's only happened twice).
Two things:
Sometimes the request takes some time, because I query FB API for name info, friends, and picture.
I'm using bigint to store fb_uid (backend is postgres).
I haven't been able to replicate in dev.
Any ideas would be extremely appreciated.
The signin function
def self.create_from_cookie(fb_cookie, remote_ip = nil)
return nil unless fb_cookie
return nil unless fb_hash = authenticate_cookie(fb_cookie)
uid = fb_hash["uid"].join.to_i
#Make user and set data
fb_user = FacebookUser.new
fb_user.fb_uid = uid
fb_user.fb_authorized = true
fb_user.email_confirmed = true
fb_user.creation_ip = remote_ip
fb_name_data, fb_friends_data, fb_photo_data, fb_photo_ext = fb_user.query_data(fb_hash)
return nil unless fb_name_data
fb_user.set_name(fb_name_data)
fb_user.set_photo(fb_photo_data, fb_photo_ext)
#Save user and friends to the db
return nil unless fb_user.save
fb_user.set_friends(fb_friends_data)
return fb_user
end
I'm not terribly familiar with facebook connect, but is it possible to get two of the same uuid if two separate users from two separate accounts post a request in very quick succession before either request has completed? (Otherwise known as a race condition) validates_uniqueness_of can still suffer from this sort of race condition, details can be found here:
http://apidock.com/rails/ActiveModel/Validations/ClassMethods/validates_uniqueness_of
Because this check is performed
outside the database there is still a
chance that duplicate values will be
inserted in two parallel transactions.
To guarantee against this you should
create a unique index on the field.
See add_index for more information.
You can really make sure this will never happen by adding a database constraint. Add this to a database migration and then run it:
add_index :user, :fb_uid, :unique => true
Now a user would get an error instead of being able to complete the request, which is usually preferable to generating illegal data in your database which you have to debug and clean out manually.
From Ruby on Rails v3.0.5 Module ActiveRecord::Validations::ClassMethods
http://s831.us/dK6mFQ
Concurrency and integrity
Using this [validates_uniqueness_of]
validation method in conjunction with
ActiveRecord::Base#save does not
guarantee the absence of duplicate
record insertions, because uniqueness
checks on the application level are
inherently prone to race conditions.
For example, suppose that two users
try to post a Comment at the same
time, and a Comment’s title must be
unique. At the database-level, the
actions performed by these users could
be interleaved in the following
manner: ...
It seems like there is some sort of a race condition inside your code. To check this, i would first change the code so that facebook values are first extracted and only then i would create a new facebook object.
Then i would highly suggest that you write a test to check whether your function gets executed once. It seems that it's executed two times.
And upon this, there seems to be a race condition upon waiting to get the facebook results.

Resources