I'm working on a Rails app (Ruby 1.9.2 / Rails 3.0.3) that keeps track of people and their memberships to different teams over time. I'm having trouble coming up with a scalable way to combine duplicate Person objects. By 'combine' I mean to delete all but one of the duplicate Person objects and update all references to point to the remaining copy of that Person. Here's some code:
Models:
Person.rb
class Person < ActiveRecord::Base
has_many :rostered_people, :dependent => :destroy
has_many :rosters, :through => :rostered_people
has_many :crews, :through => :rosters
def crew(year = Time.now.year)
all_rosters = RosteredPerson.find_all_by_person_id(id).collect {|t| t.roster_id}
r = Roster.find_by_id_and_year(all_rosters, year)
r and r.crew
end
end
Crew.rb
class Crew < ActiveRecord::Base
has_many :rosters
has_many :people, :through => :rosters
end
Roster.rb
class Roster < ActiveRecord::Base
has_many :rostered_people, :dependent => :destroy
has_many :people, :through => :rostered_people
belongs_to :crew
end
RosteredPerson.rb
class RosteredPerson < ActiveRecord::Base
belongs_to :roster
belongs_to :person
end
Person objects can be created with just a first and last name, but they have one truly unique field called iqcs_num (think of it like a social security number) which can be optionally stored on either the create or update actions.
So within the create and update actions, I would like to implement a check for duplicate Person objects, delete the duplicates, then update all of the crew and roster references to point to the remaining Person.
Would it be safe to use .update_all on each model? That seems kind of brute force, especially since I will probably add more models in the future that depend on Person and I don't want to have to remember to maintain the find_duplicate function.
Thanks for the help!
The 'scalable' way to deal with this is to make the de-duplication process part of the app's normal function - whenever you save a record, make sure it's not a duplicate. You can do this by adding a callback to the Person model. Perhaps something like this:
before_save :check_for_duplicate
def check_for_duplicate
if iqcs_num
dup = Person.find_by_iqcs_num(self.iqcs_num)
if dup && dup.id != self.id
# move associated objects to existing record
dup.crews = dup.crews + self.crews
# update existing record
dup.update_attributes(:name => self.name, :other_field => self.other_field)
# delete this record
self.destroy
# return false, so that no other callbacks get triggered
return false
end
end
end
You'll want to make sure that you index the table you store Person objects in on the iqcs_num column, so that this lookup stays efficient as the number of records grows - it's going to be performed every time you update a Person record, after all.
I don't know that you can get out of keeping the callback up-to-date - it's entirely likely that different sorts of associated objects will have to be moved differently. On the other hand, it only exists in one place, and it's the same place you'd be adding the associations anyway - in the model.
Finally, to make sure your code is working, you'll probably want to add a validation on the Person model that prevents duplicates from existing. Something like:
validates :iqcs_num, :uniqueness => true, :allow_nil => true
Related
I'm a rails begginer and I was coding a simple app to train the language and other stuff.
In my app, I have three different scaffolds generated, one for People, one for House Activities and one last to link them together called Assignments. It's a many to many dependency situation.
So I was trying to calculate the total time a person would have to spend doing all the house activities assigned to them and store it inside the Person in an attribute called "time_allocated". So if I have two activities assigned to someone, it would return the sum of the duration of those activities.
After searching I discovered that creating an attribute with three dependencies is no good, but I don't know how to do it other way.
These are the models and the things that I tried to do:
Person Model
class Person < ActiveRecord::Base
has_many :assignments, dependent: :destroy
has_many :house_activities, through: :assignments
extend FriendlyId
friendly_id :name, use: :slugged
end
House Activity Model
class HouseActivity < ActiveRecord::Base
has_many :assignments, dependent: :destroy
has_many :people, through: :assignments
extend FriendlyId
friendly_id :name, use: :slugged
end
Assignment Model
class Assignment < ActiveRecord::Base
belongs_to :person
belongs_to :house_activity
def self.time_allocation #fulltime
Assignment.all.each do |assignment|
if (assignment.person.time_allocation.present?)
assignment.person.time_allocation += assignment.house_activity.duration
else
assignment.person.time_allocation = assignment.house_activity.duration
end
end
end
end
If I understand correctly, you're trying to get the sum of the durations of all of a Person's house_activities. You can get this directly from the database using Rails' ActiveRecord::Calculations#sum method:
person = Person.find(123)
puts person.house_activities.sum(:duration)
# => 500
Of course, you could create a helper method for this as well:
class Person < ActiveRecord::Base
# ...
def total_activities_duration
house_activities.sum(:duration)
end
end
person = Person.find(123)
puts person.total_activities_duration
# => 500
I would advise against storing this sum in the database, because then you have to ensure its consistency (e.g. every time an Assignment is created, edited, or deleted, you have to ensure that the associated Person is updated with the new sum). You might think that calculating the sum anew every time will slow down your app, and it may at some time in the future when you have thousands of records, but there's no need to optimize this unless and until an actual performance problem arises.
I have three activerecord classes: Klass, Reservation and Certificate
A Klass can have many reservations, and each reservation may have one Certificate
The definitions are as follows...
class Klass < ActiveRecord::Base
has_many :reservations, dependent: :destroy, :autosave => true
has_many :certificates, through: :reservations
attr_accessible :name
def kill_certs
begin
p "In Kill_certs"
self.certificates.destroy_all
p "After Destroy"
rescue Exception => e
p "In RESCUE!"
p e.message
end
end
end
class Reservation < ActiveRecord::Base
belongs_to :klass
has_one :certificate, dependent: :destroy, autosave: true
attr_accessible :klass_id, :name
end
class Certificate < ActiveRecord::Base
belongs_to :reservation
attr_accessible :name
end
I would like to be able to delete/destroy all the certificates for a particular klass within the klass controller with a call to Klass#kill_certs (above)
However, I get an exception with the message:
"In RESCUE!"
"Cannot modify association 'Klass#certificates' because the source
reflection class 'Certificate' is associated to 'Reservation' via :has_one."
I('ve also tried changing the reservation class to "has_many :certificates", and then the error is...
"In RESCUE!"
"Cannot modify association 'Klass#certificates' because the source reflection
class 'Certificate' is associated to 'Reservation' via :has_many."
It's strange that I can do Klass.first.certificates from the console and the certs from the first class are retrieved, but I can't do Klass.first.certificates.delete_all with out creating an error. Am I missing something?
Is the only way to do this..
Klass.first.reservations.each do |res|
res.certificate.destroy
end
Thanks for any help.
RoR docs have clear explanation for this (read bold only for TLDR):
Deleting from associations
What gets deleted?
There is a potential pitfall here: has_and_belongs_to_many and
has_many :through associations have records in join tables, as well as
the associated records. So when we call one of these deletion methods,
what exactly should be deleted?
The answer is that it is assumed that deletion on an association is
about removing the link between the owner and the associated
object(s), rather than necessarily the associated objects themselves.
So with has_and_belongs_to_many and has_many :through, the join
records will be deleted, but the associated records won’t.
This makes sense if you think about it: if you were to call
post.tags.delete(Tag.find_by(name: 'food')) you would want the ‘food’
tag to be unlinked from the post, rather than for the tag itself to be
removed from the database.
However, there are examples where this strategy doesn’t make sense.
For example, suppose a person has many projects, and each project has
many tasks. If we deleted one of a person’s tasks, we would probably
not want the project to be deleted. In this scenario, the delete
method won’t actually work: it can only be used if the association on
the join model is a belongs_to. In other situations you are expected
to perform operations directly on either the associated records or the
:through association.
With a regular has_many there is no distinction between the
“associated records” and the “link”, so there is only one choice for
what gets deleted.
With has_and_belongs_to_many and has_many :through, if you want to
delete the associated records themselves, you can always do something
along the lines of person.tasks.each(&:destroy).
So you can do this:
self.certificates.each(&:destroy)
Relationships
class Promotion < ActiveRecord::Base
has_many :promotion_sweepstakes,
has_many :sweepstakes,
:through => :promotion_sweepstakes
end
class PromotionSweepstake < ActiveRecord::Base
belongs_to :promotion
belongs_to :sweepstake
end
class Sweepstake < ActiveRecord::Base
# Not relevant in this question, but I included the class
end
So a Promotion has_many Sweepstake through join table PromotionSweepstake. This is a legacy db schema so the naming might seem a bit odd and there are some self.table_name == and foreign_key stuff left out.
The nature of this app demands that at least one entry in the join table is present for a promotionId, because not having a sweepstake would break the app.
First question
How can I guarantee that there is always one entry in PromotionSweepstake for a Promotion? At least one Sweepstake.id has to be included upon creation, and once an entry in the join table is created there has to be a minimum of one for each Promotion/promotion_id.
Second question (other option)
If the previous suggestion would not be possible, which I doubt is true, there's another way the problem can be worked around. There's a sort of "default Sweepstake" with a certain id. If through a form all the sweepstake_ids would be removed (so that all entries for the Promotion in the join table would be deleted), can I create a new entry in PromotionSweepstake?
pseudo_code (sort of)
delete promotion_sweepstake with ids [1, 4, 5] where promotion_id = 1
if promotion with id=1 has no promotion_sweepstakes
add promotion_sweepstake with promotion_id 1 and sweepstake_id 100
end
Thank you for your help.
A presence validation should solve the problem in case of creation and modification of Promotions.
class Promotion < ActiveRecord::Base
has_many :promotion_sweepstakes
has_many :sweepstakes,
:through => :promotion_sweepstakes
validates :sweepstakes, :presence => true
end
In order to assure consistency when there's an attempt to delete or update a Sweepstake or a PromotionSweepstake you'd have to write your own validations for those two classes. They would have to check whether previously referenced Promotions are still valid, i.e. still have some Sweepstakes.
A simple solution would take and advantage of validates :sweepstakes, :presence => true in Promotion. After updating referenced PromotionSweepstakes or Sweepstakes in a transaction you would have to call Promotion#valid? on previously referenced Promotions. If they're not valid you roll back the transaction as the modification broke the consistency.
Alternatively you could use before_destroy in both PromotionSweepstake and Sweepstake in order to prevent changes violating your consistency requirements.
class PromotionSweepstake < ActiveRecord::Base
belongs_to :promotion
belongs_to :sweepstake
before_destroy :check_for_promotion_on_destroy
private
def check_for_promotion_on_destroy
raise 'deleting the last sweepstake' if promotion.sweepstakes.count == 1
end
end
class Sweepstake < ActiveRecord::Base
has_many :promotion_sweepstakes
has_many :promotions, :through => :promotion_sweepstakes
before_destroy :check_for_promotions_on_destroy
private
def check_for_promotions_on_destroy
promotions.each do |prom|
raise 'deleting the last sweepstake' if prom.sweepstakes.count == 1
end
end
end
In my project, I have a self-referential association.
I have a User model:
class User < ActiveRecord::Base
has_many :relationships, :dependent => :destroy
has_many :peers, :through => :relationships
end
And a Relationship model:
class Relationship < ActiveRecord::Base
belongs_to :user
belongs_to :peer, :class_name => "User"
end
When two users are peers with one another, there are obviously two records in the database.
When one user opts to end a relationship, I'd like this to destroy both records - not just one side of the relationship.
Is there a better way to go about doing this rather than loading the relationship twice in the controller (once for each side of the relationship)?
Couple of ways this can be done
Firstly is an after delete trigger, this is a pretty controversial way of doing things if you believe in the false promise of database agnosticism, however is one that works - in essence, you look at old.peer_id and old.user_id and then do a delete but reversing the roles. If you want to go down this route, you should consult your database manual as how to implement a trigger.
The second way is an after_destroy callback where you do a
after_destroy do |record|
other = Relationship.find_by_user_id_and_peer_id(record.peer_id, record.user_id)
other.destroy if other
end
The other - and probably more drastic measure is to rework the model, so that it has a boolean accepted field wherein both sides of the relationship are modelled by one record in the database, there is a constraint on records where (peer_id, user_id) = (user_id, peer_id). That way you wont have to worry about deleting both sides, nor having duplicate records.
Using ActiveRecord, I have an object, Client, that zero or more Users (i.e. via a has_many association). Client also has a 'primary_contact' attribute that can be manually set, but always has to point to one of the associated users. I.e. primary_contact can only be blank if there are no associated users.
What's the best way to implement Client such that:
a) The first time a user is added to a client, primary_contact is set to point to that user?
b) The primary_contact is always guaranteed to be in the users association, unless all of the users are deleted? (This has two parts: when setting a new primary_contact or removing a user from the association)
In other words, how can I designate and reassign the title of "primary contact" to one of a given client's users? I've tinkered around with numerous filters and validations, but I just can't get it right. Any help would be appreciated.
UPDATE: Though I'm sure there are a myriad of solutions, I ended up having User inform Client when it is being deleted and then using a before_save call in Client to validate (and set, if necessary) its primary_contact. This call is triggered by User just before it is deleted. This doesn't catch all of the edge cases when updating associations, but it's good enough for what I need.
My solution is to do everything in the join model. I think this works correctly on the client transitions to or from zero associations, always guaranteeing a primary contact is designated if there is any existing association. I'd be interested to hear anyone's feedback.
I'm new here, so cannot comment on François below. I can only edit my own entry. His solution presumes user to client is one to many, whereas my solution presumes many to many. I was thinking the user model represented an "agent" or "rep" perhaps, and would surely manage multiple clients. The question is ambiguous in this regard.
class User < ActiveRecord::Base
has_many :user_clients, :dependent => true
has_many :clients, :through => :user_client
end
class UserClient < ActiveRecord::Base
belongs_to :user
belongs_to :client
# user_client join table contains :primary column
after_create :init_primary
before_destroy :preserve_primary
def init_primary
# first association for a client is always primary
if self.client.user_clients.length == 1
self.primary = true
self.save
end
end
def preserve_primary
if self.primary
#unless this is the last association, make soemone else primary
unless self.client.user_clients.length == 1
# there's gotta be a more concise way...
if self.client.user_clients[0].equal? self
self.client.user_clients[1].primary = true
else
self.client.user_clients[0].primary = true
end
end
end
end
end
class Client < ActiveRecord::Base
has_many :user_clients, :dependent => true
has_many :users, :through => :user_client
end
Though I'm sure there are a myriad of solutions, I ended up having User inform Client when it is being deleted and then using a before_save call in Client to validate (and set, if necessary) its primary_contact. This call is triggered by User just before it is deleted. This doesn't catch all of the edge cases when updating associations, but it's good enough for what I need.
I would do this using a boolean attribute on users. #has_one can be used to find the first model that has this boolean set to true.
class Client < AR::B
has_many :users, :dependent => :destroy
has_one :primary_contact, :class_name => "User",
:conditions => {:primary_contact => true},
:dependent => :destroy
end
class User < AR::B
belongs_to :client
after_save :ensure_only_primary
before_create :ensure_at_least_one_primary
after_destroy :select_another_primary
private
# We always want one primary contact, so find another one when I'm being
# deleted
def select_another_primary
return unless primary_contact?
u = self.client.users.first
u.update_attribute(:primary_contact, true) if u
end
def ensure_at_least_one_primary
return if self.client.users.count(:primary_contact).nonzero?
self.primary_contact = true
end
# We want only 1 primary contact, so if I am the primary contact, all other
# ones have to be secondary
def ensure_only_primary
return unless primary_contact?
self.client.users.update_all(["primary_contact = ?", false], ["id <> ?", self.id])
end
end