I have the app, where user can have one of several different profiles. Some of profile data are always the same (like first and last name, gender etc). Other fields may vary (for example, doctor can have license number and text about himself, while patient can have phone number etc).
I found approach, that fits pretty well, but still have some doubts. The point of my approach looks like this:
User model contains a lot of system-specific data, controlled by Devise and has_one :person
Person model contains common profile data and belongs_to :profile, :polymorphic => true
Doctor/Patient/Admin/etc contains more specific profile data and has_one :person, :as => :profile
With this approach i can simply check in Person model:
def doctor?
self.profile_type == 'Doctor'
end
But there is the few things doesn't give me a rest.
First one is performance. This approach requires a lot of additional joins. For example, for reading doctor's license number, first/last name and email at the same time it will generate 2 additional joins.
Second one is different ids for profile-specific model (i.e. Doctor) and for Person/User models. There will be situations, when user with ID=1 will have Patient relation with different ID, but it would be logical to have same ID for all this associated models.
Maybe you guys will see any more pitfalls in this approach? Is there any better solution for my situation?
You've got four basic patterns you can use here that might work.
Omnirecord
In this model you have all the possible fields in a single record and then use STI to differentiate between profile types. This is the simplest to implement but looks the most messy as few people will have all fields populated. Keep in mind that NULL string fields don't take up a lot of database space, typically one bit per column, so having a lot of them isn't a big deal.
Optional Joins
In this model you create a number of possible linkages to different profile types, like doctor_profile_id linking to a DoctorProfile, patient_profile_id linking to a PatientProfile, and so forth. Since each relationship is spelled out in a specific field, you can even enforce foreign key constraints if you prefer, and indexing is easy. This can come in handy when a single record requires multiple different profiles to be associated with it, as in the case of a patient that's also a doctor.
Polymorphic Join
In this model you link to a specific profile type and profile id using the :polymorphic option, much like you've suggested. Indexing is more complicated and foreign keys are impossible. You're also limited to having one and only one profile. These tend to work as a starting point but may prove to be trouble down the road when you get a doctor + patient requirement.
Key/Value Store
In this model you abandon all efforts to organize things into singular records and instead build an associated ProfileField and ProfileValue table. The ProfileField identifies what fields are available on what kinds of profiles, such as label, allowed values, data type and so forth, while the ProfileValue is used to store specific values for specific profiles.
class User < ActiveRecord::Base
has_many :profile_fields
end
class ProfileField < ActiveRecord::Base
has_many :profile_values
end
class ProfileValue < ActiveRecord::Base
belongs_to :user
belongs_to :profile_field
end
Since this one is wide open you can allow the site administrator to redefine what field are required, add new fields, and so on, without having to make a schema change.
Related
I need help designing complex user permissions within a Postgres database. In my Rails app, each user will be able to access a unique set of features. In other words, there are no pre-defined "roles" that determine which features a user can access.
In almost every controller/view, the app will check whether or not the current user has access to different features. Ideally, the app will provide ~100 different features and will support 500k+ users.
At the moment, I am considering three different options (but welcome alternatives!) and would like to know which option offers the best performance. Thank you in advance for any help/suggestions.
Option 1: Many-to-many relationship
By constructing a many-to-many relationship between the User table and a Feature table, the app could check whether a user has access to a given feature by querying the join table.
E.g., if there is a record in the join table that connects user1 and feature1, then user1 has access to feature1.
Option 2: Multiple columns
The app could represent each feature as a boolean column on the User table. This would avoid querying multiple tables to check permissions.
E.g., if user1.has_feature1 is true, then user1 has access to feature1.
Option 3: Array column
The app could store features as strings in a (GIN-indexed?) array column on the User table. Then, to check whether a user has access to a feature, it would search the array column for the given feature.
E.g., if user1.features.include? 'feature1' is true, then user1 has access to feature1.
Many-to-many relationships are the only viable option here. There is a reason why they call it a relational database.
Why?
Joins are actually not that expensive.
Multiple columns - The number of columns in your tables will be ludicris and it will be true developer hell. As each feature adds a migration the amount of churn in your codebase will be silly.
Array column - Using an array column may seem like an attractive alternative until you realize that its actually just a marginal improvement over stuffing things into a comma seperated string. you have no referential integrety and none of the code organization benefits that come from have having models that represent the entities in your application.
Oh and every time a feature is yanked you have to update every one of those 500k+ users. VS just using CASCADE.
class Feature
has_many :user_features
has_many :users, through: :user_features
end
class UserFeature
belongs_to :user
belongs_to :feature
end
class User
has_many :user_features
has_many :features, through: :user_features
def has_feature?(name)
features.exist?(name: name)
end
end
I'm building a diet analysis app in Rails 4.1. I have a model, FoodEntry, which at a simple level has a quantity value and references a Food and a Measure:
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure
end
However I actually have two different types of measures, standard generic measures (cups, teaspoons, grams, etc.) and measures which are specific to a food (heads of broccoli, medium-sized bananas, large cans, etc.). Sounds like a case for a polymorphic association right? e.g.
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure, polymorphic: true # Uses measure_id and measure_type columns
end
class StandardMeasure < ActiveRecord::Base
has_many :food_entries, as: :measure
end
class FoodMeasure < ActiveRecord::Base
has_many :food_entries, as: :measure
end
The thing is, the food-specific measures come from a legacy database dump. These records are uniquely identified by a combination of their food_id and description - they aren't supplied with a single-column primary key (description is not unique on its own because there are multiple foods with the same measure description but different numeric data). Because I'm importing to my Rails Postgres db, I'm able to add a surrogate primary key - the auto-incrementing integer id column that Rails expects. But I don't want to utilize this id as a reference in my FoodEntry model because it poses a pretty big challenge for keeping referential integrity intact when the (externally-supplied) data is updated and I have to reimport. Basically, those ids are completely subject to change, so I'd much rather reference the food_id and description directly.
Luckily it's not very difficult to do this in Rails by using a scope on the association:
class FoodEntry < ActiveRecord::Base
belongs_to :food
belongs_to :measure, ->(food_entry) { where(food_id: food_entry.food_id) }, primary_key: 'description', class_name: 'FoodMeasure'
# Or even: ->(food_entry) { food_entry.food.measures }, etc.
end
Which produces a perfectly acceptable query like this:
> FoodEntry.first.measure
FoodMeasure Load (15.6ms) SELECT "food_measures".* FROM "food_measures" WHERE "food_measures"."description" = $1 AND "food_measures"."food_id" = '123' LIMIT 1 [["description", "Broccoli head"]]
Note that this assumes that measure_id is a string column in this case (because description is a string).
In contrast the StandardMeasure data is under my control and doesn't reference Foods, and so it makes perfect sense to simply reference the id column in that case.
So the crux of my issue is this: I need a way for a FoodEntry to reference only one type of measure, as it would in the polymorphic association example I made above. However I don't know how I'd implement a polymorphic association with respect to my measure models because as it stands:
an associated FoodMeasure needs to be referenced through a scope, while a StandardMeasure doesn't.
an associated FoodMeasure needs to be referenced by a string, while a StandardMeasure is referenced by an integer (and the columns being referenced have different names).
How do I reconcile these issues?
Edit: I think I should explain why I don't want to use the autonumber id on FoodMeasures as my foreign key in FoodEntries. When the data set is updated, my plan was to:
Rename the current food_measures table to retired_food_measures (or whatever).
Import the new set of data into a new food_measures table (with a new set of autonumber ids).
Run a join between these two tables, then delete any common records in retired_food_measures, so it just has the retired records.
If I'm referencing those measures by food_id and description, that way I get the benefit that food entries automatically refer to the new records, and therefore any updated numeric data for a given measure. And I can instruct my application to go searching in the retired_food_measures table if a referenced measure can't be found in the new one.
This is why I think using the id column would make things more complicated, in order to receive the same benefits I'd have to ensure that every updated record received the same id as the old one, every new record received a new not-used-before id, and that any retired id is never used again.
There's also one other reason I don't want to do this: ordering. The records in the dump are ordered first by food_id, however the measures for any given food_id are in a non-alphabetical but nevertheless logical order I'd like to retain. The id column can serve this purpose elegantly (because ids are assigned in row order on import), but I lose this benefit the moment the ids start getting messed around with.
So yeah I'm sure I could implement solutions to these problems, but I'm not sure it would be worth the benefit?
it poses a pretty big challenge for keeping referential integrity
intact when the (externally-supplied) data is updated
This is an illusion. You have total control over the surrogates. You can do exactly the processing of external updates whether they are there or not.
This is just one of those times when you want your own new names for things, in this case Measures, of which FoodMeasures and StandardMeasures are subtypes. Have a measure_id in all three models/tables. You can find many idioms for simplifying subtype constraints, eg using type tags.
If you process external updates in such a way that it is convenient for such objects to also have such surrogates then you need to clearly separate such PutativeFoodMeasures and FoodMeasures as subtypes of some supertype PutativeOrProvenFoodMeasure and/or of PutativeOrProvenMeasure.
EDIT:
Your update helps. You have described what I did. It is not difficult to map old to new ids; join on old & new (food_id,description) and select old id (not a food_id!). You control ids; how can it matter to reuse ids compared to them not even existing otherwise? Ditto for sorting FoodMeasures; do as you would have. It is only when you mix them with StandardMeasures giving some result that you need order a mixture differently; but you would do that anyway whether or not a shared id existed. (Though "polymorphic:" may not be the best id sharing design.)
The Measures model offers measures; and when you know you have a FoodMeasure or StandardMeasure you can get at its subtype-particular parts.
I'm building a course application system. The high school, undergraduate and graduate students are all able to apply for this course. They have to fill out some application form.
However, their information forms are similar, but not exactly the same. Every student has name, phone number, email, address, etc. But only undergraduate students have to provide their GPA, and graduate students is required to tell which lab they are researching at. There are other subtle differences...
So how should I deal with this? Make a big table, but leave 'GPA' column of high school students NULL? Or use three separate tables?
Moreover, there are some relation between Student(or, in three tables case, HighSchoolStudent, UndergraduateStudent and GraduateStudent) and other models. For instance, Course has many Students, Student has many Questions, and so on.
You can use a combination of STI and Store feature achieve this.
Declare a base model for Student with a text column called settings.
class Student < ActiveRecord::Base
store :settings
# name, email, phone, address etc..
end
class HighSchoolStudent < Student
# declare HighSchoolStudent specific attributes
store_accessor :settings, :gpa
end
class UndergraduateStudent < Student
# declare UndergraduateStudent specific attributes
store_accessor :settings, :attr1
end
class GraduateStudent< Student
# declare GraduateStudent specific attributes
store_accessor :settings, :attr2
end
In the sample above, instances of HighSchoolStudent will have an attribute called gpa.
You could go with the option you thought of like leaving GPA null, and set up custom validations for the model so it only checks depending on the student type. Single table inheritance is also an option, where you specify the different class names to use in a column on the database table, and then you simply add these classes in the model directory. You can see some documentation on it here: http://api.rubyonrails.org/classes/ActiveRecord/Base.html
I haven't tried STI before, but given what you stated above, I'd probably go for that route, branch off my code and would see how it pans out.
I have a situation that involves Companies, Projects, and Employees who write Reports on Projects.
A Company owns many projects, many reports, and many employees.
One report is written by one employee for one of the company's projects.
Companies each want different things in a report. Let's say one company wants to know about project performance and speed, while another wants to know about cost-effectiveness. There are 5-15 criteria, set differently by each company, which ALL apply to all of that company's project reports.
I was thinking about different ways to do this, but my current stalemate is this:
To company table, add text field criteria, which contains an array of the criteria desired in order.
In the report table, have a company_id and columns criterion1, criterion2, etc.
I am completely aware that this is typically considered horrible database design - inelegant and inflexible. So, I need your help! How can I build this better?
Conclusion
I decided to go with the serialized option in my case, for these reasons:
My requirements for the criteria are simple - no searching or sorting will be required of the reports once they are submitted by each employee.
I wanted to minimize database load - where these are going to be implemented, there is already a large page with overhead.
I want to avoid complicating my database structure for what I believe is a relatively simple need.
CouchDB and Mongo are not currently in my repertoire so I'll save them for a more needy day.
This would be a great opportunity to use NoSQL! Seems like the textbook use-case to me. So head over to CouchDB or Mongo and start hacking.
With conventional DBs you are slightly caught in the problem of how much to normalize your data:
A sort of "good" way (meaning very normalized) would look something like this:
class Company < AR::Base
has_many :reports
has_many :criteria
end
class Report < AR::Base
belongs_to :company
has_many :criteria_values
has_many :criteria, :through => :criteria_values
end
class Criteria < AR::Base # should be Criterion but whatever
belongs_to :company
has_many :criteria_values
# one attribute 'name' (or 'type' and you can mess with STI)
end
class CriteriaValues < AR::Base
belongs_to :report
belongs_to :criteria
# one attribute 'value'
end
This makes something very simple and fast in NoSQL a triple or quadruple join in SQL and you have many models that pretty much do nothing.
Another way is to denormalize:
class Company < AR::Base
has_many :reports
serialize :criteria
end
class Report < AR::Base
belongs_to :company
serialize :criteria_values
def criteria
self.company.criteria
end
# custom code here to validate that criteria_values correspond to criteria etc.
end
Related to that is the rather clever way of serializing at least the criteria (and maybe values if they were all boolean) is using bit fields. This basically gives you more or less easy migrations (hard to delete and modify, but easy to add) and search-ability without any overhead.
A good plugin that implements this is Flag Shih Tzu which I've used on a few projects and could recommend.
Variable columns (eg. crit1, crit2, etc.).
I'd strongly advise against it. You don't get much benefit (it's still not very searchable since you don't know in which column your info is) and it leads to maintainability nightmares. Imagine your db gets to a few million records and suddenly someone needs 16 criteria. What could have been a complete no-issue is suddenly a migration that adds a completely useless field to millions of records.
Another problem is that a lot of the ActiveRecord magic doesn't work with this - you'll have to figure out what crit1 means by yourself - now if you wan't to add validations on these fields then that adds a lot of pointless work.
So to summarize: Have a look at Mongo or CouchDB and if that seems impractical, go ahead and save your stuff serialized. If you need to do complex validation and don't care too much about DB load then normalize away and take option 1.
Well, when you say "To company table, add text field criteria, which contains an array of the criteria desired in order" that smells like the company table wants to be normalized: you might break out each criterion in one of 15 columns called "criterion1", ..., "criterion15" where any or all columns can default to null.
To me, you are on the right track with your report table. Each row in that table might represent one report; and might have corresponding columns "criterion1",...,"criterion15", as you say, where each cell says how well the company did on that column's criterion. There will be multiple reports per company, so you'll need a date (or report-number or similar) column in the report table. Then the date plus the company id can be a composite key; and the company id can be a non-unique index. As can the report date/number/some-identifier. And don't forget a column for the reporting-employee id.
Any and every criterion column in the report table can be null, meaning (maybe) that the employee did not report on this criterion; or that this criterion (column) did not apply in this report (row).
It seems like that would work fine. I don't see that you ever need to do a join. It looks perfectly straightforward, at least to these naive and ignorant eyes.
Create a criteria table that lists the criteria for each company (company 1 .. * criteria).
Then, create a report_criteria table (report 1 .. * report_criteria) that lists the criteria for that specific report based on the criteria table (criteria 1 .. * report_criteria).
In the Rails ActiveRecord Associations guide, I'm confused over why the tables for has_one and has_many are identical:
Example tables for has_many:
customers(id,name)
orders(id,customer_id,order_date)
Example tables for has_one:
these tables will, at the database level, also allow a supplier to have many accounts, but we just want one account per supplier
suppliers(id,name)
accounts(id,supplier_id,account_number) #Foreign Key to supplier here??
Shouldn't the tables for has_one be like this instead:
suppliers(id,name,account_id) #Foreign Key to account here
accounts(id,account_number)
Now because the account_id is in the suppliers table, a supplier can never have more than one account.
Is the example in the Rails Guide incorrect?
Or, does Rails use the has_many kind of approach but restricts the many part from happening?
If you think about this way -- they are all the same:
1 customer can have many orders, so each order record points back to customer.
1 supplier can have one account, and it is a special case of "has many", so it equally works with account pointing back to supplier.
and it is the same case with many-to-many, with junction table pointing back to the individual records... (if a student can take many classes, and one class can have many students, then the enrollment table points back to the student and class records).
as to why account points back to supplier vs account points to supplier, that one I am not entirely sure whether we can have it either way, or one form is better than the other.
I believe it has to do with the constraints. With has_one rails will try to enforce that there is only one account per supplier. However, with a has_many, there will be no constraint enforced, so a supplier with has_many would be allowed to exist with multiple accounts.
It does take some getting used to when thinking about the relationships and their creation in rails. If you want to enforce foreign keys on the database side (since rails doesn't do this outside of the application layer), take a look at Mathew Higgins' foreigner
If I understand your question correctly, you believe there is a 1:1 relationship bi-directionally in a has_one/belongs_to relationship. That's not exactly true. You could have:
Class Account
belongs_to :supplier
belongs_to :wholesaler
belongs_to :shipper
# ...
end
account = supplier.account # Get supplier's account
wholesaler = Wholesaler.new
wholesaler.accounts << account # Tell wholesaler this is one of their suppliers
wholesaler.save
I'm not saying your app actually behaves this way, but you can see how a table -- no, let's say a model -- that "belongs to" another model is not precluded from belonging to any number of models. Right? So the relationship is really infinity:1.
I should add that has_one is really a degenerate case of has_many and just adds syntactic sugar of singularizing the association and a few other nits. Otherwise, it's pretty much the same thing and it's pretty much why they look alike.