I am working on a project that currently has tons of HABTM associations. Essentially, everything is related to everything else. I am considering setting up a single intermediate table/model that has two polymorphic fields. This way, if I add another model I can easily connect it to the remaining models. Is this a good idea? If not, why not? If it is, why don't all rails projects have this kind of intermediate table?
I see two other options. I could keep adding intermediate tables or I could add a table that contains one of each type. The former option is kind of a hassle and the latter option does not allow for self joins.
While a polymorphic join table sounds like it would make things easier, I think you will end up creating more headache for yourself than it's worth. Here are a few potential challenges/problems off the top of my head:
You will not be able to use ActiveRecord's has_and_belongs_to_many association or related helpers without a ton of hacking/monkeypatching which will immediately eclipse the time it would take to setup individual pairwise link tables.
Your join table will have two id columns, let's call them a_id and b_id. For any given pair of models you will have to ensure that the ids always end up in the same column.
Example: If you have two models called User and Role, you would have to ensure for that pair that the user_id is always stored in col a_id and the role_id is always stored in col b_id, otherwise you will not be able to index the table in any kind of meaningful way (and will run the risk of defining the same relationship twice).
If you ever want to use database enforcement of FOREIGN KEY constraints it is unlikely that this polymorphic link table scheme will be supported.
The universal link table will get n times larger than n separate link tables. It shouldn't matter much with good indexing but as your application and data grow this could become a headache and limit some of your options in regards to scaling. Give your DB a break.
Most or least importantly (I can't decide) you will be bucking the norm which means a lot fewer (if any) resources out there to help you when you run into trouble. Basically the Adam Sandler "they're all gonna laugh at you" rationale.
Last thought: Can you eliminate any of the link tables by using has_many :xxx, :through => :xxx relationships?
Thinking it all through, you could actually do this, but I wouldn't. Join tables grow fast enough as it is and i like to keep model relationships simple and easy to alter.
I'm used to working on very large systems / data sets though, so if you're going going to have much in each join then ok. I'd still do it separately for joins however and i really like my polymorphics.
I think it would be cleaner and more flexible if you were to use multiple join tables as opposed to one giant multipurpose join table.
Related
I have a Rails application that includes tables for surveys and (survey) questions.
A survey has_many questions
A question belongs_to a survey
The surveys are inherently teacher surveys. Now we are introducing student surveys, which are meaningfully different from teacher surveys, with different types of information that we need to store about them, such that they seem to each warrant their own table/model, so I'm thinking we want separate tables for teacher_surveys and student_surveys.
However, the questions are really pretty much the same. Includes things like the question text, type of question (text, checkbox, dropdown), etc. So it seems like questions should remain a single table.
How do I best model this data?
Should the questions table have a teacher_survey_id and a student_survey_id where each is optional but one of the two of them is required?
Should I have join tables for questions_teacher_surveys and questions_student_surveys?
Something else?
There is no easy answer to this question. Separating questions into student_question and teach_question tables does mean you have a slight bit of duplication and you can't query them as a homogenized collection if that happens to be important.
The code duplication can be very simply addressed by using inheritance or composition but there is an increased maintainence burdon / complexity cost here.
But it does come with the advantage that you get a guarentee of referential integrity from the non-nullable foreign key column without resorting to stuff like creating a database trigger or the lack of real foreign key constraints if you choose to use a polymorphic association.
An additional potential advantage is that you're querying smaller tables that can remain dense instead of sparse if the requirements diverge.
sounds like you need to make your surveys table polymorphic and add a type column as an enum with :student and :teacher as options, you can use the type column as a flag to control the different business logic. Once the survey table becomes polymorphic you probably wont need to do anything with your questions table. If you decide to go with this solution its also recommend to add a concern class named Surveyable to hold the needed associations and shared logic between your surveyed models (in your case students and teachers).
There are many answers to the question of implementing a simple self-join in Rails. My problem is a bit complex.
I have many models. Doctor, Clinic, Treatment etc. A doctor can have similar doctors. A clinic can have similar clinics and so on. I can implement separate self joins for each model but this will lead to repetition of code. Is there a way to have one table to self join different models? I think the solution lies with using polymorphic association either on the join table or as an intermediate table but I can't seem to come up with a working code.
Just say no to polymorphic associations. Yeah they let you cheat the relational database model and are good for those special cases where you have a "behavior" that needs to be shared with a ton of different classes. Like in libraries. In reality though there is nothing you can do with polymorphic assocations that can't be done (arguably better) with more columns / tables.
The biggest con is that polymorphic associations don't have any real foreign keys in the database. That means the database does not guarantee referential integrity. There are also tons of issues related to joining, eager loading and the fact that its a leaky abstraction like when it comes to for example ordering.
Code duplication should really be a secondary concern compared to a good solid database design or the single responsibility principle. You should also consider the fact that joins are not just plumbing. In many cases the join table is an entity in itself and describes the relations between entities. Like you might actually want to keep track of how long a doctor worked at a clinic. If you build your mega do everything polymorphic join table (and your god class join model) you will have to redesign it anyways when the requirements change.
I encountered against the typical case of a model (Address) and multiple models having this data (Company, Person). Let's work with these models as example, but can be generalized with any others.
Making the correct choice is important, as changing the database schema or making deep changes are not easy later.
Initially it will be one-to-one association, then we have these possibilities:
1) Use the classical normalization and put all the address data into each model requiring it. Then create an address subform for render partial into each one, and put the code behavior into a Module to be called from each model controller using it.
This could looks fine, but it has the problem that is hard to make changes, as any change in the Address data needs to be done on each model using it. Also hard to change it to a one-to-many association, if required.
2) Rails polymorphic feature. Address is a model itself, and belongs_to :addressable, polymorphic: true, then Company and Person have has_one/has_many :address/:addresses, as :addressable. Then add the polymorphic FK to Address table and it's done.
I like this solution, it is clear and with only 2 extra columns we have done it. The only contra I can think about is we rely in a framework feature, so if someday we migrate to another one, it should feature polymorphic too. Well this is not really true because we can always make the SQL manually searching by addressable_type = "type", but I think is not database design standard.
3) Join tables. One for each parent. So we have JT_Company_Address, JT_Person_Address, and we use the Rails has_one/has_many :through associations using those join tables.
I think using join tables is more database design standard, but add some extra tables.
4) Reverse the relationship, making Address the parent and each of the other a child. So each child would have a FK address_id.
I don't like much this one, as I will handle mainly companies and persons, and then look up their address if required, not the opposite, looking addresses and then loop up what it has in it. We can always use the bidiriectional association using the Rails inverse_of, but at database layer Address would be the parent one anycase.
My favorites at this moment are 2 and 3. I will probably use Rails and not move from it, so the polymorphic looks very easy and clean.
Then, what do you think, which one is the best one? Any advice is welcome.
Thanks.
I am making a proof of concept for an app, using Rails 4 and Postgresql. I am thinking on the best way to handle the relation between Products and SubProducts.
A Product have a name, a description... and a SubProduct could have multiple fields too.
A Product have many SubProduct, a SubProduct belongs to one Product.
I have some Products and SubProducts with hundreds of fields. So I think it is best to not use STI to avoid thousands of null value.
Also I am working with remote designers, I would like to keep it simple for them. So when they want to display the value of a field from a sub product, they would write something like #product.name (from Product table) or #product.whatever (field from SubProduct table).
My question is how to handle this ? For the moment, I was thinking to delete the Products table and to make multiple SELECT to db, one for each SubProducts table. But maybe there is a solution to keep the Products table ? Or maybe I can take advantage of table inheritance from Postgresql ?
Thank you :-)
Are the hundreds of fields all different for each subproduct? (As you mentioned, "sparse" attributes can lead to lots of nulls.)
This brings to mind an entity-attribute-value model, as described here:
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Here's a presentation with one organization's solution (key/value hstore):
https://wiki.postgresql.org/images/3/37/Eav-pgconfnyc2014.pdf
This can quickly get very complicated, and makes things like search much more challenging.
And if there are many variations, this also brings to mind a semi-structured, document-oriented or "NoSQL" design.
I'm using MS SQL Server 2008R2, but I believe this is database agnostic.
I'm redesigning some of my sql structure, and I'm looking for the best way to set up 1 to many relationships.
I have 3 tables, Companies, Suppliers and Utilities, any of these can have a 1 to many relationship with another table called VanInfo.
A van info record can either belong to a company, supplier or utility.
I originally had a company_id in the VanInfo table that pointed to the company table, but then when I added suppliers, they needed vaninfo records as well, so I added another column in VanInfo for supplier_id, and set a constraint that either supplier_id or company_id was set and the other was null.
Now I've added Utilities, and now they need access to the VanInfo table, and I'm realizing that this is not the optimum structure.
What would be the proper way of setting up these relationships? Or should I just continue adding foreign keys to the VanInfo table? or set up some sort of cross reference table.
The application isn't technically live yet, but I want to make sure that this is set up using the best possible practices.
UPDATE:
Thank you for all the quick responses.
I've read all the suggestions, checked out all the links. My main criteria is something that would be easy to modify and maintain as clients requirements always tend to change without a lot of notice. After studying, research and planning, I'm thinking it is best to go with a cross reference table of sorts named Organizations, and 1 to 1 relationships between Companies/Utilities/Suppliers and the Organizations table, allowing a clean relationship to the Vaninfo table. This is going to be easy to maintain and still properly model my business objects.
With your example I would always go for 'some sort of cross reference table' - adding columns to the VanInfo table smells.
Ultimately you'll have more joins in your SP's but I think the overhead is worth it.
When you design a database you should not think about where the primary/foreign key goes because those are concepts that doesn’t belong to the design stage. I know it sound weird but you should not think about tables as well ! (you could implement your E/R model using XML/Files/Whatever
Sticking to E/R relationship design you should just indentify your entity (in your case Company/supplier/utilities/vanInfo) and then think about what kind of relationship there is between them(if there are any). For example you said the company can have one or more VanInfo but the Van Info can belong only to one Company. We are talking about a one – to- many relationship as you have already guessed. At this point when you “convert” you design model (a one-to many relationship) to a Database table you will know where to put the keys/ foreign keys. In the case of a one-to-Many relationship the foreign key should go to the “Many” side. In this case the van info will have a foreign keys to company (so the vaninfo table will contain the company id) . You have to follow this way for all the others tables
Have a look at the link below:
https://homepages.westminster.org.uk/it_new/BTEC%20Development/Advanced/Advanced%20Data%20Handling/ERdiagrams/build.htm
Consider making Com, Sup and Util PKs a GUID, this should be enough to solve the problem. However this sutiation may be a good indicator of poor database design, but to propose a different solution one should know more broad database context, i.e. that you are trying to achive. To me this seems like a VanInfo should be just a separate entity for each of the tables (yes, exact duplicate like Com_VanInfo, Sup_VanInfo etc), unless VanInfo isn't shared between this entities (then relationships should be inverted, i.e. Com, Sup and Util should contain FK for VanInfo).
Your database basically need normalization and I think you're database should be on its fifth normal form where you have two tables linked by one table. Please see this article, this will help you:
http://en.wikipedia.org/wiki/Fifth_normal_form
You may also want to see this, database normalization:
http://en.wikipedia.org/wiki/Database_normalization