Database modeling best practice: Dealing with Associations to the same "base"

Database modeling best practice: Dealing with Associations to the same "base" - ruby-on-rails

I tried to normalize my database model. But I'm clueless how to do it in this case. Giving the following model:
Customer has many Systems (has_many :systems)
Cluster has many Systems (has_and_belongs_to_many :systems)
I want to display all Systems of a Customer. That would be: #customer.systemobjects.each. That is already working.
Then I could add a System to a Cluster (which I mentioned is a "HABTM" association). In my Customer view I want to show only systems, that are not related to a cluster (also working with Cluster.includes(:systems).where(systems: { id: sysid }).present?).
Now my question: I want to display all Clusters (and Systems of that Cluster) of a specific customer, too. But, right now, I only to have the connection to customer through systems. For me, it would be easier, to add a reference to customer also in the cluster object (even though I would have this information already in the system).
Should I add this reference? Does it have something to do with normalization anyway? How would you assess this situation from a best practice point of view for a databases in general and for Ruby On Rails in specific? How would also be the best way, to go through each cluster of a customer, when I have it only through systems (how I could do it in rails?)?

I think you'd prefer something like this:
class Customer
has_many :systems
has_many :clusters, through :systems # expect `clusters_id` in System, which is typical
# ...other code
end
class System
belongs_to :customer
belongs_to :cluster
# ...other code
end
class Cluster
has_many :systems
has_many :customers, through :systems # expect `customer_id` in System, which is typical
# ...other code
end
This results in three tables, as your model already implies, but uses the systems table as a "hinge" lookup table for the other two without implication that clusters belong to systems (which doesn't make sense IRL as I understand your problem statement).
I often find has_many :through is often an easier and better choice than HABTM unless you truly have a mutual belonging relationship. You can go to the Rails Guides here and here (guide: "Active Record Associations") for more information on use :through. Definitely worth getting to know that guide for the kind of questions you have (though to be fair, it can take a bit of experience to fully appreciate the various options and how they're helpful).
Now when you want to refer to clusters that a customer has systems within, you merely need to write something like this:
my_customer = Customer.find(some_id)
customer_clusters = customer.clusters
customer_systems = customer.systems
To find all the customers for a cluster (and so through clusters' systems), you'd write something like this:
target_cluster = Cluster.find(some_id)
cluster_customers = target_cluster.customers
If you want a want to produce a hierarchy of those (say, systems of a customer grouped under the clusters they belong to), it'd be something like this:
my_customer = Customer.find(some_id)
customers_systems = customer.systems.includes(:cluster) # this brings in the associated cluster records without additional database operations
then either iterate or use group_by from the resulting data in customer_systems depending on how you intend to display or return the data.

Related

Rails: using STI to model client and partner records

I know STI is a debated topic within the Rails community (and probably others), which is the reason I'm trying to find a different solution to my problem before going down the STI route.
I'm building a system that has a contact management portion, which contains both client and partner records. The difference is that partners will have an associated partner_type and a few additional fields that client will not have.
This looks like a good case for STI. The records are of the same "category", meaning they all represent "people" but in different ways. They will all have the same core fields and have many email_addresses/phone_numbers.
But the biggest requirement that led me to STI instead of separate tables, is that I need to list all of the contacts together alphabetically. My employer doesn't want separate pages for client records and partner records. If I broke this into multiple tables, I would have to somehow query both tables and arrange them alphabetically, also while taking pagination into account (will have thousands of records for each type).
Is there another solution besides STI? I know many developers have run into problems with STI before, but I'm leaning towards this is a text-book case where STI may actually work.
class Contact < ApplicationRecord
has_many :email_addresses # probably use polymorphic
has_many :phone_numbers # probably use polymorphic
validates :first_name, :last_name, presence: true
end
class Client < Contact
end
class Partner < Contact
belongs_to :partner_type
validates :partner_type, presence: true
# some attributes only applicable to client
validates :client_unique_field1, :client_unique_field2, presence: true
end

There are two design decisions you need to make this case:
Should partners and clients share the same table?
a. If "no", then you simple create separate tables and separate models.
b. If "yes", then you have a second design question to answer #2.
Should partners and clients share the same model class?
a. If "yes", then you can use an emum to identify the different roles of partner and client and use that enum to drive your business logic.
b. If "no" then you should implement STI.
I think there is a strong case to say "yes" to #1. It seems client and partner are both fundamentally the same thing. They are both people.
More importantly, they will contain most of the same information so sharing a table makes good sense.
So that leaves you with whether or not to use STI or an enum. The fundamental decision you need to make surrounds business logic associated with partners and clients.
If most of the business logic is shared, then it makes sense to use an enum. Let me give you an example. In one of my projects, I have a User model. All users can do basic things on the site. However, we also have school_admin users and class_admin users. Admins of course have greater access to portions of the site, but from a business logic perspective, there are only a couple of relations and a couple of methods that are unique to an admin and not shared by a user.
Since 95% of the business logic is shared between normal users and admins, I elected to keep them all in one class. I used an enum called role to distinguish users:
# in the User model
enum :role, [:graduate, :school_admin, :class_admin]
In the users table I have a column of type int called role. The enum opens up a bunch of helper methods, such as class_admin?, to make the business logic work.
Your case may be different. It seems clients and partners may have greater differences in business logic in your app. I don't know, but it sounds like there are some fundamental differences in their roles. You will have to decide how much business logic is shared between them and how much is different. If they are different enough, then STI makes sense.
Furthermore, you may want to go the STI route if you would like to take advantage of inheritance in methods. For example: you may have a contact_verified? method where partner.contact_verified? has different business logic (email and phone maybe) than client.contact_verified? (email only). A weak example maybe, but you get the idea. Of course, you could accomplish the same thing with a conditional inside contact_verified? when using the single model approach.
You are correct that the some in the Rails community tend to be down on STI. So do not make the decision to go the STI route lightly. However, I have used STI successfully in some apps with few STI-related problems.
It all depends on how much business logic is shared and if you want to take advantage of inheritance. The decision is ultimately up to you.

Tom Aranda gives a good framework for deciding on an approach (and it seems you should probably use one table). Your "biggest" requirement, however, could easily be solved in SQL with a UNION query even if you decided to use two tables.
SELECT * FROM (SELECT id, 'Client' as type, first_name, last_name FROM clients
UNION SELECT id, 'Partner' as type, first_name, last_name FROM partners) AS t1
ORDER BY last_name LIMIT 25;
You could go further and INNER JOIN the email addresses and phone numbers as well.

Ruby on Rails - Alternatives to STI?

I have many different models (close to 20) that share some common attributes but also differ to some degree in others. STI seems attractive at first, but I have no idea how the various models will evolve over time with rapid product development.
A good parallel to our application that comes to mind is Yelp. How would Yelp manage something in Rails? All of the postings have some common attributes like "address". Yet, they differ quite a lot on others. For example, you have a reservation option for restaurants and maybe not for others. Restaurants also have a ton of other attributes like "Alcohol allowed" that don't apply to others. Doing this with STI will get out of hand pretty quickly.
So whats the next best option? HStore with Postgres? I am not comfortable using HStore for anything but small things. HStore solves some problems while introduces others like lack of data types, lack of referential integrity checks etc. I'd like a solid relational database as the foundation to build upon. So in the Yelp case, probably, a restaurant model is where I am going. I've taken a look at suggestions like here - http://mediumexposure.com/multiple-table-inheritance-active-record/, but I am not happy to do so much monkey patching to get something so common going.
So I am wondering what other alternatives exist (if any) or should I just bite the bullet, grind my teeth and copy those common attributes into the 20 models? I am thinking my problems would come from the migration files rather than the code itself. For example, if I setup my migrations to loop through tables and set those attributes on the tables, then would I have mitigated the extent of the problem with having different models?
Am I overlooking something critical that might cause a ton of problems down the road with a separate models?

I see a few options here:
Bite the bullet and create your 20 different models with a lot of the same attributes. It's possible that these models will drift over time - adding new fields to one specific type - and you'll create a 200 column table with STI. Maybe you don't - the future is hard to see, especially with exploratory/agile software.
Store non referential fields in a NoSQL (document) database. Use your relational database for parts of the record that are relational (a user has many reviews and a review has one business), but keep the type specific stuff in a NoSQL database. Keep an external_document_id in your Rails models and external_record_id / external_record_type in your NoSQL document schema so you can still query all bars that allow smoking using whatever NoSQL ORM you end up using.
Create an Attributes model. An attribute belongs_to :parent_object, polymorphic: true with a key and value field. With this approach you might have a base Business model and each business can has_many :attributes. Certain (non-relational?) attributes of the business (allows_smoking) are one Attribute record. An Attribute's key could be a string or could be a numeral you have Ruby constants for. You're essentially using the Attribute entities to create a SQL version of option #2. It might be a good option, and I've used this myself for User or Profile models. (Although there are some performance hits to be aware of with this approach).
I'd really worry about having that many (independent) models for something that sounds subclass-ey. It's possible you might be able to DRY up common behavior/methods by using Concerns (syntactic sugar over the mixin concept, see an awesome SO answer on concerns in Rails 4). You still have your (initial) migration problem, of course.

Adding another option here: Serialized LOB (272). ActiveRecord allows you to do this to an object using serialize:
class User < ActiveRecord::Base
serialize :preferences
end
user = User.create(preferences: { "background" => "black", "display" => large })
User.find(user.id).preferences # => { "background" => "black", "display" => large }
(Example code from ActiveRecord::Base docs.)
The important consequence to understand is that attributes stored in a Serialized LOB will not be indexable and certainly not searchable in any performant manner. If you later discover that a column needs to be available as an index you'll have to write [most likely] a Ruby program to perform the transformation (though by default serialization is in Yaml so any Yaml parser will suffice).
The advantage is that you don't have to make any technology changes to your stack in order to apply this pattern. Its easy to moderate - based on the amount of data you have collected - to migrate away from this pattern.

rails semi-complex STI with ancestry data model planning the routes and controllers

I'm trying to figure out the best way to manage my controller(s) and models for a particular use case.
I'm building a review system where a User may build a review of several distinct types with a Polymorphic Reviewable.
Country (has_many reviews & cities)
Subdivision/State (optional, sometimes it doesnt exist, also reviewable, has_many cities)
City (has places & review)
Burrow (optional, also reviewable ex: Brooklyn)
Neighborhood (optional & reviewable, ex: williamsburg)
Place (belongs to city)
I'm also wondering about adding more complexity. I also want to include subdivisions occasionally... ie for the US, I might add Texas or for Germany, Baveria and have it be reviewable as well but not every country has regions and even those that do might never be reviewed. So it's not at all strict. I would like it to as simple and flexible as possible.
It'd kinda be nice if the user could just land on one form and select either a city or a country, and then drill down using data from say Foursquare to find a particular place in a city and make a review.
I'm really not sure which route I should take? For example, what happens if I have a Country, and a City... and then I decide to add a Burrow?
Could I give places tags (ie Williamsburg, Brooklyn) belong_to NY City and the tags belong to NY?
Tags are more flexible and optionally explain what areas they might be in, the tags belong to a city, but also have places and be reviewable?
So I'm looking for suggestions for anyone who's done something related.
Using Rails 3.2, and mongoid.

I've built something very similar and found two totally different way that both worked well.
Way 1: Country » Subcountry » City » Neighborhood
The first way that worked for me is to do it with Country, Subcountry, City, Neighborhood. This maps well to major geocoding services and is sufficient for most simple uses. This can be STI (as in your example) or with multiple tables (how I did it).
In your example you wrote "Subdivision/State". My two cents is to avoid using those terms and instead use "Subcountry" because it's an ISO standard and you'll save yourself some confusion when another developer thinks a subdivision is a tiny neighborhood of houses, or when you have a non-U.S. country that doesn't use states, but instead uses provinces.
This is what I settled on after many experiments with trying model names like Region, District, Area, Zone, etc. and abandoning these as too vague or too specific. In your STI case it may be fine to use more names.
One surprise is that it's a big help to write associations that go multi-level, for example to say country.cities (skipping subcountry). This is because sometimes the intermediary model doesn't exist (i.e. there's no subcountry). In your STI, this may be trickier.
Also you get a big speedup if you denormalize your tables, so for example my city table has a country field. This makes updating info a bit trickier but it's worth it. Your STI could inmplement an equivalent to this by using tags.
Way 2: Zones that are lists of lat/lng shapes with bounding boxes
The second way is to use an arbitrary Zone model and store latitude longitude shapes. This gives you enormous flexibility, and you can pre-calculate when shapes contain other shapes, or intersect them. So your "drill down" becomes "show me shapes immediately inside this one".
Postgres has some good geocoding helpers for this, and you can speed up lookups by doing bounding boxes of min/max lat/lng. We also stored data like the expected center point of a Zone (which is where we would drop a pin on a map), and a radius (useful for calculating queries like "show me all the x items within y distance).
With this approach we were able to do interesting zones like "Broadway in New York" which isn't really a neighborhood so much as long street, and "The Amazon Basin" which is defined by the river, not a country.

STI Model with Ancestry and with Polymprphic Relation
I built something similar for previous projects, and went for STI with ancestry because it is very flexible and allows to model a tree of nodes. Not all intermediate nodes have to exist (as in your example of State/Subdivision/Subcountry).
For Mongoid there are at least two ancestry gems: mongoid-ancestry and mongestry (links below).
As an added benefit of using STI with ancestry, you can also model other location-related nodes, let's say restaurants or other places.
You can also add geo-location information lat/lon to all your nodes, so you can geo-tag them.
In the example below I just used one set of geo-location coordinates (center point) - but you could of course also add several geo-locations to model a bounding box.
You can arrange the nodes in any order you like, e.g. through this_node.children.create(...) .
When using MySQL with ancestry, you can pass-in the type of the newly created node. There must be a similar way with mongoid-ancestry (haven't tried it).
In addition to the tree-structured nodes, you can use a polymorphic collection to model the Reviews, and also Tags (well, there's a gem for acts_as_taggable, so you don't have to models Tags yourself).
Compared to modeling every class with it's own collection, this STI approach is much more flexible and keeps the schema simple. It's very easy to add a new type of node later.
This paradigm can be used with either Mongoid or SQL data stores.
# app/models/geo_node.rb
class GeoNode # this is the parent class; geo_nodes is the table name / collection name.
include Mongoid::Document
has_ancestry # either this
has_mongestry # or this
has_many :reviews, :as => :reviewable
field :lat, type: Float
field :lon, type: Float
field :name, type: String
field :desc, type: String
# ...
end
# app/models/geo_node/country.rb
class Country < GeoNode
end
# app/models/geo_node/subcountry.rb
Class Subcountry < GeoNode
end
# app/models/geo_node/city.rb
class City < GeoNode
end
# app/models/review.rb
class Review
include Mongoid::Document
belongs_to :reviewable, :polymorphic => true
field :title
field :details
end
Check these links:
mongoid-ancestry gem https://github.com/skyeagle/mongoid-ancestry
mongestry gem https://github.com/DailyDeal/mongestry
mongoid-tree gem https://github.com/benedikt/mongoid-tree
Gist on Mongoid STI: https://gist.github.com/507721
ancestry gem (for MySQL)
A big thanks to Stefan Kroes for his awesome ancestry gem, and to Anton Orel for adapting it to Mongoid (mongoid-ancestry). ancestry is of the most useful gems I've seen.

Sounds like a good candidate for nested routes/resources. In routes.rb, do something like:
resources :cities do
resources :reviews
end
resources :countries do
resources :reviews
end
resources :places do
resources :reviews
end
Which should produce something along the lines of rake routes:
reviews_cities GET /cities/:id/reviews {:controller=>"reviews", :action=>"index"}
reviews_countries GET /countries/:id/reviews {:controller=>"reviews", :action=>"index"}
reviews_places GET /countries/:id/reviews {:controller=>"reviews", :action=>"index"}
...etc., etc.
In the controller action, you lookup match up the :id of reviewable record, and only send back reviews that are attached to that reviewable object.
Also, see the nested resources section of the Rails Routing Guide, and this RailsCast on Polymorphic relationships, which has a quick section on routing, and getting everything to line up properly.

I would probably keep my data model very unrestrictive, and handle any specifics related to what filters to display in the controller/view. Make a mapping table where you can map attributes (i.e. city, borough, state, country) polymorphically, also polymorphically to reviewable.
By assuming many-to-many, your schema is as flexible as it can be, and you can restrict which mappings to create using validations or filters in your models.
It's basically using tagging, like you eluded, but not really using a tags model per-se, but rather a polymorphic association to different models that all act like tags.
Keep your DB schema clean and keep the business logic in ruby.

A database design for variable column names

I have a situation that involves Companies, Projects, and Employees who write Reports on Projects.
A Company owns many projects, many reports, and many employees.
One report is written by one employee for one of the company's projects.
Companies each want different things in a report. Let's say one company wants to know about project performance and speed, while another wants to know about cost-effectiveness. There are 5-15 criteria, set differently by each company, which ALL apply to all of that company's project reports.
I was thinking about different ways to do this, but my current stalemate is this:
To company table, add text field criteria, which contains an array of the criteria desired in order.
In the report table, have a company_id and columns criterion1, criterion2, etc.
I am completely aware that this is typically considered horrible database design - inelegant and inflexible. So, I need your help! How can I build this better?
Conclusion
I decided to go with the serialized option in my case, for these reasons:
My requirements for the criteria are simple - no searching or sorting will be required of the reports once they are submitted by each employee.
I wanted to minimize database load - where these are going to be implemented, there is already a large page with overhead.
I want to avoid complicating my database structure for what I believe is a relatively simple need.
CouchDB and Mongo are not currently in my repertoire so I'll save them for a more needy day.

This would be a great opportunity to use NoSQL! Seems like the textbook use-case to me. So head over to CouchDB or Mongo and start hacking.
With conventional DBs you are slightly caught in the problem of how much to normalize your data:
A sort of "good" way (meaning very normalized) would look something like this:
class Company < AR::Base
has_many :reports
has_many :criteria
end
class Report < AR::Base
belongs_to :company
has_many :criteria_values
has_many :criteria, :through => :criteria_values
end
class Criteria < AR::Base # should be Criterion but whatever
belongs_to :company
has_many :criteria_values
# one attribute 'name' (or 'type' and you can mess with STI)
end
class CriteriaValues < AR::Base
belongs_to :report
belongs_to :criteria
# one attribute 'value'
end
This makes something very simple and fast in NoSQL a triple or quadruple join in SQL and you have many models that pretty much do nothing.
Another way is to denormalize:
class Company < AR::Base
has_many :reports
serialize :criteria
end
class Report < AR::Base
belongs_to :company
serialize :criteria_values
def criteria
self.company.criteria
end
# custom code here to validate that criteria_values correspond to criteria etc.
end
Related to that is the rather clever way of serializing at least the criteria (and maybe values if they were all boolean) is using bit fields. This basically gives you more or less easy migrations (hard to delete and modify, but easy to add) and search-ability without any overhead.
A good plugin that implements this is Flag Shih Tzu which I've used on a few projects and could recommend.
Variable columns (eg. crit1, crit2, etc.).
I'd strongly advise against it. You don't get much benefit (it's still not very searchable since you don't know in which column your info is) and it leads to maintainability nightmares. Imagine your db gets to a few million records and suddenly someone needs 16 criteria. What could have been a complete no-issue is suddenly a migration that adds a completely useless field to millions of records.
Another problem is that a lot of the ActiveRecord magic doesn't work with this - you'll have to figure out what crit1 means by yourself - now if you wan't to add validations on these fields then that adds a lot of pointless work.
So to summarize: Have a look at Mongo or CouchDB and if that seems impractical, go ahead and save your stuff serialized. If you need to do complex validation and don't care too much about DB load then normalize away and take option 1.

Well, when you say "To company table, add text field criteria, which contains an array of the criteria desired in order" that smells like the company table wants to be normalized: you might break out each criterion in one of 15 columns called "criterion1", ..., "criterion15" where any or all columns can default to null.
To me, you are on the right track with your report table. Each row in that table might represent one report; and might have corresponding columns "criterion1",...,"criterion15", as you say, where each cell says how well the company did on that column's criterion. There will be multiple reports per company, so you'll need a date (or report-number or similar) column in the report table. Then the date plus the company id can be a composite key; and the company id can be a non-unique index. As can the report date/number/some-identifier. And don't forget a column for the reporting-employee id.
Any and every criterion column in the report table can be null, meaning (maybe) that the employee did not report on this criterion; or that this criterion (column) did not apply in this report (row).
It seems like that would work fine. I don't see that you ever need to do a join. It looks perfectly straightforward, at least to these naive and ignorant eyes.

Create a criteria table that lists the criteria for each company (company 1 .. * criteria).
Then, create a report_criteria table (report 1 .. * report_criteria) that lists the criteria for that specific report based on the criteria table (criteria 1 .. * report_criteria).

How to model the database for a backpack-like application

I would like to create an application that would use the same system as backpack (http://www.backpackit.com) to create different types of pages.
Basically, you can add different elements to a page, and reorder them. Some elements can contains other elements (like an image gallery which contains... images, or lists etc).
I'm not sure how to model that.
I'd like to be able to do something like:
page.elements
without having to retrieve all elements myself
class Page < ActiveRecord::Base
has_many :texts, :dependent => :destroy
has_many :titles, :dependent => :destroy
def elements
#elts = texts + titles + ...
#order elts...
end
end
So I was thinking about single table inheritance.
I could have a Containers table, and Notes, Galleries, Lists etc could inherit from Containers.
And then, I would have Elements that could be linked to various Containers using polymorphism.
How would you do that? Do you see any fundamental flaws in my approach?
Thanks!

First off, the design is not as efficient as it could be, but whether or not it is fundamentally flawed actually depends on your level of experience:
Case 1: You are relatively new to programming and trying to get started by reverse-engineering and implementing something you can see and understand (backpackit). If this is true then you cannot go wrong by diving in and using the ORM philosophy that database tables can be designed as if they were persisting classes. It will be inefficient, but you'll learn plenty by not having to worry about the database -- yet.
Case 2: You are a veteran programmer (at least one decent app actually being used by people who paid for it) and for some reason are still expressing database design questions in object-oriented terminology. Then you have a fundamental flaw only because there is a good chance you will experience success that will stress the system, at which point the fundamental inefficiency of "table inheritance" will bite you.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart