I've read Rails guides and tried a few different things w/Active Record but haven't been able to figure out what the best way to do this is.
I need to set up a self-referential (users to users) relationship that is hierarchical. It usually would be no more than 5 levels high, but should be able to scale up infinitely.
I've tried creating a UserHierarchy model with a DB schema like this:
parent | child | level
However, managing this is a bit too difficult and too complicated to handle.
What's the best way in Rails to do a self-referential hierarchical relationship? I've checked out gems like ancestry, but the majority of them use class inheritance and don't work well for self-referential relationships. It's a many-to-many, self-referential hierarchy (in MySQL).
Ancestry is one of the gems that is esp. well suited for object tree structures (what you call self-referential hierarchical relationships). You should have a more detailed look at it.
Generally, there are about four common ways to store trees in a SQL database:
Simple parent pointers. You just add a new column called parent_id to your model holding the ID of the parent object. This allows easy inserts and is well suited for single-level hierarchies but is generally difficult to use for deeper hierarchies and is thus generally not used as the primary mechanism (although it is sometimes combined with other mechanisms)
Nested Sets. You define your trees as a structure of nested sets. This is typically implemented with a right and a left column which are populated with numbers to define the set. It allows efficient querying but is a bit tricky when inserting values. Esp. when having concurrent changes to a tree, it is sometimes prone to inconsistencies. This model is e.g. used to the awesome_nested_set gem.
Materialized Paths. This is the model e.g. used by ancestry. It stores the full parent path of all elements. This allows for efficient inserting and querying. Changing a tree is bit more expensive.
Closure Trees. This mechanism stores for each element all of its parents in a table. This is e.g. used by the closure_tree gem.
Generally, all these options allow to store a tree of objects, i.e. a hierarchical structure of objects of the same class (an ActiveRecord model in this case).
Which one to use depends on which trade-offs are more important for your specific use-case. Most importantly, you should figure out if you are changing trees often (e.g. moving sub-trees around or adding only leaves) and how you are querying the tree (e.g. do you only need direct children, do you need whole sub-trees, do you need to filter) and chose the appropriate solution based on that.
Related
We have to create a request system which will have roughly 10 different types of requests. All of these requests will belong to the 'accounting' aspect of our application. Therefore we've called them "Accounting requests".
All requests share maybe only a few columns and each has up to 20 columns individually.
We started to wonder if having separate tables for each request type would be practical in terms of speed when we start to have to do very complicated joins or queries, for example, fetching ALL requests types into a single table and then sorting it.
Maybe it would be easier to just use Single Table Inheritance since it will have a type column and we'd be using one table to store all 10 accounting request types.
What do you think regarding using STI for this many polymorphic associations and requirements?
Essentially, it would have models like so:
AccountingRequest
BillingRequest < AccountingRequest
CheckRequest < AccountingRequest
CancellationRequest < AccountingRequest
Each subclass has roughly 10+ fields.
Currently reading about Multiple Table Inheritance here. This seems like the solution that fits my requirements in this case. Not sure yet though.
STI is a good fit if your models all share the same attributes.
However if your sub classes start having attributes specific to them and not applicable to others, then STI can result in a lot of null columns. In that case, I usually prefer to go with polymorphic association.
This railscast episode is a great example of the difference between the 2
You can use STI in that situation. But making STI will require all the columns into one single table and that's not the good think. The table will go very large in the number of fields.
I think you should divide into two tables like as below...
Request: A request table will be the polymorphic table which saved the information for the type of requests.
RequestItem: The request item table will save all the 20 fields records into the table and will have a foreign key of request table. The request item table will have two fields into the database that's called key and value.
It sounds do-able.
When I've looked into this, I found that making extensive use of value objects helped to control the non-applicability of some attributes to some of the types.
In my case I had types of products, some of which would not have particular measurements for example. In those cases I used a Null Object to indicate "Not applicable" where appropriate.
Edit: I also found the composed_of syntax very convenient: https://apidock.com/rails/ActiveRecord/Aggregations/ClassMethods/composed_of
For now I'm using a bit of NoSQL for such cases. Postgresql's JSONB type allows to store multilevel ruby hash. It also provides rich functionality: DB level constraints, indexes and query operators.
So common attributes are stored in standard way and child specific - in jsonb. Then you can use whatever you need on top of this: STI, Value Objects pattern, serialization or just create scopes for each child. I prefer the last one - my models are thin, most of constraints are DB level and all business logic is in service classes.
Pros:
Avoiding alter table on big tables when need to add one more child type
Keeping my queries efficient
Preventing storing and selecting unnecessary columns
Serialization out of the box for JSON APIs
Cons:
A bit of schemaless
Vendor lock
I just started reading this guide: https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/CoreData/KeyConcepts.html#//apple_ref/doc/uid/TP40001075-CH30-SW1
And it basically has (in my opinion) two big contradictions:
I get them both, but basically, if I follow the first "implement a custom class to the entity from which classes representing subentities also inherit"-statement, then ALL my entities will be put in the same table. Which could cause performance issues, according to the NOTE.
How big of a performance hit would I run into of it create a "custom super entity"?
You can use the inheritance mechanism to get a default database structure. From your link:
If you have a number of entities that are similar, you can factor the common properties into a superentity, also known as a parent entity.
There is no contradiction. The documentation is just telling you what the database structure is going to be when you use a certain facility. (And it is the standard database table idiom for inheritance.) Using the entity inheritance mechanism automatically declares and implements default parent-child class inheritance functionality along with a parent table. Otherwise you do any parent-child class inheritance declaration and implementation by hand. Each comes with certain performance and other characteristics.
Design involves tradeoffs between costs and benefits over multiple dimensions. "Performance" itself involves multiple dimensions, and has no meaning outside of given application usage patterns. Other dimensions relevant here include complexity of both construction and maintenance.
If you query about entities as parents sufficiently frequently then it can be better to have all parent data in its own table. But if you sufficiently rarely ask for the parent data while querying about a given child type or if you sufficiently frequently need both child and parent data then it can be better to only have parent data in the child tables or table. But notice that each design performs worse at the other kind of query.
The first is talking about sub-entities. The second is talking about subclasses. These are 2 different hierarchies.
One use for sub-entities is if you have a table where you want to show cells displaying different entities. By making them sub-entities, you can fetch the parent entity and all sub-entities will be returned. This is actually how the Notes app shows the "All Notes" cell above folders, that is actually displaying the Account entity, and both Account and Folder are sub-entities of NoteContainer which is what is fetched. This does mean all of the rows are in the same table, but personally I have not experienced any performance problems but it is something to keep in mind when modifying the entities in other ways like indexes, relations or constraints for example.
I'm not familiar with this quirk of SQLite, but modeling base class/subclass relationships are usually done with different tables. There is one table that represents the base class which contains attributes common to all derivative classes (Vehiclea) and a different table for each subclass which contain attributes unique to that subclass (Cars, Trains, Airplanes).
Performance is no better or worse than any entity normalized across different tables.
I have an app that consists mainly of restaurant model instances. One of the essential attributes for these restaurants is labeling the cuisine it falls under. I'm currently at odds with myself in regards to designing this. On one hand I thought of creating a Cuisine model and creating either a HMT or HABTM association between Restaurants and Cuisines.
More recently I came across this post which shows how to create a pre-defined set of attributes. To take the answer one step further I'm assuming (in my case) I'd add a string-based cuisine column to my restaurant model and setup a select box in my restaurant form that would save the selected value.
What I was wondering was what would be the most efficient way of doing this? The goal is to eventually be able to query restaurants based what cuisine(s) they fall under. I wasn't sure if a model would be the best choice due to it only serving as a join table in a sense with a name attribute. Wasn't sure if having this extra table for something so minute would be optimal.
On the other hand I didn't know if using YAML for this would be conducive since the values are essentially dummy strings with no tangible records on file like I'd have with a model instance. Can someone help me sort out this confusion?
There are many benefits of normalizing many-to-many relationships in the db. Here are some:
Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.
You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.
Index searching is often faster, since indexes tend to be narrower and shorter.
More tables allow better use of segments to control physical placement of data.
You usually have fewer indexes per table, so data modification commands are faster.
Fewer null values and less redundant data, making your database more compact.
Triggers execute more quickly if you are not maintaining redundant data.
Data modification anomalies are reduced.
Normalization is conceptually cleaner and easier to maintain and change as your needs change.
Also, by normalizing you get the cleaner syntax and other infrastructure benefits from ActiveRecord, e.g.
cuisine.restaurants.where(city: 'Toledo')
So this is probably a fairly easy question to answer but here goes anyway.
I want to have this view, say media_objects/ that shows a list of media objects. Easy enough, right? However, I want the list of media objects to be a collection of things that are subtypes of MediaObject, CDMediaObject, DVDMediaObject, for example. Each of these subtypes needs to be represented with a db table for specific set of metadata that is not entirely common across the subtypes.
My first pass at this was to create a model for each of the subtypes, alter the MediaObject to be smart enough to join into those tables on it's conceptual 'all' behavior. This seems straightforward enough but I end up doing a lot of little things that feel not so rails-O-rific so I wanted to ask for advice here.
I don't have any concrete code for this example yet, obviously, but if you have questions I'll gladly edit this question to provide that information...
thanks!
Creating a model for each sub-type is the way to go, but what you're talking about is multiple-table inheritance. Rails assumes single-table inheritance and provides really easy support for setting it up. Add a type column to your media_objects table, and add all the columns for each of the specific types of MediaObject to the table. Then make each of your models a sub-class of MediaObject:
class MediaObject < ActiveRecord::Base
end
class CDMediaObject < MediaObject
end
Rails will handle pulling the records out and instantiating the correct subclass, so that when you MediaObject.find(:all) the results will contain a mixture of instances of the various subclasses of MediaObject.
Note this doesn't meet your requirement:
Each of these subtypes needs to be represented with a db table for specific set of metadata that is not entirely common across the subtypes.
Rails is all about convention-over-configuration, and it will make your life very easy if you write your application to it's strengths rather than expecting Rails to adapt to your requirements. Yes, STI will waste space leaving some columns unpopulated for every record. Should you care? Probably not; database storage is cheap, and extra columns won't affect lookup performance if your important columns have indexes on them.
That said, you can setup something very close to multiple-table inheritance, but you probably shouldn't.
I know this question is pretty old but just putting down my thoughts, if somebody lands up here.
In case the DB is postgres, I would suggest use STI along hstore column for storing attributes not common across different objects. This will avoid wasting space in DB yet the attributes can be accessed for different operations.
I would say, it depends on your data: For example, if the differences between the specific media objects do not have to be searchable, you could use a single db table with a TEXT column, say "additional_attributes". With rails, you could then serialize arbitrary data into that column.
If you can't go with that, you could have a general table "media_objects" which "has one :dataset". Within the dataset, you could then store the specifics between CDMediaObject, DVDMediaObject, etc.
A completely different approach would be to go with MongoDB (instead of MySQL) which is a document store. Each document can have a completely different form. The entire document tree is also searchable.
I'm getting ready to start a small project that provides an opportunity to use single table inheritance. As I read through prior post on STI on Stackoverflow there seems to be some strong opinions on sides of the argument.
My application is related to my horse racing hobby. A horse's connections are defined as its current jockey, trainer and owner. The jockey, trainer and owner could be modeled using three separate tables (models/classes) or as one one class with several sub-classes through single table inheritance.
When faced with a decision like this, is there a check list of questions that one can go through to determine what approach is preferable. I'm assuming that using STI would reduce the number of potential joins. What are the other practical considerations?
There are a few things you should think about:
Are the objects, conceptually, children of a single parent?
Don't use single table inheritance just because your classes share some attributes; make sure there is actually an OO inheritance relationship between each of them and an understandable parent class.
Do you need to do database queries on all objects together?
If you want to list the objects together or run aggregate queries on all of the data, you’ll probably want everything in the same database table for speed and simplicity.
Do the objects have similar data but different behavior?
If you have a larger number of model-specific columns, you should consider polymorphic associations instead.
The article linked goes in depth a bit more.