I am making a proof of concept for an app, using Rails 4 and Postgresql. I am thinking on the best way to handle the relation between Products and SubProducts.
A Product have a name, a description... and a SubProduct could have multiple fields too.
A Product have many SubProduct, a SubProduct belongs to one Product.
I have some Products and SubProducts with hundreds of fields. So I think it is best to not use STI to avoid thousands of null value.
Also I am working with remote designers, I would like to keep it simple for them. So when they want to display the value of a field from a sub product, they would write something like #product.name (from Product table) or #product.whatever (field from SubProduct table).
My question is how to handle this ? For the moment, I was thinking to delete the Products table and to make multiple SELECT to db, one for each SubProducts table. But maybe there is a solution to keep the Products table ? Or maybe I can take advantage of table inheritance from Postgresql ?
Thank you :-)
Are the hundreds of fields all different for each subproduct? (As you mentioned, "sparse" attributes can lead to lots of nulls.)
This brings to mind an entity-attribute-value model, as described here:
https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Here's a presentation with one organization's solution (key/value hstore):
https://wiki.postgresql.org/images/3/37/Eav-pgconfnyc2014.pdf
This can quickly get very complicated, and makes things like search much more challenging.
And if there are many variations, this also brings to mind a semi-structured, document-oriented or "NoSQL" design.
Related
I'm relatively new to the Rails framework and I'm not sure if the approach I am taking is the most efficient/effective way or if I am following Rails conventions well.
The basic issue I have is that my application will have a Company model and various set Categories (not editable by the user). Each Company can be part of multiple Categories. My understanding, from other examples, is that I should set the relationships as something like:
Company has_many_belongs_to_many Categories
Category has_many_belongs_to_many Companies
However, since there will not be that many categories (<10), and since they will not change/be editable/be added/be removed by users, I'm not sure I need to create a whole new table for categories then join them onto Companies? Is there a better way to do this in Rails that I'm missing? Thanks in advance!
Even though you may only have 10 categories or so, I would say this is still fine to have it in its own table. Setting up the relationships give you programmatic power to retrieve companies related to a single category and vice versa when you need it without having to reconstruct queries yourself.
An example of the simplicity for keeping those in the database:
# Get all companies under a specific Category
#category = Category.find(1)
#companies = #category.companies
That's pretty simple if you ask me. And if you add another category to the table in the future, you won't need to write any new code to get it work.
Another thing, I would check out using has_many :through instead of has_and_belongs_to_many (habtm), as habtm can cause unforeseen problems as your application gets bigger. Here is a great article that goes into that problem a lot deeper: Why You Don’t Need Has_and_belongs_to_many Relationships. Not saying you can't use it (if the shoe fits), but generally it's good to be aware of potential problems so you can make the right decision for you and your app.
I'm building a Ruby on Rails App for a business and will be utilizing an ActiveRecord database. My question really has to do with Database Architecture and really the best way I should organize all the different tables and models within my app. So the App I'm building is going to have a database of orders for an ECommerce Business that sells products through 2 different channels, a subscription service where they pick the products and sell it for a fixed monthly fee and a traditional ECommerce channel, where customers pay for their products directly. So essentially while all of these would be classified as the Order model, there are two types of Orders: Subscription Order and Regular Order.
So initially I thought I would classify all this activity in my Orders Table and include a field 'Type' that would indicate whether it is a subscription order or a regular order. My issue is that there are a bunch of fields that I would need that would be specific to each type. For instance, transaction_id, batch_id and sub_id are all fields that would only be present if that order type was a subscription, and conversely would be absent if the order type was regular.
My question is, would it be in my best interest to just create two separate tables, one for subscription orders and one for regular orders? Or is there a way that fields could only appear conditional on what the Type field is? I would hate to see so many Nil values, for instance, if the order type was a regular order.
Sorry this question isn't as technical as it is just pertaining to best practice and organization.
Thanks,
Sunny
What you've described is a pattern called Single Table Inheritance — aka, having one table store data for different types of objects with different behavior.
Generally, people will tell you not to do it, since it leads to a lot of empty fields in your database which will hurt performance long term. It also just looks gross.
You should probably instead store the data in separate tables. If you want to get fancy, you can try to implement Class Table Inheritance, in which there are actually separate but connected table for each of the child classes. This isn't supported natively by ActiveRecord. This gem and this gem might be able to help you, but I've never used either, so I can't give you a firm recommendation.
I would keep all of my orders in one table. You could create a second table for "subscription order information" that would only contain the columns transaction_id, batch_id and sub_id as well as a primary key to link it back to the main orders table. You would still want to include an order type column in the main database though to make it a little easier when debugging.
Assuming you're using Postgres, I might lean towards an Hstore for that.
Some reading:
http://www.devmynd.com/blog/2013-3-single-table-inheritance-hstore-lovely-combination
https://github.com/devmynd/hstore_accessor
Make an integer column called order_type.
In the model do:
SUBSCRIPTION = 0
ONLINE = 1
...
It'll query better than strings and whenever you want to call one you do Order:SUBSCRIPTION.
Make two+ other tables with a foreign key equal to whatever the ID of the corresponding row in orders.
Now you can keep all shared data in the orders table, for easy querying, and all unique data in the other tables so you don't have bloated models.
I have a model which has many of another model but this model only needs to have 10 or less id's in it.
Let's say it has, Bathroom, Kitchen, LivingRoom for arguments sake and the new records will probably never need to change.
What is the best way of making a model like this that doesn't use a database table?
This may not be best practices, but to solve the same problem I just specified a collection in my model, like this:
ROOM_TYPES = [ "Bathroom", "Living Room", "Kitchen" ]
Then in the view:
<%= f.select(:room_type, Project::ROOM_TYPES, {:prompt => '...'}) %>
(replace Project with your actual model name.)
Super-straightforward, almost no setup. I can see how it would be difficult to maintain though, since there's no way to add items without accessing the Rails code, but it does get the job done quickly.
Even though the collection of rows never changes, it's still useful to have this as a table in your database in order to leverage ActiveRecord relations. A less contrived example would be a database table that has a list of US states. This is highly unlikely to change, and would likely have only a couple of columns (state name and state abbreviation). However, by storing these in the database and supporting them with ActiveRecord, you preserve the ability to do handy things like searching for all users in a state using conventional rails semantics.
That being said, an alternative would be to simply create a class that you stick in your models directory that does not inherit from ActiveRecord, and then either populate it once from the database when the application loads, or simply populate it by hand.
A similar question was asked yesterday, and one of the answers proposes a mechanism for supporting something similar to what you want to do:
How to create a static class that represents what is in the database?
I am working on a project that currently has tons of HABTM associations. Essentially, everything is related to everything else. I am considering setting up a single intermediate table/model that has two polymorphic fields. This way, if I add another model I can easily connect it to the remaining models. Is this a good idea? If not, why not? If it is, why don't all rails projects have this kind of intermediate table?
I see two other options. I could keep adding intermediate tables or I could add a table that contains one of each type. The former option is kind of a hassle and the latter option does not allow for self joins.
While a polymorphic join table sounds like it would make things easier, I think you will end up creating more headache for yourself than it's worth. Here are a few potential challenges/problems off the top of my head:
You will not be able to use ActiveRecord's has_and_belongs_to_many association or related helpers without a ton of hacking/monkeypatching which will immediately eclipse the time it would take to setup individual pairwise link tables.
Your join table will have two id columns, let's call them a_id and b_id. For any given pair of models you will have to ensure that the ids always end up in the same column.
Example: If you have two models called User and Role, you would have to ensure for that pair that the user_id is always stored in col a_id and the role_id is always stored in col b_id, otherwise you will not be able to index the table in any kind of meaningful way (and will run the risk of defining the same relationship twice).
If you ever want to use database enforcement of FOREIGN KEY constraints it is unlikely that this polymorphic link table scheme will be supported.
The universal link table will get n times larger than n separate link tables. It shouldn't matter much with good indexing but as your application and data grow this could become a headache and limit some of your options in regards to scaling. Give your DB a break.
Most or least importantly (I can't decide) you will be bucking the norm which means a lot fewer (if any) resources out there to help you when you run into trouble. Basically the Adam Sandler "they're all gonna laugh at you" rationale.
Last thought: Can you eliminate any of the link tables by using has_many :xxx, :through => :xxx relationships?
Thinking it all through, you could actually do this, but I wouldn't. Join tables grow fast enough as it is and i like to keep model relationships simple and easy to alter.
I'm used to working on very large systems / data sets though, so if you're going going to have much in each join then ok. I'd still do it separately for joins however and i really like my polymorphics.
I think it would be cleaner and more flexible if you were to use multiple join tables as opposed to one giant multipurpose join table.
I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).