Postgres HStore vs HABTM - ruby-on-rails

I am building an app that has and model that can be tagged with entries from another model, similar to the tagging function of Stackoverflow.
For example:
class Question < ActiveRecord::Base
has_and_belongs_to_many :tags
end
class Tag < ActiveRecord::Base
has_and_belongs_to_many :questions
end
I am debating between just setting up a has_and_belongs_to_many relationship with a join table, or adding the tags to a hash using Postgres' hstore feature.
Looking for anyone that has had a similar experience that can speak to performance differences, obstacles, or anything else that should persuade me one way or another.
EDIT:
I think I should also mention that this will be a API that will be using an AngularJS frontend.

You are describing the topic of a great debate:) Normalization vs denormalization. Using many to many allows you to do nice queries such as "how many people use a certain tag" in a very simple way. HStore is very nice as well but you end up with thousands of the same tags everywhere. I use both approaches in different projects but the real problem comes when you decide one day to move your database. With Hstore you will be stuck to postgresql or have to rewrite your code. If super high speed is important as well as querying different ways and you often want to load a user record in one fellow swoop as fast as possible and show all used tags I normally do both: create a many to many relationship as tags are normally also connected to more objects like user has many tags from tags table and tags are connected to let's say brands which are connected to products and so on.
Then I create an additional field with hstore or json objects on the user table which adds every tag or removes it when the many to many relationship is destroyed.
To give you an example: in on of my projects I have companies (almost 10 million) who are interested in certain keywords and their ranking on google. This table has millions of rows but connected only to 2 million keywords which are connected to search results. This way I can quickly query which result is searched for by how many people and who they are.
If a customer opens their key word search page I load their keywords from a text column with json which is faster than going through the table.

Related

Small number of set categories with many-to-many relation?

I'm relatively new to the Rails framework and I'm not sure if the approach I am taking is the most efficient/effective way or if I am following Rails conventions well.
The basic issue I have is that my application will have a Company model and various set Categories (not editable by the user). Each Company can be part of multiple Categories. My understanding, from other examples, is that I should set the relationships as something like:
Company has_many_belongs_to_many Categories
Category has_many_belongs_to_many Companies
However, since there will not be that many categories (<10), and since they will not change/be editable/be added/be removed by users, I'm not sure I need to create a whole new table for categories then join them onto Companies? Is there a better way to do this in Rails that I'm missing? Thanks in advance!
Even though you may only have 10 categories or so, I would say this is still fine to have it in its own table. Setting up the relationships give you programmatic power to retrieve companies related to a single category and vice versa when you need it without having to reconstruct queries yourself.
An example of the simplicity for keeping those in the database:
# Get all companies under a specific Category
#category = Category.find(1)
#companies = #category.companies
That's pretty simple if you ask me. And if you add another category to the table in the future, you won't need to write any new code to get it work.
Another thing, I would check out using has_many :through instead of has_and_belongs_to_many (habtm), as habtm can cause unforeseen problems as your application gets bigger. Here is a great article that goes into that problem a lot deeper: Why You Don’t Need Has_and_belongs_to_many Relationships. Not saying you can't use it (if the shoe fits), but generally it's good to be aware of potential problems so you can make the right decision for you and your app.

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?
Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

Better solution for mongoid many to many relationship

The mongoid documentation told me that n-n relations should be used with caution
I understand his but don't have an idea how to solve my problem a better way using pure mongoid:
A course has many participants and a participant could participate with many courses. So wouldn't it be faster to store the participant on the course model and do a search over all courses when all courses of a participant are needed?
Your model should be reflective of your use cases.
One way to do this would be to have one model for the courses, one for participants and a 3rd that maps students to courses (with a unique index on course & student to prevent duplicates). This way there is a single model referring to the other 2. This may or may not be ideal based on your access patterns.
I think this is probably a good use case for embedding documents. See the sample syntax on the front page for embeds_many and embedded_in: http://mongoid.org/en/mongoid/
The main downside here is that if you have participants in more than one course, you will have duplicate participants in each of those courses.
Make sure you put an index on the fields you plan to do your lookups for participants with.

Blog architecture design

I'm teaching myself rails, and want to build a blog similar to tumblr. It will have a couple different post types, such as written text, photo posts, audio posts, and video posts.
My initial thought was to have different models for each type of post, since there will be different rules for each type of post. However, I'm still learning and don't know what I don't know, so maybe there is a better way to go about things (maybe only one model for posts, and a table for post types?).
Any feedback would be appreciated.
Probably a good relational database and object oriented design would be to have one main post model, which will probably share mostly same attributes and behaviors with all the types of posts. This could even act as your "text" type posts.
This could also simplify relationships with the posts also (eg. "users has many posts" vs "users has many text posts and/or video posts and/or etc").
Then have a sort of "attachments" join table, which determines the type of attachment (so you can have multiple attachments per post):
CREATE TABLE attachments (post_id, media_type, media_id)
Then have a table and model for each type for specific behaviors and handlers for the media types.
CREATE TABLE audios (id, transcription, storage);
CREATE TABLE videos (id, location, format, storage);
This will probably require some sort of polymorphic relationship, though, which could be a debatable DB design... you'll need views and triggers to query easily and maintain integrity... but Rails handles it quite well.
The post model would have
has_many :attachments
and attachments would have
belongs_to :post
belongs_to :media, :polymorphic => true
and each of the media model would have
has_one :attachment, :as => :media
then you can access your media via
post.attachments[0].media
You can skip the attachments table and merge the attributes with the posts table if you only need one type of media per post
Sorry i keep editing, i keep thinking of more things to to say :)
Here's a couple of options that could work.
First, you could just make one Model with columns for text_content, video_link, photo_link, etc.
Then in your view, you could render the post's view to the user (probably using a partial) with a different look depending on which attributes have values.
A second option would be to make a smaller Post table that just had key information and use a series of 'has_one' relationships to the other items.
The only advantage I see to the second option is that your DB table would be smaller since you don't have to represent the null cells over and over. Unless your worried about some huge scaling issues, I'd go with the 1st option though.

Single Inheritance or Polymorphic?

I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).

Resources