Rails eager loading relational table with only 1 piece of data - ruby-on-rails

As a simplification of my problem, lets say I have a songs and an artists table:
The majority of the time I'm querying songs and I always need the artist name attached. Would it make sense to add another column to song called artist_name so that I don't need to do .includes(:artist) eager loading?
I think my current design is the typical one but what is the most efficient one or what is the best way to handle this situation in rails?

Song.find(2, :include => :artist)

To answer my own question:
It seems database denormalization makes sense only in heavy read applications and one should profile extensively first.
Some options:
Add an includes to the default scope for songs
Denormalize artist name and use model hooks to ensure data integrity


Program design when combining external sources with local

I have a Rails app that is basically designed this way:
It has a Book model, that has an external_id (all saved Book records have an external_id). The external_id links to an external source about books that doesn't allow for the data to be stored. We use a Presenter to handle some of the differences in the Book model and the external library's class to smooth things over for the view.
We let users do things like "Favorite" their books, regardless of source, so we have a join table and model with a book_id and a user_id to record favorites.
However, in some of the queries, there will be a list of results displayed to the user from the external source, even though we might have Book records with those external_ids. We want to be able to display information like who that the user is friends with that has favorited that book.
It seems there are a couple of ways to handle this:
1) Always load the canonical Book record (if it exists) in the presenter based on the external_id, and override the Book#friends_who_favorited method to return false if no external_id was found
2) Overload the presenter to either call Book#friends_who_favorited or if not a Book record, create its own join query based on external_id (since we wouldn't know the book id yet).
3) Denormalize the database a little, and make sure that we always store the external_id everywhere -- Basically treat external_id like the primary key since every Book record has an external_id. Then the queries can be done more directly, not require a join query, and we wouldn't need multiple queries written. But, this ties us even more to that external source since now our database design will be based on external_id.
It seems like #1 might be the best way to do it, even though it would introduce an extra query to Book (Book.where(external_id: x).first), since #2 would require writing a whole set of additional queries to handle the external_id case. But, I'm open to suggestions as I'm not fully comfortable with any of these methods.
Based on the discussions, if I do that I might consider this solution:
Uniform the identifier of all books to an id instead of ActiveRecord default id. This is the current field external_id, though I would prefer to rename it without underscore, say rid represents resource id.
Use a format for internal books on rid different from external books.
For example, suppose the format of external id like "abcde12345", then you name the internal books rid as "int_123" according to actual id so all of them are guaranteed to be unique.
Use a model callback to update rid after creating. If it's internal, copy its id and add "int_" prefix. If it's external, save its external id to that field.
Now usage would be simpler. For every action, use rid instead of original id. When an user favouring the book, the association would be the rid.
In the join table, you can also keep the original id there, so that when one day you changed implementation, there would still be original ids available.
Now the join table will have 4 fields: id, user_id, book_id(the original id), book_rid.
To display the users who liked this book, no matter the book is external or not, you can now query based on the rid in join table and fulfil the job.
Actually refacoring on this solution should not be hard and do no harm.
Add a field rid in the join table
Build a query task to fill rid of all books. Actually it's for internal books only which has blank external_id at this moment.
Build a query to fill the rid field in join table.
Refacor associating method to specify association id, and other related methods if needed.

Loading all the data but not from all the tables

I watched this rails cast http://railscasts.com/episodes/22-eager-loading but still I have some confusions about what is the best way of writing an efficient GET REST service for a scenario like this:
Let's say we have an Organization table and there are like twenty other tables that there is a belongs_to and has_many relations between them. (so all those tables have a organization_id field).
Now I want to write a GET and INDEX request in form of a Rails REST service that based on the organization id being passed to the request in URL, it can go and read those tables and fill the JSON BUT NOT for ALL of those table, only for a few of them, for example let's say for a Patients, Orders and Visits table, not all of those twenty tables.
So still I have trouble with getting my head around how to write such a
.find( :all )
sort of query ?
Can someone show some example so I can understand how to do this sort of queries?
You can include all of those tables in one SQL query:
#organization = Organization.includes(:patients, :orders, :visits).find(1)
Now when you do something like:
It will load the patients in-memory, since it already fetched them in the original query. Without includes, #organization.patients would trigger another database query. This is why it's called "eager loading", because you are loading the patients of the organization before you actually reference them (eagerly), because you know you will need that data later.
You can use includes anytime, whether using all or not. Personally I find it to be more explicit and clear when I chain the includes method onto the model, instead of including it as some sort of hash option (as in the Railscast episode).

Rails Order by Contained Objects 2 Levels Deep?

How can I include ordering in an 'order' ActiveRelation call that's more than one level deep?
That is, I understand the answer when it's only one level deep (asked and answered at Rails order by associated data). However, I have a case where the data on which I want to sort is two levels deep.
Specifically, in my schema a SongbookEntry contains a Recording, which contains an Artist and a Song. I want to be able to sort SongbookEntry lists by song title.
I can go one level deep and sort Recordings by song title:
#recordings = Recording.includes(:song).order('songs.title')
...but don't know how to go two levels deep. In addition, it would be great if I could sort on the recording (that is, the song title and the artist name) -- is this possible without descending into SQL?
Thanks for any help,
If you model the association between SongbookEntry and Song as such:
class SongbookEntry < ActiveRecord::Base
# ...
has_one :song, through: :recording
you will be able to access #songbookentry.song and SongbookEntry.joins(:song) using your existing schema.
Applying the same idea for Artist, a possible query would be:
Note that this may not be the most efficient operation (multiple joins involved) even though it looks Rails-ish, so later on you may want to denormalize the tables as Ryan suggested, or find another way to model the data.
I would advise storing the artist name (and possibly the song title too) on the recording itself, so you don't have to "descend into SQL".
Try this
SongbookEntry.includes(:recording=>[:artist,:song]).order('songs.title, artists.name')
You can use joins in place of includes if you don't want to use associated tables fields in views

Model design -- What's the optimal way to do this?

I have this app I am writing in Rails 3.1, I am wondering the best way to model this.
Would it be best if I created a model called "Movie" with a "title" and then create a new model for each "movie asset" such as "poster, trailer, screener" etc and relate it to the "Movie" by associations? Or would it be best if I just created this as one and do-away with the of associations of each asset to "Movie"?
My assumption is to just make it as one as it will remove all the overhead of the FK's and joins to get retrieve the data related to the movie but I am looking for opinions/suggestions. Thanks
There can be three types of attributes(columns) for movies.
Which have exactly one value, and are present in every movie e.g. title, year, official trailer etc.
Keep them in the movie table.
Which have exactly one value, but are present in few of the movies e.g. total Academy Awards.
Keep them in separate table, and use has_one+belongs_to association.
Which have multiple values e.g. trailers
Keep them in separate table, and use has_many+belongs_to association.
More suggestions:
For many key-value attributes, it is easier to use one json/yaml column using serialize instead of creating one column for each key.
Do not store images in DB, keep them in file-system or cloud storage.
You can create your models in the order you want since you have to fill in both models to create a unique association (such as belongs_to and has_many) so I think it doesn't really matter !

Structure my database

I'm creating an app that will import products from several XML feeds. In the XML there is a category specified, like T-Shirt for instance. The problem is that different resellers specify the categories differently. For instance, what one reseller calls "T-Shirts" another may call "T-Shirt", a third "short sleeved shirts" and so on.
I want to somehow map these categories to the categories I have myself. So I need some tips on how I should structure my database.
The idea I have is to create a "raw_categories" table which contains the name of the resellers category and a "category_id" which has a belongs_to relationship to my own "categories" table. Then when I import I simply try to find a raw_category which has a matching name and if there is one, pick it, otherwise add a new one. This new one I can then manually relate to one of my own categories.
Do you understand how I mean, and is it a good approach? Is there a better/more efficient way?
If this is a good idea. How do I do it in Rails? Should I use something like this (I think I've seen something like this in the API doc):
# products model
has_one :category, :through => :raw_categories
I estimate that there will be about 40k to 100k products in the database.
Yes, that is the typical design. I would usually call your base table category, and call this alias table categoryAlias. I'm picky about verbage, but raw_categories has nothing to do with "rawness", it's just the categories you want to use.
The other thing I'd suggest is that when you create a category you also create a categoryAlias.
Considering the amount of data you will have in these categories, one practical suggestion I can offer is that when you create a category, also create a categoryAlias row for it, with the same name.
This will make your import code easier as you will only need to query categoryAlias to determine if a category already exists, or there is an alias for it.
