Can you temporarily disable mongoid relationships to roll your own queries - ruby-on-rails

I am trying to manually build a series of queries to get around mongo's lack of joins and mongoids lack of eager loading. Suppose I have 2 classes:
class A
include Mongoid::Document
has_many :bs
...
class B
include Mongoid::Document
belongs_to :a
...
If I run a query on bs:
bs = B.where(...)
The result is a MongoidCriteria
If i try to get the first b by calling bs.first, however, it immediately fires a mongo query for the a association. This is exactly what I'm trying to avoid (If I have 1,000 b's, I'm trying to avoid 1000 singleton b queries).
This is fine, but when I have complex relationships, I want to work around the lack of eager loading by manually specifying the models myself, collecting ids, and the only returning the core model, without the associations.
Is there anything that will let me do this? Something like:
bs = B.where(...).disable_automatic_association_queries
Does such a thing exist?

The method is .without. For example:
A.where(...).without(:bs)

Related

Query Optimization with ActiveRecord for each method

Below mentioned query is taking too much time, not able to understand how to optimized it.
Code and Associations :
temp = []
platforms = current_user.company.advisory_platforms
platforms.each{ |x| temp << x.advisories.published.collect(&:id) }
class Advisory
has_many :advisory_platforms,:through =>:advisory_advisory_platforms
end
class AdvisoryPlatform
has_many :companies,:through => :company_advisory_platforms
has_many :company_advisory_platforms,:dependent => :destroy
has_many :advisory_advisory_platforms,:dependent => :destroy
has_many :advisories, :through => :advisory_advisory_platforms
end
There are three glaring performance issues in your example.
First, you are iterating the records using each which means that you are loading the entire record set into memory at once. If you must iterate records in this way you should always use find_each so it is done in batches.
Second, every iteration of your each loop is performing an additional SQL call to get its results. You want to limit SQL calls to the bare minimum.
Third, you are instantiating entire Rails models simply to collect a single value, which is very wasteful. Instantiating Rails models is expensive.
I'm going to solve these problems in two ways. First, construct an ActiveRecord relation that will access all the data you need in one query. Second, use pluck to grab the id you need without paying the model instantiation cost.
You didn't specify what published is doing so I am going to assume it is a scope on Advisory. You also left out some of the data model so I am going to have to make assumptions about your join models.
advisory_ids = AdvisoryAdvisoryPlatform
.where(advisory_platform_id: current_user.company.advisory_platforms)
.where(advisory_id: Advisory.published)
.pluck(:advisory_id)
If you pass a Relation object as the value of a field, ActiveRecord will convert it into a subquery.
So
where(advisory_id: Advisory.published)
is analogous to
WHERE advisory_id IN (SELECT id FROM advisories WHERE published = true)
(or whatever it is published is doing).

Can I have a one way HABTM relationship?

Say I have the model Item which has one Foo and many Bars.
Foo and Bar can be used as parameters when searching for Items and so Items can be searched like so:
www.example.com/search?foo=foovalue&bar[]=barvalue1&bar[]=barvalue2
I need to generate a Query object that is able to save these search parameters. I need the following relationships:
Query needs to access one Foo and many Bars.
One Foo can be accessed by many different Queries.
One Bar can be accessed by many different Queries.
Neither Bar nor Foo need to know anything about Query.
I have this relationship set up currently like so:
class Query < ActiveRecord::Base
belongs_to :foo
has_and_belongs_to_many :bars
...
end
Query also has a method which returns a hash like this: { foo: 'foovalue', bars: [ 'barvalue1', 'barvalue2' } which easily allows me to pass these values into a url helper and generate the search query.
This all works fine.
My question is whether this is the best way to set up this relationship. I haven't seen any other examples of one-way HABTM relationships so I think I may be doing something wrong here.
Is this an acceptable use of HABTM?
Functionally yes, but semantically no. Using HABTM in a "one-sided" fashion will achieve exactly what you want. The name HABTM does unfortunately insinuate a reciprocal relationship that isn't always the case. Similarly, belongs_to :foo makes little intuitive sense here.
Don't get caught up in the semantics of HABTM and the other association, instead just consider where your IDs need to sit in order to query the data appropriately and efficiently. Remember, efficiency considerations should above all account for your productivity.
I'll take the liberty to create a more concrete example than your foos and bars... say we have an engine that allows us to query whether certain ducks are present in a given pond, and we want to keep track of these queries.
Possibilities
You have three choices for storing the ducks in your Query records:
Join table
Native array of duck ids
Serialized array of duck ids
You've answered the join table use case yourself, and if it's true that "neither [Duck] nor [Pond] need to know anything about Query", using one-sided associations should cause you no problems. All you need to do is create a ducks_queries table and ActiveRecord will provide the rest. You could even opt to use has_many :through relationship if you need to do anything fancy.
At times arrays are more convenient than using join tables. You could store the data as a serialized integer array and add handlers for accessing the data similar to the following:
class Query
serialize :duck_ids
def ducks
transaction do
Duck.where(id: duck_ids)
end
end
end
If you have native array support in your database, you can do the same from within your DB. similar.
With Postgres' native array support, you could make a query as follows:
SELECT * FROM ducks WHERE id=ANY(
(SELECT duck_ids FROM queries WHERE id=1 LIMIT 1)::int[]
)
You can play with the above example on SQL Fiddle
Trade Offs
Join table:
Pros: Convention over configuration; You get all the Rails goodies (e.g. query.bars, query.bars=, query.bars.where()) out of the box
Cons: You've added complexity to your data layer (i.e. another table, more dense queries); makes little intuitive sense
Native array:
Pros: Semantically nice; you get all the DB's array-related goodies out of the box; potentially more performant
Cons: You'll have to roll your own Ruby/SQL or use an ActiveRecord extension such as postgres_ext; not DB agnostic; goodbye Rails goodies
Serialized array:
Pros: Semantically nice; DB agnostic
Cons: You'll have to roll your own Ruby; you'll loose the ability to make certain queries directly through your DB; serialization is icky; goodbye Rails goodies
At the end of the day, your use case makes all the difference. That aside, I'd say you should stick with your "one-sided" HABTM implementation: you'll lose a lot of Rails-given gifts otherwise.

Rails ActiveRecord helper find method not eager loading association

I have the following models: Game and Pick. There's a one to many association between Game and Pick. There's a third model called Player, a Player has many Picks.
There's a method in the Player class that finds a pick for a given game or creates a new one if it doesn't exist.
class Player < ActiveRecord::Base
has_many :picks
def pick_for_game(game)
game_id = game.instance_of?(Game) ? game.id : game
picks.find_or_initialize_by_game_id(game_id)
end
end
I want to eager load the games for each pick. However if I do
picks.find_or_initialize_by_game_id(game_id, :include => :game)
It first fetches the picks when this query is run (the method is run multiple times), then fetches the games as each pick is accessed. If I add a default_scope to the Pick class
class Pick < ActiveRecord::Base
belongs_to :game
belongs_to :player
default_scope :include => :game
end
It still generates 2 select statements for each pick, but now it loads the game right after the pick, but it still doesn't do a join like I'm expecting.
Pick Load (0.2ms) SELECT "picks".* FROM "picks" WHERE "picks"."game_id" = 1 AND ("picks".player_id = 1) LIMIT 1
Game Load (0.4ms) SELECT "games".* FROM "games" WHERE ("games"."id" = 1)
First, find doesn't support having include or join as a parameter. (As mipsy said, it doesn't make sense for find to support include as it would be the same number of queries as loading it later.)
Second, include eagerly loads the association, so something like
Person.includes(:company)
is roughly equivalent to doing:
Person.all.each { |person| Company.find(person.company_id) }
I say roughly equivalent to because the former has O(1) (really two) queries whereas the latter is O(n) queries, where n is the number of people.
A join, however, would be just one query, but the downside of a join is you can't always use the retrieved data to update the model. To do a join you would do:
Person.join(:companies)
You can read more on joining tables in the Rails Guide.
To sum up, joining isn't eagerly loading because it's not loading the association, it's loading both pieces of data together at once. I realize there's a weird fine line between the two, but eagerly loading is getting other data preemptively, but you wouldn't be getting that data later via a join, or you'd have already gotten it in your original query! Hope that makes sense.
This is the way it's meant to work, I think. Eager loading is primarily used to make iterations over large collections of models more efficient by fetching them all at once-- it won't make any difference if you're just dealing with a single object.

A database design for variable column names

I have a situation that involves Companies, Projects, and Employees who write Reports on Projects.
A Company owns many projects, many reports, and many employees.
One report is written by one employee for one of the company's projects.
Companies each want different things in a report. Let's say one company wants to know about project performance and speed, while another wants to know about cost-effectiveness. There are 5-15 criteria, set differently by each company, which ALL apply to all of that company's project reports.
I was thinking about different ways to do this, but my current stalemate is this:
To company table, add text field criteria, which contains an array of the criteria desired in order.
In the report table, have a company_id and columns criterion1, criterion2, etc.
I am completely aware that this is typically considered horrible database design - inelegant and inflexible. So, I need your help! How can I build this better?
Conclusion
I decided to go with the serialized option in my case, for these reasons:
My requirements for the criteria are simple - no searching or sorting will be required of the reports once they are submitted by each employee.
I wanted to minimize database load - where these are going to be implemented, there is already a large page with overhead.
I want to avoid complicating my database structure for what I believe is a relatively simple need.
CouchDB and Mongo are not currently in my repertoire so I'll save them for a more needy day.
This would be a great opportunity to use NoSQL! Seems like the textbook use-case to me. So head over to CouchDB or Mongo and start hacking.
With conventional DBs you are slightly caught in the problem of how much to normalize your data:
A sort of "good" way (meaning very normalized) would look something like this:
class Company < AR::Base
has_many :reports
has_many :criteria
end
class Report < AR::Base
belongs_to :company
has_many :criteria_values
has_many :criteria, :through => :criteria_values
end
class Criteria < AR::Base # should be Criterion but whatever
belongs_to :company
has_many :criteria_values
# one attribute 'name' (or 'type' and you can mess with STI)
end
class CriteriaValues < AR::Base
belongs_to :report
belongs_to :criteria
# one attribute 'value'
end
This makes something very simple and fast in NoSQL a triple or quadruple join in SQL and you have many models that pretty much do nothing.
Another way is to denormalize:
class Company < AR::Base
has_many :reports
serialize :criteria
end
class Report < AR::Base
belongs_to :company
serialize :criteria_values
def criteria
self.company.criteria
end
# custom code here to validate that criteria_values correspond to criteria etc.
end
Related to that is the rather clever way of serializing at least the criteria (and maybe values if they were all boolean) is using bit fields. This basically gives you more or less easy migrations (hard to delete and modify, but easy to add) and search-ability without any overhead.
A good plugin that implements this is Flag Shih Tzu which I've used on a few projects and could recommend.
Variable columns (eg. crit1, crit2, etc.).
I'd strongly advise against it. You don't get much benefit (it's still not very searchable since you don't know in which column your info is) and it leads to maintainability nightmares. Imagine your db gets to a few million records and suddenly someone needs 16 criteria. What could have been a complete no-issue is suddenly a migration that adds a completely useless field to millions of records.
Another problem is that a lot of the ActiveRecord magic doesn't work with this - you'll have to figure out what crit1 means by yourself - now if you wan't to add validations on these fields then that adds a lot of pointless work.
So to summarize: Have a look at Mongo or CouchDB and if that seems impractical, go ahead and save your stuff serialized. If you need to do complex validation and don't care too much about DB load then normalize away and take option 1.
Well, when you say "To company table, add text field criteria, which contains an array of the criteria desired in order" that smells like the company table wants to be normalized: you might break out each criterion in one of 15 columns called "criterion1", ..., "criterion15" where any or all columns can default to null.
To me, you are on the right track with your report table. Each row in that table might represent one report; and might have corresponding columns "criterion1",...,"criterion15", as you say, where each cell says how well the company did on that column's criterion. There will be multiple reports per company, so you'll need a date (or report-number or similar) column in the report table. Then the date plus the company id can be a composite key; and the company id can be a non-unique index. As can the report date/number/some-identifier. And don't forget a column for the reporting-employee id.
Any and every criterion column in the report table can be null, meaning (maybe) that the employee did not report on this criterion; or that this criterion (column) did not apply in this report (row).
It seems like that would work fine. I don't see that you ever need to do a join. It looks perfectly straightforward, at least to these naive and ignorant eyes.
Create a criteria table that lists the criteria for each company (company 1 .. * criteria).
Then, create a report_criteria table (report 1 .. * report_criteria) that lists the criteria for that specific report based on the criteria table (criteria 1 .. * report_criteria).

ActiveRecord has_n association

I was wondering what the best way to model a relationship where an object is associated with exactly n objects of another class. I want to extend the has_one relationship to a specific value of n.
For example, a TopFiveMoviesList would belong to user and have exactly five movies. I would imagine that the underlying sql table would have fields like movie_id_1, movie_id_2, ... movie_id_5.
I know I could do a has_many relationship and limit the number of children at the model level, but I'd rather not have an intermediary table.
I think implementing this model through a join model is going to be you're best bet here. It allows the List model to worry about List logic and the Movie model to worry about Movie logic. You can create a Nomination (name isn't the greatest, but you know what I mean) model to handle the relationship between movies and lists, and when there's a limit of 5, you could just limit the number of nominations you pull back.
There are a few reasons I think this approach is better.
First, assuming you want to be able to traverse the relationships both ways (movie.lists and list.movies), the 5 column approach is going to be much messier.
While it'd be so much better for ActiveRecord to support has n relationships, it doesn't, and so you'll be fighting the framework on that one. Also, the has n relationship seems a bit brittle to me in this situation. I haven't seen that kind of implementation pulled off in ActiveRecord, though I'd be really interested in seeing it happen. :)
My first instinct would be to use a join table, but if that's not desirable User.movie[1-5]_id columns would fit the bill. (I think movie1_id fits better with Rails convention than movie_id_1.)
Since you tagged this Rails and ActiveRecord, I'll add some completely untested and probably somewhat wrong model code to my answer. :)
class User < ActiveRecord::Base
TOP_N_MOVIES = 5
(1..TOP_N_MOVIES).each { |n| belongs_to "movie#{n}".to_sym, :class_name => Movie }
end
You could wrap that line in a macro-style method, but unless if that's a common pattern for your application, doing that will probably just make your code that harder to read with little DRY benefit.
You might also want to add validations to ensure that there are no duplicate movies on a user's list.
Associating your movie class back to your users is similar.
class Movie < ActiveRecord::Base
(1..User::TOP_N_MOVIES).each do |n|
has_many "users_list_as_top_#{n}".to_sym, :class_name => User, :foreign_key => "movie#{n}_id"
end
def users_list_as_top_anything
ary = []
(1..User::TOP_N_MOVIES).each {|n| ary += self.send("users_list_as_top_#{n}") }
return ary
end
end
(Of course that users_list_as_top_anything would probably be better written out as explicit SQL. I'm lazy today.)
I assume you mean "implement" rather than "model"? The modeling's pretty easy in UML, say, where you have a Person entity that is made up of 5 Movie entities.
But the difficulty comes when you say has_one, going to has_5. If it's a simple scalar value, has_one is perhaps a property on the parent entity. Has_5 is probably 2 entities related to one another through an "is made up of" relationship in UML.
The main question to answer is probably, "Can you guarantee that it will always be 'Top 5'?" If yes, model it with columns, as you mentioned. If no, model it with another entity.
Another question is perhaps, "How easy will it be to refactor?" If it's simple, heck, start with 5 columns and refactor to separate entities if it ever changes.
As usual, "best" is dependent on the business and technical environment.

Resources