Rails query] difference between joins and includes - ruby-on-rails

#teachers = User.joins(:students).where("student_id IS NOT NULL")
The above works, the below doesn't.
#teachers = User.includes(:students).where("student_id IS NOT NULL")
As far as I understand, joins and includes should both bring the same result with different performance. According to this, you use includes to load associated records of the objects called by Model, where joins to simply add two tables together. Using includes can also prevent the N+1 queries.
First question: why does my second line of code not work?
Second question: should anyone always use includes in a case similar to above?

You use joins when you want to query against the joined model. This is doing an inner join between your tables.
Includes is when you want to eager load the associated model to the end result.
This allows you to call the association on any of the results without having to again do the db lookup.
You cannot query against a model that is loaded via includes. If you want to query against it you must use joins( you can do both! )

Related

Optimizing ActiveRecord query when there is if/else logic in select statement

I have a very janky query that I'm trying to optimize. I want to return an activerecord association of user records based on some membership packet logic.
ids = User.joins(:memberships).includes(:memberships).select do |user|
if user.memberships.count <= 1
user.membership_packet_sent_at.blank?
else
user.membership_packet_sent_at.blank? && user.current_membership.created_more_than_30_days_ago?
end
end.uniq.map(&:id)
User.where(id: ids)
The issue I am running into above is that I have to make two queries since I don't know how to use select to return an active record association instead of an array.
How do I return an activerecord association from select?
Is there a better way for me to organize my logic inside the select block?
Edit:
After some refactoring, I was able to move some of the select logic into methods:
```
ids = User.joins(:memberships)
.where(membership_packet_sent_at: nil)
.select(&:current_membership_packet_expired?)
.uniq
.map(&:id)
User.where(id: ids)
```
Still running into the issue of how to return an activerecord association instead
I don't think it is currently possible to directly return a Relation object after applying a select.
These are the main reasons that I could think about:
A Relation object is just a SQL builder and doesn't work with the actual data
The select method allows to execute custom processes that cannot be represented as a valid SQL query, so it won't be possible to chain the result of the select with another query like where
This answer also states a similar line of thought.

Optimizing has many record association query

I have this query that I've built using Enumerable#select. The purpose is to find records thave have no has many associated records or if it does have those records select only those with it's preview attribute set to true. The code below works perfectly for that use case. However, this query does not scale well. When I test against thousands of records it takes several hundred seconds to complete. How can this query be improved upon?
# User has many enrollments
# Enrollment belongs to user.
users_with_no_courses = User.includes(:enrollments).select {|user| user.enrollments.empty? || user.enrollments.where(preview: false).empty?}
So first, make sure enrollments.user_id has an index.
Second, you can speed this up by not loading all the enrollments, and doing your filtering in SQL:
User.where(<<-EOQ)
NOT EXISTS (SELECT 1
FROM enrollments e
WHERE e.user_id = users.id
AND NOT e.preview)
EOQ
By the way here I'm simplifying your two conditions into one: "no enrollments or no real enrollments" is the same as "no real enrollments".
If you want you can put this condition into a scope so it is more reusable.
Third, this is still going to be slow if you're instantiating thousands of User objects. So I would look into paginating if that makes sense, or find_each if this is an offline script. Or use raw SQL to avoid all the object instances.
Oh by the way: even though you are saying includes(:enrollments), this will still go back to the database, giving you an n+1 problem:
user.enrollments.where(preview: false)
That is because the where means ActiveRecord can't use the already-loaded association. You can avoid that by using select instead of where. But not loading the enrollments in the first place is even better.

ActiveRecord 4.2 includes and ordering by count

I'm writing a search for a project I'm working on. It is meant to be able to search the body of articles and produce a list of their authors, ordered by the number of matching articles and including the relevant articles only, not all of their articles.
I currently have the following query:
Author.includes(:articles).where('articles.body ilike ?', '%foo%').references(:articles)
The use of includes in this case makes it so that all the relevant articles (not all articles) are preloaded, that's exactly what I want. However, when it comes to ordering by the number of included articles, I'm not sure how to proceed.
I should note I want to do this in ActiveRecord because pagination will be applied after the query. Not after a Ruby solution.
I should note I'm using PostgreSQL 9.3.
Edit: using raw SQL
This seems to work on its own like so:
Author.includes(:articles).where('articles.body ilike ?', '%foo%').references(:articles).select('authors.*, (SELECT COUNT(0) FROM articles WHERE articles.author_id = authors.id) AS article_count').order('article_count DESC')
This works fine. However, if I add .limit(1) it breaks.
PG::UndefinedColumn: ERROR: column "article_count" does not exist
Any idea why adding limit breaks it? The query seems very different too
SELECT DISTINCT "authors"."id", article_count AS alias_0 FROM "authors" LEFT OUTER JOIN "articles" ON "articles"."author_id" = "authors"."id" WHERE (articles.body ilike '%microsoft%') ORDER BY article_count DESC LIMIT 1
I don't think there's an out of the box solution for this. You have to write raw sql to do this but you can combine it with existing ActiveRecord queries.
Author
.includes(:articles)
.select('authors.*, (SELECT COUNT(0) FROM articles WHERE articles.author_id = authors.id) AS article_count')
.order('article_count DESC')
So the only thing to explain here is the select part. The first part, authors.*, selects all fields under the authors table and this is the default. Since we want to also count the number of articles, we create a subquery and pass its result as one of the pseudo columns of authors (we called it article_count). The last part is to just call order using article_count.
This solution assumes a couple of things which you'll have to fine tune depending on your setup.
Author by convention in rails maps to an authors table. If it is an STI (inherits from a User class and is using users table), you'll need to change authors to users.
articles.author_id assumes that the foreign key is author_id (and essentially, an article is only written by a single author). Change to whatever the foreign key is.
So given that, you'll have an array of authors ordered by the number of articles they've written.

Loading all the data but not from all the tables

I watched this rails cast http://railscasts.com/episodes/22-eager-loading but still I have some confusions about what is the best way of writing an efficient GET REST service for a scenario like this:
Let's say we have an Organization table and there are like twenty other tables that there is a belongs_to and has_many relations between them. (so all those tables have a organization_id field).
Now I want to write a GET and INDEX request in form of a Rails REST service that based on the organization id being passed to the request in URL, it can go and read those tables and fill the JSON BUT NOT for ALL of those table, only for a few of them, for example let's say for a Patients, Orders and Visits table, not all of those twenty tables.
So still I have trouble with getting my head around how to write such a
.find( :all )
sort of query ?
Can someone show some example so I can understand how to do this sort of queries?
You can include all of those tables in one SQL query:
#organization = Organization.includes(:patients, :orders, :visits).find(1)
Now when you do something like:
#organization.patients
It will load the patients in-memory, since it already fetched them in the original query. Without includes, #organization.patients would trigger another database query. This is why it's called "eager loading", because you are loading the patients of the organization before you actually reference them (eagerly), because you know you will need that data later.
You can use includes anytime, whether using all or not. Personally I find it to be more explicit and clear when I chain the includes method onto the model, instead of including it as some sort of hash option (as in the Railscast episode).

Why does ActiveRecord's :include do two queries?

I am just learning ActiveRecord and SQL and I was under the impression that :include does one SQL query. So if I do:
Show.first :include => :artist
It will execute one query and that query is going to return first show and artist. But looking at the SQL generated, I see two queries:
[2013-01-08T09:38:00.455705 #1179] DEBUG -- : Show Load (0.5ms) SELECT `shows`.* FROM `shows` LIMIT 1
[2013-01-08T09:38:00.467123 #1179] DEBUG -- : Artist Load (0.5ms) SELECT `artists`.* FROM `artists` WHERE `artists`.`id` IN (2)
I saw one of the Railscast videos where the author was going over :include vs :join and I saw the output SQL on the console and it was a large SQL query, but it was only one query. I am just wondering if this is how it is supposed to be or am I missing something?
Active Record has two ways in which it loads association up front. :includes will trigger either of those, based on some heuristics.
One way is for there to be one query per association: you first load all the shows (1 query) then you load all artists (2nd query). If you were then including an association on artists that would be a 3rd query. All of these queries are simple queries, although it does mean that no advantage is gained in your specific case. Because the queries are separate, you can't do things like order the top level (shows) by the child associations and thing like that.
The second way is to load everything in one big joins based query. This always produces a single query, but its more complicated - 1 join per association included and the code to turn the result set back into ruby objects is more complicated too. There are some other corner cases: polymorphic belongs_to can't be handled and including multiple has_many at the same level will produce a very large result set).
Active Record will by default use the first strategy (preload), unless it thinks that your query conditions or order are referencing the associations, in which case it falls back to the second approach. You can force the strategy used by using preload or eager_load instead of :includes.
Using :includes is a solution to provide eager loading. It will load at most two queries in your example. If you were to change your query Show.all :include => :artist. This will also call just two queries.
Better explanation: Active Record Querying Eager Loading

Resources