I'm writing a search for a project I'm working on. It is meant to be able to search the body of articles and produce a list of their authors, ordered by the number of matching articles and including the relevant articles only, not all of their articles.
I currently have the following query:
Author.includes(:articles).where('articles.body ilike ?', '%foo%').references(:articles)
The use of includes in this case makes it so that all the relevant articles (not all articles) are preloaded, that's exactly what I want. However, when it comes to ordering by the number of included articles, I'm not sure how to proceed.
I should note I want to do this in ActiveRecord because pagination will be applied after the query. Not after a Ruby solution.
I should note I'm using PostgreSQL 9.3.
Edit: using raw SQL
This seems to work on its own like so:
Author.includes(:articles).where('articles.body ilike ?', '%foo%').references(:articles).select('authors.*, (SELECT COUNT(0) FROM articles WHERE articles.author_id = authors.id) AS article_count').order('article_count DESC')
This works fine. However, if I add .limit(1) it breaks.
PG::UndefinedColumn: ERROR: column "article_count" does not exist
Any idea why adding limit breaks it? The query seems very different too
SELECT DISTINCT "authors"."id", article_count AS alias_0 FROM "authors" LEFT OUTER JOIN "articles" ON "articles"."author_id" = "authors"."id" WHERE (articles.body ilike '%microsoft%') ORDER BY article_count DESC LIMIT 1
I don't think there's an out of the box solution for this. You have to write raw sql to do this but you can combine it with existing ActiveRecord queries.
Author
.includes(:articles)
.select('authors.*, (SELECT COUNT(0) FROM articles WHERE articles.author_id = authors.id) AS article_count')
.order('article_count DESC')
So the only thing to explain here is the select part. The first part, authors.*, selects all fields under the authors table and this is the default. Since we want to also count the number of articles, we create a subquery and pass its result as one of the pseudo columns of authors (we called it article_count). The last part is to just call order using article_count.
This solution assumes a couple of things which you'll have to fine tune depending on your setup.
Author by convention in rails maps to an authors table. If it is an STI (inherits from a User class and is using users table), you'll need to change authors to users.
articles.author_id assumes that the foreign key is author_id (and essentially, an article is only written by a single author). Change to whatever the foreign key is.
So given that, you'll have an array of authors ordered by the number of articles they've written.
Related
Environment: Rails 3.2.22
Question:
Lets say I have the models Topics, Posts, and User.
Posts belongs to Topics
User has many Posts
I want to make a query of Topic.all, but includes all posts associated to a user.
I've tried include and eager_load with a where condition for the user id, but only topics with a post which meets the condition are return.
What I want is all topics return and include only posts which match the user_id condition.
After playing around with ActiveRecord I figured out how to do the query. It requires the left join as pointed out by #pshoukry, but it is missing two items.
AND statement is required to include only posts for a specific user.
An ActiveRecord method select needs to be appended at the end to include the fields you want.
To include all fields:
Topic.joins("LEFT JOIN posts ON posts.topic_id = topics.id AND posts.user_id = ?", user.id).select('topics.*, posts.*')
Now for the caveat. For those using Postgres and on Rails version 3.2.* there is a bug where the joined table will only return strings for ALL columns, disregarding the data type set. This issue is not present with Rails 4. There was an issue posted in the Rail's Github repo, but I can't seem to locate it. Since 3.2 is no longer supported they have no intention of fixing it.
Try using left join in your relation
Topic.joins("LEFT JOIN posts ON topics.id = posts.topic_id")
Starting with rails, i want to create a request with dynamic selection and dynamic sorting, like following examples (in SQL):
select * from books join authors on author_id = books.id
where books.title like '%something%'
order by author.name, books.title
or
select * from books join authors on author_id = books.id
where books.title like '%something%'
order by books.title, author.name
Author has_many books, book belongs to author.
I code this with two nested loops. In the first case, Author (sorted by name) is read first then Book (sorted by title), in the second case, Book first then author.
I can then print together books fields and authors fields.
The loops must reflect the hierarchy of sort fields.
But many other fields exist, and dynamic selection/ordering may be any field(s).
Is there a way to write a single 'each' loop, where books fields and authors fields would be available together, like with above sql examples.
My problem is to get fields from several tables on one single line.
What would the 'find' request be?
Thanks for your help.
Your basic query would be something like:
#books = Book.where("title LIKE ?", "%{something}%").joins(:author).order("author.name ASC, books.title ASC")
As for controlling the sorting, you can break that into scopes that get conditionally added depending on your params.
#teachers = User.joins(:students).where("student_id IS NOT NULL")
The above works, the below doesn't.
#teachers = User.includes(:students).where("student_id IS NOT NULL")
As far as I understand, joins and includes should both bring the same result with different performance. According to this, you use includes to load associated records of the objects called by Model, where joins to simply add two tables together. Using includes can also prevent the N+1 queries.
First question: why does my second line of code not work?
Second question: should anyone always use includes in a case similar to above?
You use joins when you want to query against the joined model. This is doing an inner join between your tables.
Includes is when you want to eager load the associated model to the end result.
This allows you to call the association on any of the results without having to again do the db lookup.
You cannot query against a model that is loaded via includes. If you want to query against it you must use joins( you can do both! )
I am just learning ActiveRecord and SQL and I was under the impression that :include does one SQL query. So if I do:
Show.first :include => :artist
It will execute one query and that query is going to return first show and artist. But looking at the SQL generated, I see two queries:
[2013-01-08T09:38:00.455705 #1179] DEBUG -- : Show Load (0.5ms) SELECT `shows`.* FROM `shows` LIMIT 1
[2013-01-08T09:38:00.467123 #1179] DEBUG -- : Artist Load (0.5ms) SELECT `artists`.* FROM `artists` WHERE `artists`.`id` IN (2)
I saw one of the Railscast videos where the author was going over :include vs :join and I saw the output SQL on the console and it was a large SQL query, but it was only one query. I am just wondering if this is how it is supposed to be or am I missing something?
Active Record has two ways in which it loads association up front. :includes will trigger either of those, based on some heuristics.
One way is for there to be one query per association: you first load all the shows (1 query) then you load all artists (2nd query). If you were then including an association on artists that would be a 3rd query. All of these queries are simple queries, although it does mean that no advantage is gained in your specific case. Because the queries are separate, you can't do things like order the top level (shows) by the child associations and thing like that.
The second way is to load everything in one big joins based query. This always produces a single query, but its more complicated - 1 join per association included and the code to turn the result set back into ruby objects is more complicated too. There are some other corner cases: polymorphic belongs_to can't be handled and including multiple has_many at the same level will produce a very large result set).
Active Record will by default use the first strategy (preload), unless it thinks that your query conditions or order are referencing the associations, in which case it falls back to the second approach. You can force the strategy used by using preload or eager_load instead of :includes.
Using :includes is a solution to provide eager loading. It will load at most two queries in your example. If you were to change your query Show.all :include => :artist. This will also call just two queries.
Better explanation: Active Record Querying Eager Loading
I am using Rails 3 and postgresql. I have the following genres: rock, ambience, alternative, house.
I also have two users registered. One has rock and the other house, as their genres. I need to return rock and house genre objects.
I found two ways to do this. One is using group:
Genre.group('genres.id, genres.name, genres.cached_slug, genres.created_at, genres.updated_at').joins(:user).all
And the other using DISTINCT:
Genre.select('DISTINCT(genres.name), genres.cached_slug').joins(:user)
Both return the same desired results. But which one is better performance wise? Using group() looks messy since I have to indicate all the fields in the Genre table otherwise I'll get errors as such:
ActiveRecord::StatementInvalid: PGError: ERROR: column "genres.id" must appear in the GROUP BY clause or be used in an aggregate function
: SELECT genres.id FROM "genres" INNER JOIN "users" ON "users"."genre_id" = "genres"."id" GROUP BY genres.name
A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs.
Since you're not using any aggregate functions, you should use the one that makes the most sense in your situation, which I believe is this one:
Genre.select('DISTINCT(genres.name), genres.cached_slug').joins(:user)
This will be more clear when trying to read your code later and remember what you did here and, as you pointed out, is much less messy.
Update
It depends on the version of Postgresql you are using. Using versions < 8.4, GROUP BY is faster. With version 8.4 and later, they are the same.