I have the following example query:
source = "(SELECT DISTINCT source.* FROM (SELECT * FROM items) AS source) AS items"
items = Item.select("items.*").from(source).includes([:images])
p items # [#<Item id: 1>, #<Item id:2>]
However running:
p items.count
Results in NoMethodError: undefined methodmap' for Arel::Nodes::SqlLiteral`
I appreciate the query is silly, however the non-simplifieid query is a bit too complicated to copy and this was the smallest crashing version I could create. Any ideas?
Can you call all on that object to essentially cast it to an Array?
Item.select("items.*").from(source).includes([:images]).all.count
Or perhaps in that case, size would be more appropriate. In any case, this will execute the query and load all the objects into memory, which may not be desirable.
It looks like the problem is with your includes([:images]). On a similar application, I can execute this from the console:
> Category.select('categories.*').from('(SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories').count
(0.5ms) SELECT COUNT(*) FROM (SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories
(Notice that the count overrides the SELECT clause, even though I explicitly specified items.*. But they're still equivalent queries.)
As soon as I add an includes scope, it fails:
> Category.select('categories.*').from('(SELECT DISTINCT source.* FROM (SELECT * FROM categories) AS source) AS categories').includes(:projects).count
NoMethodError: undefined method `left' for #<Arel::Nodes::SqlLiteral:0x131d35248>
I tried a few different means of acquiring the count, like select('COUNT(categories.*)'), but they all failed in various ways. ActiveRecord seems to be falling back on a basic LEFT OUTER JOIN to perform the eager loading, possibly because it thinks you're using some kind of condition or external table to perform the join, and this seems to confuse its normal methods of performing the count. See the end of the section on Eager Loading in the ActiveRecord::Associations docs.
My Suggestion
If the join doesn't affect the number of rows returned in the outer query, I'd say your best bet is to execute one query to get the count and one query to get the actual results. We have to do something similar in our application for paging: one query returns the current page of results, and one returns the total number of records matching the filter criteria.
The issue is Rails #24193 https://github.com/rails/rails/issues/24193 and has to do with from combined with eager loading. The workaround is to use the form: Item.select("items.*").from([Arel.sql(source)]).includes([:images])
Related
Currently I have a controller query which fetches products & product updates as follows:
products = Product.left_outer_joins(:productupdates).select("products.*, count(productupdates.*) as update_count, max(productupdates.old_price) as highest_price").group(:id)
products = products.paginate(:page => params[:page], :per_page => 20)
This query creates N+1 query but I can not put .include(:productsupdates) since I have a left out join as well.
If possible, can you please help me how to reduce N+1 queries?
EDIT------------------------------
As per Vishal's suggestion; I have changed the controller query as follows,
products = product.includes(:productupdates).select("products.*, count(productupdates.*) as productupdate_count, max(productupdates.old_price) as highest_price").group("productupdates.product_id")
products = products.paginate(:page => params[:page], :per_page => 20)
Unfortunately, I receive the following error:
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: missing FROM-clause entry for table "productupdates"
LINE 1: SELECT products.*, count(productupdates.*) as productupdate_count, m...
^
: SELECT products.*, count(productupdates.*) as productupdate_count, max(productupdates.old_price) as highest_price FROM "products" WHERE "products"."isopen" = $1 AND (products.year > 2009) AND ("products"."make" IS NOT NULL) GROUP BY productupdates.product_id LIMIT $2 OFFSET $3):
Please advise how this is causing N+1 and how you think this will solve the issue. The only way I can see an N+1 situation here is if you are then calling productupdates on each product later. If this is the case then this will not solve the issue. Please advise so others can formulate appropriate responses
For the time being I am going to assume that somewhere later in the code you are calling productupdates on the individual products. If this is the case then we can solve this without the aggregation as follows
#products = Product.eager_load(:productupdates)
Now when we loop the productupdates are already loaded so to get the count and the max we can do things like
#products.each do |p|
# COUNT
# (don't use the count method or it will execute a query )
p.productupdates.size
# MAX old_price
# older ruby versions use rails `try` instead
# e.g. p.productupdates.max_by(&:old_price).try(:old_price) || 0
p.productupdates.max_by(&:old_price)&.old_price || 0
end
Using these methods will not execute additional queries since the productupdates are already loaded
Side note: The reason includes did not work for you is that includes will use 2 queries to retrieve the data (sudo outer join) unless one of the following conditions is met:
The where clause uses a hash finder condition that references the association table (e.g. where(productupdates: {old_price: 12}))
You include the references method (e.g. Product.includes(:productupdates).references(:productupdates))
In both theses cases the table will be left joined. I chose to use eager load in this case as includes delegates to eager_load in the above cases anyway
You can directly do Product.includes(:productupdates) this will query the database with left outer join as well as it will overcome the N+1 query problem.
So instead of Product.left_outer_joins(:productupdates) in your query use Product.includes(:productupdates)
after firing this query in the console you can see that includes fires left outer join query on the table
I want to be able to limit the activerecord objects to 20 being returned, then perform a where() that returns a subset of the limited objects which I currently know only 10 will fulfil the second columns criteria.
e.g. of ideal behaviour:
o = Object.limit(20)
o.where(column: criteria).count
=> 10
But instead, activerecord still looks for 20 objects that fulfil the where() criteria, but looks outside of the original 20 objects that the limit() would have returned on its own.
How can I get the desired response?
One way to decrease the search space is to use a nested query. You should search the first N records rather than all records which match a specific condition. In SQL this would be done like this:
select * from (select * from table order by ORDERING_FIELD limit 20) where column = value;
The query above will only search for the condition in 20 rows from the table. Notice how I have added a ORDERING_FIELD, this is required because each query could give you a different order each time you run it.
To do something similar in Rails, you could try the following:
Object.where(id: Object.order(:id).limit(20).select(:id)).where(column: criteria)
This will execute a query similar to the following:
SELECT [objects].* FROM [objects] WHERE [objects].[id] IN (SELECT TOP (20) [objects].[id] FROM [objects] ORDER BY [objects].id ASC) AND [objects].[column] = criteria
So here's the lay of the land:
I have a Applicant model which has_many Lead records.
I need to group leads by applicant email, i.e. for each specific applicant email (there may be 2+ applicant records with the email) i need to get a combined list of leads.
I already have this working using an in-memory / N+1 solution
I want to do this in a single query, if possible. Right now I'm running one for each lead which is maxing out the CPU.
Here's my attempt right now:
Lead.
all.
select("leads.*, applicants.*").
joins(:applicant).
group("applicants.email").
having("count(*) > 1").
limit(1).
to_a
And the error:
Lead Load (1.2ms) SELECT leads.*, applicants.* FROM "leads" INNER
JOIN "applicants" ON "applicants"."id" = "leads"."applicant_id"
GROUP BY applicants.email HAVING count(*) > 1 LIMIT 1
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column
"leads.id" must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT leads.*, applicants.* FROM "leads" INNER JOIN
"appli...
This is a postgres specific issue. "the selected fields must appear in the GROUP BY clause".
must appear in the GROUP BY clause or be used in an aggregate function
You can try this
Lead.joins(:applicant)
.select('leads.*, applicants.email')
.group_by('applicants.email, leads.id, ...')
You will need to list all the fields in leads table in the group by clause (or all the fields that you are selecting).
I would just get all the records and do the grouping in memory. If you have a lot of records, I would paginate them or batch them.
group_by_email = Hash.new { |h, k| h[k] = [] }
Applicant.eager_load(:leads).each_batch(10_000) do |batch|
batch.each do |applicant|
group_by_email[:applicant.email] << applicant.leads
end
end
You need to use a .where rather than using Lead.all. The reason it is maxing out the CPU is you are trying to load every lead into memory at once. That said I guess I am still missing what you actually want back from the query so it would be tough for me to help you write the query. Can you give more info about your associations and the expected result of the query?
I have the following query returning duplicate titles, but :id is nil:
Movie.select(:title).group(:title).having("count(*) > 1")
[#<Movie:0x007f81f7111c20 id: nil, title: "Fargo">,
#<Movie:0x007f81f7111ab8 id: nil, title: "Children of Men">,
#<Movie:0x007f81f7111950 id: nil, title: "The Martian">,
#<Movie:0x007f81f71117e8 id: nil, title: "Gravity">]
I tried adding :id to the select and group but it returns an empty array. How can I return the whole movie record, not just the titles?
A SQL-y Way
First, let's just solve the problem in SQL, so that the Rails-specific syntax doesn't trick us.
This SO question is a pretty clear parallel: Finding duplicate values in a SQL Table
The answer from KM (second from the top, non-checkmarked, at the moment) meets your criteria of returning all duplicated records along with their IDs. I've modified KM's SQL to match your table...
SELECT
m.id, m.title
FROM
movies m
INNER JOIN (
SELECT
title, COUNT(*) AS CountOf
FROM
movies
GROUP BY
title
HAVING COUNT(*)>1
) dupes
ON
m.title=dupes.title
The portion inside the INNER JOIN ( ) is essentially what you've generated already. A grouped table of duplicated titles and counts. The trick is JOINing it to the unmodified movies table, which will exclude any movies that don't have matches in the query of dupes.
Why is this so hard to generate in Rails? The trickiest part is that, because we're JOINing movies to movies, we have to create table aliases (m and dupes in my query above).
Sadly, it Rails doesn't provide any clean ways of declaring these aliases. Some references:
Rails GitHub issues mentioning "join" and "alias". Misery.
SO Question: ActiveRecord query with alias'd table names
Fortunately, since we've got the SQL in-hand, we can use the .find_by_sql method...
Movie.find_by_sql("SELECT m.id, m.title FROM movies m INNER JOIN (SELECT title, COUNT(*) FROM movies GROUP BY title HAVING COUNT(*)>1) dupes ON m.first=.first")
Because we're calling Movie.find_by_sql, ActiveRecord assumes our hand-written SQL can be bundled into Movie objects. It doesn't massage or generate anything, which lets us do our aliases.
This approach has its shortcomings. It returns an array and not an ActiveRecord Relation, which means it can't be chained with other scopes. And, in the documentation for the find_by_sql method, we get extra discouragement...
This should be a last resort because using, for example, MySQL specific terms will lock you to using that particular database engine or require you to change your call if you switch engines.
A Rails-y Way
Really, what is the SQL doing above? It's getting a list of names that appear more than once. Then, it's matching that list against the original table. So, let's just do that using Rails.
titles_with_multiple = Movie.group(:title).having("count(title) > 1").count.keys
Movie.where(title: titles_with_multiple)
We call .keys because the first query returns an hash. The keys are our titles. The where() method can take an array, and we've handed it an array of titles. Winner.
You could argue one line of Ruby is more elegant than two. And if that one line of Ruby has an ungodly string of SQL embedded within it, how elegant is it really?
Hope this helps!
You can try to add id in your select:
Movie.select([:id, :title]).group(:title).having("count(title) > 1")
I have two models, Monkey and Session, where Monkey has_many Session. I have a scope for Monkey:
scope :with_session_counts, -> {
joins("LEFT OUTER JOIN `sessions` ON `sessions`.`monkey_id` = `monkeys`.`id`")
.group(:id)
.select("`monkeys`.*, COUNT(DISTINCT `sessions`.`id`) as session_count")
}
in order to grab the number of associated Sessions (even when 0).
Querying #monkeys = Monkey.with_session_counts works as expected. However, when I test in my view:
<% unless #monkeys.empty?%>
I get this error:
Mysql2::Error: Column 'id' in field list is ambiguous:
SELECT COUNT(*) AS count_all, id AS id FROM `monkeys`
LEFT OUTER JOIN `sessions` ON `sessions`.`monkey_id` = `monkeys`.`id`
GROUP BY `monkeys`.`id`
How would I convince Rails to prefix id with the table name in presence of the JOIN?
Or is there a better alternative for the OUTER JOIN?
This applies equally to calling #monkeys.count(:all). I'm using RoR 4.2.1.
Update:
I have a partial fix for my issue (specify group("monkeys.id") explicitly) I wonder whether this is a bug in the code that generates the SELECT clause for count(:all). Note that in both cases (group("monkeys.id") and group(:id)) the GROUP BY part is generated correctly (i.e. with monkeys.id), but in the latter case the SELECT only contains id AS id. The reason I say 'partial' is because it works in that it does not break a call to empty?, but a call to count(:all) returns a Hash {monkey_id => number_of_sessions} instead of the number of records.
Update 2:
I guess my real question is: How can I get the number of associated sessions for each monkey, so that for all intents and purposes I can work with the query result as with Monkey.all? I know about counter cache but would prefer not to use it.
I believe it is not a bug. Like you added on your update, you have to specify the table that the id column belongs to. In this case group('monkeys.id') would do it.
How would the code responsible for generating the statement know the table to use? Without the count worked fine because it adds points.* to the projection and that is the one used by group by. However, if you actually wanted to group by Sessions id, you would have to specify it anyway.