Complex :includes for Eager Loading - ruby-on-rails

I'm trying to create a pretty complex eager load; I’d like to modify the second statement, not the first. I need to create a join on the second statement that includes a column from another table.
Everything I try modifies the first statement and leaves the second as is. If there’s another way to accomplish the same task without N+1 queries, I'm open.
This:
Conversation.joins(:phones)
.where('phones.id' => 2)
.order('last_message_at DESC')
.includes(:messages)
Generates:
SELECT "conversations".* FROM "conversations"
INNER JOIN "conversations_phones"
ON "conversations_phones"."conversation_id" = "conversations"."id"
INNER JOIN "phones"
ON "phones"."id" = "conversations_phones"."phone_id"
WHERE "phones"."id" = 2 ORDER BY last_message_at DESC
SELECT "messages".* FROM "messages"
WHERE "messages"."conversation_id" IN (10, 11) ORDER BY created_at ASC
Makes sense, but not where I want to be.
I can write the needed second statement with something like this:
Message.joins(:message_tags)
.select('messages.*, message_tags.status as read')
.group('messages.id, message_tags.status')
.order('messages.id')
.where(:message_tags => { :user_id => current_user.id })
.where(:messages => { :conversation_id => [10, 11] })
Which correctly generates:
SELECT messages.*, message_tags.status as read FROM "messages"
INNER JOIN "message_tags" ON "message_tags"."message_id" = "messages"."id"
WHERE "message_tags"."user_id" = 2 AND "messages"."conversation_id" IN (10, 11)
GROUP BY messages.id, message_tags.status
ORDER BY messages.id
Basically, I want the more complex messages select to replace the simpler one so I can call #conversations.first.messages.first.read without creating a new query.

Sounds like you need to add some conditions to an association:
has_many :messages, -> { select('messages.*, message_tags.status as read')) }
and then something like:
Conversation.joins(:phones)
.where('phones.id' => 2)
.order('last_message_at DESC')
.includes(:messages => :message_tags)
.where(:message_tags => { :user_id => current_user.id })

Related

Additive scope conditions for has_many :through

I want a user to be able to find all posts that have one or more tags. And I'd like the tags to be additive criteria, so for example you could search for posts that have just the 'News' tag, or you could search for posts that have both the 'News' and 'Science' tags.
Currently what I have, and it works, is a Post model, a Tag model, and a join model called Marking. Post has_many :tags, through: :markings. I get what I need by passing an array of Tag ids to a Post class method:
post.rb
def self.from_tag_id_array array
post_array = []
Marking.where(tag_id: array).group_by(&:post_id).each do |p_id,m_array|
post_array << p_id if m_array.map(&:tag_id).sort & array.sort == array.sort
end
where id: post_array
end
This seems like a clunky way to get there. Is there a way I can do this with a scope on an association or something of the like?
So the general rule of thumb with building these kinds of queries is to minimize work in "Ruby-land" and maximize work in "Database-land". In your solution above, you're fetching a set of markings with any tags in the set array, which presumably will be a very large set (all posts that have any of those tags). This is represented in a ruby array and processed (group_by is in Ruby-world, group is the equivalent in Database-land).
So aside from being hard-to-read, that solution is going to be slow for any large set of markings.
There are a couple ways to solve the problem without doing any heavy lifting in Ruby-world. One way is using subqueries, like this:
scope :with_tag_ids, ->(tag_ids) {
tag_ids.map { |tag_id|
joins(:markings).where(markings: { tag_id: tag_id })
}.reduce(all) { |scope, subquery| scope.where(id: subquery) }
}
This generates a query like this (again for tag_ids 5 and 8)
SELECT "posts".*
FROM "posts"
WHERE "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 5)
AND "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 8)
Note that since everything here is calculated directly in SQL, no arrays are generated or processed in Ruby. This will generally scale much better.
Alternatively, you can use COUNT and do it in a single query without subqueries:
scope :with_tag_ids, ->(tag_ids) {
joins(:markings).where(markings: { tag_id: tag_ids }).
group(:post_id).having('COUNT(posts.id) = ?', tag_ids.count)
}
Which generates SQL like this:
SELECT "posts".*
FROM "posts"
INNER JOIN "markings" ON "markings"."post_id" = "posts"."id"
WHERE "markings"."tag_id" IN (5, 8)
GROUP BY "post_id"
HAVING (COUNT(posts.id) = 2)
This assumes that you don't have multiple markings with the same pair of tag_id and post_id, which would throw off the count.
I would imagine that the last solution is probably the most efficient, but you should try different solutions and see what works best for your data.
See also: Query intersection with activerecord

Problem with named_scope causes error in will_paginate - how to include group_by in count?

[rails 2.3.12] named_scope:
named_scope :order_by_price, lambda {{:joins => :variants, :group => "products.id", :order => "MAX(price)"}}
console:
1.
> Product.order_by_price.size
=> 21
2.
> p = Product.order_by_price
> p.size
=> 4
sql queries:
1.
SELECT count(*) AS count_all FROM `products` INNER JOIN `variants` ON variants.product_id = products.id
2.
SELECT `products`.* FROM `products` INNER JOIN `variants` ON variants.product_id = products.id GROUP BY products.id ORDER BY MAX(price)
I use will_paginate for pagination. In this case total_entries value is 21 and number of pages is based on this, although there are only 4 products...
Any ideas how can I get this to work correctly?
EDIT
In general I have to include group_by when calling Product.count... how?
No answers, but I found a solution. Maybe it will be useful for someone else too. I just had to redefine count, selecting distinct product_id:
def self.count(*args)
super(args, {:select => "(products.id)", :distinct => true})
end

Rails eager loading seems to be querying wrong

I'm attempting to eager load in my Rails 3 app. I've narrowed it down to a very basic sample, and instead of generating the one query I'm expecting, it's generating 4.
First, here's a simple breakdown of my models.
class Profile < ActiveRecord::Base
belongs_to :gender
def to_param
self.name
end
end
class Gender < ActiveRecord::Base
has_many :profiles, :dependent => :nullify
end
I then has a ProfilesController::show action, where's I'm querying for the model.
def ProfilesController < ApplicationController
before_filter :find_profile, :only => [:show]
def show
end
private
def find_profile
#profile = Profile.find_by_username(params[:id], :include => :gender)
raise ActiveRecord::RecordNotFound, "Page not found" unless #profile
end
end
When I look at the queries this generates, it shows the following:
SELECT `profiles`.* FROM `profiles` WHERE `profiles`.`username` = 'matt' LIMIT 1
SELECT `genders`.* FROM `genders` WHERE (`genders`.`id` = 1)
What I expected to see is a single query:
SELECT `profiles`.*, `genders`.* FROM `profiles` LEFT JOIN `genders` ON `profiles`.gender_id = `genders`.id WHERE `profiles`.`username` = 'matt' LIMIT 1
Anyone know what I'm doing wrong here? Everything I've found on eager loading makes it sound like this should work.
Edit: After trying joins, as recommended by sled, I'm still seeing the same results.
The code:
#profile = Profile.joins(:gender).where(:username => params[:id]).limit(1).first
The query:
SELECT `profiles`.* FROM `profiles` INNER JOIN `genders` ON `genders`.`id` = `profiles`.`gender_id` WHERE `profiles`.`username` = 'matt' LIMIT 1
Again, you can see no genders data is being retrieved, and so a second query to genders is being made.
I even tried adding a select, to no avail:
#profile = Profile.joins(:gender).select('profiles.*, genders.*').where(:username => params[:id]).limit(1).first
which correctly resulted in:
SELECT profiles.*, genders.* FROM `profiles` INNER JOIN `genders` ON `genders`.`id` = `profiles`.`gender_id` WHERE `profiles`.`username` = 'matt' LIMIT 1
...but it still performed a second query on genders later when accessing #profile.gender's attributes.
Edit 2: I also tried creating a scope that includes both select and joins in order to get all the fields I require, (similar to the custom left join method sled demonstrated). It looks like this:
class Profile < ActiveRecord::Base
# ...
ALL_ATTRIBUTES = [:photo, :city, :gender, :relationship_status, :physique, :children,
:diet, :drink, :smoke, :drug, :education, :income, :job, :politic, :religion, :zodiac]
scope :with_attributes,
select((ALL_ATTRIBUTES.collect { |a| "`#{reflect_on_association(a).table_name}`.*" } + ["`#{table_name}`.*"]).join(', ')).
joins(ALL_ATTRIBUTES.collect { |a|
assoc = reflect_on_association(a)
"LEFT JOIN `#{assoc.table_name}` ON `#{table_name}`.#{assoc.primary_key_name} = `#{assoc.table_name}`.#{assoc.active_record_primary_key}"
}.join(' '))
# ...
end
This generates the following query, which appears correct:
SELECT `photos`.*, `cities`.*, `profile_genders`.*, `profile_relationship_statuses`.*, `profile_physiques`.*, `profile_children`.*, `profile_diets`.*, `profile_drinks`.*, `profile_smokes`.*, `profile_drugs`.*, `profile_educations`.*, `profile_incomes`.*, `profile_jobs`.*, `profile_politics`.*, `profile_religions`.*, `profile_zodiacs`.*, `profiles`.* FROM `profiles` LEFT JOIN `photos` ON `profiles`.photo_id = `photos`.id LEFT JOIN `cities` ON `profiles`.city_id = `cities`.id LEFT JOIN `profile_genders` ON `profiles`.gender_id = `profile_genders`.id LEFT JOIN `profile_relationship_statuses` ON `profiles`.relationship_status_id = `profile_relationship_statuses`.id LEFT JOIN `profile_physiques` ON `profiles`.physique_id = `profile_physiques`.id LEFT JOIN `profile_children` ON `profiles`.children_id = `profile_children`.id LEFT JOIN `profile_diets` ON `profiles`.diet_id = `profile_diets`.id LEFT JOIN `profile_drinks` ON `profiles`.drink_id = `profile_drinks`.id LEFT JOIN `profile_smokes` ON `profiles`.smoke_id = `profile_smokes`.id LEFT JOIN `profile_drugs` ON `profiles`.drug_id = `profile_drugs`.id LEFT JOIN `profile_educations` ON `profiles`.education_id = `profile_educations`.id LEFT JOIN `profile_incomes` ON `profiles`.income_id = `profile_incomes`.id LEFT JOIN `profile_jobs` ON `profiles`.job_id = `profile_jobs`.id LEFT JOIN `profile_politics` ON `profiles`.politic_id = `profile_politics`.id LEFT JOIN `profile_religions` ON `profiles`.religion_id = `profile_religions`.id LEFT JOIN `profile_zodiacs` ON `profiles`.zodiac_id = `profile_zodiacs`.id WHERE `profiles`.`username` = 'matt' LIMIT 1
Unfortunately, it doesn't seem that calls to relationship attributes (e.g.: #profile.gender.name) are using the data that was returned in the original SELECT. Instead, I see a flood of queries following this first one:
Profile::Gender Load (0.2ms) SELECT `profile_genders`.* FROM `profile_genders` WHERE `profile_genders`.`id` = 1 LIMIT 1
Profile::Gender Load (0.4ms) SELECT `profile_genders`.* FROM `profile_genders` INNER JOIN `profile_attractions` ON `profile_genders`.id = `profile_attractions`.gender_id WHERE ((`profile_attractions`.profile_id = 2))
City Load (0.4ms) SELECT `cities`.* FROM `cities` WHERE `cities`.`id` = 1 LIMIT 1
Country Load (0.3ms) SELECT `countries`.* FROM `countries` WHERE `countries`.`id` = 228 ORDER BY FIELD(code, 'US') DESC, name ASC LIMIT 1
Profile Load (0.4ms) SELECT `profiles`.* FROM `profiles` WHERE `profiles`.`id` = 2 LIMIT 1
Profile::Language Load (0.4ms) SELECT `profile_languages`.* FROM `profile_languages` INNER JOIN `profile_profiles_languages` ON `profile_languages`.id = `profile_profiles_languages`.language_id WHERE ((`profile_profiles_languages`.profile_id = 2))
SQL (0.3ms) SELECT COUNT(*) FROM `profile_ethnicities` INNER JOIN `profile_profiles_ethnicities` ON `profile_ethnicities`.id = `profile_profiles_ethnicities`.ethnicity_id WHERE ((`profile_profiles_ethnicities`.profile_id = 2))
Profile::Religion Load (0.5ms) SELECT `profile_religions`.* FROM `profile_religions` WHERE `profile_religions`.`id` = 2 LIMIT 1
Profile::Politic Load (0.2ms) SELECT `profile_politics`.* FROM `profile_politics` WHERE `profile_politics`.`id` = 3 LIMIT 1
your example is fine and it will end up in two queries because that's how eager loading is implemented in rails. It becomes handy if you have many associated records. You can read more about it here
What you probably want is a simple join:
#profile = Profile.joins(:gender).where(:username => params[:id])
Edit
If the profile consists of many pieces there are multiple approaches here:
Custom left joins - maybe there is a plugin out there which does the job otherwise I'd suggest to do something like:
class Profile < ActiveRecord::Base
# .... code .....
def self.with_dependencies
attr_joins = []
attr_selects = []
attr_selects << "`profiles`.*"
attr_selects << "`genders`.*"
attr_selects << "`colors`.*"
attr_joins << "LEFT JOIN `genders` ON `gender`.`id` = `profiles`.gender_id"
attr_joins << "LEFT JOIN `colors` ON `colors`.`id` = `profiles`.color_id"
prep_model = select(attr_selects.join(','))
attr_joins.each do |c_join|
prep_model = prep_model.joins(c_join)
end
return prep_model
end
end
Now you could do something like:
#profile = Profile.with_dependencies.where(:username => params[:id])
Another solution is to use the :include => [:gender, :color] it may be some queries more but it's the cleaner "rails way". If you run into performance issues you may want to rethink your DB Schema but do you have really such a heavy load?
A friend of mine wrote a nice little solution for this simple 1:n relations (like genders) it's called simple_enum
After working with sled's suggestions, I finally came up with this solution. I'm sure it could be made cleaner with a plugin, but here's what I've got for now:
class Profile < ActiveRecord::Base
ALL_ATTRIBUTES = [:photo, :city, :gender, :relationship_status, :physique, :children,
:diet, :drink, :smoke, :drug, :education, :income, :job, :politic, :religion, :zodiac]
scope :with_attributes,
includes(ALL_ATTRIBUTES).
select((ALL_ATTRIBUTES.collect { |a| "`#{reflect_on_association(a).table_name}`.*" } + ["`#{table_name}`.*"]).join(', '))
end
The two main points are:
A call to includes, which passes the symbols of the relationships I want
A call to select that makes sure to retrieve all columns for the related tables. Note that I call reflect_on_association so that I don't have to hard-code the related tables' names, letting the Rails models do the work for me.
I can now call:
Profile.with_attributes.where(:username => params[:id]).limit(1).first
Going to mark sled's answer as correct since it's his help (answers + comments combined) that led me here, even though this is the code I'm ultimately using.

ActiveRecord Count to count rows returned by group by in Rails

I looked around and couldn't find any answers to this. All answers involved counts that did not use a GROUP BY.
Background:
I have a paginator that will take options for an ActiveRecord.find. It adds a :limit and :offset option and performs the query. What I also need to do is count the total number of records (less the limit), but sometimes the query contains a :group option and ActiveRecord.count tries to return all rows returned by the GROUP BY along with each of their counts. I'm doing this in Rails 2.3.5.
What I want is for ActiveRecord.count to return the number of rows returned by the GROUP BY.
Here is some sample code that demonstrates one instance of this (used for finding all tags and ordering them by the number of posts with that tag):
options = { :select => 'tags.*, COUNT(*) AS post_count',
:joins => 'INNER JOIN posts_tags', #Join table for 'posts' and 'tags'
:group => 'tags.id',
:order => 'post_count DESC' }
#count = Tag.count(options)
options = options.merge { :offset => (page - 1) * per_page, :limit => per_page }
#items = Tag.find(options)
With the :select option, the Tag.count generates the following SQL:
SELECT count(tags.*, COUNT(*) AS post_count) AS count_tags_all_count_all_as_post_count, tags.id AS tags_id FROM `tags` INNER JOIN posts_tags GROUP BY tags.id ORDER BY COUNT(*) DESC
As you can see it merely wrapped a COUNT() around the 'tags.*, COUNT(*)', and MySQL complains about the COUNT within a COUNT.
Without the :select option, it generates this SQL:
SELECT count(*) AS count_all, tags.id AS tags_id FROM `tags` INNER JOIN posts_tags GROUP BY tags.id ORDER BY COUNT(*)
which returns the whole GROUP BY result set and not the number of rows.
Is there a way around this or will I have to hack up the paginator to account for queries with GROUP BYs (and how would I go about doing that)?
Seems like you'd need to handle the grouped queries separately. Doing a count without a group returns an integer, while counting with a group returns a hash:
Tag.count
SQL (0.2ms) SELECT COUNT(*) FROM "tags"
=> 37
Tag.count(:group=>"tags.id")
SQL (0.2ms) SELECT COUNT(*) AS count_all, tags.id AS tags_id FROM "tags"
GROUP BY tags.id
=> {1=>37}
If you're using Rails 4 or 5 you can do the following as well.
Tag.group(:id).count
The workaround for my situation seems to be to replace the :group => 'tags.id' with :select => 'DISTINCT tags.id' in the options hash before executing the count.
count_options = options.clone
count_options.delete(:order)
if options[:group]
group_by = count_options[:group]
count_options.delete(:group)
count_options[:select] = "DISTINCT #{group_by}"
end
#item_count = #type.count(count_options)
Another (hacky) solution:
selection = Tag.where(...).group(...)
count = Tag.connection.select_value "select count(*) from (" + selection.to_sql + ") as x"
If I understand your question correctly, then it should work if you don't use Tag.count at all. Specifying 'COUNT(*) AS post_count' in your select hash should be enough. For example:
#tag = Tag.first(options)
#tag.post_count
As you can see, the post_count value from the query is accessible from the #tag instance. And if you want to get all tags, then perhaps something like this:
#tags = Tag.all(options)
#tags.each do |tag|
puts "Tag name: #{tag.name} posts: #{tag.post_count}"
end
Update:
Count can be called with which attribute to count and also a parameter :distinct
options = { :select => 'tags.*, COUNT(*) AS post_count',
:joins => 'INNER JOIN posts_tags', #Join table for 'posts' and 'tags'
:group => 'tags.id',
:order => 'post_count DESC',
:offset => (page - 1) * per_page,
:limit => per_page }
#count = Tag.count(:id, :distinct => true, :joins => options[:joins])
#items = Tag.find(options)

named_scope + average is causing the table to be specified more then once in the sql query run on postgresql

I have a named scopes like so...
named_scope :gender, lambda { |gender| { :joins => {:survey_session => :profile }, :conditions => { :survey_sessions => { :profiles => { :gender => gender } } } } }
and when I call it everything works fine.
I also have this average method I call...
Answer.average(:rating, :include => {:survey_session => :profile}, :group => "profiles.career")
which also works fine if I call it like that.
However if I were to call it like so...
Answer.gender('m').average(:rating, :include => {:survey_session => :profile}, :group => "profiles.career")
I get...
ActiveRecord::StatementInvalid: PGError: ERROR: table name "profiles" specified more than once
: SELECT avg("answers".rating) AS avg_rating, profiles.career AS profiles_career FROM "answers" LEFT OUTER JOIN "survey_sessions" survey_sessions_answers ON "survey_sessions_answers".id = "answers".survey_session_id LEFT OUTER JOIN "profiles" ON "profiles".id = "survey_sessions_answers".profile_id INNER JOIN "survey_sessions" ON "survey_sessions".id = "answers".survey_session_id INNER JOIN "profiles" ON "profiles".id = "survey_sessions".profile_id WHERE ("profiles"."gender" = E'm') GROUP BY profiles.career
Which is a little hard to read but says I'm including the table profiles twice.
If I were to just remove the include from average it works but it isn't really practical because average is actually being called inside a method which gets passed the scoped. So there is some times gender or average might get called with out each other and if either was missing the profile include it wouldn't work.
So either I need to know how to fix this apparent bug in Rails or figure out a way to know what scopes were applied to a ActiveRecord::NamedScope::Scope object so that I could check to see if they have been applied and if not add the include for average.
Looks like ActiveRecord is generating some bad SQL:
SELECT avg("answers".rating) AS avg_rating,
profiles.career AS profiles_career
FROM "answers"
LEFT OUTER JOIN "survey_sessions" survey_sessions_answers
ON "survey_sessions_answers".id = "answers".survey_session_id
LEFT OUTER JOIN "profiles"
ON "profiles".id = "survey_sessions_answers".profile_id
INNER JOIN "survey_sessions"
ON "survey_sessions".id = "answers".survey_session_id
INNER JOIN "profiles"
ON "profiles".id = "survey_sessions".profile_id
WHERE ("profiles"."gender" = E'm')
GROUP BY profiles.career
Presumably it's generated the left joins as part of getting the projected property, and the inner joins as part of getting the criteria: this wouldn't be invalid (just inefficient) if it assigned aliases to those tables, but it doesn't. Is there a way to specify an alias name from your app?

Resources