Mix :select with :include in find method (Rails 2) - ruby-on-rails

I have 2 models, User and UserProfile. A user has_one user_profile and a user_profile belongs_to user.
1) Find without select
This query in console works fine, and take only 2 SQL queries.
>> User.find(:all, :limit => 10, :include => [ :user_profile ])
 
User Load (0.3ms) SELECT * FROM `users` LIMIT 10
UserProfile Load (0.3ms) SELECT `user_profiles`.* FROM `user_profiles`
WHERE (`user_profiles`.user_id IN (1,2,3,...))
2) Find with select on user model
I can select columns from User model, with
>> User.find(:all, :select => '`users`.id, `users`.last_name',
:limit => 10, :include => [ :user_profile ])
 
User Load (0.3ms) SELECT `users`.id, `users`.last_name FROM `users` LIMIT 10
UserProfile Load (0.2ms) SELECT `user_profiles`.* FROM `user_profiles`
WHERE (`user_profiles`.user_id IN (17510,18087,17508,17288...))
Everything works fine. Note that I must set users.id in the user selected columns, because the second query doesn't work (return NULL).
3) Find with select on user_profile model
But when I try to select columns from UserProfile model, I got only 1 query, which doesn't take care of my :select
>> User.find(:all,
:select => '`users`.id, `users`.last_name, `user_profiles`.permalink',
:limit => 10, :include => [ :user_profile ])
 
User Load Including Associations (0.6ms) SELECT `users`.`id` AS t0_r0,
`users`.`login` AS t0_r1, ....
`user_profiles`.`id` AS t1_r0,
`user_profiles`.`birth_date` AS t1_r1,
LEFT OUTER JOIN `user_profiles` ON user_profiles.user_id = users.id LIMIT 10
As you can see, the Rails query contains fiels from users and fields from user_profiles that I didn't select.
4) Join method
Codeit purpose a method with join function :
user_details = User.find(:all,
:select => '`users`.id, `users`.last_name, `user_profiles`.permalink',
:limit => 10, :joins => [ :user_profile ]
)
 
User Load (0.2ms) SELECT `users`.id, `users`.last_name, `user_profiles`.permalink
FROM `users`
INNER JOIN `user_profiles` ON user_profiles.user_id = users.id
LIMIT 10
This solution works fine with SQL queries, but doesn't make 'link' between User and User Profile. 10 new queries are needed, while the method 1 and 2 make only 2 SQL queries.
user_details.map(&:user_profile).map(&:permalink)
UserProfile Load (0.3ms) SELECT * FROM `user_profiles` WHERE (`user_profiles`.user_id = 1) LIMIT 1
UserProfile Load (0.2ms) SELECT * FROM `user_profiles` WHERE (`user_profiles`.user_id = 2) LIMIT 1
... (10 times) ...
UserProfile Load (0.3ms) SELECT * FROM `user_profiles` WHERE (`user_profiles`.user_id = 10) LIMIT 1
Is there a right syntax to have same results than the 2 first queries, but with a :select witch select only a few columns of my models ?

Use join:
User.find(:all,
:select => '`users`.id, `users`.last_name, `user_profiles`.permalink',
:limit => 10, :joins => [ :user_profile ])
include is used for eager loading. It is used to solve (N+1) queries problem for accessing user_profile when you have large users with user_profile. If you want to select columns of included table you need to use join. If you use columns of included table it will just ignored from select clause.
EDIT:
user_details = User.find(:all,
:select => '`users`.id, `users`.last_name, `user_profiles`.permalink',
:limit => 10, :joins => [ :user_profile ]
)
user_details.map(&:permalink)

Related

Complex :includes for Eager Loading

I'm trying to create a pretty complex eager load; I’d like to modify the second statement, not the first. I need to create a join on the second statement that includes a column from another table.
Everything I try modifies the first statement and leaves the second as is. If there’s another way to accomplish the same task without N+1 queries, I'm open.
This:
Conversation.joins(:phones)
.where('phones.id' => 2)
.order('last_message_at DESC')
.includes(:messages)
Generates:
SELECT "conversations".* FROM "conversations"
INNER JOIN "conversations_phones"
ON "conversations_phones"."conversation_id" = "conversations"."id"
INNER JOIN "phones"
ON "phones"."id" = "conversations_phones"."phone_id"
WHERE "phones"."id" = 2 ORDER BY last_message_at DESC
SELECT "messages".* FROM "messages"
WHERE "messages"."conversation_id" IN (10, 11) ORDER BY created_at ASC
Makes sense, but not where I want to be.
I can write the needed second statement with something like this:
Message.joins(:message_tags)
.select('messages.*, message_tags.status as read')
.group('messages.id, message_tags.status')
.order('messages.id')
.where(:message_tags => { :user_id => current_user.id })
.where(:messages => { :conversation_id => [10, 11] })
Which correctly generates:
SELECT messages.*, message_tags.status as read FROM "messages"
INNER JOIN "message_tags" ON "message_tags"."message_id" = "messages"."id"
WHERE "message_tags"."user_id" = 2 AND "messages"."conversation_id" IN (10, 11)
GROUP BY messages.id, message_tags.status
ORDER BY messages.id
Basically, I want the more complex messages select to replace the simpler one so I can call #conversations.first.messages.first.read without creating a new query.
Sounds like you need to add some conditions to an association:
has_many :messages, -> { select('messages.*, message_tags.status as read')) }
and then something like:
Conversation.joins(:phones)
.where('phones.id' => 2)
.order('last_message_at DESC')
.includes(:messages => :message_tags)
.where(:message_tags => { :user_id => current_user.id })

How to use includes with 3 models w ActiveRecord?

Given the following model:
Room (id, title, suggested)
has_many :room_apps, :dependent => :destroy
RoomApp (room_id, app_id, appable_id, appable_type)
belongs_to :appable, :polymorphic => true
has_many :colors, :as => :appable
has_many :shirts, :as => :appable
Colors (room_id)
belongs_to :room
belongs_to :room_app
belongs_to :app
What I want to do is get all the suggested rooms. In my controller I have:
#suggested_rooms = Room.includes(:room_apps).find_all_by_suggested(true).first(5)
Problem here is the includes is not working and the db is being hit several times:
Processing by PagesController#splash as HTML
Room Load (0.6ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't' ORDER BY last_activity_at DESC
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND ("room_apps".room_id IN (5,4,3)) ORDER BY created_at DESC
RoomApp Load (5.9ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 6 AND ("room_apps".room_id = 5) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 5 LIMIT 1
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 5 AND ("room_apps".room_id = 4) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 4 LIMIT 1
RoomApp Load (0.4ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 4 AND ("room_apps".room_id = 3) ORDER BY created_at DESC LIMIT 1
Color Load (0.3ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 3 LIMIT 1
Is something setup incorrectly? I'd like to be able to get suggested rooms and use includes for room_apps with one hit versus currently where it's a hit for every room.
Ideas? Thanks
I think you'll either want to use the full Rails3 arel interface like so:
#suggested_rooms = Room.includes(:room_apps).where(:suggested => true).limit(5)
Or do this for Rails 2.3x:
#suggested_rooms = Room.find_all_by_suggested(true, :include=>:room_apps).first(5)
Did some digging around and I think I have an idea what's going on.
include by default does not generate a single query. It generates N queries, where N is the number of models being included.
ruby-1.9.2-p180 :014 > Room.where(:suggested => true).includes(:room_apps => :colors)
Room Load (0.5ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't'
RoomApp Load (0.8ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."room_id" IN (1)
Color Load (0.5ms) SELECT "colors".* FROM "colors" WHERE "colors"."room_app_id" IN (1)
One exception to this is if you have a where clause that references one of the model tables being included, in this case it will use a LEFT OUTER JOIN to add the where clause to that table.
If you want to INNER JOIN a bunch of models AND include them, you have to use both joins and includes with the given models. joins alone will only do the INNER JOIN across the relations, includes will pull in the fields and setup the returned models with their relations intact.
ruby-1.9.2-p180 :015 > Room.where(:suggested => true).joins(:room_apps => :colors)
Room Load (0.8ms) SELECT "rooms".*
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
ruby-1.9.2-p180 :016 > Room.where(:suggested => true).joins(:room_apps => :colors).includes(:room_apps => :colors)
SQL (0.6ms) SELECT "rooms"."id" AS t0_r0, "rooms"."suggested" AS t0_r1, "rooms"."created_at" AS t0_r2, "rooms"."updated_at" AS t0_r3, "room_apps"."id" AS t1_r0, "room_apps"."room_id" AS t1_r1, "room_apps"."created_at" AS t1_r2, "room_apps"."updated_at" AS t1_r3, "colors"."id" AS t2_r0, "colors"."room_id" AS t2_r1, "colors"."room_app_id" AS t2_r2, "colors"."created_at" AS t2_r3, "colors"."updated_at" AS t2_r4
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
The big convoluted SELECT part in the last query is ARel making sure that the fields from all of the models are unique and able to be differentiated when they need to be mapped back to the actual models.
Whether you use includes alone or includes with joins is a matter of how much data your bringing back, and how much speed difference there might be if you were not doing the INNER JOIN, causing a great deal of duplicate data to be returned. I would imagine that if 'rooms' had something like a dozen fields and 'colors' had 1 field, but there was 100 colors that mapped to a single room, instead of pulling back 113 fields in total (1 room * 13 + 100 colors * 1) you would end up with 1400 fields (13 + 1 * 100 colors). Not exactly a performance boost.
Though the downside of using includes alone is that if you do have a large number of colors per room, the IN(ids) will be huge, bit of a double edged sword.
Here's a quick test I did with various configurations using sqlite3
I setup two sets of rooms, one with :suggested => true, the other :suggested => false. The suggested rooms had a 1:1:2 ratio between rooms/room_apps/colors, the suggested false rooms were setup with a 1:1:10 ratio of the same, and there is a 10:1 ratio between suggested and not suggested.
# 100/10 rooms
# insert only
100 * 1/1/2: 8.1ms
10 * 1/1/10: 3.2ms
# insert + joins
100 * 1/1/2: 6.2ms
10 * 1/1/10: 3.1ms
# 1000/100 rooms
# insert only
1000 * 1/1/2: 76.8ms
100 * 1/1/10: 19.8ms
# insert + joins
1000 * 1/1/2: 54.5ms
100 * 1/1/10: 23.1ms
The times are not relevant themselves, this is being run via IRB on a Ubuntu guest on a WinXP host on a crappy HDD. Given that you've got a limit(5) in there it probably isn't going to make a huge difference either way.

Problem with named_scope causes error in will_paginate - how to include group_by in count?

[rails 2.3.12] named_scope:
named_scope :order_by_price, lambda {{:joins => :variants, :group => "products.id", :order => "MAX(price)"}}
console:
1.
> Product.order_by_price.size
=> 21
2.
> p = Product.order_by_price
> p.size
=> 4
sql queries:
1.
SELECT count(*) AS count_all FROM `products` INNER JOIN `variants` ON variants.product_id = products.id
2.
SELECT `products`.* FROM `products` INNER JOIN `variants` ON variants.product_id = products.id GROUP BY products.id ORDER BY MAX(price)
I use will_paginate for pagination. In this case total_entries value is 21 and number of pages is based on this, although there are only 4 products...
Any ideas how can I get this to work correctly?
EDIT
In general I have to include group_by when calling Product.count... how?
No answers, but I found a solution. Maybe it will be useful for someone else too. I just had to redefine count, selecting distinct product_id:
def self.count(*args)
super(args, {:select => "(products.id)", :distinct => true})
end

ActiveRecord Count to count rows returned by group by in Rails

I looked around and couldn't find any answers to this. All answers involved counts that did not use a GROUP BY.
Background:
I have a paginator that will take options for an ActiveRecord.find. It adds a :limit and :offset option and performs the query. What I also need to do is count the total number of records (less the limit), but sometimes the query contains a :group option and ActiveRecord.count tries to return all rows returned by the GROUP BY along with each of their counts. I'm doing this in Rails 2.3.5.
What I want is for ActiveRecord.count to return the number of rows returned by the GROUP BY.
Here is some sample code that demonstrates one instance of this (used for finding all tags and ordering them by the number of posts with that tag):
options = { :select => 'tags.*, COUNT(*) AS post_count',
:joins => 'INNER JOIN posts_tags', #Join table for 'posts' and 'tags'
:group => 'tags.id',
:order => 'post_count DESC' }
#count = Tag.count(options)
options = options.merge { :offset => (page - 1) * per_page, :limit => per_page }
#items = Tag.find(options)
With the :select option, the Tag.count generates the following SQL:
SELECT count(tags.*, COUNT(*) AS post_count) AS count_tags_all_count_all_as_post_count, tags.id AS tags_id FROM `tags` INNER JOIN posts_tags GROUP BY tags.id ORDER BY COUNT(*) DESC
As you can see it merely wrapped a COUNT() around the 'tags.*, COUNT(*)', and MySQL complains about the COUNT within a COUNT.
Without the :select option, it generates this SQL:
SELECT count(*) AS count_all, tags.id AS tags_id FROM `tags` INNER JOIN posts_tags GROUP BY tags.id ORDER BY COUNT(*)
which returns the whole GROUP BY result set and not the number of rows.
Is there a way around this or will I have to hack up the paginator to account for queries with GROUP BYs (and how would I go about doing that)?
Seems like you'd need to handle the grouped queries separately. Doing a count without a group returns an integer, while counting with a group returns a hash:
Tag.count
SQL (0.2ms) SELECT COUNT(*) FROM "tags"
=> 37
Tag.count(:group=>"tags.id")
SQL (0.2ms) SELECT COUNT(*) AS count_all, tags.id AS tags_id FROM "tags"
GROUP BY tags.id
=> {1=>37}
If you're using Rails 4 or 5 you can do the following as well.
Tag.group(:id).count
The workaround for my situation seems to be to replace the :group => 'tags.id' with :select => 'DISTINCT tags.id' in the options hash before executing the count.
count_options = options.clone
count_options.delete(:order)
if options[:group]
group_by = count_options[:group]
count_options.delete(:group)
count_options[:select] = "DISTINCT #{group_by}"
end
#item_count = #type.count(count_options)
Another (hacky) solution:
selection = Tag.where(...).group(...)
count = Tag.connection.select_value "select count(*) from (" + selection.to_sql + ") as x"
If I understand your question correctly, then it should work if you don't use Tag.count at all. Specifying 'COUNT(*) AS post_count' in your select hash should be enough. For example:
#tag = Tag.first(options)
#tag.post_count
As you can see, the post_count value from the query is accessible from the #tag instance. And if you want to get all tags, then perhaps something like this:
#tags = Tag.all(options)
#tags.each do |tag|
puts "Tag name: #{tag.name} posts: #{tag.post_count}"
end
Update:
Count can be called with which attribute to count and also a parameter :distinct
options = { :select => 'tags.*, COUNT(*) AS post_count',
:joins => 'INNER JOIN posts_tags', #Join table for 'posts' and 'tags'
:group => 'tags.id',
:order => 'post_count DESC',
:offset => (page - 1) * per_page,
:limit => per_page }
#count = Tag.count(:id, :distinct => true, :joins => options[:joins])
#items = Tag.find(options)

rails named_scope ignores eager loading

Two models (Rails 2.3.8):
User; username & disabled properties; User has_one :profile
Profile; full_name & hidden properties
I am trying to create a named_scope that eliminate the disabled=1 and hidden=1 User-Profiles. The User model is usually used in conjunction with the Profile model, so I attempt to eager-load the Profile model (:include => :profile).
I created a named_scope on the User model called 'visible':
named_scope :visible, {
:joins => "INNER JOIN profiles ON users.id=profiles.user_id",
:conditions => ["users.disabled = ? AND profiles.hidden = ?", false, false]
}
I've noticed that when I use the named_scope in a query, the eager-loading instruction is ignored.
Variation 1 - User model only:
# UserController
#users = User.find(:all)
# User's Index view
<% for user in #users %>
<p><%= user.username %></p>
<% end %>
# generates a single query:
SELECT * FROM `users`
Variation 2 - use Profile model in view; lazy load Profile model
# UserController
#users = User.find(:all)
# User's Index view
<% for user in #users %>
<p><%= user.username %></p>
<p><%= user.profile.full_name %></p>
<% end %>
# generates multiple queries:
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 1) ORDER BY full_name ASC LIMIT 1
SHOW FIELDS FROM `profiles`
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 2) ORDER BY full_name ASC LIMIT 1
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 3) ORDER BY full_name ASC LIMIT 1
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 4) ORDER BY full_name ASC LIMIT 1
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 5) ORDER BY full_name ASC LIMIT 1
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 6) ORDER BY full_name ASC LIMIT 1
Variation 3 - eager load Profile model
# UserController
#users = User.find(:all, :include => :profile)
#view; no changes
# two queries
SELECT * FROM `users`
SELECT `profiles`.* FROM `profiles` WHERE (`profiles`.user_id IN (1,2,3,4,5,6))
Variation 4 - use name_scope, including eager-loading instruction
#UserConroller
#users = User.visible(:include => :profile)
#view; no changes
# generates multiple queries
SELECT `users`.* FROM `users` INNER JOIN profiles ON users.id=profiles.user_id WHERE (users.disabled = 0 AND profiles.hidden = 0)
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 1) ORDER BY full_name ASC LIMIT 1
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 2) ORDER BY full_name ASC LIMIT 1
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 3) ORDER BY full_name ASC LIMIT 1
SELECT * FROM `profiles` WHERE (`profiles`.user_id = 4) ORDER BY full_name ASC LIMIT 1
Variation 4 does return the correct number of records, but also appears to be ignoring the eager-loading instruction.
Is this an issue with cross-model named scopes? Perhaps I'm not using it correctly.
Is this sort of situation handled better by Rails 3?
From railsapi.com:
Eager loading of associations
[...] Since only one table is loaded
at a time, conditions or orders
cannot reference tables other than the
main one. If this is the case Active
Record falls back to the previously
used LEFT OUTER JOIN based strategy.
For example
Post.find(:all, :include => [ :author, :comments ],
:conditions => ['comments.approved = ?', true])
will result in a single SQL query with
joins along the lines of: LEFT OUTER
JOIN comments ON comments.post_id =
posts.id and LEFT OUTER JOIN authors
ON authors.id = posts.author_id.
I believe this answers your question... there's no eager loading in "variation #4" because you references profiles table on your named_scope.
I believe the following may give you what you are looking for:
#users = User.visible.scoped(:include => :profile)
This did the trick for me, but I'm not joining with other tables in the definition of my named scope.
Jim Benton provides an elegant way of adding this to ActiveRecord on his blog: http://autonomousmachine.com/posts/2009/10/28/add-a-scope-for-easier-eager-loading

Resources