Given the following model:
Room (id, title, suggested)
has_many :room_apps, :dependent => :destroy
RoomApp (room_id, app_id, appable_id, appable_type)
belongs_to :appable, :polymorphic => true
has_many :colors, :as => :appable
has_many :shirts, :as => :appable
Colors (room_id)
belongs_to :room
belongs_to :room_app
belongs_to :app
What I want to do is get all the suggested rooms. In my controller I have:
#suggested_rooms = Room.includes(:room_apps).find_all_by_suggested(true).first(5)
Problem here is the includes is not working and the db is being hit several times:
Processing by PagesController#splash as HTML
Room Load (0.6ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't' ORDER BY last_activity_at DESC
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND ("room_apps".room_id IN (5,4,3)) ORDER BY created_at DESC
RoomApp Load (5.9ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 6 AND ("room_apps".room_id = 5) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 5 LIMIT 1
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 5 AND ("room_apps".room_id = 4) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 4 LIMIT 1
RoomApp Load (0.4ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 4 AND ("room_apps".room_id = 3) ORDER BY created_at DESC LIMIT 1
Color Load (0.3ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 3 LIMIT 1
Is something setup incorrectly? I'd like to be able to get suggested rooms and use includes for room_apps with one hit versus currently where it's a hit for every room.
Ideas? Thanks
I think you'll either want to use the full Rails3 arel interface like so:
#suggested_rooms = Room.includes(:room_apps).where(:suggested => true).limit(5)
Or do this for Rails 2.3x:
#suggested_rooms = Room.find_all_by_suggested(true, :include=>:room_apps).first(5)
Did some digging around and I think I have an idea what's going on.
include by default does not generate a single query. It generates N queries, where N is the number of models being included.
ruby-1.9.2-p180 :014 > Room.where(:suggested => true).includes(:room_apps => :colors)
Room Load (0.5ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't'
RoomApp Load (0.8ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."room_id" IN (1)
Color Load (0.5ms) SELECT "colors".* FROM "colors" WHERE "colors"."room_app_id" IN (1)
One exception to this is if you have a where clause that references one of the model tables being included, in this case it will use a LEFT OUTER JOIN to add the where clause to that table.
If you want to INNER JOIN a bunch of models AND include them, you have to use both joins and includes with the given models. joins alone will only do the INNER JOIN across the relations, includes will pull in the fields and setup the returned models with their relations intact.
ruby-1.9.2-p180 :015 > Room.where(:suggested => true).joins(:room_apps => :colors)
Room Load (0.8ms) SELECT "rooms".*
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
ruby-1.9.2-p180 :016 > Room.where(:suggested => true).joins(:room_apps => :colors).includes(:room_apps => :colors)
SQL (0.6ms) SELECT "rooms"."id" AS t0_r0, "rooms"."suggested" AS t0_r1, "rooms"."created_at" AS t0_r2, "rooms"."updated_at" AS t0_r3, "room_apps"."id" AS t1_r0, "room_apps"."room_id" AS t1_r1, "room_apps"."created_at" AS t1_r2, "room_apps"."updated_at" AS t1_r3, "colors"."id" AS t2_r0, "colors"."room_id" AS t2_r1, "colors"."room_app_id" AS t2_r2, "colors"."created_at" AS t2_r3, "colors"."updated_at" AS t2_r4
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
The big convoluted SELECT part in the last query is ARel making sure that the fields from all of the models are unique and able to be differentiated when they need to be mapped back to the actual models.
Whether you use includes alone or includes with joins is a matter of how much data your bringing back, and how much speed difference there might be if you were not doing the INNER JOIN, causing a great deal of duplicate data to be returned. I would imagine that if 'rooms' had something like a dozen fields and 'colors' had 1 field, but there was 100 colors that mapped to a single room, instead of pulling back 113 fields in total (1 room * 13 + 100 colors * 1) you would end up with 1400 fields (13 + 1 * 100 colors). Not exactly a performance boost.
Though the downside of using includes alone is that if you do have a large number of colors per room, the IN(ids) will be huge, bit of a double edged sword.
Here's a quick test I did with various configurations using sqlite3
I setup two sets of rooms, one with :suggested => true, the other :suggested => false. The suggested rooms had a 1:1:2 ratio between rooms/room_apps/colors, the suggested false rooms were setup with a 1:1:10 ratio of the same, and there is a 10:1 ratio between suggested and not suggested.
# 100/10 rooms
# insert only
100 * 1/1/2: 8.1ms
10 * 1/1/10: 3.2ms
# insert + joins
100 * 1/1/2: 6.2ms
10 * 1/1/10: 3.1ms
# 1000/100 rooms
# insert only
1000 * 1/1/2: 76.8ms
100 * 1/1/10: 19.8ms
# insert + joins
1000 * 1/1/2: 54.5ms
100 * 1/1/10: 23.1ms
The times are not relevant themselves, this is being run via IRB on a Ubuntu guest on a WinXP host on a crappy HDD. Given that you've got a limit(5) in there it probably isn't going to make a huge difference either way.
Related
My people have scores and I'd like an efficient way to query if the given user is in the top X users.
# person.rb
class Person
scope :top_score, -> {order('score DESC')}
scope :page_limit, -> { limit(10) }
def self.in_top_score(id)
top_score.page_limit.something_something_soemthign?
end
end
previously was doing:
user.id.in?(top_score.page_limit.pluck(:id))
but i'd prefer to move this check to the database to prevent the object serialization of hundreds/thousands of records.
Person.order('score DESC').select([:score, :id]).limit(1)
Person Load (0.5ms) SELECT score, id FROM `people` ORDER BY score DESC LIMIT 1
=> [#<Person id: "dxvrDy...", score: 35>]
now to check if another user exists in that list^^
Person.order('score DESC').select([:score, :id]).limit(1).exists?({id: "c_Tvr6..."})
Person Exists (0.3ms) SELECT 1 AS one FROM `people` WHERE `people`.`id` = 'c_Tvr6...' LIMIT 1
=> true
returns true but should return false
updated answer
Sorry, my original answer was incorrect. (The exists? query evidently uses LIMIT 1 and overwrites the LIMIT 10 from the page_limit scope, and evidently throws out the ORDER BY clause, too. Totally wrong! :-p)
What about this? It's a little bit less elegant, but I actually tested the answer this time :-p, and it seems to work as desired.
def self.in_top_score?(id)
where(id: id).where(id: Person.top_score.page_limit).exists?
end
Here's an example usage from my testing (using Rails 4.2.6) and the SQL it generates (which uses a subquery):
pry(main)> Person.in_top_score?(56)
Person Exists (0.4ms) SELECT 1 AS one FROM "people" WHERE "people"."id" = $1 AND "people"."id" IN (SELECT "people"."id" FROM "people" ORDER BY "people"."score" DESC LIMIT 10) LIMIT 1 [["id", 56]]
=> false
In my testing, this does indeed have at least a bit of a performance boost compared to your original version.
original answer
top_score.page_limit.exists?(user.id)
http://apidock.com/rails/ActiveRecord/FinderMethods/exists%3F
I'm preparing an API in rails to serve an AngularJS app. This app will provide a dashboard for managing people in a database, so the main page for an individual person is pulling in a lot of information. Here's the Jbuilder file I'm using to format the information as JSON:
json.extract! #person, :id, :employee_id, :display_name
json.appointments #person.appointments, :id, :jobcode, :title
json.flags #person.flags, :id, :name
json.source_relationships #person.source_relationships, :id, :source_id, :target_id, :relationship_type_id
json.target_relationships #person.target_relationships, :id, :source_id, :target_id, :relationship_type_id
The JSON returned looks like this (from /api/v1/people/1685.json):
{
"id":1685,
"employee_id":"9995999",
"display_name":"John Doe",
"appointments": [
{"id":353,"jobcode":"TE556","title":"Developer"}
],
"flags":[
{"id":5,"name":"Unclassified"},
{"id":7,"name":"Full Time"}
],
"source_relationships":[
{"id":19,"source_id":1685,"target_id":1648,"relationship_type_id":9},
{"id":21,"source_id":1685,"target_id":1606,"relationship_type_id":9}
],
"target_relationships":[
{"id":1,"source_id":1648,"target_id":1685,"relationship_type_id":10}
]
}
And the console shows these queries:
Person Load (0.1ms) SELECT `people`.* FROM `people` WHERE `people`.`id` = 1685 LIMIT 1
Appointment Load (0.1ms) SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`person_id` = 1685
Flag Load (0.1ms) SELECT `flags`.* FROM `flags`
INNER JOIN `flags_people` ON `flags`.`id` = `flags_people`.`flag_id` WHERE `flags_people`.`person_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`source_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`target_id` = 1685
I like the way the JSON is formatted, but the fact that it has to run 5 separate queries seems inefficient. I tried adding joins() or includes() methods to active record query, which is currently just: #person = Person.find(params[:id]), but that didn't seem to be what I wanted. How can I cleanly minimize the number of queries while still returning JSON in a similar format?
The method I was looking for is eager_load. Not sure how I haven't come across it in the past, but it combined all the table queries into one using LEFT OUTER JOIN.
#person = Person.eager_load(:flags, :appointments,
:source_relationships, :target_relationships).find(params[:id])
Results in this single query:
SELECT DISTINCT `people`.`id` FROM `people`
LEFT OUTER JOIN `flags_people` ON `flags_people`.`person_id` = `people`.`id`
LEFT OUTER JOIN `flags` ON `flags`.`id` = `flags_people`.`flag_id`
LEFT OUTER JOIN `appointments` ON `appointments`.`person_id` = `people`.`id`
LEFT OUTER JOIN `relationships` ON `relationships`.`source_id` = `people`.`id`
LEFT OUTER JOIN `relationships` `target_relationships_people` ON `target_relationships_people`.`target_id` = `people`.`id`
WHERE `people`.`id` = 1685 LIMIT 1
Found the explanation on this blog post from Arkency
I have a small rails app, and I'm trying to get some order statistics.
So I have an Admin model, and an Order model, with one-to-many association.
class Admin < ActiveRecord::Base
attr_accessible :name
has_many :orders
class Order < ActiveRecord::Base
attr_accessible :operation
belongs_to :admin
And I'm trying to get specifical orders using this query:
admins = Admin.where(...).includes(:orders).where('orders.operation = ?', 'new gifts!')
That works just as expected. But when I try to make json using map like that
admins.map {|a| [a.name, a.orders.pluck(:operation)]}
Rails loads orders again using new query, ignoring already loaded objects.
(5.6ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 26
(6.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 24
(2.9ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 30
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 29
(4.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 27
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 28
(5.1ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 25
When I try to use
loop instead of map, it works as it should:
admins.each do |a|
p a.orders.pluck(:operation)
end
this code doesn't load all orders, and prints only those loaded in the first query.
Is it possible to get the same result using map? What are the drawbacks of using loop instead of map?
pluck should always make a new query to database. Not sure why you think it does not happen in an each loop. Maybe you did not see the log because it is in between your prints?
There are 2 possibilities how to avoid additional queries.
Since orders are already loaded because you include them, you can do admins.map {|a| [a.name, a.orders.collect(&:operation)]}
Using joins (see #tihom's comment).
Edit: I just tested the each/ map behavior and it reloads every time as expected.
Well, I`m confused about rails queries. For example:
Affiche belongs_to :place
Place has_many :affiches
We can do this now:
#affiches = Affiche.all( :joins => :place )
or
#affiches = Affiche.all( :include => :place )
and we will get a lot of extra SELECTs, if there are many affiches:
Place Load (0.2ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.3ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.8ms) SELECT "places".* FROM "places" WHERE "places"."id" = 444 LIMIT 1
Place Load (1.0ms) SELECT "places".* FROM "places" WHERE "places"."id" = 222 LIMIT 1
...and so on...
And (sic!) with :joins used every SELECT is doubled!
Technically we cloud just write like this:
#affiches = Affiche.all( )
and the result is totally the same! (Because we have relations declared). The wayout of keeping all data in one query is removing the relations and writing a big string with "LEFT OUTER JOIN", but still there is a problem of grouping data in multy-dimentional array and a problem of similar column names, such as id.
What is done wrong? Or what am I doing wrong?
UPDATE:
Well, i have that string Place Load (2.5ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20)) and a list of selects one by one id. Strange, but I get these separate selects when I`m doing this in each scope:
<%= link_to a.place.name, **a.place**( :id => a.place.friendly_id ) %>
the marked a.place is the spot, that produces these extra queries.
UPDATE 2:
And let me do some math. In console we have:
Affiche Load (1.8ms) SELECT affiches.*, places.name FROM "affiches" LEFT OUTER JOIN "places" ON "places"."id" = "affiches"."place_id" ORDER BY affiches.event_date DESC
<VS>
Affiche Load (1.2ms) SELECT "affiches".* FROM "affiches"
Place Load (2.9ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20))
Comes out: 1.8ms versus 4.1ms, pretty much, confusing...
Something is really strange here because :include option is intended to gather place_id attribute from every affiche and then fetch all places at once using select query like this:
select * from places where id in (3, 444, 222)
You can check that in rails console. Just start it and run that snippet:
ActiveRecord::Base.logger = Logger.new STDOUT
Affiche.all :include => :place
You might be incidentally fetching affiches without actually including places somewhere in your code and than calling place for every affiche making rails to perform separate query for every one of them.
Rails version 3.0.3, I am new to rails, but been in webdev for a long time.
I am using awesome nested set.
I have the tables "posts", "labels", and "labels_posts"
posts has_and_belongs_to_many labels
labels has_and_belongs_to_many posts
labels acts_as_nested_set
I have a label_id and I want to get all posts that are associated to that label and its children, all as a single ordered result set.
Let us say that I have Labels: "L1, L1.1, L1.1.1, L1.2, L2"
Given L1, and knowing that therefore I have L1, L1.1, L1.1.1, and L1.2, I would normally run the query:
select id, title
from posts
where exists (select * from labels_posts where labels_posts.post_id = posts.id and labels_posts.label_id IN ('L1', 'L1.1', 'L1.1.1', 'L1.2'))
order by created_at desc
This query would return all the posts associated with each of those labels.
So, what is the rails way to do this?
EDIT:
So, here is my controller
#label = Label.find(params[:label])
#posts = Post.all.select do |post|
post.label_ids.include?(#label.self_and_descendants.map(&:id))
end
And here is the rails server output
Label Load (0.5ms) SELECT "labels".* FROM "labels" WHERE ("labels"."cached_slug" = 'caribbean') LIMIT 1
Post Load (0.6ms) SELECT "posts".* FROM "posts"
Label Load (0.2ms) SELECT "labels".id FROM "labels" INNER JOIN "labels_posts" ON "labels".id = "labels_posts".label_id WHERE ("labels_posts".post_id = 1 )
Label Load (0.8ms) SELECT "labels".* FROM "labels" WHERE ("labels"."lft" >= 1 AND "labels"."rgt" <= 8) ORDER BY "lft"
Label Load (0.1ms) SELECT "labels".id FROM "labels" INNER JOIN "labels_posts" ON "labels".id = "labels_posts".label_id WHERE ("labels_posts".post_id = 2 )
CACHE (0.0ms) SELECT "labels".* FROM "labels" WHERE ("labels"."lft" >= 1 AND "labels"."rgt" <= 8) ORDER BY "lft"
I am not sure the select method is the one that is needed.
EDIT ANSWER:
So here is the answer I arrived at
#label = Label.find(params[:label])
#posts = Post.order('posts.created_at desc').where('labels_posts.label_id IN (?)', #label.self_and_descendants.map(&:id)).includes(:labels)
try
Post.all(:conditions => "id = labels_posts.post_id AND label_posts.label_id in('L1', 'L1.1', 'L1.1.1', 'L1.2')", :include => [:label => labels_posts], :order => :created_at)
Lets first collect the ids of the Label (label_id) and its children.
Then use the Ruby Array Select method to select those posts which are associated with the appropriate labels.
label_ids = Label.find(label_id).children.map(&:id).push(label_id)
posts = Post.all.select{|post| post.label_ids.include?(label_ids)}