ActiveRecord :includes - how to use map with loaded associations? - ruby-on-rails

I have a small rails app, and I'm trying to get some order statistics.
So I have an Admin model, and an Order model, with one-to-many association.
class Admin < ActiveRecord::Base
attr_accessible :name
has_many :orders
class Order < ActiveRecord::Base
attr_accessible :operation
belongs_to :admin
And I'm trying to get specifical orders using this query:
admins = Admin.where(...).includes(:orders).where('orders.operation = ?', 'new gifts!')
That works just as expected. But when I try to make json using map like that
admins.map {|a| [a.name, a.orders.pluck(:operation)]}
Rails loads orders again using new query, ignoring already loaded objects.
(5.6ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 26
(6.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 24
(2.9ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 30
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 29
(4.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 27
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 28
(5.1ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 25
When I try to use
loop instead of map, it works as it should:
admins.each do |a|
p a.orders.pluck(:operation)
end
this code doesn't load all orders, and prints only those loaded in the first query.
Is it possible to get the same result using map? What are the drawbacks of using loop instead of map?

pluck should always make a new query to database. Not sure why you think it does not happen in an each loop. Maybe you did not see the log because it is in between your prints?
There are 2 possibilities how to avoid additional queries.
Since orders are already loaded because you include them, you can do admins.map {|a| [a.name, a.orders.collect(&:operation)]}
Using joins (see #tihom's comment).
Edit: I just tested the each/ map behavior and it reloads every time as expected.

Related

Avoiding N+1 in model instance methods by using sort_by instead of order

My app is a CRM for teachers where a Teacher belongs_to an Account that has_many Students who HABTM PhoneNumbers through CallablePhoneNumbers (since, IRL siblings can share one phone number).
(Aside: As a possible complicating factor, PhoneNumbers is Polymorphic. Both Teachers and Students are "Callable"...)
My Issue: I'm trying to avoid N+1 in a students_list view. When viewing a list of 900 students and some metadata, the database hits are pretty terrifying.
app/models/student.rb
class Student < ActiveRecord::Base
...
has_many :phone_numbers, through: :callable_phone_numbers, as: :callable_phone_numbers
...
def last_messaged_at
self.phone_numbers.order(:last_received_message_at).last.try(:last_received_message_at)
# :last_received_message_at is a simple DateTime in the database
end
...
end
When I'm showing a list of students I want to show the last_messaged_at method as a status alongside the student, and I'm attempting to avoid N+1 via .includes()
app/controllers/dashes_controller.rb
class DashesController < ApplicationController
before_action :logged_in_teacher
def show
#teacher = Teacher.includes(account: [{students: [:phone_numbers, :grade_level, :student_groups]}, :grade_levels]).includes(:student_groups).find(#current_teacher.id)
end
end
Yes, there are a lot of other associations in there. I'm focusing this question exclusively on PhoneNumbers, though feedback about my use of .includes() would not be unwelcome, since it does look convoluted.
In the console, I can go...
pry(main)> t = Teacher.includes(account: [{students: [:phone_numbers, :grade_level, :student_groups]}, :grade_levels]).includes(:student_groups).find(3)
Teacher Load (2.3ms) SELECT "teachers".* FROM "teachers" WHERE "teachers"."id" = ? LIMIT 1 [["id", 3]]
Account Load (0.4ms) SELECT "accounts".* FROM "accounts" WHERE "accounts"."id" IN (3)
Student Load (8.2ms) SELECT "students".* FROM "students" WHERE "students"."account_id" IN (3)
CallablePhoneNumber Load (7.3ms) ... ETC
pry(main)> t.account.students.first.phone_numbers
=> [#<PhoneNumber:0x007fddcc59ac98
id: 15,
number: ... ETC
...to get phone_numbers without an additional PhoneNumber Load. However, when I...
pry(main)> t.account.students.first.last_messaged_at
PhoneNumber Load (0.4ms) SELECT "phone_numbers".* FROM "phone_numbers" INNER JOIN "callable_phone_numbers" ON "phone_numbers"."id" = "callable_phone_numbers"."phone_number_id" WHERE "callable_phone_numbers"."callable_id" = ? AND "callable_phone_numbers"."callable_type" = ? ORDER BY "phone_numbers"."last_received_message_at" DESC LIMIT 1 [["callable_id", 3], ["callable_type", "Student"]]
=> Thu, 06 Aug 2015 18:01:12 UTC +00:00
I'm unexpectedly forced to ping the database again, when I would've thought those PhoneNumbers were already in memory.
I felt like an instance method was most appropriate for this, but maybe it should be a helper that I pass the Collection of Phone Numbers to? Even if that's the case, it's still unclear to me why the instance method can't "see" the loaded PhoneNumbers.
Please try sort_by if you have already eager loaded the associations.
self.phone_numbers.sort_by { |pn| pn.last_received_message_at || Time.now - 20.year }.last.try(:last_received_message_at)

How can I minimize queries when serving JSON through a Rails API?

I'm preparing an API in rails to serve an AngularJS app. This app will provide a dashboard for managing people in a database, so the main page for an individual person is pulling in a lot of information. Here's the Jbuilder file I'm using to format the information as JSON:
json.extract! #person, :id, :employee_id, :display_name
json.appointments #person.appointments, :id, :jobcode, :title
json.flags #person.flags, :id, :name
json.source_relationships #person.source_relationships, :id, :source_id, :target_id, :relationship_type_id
json.target_relationships #person.target_relationships, :id, :source_id, :target_id, :relationship_type_id
The JSON returned looks like this (from /api/v1/people/1685.json):
{
"id":1685,
"employee_id":"9995999",
"display_name":"John Doe",
"appointments": [
{"id":353,"jobcode":"TE556","title":"Developer"}
],
"flags":[
{"id":5,"name":"Unclassified"},
{"id":7,"name":"Full Time"}
],
"source_relationships":[
{"id":19,"source_id":1685,"target_id":1648,"relationship_type_id":9},
{"id":21,"source_id":1685,"target_id":1606,"relationship_type_id":9}
],
"target_relationships":[
{"id":1,"source_id":1648,"target_id":1685,"relationship_type_id":10}
]
}
And the console shows these queries:
Person Load (0.1ms) SELECT `people`.* FROM `people` WHERE `people`.`id` = 1685 LIMIT 1
Appointment Load (0.1ms) SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`person_id` = 1685
Flag Load (0.1ms) SELECT `flags`.* FROM `flags`
INNER JOIN `flags_people` ON `flags`.`id` = `flags_people`.`flag_id` WHERE `flags_people`.`person_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`source_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`target_id` = 1685
I like the way the JSON is formatted, but the fact that it has to run 5 separate queries seems inefficient. I tried adding joins() or includes() methods to active record query, which is currently just: #person = Person.find(params[:id]), but that didn't seem to be what I wanted. How can I cleanly minimize the number of queries while still returning JSON in a similar format?
The method I was looking for is eager_load. Not sure how I haven't come across it in the past, but it combined all the table queries into one using LEFT OUTER JOIN.
#person = Person.eager_load(:flags, :appointments,
:source_relationships, :target_relationships).find(params[:id])
Results in this single query:
SELECT DISTINCT `people`.`id` FROM `people`
LEFT OUTER JOIN `flags_people` ON `flags_people`.`person_id` = `people`.`id`
LEFT OUTER JOIN `flags` ON `flags`.`id` = `flags_people`.`flag_id`
LEFT OUTER JOIN `appointments` ON `appointments`.`person_id` = `people`.`id`
LEFT OUTER JOIN `relationships` ON `relationships`.`source_id` = `people`.`id`
LEFT OUTER JOIN `relationships` `target_relationships_people` ON `target_relationships_people`.`target_id` = `people`.`id`
WHERE `people`.`id` = 1685 LIMIT 1
Found the explanation on this blog post from Arkency

Effectively query the number of friends?

I'm current first getting all friends of a certain user and then take the size of the array as the number of friends. My concern is, since every time I'm retrieving all friends information from the database, it is potentially (or obviously) inefficient. So I'm wondering if there's a way that could query the number of friends of a certain user effectively without getting any other information?
If your relations are declared correctly, you should be able to do user.friends.count in order to generate a DB-level count.
See here in my console the SQL queries generated (Drug has_many :details and DrugDetail belongs_to :drug):
irb(main):003:0> Drug.first.details
Drug Load (0.7ms) SELECT "drugs".* FROM "drugs" LIMIT 1
DrugDetail Load (1.9ms) SELECT "drug_details".* FROM "drug_details" WHERE "drug_details"."drug_id" = 1771
=> []
irb(main):004:0> Drug.first.details.count
Drug Load (0.7ms) SELECT "drugs".* FROM "drugs" LIMIT 1
(0.6ms) SELECT COUNT(*) FROM "drug_details" WHERE "drug_details"."drug_id" = 1771
=> 0
irb(main):006:0> Drug.first.details.to_a.size
Drug Load (2.1ms) SELECT "drugs".* FROM "drugs" LIMIT 1
DrugDetail Load (0.5ms) SELECT "drug_details".* FROM "drug_details" WHERE "drug_details"."drug_id" = 1771
=> 0
In your case, if you have your relations like this:
User has_many :friends
Friend belongs_to :user
Then this should be executed at the DB-level and be faster than your first piece of code:
User.first.friends.count

active record relations – who needs it?

Well, I`m confused about rails queries. For example:
Affiche belongs_to :place
Place has_many :affiches
We can do this now:
#affiches = Affiche.all( :joins => :place )
or
#affiches = Affiche.all( :include => :place )
and we will get a lot of extra SELECTs, if there are many affiches:
Place Load (0.2ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.3ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.8ms) SELECT "places".* FROM "places" WHERE "places"."id" = 444 LIMIT 1
Place Load (1.0ms) SELECT "places".* FROM "places" WHERE "places"."id" = 222 LIMIT 1
...and so on...
And (sic!) with :joins used every SELECT is doubled!
Technically we cloud just write like this:
#affiches = Affiche.all( )
and the result is totally the same! (Because we have relations declared). The wayout of keeping all data in one query is removing the relations and writing a big string with "LEFT OUTER JOIN", but still there is a problem of grouping data in multy-dimentional array and a problem of similar column names, such as id.
What is done wrong? Or what am I doing wrong?
UPDATE:
Well, i have that string Place Load (2.5ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20)) and a list of selects one by one id. Strange, but I get these separate selects when I`m doing this in each scope:
<%= link_to a.place.name, **a.place**( :id => a.place.friendly_id ) %>
the marked a.place is the spot, that produces these extra queries.
UPDATE 2:
And let me do some math. In console we have:
Affiche Load (1.8ms) SELECT affiches.*, places.name FROM "affiches" LEFT OUTER JOIN "places" ON "places"."id" = "affiches"."place_id" ORDER BY affiches.event_date DESC
<VS>
Affiche Load (1.2ms) SELECT "affiches".* FROM "affiches"
Place Load (2.9ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20))
Comes out: 1.8ms versus 4.1ms, pretty much, confusing...
Something is really strange here because :include option is intended to gather place_id attribute from every affiche and then fetch all places at once using select query like this:
select * from places where id in (3, 444, 222)
You can check that in rails console. Just start it and run that snippet:
ActiveRecord::Base.logger = Logger.new STDOUT
Affiche.all :include => :place
You might be incidentally fetching affiches without actually including places somewhere in your code and than calling place for every affiche making rails to perform separate query for every one of them.

How to use includes with 3 models w ActiveRecord?

Given the following model:
Room (id, title, suggested)
has_many :room_apps, :dependent => :destroy
RoomApp (room_id, app_id, appable_id, appable_type)
belongs_to :appable, :polymorphic => true
has_many :colors, :as => :appable
has_many :shirts, :as => :appable
Colors (room_id)
belongs_to :room
belongs_to :room_app
belongs_to :app
What I want to do is get all the suggested rooms. In my controller I have:
#suggested_rooms = Room.includes(:room_apps).find_all_by_suggested(true).first(5)
Problem here is the includes is not working and the db is being hit several times:
Processing by PagesController#splash as HTML
Room Load (0.6ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't' ORDER BY last_activity_at DESC
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND ("room_apps".room_id IN (5,4,3)) ORDER BY created_at DESC
RoomApp Load (5.9ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 6 AND ("room_apps".room_id = 5) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 5 LIMIT 1
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 5 AND ("room_apps".room_id = 4) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 4 LIMIT 1
RoomApp Load (0.4ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 4 AND ("room_apps".room_id = 3) ORDER BY created_at DESC LIMIT 1
Color Load (0.3ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 3 LIMIT 1
Is something setup incorrectly? I'd like to be able to get suggested rooms and use includes for room_apps with one hit versus currently where it's a hit for every room.
Ideas? Thanks
I think you'll either want to use the full Rails3 arel interface like so:
#suggested_rooms = Room.includes(:room_apps).where(:suggested => true).limit(5)
Or do this for Rails 2.3x:
#suggested_rooms = Room.find_all_by_suggested(true, :include=>:room_apps).first(5)
Did some digging around and I think I have an idea what's going on.
include by default does not generate a single query. It generates N queries, where N is the number of models being included.
ruby-1.9.2-p180 :014 > Room.where(:suggested => true).includes(:room_apps => :colors)
Room Load (0.5ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't'
RoomApp Load (0.8ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."room_id" IN (1)
Color Load (0.5ms) SELECT "colors".* FROM "colors" WHERE "colors"."room_app_id" IN (1)
One exception to this is if you have a where clause that references one of the model tables being included, in this case it will use a LEFT OUTER JOIN to add the where clause to that table.
If you want to INNER JOIN a bunch of models AND include them, you have to use both joins and includes with the given models. joins alone will only do the INNER JOIN across the relations, includes will pull in the fields and setup the returned models with their relations intact.
ruby-1.9.2-p180 :015 > Room.where(:suggested => true).joins(:room_apps => :colors)
Room Load (0.8ms) SELECT "rooms".*
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
ruby-1.9.2-p180 :016 > Room.where(:suggested => true).joins(:room_apps => :colors).includes(:room_apps => :colors)
SQL (0.6ms) SELECT "rooms"."id" AS t0_r0, "rooms"."suggested" AS t0_r1, "rooms"."created_at" AS t0_r2, "rooms"."updated_at" AS t0_r3, "room_apps"."id" AS t1_r0, "room_apps"."room_id" AS t1_r1, "room_apps"."created_at" AS t1_r2, "room_apps"."updated_at" AS t1_r3, "colors"."id" AS t2_r0, "colors"."room_id" AS t2_r1, "colors"."room_app_id" AS t2_r2, "colors"."created_at" AS t2_r3, "colors"."updated_at" AS t2_r4
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
The big convoluted SELECT part in the last query is ARel making sure that the fields from all of the models are unique and able to be differentiated when they need to be mapped back to the actual models.
Whether you use includes alone or includes with joins is a matter of how much data your bringing back, and how much speed difference there might be if you were not doing the INNER JOIN, causing a great deal of duplicate data to be returned. I would imagine that if 'rooms' had something like a dozen fields and 'colors' had 1 field, but there was 100 colors that mapped to a single room, instead of pulling back 113 fields in total (1 room * 13 + 100 colors * 1) you would end up with 1400 fields (13 + 1 * 100 colors). Not exactly a performance boost.
Though the downside of using includes alone is that if you do have a large number of colors per room, the IN(ids) will be huge, bit of a double edged sword.
Here's a quick test I did with various configurations using sqlite3
I setup two sets of rooms, one with :suggested => true, the other :suggested => false. The suggested rooms had a 1:1:2 ratio between rooms/room_apps/colors, the suggested false rooms were setup with a 1:1:10 ratio of the same, and there is a 10:1 ratio between suggested and not suggested.
# 100/10 rooms
# insert only
100 * 1/1/2: 8.1ms
10 * 1/1/10: 3.2ms
# insert + joins
100 * 1/1/2: 6.2ms
10 * 1/1/10: 3.1ms
# 1000/100 rooms
# insert only
1000 * 1/1/2: 76.8ms
100 * 1/1/10: 19.8ms
# insert + joins
1000 * 1/1/2: 54.5ms
100 * 1/1/10: 23.1ms
The times are not relevant themselves, this is being run via IRB on a Ubuntu guest on a WinXP host on a crappy HDD. Given that you've got a limit(5) in there it probably isn't going to make a huge difference either way.

Resources