I'm preparing an API in rails to serve an AngularJS app. This app will provide a dashboard for managing people in a database, so the main page for an individual person is pulling in a lot of information. Here's the Jbuilder file I'm using to format the information as JSON:
json.extract! #person, :id, :employee_id, :display_name
json.appointments #person.appointments, :id, :jobcode, :title
json.flags #person.flags, :id, :name
json.source_relationships #person.source_relationships, :id, :source_id, :target_id, :relationship_type_id
json.target_relationships #person.target_relationships, :id, :source_id, :target_id, :relationship_type_id
The JSON returned looks like this (from /api/v1/people/1685.json):
{
"id":1685,
"employee_id":"9995999",
"display_name":"John Doe",
"appointments": [
{"id":353,"jobcode":"TE556","title":"Developer"}
],
"flags":[
{"id":5,"name":"Unclassified"},
{"id":7,"name":"Full Time"}
],
"source_relationships":[
{"id":19,"source_id":1685,"target_id":1648,"relationship_type_id":9},
{"id":21,"source_id":1685,"target_id":1606,"relationship_type_id":9}
],
"target_relationships":[
{"id":1,"source_id":1648,"target_id":1685,"relationship_type_id":10}
]
}
And the console shows these queries:
Person Load (0.1ms) SELECT `people`.* FROM `people` WHERE `people`.`id` = 1685 LIMIT 1
Appointment Load (0.1ms) SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`person_id` = 1685
Flag Load (0.1ms) SELECT `flags`.* FROM `flags`
INNER JOIN `flags_people` ON `flags`.`id` = `flags_people`.`flag_id` WHERE `flags_people`.`person_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`source_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`target_id` = 1685
I like the way the JSON is formatted, but the fact that it has to run 5 separate queries seems inefficient. I tried adding joins() or includes() methods to active record query, which is currently just: #person = Person.find(params[:id]), but that didn't seem to be what I wanted. How can I cleanly minimize the number of queries while still returning JSON in a similar format?
The method I was looking for is eager_load. Not sure how I haven't come across it in the past, but it combined all the table queries into one using LEFT OUTER JOIN.
#person = Person.eager_load(:flags, :appointments,
:source_relationships, :target_relationships).find(params[:id])
Results in this single query:
SELECT DISTINCT `people`.`id` FROM `people`
LEFT OUTER JOIN `flags_people` ON `flags_people`.`person_id` = `people`.`id`
LEFT OUTER JOIN `flags` ON `flags`.`id` = `flags_people`.`flag_id`
LEFT OUTER JOIN `appointments` ON `appointments`.`person_id` = `people`.`id`
LEFT OUTER JOIN `relationships` ON `relationships`.`source_id` = `people`.`id`
LEFT OUTER JOIN `relationships` `target_relationships_people` ON `target_relationships_people`.`target_id` = `people`.`id`
WHERE `people`.`id` = 1685 LIMIT 1
Found the explanation on this blog post from Arkency
Related
I have two models product and category.
I am able to make successful queries like Category.products etc.
Product.rb
belongs_to :category
Category.rb
has_many :products
Now I want to retrieve only those categories that has at least one existing product.
I tried like this :
#categories = Category.where(Category.products.present?)
# returned error undefined method `products' also changing to product didn't work.
Getting your comment that you need Categories with products and that the product property with_operator to be true, you can do that query in "rails style" using joins and merge:
#categories = Category.joins(:products).merge(Product.where(with_operator: true)).uniq
Which will generate the following SQL:
SELECT DISTINCT "categories".* FROM "categories" INNER JOIN "products" ON "products"."category_id" = "categories"."id" WHERE "products"."with_operator" = 't'
You could also use the rails 4 syntax, as pointed by #yukke:
Category.joins(:products).where(products: { with_operator: true }).uniq
All you need is inner join. It will skip those categories, that has no products. And to add a condition on joined table you can use rails 4 where's syntax:
#categories = Category.joins(:products).where(products: { with_operator: true }).uniq
It will produce next sql query:
SELECT DISTINCT "categories".*
FROM "categories" INNER JOIN "products" ON "products"."category_id" = "categories"."id"
WHERE "products"."with_operator" = 't'
I have a small rails app, and I'm trying to get some order statistics.
So I have an Admin model, and an Order model, with one-to-many association.
class Admin < ActiveRecord::Base
attr_accessible :name
has_many :orders
class Order < ActiveRecord::Base
attr_accessible :operation
belongs_to :admin
And I'm trying to get specifical orders using this query:
admins = Admin.where(...).includes(:orders).where('orders.operation = ?', 'new gifts!')
That works just as expected. But when I try to make json using map like that
admins.map {|a| [a.name, a.orders.pluck(:operation)]}
Rails loads orders again using new query, ignoring already loaded objects.
(5.6ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 26
(6.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 24
(2.9ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 30
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 29
(4.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 27
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 28
(5.1ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 25
When I try to use
loop instead of map, it works as it should:
admins.each do |a|
p a.orders.pluck(:operation)
end
this code doesn't load all orders, and prints only those loaded in the first query.
Is it possible to get the same result using map? What are the drawbacks of using loop instead of map?
pluck should always make a new query to database. Not sure why you think it does not happen in an each loop. Maybe you did not see the log because it is in between your prints?
There are 2 possibilities how to avoid additional queries.
Since orders are already loaded because you include them, you can do admins.map {|a| [a.name, a.orders.collect(&:operation)]}
Using joins (see #tihom's comment).
Edit: I just tested the each/ map behavior and it reloads every time as expected.
Well, I`m confused about rails queries. For example:
Affiche belongs_to :place
Place has_many :affiches
We can do this now:
#affiches = Affiche.all( :joins => :place )
or
#affiches = Affiche.all( :include => :place )
and we will get a lot of extra SELECTs, if there are many affiches:
Place Load (0.2ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.3ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.8ms) SELECT "places".* FROM "places" WHERE "places"."id" = 444 LIMIT 1
Place Load (1.0ms) SELECT "places".* FROM "places" WHERE "places"."id" = 222 LIMIT 1
...and so on...
And (sic!) with :joins used every SELECT is doubled!
Technically we cloud just write like this:
#affiches = Affiche.all( )
and the result is totally the same! (Because we have relations declared). The wayout of keeping all data in one query is removing the relations and writing a big string with "LEFT OUTER JOIN", but still there is a problem of grouping data in multy-dimentional array and a problem of similar column names, such as id.
What is done wrong? Or what am I doing wrong?
UPDATE:
Well, i have that string Place Load (2.5ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20)) and a list of selects one by one id. Strange, but I get these separate selects when I`m doing this in each scope:
<%= link_to a.place.name, **a.place**( :id => a.place.friendly_id ) %>
the marked a.place is the spot, that produces these extra queries.
UPDATE 2:
And let me do some math. In console we have:
Affiche Load (1.8ms) SELECT affiches.*, places.name FROM "affiches" LEFT OUTER JOIN "places" ON "places"."id" = "affiches"."place_id" ORDER BY affiches.event_date DESC
<VS>
Affiche Load (1.2ms) SELECT "affiches".* FROM "affiches"
Place Load (2.9ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20))
Comes out: 1.8ms versus 4.1ms, pretty much, confusing...
Something is really strange here because :include option is intended to gather place_id attribute from every affiche and then fetch all places at once using select query like this:
select * from places where id in (3, 444, 222)
You can check that in rails console. Just start it and run that snippet:
ActiveRecord::Base.logger = Logger.new STDOUT
Affiche.all :include => :place
You might be incidentally fetching affiches without actually including places somewhere in your code and than calling place for every affiche making rails to perform separate query for every one of them.
Given the following model:
Room (id, title, suggested)
has_many :room_apps, :dependent => :destroy
RoomApp (room_id, app_id, appable_id, appable_type)
belongs_to :appable, :polymorphic => true
has_many :colors, :as => :appable
has_many :shirts, :as => :appable
Colors (room_id)
belongs_to :room
belongs_to :room_app
belongs_to :app
What I want to do is get all the suggested rooms. In my controller I have:
#suggested_rooms = Room.includes(:room_apps).find_all_by_suggested(true).first(5)
Problem here is the includes is not working and the db is being hit several times:
Processing by PagesController#splash as HTML
Room Load (0.6ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't' ORDER BY last_activity_at DESC
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND ("room_apps".room_id IN (5,4,3)) ORDER BY created_at DESC
RoomApp Load (5.9ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 6 AND ("room_apps".room_id = 5) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 5 LIMIT 1
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 5 AND ("room_apps".room_id = 4) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 4 LIMIT 1
RoomApp Load (0.4ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 4 AND ("room_apps".room_id = 3) ORDER BY created_at DESC LIMIT 1
Color Load (0.3ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 3 LIMIT 1
Is something setup incorrectly? I'd like to be able to get suggested rooms and use includes for room_apps with one hit versus currently where it's a hit for every room.
Ideas? Thanks
I think you'll either want to use the full Rails3 arel interface like so:
#suggested_rooms = Room.includes(:room_apps).where(:suggested => true).limit(5)
Or do this for Rails 2.3x:
#suggested_rooms = Room.find_all_by_suggested(true, :include=>:room_apps).first(5)
Did some digging around and I think I have an idea what's going on.
include by default does not generate a single query. It generates N queries, where N is the number of models being included.
ruby-1.9.2-p180 :014 > Room.where(:suggested => true).includes(:room_apps => :colors)
Room Load (0.5ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't'
RoomApp Load (0.8ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."room_id" IN (1)
Color Load (0.5ms) SELECT "colors".* FROM "colors" WHERE "colors"."room_app_id" IN (1)
One exception to this is if you have a where clause that references one of the model tables being included, in this case it will use a LEFT OUTER JOIN to add the where clause to that table.
If you want to INNER JOIN a bunch of models AND include them, you have to use both joins and includes with the given models. joins alone will only do the INNER JOIN across the relations, includes will pull in the fields and setup the returned models with their relations intact.
ruby-1.9.2-p180 :015 > Room.where(:suggested => true).joins(:room_apps => :colors)
Room Load (0.8ms) SELECT "rooms".*
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
ruby-1.9.2-p180 :016 > Room.where(:suggested => true).joins(:room_apps => :colors).includes(:room_apps => :colors)
SQL (0.6ms) SELECT "rooms"."id" AS t0_r0, "rooms"."suggested" AS t0_r1, "rooms"."created_at" AS t0_r2, "rooms"."updated_at" AS t0_r3, "room_apps"."id" AS t1_r0, "room_apps"."room_id" AS t1_r1, "room_apps"."created_at" AS t1_r2, "room_apps"."updated_at" AS t1_r3, "colors"."id" AS t2_r0, "colors"."room_id" AS t2_r1, "colors"."room_app_id" AS t2_r2, "colors"."created_at" AS t2_r3, "colors"."updated_at" AS t2_r4
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
The big convoluted SELECT part in the last query is ARel making sure that the fields from all of the models are unique and able to be differentiated when they need to be mapped back to the actual models.
Whether you use includes alone or includes with joins is a matter of how much data your bringing back, and how much speed difference there might be if you were not doing the INNER JOIN, causing a great deal of duplicate data to be returned. I would imagine that if 'rooms' had something like a dozen fields and 'colors' had 1 field, but there was 100 colors that mapped to a single room, instead of pulling back 113 fields in total (1 room * 13 + 100 colors * 1) you would end up with 1400 fields (13 + 1 * 100 colors). Not exactly a performance boost.
Though the downside of using includes alone is that if you do have a large number of colors per room, the IN(ids) will be huge, bit of a double edged sword.
Here's a quick test I did with various configurations using sqlite3
I setup two sets of rooms, one with :suggested => true, the other :suggested => false. The suggested rooms had a 1:1:2 ratio between rooms/room_apps/colors, the suggested false rooms were setup with a 1:1:10 ratio of the same, and there is a 10:1 ratio between suggested and not suggested.
# 100/10 rooms
# insert only
100 * 1/1/2: 8.1ms
10 * 1/1/10: 3.2ms
# insert + joins
100 * 1/1/2: 6.2ms
10 * 1/1/10: 3.1ms
# 1000/100 rooms
# insert only
1000 * 1/1/2: 76.8ms
100 * 1/1/10: 19.8ms
# insert + joins
1000 * 1/1/2: 54.5ms
100 * 1/1/10: 23.1ms
The times are not relevant themselves, this is being run via IRB on a Ubuntu guest on a WinXP host on a crappy HDD. Given that you've got a limit(5) in there it probably isn't going to make a huge difference either way.
I'm attempting to eager load in my Rails 3 app. I've narrowed it down to a very basic sample, and instead of generating the one query I'm expecting, it's generating 4.
First, here's a simple breakdown of my models.
class Profile < ActiveRecord::Base
belongs_to :gender
def to_param
self.name
end
end
class Gender < ActiveRecord::Base
has_many :profiles, :dependent => :nullify
end
I then has a ProfilesController::show action, where's I'm querying for the model.
def ProfilesController < ApplicationController
before_filter :find_profile, :only => [:show]
def show
end
private
def find_profile
#profile = Profile.find_by_username(params[:id], :include => :gender)
raise ActiveRecord::RecordNotFound, "Page not found" unless #profile
end
end
When I look at the queries this generates, it shows the following:
SELECT `profiles`.* FROM `profiles` WHERE `profiles`.`username` = 'matt' LIMIT 1
SELECT `genders`.* FROM `genders` WHERE (`genders`.`id` = 1)
What I expected to see is a single query:
SELECT `profiles`.*, `genders`.* FROM `profiles` LEFT JOIN `genders` ON `profiles`.gender_id = `genders`.id WHERE `profiles`.`username` = 'matt' LIMIT 1
Anyone know what I'm doing wrong here? Everything I've found on eager loading makes it sound like this should work.
Edit: After trying joins, as recommended by sled, I'm still seeing the same results.
The code:
#profile = Profile.joins(:gender).where(:username => params[:id]).limit(1).first
The query:
SELECT `profiles`.* FROM `profiles` INNER JOIN `genders` ON `genders`.`id` = `profiles`.`gender_id` WHERE `profiles`.`username` = 'matt' LIMIT 1
Again, you can see no genders data is being retrieved, and so a second query to genders is being made.
I even tried adding a select, to no avail:
#profile = Profile.joins(:gender).select('profiles.*, genders.*').where(:username => params[:id]).limit(1).first
which correctly resulted in:
SELECT profiles.*, genders.* FROM `profiles` INNER JOIN `genders` ON `genders`.`id` = `profiles`.`gender_id` WHERE `profiles`.`username` = 'matt' LIMIT 1
...but it still performed a second query on genders later when accessing #profile.gender's attributes.
Edit 2: I also tried creating a scope that includes both select and joins in order to get all the fields I require, (similar to the custom left join method sled demonstrated). It looks like this:
class Profile < ActiveRecord::Base
# ...
ALL_ATTRIBUTES = [:photo, :city, :gender, :relationship_status, :physique, :children,
:diet, :drink, :smoke, :drug, :education, :income, :job, :politic, :religion, :zodiac]
scope :with_attributes,
select((ALL_ATTRIBUTES.collect { |a| "`#{reflect_on_association(a).table_name}`.*" } + ["`#{table_name}`.*"]).join(', ')).
joins(ALL_ATTRIBUTES.collect { |a|
assoc = reflect_on_association(a)
"LEFT JOIN `#{assoc.table_name}` ON `#{table_name}`.#{assoc.primary_key_name} = `#{assoc.table_name}`.#{assoc.active_record_primary_key}"
}.join(' '))
# ...
end
This generates the following query, which appears correct:
SELECT `photos`.*, `cities`.*, `profile_genders`.*, `profile_relationship_statuses`.*, `profile_physiques`.*, `profile_children`.*, `profile_diets`.*, `profile_drinks`.*, `profile_smokes`.*, `profile_drugs`.*, `profile_educations`.*, `profile_incomes`.*, `profile_jobs`.*, `profile_politics`.*, `profile_religions`.*, `profile_zodiacs`.*, `profiles`.* FROM `profiles` LEFT JOIN `photos` ON `profiles`.photo_id = `photos`.id LEFT JOIN `cities` ON `profiles`.city_id = `cities`.id LEFT JOIN `profile_genders` ON `profiles`.gender_id = `profile_genders`.id LEFT JOIN `profile_relationship_statuses` ON `profiles`.relationship_status_id = `profile_relationship_statuses`.id LEFT JOIN `profile_physiques` ON `profiles`.physique_id = `profile_physiques`.id LEFT JOIN `profile_children` ON `profiles`.children_id = `profile_children`.id LEFT JOIN `profile_diets` ON `profiles`.diet_id = `profile_diets`.id LEFT JOIN `profile_drinks` ON `profiles`.drink_id = `profile_drinks`.id LEFT JOIN `profile_smokes` ON `profiles`.smoke_id = `profile_smokes`.id LEFT JOIN `profile_drugs` ON `profiles`.drug_id = `profile_drugs`.id LEFT JOIN `profile_educations` ON `profiles`.education_id = `profile_educations`.id LEFT JOIN `profile_incomes` ON `profiles`.income_id = `profile_incomes`.id LEFT JOIN `profile_jobs` ON `profiles`.job_id = `profile_jobs`.id LEFT JOIN `profile_politics` ON `profiles`.politic_id = `profile_politics`.id LEFT JOIN `profile_religions` ON `profiles`.religion_id = `profile_religions`.id LEFT JOIN `profile_zodiacs` ON `profiles`.zodiac_id = `profile_zodiacs`.id WHERE `profiles`.`username` = 'matt' LIMIT 1
Unfortunately, it doesn't seem that calls to relationship attributes (e.g.: #profile.gender.name) are using the data that was returned in the original SELECT. Instead, I see a flood of queries following this first one:
Profile::Gender Load (0.2ms) SELECT `profile_genders`.* FROM `profile_genders` WHERE `profile_genders`.`id` = 1 LIMIT 1
Profile::Gender Load (0.4ms) SELECT `profile_genders`.* FROM `profile_genders` INNER JOIN `profile_attractions` ON `profile_genders`.id = `profile_attractions`.gender_id WHERE ((`profile_attractions`.profile_id = 2))
City Load (0.4ms) SELECT `cities`.* FROM `cities` WHERE `cities`.`id` = 1 LIMIT 1
Country Load (0.3ms) SELECT `countries`.* FROM `countries` WHERE `countries`.`id` = 228 ORDER BY FIELD(code, 'US') DESC, name ASC LIMIT 1
Profile Load (0.4ms) SELECT `profiles`.* FROM `profiles` WHERE `profiles`.`id` = 2 LIMIT 1
Profile::Language Load (0.4ms) SELECT `profile_languages`.* FROM `profile_languages` INNER JOIN `profile_profiles_languages` ON `profile_languages`.id = `profile_profiles_languages`.language_id WHERE ((`profile_profiles_languages`.profile_id = 2))
SQL (0.3ms) SELECT COUNT(*) FROM `profile_ethnicities` INNER JOIN `profile_profiles_ethnicities` ON `profile_ethnicities`.id = `profile_profiles_ethnicities`.ethnicity_id WHERE ((`profile_profiles_ethnicities`.profile_id = 2))
Profile::Religion Load (0.5ms) SELECT `profile_religions`.* FROM `profile_religions` WHERE `profile_religions`.`id` = 2 LIMIT 1
Profile::Politic Load (0.2ms) SELECT `profile_politics`.* FROM `profile_politics` WHERE `profile_politics`.`id` = 3 LIMIT 1
your example is fine and it will end up in two queries because that's how eager loading is implemented in rails. It becomes handy if you have many associated records. You can read more about it here
What you probably want is a simple join:
#profile = Profile.joins(:gender).where(:username => params[:id])
Edit
If the profile consists of many pieces there are multiple approaches here:
Custom left joins - maybe there is a plugin out there which does the job otherwise I'd suggest to do something like:
class Profile < ActiveRecord::Base
# .... code .....
def self.with_dependencies
attr_joins = []
attr_selects = []
attr_selects << "`profiles`.*"
attr_selects << "`genders`.*"
attr_selects << "`colors`.*"
attr_joins << "LEFT JOIN `genders` ON `gender`.`id` = `profiles`.gender_id"
attr_joins << "LEFT JOIN `colors` ON `colors`.`id` = `profiles`.color_id"
prep_model = select(attr_selects.join(','))
attr_joins.each do |c_join|
prep_model = prep_model.joins(c_join)
end
return prep_model
end
end
Now you could do something like:
#profile = Profile.with_dependencies.where(:username => params[:id])
Another solution is to use the :include => [:gender, :color] it may be some queries more but it's the cleaner "rails way". If you run into performance issues you may want to rethink your DB Schema but do you have really such a heavy load?
A friend of mine wrote a nice little solution for this simple 1:n relations (like genders) it's called simple_enum
After working with sled's suggestions, I finally came up with this solution. I'm sure it could be made cleaner with a plugin, but here's what I've got for now:
class Profile < ActiveRecord::Base
ALL_ATTRIBUTES = [:photo, :city, :gender, :relationship_status, :physique, :children,
:diet, :drink, :smoke, :drug, :education, :income, :job, :politic, :religion, :zodiac]
scope :with_attributes,
includes(ALL_ATTRIBUTES).
select((ALL_ATTRIBUTES.collect { |a| "`#{reflect_on_association(a).table_name}`.*" } + ["`#{table_name}`.*"]).join(', '))
end
The two main points are:
A call to includes, which passes the symbols of the relationships I want
A call to select that makes sure to retrieve all columns for the related tables. Note that I call reflect_on_association so that I don't have to hard-code the related tables' names, letting the Rails models do the work for me.
I can now call:
Profile.with_attributes.where(:username => params[:id]).limit(1).first
Going to mark sled's answer as correct since it's his help (answers + comments combined) that led me here, even though this is the code I'm ultimately using.