active record relations – who needs it? - ruby-on-rails

Well, I`m confused about rails queries. For example:
Affiche belongs_to :place
Place has_many :affiches
We can do this now:
#affiches = Affiche.all( :joins => :place )
or
#affiches = Affiche.all( :include => :place )
and we will get a lot of extra SELECTs, if there are many affiches:
Place Load (0.2ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.3ms) SELECT "places".* FROM "places" WHERE "places"."id" = 3 LIMIT 1
Place Load (0.8ms) SELECT "places".* FROM "places" WHERE "places"."id" = 444 LIMIT 1
Place Load (1.0ms) SELECT "places".* FROM "places" WHERE "places"."id" = 222 LIMIT 1
...and so on...
And (sic!) with :joins used every SELECT is doubled!
Technically we cloud just write like this:
#affiches = Affiche.all( )
and the result is totally the same! (Because we have relations declared). The wayout of keeping all data in one query is removing the relations and writing a big string with "LEFT OUTER JOIN", but still there is a problem of grouping data in multy-dimentional array and a problem of similar column names, such as id.
What is done wrong? Or what am I doing wrong?
UPDATE:
Well, i have that string Place Load (2.5ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20)) and a list of selects one by one id. Strange, but I get these separate selects when I`m doing this in each scope:
<%= link_to a.place.name, **a.place**( :id => a.place.friendly_id ) %>
the marked a.place is the spot, that produces these extra queries.
UPDATE 2:
And let me do some math. In console we have:
Affiche Load (1.8ms) SELECT affiches.*, places.name FROM "affiches" LEFT OUTER JOIN "places" ON "places"."id" = "affiches"."place_id" ORDER BY affiches.event_date DESC
<VS>
Affiche Load (1.2ms) SELECT "affiches".* FROM "affiches"
Place Load (2.9ms) SELECT "places".* FROM "places" WHERE ("places"."id" IN (3,444,222,57,663,32,154,20))
Comes out: 1.8ms versus 4.1ms, pretty much, confusing...

Something is really strange here because :include option is intended to gather place_id attribute from every affiche and then fetch all places at once using select query like this:
select * from places where id in (3, 444, 222)
You can check that in rails console. Just start it and run that snippet:
ActiveRecord::Base.logger = Logger.new STDOUT
Affiche.all :include => :place
You might be incidentally fetching affiches without actually including places somewhere in your code and than calling place for every affiche making rails to perform separate query for every one of them.

Related

Count records after where clause

I have three models: Catalog, Upload and Product. A product belongs to a catalog, and an upload belongs to a product.
I need to count the number of uploads for all the products of a given catalog.
This is the way I've been doing it so far, which is incredibly slow for a large amount of uploads or products:
#products = Product.where(catalog_id: 123)
#uploads_count = Upload.where(product_id: #products.pluck(:id)).count
I'd like to avoid loading all the products just for a count.
Should I use raw SQL or is there a better way to do this with ActiveRecord ?
This should do it for you:
Upload.joins(:product).where(products: { catalog_id: 123 }).count
Using joins creates an INNER JOIN between the two tables, allowing you to query the products table as above.
Note the singular and plural uses of product - the joins should reflect the association (the upload belongs to one product), while the where clause always uses the table name, typically pluralised.
The SQL will look similar to:
SELECT "uploads".* FROM "uploads"
INNER JOIN "products"
ON "products"."id" = "uploads"."product_id"
WHERE "products"."catalog_id" = 123
If you need to have more information on the catalog you can also include this, something like the following:
Upload.joins(product: :catalog).where(products: { catalogs: { whatever: 'you want to query' } }).count
Bear in mind, using joins is just for a query such as this. If you need to access attributes of the product or catalog, you should use another approach, such as includes, to preload the data and avoid N + 1 queries. There's a good read here if you're interested.
Another way to avoid selecting records is to use sub-query. This can be done the following way:
query = User.where(id: 1..100)
User.where(id: query.select(:id)).count
# [DEBUG] (10.5ms) SELECT COUNT(*) FROM "users" WHERE "users"."id" IN (SELECT "users"."id" FROM "users" WHERE ("users"."id" BETWEEN $1 AND $2)) [["id", 1], ["id", 100]]
# => 33
So, User.where(id: 1..100) prepares a query, that can be used as a sub-select. .select(:field) tells what field you are interested in.
Though for a basic count, SRack provides a good answer.

How can I minimize queries when serving JSON through a Rails API?

I'm preparing an API in rails to serve an AngularJS app. This app will provide a dashboard for managing people in a database, so the main page for an individual person is pulling in a lot of information. Here's the Jbuilder file I'm using to format the information as JSON:
json.extract! #person, :id, :employee_id, :display_name
json.appointments #person.appointments, :id, :jobcode, :title
json.flags #person.flags, :id, :name
json.source_relationships #person.source_relationships, :id, :source_id, :target_id, :relationship_type_id
json.target_relationships #person.target_relationships, :id, :source_id, :target_id, :relationship_type_id
The JSON returned looks like this (from /api/v1/people/1685.json):
{
"id":1685,
"employee_id":"9995999",
"display_name":"John Doe",
"appointments": [
{"id":353,"jobcode":"TE556","title":"Developer"}
],
"flags":[
{"id":5,"name":"Unclassified"},
{"id":7,"name":"Full Time"}
],
"source_relationships":[
{"id":19,"source_id":1685,"target_id":1648,"relationship_type_id":9},
{"id":21,"source_id":1685,"target_id":1606,"relationship_type_id":9}
],
"target_relationships":[
{"id":1,"source_id":1648,"target_id":1685,"relationship_type_id":10}
]
}
And the console shows these queries:
Person Load (0.1ms) SELECT `people`.* FROM `people` WHERE `people`.`id` = 1685 LIMIT 1
Appointment Load (0.1ms) SELECT `appointments`.* FROM `appointments` WHERE `appointments`.`person_id` = 1685
Flag Load (0.1ms) SELECT `flags`.* FROM `flags`
INNER JOIN `flags_people` ON `flags`.`id` = `flags_people`.`flag_id` WHERE `flags_people`.`person_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`source_id` = 1685
Relationship Load (0.1ms) SELECT `relationships`.* FROM `relationships` WHERE `relationships`.`target_id` = 1685
I like the way the JSON is formatted, but the fact that it has to run 5 separate queries seems inefficient. I tried adding joins() or includes() methods to active record query, which is currently just: #person = Person.find(params[:id]), but that didn't seem to be what I wanted. How can I cleanly minimize the number of queries while still returning JSON in a similar format?
The method I was looking for is eager_load. Not sure how I haven't come across it in the past, but it combined all the table queries into one using LEFT OUTER JOIN.
#person = Person.eager_load(:flags, :appointments,
:source_relationships, :target_relationships).find(params[:id])
Results in this single query:
SELECT DISTINCT `people`.`id` FROM `people`
LEFT OUTER JOIN `flags_people` ON `flags_people`.`person_id` = `people`.`id`
LEFT OUTER JOIN `flags` ON `flags`.`id` = `flags_people`.`flag_id`
LEFT OUTER JOIN `appointments` ON `appointments`.`person_id` = `people`.`id`
LEFT OUTER JOIN `relationships` ON `relationships`.`source_id` = `people`.`id`
LEFT OUTER JOIN `relationships` `target_relationships_people` ON `target_relationships_people`.`target_id` = `people`.`id`
WHERE `people`.`id` = 1685 LIMIT 1
Found the explanation on this blog post from Arkency

ActiveRecord :includes - how to use map with loaded associations?

I have a small rails app, and I'm trying to get some order statistics.
So I have an Admin model, and an Order model, with one-to-many association.
class Admin < ActiveRecord::Base
attr_accessible :name
has_many :orders
class Order < ActiveRecord::Base
attr_accessible :operation
belongs_to :admin
And I'm trying to get specifical orders using this query:
admins = Admin.where(...).includes(:orders).where('orders.operation = ?', 'new gifts!')
That works just as expected. But when I try to make json using map like that
admins.map {|a| [a.name, a.orders.pluck(:operation)]}
Rails loads orders again using new query, ignoring already loaded objects.
(5.6ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 26
(6.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 24
(2.9ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 30
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 29
(4.8ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 27
(3.3ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 28
(5.1ms) SELECT "orders"."operation" FROM "orders" WHERE "orders"."admin_id" = 25
When I try to use
loop instead of map, it works as it should:
admins.each do |a|
p a.orders.pluck(:operation)
end
this code doesn't load all orders, and prints only those loaded in the first query.
Is it possible to get the same result using map? What are the drawbacks of using loop instead of map?
pluck should always make a new query to database. Not sure why you think it does not happen in an each loop. Maybe you did not see the log because it is in between your prints?
There are 2 possibilities how to avoid additional queries.
Since orders are already loaded because you include them, you can do admins.map {|a| [a.name, a.orders.collect(&:operation)]}
Using joins (see #tihom's comment).
Edit: I just tested the each/ map behavior and it reloads every time as expected.

Effectively query the number of friends?

I'm current first getting all friends of a certain user and then take the size of the array as the number of friends. My concern is, since every time I'm retrieving all friends information from the database, it is potentially (or obviously) inefficient. So I'm wondering if there's a way that could query the number of friends of a certain user effectively without getting any other information?
If your relations are declared correctly, you should be able to do user.friends.count in order to generate a DB-level count.
See here in my console the SQL queries generated (Drug has_many :details and DrugDetail belongs_to :drug):
irb(main):003:0> Drug.first.details
Drug Load (0.7ms) SELECT "drugs".* FROM "drugs" LIMIT 1
DrugDetail Load (1.9ms) SELECT "drug_details".* FROM "drug_details" WHERE "drug_details"."drug_id" = 1771
=> []
irb(main):004:0> Drug.first.details.count
Drug Load (0.7ms) SELECT "drugs".* FROM "drugs" LIMIT 1
(0.6ms) SELECT COUNT(*) FROM "drug_details" WHERE "drug_details"."drug_id" = 1771
=> 0
irb(main):006:0> Drug.first.details.to_a.size
Drug Load (2.1ms) SELECT "drugs".* FROM "drugs" LIMIT 1
DrugDetail Load (0.5ms) SELECT "drug_details".* FROM "drug_details" WHERE "drug_details"."drug_id" = 1771
=> 0
In your case, if you have your relations like this:
User has_many :friends
Friend belongs_to :user
Then this should be executed at the DB-level and be faster than your first piece of code:
User.first.friends.count

How to use includes with 3 models w ActiveRecord?

Given the following model:
Room (id, title, suggested)
has_many :room_apps, :dependent => :destroy
RoomApp (room_id, app_id, appable_id, appable_type)
belongs_to :appable, :polymorphic => true
has_many :colors, :as => :appable
has_many :shirts, :as => :appable
Colors (room_id)
belongs_to :room
belongs_to :room_app
belongs_to :app
What I want to do is get all the suggested rooms. In my controller I have:
#suggested_rooms = Room.includes(:room_apps).find_all_by_suggested(true).first(5)
Problem here is the includes is not working and the db is being hit several times:
Processing by PagesController#splash as HTML
Room Load (0.6ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't' ORDER BY last_activity_at DESC
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND ("room_apps".room_id IN (5,4,3)) ORDER BY created_at DESC
RoomApp Load (5.9ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 6 AND ("room_apps".room_id = 5) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 5 LIMIT 1
RoomApp Load (0.6ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 5 AND ("room_apps".room_id = 4) ORDER BY created_at DESC LIMIT 1
Color Load (0.4ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 4 LIMIT 1
RoomApp Load (0.4ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."published" = 't' AND "room_apps"."id" = 4 AND ("room_apps".room_id = 3) ORDER BY created_at DESC LIMIT 1
Color Load (0.3ms) SELECT "colors".* FROM "colors" WHERE "colors"."id" = 3 LIMIT 1
Is something setup incorrectly? I'd like to be able to get suggested rooms and use includes for room_apps with one hit versus currently where it's a hit for every room.
Ideas? Thanks
I think you'll either want to use the full Rails3 arel interface like so:
#suggested_rooms = Room.includes(:room_apps).where(:suggested => true).limit(5)
Or do this for Rails 2.3x:
#suggested_rooms = Room.find_all_by_suggested(true, :include=>:room_apps).first(5)
Did some digging around and I think I have an idea what's going on.
include by default does not generate a single query. It generates N queries, where N is the number of models being included.
ruby-1.9.2-p180 :014 > Room.where(:suggested => true).includes(:room_apps => :colors)
Room Load (0.5ms) SELECT "rooms".* FROM "rooms" WHERE "rooms"."suggested" = 't'
RoomApp Load (0.8ms) SELECT "room_apps".* FROM "room_apps" WHERE "room_apps"."room_id" IN (1)
Color Load (0.5ms) SELECT "colors".* FROM "colors" WHERE "colors"."room_app_id" IN (1)
One exception to this is if you have a where clause that references one of the model tables being included, in this case it will use a LEFT OUTER JOIN to add the where clause to that table.
If you want to INNER JOIN a bunch of models AND include them, you have to use both joins and includes with the given models. joins alone will only do the INNER JOIN across the relations, includes will pull in the fields and setup the returned models with their relations intact.
ruby-1.9.2-p180 :015 > Room.where(:suggested => true).joins(:room_apps => :colors)
Room Load (0.8ms) SELECT "rooms".*
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
ruby-1.9.2-p180 :016 > Room.where(:suggested => true).joins(:room_apps => :colors).includes(:room_apps => :colors)
SQL (0.6ms) SELECT "rooms"."id" AS t0_r0, "rooms"."suggested" AS t0_r1, "rooms"."created_at" AS t0_r2, "rooms"."updated_at" AS t0_r3, "room_apps"."id" AS t1_r0, "room_apps"."room_id" AS t1_r1, "room_apps"."created_at" AS t1_r2, "room_apps"."updated_at" AS t1_r3, "colors"."id" AS t2_r0, "colors"."room_id" AS t2_r1, "colors"."room_app_id" AS t2_r2, "colors"."created_at" AS t2_r3, "colors"."updated_at" AS t2_r4
FROM "rooms"
INNER JOIN "room_apps"
ON "room_apps"."room_id" = "rooms"."id"
INNER JOIN "colors"
ON "colors"."room_app_id" = "room_apps"."id"
WHERE "rooms"."suggested" = 't'
The big convoluted SELECT part in the last query is ARel making sure that the fields from all of the models are unique and able to be differentiated when they need to be mapped back to the actual models.
Whether you use includes alone or includes with joins is a matter of how much data your bringing back, and how much speed difference there might be if you were not doing the INNER JOIN, causing a great deal of duplicate data to be returned. I would imagine that if 'rooms' had something like a dozen fields and 'colors' had 1 field, but there was 100 colors that mapped to a single room, instead of pulling back 113 fields in total (1 room * 13 + 100 colors * 1) you would end up with 1400 fields (13 + 1 * 100 colors). Not exactly a performance boost.
Though the downside of using includes alone is that if you do have a large number of colors per room, the IN(ids) will be huge, bit of a double edged sword.
Here's a quick test I did with various configurations using sqlite3
I setup two sets of rooms, one with :suggested => true, the other :suggested => false. The suggested rooms had a 1:1:2 ratio between rooms/room_apps/colors, the suggested false rooms were setup with a 1:1:10 ratio of the same, and there is a 10:1 ratio between suggested and not suggested.
# 100/10 rooms
# insert only
100 * 1/1/2: 8.1ms
10 * 1/1/10: 3.2ms
# insert + joins
100 * 1/1/2: 6.2ms
10 * 1/1/10: 3.1ms
# 1000/100 rooms
# insert only
1000 * 1/1/2: 76.8ms
100 * 1/1/10: 19.8ms
# insert + joins
1000 * 1/1/2: 54.5ms
100 * 1/1/10: 23.1ms
The times are not relevant themselves, this is being run via IRB on a Ubuntu guest on a WinXP host on a crappy HDD. Given that you've got a limit(5) in there it probably isn't going to make a huge difference either way.

Resources