Rails 4: Joins vs Includes: Why different results with nested association? - ruby-on-rails

In Rails 4 app, I have two models:
Merchant has_many :offering_specials
OfferingSpecial belongs_to :merchant
I want to retrieve all merchants and their special offerings that are open (with the status_code: "OP")
I tried this:
#merchants = Merchant.joins(:offering_specials).where(offering_specials: { status_code: "OP" })
This is the query:
Merchant Load (0.4ms) SELECT `merchants`.* FROM `merchants` INNER JOIN `offering_specials` ON `offering_specials`.`merchant_id` = `merchants`.`id` WHERE `offering_specials`.`status_code` = 'OP'
But it retrieved all offering specials, both the open ("OP") and the pending ("PN").
However, using includes worked:
#merchants = Merchant.joins(:offering_specials).where(offering_specials: { status_code: "OP" })
This retrieved only the open offering specials. But look at the much slower query:
SQL (19.9ms) SELECT `merchants`.`id` AS t0_r0, `merchants`.`name` AS t0_r1, `merchants`.`slug` AS t0_r2, `merchants`.`url` AS t0_r3, `merchants`.`summary` AS t0_r4, `merchants`.`description` AS t0_r5, `merchants`.`active_for_display` AS t0_r6, `merchants`.`active_for_offerings_by_merchant` AS t0_r7, `merchants`.`active_for_offerings_by_legatocard` AS t0_r8, `merchants`.`credit_limit` AS t0_r9, `merchants`.`search_location_code` AS t0_r10, `merchants`.`image_file_name` AS t0_r11, `merchants`.`image_file_size` AS t0_r12, `merchants`.`image_content_type` AS t0_r13, `merchants`.`image_updated_at` AS t0_r14, `merchants`.`logo_file_name` AS t0_r15, `merchants`.`logo_file_size` AS t0_r16, `merchants`.`logo_content_type` AS t0_r17, `merchants`.`logo_updated_at` AS t0_r18, `merchants`.`created_at` AS t0_r19, `merchants`.`updated_at` AS t0_r20, `offering_specials`.`id` AS t1_r0, `offering_specials`.`special_number` AS t1_r1, `offering_specials`.`merchant_id` AS t1_r2, `offering_specials`.`merchant_user_id` AS t1_r3, `offering_specials`.`nonprofit_percentage` AS t1_r4, `offering_specials`.`discount_percentage` AS t1_r5, `offering_specials`.`start_at` AS t1_r6, `offering_specials`.`end_at` AS t1_r7, `offering_specials`.`closed_at` AS t1_r8, `offering_specials`.`max_dollar_amount_for_offering` AS t1_r9, `offering_specials`.`max_dollar_amount_per_buyer` AS t1_r10, `offering_specials`.`status_code` AS t1_r11, `offering_specials`.`created_at` AS t1_r12, `offering_specials`.`updated_at` AS t1_r13 FROM `merchants` LEFT OUTER JOIN `offering_specials` ON `offering_specials`.`merchant_id` = `merchants`.`id` WHERE `offering_specials`.`status_code` = 'OP'
How can I get get this query to work with a joins, instead of the includes?

Queries of this sort do not normally return associated records. You're requesting a list of Merchants, and that's what you get. When you subsequently request the associated OfferingSpecials of one of those Merchants, a new query is executed (which you should see in the logs), and you get all of them, because you did not specify otherwise. The code in your question does not include the place where you do this, but you must be doing it somewhere, in order to get the OfferingSpecials.
Using includes asks to eager-load the association, which means that it will be subject to the restrictions of the query, which is why you're seeing it work when you do it that way. It's slower because it's fetching those extra records for you now, instead of doing it separately later.
If you really do want to refactor this using .joins, you simply need to add the conditional to the line where you fetch the .offering_specials of the Merchant:
#merchants.each do |m|
m.offering_specials.where(:status_code => 'OP')
end
However, you should read up on why eager loading exists before doing so - it is likely that either you are already getting better performance by doing one slower query vs. many fast ones, or that you will do so if the number of merchant records involve passes some threshold (which may or may not happen, depending on the nature of your app).

I've wanted leaner queries with .includes(...) as well, and have now released this feature as a part of a data-related gem I maintain, The Brick.
By overriding ActiveRecord::Associations::JoinDependency.apply_column_aliases() like this then when you add a .select(...) then it can act as a filter to choose which column aliases get built out.
With gem 'brick' loaded, in order to enable this selective behaviour, add the special column name :_brick_eager_load as the first entry in your .select(...), which turns on the filtering of columns while the aliases are being built out. Here's an example from your merchant offerings data set:
#merchants = Merchant.includes(:offering_specials)
.references(:offering_specials)
.where(offering_specials: { status_code: "OP" })
.select(:_brick_eager_load, # Turns on the filtering
:name, :slug,
'offering_specials.discount_percentage',
'offering_specials.start_at', 'offering_specials.end_at', 'offering_specials.closed_at'
Hope it can save you both query time and some RAM!

Related

Need assistance re-writing some Rails DB Queries that were originally written with a gem called Squeel, have the working SQL

I have a couple really hard DB queries I need help to re-write in the correct way for Rails 6 Active Record. These are currently working in an app I an re-writing to the new version of Ruby on Rails (6.1.4.2).
It was originally written on Rails v3.2 with a Hell gem called squeel which uses its own DSL Language.
https://github.com/activerecord-hackery/squeel
I have been trying for days now and haven't been able to get it figured out. The first time I asked it I probably wasn't as clear as I needed to be. So this time I am going to put the query as it was written in squeel, and the SQL that the console from Heroku is spitting out and that's all. If anyone wants any additional information ask and I will HAPPILY post it. I want to keep it simple to start with as they are confusing enough.
WARNING: These seem to be EXTREMLY COMPLICATED.
ANY HELP would be VERY Appreciated! :)
Here is squeel DB Query 1:
Project.joins{vendor}.joins{certifications.outer}.where{
(projects.vendor_id.eq my{ vendor_id }) |
(vendors.parent_vendor_id.eq my{ vendor_id }) |
((certifications.cdti == true) & (certifications.published == true))
}.uniq
Here is the strait SQL from query 1 out of Rails v3.2:
SELECT DISTINCT "vendors".* FROM "vendors" INNER JOIN "projects" ON "projects"."vendor_id" = "vendors"."id"
INNER JOIN "certifications" ON "certifications"."project_id" = "projects"."id"
WHERE (("certifications"."cdti" = 't' AND "certifications"."published" = 't'))
ORDER BY "vendors"."parent_vendor_id", "vendors"."name"
Here is the squeel DB query 2:
Fleet.joins{vendor.projects.certifications}.
where{(certifications.cdti.eq true) & (certifications.published.eq true)}.
uniq.includes(:vendor).
order(:vendor_id, :name)
Here is the strait SQL from query 2 out of Rails v3.2:
(I hit enter in a few places so you could see the entire statement without having to scroll to the right
SELECT DISTINCT "fleets".* FROM "fleets" INNER JOIN "vendors" ON "vendors"."id" = "fleets"."vendor_id"
INNER JOIN "projects" ON "projects"."vendor_id" = "vendors"."id"
INNER JOIN "certifications" ON "certifications"."project_id" = "projects"."id"
WHERE (("certifications"."cdti" = 't' AND "certifications"."published" = 't'))
ORDER BY "fleets"."vendor_id", "fleets"."name"
Again if anyone wants to see or know anything else just let me know as I am trying my best to figure this out, but these seem so advanced I just don't think I know the correct syntax.
Thank You for your time,
Scott
Query 1 equivalent is:
Vendor.joins(projects: :certifications).where(certifications: { cdti: 't', published: 't' }).order(:parent_vendor_id, :name).distinct
Query 2:
Fleet.joins(vendor: { projects: :certifications }).where(certifications: { cdti: 't', published: 't' }).order(:vendor_id, :name).distinct

How to prevent SELECTing extra fields in a JOINed .includes()

I’m trying to implement parametrized grouping to a report. A simplified example of what I’m trying to achieve:
observation_query = Observation.includes(:reporter).order("reporters.name")
if params[:group_results]
observation_query = observation_query
.select("DATE(observations.created_at) AS created_at, AVG(value) AS value")
.group("DATE(observations.created_at)", :reporter_id)
end
observation_query.each do |observation|
puts "#{observation.reporter.name} #{observation.created_at}: #{observation.value}"
end
When grouping is not used, or if I remove the ordering, the results are as expected. But when both ordering and grouping are used, the query generated due to having to achieve the eager loading with JOINs is:
SELECT DATE(observations.updated_at) AS updated_at, AVG(value) AS value,
`observations`.`id` AS t0_r0,
`observations`.`value` AS t0_r1,
`observations`.`reporter_id` AS t0_r2,
...
`observations`.`created_at` AS t0_r6,
`observations`.`updated_at` AS t0_r7,
`reporters`.`id` AS t1_r0,
...
FROM `observations` INNER JOIN `reporters` ON `reporters`.`id` = `observations`.`user_id`
GROUP BY DATE(observations.created_at), `observations`.`reporter_id`
ORDER BY reporters.name
..which gives the MySQL error 'observations.id' isn't in GROUP BY. How do I prevent selection of columns which are not used for grouping?
I got it working with preload, which seems to work similarly to includes, with the difference that JOINs and SELECTs of the primary query are controlled manually.
observation_query = Observation.joins(:pulse).preload(:reporter).order("reporters.name")
if params[:group_results]
observation_query = observation_query
.select(:reporter_id, "DATE(observations.created_at) AS created_at, AVG(value) AS value")
.group("DATE(observations.created_at)", :reporter_id)
end
The thing about this solution is that table reporters is queried twice, first JOINed for ordering and then a second query that SELECTs the values for filling the associated records. Because the equivalent of reporters.name is indexed in my actual case, this is good enough, but the optimal solution would generate a single query, so I’m not marking this as the answer.

Comparing .references requirement on includes vs. eager_load

I know that when you utilize includes and you specify a where clause on the joined table, you should use .references
example:
# will error out or throw deprecation warning in logs
users = User.includes(:orders).where("Orders.cost < ?", 20)
In rails 4 or later, you will get an error like the following:
Mysql2::Error: Unknown column 'Orders.cost' in 'where clause': SELECT
customers.* FROM customers WHERE (Orders.cost < 100)
Or you will get a deprecation warning:
DEPRECATION WARNING: It looks like you are eager loading table(s) (one
of: users, addresses) that are referenced in a string SQL snippet. For
example:
Post.includes(:comments).where("comments.title = 'foo'") Currently,
Active Record recognizes the table in the string, and knows to JOIN
the comments table to the query, rather than loading comments in a
separate query. However, doing this without writing a full-blown SQL
parser is inherently flawed. Since we don't want to write an SQL
parser, we are removing this functionality. From now on, you must
explicitly tell Active Record when you are referencing a table from a
string:
Post.includes(:comments).where("comments.title =
'foo'").references(:comments)
If you don't rely on implicit join references you can disable the
feature entirely by setting
config.active_record.disable_implicit_join_references = true. (
SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email"
AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS
t0_r4, "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1,
"addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3,
"addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5,
"addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7
FROM "users" LEFT OUTER JOIN "addresses" ON "addresses"."user_id" =
"users"."id" WHERE (addresses.country = 'Poland')
so we do this:
# added .references(:orders)
users = User.includes(:orders).where("Orders.cost < ?", 20).references(:orders)
And it executes just fine:
SELECT "users"."id" AS t0_r0,
"users"."name" AS t0_r1,
"users"."created_at" AS t0_r2,
"users"."updated_at" AS t0_r3,
"orders"."id" AS t1_r0,
"orders"."cost" AS t1_r1,
"orders"."user_id" AS t1_r2,
"orders"."created_at" AS t1_r3,
"orders"."updated_at" AS t1_r4
FROM "users"
LEFT OUTER JOIN "orders"
ON "orders"."user_id" = "users"."id"
WHERE ( orders.cost < 20 )
I know that .includes is just a wrapper for two methods: eager_load and preload. I know that since my query above is doing a filter on a joined table (orders in this example), includes is smart and knows to pick the eager_load implementation over preload because preload cannot handle doing this query since preload does not join tables.
Here is where I am confused. Ok: So on that query above: under the hood includes will utilize the eager_load implementation. But notice how when I explicitly use eager_load for this same query (which is what includes is essentially doing): I do not need to use .references! It runs the query and loads the data just fine. No error and no deprecation warning:
# did not specify .references(:orders), and yet no error and no deprecation warning
users = User.eager_load(:orders).where("Orders.cost < ?", 20)
And it executes the exact same process with no problem:
SELECT "users"."id" AS t0_r0,
"users"."name" AS t0_r1,
"users"."created_at" AS t0_r2,
"users"."updated_at" AS t0_r3,
"orders"."id" AS t1_r0,
"orders"."cost" AS t1_r1,
"orders"."user_id" AS t1_r2,
"orders"."created_at" AS t1_r3,
"orders"."updated_at" AS t1_r4
FROM "users"
LEFT OUTER JOIN "orders"
ON "orders"."user_id" = "users"."id"
WHERE ( orders.cost < 20 )
That seems odd. Why does .references need to be specified for the includes version of the query, whereas .references does not need to be specified for the eager_load version of the query? What am I missing here?
It comes down to the problem they mention in the deprecation warning:
Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality.
In older versions, Rails tried to be helpful about selecting the query pattern to use, and includes would use the preload strategy if it could, but switch to the eager_load strategy when it looks like you're referencing something in a joined table. But without a full SQL parser figuring out what tables are actually referenced, it's like parsing XHTML with a Regex - you can get some things done, but Rails can't decide correctly in every case. Consider:
User.includes(:orders).where("Orders.cost < 20")
This is a nice, simple example, and Rails could tell that you need Orders joined. Now try this one:
User.includes(:orders).where("id IN (select user_id from Orders where Orders.cost < 20)")
This gives the same result, but the subquery rendered joining Orders unnecessary. It's a contrived example, and I don't know whether Rails would decide the second query needed to join or not, but the point is there are cases when the heuristic could make the wrong decision. In those cases, either Rails would perform an unnecessary join, burning memory and slowing the query down, or not perform a necessary join, causing an error.
Rather than maintain a heuristic with a pretty bad failure case, the developers decided to just ask the programmer whether the join is needed. You're able to get it right more often than Rails can (hopefully), and when you get it wrong, it's clear what to change.
Instead of adding references you could switch to eager_load, but keeping includes and references separate allows the implementation flexibility in its query pattern. You could conceivably .includes(:orders, :addresses).references(:orders) and have addresses loaded in a second preload-style query because it's not needed during the join (though Rails actually just includes addresses in the join anyway). You don't need to specify references when you're using eager_load because eager_load always joins, where preload always does multiple queries. All references does is instruct includes to use the necessary eager_load strategy and specify which tables are needed.

Filter parents by child attribute, but eager-load all children

That title is a bit obtuse, so here's an example. Suppose we have a Rails 3 app with models Ship, Pirate, and Parrot. A ship has_many pirates, and a pirate has_many parrots.
Ship.includes(pirates: :parrots).where('parrots.name LIKE ?', '%polly%')
This returns ships having at least one pirate with at least one parrot whose name is like "polly". I would also like it to eager-load all of the pirates and parrots for those ships... but in reality only the pirates with matching parrots are eager-loaded, and among those, only the matching parrots are eager-loaded. The generated SQL is something like this:
SELECT ships.id AS t0_r0, ships.name AS t0_r1, pirates.id AS t1_r0, pirates.name AS t1_r1, parrots.id AS t2_r0, parrots.name AS t2_r1 FROM ships LEFT OUTER JOIN pirates ON pirates.ship_id = ships.id LEFT OUTER JOIN parrots ON parrots.pirate_id = pirates.id WHERE (parrots.name LIKE '%polly%')
When doing Ship.includes(pirates: :parrots) without the condition, ActiveRecord generates a bundle of queries that is somewhat closer to what I want:
SELECT ships.* FROM ships
SELECT pirates.* FROM pirates WHERE pirates.ship_id IN (ship IDs from previous query)
SELECT parrots.* FROM parrots WHERE parrots.pirate_id IN (pirate IDs from previous query)
If I could somehow change that first query to use the SQL from the first example, it would do exactly what I want:
SELECT ships.* FROM ships LEFT OUTER JOIN pirates ON pirates.ship_id = ships.id LEFT OUTER JOIN parrots ON parrots.pirate_id = pirates.id WHERE (parrots.name LIKE '%polly%')
SELECT pirates.* FROM pirates WHERE pirates.ship_id IN (ship IDs from previous query)
SELECT parrots.* FROM parrots WHERE parrots.pirate_id IN (pirate IDs from previous query)
But I'm not aware of any way to get ActiveRecord to do this, or any way to do it myself and "manually" wire up the eager-loading (which is necessary in my situation to avoid an N+1 query explosion). Any ideas or advice would be appreciated.
Ship.joins(pirates: :parrots).where('parrots.name LIKE ?', '%polly%').preload(pirates: :parrots)
requires rails 3+
If INNER JOIN is what you're looking for, I think
Ship.includes(pirates: :parrots).where('parrots.name LIKE ?', '%polly%').joins(pirates: :parrots)
gets it done.

Is there any way to avoid excess ActiveRecord calls?

I have the following ActiveRecord call in a Rails controller ("filings#index"):
#filings = Filing.order("created_at DESC").limit(limit).offset(start).joins("LEFT OUTER JOIN companies ON companies.id=filings.company_id")
Each Filing belongs_to a Company. I would like to be able to access:
#filings.first.company
Without having to make an additional SQL query as that was the entire purpose of completing an OUTER JOIN in the first place. However when I call #filings.first.company it performs an additional query:
SELECT "companies".* FROM "companies" WHERE "companies"."id" = 989 LIMIT 1
How can I avoid this second query from taking place? Shouldn't the information already have been stored as a result of the initial query?
You need to include the information from the database:
#filings = Filing.includes(:company).order("created_at DESC").offset(start).limit(limit)
hat tip to John Naegle and tharrison

Resources