ActiveRecord OUTER JOIN - ruby-on-rails

In a Google Group, I've added my question - here infact!
My question is how to express this outer join 'the Rails way':
SELECT
`employees`.`id` AS t0_r0, `employees`.`name` AS t0_r1, `employees`.`last_seen` AS t0_r2, `employees`.`created_at` AS t0_r3, `employees`.`updated_at` AS t0_r4, `employees`.`punch_clock_id` AS t0_r5, `employees`.`account_id` AS t0_r6, `employees`.`born_at` AS t0_r7, `entrances`.`id` AS t1_r0, `entrances`.`employee_id` AS t1_r1, `entrances`.`clocked_at` AS t1_r2, `entrances`.`created_at` AS t1_r3, `entrances`.`updated_at` AS t1_r4, `entrances`.`entrance_type` AS t1_r5
FROM
`employees`
LEFT OUTER JOIN
`entrances` ON `entrances`.`employee_id` = `employees`.`id` AND (`entrances`.`clocked_at` BETWEEN '2015-01-01' AND '2015-01-31’)
WHERE
`employees`.`account_id` = 2
My first take was:
current_user.account.employees.includes(:entrances).where( 'entrances.clocked_at' => #month_range)
but that does only provide for employees with entrances (unless I loose the 'where' part of the statement, in which case I do get all employees (and all 250K entrances!!!))

If you want to include employees with no entrances you'll have to explicitly include that case in your where statement by calling entrances.id IS NULL.
current_user.account.employees.includes(:entrances).where( 'entrances.id IS NULL OR entrances.clocked_at = ?', #month_range)
That last part might look slightly different depending on what #month_range contains.

Related

PG::UndefinedTable: ERROR: missing FROM-clause entry for table "Teams"

I get the following error when trying to order a query by an associated table's "name" attribute.
Error
PG::UndefinedTable: ERROR: missing FROM-clause entry for table "team"
Query
: SELECT "team_seasons"."id" AS t0_r0, "team_seasons"."season_id" AS t0_r1, "team_seasons"."team_id" AS t0_r2, "team_seasons"."created_at" AS t0_r3, "team_seasons"."updated_at" AS t0_r4, "team_seasons"."points" AS t0_r5, "team_seasons"."goals_for" AS t0_r6, "team_seasons"."goals_against" AS t0_r7, "team_seasons"."games_played" AS t0_r8, "teams"."id" AS t1_r0, "teams"."name" AS t1_r1, "teams"."city" AS t1_r2, "teams"."created_at" AS t1_r3, "teams"."updated_at" AS t1_r4 FROM "team_seasons" LEFT OUTER JOIN "teams" ON "teams"."id" = "team_seasons"."team_id" WHERE "team_seasons"."season_id" = 1 ORDER BY team.name
SeasonsController.rb
class SeasonsController < ApplicationController
load_and_authorize_resource
def show
#season = Season.find(params[:id])
#teamseasons = TeamSeason.where(season: #season).includes(:team).order("team.name")
end
Removing the order or ordering by an attribute on the teamseason record doesn't produce the error so I know it's something related to the ordering of the related record. I thought including it was all I needed to do but obviously it's not working. Any help is much appreciated.
Try changing team.name to teams.name in (mind s)
TeamSeason.where(season: #season).includes(:team).order("team.name")

For the sake of efficiency and optimization, should I use eager_load or includes?

Currently, my code reads like this:
current_user.association.includes(a: [:b, {c: :d}, {e: :f}]).to_a
When doing a call, it seems every single includes is called through its own SELECT call to the DB.
However, when I do current_user.association.eager_load(a: [:b, {c: :d}, {e: :f}]).to_a I see one huge SELECT call.
I ask because I haven't seen this raised before. I would assume that the eager_load is more efficient due to less DB calls.
As I can't infer the query from your description (a: [:b, {c: :d}, {e: :f}]), I need to talk about includes for a little bit.
includes is a query method which accommodates in different situations.
Here are some example code:
# model and reference
class Blog < ActiveRecord::Base
has_many :posts
# t.string "name"
# t.string "author"
end
class Post < ActiveRecord::Base
belongs_to :blog
# t.string "title"
end
# seed
(1..3).each do |b_id|
blog = Blog.create(name: "Blog #{b_id}", author: 'someone')
(1..5).each { |p_id| blog.posts.create(title: "Post #{b_id}-#{p_id}") }
end
In one case, it fires two separate queries, just like preload.
> Blog.includes(:posts)
Blog Load (2.8ms) SELECT "blogs".* FROM "blogs"
Post Load (0.7ms) SELECT "posts".* FROM "posts" WHERE "posts"."blog_id" IN (1, 2, 3)
In another case, when querying on the referenced table, it fires only one LEFT OUTER JOIN query, just like eager_load.
> Blog.includes(:posts).where(posts: {title: 'Post 1-1'})
SQL (0.3ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id" WHERE "posts"."title" = ? [["title", "Post 1-1"]]
So, I think you may asking for the different part of includes and eager_load, which is
Should we use two separate queries or one LEFT OUTER JOIN query for the sake of efficiency and optimisation?
This also confuses me. After some digging, I've found this article by Fabio Akita convinced me. Here are some references and example:
For some situations, the monster outer join becomes slower than many smaller queries. The bottom line is: generally it seems better to split a monster join into smaller ones. This avoid the cartesian product overload problem.
The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.
Example for query data from Rails
> Blog.eager_load(:posts).map(&:name).count
SQL (0.9ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id"
=> 3
Example for SQL data returned from LEFT OUTER JOIN query
sqlite> SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id";
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|1|Post 1-1|2015-11-11 15:22:35.053689|2015-11-11 15:22:35.053689|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|2|Post 1-2|2015-11-11 15:22:35.058113|2015-11-11 15:22:35.058113|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|3|Post 1-3|2015-11-11 15:22:35.062776|2015-11-11 15:22:35.062776|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|4|Post 1-4|2015-11-11 15:22:35.065994|2015-11-11 15:22:35.065994|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|5|Post 1-5|2015-11-11 15:22:35.069632|2015-11-11 15:22:35.069632|1
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|6|Post 2-1|2015-11-11 15:22:35.078644|2015-11-11 15:22:35.078644|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|7|Post 2-2|2015-11-11 15:22:35.081845|2015-11-11 15:22:35.081845|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|8|Post 2-3|2015-11-11 15:22:35.084888|2015-11-11 15:22:35.084888|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|9|Post 2-4|2015-11-11 15:22:35.087778|2015-11-11 15:22:35.087778|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|10|Post 2-5|2015-11-11 15:22:35.090781|2015-11-11 15:22:35.090781|2
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|11|Post 3-1|2015-11-11 15:22:35.097479|2015-11-11 15:22:35.097479|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|12|Post 3-2|2015-11-11 15:22:35.103512|2015-11-11 15:22:35.103512|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|13|Post 3-3|2015-11-11 15:22:35.108775|2015-11-11 15:22:35.108775|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|14|Post 3-4|2015-11-11 15:22:35.112654|2015-11-11 15:22:35.112654|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|15|Post 3-5|2015-11-11 15:22:35.117601|2015-11-11 15:22:35.117601|3
We got the expected result from Rails, but bigger result from SQL. And that's the efficiency lose for the LEFT OUTER JOIN.
So my conclusion is, prefer includes over eager_load.
I've concluded a blog post about Preload, Eager_load, Includes, References, and Joins in Rails while researching. Hope this can help.
Reference
Remove N+1 queries in your Ruby on Rails app
Rails :include vs. :joins
Preload, Eagerload, Includes and Joins
Rolling with Rails 2.1 - The First Full Tutorial - Part 2
So, as it turns out, at one point ActiveRecord actually attempted to get everything into one query, but then opted it wasn't such a good idea.
I explored this with my query above and 4000 records.
A quick analysis:
eager_load took 2,600 milliseconds.
includes took 72 milliseconds.
eager_load took 36 times as long as includes.

searching by email in ActiveRecord using named scope "<field> is ambiguous'

Using rails 3.2.6.
# 1 letter domain name in email without scope
> Member.where('UPPER(email) LIKE UPPER(?)' , "a#b.com")
Member Load (0.7ms) SELECT "members".* FROM "members" WHERE (UPPER(email) LIKE UPPER('a#b.com'))
=> []
# 2 letter domain name in email without scope
> Member.where('UPPER(email) LIKE UPPER(?)' , "a#bc.com")
Member Load (0.7ms) SELECT "members".* FROM "members" WHERE (UPPER(email) LIKE UPPER('a#bc.com'))
=> []
# 1 letter domain name in email with scope
> Member.with_households.where('UPPER(email) LIKE UPPER(?)' , "a#b.com")
Member Load (0.7ms) SELECT "members".* FROM "members" WHERE (UPPER(email) LIKE UPPER('a#b.com'))
=> []
# 2 letter domain name in email with scope
> Member.with_households.where('UPPER(email) LIKE UPPER(?)' , "a#bs.com")
SQL (0.6ms) SELECT "members"."id" AS t0_r0, "members"."last_name" AS t0_r1, "members"."first_name" AS t0_r2, "members"."household_id" AS t0_r3, "members"."created_at" AS t0_r4, "members"."updated_at" AS t0_r5, "members"."phone1" AS t0_r6, "members"."phone2" AS t0_r
7, "members"."address1" AS t0_r8, "members"."address2" AS t0_r9, "members"."city" AS t0_r10, "members"."state" AS t0_r11, "members"."zip" AS t0_r12, "members"."notes" AS t0_r13, "members"."active" AS t0_r14, "members"."email" AS t0_r15, "households"."id" AS t1_r0, "ho
useholds"."balance" AS t1_r1, "households"."created_at" AS t1_r2, "households"."updated_at" AS t1_r3, "households"."notes" AS t1_r4, "members_households"."id" AS t2_r0, "members_households"."last_name" AS t2_r1, "members_households"."first_name" AS t2_r2, "members_hou
seholds"."household_id" AS t2_r3, "members_households"."created_at" AS t2_r4, "members_households"."updated_at" AS t2_r5, "members_households"."phone1" AS t2_r6, "members_households"."phone2" AS t2_r7, "members_households"."address1" AS t2_r8, "members_households"."ad
dress2" AS t2_r9, "members_households"."city" AS t2_r10, "members_households"."state" AS t2_r11, "members_households"."zip" AS t2_r12, "members_households"."notes" AS t2_r13, "members_households"."active" AS t2_r14, "members_households"."email" AS t2_r15 FROM "members
" LEFT OUTER JOIN "households" ON "households"."id" = "members"."household_id" LEFT OUTER JOIN "members" "members_households" ON "members_households"."household_id" = "households"."id" WHERE (UPPER(email) LIKE UPPER('a#bs.com'))
ActiveRecord::StatementInvalid: PGError: ERROR: column reference "email" is ambiguous
LINE 1: ..."."household_id" = "households"."id" WHERE (UPPER(email) LIK...
Here's the error:
ActiveRecord::StatementInvalid: PGError: ERROR: column reference "email" is ambiguous
A Household has_many Members.
Here's the with_households scope definition:
scope :with_households, :include => [{:household => :members}]
The email domain length may be a red herring, but I couldn't reproduce the error otherwise. Why is Arel doing a bunch of joins in just this case?
Try
Member.with_households.where('UPPER(members.email) LIKE UPPER(?)' , "a#bs.com")
ActiveRecord StatementInvalid is ambiguous is ARs way of saying: Mister, your SQL statement does not make sense because a field you are examining is found on several tables and I don't know which to use.
I'm guessing the with_households scope looks at another table which also has an email field. Or maybe you have a default_scope which does the same. Try specifying the table name everywhere, e.g. members.email and managers.email.
Member.with_households.where(Member.arel_table[:email].matches("a#bs.com"))

ActiveRecord returns model objects with un-initialized associations

I have a simple join query which in some cases returns ActiveRecord objects with uninitialized associations, and I try to understand why. (My setup: rails 2.3.8 with MySQL)
Here are my models:
class Member
has_many :twitter_status_relations
//has some more unrelated associations
end
class TwitterStatus
has_many :twitter_status_relations
end
class TwitterStatusRelation
belongs_to :member
belongs_to :twitter_status
end
And here is the query I perform:
result = TwitterStatusRelation.all(:joins => :twitter_status,
:conditions=>{:twitter_statuses=>{:sent_at=>1.month.ago..DateTime.now}}, :include=>:member,:group=>"twitter_status_relations.member_id")
Now, when I run in it the first time in the app, it works fine:
print result[0].member, result[0].member.class.reflect_on_all_associations(:has_many)
#=> <Member...>, [<ActiveRecord::Reflection::AssociationReflection,...]
BUT, when I run it again, and try accessing any association of the member, I get nil exception. Print shows the following:
print result[0].member, result[0].member.class.reflect_on_all_associations(:has_many)
#=> <Member...>, [-- empty ---]
Looks like the member object doesn't have any associations, and so when I try to access any of it, I get an exception.
Do you have any idea why ActiveRecord wouldn't initialize associations of the returned objects in some cases? I would appreciate any half-idea because I'm stuck.
Here is the SQL the above query produces (Ran posted the question on my behalf). The SQL has more fields then the query, because I simplified the query when posting it, removing conditions that aren't relevant to the problem.
SELECT `twitter_status_relations`.`id` AS t0_r0,
twitter_status_relations.twitter_status_id AS t0_r1,
twitter_status_relations.source_twitter_identity_id AS t0_r2,
twitter_status_relations.relation_type AS t0_r3,
twitter_status_relations.relation_data AS t0_r4,
twitter_status_relations.linked_twitter_identity_id AS t0_r5,
twitter_status_relations.user_id AS t0_r6,
twitter_status_relations.linked_member_id AS t0_r7, members.id
AS t1_r0, members.user_id AS t1_r1, members.name AS t1_r2,
members.email AS t1_r3, members.member_rating AS t1_r4,
members.created_at AS t1_r5, members.updated_at AS t1_r6,
members.merged_with_member_id AS t1_r7, members.engage_rating
AS t1_r8, members.support_rating AS t1_r9,
members.user_engage_rating AS t1_r10,
members.user_support_rating AS t1_r11,
members.influence_rating AS t1_r12, members.twitter_username
AS t1_r13, members.lead_rating AS t1_r14,
members.follow_rating AS t1_r15, members.unfollow_rating AS
t1_r16, members.followers_count AS t1_r17, members.hidden AS
t1_r18 FROM twitter_status_relations LEFT OUTER JOIN members ON
members.id = twitter_status_relations.linked_member_id WHERE
(twitter_status_relations.user_id = 1 AND
twitter_status_relations.relation_type IN(
'mention','reply','received_dm','retweet','link','term','hashtag' )
AND twitter_status_relations.linked_member_id IN(
83995,128457,21421,138316,128455,97475,128453,436231,82236,441208,138564,138337,436223,436222,441093,21194,441088,441092,438998,442752,138331,138327,138325,444897,9277,12,509521,13,15,534511,7606,7447,200,7,4,17200,5,652302,1,5536,18770,652301,214082,150870,436228,81204,436225,662513,138608,138338
)) AND twitter_status_relations.id IN (8304, 26493, 113492, 113638,
1, 6, 41213, 113493, 20, 26173) GROUP BY
twitter_status_relations.linked_member_id ORDER BY
members.member_rating

Rails, scope, OR and joins

I have a scope:
includes(:countries).where("profiles.sector = :sector OR advices.sector = :sector", :sector => sector)
It produces the following SQL:
SELECT `profiles`.* FROM `profiles` INNER JOIN `advices` ON `advices`.`profile_id` = `profiles`.`id` WHERE (profiles.sector = 'Forestry_paper' OR advices.sector = 'Forestry_paper')
(yes I have country in my Profile and in my Country model)
Unfortunately, the OR seems to fail:
it doesn't render a profile having only the proper sector but no related advice. Thoughts?
You are doing an INNER JOIN, so it requires that the profiles have a corresponding advice. Try the following instead:
Profile
.joins("LEFT JOIN advices ON advices.profile_id = profiles.id")
.where("profiles.sector = :sector OR advices.sector = :sector", :sector => sector)
This will also include profiles that have no advices.
You can do outer joins by specifying a where clause with a hash after the includes:
Post.includes(:comments).where(:comments=>{:user_id=>nil})
produces:
Post Load (0.5ms) SELECT "posts"."id" AS t0_r0, "posts"."created_at" AS t0_r1,
"posts"."updated_at" AS t0_r2, "comments"."id" AS t1_r0, "comments"."user_id"
AS t1_r1, "comments"."post_id" AS t1_r2, "comments"."content" AS t1_r3,
"comments"."created_at" AS t1_r4, "comments"."updated_at" AS t1_r5
FROM "posts" LEFT OUTER JOIN "comments" ON "comments"."post_id" = "posts"."id"
WHERE ("comments"."user_id" IS NULL)
Ryan Bigg wrote a helpful blog post about this.
EDIT
Be aware that this technique is more or less a side effect of the way Rails constructs the SQL for eager-loading associations. An explicit LEFT JOIN is more robust, as suggested in the accepted answer.
Check out http://metautonomo.us/projects/metawhere/ for more query goodness...
meta_where is now unmaintained: https://github.com/activerecord-hackery/meta_where
Rails 5 is introducing OR statements:
Rails 5: ActiveRecord OR query

Resources