Rails: How to get objects with at least one child? - ruby-on-rails

After googling, browsing SO and reading, there doesn't seem to be a Rails-style way to efficiently get only those Parent objects which have at least one Child object (through a has_many :children relation). In plain SQL:
SELECT *
FROM parents
WHERE EXISTS (
SELECT 1
FROM children
WHERE parent_id = parents.id)
The closest I've come is
Parent.all.reject { |parent| parent.children.empty? }
(based on another answer), but it's really inefficient because it runs a separate query for each Parent.

Parent.joins(:children).uniq.all

As of Rails 5.1, uniq is deprecated and distinct should be used instead.
Parent.joins(:children).distinct
This is a follow-up on Chris Bailey's answer. .all is removed as well from the original answer as it doesn't add anything.

The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:
Parent.where<<-SQL
EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id)
SQL
EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:
class Post < ApplicationRecord
has_many :comments
has_many :likes
end
class Comment < ApplicationRecord
belongs_to :post
end
class Like < ApplicationRecord
belongs_to :post
end
In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:
# Create posts with comments and likes
100.times do |i|
post = Post.create!(title: "Post #{i}")
50.times do |j|
post.comments.create!(content: "Comment #{j} for #{post.title}")
post.likes.create!(user_name: "User #{j} for #{post.title}")
end
end
# Create a post without comment and like
Post.create!(title: 'Hidden post')
If you want to get posts which have at least one comment and like, you might write like this:
# NOTE: uniq method will be removed in Rails 5.1
Post.joins(:comments, :likes).distinct
The query above generates SQL like this:
SELECT DISTINCT "posts".*
FROM "posts"
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id"
INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"
But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.
In this case you should write like this:
Post.where <<-SQL
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
SQL
This query generates SQL like this:
SELECT "posts".*
FROM "posts"
WHERE (
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
)
This query does not generate useless duplicated rows, so it could be faster.
Here is benchmark:
user system total real
Uniq: 0.010000 0.000000 0.010000 ( 0.074396)
Exists: 0.000000 0.000000 0.000000 ( 0.003711)
It shows EXISTS is 20.047661 times faster than DISTINCT.
I pushed the sample application in GitHub, so you can confirm the difference by yourself:
https://github.com/JunichiIto/exists-query-sandbox

I have just modified this solution for your need.
Parent.joins("left join childrens on childrends.parent_id = parents.id").where("childrents.parent_id is not null")

You just want an inner join with a distinct qualifier
SELECT DISTINCT(*)
FROM parents
JOIN children
ON children.parent_id = parents.id
This can be done in standard active record as
Parent.joins(:children).uniq
However if you want the more complex result of find all parents with no children
you need an outer join
Parent.joins("LEFT OUTER JOIN children on children.parent_id = parent.id").
where(:children => { :id => nil })
which is a solution which sux for many reasons. I recommend Ernie Millers squeel library which will allow you to do
Parent.joins{children.outer}.where{children.id == nil}

try including the children with #includes()
Parent.includes(:children).all.reject { |parent| parent.children.empty? }
This will make 2 queries:
SELECT * FROM parents;
SELECT * FROM children WHERE parent_id IN (5, 6, 8, ...);
[UPDATE]
The above solution is usefull when you need to have the Child objects loaded.
But children.empty? can also use a counter cache1,2 to determine the amount of children.
For this to work you need to add a new column to the parents table:
# a new migration
def up
change_table :parents do |t|
t.integer :children_count, :default => 0
end
Parent.reset_column_information
Parent.all.each do |p|
Parent.update_counters p.id, :children_count => p.children.length
end
end
def down
change_table :parents do |t|
t.remove :children_count
end
end
Now change your Child model:
class Child
belongs_to :parent, :counter_cache => true
end
At this point you can use size and empty? without touching the children table:
Parent.all.reject { |parent| parent.children.empty? }
Note that length doesn't use the counter cache whereas size and empty? do.

Related

How to UNION tables and make results accessible in a Ruby view

I'm quite new to RoR and creating a student project for a course I'm taking. I'm wanting to construct a type of query we didn't cover in the course and which I know I could do in a snap in .NET and SQL. I'm having a heck of a time though getting it implemented the Ruby way.
What I'd like to do: Display a list on a user's page of all "posts" by that user's friends.
"Posts" are found in both a questions table and in a blurbs table that users contribute to. I'd like to UNION these two into a single recordset to sort by updated_at DESC.
The table column names are not the same however, and this is my sticking point since other successful answers I've seen have hinged on column names being the same between the two.
In SQL I'd write something like (emphasis on like):
SELECT b.Blurb AS 'UserPost', b.updated_at, u.username as 'Author'
FROM Blurbs b
INNER JOIN Users u ON b.User_ID = u.ID
WHERE u.ID IN
(SELECT f.friend_id FROM Friendships f WHERE f.User_ID = [current user])
ORDER BY b.updated_at DESC
UNION
SELECT q.Question, q.updated_at, u.username
FROM Questions q
INNER JOIN Users u ON q.User_ID = u.ID
WHERE u.ID IN
(SELECT f.friend_id FROM Friendships f WHERE f.User_ID = [current user])
ORDER BY b.updated_at DESC
The User model's (applicable) relationships are:
has_many :friendships
has_many :friends, through: :friendships
has_many :questions
has_many :blurbs
And the Question and Blurb models both have belongs_to :user
In the view I'd like to display the contents of the 'UserPost' column and the 'Author'. I'm sure this is possible, I'm just too new still to ActiveRecord and how statements are formed. Happy to have some input or review any relevant links that speak to this specifically!
Final Solution
Hopefully this will assist others in the future with Ruby UNION questions. Thanks to #Plamena's input the final implementation ended up as:
def friend_posts
sql = "...the UNION statement seen above..."
ActiveRecord::Base.connection.select_all(ActiveRecord::Base.send("sanitize_sql_array",[sql, self.id, self.id] ) )
end
Currently Active Record lacks union support. You can use SQL:
sql = <<-SQL
# your sql query goes here
SELECT b.created_at ...
UNION(
SELECT q.created_at
....
)
SQL
posts = ActiveRecord::Base.connection.select_all(sql)
Then you can iterate the result:
posts.each do |post|
# post is a hash
p post['created_at']
end
Your best way to do this is to just use the power of Rails
If you want all of something belonging to a user's friend:
current_user.friends.find(id_of_friend).first.questions
This would get all of the questions from a certain friend.
Now, it seems that you have writings in multiple places (this is hard to visualise without your providing a model of how writings is connected to everywhere else). Can you provide this?
#blurbs = Blurb.includes(:user)
#blurbs.each do |blurb|
p blurb.blurb, blurb.user.username
end

grouping with a non-primary key in postgres / activerecord

I have a model Lap:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
scoped = joins(:car_model).where(:car_models => {:name => carmodel})
scoped
end
def self.fastest_per_car
scoped = select("laps.id",:car_id, :time, :mph).group("laps.id", :car_id, :time, :mph).order("time").limit(1)
scoped
end
end
I want to only return the fastest lap for each car.
So, I need to group the Laps by the Lap.car_id and then only return the fastest lap time based on that car, which would determined by the column Lap.time
Basically I would like to stack my methods in my controller:
#corvettes = Lap.by_carmodel("Corvette").fastest_per_car
Hopefully that makes sense...
When trying to run just Lap.fastest_per_car I am limiting everything to 1 result, rather than 1 result per each Car.
Another thing I had to do was add "laps.id" as :id was showing up empty in my results as well. If i just select(:id) it was saying ambiguous
I think a decent approach to this would be to add a where clause based on an efficient SQL syntax for returning the single fastest lap.
Something like this correlated subquery ...
select ...
from laps
where id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)
It's a little complex because of the need to tie-break on created_at.
The rails scope would just be:
where("laps.id = (select id
from laps laps_inner
where laps_inner.car_id = laps.car_id
order by time asc,
created_at desc
limit 1)")
An index on car_id would be pretty essential, and if that was a composite index on (car_id, time asc) then so much the better.
You are using limit which will return you one single value. Not one value per car. To return one car value per lap you just have to join the table and group by a group of columns that will identify one lap (id is the simplest).
Also, you can have a more ActiveRecord friendly friendly with:
class Lap < ActiveRecord::Base
belongs_to :car
def self.by_carmodel(carmodel)
joins(:car_model).where(:car_models => {:name => carmodel})
end
def self.fastest_per_car
joins(:car_model)
.select("laps.*, MIN(car_models.time) AS min_time")
.group("laps.id")
.order("min_time ASC")
end
end
This is what I did and its working. If there is a better way to go about these please post your answer:
in my model:
def self.fastest_per_car
select('DISTINCT ON (car_id) *').order('car_id, time ASC').sort_by! {|ts| ts.time}
end

Specifying conditions on eager loaded associations returns ActiveRecord::RecordNotFound

The problem is that when a Restaurant does not have any MenuItems that match the condition, ActiveRecord says it can't find the Restaurant. Here's the relevant code:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
def self.with_meals_of_the_week
includes({menu_items: :meal}).where(:'menu_items.date' => Time.now.beginning_of_week..Time.now.end_of_week)
end
end
And the sql code generated:
Restaurant Load (0.0ms)←[0m ←[1mSELECT DISTINCT "restaurants".id FROM "restaurants"
LEFT OUTER JOIN "menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN "meals" ON "meals"."id" = "menu_items"."meal_id" WHERE
"restaurants"."id" = ? AND ("menu_items"."date" BETWEEN '2012-10-14 23:00:00.000000'
AND '2012-10-21 22:59:59.999999') LIMIT 1←[0m [["id", "1"]]
However, according to this part of the Rails Guides, this shouldn't be happening:
Post.includes(:comments).where("comments.visible", true)
If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded.
The SQL generated is a correct translation of your query. But look at it,
just at the SQL level (i shortened it a bit):
SELECT *
FROM
"restaurants"
LEFT OUTER JOIN
"menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN
"meals" ON "meals"."id" = "menu_items"."meal_id"
WHERE
"restaurants"."id" = ?
AND
("menu_items"."date" BETWEEN '2012-10-14' AND '2012-10-21')
the left outer joins do the work you expect them to do: restaurants
are combined with menu_items and meals; if there is no menu_item to
go with a restaurant, the restaurant is still kept in the result, with
all the missing pieces (menu_items.id, menu_items.date, ...) filled in with NULL
now look aht the second part of the where: the BETWEEN operator demands,
that menu_items.date is not null! and this
is where you filter out all the restaurants without meals.
so we need to change the query in a way that makes having null-dates ok.
going back to ruby, you can write:
def self.with_meals_of_the_week
includes({menu_items: :meal})
.where('menu_items.date is NULL or menu_items.date between ? and ?',
Time.now.beginning_of_week,
Time.now.end_of_week
)
end
The resulting SQL is now
.... WHERE (menu_items.date is NULL or menu_items.date between '2012-10-21' and '2012-10-28')
and the restaurants without meals stay in.
As it is said in Rails Guide, all Posts in your query will be returned only if you will not use "where" clause with "includes", cause using "where" clause generates OUTER JOIN request to DB with WHERE by right outer table so DB will return nothing.
Such implementation is very helpful when you need some objects (all, or some of them - using where by base model) and if there are related models just get all of them, but if not - ok just get list of base models.
On other hand if you trying to use conditions on including tables then in most cases you want to select objects only with this conditions it means you want to select Restaurants only which has meals_items.
So in your case, if you still want to use only 2 queries (and not N+1) I would probably do something like this:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
cattr_accessor :meals_of_the_week
def self.with_meals_of_the_week
restaurants = Restaurant.all
meals_of_the_week = {}
MenuItems.includes(:meal).where(date: Time.now.beginning_of_week..Time.now.end_of_week, restaurant_id => restaurants).each do |menu_item|
meals_of_the_week[menu_item.restaurant_id] = menu_item
end
restaurants.each { |r| r.meals_of_the_week = meals_of_the_week[r.id] }
restaurants
end
end
Update: Rails 4 will raise Deprecation warning when you simply try to do conditions on models
Sorry for possible typo.
I think there is some misunderstanding of this
If there was no where condition, this would generate the normal set of two queries.
If, in the case of this includes query, there were no comments for any
posts, all the posts would still be loaded. By using joins (an INNER
JOIN), the join conditions must match, otherwise no records will be
returned.
[from guides]
I think this statements doesn't refer to the example Post.includes(:comments).where("comments.visible", true)
but refer to one without where statement Post.includes(:comments)
So all work right! This is the way LEFT OUTER JOIN work.
So... you wrote: "If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded." Ok! But this is true ONLY when there is NO where clause! You missed the context of the phrase.

How on earth is this rails query working?

I have just optimised some Ruby code that was in a controller method, replacing it with a direct database query. The replacement appears to work and is much faster. Thing is, I've no idea how Rails managed to figure out the correct query to use!
The purpose of the query is to work out tag counts for Place models within a certain distance of a given latitude and longitude. The distance part is handled by the GeoKit plugin (which basically adds convenience methods to add the appropriate trigonometry calculations to the select), and the tagging part is done by the acts_as_taggable_on_steroids plugin, which uses a polymorphic association.
Below is the original code:
places = Place.find(:all, :origin=>latlng, :order=>'distance asc', :within=>distance, :limit=>200)
tag_counts = MyTag.tagcounts(places)
deep_tag_counts=Array.new()
tag_counts.each do |tag|
count=Place.find_tagged_with(tag.name,:origin=>latlng, :order=>'distance asc', :within=>distance, :limit=>200).size
deep_tag_counts<<{:name=>tag.name,:count=>count}
end
where the MyTag class implements this:
def MyTag.tagcounts(places)
alltags = places.collect {|p| p.tags}.flatten.sort_by(&:name)
lasttag=nil;
tagcount=0;
result=Array.new
alltags.each do |tag|
unless (lasttag==nil || lasttag.name==tag.name)
result << MyTag.new(lasttag,tagcount)
tagcount=0
end
tagcount=tagcount+1
lasttag=tag
end
unless lasttag==nil then
result << MyTag.new(lasttag,tagcount)
end
result
end
This was my (very ugly) first attempt as I originally found it difficult to come up with the right rails incantations to get this done in SQL. The new replacement is this single line:
deep_tag_counts=Place.find(:all,:select=>'name,count(*) as count',:origin=>latlng,:within=>distance,:joins=>:tags, :group=>:tag_id)
Which results in an SQL query like this:
SELECT name,count(*) as count, (ACOS(least(1,COS(0.897378837271255)*COS(-0.0153398733287034)*COS(RADIANS(places.lat))*COS(RADIANS(places.lng))+
COS(0.897378837271255)*SIN(-0.0153398733287034)*COS(RADIANS(places.lat))*SIN(RADIANS(places.lng))+
SIN(0.897378837271255)*SIN(RADIANS(places.lat))))*3963.19)
AS distance FROM `places` INNER JOIN `taggings` ON (`places`.`id` = `taggings`.`taggable_id` AND `taggings`.`taggable_type` = 'Place') INNER JOIN `tags` ON (`tags`.`id` = `taggings`.`tag_id`) WHERE (places.lat>50.693170735732 AND places.lat<52.1388692642679 AND places.lng>-2.03785525810908 AND places.lng<0.280035258109084 AND (ACOS(least(1,COS(0.897378837271255)*COS(-0.0153398733287034)*COS(RADIANS(places.lat))*COS(RADIANS(places.lng))+
COS(0.897378837271255)*SIN(-0.0153398733287034)*COS(RADIANS(places.lat))*SIN(RADIANS(places.lng))+
SIN(0.897378837271255)*SIN(RADIANS(places.lat))))*3963.19)
<= 50) GROUP BY tag_id
Ignoring the trig (which is from GeoKit, and results from the :within and :origin parameters), what I can't figure out about this is how on earth Rails was able to figure out from the instruction to join 'tags', that it had to involve 'taggings' in the JOIN (which it does, as there is no direct way to join the places and tags tables), and also that it had to use the polymorphic stuff.
In other words, how the heck did it (correctly) come up with this bit:
INNER JOIN `taggings` ON (`places`.`id` = `taggings`.`taggable_id` AND `taggings`.`taggable_type` = 'Place') INNER JOIN `tags` ON (`tags`.`id` = `taggings`.`tag_id`)
...given that I never mentioned the taggings table in the code! Digging into the taggable plugin, the only clue that Rails has seems to be this:
class Tag < ActiveRecord::Base
has_many :taggings, :dependent=>:destroy
...
end
Anybody able to give some insight into the magic going on under the hood here?
The acts_as_taggable_on_steroids plugin tells your Place model that it has_many Tags through Taggings. With this association specified, ActiveRecord knows that it needs to join taggings in order to get to the tags table. The same thing holds true for HABTM relationships. For example:
class Person < ActiveRecord::Base
has_and_belongs_to_many :tags
end
class Tag < ActiveRecord::Base
has_and_belongs_to_many :people
end
>> Person.first(:joins => :tags)
This produces the following SQL:
SELECT "people".*
FROM "people"
INNER JOIN "people_tags" ON "people_tags".person_id = "people".id
INNER JOIN "tags" ON "tags".id = "people_tags".tag_id
LIMIT 1

Find all objects with no associated has_many objects

In my online store, an order is ready to ship if it in the "authorized" state and doesn't already have any associated shipments. Right now I'm doing this:
class Order < ActiveRecord::Base
has_many :shipments, :dependent => :destroy
def self.ready_to_ship
unshipped_orders = Array.new
Order.all(:conditions => 'state = "authorized"', :include => :shipments).each do |o|
unshipped_orders << o if o.shipments.empty?
end
unshipped_orders
end
end
Is there a better way?
In Rails 3 using AREL
Order.includes('shipments').where(['orders.state = ?', 'authorized']).where('shipments.id IS NULL')
You can also query on the association using the normal find syntax:
Order.find(:all, :include => "shipments", :conditions => ["orders.state = ? AND shipments.id IS NULL", "authorized"])
One option is to put a shipment_count on Order, where it will be automatically updated with the number of shipments you attach to it. Then you just
Order.all(:conditions => [:state => "authorized", :shipment_count => 0])
Alternatively, you can get your hands dirty with some SQL:
Order.find_by_sql("SELECT * FROM
(SELECT orders.*, count(shipments) AS shipment_count FROM orders
LEFT JOIN shipments ON orders.id = shipments.order_id
WHERE orders.status = 'authorized' GROUP BY orders.id)
AS order WHERE shipment_count = 0")
Test that prior to using it, as SQL isn't exactly my bag, but I think it's close to right. I got it to work for similar arrangements of objects on my production DB, which is MySQL.
Note that if you don't have an index on orders.status I'd strongly advise it!
What the query does: the subquery grabs all the order counts for all orders which are in authorized status. The outer query filters that list down to only the ones which have shipment counts equal to zero.
There's probably another way you could do it, a little counterintuitively:
"SELECT DISTINCT orders.* FROM orders
LEFT JOIN shipments ON orders.id = shipments.order_id
WHERE orders.status = 'authorized' AND shipments.id IS NULL"
Grab all orders which are authorized and don't have an entry in the shipments table ;)
This is going to work just fine if you're using Rails 6.1 or newer:
Order.where(state: 'authorized').where.missing(:shipments)

Resources