I'm trying to join the first song of each playlist to an array of playlists and am having a pretty tough time finding an efficient solution.
I have the following models:
class Playlist < ActiveRecord::Base
belongs_to :user
has_many :playlist_songs
has_many :songs, :through => :playlist_songs
end
class PlaylistSong < ActiveRecord::Base
belongs_to :playlist
belongs_to :song
end
class Song < ActiveRecord::Base
has_many :playlist_songs
has_many :playlists, :through => :playlist_songs
end
I would like to get this:
playlist_name | song_name
----------------------------
chill | baby
fun | bffs
I'm having a pretty tough time finding an efficient way to do this through a join.
UPDATE ****
Shane Andrade has lead me in the right direction, but I still can't get exactly what I want.
This is as far as I've been able to get:
playlists = Playlist.where('id in (1,2,3)')
playlists.joins(:playlist_songs)
.group('playlists.id')
.select('MIN(songs.id) as song_id, playlists.name as playlist_name')
This gives me:
playlist_name | song_id
---------------------------
chill | 1
This is close, but I need the first song(according to id)'s name.
Assuming you are on Postgresql
Playlist.
select("DISTINCT ON(playlists.id) playlists.id,
songs.id song_id,
playlists.name,
songs.name first_song_name").
joins(:songs).
order("id, song_id").
map do |pl|
[pl.id, pl.name, pl.first_song_name]
end
I think this problem would be improved by having a a stricter definition of "first". I'd suggest adding a position field on the PlaylistSong model. At which point you can then simply do:
Playlist.joins(:playlist_song).joins(:song).where(:position => 1)
What you are doing above with joins is what you would do if you wanted to find every playlist with a given name and a given song. In order to collect the playlist_name and first song_name from each playlist you can do this:
Playlist.includes(:songs).all.collect{|play_list| [playlist.name, playlist.songs.first.name]}
This will return an array in this form [[playlist_name, first song_name],[another_playlist_name, first_song_name]]
I think the best way to do this is to use an inner query to get the first item and then join on it.
Untested but this is the basic idea:
# gnerate the sql query that selects the first item per playlist
inner_query = Song.group('playlist_id').select('MIN(id) as id, playlist_id').to_sql
#playlists = Playlist
.joins("INNER JOIN (#{inner_query}) as first_songs ON first_songs.playlist_id = playlist.id")
.joins("INNER JOIN songs on songs.id = first_songs.id")
Then rejoin back to the songs table since we need the song name. I'm not sure if rails is smart enough to select the song fields on the last join. If not you might need to include a select at the end that selects playlists.*, songs.* or something.
Try:
PlaylistSong.includes(:song, :playlist).
find(PlaylistSong.group("playlist_id").pluck(:id)).each do |ps|
puts "Playlist: #{ps.playlist.name}, Song: #{ps.song.name}"
end
(0.3ms) SELECT id FROM `playlist_songs` GROUP BY playlist_id
PlaylistSong Load (0.2ms) SELECT `playlist_songs`.* FROM `playlist_songs` WHERE `playlist_songs`.`id` IN (1, 4, 7)
Song Load (0.2ms) SELECT `songs`.* FROM `songs` WHERE `songs`.`id` IN (1, 4, 7)
Playlist Load (0.2ms) SELECT `playlists`.* FROM `playlists` WHERE `playlists`.`id` IN (1, 2, 3)
Playlist: Dubstep, Song: Dubstep song 1
Playlist: Top Rated, Song: Top Rated song 1
Playlist: Last Played, Song: Last Played song 1
This solution has some benefits:
Limited to 4 select statements
Does not load all playlist_songs - aggregating on db side
Does not load all songs - filtering by id's on db side
Tested with MySQL.
This will not show empty playlists.
And there could be problems with some DBs when playlists count > 1000
just fetch the song from the other side :
Song
.joins( :playlist )
.where( playlists: {id: [1,2,3]} )
.first
however, as #Dave S. suggested, "first" song in a playlist is random unless you explicitly specify an order (positioncolumn, or anything else) because SQL does not warrant the order in which the records are returned, unless you explicitly ask it.
EDIT
Sorry, I misread your question. I think that indeed a position column is necessary.
Song
.joins( :playlist )
.where( playlists: {id: [1,2,3]}, songs: {position: 1} )
If you do not want any position column at all, you can always try to group the songs by playlist id, but you'll have to select("songs.*, playlist_songs.*"), and the "first" song is still random. Another option is to use the RANK window function, but it is not supported by all RDBMS (for all i know, postgres and sql server do).
you can create a has_one association which, in effect, will call the first song that is associated to the playlist
class PlayList < ActiveRecord::Base
has_one :playlist_cover, class_name: 'Song', foreign_key: :playlist_id
end
Then just use this association.
Playlist.joins(:playlist_cover)
UPDATE: didn't see the join table.
you can use a :through option for has_one if you have a join table
class PlayList < ActiveRecord::Base
has_one :playlist_song_cover
has_one :playlist_cover, through: :playlist_song_cover, source: :song
end
Playlyst.joins(:playlist_songs).group('playlists.name').minimum('songs.name').to_a
hope it works :)
got this :
Product.includes(:vendors).group('products.id').collect{|product| [product.title, product.vendors.first.name]}
Product Load (0.5ms) SELECT "products".* FROM "products" GROUP BY products.id
Brand Load (0.5ms) SELECT "brands".* FROM "brands" WHERE "brands"."product_id" IN (1, 2, 3)
Vendor Load (0.4ms) SELECT "vendors".* FROM "vendors" WHERE "vendors"."id" IN (2, 3, 1, 4)
=> [["Computer", "Dell"], ["Smartphone", "Apple"], ["Screen", "Apple"]]
2.0.0p0 :120 > Product.joins(:vendors).group('products.title').minimum('vendors.name').to_a
(0.6ms) SELECT MIN(vendors.name) AS minimum_vendors_name, products.title AS products_title FROM "products" INNER JOIN "brands" ON "brands"."product_id" = "products"."id" INNER JOIN "vendors" ON "vendors"."id" = "brands"."vendor_id" GROUP BY products.title
=> [["Computer", "Dell"], ["Screen", "Apple"], ["Smartphone", "Apple"]]
You could add activerecord scope to your models to optimize how the sql queries work for you in the context of the app. Also, scopes are composable, thus make it easier to obtain what you're looking for.
For example, in your Song model, you may want a first_song scope
class Song < ActiveRecord::Base
scope :first_song, order("id asc").limit(1)
end
And then you can do something like this
playlists.songs.first_song
Note, you may also need to add some scopes to your PlaylistSongs association model, or to your Playlist model.
You didn't say if you had timestamps in your database. If you do though, and your records on the join table PlaylistSongs are created when you add a song to a playlist, I think this may work:
first_song_ids = Playlist.joins(:playlist_songs).order('playlist_songs.created_at ASC').pluck(:song_id).uniq
playlist_ids = Playlist.joins(:playlist_songs).order('playlist_songs.created_at ASC').pluck(:playlist_id).uniq
playlist_names = Playlist.where(id: playlist_ids).pluck(:playlist_name)
song_names = Song.where(id: first_song_ids).pluck(:song_name)
I believe playlist_names and song_names are now mapped by their index in this way. As in: playlist_names[0] first song name is song_names[0], and playlist_names[1] first song name is song_names[1] and so on. I'm sure you could combine them in a hash or an array very easily with built in ruby methods.
I realize you were looking for an efficient way to do this, and you said in the comments you didn't want to use a block, and I am unsure if by efficient you meant an all-in-one query. I am just getting used to combining all these rails query methods and perhaps looking at what I have here, you can modify things to your needs and make them more efficient or condensed.
Hope this helps.
Related
I have 3 models, Shop, Client, Product.
A shop has many clients, and a shop has many products.
Then I have 2 extra models, one is ShopClient, that groups the shop_id and client_id. The second is ShopProduct, that groups the shop_id and product_id.
Now I have a controller that receives two params, the client_id and product_id. So I want to select all the shops (in one instance variable #shops) filtered by client_id and product_id without shop repetition. How can I do this??
I hope I was clear, thanks.
ps: I'm using Postgresql as database.
Below query will work for you.
class Shop
has_many :shop_clients
has_many :clients, through: :shop_clients
has_many :shop_products
has_many :products, through: :shop_products
end
class Client
end
class Product
end
class ShopClient
belongs_to :shop
belongs_to :client
end
class ShopProduct
belongs_to :shop
belongs_to :product
end
#shops = Shop.joins(:clients).where(clients: {id: params[:client_id]}).merge(Shop.joins(:products).where(products: {id: params[:product_id]}))
Just to riff on the answer provided by Prince Bansal. How about creating some class methods for those joins? Something like:
class Shop
has_many :shop_clients
has_many :clients, through: :shop_clients
has_many :shop_products
has_many :products, through: :shop_products
class << self
def with_clients(clients)
joins(:clients).where(clients: {id: clients})
end
def with_products(products)
joins(:products).where(products: {id: products})
end
end
end
Then you could do something like:
#shops = Shop.with_clients(params[:client_id]).with_products(params[:product_id])
By the way, I'm sure someone is going to say you should make those class methods into scopes. And you certainly can do that. I did it as class methods because that's what the Guide recommends:
Using a class method is the preferred way to accept arguments for scopes.
But, I realize some people strongly prefer the aesthetics of using scopes instead. So, whichever pleases you most.
I feel like the best way to solve this issue is to use sub-queries. I'll first collect all valid shop ids from ShopClient, followed by all valid shop ids from ShopProduct. Than feed them into the where query on Shop. This will result in one SQL query.
shop_client_ids = ShopClient.where(client_id: params[:client_id]).select(:shop_id)
shop_product_ids = ShopProduct.where(product_id: params[:product_id]).select(:shop_id)
#shops = Shop.where(id: shop_client_ids).where(id: shop_product_ids)
#=> #<ActiveRecord::Relation [#<Shop id: 1, created_at: "2018-02-14 20:22:18", updated_at: "2018-02-14 20:22:18">]>
The above query results in the SQL query below. I didn't specify a limit, but this might be added by the fact that my dummy project uses SQLite.
SELECT "shops".*
FROM "shops"
WHERE
"shops"."id" IN (
SELECT "shop_clients"."shop_id"
FROM "shop_clients"
WHERE "shop_clients"."client_id" = ?) AND
"shops"."id" IN (
SELECT "shop_products"."shop_id"
FROM "shop_products"
WHERE "shop_products"."product_id" = ?)
LIMIT ?
[["client_id", 1], ["product_id", 1], ["LIMIT", 11]]
Combining the two sub-queries in one where doesn't result in a correct response:
#shops = Shop.where(id: [shop_client_ids, shop_product_ids])
#=> #<ActiveRecord::Relation []>
Produces the query:
SELECT "shops".* FROM "shops" WHERE "shops"."id" IN (NULL, NULL) LIMIT ? [["LIMIT", 11]]
note
Keep in mind that when you run the statements one by one in the console this will normally result in 3 queries. This is due to the fact that the return value uses the #inspect method to let you see the result. This method is overridden by Rails to execute the query and display the result.
You can simulate the behavior of the normal application by suffixing the statements with ;nil. This makes sure nil is returned and the #inspect method is not called on the where chain, thus not executing the query and keeping the chain in memory.
edit
If you want to clean up the controller you might want to move these sub-queries into model methods (inspired by jvillians answer).
class Shop
# ...
def self.with_clients(*client_ids)
client_ids.flatten! # allows passing of multiple arguments or an array of arguments
where(id: ShopClient.where(client_id: client_ids).select(:shop_id))
end
# ...
end
Rails sub-query vs join
The advantage of a sub-query over a join is that using joins might end up returning the same record multiple times if you query on a attribute that is not unique. For example, say a product has an attribute product_type that is either 'physical' or 'digital'. If you want to select all shops selling a digital product you must not forget to call distinct on the chain when you're using a join, otherwise the same shop may return multiple times.
However if you'll have to query on multiple attributes in product, and you'll use multiple helpers in the model (where each helper joins(:products)). Multiple sub-queries are likely slower. (Assuming you set has_many :products, through: :shop_products.) Since Rails reduces all joins to the same association to a single one. Example: Shop.joins(:products).joins(:products) (from multiple class methods) will still end up joining the products table a single time, whereas sub-queries will not be reduced.
Below sql query possibly gonna work for you.
--
-- assuming
-- tables: shops, products, clients, shop_products, shop_clients
--
SELECT DISTINCT * FROM shops
JOIN shop_products
ON shop_products.shop_id = shops.id
JOIN shop_clients
ON shop_clients.shop_id = shops.id
WHERE shop_clients.client_id = ? AND shop_products.product_id = ?
If you'll face difficulties while creating an adequate AR expression for this sql query, let me know.
Btw, here is a mock
I'm having troubles to order my records by their has_one association. I'm quite sure the solution is obvious, but I just can't get it.
class Migration
has_many :checks
has_one :latest_origin_check, -> { where(origin: true).order(at: :desc) }, class_name: 'Check'
end
class Check
belongs_to :migration
end
If I order by checks.status I always get different check ids. Shouldn't they be the same but with different order?
Or is the -> { } way to get the has_one association the problem?
Migration.all.includes(:latest_origin_check).order("checks.status DESC").each do |m| puts m.latest_origin_check.id end
So in one sentence: How do I order records through a custom has_one association?
I'm using Ruby 2.0.0, Rails 4.2 and PostgreSQL.
Update:
I wasn't specific enough. I've got two has_one relations on the checks relation.
Also very Important. One Migration has a way to big number of checks to include all the checks at once. So Migration.first.includes(:checks) would be very slow. We are talking about serveral thousand and I only need the latest.
class Migration
has_many :checks
has_one :latest_origin_check, -> { where(origin: true).order(at: :desc) }, class_name: 'Check'
has_one :latest_target_check, -> { where(origin: false).order(at: :desc) }, class_name: 'Check'
end
class Check
belongs_to :migration
end
Now if I get the latest_origin_check, I get the correct Record. The query is the following.
pry(main)> Migration.last.latest_origin_check
Migration Load (1.1ms) SELECT "migrations".* FROM "migrations" ORDER BY "migrations"."id" DESC LIMIT 1
Check Load (0.9ms) SELECT "checks".* FROM "checks" WHERE "checks"."migration_id" = $1 AND "checks"."origin" = 't' ORDER BY "checks"."at" DESC LIMIT 1 [["migration_id", 59]]
How do I get the latest check of each migration and then sort the migrations by a attribute of the latest check?
I'm using ransack. Ransack seems to get it right when I order the records by "checks.at"
SELECT "migrations".* FROM "migrations" LEFT OUTER JOIN "checks" ON "checks"."migration_id" = "migrations"."id" AND "checks"."origin" = 't' WHERE (beginning between '2015-02-22 23:00:00.000000' and '2015-02-23 22:59:59.000000' or ending between '2015-02-22 23:00:00.000000' and '2015-02-23 22:59:59.000000') ORDER BY "checks"."at" ASC
But the same query returns wrong results when I order by status
SELECT "migrations".* FROM "migrations" LEFT OUTER JOIN "checks" ON "checks"."migration_id" = "migrations"."id" AND "checks"."origin" = 't' WHERE (beginning between '2015-02-22 23:00:00.000000' and '2015-02-23 22:59:59.000000' or ending between '2015-02-22 23:00:00.000000' and '2015-02-23 22:59:59.000000') ORDER BY "checks"."status" ASC
Check.status is a boolean, check.at is a DateTime. A colleague suggested that the boolean is the problem. Do I need to convert the booleans to an integer to make them sortable? How do I do that only for the :latest_origin_check? Something like that?
.order("(case when \"checks\".\"status\" then 2 when \"checks\".\"status\" is null then 0 else 1 end) DESC")
You already have a has_many relationship with Check on Migration. I think you are looking for a scope instead:
scope :latest_origin_check, -> { includes(:checks).where(origin:true).order("checks.status DESC").limit(1)}
Drop the has_one :latest_origin_check line on Migration.
Migration.latest_origin_check
I think the line about should return your desired result set.
The problem is that when a Restaurant does not have any MenuItems that match the condition, ActiveRecord says it can't find the Restaurant. Here's the relevant code:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
def self.with_meals_of_the_week
includes({menu_items: :meal}).where(:'menu_items.date' => Time.now.beginning_of_week..Time.now.end_of_week)
end
end
And the sql code generated:
Restaurant Load (0.0ms)←[0m ←[1mSELECT DISTINCT "restaurants".id FROM "restaurants"
LEFT OUTER JOIN "menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN "meals" ON "meals"."id" = "menu_items"."meal_id" WHERE
"restaurants"."id" = ? AND ("menu_items"."date" BETWEEN '2012-10-14 23:00:00.000000'
AND '2012-10-21 22:59:59.999999') LIMIT 1←[0m [["id", "1"]]
However, according to this part of the Rails Guides, this shouldn't be happening:
Post.includes(:comments).where("comments.visible", true)
If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded.
The SQL generated is a correct translation of your query. But look at it,
just at the SQL level (i shortened it a bit):
SELECT *
FROM
"restaurants"
LEFT OUTER JOIN
"menu_items" ON "menu_items"."restaurant_id" = "restaurants"."id"
LEFT OUTER JOIN
"meals" ON "meals"."id" = "menu_items"."meal_id"
WHERE
"restaurants"."id" = ?
AND
("menu_items"."date" BETWEEN '2012-10-14' AND '2012-10-21')
the left outer joins do the work you expect them to do: restaurants
are combined with menu_items and meals; if there is no menu_item to
go with a restaurant, the restaurant is still kept in the result, with
all the missing pieces (menu_items.id, menu_items.date, ...) filled in with NULL
now look aht the second part of the where: the BETWEEN operator demands,
that menu_items.date is not null! and this
is where you filter out all the restaurants without meals.
so we need to change the query in a way that makes having null-dates ok.
going back to ruby, you can write:
def self.with_meals_of_the_week
includes({menu_items: :meal})
.where('menu_items.date is NULL or menu_items.date between ? and ?',
Time.now.beginning_of_week,
Time.now.end_of_week
)
end
The resulting SQL is now
.... WHERE (menu_items.date is NULL or menu_items.date between '2012-10-21' and '2012-10-28')
and the restaurants without meals stay in.
As it is said in Rails Guide, all Posts in your query will be returned only if you will not use "where" clause with "includes", cause using "where" clause generates OUTER JOIN request to DB with WHERE by right outer table so DB will return nothing.
Such implementation is very helpful when you need some objects (all, or some of them - using where by base model) and if there are related models just get all of them, but if not - ok just get list of base models.
On other hand if you trying to use conditions on including tables then in most cases you want to select objects only with this conditions it means you want to select Restaurants only which has meals_items.
So in your case, if you still want to use only 2 queries (and not N+1) I would probably do something like this:
class Restaurant < ActiveRecord::Base
has_many :menu_items, dependent: :destroy
has_many :meals, through: :menu_items
cattr_accessor :meals_of_the_week
def self.with_meals_of_the_week
restaurants = Restaurant.all
meals_of_the_week = {}
MenuItems.includes(:meal).where(date: Time.now.beginning_of_week..Time.now.end_of_week, restaurant_id => restaurants).each do |menu_item|
meals_of_the_week[menu_item.restaurant_id] = menu_item
end
restaurants.each { |r| r.meals_of_the_week = meals_of_the_week[r.id] }
restaurants
end
end
Update: Rails 4 will raise Deprecation warning when you simply try to do conditions on models
Sorry for possible typo.
I think there is some misunderstanding of this
If there was no where condition, this would generate the normal set of two queries.
If, in the case of this includes query, there were no comments for any
posts, all the posts would still be loaded. By using joins (an INNER
JOIN), the join conditions must match, otherwise no records will be
returned.
[from guides]
I think this statements doesn't refer to the example Post.includes(:comments).where("comments.visible", true)
but refer to one without where statement Post.includes(:comments)
So all work right! This is the way LEFT OUTER JOIN work.
So... you wrote: "If, in the case of this includes query, there were no comments for any posts, all the posts would still be loaded." Ok! But this is true ONLY when there is NO where clause! You missed the context of the phrase.
In my app, the main objects are Accounts and Phones, with a typical has_many :through Contacts, eg:
Account:
has_many :contacts
has_many :phones, :though => contacts
Phone:
has_many :contacts
has_many :accounts, :though => :contacts
Contact:
belongs_to :account
belongs_to :phone
Contacts has fields signup_status, name
There is one Contact per unique Account/Phone pair
For an account with id = 123, which has 5 contacts, each contact having one phone, is there a query that would yield all 5 rows and include all the account fields AND contact fields AND phone fields?
You can use eager loading of associations to get all the data you need in one active record query
#account = Account.includes(:contacts, :phones).find(123)
, which will actually translate into three SQL queries:
SELECT "accounts".* FROM "accounts" WHERE "accounts"."id" = $1 LIMIT 1 [["id", 123]]
SELECT "contacts".* FROM "contacts" WHERE "contacts"."account_id" IN (123)
SELECT "phones".* FROM "phones" WHERE "phones"."id" IN (<phone ids found in prev query>)
All of the records will be loaded into memory and become available through #account. To get the array of contacts and phones, just call #account.contacts and #account.phones, respectively. Note that these calls will not result in re-issued SQL queries, which is the beauty of eager loading.
ActiveRecord isn't quite smart enough to do all that with one SQL query. You can get pretty close, however, by using includes, which will avoid n+1 queries.
Account.includes(:contacts => :phones).where(:id => 123)
ActiveRecord will execute one query to load all Account records, one query to load all Contacts, and one query to load all Phones. See the link below to the documentation for the reason behind this.
if you really wanted to get everything in one SQL query (which can have drawbacks) you should look at ActiveRecord::Associations::Preloader (documentation)
Support I have two models for items and categories, in a many-to-many relation
class Item < ActiveRecord::Base
has_and_belongs_to_many :categories
class Category < ActiveRecord::Base
has_and_belongs_to_many :items
Now I want to filter out categories which contain at least one items, what will be the best way to do this?
I would like to echo #Delba's answer and expand on it because it's correct - what #huan son is suggesting with the count column is completely unnecessary, if you have your indexes set up correctly.
I would add that you probably want to use .uniq, as it's a many-to-many you only want DISTINCT categories to come back:
Category.joins(:items).uniq
Using the joins query will let you more easily work conditions into your count of items too, giving much more flexibility. For example you might not want to count items where enabled = false:
Category.joins(:items).where(:items => { :enabled => true }).uniq
This would generate the following SQL, using inner joins which are EXTREMELY fast:
SELECT `categories`.* FROM `categories` INNER JOIN `categories_items` ON `categories_items`.`category_id` = `categories`.`id` INNER JOIN `items` ON `items`.`id` = `categories_items`.`item_id` WHERE `items`.`enabled` = 1
Good luck,
Stu
Category.joins(:items)
More details here: http://guides.rubyonrails.org/active_record_querying.html#joining-tables
please notice, what the other guys answererd is NOT performant!
the most performant solution:
better to work with a counter_cache and save the items_count in the model!
scope :with_items, where("items_count > 0")
has_and_belongs_to_many :categories, :after_add=>:update_count, :after_remove=>:update_count
def update_count(category)
category.items_count = category.items.count
category.save
end
for normal "belongs_to" relation you just write
belongs_to :parent, :counter_cache=>true
and in the parent_model you have an field items_count (items is the pluralized has_many class name)
http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
in a has_and_belongs_to_many relation you have to write it as your own as above
scope :has_item, where("#{table_name}.id IN (SELECT categories_items.category_id FROM categories_items")
This will return all categories which have an entry in the join table because, ostensibly, a category shouldn't have an entry there if it does not have an item. You could add a AND categories_items.item_id IS NOT NULL to the subselect condition just to be sure.
In case you're not aware, table_name is a method which returns the table name of ActiveRecord class calling it. In this case it would be "categories".