Use merge method with scope while eager loading with includes Rails - ruby-on-rails

This is a follow up question to watching a video at codeschool on Scopes, as well as watching a video by Chris Oliver on the merge method.
What I am trying to do is find only those authors which have at least one book that is available. Then after it filters for those authors, I want to eager load all of the books for those selected authors because I do not want to query the database each time I pull data out about those books. I tried a number of different scopes but none of them are giving me exactly what I need:
#app/models/book.rb
class Book < ActiveRecord::Base
belongs_to :author
scope :available, ->{where(availability: true)}
scope :unavailable, ->{where(availability: false)}
end
#app/models/author.rb
class Author < ActiveRecord::Base
has_many :books, dependent: :destroy
scope :with_available_books, ->{joins(:books).merge(Book.available)}
scope :with_available_books_uniq, ->{uniq.joins(:books).merge(Book.available)}
scope :with_available_books_includes, ->{joins(:books).merge(Book.available).includes(:books)}
scope :with_available_books_uniq_includes, ->{uniq.joins(:books).merge(Book.available).includes(:books)}
def to_s
self.name
end
end
Here is what is a snapshot of what is in my databases
I have three authors:
Neil, and he has 10 associated books total, ALL are available
John, and he has 10 associated books total, ALL are unavailable
Mixture Author, and he has 10 books total, 5 are available, 5 are unavailable
I ran all the queries and outputted the results in HTML. Here is what I am getting:
# Duplicating the authors AND N + 1 problem with associated books
Author.with_available_books.size: 15
Books for author[0]: 10
Books for author[1]: 10
# Fixed the duplication but still N + 1 problem with the associated books
Author.with_available_books_uniq.size: 2
Books for author[0]: 10
Books for author[1]: 10
# Fixed the N + 1 problem but duplicating authors
Author.with_available_books_includes.size: 15
# Fixed the duplication and fixed the N + 1 problem
# BUT now it is filtering out the unavailable books!
# But I want all the Books for these authors!
Author.with_available_books_uniq_includes.size: 2
Books for author[0]: 10
Books for author[1]: 5
How do I grab ALL the books for the unduplicated authors? I want to filter the authors by their associated object's attribute (the available attribute on the books), and I want to eager load those books.

Thanks TREMENDOUSLY to Chris Oliver himself for responding to my email query about this situation.
First grab the authors:
#uniq_authors_with_available_books = Author.with_available_books_uniq
This properly grabs Neil and Mixture Author who both have available books:
2.2.1 :004 > #authors_with_available_books.size
=> 2
However, as can be seen below, the N + 1 problem is still present if we want to grab any info about those two authors' books:
2.2.1 :005 > #authors_with_available_books[0].books.size
(0.2ms) SELECT COUNT(*) FROM "books" WHERE "books"."author_id" = ? [["author_id", 1]]
=> 10
2.2.1 :006 > #authors_with_available_books[1].books.size
(0.2ms) SELECT COUNT(*) FROM "books" WHERE "books"."author_id" = ? [["author_id", 3]]
=> 10
So thanks to Chris's advice. What we have to do is a separate query on books with a subquery using the id's from the #authors_with_available_books:
2.2.1 :007 > #books_by_those_authors = Book.where(author_id: #authors_with_available_books.map(&:id))
Book Load (0.4ms) SELECT "books".* FROM "books" WHERE "books"."author_id" IN (1, 3)
From the sql query we can see that it is grabbing only those books where the id is equal to 1 or 3. It is only grabbing those authors because the first query told us that only those authors are the ones with available books.
Now I can do this:
#books.size
=> 20
Which makes sense because Neil has ten total books and Mixture Author has ten total books, yielding a combined total of 20.

Related

How to get the top 5 per enum of a model in rails?

Let's say I have a model named post, which has an enum named post_type which can either be
admin, public or user
#app/models/post.rb
class Post < ApplicationRecord
enum post_type: [ :admin, :public, :user ]
end
How can I select 5 last created posts from each category?
I can't think of any other solution than this:
PER_GROUP = 5
admin_posts = Post.admin.order(created_at: :desc).limit(PER_GROUP)
user_posts = Post.user.order(created_at: :desc).limit(PER_GROUP)
public_posts = Post.public.order(created_at: :desc).limit(PER_GROUP)
Is there any way I could fetch all the rows in the required manner from just a single query to the database.
STACK
RAILS : 6
PostgresSQL: 9.4
I am not sure how to translate into RAILS, but it is straight forward Postgres query. You use the row_number window function in a sub-select then keep only rows with row_number less than or equal 5 on the outer select.
select *
from (select post_txt
, posted_type
, row_number() over (partition by posted_type) rn
from enum_table
) pt
where rn <= 5
order by posted_type;
One thing to look out for is the sorting on an enum. Doing so gives results in order of the definition, not a "natural order" (alphanumeric in this case). See example here.
Thanks to #Belayer i was able to come up with a solution.
PER_GROUP = 5
sub_query = Post.select('*', 'row_number() over (partition by "posts"."post_type" ORDER BY posts.created_at DESC ) rn').to_sql
#posts = Post.from("(#{sub_query}) inner_query")
.where('inner_query.rn <= ?', PER_GROUP')
.order(:post_type, created_at: :desc)
.group_by(&:post_type)
Since i am only loading 5 records across just a few different types group_by will work just fine for me.

Rails 4 Eager load limit subquery

Is there a way to avoid the n+1 problem when eager loading and also applying a limit to the subquery?
I want to avoid lots of sql queries like this:
Category.all.each do |category|
category.posts.limit(10)
end
But I also want to only get 10 posts per category, so the standard eager loading, which gets all the posts, does not suffice:
Category.includes(:posts).all
What is the best way to solve this problem? Is N+1 the only way to limit the amount of posts per category?
From the Rails docs
If you eager load an association with a specified :limit option, it will be ignored, returning all the associated objects
So given the following model definition
class Category < ActiveRecord::Base
has_many :posts
has_many :included_posts, -> { limit 10 }, class_name: "Post"
end
Calling Category.find(1).included_posts would work as expected and apply the limit of 10 in the query. However, if you try to do Category.includes(:included_posts).all the limit option will be ignored. You can see why this is the case if you look at the SQL generated by an eager load
Category.includes(:posts).all
Category Load (0.2ms) SELECT "categories".* FROM "categories"
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."category_id" IN (1, 2, 3)
If you added the LIMIT clause to the posts query, it would return a total of 10 posts and not 10 posts per category as you might expect.
Getting back to your problem, I would eager load all posts and then limit the loaded collection using first(10)
categories = Category.includes(:posts).all
categories.first.posts.first(10)
Although you're loading more models into memory, this is bound to be more performant since you're only making 2 calls against the database vs. n+1. Cheers.

Get first association from a has many through association

I'm trying to join the first song of each playlist to an array of playlists and am having a pretty tough time finding an efficient solution.
I have the following models:
class Playlist < ActiveRecord::Base
belongs_to :user
has_many :playlist_songs
has_many :songs, :through => :playlist_songs
end
class PlaylistSong < ActiveRecord::Base
belongs_to :playlist
belongs_to :song
end
class Song < ActiveRecord::Base
has_many :playlist_songs
has_many :playlists, :through => :playlist_songs
end
I would like to get this:
playlist_name | song_name
----------------------------
chill | baby
fun | bffs
I'm having a pretty tough time finding an efficient way to do this through a join.
UPDATE ****
Shane Andrade has lead me in the right direction, but I still can't get exactly what I want.
This is as far as I've been able to get:
playlists = Playlist.where('id in (1,2,3)')
playlists.joins(:playlist_songs)
.group('playlists.id')
.select('MIN(songs.id) as song_id, playlists.name as playlist_name')
This gives me:
playlist_name | song_id
---------------------------
chill | 1
This is close, but I need the first song(according to id)'s name.
Assuming you are on Postgresql
Playlist.
select("DISTINCT ON(playlists.id) playlists.id,
songs.id song_id,
playlists.name,
songs.name first_song_name").
joins(:songs).
order("id, song_id").
map do |pl|
[pl.id, pl.name, pl.first_song_name]
end
I think this problem would be improved by having a a stricter definition of "first". I'd suggest adding a position field on the PlaylistSong model. At which point you can then simply do:
Playlist.joins(:playlist_song).joins(:song).where(:position => 1)
What you are doing above with joins is what you would do if you wanted to find every playlist with a given name and a given song. In order to collect the playlist_name and first song_name from each playlist you can do this:
Playlist.includes(:songs).all.collect{|play_list| [playlist.name, playlist.songs.first.name]}
This will return an array in this form [[playlist_name, first song_name],[another_playlist_name, first_song_name]]
I think the best way to do this is to use an inner query to get the first item and then join on it.
Untested but this is the basic idea:
# gnerate the sql query that selects the first item per playlist
inner_query = Song.group('playlist_id').select('MIN(id) as id, playlist_id').to_sql
#playlists = Playlist
.joins("INNER JOIN (#{inner_query}) as first_songs ON first_songs.playlist_id = playlist.id")
.joins("INNER JOIN songs on songs.id = first_songs.id")
Then rejoin back to the songs table since we need the song name. I'm not sure if rails is smart enough to select the song fields on the last join. If not you might need to include a select at the end that selects playlists.*, songs.* or something.
Try:
PlaylistSong.includes(:song, :playlist).
find(PlaylistSong.group("playlist_id").pluck(:id)).each do |ps|
puts "Playlist: #{ps.playlist.name}, Song: #{ps.song.name}"
end
(0.3ms) SELECT id FROM `playlist_songs` GROUP BY playlist_id
PlaylistSong Load (0.2ms) SELECT `playlist_songs`.* FROM `playlist_songs` WHERE `playlist_songs`.`id` IN (1, 4, 7)
Song Load (0.2ms) SELECT `songs`.* FROM `songs` WHERE `songs`.`id` IN (1, 4, 7)
Playlist Load (0.2ms) SELECT `playlists`.* FROM `playlists` WHERE `playlists`.`id` IN (1, 2, 3)
Playlist: Dubstep, Song: Dubstep song 1
Playlist: Top Rated, Song: Top Rated song 1
Playlist: Last Played, Song: Last Played song 1
This solution has some benefits:
Limited to 4 select statements
Does not load all playlist_songs - aggregating on db side
Does not load all songs - filtering by id's on db side
Tested with MySQL.
This will not show empty playlists.
And there could be problems with some DBs when playlists count > 1000
just fetch the song from the other side :
Song
.joins( :playlist )
.where( playlists: {id: [1,2,3]} )
.first
however, as #Dave S. suggested, "first" song in a playlist is random unless you explicitly specify an order (positioncolumn, or anything else) because SQL does not warrant the order in which the records are returned, unless you explicitly ask it.
EDIT
Sorry, I misread your question. I think that indeed a position column is necessary.
Song
.joins( :playlist )
.where( playlists: {id: [1,2,3]}, songs: {position: 1} )
If you do not want any position column at all, you can always try to group the songs by playlist id, but you'll have to select("songs.*, playlist_songs.*"), and the "first" song is still random. Another option is to use the RANK window function, but it is not supported by all RDBMS (for all i know, postgres and sql server do).
you can create a has_one association which, in effect, will call the first song that is associated to the playlist
class PlayList < ActiveRecord::Base
has_one :playlist_cover, class_name: 'Song', foreign_key: :playlist_id
end
Then just use this association.
Playlist.joins(:playlist_cover)
UPDATE: didn't see the join table.
you can use a :through option for has_one if you have a join table
class PlayList < ActiveRecord::Base
has_one :playlist_song_cover
has_one :playlist_cover, through: :playlist_song_cover, source: :song
end
Playlyst.joins(:playlist_songs).group('playlists.name').minimum('songs.name').to_a
hope it works :)
got this :
Product.includes(:vendors).group('products.id').collect{|product| [product.title, product.vendors.first.name]}
Product Load (0.5ms) SELECT "products".* FROM "products" GROUP BY products.id
Brand Load (0.5ms) SELECT "brands".* FROM "brands" WHERE "brands"."product_id" IN (1, 2, 3)
Vendor Load (0.4ms) SELECT "vendors".* FROM "vendors" WHERE "vendors"."id" IN (2, 3, 1, 4)
=> [["Computer", "Dell"], ["Smartphone", "Apple"], ["Screen", "Apple"]]
2.0.0p0 :120 > Product.joins(:vendors).group('products.title').minimum('vendors.name').to_a
(0.6ms) SELECT MIN(vendors.name) AS minimum_vendors_name, products.title AS products_title FROM "products" INNER JOIN "brands" ON "brands"."product_id" = "products"."id" INNER JOIN "vendors" ON "vendors"."id" = "brands"."vendor_id" GROUP BY products.title
=> [["Computer", "Dell"], ["Screen", "Apple"], ["Smartphone", "Apple"]]
You could add activerecord scope to your models to optimize how the sql queries work for you in the context of the app. Also, scopes are composable, thus make it easier to obtain what you're looking for.
For example, in your Song model, you may want a first_song scope
class Song < ActiveRecord::Base
scope :first_song, order("id asc").limit(1)
end
And then you can do something like this
playlists.songs.first_song
Note, you may also need to add some scopes to your PlaylistSongs association model, or to your Playlist model.
You didn't say if you had timestamps in your database. If you do though, and your records on the join table PlaylistSongs are created when you add a song to a playlist, I think this may work:
first_song_ids = Playlist.joins(:playlist_songs).order('playlist_songs.created_at ASC').pluck(:song_id).uniq
playlist_ids = Playlist.joins(:playlist_songs).order('playlist_songs.created_at ASC').pluck(:playlist_id).uniq
playlist_names = Playlist.where(id: playlist_ids).pluck(:playlist_name)
song_names = Song.where(id: first_song_ids).pluck(:song_name)
I believe playlist_names and song_names are now mapped by their index in this way. As in: playlist_names[0] first song name is song_names[0], and playlist_names[1] first song name is song_names[1] and so on. I'm sure you could combine them in a hash or an array very easily with built in ruby methods.
I realize you were looking for an efficient way to do this, and you said in the comments you didn't want to use a block, and I am unsure if by efficient you meant an all-in-one query. I am just getting used to combining all these rails query methods and perhaps looking at what I have here, you can modify things to your needs and make them more efficient or condensed.
Hope this helps.

Rails 3 Limiting Included Objects

For example I have a blog object, and that blog has many posts. I want to do eager loading of say the first blog object and include say the first 10 posts of it. Currently I would do #blogs = Blog.limit(4) and then in the view use #blogs.posts.limit(10). I am pretty sure there is a better way to do this via an include such as Blog.include(:posts).limit(:posts=>10). Is it just not possible to limit the number of included objects, or am I missing something basic here?
Looks like you can't apply a limit to :has_many when eager loading associations for multiple records.
Example:
class Blog < ActiveRecord::Base
has_many :posts, :limit => 5
end
class Post < ActiveRecord::Base
belongs_to :blog
end
This works fine for limiting the number of posts for a single blog:
ruby-1.9.2-p290 :010 > Blog.first.posts
Blog Load (0.5ms) SELECT `blogs`.* FROM `blogs` LIMIT 1
Post Load (0.6ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`blog_id` = 1 LIMIT 5
However, if you try to load all blogs and eager load the posts with them:
ruby-1.9.2-p290 :011 > Blog.includes(:posts)
Blog Load (0.5ms) SELECT `blogs`.* FROM `blogs`
Post Load (1.1ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`blog_id` IN (1, 2)
Note that there's no limit on the second query, and there couldn't be - it would limit the number of posts returned to 5 across all blogs, which is not at all what you want.
EDIT:
A look at the Rails docs confirms this. You always find these things the minute you've figured them out :)
If you eager load an association with a specified :limit option, it
will be ignored, returning all the associated objects
You need to limit the number of posts in your blog model like this:
class Blog < ActiveRecord::Base
has_many :included_posts, :class_name => 'Post', :limit => 10
has_many :posts
end
So then you can do:
$ Blog.first.included_posts.count
=> 10
$ Blog.first.posts.count
=> 999

How do I do a join in ActiveRecord after records have been returned?

I am using ActiveRecord in Rails 3 to pull data from two different tables in two different databases. These databases can not join on each other, but I have the need to do a simple join after-the-fact. I would like to preserve the relation so that I can chain it down the line.
here is a simplified version of what I am doing
browsers = Browser.all # <-- this is fairly small and can reside in memory
events = Event.where(:row_date=>Date.today).select(:name, :browser_id)
So as you can see, I want to join browsers in on the events relation, where browser_id should equal browsers.name. events is a relation and I can still add clauses to it down the line, so I dont want to run the query on the db just yet. How would I accomplish this?
Edit
For those that would like to see some code for the answer I accepted below, here is what I came up with:
class EventLog < ActiveRecord::Base
belongs_to :browser
def get_todays_events
Event.where(:row_date=>Date.today).select(:name, :browser_id).includes(:browser)
end
end
would let me get the browser name in the following manner
get_todays_events.browser.name
I would accomplish this by using an :include. Attempting to do this in Ruby will cause you nothing but grief. You can chain onto an include just fine.
joins does create SQL joins as expected in the current Rails 5:
pry(main)> Customer.joins(:orders).limit(5)
Customer Load (0.2ms) SELECT `customers`.* FROM `customers` INNER JOIN `orders` ON `orders`.`customer_id` = `customers`.`id` LIMIT 5
=> [#<Customer:0x007fb869f11fe8
...
This should be vastly faster, because it only requires a single database query, whereas includes will perform 1 + <number of rows in first table> + <number of rows in second table>...
Here's an example where includes requires 1750x as long as joins:
pry(main)> benchmark do
Order.joins(:address, :payments, :customer, :packages).all.size
> 0.02456 seconds
pry(main)> benchmark do
[14] pry(main)* Order.includes(:address, :payments, :customer, :packages).all.map(&:zip).max
[14] pry(main)*end
=> 35.607257 seconds

Resources