Ruby on Rails Active Record Query - ruby-on-rails

Employers and Jobs. Employers have many jobs. Jobs have a boolean field started.
I am trying to query and find a count for Employers that have more than one job that is started.
How do I do this?
Employer.first.jobs.where(started: true).count
Do I use a loop with a counter or is there a way I can do it with a query?
Thanks!

You can have condition on join
Employer.joins(:jobs).where(jobs: {started: true}).count

You could create a scope like this in your Employer model:
def self.with_started_job
joins(:jobs)
.where(jobs: { started: true })
.having('COUNT(jobs.id) > 0')
end
Then, to get the number of employers that have a started job, you can just use Employer.with_started_job.count.

What is missing is a group by clause. Use .group() then count. Something like Employer.select("employers.id,count(*)").joins(:jobs).where("jobs.started = 1").group("employers.id")
The query joins both tables, eliminates the records that are false, then it counts the total records for each employer.id when grouped together.

Time for exploratory programming!
Given I don't know SQL really well, my way of doing this may not be optimal. I've been unable to use two aggregations without using a subquery. So I split the task in two:
Fetch all employers who have more than one job started
Count the number of entries in the result set
All on the database level, of course! And without raw SQL, so using Arel here and there. Here's what I've come up with:
class Employer < ActiveRecord::Base
has_many :jobs
# I explored my possibilities using this method: fetches
# all the employers and number of jobs each has started.
# Looks best> Employer.with_started_job_counts.map(&:attributes)
# In the final method this one is not used, it's just handy.
def self.with_started_job_counts
jobs = Job.arel_table
joins(:jobs).select(arel_table[Arel.star],
jobs[:id].count.as('job_count'))
.merge(Job.where(started: true))
.group(:id)
end
# Alright. Now try to apply it. Seems to work alright.
def self.busy
jobs = Job.arel_table
joins(:jobs).merge(Job.where(started: true))
.group(:id)
.having(jobs[:id].count.gt 1)
end
# This is one of the tricks I've learned while fiddling
# with Arel. Counts any relations on the database level.
def self.generic_count(subquery)
from(subquery).count
end
# So now... we get this.
def self.busy_count
generic_count(busy)
end
# ...and it seems to get what we need!
end
Resulting SQL is...large. Not huge, but you may have to solve performance issues with it.
SELECT COUNT(*)
FROM (
SELECT "employers".*
FROM "employers"
INNER JOIN "jobs" ON "jobs"."employer_id" = "employers"."id"
WHERE "jobs"."started" = ?
GROUP BY "employers"."id"
HAVING COUNT("jobs"."id") > 1
) subquery [["started", "t"]]
...still, it does seem to get the result.

Related

Does splitting up an active record query over 2 methods hit the database twice?

I have a database query where I want to get an array of Users that are distinct for the set:
#range is a predefinded date range
#shift_list is a list of filtered shifts
def listing
Shift
.where(date: #range, shiftname: #shift_list)
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
and I read somewhere that for readability, or isolating for testing, or code reuse, you could split this into seperate methods:
def listing
shiftlist
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
def shift_list
Shift
.where(date: #range, shiftname: #shift_list)
end
So I rewrote this and some other code, and now the page takes 4 times as long to load.
My question is, does this type of method splitting cause the database to be hit twice? Or is it something that I did elsewhere?
And I'd love a suggestion to improve the efficiency of this code.
Further to the need to remove mapping from the code, this shift list is being created with the following code:
def _month_shift_list
Shift
.select(:shiftname)
.distinct
.where(date: #range)
.map {|x| x.shiftname }
end
My intention is to create an array of shiftnames as strings.
I am obviously missing some key understanding in database access, as this method is clearly creating part of the problem.
And I think I have found the solution to this with the following:
def month_shift_list
Shift.
.where(date: #range)
.pluck(:shiftname)
.uniq
end
Nope, the database will not be hit twice. The queries in both methods are lazy loaded. The issue you have with the slow page load times is because the map function now has to do multiple finds which translates to multiple SELECT from the DB. You can re-write your query to this:
def listing
User.
joins(:shift).
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
This has just one hit to the DB and will be much faster and should produce the same result as above.
The assumption here is that there is a has_one/has_many relationship on the User model for Shifts
class User < ActiveRecord::Base
has_one :shift
end
If you don't want to establish the has_one/has_many relationship on User, you can re-write it to:
def listing
User.
joins("INNER JOIN shifts on shifts.user_id = users.id").
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
ALTERNATIVE:
You can use 2 queries if you experience issues with using ActiveRecord#merge.
def listing
user_ids = Shift.where(date: #range, shiftname: #shift_list).uniq.pluck(:user_id).sort
User.find(user_ids)
end

Avoiding n+1 with where query active record

I'm looking at my server log and there's a lot of SELECT statements from a particular query. I'm aware of n+1 issues and this seems to be one. However I'm not sure what the solution would be for this query. Is this a single query or multiple for every user that is returned (I believe WHERE is a single query)?
User.where(profile_type: self.profile_type).where.not(id: black_list)
EDIT:
user.rb
def suggested_friends
black_list = self.friendships.map(&:friend_id)
black_list.push(self.id)
return User.where(profile_type: self.profile_type).where.not(id: black_list)
end
index.json.jbuilder
json.suggested_friends #user.suggested_friends do |friend|
json.set! :user_id, friend.id
json.(friend.profile, *friend.profile.attributes.keys)
end
the issue was leaving out the includes(:profile). Using the following query solved my problem
User.where(profile_type: self.profile_type).where.not(id: black_list).includes(:profile)
thank you for the comments to point me in the right direction
User.where(profile_type: self.profile_type).where.not(id: black_list)
Is only 1 query, even if you have multiple where clauses, rails is smart to make from it 1 sql query.
If you had
users = User.where(profile_type: self.profile_type)
users.each do |u|
u.get_all_comments
end
This is n+1 query, because you ask for all users, then for each user you make new request for its comments, this is N (queries for comments) and 1 query for all users

How to efficiently update associated collection in rails (eager loading)

I have a simple association like
class Slot < ActiveRecord::Base
has_many :media_items, dependent: :destroy
end
class MediaItem < ActiveRecord::Base
belongs_to :slot
end
The MediaItems are ordered per Slot and have a field called ordering.
And want to avoid n+1 querying but nothing I tried works. I had read several blogposts, railscasts etc but hmm.. they never operate on a single model and so on...
What I do is:
def update
#slot = Slot.find(params.require(:id))
media_items = #slot.media_items
par = params[:ordering_media]
# TODO: IMP remove n+1 query
par.each do |item|
item_id = item[:media_item_id]
item_order = item[:ordering]
media_items.find(item_id).update(ordering: item_order)
end
#slot.save
end
params[:ordering_media] is a json array with media_item_id and an integer for ordering
I tried things like
#slot = Slot.includes(:media_items).find(params.require(:id)) # still n+1
#slot = Slot.find(params.require(:id)).includes(:media_items) # not working at all b/c is a Slot already
media_items = #slot.media_items.to_a # looks good but then in the array of MediaItems it is difficult to retrieve the right instance in my loop
This seems like a common thing to do, so I think there is a simple approach to solve this. Would be great to learn about it.
First at all, at this line media_items.find(item_id).update(ordering: item_order) you don't have an n + 1 issue, you have a 2 * n issue. Because for each media_item you make 2 queries: one for find, one for update. To fix you can do this:
params[:ordering_media].each do |item|
MediaItem.update_all({ordering: item[:ordering]}, {id: item[:media_item_id]})
end
Here you have n queries. That is the best we can do, there's no way to update a column on n records with n distinct values, with less than n queries.
Now you can remove the lines #slot = Slot.find(params.require(:id)) and #slot.save, because #slot was not modified or used at the update action.
With this refactor, we see a problem: the action SlotsController#update don't update slot at all. A better place for this code could be MediaItemsController#sort or SortMediaItemsController#update (more RESTful).
At the last #slot = Slot.includes(:media_items).find(params.require(:id)) this is not n + 1 query, this is 2 SQL statements query, because you retrieve n media_items and 1 slot with only 2 db calls. Also it's the best option.
I hope it helps.

How to filter association_ids for an ActiveRecord model?

In a domain like this:
class User
has_many :posts
has_many :topics, :through => :posts
end
class Post
belongs_to :user
belongs_to :topic
end
class Topic
has_many :posts
end
I can read all the Topic ids through user.topic_ids but I can't see a way to apply filtering conditions to this method, since it returns an Array instead of a ActiveRecord::Relation.
The problem is, given a User and an existing set of Topics, marking the ones for which there is a post by the user. I am currently doing something like this:
def mark_topics_with_post(user, topics)
# only returns the ids of the topics for which this user has a post
topic_ids = user.topic_ids
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But this loads all the topic ids regardless of the input set. Ideally, I'd like to do something like
def mark_topics_with_post(user, topics)
# only returns the topics where user has a post within the subset of interest
topic_ids = user.topic_ids.where(:id=>topics.map(&:id))
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
But the only thing I can do concretely is
def mark_topics_with_post(user, topics)
# needlessly create Post objects only to unwrap them later
topic_ids = user.posts.where(:topic_id=>topics.map(&:id)).select(:topic_id).map(&:topic_id)
topics.each {|t| t[:has_post]=topic_ids.include(t.id)}
end
Is there a better way?
Is it possible to have something like select_values on a association or scope?
FWIW, I'm on rails 3.0.x, but I'd be curious about 3.1 too.
Why am I doing this?
Basically, I have a result page for a semi-complex search (which happens based on the Topic data only), and I want to mark the results (Topics) as stuff on which the user has interacted (wrote a Post).
So yeah, there is another option which would be doing a join [Topic,Post] so that the results come out as marked or not from the search, but this would destroy my ability to cache the Topic query (the query, even without the join, is more expensive than fetching only the ids for the user)
Notice the approaches outlined above do work, they just feel suboptimal.
I think that your second solution is almost the optimal one (from the point of view of the queries involved), at least with respect to the one you'd like to use.
user.topic_ids generates the query:
SELECT `topics`.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1
if user.topic_ids.where(:id=>topics.map(&:id)) was possible it would have generated this:
SELECT topics.id FROM `topics`
INNER JOIN `posts` ON `topics`.`id` = `posts`.`topic_id`
WHERE `posts`.`user_id` = 1 AND `topics`.`id` IN (...)
this is exactly the same query that is generated doing: user.topics.select("topics.id").where(:id=>topics.map(&:id))
while user.posts.select(:topic_id).where(:topic_id=>topics.map(&:id)) generates the following query:
SELECT topic_id FROM `posts`
WHERE `posts`.`user_id` = 1 AND `posts`.`topic_id` IN (...)
which one of the two is more efficient depends on the data in the actual tables and indices defined (and which db is used).
If the topic ids list for the user is long and has topics repeated many times, it may make sense to group by topic id at the query level:
user.posts.select(:topic_id).group(:topic_id).where(:topic_id=>topics.map(&:id))
Suppose your Topic model has a column named id you can do something like this
Topic.select(:id).join(:posts).where("posts.user_id = ?", user_id)
This will run only one query against your database and will give you all the topics ids that have posts for a given user_id

Help converting Rails 2 Database logic to Rails 3.1/ PostgreSQL

How do I select a single random record for each user, but order the Array by the latest record pr. user.
If Foo uploads a new painting, I would like to select a single random record from foo. This way a user that uploads 10 paintings won't monopolize all the space on the front page, but still get a slot on the top of the page.
This is how I did it with Rails 2.x running on MySQL.
#paintings = Painting.all.reverse
first_paintings = []
#paintings.group_by(&:user_id).each do |user_id, paintings|
first_paintings << paintings[rand(paintings.size-1)]
end
#paintings = (first_paintings + (Painting.all - first_paintings).reverse).paginate(:per_page => 9, :page => params[:page])
The example above generates a lot of SQL query's and is properly badly optimized. How would you pull this off with Rails 3.1 running on PostgreSQL? I have 7000 records..
#paintings = Painting.all.reverse = #paintings = Painting.order("id desc")
If you really want to reverse the order of the the paintings result set I would set up a scope then just use that
Something like
class Painting < ActiveRecord::Base
scope :reversed, order("id desc")
end
Then you can use Painting.reversed anywhere you need it
You have definitely set up a belongs_to association in your Painting model, so I would do:
# painting.rb
default_scope order('id DESC')
# paintings_controller.rb
first_paintings = User.includes(:paintings).collect do |user|
user.paintings.sample
end
#paintings = (first_paintings + Painting.where('id NOT IN (?)', first_paintings)).paginate(:per_page => 9, :page => params[:page])
I think this solution results in the fewest SQL queries, and is very readable. Not tested, but I hope you got the idea.
You could use the dynamic finders:
Painting.order("id desc").find_by_user_id!(user.id)
This is assuming your Paintings table contains a user_id column or some other way to associate users to paintings which it appears you have covered since you're calling user_id in your initial code. This isn't random but using find_all_by_user_id would allow you to call .reverse on the array if you still wanted and find a random painting.

Resources