how to query a limited set of records with ActiveRecord - ruby-on-rails

This has been driving me crazy for the last couple of hours as I'm sure there must be a simple solution. Let's say I have the following models:
class Post < ActiveRecord::Base
has_many :comments
end
class Comment < ActiveRecord::Base
belongs_to :post
end
And the Comment model has an attribute called Flagged. Assume the post has ten comments and the first two and last two have been marked as flagged.
I want to get a count of how many of the first 5 comments of a post have been flagged. In this case I would want to return 2. So at first I tried:
post.comments.limit(5).where(comments: { flagged: true }).count
But this returns 4 which makes sense because it's finding the first 5 records where flagged is true. My question is, how can I do the count on only the limited resultset? I tried:
first_five_comments = post.comments.limit(5)
first_five_comments.where(flagged: true).count
This also returns 4 as it's just chaining the relations together and executing the same query as above.
I know I could do this with a straight SQL statement, but it just seems like there should be a more Rails way to do it. Do I have to add a .all to the above statement and then do the count within the returned array? Obviously this doesn't work:
first_five_comments = post.comments.limit(5).all
first_five_comments.where(flagged: true).count
because I can't use "where" on an array. If I do have to do it like this, how would I search within the array the get the count?
Any help is appreciated!

You need to filter the array and then count it's elements.
post.comments.limit(5).select{ |comment| comment.flagged? }.size
Or shorter:
post.comments.limit(5).select(&:flagged?).size
Note: select is a method of Array, it does not have anything to do with SQL Select statement.

Related

Are .select and or .where responsible for causing N+1 queries in rails?

I have two methods here, distinct_question_ids and #correct_on_first attempt. The goal is to show a user how many distinct multiple choice questions have been answered that are correct.
The second one will let me know how many of these distinct MCQs have been answered correctly on the first attempt. (A user can attempt a MCQ many times)
Now, when a user answers thousands of questions and has thousands of user answers, the page to show their performance is taking 30 seconds to a minute to load. And I believe it's due to the .select method, but I don't know how to replace .select without using .select, since it loops just like .each
Is there any method that doesn't cause N+1?
distinct_question_ids = #user.user_answers.includes(:multiple_choice_question).
where(is_correct_answer: true).
distinct.pluck(:multiple_choice_question_id)
#correct_on_first_attempt = distinct_question_ids.select { |qid|
#user.user_answers.
where(multiple_choice_question_id: qid).first.is_correct_answer
}.count
.pluck returns an Array of values, not an ActiveRecord::Relation.
So when you do distinct_question_ids.select you're not calling ActiveRecord's select, but Array's select. Within that select, you're issuing a fresh new query against #user for every id you just plucked -- including ones that get rejected in the select.
You could create a query named distinct_questions that returns a relation (no pluck!), and then build correct_on_first_attempt off of that, and I think you'll avoid the N+1 queries.
Something along these lines:
class UserAnswer < ActiveRecord::Base
scope :distinct_correct, -> { includes(:multiple_choice_question)
.where(is_correct_answer: true).distinct }
scope :first_attempt_correct, -> { distinct_correct
.first.is_correct_answer }
end
class User < ActiveRecord::Base
def good_guess_count
#correct_on_first_attempt = #user.user_answers.distinct_correct.first_attempt_correct.count
end
end
You'll need to ensure that .first is actually getting their first attempt, probably by sorting by id or created_at.
As an aside, if you track the attempt number explicitly in UserAnswer, you can really tighten this up:
class UserAnswer < ActiveRecord::Base
scope :correct, -> { where(is_correct_answer: true) }
scope :first_attempt, -> { where(attempt: 1) }
end
class User < ActiveRecord::Base
def lucky_guess_count
#correct_on_first_attempt = #user.user_answers.includes(:multiple_choice_question)
.correct.first_attempt.count
end
end
If you don't have an attempt number in your schema, you could .order and .group to get something similar. But...it seems that some of your project requirements depend on that sequence number, so I'd recommend adding it if you don't have it already.
ps. For fighting N+1 queries, use gem bullet. It is on-point.

Problems with displaying the correct number of items in an has_and_belongs_to_many association

I have a model UseCases (about 6.000 rows) and EducationalObjectives (about 4.000 rows) associated with has_and_belongs_to_many(EducationalObjectivesUseCases with about 8.000 rows). Some of the EducationalObjectives belong to subjectA (about 4.500 rows in EducationalObjectivesUseCases) and some to subjectB (about 3.500 rows in EducationalObjectivesUseCases).
Now I want to display a list of all UseCases which are tied to the EducationalObjectives of the subjectA which should be about 3.500 rows but I get about 4.500 rows (you've guessed it: the number of associations within EducationalObjectivesUseCases) since duplicate entries (UseCases with many EducationalObjectives on subjectA) are displayed the number of times of entries.
My thinking was that I only can tell through the HABTM association that I need the UseCases for subjectA but don't know how the avoid duplicate entries.
class UseCase < ApplicationRecord
has_and_belongs_to_many :educational_objectives
end
class EducationalObjective < ApplicationRecord
has_and_belongs_to_many :use_cases
end
class EducationalObjectivesUseCase < ApplicationRecord
belongs_to :educational_objective
belongs_to :use_case
end
class UseCasesController < ApplicationController
def index
#use_cases = UseCase.all.
order(:use_case).
joins(:educational_objectives).
where('educational_objectives.subject_id = ?',2)
end
end
How do I get Rails to display only the used UseCases for subjectA once (only 3.500 rows)? Where is my mistake?
Thanks in advance!
The quickest way to solve this is to call #distinct on the where-chain. Since the select is automatically set to use_cases.* this will work and filter out duplicated records.
def index
#use_cases = UseCase.joins(:educational_objectives)
.where(educational_objectives: {subject_id: 2})
.order(:use_case)
.distinct
end
Alternatively this can be solved using a sub-query.
def index
educational_objectives = EducationalObjective.where(subject_id: 2)
use_case_ids = EducationalObjectivesUseCase
.where(educational_objective_id: educational_objectives)
.select(:use_case_id)
#use_cases = UseCase.where(id: use_case_ids).order(:use_case)
end
edit
The sub-query code will execute 1 SQL query, just like the code for the distinct version. When executed on the console suffix each statement with ;nil to prevent execution by the #inspect method (used to show you the result). If you don't do this the console will try to show the result and trigger the query before we are ready executing it. It will still work, but it looks like it are multiple queries.

Ruby on Rails 4 count distinct with inner join

I have created a validation rule to limit the number of records a member can create.
class Engine < ActiveRecord::Base
validates :engine_code, presence: true
belongs_to :group
delegate :member, to: :group
validate :engines_within_limit, on: :create
def engines_within_limit
if self.member.engines(:reload).distinct.count(:engine_code) >= self.member.engine_limit
errors.add(:engine, "Exceeded engine limit")
end
end
end
The above doesn't work, specifically this part,
self.member.engines(:reload).distinct.count(:engine_code)
The query it produces is
SELECT "engines".*
FROM "engines"
INNER JOIN "groups"
ON "engines"."group_id" = "groups"."id"
WHERE "groups"."member_id" = $1 [["member_id", 22]]
and returns the count 0 which is wrong
Whereas the following
Engine.distinct.count(:engine_code)
produces the query
SELECT DISTINCT COUNT(DISTINCT "engines"."engine_code")
FROM "engines"
and returns 3 which is correct
What am I doing wrong? It is the same query just with a join?
After doing long chat, we found the below query to work :
self.member
.engines(:reload)
.count("DISTINCT engine_code")
AR:: means ActiveRecord:: below.
The reason for the "wrong" result in the question is that the collection association isn't used correct. A collection association (e.g. has_many) for a record is not a AR::Relation it's a AR::Associations::CollectionProxy. It's a sub class of AR::Relation, and e.g. distinct is overridden.
self.member.engines(:reload).distinct.count(:engine_code) will cause this to happen:
self.member.engines(:reload) is a
AR::Associations::CollectionProxy
.distinct on that will first
fire the db read, then do a .to_a on the result and then doing
"it's own" distinct which is doing a uniq on the array of records
regarding the id of the records.
The result is an array.
.count(:engine_code) this is doing Array#count on the array which is returning
0 since no record in the array equals to the symbol :engine_code.
To get the correct result you should use the relation of the association proxy, .scope:
self.member.engines(:reload).scope.distinct.count(:engine_code)
I think it's a little bit confusing in Rails how collection associations is handled. Many of the "normal" methods for relations works as usual, e.g. this will work without using .scope:
self.member.engines(:reload).where('true').distinct.count(:engine_code)
that is because where isn't overridden by AR::Associations::CollectionProxy.
Perhaps it would be better to always have to use .scope when using the collection as a relation.

How to efficiently update associated collection in rails (eager loading)

I have a simple association like
class Slot < ActiveRecord::Base
has_many :media_items, dependent: :destroy
end
class MediaItem < ActiveRecord::Base
belongs_to :slot
end
The MediaItems are ordered per Slot and have a field called ordering.
And want to avoid n+1 querying but nothing I tried works. I had read several blogposts, railscasts etc but hmm.. they never operate on a single model and so on...
What I do is:
def update
#slot = Slot.find(params.require(:id))
media_items = #slot.media_items
par = params[:ordering_media]
# TODO: IMP remove n+1 query
par.each do |item|
item_id = item[:media_item_id]
item_order = item[:ordering]
media_items.find(item_id).update(ordering: item_order)
end
#slot.save
end
params[:ordering_media] is a json array with media_item_id and an integer for ordering
I tried things like
#slot = Slot.includes(:media_items).find(params.require(:id)) # still n+1
#slot = Slot.find(params.require(:id)).includes(:media_items) # not working at all b/c is a Slot already
media_items = #slot.media_items.to_a # looks good but then in the array of MediaItems it is difficult to retrieve the right instance in my loop
This seems like a common thing to do, so I think there is a simple approach to solve this. Would be great to learn about it.
First at all, at this line media_items.find(item_id).update(ordering: item_order) you don't have an n + 1 issue, you have a 2 * n issue. Because for each media_item you make 2 queries: one for find, one for update. To fix you can do this:
params[:ordering_media].each do |item|
MediaItem.update_all({ordering: item[:ordering]}, {id: item[:media_item_id]})
end
Here you have n queries. That is the best we can do, there's no way to update a column on n records with n distinct values, with less than n queries.
Now you can remove the lines #slot = Slot.find(params.require(:id)) and #slot.save, because #slot was not modified or used at the update action.
With this refactor, we see a problem: the action SlotsController#update don't update slot at all. A better place for this code could be MediaItemsController#sort or SortMediaItemsController#update (more RESTful).
At the last #slot = Slot.includes(:media_items).find(params.require(:id)) this is not n + 1 query, this is 2 SQL statements query, because you retrieve n media_items and 1 slot with only 2 db calls. Also it's the best option.
I hope it helps.

ActiveRecord query array intersection?

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:
Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size
In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.
EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.
2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!
Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.
See this SO thread How to check if an array field is a part of another array in MongoDB?
Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.
However, there is one problem with your database query
Article.where(status: 'Finished')
# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'
Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:
Article.where(status: 'Finished').pluck(:tags)
# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'
I answered a question regarding general intersection like queries in ActiveRecord here.
Extracted below:
The following is a general approach I use for constructing intersection like queries in ActiveRecord:
class Service < ActiveRecord::Base
belongs_to :person
def self.with_types(*types)
where(service_type: types)
end
end
class City < ActiveRecord::Base
has_and_belongs_to_many :services
has_many :people, inverse_of: :city
end
class Person < ActiveRecord::Base
belongs_to :city, inverse_of: :people
def self.with_cities(cities)
where(city_id: cities)
end
# intersection like query
def self.with_all_service_types(*types)
types.map { |t|
joins(:services).merge(Service.with_types t).select(:id)
}.reduce(scoped) { |scope, subquery|
scope.where(id: subquery)
}
end
end
Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))
It will generate SQL of the form:
SELECT "people".*
FROM "people"
WHERE "people"."id" in (SELECT "people"."id" FROM ...)
AND "people"."id" in (SELECT ...)
AND ...
You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.
Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

Resources