Does splitting up an active record query over 2 methods hit the database twice? - ruby-on-rails

I have a database query where I want to get an array of Users that are distinct for the set:
#range is a predefinded date range
#shift_list is a list of filtered shifts
def listing
Shift
.where(date: #range, shiftname: #shift_list)
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
and I read somewhere that for readability, or isolating for testing, or code reuse, you could split this into seperate methods:
def listing
shiftlist
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
def shift_list
Shift
.where(date: #range, shiftname: #shift_list)
end
So I rewrote this and some other code, and now the page takes 4 times as long to load.
My question is, does this type of method splitting cause the database to be hit twice? Or is it something that I did elsewhere?
And I'd love a suggestion to improve the efficiency of this code.
Further to the need to remove mapping from the code, this shift list is being created with the following code:
def _month_shift_list
Shift
.select(:shiftname)
.distinct
.where(date: #range)
.map {|x| x.shiftname }
end
My intention is to create an array of shiftnames as strings.
I am obviously missing some key understanding in database access, as this method is clearly creating part of the problem.
And I think I have found the solution to this with the following:
def month_shift_list
Shift.
.where(date: #range)
.pluck(:shiftname)
.uniq
end

Nope, the database will not be hit twice. The queries in both methods are lazy loaded. The issue you have with the slow page load times is because the map function now has to do multiple finds which translates to multiple SELECT from the DB. You can re-write your query to this:
def listing
User.
joins(:shift).
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
This has just one hit to the DB and will be much faster and should produce the same result as above.
The assumption here is that there is a has_one/has_many relationship on the User model for Shifts
class User < ActiveRecord::Base
has_one :shift
end
If you don't want to establish the has_one/has_many relationship on User, you can re-write it to:
def listing
User.
joins("INNER JOIN shifts on shifts.user_id = users.id").
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
ALTERNATIVE:
You can use 2 queries if you experience issues with using ActiveRecord#merge.
def listing
user_ids = Shift.where(date: #range, shiftname: #shift_list).uniq.pluck(:user_id).sort
User.find(user_ids)
end

Related

Rails code refactor in call method to handle map

I'm just wondering is there any chance to get fresh eye on code below and make some code refactor?
def call
inq_proc_ids = InquiryProcess.all.includes(inquiry_field_responses: :inquiry_field).select do |process|
process.inquiry_field_responses.select do |inquiry_field_responses|
inquiry_field_responses.inquiry_field.name == 'company_name'
end.last&.value&.start_with?(company_filter)
end.map(&:id)
InquiryProcess.where(id: inq_proc_ids)
end
I think I should leave only InquiryProcess.where(id: inq_proc_ids) in my call method but I don't know how to handle with all these .last&.value&.start_with?(company_filter) and .map(&:id) stuff.
EDIT:
I was trying to split it to the new methods
def call
InquiryProcess.where(id: inquiry_process_id)
end
private
attr_reader :company_filter, :inquiry_field_response
def inquiry_process_id
InquiryProcess.all.includes(inquiry_field_responses: :inquiry_field).select do |process|
process.inquiry_field_responses.select_company_name
end.map(&:id)
end
def select_company_name
select do |inquiry_field_responses|
inquiry_field_responses.inquiry_field.name == 'company_name'
end.last&.value&.start_with?(company_filter)
end
but I got an error:
NoMethodError (undefined method `select_company_name' for ActiveRecord::Associations::CollectionProxy []>):
The code you posted is not only hard to follow, but I remember we had a massive memory leak connected to ActiveReocrd caching when using precalculated ids in a query.
That said, I'd try to utilise the above within a single sql query:
def call
id_select = InquiryProcess
.joins(inquiry_field_responses: :inquiry_field)
.where(inquire_fields: { name: 'company_name' })
.where(InquiryField.arel_table[:value].matches("#{company_filter}%"))
.select(:id)
InquiryProcess.where(id: id_select)
end
Note that id_select is not an array of ids but ActiveRecord scope, the above will translate to following SQL:
SELECT "inquiry_processes".*
FROM "inquiry_processes"
WHERE "inquiry_processes"."id" IN (
SELECT "inquiry_processes"."id"
FROM "inquiry_processes"
INNER JOIN ...
WHERE ...
)
And to answer another question - why do we query table by matching id to a result of another subquery on the same table? This is to avoid all sort of painful issues when you deal with an active record relation that has a join in it - e.g. it would affect all further includes statements, as the preloaded association would only include records matching the relation join conditions.
I really hope for you that this bit is quite well tested or you have someone who can verify validity of the behaviour.

Are .select and or .where responsible for causing N+1 queries in rails?

I have two methods here, distinct_question_ids and #correct_on_first attempt. The goal is to show a user how many distinct multiple choice questions have been answered that are correct.
The second one will let me know how many of these distinct MCQs have been answered correctly on the first attempt. (A user can attempt a MCQ many times)
Now, when a user answers thousands of questions and has thousands of user answers, the page to show their performance is taking 30 seconds to a minute to load. And I believe it's due to the .select method, but I don't know how to replace .select without using .select, since it loops just like .each
Is there any method that doesn't cause N+1?
distinct_question_ids = #user.user_answers.includes(:multiple_choice_question).
where(is_correct_answer: true).
distinct.pluck(:multiple_choice_question_id)
#correct_on_first_attempt = distinct_question_ids.select { |qid|
#user.user_answers.
where(multiple_choice_question_id: qid).first.is_correct_answer
}.count
.pluck returns an Array of values, not an ActiveRecord::Relation.
So when you do distinct_question_ids.select you're not calling ActiveRecord's select, but Array's select. Within that select, you're issuing a fresh new query against #user for every id you just plucked -- including ones that get rejected in the select.
You could create a query named distinct_questions that returns a relation (no pluck!), and then build correct_on_first_attempt off of that, and I think you'll avoid the N+1 queries.
Something along these lines:
class UserAnswer < ActiveRecord::Base
scope :distinct_correct, -> { includes(:multiple_choice_question)
.where(is_correct_answer: true).distinct }
scope :first_attempt_correct, -> { distinct_correct
.first.is_correct_answer }
end
class User < ActiveRecord::Base
def good_guess_count
#correct_on_first_attempt = #user.user_answers.distinct_correct.first_attempt_correct.count
end
end
You'll need to ensure that .first is actually getting their first attempt, probably by sorting by id or created_at.
As an aside, if you track the attempt number explicitly in UserAnswer, you can really tighten this up:
class UserAnswer < ActiveRecord::Base
scope :correct, -> { where(is_correct_answer: true) }
scope :first_attempt, -> { where(attempt: 1) }
end
class User < ActiveRecord::Base
def lucky_guess_count
#correct_on_first_attempt = #user.user_answers.includes(:multiple_choice_question)
.correct.first_attempt.count
end
end
If you don't have an attempt number in your schema, you could .order and .group to get something similar. But...it seems that some of your project requirements depend on that sequence number, so I'd recommend adding it if you don't have it already.
ps. For fighting N+1 queries, use gem bullet. It is on-point.

How to efficiently update associated collection in rails (eager loading)

I have a simple association like
class Slot < ActiveRecord::Base
has_many :media_items, dependent: :destroy
end
class MediaItem < ActiveRecord::Base
belongs_to :slot
end
The MediaItems are ordered per Slot and have a field called ordering.
And want to avoid n+1 querying but nothing I tried works. I had read several blogposts, railscasts etc but hmm.. they never operate on a single model and so on...
What I do is:
def update
#slot = Slot.find(params.require(:id))
media_items = #slot.media_items
par = params[:ordering_media]
# TODO: IMP remove n+1 query
par.each do |item|
item_id = item[:media_item_id]
item_order = item[:ordering]
media_items.find(item_id).update(ordering: item_order)
end
#slot.save
end
params[:ordering_media] is a json array with media_item_id and an integer for ordering
I tried things like
#slot = Slot.includes(:media_items).find(params.require(:id)) # still n+1
#slot = Slot.find(params.require(:id)).includes(:media_items) # not working at all b/c is a Slot already
media_items = #slot.media_items.to_a # looks good but then in the array of MediaItems it is difficult to retrieve the right instance in my loop
This seems like a common thing to do, so I think there is a simple approach to solve this. Would be great to learn about it.
First at all, at this line media_items.find(item_id).update(ordering: item_order) you don't have an n + 1 issue, you have a 2 * n issue. Because for each media_item you make 2 queries: one for find, one for update. To fix you can do this:
params[:ordering_media].each do |item|
MediaItem.update_all({ordering: item[:ordering]}, {id: item[:media_item_id]})
end
Here you have n queries. That is the best we can do, there's no way to update a column on n records with n distinct values, with less than n queries.
Now you can remove the lines #slot = Slot.find(params.require(:id)) and #slot.save, because #slot was not modified or used at the update action.
With this refactor, we see a problem: the action SlotsController#update don't update slot at all. A better place for this code could be MediaItemsController#sort or SortMediaItemsController#update (more RESTful).
At the last #slot = Slot.includes(:media_items).find(params.require(:id)) this is not n + 1 query, this is 2 SQL statements query, because you retrieve n media_items and 1 slot with only 2 db calls. Also it's the best option.
I hope it helps.

Rails best way to get previous and next active record object

I need to get the previous and next active record objects with Rails. I did it, but don't know if it's the right way to do that.
What I've got:
Controller:
#product = Product.friendly.find(params[:id])
order_list = Product.select(:id).all.map(&:id)
current_position = order_list.index(#product.id)
#previous_product = #collection.products.find(order_list[current_position - 1]) if order_list[current_position - 1]
#next_product = #collection.products.find(order_list[current_position + 1]) if order_list[current_position + 1]
#previous_product ||= Product.last
#next_product ||= Product.first
product_model.rb
default_scope -> {order(:product_sub_group_id => :asc, :id => :asc)}
So, the problem here is that I need to go to my database and get all this ids to know who is the previous and the next.
Tried to use the gem order_query, but it did not work for me and I noted that it goes to the database and fetch all the records in that order, so, that's why I did the same but getting only the ids.
All the solutions that I found was with simple order querys. Order by id or something like a priority field.
Write these methods in your Product model:
class Product
def next
self.class.where("id > ?", id).first
end
def previous
self.class.where("id < ?", id).last
end
end
Now you can do in your controller:
#product = Product.friendly.find(params[:id])
#previous_product = #product.next
#next_product = #product.previous
Please try it, but its not tested.
Thanks
I think it would be faster to do it with only two SQL requests, that only select two rows (and not the entire table). Considering that your default order is sorted by id (otherwise, force the sorting by id) :
#previous_product = Product.where('id < ?', params[:id]).last
#next_product = Product.where('id > ?', params[:id]).first
If the product is the last, then #next_product will be nil, and if it is the first, then, #previous_product will be nil.
There's no easy out-of-the-box solution.
A little dirty, but working way is carefully sorting out what conditions are there for finding next and previous items. With id it's quite easy, since all ids are different, and Rails Guy's answer describes just that: in next for a known id pick a first entry with a larger id (if results are ordered by id, as per defaults). More than that - his answer hints to place next and previous into the model class. Do so.
If there are multiple order criteria, things get complicated. Say, we have a set of rows sorted by group parameter first (which can possibly have equal values on different rows) and then by id (which id different everywhere, guaranteed). Results are ordered by group and then by id (both ascending), so we can possibly encounter two situations of getting the next element, it's the first from the list that has elements, that (so many that):
have the same group and a larger id
have a larger group
Same with previous element: you need the last one from the list
have the same group and a smaller id
have a smaller group
Those fetch all next and previous entries respectively. If you need only one, use Rails' first and last (as suggested by Rails Guy) or limit(1) (and be wary of the asc/desc ordering).
This is what order_query does. Please try the latest version, I can help if it doesn't work for you:
class Product < ActiveRecord::Base
order_query :my_order,
[:product_sub_group_id, :asc],
[:id, :asc]
default_scope -> { my_order }
end
#product.my_order(#collection.products).next
#collection.products.my_order_at(#product).next
This runs one query loading only the next record. Read more on Github.

How do I calculate the most popular combination of a order lines? (or any similar order/order lines db arrangement)

I'm using Ruby on Rails. I have a couple of models which fit the normal order/order lines arrangement, i.e.
class Order
has_many :order_lines
end
class OrderLines
belongs_to :order
belongs_to :product
end
class Product
has_many :order_lines
end
(greatly simplified from my real model!)
It's fairly straightforward to work out the most popular individual products via order line, but what magical ruby-fu could I use to calculate the most popular combination(s) of products ordered.
Cheers,
Graeme
My suggestion is to create an array a of Product.id numbers for each order and then do the equivalent of
h = Hash.new(0)
# for each a
h[a.sort.hash] += 1
You will naturally need to consider the scale of your operation and how much you are willing to approximate the results.
External Solution
Create a "Combination" model and index the table by the hash, then each order could increment a counter field. Another field would record exactly which combination that hash value referred to.
In-memory Solution
Look at the last 100 orders and recompute the order popularity in memory when you need it. Hash#sort will give you a sorted list of popularity hashes. You could either make a composite object that remembered what order combination was being counted, or just scan the original data looking for the hash value.
Thanks for the tip digitalross. I followed the external solution idea and did the following. It varies slightly from the suggestion as it keeps a record of individual order_combos, rather than storing a counter so it's possible to query by date as well e.g. most popular top 10 orders in the last week.
I created a method in my order which converts the list of order items to a comma separated string.
def to_s
order_lines.sort.map { |ol| ol.id }.join(",")
end
I then added a filter so the combo is created every time an order is placed.
after_save :create_order_combo
def create_order_combo
oc = OrderCombo.create(:user => user, :combo => self.to_s)
end
And finally my OrderCombo class looks something like below. I've also included a cached version of the method.
class OrderCombo
belongs_to :user
scope :by_user, lambda{ |user| where(:user_id => user.id) }
def self.top_n_orders_by_user(user,count=10)
OrderCombo.by_user(user).count(:group => :combo).sort { |a,b| a[1] <=> b[1] }.reverse[0..count-1]
end
def self.cached_top_orders_by_user(user,count=10)
Rails.cache.fetch("order_combo_#{user.id.to_s}_#{count.to_s}", :expiry => 10.minutes) { OrderCombo.top_n_orders_by_user(user, count) }
end
end
It's not perfect as it doesn't take into account increased popularity when someone orders more of one item in an order.

Resources