Ruby - how to speed up looping through an ".each" array? - ruby-on-rails

I'm having these models and following lines in a method where I am trying to improve performance of the code.
class Location < ActiveRecord::Base
belongs_to :company
end
class Company < ActiveRecord::Base
has_many :locations
end
In the method:
locations_company = []
###
found_locations = Location.within(distance, origin: from_result.split(',')).order("distance ASC")
### 0.002659s
###
found_locations.each do |location|
locations_company << location.company
end
### 45.972285s
###
companies = locations_company.uniq{|x| x.id}
### 0.033029s
The code has this functionality - first, grab all locations within a specified radius. Then, from each found row take the company and save it to the prepared array. This is the problematic part - the each loop takes 45 seconds to process.
And then from this newly created array remove duplicities.
I still wondering if there would be a better approach to solve this situation, but I am afraid I don't see it right now, so I'd like to ask you guys how I could speed up the .each looping with saving data to the array - is there a better method in ruby to grab some information from an object?
Thank you very much for your time, I am immersed in this problem the whole day, but still don't have a more effective solution.

The best way would be to not loop. Your end goal appears to be to find all the companies in the specified area.
found_locations = Location.within(distance, origin: from_result.split(',')).order("distance ASC")
companies = Company.where(id: found_locations.pluck(:company_id).uniq)

The problem is not in the each, but in that the query only begins executing when you start iterating over it. found_locations is no the result of the query, it is a query builder that will execute the query once it is needed (such as when you start iterating the results).

I believe the thing that takes all the time is not the each, but rather the query to the db.
The first line, although it builds the query does not really run it.
I believe that if you write the code as follows:
locations_company = []
found_locations = Location.within(distance, origin: from_result.split(',')).order("distance ASC")
### this line will take most of the time
found_locations = found_locations.to_a
###
###
found_locations.each do |location|
locations_company << location.company_id
end
###
###
companies = locations_company.uniq{|x| x.id}
###
You'll see that the each will take a lot less time. You should look into optimizing the query.
As #AlexPeachey has commented below, location.company will also involve a query for each location in the list, since it is a relation. You might want to eagerly load the company by adding:
found_locations = Location.includes(:company).within(distance, origin: from_result.split(',')).order("distance ASC")

Related

Does splitting up an active record query over 2 methods hit the database twice?

I have a database query where I want to get an array of Users that are distinct for the set:
#range is a predefinded date range
#shift_list is a list of filtered shifts
def listing
Shift
.where(date: #range, shiftname: #shift_list)
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
and I read somewhere that for readability, or isolating for testing, or code reuse, you could split this into seperate methods:
def listing
shiftlist
.select(:user_id)
.distinct
.map { |id| User.find( id.user_id ) }
.sort
end
def shift_list
Shift
.where(date: #range, shiftname: #shift_list)
end
So I rewrote this and some other code, and now the page takes 4 times as long to load.
My question is, does this type of method splitting cause the database to be hit twice? Or is it something that I did elsewhere?
And I'd love a suggestion to improve the efficiency of this code.
Further to the need to remove mapping from the code, this shift list is being created with the following code:
def _month_shift_list
Shift
.select(:shiftname)
.distinct
.where(date: #range)
.map {|x| x.shiftname }
end
My intention is to create an array of shiftnames as strings.
I am obviously missing some key understanding in database access, as this method is clearly creating part of the problem.
And I think I have found the solution to this with the following:
def month_shift_list
Shift.
.where(date: #range)
.pluck(:shiftname)
.uniq
end
Nope, the database will not be hit twice. The queries in both methods are lazy loaded. The issue you have with the slow page load times is because the map function now has to do multiple finds which translates to multiple SELECT from the DB. You can re-write your query to this:
def listing
User.
joins(:shift).
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
This has just one hit to the DB and will be much faster and should produce the same result as above.
The assumption here is that there is a has_one/has_many relationship on the User model for Shifts
class User < ActiveRecord::Base
has_one :shift
end
If you don't want to establish the has_one/has_many relationship on User, you can re-write it to:
def listing
User.
joins("INNER JOIN shifts on shifts.user_id = users.id").
merge(Shift.where(date: #range, shiftname: #shift_list).
uniq.
sort
end
ALTERNATIVE:
You can use 2 queries if you experience issues with using ActiveRecord#merge.
def listing
user_ids = Shift.where(date: #range, shiftname: #shift_list).uniq.pluck(:user_id).sort
User.find(user_ids)
end

Ruby on Rails inefficient database query

My Rails app has the following conditions:
Each Style has many Bookings
Each Booking has a single warehouse value, and a single netbooked value
I need to update the warehouse_netbooked column of every Style with a hash containing the total netbooked sum for each warehouse across all of the style's bookings.
My current code works, but is way too slow (each iteration is taking ~0.5s, and there are thousands of styles):
def assign_warehouse_bookings
warehouses = ["WH1","WH2","WH3"]
Style.all.each do |s|
style_warehouse_bookings = Hash.new
warehouses.each do |wh|
total_netbooked = s.bookings.where(warehouse: wh).sum(:netbooked)
style_warehouse_bookings[wh] = total_netbooked
end
s.update(warehouse_netbooked: "#{style_warehouse_bookings}")
end
end
Here a small change to your code to avoid to do many queries following #eric-duminil advise
#styles = Style.includes(:bookings)
#styles.each do |s|
style_warehouse_bookings = Hash.new
warehouses.each do |wh|
total_netbooked = s.bookings.map {|book| book.warehouse.eql?(wh) ? book.netbooked : 0}.sum
style_warehouse_bookings[wh] = total_netbooked
end
s.update(warehouse_netbooked: "#{style_warehouse_bookings}")
end
end
I hope it help you.
In your case, you have for a single style, you are fetching booking with 3 type of warehouses finding sum of netbooked for each type of warehouse. This is highly inefficient.
One good rule is, first fetch required data from database, and after fetching data use ruby to handle data. It's important to fetch only required data. So, in your case, you can fetch styles and all related bookings. Then you can iterate through collection of bookings and prepare style_warehouse_bookings hash.
use find_each instead of each
use includes or preload to preload data.
Here is simple example which will definitely improve performance,
warehouses = ["WH1","WH2","WH3"]
# preload bookings with style, `preload` used explicitely instead of `includes` to prevent cross join queries.
styles = Style.joins(:bookings).where('bookings.warehouse' => warehouses).preload(:bookings)
# find_each fetches data in batches of 1000 records
styles.find_each do |s|
style_warehouse_bookings = Hash.new
warehouses.each do |wh|
# select and sum methods of ruby are used instead of where and sum of active-record
total_netbooked = s.bookings.select{ |booking| booking.warehouse = wh }.sum(&:netbooked)
style_warehouse_bookings[wh] = total_netbooked
end
s.update(warehouse_netbooked: "#{style_warehouse_bookings}")
end
Read in depth about preload, includes and joins at eager loading associations documentation. Apart from that I wrote an article on when to use preload, includes and joins here which can help.
I think you want to do batch update. If I am correct check the following
link
Is there anything like batch update in Rails?
Also you can introduce transaction to avoid too many commits
For example
def assign_warehouse_bookings
Style.transaction do
<your remaining code goes here>
end
end

How to efficiently update associated collection in rails (eager loading)

I have a simple association like
class Slot < ActiveRecord::Base
has_many :media_items, dependent: :destroy
end
class MediaItem < ActiveRecord::Base
belongs_to :slot
end
The MediaItems are ordered per Slot and have a field called ordering.
And want to avoid n+1 querying but nothing I tried works. I had read several blogposts, railscasts etc but hmm.. they never operate on a single model and so on...
What I do is:
def update
#slot = Slot.find(params.require(:id))
media_items = #slot.media_items
par = params[:ordering_media]
# TODO: IMP remove n+1 query
par.each do |item|
item_id = item[:media_item_id]
item_order = item[:ordering]
media_items.find(item_id).update(ordering: item_order)
end
#slot.save
end
params[:ordering_media] is a json array with media_item_id and an integer for ordering
I tried things like
#slot = Slot.includes(:media_items).find(params.require(:id)) # still n+1
#slot = Slot.find(params.require(:id)).includes(:media_items) # not working at all b/c is a Slot already
media_items = #slot.media_items.to_a # looks good but then in the array of MediaItems it is difficult to retrieve the right instance in my loop
This seems like a common thing to do, so I think there is a simple approach to solve this. Would be great to learn about it.
First at all, at this line media_items.find(item_id).update(ordering: item_order) you don't have an n + 1 issue, you have a 2 * n issue. Because for each media_item you make 2 queries: one for find, one for update. To fix you can do this:
params[:ordering_media].each do |item|
MediaItem.update_all({ordering: item[:ordering]}, {id: item[:media_item_id]})
end
Here you have n queries. That is the best we can do, there's no way to update a column on n records with n distinct values, with less than n queries.
Now you can remove the lines #slot = Slot.find(params.require(:id)) and #slot.save, because #slot was not modified or used at the update action.
With this refactor, we see a problem: the action SlotsController#update don't update slot at all. A better place for this code could be MediaItemsController#sort or SortMediaItemsController#update (more RESTful).
At the last #slot = Slot.includes(:media_items).find(params.require(:id)) this is not n + 1 query, this is 2 SQL statements query, because you retrieve n media_items and 1 slot with only 2 db calls. Also it's the best option.
I hope it helps.

Efficient ActiveRecord association conditions

Let's say you have an assocation in one of your models like this:
class User
has_many :articles
end
Now assume you need to get 3 arrays, one for the articles written yesterday, one of for the articles written in the last 7 days, and one of for the articles written in the last 30 days.
Of course you might do this:
articles_yesterday = user.articles.where("posted_at >= ?", Date.yesterday)
articles_last7d = user.articles.where("posted_at >= ?", 7.days.ago.to_date)
articles_last30d = user.articles.where("posted_at >= ?", 30.days.ago.to_date)
However, this will run 3 separate database queries. More efficiently, you could do this:
articles_last30d = user.articles.where("posted_at >= ?", 30.days.ago.to_date)
articles_yesterday = articles_last30d.select { |article|
article.posted_at >= Date.yesterday
}
articles_last7d = articles_last30d.select { |article|
article.posted_at >= 7.days.ago.to_date
}
Now of course this is a contrived example and there is no guarantee that the array select will actually be faster than a database query, but let's just assume that it is.
My question is: Is there any way (e.g. some gem) to write this code in a way which eliminates this problem by making sure that you simply specify the association conditions, and the application itself will decide whether it needs to perform another database query or not?
ActiveRecord itself does not seem to cover this problem appropriately. You are forced to decide between querying the database every time or treating the association as an array.
There are a couple of ways to handle this:
You can create separate associations for each level that you want by specifying a conditions hash on the association definition. Then you can simply eager load these associations for your User query, and you will be hitting the db 3x for the entire operation instead of 3x for each user.
class User
has_many articles_yesterday, class_name: Article, conditions: ['posted_at >= ?', Date.yesterday]
# other associations the same way
end
User.where(...).includes(:articles_yesterday, :articles_7days, :articles_30days)
You could do a group by.
What it comes down to is you need to profile your code and determine what's going to be fastest for your app (or if you should even bother with it at all)
You can get rid of the necessity of checking the query with something like the code below.
class User
has_many :articles
def article_30d
#articles_last30d ||= user.articles.where("posted_at >= ?", 30.days.ago.to_date)
end
def articles_last7d
#articles_last7d ||= articles_last30d.select { |article| article.posted_at >= 7.days.ago.to_date }
end
def articles_yesterday
#articles_yesterday ||= articles_last30d.select { |article| article.posted_at >= Date.yesterday }
end
end
What it does:
Makes only one query maximum, if any of the three is used
Calculates only the used array, and the 30d version in any case, but only once
It does not however simplifies the initial 30d query even if you do not use it. Is it enough, or you need something more?

How do I calculate the most popular combination of a order lines? (or any similar order/order lines db arrangement)

I'm using Ruby on Rails. I have a couple of models which fit the normal order/order lines arrangement, i.e.
class Order
has_many :order_lines
end
class OrderLines
belongs_to :order
belongs_to :product
end
class Product
has_many :order_lines
end
(greatly simplified from my real model!)
It's fairly straightforward to work out the most popular individual products via order line, but what magical ruby-fu could I use to calculate the most popular combination(s) of products ordered.
Cheers,
Graeme
My suggestion is to create an array a of Product.id numbers for each order and then do the equivalent of
h = Hash.new(0)
# for each a
h[a.sort.hash] += 1
You will naturally need to consider the scale of your operation and how much you are willing to approximate the results.
External Solution
Create a "Combination" model and index the table by the hash, then each order could increment a counter field. Another field would record exactly which combination that hash value referred to.
In-memory Solution
Look at the last 100 orders and recompute the order popularity in memory when you need it. Hash#sort will give you a sorted list of popularity hashes. You could either make a composite object that remembered what order combination was being counted, or just scan the original data looking for the hash value.
Thanks for the tip digitalross. I followed the external solution idea and did the following. It varies slightly from the suggestion as it keeps a record of individual order_combos, rather than storing a counter so it's possible to query by date as well e.g. most popular top 10 orders in the last week.
I created a method in my order which converts the list of order items to a comma separated string.
def to_s
order_lines.sort.map { |ol| ol.id }.join(",")
end
I then added a filter so the combo is created every time an order is placed.
after_save :create_order_combo
def create_order_combo
oc = OrderCombo.create(:user => user, :combo => self.to_s)
end
And finally my OrderCombo class looks something like below. I've also included a cached version of the method.
class OrderCombo
belongs_to :user
scope :by_user, lambda{ |user| where(:user_id => user.id) }
def self.top_n_orders_by_user(user,count=10)
OrderCombo.by_user(user).count(:group => :combo).sort { |a,b| a[1] <=> b[1] }.reverse[0..count-1]
end
def self.cached_top_orders_by_user(user,count=10)
Rails.cache.fetch("order_combo_#{user.id.to_s}_#{count.to_s}", :expiry => 10.minutes) { OrderCombo.top_n_orders_by_user(user, count) }
end
end
It's not perfect as it doesn't take into account increased popularity when someone orders more of one item in an order.

Resources