Generate array of daily avg values from db table (Rails) - ruby-on-rails

Context:
Trying to generating an array with 1 element for each created_at day in db table. Each element is the average of the points (integer) column from records with that created_at day.
This will later be graphed to display the avg number of points on each day.
Result:
I've been successful in doing this, but it feels like an unnecessary amount of code to generate the desired result.
Code:
def daily_avg
# get all data for current user
records = current_user.rounds
# make array of long dates
long_date_array = records.pluck(:created_at)
# create array to store short dates
short_date_array = []
# remove time of day
long_date_array.each do |date|
short_date_array << date.strftime('%Y%m%d')
end
# remove duplicate dates
short_date_array.uniq!
# array of avg by date
array_of_avg_values = []
# iterate through each day
short_date_array.each do |date|
temp_array = []
# make array of records with this day
records.each do |record|
if date === record.created_at.strftime('%Y%m%d')
temp_array << record.audio_points
end
end
# calc avg by day and append to array_of_avg_values
array_of_avg_values << temp_array.inject(0.0) { |sum, el| sum + el } / temp_array.size
end
render json: array_of_avg_values
end
Question:
I think this is a common extraction problem needing to be solved by lots of applications, so I'm wondering if there's a known repeatable pattern for solving something like this?
Or a more optimal way to solve this?
(I'm barely a junior developer so any advice you can share would be appreciated!)

Yes, that's a lot of unnecessary stuff when you can just go down to SQL to do it (I'm assuming you have a class called Round in your app):
class Round
DAILY_AVERAGE_SELECT = "SELECT
DATE(rounds.created_at) AS day_date,
AVG(rounds.audio_points) AS audio_points
FROM rounds
WHERE rounds.user_id = ?
GROUP BY DATE(rounds.created_at)
"
def self.daily_average(user_id)
connection.select_all(sanitize_sql_array([DAILY_AVERAGE_SELECT, user_id]), "daily-average")
end
end
Doing this straight into the database will be faster (and also include less code) than doing it in ruby as you're doing now.

I advice you to do something like this:
grouped =
records.order(:created_at).group_by do |r|
r.created_at.strftime('%Y%m%d')
end
At first here you generate proper SQL near to that you wish to get in first approximation, then group result records by created_at field converted to just a date.
points =
grouped.map do |(date, values)|
[ date, values.reduce(0.0, :audio_points) / values.size ]
end.to_h
# => { "1-1-1970" => 155.0, ... }
Then you remap your grouped hash via array, to calculate average values with audio_points.

You can use group and calculations methods built in AR: http://guides.rubyonrails.org/active_record_querying.html#group
http://guides.rubyonrails.org/active_record_querying.html#calculations

Related

Rails Optimize query and loop through large entity

I have a method that outputs the following hash format for charting.
# Monthly (Jan - Dec)
{
"john": [1,2,3,4,5,6,7,8,9,10,11,12],
"mike": [1,2,3,4,5,6,7,8,9,10,11,12],
"rick": [1,2,3,4,5,6,7,8,9,10,11,12]
}
# the indices represents the month
# e.g [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
# Index
# 0 = Jan
# 1 = Feb
# 2 = Mar
...
The following method loops through all the store invoices within given year with specific sales rep name and generate above outcome
def chart_data
hash = Hash.new {|h,k| h[k] = [] }
(1..12).each do |month|
date_range = "1/#{month}/#{date.year}".to_date.all_month
all_reps.each do |name|
hash[name] << store.bw_invoices.where(sales_rep_name: name,
purchase_date: date_range).sum(:subtotal).to_f
end
end
return hash
end
When I run run this method it takes over 4~5 sec to execute. I really need to optimize this query. I came up with two solutions that I think it would help but I would love to get some of your expertise.
move it to background job
perform a SQL query to optimize(I need help with this if this is optimal)
Thank you so much for your time
Yes, you've found a problem that is very hard to solve efficiently without letting the database do the hard work.
Assuming your dataset is potentially too large to load a whole year raw into ruby objects, this approach using just 1 postgreSQL query would be probably the best kind of idea:
More SQL approach
def chart_data
result = Hash.new {|h,k| h[k] = [] }
total_lines = store.bw_invoices.select("sales_rep_name, to_char(purchase_date, 'mm') as month, sum(subtotal) as total")
.where(purchase_date: Date.today.all_year)
.group("sales_rep_name, to_char(purchase_date, 'mm')")
total_lines.each do |total_line|
result[total_line.sales_rep_name][total_line.month.to_i - 1] = total_line.total.to_f
end
result
end
Note that this solution will leave nil rather than 0 for months where a rep had no sales. And if their last month with sales was June then there will only be 6 items in the array.
We can avoid this either with more complex SQL left joining from a virtual table or by filling in the array gaps afterwards. However, depending on how you setup your charting this might make no practical difference anyway.
More ruby approach
def chart_data
result = Hash.new {|h,k| h[k] = [] }
(1..12).each do |month|
date_range = "1/#{month}/#{Date.today.year}".to_date.all_month
rows = store.bw_invoices.select("sales_rep_name, SUM(subtotal) as total")
.where(purchase_date: date_range)
.group(:sales_rep_name)
all_reps.each do |rep_name|
row = rows.detect { |x| x.sales_rep_name == rep_name }
result[rep_name] << (row ? row.total : 0).to_f
end
end
result
end
This is more similar to your approach but takes the querying outside of the inner loop so we do 12 queries instead of 12 * number of reps. The detect used may become a little slow but only if there are thousands of reps. In which case you could sort both all_reps and the query output and implement your own kind of merge join but at that point you're getting into complexity you might as well let the database handle again.

Ruby/Rails how to iterate months over a DateTime range?

I am trying to build a graph from data in a Rails table: The amount of sold products per time-fragment.
Because the graph should be able to show the last hour(in 1-minute steps), the last day (in 1-hour steps), the last week (in 1-day steps), the last month (in 1-day steps), etc, I am trying to reduce the code duplication by iterating over a range of DateTime objects:
# To prevent code-duplication, iterate over different time ranges.
times = {
:hour=>{newer_than: 1.hour.ago, timestep: :minute},
:day=>{newer_than: 1.day.ago, timestep: :hour},
:week=>{newer_than: 1.week.ago, , timestep: :day},
:month=>{newer_than: 1.week.ago, , timestep: :day}
}
products = Product.all
# Create symbols `:beginning_of_minute`, `:beginning_of_hour`, etc. These are used to group products and timestamps by.
times.each do|name, t|
t[:beginning_of] = ("beginning_of_" << t[:timestep].to_s).to_sym
end
graphs = times.map do |name, t|
graphpoints = {}
seconds_in_a_day = 1.day.to_f
step_ratio = 1.send(t[:timestep]).ago.to_f / seconds_in_a_day
time_enum = 1.send(t[:timestep]).ago.to_datetime.step(DateTime.now, step_ratio)
time_enum.each do |timestep|
graphpoints[time_moment.send(timehash[:beginning_of]).to_datetime] = []
end
# Load all products that are visible in this graph size
visible_products = products.select {|p| p.created_at >= t.newer_than}
# Group them per graph point
grouped_products = visible_products.group_by {|item| item.created_at.send(timehash[:beginning_of]).to_datetime}
graphpoints.merge!(grouped_products)
{
points: graphpoints,
labels: graphpoints.keys
}
end
This code works great for all time-intervals that have a constant size (hour,day,week). For months, however, it uses a step_ratio of 30 days: 1.month / 1.day == 30. Obviously, the amount of days that months has is not constant. In my script, this has the result that a month might be 'skipped' and therefore missing from the graph.
How can this problem be solved? How to iterate over months while keeping the different amount of days in the months in mind?
if you have to select month over a gigantic arrays, just make the range between two Date:class.
(1.year.ago.to_date..DateTime.now.to_date)).select{|date| date.day==1}.each do |date|
p date
end
Use groupdate gem. For example (modified example from the docs):
visible_products = Product.where("created_at > ?", 1.week.ago).group_by_day
# {
# 2015-07-29 00:00:00 UTC => 50,
# 2013-07-30 00:00:00 UTC => 100,
# 2013-08-02 00:00:00 UTC => 34
# }
Also, this will be much faster, because your grouping/counting will be done by database itself, without the need to pass all the records via Product.all call to your Rails code, and without the need to create ActiveRecord object for each one (even irrelevant).

Rails: How To Deal With Many Subsets of Data From One Model?

Rails 3.0.3 application (stuck on a Dreamhost shared server).
I have a page that displays averages calculated from subsets of data from one model.
Right now, each average is calculated individually, like this:
From the view, I'm using the current_user helper provided by Devise authentication to call the average methods that are located in the user model, like so:
<%= current_user.seven_day_weight_average %>
<%= current_user.fourteen_day_weight_average %>
<%= current_user.thirty_day_weight_average %>
Here's the public methods and the averaging method in the user model:
def seven_day_weight_average
calculate_average_weight(7)
end
def fourteen_day_weight_average
calculate_average_weight(14)
end
def thirty_day_weight_average
calculate_average_weight(30)
end
. . .
private
def calculate_average_weight(number_days)
temp_weight = 0
weights_array = self.weights.find_all_by_entry_date(number_days.days.ago..Date.today)
unless weights_array.count.zero?
weights_array.each do |weight|
temp_weight += weight.converted_weight
end
return (temp_weight/weights_array.count).round(1).to_s
else
return '0.0'
end
end
This doesn't seem very efficient - the database is queried for every average calculated.
How can I calculate and make these averages available to the page with one database query?
You could cache an array of converted weights for the last 30 days (presuming 30 is the maximum days back), something like this:
def calculate_average_weight(number_days)
#converted_weights ||= weights.where("entry_date > ?", 30.days.ago).group_by(&:entry_date).sort_by do |date,weights|
date
end.collect do |date,weights|
weights.collect(&:converted_weight)
end
weights_during_period = #converted_weights[0..number_days-1].flatten
weights_during_period.sum / weights_during_period.length
end
Explanation:
Firstly, ||= gets or sets #converted_weights (ie don't bother setting it unless it's nil or false). This ensures only one db hit. Next, we find all weights from 30 days ago and group by date. This returns an array of [date, weights], which we sort by date. Then we collect the converted weights for each date, so we end up with: [weights on day 1], [weights on day 2], ....
Now, the calculation: we store values spanning the number of days from the array in weights_during_period. We flatten the values and calculate the average value.

How can I speed up my Ruby/Rake task, which counts occurrences of dates among 300K date strings?

I have an array of 300K strings which represent dates:
date_array = [
"2007-03-25 14:24:29",
"2007-03-25 14:27:00",
...
]
I need to count occurrences of each date in this array (e.g., all date strings for "2011-03-25"). The exact time doesn't matter -- just the date. I know the range of dates within the file. So I have:
Date.parse('2007-03-23').upto Date.parse('2011-10-06') do |date_to_count|
count = 0
date_array.each do |date_string|
if Date.parse(date_string) >= date_to_count &&
Date.parse(date_string) <= date_to_count
count += 1
end
end
puts "#{date_to_count} occurred #{count} times."
end
Counting occurrences of just one date takes longer than 60 seconds on my machine. In what ways can I optimize the performance of this task?
Possibly useful notes: I'm using Ruby 1.9.2. This script is running in a Rake task with rake 0.9.2. The date_array is loaded from a CSV file. On each iteration, the count is saved as a record in my Rails project database.
Yes, you don't need to parse the dates at all if they are formatted the same. Knowing your data is one of the most powerful tools you can have.
If the datetime strings are all in the same format (yyyy-mm-dd HH:MM:SS) then you could do something like
data_array.group_by{|datetime| datetime[0..9]}
This will give you a hash like with the date strings as the keys and the array of dates as values
{
"2007-05-06" => [...],
"2007-05-07" => [...],
...
}
So you'd have to get the length of each array
data_array.group_by{|datetime| datatime[0..9]}.each do |date_string, date_array|
puts "#{date_string} occurred #{date_array.length} times."
end
Of course that method is wasting memory by arrays of dates when you don't need them.
so how about
A more memory-efficient method
date_counts = {}
date_array.each do |date_string|
date = date_string[0..9]
date_counts[date] ||= 0 # initialize count if necessary
date_counts[date] += 1
end
You'll end up with a hash with the date strings as the keys and the counts as values
{
"2007-05-06" => 123,
"2007-05-07" => 456,
...
}
Putting everything together
date_counts = {}
date_array.each do |date_string|
date = date_string[0..9]
date_counts[date] ||= 0 # initialize count if necessary
date_counts[date] += 1
end
Date.parse('2007-03-23').upto Date.parse('2011-10-06') do |date_to_count|
puts "#{date_to_count} occurred #{date_counts[date_to_count.to_s].to_i} times."
end
This is a really awful algorithm to use. You're scanning through the entire list for each date, and further, you're parsing the same date twice for no apparent reason. That means for N dates in the range and M dates in the list you're doing N*M*2 date parses.
What you really need is to use group_by and do it in one pass:
dates = date_array.group_by do |date_string|
Date.parse(date_string)
end
Then you can use this as a reference for your counts:
Date.parse('2007-03-23').upto Date.parse('2011-10-06') do |date_to_count|
puts "#{date_to_count} occurred #{dates[date_to_count] ? dates[date_to_count].length : 0} times."
end

Query sum speedup - Date series for charts

The following query runs fairly quickly, but the series processing that needs to take place afterwards is really slowing this method down. I could use some help in refactoring.
def self.sum_amount_chart_series(start_time)
orders_by_day = Widget.archived.not_void.
where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day).
group(pg_print_date_group).
select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")
# THIS IS WHAT IS SLOWING THE METHOD DOWN!
(start_time.to_date..Date.today).map do |date|
order = orders_by_day.detect { |order| order.print_date.to_date == date }
order && order.total_amount.to_f.round(2) || 0.0
end
end
def self.pg_print_date_group
"CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)"
end
I have benchmarked this method and the offending code is the series loop where it generates a series of dates and then maps out a new array with an amount for each date. This way I get a series back with amounts for every date, regardless if it has an amount or not.
When the query only returns a few dates, it runs fairly quickly. But set the start date back a year or two and it becomes impossibly slow. The real offender is the .detect method. It's very slow at scanning the array of activerecord objects.
Is there a faster method to generates this series?
orders_by_day is grouped by "pg_print_date_group" so it should be a hash of "date" to objects. so why don't you just do
(start_time.to_date..Date.today).map do |date|
order = orders_by_day[date.to_s(:db)]
order && order.total_amount.to_f.round(2) || 0.0
end
That should seriously reduce the Big O of your run. And if I'm misunderstanding and your orders_by_day isn't a hash, preprocess it into a hash and then run the map, you definitely don't want to detect for every date.
Since the primary offender in your code is the detect method that has to scan the array again and again, I suggest that you invert the order in which you create the series, so that you only scan the array once, and your code runs in O(n) time.
Try something along the lines of:
series = []
next_date = start_time.to_date
orders_by_day.each do |order|
while order.print_date.to_date < next_date
series << 0.0
next_date = next_date.next
end
series << order.total_amount.to_f.round(2)
next_date += 1
end
while next_date < Date.today
series << 0.0
next_date = next_date.next
end
Please note that my code is untested ;)

Resources