Group by unique values while summing / adding other values - ruby-on-rails

I have a data structure that looks like this:
arr = [
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
}
]
I need to group all of these by tariff_code, while summing the price and amounts that correspond with that tariff code. So my expected output should be:
[
{
price: 4.0,
unit: "meter",
tariff_code: "4901.99",
amount: 400
},
{
price: 2.0,
unit: "yards",
tariff_code: "6006.24",
amount: 1000
}
]
receipt_data[:order_items].group_by { |oi| oi[:tariff_code] }.values
The group_by statement used above will allow me to group by tariff_code but I'm unable to work out a way to sum the other values. I'm sure there is a slick one-liner way to accomplish this...

More verbose:
grouped_items = arr.group_by { |oi| oi[:tariff_code] }
result = grouped_items.map do |tariff_code, code_items|
price, amount = code_items.reduce([0, 0]) do |(price, amount), ci|
[price + ci[:price], amount + ci[:amount]]
end
{
price: price,
unit: code_items.first[:unit],
tariff_code: tariff_code,
amount: amount
}
end
#[
# {:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400}
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}
#]

Just to add to the fun, the answer which uses group_by as #cary said, and mostly copying Pavel's answer. This is very bad performancewise and use only if the array is small . Also it uses sum which is available only in Rails. (can be replaced by .map { |item| item[:price] }.reduce(:+) in pure ruby)
arr.group_by { |a| a[:tariff_code] }.map do |tariff_code, items|
{
price: items.sum { |item| item[:price] },
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum { |item| item[:amount] }
}
end
This would have been even smaller if it was an array of objects (ActiveRecord objects maybe) with methods instead of hashes.
arr.group_by(&:tariff_code).map do |tariff_code, items|
{
price: items.sum(&:price]),
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum(&:amount)
}
end

There are two standard ways of addressing problems of this kind. One, which I've taken, is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. The other way is to use Enumerable#group_by, which I expect someone will soon employ in another answer. I do not believe either approach is preferable in terms of efficiency or readability.
arr.each_with_object({}) do |g,h|
h.update(g[:tariff_code]=>g) do |_,o,n|
{ price: o[:price]+n[:price], unit: o[:unit], amount: o[:amount]+n[:amount] }
end
end.values
#=> [{:price=>4.0, :unit=>"meter", :amount=>400},
# {:price=>28.0, :unit=>"yards", :amount=>1000}]
Note that the receiver of values is seen to be:
{"4901.99"=>{:price=>4.0, :unit=>"meter", :amount=>400},
{"6006.24"=>{:price=>28.0, :unit=>"yards", :amount=>1000}}

A simple approach, but its easy to add new keys for summing and to change a group key. Not sure about efficiency, but 500_000 times Benchmark of arr.map here looks good
#<Benchmark::Tms:0x00007fad0911b418 #label="", #real=1.480799000000843, #cstime=0.0, #cutime=0.0, #stime=0.0017340000000000133, #utime=1.4783359999999999, #total=1.48007>
summ_keys = %i[price amount]
grouping_key = :tariff_code
result = Hash.new { |h, k| h[k] = {} }
arr.map do |h|
cumulative = result[h[grouping_key]]
h.each do |k, v|
case k
when *summ_keys
cumulative[k] = (cumulative[k] || 0) + h[k]
else
cumulative[k] = v
end
end
end
p result.values
# [{:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400},
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}]

Related

How to sum repetitions of a value and add it in two values of a key in Ruby?

Im trying to to create a hash with one key per each type of extension on a directory. To every key I would like to add two values: number of times that extension is repeated and total size of all the files with that extension.
Something similar to this:
{".md" => {"ext_reps" => 6, "ext_size_sum" => 2350}, ".txt" => {"ext_reps" => 3, "ext_size_sum" => 1300}}
But I´m stuck on this step:
hash = Hash.new{|hsh,key| hsh[key] = {}}
ext_reps = 0
ext_size_sum = 0
Dir.glob("/home/computer/Desktop/**/*.*").each do |file|
hash[File.extname(file)].store "ext_reps", ext_reps
hash[File.extname(file)].store "ext_size_sum", ext_size_sum
end
p hash
With this result:
{".md" => {"ext_reps" => 0, "ext_size_sum" => 0}, ".txt" => {"ext_reps" => 0, "ext_size_sum" => 0}}
And I can't finde the way to increment ext_reps and ext_siz_sum
Thanks
Suppose the file name extensions and files sizes drawn are as follows.
files = [{ ext: 'a', size: 10 },
{ ext: 'b', size: 20 },
{ ext: 'a', size: 30 },
{ ext: 'c', size: 40 },
{ ext: 'b', size: 50 },
{ ext: 'a', size: 60 }]
You can use Hash#group_by and Hash#transform_values.
files.group_by { |h| h[:ext] }.
transform_values do |arr|
{ "ext_reps"=>arr.size, "ext_size_sum"=>arr.sum { |h| h[:size] } }
end
#=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
# "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
# "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
Note that the first calculation is as follows.
files.group_by { |h| h[:ext] }
#=> {"a"=>[{:ext=>"a", :size=>10}, {:ext=>"a", :size=>30},
# {:ext=>"a", :size=>60}],
# "b"=>[{:ext=>"b", :size=>20}, {:ext=>"b", :size=>50}],
# "c"=>[{:ext=>"c", :size=>40}]}
Another way is use the forms of Hash#update (aka Hash#merge!) and Hash#merge that employ a block to compute the values of keys that are present in both hashes being merged. (Ruby does not consult that block when a key-value pair with key k is being merged into the hash being built (h) when h does not have a key k.)
See the docs for an explanation of the three parameters of the block that returns the values of common keys of hashes being merged.
files.each_with_object({}) do |g,h|
h.update(g[:ext]=>{"ext_reps"=>1, "ext_size_sum"=>g[:size]}) do |_k,o,n|
o.merge(n) { |_kk, oo, nn| oo + nn }
end
end
#=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
# "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
# "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
I've chosen names for the common keys of the "outer" and "inner" hashes (_k and _kk, respectively) that begin with an underscore to signal to the reader that they are not used in the block calculation. This is common practive.
Note that this approach avoids the creation of a temporary hash similar to that created by group_by and therefore tends to use less memory than the first approach.
Here is a solution inspired by the answers given by Cary Swoveland and BenFenner
hash = {}
Dir.glob("/home/computer/Desktop/**/*.*").each do |file|
(hash[File.extname(file)] ||= []) << file.size
end
hash.transform_values! { |sizes| { "ext_reps" => sizes.count, "ext_size_sum" => sizes.sum } }
With each_with_object and nested Hash.new
files = [{ ext: 'a', size: 10 },
{ ext: 'b', size: 20 },
{ ext: 'a', size: 30 },
{ ext: 'c', size: 40 },
{ ext: 'b', size: 50 },
{ ext: 'a', size: 60 }]
files.each_with_object(Hash.new(Hash.new(0))) do |el, hash|
h = hash[el[:ext]]
hash[el[:ext]] =
{ "ext_reps" => h["ext_reps"] + 1, "ext_size_sum" => h["ext_size_sum"] + el[:size] }
end
#=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
# "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
# "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
It's not the most "Ruby-like" solution, but going along with your provided example this is probably what you'd ultimately end up with as a solution. Your main problem was that you were never incrementing the ext_reps value, nor were you ever accumulating the ext_size_sum value.
hash = {}
Dir.glob('/home/computer/Desktop/**/*.*').each do |file|
file_extension = File.extname(file)
if hash[file_extension].nil?
# This is the first time this file extension has been seen, so initialize things for it.
hash[file_extension] = {}
hash[file_extension]['ext_reps'] = 0
hash[file_extension]['ext_size_sum'] = 0
end
# Increment/accumulate values.
hash[file_extension]['ext_reps'] += 1
hash[file_extension]['ext_size_sum'] += file.size
end
This is pretty much a reiteration of Cary's and others' answers without temporary variables. (Which is more Ruby-like IMHO.)
Dir.glob("*.*")
.group_by { |f| File.extname(f) }
.transform_values do |files|
{
"count" => files.count,
"size" => files.sum { |f| File.size(f) }
}
end
=> {".app"=>{"count"=>1, "size"=>96},
".sh-builder"=>{"count"=>1, "size"=>192},
".sh-names"=>{"count"=>1, "size"=>288},
".json"=>{"count"=>2, "size"=>5362},
".rb"=>{"count"=>1, "size"=>132}}

build a new array of hash from multiple array of hashes

I have following three array of hashes.
customer_mapping = [
{:customer_id=>"a", :customer_order_id=>"g1"},
{:customer_id=>"b", :customer_order_id=>"g2"},
{:customer_id=>"c", :customer_order_id=>"g3"},
{:customer_id=>"d", :customer_order_id=>"g4"},
{:customer_id=>"e", :customer_order_id=>"g5"}
]
customer_with_products = [
{:customer_order_id=>"g1", :product_order_id=>"a1"},
{:customer_order_id=>"g2", :product_order_id=>"a2"},
{:customer_order_id=>"g3", :product_order_id=>"a3"},
{:customer_order_id=>"g4", :product_order_id=>"a4"},
{:customer_order_id=>"g5", :product_order_id=>"a5"}
]
product_mapping = [
{:product_id=>"j", :product_order_id=>"a1"},
{:product_id=>"k", :product_order_id=>"a2"},
{:product_id=>"l", :product_order_id=>"a3"}
]
What i want is a new hash with only customer_id and product_id
{:product_id=>"j", :customer_id=>"a"},
{:product_id=>"k", :customer_id=>"b"},
{:product_id=>"l", :customer_id=>"c"}
I tried to loop over product_mapping and select the customer_order_id that match product_order_id in customer_with_products and then thought of looping over customer_mapping but not able to get desired output from the first step.
How can i achieve this?
Using
def merge_by(a,b, key)
(a+b).group_by { |h| h[key] }
.each_value.map { |arr| arr.inject(:merge) }
end
merge_by(
merge_by(customer_mapping, customer_with_products, :customer_order_id),
product_mapping,
:product_order_id
).select { |h| h[:product_id] }.map { |h| h.slice(:product_id, :customer_id) }
#=>[{:product_id=>"j", :customer_id=>"a"},
# {:product_id=>"k", :customer_id=>"b"},
# {:product_id=>"l", :customer_id=>"c"}]
Definitely not the cleanest solution, if your initial arrays come from SQL queries, I think those queries could be modified to aggregate your data properly.
merge_by(customer_mapping, customer_with_products, :customer_order_id)
# => [{:customer_id=>"a", :customer_order_id=>"g1", :product_order_id=>"a1"},
# {:customer_id=>"b", :customer_order_id=>"g2", :product_order_id=>"a2"},
# {:customer_id=>"c", :customer_order_id=>"g3", :product_order_id=>"a3"},
# {:customer_id=>"d", :customer_order_id=>"g4", :product_order_id=>"a4"},
# {:customer_id=>"e", :customer_order_id=>"g5", :product_order_id=>"a5"}]
Then merge it similarly with your last array and cleanup the result selecting only the elements for which :product_id was found, slicing wanted keys.
Alternatively, a much more readable solution, depending on your array sizes might be slower as it keeps iterating over the hashes:
product_mapping.map do |hc|
b_match = customer_with_products.detect { |hb| hb[:product_order_id] == hc[:product_order_id] }
a_match = customer_mapping.detect { |ha| ha[:customer_order_id] == b_match[:customer_order_id] }
[hc, a_match, b_match].inject(:merge)
end.map { |h| h.slice(:product_id, :customer_id) }
Following your handling of the problem the solution would be the following:
result_hash_array = product_mapping.map do |product_mapping_entry|
customer_receipt = customer_with_products.find do |customer_with_products_entry|
product_mapping_entry[:product_order_id] == customer_with_products_entry[:product_order_id]
end
customer_id = customer_mapping.find do |customer_mapping_entry|
customer_receipt[:customer_order_id] == customer_mapping_entry[:customer_order_id]
end[:customer_id]
{product_id: product_mapping_entry[:product_id], customer_id: customer_id}
end
Output
results_hash_array => [{:product_id=>"j", :customer_id=>"a"},
{:product_id=>"k", :customer_id=>"b"},
{:product_id=>"l", :customer_id=>"c"}]
Other option, starting from customer_mapping, one liner (but quite wide):
customer_mapping.map { |e| {customer_id: e[:customer_id], product_id: (product_mapping.detect { |k| k[:product_order_id] == (customer_with_products.detect{ |h| h[:customer_order_id] == e[:customer_order_id] } || {} )[:product_order_id] } || {} )[:product_id] } }
#=> [{:customer_id=>"a", :product_id=>"j"},
# {:customer_id=>"b", :product_id=>"k"},
# {:customer_id=>"c", :product_id=>"l"},
# {:customer_id=>"d", :product_id=>nil},
# {:customer_id=>"e", :product_id=>nil}]
cust_order_id_to_cust_id =
customer_mapping.each_with_object({}) do |g,h|
h[g[:customer_order_id]] = g[:customer_id]
end
#=> {"g1"=>"a", "g2"=>"b", "g3"=>"c", "g4"=>"d", "g5"=>"e"}
prod_order_id_to_cust_order_id =
customer_with_products.each_with_object({}) do |g,h|
h[g[:product_order_id]] = g[:customer_order_id]
end
#=> {"a1"=>"g1", "a2"=>"g2", "a3"=>"g3", "a4"=>"g4", "a5"=>"g5"}
product_mapping.map do |h|
{ product_id: h[:product_id], customer_id:
cust_order_id_to_cust_id[prod_order_id_to_cust_order_id[h[:product_order_id]]] }
end
#=> [{:product_id=>"j", :customer_id=>"a"},
# {:product_id=>"k", :customer_id=>"b"},
# {:product_id=>"l", :customer_id=>"c"}]
This formulation is particularly easy to test. (It's so straightforward that no debugging was needed).
I would recommended to rather take a longer but more readable solution which you also understand in some months from now by looking at it. Use full names for the hash keys instead of hiding them behind k, v for more complexe lookups (maybe its just my personal preference).
I would suggest somethink like:
result = product_mapping.map do |mapping|
customer_id = customer_mapping.find do |hash|
hash[:customer_order_id] == customer_with_products.find do |hash|
hash[:product_order_id] == mapping[:product_order_id]
end[:customer_order_id]
end[:customer_id]
{ product_id: mapping[:product_id], customer_id: customer_id }
end

Group by key in array and get max and average

I have an array that is structure as such:
{"status": "ok", "data": [{"temp": 22, "wind": 351.0, "datetime": "20160815-0330"}, {"temp": 21, "wind": 321.0, "datetime": "20160815-0345"}]}
I'm looking to group by the datetime key (ignoring the time), find the max temp and the average wind.
I've tried something as follows, but unsure of how to do max_by and average in the same map:
#data['data'].group_by { |d| d.values_at("datetime") }.map { |_, v| v.max_by { |h| h["temp"] } }
So, when you do "data": { ... }, the data actually becomes a symbol, not a string so you would need to do something like:
#data[:data].group_by { |data| data[:datetime].split('-')[0] }
in order to group by the :datetime key, ignoring the time portion (I assume, the time portion is just everything after the -). Then you end up with a hash looking like:
{"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
and to find the max :temp and average of the :wind you can do:
results = #data[:data].group_by { |data| data[:datetime].split('-')[0] }.map do |date, values|
[date, {
maximum_temp: values.max_by { |value| value[:temp] }[:temp],
average_wind: values.sum { |value| value[:wind] }.to_f / values.length
}]
end.to_h
# => {"20160815"=>{:maximum_temp=>22, :average_wind=>336.0}}
The above method work very well, the code seems bit complicated by making use of max_by and the access the value [:temp] and then sum and explicit to_h. So, if you consider for performance and good readability wise you could use the basic each like below,
data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
data.map do |k, v|
winds = []
temps = []
v.each do |item|
winds << item[:wind]
temps << item[:temp]
end
{k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}
end
And the output is below,
# => {"20160815"=>{:max_temp=>22, :avg_wind=>336.0}}
Below is the small benchmark between making use of each and max_by,
data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
def by_each(data)
data.map do |k, v|
winds = []
temps = []
v.each do |item|
winds << item[:wind]
temps << item[:temp]
end
{k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}
end
end
def by_max(data)
data.map do |date, values|
[date, {
maximum_temp: values.max_by { |value| value[:temp] }[:temp],
average_wind: values.sum { |value| value[:wind] }.to_f / values.length
}]
end.to_h
end
Benchmark.ips do |x|
x.config(times: 10)
x.report 'BY_EACH' do
by_each(data)
end
x.report 'BY_MAX' do
by_max(data)
end
x.compare!
end
And the benchmark o/p is like below,
Warming up --------------------------------------
BY_EACH 18.894k i/100ms
BY_MAX 13.793k i/100ms
Calculating -------------------------------------
BY_EACH 226.160k (± 5.3%) i/s - 1.134M in 5.025488s
BY_MAX 154.745k (± 5.8%) i/s - 772.408k in 5.006365s
Comparison:
BY_EACH: 226159.5 i/s
BY_MAX: 154744.8 i/s - 1.46x slower
Hence, you can see BY_MAX is 1.46 times slower than BY_EACH. But, of course you can make use of any approach that suits for your understanding and usability.

Calculating avg. in deeply nested hash and then group by another field

I'm trying to work out the most efficient way to loop through some deeply nested data, find the average of the values and return a new hash with the data grouped by the date.
The raw data looks like this:
[
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.153846153848,
"44"=>15117.866666666667
}
},
client_id: 1,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>38113.846153846156,
"44"=>33032.0
}
},
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
},
]
I have about 10,000,000 of these to loop through so I'm a little worried about performance.
The end result, needs to look like this. The vals need to be the average of the txbps:
[
{
date: "2015-11-14",
avg: 178730.153846153848
},
{
date: "2015-11-15",
avg: 123987.192873978987
},
{
date: "2015-11-16",
avg: 126335.982123876283
}
]
I've tried this to start:
results.map { |val| val["txbps"].values.map { |a| a.values.sum } }
But that's giving me this:
[[5211174.189281798, 25998.222222222223], [435932.442835184, 56051.555555555555], [5718452.806735582, 321299.55555555556]]
And I just can't figure out how to get it done. I can't find any good references online either.
I also tried to group by the date first:
res.map { |date, values| values.map { |client| client["txbps"].map { |tx,a| { date: date, client_id: client[':'], tx: (a.values.inject(:+) / a.size).to_i } } } }.flatten
[
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>306539
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25998
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25643
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>56051
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>336379
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>321299
}
]
If possible, how can I do this in a single run.
---- EDIT ----
Got a little bit further:
res.map { |a,b|
{
date: a[:date], val: a["txbps"].values.map { |k,v|
k.values.sum / k.size
}.first
}
}.
group_by { |el| el[:date] }.map { |date,list|
{
key: date, val: list.map { |elem| elem[:val] }.reduce(:+) / list.size
}
}
But that's epic - is there a faster, simpler way??
Try #inject
Like .map, It's a way of converting a enumerable (list, hash, pretty much anything you can loop in Ruby) into a different object. Compared to .map, it's a lot more flexible, which is super helpful. Sadly, this comes with a cost of the method being super hard to wrap your head around. I think Drew Olson explains it best in his answer.
You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.
Examples:
To sum all the numbers in an array (with #inject), you can do this:
array = [5,10,7,8]
# |- Initial Value
array.inject(0) { |sum, n| sum + n } #=> 30
# |- You return the new value for the accumulator in this block.
To find the average of an array of numbers, you can find a sum, and then divide. If you divide the num variable inside the inject function ({|sum, num| sum + (num / array.size)}), you multiply the amount of calculations you will have to do.
array = [5,10,7,8]
array.inject(0.0) { |sum, num| sum + num } / array.size #=> 7.5
Method
If creating methods on classes is your style, you can define a method on the Array class (from John Feminella's answer). Put this code somewhere before you need to find the sum or mean of an array:
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
And then
array = [5,10,7,8].sum #=> 30
array = [5,10,7,8].mean #=> 7.5
Gem
If you like putting code in black boxes, or really precious minerals, then you can use the average gem by fegoa89: gem install average. It also has support for the #mode and #median
[5,10,7,8].mean #=> 7.5
Solution:
Assuming your objects look like this:
data = [
{
date: "2015-11-14",
...
txbps: {...},
},
{
date: "2015-11-14",
...
txbps: {...},
},
...
]
This code does what you need, but it's somewhat complex.
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
data = (data.inject({}) do |hash, item|
this = (item[:txbps].values.map {|i| i.values}).flatten # Get values of values of `txbps`
hash[item[:date]] = (hash[item[:date]] || []) + this # If a list already exists for this date, use it, otherwise create a new list, and add the info we created above.
hash # Return the hash for future use
end).map do |day, value|
{date: day, avg: value.mean} # Clean data
end
will merge your objects into arrays grouped by date:
{:date=>"2015-11-14", :avg=>123046.04444444446}
Data Structure
I assume your input data is an array of hashes. For example:
arr = [
{
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
},
{
client_id: 1,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>38113.84,
"44"=>33032.03,
}
}
},
{
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
}
},
{
client_id: 3,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
}
]
Code
Based on my understanding of the problem, you can compute averages as follows:
def averages(arr)
h = arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |g,h|
g[:txbps].values.each { |f| h[g[:date]].concat(f.values) } }
h.merge(h) { |_,v| (v.reduce(:+)/(v.size.to_f)).round(2) }
end
Example
For arr above:
avgs = averages(arr)
#=> {"2015-11-14"=>166782.6, "2015-11-15"=>26033.47}
The value of the hash h in the first line of the method was:
{"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
Convert hash returned by averages to desired array of hashes
avgs is not in the form of the output desired. It's a simple matter to do the conversion, but you might consider leaving the hash output in this format. The conversion is simply:
avgs.map { |d,avg| { date: d, avg: avg } }
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]
Explanation
Rather than explain in detail how the method works, I will instead give an alternative form of the method does exactly the same thing, but in a more verbose and slightly less Ruby-like way. I've also included the conversion of the hash to an array of hashes at the end:
def averages(arr)
h = {}
arr.each do |g|
vals = g[:txbps].values
vals.each do |f|
date = g[:date]
h[date] = [] unless h.key?(date)
h[date].concat(f.values)
end
end
keys = h.keys
keys.each do |k|
val = h[k]
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
end
h.map { |d,avg| { date: d, avg: avg } }
end
Now let me insert some puts statements to print out various intermediate values in the calculations, to help explain what's going on:
def averages(arr)
h = {}
arr.each do |g|
puts "g=#{g}"
vals = g[:txbps].values
puts "vals=#{vals}"
vals.each do |f|
puts " f=#{f}"
date = g[:date]
puts " date=#{date}"
h[date] = [] unless h.key?(date)
puts " before concat, h=#{h}"
h[date].concat(f.values)
puts " after concat, h=#{h}"
end
puts
end
puts "h=#{h}"
keys = h.keys
puts "keys=#{keys}"
keys.each do |k|
val = h[k]
puts " k=#{k}, val=#{val}"
puts " val.reduce(:+)=#{val.reduce(:+)}"
puts " val.size.to_f=#{val.size.to_f}"
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
puts " h[#{k}]=#{h[k]}"
puts
end
h.map { |d,avg| { date: d, avg: avg } }
end
Execute averages once more:
averages(arr)
g={:client_id=>2, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-14
before concat, h={"2015-11-14"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86]}
g={:client_id=>1, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>38113.84, "44"=>33032.03}}}
vals=[{"43"=>38113.84, "44"=>33032.03}]
f={"43"=>38113.84, "44"=>33032.03}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>4, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>299960.0, "44"=>334182.4}}}
vals=[{"43"=>299960.0, "44"=>334182.4}]
f={"43"=>299960.0, "44"=>334182.4}
date=2015-11-14
before concat, h={"2015-11-14"=>[17870.15, 15117.86],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>3, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
keys=["2015-11-14", "2015-11-15"]
k=2015-11-14, val=[17870.15, 15117.86, 299960.0, 334182.4]
val.reduce(:+)=667130.41
val.size.to_f=4.0
h[2015-11-14]=166782.6
k=2015-11-15, val=[38113.84, 33032.03, 17870.15, 15117.86]
val.reduce(:+)=104133.87999999999
val.size.to_f=4.0
h[2015-11-15]=26033.47
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]

How can you sort an array in Ruby starting at a specific letter, say letter f?

I have a text array.
text_array = ["bob", "alice", "dave", "carol", "frank", "eve", "jordan", "isaac", "harry", "george"]
text_array = text_array.sort would give us a sorted array.
However, I want a sorted array with f as the first letter for our order, and e as the last.
So the end result should be...
text_array = ["frank", "george", "harry", "isaac", "jordan", "alice", "bob", "carol", "dave", "eve"]
What would be the best way to accomplish this?
Try this:
result = (text_array.select{ |v| v =~ /^[f-z]/ }.sort + text_array.select{ |v| v =~ /^[a-e]/ }.sort).flatten
It's not the prettiest but it will get the job done.
Edit per comment. Making a more general piece of code:
before = []
after = []
text_array.sort.each do |t|
if t > term
after << t
else
before << t
end
end
return (after + before).flatten
This code assumes that term is whatever you want to divide the array. And if an array value equals term, it will be at the end.
You can do that using a hash:
alpha = ('a'..'z').to_a
#=> ["a", "b", "c",..."x", "y", "z"]
reordered = alpha.rotate(5)
#=> ["f", "g",..."z", "a",...,"e"]
h = reordered.zip(alpha).to_h
# => {"f"=>"a", "g"=>"b",..., "z"=>"u", "a"=>"v",..., e"=>"z"}
text_array.sort_by { |w| w.gsub(/./,h) }
#=> ["frank", "george", "harry", "isaac", "jordan",
# "alice", "bob", "carol", "dave", "eve"]
A variant of this is:
a_to_z = alpha.join
#=> "abcdefghijklmnopqrstuvwxyz"
f_to_e = reordered.join
#=> "fghijklmnopqrstuvwxyzabcde"
text_array.sort_by { |w| w.tr(f_to_e, a_to_z) }
#=> ["frank", "george", "harry", "isaac", "jordan",
# "alice", "bob", "carol", "dave", "eve"]
I think the easiest would be to rotate the sorted array:
text_array.rotate(offset) if offset = text_array.find_index { |e| e.size > 0 and e[0] == 'f' }
Combining Ryan K's answer and my previous answer, this is a one-liner you can use without any regex:
text_array = text_array.sort!.select {|x| x.first >= "f"} + text_array.select {|x| x.first < "f"}
If I got your question right, it looks like you want to create sorted list with biased predefined patterns.
ie. let's say you want to define specific pattern of text which can completely change the sorting sequence for the array element.
Here is my proposal, you can get better code out of this, but my tired brain got it for now -
an_array = ["bob", "alice", "dave", "carol", "frank", "eve", "jordan", "isaac", "harry", "george"]
# Define your patterns with scores so that the sorting result can vary accordingly
# It's full fledged Regex so you can put any kind of regex you want.
patterns = {
/^f/ => 100,
/^e/ => -100,
/^g/ => 60,
/^j/ => 40
}
# Sort the array with our preferred sequence
sorted_array = an_array.sort do |left, right|
# Find score for the left string
left_score = patterns.find{ |p, s| left.match(p) }
left_score = left_score ? left_score.last : 0
# Find the score for the right string
right_score = patterns.find{ |p, s| right.match(p) }
right_score = right_score ? right_score.last : 0
# Create the comparision score to prepare the right order
# 1 means replace with right and -1 means replace with left
# and 0 means remain unchanged
score = if right_score > left_score
1
elsif left_score > right_score
-1
else
0
end
# For debugging purpose, I added few verbose data
puts "L#{left_score}, R:#{right_score}: #{left}, #{right} => #{score}"
score
end
# Original array
puts an_array.join(', ')
# Biased array
puts sorted_array.join(', ')

Resources