Group by key in array and get max and average - ruby-on-rails

I have an array that is structure as such:
{"status": "ok", "data": [{"temp": 22, "wind": 351.0, "datetime": "20160815-0330"}, {"temp": 21, "wind": 321.0, "datetime": "20160815-0345"}]}
I'm looking to group by the datetime key (ignoring the time), find the max temp and the average wind.
I've tried something as follows, but unsure of how to do max_by and average in the same map:
#data['data'].group_by { |d| d.values_at("datetime") }.map { |_, v| v.max_by { |h| h["temp"] } }

So, when you do "data": { ... }, the data actually becomes a symbol, not a string so you would need to do something like:
#data[:data].group_by { |data| data[:datetime].split('-')[0] }
in order to group by the :datetime key, ignoring the time portion (I assume, the time portion is just everything after the -). Then you end up with a hash looking like:
{"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
and to find the max :temp and average of the :wind you can do:
results = #data[:data].group_by { |data| data[:datetime].split('-')[0] }.map do |date, values|
[date, {
maximum_temp: values.max_by { |value| value[:temp] }[:temp],
average_wind: values.sum { |value| value[:wind] }.to_f / values.length
}]
end.to_h
# => {"20160815"=>{:maximum_temp=>22, :average_wind=>336.0}}

The above method work very well, the code seems bit complicated by making use of max_by and the access the value [:temp] and then sum and explicit to_h. So, if you consider for performance and good readability wise you could use the basic each like below,
data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
data.map do |k, v|
winds = []
temps = []
v.each do |item|
winds << item[:wind]
temps << item[:temp]
end
{k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}
end
And the output is below,
# => {"20160815"=>{:max_temp=>22, :avg_wind=>336.0}}
Below is the small benchmark between making use of each and max_by,
data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
def by_each(data)
data.map do |k, v|
winds = []
temps = []
v.each do |item|
winds << item[:wind]
temps << item[:temp]
end
{k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}
end
end
def by_max(data)
data.map do |date, values|
[date, {
maximum_temp: values.max_by { |value| value[:temp] }[:temp],
average_wind: values.sum { |value| value[:wind] }.to_f / values.length
}]
end.to_h
end
Benchmark.ips do |x|
x.config(times: 10)
x.report 'BY_EACH' do
by_each(data)
end
x.report 'BY_MAX' do
by_max(data)
end
x.compare!
end
And the benchmark o/p is like below,
Warming up --------------------------------------
BY_EACH 18.894k i/100ms
BY_MAX 13.793k i/100ms
Calculating -------------------------------------
BY_EACH 226.160k (± 5.3%) i/s - 1.134M in 5.025488s
BY_MAX 154.745k (± 5.8%) i/s - 772.408k in 5.006365s
Comparison:
BY_EACH: 226159.5 i/s
BY_MAX: 154744.8 i/s - 1.46x slower
Hence, you can see BY_MAX is 1.46 times slower than BY_EACH. But, of course you can make use of any approach that suits for your understanding and usability.

Related

Group by unique values while summing / adding other values

I have a data structure that looks like this:
arr = [
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
}
]
I need to group all of these by tariff_code, while summing the price and amounts that correspond with that tariff code. So my expected output should be:
[
{
price: 4.0,
unit: "meter",
tariff_code: "4901.99",
amount: 400
},
{
price: 2.0,
unit: "yards",
tariff_code: "6006.24",
amount: 1000
}
]
receipt_data[:order_items].group_by { |oi| oi[:tariff_code] }.values
The group_by statement used above will allow me to group by tariff_code but I'm unable to work out a way to sum the other values. I'm sure there is a slick one-liner way to accomplish this...
More verbose:
grouped_items = arr.group_by { |oi| oi[:tariff_code] }
result = grouped_items.map do |tariff_code, code_items|
price, amount = code_items.reduce([0, 0]) do |(price, amount), ci|
[price + ci[:price], amount + ci[:amount]]
end
{
price: price,
unit: code_items.first[:unit],
tariff_code: tariff_code,
amount: amount
}
end
#[
# {:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400}
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}
#]
Just to add to the fun, the answer which uses group_by as #cary said, and mostly copying Pavel's answer. This is very bad performancewise and use only if the array is small . Also it uses sum which is available only in Rails. (can be replaced by .map { |item| item[:price] }.reduce(:+) in pure ruby)
arr.group_by { |a| a[:tariff_code] }.map do |tariff_code, items|
{
price: items.sum { |item| item[:price] },
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum { |item| item[:amount] }
}
end
This would have been even smaller if it was an array of objects (ActiveRecord objects maybe) with methods instead of hashes.
arr.group_by(&:tariff_code).map do |tariff_code, items|
{
price: items.sum(&:price]),
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum(&:amount)
}
end
There are two standard ways of addressing problems of this kind. One, which I've taken, is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. The other way is to use Enumerable#group_by, which I expect someone will soon employ in another answer. I do not believe either approach is preferable in terms of efficiency or readability.
arr.each_with_object({}) do |g,h|
h.update(g[:tariff_code]=>g) do |_,o,n|
{ price: o[:price]+n[:price], unit: o[:unit], amount: o[:amount]+n[:amount] }
end
end.values
#=> [{:price=>4.0, :unit=>"meter", :amount=>400},
# {:price=>28.0, :unit=>"yards", :amount=>1000}]
Note that the receiver of values is seen to be:
{"4901.99"=>{:price=>4.0, :unit=>"meter", :amount=>400},
{"6006.24"=>{:price=>28.0, :unit=>"yards", :amount=>1000}}
A simple approach, but its easy to add new keys for summing and to change a group key. Not sure about efficiency, but 500_000 times Benchmark of arr.map here looks good
#<Benchmark::Tms:0x00007fad0911b418 #label="", #real=1.480799000000843, #cstime=0.0, #cutime=0.0, #stime=0.0017340000000000133, #utime=1.4783359999999999, #total=1.48007>
summ_keys = %i[price amount]
grouping_key = :tariff_code
result = Hash.new { |h, k| h[k] = {} }
arr.map do |h|
cumulative = result[h[grouping_key]]
h.each do |k, v|
case k
when *summ_keys
cumulative[k] = (cumulative[k] || 0) + h[k]
else
cumulative[k] = v
end
end
end
p result.values
# [{:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400},
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}]

Ruby: Passing down key/value after transforming objects in array

Given data:
data = [
{"id":14, "sort":1, "content":"9", foo: "2022"},
{"id":14, "sort":4, "content":"5", foo: "2022"},
{"id":14, "sort":2, "content":"1", foo: "2022"},
{"id":14, "sort":3, "content":"0", foo: "2022"},
{"id":15, "sort":4, "content":"4", foo: "2888"},
{"id":15, "sort":2, "content":"1", foo: "2888"},
{"id":15, "sort":1, "content":"3", foo: "2888"},
{"id":15, "sort":3, "content":"3", foo: "2888"},
{"id":16, "sort":1, "content":"8", foo: "3112"},
{"id":16, "sort":3, "content":"4", foo: "3112"},
{"id":16, "sort":2, "content":"4", foo: "3112"},
{"id":16, "sort":4, "content":"9", foo: "3112"}
]
Got the contents concatenated by their sort and ids with:
formatted = data.group_by { |d| d[:id]}.transform_values do |value_array|
value_array.sort_by { |b| b[:sort] }
.map { |c| c[:content] }.join
end
puts formatted
#=> {14=>"9105", 15=>"3134", 16=>"8449"}
I know that foo exists inside value_array but wondering how can I include foo to exist inside the formatted variable so I can map through it to get the desired output or if it's possible?
Desired Output:
[
{"id":14, "concated_value":"9105", foo: "2022"},
{"id":15, "concated_value":"3134", foo: "2888"},
{"id":16, "concated_value":"8449", foo: "3112"}
]
Since :foo is unique to :id. You can do this as follows:
data.group_by {|h| h[:id]}.map do |_,sa|
sa.map(&:dup).sort_by {|h| h.delete(:sort) }.reduce do |m,h|
m.merge(h) {|key,old,new| key == :content ? old + new : old }
end.tap {|h| h[:concated_value] = h.delete(:content) }
end
#=> [
# {"id":14, foo: "2022", "concated_value":"9105"},
# {"id":15, foo: "2888", "concated_value":"3134"},
# {"id":16, foo: "3112", "concated_value":"8449"}
# ]
First we group by id. group_by {|h| h[:id]}
Then we dup the hashes in the groups (so as not to destory the original). map(&:dup)
Then we sort by sort and delete it at the same time. .sort_by {|h| h.delete(:sort) }
Then we merge the groups together and concatenate the content key only.
m.merge(h) {|key,old,new| key == :content ? old + new : old }
Then we just change the key for content to concated_value tap {|h| h[:concated_value] = h.delete(:content) }
We can use first value from value_array to get our :id & :foo values
formatted = data.group_by { |d| d[:id]}.values.map do |value_array|
concated_value = value_array.sort_by { |b| b[:sort] }
.map { |c| c[:content] }.join
value_array.first.slice(:id, :foo)
.merge concated_value: concated_value
end
I think this is a good usecase for reduce, since after grouping you need first to get rid of the ID in the resulting [ID, VALUES] array from group_by and just return a reduced version of the VALUES part - this can all be done without any ActiveSupport etc. dependencies:
data
.group_by{ |d| d[:id] } # Get an array of [ID, [VALUES]]
.reduce([]) do |a, v| # Reduce it into a new empty array
# Append a new hash to the new array
a << {
id: v[1].first[:id], # Just take the ID of the first entry
foo: v[1].first[:foo], # Dito for foo
concatenated: v[1]
.sort_by{ |s| s[:sort] } # now sort all hashes by its sort key
.collect{ |s| s[:content] } # collect the content
.join # and merge it into a string
}
end
Output:
[{:id=>14, :foo=>"2022", :concatenated=>"9105"},
{:id=>15, :foo=>"2888", :concatenated=>"3134"},
{:id=>16, :foo=>"3112", :concatenated=>"8449"}]
EDIT
I had some other approach in mind when i started to write the previous solution, reduce was not really necessary, since the size of the array after group_by does not change, so a map is sufficient.
But while rewriting the code, i was thinking that creating a new hash with all the keys and copying all the values from the first hash within VALUES was a bit too much work, so it would be easier to just reject the overhead keys:
keys_to_ignore = [:sort, :content]
data
.group_by{ |d| d[:id] } # Get an array of [ID, [VALUES]]
.map do |v|
v[1]
.first # Take the first hash from [VALUES]
.merge({'concatenated': v[1] # Insert the concatenated values
.sort_by{ |s| s[:sort] } # now sort all hashes by its sort key
.collect{ |s| s[:content] } # collect the content
.join # and merge it into a string
})
.select { |k, _| !keys_to_ignore.include? k }
end
Output
[{:id=>14, :foo=>"2022", :concatenated=>"9105"},
{:id=>15, :foo=>"2888", :concatenated=>"3134"},
{:id=>16, :foo=>"3112", :concatenated=>"8449"}]
Online demo here
This will work even without Rails:
$irb> formatted = []
$irb> data.sort_by!{|a| a[:sort]}.map {|z| z[:id]}.uniq.each_with_index { |id, index| formatted << {id: id, concated_value: data.map{|c| (c[:id] == id ? c[:content] : nil)}.join, foo: data[index][:foo]}}
$irb> formatted
[{:id=>14, :concated_value=>"9105", :foo=>"2022"},
{:id=>15, :concated_value=>"3134", :foo=>"2888"},
{:id=>16, :concated_value=>"8449", :foo=>"3112"}]
data.sort_by { |h| h[:sort] }.
each_with_object({}) do |g,h| h.update(g[:id]=>{ id: g[:id],
concatenated_value: g[:content].to_s, foo: g[:foo] }) { |_,o,n|
o.merge(concatenated_value: o[:concatenated_value]+n[:concatenated_value]) }
end.values
#=> [{:id=>14, :concatenated_value=>"9105", :foo=>"2022"},
# {:id=>15, :concatenated_value=>"3134", :foo=>"2888"},
# {:id=>16, :concatenated_value=>"8449", :foo=>"3112"}]
This uses the form of Hash#update (aka merge!) that employs a block to determine the values of keys (here the value of :id) that are present in both hashes being merged. See the doc for the description of the three block variables (here _, o and n).
Note the receiver of values (at the end) is the following.
{ 14=>{ :id=>14, :concatenated_value=>"9105", :foo=>"2022" },
15=>{ :id=>15, :concatenated_value=>"3134", :foo=>"2888" },
16=>{ :id=>16, :concatenated_value=>"8449", :foo=>"3112" } }

Rails deserialize format output

I face a problem, I don't know how to arrange the serialize format in rails.
I have models call MissionSet, QuestionSet, Group
The MissionSet will return like this
I want it to become like this, it's really a challenge to me, because I am not familiar with handling this format.
Here's something that will get you started:
x = {}
inp.each do |h|
h['assignments'].each do |k, _|
x[k] ||= []
x[k] << h['question_set_id']
end
end
out = x.map do |key, value|
{
group_id: key,
question_sets: value.map { |v| { id: v} }
}
end
puts out.inspect
This code will first group your questions by the ids in assignments and then format it as you wanted.

Calculating avg. in deeply nested hash and then group by another field

I'm trying to work out the most efficient way to loop through some deeply nested data, find the average of the values and return a new hash with the data grouped by the date.
The raw data looks like this:
[
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.153846153848,
"44"=>15117.866666666667
}
},
client_id: 1,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>38113.846153846156,
"44"=>33032.0
}
},
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
},
]
I have about 10,000,000 of these to loop through so I'm a little worried about performance.
The end result, needs to look like this. The vals need to be the average of the txbps:
[
{
date: "2015-11-14",
avg: 178730.153846153848
},
{
date: "2015-11-15",
avg: 123987.192873978987
},
{
date: "2015-11-16",
avg: 126335.982123876283
}
]
I've tried this to start:
results.map { |val| val["txbps"].values.map { |a| a.values.sum } }
But that's giving me this:
[[5211174.189281798, 25998.222222222223], [435932.442835184, 56051.555555555555], [5718452.806735582, 321299.55555555556]]
And I just can't figure out how to get it done. I can't find any good references online either.
I also tried to group by the date first:
res.map { |date, values| values.map { |client| client["txbps"].map { |tx,a| { date: date, client_id: client[':'], tx: (a.values.inject(:+) / a.size).to_i } } } }.flatten
[
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>306539
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25998
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25643
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>56051
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>336379
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>321299
}
]
If possible, how can I do this in a single run.
---- EDIT ----
Got a little bit further:
res.map { |a,b|
{
date: a[:date], val: a["txbps"].values.map { |k,v|
k.values.sum / k.size
}.first
}
}.
group_by { |el| el[:date] }.map { |date,list|
{
key: date, val: list.map { |elem| elem[:val] }.reduce(:+) / list.size
}
}
But that's epic - is there a faster, simpler way??
Try #inject
Like .map, It's a way of converting a enumerable (list, hash, pretty much anything you can loop in Ruby) into a different object. Compared to .map, it's a lot more flexible, which is super helpful. Sadly, this comes with a cost of the method being super hard to wrap your head around. I think Drew Olson explains it best in his answer.
You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.
Examples:
To sum all the numbers in an array (with #inject), you can do this:
array = [5,10,7,8]
# |- Initial Value
array.inject(0) { |sum, n| sum + n } #=> 30
# |- You return the new value for the accumulator in this block.
To find the average of an array of numbers, you can find a sum, and then divide. If you divide the num variable inside the inject function ({|sum, num| sum + (num / array.size)}), you multiply the amount of calculations you will have to do.
array = [5,10,7,8]
array.inject(0.0) { |sum, num| sum + num } / array.size #=> 7.5
Method
If creating methods on classes is your style, you can define a method on the Array class (from John Feminella's answer). Put this code somewhere before you need to find the sum or mean of an array:
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
And then
array = [5,10,7,8].sum #=> 30
array = [5,10,7,8].mean #=> 7.5
Gem
If you like putting code in black boxes, or really precious minerals, then you can use the average gem by fegoa89: gem install average. It also has support for the #mode and #median
[5,10,7,8].mean #=> 7.5
Solution:
Assuming your objects look like this:
data = [
{
date: "2015-11-14",
...
txbps: {...},
},
{
date: "2015-11-14",
...
txbps: {...},
},
...
]
This code does what you need, but it's somewhat complex.
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
data = (data.inject({}) do |hash, item|
this = (item[:txbps].values.map {|i| i.values}).flatten # Get values of values of `txbps`
hash[item[:date]] = (hash[item[:date]] || []) + this # If a list already exists for this date, use it, otherwise create a new list, and add the info we created above.
hash # Return the hash for future use
end).map do |day, value|
{date: day, avg: value.mean} # Clean data
end
will merge your objects into arrays grouped by date:
{:date=>"2015-11-14", :avg=>123046.04444444446}
Data Structure
I assume your input data is an array of hashes. For example:
arr = [
{
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
},
{
client_id: 1,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>38113.84,
"44"=>33032.03,
}
}
},
{
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
}
},
{
client_id: 3,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
}
]
Code
Based on my understanding of the problem, you can compute averages as follows:
def averages(arr)
h = arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |g,h|
g[:txbps].values.each { |f| h[g[:date]].concat(f.values) } }
h.merge(h) { |_,v| (v.reduce(:+)/(v.size.to_f)).round(2) }
end
Example
For arr above:
avgs = averages(arr)
#=> {"2015-11-14"=>166782.6, "2015-11-15"=>26033.47}
The value of the hash h in the first line of the method was:
{"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
Convert hash returned by averages to desired array of hashes
avgs is not in the form of the output desired. It's a simple matter to do the conversion, but you might consider leaving the hash output in this format. The conversion is simply:
avgs.map { |d,avg| { date: d, avg: avg } }
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]
Explanation
Rather than explain in detail how the method works, I will instead give an alternative form of the method does exactly the same thing, but in a more verbose and slightly less Ruby-like way. I've also included the conversion of the hash to an array of hashes at the end:
def averages(arr)
h = {}
arr.each do |g|
vals = g[:txbps].values
vals.each do |f|
date = g[:date]
h[date] = [] unless h.key?(date)
h[date].concat(f.values)
end
end
keys = h.keys
keys.each do |k|
val = h[k]
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
end
h.map { |d,avg| { date: d, avg: avg } }
end
Now let me insert some puts statements to print out various intermediate values in the calculations, to help explain what's going on:
def averages(arr)
h = {}
arr.each do |g|
puts "g=#{g}"
vals = g[:txbps].values
puts "vals=#{vals}"
vals.each do |f|
puts " f=#{f}"
date = g[:date]
puts " date=#{date}"
h[date] = [] unless h.key?(date)
puts " before concat, h=#{h}"
h[date].concat(f.values)
puts " after concat, h=#{h}"
end
puts
end
puts "h=#{h}"
keys = h.keys
puts "keys=#{keys}"
keys.each do |k|
val = h[k]
puts " k=#{k}, val=#{val}"
puts " val.reduce(:+)=#{val.reduce(:+)}"
puts " val.size.to_f=#{val.size.to_f}"
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
puts " h[#{k}]=#{h[k]}"
puts
end
h.map { |d,avg| { date: d, avg: avg } }
end
Execute averages once more:
averages(arr)
g={:client_id=>2, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-14
before concat, h={"2015-11-14"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86]}
g={:client_id=>1, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>38113.84, "44"=>33032.03}}}
vals=[{"43"=>38113.84, "44"=>33032.03}]
f={"43"=>38113.84, "44"=>33032.03}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>4, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>299960.0, "44"=>334182.4}}}
vals=[{"43"=>299960.0, "44"=>334182.4}]
f={"43"=>299960.0, "44"=>334182.4}
date=2015-11-14
before concat, h={"2015-11-14"=>[17870.15, 15117.86],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>3, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
keys=["2015-11-14", "2015-11-15"]
k=2015-11-14, val=[17870.15, 15117.86, 299960.0, 334182.4]
val.reduce(:+)=667130.41
val.size.to_f=4.0
h[2015-11-14]=166782.6
k=2015-11-15, val=[38113.84, 33032.03, 17870.15, 15117.86]
val.reduce(:+)=104133.87999999999
val.size.to_f=4.0
h[2015-11-15]=26033.47
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]

Improve performance to find an array of ids from array of hashes in Ruby

Consider a array of hashes
a=[{'id'=>'1','imageUrl'=>'abc'},{'id'=>'2','imageUrl'=>'efg'},{'id'=>'3','imageUrl'=>'hij'}]
Consider an array of characters/numbers/ids
b=['1','2','5']
I would like to match ids of b with a. With all matches, I would like to replace the value of b with the corresponding hash.
In the above example, the values '1' and '2' are common between a and b, so I replace '1' and '2' in b with the corresponding hash values of a.
So the resultant b becomes
b=[[{"id"=>"1", "imageUrl"=>"abc"}], [{"id"=>"2", "imageUrl"=>"efg"}], []]
I wrote the following code:
b.each_with_index{|r,index|
puts index
k=a.select {|z| z["id"]==r }
b[index]=k
}
Is there a better solution? A more sleek one. I am new to ruby.
You can use the destructive version of Enumerable#map, with Enumerable#select
b.map! {|id| a.select {|h| h['id'] == id }}
# => [[{"id"=>"1", "imageUrl"=>"abc"}], [{"id"=>"2", "imageUrl"=>"efg"}], []]
This will improve speed:
#!/usr/bin/env ruby
require 'pp'
require 'benchmark'
a = []
5000.times {|c| a << {"id" => "#{c}", "imageUrl" => "test#{c}"}}
b1 = (1..2500).to_a.shuffle.map(&:to_s)
b2 = b1.dup()
puts "method1"
puts Benchmark.measure { b1.map! {|id| a.select {|h| h['id'] == id }} }
puts "method2"
result = Benchmark.measure do
ah = Hash.new([])
a.each{|x| ah[x["id"]] = x}
b2.map!{|be| ah[be]}
end
puts result
Results:
method1
2.820000 0.010000 2.830000 ( 2.827695)
method2
0.000000 0.000000 0.000000 ( 0.002607)
Updated benchmark - it uses 250000 elements in b instead of 2500 (method 1 commented out to protect the innocent - it's too slow and I got bored waiting for it):
#!/usr/bin/env ruby
require 'pp'
require 'benchmark'
a = []
5000.times {|c| a << {"id" => "#{c}", "imageUrl" => "test#{c}"}}
b1 = (1..250000).to_a.collect{|x| x%2500}.shuffle.map(&:to_s)
b2 = b1.dup()
b3 = b1.dup()
# puts "method1"
# puts Benchmark.measure { b1.map! {|id| a.select {|h| h['id'] == id }} }
puts "method2"
result = Benchmark.measure do
ah = Hash.new([])
a.each{|x| ah[x["id"]] = x}
b2.map!{|be| ah[be]}
end
puts result
puts "method3"
result = Benchmark.measure do
h = a.each_with_object({}) { |g,h| h.update(g['id']=>g) }
b3.map! { |s| h.key?(s) ? [h[s]] : [] }
end
puts result
And the results are:
method2
0.050000 0.000000 0.050000 ( 0.045294)
method3
0.100000 0.010000 0.110000 ( 0.109646)
[Edit: after posting I noticed #Mircea had already posted the same solution. I'll leave mine for the mention of the values_at alternative.]
I assume the values of :id in a are unique.
First construct a look-up hash:
h = a.each_with_object({}) { |g,h| h.update(g['id']=>g) }
#=> {"1"=>{"id"=>"1", "imageUrl"=>"abc"},
# "2"=>{"id"=>"2", "imageUrl"=>"efg"},
# "3"=>{"id"=>"3", "imageUrl"=>"hij"}}
Then simply loop through b, constructing the desired array:
b.map { |s| h.key?(s) ? [h[s]] : [] }
#=> [[{"id"=>"1", "imageUrl"=>"abc"}],
# [{"id"=>"2", "imageUrl"=>"efg"}],
# []]
Alternatively,
arr = h.values_at(*b)
#=> [{"id"=>"1", "imageUrl"=>"abc"},
# {"id"=>"2", "imageUrl"=>"efg"},
# nil]
Then:
arr.map { |e| e.nil? ? [] : [e] }
#=> [[{"id"=>"1", "imageUrl"=>"abc"}],
# [{"id"=>"2", "imageUrl"=>"efg"}],
# []]
You might instead consider using arr for subsequent calculations, since all the arrays in your desired solution contain at most one element.
The use of a lookup hash is especially efficient when b is large relative to a.

Resources