I'm trying to work out the most efficient way to loop through some deeply nested data, find the average of the values and return a new hash with the data grouped by the date.
The raw data looks like this:
[
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.153846153848,
"44"=>15117.866666666667
}
},
client_id: 1,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>38113.846153846156,
"44"=>33032.0
}
},
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
},
]
I have about 10,000,000 of these to loop through so I'm a little worried about performance.
The end result, needs to look like this. The vals need to be the average of the txbps:
[
{
date: "2015-11-14",
avg: 178730.153846153848
},
{
date: "2015-11-15",
avg: 123987.192873978987
},
{
date: "2015-11-16",
avg: 126335.982123876283
}
]
I've tried this to start:
results.map { |val| val["txbps"].values.map { |a| a.values.sum } }
But that's giving me this:
[[5211174.189281798, 25998.222222222223], [435932.442835184, 56051.555555555555], [5718452.806735582, 321299.55555555556]]
And I just can't figure out how to get it done. I can't find any good references online either.
I also tried to group by the date first:
res.map { |date, values| values.map { |client| client["txbps"].map { |tx,a| { date: date, client_id: client[':'], tx: (a.values.inject(:+) / a.size).to_i } } } }.flatten
[
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>306539
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25998
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>25643
},
{
: date=>"2015-11-14",
: client_id=>"2",
: tx=>56051
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>336379
},
{
: date=>"2015-11-14",
: client_id=>"1",
: tx=>321299
}
]
If possible, how can I do this in a single run.
---- EDIT ----
Got a little bit further:
res.map { |a,b|
{
date: a[:date], val: a["txbps"].values.map { |k,v|
k.values.sum / k.size
}.first
}
}.
group_by { |el| el[:date] }.map { |date,list|
{
key: date, val: list.map { |elem| elem[:val] }.reduce(:+) / list.size
}
}
But that's epic - is there a faster, simpler way??
Try #inject
Like .map, It's a way of converting a enumerable (list, hash, pretty much anything you can loop in Ruby) into a different object. Compared to .map, it's a lot more flexible, which is super helpful. Sadly, this comes with a cost of the method being super hard to wrap your head around. I think Drew Olson explains it best in his answer.
You can think of the first block argument as an accumulator: the result of each run of the block is stored in the accumulator and then passed to the next execution of the block. In the case of the code shown above, you are defaulting the accumulator, result, to 0. Each run of the block adds the given number to the current total and then stores the result back into the accumulator. The next block call has this new value, adds to it, stores it again, and repeats.
Examples:
To sum all the numbers in an array (with #inject), you can do this:
array = [5,10,7,8]
# |- Initial Value
array.inject(0) { |sum, n| sum + n } #=> 30
# |- You return the new value for the accumulator in this block.
To find the average of an array of numbers, you can find a sum, and then divide. If you divide the num variable inside the inject function ({|sum, num| sum + (num / array.size)}), you multiply the amount of calculations you will have to do.
array = [5,10,7,8]
array.inject(0.0) { |sum, num| sum + num } / array.size #=> 7.5
Method
If creating methods on classes is your style, you can define a method on the Array class (from John Feminella's answer). Put this code somewhere before you need to find the sum or mean of an array:
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
And then
array = [5,10,7,8].sum #=> 30
array = [5,10,7,8].mean #=> 7.5
Gem
If you like putting code in black boxes, or really precious minerals, then you can use the average gem by fegoa89: gem install average. It also has support for the #mode and #median
[5,10,7,8].mean #=> 7.5
Solution:
Assuming your objects look like this:
data = [
{
date: "2015-11-14",
...
txbps: {...},
},
{
date: "2015-11-14",
...
txbps: {...},
},
...
]
This code does what you need, but it's somewhat complex.
class Array
def sum
inject(0.0) { |result, el| result + el }
end
def mean
sum / size
end
end
data = (data.inject({}) do |hash, item|
this = (item[:txbps].values.map {|i| i.values}).flatten # Get values of values of `txbps`
hash[item[:date]] = (hash[item[:date]] || []) + this # If a list already exists for this date, use it, otherwise create a new list, and add the info we created above.
hash # Return the hash for future use
end).map do |day, value|
{date: day, avg: value.mean} # Clean data
end
will merge your objects into arrays grouped by date:
{:date=>"2015-11-14", :avg=>123046.04444444446}
Data Structure
I assume your input data is an array of hashes. For example:
arr = [
{
client_id: 2,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
},
{
client_id: 1,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>38113.84,
"44"=>33032.03,
}
}
},
{
client_id: 4,
date: "2015-11-14",
txbps: {
"22"=>{
"43"=>299960.0,
"44"=>334182.4
}
}
},
{
client_id: 3,
date: "2015-11-15",
txbps: {
"22"=>{
"43"=>17870.15,
"44"=>15117.86
}
}
}
]
Code
Based on my understanding of the problem, you can compute averages as follows:
def averages(arr)
h = arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |g,h|
g[:txbps].values.each { |f| h[g[:date]].concat(f.values) } }
h.merge(h) { |_,v| (v.reduce(:+)/(v.size.to_f)).round(2) }
end
Example
For arr above:
avgs = averages(arr)
#=> {"2015-11-14"=>166782.6, "2015-11-15"=>26033.47}
The value of the hash h in the first line of the method was:
{"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
Convert hash returned by averages to desired array of hashes
avgs is not in the form of the output desired. It's a simple matter to do the conversion, but you might consider leaving the hash output in this format. The conversion is simply:
avgs.map { |d,avg| { date: d, avg: avg } }
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]
Explanation
Rather than explain in detail how the method works, I will instead give an alternative form of the method does exactly the same thing, but in a more verbose and slightly less Ruby-like way. I've also included the conversion of the hash to an array of hashes at the end:
def averages(arr)
h = {}
arr.each do |g|
vals = g[:txbps].values
vals.each do |f|
date = g[:date]
h[date] = [] unless h.key?(date)
h[date].concat(f.values)
end
end
keys = h.keys
keys.each do |k|
val = h[k]
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
end
h.map { |d,avg| { date: d, avg: avg } }
end
Now let me insert some puts statements to print out various intermediate values in the calculations, to help explain what's going on:
def averages(arr)
h = {}
arr.each do |g|
puts "g=#{g}"
vals = g[:txbps].values
puts "vals=#{vals}"
vals.each do |f|
puts " f=#{f}"
date = g[:date]
puts " date=#{date}"
h[date] = [] unless h.key?(date)
puts " before concat, h=#{h}"
h[date].concat(f.values)
puts " after concat, h=#{h}"
end
puts
end
puts "h=#{h}"
keys = h.keys
puts "keys=#{keys}"
keys.each do |k|
val = h[k]
puts " k=#{k}, val=#{val}"
puts " val.reduce(:+)=#{val.reduce(:+)}"
puts " val.size.to_f=#{val.size.to_f}"
h[k] = (val.reduce(:+)/(val.size.to_f)).round(2)
puts " h[#{k}]=#{h[k]}"
puts
end
h.map { |d,avg| { date: d, avg: avg } }
end
Execute averages once more:
averages(arr)
g={:client_id=>2, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-14
before concat, h={"2015-11-14"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86]}
g={:client_id=>1, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>38113.84, "44"=>33032.03}}}
vals=[{"43"=>38113.84, "44"=>33032.03}]
f={"43"=>38113.84, "44"=>33032.03}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86], "2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>4, :date=>"2015-11-14", :txbps=>{"22"=>{"43"=>299960.0, "44"=>334182.4}}}
vals=[{"43"=>299960.0, "44"=>334182.4}]
f={"43"=>299960.0, "44"=>334182.4}
date=2015-11-14
before concat, h={"2015-11-14"=>[17870.15, 15117.86],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
g={:client_id=>3, :date=>"2015-11-15", :txbps=>{"22"=>{"43"=>17870.15, "44"=>15117.86}}}
vals=[{"43"=>17870.15, "44"=>15117.86}]
f={"43"=>17870.15, "44"=>15117.86}
date=2015-11-15
before concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03]}
after concat, h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
h={"2015-11-14"=>[17870.15, 15117.86, 299960.0, 334182.4],
"2015-11-15"=>[38113.84, 33032.03, 17870.15, 15117.86]}
keys=["2015-11-14", "2015-11-15"]
k=2015-11-14, val=[17870.15, 15117.86, 299960.0, 334182.4]
val.reduce(:+)=667130.41
val.size.to_f=4.0
h[2015-11-14]=166782.6
k=2015-11-15, val=[38113.84, 33032.03, 17870.15, 15117.86]
val.reduce(:+)=104133.87999999999
val.size.to_f=4.0
h[2015-11-15]=26033.47
#=> [{:date=>"2015-11-14", :avg=>166782.6},
# {:date=>"2015-11-15", :avg=>26033.47}]
Related
I have a data structure that looks like this:
arr = [
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
}
]
I need to group all of these by tariff_code, while summing the price and amounts that correspond with that tariff code. So my expected output should be:
[
{
price: 4.0,
unit: "meter",
tariff_code: "4901.99",
amount: 400
},
{
price: 2.0,
unit: "yards",
tariff_code: "6006.24",
amount: 1000
}
]
receipt_data[:order_items].group_by { |oi| oi[:tariff_code] }.values
The group_by statement used above will allow me to group by tariff_code but I'm unable to work out a way to sum the other values. I'm sure there is a slick one-liner way to accomplish this...
More verbose:
grouped_items = arr.group_by { |oi| oi[:tariff_code] }
result = grouped_items.map do |tariff_code, code_items|
price, amount = code_items.reduce([0, 0]) do |(price, amount), ci|
[price + ci[:price], amount + ci[:amount]]
end
{
price: price,
unit: code_items.first[:unit],
tariff_code: tariff_code,
amount: amount
}
end
#[
# {:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400}
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}
#]
Just to add to the fun, the answer which uses group_by as #cary said, and mostly copying Pavel's answer. This is very bad performancewise and use only if the array is small . Also it uses sum which is available only in Rails. (can be replaced by .map { |item| item[:price] }.reduce(:+) in pure ruby)
arr.group_by { |a| a[:tariff_code] }.map do |tariff_code, items|
{
price: items.sum { |item| item[:price] },
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum { |item| item[:amount] }
}
end
This would have been even smaller if it was an array of objects (ActiveRecord objects maybe) with methods instead of hashes.
arr.group_by(&:tariff_code).map do |tariff_code, items|
{
price: items.sum(&:price]),
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum(&:amount)
}
end
There are two standard ways of addressing problems of this kind. One, which I've taken, is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. The other way is to use Enumerable#group_by, which I expect someone will soon employ in another answer. I do not believe either approach is preferable in terms of efficiency or readability.
arr.each_with_object({}) do |g,h|
h.update(g[:tariff_code]=>g) do |_,o,n|
{ price: o[:price]+n[:price], unit: o[:unit], amount: o[:amount]+n[:amount] }
end
end.values
#=> [{:price=>4.0, :unit=>"meter", :amount=>400},
# {:price=>28.0, :unit=>"yards", :amount=>1000}]
Note that the receiver of values is seen to be:
{"4901.99"=>{:price=>4.0, :unit=>"meter", :amount=>400},
{"6006.24"=>{:price=>28.0, :unit=>"yards", :amount=>1000}}
A simple approach, but its easy to add new keys for summing and to change a group key. Not sure about efficiency, but 500_000 times Benchmark of arr.map here looks good
#<Benchmark::Tms:0x00007fad0911b418 #label="", #real=1.480799000000843, #cstime=0.0, #cutime=0.0, #stime=0.0017340000000000133, #utime=1.4783359999999999, #total=1.48007>
summ_keys = %i[price amount]
grouping_key = :tariff_code
result = Hash.new { |h, k| h[k] = {} }
arr.map do |h|
cumulative = result[h[grouping_key]]
h.each do |k, v|
case k
when *summ_keys
cumulative[k] = (cumulative[k] || 0) + h[k]
else
cumulative[k] = v
end
end
end
p result.values
# [{:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400},
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}]
I have an array that is structure as such:
{"status": "ok", "data": [{"temp": 22, "wind": 351.0, "datetime": "20160815-0330"}, {"temp": 21, "wind": 321.0, "datetime": "20160815-0345"}]}
I'm looking to group by the datetime key (ignoring the time), find the max temp and the average wind.
I've tried something as follows, but unsure of how to do max_by and average in the same map:
#data['data'].group_by { |d| d.values_at("datetime") }.map { |_, v| v.max_by { |h| h["temp"] } }
So, when you do "data": { ... }, the data actually becomes a symbol, not a string so you would need to do something like:
#data[:data].group_by { |data| data[:datetime].split('-')[0] }
in order to group by the :datetime key, ignoring the time portion (I assume, the time portion is just everything after the -). Then you end up with a hash looking like:
{"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
and to find the max :temp and average of the :wind you can do:
results = #data[:data].group_by { |data| data[:datetime].split('-')[0] }.map do |date, values|
[date, {
maximum_temp: values.max_by { |value| value[:temp] }[:temp],
average_wind: values.sum { |value| value[:wind] }.to_f / values.length
}]
end.to_h
# => {"20160815"=>{:maximum_temp=>22, :average_wind=>336.0}}
The above method work very well, the code seems bit complicated by making use of max_by and the access the value [:temp] and then sum and explicit to_h. So, if you consider for performance and good readability wise you could use the basic each like below,
data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
data.map do |k, v|
winds = []
temps = []
v.each do |item|
winds << item[:wind]
temps << item[:temp]
end
{k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}
end
And the output is below,
# => {"20160815"=>{:max_temp=>22, :avg_wind=>336.0}}
Below is the small benchmark between making use of each and max_by,
data = {"20160815"=>[{:temp=>22, :wind=>351.0, :datetime=>"20160815-0330"}, {:temp=>21, :wind=>321.0, :datetime=>"20160815-0345"}]}
def by_each(data)
data.map do |k, v|
winds = []
temps = []
v.each do |item|
winds << item[:wind]
temps << item[:temp]
end
{k => {max_temp: temps.max, avg_wind: winds.inject(:+).to_f/winds.length}}
end
end
def by_max(data)
data.map do |date, values|
[date, {
maximum_temp: values.max_by { |value| value[:temp] }[:temp],
average_wind: values.sum { |value| value[:wind] }.to_f / values.length
}]
end.to_h
end
Benchmark.ips do |x|
x.config(times: 10)
x.report 'BY_EACH' do
by_each(data)
end
x.report 'BY_MAX' do
by_max(data)
end
x.compare!
end
And the benchmark o/p is like below,
Warming up --------------------------------------
BY_EACH 18.894k i/100ms
BY_MAX 13.793k i/100ms
Calculating -------------------------------------
BY_EACH 226.160k (± 5.3%) i/s - 1.134M in 5.025488s
BY_MAX 154.745k (± 5.8%) i/s - 772.408k in 5.006365s
Comparison:
BY_EACH: 226159.5 i/s
BY_MAX: 154744.8 i/s - 1.46x slower
Hence, you can see BY_MAX is 1.46 times slower than BY_EACH. But, of course you can make use of any approach that suits for your understanding and usability.
I want to dynamically create a Hash without overwriting keys from an array of arrays. Each array has a string that contains the nested key that should be created. However, I am running into the issue where I am overwriting keys and thus only the last key is there
data = {}
values = [
["income:concessions", 0, "noi", "722300", "purpose", "refinancing"],
["fees:fee-one", "0" ,"income:gross-income", "900000", "expenses:admin", "7500"],
["fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
]
What it should look like:
{
"income" => {
"concessions" => 0,
"gross-income" => "900000"
},
"expenses" => {
"admin" => "7500",
"other" => "0"
}
"noi" => "722300",
"purpose" => "refinancing",
"fees" => {
"fee-one" => 0,
"fee-two" => 0
},
"address" => {
"zip" => "10019"
}
}
This is the code that I currently, have how can I avoid overwriting keys when I merge?
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
a = data.merge!(hh)
end
end
end
The code you've provided can be modified to merge hashes on conflict instead of overwriting:
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
data.merge!(hh) { |_, old, new| old.merge(new) }
end
end
end
But this code only works for the two levels of nesting.
By the way, I noted ruby-on-rails tag on the question. There's deep_merge method that can fix the problem:
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
data.deep_merge!(hh)
end
end
end
values.flatten.each_slice(2).with_object({}) do |(f,v),h|
k,e = f.is_a?(String) ? f.split(':') : [f,nil]
h[k] = e.nil? ? v : (h[k] || {}).merge(e=>v)
end
#=> {"income"=>{"concessions"=>0, "gross-income"=>"900000"},
# "noi"=>"722300",
# "purpose"=>"refinancing",
# "fees"=>{"fee-one"=>"0", "fee-two"=>"0"},
# "expenses"=>{"admin"=>"7500", "other"=>"0"},
# "address"=>{"zip"=>"10019"}}
The steps are as follows.
values = [
["income:concessions", 0, "noi", "722300", "purpose", "refinancing"],
["fees:fee-one", "0" ,"income:gross-income", "900000", "expenses:admin", "7500"],
["fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
]
a = values.flatten
#=> ["income:concessions", 0, "noi", "722300", "purpose", "refinancing",
# "fees:fee-one", "0", "income:gross-income", "900000", "expenses:admin", "7500",
# "fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
enum1 = a.each_slice(2)
#=> #<Enumerator: ["income:concessions", 0, "noi", "722300",
# "purpose", "refinancing", "fees:fee-one", "0", "income:gross-income", "900000",
# "expenses:admin", "7500", "fees:fee-two", "0", "address:zip", "10019",
# "expenses:other","0"]:each_slice(2)>
We can see what values this enumerator will generate by converting it to an array.
enum1.to_a
#=> [["income:concessions", 0], ["noi", "722300"], ["purpose", "refinancing"],
# ["fees:fee-one", "0"], ["income:gross-income", "900000"],
# ["expenses:admin", "7500"], ["fees:fee-two", "0"],
# ["address:zip", "10019"], ["expenses:other", "0"]]
Continuing,
enum2 = enum1.with_object({})
#=> #<Enumerator: #<Enumerator:
# ["income:concessions", 0, "noi", "722300", "purpose", "refinancing",
# "fees:fee-one", "0", "income:gross-income", "900000", "expenses:admin", "7500",
# "fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
# :each_slice(2)>:with_object({})>
enum2.to_a
#=> [[["income:concessions", 0], {}], [["noi", "722300"], {}],
# [["purpose", "refinancing"], {}], [["fees:fee-one", "0"], {}],
# [["income:gross-income", "900000"], {}], [["expenses:admin", "7500"], {}],
# [["fees:fee-two", "0"], {}], [["address:zip", "10019"], {}],
# [["expenses:other", "0"], {}]]
enum2 can be thought of as a compound enumerator (though Ruby has no such concept). The hash being generated is initially empty, as shown, but will be filled in as additional elements are generated by enum2
The first value is generated by enum2 and passed to the block, and the block values are assigned values by a process called array decomposition.
(f,v),h = enum2.next
#=> [["income:concessions", 0], {}]
f #=> "income:concessions"
v #=> 0
h #=> {}
We now perform the block calculation.
f.is_a?(String)
#=> true
k,e = f.is_a?(String) ? f.split(':') : [f,nil]
#=> ["income", "concessions"]
e.nil?
#=> false
h[k] = e.nil? ? v : (h[k] || {}).merge(e=>v)
#=> {"concessions"=>0}
h[k] equals nil if h does not have a key k. In that case (h[k] || {}) #=> {}. If h does have a key k (and h[k] in not nil).(h[k] || {}) #=> h[k].
A second value is now generated by enum2 and passed to the block.
(f,v),h = enum2.next
#=> [["noi", "722300"], {"income"=>{"concessions"=>0}}]
f #=> "noi"
v #=> "722300"
h #=> {"income"=>{"concessions"=>0}}
Notice that the hash, h, has been updated. Recall it will be returned by the block after all elements of enum2 have been generated. We now perform the block calculation.
f.is_a?(String)
#=> true
k,e = f.is_a?(String) ? f.split(':') : [f,nil]
#=> ["noi"]
e #=> nil
e.nil?
#=> true
h[k] = e.nil? ? v : (h[k] || {}).merge(e=>v)
#=> "722300"
h #=> {"income"=>{"concessions"=>0}, "noi"=>"722300"}
The remaining calculations are similar.
merge overwrites a duplicate key by default.
{ "income"=> { "concessions" => 0 } }.merge({ "income"=> { "gross-income" => "900000" } } completely overwrites the original value of "income". What you want is a recursive merge, where instead of just merging the top level hash you're merging the nested values when there's duplication.
merge takes a block where you can specify what to do in the event of duplication. From the documentation:
merge!(other_hash){|key, oldval, newval| block} → hsh
Adds the contents of other_hash to hsh. If no block is specified, entries with duplicate keys are overwritten with the values from other_hash, otherwise the value of each duplicate key is determined by calling the block with the key, its value in hsh and its value in other_hash
Using this you can define a simple recursive_merge in one line
def recursive_merge!(hash, other)
hash.merge!(other) { |_key, old_val, new_val| recursive_merge!(old_val, new_val) }
end
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
a = recursive_merge!(data, hh)
end
end
end
A few more lines will give you a more robust solution, that will overwrite duplicate keys that are not hashes and even take a block just like merge
def recursive_merge!(hash, other, &block)
hash.merge!(other) do |_key, old_val, new_val|
if [old_val, new_val].all? { |v| v.is_a?(Hash) }
recursive_merge!(old_val, new_val, &block)
elsif block_given?
block.call(_key, old_val, new_val)
else
new_val
end
end
end
h1 = { a: true, b: { c: [1, 2, 3] } }
h2 = { a: false, b: { x: [3, 4, 5] } }
recursive_merge!(h1, h2) { |_k, o, _n| o } # => { a: true, b: { c: [1, 2, 3], x: [3, 4, 5] } }
Note: This method reproduces the results you would get from ActiveSupport's Hash#deep_merge if you're using Rails.
This is how I would handle this:
def new_h
Hash.new{|h,k| h[k] = new_h}
end
values.flatten.each_slice(2).each_with_object(new_h) do |(k,v),obj|
keys = k.is_a?(String) ? k.split(':') : [k]
if keys.count > 1
set_key = keys.pop
obj.merge!(keys.inject(new_h) {|memo,k1| memo[k1] = new_h})
.dig(*keys)
.merge!({set_key => v})
else
obj[k] = v
end
end
#=> {"income"=>{
"concessions"=>0,
"gross-income"=>"900000"},
"noi"=>"722300",
"purpose"=>"refinancing",
"fees"=>{
"fee-one"=>"0",
"fee-two"=>"0"},
"expenses"=>{
"admin"=>"7500",
"other"=>"0"},
"address"=>{
"zip"=>"10019"}
}
Explanation:
Define a method (new_h) for setting up a new Hash with default new_h at any level (Hash.new{|h,k| h[k] = new_h})
First flatten the Array (values.flatten)
then group each 2 elements together as sudo key value pairs (.each_slice(2))
then iterate over the pairs using an accumulator where each new element added is defaulted to a Hash (.each_with_object(new_h.call) do |(k,v),obj|)
split the sudo key on a colon (keys = k.is_a?(String) ? k.split(':') : [k])
if there is a split then create the parent key(s) (obj.merge!(keys.inject(new_h.call) {|memo,k1| memo[k1] = new_h.call}))
merge the last child key equal to the value (obj.dig(*keys.merge!({set_key => v}))
other wise set the single key equal to the value (obj[k] = v)
This has infinite depth as long as the depth chain is not broken say [["income:concessions:other",12],["income:concessions", 0]] in this case the latter value will take precedence (Note: this applies to all the answers in one way or anther e.g. the accepted answer the former wins but a value is still lost dues to inaccurate data structure)
repl.it Example
I have a hash, say,
account = {
name: "XXX",
email: "xxx#yyy.com",
details: {
phone: "9999999999",
dob: "00-00-00",
address: "zzz"
}
}
Now I want to convert account to a hash like this:
account = {
name: "XXX",
email: "xxx#yyy.com",
phone: "9999999999",
dob: "00-00-00",
address: "zzz"
}
I'm a beginner and would like to know if there is any function to do it? (Other than merging the nested hash and then deleting it)
You could implement a generic flatten_hash method which works roughly similar to Array#flatten in that it allows to flatten Hashes of arbitrary depth.
def flatten_hash(hash, &block)
hash.dup.tap do |result|
hash.each_pair do |key, value|
next unless value.is_a?(Hash)
flattened = flatten_hash(result.delete(key), &block)
result.merge!(flattened, &block)
end
end
end
Here, we are still performing the delete / merge sequence, but it would be required in any such implementation anyway, even if hidden below further abstractions.
You can use this method as follows:
account = {
name: "XXX",
email: "xxx#yyy.com",
details: {
phone: "9999999999",
dob: "00-00-00",
address: "zzz"
}
}
flatten(account)
# => {:name=>"XXX", :email=>"xxx#yyy.com", :phone=>"9999999999", :dob=>"00-00-00", :address=>"zzz"}
Note that with this method, any keys in lower-level hashes overwrite existing keys in upper-level hashes by default. You can however provide a block to resolve any merge conflicts. Please refer to the documentation of Hash#merge! to learn how to use this.
This will do the trick:
account.map{|k,v| k==:details ? v : {k => v}}.reduce({}, :merge)
Case 1: Each value of account may be a hash whose values are not hashes
account.flat_map { |k,v| v.is_a?(Hash) ? v.to_a : [[k,v]] }.to_h
#=> {:name=>"XXX", :email=>"xxx#yyy.com", :phone=>"9999999999",
# :dob=>"00-00-00", :address=>"zzz"}
Case 2: account may have nested hashes
def doit(account)
recurse(account.to_a).to_h
end
def recurse(arr)
arr.each_with_object([]) { |(k,v),a|
a.concat(v.is_a?(Hash) ? recurse(v.to_a) : [[k,v]]) }
end
account = {
name: "XXX",
email: "xxx#yyy.com",
details: {
phone: "9999999999",
dob: { a: 1, b: { c: 2, e: { f: 3 } } },
address: "zzz"
}
}
doit account
#=> {:name=>"XXX", :email=>"xxx#yyy.com", :phone=>"9999999999", :a=>1,
# :c=>2, :f=>3, :address=>"zzz"}
Explanation for Case 1
The calculations progress as follows.
One way to think of Enumerable#flat_map, as it is used here, is that if, for some method g,
[a, b, c].map { |e| g(e) } #=> [f, g, h]
where a, b, c, f, g and h are all arrays, then
[a, b, c].flat_map { |e| g(e) } #=> [*f, *g, *h]
Let's start by creating an enumerator to pass elements to the block.
enum = account.to_enum
#=> #<Enumerator: {:name=>"XXX", :email=>"xxx#yyy.com",
# :details=>{:phone=>"9999999999", :dob=>"00-00-00",
# :address=>"zzz"}}:each>
enum generates an element which is passed to the block and the block variables are set equal to those values.
k, v = enum.next
#=> [:name, "XXX"]
k #=> :name
v #=> "XXX"
v.is_a?(Hash)
#=> false
a = [[k,v]]
#=> [[:name, "XXX"]]
k, v = enum.next
#=> [:email, "xxx#yyy.com"]
v.is_a?(Hash)
#=> false
b = [[k,v]]
#=> [[:email, "xxx#yyy.com"]]
k,v = enum.next
#=> [:details, {:phone=>"9999999999", :dob=>"00-00-00", :address=>"zzz"}]
v.is_a?(Hash)
#=> true
c = v.to_a
#=> [[:phone, "9999999999"], [:dob, "00-00-00"], [:address, "zzz"]]
d = account.flat_map { |k,v| v.is_a?(Hash) ? v.to_a : [[k,v]] }
#=> [*a, *b, *c]
#=> [[:name, "XXX"], [:email, "xxx#yyy.com"], [:phone, "9999999999"],
# [:dob, "00-00-00"], [:address, "zzz"]]
d.to_h
#=> <the return value shown above>
I have a text array.
text_array = ["bob", "alice", "dave", "carol", "frank", "eve", "jordan", "isaac", "harry", "george"]
text_array = text_array.sort would give us a sorted array.
However, I want a sorted array with f as the first letter for our order, and e as the last.
So the end result should be...
text_array = ["frank", "george", "harry", "isaac", "jordan", "alice", "bob", "carol", "dave", "eve"]
What would be the best way to accomplish this?
Try this:
result = (text_array.select{ |v| v =~ /^[f-z]/ }.sort + text_array.select{ |v| v =~ /^[a-e]/ }.sort).flatten
It's not the prettiest but it will get the job done.
Edit per comment. Making a more general piece of code:
before = []
after = []
text_array.sort.each do |t|
if t > term
after << t
else
before << t
end
end
return (after + before).flatten
This code assumes that term is whatever you want to divide the array. And if an array value equals term, it will be at the end.
You can do that using a hash:
alpha = ('a'..'z').to_a
#=> ["a", "b", "c",..."x", "y", "z"]
reordered = alpha.rotate(5)
#=> ["f", "g",..."z", "a",...,"e"]
h = reordered.zip(alpha).to_h
# => {"f"=>"a", "g"=>"b",..., "z"=>"u", "a"=>"v",..., e"=>"z"}
text_array.sort_by { |w| w.gsub(/./,h) }
#=> ["frank", "george", "harry", "isaac", "jordan",
# "alice", "bob", "carol", "dave", "eve"]
A variant of this is:
a_to_z = alpha.join
#=> "abcdefghijklmnopqrstuvwxyz"
f_to_e = reordered.join
#=> "fghijklmnopqrstuvwxyzabcde"
text_array.sort_by { |w| w.tr(f_to_e, a_to_z) }
#=> ["frank", "george", "harry", "isaac", "jordan",
# "alice", "bob", "carol", "dave", "eve"]
I think the easiest would be to rotate the sorted array:
text_array.rotate(offset) if offset = text_array.find_index { |e| e.size > 0 and e[0] == 'f' }
Combining Ryan K's answer and my previous answer, this is a one-liner you can use without any regex:
text_array = text_array.sort!.select {|x| x.first >= "f"} + text_array.select {|x| x.first < "f"}
If I got your question right, it looks like you want to create sorted list with biased predefined patterns.
ie. let's say you want to define specific pattern of text which can completely change the sorting sequence for the array element.
Here is my proposal, you can get better code out of this, but my tired brain got it for now -
an_array = ["bob", "alice", "dave", "carol", "frank", "eve", "jordan", "isaac", "harry", "george"]
# Define your patterns with scores so that the sorting result can vary accordingly
# It's full fledged Regex so you can put any kind of regex you want.
patterns = {
/^f/ => 100,
/^e/ => -100,
/^g/ => 60,
/^j/ => 40
}
# Sort the array with our preferred sequence
sorted_array = an_array.sort do |left, right|
# Find score for the left string
left_score = patterns.find{ |p, s| left.match(p) }
left_score = left_score ? left_score.last : 0
# Find the score for the right string
right_score = patterns.find{ |p, s| right.match(p) }
right_score = right_score ? right_score.last : 0
# Create the comparision score to prepare the right order
# 1 means replace with right and -1 means replace with left
# and 0 means remain unchanged
score = if right_score > left_score
1
elsif left_score > right_score
-1
else
0
end
# For debugging purpose, I added few verbose data
puts "L#{left_score}, R:#{right_score}: #{left}, #{right} => #{score}"
score
end
# Original array
puts an_array.join(', ')
# Biased array
puts sorted_array.join(', ')