merging two arrays of hashes wisely - ruby-on-rails

I am trying to combine two arrays of hashes arr1 and arr2:
arr1 = [{"id"=>1, "a"=>1, "c"=>2}, {"id"=>2, "a"=>1}]
arr2 = [{"id"=>1, "a"=>10, "b"=>20}, {"id"=>3, "b"=>2}]
And I want the result to include all elements in both arrays, but the ones that have the same value for the "id" key, should be merged so that if a key exists in both hashes, it should be selected from arr2, otherwise, it just picks the value from any hash that the key exists in. So the combination of the example above would be:
combined = [
{"id"=>1, "a"=>10, "b"=>20, "c"=>2}, # "id"=>1 exists in both, so they are merged
{"id"=>2, "a"=>1},
{"id"=>3, "b"=>2}
]
The code below works, but I am new to Ruby and I am sure there is a better way to do this. Can you provide a more ruby-ic way?
combined = []
# merge items that exist in both and add to combined
arr1.each do |a1|
temp = arr2.select {|a2| a2["id"] == a1["id"]}[0]
if temp.present?
combined << temp.reverse_merge(a1)
end
end
# Add items that exist in arr1 but not in arr2
arr1.each do |a1|
if arr2.pluck("id").exclude? a1["id"]
combined << a1
end
end
# Add items that exist in arr2 but not in arr1
arr2.each do |a2|
if arr1.pluck("id").exclude? a2["id"]
combined << a2
end
end

I assume that no two elements (hashes) of arr1, g and h, have the property that g["id"] == h["id"].
In this case one could write:
(arr1 + arr2).each_with_object(Hash.new { |h,k| h[k] = {} }) { |g,h|
h[g["id"]].update(g) }.values
#=> [{"id"=>1, "a"=>10, "c"=>2, "b"=>20}, {"id"=>2, "a"=>1},
# {"id"=>3, "b"=>2}]
Note that:
(arr1 + arr2).each_with_object(Hash.new { |h,k| h[k] = {} }) { |g,h|
h[g["id"]].update(g) }
#=> {1=>{"id"=>1, "a"=>10, "c"=>2, "b"=>20}, 2=>{"id"=>2, "a"=>1},
# 3=>{"id"=>3, "b"=>2}}
If a hash is defined:
h = Hash.new { |h,k| h[k] = {} }
then, possibly after keys have been added to h, if h does not have a key k, h[k] = {} is executed and the empty hash is returned. See the form of Hash::new that takes a block. See also Hash#update (aka Hash#merge!).
One may alternatively write:
(arr1 + arr2).each_with_object({}) { |g,h| (h[g["id"]] ||= {}).update(g) }.values
#=> {1=>{"id"=>1, "a"=>10, "c"=>2, "b"=>20}, 2=>{"id"=>2, "a"=>1},
# 3=>{"id"=>3, "b"=>2}}
Another way is to use Emumerable#group_by, where the grouping is on the value of the key "id":
(arr1 + arr2).group_by { |h| h["id"] }.values.map { |a| a.reduce(&:merge) }
#=> [{"id"=>1, "a"=>10, "c"=>2, "b"=>20}, {"id"=>2, "a"=>1}, {"id"=>3, "b"=>2}]

Related

Dynamically create hash from array of arrays

I want to dynamically create a Hash without overwriting keys from an array of arrays. Each array has a string that contains the nested key that should be created. However, I am running into the issue where I am overwriting keys and thus only the last key is there
data = {}
values = [
["income:concessions", 0, "noi", "722300", "purpose", "refinancing"],
["fees:fee-one", "0" ,"income:gross-income", "900000", "expenses:admin", "7500"],
["fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
]
What it should look like:
{
"income" => {
"concessions" => 0,
"gross-income" => "900000"
},
"expenses" => {
"admin" => "7500",
"other" => "0"
}
"noi" => "722300",
"purpose" => "refinancing",
"fees" => {
"fee-one" => 0,
"fee-two" => 0
},
"address" => {
"zip" => "10019"
}
}
This is the code that I currently, have how can I avoid overwriting keys when I merge?
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
a = data.merge!(hh)
end
end
end
The code you've provided can be modified to merge hashes on conflict instead of overwriting:
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
data.merge!(hh) { |_, old, new| old.merge(new) }
end
end
end
But this code only works for the two levels of nesting.
By the way, I noted ruby-on-rails tag on the question. There's deep_merge method that can fix the problem:
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
data.deep_merge!(hh)
end
end
end
values.flatten.each_slice(2).with_object({}) do |(f,v),h|
k,e = f.is_a?(String) ? f.split(':') : [f,nil]
h[k] = e.nil? ? v : (h[k] || {}).merge(e=>v)
end
#=> {"income"=>{"concessions"=>0, "gross-income"=>"900000"},
# "noi"=>"722300",
# "purpose"=>"refinancing",
# "fees"=>{"fee-one"=>"0", "fee-two"=>"0"},
# "expenses"=>{"admin"=>"7500", "other"=>"0"},
# "address"=>{"zip"=>"10019"}}
The steps are as follows.
values = [
["income:concessions", 0, "noi", "722300", "purpose", "refinancing"],
["fees:fee-one", "0" ,"income:gross-income", "900000", "expenses:admin", "7500"],
["fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
]
a = values.flatten
#=> ["income:concessions", 0, "noi", "722300", "purpose", "refinancing",
# "fees:fee-one", "0", "income:gross-income", "900000", "expenses:admin", "7500",
# "fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
enum1 = a.each_slice(2)
#=> #<Enumerator: ["income:concessions", 0, "noi", "722300",
# "purpose", "refinancing", "fees:fee-one", "0", "income:gross-income", "900000",
# "expenses:admin", "7500", "fees:fee-two", "0", "address:zip", "10019",
# "expenses:other","0"]:each_slice(2)>
We can see what values this enumerator will generate by converting it to an array.
enum1.to_a
#=> [["income:concessions", 0], ["noi", "722300"], ["purpose", "refinancing"],
# ["fees:fee-one", "0"], ["income:gross-income", "900000"],
# ["expenses:admin", "7500"], ["fees:fee-two", "0"],
# ["address:zip", "10019"], ["expenses:other", "0"]]
Continuing,
enum2 = enum1.with_object({})
#=> #<Enumerator: #<Enumerator:
# ["income:concessions", 0, "noi", "722300", "purpose", "refinancing",
# "fees:fee-one", "0", "income:gross-income", "900000", "expenses:admin", "7500",
# "fees:fee-two", "0", "address:zip", "10019", "expenses:other", "0"]
# :each_slice(2)>:with_object({})>
enum2.to_a
#=> [[["income:concessions", 0], {}], [["noi", "722300"], {}],
# [["purpose", "refinancing"], {}], [["fees:fee-one", "0"], {}],
# [["income:gross-income", "900000"], {}], [["expenses:admin", "7500"], {}],
# [["fees:fee-two", "0"], {}], [["address:zip", "10019"], {}],
# [["expenses:other", "0"], {}]]
enum2 can be thought of as a compound enumerator (though Ruby has no such concept). The hash being generated is initially empty, as shown, but will be filled in as additional elements are generated by enum2
The first value is generated by enum2 and passed to the block, and the block values are assigned values by a process called array decomposition.
(f,v),h = enum2.next
#=> [["income:concessions", 0], {}]
f #=> "income:concessions"
v #=> 0
h #=> {}
We now perform the block calculation.
f.is_a?(String)
#=> true
k,e = f.is_a?(String) ? f.split(':') : [f,nil]
#=> ["income", "concessions"]
e.nil?
#=> false
h[k] = e.nil? ? v : (h[k] || {}).merge(e=>v)
#=> {"concessions"=>0}
h[k] equals nil if h does not have a key k. In that case (h[k] || {}) #=> {}. If h does have a key k (and h[k] in not nil).(h[k] || {}) #=> h[k].
A second value is now generated by enum2 and passed to the block.
(f,v),h = enum2.next
#=> [["noi", "722300"], {"income"=>{"concessions"=>0}}]
f #=> "noi"
v #=> "722300"
h #=> {"income"=>{"concessions"=>0}}
Notice that the hash, h, has been updated. Recall it will be returned by the block after all elements of enum2 have been generated. We now perform the block calculation.
f.is_a?(String)
#=> true
k,e = f.is_a?(String) ? f.split(':') : [f,nil]
#=> ["noi"]
e #=> nil
e.nil?
#=> true
h[k] = e.nil? ? v : (h[k] || {}).merge(e=>v)
#=> "722300"
h #=> {"income"=>{"concessions"=>0}, "noi"=>"722300"}
The remaining calculations are similar.
merge overwrites a duplicate key by default.
{ "income"=> { "concessions" => 0 } }.merge({ "income"=> { "gross-income" => "900000" } } completely overwrites the original value of "income". What you want is a recursive merge, where instead of just merging the top level hash you're merging the nested values when there's duplication.
merge takes a block where you can specify what to do in the event of duplication. From the documentation:
merge!(other_hash){|key, oldval, newval| block} → hsh
Adds the contents of other_hash to hsh. If no block is specified, entries with duplicate keys are overwritten with the values from other_hash, otherwise the value of each duplicate key is determined by calling the block with the key, its value in hsh and its value in other_hash
Using this you can define a simple recursive_merge in one line
def recursive_merge!(hash, other)
hash.merge!(other) { |_key, old_val, new_val| recursive_merge!(old_val, new_val) }
end
values.each do |row|
Hash[*row].each do |key, value|
keys = key.split(':')
if !data.dig(*keys)
hh = keys.reverse.inject(value) { |a, n| { n => a } }
a = recursive_merge!(data, hh)
end
end
end
A few more lines will give you a more robust solution, that will overwrite duplicate keys that are not hashes and even take a block just like merge
def recursive_merge!(hash, other, &block)
hash.merge!(other) do |_key, old_val, new_val|
if [old_val, new_val].all? { |v| v.is_a?(Hash) }
recursive_merge!(old_val, new_val, &block)
elsif block_given?
block.call(_key, old_val, new_val)
else
new_val
end
end
end
h1 = { a: true, b: { c: [1, 2, 3] } }
h2 = { a: false, b: { x: [3, 4, 5] } }
recursive_merge!(h1, h2) { |_k, o, _n| o } # => { a: true, b: { c: [1, 2, 3], x: [3, 4, 5] } }
Note: This method reproduces the results you would get from ActiveSupport's Hash#deep_merge if you're using Rails.
This is how I would handle this:
def new_h
Hash.new{|h,k| h[k] = new_h}
end
values.flatten.each_slice(2).each_with_object(new_h) do |(k,v),obj|
keys = k.is_a?(String) ? k.split(':') : [k]
if keys.count > 1
set_key = keys.pop
obj.merge!(keys.inject(new_h) {|memo,k1| memo[k1] = new_h})
.dig(*keys)
.merge!({set_key => v})
else
obj[k] = v
end
end
#=> {"income"=>{
"concessions"=>0,
"gross-income"=>"900000"},
"noi"=>"722300",
"purpose"=>"refinancing",
"fees"=>{
"fee-one"=>"0",
"fee-two"=>"0"},
"expenses"=>{
"admin"=>"7500",
"other"=>"0"},
"address"=>{
"zip"=>"10019"}
}
Explanation:
Define a method (new_h) for setting up a new Hash with default new_h at any level (Hash.new{|h,k| h[k] = new_h})
First flatten the Array (values.flatten)
then group each 2 elements together as sudo key value pairs (.each_slice(2))
then iterate over the pairs using an accumulator where each new element added is defaulted to a Hash (.each_with_object(new_h.call) do |(k,v),obj|)
split the sudo key on a colon (keys = k.is_a?(String) ? k.split(':') : [k])
if there is a split then create the parent key(s) (obj.merge!(keys.inject(new_h.call) {|memo,k1| memo[k1] = new_h.call}))
merge the last child key equal to the value (obj.dig(*keys.merge!({set_key => v}))
other wise set the single key equal to the value (obj[k] = v)
This has infinite depth as long as the depth chain is not broken say [["income:concessions:other",12],["income:concessions", 0]] in this case the latter value will take precedence (Note: this applies to all the answers in one way or anther e.g. the accepted answer the former wins but a value is still lost dues to inaccurate data structure)
repl.it Example

how to combine 2 arrays in rails such that there are no duplicates

Consider I have 2 arrays,
o = ["16", "16", "119"]
d = ["97", "119", "97"]
Output that is needed is like this:
{16=>[97, 119], 119=>[97]}
I tried using .zip but it didn't work. How do I do it?
You could chain group_by and with_index to group the elements in d by the corresponding element in o:
d.group_by.with_index { |_, i| o[i] }
#=> {"16"=>["97", "119"], "119"=>["97"]}
To get integers, you have to add some to_i calls:
d.map(&:to_i).group_by.with_index { |_, i| o[i].to_i }
#=> {16=>[97, 119], 119=>[97]}
First thing that comes to mind is this:
result = Hash.new { |h, k| h[k] = [] }
o.zip(d) { |a, b| result[a] << b }
result #=> {"16"=>["97", "119"], "119"=>["97"]}
There probably is a better way though, but this should get you thinking.
o.map(&:to_i).zip(d.map(&:to_i)).group_by(&:first).each_value{|a| a.map!(&:last)}
# => {16=>[97, 119], 119=>[97]}

Get first element of nested array for hash key

Supposing I have an array that looks like:
[
[["str1"],["val1"],["val2"]],
[["str2"], ["val1"], ["val2"], ["val3"]]
]
Is there a way for me to get a Hash that looks like:
{
"str1" => [["val1"],["val2"]],
"str2" => [["val1"],["val2"],["val3"]]
}
a.map { |a| [a.first.first, a.drop(1)] }.to_h
# or
a.each_with_object({}) {|a, h| h[a.first.first] = a.drop(1) }
#=> {
# "str1"=>[["val1"], ["val2"]],
# "str2"=>[["val1"], ["val2"], ["val3"]]
# }
If you do not want to have each element in a separate array:
Hash[a.map(&:flatten).map { |a| [a.first, a.drop(1)] }]
#=> {"str1"=>["val1", "val2"], "str2"=>["val1", "val2", "val3"]}
If your array is arr, you could write the following.
Marshal.load(Marshal.dump(arr)).map { |a| [a.shift.first, a.map(&:first)] }.to_h
#=> {"str1"=>["val1", "val2"],
# "str2"=>["val1", "val2", "val3"]}
A slight variation would be:
Marshal.load(Marshal.dump(arr)).map { |a| [a.shift.first, a] }.to_h
#=> {"str1"=>["val1", "val2"],
# "str2"=>["val1", "val2", "val3"]}
Note: We're using the Marshal class to make an exact copy of arr to avoid modifying the original array. Object#dup or Object#clone won't work here.
arr = [
[["str1"],["val1"],["val2"]],
[["str2"], ["val1"], ["val2"], ["val3"]]
]
arr.each_with_object({}) { |((k), *values), h| h[k]=values }
#=> {"str1"=>[["val1"], ["val2"]], "str2"=>[["val1"], ["val2"], ["val3"]]}
This illustrates how Ruby's use of parallel assignment for determining the values of block variables can be used to advantage.

Improve performance to find an array of ids from array of hashes in Ruby

Consider a array of hashes
a=[{'id'=>'1','imageUrl'=>'abc'},{'id'=>'2','imageUrl'=>'efg'},{'id'=>'3','imageUrl'=>'hij'}]
Consider an array of characters/numbers/ids
b=['1','2','5']
I would like to match ids of b with a. With all matches, I would like to replace the value of b with the corresponding hash.
In the above example, the values '1' and '2' are common between a and b, so I replace '1' and '2' in b with the corresponding hash values of a.
So the resultant b becomes
b=[[{"id"=>"1", "imageUrl"=>"abc"}], [{"id"=>"2", "imageUrl"=>"efg"}], []]
I wrote the following code:
b.each_with_index{|r,index|
puts index
k=a.select {|z| z["id"]==r }
b[index]=k
}
Is there a better solution? A more sleek one. I am new to ruby.
You can use the destructive version of Enumerable#map, with Enumerable#select
b.map! {|id| a.select {|h| h['id'] == id }}
# => [[{"id"=>"1", "imageUrl"=>"abc"}], [{"id"=>"2", "imageUrl"=>"efg"}], []]
This will improve speed:
#!/usr/bin/env ruby
require 'pp'
require 'benchmark'
a = []
5000.times {|c| a << {"id" => "#{c}", "imageUrl" => "test#{c}"}}
b1 = (1..2500).to_a.shuffle.map(&:to_s)
b2 = b1.dup()
puts "method1"
puts Benchmark.measure { b1.map! {|id| a.select {|h| h['id'] == id }} }
puts "method2"
result = Benchmark.measure do
ah = Hash.new([])
a.each{|x| ah[x["id"]] = x}
b2.map!{|be| ah[be]}
end
puts result
Results:
method1
2.820000 0.010000 2.830000 ( 2.827695)
method2
0.000000 0.000000 0.000000 ( 0.002607)
Updated benchmark - it uses 250000 elements in b instead of 2500 (method 1 commented out to protect the innocent - it's too slow and I got bored waiting for it):
#!/usr/bin/env ruby
require 'pp'
require 'benchmark'
a = []
5000.times {|c| a << {"id" => "#{c}", "imageUrl" => "test#{c}"}}
b1 = (1..250000).to_a.collect{|x| x%2500}.shuffle.map(&:to_s)
b2 = b1.dup()
b3 = b1.dup()
# puts "method1"
# puts Benchmark.measure { b1.map! {|id| a.select {|h| h['id'] == id }} }
puts "method2"
result = Benchmark.measure do
ah = Hash.new([])
a.each{|x| ah[x["id"]] = x}
b2.map!{|be| ah[be]}
end
puts result
puts "method3"
result = Benchmark.measure do
h = a.each_with_object({}) { |g,h| h.update(g['id']=>g) }
b3.map! { |s| h.key?(s) ? [h[s]] : [] }
end
puts result
And the results are:
method2
0.050000 0.000000 0.050000 ( 0.045294)
method3
0.100000 0.010000 0.110000 ( 0.109646)
[Edit: after posting I noticed #Mircea had already posted the same solution. I'll leave mine for the mention of the values_at alternative.]
I assume the values of :id in a are unique.
First construct a look-up hash:
h = a.each_with_object({}) { |g,h| h.update(g['id']=>g) }
#=> {"1"=>{"id"=>"1", "imageUrl"=>"abc"},
# "2"=>{"id"=>"2", "imageUrl"=>"efg"},
# "3"=>{"id"=>"3", "imageUrl"=>"hij"}}
Then simply loop through b, constructing the desired array:
b.map { |s| h.key?(s) ? [h[s]] : [] }
#=> [[{"id"=>"1", "imageUrl"=>"abc"}],
# [{"id"=>"2", "imageUrl"=>"efg"}],
# []]
Alternatively,
arr = h.values_at(*b)
#=> [{"id"=>"1", "imageUrl"=>"abc"},
# {"id"=>"2", "imageUrl"=>"efg"},
# nil]
Then:
arr.map { |e| e.nil? ? [] : [e] }
#=> [[{"id"=>"1", "imageUrl"=>"abc"}],
# [{"id"=>"2", "imageUrl"=>"efg"}],
# []]
You might instead consider using arr for subsequent calculations, since all the arrays in your desired solution contain at most one element.
The use of a lookup hash is especially efficient when b is large relative to a.

Creating a new hash with default keys

I want to create a hash with an index that comes from an array.
ary = ["a", "b", "c"]
h = Hash.new(ary.each{|a| h[a] = 0})
My goal is to start with a hash like this:
h = {"a"=>0, "b"=>0, "c"=>0}
so that later when the hash has changed I can reset it with h.default
Unfortunately the way I'm setting up the hash is not working... any ideas?
You should instantiate your hash h first, and then fill it with the contents of the array:
h = {}
ary = ["a", "b", "c"]
ary.each{|a| h[a] = 0}
Use the default value feature for the hash
h = Hash.new(0)
h["a"] # => 0
In this approach, the key is not set.
h.key?("a") # => false
Other approach is to set the missing key when accessed.
h = Hash.new {|h, k| h[k] = 0}
h["a"] # => 0
h.key?("a") # => true
Even in this approach, the operations like key? will fail if you haven't accessed the key before.
h.key?("b") # => false
h["b"] # => 0
h.key?("b") # => true
You can always resort to brute force, which has the least boundary conditions.
h = Hash.new.tap {|h| ["a", "b", "c"].each{|k| h[k] = 0}}
h.key?("b") # => true
h["b"] # => 0
You can do it like this where you expand a list into zero-initialized values:
list = %w[ a b c ]
hash = Hash[list.collect { |i| [ i, 0 ] }]
You can also make a Hash that simply has a default value of 0 for any given key:
hash = Hash.new { |h, k| h[k] = 0 }
Any new key referenced will be pre-initialized to the default value and this will avoid having to initialize the whole hash.
This may not be the most efficient way, but I always appreciate one-liners that reveal a little more about Ruby's versatility:
h = Hash[['a', 'b', 'c'].collect { |v| [v, 0] }]
Or another one-liner that does the same thing:
h = ['a', 'b', 'c'].inject({}) {|h, v| h[v] = 0; h }
By the way, from a performance standpoint, the one-liners run about 80% of the speed of:
h = {}
ary = ['a','b','c']
ary.each { |a| h[a]=0 }
Rails 6 added index_with on Enumerable module. This will help in creating a hash from an enumerator with default or fetched values.
ary = %w[a b c]
hash = ary.index_with(0) # => {"a"=>0, "b"=>0, "c"=>0}
Another option is to use the Enum#inject method which I'm a fan of for its cleanliness. I haven't benchmarked it compared to the other options though.
h = ary.inject({}) {|hash, key| hash[key] = 0; hash}
Alternate way of having a hash with the keys actually added
Hash[[:a, :b, :c].zip([])] # => {:a=>nil, :b=>nil, :c=>nil}
Hash[[:a, :b, :c].zip(Array.new(3, 0))] # => {:a=>0, :b=>0, :c=>0}

Resources