Get unique properties from array of hashes in ruby - ruby-on-rails

Given an array of hashes, I want to create a method that returns a hash where the keys are the unique values of the hashes in the array.
For example, I'd like to take
[
{foo: 'bar', baz: 'bang'},
{foo: 'rab', baz: 'bang'},
{foo: 'bizz', baz: 'buzz'}
]
and return
{
foo: ['bar', 'rab', 'bizz'],
baz: ['bang', 'buzz']
}
I am currently accomplishing this using:
def my_fantastic_method(data)
response_data = { foo: [], baz: []}
data.each { |data|
data.attributes.each { |key, value|
response_data[key.to_sym] << value
}
}
response_data.each { |key, value| response_data[key] = response_data[key].uniq }
response_data
end
Is there a more elegant way of doing this? Thanks!

Your current approach is already pretty good; I don't see much room for improvement. I would write it like this:
def my_fantastic_method(data_list)
data_list.each_with_object(Hash.new { |h, k| h[k] = Set.new }) do |data, result|
data.attributes.each do |key, value|
result[key.to_sym] << value
end
end
end
By setting a default value on each hash value, I have eliminated the need to explicitly declare foo: [], bar: [].
By using each_with_object, I have eliminated the need to declare a local variable and explicitly return it at the end.
By using Set, there is no need to call uniq on the final result. This requires less code, and is more performant. However, if you really want the final result to be a mapping to Arrays rather than Sets, then you would need to call to_a on each value at the end of the method.
I have used different variable names for data_list and data. Call these whatever you like, but it's typically considered bad practice to shadow outer variables.

Here are a couple of one-liners. (I'm pretty sure #eiko was being facetious, but I'm proving him correct)
This one reads well and is easy to follow (caveat: requires Ruby 2.4+ for transform_values):
array.flat_map(&:entries).group_by(&:first).transform_values{|v| v.map(&:last).uniq}
Here's another, using the block form of merge to specify an alternate merge method, which in this case is combining the values into a uniq array:
array.reduce{|h, el| h.merge(el){|k, old, new| ([old]+[new]).flatten.uniq}}

You already have a pretty good answer, but I felt golfy and so here is a shorter one:
def the_combiner(a)
hash = {}
a.map(&:to_a).flatten(1).each do |k,v|
hash[k] ||= []
hash[k].push(v)
end
hash
end

Try this:
array.flat_map(&:entries)
.group_by(&:first)
.map{|k,v| {k => v.map(&:last)} }
OR
a.inject({}) {|old_h, new_h|
new_h.each_pair {|k, v|
old_h.key?(k) ? old_h[k] << v : old_h[k]=[v]};
old_h}

If, as in the example, all hashes have the same keys, you could do as follows.
arr = [{ foo: 'bar', baz: 'bang' },
{ foo: 'rab', baz: 'bang' },
{ foo: 'bizz', baz: 'buzz' }]
keys = arr.first.keys
keys.zip(arr.map { |h| h.values_at(*keys) }.transpose.map(&:uniq)).to_h
#=> {:foo=>["bar", "rab", "bizz"], :baz=>["bang", "buzz"]}
The steps are as follows.
keys = arr.first.keys
#=> [:foo, :baz]
a = arr.map { |h| h.values_at(*keys) }
#=> [["bar", "bang"], ["rab", "bang"], ["bizz", "buzz"]]
b = a.transpose
#=> [["bar", "rab", "bizz"], ["bang", "bang", "buzz"]]
c = b.map(&:uniq)
#=> [["bar", "rab", "bizz"], ["bang", "buzz"]]
d = c.to_h
#=> <array of hashes shown above>

Related

Flatten array of nested hashes

I have an array of nested hashes that looks something like this:
[{"month"=>1,
"percentiles"=>{"25"=>768.06, "50"=>1868.5, "75"=>3043.79, "90"=>4161.6},
"total_revenue"=>1308620.0,
"year"=>2017},
{"month"=>2,
"percentiles"=>{"25"=>922.63, "50"=>2074.31, "75"=>3048.87, "90"=>4018.6},
"total_revenue"=>1105860.0,
"year"=>2017}]
That I would like to flatten into this:
[{"month"=>1,
"25"=>768.06, "50"=>1868.5, "75"=>3043.79, "90"=>4161.6,
"total_revenue"=>1308620.0,
"year"=>2017},
{"month"=>2,
"25"=>922.63, "50"=>2074.31, "75"=>3048.87, "90"=>4018.6,
"total_revenue"=>1105860.0,
"year"=>2017}]
I have been looking and testing different methods with no luck. Any ideas on how to accomplish this? The end goal is to mass update/insert these into a database, so if there is a better way to accomplish that, I would like to see a different approach.
If you don't mind modifying the array in-place then you could say:
array.each { |h| h.merge!(h.delete('percentiles')) }
If you're not sure that all the hashes have 'percentiles' keys then you could say:
# Explicitly check
array.each { |h| h.merge!(h.delete('percentiles')) if(h.has_key?('percentiles')) }
# Convert possible `nil`s to `{ }`
array.each { |h| h.merge!(h.delete('percentiles').to_h) }
# Filter before merging
array.select { |h| h.has_key?('percentiles') }.each { |h| h.merge!(h.delete('percentiles')) }
If you want to flatten all hash values then you can do things like this:
array.each do |h|
h.keys.each do |k|
if(h[k].is_a?(Hash))
h.merge!(h.delete(k))
end
end
end
If you don't want to modify the hashes inside the array then variations on:
flat = array.map(&:dup).each { |h| h.merge!(h.delete('percentiles')) }
flat = array.map do |e|
e.each_with_object({}) do |(k, v), h|
if(v.is_a?(Hash))
h.merge!(v)
else
h[k] = v
end
end
end

build a new array of hash from multiple array of hashes

I have following three array of hashes.
customer_mapping = [
{:customer_id=>"a", :customer_order_id=>"g1"},
{:customer_id=>"b", :customer_order_id=>"g2"},
{:customer_id=>"c", :customer_order_id=>"g3"},
{:customer_id=>"d", :customer_order_id=>"g4"},
{:customer_id=>"e", :customer_order_id=>"g5"}
]
customer_with_products = [
{:customer_order_id=>"g1", :product_order_id=>"a1"},
{:customer_order_id=>"g2", :product_order_id=>"a2"},
{:customer_order_id=>"g3", :product_order_id=>"a3"},
{:customer_order_id=>"g4", :product_order_id=>"a4"},
{:customer_order_id=>"g5", :product_order_id=>"a5"}
]
product_mapping = [
{:product_id=>"j", :product_order_id=>"a1"},
{:product_id=>"k", :product_order_id=>"a2"},
{:product_id=>"l", :product_order_id=>"a3"}
]
What i want is a new hash with only customer_id and product_id
{:product_id=>"j", :customer_id=>"a"},
{:product_id=>"k", :customer_id=>"b"},
{:product_id=>"l", :customer_id=>"c"}
I tried to loop over product_mapping and select the customer_order_id that match product_order_id in customer_with_products and then thought of looping over customer_mapping but not able to get desired output from the first step.
How can i achieve this?
Using
def merge_by(a,b, key)
(a+b).group_by { |h| h[key] }
.each_value.map { |arr| arr.inject(:merge) }
end
merge_by(
merge_by(customer_mapping, customer_with_products, :customer_order_id),
product_mapping,
:product_order_id
).select { |h| h[:product_id] }.map { |h| h.slice(:product_id, :customer_id) }
#=>[{:product_id=>"j", :customer_id=>"a"},
# {:product_id=>"k", :customer_id=>"b"},
# {:product_id=>"l", :customer_id=>"c"}]
Definitely not the cleanest solution, if your initial arrays come from SQL queries, I think those queries could be modified to aggregate your data properly.
merge_by(customer_mapping, customer_with_products, :customer_order_id)
# => [{:customer_id=>"a", :customer_order_id=>"g1", :product_order_id=>"a1"},
# {:customer_id=>"b", :customer_order_id=>"g2", :product_order_id=>"a2"},
# {:customer_id=>"c", :customer_order_id=>"g3", :product_order_id=>"a3"},
# {:customer_id=>"d", :customer_order_id=>"g4", :product_order_id=>"a4"},
# {:customer_id=>"e", :customer_order_id=>"g5", :product_order_id=>"a5"}]
Then merge it similarly with your last array and cleanup the result selecting only the elements for which :product_id was found, slicing wanted keys.
Alternatively, a much more readable solution, depending on your array sizes might be slower as it keeps iterating over the hashes:
product_mapping.map do |hc|
b_match = customer_with_products.detect { |hb| hb[:product_order_id] == hc[:product_order_id] }
a_match = customer_mapping.detect { |ha| ha[:customer_order_id] == b_match[:customer_order_id] }
[hc, a_match, b_match].inject(:merge)
end.map { |h| h.slice(:product_id, :customer_id) }
Following your handling of the problem the solution would be the following:
result_hash_array = product_mapping.map do |product_mapping_entry|
customer_receipt = customer_with_products.find do |customer_with_products_entry|
product_mapping_entry[:product_order_id] == customer_with_products_entry[:product_order_id]
end
customer_id = customer_mapping.find do |customer_mapping_entry|
customer_receipt[:customer_order_id] == customer_mapping_entry[:customer_order_id]
end[:customer_id]
{product_id: product_mapping_entry[:product_id], customer_id: customer_id}
end
Output
results_hash_array => [{:product_id=>"j", :customer_id=>"a"},
{:product_id=>"k", :customer_id=>"b"},
{:product_id=>"l", :customer_id=>"c"}]
Other option, starting from customer_mapping, one liner (but quite wide):
customer_mapping.map { |e| {customer_id: e[:customer_id], product_id: (product_mapping.detect { |k| k[:product_order_id] == (customer_with_products.detect{ |h| h[:customer_order_id] == e[:customer_order_id] } || {} )[:product_order_id] } || {} )[:product_id] } }
#=> [{:customer_id=>"a", :product_id=>"j"},
# {:customer_id=>"b", :product_id=>"k"},
# {:customer_id=>"c", :product_id=>"l"},
# {:customer_id=>"d", :product_id=>nil},
# {:customer_id=>"e", :product_id=>nil}]
cust_order_id_to_cust_id =
customer_mapping.each_with_object({}) do |g,h|
h[g[:customer_order_id]] = g[:customer_id]
end
#=> {"g1"=>"a", "g2"=>"b", "g3"=>"c", "g4"=>"d", "g5"=>"e"}
prod_order_id_to_cust_order_id =
customer_with_products.each_with_object({}) do |g,h|
h[g[:product_order_id]] = g[:customer_order_id]
end
#=> {"a1"=>"g1", "a2"=>"g2", "a3"=>"g3", "a4"=>"g4", "a5"=>"g5"}
product_mapping.map do |h|
{ product_id: h[:product_id], customer_id:
cust_order_id_to_cust_id[prod_order_id_to_cust_order_id[h[:product_order_id]]] }
end
#=> [{:product_id=>"j", :customer_id=>"a"},
# {:product_id=>"k", :customer_id=>"b"},
# {:product_id=>"l", :customer_id=>"c"}]
This formulation is particularly easy to test. (It's so straightforward that no debugging was needed).
I would recommended to rather take a longer but more readable solution which you also understand in some months from now by looking at it. Use full names for the hash keys instead of hiding them behind k, v for more complexe lookups (maybe its just my personal preference).
I would suggest somethink like:
result = product_mapping.map do |mapping|
customer_id = customer_mapping.find do |hash|
hash[:customer_order_id] == customer_with_products.find do |hash|
hash[:product_order_id] == mapping[:product_order_id]
end[:customer_order_id]
end[:customer_id]
{ product_id: mapping[:product_id], customer_id: customer_id }
end

Ruby: Passing down key/value after transforming objects in array

Given data:
data = [
{"id":14, "sort":1, "content":"9", foo: "2022"},
{"id":14, "sort":4, "content":"5", foo: "2022"},
{"id":14, "sort":2, "content":"1", foo: "2022"},
{"id":14, "sort":3, "content":"0", foo: "2022"},
{"id":15, "sort":4, "content":"4", foo: "2888"},
{"id":15, "sort":2, "content":"1", foo: "2888"},
{"id":15, "sort":1, "content":"3", foo: "2888"},
{"id":15, "sort":3, "content":"3", foo: "2888"},
{"id":16, "sort":1, "content":"8", foo: "3112"},
{"id":16, "sort":3, "content":"4", foo: "3112"},
{"id":16, "sort":2, "content":"4", foo: "3112"},
{"id":16, "sort":4, "content":"9", foo: "3112"}
]
Got the contents concatenated by their sort and ids with:
formatted = data.group_by { |d| d[:id]}.transform_values do |value_array|
value_array.sort_by { |b| b[:sort] }
.map { |c| c[:content] }.join
end
puts formatted
#=> {14=>"9105", 15=>"3134", 16=>"8449"}
I know that foo exists inside value_array but wondering how can I include foo to exist inside the formatted variable so I can map through it to get the desired output or if it's possible?
Desired Output:
[
{"id":14, "concated_value":"9105", foo: "2022"},
{"id":15, "concated_value":"3134", foo: "2888"},
{"id":16, "concated_value":"8449", foo: "3112"}
]
Since :foo is unique to :id. You can do this as follows:
data.group_by {|h| h[:id]}.map do |_,sa|
sa.map(&:dup).sort_by {|h| h.delete(:sort) }.reduce do |m,h|
m.merge(h) {|key,old,new| key == :content ? old + new : old }
end.tap {|h| h[:concated_value] = h.delete(:content) }
end
#=> [
# {"id":14, foo: "2022", "concated_value":"9105"},
# {"id":15, foo: "2888", "concated_value":"3134"},
# {"id":16, foo: "3112", "concated_value":"8449"}
# ]
First we group by id. group_by {|h| h[:id]}
Then we dup the hashes in the groups (so as not to destory the original). map(&:dup)
Then we sort by sort and delete it at the same time. .sort_by {|h| h.delete(:sort) }
Then we merge the groups together and concatenate the content key only.
m.merge(h) {|key,old,new| key == :content ? old + new : old }
Then we just change the key for content to concated_value tap {|h| h[:concated_value] = h.delete(:content) }
We can use first value from value_array to get our :id & :foo values
formatted = data.group_by { |d| d[:id]}.values.map do |value_array|
concated_value = value_array.sort_by { |b| b[:sort] }
.map { |c| c[:content] }.join
value_array.first.slice(:id, :foo)
.merge concated_value: concated_value
end
I think this is a good usecase for reduce, since after grouping you need first to get rid of the ID in the resulting [ID, VALUES] array from group_by and just return a reduced version of the VALUES part - this can all be done without any ActiveSupport etc. dependencies:
data
.group_by{ |d| d[:id] } # Get an array of [ID, [VALUES]]
.reduce([]) do |a, v| # Reduce it into a new empty array
# Append a new hash to the new array
a << {
id: v[1].first[:id], # Just take the ID of the first entry
foo: v[1].first[:foo], # Dito for foo
concatenated: v[1]
.sort_by{ |s| s[:sort] } # now sort all hashes by its sort key
.collect{ |s| s[:content] } # collect the content
.join # and merge it into a string
}
end
Output:
[{:id=>14, :foo=>"2022", :concatenated=>"9105"},
{:id=>15, :foo=>"2888", :concatenated=>"3134"},
{:id=>16, :foo=>"3112", :concatenated=>"8449"}]
EDIT
I had some other approach in mind when i started to write the previous solution, reduce was not really necessary, since the size of the array after group_by does not change, so a map is sufficient.
But while rewriting the code, i was thinking that creating a new hash with all the keys and copying all the values from the first hash within VALUES was a bit too much work, so it would be easier to just reject the overhead keys:
keys_to_ignore = [:sort, :content]
data
.group_by{ |d| d[:id] } # Get an array of [ID, [VALUES]]
.map do |v|
v[1]
.first # Take the first hash from [VALUES]
.merge({'concatenated': v[1] # Insert the concatenated values
.sort_by{ |s| s[:sort] } # now sort all hashes by its sort key
.collect{ |s| s[:content] } # collect the content
.join # and merge it into a string
})
.select { |k, _| !keys_to_ignore.include? k }
end
Output
[{:id=>14, :foo=>"2022", :concatenated=>"9105"},
{:id=>15, :foo=>"2888", :concatenated=>"3134"},
{:id=>16, :foo=>"3112", :concatenated=>"8449"}]
Online demo here
This will work even without Rails:
$irb> formatted = []
$irb> data.sort_by!{|a| a[:sort]}.map {|z| z[:id]}.uniq.each_with_index { |id, index| formatted << {id: id, concated_value: data.map{|c| (c[:id] == id ? c[:content] : nil)}.join, foo: data[index][:foo]}}
$irb> formatted
[{:id=>14, :concated_value=>"9105", :foo=>"2022"},
{:id=>15, :concated_value=>"3134", :foo=>"2888"},
{:id=>16, :concated_value=>"8449", :foo=>"3112"}]
data.sort_by { |h| h[:sort] }.
each_with_object({}) do |g,h| h.update(g[:id]=>{ id: g[:id],
concatenated_value: g[:content].to_s, foo: g[:foo] }) { |_,o,n|
o.merge(concatenated_value: o[:concatenated_value]+n[:concatenated_value]) }
end.values
#=> [{:id=>14, :concatenated_value=>"9105", :foo=>"2022"},
# {:id=>15, :concatenated_value=>"3134", :foo=>"2888"},
# {:id=>16, :concatenated_value=>"8449", :foo=>"3112"}]
This uses the form of Hash#update (aka merge!) that employs a block to determine the values of keys (here the value of :id) that are present in both hashes being merged. See the doc for the description of the three block variables (here _, o and n).
Note the receiver of values (at the end) is the following.
{ 14=>{ :id=>14, :concatenated_value=>"9105", :foo=>"2022" },
15=>{ :id=>15, :concatenated_value=>"3134", :foo=>"2888" },
16=>{ :id=>16, :concatenated_value=>"8449", :foo=>"3112" } }

Process nested hash to convert all values to strings

I have the following code which takes a hash and turns all the values in to strings.
def stringify_values obj
#values ||= obj.clone
obj.each do |k, v|
if v.is_a?(Hash)
#values[k] = stringify_values(v)
else
#values[k] = v.to_s
end
end
return #values
end
So given the following hash:
{
post: {
id: 123,
text: 'foobar',
}
}
I get following YAML output
--- &1
:post: *1
:id: '123'
:text: 'foobar'
When I want this output
---
:post:
:id: '123'
:text: 'foobar'
It looks like the object has been flattened and then been given a reference to itself, which causes Stack level errors in my specs.
How do I get the desired output?
A simpler implementation of stringify_values can be - assuming that it is always a Hash. This function makes use of Hash#deep_merge method added by Active Support Core Extensions - we merge the hash with itself, so that in the block we get to inspect each value and call to_s on it.
def stringify_values obj
obj.deep_merge(obj) {|_,_,v| v.to_s}
end
Complete working sample:
require "yaml"
require "active_support/core_ext/hash"
def stringify_values obj
obj.deep_merge(obj) {|_,_,v| v.to_s}
end
class Foo
def to_s
"I am Foo"
end
end
h = {
post: {
id: 123,
arr: [1,2,3],
text: 'foobar',
obj: { me: Foo.new}
}
}
puts YAML.dump (stringify_values h)
#=>
---
:post:
:id: '123'
:arr: "[1, 2, 3]"
:text: foobar
:obj:
:me: I am Foo
Not sure what is the expectation when value is an array, as Array#to_s will give you array as a string as well, whether that is desirable or not, you can decide and tweak the solution a bit.
There are two issues. First: the #values after the first call would always contain an object which you cloned in the first call, so in the end you will always receive a cloned #values object, no matter what you do with the obj variable(it's because of ||= operator in your call). Second: if you remove it and will do #values = obj.clone - it would still return incorrect result(deepest hash), because you are overriding existing variable call after call.
require 'yaml'
def stringify_values(obj)
temp = {}
obj.each do |k, v|
if v.is_a?(Hash)
temp[k] = stringify_values(v)
else
temp[k] = v.to_s
end
end
temp
end
hash = {
post: {
id: 123,
text: 'foobar',
}
}
puts stringify_values(hash).to_yaml
#=>
---
:post:
:id: '123'
:text: foobar
If you want a simple solution without need of ActiveSupport, you can do this in one line using each_with_object:
obj.each_with_object({}) { |(k,v),m| m[k] = v.to_s }
If you want to modify obj in place pass obj as the argument to each_with_object; the above version returns a new object.
If you are as aware of converting values to strings, I would go with monkeypatching Hash class:
class Hash
def stringify_values
map { |k, v| [k, Hash === v ? v.stringify_values : v.to_s] }.to_h
end
end
Now you will be able to:
require 'yaml'
{
post: {
id: 123,
text: 'foobar'
},
arr: [1, 2, 3]
}.stringify_values.to_yaml
#⇒ ---
# :post:
# :id: '123'
# :text: foobar
# :arr: "[1, 2, 3]"
In fact, I wonder whether you really want to scramble Arrays?

How to refactor each function with map in Ruby?

I have a loop building a hash for use in a select field. The intention is to end up with a hash:
{ object.id => "object name", object.id => "object name" }
Using:
#hash = {}
loop_over.each do |ac|
#hash[ac.name] = ac.id
end
I think that the map method is meant for this type of situation but just need some help understanding it and how it works. Is map the right method to refactor this each loop?
Data transformations like this are better suited to each_with_object:
#hash = loop_over.each_with_object({}) { |ac, h| h[ac.name] = ac.id }
If your brain is telling you to use map but you don't want an array as the result, then you usually want to use each_with_object. If you want to feed the block's return value back into itself, then you want inject but in cases like this, inject requires a funny looking and artificial ;h in the block:
#hash = loop_over.inject({}) { |h, ac| h[ac.name] = ac.id; h }
# -------------------- yuck -----------------------------^^^
The presence of the artificial return value is the signal that you want to use each_with_object instead.
Try:
Hash[loop_over.map { |ac| [ac[:name], ac[:id]] }]
Or if you are running on Ruby 2:
loop_over.map { |ac| [ac[:name], ac[:id]] }.to_h
#hash = Hash[loop_over.map { |ac| {ac.name => ac.id} }.map(&:flatten)]
Edit, a simpler solution as per suggestion in a comment.
#hash = Hash[ loop_over.map { |ac| [ac.name, ac.id] } ]
You can simply do this by injecting a blank new Hash and performing your operation:
loop_over.inject({}){ |h, ac| h[ac.name] = ac.id; h }
Ruby FTW
No a map isn't the correct tool for this.
The general use-case of a map is to take in an array, perform an operation on each element, and spit out a (possibly) new array (not a hashmap) of the same length, with the individual element modifications.
Here's an example of a map
x = [1, 2, 3, 4].map do |i|
i+1 #transform each element by adding 1
end
p x # will print out [2, 3, 4, 5]
Your code:
#hash = {}
loop_over.each do |ac|
#hash[ac.name] = ac.id
end
There is nothing wrong with this example. You are iterating over a list, and populating a hashmap exactly as you wished.
Ruby 2.1.0 introduces brand new method to generate hashes:
h = { a: 1, b: 2, c: 3 }
h.map { |k, v| [k, v+1] }.to_h # => {:a=>2, :b=>3, :c=>4}
I would go for the inject version, but use update in the block to avoid the easy to miss (and therefore error prone) ;h suffix:
#hash = loop_over.inject({}) { |h, ac| h.update(ac.name: ac.id) }

Resources