Related
I am trying to sum array of array and get average at the same time. The original data is in the form of JSON. I have to parse my data to array of array in order to render the graph. The graph does not accept array of hash.
I first convert the output to JSON using the definition below.
ActiveSupport::JSON.decode(#output.first(10).to_json)
And the result of the above action is shown below.
output =
[{"name"=>"aaa", "job"=>"a", "pay"=> 2, ... },
{"name"=>"zzz", "job"=>"a", "pay"=> 4, ... },
{"name"=>"xxx", "job"=>"a", "pay"=> 6, ... },
{"name"=>"yyy", "job"=>"a", "pay"=> 8, ... },
{"name"=>"aaa", "job"=>"b", "pay"=> 2, ... },
{"name"=>"zzz", "job"=>"b", "pay"=> 4, ... },
{"name"=>"xxx", "job"=>"b", "pay"=> 6, ... },
{"name"=>"yyy", "job"=>"b", "pay"=> 10, ... },
]
Then I retrieved the job and pay by converting to array of array.
ActiveSupport::JSON.decode(output.to_json).each { |h|
a << [h['job'], h['pay']]
}
The result of the above operation is as below.
a = [["a", 2], ["a", 4], ["a", 6], ["a", 8],
["b", 2], ["b", 4], ["b", 6], ["b", 10]]
The code below will give me the sum of each element in the form of array of array.
a.inject({}) { |h,(job, data)| h[job] ||= 0; h[job] += data; h }.to_a
And the result is as below
[["a", 20], ["b", 22]]
However, I am trying to get the average of the array. The expected output is as below.
[["a", 5], ["b", 5.5]]
I can count how many elements in an array and divide the sum array by the count array. I was wondering if there is an easier and more efficient way to get the average.
output = [
{"name"=>"aaa", "job"=>"a", "pay"=> 2 },
{"name"=>"zzz", "job"=>"a", "pay"=> 4 },
{"name"=>"xxx", "job"=>"a", "pay"=> 6 },
{"name"=>"yyy", "job"=>"a", "pay"=> 8 },
{"name"=>"aaa", "job"=>"b", "pay"=> 2 },
{"name"=>"zzz", "job"=>"b", "pay"=> 4 },
{"name"=>"xxx", "job"=>"b", "pay"=> 6 },
{"name"=>"yyy", "job"=>"b", "pay"=> 10 },
]
output.group_by { |obj| obj['job'] }.map do |key, list|
[key, list.map { |obj| obj['pay'] }.reduce(:+) / list.size.to_f]
end
The group_by method will transform your list into a hash with the following structure:
{"a"=>[{"name"=>"aaa", "job"=>"a", "pay"=>2}, ...], "b"=>[{"name"=>"aaa", "job"=>"b", ...]}
After that, for each pair of that hash, we want to calculate the mean of its 'pay' values, and return a pair [key, mean]. We use a map for that, returning a pair with:
They key itself ("a" or "b").
The mean of the values. Note that the values list has the form of a list of hashes. To retrieve the values, we need to extract the last element of each pair; that's what list.map { |obj| obj['pay'] } is used for. Finally, calculate the mean by suming all elements with .reduce(:+) and dividing them by the list size as a float.
Not the most efficient solution, but it's practical.
Comparing the answer with #EricDuminil's, here's a benchmark with a list of size 8.000.000:
def Wikiti(output)
output.group_by { |obj| obj['job'] }.map do |key, list|
[key, list.map { |obj| obj['pay'] }.reduce(:+) / list.size.to_f]
end
end
def EricDuminil(output)
count_and_sum = output.each_with_object(Hash.new([0, 0])) do |hash, mem|
job = hash['job']
count, sum = mem[job]
mem[job] = count + 1, sum + hash['pay']
end
result = count_and_sum.map do |job, (count, sum)|
[job, sum / count.to_f]
end
end
require 'benchmark'
Benchmark.bm do |x|
x.report('Wikiti') { Wikiti(output) }
x.report('EricDuminil') { EricDuminil(output) }
end
user system total real
Wikiti 4.100000 0.020000 4.120000 ( 4.130373)
EricDuminil 4.250000 0.000000 4.250000 ( 4.272685)
This method should be reasonably efficient. It creates a temporary hash with job name as key and [count, sum] as value:
output = [{ 'name' => 'aaa', 'job' => 'a', 'pay' => 2 },
{ 'name' => 'zzz', 'job' => 'a', 'pay' => 4 },
{ 'name' => 'xxx', 'job' => 'a', 'pay' => 6 },
{ 'name' => 'yyy', 'job' => 'a', 'pay' => 8 },
{ 'name' => 'aaa', 'job' => 'b', 'pay' => 2 },
{ 'name' => 'zzz', 'job' => 'b', 'pay' => 4 },
{ 'name' => 'xxx', 'job' => 'b', 'pay' => 6 },
{ 'name' => 'yyy', 'job' => 'b', 'pay' => 10 }]
count_and_sum = output.each_with_object(Hash.new([0, 0])) do |hash, mem|
job = hash['job']
count, sum = mem[job]
mem[job] = count + 1, sum + hash['pay']
end
#=> {"a"=>[4, 20], "b"=>[4, 22]}
result = count_and_sum.map do |job, (count, sum)|
[job, sum / count.to_f]
end
#=> [["a", 5.0], ["b", 5.5]]
It requires 2 passes, but the created objects aren't big. In comparison, calling group_by on a huge array of hashes isn't very efficient.
How about this (Single pass iterative average calculation)
accumulator = Hash.new {|h,k| h[k] = Hash.new(0)}
a.each_with_object(accumulator) do |(k,v),obj|
obj[k][:count] += 1
obj[k][:sum] += v
obj[k][:average] = (obj[k][:sum] / obj[k][:count].to_f)
end
#=> {"a"=>{:count=>4, :sum=>20, :average=>5.0},
# "b"=>{:count=>4, :sum=>22, :average=>5.5}}
Obviously average is just recalculated on every iteration but since you asked for them at the same time this is probably as close as you are going to get.
Using your "output" instead looks like
output.each_with_object(accumulator) do |h,obj|
key = h['job']
obj[key][:count] += 1
obj[key][:sum] += h['pay']
obj[key][:average] = (obj[key][:sum] / obj[key][:count].to_f)
end
#=> {"a"=>{:count=>4, :sum=>20, :average=>5.0},
# "b"=>{:count=>4, :sum=>22, :average=>5.5}}
as Sara Tibbetts comment suggests, my first step would be to convert it like this
new_a = a.reduce({}){ |memo, item| memo[item[0]] ||= []; memo[item[0]] << item[1]; memo}
which puts it in this format
{a: [2, 4, 6, 8], b: [2, 4, 6, 20]}
you can then use slice to filter the keys you want
new_a.slice!(key1, key2, ...)
Then do another pass through to do get the final format
new_a.reduce([]) do |memo, (k,v)|
avg = v.inject{ |sum, el| sum + el }.to_f / v.size
memo << [k,avg]
memo
end
I elected to use Enumerable#each_with_object with the object being an array of two hashes, the first to compute totals, the second to count the number of numbers that are totalled. Each hash is defined Hash.new(0), zero being the default value. See Hash::new for a fuller explanation, In short, if a hash defined h = Hash.new(0) does not have a key k, h[k] returns 0. (h is not modified.) h[k] += 1 expands to h[k] = h[k] + 1. If h does not have a key k, h[k] on the right of the equality returns 0.1
output =
[{"name"=>"aaa", "job"=>"a", "pay"=> 2},
{"name"=>"zzz", "job"=>"a", "pay"=> 4},
{"name"=>"xxx", "job"=>"a", "pay"=> 6},
{"name"=>"yyy", "job"=>"a", "pay"=> 8},
{"name"=>"aaa", "job"=>"b", "pay"=> 2},
{"name"=>"zzz", "job"=>"b", "pay"=> 4},
{"name"=>"xxx", "job"=>"b", "pay"=> 6},
{"name"=>"yyy", "job"=>"b", "pay"=>10}
]
htot, hnbr = output.each_with_object([Hash.new(0), Hash.new(0)]) do |f,(g,h)|
s = f["job"]
g[s] += f["pay"]
h[s] += 1
end
htot.merge(hnbr) { |k,o,n| o.to_f/n }.to_a
#=> [["a", 5.0], ["b", 5.5]]
If .to_a at the end is dropped the the hash {"a"=>5.0, "b"=>5.5} is returned. The OP might find that more useful than the array.
I've used the form of Hash#merge that uses a block to determine the values of keys that are present in both hashes being merged.
Note that htot={"a"=>20, "b"=>22} and hnbr=>{"a"=>4, "b"=>4}.
1 If the reader is wondering why h[k] on the left of = doesn't return zero as well, it's a different method: Hash#[]= versus Hash#[]
This question already has answers here:
Why are exclamation marks used in Ruby methods?
(12 answers)
Closed 7 years ago.
a = [1,2,3]
a.uniq! # nil
a.uniq # [1,2,3]
Why a.uniq! is not [1,2,3] ?
Let me know the reason. Thank you!
You need to read the Ruby documentation.
The uniq method returns a new array by removing duplicate values in self. If no duplicates are found, the same array value is returned.
a = [ "a", "a", "b", "b", "c" ]
a.uniq # => ["a", "b", "c"]
b = [ "a", "b", "c" ]
b.uniq # => ["a", "b", "c"]
The uniq! method removes duplicate elements from self and returns nil if no changes are made (that is, no duplicates are found).
a = [ "a", "a", "b", "b", "c" ]
a.uniq! # => ["a", "b", "c"]
b = [ "a", "b", "c" ]
b.uniq! # => nil
most of the methods ending with bang (!) change the variable, while those without it just return the altered variable.
So, if you have something like this:
a = [1, 1, 2, 3]
a.uniq will return [1, 2, 3], but wont alter a, while a! will alter a to be equal to [1, 2, 3]
[1] pry(main)> a = [1,1,2,3]
=> [1, 1, 2, 3]
[2] pry(main)> a.uniq
=> [1, 2, 3]
[3] pry(main)> a
=> [1, 1, 2, 3]
[4] pry(main)> a.uniq!
=> [1, 2, 3]
[5] pry(main)> a
=> [1, 2, 3]
[6] pry(main)> a.uniq!
=> nil
[7] pry(main)> a
=> [1, 2, 3]
I'm writing a small CSV parser in Ruby. The CSV parser works fine, but I can't add my row to hash. What am I doing wrong?
Here's the parser:
require 'smarter_csv'
f = File.open('installs.csv')
hash = {}
csv = SmarterCSV.process(f, strip_chars_from_headers: /"|:/)
csv.each do |row|
coords = row[:location_1].lines.to_a[1..-1].join
row[:address] = coords
hash << row
end
p hash
This returns a undefined method '<<' for {}:Hash (NoMethodError) error. What's going on?
You can use merge! to insert Hash into another Hash, It behaves like Array <<
a = {'1' => 2}
b = {'2' => 3}
c = {}
c.merge!(a) # c = {'1' => 2}
c.merge!(b) # c = {'1' => 2, '2' => 3}
If you want an Array of Hashes, why do you don't use Array object instead of Hash
require 'smarter_csv'
f = File.open('installs.csv')
a = []
csv = SmarterCSV.process(f, strip_chars_from_headers: /"|:/)
csv.each do |row|
coords = row[:location_1].lines.to_a[1..-1].join
row[:address] = coords
a << row
end
p a # will result in array of rows, each row is hash
For Hash use merge or merge! with ! to persist the change to the object.
hash_one = {a: 1, b: 3, c: 2}
=> {a: 1, b: 3, c: 2}
hash_two = {d: 89, e: 34, f: 1}
=> {d: 89, e: 34, f: 1}
hash_two.merge!(hash_one)
=> {a: 1, b: 3, c: 2, d: 89, e: 34, f: 1}
Use << for Array object.
This question already has answers here:
How to determine if one array contains all elements of another array
(8 answers)
Closed 3 years ago.
Is there any method to check if array A contains all the elements of array B?
You can try this
a.sort.uniq == b.sort.uniq
or
(a-b).empty?
And if [1,2,2] != [1,2] in your case you can:
a.group_by{|i| i} == b.group_by{|i| i}
This should work for what you need:
(a & b) == b
You could use Ruby's Set class:
>> require 'set' #=> true
>> a = [*1..5] #=> [1, 2, 3, 4, 5]
>> b = [*1..3] #=> [1, 2, 3]
>> a.to_set.superset? b.to_set #=> true
For small arrays I usually do the same as what fl00r suggested:
>> (b-a).empty? #=> true
I prefer to do this via: (b - a).blank? # tells that b is contained in a
The simplest way is this:
(b-a).empty?
There's also the Set class (part of the standard library) which would allow you to just check to see if B is a subset of A, e.g.
>> a = [1,2,3,4,5] => [1, 2, 3, 4, 5]
>> b = [3,4,5] => [3, 4, 5]
>> require 'set' => true
>> set_a = a.to_set => #<Set: {1, 2, 3, 4, 5}>
>> set_b = b.to_set => #<Set: {3, 4, 5}>
>> set_b.subset? set_a => true
http://www.ruby-doc.org/stdlib/libdoc/set/rdoc/index.html
Ruby 2.6+
Ruby's introduced difference in 2.6 for exactly this purpose.
Very fast, very readable, as follows:
a = [1, 2, 3, 4, 5, 6]
b = [1, 2, 3, 4, 5, 6]
a.difference(b).any?
# => false
a.difference(b.reverse).any?
# => false
a = [1, 2, 3, 4, 5, 6]
b = [1, 2, 3]
a.difference(b).any?
# => true
Hope that helps someone!
You may want to check out the Set class in the Ruby Standard library. The proper_subset? method will probably do what you want.
How could I implement this? I think my solution is very dirty, and I would like to do it better. I think there is an easy way to do this in Ruby, but I can't remember. I want to use it with Rails, so if Rails provides something similar that's ok, too. usage should be like this:
fruits = ['banana', 'strawberry', 'kiwi', 'orange', 'grapefruit', 'lemon', 'melon']
# odd_fruits should contain all elements with odd indices (index % 2 == 0)
odd_fruits = array_mod(fruits, :mod => 2, :offset => 0)
# even_fruits should contain all elements with even indices (index % 2 == 1)
even_fruits = array_mod(fruits, :mod => 2, :offset => 1)
puts odd_fruits
banana
kiwi
grapefruit
melon
puts even_fruits
strawberry
orange
lemon
******* EDIT *******
for those wo want to know, that is what i finally did:
in a rails project, i created a new file config/initializers/columnize.rb which looks like this:
class Array
def columnize args = { :columns => 1, :offset => 0 }
column = []
self.each_index do |i|
column << self[i] if i % args[:columns] == args[:offset]
end
column
end
end
Rails automatically loads these files immediately after Rails has been loaded. I also used the railsy way of supplying arguments to a method, because i think that serves the purpose of better readable code, and i'm a good-readable-code-fetishist :) I extended the core class "Array", and now i can do things like the following with every array in my project:
>> arr = [1,2,3,4,5,6,7,8]
=> [1, 2, 3, 4, 5, 6, 7, 8]
>> arr.columnize :columns => 2, :offset => 0
=> [1, 3, 5, 7]
>> arr.columnize :columns => 2, :offset => 1
=> [2, 4, 6, 8]
>> arr.columnize :columns => 3, :offset => 0
=> [1, 4, 7]
>> arr.columnize :columns => 3, :offset => 1
=> [2, 5, 8]
>> arr.columnize :columns => 3, :offset => 2
=> [3, 6]
I will now use it to display entries from the database in different columns in my views. What i like about it, is that i don't have to call any compact methods or stuff, because rails complains when you pass a nil object to a view. now it just works. I also thought about letting JS do all that for me, but i like it better this way, working with the 960 Grid system (http://960.gs)
fruits = ["a","b","c","d"]
even = []
x = 2
fruits.each_index{|index|
even << fruits[index] if index % x == 0
}
odds = fruits - even
p fruits
p even
p odds
["a", "b", "c", "d"]
["a", "c"]
["b", "d"]
def array_mod(arr, mod, offset = 0)
arr.shift(offset)
out_arr = []
arr.each_with_index do |val, idx|
out_arr << val if idx % mod == 0
end
out_arr
end
Usage:
>> fruits = ['banana', 'strawberry', 'kiwi', 'orange', 'grapefruit', 'lemon', 'melon']
>> odd_fruits = array_mod(fruits, 2)
=> ["banana", "kiwi", "grapefruit", "melon"]
>> even_fruits = array_mod(fruits, 2, 1)
=> ["strawberry", "orange", "lemon"]
>> even_odder_fruits = array_mod(fruits, 3, 2)
=> ["kiwi", "lemon"]
The simplest method I can think of is this:
fruits = ["a","b","c","d"]
evens = fruits.select {|x| fruits.index(x) % 2 == 0}
odds = fruits - evens
You don't need to mess with select_with_index if the array can look up indices for you. I suppose the drawback to this method is if you have multiple entries in 'fruits' with the same value (the index method returns the index of the first matching entry only).
What you want is:
even_fruits = fruits.select_with_index { |v,i| i % 2 == 0) }
odd_fruits = fruits - even_fruits
Unfortunately Enumerable#select_with_index does not exist as standard, but several people have extended Enumerable with such a method.
http://snippets.dzone.com/posts/show/3746
http://webget.com/gems/webget_ruby_ramp/doc/Enumerable.html#M000058
Solution using just core capabilities:
(0...((fruits.size+1-offset)/mod)).map {|i| fruits[i*mod+offset]}
Rails provides an ActiveSupport extension to Array that provides an "in_groups_of" method. That's what I usually use for things like this. It allows you to do this:
to pull the even fruits (remember to compact to pull off nils at the end):
fruits = ['banana', 'strawberry', 'kiwi', 'orange', 'grapefruit', 'lemon', 'melon']
fruits.in_groups_of(2).collect{|g| g[1]}.compact
=> ["strawberry", "orange", "lemon"]
to pull the odd fruits:
fruits.in_groups_of(2).collect{|g| g[0]}.compact
=> ["banana", "kiwi", "grapefruit", "melon"]
to get every third fruit, you could use:
fruits.in_groups_of(3).collect{|g| g[0]}.compact
=> ["banana", "orange", "melon"]
functional way
#fruits = [...]
even = []
odd = []
fruits.inject(true ){|_is_even, _el| even << _el if _is_even; !_is_even}
fruits.inject(false){|_is_odd, _el| odd << _el if _is_odd; !_is_odd }
Here's a solution using #enum_for, which allows you to specify a method to use "in place" of #each:
require 'enumerator'
mod = 2
[1, 2, 3, 4].enum_for(:each_with_index).select do |item, index|
index % mod == 0
end.map { |item, index| item }
# => [1, 2]