I have an array of hashes like so:
[{"testPARAM1"=>"testVAL1"}, {"testPARAM2"=>"testVAL2"}]
And I'm trying to map this onto single hash like this:
{"testPARAM2"=>"testVAL2", "testPARAM1"=>"testVAL1"}
I have achieved it using
par={}
mitem["params"].each { |h| h.each {|k,v| par[k]=v} }
But I was wondering if it's possible to do this in a more idiomatic way (preferably without using a local variable).
How can I do this?
You could compose Enumerable#reduce and Hash#merge to accomplish what you want.
input = [{"testPARAM1"=>"testVAL1"}, {"testPARAM2"=>"testVAL2"}]
input.reduce({}, :merge)
is {"testPARAM2"=>"testVAL2", "testPARAM1"=>"testVAL1"}
Reducing an array sort of like sticking a method call between each element of it.
For example [1, 2, 3].reduce(0, :+) is like saying 0 + 1 + 2 + 3 and gives 6.
In our case we do something similar, but with the merge function, which merges two hashes.
[{:a => 1}, {:b => 2}, {:c => 3}].reduce({}, :merge)
is {}.merge({:a => 1}.merge({:b => 2}.merge({:c => 3})))
is {:a => 1, :b => 2, :c => 3}
How about:
h = [{"testPARAM1"=>"testVAL1"}, {"testPARAM2"=>"testVAL2"}]
r = h.inject(:merge)
Every answers until now are advising to use Enumerable#reduce (or inject which is an alias) + Hash#merge but beware, while being clean, concise and human readable this solution will be hugely time consuming and have a large memory footprint on large arrays.
I have compiled different solutions and benchmarked them.
Some options
a = [{'a' => {'x' => 1}}, {'b' => {'x' => 2}}]
# to_h
a.to_h { |h| [h.keys.first, h.values.first] }
# each_with_object
a.each_with_object({}) { |x, h| h.store(x.keys.first, x.values.first) }
# each_with_object (nested)
a.each_with_object({}) { |x, h| x.each { |k, v| h.store(k, v) } }
# map.with_object
a.map.with_object({}) { |x, h| h.store(x.keys.first, x.values.first) }
# map.with_object (nested)
a.map.with_object({}) { |x, h| x.each { |k, v| h.store(k, v) } }
# reduce + merge
a.reduce(:merge) # take wayyyyyy to much time on large arrays because Hash#merge creates a new hash on each iteration
# reduce + merge!
a.reduce(:merge!) # will modify a in an unexpected way
Benchmark script
It's important to use bmbm and not bm to avoid differences are due to the cost of memory allocation and garbage collection.
require 'benchmark'
a = (1..50_000).map { |x| { "a#{x}" => { 'x' => x } } }
Benchmark.bmbm do |x|
x.report('to_h:') { a.to_h { |h| [h.keys.first, h.values.first] } }
x.report('each_with_object:') { a.each_with_object({}) { |x, h| h.store(x.keys.first, x.values.first) } }
x.report('each_with_object (nested):') { a.each_with_object({}) { |x, h| x.each { |k, v| h.store(k, v) } } }
x.report('map.with_object:') { a.map.with_object({}) { |x, h| h.store(x.keys.first, x.values.first) } }
x.report('map.with_object (nested):') { a.map.with_object({}) { |x, h| x.each { |k, v| h.store(k, v) } } }
x.report('reduce + merge:') { a.reduce(:merge) }
x.report('reduce + merge!:') { a.reduce(:merge!) }
end
Note: I initially tested with a 1_000_000 items array but as reduce + merge is costing exponentially much time it will take to much time to end.
Benchmark results
50k items array
Rehearsal --------------------------------------------------------------
to_h: 0.031464 0.004003 0.035467 ( 0.035644)
each_with_object: 0.018782 0.003025 0.021807 ( 0.021978)
each_with_object (nested): 0.018848 0.000000 0.018848 ( 0.018973)
map.with_object: 0.022634 0.000000 0.022634 ( 0.022777)
map.with_object (nested): 0.020958 0.000222 0.021180 ( 0.021325)
reduce + merge: 9.409533 0.222870 9.632403 ( 9.713789)
reduce + merge!: 0.008547 0.000000 0.008547 ( 0.008627)
----------------------------------------------------- total: 9.760886sec
user system total real
to_h: 0.019744 0.000000 0.019744 ( 0.019851)
each_with_object: 0.018324 0.000000 0.018324 ( 0.018395)
each_with_object (nested): 0.029053 0.000000 0.029053 ( 0.029251)
map.with_object: 0.021635 0.000000 0.021635 ( 0.021782)
map.with_object (nested): 0.028842 0.000005 0.028847 ( 0.029046)
reduce + merge: 17.331742 6.387505 23.719247 ( 23.925125)
reduce + merge!: 0.008255 0.000395 0.008650 ( 0.008681)
2M items array (excluding reduce + merge)
Rehearsal --------------------------------------------------------------
to_h: 2.036005 0.062571 2.098576 ( 2.116110)
each_with_object: 1.241308 0.023036 1.264344 ( 1.273338)
each_with_object (nested): 1.126841 0.039636 1.166477 ( 1.173382)
map.with_object: 2.208696 0.026286 2.234982 ( 2.252559)
map.with_object (nested): 1.238949 0.023128 1.262077 ( 1.270945)
reduce + merge!: 0.777382 0.013279 0.790661 ( 0.797180)
----------------------------------------------------- total: 8.817117sec
user system total real
to_h: 1.237030 0.000000 1.237030 ( 1.247476)
each_with_object: 1.361288 0.016369 1.377657 ( 1.388984)
each_with_object (nested): 1.765759 0.000000 1.765759 ( 1.776274)
map.with_object: 1.439949 0.029580 1.469529 ( 1.481832)
map.with_object (nested): 2.016688 0.019809 2.036497 ( 2.051029)
reduce + merge!: 0.788528 0.000000 0.788528 ( 0.794186)
Use #inject
hashes = [{"testPARAM1"=>"testVAL1"}, {"testPARAM2"=>"testVAL2"}]
merged = hashes.inject({}) { |aggregate, hash| aggregate.merge hash }
merged # => {"testPARAM1"=>"testVAL1", "testPARAM2"=>"testVAL2"}
Here you can use either inject or reduce from Enumerable class as both of them are aliases of each other so there is no performance benefit to either.
sample = [{"testPARAM1"=>"testVAL1"}, {"testPARAM2"=>"testVAL2"}]
result1 = sample.reduce(:merge)
# {"testPARAM1"=>"testVAL1", "testPARAM2"=>"testVAL2"}
result2 = sample.inject(:merge)
# {"testPARAM1"=>"testVAL1", "testPARAM2"=>"testVAL2"}
Related
I am dealing with a large quantity of data and I'm worried about the efficiency of my operations at-scale. After benchmarking, the average time to execute this string of code is about 0.004sec. The goal of this line of code is to find the difference between the two values in each array location. In a previous operation, 111.111 was loaded into the arrays in locations which contained invalid data. Due to some weird time domain issues, I needed to do this because I couldn't just remove the values and I needed some distinguishable placeholder. I could probably use 'nil' here instead. Anyways, back to the explanation. This line of code checks to ensure neither array has this 111.111 placeholder in the current location. If the values are valid then I perform the mathematical operation, otherwise I want to delete the values (or at least exclude them from the new array to which I'm writing). I accomplished this by place a 'nil' in that location and then compacting the array afterwards.
The time of 0.004sec for 4000 data points in each array isn't terrible but this line of code is executed 25M times. I'm hoping someone might be able to offer some insight into how I might optimize this line of code.
temp_row = row_1.zip(row_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
You are wasting CPU generating nil in the ternary statement, then using compact to remove them. Instead, use reject or select to find elements not containing 111.111 then map or something similar.
Instead of:
row_1 = [1, 111.111, 2]
row_2 = [2, 111.111, 4]
temp_row = row_1.zip(row_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
temp_row # => [1, 2]
I'd start with:
temp_row = row_1.zip(row_2)
.reject{ |x,y| x == 111.111 || y == 111.111 }
.map{ |x,y| (x - y).abs }
temp_row # => [1, 2]
Or:
temp_row = row_1.zip(row_2)
.each_with_object([]) { |(x,y), ary|
ary << (x - y).abs unless (x == 111.111 || y == 111.111)
}
temp_row # => [1, 2]
Benchmarking different size arrays shows good things to know:
require 'benchmark'
DECIMAL_SHIFT = 100
DATA_ARRAY = (1 .. 1000).to_a
ROW_1 = (DATA_ARRAY + [111.111]).shuffle
ROW_2 = (DATA_ARRAY.map{ |i| i * 2 } + [111.111]).shuffle
Benchmark.bm(16) do |b|
b.report('ternary:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
end
end
b.report('reject:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2).reject{ |x,y| x == 111.111 || y == 111.111 }.map{ |x,y| (x - y).abs }
end
end
b.report('each_with_index:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2)
.each_with_object([]) { |(x,y), ary|
ary += [(x - y).abs] unless (x == 111.111 || y == 111.111)
}
end
end
end
# >> user system total real
# >> ternary: 0.240000 0.000000 0.240000 ( 0.244476)
# >> reject: 0.060000 0.000000 0.060000 ( 0.058842)
# >> each_with_index: 0.350000 0.000000 0.350000 ( 0.349363)
Adjust the size of DECIMAL_SHIFT and DATA_ARRAY and the placement of 111.111 and see what happens to get an idea of what expressions work best for your data size and structure and fine-tune the code as necessary.
You can try the parallel gem https://github.com/grosser/parallel and run it on multiple threads
Basically I´ve got a hash and I would like to sum the current value with the previous.
i.e
what I have
hash = {:a=>5, :b=>10, :c=>15, :d=>3}
The result that I want
{:a=>5, :b=>15, :c=>30, :d=>33}
hash.inject(0) { |s, (k, v)| hash[k] = s + v }
# => 33
hash
# => {:a=>5, :b=>15, :c=>30, :d=>33}
If you want to preserve the original hash, you can use each_with_object instead:
hash.each_with_object({}) { |(k, v), h| h[k] = v + (h.values.last||0) }
# => {:a=>5, :b=>15, :c=>30, :d=>33}
The following will return a new hash instance:
hash.each_with_object({}) { |(key, val), new_hash| new_hash[key] = val + (new_hash.values.last||0) }
Trying to create a simple regular expression that can extract numbers(between 7 - 14) after a keyword starting with g letter and some id, something like following :
(g)(\d{1,6})\s+(\d{7,14}\s*)+
Lets assume :
m = (/(g)(\d{1,6})\s+(\d{7,14}\s*)+/i.match("g12 327638474 83873478 2387327683 44 437643673476"))
I have results of :
#<MatchData "g23333 327638474 83873478 2387327683 " "g" "12" "2387327683 ">
But what I need as a final result , to include, 327638474, 83873478, 2387327683 and exclude 44.
For now I just getting the last number 2387327683 with not including the previous numbers
Any help here .
cheers
Instead of a regex, you can use something like that:
s = "g12 327638474 83873478 2387327683 44 437643673476"
s.split[1..-1].select { |x| (7..14).include?(x.size) }.map(&:to_i)
# => [327638474, 83873478, 2387327683, 437643673476]
Just as a FYI, here is a benchmark showing a bit faster way of accomplishing the selected answer:
require 'ap'
require 'benchmark'
n = 100_000
s = "g12 327638474 83873478 2387327683 44 437643673476"
ap s.split[1..-1].select { |x| (7..14).include? x.size }.map(&:to_i)
ap s.split[1..-1].select { |x| 7 <= x.size && x.size <= 14 }.map(&:to_i)
Benchmark.bm(11) do |b|
b.report('include?' ) { n.times{ s.split[1..-1].select { |x| (7..14).include? x.size }.map(&:to_i) } }
b.report('conditional') { n.times{ s.split[1..-1].select { |x| 7 <= x.size && x.size <= 14 }.map(&:to_i) } }
end
ruby ~/Desktop/test.rb
[
[0] 327638474,
[1] 83873478,
[2] 2387327683,
[3] 437643673476
]
[
[0] 327638474,
[1] 83873478,
[2] 2387327683,
[3] 437643673476
]
user system total real
include? 1.010000 0.000000 1.010000 ( 1.011725)
conditional 0.830000 0.000000 0.830000 ( 0.825746)
For speed I'll use the conditional test. It's a tiny bit more verbose, but is still easily read.
a = {"a" => 100, "b" => 200, "c" => 300}
b = a.map{|k,v| v = v + 10}
is returning an array, i need to change the values of a hash by call by reference
I am expecting the following output
{"a" => 110, "b" => 210, "c" => 310}
Thanks
Here's my non-mutating one-liner :P
Hash[original_hash.map { |k,v| [k, v+10] }]
Gotta love ruby one-liners :)
Maybe you can do something like this:
a.keys.each do |key| a[key] += 10 end
a.each_pair do |x,y| a[x] += 10 end
Reality check:
require "benchmark"
include Benchmark
h0, h1, h2, h3, h4 = (0..4).map { Hash[(0..1000).map{ |i| [i,i] }] }
bm do |x|
x.report("0") { 1000.times { h0.each_key{ |k| h0[k] += 10 } } }
x.report("1") { 1000.times { h1.keys.each{ |k| h1[k] += 10 } } }
x.report("2") { 1000.times { Hash[h2.map { |k,v| [k, v+10] }] } }
x.report("3") { 1000.times { h3.inject({}){ |h,(k,v)| h[k] = v + 10; h } } }
x.report("4") { 1000.times { h4.inject({}){ |h,(k,v)| h.update( k => v + 10) } } }
end
user system total real
0 0.490000 0.000000 0.490000 ( 0.540795)
1 0.490000 0.010000 0.500000 ( 0.545050)
2 1.210000 0.010000 1.220000 ( 1.388739)
3 1.570000 0.010000 1.580000 ( 1.660317)
4 2.460000 0.010000 2.470000 ( 3.057287)
Imperative programming wins.
Dude change the map with each and you are good to go :)
I believe in every Ruby question inject should be presented :D
b = a.inject({}){ |h,(k,v)| h[k] = v + 10; h }
#=> {"a"=>110, "b"=>210, "c"=>310}
I have two array I need to merge, and using the Union (|) operator is PAINFULLY slow.. are there any other ways to accomplish an array merge?
Also, the arrays are filled with objects, not strings.
An Example of the objects within the array
#<Article
id: 1,
xml_document_id: 1,
source: "<article><domain>events.waikato.ac</domain><excerpt...",
created_at: "2010-02-11 01:32:46",
updated_at: "2010-02-11 01:41:28"
>
Where source is a short piece of XML.
EDIT
Sorry! By 'merge' I mean I need to not insert duplicates.
A => [1, 2, 3, 4, 5]
B => [3, 4, 5, 6, 7]
A.magic_merge(B) #=> [1, 2, 3, 4, 5, 6, 7]
Understanding that the integers are actually Article objects, and the Union operator appears to take forever
Here's a script which benchmarks two merge techniques: using the pipe operator (a1 | a2), and using concatenate-and-uniq ((a1 + a2).uniq). Two additional benchmarks give the time of concatenate and uniq individually.
require 'benchmark'
a1 = []; a2 = []
[a1, a2].each do |a|
1000000.times { a << rand(999999) }
end
puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }
puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }
puts "Concat only:"
puts Benchmark.measure { a1 + a2 }
puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }
On my machine (Ubuntu Karmic, Ruby 1.8.7), I get output like this:
Merge with pipe:
1.000000 0.030000 1.030000 ( 1.020562)
Merge with concat and uniq:
1.070000 0.000000 1.070000 ( 1.071448)
Concat only:
0.010000 0.000000 0.010000 ( 0.005888)
Uniq only:
0.980000 0.000000 0.980000 ( 0.981700)
Which shows that these two techniques are very similar in speed, and that uniq is the larger component of the operation. This makes sense intuitively, being O(n) (at best), whereas simple concatenation is O(1).
So, if you really want to speed this up, you need to look at how the <=> operator is implemented for the objects in your arrays. I believe that most of the time is being spent comparing objects to ensure inequality between any pair in the final array.
Do you need the items to be in a specific order within the arrays? If not, you may want to check whether using Sets makes it faster.
Update
Adding to another answerer's code:
require "set"
require "benchmark"
a1 = []; a2 = []
[a1, a2].each do |a|
1000000.times { a << rand(999999) }
end
s1, s2 = Set.new, Set.new
[s1, s2].each do |s|
1000000.times { s << rand(999999) }
end
puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }
puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }
puts "Concat only:"
puts Benchmark.measure { a1 + a2 }
puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }
puts "Using sets"
puts Benchmark.measure {s1 + s2}
puts "Starting with arrays, but using sets"
puts Benchmark.measure {s3, s4 = [a1, a2].map{|a| Set.new(a)} ; (s3 + s4)}
gives (for ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0])
Merge with pipe:
1.320000 0.040000 1.360000 ( 1.349563)
Merge with concat and uniq:
1.480000 0.030000 1.510000 ( 1.512295)
Concat only:
0.010000 0.000000 0.010000 ( 0.019812)
Uniq only:
1.460000 0.020000 1.480000 ( 1.486857)
Using sets
0.310000 0.010000 0.320000 ( 0.321982)
Starting with arrays, but using sets
2.340000 0.050000 2.390000 ( 2.384066)
Suggests that sets may or may not be faster, depending on your circumstances (lots of merges or not many merges).
Using the Array#concat method will likely be a lot faster, according to my initial benchmarks using Ruby 1.8.7:
require 'benchmark'
def reset_arrays!
#array1 = []
#array2 = []
[#array1, #array2].each do |array|
10000.times { array << ActiveSupport::SecureRandom.hex }
end
end
reset_arrays! && puts(Benchmark.measure { #array1 | #array2 })
# => 0.030000 0.000000 0.030000 ( 0.026677)
reset_arrays! && puts(Benchmark.measure { #array1.concat(#array2) })
# => 0.000000 0.000000 0.000000 ( 0.000122)
Try this and see if this is any faster
a = [1,2,3,3,2]
b = [1,2,3,4,3,2,5,7]
(a+b).uniq