Improving an 'each' statement - ruby-on-rails

I am using Ruby on Rails 3 and I would like to improve the following code to a better way to do the same thing.
query = {}
params.each { |k,v| query[k.singularize] = v }
How can I do that?

If you were actually finding that
params.each { |k,v| query[k.singularize] = v }
was taking too long, singularize would be taking up most of your time.
If most of the words were the same, I'd consider memoization.
Actually, if you had ten thousand parameters, I'd consider a code review!

query = Hash[params.map{|k, v| [k.singularize, v]}]

As nzifnab wrote, my other code seems to be slow. The following one may be slightly faster than the original posted.
query = params.each_with_object({}){|(k, v), h| h[k.singularize] = v}

Well I had an idea (same idea as sawa) and decided I wanted to know if it was an improvement. Here's the benchmark results:
params = {'puppies' => 'cute',
'dinosaurs' => 'angry',
'kittens' => 'kill them all',
'wat' => 4}
Benchmark.bm do |x|
x.report(".each"){10000.times{query = {}; params.each{ |k,v| query[k.singularize] = v }}}
x.report("Hash"){10000.times{query = Hash[params.map{|k, v| [k.singularize, v]}]}}
end
And the result:
user system total real
.each 3.850000 0.390000 4.240000 ( 4.260567)
Hash 3.910000 0.400000 4.310000 ( 4.402304)
So very little difference, although Hash is the opposite of improvement, sadly - if performance was a concern for you.
I still tend to use the Hash[] format just because I like how .map works... but .map has to loop through every single item too so it's no different.
EDIT:
I went with the comment suggestion to do one really large hash instead of a tiny hash 10,000 times. Here's the results:
myhash = {}
20000.times do |i|
myhash[i.to_s * 2 + 's'] = i
end
Benchmark.bm do |x|
x.report(".each"){query = {}; myhash.each{|k,v| query[k.singularize] = v}}
x.report("Hash"){query = Hash[myhash.map{|k,v| [k.singularize, v]}]}
end
Results:
user system total real
.each 1.980000 0.110000 2.090000 ( 2.100811)
Hash 2.040000 0.140000 2.180000 ( 2.176588)
Edit 2: Credit goes to sawa for this third method:
Benchmark.bm do |x|
x.report(".each"){query = {}; myhash.each{|k,v| query[k.singularize] = v}}
x.report("Hash"){query = Hash[myhash.map{|k,v| [k.singularize, v]}]}
x.report("with_object"){query = myhash.each_with_object({}){|(k, v), h| h[k.singularize] = v}}
end
user system total real
.each 2.050000 0.110000 2.160000 ( 2.174315)
Hash 2.070000 0.110000 2.180000 ( 2.187600)
with_object 2.100000 0.110000 2.210000 ( 2.207763)
If you (or someone) can find a way to modify each value in-place I suspect this would be the fastest way to do it:
params.each{|arr| arr[0].singularize!}
But you can't do that because
singularize! is not defined, and
when you try to do this:
params.each{|arr| arr[0].gsub!('s', '')}
You get an error:
TypeError: can't modify frozen string
I would just stick with the original version :p

I tried the following in Ruby YARV 1.9.1, without ActiveSupport (hence reverse rather than singularize)
require "benchmark"
myhash = {}
2000000.times do |i|
myhash[i.to_s * 2 + 's'] = i
end
Benchmark.bm do |x|
x.report(".each"){query = {}; myhash.each{|k,v| query[k.reverse] = v}}
x.report("Hash"){query = Hash[myhash.map{|k,v| [k.reverse, v]}]}
puts RUBY_VERSION
puts RUBY_ENGINE if defined?(RUBY_ENGINE)
end
gave me
user system total real
.each 6.350000 0.070000 6.420000 ( 6.415588)
Hash 5.710000 0.100000 5.810000 ( 5.795611)
1.9.1
ruby
So in my case, Hash was faster.
Considering the difference in speed between my benchmark and nzifnab's, I'd want to check the bulk of time wasn't spend on singularize.
Update:
Under 1.8.7:
user system total real
.each 11.640000 0.380000 12.020000 ( 12.019372)
Hash 15.010000 0.540000 15.550000 ( 15.552186)
1.8.7
So it's slower using Hash under 1.8?

Related

Skip key from hash loop

I am performing an operation on hash values, lets say :
hash = { a: true, b: false, c: nil }
I am executing an each loop on hash but I want to skip keys b and c. I don't want to delete these from hash.
I have tried:
hash = { a: true, b: false, c: nil}
hash.except(:c)
{ a: true, b: false, c: nil}
But it is not working. I am using ruby 2.4.2
Actually hash.except(:c) returns { a: true, b: false } as expected. Since you're using Rails it should work. The only subtle moment, that I want to take a note on is that:
hash.except([:b, :c])
won't work. You need to use
hash.except(:b, :c)
instead.
For general solution you need to use splat operator:
keys = [:b, :c]
hash.except(*keys)
If you simply need to skip it while looping over the hash pairs, I personally would steer clear of using except, and use the uglier next if key == whatever within the loop.
A equality check between symbols is cheap, about the the most low overhead thing that can be done in Ruby, the basic equivalent of comparing two integers or booleans.
except on the other hand is not, especially as the size of the hash grows. You are creating a new cloned hash, minus the specified values, every time you call it. Even with a small hash, you are creating a new object needlessly.
I understand that many Ruby users are forever in pursuit of the "one-liners" or absolute shortest amount of code possible, I am guilty of it myself, but it needs done mindfully of whats going on beneath the surface, or you are creating less efficient code, not more.
So although not as "pretty", this would be more efficient:
hash.each do |k, v|
next if k == :b || k == :c
# Do stuff
end
EDIT
I was curious of the performance difference between what I was stating, and the use of except, and the resulting differences are significant.
First, I added the source for except, I didn't have Rails installed. This is straight from the source code from activesupport/lib/active_support/core_ext/hash/except.rb.
class Hash
def except!(*keys)
keys.each { |key| delete(key) }
self
end
def except(*keys)
dup.except!(*keys)
end
end
Then I did some benchmarking. I figured a 1,000,000 samples was enough for one run.
require 'benchmark'
hash = { a: true, b: false, c: nil }
count = 1_000_000
Benchmark.bm do |bm|
bm.report("except") do
count.times do
hash.except(:b, :c).each do |k, v|
# Do nothing
end
end
end
bm.report("next") do
count.times do
hash.each do |k, v|
next if k == :b || k == :c
# Do nothing
end
end
end
end
Running various times, including changing to bmbm to confirm the GC isn't skewing anything:
user system total real
except 1.282000 0.000000 1.282000 ( 1.276943)
next 0.250000 0.000000 0.250000 ( 0.246193)
On average, the use of next resulting in over 5x faster code. This difference grows even more the larger the hash becomes.
I'd probably just do something like this
my_hash = hash.reject { |k, _| %i[a b].include?(k) }
my_hash.each do |k, v|
# Do what you need to and be free from worry about those pesky :a and :b keys
end

Differences between literals and constructors? ([] vs Array.new and {} vs Hash.new)

I was curious to know more differences between [] and Array.new and {} and Hash.new
I ran same benchmarks on it and seems like the shorthands are winners
require 'benchmark'
many = 500000
Benchmark.bm do |b|
b.report("[] \t") {many.times { [].object_id }}
b.report("Array.new \t") { many.times { Array.new.object_id }}
b.report("{} \t") {many.times { {}.object_id }}
b.report("Hash.new\t") { many.times { Hash.new.object_id }}
end
user system total real
[] 0.080000 0.000000 0.080000 ( 0.079287)
Array.new 0.180000 0.000000 0.180000 ( 0.177105)
{} 0.080000 0.000000 0.080000 ( 0.079467)
Hash.new 0.260000 0.000000 0.260000 ( 0.264796)
I personally like to use the shorthand one's [] and {} , the code looks so cool and readable.
Any other pointer what is the difference between them? what happens behind scene that make it so better, and suggestions if any when to use which?
I found this link but was looking to get more info.
cheers.
With Hash.new you can set the default value of the hash for unset keys. This is quite useful if you're doing statistics, because Hash.new(0) will let you increment keys without explicitly initializing them.
So for half a million times they all ran "very quickly". I prefer literals in any language ([], {}, 0, "", etc.) but the literals can't do everything (see the documentation for the other constructor forms). Write clean code, and be consistent: there is no issue here :)
However, I suspect the literals avoid a constant lookup and a method call which results in them being faster, at least in that particular implementation .. (someone with more smarts than me could look at the intermediate AST generated to prove/disprove this.)
That is, what if Hash resolved to a custom class or Hash.new was replaced with a custom method? Can't do that with {}. In addition, the methods have to deal with additional arguments (or blocks) while the literals do not.
Robert already mentioned the default value of the Hash.new
You may also use compley 'default'-values with the block variant of Hash.new:
x = Hash.new { |hash, key|
hash[key] = key * 2
}
p x #-> {}
p x[1] #-> 2
p x #-> {1=>2}
Array.new can also be used to get default values:
p Array.new(5, :a) #-> [:a, :a, :a, :a, :a]

How can I speed up this Rails code?

It's a vague question I know....but the performance on this block of code is horrible. It takes about 15secs from the original post to the action to rendering the page...
The purpose of this action is to retrieve all Occupations from a CV, all the skills from that CV and the occupations. They need to be organized in 2 arrays:
the first array contains all the Occupations (no duplicates) and has them ordered according their score. Fo each double entry found the score is increased by 1
the second array contains ALL the skills from both the occupation array and the cv. Again no doubles are allowed, but for every double encountered the score of the existing is increased by one.
Below is the code block that performs this operation. It's relatively big compared to my other code snippets, but i hope it's understandable. I know working with the arrays like i do is confusing, but here is what each array location means:
position 0 : the actuall skill/occupation object
position 1 : the score of the entry
position 2 : the location found in the db
position 3 : the location found in the cv
def categorize
#cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills])
#menu = :second
#language = Language.resolve(:code => :en, :name => :en)
#occupation_hashes = []
#skill_hashes = []
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
occupation.skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
label = occupation.concept.label(#language).value
array[1]+= 1
array[3] << label unless array[3].include? label
else
#skill_hashes << [skill, 1, [], [occupation.concept.label(#language).value]]
end
end
end
#cv.educational_skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
array[1]+= 1
array[3] << 'Education skills' unless array[3].include? 'Education skills'
else
#skill_hashes << [skill, 1, ['Education skills'], []]
end
end
# Sort the hashes
#occupation_hashes.sort! { |x,y| y[1] <=> x[1]}
#skill_hashes.sort! { |x,y| y[1] <=> x[1]}
#max = #skill_hashes.first[1]
#min = #skill_hashes.last[1] end
I can post the additional models and migrations to make it clear what each class does, but I think the first few lines of the above script should be clear on the associations. I'm looking for a way to optimize the each-loops...
That's quite the block of code there. Generally if you're writing methods that serious you're going to have trouble maintaining it in the future. A technique that would help is breaking up that monolithic chunk of code and turning it into a helper class that does the processing in more logical stages, making it easier to fine-tune aspects of it.
For instance, an interface might be:
#categorizer = CvCategorizer.new(params[:cv_id])
This would encapsulate all of the above and save it into instance variables made accessible by being declared with attr_reader.
Using a utility class means you can break up the initialization into steps that are made more clear:
def initialize(cv_id)
# Call a wrapper method that loads the CV
#cv = self.load_cv(cv_id)
# Perform discrete steps to re-order the imported data
self.organize_occupations
self.organize_skills
end
It's really hard to say why this is slow by just looking at it, though I would pay very close attention to log/development.log to see what's going on in there. It could be the initial load is painfully slow but the rest of the method is fine.
You should do a but of profiling in your code to see what is taking a large chunk of time. You can figure out how to work on of the profilers, or just sprinkle some simple puts or logger.info statements throughout your code with a timestamp. Probably easiest to do this by using Benchmark. Note: you may need to require 'benchmark'... not sure if it is auto required in Rails or not.
For a single line, you can do something like this:
logger.info Benchmark.measure { #cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills]) }
And for timing larger blocks of code:
logger.info Benchmark.measure do
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
end
end
I'd just start with large blocks and then narrow it down. Not knowing how large of a dataset you are dealing with, it is hard to say what the problem zone is.
I'll also concur with others that you will be way better off to break this thing into smaller methods. This will also make it easier to test for performance, as you can do things like:
Benchmark.measure { 10000.times { foo.do_that_thing_that_might_be_slow }}

What's the most efficient way to deep copy an object in Ruby?

I know that serializing an object is (to my knowledge) the only way to effectively deep-copy an object (as long as it isn't stateful like IO and whatnot), but is one way particularly more efficient than another?
For example, since I'm using Rails, I could always use ActiveSupport::JSON, to_xml - and from what I can tell marshalling the object is one of the most accepted ways to do this. I'd expect that marshalling is probably the most efficient of these since it's a Ruby internal, but am I missing anything?
Edit: note that its implementation is something I already have covered - I don't want to replace existing shallow copy methods (like dup and clone), so I'll just end up likely adding Object::deep_copy, the result of which being whichever of the above methods (or any suggestions you have :) that has the least overhead.
I was wondering the same thing, so I benchmarked a few different techniques against each other. I was primarily concerned with Arrays and Hashes - I didn't test any complex objects. Perhaps unsurprisingly, a custom deep-clone implementation proved to be the fastest. If you are looking for quick and easy implementation, Marshal appears to be the way to go.
I also benchmarked an XML solution with Rails 3.0.7, not shown below. It was much, much slower, ~10 seconds for only 1000 iterations (the solutions below all ran 10,000 times for the benchmark).
Two notes regarding my JSON solution. First, I used the C variant, version 1.4.3. Second, it doesn't actually work 100%, as symbols will be converted to Strings.
This was all run with ruby 1.9.2p180.
#!/usr/bin/env ruby
require 'benchmark'
require 'yaml'
require 'json/ext'
require 'msgpack'
def dc1(value)
Marshal.load(Marshal.dump(value))
end
def dc2(value)
YAML.load(YAML.dump(value))
end
def dc3(value)
JSON.load(JSON.dump(value))
end
def dc4(value)
if value.is_a?(Hash)
result = value.clone
value.each{|k, v| result[k] = dc4(v)}
result
elsif value.is_a?(Array)
result = value.clone
result.clear
value.each{|v| result << dc4(v)}
result
else
value
end
end
def dc5(value)
MessagePack.unpack(value.to_msgpack)
end
value = {'a' => {:x => [1, [nil, 'b'], {'a' => 1}]}, 'b' => ['z']}
Benchmark.bm do |x|
iterations = 10000
x.report {iterations.times {dc1(value)}}
x.report {iterations.times {dc2(value)}}
x.report {iterations.times {dc3(value)}}
x.report {iterations.times {dc4(value)}}
x.report {iterations.times {dc5(value)}}
end
results in:
user system total real
0.230000 0.000000 0.230000 ( 0.239257) (Marshal)
3.240000 0.030000 3.270000 ( 3.262255) (YAML)
0.590000 0.010000 0.600000 ( 0.601693) (JSON)
0.060000 0.000000 0.060000 ( 0.067661) (Custom)
0.090000 0.010000 0.100000 ( 0.097705) (MessagePack)
I think you need to add an initialize_copy method to the class you are copying. Then put the logic for the deep copy in there. Then when you call clone it will fire that method. I haven't done it but that's my understanding.
I think plan B would be just overriding the clone method:
class CopyMe
attr_accessor :var
def initialize var=''
#var = var
end
def clone deep= false
deep ? CopyMe.new(#var.clone) : CopyMe.new()
end
end
a = CopyMe.new("test")
puts "A: #{a.var}"
b = a.clone
puts "B: #{b.var}"
c = a.clone(true)
puts "C: #{c.var}"
Output
mike#sleepycat:~/projects$ ruby ~/Desktop/clone.rb
A: test
B:
C: test
I'm sure you could make that cooler with a little tinkering but for better or for worse that is probably how I would do it.
Probably the reason Ruby doesn't contain a deep clone has to do with the complexity of the problem. See the notes at the end.
To make a clone that will "deep copy," Hashes, Arrays, and elemental values, i.e., make a copy of each element in the original such that the copy will have the same values, but new objects, you can use this:
class Object
def deepclone
case
when self.class==Hash
hash = {}
self.each { |k,v| hash[k] = v.deepclone }
hash
when self.class==Array
array = []
self.each { |v| array << v.deepclone }
array
else
if defined?(self.class.new)
self.class.new(self)
else
self
end
end
end
end
If you want to redefine the behavior of Ruby's clone method , you can name it just clone instead of deepclone (in 3 places), but I have no idea how redefining Ruby's clone behavior will affect Ruby libraries, or Ruby on Rails, so Caveat Emptor. Personally, I can't recommend doing that.
For example:
a = {'a'=>'x','b'=>'y'} => {"a"=>"x", "b"=>"y"}
b = a.deepclone => {"a"=>"x", "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 15227640 / 15209520
If you want your classes to deepclone properly, their new method (initialize) must be able to deepclone an object of that class in the standard way, i.e., if the first parameter is given, it's assumed to be an object to be deepcloned.
Suppose we want a class M, for example. The first parameter must be an optional object of class M. Here we have a second optional argument z to pre-set the value of z in the new object.
class M
attr_accessor :z
def initialize(m=nil, z=nil)
if m
# deepclone all the variables in m to the new object
#z = m.z.deepclone
else
# default all the variables in M
#z = z # default is nil if not specified
end
end
end
The z pre-set is ignored during cloning here, but your method may have a different behavior. Objects of this class would be created like this:
# a new 'plain vanilla' object of M
m=M.new => #<M:0x0000000213fd88 #z=nil>
# a new object of M with m.z pre-set to 'g'
m=M.new(nil,'g') => #<M:0x00000002134ca8 #z="g">
# a deepclone of m in which the strings are the same value, but different objects
n=m.deepclone => #<M:0x00000002131d00 #z="g">
puts "#{m.z.object_id} / #{n.z.object_id}" => 17409660 / 17403500
Where objects of class M are part of an array:
a = {'a'=>M.new(nil,'g'),'b'=>'y'} => {"a"=>#<M:0x00000001f8bf78 #z="g">, "b"=>"y"}
b = a.deepclone => {"a"=>#<M:0x00000001766f28 #z="g">, "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 12303600 / 12269460
puts "#{a['b'].object_id} / #{b['b'].object_id}" => 16811400 / 17802280
Notes:
If deepclone tries to clone an object which doesn't clone itself in the standard way, it may fail.
If deepclone tries to clone an object which can clone itself in the standard way, and if it is a complex structure, it may (and probably will) make a shallow clone of itself.
deepclone doesn't deep copy the keys in the Hashes. The reason is that they are not usually treated as data, but if you change hash[k] to hash[k.deepclone] they will also be deep copied also.
Certain elemental values have no new method, such as Fixnum. These objects always have the same object ID, and are copied, not cloned.
Be careful because when you deep copy, two parts of your Hash or Array that contained the same object in the original will contain different objects in the deepclone.

Is #inject on hashes considered good style?

Inside the Rails code, people tend to use the Enumerable#inject method to create hashes, like this:
somme_enum.inject({}) do |hash, element|
hash[element.foo] = element.bar
hash
end
While this appears to have become a common idiom, does anyone see an advantage over the "naive" version, which would go like:
hash = {}
some_enum.each { |element| hash[element.foo] = element.bar }
The only advantage I see for the first version is that you do it in a closed block and you don't (explicitly) initialize the hash. Otherwise it abuses a method unexpectedly, is harder to understand and harder to read. So why is it so popular?
As Aleksey points out, Hash#update() is slower than Hash#store(), but that got me thinking about the overall efficiency of #inject() vs a straight #each loop, so I benchmarked a few things:
require 'benchmark'
module HashInject
extend self
PAIRS = 1000.times.map {|i| [sprintf("s%05d",i).to_sym, i]}
def inject_store
PAIRS.inject({}) {|hash, sym, val| hash[sym] = val ; hash }
end
def inject_update
PAIRS.inject({}) {|hash, sym, val| hash.update(val => hash) }
end
def each_store
hash = {}
PAIRS.each {|sym, val| hash[sym] = val }
hash
end
def each_update
hash = {}
PAIRS.each {|sym, val| hash.update(val => hash) }
hash
end
def each_with_object_store
PAIRS.each_with_object({}) {|pair, hash| hash[pair[0]] = pair[1]}
end
def each_with_object_update
PAIRS.each_with_object({}) {|pair, hash| hash.update(pair[0] => pair[1])}
end
def by_initialization
Hash[PAIRS]
end
def tap_store
{}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}
end
def tap_update
{}.tap {|hash| PAIRS.each {|sym, val| hash.update(sym => val)}}
end
N = 10000
Benchmark.bmbm do |x|
x.report("inject_store") { N.times { inject_store }}
x.report("inject_update") { N.times { inject_update }}
x.report("each_store") { N.times {each_store }}
x.report("each_update") { N.times {each_update }}
x.report("each_with_object_store") { N.times {each_with_object_store }}
x.report("each_with_object_update") { N.times {each_with_object_update }}
x.report("by_initialization") { N.times {by_initialization}}
x.report("tap_store") { N.times {tap_store }}
x.report("tap_update") { N.times {tap_update }}
end
end
And the results:
Rehearsal -----------------------------------------------------------
inject_store 10.510000 0.120000 10.630000 ( 10.659169)
inject_update 8.490000 0.190000 8.680000 ( 8.696176)
each_store 4.290000 0.110000 4.400000 ( 4.414936)
each_update 12.800000 0.340000 13.140000 ( 13.188187)
each_with_object_store 5.250000 0.110000 5.360000 ( 5.369417)
each_with_object_update 13.770000 0.340000 14.110000 ( 14.166009)
by_initialization 3.040000 0.110000 3.150000 ( 3.166201)
tap_store 4.470000 0.110000 4.580000 ( 4.594880)
tap_update 12.750000 0.340000 13.090000 ( 13.114379)
------------------------------------------------- total: 77.140000sec
user system total real
inject_store 10.540000 0.110000 10.650000 ( 10.674739)
inject_update 8.620000 0.190000 8.810000 ( 8.826045)
each_store 4.610000 0.110000 4.720000 ( 4.732155)
each_update 12.630000 0.330000 12.960000 ( 13.016104)
each_with_object_store 5.220000 0.110000 5.330000 ( 5.338678)
each_with_object_update 13.730000 0.340000 14.070000 ( 14.102297)
by_initialization 3.010000 0.100000 3.110000 ( 3.123804)
tap_store 4.430000 0.110000 4.540000 ( 4.552919)
tap_update 12.850000 0.330000 13.180000 ( 13.217637)
=> true
Enumerable#each is faster than Enumerable#inject, and Hash#store is faster than Hash#update. But the fastest of all is to pass an array in at initialization time:
Hash[PAIRS]
If you're adding elements after the hash has been created, the winning version is exactly what the OP was suggesting:
hash = {}
PAIRS.each {|sym, val| hash[sym] = val }
hash
But in that case, if you're a purist who wants a single lexical form, you can use #tap and #each and get the same speed:
{}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}
For those not familiar with tap, it creates a binding of the receiver (the new hash) inside the body, and finally returns the receiver (the same hash). If you know Lisp, think of it as Ruby's version of LET binding.
Since people have asked, here's the testing environment:
# Ruby version ruby 2.0.0p247 (2013-06-27) [x86_64-darwin12.4.0]
# OS Mac OS X 10.9.2
# Processor/RAM 2.6GHz Intel Core i7 / 8GB 1067 MHz DDR3
Beauty is in the eye of the beholder. Those with some functional programming background will probably prefer the inject-based method (as I do), because it has the same semantics as the fold higher-order function, which is a common way of calculating a single result from multiple inputs. If you understand inject, then you should understand that the function is being used as intended.
As one reason why this approach seems better (to my eyes), consider the lexical scope of the hash variable. In the inject-based method, hash only exists within the body of the block. In the each-based method, the hash variable inside the block needs to agree with some execution context defined outside the block. Want to define another hash in the same function? Using the inject method, it's possible to cut-and-paste the inject-based code and use it directly, and it almost certainly won't introduce bugs (ignoring whether one should use C&P during editing - people do). Using the each method, you need to C&P the code, and rename the hash variable to whatever name you wanted to use - the extra step means this is more prone to error.
inject (aka reduce) has a long and respected place in functional programming languages. If you're ready to take the plunge, and want to understand a lot of Matz's inspiration for Ruby, you should read the seminal Structure and Interpretation of Computer Programs, available online at http://mitpress.mit.edu/sicp/.
Some programmers find it stylistically cleaner to have everything in one lexical package. In your hash example, using inject means you don't have to create an empty hash in a separate statement. What's more, the inject statement returns the result directly -- you don't have to remember that it's in the hash variable. To make that really clear, consider:
[1, 2, 3, 5, 8].inject(:+)
vs
total = 0
[1, 2, 3, 5, 8].each {|x| total += x}
The first version returns the sum. The second version stores the sum in total, and as a programmer, you have to remember to use total rather than the value returned by the .each statement.
One tiny addendum (and purely idomatic -- not about inject): your example might be better written:
some_enum.inject({}) {|hash, element| hash.update(element.foo => element.bar) }
...since hash.update() returns the hash itself, you don't need the extra hash statement at the end.
update
#Aleksey has shamed me into benchmarking the various combinations. See my benchmarking reply elsewhere here. Short form:
hash = {}
some_enum.each {|x| hash[x.foo] = x.bar}
hash
is the fastest, but can be recast slightly more elegantly -- and it's just as fast -- as:
{}.tap {|hash| some_enum.each {|x| hash[x.foo] = x.bar}}
I have just found in
Ruby inject with initial being a hash
a suggestion to use each_with_object instead of inject:
hash = some_enum.each_with_object({}) do |element, h|
h[element.foo] = element.bar
end
Seems natural to me.
Another way, using tap:
hash = {}.tap do |h|
some_enum.each do |element|
h[element.foo] = element.bar
end
end
If you are returning a hash, using merge can keep it cleaner so you don't have to return the hash afterward.
some_enum.inject({}){|h,e| h.merge(e.foo => e.bar) }
If your enum is a hash, you can get key and value nicely with the (k,v).
some_hash.inject({}){|h,(k,v)| h.merge(k => do_something(v)) }
I think it has to do with people not fully understanding when to use reduce. I agree with you, each is the way it should be

Resources