I am performing an operation on hash values, lets say :
hash = { a: true, b: false, c: nil }
I am executing an each loop on hash but I want to skip keys b and c. I don't want to delete these from hash.
I have tried:
hash = { a: true, b: false, c: nil}
hash.except(:c)
{ a: true, b: false, c: nil}
But it is not working. I am using ruby 2.4.2
Actually hash.except(:c) returns { a: true, b: false } as expected. Since you're using Rails it should work. The only subtle moment, that I want to take a note on is that:
hash.except([:b, :c])
won't work. You need to use
hash.except(:b, :c)
instead.
For general solution you need to use splat operator:
keys = [:b, :c]
hash.except(*keys)
If you simply need to skip it while looping over the hash pairs, I personally would steer clear of using except, and use the uglier next if key == whatever within the loop.
A equality check between symbols is cheap, about the the most low overhead thing that can be done in Ruby, the basic equivalent of comparing two integers or booleans.
except on the other hand is not, especially as the size of the hash grows. You are creating a new cloned hash, minus the specified values, every time you call it. Even with a small hash, you are creating a new object needlessly.
I understand that many Ruby users are forever in pursuit of the "one-liners" or absolute shortest amount of code possible, I am guilty of it myself, but it needs done mindfully of whats going on beneath the surface, or you are creating less efficient code, not more.
So although not as "pretty", this would be more efficient:
hash.each do |k, v|
next if k == :b || k == :c
# Do stuff
end
EDIT
I was curious of the performance difference between what I was stating, and the use of except, and the resulting differences are significant.
First, I added the source for except, I didn't have Rails installed. This is straight from the source code from activesupport/lib/active_support/core_ext/hash/except.rb.
class Hash
def except!(*keys)
keys.each { |key| delete(key) }
self
end
def except(*keys)
dup.except!(*keys)
end
end
Then I did some benchmarking. I figured a 1,000,000 samples was enough for one run.
require 'benchmark'
hash = { a: true, b: false, c: nil }
count = 1_000_000
Benchmark.bm do |bm|
bm.report("except") do
count.times do
hash.except(:b, :c).each do |k, v|
# Do nothing
end
end
end
bm.report("next") do
count.times do
hash.each do |k, v|
next if k == :b || k == :c
# Do nothing
end
end
end
end
Running various times, including changing to bmbm to confirm the GC isn't skewing anything:
user system total real
except 1.282000 0.000000 1.282000 ( 1.276943)
next 0.250000 0.000000 0.250000 ( 0.246193)
On average, the use of next resulting in over 5x faster code. This difference grows even more the larger the hash becomes.
I'd probably just do something like this
my_hash = hash.reject { |k, _| %i[a b].include?(k) }
my_hash.each do |k, v|
# Do what you need to and be free from worry about those pesky :a and :b keys
end
Related
I have a Hash where the majority of it is filled with a key with two values associated with the key. There is also another hash within this Hash which is where I've been stuck.
Lets say the hash looks like:
{'sports'=>['football', 'basketball'], 'season'=>['summer','fall'], 'data'=>[{'holiday'=>'Christmas', 'genre' => 'Comedy'}, {'holiday'=>'Thanksgiving', 'genre' => 'Action'}]}
The output should look like:
Sports
- football
- basketball
Season
- summer
- fall
Holiday
- Christmas
- Thanksgiving
Genre
- Comedy
- Action
So far I have a helper that gives me everything except the data section.
def output_list_from(hash)
return if hash.empty?
content_tag(:ul) do
hash.map do |key, values|
content_tag(:li, key.to_s.humanize) +
content_tag(:ul) do
# if values.is_a?(Hash)...
content_tag(:li, values.first) +
content_tag(:li, values.last)
end
end.join.html_safe
end.html_safe
end
This returns the output:
Sports
- football
- basketball
Season
- summer
- fall
Data
- {'holiday'=>'Christmas', 'genre' => 'Comedy'}
- {'holiday'=>'Thanksgiving', 'genre' => 'Action'}
Which of course makes sense...so I've tried to check in the loop if the value is a Hash, but the way it's set up has tricked me. I think it's be easier if I knew what the hash would look like everytime, but it would be a new hash each time. One time there could be a holiday within data and the other time there could be both holiday and genre.
Any advice would be appreciated.
You will need to create a hash with the correct format. Something like this:
hash = {'sports'=>['football', 'basketball'], 'season'=>['summer','fall'], 'data'=>[{'holiday'=>'Christmas', 'genre' => 'Comedy'}, {'holiday'=>'Thanksgiving', 'genre' => 'Action'}]}
formatted_data = hash.dup
data = formatted_data.delete('data')
if data
data.each do |item|
item.each do |k, v|
formatted_data[k] ||= []
formatted_data[k] << v
end
end
end
puts formatted_data
# => {"sports"=>["football", "basketball"], "season"=>["summer", "fall"],
# => "holiday"=>["Christmas", "Thanksgiving"], "genre"=>["Comedy", "Action"]}
content_tag(:ul) do
formatted_data.map do |key, values|
#... your code here...
end.join.html_safe
end.html_safe
Suppose your hash looked like this:
hash = { 'sports'=>['football', 'basketball'],
'season'=>['summer', 'fall'],
'data1' =>[{ 'holiday'=>'Christmas', 'genre'=>'Comedy'},
{ 'holiday'=>'Thanksgiving', 'genre'=>'Action' }],
'data2' =>[{ 'data3'=>[{ 'sports'=>'darts', 'genre'=>'Occult' }] }]
}
and you wanted a general solution that would work for any number of levels and does not depend on the names of the keys that will not be in the resulting hash (here 'data1', 'data2' and 'data3'). Here's one way you could do that, using recursion.
Code
def extract(h, new_hash = {})
h.each do |k,v|
[*v].each do |e|
case e
when Hash then extract(e, new_hash)
else new_hash.update({ k=>[e] }) { |_,ov,nv| ov << nv.first }
end
end
end
new_hash
end
Example
extract(hash)
#=> {"sports"=>["football", "basketball", "darts"],
# "season"=>["summer", "fall"],
# "holiday"=>["Christmas", "Thanksgiving"],
# "genre"=>["Comedy", "Action", "Occult"]}
Explanation
There are, I think, mainly two things in the code that may require clarification.
#1
The first is the rather lonely and odd-looking expression:
[*v]
If v is an array, this returns v. If v is a literal, the splat operator has no effect, so it returns [v]. In other words, it leaves arrays alone and converts literals to an array containing one element, itself. Ergo:
[*['football', 'basketball']] #=> ["football", "basketball"]
[*'Thanksgiving'] #=> ["Thanksgiving"]
This saves us the trouble of having three, rather than two, possibilities in the case statement. We simply convert literals to arrays of one element, allowing us to deal with just hashes and arrays.
#2
The second snippet that may be unfamiliar to some is this:
new_hash.update({ k=>[e] }) { |_,ov,nv| ov << nv.first }
This uses the form of the method Hash#update (a.k.a. merge!) that uses a block to resolve the values of keys that are present in both hashes being merged. As an example, at some stage of the calculations, new_hash will have a key-value pair:
'sports'=>['football', 'basketball']
and is to be updated with the hash1:
{ 'sports'=>['darts'] }
Since both of these hashes have the key 'sport', the block is called upon as arbiter:
{ |k,ov,nv| ov << nv.first }
#=> { |'sport', ['football', 'basketball'], ['darts']| ov << nv.first }
#=> { |'sport', ['football', 'basketball'], ['darts']|
['football', 'basketball'] << 'darts' }
#=> ['football', 'basketball'] << 'darts'
As I'm not using the key 'sport' in the block, I've replaced that block variable with a placeholder (_) to reduce opportunities for error and also to inform the reader that the key is not being used.
1 I sometimes use darts as example of a sport because it is one of the few in which one can be successful without being extremely physically fit.
I was curious to know more differences between [] and Array.new and {} and Hash.new
I ran same benchmarks on it and seems like the shorthands are winners
require 'benchmark'
many = 500000
Benchmark.bm do |b|
b.report("[] \t") {many.times { [].object_id }}
b.report("Array.new \t") { many.times { Array.new.object_id }}
b.report("{} \t") {many.times { {}.object_id }}
b.report("Hash.new\t") { many.times { Hash.new.object_id }}
end
user system total real
[] 0.080000 0.000000 0.080000 ( 0.079287)
Array.new 0.180000 0.000000 0.180000 ( 0.177105)
{} 0.080000 0.000000 0.080000 ( 0.079467)
Hash.new 0.260000 0.000000 0.260000 ( 0.264796)
I personally like to use the shorthand one's [] and {} , the code looks so cool and readable.
Any other pointer what is the difference between them? what happens behind scene that make it so better, and suggestions if any when to use which?
I found this link but was looking to get more info.
cheers.
With Hash.new you can set the default value of the hash for unset keys. This is quite useful if you're doing statistics, because Hash.new(0) will let you increment keys without explicitly initializing them.
So for half a million times they all ran "very quickly". I prefer literals in any language ([], {}, 0, "", etc.) but the literals can't do everything (see the documentation for the other constructor forms). Write clean code, and be consistent: there is no issue here :)
However, I suspect the literals avoid a constant lookup and a method call which results in them being faster, at least in that particular implementation .. (someone with more smarts than me could look at the intermediate AST generated to prove/disprove this.)
That is, what if Hash resolved to a custom class or Hash.new was replaced with a custom method? Can't do that with {}. In addition, the methods have to deal with additional arguments (or blocks) while the literals do not.
Robert already mentioned the default value of the Hash.new
You may also use compley 'default'-values with the block variant of Hash.new:
x = Hash.new { |hash, key|
hash[key] = key * 2
}
p x #-> {}
p x[1] #-> 2
p x #-> {1=>2}
Array.new can also be used to get default values:
p Array.new(5, :a) #-> [:a, :a, :a, :a, :a]
It's a vague question I know....but the performance on this block of code is horrible. It takes about 15secs from the original post to the action to rendering the page...
The purpose of this action is to retrieve all Occupations from a CV, all the skills from that CV and the occupations. They need to be organized in 2 arrays:
the first array contains all the Occupations (no duplicates) and has them ordered according their score. Fo each double entry found the score is increased by 1
the second array contains ALL the skills from both the occupation array and the cv. Again no doubles are allowed, but for every double encountered the score of the existing is increased by one.
Below is the code block that performs this operation. It's relatively big compared to my other code snippets, but i hope it's understandable. I know working with the arrays like i do is confusing, but here is what each array location means:
position 0 : the actuall skill/occupation object
position 1 : the score of the entry
position 2 : the location found in the db
position 3 : the location found in the cv
def categorize
#cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills])
#menu = :second
#language = Language.resolve(:code => :en, :name => :en)
#occupation_hashes = []
#skill_hashes = []
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
occupation.skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
label = occupation.concept.label(#language).value
array[1]+= 1
array[3] << label unless array[3].include? label
else
#skill_hashes << [skill, 1, [], [occupation.concept.label(#language).value]]
end
end
end
#cv.educational_skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
array[1]+= 1
array[3] << 'Education skills' unless array[3].include? 'Education skills'
else
#skill_hashes << [skill, 1, ['Education skills'], []]
end
end
# Sort the hashes
#occupation_hashes.sort! { |x,y| y[1] <=> x[1]}
#skill_hashes.sort! { |x,y| y[1] <=> x[1]}
#max = #skill_hashes.first[1]
#min = #skill_hashes.last[1] end
I can post the additional models and migrations to make it clear what each class does, but I think the first few lines of the above script should be clear on the associations. I'm looking for a way to optimize the each-loops...
That's quite the block of code there. Generally if you're writing methods that serious you're going to have trouble maintaining it in the future. A technique that would help is breaking up that monolithic chunk of code and turning it into a helper class that does the processing in more logical stages, making it easier to fine-tune aspects of it.
For instance, an interface might be:
#categorizer = CvCategorizer.new(params[:cv_id])
This would encapsulate all of the above and save it into instance variables made accessible by being declared with attr_reader.
Using a utility class means you can break up the initialization into steps that are made more clear:
def initialize(cv_id)
# Call a wrapper method that loads the CV
#cv = self.load_cv(cv_id)
# Perform discrete steps to re-order the imported data
self.organize_occupations
self.organize_skills
end
It's really hard to say why this is slow by just looking at it, though I would pay very close attention to log/development.log to see what's going on in there. It could be the initial load is painfully slow but the rest of the method is fine.
You should do a but of profiling in your code to see what is taking a large chunk of time. You can figure out how to work on of the profilers, or just sprinkle some simple puts or logger.info statements throughout your code with a timestamp. Probably easiest to do this by using Benchmark. Note: you may need to require 'benchmark'... not sure if it is auto required in Rails or not.
For a single line, you can do something like this:
logger.info Benchmark.measure { #cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills]) }
And for timing larger blocks of code:
logger.info Benchmark.measure do
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
end
end
I'd just start with large blocks and then narrow it down. Not knowing how large of a dataset you are dealing with, it is hard to say what the problem zone is.
I'll also concur with others that you will be way better off to break this thing into smaller methods. This will also make it easier to test for performance, as you can do things like:
Benchmark.measure { 10000.times { foo.do_that_thing_that_might_be_slow }}
I know that serializing an object is (to my knowledge) the only way to effectively deep-copy an object (as long as it isn't stateful like IO and whatnot), but is one way particularly more efficient than another?
For example, since I'm using Rails, I could always use ActiveSupport::JSON, to_xml - and from what I can tell marshalling the object is one of the most accepted ways to do this. I'd expect that marshalling is probably the most efficient of these since it's a Ruby internal, but am I missing anything?
Edit: note that its implementation is something I already have covered - I don't want to replace existing shallow copy methods (like dup and clone), so I'll just end up likely adding Object::deep_copy, the result of which being whichever of the above methods (or any suggestions you have :) that has the least overhead.
I was wondering the same thing, so I benchmarked a few different techniques against each other. I was primarily concerned with Arrays and Hashes - I didn't test any complex objects. Perhaps unsurprisingly, a custom deep-clone implementation proved to be the fastest. If you are looking for quick and easy implementation, Marshal appears to be the way to go.
I also benchmarked an XML solution with Rails 3.0.7, not shown below. It was much, much slower, ~10 seconds for only 1000 iterations (the solutions below all ran 10,000 times for the benchmark).
Two notes regarding my JSON solution. First, I used the C variant, version 1.4.3. Second, it doesn't actually work 100%, as symbols will be converted to Strings.
This was all run with ruby 1.9.2p180.
#!/usr/bin/env ruby
require 'benchmark'
require 'yaml'
require 'json/ext'
require 'msgpack'
def dc1(value)
Marshal.load(Marshal.dump(value))
end
def dc2(value)
YAML.load(YAML.dump(value))
end
def dc3(value)
JSON.load(JSON.dump(value))
end
def dc4(value)
if value.is_a?(Hash)
result = value.clone
value.each{|k, v| result[k] = dc4(v)}
result
elsif value.is_a?(Array)
result = value.clone
result.clear
value.each{|v| result << dc4(v)}
result
else
value
end
end
def dc5(value)
MessagePack.unpack(value.to_msgpack)
end
value = {'a' => {:x => [1, [nil, 'b'], {'a' => 1}]}, 'b' => ['z']}
Benchmark.bm do |x|
iterations = 10000
x.report {iterations.times {dc1(value)}}
x.report {iterations.times {dc2(value)}}
x.report {iterations.times {dc3(value)}}
x.report {iterations.times {dc4(value)}}
x.report {iterations.times {dc5(value)}}
end
results in:
user system total real
0.230000 0.000000 0.230000 ( 0.239257) (Marshal)
3.240000 0.030000 3.270000 ( 3.262255) (YAML)
0.590000 0.010000 0.600000 ( 0.601693) (JSON)
0.060000 0.000000 0.060000 ( 0.067661) (Custom)
0.090000 0.010000 0.100000 ( 0.097705) (MessagePack)
I think you need to add an initialize_copy method to the class you are copying. Then put the logic for the deep copy in there. Then when you call clone it will fire that method. I haven't done it but that's my understanding.
I think plan B would be just overriding the clone method:
class CopyMe
attr_accessor :var
def initialize var=''
#var = var
end
def clone deep= false
deep ? CopyMe.new(#var.clone) : CopyMe.new()
end
end
a = CopyMe.new("test")
puts "A: #{a.var}"
b = a.clone
puts "B: #{b.var}"
c = a.clone(true)
puts "C: #{c.var}"
Output
mike#sleepycat:~/projects$ ruby ~/Desktop/clone.rb
A: test
B:
C: test
I'm sure you could make that cooler with a little tinkering but for better or for worse that is probably how I would do it.
Probably the reason Ruby doesn't contain a deep clone has to do with the complexity of the problem. See the notes at the end.
To make a clone that will "deep copy," Hashes, Arrays, and elemental values, i.e., make a copy of each element in the original such that the copy will have the same values, but new objects, you can use this:
class Object
def deepclone
case
when self.class==Hash
hash = {}
self.each { |k,v| hash[k] = v.deepclone }
hash
when self.class==Array
array = []
self.each { |v| array << v.deepclone }
array
else
if defined?(self.class.new)
self.class.new(self)
else
self
end
end
end
end
If you want to redefine the behavior of Ruby's clone method , you can name it just clone instead of deepclone (in 3 places), but I have no idea how redefining Ruby's clone behavior will affect Ruby libraries, or Ruby on Rails, so Caveat Emptor. Personally, I can't recommend doing that.
For example:
a = {'a'=>'x','b'=>'y'} => {"a"=>"x", "b"=>"y"}
b = a.deepclone => {"a"=>"x", "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 15227640 / 15209520
If you want your classes to deepclone properly, their new method (initialize) must be able to deepclone an object of that class in the standard way, i.e., if the first parameter is given, it's assumed to be an object to be deepcloned.
Suppose we want a class M, for example. The first parameter must be an optional object of class M. Here we have a second optional argument z to pre-set the value of z in the new object.
class M
attr_accessor :z
def initialize(m=nil, z=nil)
if m
# deepclone all the variables in m to the new object
#z = m.z.deepclone
else
# default all the variables in M
#z = z # default is nil if not specified
end
end
end
The z pre-set is ignored during cloning here, but your method may have a different behavior. Objects of this class would be created like this:
# a new 'plain vanilla' object of M
m=M.new => #<M:0x0000000213fd88 #z=nil>
# a new object of M with m.z pre-set to 'g'
m=M.new(nil,'g') => #<M:0x00000002134ca8 #z="g">
# a deepclone of m in which the strings are the same value, but different objects
n=m.deepclone => #<M:0x00000002131d00 #z="g">
puts "#{m.z.object_id} / #{n.z.object_id}" => 17409660 / 17403500
Where objects of class M are part of an array:
a = {'a'=>M.new(nil,'g'),'b'=>'y'} => {"a"=>#<M:0x00000001f8bf78 #z="g">, "b"=>"y"}
b = a.deepclone => {"a"=>#<M:0x00000001766f28 #z="g">, "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 12303600 / 12269460
puts "#{a['b'].object_id} / #{b['b'].object_id}" => 16811400 / 17802280
Notes:
If deepclone tries to clone an object which doesn't clone itself in the standard way, it may fail.
If deepclone tries to clone an object which can clone itself in the standard way, and if it is a complex structure, it may (and probably will) make a shallow clone of itself.
deepclone doesn't deep copy the keys in the Hashes. The reason is that they are not usually treated as data, but if you change hash[k] to hash[k.deepclone] they will also be deep copied also.
Certain elemental values have no new method, such as Fixnum. These objects always have the same object ID, and are copied, not cloned.
Be careful because when you deep copy, two parts of your Hash or Array that contained the same object in the original will contain different objects in the deepclone.
Inside the Rails code, people tend to use the Enumerable#inject method to create hashes, like this:
somme_enum.inject({}) do |hash, element|
hash[element.foo] = element.bar
hash
end
While this appears to have become a common idiom, does anyone see an advantage over the "naive" version, which would go like:
hash = {}
some_enum.each { |element| hash[element.foo] = element.bar }
The only advantage I see for the first version is that you do it in a closed block and you don't (explicitly) initialize the hash. Otherwise it abuses a method unexpectedly, is harder to understand and harder to read. So why is it so popular?
As Aleksey points out, Hash#update() is slower than Hash#store(), but that got me thinking about the overall efficiency of #inject() vs a straight #each loop, so I benchmarked a few things:
require 'benchmark'
module HashInject
extend self
PAIRS = 1000.times.map {|i| [sprintf("s%05d",i).to_sym, i]}
def inject_store
PAIRS.inject({}) {|hash, sym, val| hash[sym] = val ; hash }
end
def inject_update
PAIRS.inject({}) {|hash, sym, val| hash.update(val => hash) }
end
def each_store
hash = {}
PAIRS.each {|sym, val| hash[sym] = val }
hash
end
def each_update
hash = {}
PAIRS.each {|sym, val| hash.update(val => hash) }
hash
end
def each_with_object_store
PAIRS.each_with_object({}) {|pair, hash| hash[pair[0]] = pair[1]}
end
def each_with_object_update
PAIRS.each_with_object({}) {|pair, hash| hash.update(pair[0] => pair[1])}
end
def by_initialization
Hash[PAIRS]
end
def tap_store
{}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}
end
def tap_update
{}.tap {|hash| PAIRS.each {|sym, val| hash.update(sym => val)}}
end
N = 10000
Benchmark.bmbm do |x|
x.report("inject_store") { N.times { inject_store }}
x.report("inject_update") { N.times { inject_update }}
x.report("each_store") { N.times {each_store }}
x.report("each_update") { N.times {each_update }}
x.report("each_with_object_store") { N.times {each_with_object_store }}
x.report("each_with_object_update") { N.times {each_with_object_update }}
x.report("by_initialization") { N.times {by_initialization}}
x.report("tap_store") { N.times {tap_store }}
x.report("tap_update") { N.times {tap_update }}
end
end
And the results:
Rehearsal -----------------------------------------------------------
inject_store 10.510000 0.120000 10.630000 ( 10.659169)
inject_update 8.490000 0.190000 8.680000 ( 8.696176)
each_store 4.290000 0.110000 4.400000 ( 4.414936)
each_update 12.800000 0.340000 13.140000 ( 13.188187)
each_with_object_store 5.250000 0.110000 5.360000 ( 5.369417)
each_with_object_update 13.770000 0.340000 14.110000 ( 14.166009)
by_initialization 3.040000 0.110000 3.150000 ( 3.166201)
tap_store 4.470000 0.110000 4.580000 ( 4.594880)
tap_update 12.750000 0.340000 13.090000 ( 13.114379)
------------------------------------------------- total: 77.140000sec
user system total real
inject_store 10.540000 0.110000 10.650000 ( 10.674739)
inject_update 8.620000 0.190000 8.810000 ( 8.826045)
each_store 4.610000 0.110000 4.720000 ( 4.732155)
each_update 12.630000 0.330000 12.960000 ( 13.016104)
each_with_object_store 5.220000 0.110000 5.330000 ( 5.338678)
each_with_object_update 13.730000 0.340000 14.070000 ( 14.102297)
by_initialization 3.010000 0.100000 3.110000 ( 3.123804)
tap_store 4.430000 0.110000 4.540000 ( 4.552919)
tap_update 12.850000 0.330000 13.180000 ( 13.217637)
=> true
Enumerable#each is faster than Enumerable#inject, and Hash#store is faster than Hash#update. But the fastest of all is to pass an array in at initialization time:
Hash[PAIRS]
If you're adding elements after the hash has been created, the winning version is exactly what the OP was suggesting:
hash = {}
PAIRS.each {|sym, val| hash[sym] = val }
hash
But in that case, if you're a purist who wants a single lexical form, you can use #tap and #each and get the same speed:
{}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}
For those not familiar with tap, it creates a binding of the receiver (the new hash) inside the body, and finally returns the receiver (the same hash). If you know Lisp, think of it as Ruby's version of LET binding.
Since people have asked, here's the testing environment:
# Ruby version ruby 2.0.0p247 (2013-06-27) [x86_64-darwin12.4.0]
# OS Mac OS X 10.9.2
# Processor/RAM 2.6GHz Intel Core i7 / 8GB 1067 MHz DDR3
Beauty is in the eye of the beholder. Those with some functional programming background will probably prefer the inject-based method (as I do), because it has the same semantics as the fold higher-order function, which is a common way of calculating a single result from multiple inputs. If you understand inject, then you should understand that the function is being used as intended.
As one reason why this approach seems better (to my eyes), consider the lexical scope of the hash variable. In the inject-based method, hash only exists within the body of the block. In the each-based method, the hash variable inside the block needs to agree with some execution context defined outside the block. Want to define another hash in the same function? Using the inject method, it's possible to cut-and-paste the inject-based code and use it directly, and it almost certainly won't introduce bugs (ignoring whether one should use C&P during editing - people do). Using the each method, you need to C&P the code, and rename the hash variable to whatever name you wanted to use - the extra step means this is more prone to error.
inject (aka reduce) has a long and respected place in functional programming languages. If you're ready to take the plunge, and want to understand a lot of Matz's inspiration for Ruby, you should read the seminal Structure and Interpretation of Computer Programs, available online at http://mitpress.mit.edu/sicp/.
Some programmers find it stylistically cleaner to have everything in one lexical package. In your hash example, using inject means you don't have to create an empty hash in a separate statement. What's more, the inject statement returns the result directly -- you don't have to remember that it's in the hash variable. To make that really clear, consider:
[1, 2, 3, 5, 8].inject(:+)
vs
total = 0
[1, 2, 3, 5, 8].each {|x| total += x}
The first version returns the sum. The second version stores the sum in total, and as a programmer, you have to remember to use total rather than the value returned by the .each statement.
One tiny addendum (and purely idomatic -- not about inject): your example might be better written:
some_enum.inject({}) {|hash, element| hash.update(element.foo => element.bar) }
...since hash.update() returns the hash itself, you don't need the extra hash statement at the end.
update
#Aleksey has shamed me into benchmarking the various combinations. See my benchmarking reply elsewhere here. Short form:
hash = {}
some_enum.each {|x| hash[x.foo] = x.bar}
hash
is the fastest, but can be recast slightly more elegantly -- and it's just as fast -- as:
{}.tap {|hash| some_enum.each {|x| hash[x.foo] = x.bar}}
I have just found in
Ruby inject with initial being a hash
a suggestion to use each_with_object instead of inject:
hash = some_enum.each_with_object({}) do |element, h|
h[element.foo] = element.bar
end
Seems natural to me.
Another way, using tap:
hash = {}.tap do |h|
some_enum.each do |element|
h[element.foo] = element.bar
end
end
If you are returning a hash, using merge can keep it cleaner so you don't have to return the hash afterward.
some_enum.inject({}){|h,e| h.merge(e.foo => e.bar) }
If your enum is a hash, you can get key and value nicely with the (k,v).
some_hash.inject({}){|h,(k,v)| h.merge(k => do_something(v)) }
I think it has to do with people not fully understanding when to use reduce. I agree with you, each is the way it should be