Related
I am performing an operation on hash values, lets say :
hash = { a: true, b: false, c: nil }
I am executing an each loop on hash but I want to skip keys b and c. I don't want to delete these from hash.
I have tried:
hash = { a: true, b: false, c: nil}
hash.except(:c)
{ a: true, b: false, c: nil}
But it is not working. I am using ruby 2.4.2
Actually hash.except(:c) returns { a: true, b: false } as expected. Since you're using Rails it should work. The only subtle moment, that I want to take a note on is that:
hash.except([:b, :c])
won't work. You need to use
hash.except(:b, :c)
instead.
For general solution you need to use splat operator:
keys = [:b, :c]
hash.except(*keys)
If you simply need to skip it while looping over the hash pairs, I personally would steer clear of using except, and use the uglier next if key == whatever within the loop.
A equality check between symbols is cheap, about the the most low overhead thing that can be done in Ruby, the basic equivalent of comparing two integers or booleans.
except on the other hand is not, especially as the size of the hash grows. You are creating a new cloned hash, minus the specified values, every time you call it. Even with a small hash, you are creating a new object needlessly.
I understand that many Ruby users are forever in pursuit of the "one-liners" or absolute shortest amount of code possible, I am guilty of it myself, but it needs done mindfully of whats going on beneath the surface, or you are creating less efficient code, not more.
So although not as "pretty", this would be more efficient:
hash.each do |k, v|
next if k == :b || k == :c
# Do stuff
end
EDIT
I was curious of the performance difference between what I was stating, and the use of except, and the resulting differences are significant.
First, I added the source for except, I didn't have Rails installed. This is straight from the source code from activesupport/lib/active_support/core_ext/hash/except.rb.
class Hash
def except!(*keys)
keys.each { |key| delete(key) }
self
end
def except(*keys)
dup.except!(*keys)
end
end
Then I did some benchmarking. I figured a 1,000,000 samples was enough for one run.
require 'benchmark'
hash = { a: true, b: false, c: nil }
count = 1_000_000
Benchmark.bm do |bm|
bm.report("except") do
count.times do
hash.except(:b, :c).each do |k, v|
# Do nothing
end
end
end
bm.report("next") do
count.times do
hash.each do |k, v|
next if k == :b || k == :c
# Do nothing
end
end
end
end
Running various times, including changing to bmbm to confirm the GC isn't skewing anything:
user system total real
except 1.282000 0.000000 1.282000 ( 1.276943)
next 0.250000 0.000000 0.250000 ( 0.246193)
On average, the use of next resulting in over 5x faster code. This difference grows even more the larger the hash becomes.
I'd probably just do something like this
my_hash = hash.reject { |k, _| %i[a b].include?(k) }
my_hash.each do |k, v|
# Do what you need to and be free from worry about those pesky :a and :b keys
end
I have a Hash where the majority of it is filled with a key with two values associated with the key. There is also another hash within this Hash which is where I've been stuck.
Lets say the hash looks like:
{'sports'=>['football', 'basketball'], 'season'=>['summer','fall'], 'data'=>[{'holiday'=>'Christmas', 'genre' => 'Comedy'}, {'holiday'=>'Thanksgiving', 'genre' => 'Action'}]}
The output should look like:
Sports
- football
- basketball
Season
- summer
- fall
Holiday
- Christmas
- Thanksgiving
Genre
- Comedy
- Action
So far I have a helper that gives me everything except the data section.
def output_list_from(hash)
return if hash.empty?
content_tag(:ul) do
hash.map do |key, values|
content_tag(:li, key.to_s.humanize) +
content_tag(:ul) do
# if values.is_a?(Hash)...
content_tag(:li, values.first) +
content_tag(:li, values.last)
end
end.join.html_safe
end.html_safe
end
This returns the output:
Sports
- football
- basketball
Season
- summer
- fall
Data
- {'holiday'=>'Christmas', 'genre' => 'Comedy'}
- {'holiday'=>'Thanksgiving', 'genre' => 'Action'}
Which of course makes sense...so I've tried to check in the loop if the value is a Hash, but the way it's set up has tricked me. I think it's be easier if I knew what the hash would look like everytime, but it would be a new hash each time. One time there could be a holiday within data and the other time there could be both holiday and genre.
Any advice would be appreciated.
You will need to create a hash with the correct format. Something like this:
hash = {'sports'=>['football', 'basketball'], 'season'=>['summer','fall'], 'data'=>[{'holiday'=>'Christmas', 'genre' => 'Comedy'}, {'holiday'=>'Thanksgiving', 'genre' => 'Action'}]}
formatted_data = hash.dup
data = formatted_data.delete('data')
if data
data.each do |item|
item.each do |k, v|
formatted_data[k] ||= []
formatted_data[k] << v
end
end
end
puts formatted_data
# => {"sports"=>["football", "basketball"], "season"=>["summer", "fall"],
# => "holiday"=>["Christmas", "Thanksgiving"], "genre"=>["Comedy", "Action"]}
content_tag(:ul) do
formatted_data.map do |key, values|
#... your code here...
end.join.html_safe
end.html_safe
Suppose your hash looked like this:
hash = { 'sports'=>['football', 'basketball'],
'season'=>['summer', 'fall'],
'data1' =>[{ 'holiday'=>'Christmas', 'genre'=>'Comedy'},
{ 'holiday'=>'Thanksgiving', 'genre'=>'Action' }],
'data2' =>[{ 'data3'=>[{ 'sports'=>'darts', 'genre'=>'Occult' }] }]
}
and you wanted a general solution that would work for any number of levels and does not depend on the names of the keys that will not be in the resulting hash (here 'data1', 'data2' and 'data3'). Here's one way you could do that, using recursion.
Code
def extract(h, new_hash = {})
h.each do |k,v|
[*v].each do |e|
case e
when Hash then extract(e, new_hash)
else new_hash.update({ k=>[e] }) { |_,ov,nv| ov << nv.first }
end
end
end
new_hash
end
Example
extract(hash)
#=> {"sports"=>["football", "basketball", "darts"],
# "season"=>["summer", "fall"],
# "holiday"=>["Christmas", "Thanksgiving"],
# "genre"=>["Comedy", "Action", "Occult"]}
Explanation
There are, I think, mainly two things in the code that may require clarification.
#1
The first is the rather lonely and odd-looking expression:
[*v]
If v is an array, this returns v. If v is a literal, the splat operator has no effect, so it returns [v]. In other words, it leaves arrays alone and converts literals to an array containing one element, itself. Ergo:
[*['football', 'basketball']] #=> ["football", "basketball"]
[*'Thanksgiving'] #=> ["Thanksgiving"]
This saves us the trouble of having three, rather than two, possibilities in the case statement. We simply convert literals to arrays of one element, allowing us to deal with just hashes and arrays.
#2
The second snippet that may be unfamiliar to some is this:
new_hash.update({ k=>[e] }) { |_,ov,nv| ov << nv.first }
This uses the form of the method Hash#update (a.k.a. merge!) that uses a block to resolve the values of keys that are present in both hashes being merged. As an example, at some stage of the calculations, new_hash will have a key-value pair:
'sports'=>['football', 'basketball']
and is to be updated with the hash1:
{ 'sports'=>['darts'] }
Since both of these hashes have the key 'sport', the block is called upon as arbiter:
{ |k,ov,nv| ov << nv.first }
#=> { |'sport', ['football', 'basketball'], ['darts']| ov << nv.first }
#=> { |'sport', ['football', 'basketball'], ['darts']|
['football', 'basketball'] << 'darts' }
#=> ['football', 'basketball'] << 'darts'
As I'm not using the key 'sport' in the block, I've replaced that block variable with a placeholder (_) to reduce opportunities for error and also to inform the reader that the key is not being used.
1 I sometimes use darts as example of a sport because it is one of the few in which one can be successful without being extremely physically fit.
I want to test an iterator using rspec. It seems to me that the only possible yield matcher is yield_successive_args (according to https://www.relishapp.com/rspec/rspec-expectations/v/3-0/docs/built-in-matchers/yield-matchers). The other matchers are used only for single yielding.
But yield_successive_args fails if the yielding is in other order than specified.
Is there any method or nice workaround for testing iterator that yields in any order?
Something like the following:
expect { |b| array.each(&b) }.to yield_multiple_args_in_any_order(1, 2, 3)
Here is the matcher I came up for this problem, it's fairly simple, and should work with a good degree of efficiency.
require 'set'
RSpec::Matchers.define :yield_in_any_order do |*values|
expected_yields = Set[*values]
actual_yields = Set[]
match do |blk|
blk[->(x){ actual_yields << x }] # ***
expected_yields == actual_yields # ***
end
failure_message do |actual|
"expected to receive #{surface_descriptions_in expected_yields} "\
"but #{surface_descriptions_in actual_yields} were yielded."
end
failure_message_when_negated do |actual|
"expected not to have all of "\
"#{surface_descriptions_in expected_yields} yielded."
end
def supports_block_expectations?
true
end
end
I've highlighted the lines containing most of the important logic with # ***. It's a pretty straightforward implementation.
Usage
Just put it in a file, under spec/support/matchers/, and make sure you require it from the specs that need it. Most of the time, people just add a line like this:
Dir[File.dirname(__FILE__) + "/support/**/*.rb"].each {|f| require f}
to their spec_helper.rb but if you have a lot of support files, and they aren't all needed everywhere, this can get a bit much, so you may want to only include it where it is used.
Then, in the specs themselves, the usage is like that of any other yielding matcher:
class Iterator
def custom_iterator
(1..10).to_a.shuffle.each { |x| yield x }
end
end
describe "Matcher" do
it "works" do
iter = Iterator.new
expect { |b| iter.custom_iterator(&b) }.to yield_in_any_order(*(1..10))
end
end
This can be solved in plain Ruby using a set intersection of arrays:
array1 = [3, 2, 4]
array2 = [4, 3, 2]
expect(array1).to eq (array1 & array2)
# for an enumerator:
enumerator = array1.each
expect(enumerator.to_a).to eq (enumerator.to_a & array2)
The intersection (&) will return items that are present in both collections, keeping the order of the first argument.
This question already has an answer here:
How to implement injection in Ruby?
(1 answer)
Closed 8 years ago.
I have a class called Hsh which basically simulates a hash. It has an array of Couple objects (which hold fields named one and two, one is an int another is a string name of that int).
I am supposed to be able to accept the following call:
h = x.inject({}) {|a, b| a[b.one] = b.two; a}
Where x is the Hsh object.
I am not sure how to implement the inject method within Hsh? Like, what would I write in:
def inject ????
??
??
end
All it's supposed to do is create a hash map.
you shouldn't really need to implement it, just implement Hsh#eachand include Enumerable, you'll get inject for free.
For your specific example something like this should work:
def inject accumulator
#I assume Hsh has some way to iterate over the elements
each do |elem|
accumulator = yield accumulator, elem
end
accumulator
end
But the real implementation of inject is a bit different (e.g. works without providing an accumulator, takes a symbol instead of a block etc)
require 'ostruct'
class Hsh
include Enumerable
def initialize
#arr = (0..9).map{ |i| OpenStruct.new(:one => i, :two => "#{i}")}
end
def each(&block)
#arr.each(&block)
end
end
p Hsh.new.inject({}) {|a, b| a[b.one] = b.two; a}
#=> {5=>"5", 0=>"0", 6=>"6", 1=>"1", 7=>"7", 2=>"2", 8=>"8", 3=>"3", 9=>"9", 4=>"4"}
In this particular case Hsh is actually an array, so unless you use it for something else such a complex code doesn't make sense, it can be done much easier:
p (0..9).map{ |i| OpenStruct.new(:one => i, :two => "#{i}")} \
.inject({}) {|a, b| a[b.one] = b.two; a}
#=> {5=>"5", 0=>"0", 6=>"6", 1=>"1", 7=>"7", 2=>"2", 8=>"8", 3=>"3", 9=>"9", 4=>"4"}
I have used OpenStruct instead of classes. See if this works for you
require 'ostruct'
class Hsh
attr_accessor :arr
def initialize
obj = OpenStruct.new
obj.one = 1
obj.two = "two"
#arr = [obj]
end
def inject(hash)
#arr.each do |arr|
yield hash, arr
end
hash
end
end
x = Hsh.new
p x.inject({}) {|a, b| a[b.one] = b.two} #prints {1 => "two"}
Inside the Rails code, people tend to use the Enumerable#inject method to create hashes, like this:
somme_enum.inject({}) do |hash, element|
hash[element.foo] = element.bar
hash
end
While this appears to have become a common idiom, does anyone see an advantage over the "naive" version, which would go like:
hash = {}
some_enum.each { |element| hash[element.foo] = element.bar }
The only advantage I see for the first version is that you do it in a closed block and you don't (explicitly) initialize the hash. Otherwise it abuses a method unexpectedly, is harder to understand and harder to read. So why is it so popular?
As Aleksey points out, Hash#update() is slower than Hash#store(), but that got me thinking about the overall efficiency of #inject() vs a straight #each loop, so I benchmarked a few things:
require 'benchmark'
module HashInject
extend self
PAIRS = 1000.times.map {|i| [sprintf("s%05d",i).to_sym, i]}
def inject_store
PAIRS.inject({}) {|hash, sym, val| hash[sym] = val ; hash }
end
def inject_update
PAIRS.inject({}) {|hash, sym, val| hash.update(val => hash) }
end
def each_store
hash = {}
PAIRS.each {|sym, val| hash[sym] = val }
hash
end
def each_update
hash = {}
PAIRS.each {|sym, val| hash.update(val => hash) }
hash
end
def each_with_object_store
PAIRS.each_with_object({}) {|pair, hash| hash[pair[0]] = pair[1]}
end
def each_with_object_update
PAIRS.each_with_object({}) {|pair, hash| hash.update(pair[0] => pair[1])}
end
def by_initialization
Hash[PAIRS]
end
def tap_store
{}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}
end
def tap_update
{}.tap {|hash| PAIRS.each {|sym, val| hash.update(sym => val)}}
end
N = 10000
Benchmark.bmbm do |x|
x.report("inject_store") { N.times { inject_store }}
x.report("inject_update") { N.times { inject_update }}
x.report("each_store") { N.times {each_store }}
x.report("each_update") { N.times {each_update }}
x.report("each_with_object_store") { N.times {each_with_object_store }}
x.report("each_with_object_update") { N.times {each_with_object_update }}
x.report("by_initialization") { N.times {by_initialization}}
x.report("tap_store") { N.times {tap_store }}
x.report("tap_update") { N.times {tap_update }}
end
end
And the results:
Rehearsal -----------------------------------------------------------
inject_store 10.510000 0.120000 10.630000 ( 10.659169)
inject_update 8.490000 0.190000 8.680000 ( 8.696176)
each_store 4.290000 0.110000 4.400000 ( 4.414936)
each_update 12.800000 0.340000 13.140000 ( 13.188187)
each_with_object_store 5.250000 0.110000 5.360000 ( 5.369417)
each_with_object_update 13.770000 0.340000 14.110000 ( 14.166009)
by_initialization 3.040000 0.110000 3.150000 ( 3.166201)
tap_store 4.470000 0.110000 4.580000 ( 4.594880)
tap_update 12.750000 0.340000 13.090000 ( 13.114379)
------------------------------------------------- total: 77.140000sec
user system total real
inject_store 10.540000 0.110000 10.650000 ( 10.674739)
inject_update 8.620000 0.190000 8.810000 ( 8.826045)
each_store 4.610000 0.110000 4.720000 ( 4.732155)
each_update 12.630000 0.330000 12.960000 ( 13.016104)
each_with_object_store 5.220000 0.110000 5.330000 ( 5.338678)
each_with_object_update 13.730000 0.340000 14.070000 ( 14.102297)
by_initialization 3.010000 0.100000 3.110000 ( 3.123804)
tap_store 4.430000 0.110000 4.540000 ( 4.552919)
tap_update 12.850000 0.330000 13.180000 ( 13.217637)
=> true
Enumerable#each is faster than Enumerable#inject, and Hash#store is faster than Hash#update. But the fastest of all is to pass an array in at initialization time:
Hash[PAIRS]
If you're adding elements after the hash has been created, the winning version is exactly what the OP was suggesting:
hash = {}
PAIRS.each {|sym, val| hash[sym] = val }
hash
But in that case, if you're a purist who wants a single lexical form, you can use #tap and #each and get the same speed:
{}.tap {|hash| PAIRS.each {|sym, val| hash[sym] = val}}
For those not familiar with tap, it creates a binding of the receiver (the new hash) inside the body, and finally returns the receiver (the same hash). If you know Lisp, think of it as Ruby's version of LET binding.
Since people have asked, here's the testing environment:
# Ruby version ruby 2.0.0p247 (2013-06-27) [x86_64-darwin12.4.0]
# OS Mac OS X 10.9.2
# Processor/RAM 2.6GHz Intel Core i7 / 8GB 1067 MHz DDR3
Beauty is in the eye of the beholder. Those with some functional programming background will probably prefer the inject-based method (as I do), because it has the same semantics as the fold higher-order function, which is a common way of calculating a single result from multiple inputs. If you understand inject, then you should understand that the function is being used as intended.
As one reason why this approach seems better (to my eyes), consider the lexical scope of the hash variable. In the inject-based method, hash only exists within the body of the block. In the each-based method, the hash variable inside the block needs to agree with some execution context defined outside the block. Want to define another hash in the same function? Using the inject method, it's possible to cut-and-paste the inject-based code and use it directly, and it almost certainly won't introduce bugs (ignoring whether one should use C&P during editing - people do). Using the each method, you need to C&P the code, and rename the hash variable to whatever name you wanted to use - the extra step means this is more prone to error.
inject (aka reduce) has a long and respected place in functional programming languages. If you're ready to take the plunge, and want to understand a lot of Matz's inspiration for Ruby, you should read the seminal Structure and Interpretation of Computer Programs, available online at http://mitpress.mit.edu/sicp/.
Some programmers find it stylistically cleaner to have everything in one lexical package. In your hash example, using inject means you don't have to create an empty hash in a separate statement. What's more, the inject statement returns the result directly -- you don't have to remember that it's in the hash variable. To make that really clear, consider:
[1, 2, 3, 5, 8].inject(:+)
vs
total = 0
[1, 2, 3, 5, 8].each {|x| total += x}
The first version returns the sum. The second version stores the sum in total, and as a programmer, you have to remember to use total rather than the value returned by the .each statement.
One tiny addendum (and purely idomatic -- not about inject): your example might be better written:
some_enum.inject({}) {|hash, element| hash.update(element.foo => element.bar) }
...since hash.update() returns the hash itself, you don't need the extra hash statement at the end.
update
#Aleksey has shamed me into benchmarking the various combinations. See my benchmarking reply elsewhere here. Short form:
hash = {}
some_enum.each {|x| hash[x.foo] = x.bar}
hash
is the fastest, but can be recast slightly more elegantly -- and it's just as fast -- as:
{}.tap {|hash| some_enum.each {|x| hash[x.foo] = x.bar}}
I have just found in
Ruby inject with initial being a hash
a suggestion to use each_with_object instead of inject:
hash = some_enum.each_with_object({}) do |element, h|
h[element.foo] = element.bar
end
Seems natural to me.
Another way, using tap:
hash = {}.tap do |h|
some_enum.each do |element|
h[element.foo] = element.bar
end
end
If you are returning a hash, using merge can keep it cleaner so you don't have to return the hash afterward.
some_enum.inject({}){|h,e| h.merge(e.foo => e.bar) }
If your enum is a hash, you can get key and value nicely with the (k,v).
some_hash.inject({}){|h,(k,v)| h.merge(k => do_something(v)) }
I think it has to do with people not fully understanding when to use reduce. I agree with you, each is the way it should be