Accessing hash value from hash directly VS from variable - ruby-on-rails

I have a question:
Suppose we have a hash
hash = {:key1 => "value1", :key2 => "value2"}
what is more efficient way to access "value1" it multiple times.
Is it with directly
hash[:key1]
OR
val = hash[:key1]

This is actually something you can check by yourself:
gem install benchmark-ips
require 'benchmark/ips'
hash = {key1: "value1", key2: "value2"}
Benchmark.ips do |x|
x.report("assign") { foo = hash[:key1] }
x.report("direct") { hash[:key1] }
end
and run it:
Warming up --------------------------------------
assign 1.644M i/100ms
direct 1.730M i/100ms
Calculating -------------------------------------
assign 15.884M (± 4.1%) i/s - 80.534M in 5.078824s
direct 16.811M (± 5.2%) i/s - 84.780M in 5.056902s
As expected, what you call direct (not assigning value to a variable) is slightly faster, but not by much (~6%)
You can learn more about this benchmarking tool here: https://github.com/evanphx/benchmark-ips
You can use this tool in any way you want, just makes sure you're measuring what you want, for example if you want to know if memoized variable is faster than a hash, you can do this
require 'benchmark/ips'
hash = {key1: "value1", key2: "value2"}
temp = hash[:key1]
Benchmark.ips do |x|
x.report("temp") { temp }
x.report("hash") { hash[:key1] }
end
Or, as you mentioned in a comment, the values are huge, try this
require 'benchmark/ips'
hash = {key1: "x" * 10_000, key2: "value2"}
temp = hash[:key1]
Benchmark.ips do |x|
x.report("temp") { temp }
x.report("hash") { hash[:key1] }
end
Or if you meant that the has itself is huge (i.e. a lot of keys), just prepare your hash to resemble your real word problem and run the benchmark again.

I think the question is, if you need to use a value stored in a hash multiple times, say in a loop, is it faster to store the value in a variable or just access the value directly from the hash when it is needed.
Accessing the value in a hash is generally is O(1), but in the worst case can be O(n). In addition at the very least accessing the hash would require sending the key through the hashing function, so I think in general extracting the value once and then referring would be faster.
Here is a somewhat extreme case were the value is needed 1000 times, but you can see from the results of the built in benchmark module, accessing the variable is faster than accessing the hash. And since for this simple hash the access is almost certainty O(1), the difference is probably only due to the fact that the hashing function must be called on the key before getting the value.
require 'benchmark'
hash = {:key1 => "value1", :key2 => "value2"}
val = nil
Benchmark.bmbm do |x|
x.report { val = hash[:key1] }
end
puts "*****"
val = hash[:key1]
Benchmark.bmbm do |x|
x.report { 1000.times { val } }
x.report { 1000.times { hash[:key1] } }
end
Rehearsal ------------------------------------
0.000006 0.000002 0.000008 ( 0.000003)
--------------------------- total: 0.000008sec
user system total real
0.000006 0.000000 0.000006 ( 0.000004)
*****
Rehearsal ------------------------------------
0.000051 0.000000 0.000051 ( 0.000051)
0.000063 0.000000 0.000063 ( 0.000062)
--------------------------- total: 0.000114sec
user system total real
0.000048 0.000011 0.000059 ( 0.000054)
0.000082 0.000000 0.000082 ( 0.000077)

Related

How can i count words frenquency and append results every time i run the script in ruby

["one", "two", "three", "three"]
I want to open a file and write
{"one" => 1, "two" => 1, "three" => 2}
["one", "two"]
and in the next time open the same file and search for the each word if exsit append + 1 else create new word
{"one" => 2, "two" => 2, "three" => 2}
This should do :
hash = ["one", "two", "three", "three"]
frequency_file = 'frequency.dat'
if File.exists?(frequency_file)
old_frequency = File.open(frequency_file) {|f| Marshal.load(f.read)}
else
old_frequency = {}
end
old_frequency.default = 0
frequency = hash.group_by{|name| name}.map{|name, list| [name,list.count+old_frequency[name]]}.to_h
File.open(frequency_file,'w'){|f| f.write(Marshal.dump(frequency))}
puts frequency.inspect
# => {"one"=>1, "two"=>1, "three"=>2}
# => {"one"=>2, "two"=>2, "three"=>4}
If you prefer a human-readable file :
require 'yaml'
hash = ["one", "two", "three", "three"]
frequency_file = 'frequency.yml'
if File.exists?(frequency_file)
old_frequency = YAML.load_file(frequency_file)
else
old_frequency = {}
end
old_frequency.default = 0
frequency = hash.group_by{|name| name}.map{|name, list| [name,list.count+old_frequency[name]]}.to_h
File.open(frequency_file,'w'){|f| f.write frequency.to_yaml}
puts frequency.inspect
# => {"one"=>1, "two"=>1, "three"=>2}
# => {"one"=>2, "two"=>2, "three"=>4}
Here are some variations that'd do it:
ary = %w[a b a c a b]
ary.group_by { |v| v }.map{ |k, v| [k, v.size] }.to_h # => {"a"=>3, "b"=>2, "c"=>1}
ary.each_with_object(Hash.new(0)) { |v, h| h[v] += 1} # => {"a"=>3, "b"=>2, "c"=>1}
ary.uniq.map { |v| [v, ary.count(v)] }.to_h # => {"a"=>3, "b"=>2, "c"=>1}
Since they're all about the same length it becomes important to know which is the fastest.
require 'fruity'
ary = %w[a b a c a b] * 1000
compare do
group_by { ary.group_by { |v| v }.map{ |k, v| [k, v.size] }.to_h }
each_with_object { ary.each_with_object(Hash.new(0)) { |v, h| h[v] += 1} }
uniq_map { ary.uniq.map { |v| [v, ary.count(v)] }.to_h }
end
# >> Running each test 4 times. Test will take about 1 second.
# >> group_by is faster than uniq_map by 30.000000000000004% ± 10.0%
# >> uniq_map is faster than each_with_object by 19.999999999999996% ± 10.0%
How to persist the data and append to it is a separate question and how to do it depends on the size of the data you're checking, and how fast you need the code to run. Databases are very capable of doing these sort of checks extremely fast as they have code optimized to search and count unique occurrences of records. Even SQLite should have no problem doing this. Using an ORM like Sequel or ActiveRecord would make it painless to talk to the DB and to scale or port to a more capable database manager if needed.
Writing to a local file is OK if you occasionally need to update, or you don't have a big list of words, and you don't need to share the information with other pieces of code or with another machine.
Reading a file to recover the hash then incrementing it assumes a word will never be deleted, they'll only be added. I've written a lot of document analysis code and that case hasn't occurred, so I'd recommend thinking about long-term use before settling on your particular path.
Could you put the string representation of a hash (the first line of the file) in a separate (e.g., JSON) file? If so, consider something like the following.
First let's create a JSON file for the hash and a second file, the words of which are to be counted.
require 'json'
FName = "lucy"
JSON_Fname = "hash_counter.json"
File.write(JSON_Fname, JSON.generate({"one" => 1, "two" => 1, "three" => 2}))
#=> 27
File.write(FName, '["one", "two", "three", "three"]')
#=>32
First read the JSON file, parse the hash and give h a default value of zero.1.
h = JSON.parse(File.read(JSON_Fname))
#=> {"one"=>1, "two"=>1, "three"=>2}
h.default = 0
(See Hash#default=). Then read the other file and update the hash.
File.read(FName).downcase.scan(/[[:alpha:]]+/).each { |w| h[w] += 1 }
h #=> {"one"=>2, "two"=>2, "three"=>4}
Lastly, write the hash h to the JSON file (as I did above).2
1 Ruby expands h[w] += 1 to h[w] = h[w] + 1 before parsing the expression. If h does not have a key w, Hash#[] returns the hash's default value, if it has one. Here h["cat"] #=> 0 since h has no key "cat" and the default has been set to zero. The expression therefore becomes h[w] = 0 + 1 #=> 1. Note that the method on the left of the equality is Hash#[]=, which is why the default value does not apply there.
2 To be safe, write the new JSON string to a temporary file, delete the JSON file, then rename the temporary file to the former JSON file name.

Improve performance to find an array of ids from array of hashes in Ruby

Consider a array of hashes
a=[{'id'=>'1','imageUrl'=>'abc'},{'id'=>'2','imageUrl'=>'efg'},{'id'=>'3','imageUrl'=>'hij'}]
Consider an array of characters/numbers/ids
b=['1','2','5']
I would like to match ids of b with a. With all matches, I would like to replace the value of b with the corresponding hash.
In the above example, the values '1' and '2' are common between a and b, so I replace '1' and '2' in b with the corresponding hash values of a.
So the resultant b becomes
b=[[{"id"=>"1", "imageUrl"=>"abc"}], [{"id"=>"2", "imageUrl"=>"efg"}], []]
I wrote the following code:
b.each_with_index{|r,index|
puts index
k=a.select {|z| z["id"]==r }
b[index]=k
}
Is there a better solution? A more sleek one. I am new to ruby.
You can use the destructive version of Enumerable#map, with Enumerable#select
b.map! {|id| a.select {|h| h['id'] == id }}
# => [[{"id"=>"1", "imageUrl"=>"abc"}], [{"id"=>"2", "imageUrl"=>"efg"}], []]
This will improve speed:
#!/usr/bin/env ruby
require 'pp'
require 'benchmark'
a = []
5000.times {|c| a << {"id" => "#{c}", "imageUrl" => "test#{c}"}}
b1 = (1..2500).to_a.shuffle.map(&:to_s)
b2 = b1.dup()
puts "method1"
puts Benchmark.measure { b1.map! {|id| a.select {|h| h['id'] == id }} }
puts "method2"
result = Benchmark.measure do
ah = Hash.new([])
a.each{|x| ah[x["id"]] = x}
b2.map!{|be| ah[be]}
end
puts result
Results:
method1
2.820000 0.010000 2.830000 ( 2.827695)
method2
0.000000 0.000000 0.000000 ( 0.002607)
Updated benchmark - it uses 250000 elements in b instead of 2500 (method 1 commented out to protect the innocent - it's too slow and I got bored waiting for it):
#!/usr/bin/env ruby
require 'pp'
require 'benchmark'
a = []
5000.times {|c| a << {"id" => "#{c}", "imageUrl" => "test#{c}"}}
b1 = (1..250000).to_a.collect{|x| x%2500}.shuffle.map(&:to_s)
b2 = b1.dup()
b3 = b1.dup()
# puts "method1"
# puts Benchmark.measure { b1.map! {|id| a.select {|h| h['id'] == id }} }
puts "method2"
result = Benchmark.measure do
ah = Hash.new([])
a.each{|x| ah[x["id"]] = x}
b2.map!{|be| ah[be]}
end
puts result
puts "method3"
result = Benchmark.measure do
h = a.each_with_object({}) { |g,h| h.update(g['id']=>g) }
b3.map! { |s| h.key?(s) ? [h[s]] : [] }
end
puts result
And the results are:
method2
0.050000 0.000000 0.050000 ( 0.045294)
method3
0.100000 0.010000 0.110000 ( 0.109646)
[Edit: after posting I noticed #Mircea had already posted the same solution. I'll leave mine for the mention of the values_at alternative.]
I assume the values of :id in a are unique.
First construct a look-up hash:
h = a.each_with_object({}) { |g,h| h.update(g['id']=>g) }
#=> {"1"=>{"id"=>"1", "imageUrl"=>"abc"},
# "2"=>{"id"=>"2", "imageUrl"=>"efg"},
# "3"=>{"id"=>"3", "imageUrl"=>"hij"}}
Then simply loop through b, constructing the desired array:
b.map { |s| h.key?(s) ? [h[s]] : [] }
#=> [[{"id"=>"1", "imageUrl"=>"abc"}],
# [{"id"=>"2", "imageUrl"=>"efg"}],
# []]
Alternatively,
arr = h.values_at(*b)
#=> [{"id"=>"1", "imageUrl"=>"abc"},
# {"id"=>"2", "imageUrl"=>"efg"},
# nil]
Then:
arr.map { |e| e.nil? ? [] : [e] }
#=> [[{"id"=>"1", "imageUrl"=>"abc"}],
# [{"id"=>"2", "imageUrl"=>"efg"}],
# []]
You might instead consider using arr for subsequent calculations, since all the arrays in your desired solution contain at most one element.
The use of a lookup hash is especially efficient when b is large relative to a.

In ruby which is better, detect or index, to find an object in an array?

I have an array of objects.
I want to find an object in the array based on some property of the object.
I can do
array.detect {|x| x.name=="some name"}
or I could do
ind=array.index {|x| x.name=="some name"}
array[ind] unless ind.nil?
Is there any reason to choose one over the other?
If you aren't interested in finding the index value of the object you're searching for, I would suggest detect. It'll save you from having to do that nil check before accessing the array.
From a performance standpoint, I imagine it's relatively comparable, but that could help your decision too. That would require benchmarking as Niels B. mentioned in his comment.
If you want to find an element in a collection, it's important to use collections made for fast retrieval. Arrays are not made for that, nor are they particularly convenient unless you are making a stack or a queue.
Here's some code to show ways to improve the storage/retrieval speed over what you can get using find, detect or other normal array-based methods:
require 'fruity'
require 'digest'
class Foo
attr_reader :var1, :var2
def initialize(var1, var2)
#var1, #var2 = var1, var2
end
end
START_INT = 1
START_CHAR = 'a'
END_INT = 10
END_CHAR = 'z'
START_MD5 = Digest::MD5.hexdigest(START_INT.to_s + START_CHAR)
END_MD5 = Digest::MD5.hexdigest(END_INT.to_s + END_CHAR)
ary = []
hsh = {}
hsh2 = {}
START_INT.upto(END_INT) do |i|
(START_CHAR .. END_CHAR).each do |j|
foo = Foo.new(i, j)
ary << foo
hsh[[i, j]] = foo
hsh2[Digest::MD5.hexdigest(i.to_s + j)] = foo
end
end
compare do
array_find {
ary.find { |a| (a.var1 == START_INT) && (a.var2 == START_CHAR) }
ary.find { |a| (a.var1 == END_INT) && (a.var2 == END_CHAR) }
}
hash_access_with_array {
hsh[[START_INT, START_CHAR]]
hsh[[END_INT, END_CHAR]]
}
hash_access_with_digest {
hsh2[START_MD5]
hsh2[END_MD5]
}
end
Which results in:
Running each test 16384 times. Test will take about 17 seconds.
hash_access_with_digest is faster than hash_access_with_array by 10x ± 1.0
hash_access_with_array is faster than array_find by 16x ± 1.0
There are three different tests, and I'm looking for the first, and last elements in the array ary, and the corresponding objects in the hashes. The result of looking for the first and last elements in the array will be an average time for that search. For comparison I'm searching for the same objects in the hashes.
If we had some advance knowledge of which array index the object is in, retrieving the object from the array would be faster, but that's the problem, and making another container to keep track of that information would be slower than using the hash.
See for yourself!
require 'benchmark'
array = (1..1000000).to_a
Benchmark.bmbm do |x|
x.report("#index for 1") {
array.index(1)
}
x.report("#detect 1") {
array.detect { |i| i == 1 }
}
x.report("#index for 500k") {
array.index(500000)
}
x.report("#detect 500k") {
array.detect { |i| i == 500000 }
}
x.report("#index for 1m") {
array.index(1000000)
}
x.report("#detect 1m") {
array.detect { |i| i == 1000000 }
}
end
Put the code above in a file and execute it from the console with ruby <file>
Ignore the top block, that is rehearsal, the bottom block should look something like this:
user system total real
#index for 1 0.000005 0.000002 0.000007 ( 0.000004)
#detect 1 0.000007 0.000002 0.000009 ( 0.000006)
#index for 500k 0.003274 0.000049 0.003323 ( 0.003388)
#detect 500k 0.029870 0.000200 0.030070 ( 0.030872)
#index for 1m 0.005866 0.000009 0.005875 ( 0.005880)
#detect 1m 0.059819 0.000520 0.060339 ( 0.061340)
Running on my mac and Ruby 2.5.0, the numbers seem to suggest that #detect is an order of magnitude slower than #index.

How do I convert an array of hashes into a sorted hash?

If I have an array of hashes, each with a day key:
[
{:day=>4,:name=>'Jay'},
{:day=>1,:name=>'Ben'},
{:day=>4,:name=>'Jill'}
]
What is the best way to convert it to a hash with sorted day values as the keys:
{
:1=>[{:day=>1,:name=>'Ben'}],
:4=>[{:day=>4,:name=>'Jay'},{:day=>4,:name=>'Jill'}]
}
I'm using Ruby 1.9.2 and Rails 3.1.1
Personally, I wouldn't bother "sorting" the keys (which amounts to ordering-by-entry-time in Ruby 1.9) until I actually needed to. Then you can use group_by:
arr = [{:day=>4,:name=>'Jay'}, {:day=>1,:name=>'Ben'}, {:day=>4,:name=>'Jill'}]
arr.group_by { |a| a[:day] }
=> {4=>[{:day=>4, :name=>"Jay"}, {:day=>4, :name=>"Jill"}],
1=>[{:day=>1, :name=>"Ben"}]}
Instead, sort the keys when you actually need them.
Assuming you array is called is list, here's one way using the reduce method:
list.reduce({}) { |hash, item|
(hash[item[:day]] ||= []) << item; hash
}
Here's another using the map method, but you have to carry a holder variable around:
hash = {}
list.each { |item|
(hash[item[:day]] ||= []) << item
}
Once you have the unsorted hash say in variable foo, you can sort it as,
Hash[foo.sort]
Simple answer:
data = [
{:day=>4,:name=>'Jay'},
{:day=>1,:name=>'Ben'},
{:day=>4,:name=>'Jill'}
]
#expected solution
sol = {
1=>[{:day=>1,:name=>'Ben'}],
4=>[{:day=>4,:name=>'Jay'},{:day=>4,:name=>'Jill'}]
}
res = {}
data.each{|h|
res[h[:day]] ||= []
res[h[:day]] << h
}
p res
p res == sol #check value
p res.keys == sol.keys #check order
Problem with this solution: The hash is not sorted as requested. (Same problem has Anurags solution).
So you must modify the answer a bit:
res = {}
data.sort_by{|h| h[:day]}.each{|h|
res[h[:day]] ||= []
res[h[:day]] << h
}
p res
p res == sol #check value
p res.keys == sol.keys #check order
In Rails you can use OrderedHash:
ActiveSupport::OrderedHash[arr.group_by { |a| a[:day] }.sort_by(&:first)]
Update: In fact in Ruby 1.9 hash is ordered, so using ActiveSupport extension is not required:
Hash[arr.group_by { |a| a[:day] }.sort_by(&:first)]

There has got to be a cleaner way to do this

I have this code here and it works but there has to be a better way.....i need two arrays that look like this
[
{
"Vector Arena - Auckland Central, New Zealand" => {
"2010-10-10" => [
"Enter Sandman",
"Unforgiven",
"And justice for all"
]
}
},
{
"Brisbane Entertainment Centre - Brisbane Qld, Austr..." => {
"2010-10-11" => [
"Enter Sandman"
]
}
}
]
one for the past and one for the upcoming...the problem i have is i am repeating myself and though it works i want to clean it up ...here is my data
..
Try this:
h = Hash.new {|h1, k1| h1[k1] = Hash.new{|h2, k2| h2[k2] = []}}
result, today = [ h, h.dup], Date.today
Request.find_all_by_artist("Metallica",
:select => "DISTINCT venue, showdate, LOWER(song) AS song"
).each do |req|
idx = req.showdate < today ? 0 : 1
result[idx][req.venue][req.showdate] << req.song.titlecase
end
Note 1
In the first line I am initializing an hash of hashes. The outer hash creates the inner hash when a non existent key is accessed. An excerpt from Ruby Hash documentation:
If this hash is subsequently accessed by a key that doesn‘t correspond to a hash
entry, the block will be called with the hash object and the key, and should
return the default value. It is the block‘s responsibility to store the value in
the hash if required.
The inner hash creates and empty array when the non existent date is accessed.
E.g: Construct an hash containing of content as values and date as keys:
Without a default block:
h = {}
list.each do |data|
h[data.date] = [] unless h[data.date]
h[data.date] << data.content
end
With a default block
h = Hash.new{|h, k| h[k] = []}
list.each do |data|
h[data.date] << data.content
end
Second line simply creates an array with two items to hold the past and future data. Since both past and the present stores the data as Hash of Hash of Array, I simply duplicate the value.
Second line can also be written as
result = [ h, h.dup]
today = Date.today

Resources