Move data into 3 Separate Hashes inside loop in ruby - ruby-on-rails

It's only my second post and I'm still learning ruby.
I'm trying to figure this out based on my Java knowledge but I can't seem to get it right.
What I need to do is:
I have a function that reads a file line by line and extract different car features from each line, for example:
def convertListings2Catalogue (fileName)
f = File.open(fileName, "r")
f.each_line do |line|
km=line[/[0-9]+km/]
t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
trans = ....
end end
Now for each line I need to store the extracted features into separate
hashes that I can access later in my program.
The issues I'm facing:
1) I'm overwriting the features in the same hash
2) Can't access the hash outside my function
That what's in my file:
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC,
Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated
Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors,
Keyless Entry},19BF723A,2018,LE
Now my function extracts the features of each car model, but I need to store these features in 3 different hashes with the same keys but different values.
listing = Hash.new(0)
listing = { kilometers: km, type: t, transmission: trans, drivetrain: dt, status: status, car_maker: car_maker }
I tried moving the data from one hash to another, I even tried storing the data in an array first and then moving it to the hash but I still can't figure out how to create separate hashes inside a loop.
Thanks

I don't fully understand the question but I thought it was important to suggest how you might deal with a more fundamental issue: extracting the desired information from each line of the file in an effective and Ruby-like manner. Once you have that information, in the form of an array of hashes, one hash per line, you can do with it what you want. Alternatively, you could loop through the lines in the file, constructing a hash for each line and performing any desired operations before going on to the next line.
Being new to Ruby you will undoubtedly find some of the code below difficult to understand. If you persevere, however, I think you will be able to understand all of it, and in the process learn a lot about Ruby. I've made some suggestions in the last section of my answer to help you decipher the code.
Code
words_by_key = {
type: %w| sedan coupe hatchback station suv |,
transmission: %w| auto manual steptronic |,
drivetrain: %w| fwd rwd awd |,
status: %w| used new |,
car_maker: %w| honda toyota mercedes bmw lexus |,
model: %w| camry clk crv |
}
#=> {:type=>["sedan", "coupe", "hatchback", "station", "suv"],
# :transmission=>["auto", "manual", "steptronic"],
# :drivetrain=>["fwd", "rwd", "awd"],
# :status=>["used", "new"],
# :car_maker=>["honda", "toyota", "mercedes", "bmw", "lexus"],
# :model=>["camry", "clk", "crv"]}
WORDS_TO_KEYS = words_by_key.each_with_object({}) { |(k,v),h| v.each { |s| h[s] = k } }
#=> {"sedan"=>:type, "coupe"=>:type, "hatchback"=>:type, "station"=>:type, "suv"=>:type,
# "auto"=>:transmission, "manual"=>:transmission, "steptronic"=>:transmission,
# "fwd"=>:drivetrain, "rwd"=>:drivetrain, "awd"=>:drivetrain,
# "used"=>:status, "new"=>:status,
# "honda"=>:car_maker, "toyota"=>:car_maker, "mercedes"=>:car_maker,
# "bmw"=>:car_maker, "lexus"=>:car_maker,
# "camry"=>:model, "clk"=>:model, "crv"=>:model}
module ExtractionMethods
def km(str)
str[/\A\d+(?=km\z)/]
end
def year(str)
str[/\A\d+{4}\z/]
end
def stock(str)
return nil if str.end_with?('km')
str[/\A\d+\p{Alpha}\p{Alnum}*\z/]
end
def trim(str)
str[/\A\p{Alpha}{2}\z/]
end
def fuel_consumption(str)
str.to_f if str[/\A\d+(?:\.\d+)?(?=l\/100km\z)/]
end
end
class K
include ExtractionMethods
def extract_hashes(fname)
File.foreach(fname).with_object([]) do |line, arr|
line = line.downcase
idx_left = line.index('{')
idx_right = line.index('}')
if idx_left && idx_right
g = { set_of_features: line[idx_left..idx_right] }
line[idx_left..idx_right] = ''
line.squeeze!(',')
else
g = {}
end
arr << line.split(',').each_with_object(g) do |word, h|
word.strip!
if WORDS_TO_KEYS.key?(word)
h[WORDS_TO_KEYS[word]] = word
else
ExtractionMethods.instance_methods.find do |m|
v = public_send(m, word)
(h[m] = v) unless v.nil?
v
end
end
end
end
end
end
Example
data =<<BITTER_END
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC, Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors, Keyless Entry},19BF723A,2018,LE
BITTER_END
FILE_NAME = 'temp'
File.write(FILE_NAME, data)
#=> 353 (characters written to file)
k = K.new
#=> #<K:0x00000001c257d348>
k.extract_hashes(FILE_NAME)
#=> [{:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry}",
# :km=>"65101", :type=>"sedan", :transmission=>"manual", :year=>"2010",
# :stock=>"18131a", :drivetrain=>"fwd", :status=>"used", :fuel_consumption=>5.5,
# :car_maker=>"toyota", :model=>"camry", :trim=>"se"},
# {:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry, power seats}",
# :type=>"coupe", :km=>"1100", :transmission=>"auto", :drivetrain=>"rwd",
# :model=>"clk", :trim=>"lx", :stock=>"18fo724a", :year=>"2017",
# :fuel_consumption=>6.0, :status=>"used"},
# {:set_of_features=>"{heated seats, heated mirrors, keyless entry}",
# :drivetrain=>"awd", :type=>"suv", :km=>"0", :transmission=>"auto",
# :status=>"new", :car_maker=>"honda", :model=>"crv", :fuel_consumption=>8.0,
# :stock=>"19bf723a", :year=>"2018", :trim=>"le"}]
Explanation
Firstly, note that the HEREDOC needs to be un-indented before being executed.
You will see that the instance method K#extract_hashes uses IO#foreach to read the file line-by-line.1
The first step in processing each line of the file is to downcase it. You will then want to split the string on commas to form an array of words. There is a problem, however, in that you don't want to split on commas that are between a left and right brace ({ and }), which corresponds to the key :set_of_features. I decided to deal with that by determining the indices of the two braces, creating a hash with the single key :set_of_features, delete that substring from the line and lastly replace a resulting pair of adjacent commas with a single comma:
idx_left = line.index('{')
idx_right = line.index('}')
if idx_left && idx_right
g = { set_of_features: line[idx_left..idx_right] }
line[idx_left..idx_right] = ''
line.squeeze!(',')
else
g = {}
end
See String for the documentation of the String methods used here and elsewhere.
We can now convert the resulting line to an array of words by splitting on the commas. If any capitalization is desired in the output this should be done after the hashes have been constructed.
We will build on the hash { set_of_features: line[idx_left..idx_right] } just created. When complete, it will be appended to the array being returned.
Each element (word) in the array, is then processed. If it is a key of the hash WORDS_TO_KEYS we set
h[WORDS_TO_KEYS[word]] = word
and are finished with that word. If not, we execute each of the instance methods m in the module ExtractionMethods until one is found for which m[word] is not nil. When that is found another key-value pair is added to the hash h:
h[m] = word
Notice that the name of each instance method in ExtractionMethods, which is a symbol (e.g., :km), is a key in the hash h. Having separate methods facilitates debugging and testing.
I could have written:
if (s = km(word))
s
elsif (s = year(word))
s
elsif (s = stock(str))
s
elsif (s = trim(str))
s
elsif (s = fuel_consumption(str))
s
end
but since all these methods take the same argument, word, we can instead use Object#public_send:
a = [:km, :year, :stock, :trim, :fuel_consumption]
a.find do |m|
v = public_send(m, word)
(h[m] = v) unless v.nil?
v
end
A final tweak is to put all the methods in the array a in a module ExtractionMethods and include that module in the class K. We can then replace a in the find expression above with ExtractionMethods.instance_methods. (See Module#instance_methods.)
Suppose now that the data are changed so that additional fields are added (e.g., for "colour" or "price"). Then the only modifications to the code required are changes to words_by_key and/or the addition of methods to ExtractionMethods.
Understanding the code
It may be helpful to run the code with puts statements inserted. For example,
idx_left = line.index('{')
idx_right = line.index('}')
puts "idx_left=#{idx_left}, idx_left=#{idx_left}"
Where code is chained it may be helpful to break it up with temporary variables and insert puts statements. For example, change
arr << line.split(',').each_with_object(g) do |word, h|
...
to
a = line.split(',')
puts "line.split(',')=#{a}"
enum = a.each_with_object(g)
puts "enum.to_a=#{enum.to_a}"
arr << enum do |word, h|
...
The second puts here is merely to see what elements the enumerator enum will generate and pass to the block.
Another way of doing that is to use the handy method Object#tap, which is inserted between two methods:
arr << line.split(',').tap { |a| puts "line.split(',')=#{a}"}.
each_with_object(g) do |word, h|
...
tap (great name, eh?), as used here, simply returns its receiver after displaying its value.
Lastly, I've used the method Enumerable#each_with_object in a couple of places. It may seem complex but it's actually quite simple. For example,
arr << line.split(',').each_with_object(g) do |word, h|
...
end
is effectively equivalent to:
h = g
arr << line.split(',').each do |word|
...
end
h
1 Many IO methods are typically invoked on File. This is acceptable because File.superclass #=> IO.

You could leverage the fact that your file instance is an enumerable. This allows you to leverage the inject method, and you can seed that with an empty hash. collector in this case is the hash that gets passed along as the iteration continues. Be sure to (implicitly, by having collector be the last line of the block) return the value of collector as the inject method will use this to feed into the next iteration. It's some pretty powerful stuff!
I think this is roughly what you're going for. I used model as the key in the hash, and set_of_features as your data.
def convertListings2Catalogue (fileName)
f = File.open(fileName, "r")
my_hash = f.inject({}) do |collector, line|
km=line[/[0-9]+km/]
t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
trans = line[(Regexp.union(/auto/i, /manual/i, /steptronic/i))]
dt = line[(Regexp.union(/fwd/i, /rwd/i, /awd/i))]
status = line[(Regexp.union(/used/i, /new/i))]
car_maker = line[(Regexp.union(/honda/i, /toyota/i, /mercedes/i, /bmw/i, /lexus/i))]
stock = line.scan(/(\d+[a-z0-9]+[a-z](?<!km\b))(?:,|$)/i).first
year = line.scan(/(\d{4}(?<!km\b))(?:,|$)/).first
trim = line.scan(/\b[a-zA-Z]{2}\b/).first
fuel = line.scan(/[\d.]+L\/\d*km/).first
set_of_features = line.scan(/\{(.*?)\}/).first
model = line[(Regexp.union(/camry/i, /clk/i, /crv/i))]
collector[model] = set_of_features
collector
end
end

Related

Link method argument to a variable

I have a program where a user is able to receive popular "vacation spots". Al they have to do is enter the continent (Which will bring them to that dictionary) and then enter a country/state (which is a key in a hash) and then it will find the corresponding value.
I have a required file (dict.rb) which is basically a hash module using arrays.
But the issue I have is fairly small. I assigned the user input to two variables, continent_select and country_select
Here's the code:
require './dict.rb'
#create a new dictionary called northamerica
northamerica = Dict.new
Dict.set(northamerica, "new york", "New York City")
Dict.set(northamerica, "new jersey", "Belmar")
puts "Welcome to The Vacation Hub"
puts "What continent are you interested in?"
print '> '
continent_select = $stdin.gets.chomp.downcase
continent_select.gsub!(/\A"|"\Z/, '')
puts "Which state would you like to go to in #{continent_select}"
print '> '
country_select = $stdin.gets.chomp.downcase
#puts "You should go to #{Dict.get(northamerica, "#{country_select}")}"
#=> You should go to Belmar
puts "You should go to #{Dict.get(continent_select, "#{country_select}")}"
#=> error
Ignore the get and set methods, they're in the included dict.rb
Anyway look carefully at the last few lines. The Dict.get method has two arguments. The first finds which dictionary to use. If I just put northamerica as an argument it works. But if I put continent_select instead (assuming the user enters 'northamerica') it doesn't work. I think the program is looking for a Dictionary named continent_select, rather than looking for the variable continent_select.
UPDATE
Here's the whole dict.rb for those who asked.
module Dict
#creates a new dictionary for the user
def Dict.new(num_buckets=256)
#initializes a Dict with given num of buckets
#creates aDict variable which is an empty array
#that will hold our values later
aDict = []
#loop through 0 to the number of buckets
(0...num_buckets).each do |i|
#keeps adding arrays to aDict using push method
aDict.push([])
end
return aDict
#returns [[],[],[]] => array of empty arrays reading to go.
end
def Dict.hash_key(aDict, key)
# Given a key this will create a number and then convert
# it to an index for the aDict's buckets.
return key.hash % aDict.length
#key.hash makes the key a number
# % aDict.length makes the number between 1 and 256
end
def Dict.get_bucket(aDict, key)
#given a key, find where the bucket would go
#sets the key to a number and it's put in bucket_id variable
bucket_id = Dict.hash_key(aDict, key)
#finds the key number in the dict, and returns the key
return aDict[bucket_id]
end
def Dict.get_slot(aDict, key, default=nil)
#returns the index, key, and value of a slot found in a bucket
#assigns the key name to the bucket variable
bucket = Dict.get_bucket(aDict, key)
bucket.each_with_index do |kv, i|
k, v = kv
if key == k
return i, k, v
#returns index key was found in, key, and value
end
end
return -1, key, default
end
def Dict.get(aDict, key, default=nil)
#Gets the value in a bucket for the given key, or the default
i, k, v = Dict.get_slot(aDict, key, default=default)
return v
end
def Dict.set(aDict, key, value)
#sets the key to the value, replacing any existing value
bucket = Dict.get_bucket(aDict, key)
i, k, v = Dict.get_slot(aDict, key)
if i >= 0
bucket[i] = [key, value]
else
bucket.push([key, value])
end
end
def Dict.delete(aDict, key)
#deletes. the given key from the Dict
bucket = Dict.get_bucket(aDict, key)
(0...bucket.length).each do |i|
k, v = bucket[i]
if key == k
bucket.delete_at(i)
break
end
end
end
def Dict.list(aDict)
#prints out what's in the dict
aDict.each do |bucket|
if bucket
bucket.each {|k, v| puts k, v}
end
end
end
end
Now there's some weird stuff going on.
In the first case, which seems to be okay, you pass the correct arguments:
Dict.get(northamerica, "#{country_select}")
That is: Dict instance as the first argument, and a String as the second. But then in the second case:
Dict.get(continent_select, "#{country_select}")
You pass a String instance instead of an obviously expected Dict, and this results in an error.
As far as I understand your intention, you want user input to become a variable name to be used as the first argument, but there is no way way it is magically happening, and you end you up passing just a string.
What you need to do is explicitly map a user input to a corresponding Dict object, and then use it. It can look like this:
# fetch a Dict object that corresponds to "northamerica" string from a hash
# NOTE: it will raise an exception if a user enters something that's not present
# in a hash, i.e. something other than "northamerica"
selected_continent_dict = { "northamerica" => northamerica }.fetch(continent_select)
puts "You should go to #{Dict.get(selected_continent_dict, country_select)}"
If you're prohibited to use Ruby hashes, you can easily get away with a case statement:
selected_continent_dict = case continent_select
when "northamerica"
northamerica
else
raise "Invalid continent"
end
puts "You should go to #{Dict.get(selected_continent_dict, country_select)}"
Hope this helps!
P.S. Two more advice, if you don't mind:
There's no real need for string interpolation in the second argument, and something like Dict.get(northamerica, country_select) could be a cleaner way.
Better variable naming could save you from headaches. I.e. if you renamed a (quite misleading) country_select to a user_state_selection_string it would remind you that it is a string, and of what it holds. The example is arbitrary though. There's a wonderful book called "Code Complete" by Steve McConnell which covers this and other issues much better than I do.

Iterating through Hash to output an unordered list

I have a Hash where the majority of it is filled with a key with two values associated with the key. There is also another hash within this Hash which is where I've been stuck.
Lets say the hash looks like:
{'sports'=>['football', 'basketball'], 'season'=>['summer','fall'], 'data'=>[{'holiday'=>'Christmas', 'genre' => 'Comedy'}, {'holiday'=>'Thanksgiving', 'genre' => 'Action'}]}
The output should look like:
Sports
- football
- basketball
Season
- summer
- fall
Holiday
- Christmas
- Thanksgiving
Genre
- Comedy
- Action
So far I have a helper that gives me everything except the data section.
def output_list_from(hash)
return if hash.empty?
content_tag(:ul) do
hash.map do |key, values|
content_tag(:li, key.to_s.humanize) +
content_tag(:ul) do
# if values.is_a?(Hash)...
content_tag(:li, values.first) +
content_tag(:li, values.last)
end
end.join.html_safe
end.html_safe
end
This returns the output:
Sports
- football
- basketball
Season
- summer
- fall
Data
- {'holiday'=>'Christmas', 'genre' => 'Comedy'}
- {'holiday'=>'Thanksgiving', 'genre' => 'Action'}
Which of course makes sense...so I've tried to check in the loop if the value is a Hash, but the way it's set up has tricked me. I think it's be easier if I knew what the hash would look like everytime, but it would be a new hash each time. One time there could be a holiday within data and the other time there could be both holiday and genre.
Any advice would be appreciated.
You will need to create a hash with the correct format. Something like this:
hash = {'sports'=>['football', 'basketball'], 'season'=>['summer','fall'], 'data'=>[{'holiday'=>'Christmas', 'genre' => 'Comedy'}, {'holiday'=>'Thanksgiving', 'genre' => 'Action'}]}
formatted_data = hash.dup
data = formatted_data.delete('data')
if data
data.each do |item|
item.each do |k, v|
formatted_data[k] ||= []
formatted_data[k] << v
end
end
end
puts formatted_data
# => {"sports"=>["football", "basketball"], "season"=>["summer", "fall"],
# => "holiday"=>["Christmas", "Thanksgiving"], "genre"=>["Comedy", "Action"]}
content_tag(:ul) do
formatted_data.map do |key, values|
#... your code here...
end.join.html_safe
end.html_safe
Suppose your hash looked like this:
hash = { 'sports'=>['football', 'basketball'],
'season'=>['summer', 'fall'],
'data1' =>[{ 'holiday'=>'Christmas', 'genre'=>'Comedy'},
{ 'holiday'=>'Thanksgiving', 'genre'=>'Action' }],
'data2' =>[{ 'data3'=>[{ 'sports'=>'darts', 'genre'=>'Occult' }] }]
}
and you wanted a general solution that would work for any number of levels and does not depend on the names of the keys that will not be in the resulting hash (here 'data1', 'data2' and 'data3'). Here's one way you could do that, using recursion.
Code
def extract(h, new_hash = {})
h.each do |k,v|
[*v].each do |e|
case e
when Hash then extract(e, new_hash)
else new_hash.update({ k=>[e] }) { |_,ov,nv| ov << nv.first }
end
end
end
new_hash
end
Example
extract(hash)
#=> {"sports"=>["football", "basketball", "darts"],
# "season"=>["summer", "fall"],
# "holiday"=>["Christmas", "Thanksgiving"],
# "genre"=>["Comedy", "Action", "Occult"]}
Explanation
There are, I think, mainly two things in the code that may require clarification.
#1
The first is the rather lonely and odd-looking expression:
[*v]
If v is an array, this returns v. If v is a literal, the splat operator has no effect, so it returns [v]. In other words, it leaves arrays alone and converts literals to an array containing one element, itself. Ergo:
[*['football', 'basketball']] #=> ["football", "basketball"]
[*'Thanksgiving'] #=> ["Thanksgiving"]
This saves us the trouble of having three, rather than two, possibilities in the case statement. We simply convert literals to arrays of one element, allowing us to deal with just hashes and arrays.
#2
The second snippet that may be unfamiliar to some is this:
new_hash.update({ k=>[e] }) { |_,ov,nv| ov << nv.first }
This uses the form of the method Hash#update (a.k.a. merge!) that uses a block to resolve the values of keys that are present in both hashes being merged. As an example, at some stage of the calculations, new_hash will have a key-value pair:
'sports'=>['football', 'basketball']
and is to be updated with the hash1:
{ 'sports'=>['darts'] }
Since both of these hashes have the key 'sport', the block is called upon as arbiter:
{ |k,ov,nv| ov << nv.first }
#=> { |'sport', ['football', 'basketball'], ['darts']| ov << nv.first }
#=> { |'sport', ['football', 'basketball'], ['darts']|
['football', 'basketball'] << 'darts' }
#=> ['football', 'basketball'] << 'darts'
As I'm not using the key 'sport' in the block, I've replaced that block variable with a placeholder (_) to reduce opportunities for error and also to inform the reader that the key is not being used.
1 I sometimes use darts as example of a sport because it is one of the few in which one can be successful without being extremely physically fit.

How can I speed up this Rails code?

It's a vague question I know....but the performance on this block of code is horrible. It takes about 15secs from the original post to the action to rendering the page...
The purpose of this action is to retrieve all Occupations from a CV, all the skills from that CV and the occupations. They need to be organized in 2 arrays:
the first array contains all the Occupations (no duplicates) and has them ordered according their score. Fo each double entry found the score is increased by 1
the second array contains ALL the skills from both the occupation array and the cv. Again no doubles are allowed, but for every double encountered the score of the existing is increased by one.
Below is the code block that performs this operation. It's relatively big compared to my other code snippets, but i hope it's understandable. I know working with the arrays like i do is confusing, but here is what each array location means:
position 0 : the actuall skill/occupation object
position 1 : the score of the entry
position 2 : the location found in the db
position 3 : the location found in the cv
def categorize
#cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills])
#menu = :second
#language = Language.resolve(:code => :en, :name => :en)
#occupation_hashes = []
#skill_hashes = []
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
occupation.skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
label = occupation.concept.label(#language).value
array[1]+= 1
array[3] << label unless array[3].include? label
else
#skill_hashes << [skill, 1, [], [occupation.concept.label(#language).value]]
end
end
end
#cv.educational_skills.each do |skill|
unless (array = #skill_hashes.assoc skill).blank?
array[1]+= 1
array[3] << 'Education skills' unless array[3].include? 'Education skills'
else
#skill_hashes << [skill, 1, ['Education skills'], []]
end
end
# Sort the hashes
#occupation_hashes.sort! { |x,y| y[1] <=> x[1]}
#skill_hashes.sort! { |x,y| y[1] <=> x[1]}
#max = #skill_hashes.first[1]
#min = #skill_hashes.last[1] end
I can post the additional models and migrations to make it clear what each class does, but I think the first few lines of the above script should be clear on the associations. I'm looking for a way to optimize the each-loops...
That's quite the block of code there. Generally if you're writing methods that serious you're going to have trouble maintaining it in the future. A technique that would help is breaking up that monolithic chunk of code and turning it into a helper class that does the processing in more logical stages, making it easier to fine-tune aspects of it.
For instance, an interface might be:
#categorizer = CvCategorizer.new(params[:cv_id])
This would encapsulate all of the above and save it into instance variables made accessible by being declared with attr_reader.
Using a utility class means you can break up the initialization into steps that are made more clear:
def initialize(cv_id)
# Call a wrapper method that loads the CV
#cv = self.load_cv(cv_id)
# Perform discrete steps to re-order the imported data
self.organize_occupations
self.organize_skills
end
It's really hard to say why this is slow by just looking at it, though I would pay very close attention to log/development.log to see what's going on in there. It could be the initial load is painfully slow but the rest of the method is fine.
You should do a but of profiling in your code to see what is taking a large chunk of time. You can figure out how to work on of the profilers, or just sprinkle some simple puts or logger.info statements throughout your code with a timestamp. Probably easiest to do this by using Benchmark. Note: you may need to require 'benchmark'... not sure if it is auto required in Rails or not.
For a single line, you can do something like this:
logger.info Benchmark.measure { #cv = Cv.find(params[:cv_id], :include => [:desired_occupations, :past_occupations, :educational_skills]) }
And for timing larger blocks of code:
logger.info Benchmark.measure do
(#cv.desired_occupations + #cv.past_occupations).each do |occupation|
section = []
section << 'Desired occupation' if #cv.desired_occupations.include? occupation
section << 'Work experience' if #cv.past_occupations.include? occupation
unless (array = #occupation_hashes.assoc(occupation)).blank?
array[1] += 1
array[2] = (array[2] & section).uniq
else
#occupation_hashes << [occupation, 1, section]
end
end
end
I'd just start with large blocks and then narrow it down. Not knowing how large of a dataset you are dealing with, it is hard to say what the problem zone is.
I'll also concur with others that you will be way better off to break this thing into smaller methods. This will also make it easier to test for performance, as you can do things like:
Benchmark.measure { 10000.times { foo.do_that_thing_that_might_be_slow }}

What's the most efficient way to deep copy an object in Ruby?

I know that serializing an object is (to my knowledge) the only way to effectively deep-copy an object (as long as it isn't stateful like IO and whatnot), but is one way particularly more efficient than another?
For example, since I'm using Rails, I could always use ActiveSupport::JSON, to_xml - and from what I can tell marshalling the object is one of the most accepted ways to do this. I'd expect that marshalling is probably the most efficient of these since it's a Ruby internal, but am I missing anything?
Edit: note that its implementation is something I already have covered - I don't want to replace existing shallow copy methods (like dup and clone), so I'll just end up likely adding Object::deep_copy, the result of which being whichever of the above methods (or any suggestions you have :) that has the least overhead.
I was wondering the same thing, so I benchmarked a few different techniques against each other. I was primarily concerned with Arrays and Hashes - I didn't test any complex objects. Perhaps unsurprisingly, a custom deep-clone implementation proved to be the fastest. If you are looking for quick and easy implementation, Marshal appears to be the way to go.
I also benchmarked an XML solution with Rails 3.0.7, not shown below. It was much, much slower, ~10 seconds for only 1000 iterations (the solutions below all ran 10,000 times for the benchmark).
Two notes regarding my JSON solution. First, I used the C variant, version 1.4.3. Second, it doesn't actually work 100%, as symbols will be converted to Strings.
This was all run with ruby 1.9.2p180.
#!/usr/bin/env ruby
require 'benchmark'
require 'yaml'
require 'json/ext'
require 'msgpack'
def dc1(value)
Marshal.load(Marshal.dump(value))
end
def dc2(value)
YAML.load(YAML.dump(value))
end
def dc3(value)
JSON.load(JSON.dump(value))
end
def dc4(value)
if value.is_a?(Hash)
result = value.clone
value.each{|k, v| result[k] = dc4(v)}
result
elsif value.is_a?(Array)
result = value.clone
result.clear
value.each{|v| result << dc4(v)}
result
else
value
end
end
def dc5(value)
MessagePack.unpack(value.to_msgpack)
end
value = {'a' => {:x => [1, [nil, 'b'], {'a' => 1}]}, 'b' => ['z']}
Benchmark.bm do |x|
iterations = 10000
x.report {iterations.times {dc1(value)}}
x.report {iterations.times {dc2(value)}}
x.report {iterations.times {dc3(value)}}
x.report {iterations.times {dc4(value)}}
x.report {iterations.times {dc5(value)}}
end
results in:
user system total real
0.230000 0.000000 0.230000 ( 0.239257) (Marshal)
3.240000 0.030000 3.270000 ( 3.262255) (YAML)
0.590000 0.010000 0.600000 ( 0.601693) (JSON)
0.060000 0.000000 0.060000 ( 0.067661) (Custom)
0.090000 0.010000 0.100000 ( 0.097705) (MessagePack)
I think you need to add an initialize_copy method to the class you are copying. Then put the logic for the deep copy in there. Then when you call clone it will fire that method. I haven't done it but that's my understanding.
I think plan B would be just overriding the clone method:
class CopyMe
attr_accessor :var
def initialize var=''
#var = var
end
def clone deep= false
deep ? CopyMe.new(#var.clone) : CopyMe.new()
end
end
a = CopyMe.new("test")
puts "A: #{a.var}"
b = a.clone
puts "B: #{b.var}"
c = a.clone(true)
puts "C: #{c.var}"
Output
mike#sleepycat:~/projects$ ruby ~/Desktop/clone.rb
A: test
B:
C: test
I'm sure you could make that cooler with a little tinkering but for better or for worse that is probably how I would do it.
Probably the reason Ruby doesn't contain a deep clone has to do with the complexity of the problem. See the notes at the end.
To make a clone that will "deep copy," Hashes, Arrays, and elemental values, i.e., make a copy of each element in the original such that the copy will have the same values, but new objects, you can use this:
class Object
def deepclone
case
when self.class==Hash
hash = {}
self.each { |k,v| hash[k] = v.deepclone }
hash
when self.class==Array
array = []
self.each { |v| array << v.deepclone }
array
else
if defined?(self.class.new)
self.class.new(self)
else
self
end
end
end
end
If you want to redefine the behavior of Ruby's clone method , you can name it just clone instead of deepclone (in 3 places), but I have no idea how redefining Ruby's clone behavior will affect Ruby libraries, or Ruby on Rails, so Caveat Emptor. Personally, I can't recommend doing that.
For example:
a = {'a'=>'x','b'=>'y'} => {"a"=>"x", "b"=>"y"}
b = a.deepclone => {"a"=>"x", "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 15227640 / 15209520
If you want your classes to deepclone properly, their new method (initialize) must be able to deepclone an object of that class in the standard way, i.e., if the first parameter is given, it's assumed to be an object to be deepcloned.
Suppose we want a class M, for example. The first parameter must be an optional object of class M. Here we have a second optional argument z to pre-set the value of z in the new object.
class M
attr_accessor :z
def initialize(m=nil, z=nil)
if m
# deepclone all the variables in m to the new object
#z = m.z.deepclone
else
# default all the variables in M
#z = z # default is nil if not specified
end
end
end
The z pre-set is ignored during cloning here, but your method may have a different behavior. Objects of this class would be created like this:
# a new 'plain vanilla' object of M
m=M.new => #<M:0x0000000213fd88 #z=nil>
# a new object of M with m.z pre-set to 'g'
m=M.new(nil,'g') => #<M:0x00000002134ca8 #z="g">
# a deepclone of m in which the strings are the same value, but different objects
n=m.deepclone => #<M:0x00000002131d00 #z="g">
puts "#{m.z.object_id} / #{n.z.object_id}" => 17409660 / 17403500
Where objects of class M are part of an array:
a = {'a'=>M.new(nil,'g'),'b'=>'y'} => {"a"=>#<M:0x00000001f8bf78 #z="g">, "b"=>"y"}
b = a.deepclone => {"a"=>#<M:0x00000001766f28 #z="g">, "b"=>"y"}
puts "#{a['a'].object_id} / #{b['a'].object_id}" => 12303600 / 12269460
puts "#{a['b'].object_id} / #{b['b'].object_id}" => 16811400 / 17802280
Notes:
If deepclone tries to clone an object which doesn't clone itself in the standard way, it may fail.
If deepclone tries to clone an object which can clone itself in the standard way, and if it is a complex structure, it may (and probably will) make a shallow clone of itself.
deepclone doesn't deep copy the keys in the Hashes. The reason is that they are not usually treated as data, but if you change hash[k] to hash[k.deepclone] they will also be deep copied also.
Certain elemental values have no new method, such as Fixnum. These objects always have the same object ID, and are copied, not cloned.
Be careful because when you deep copy, two parts of your Hash or Array that contained the same object in the original will contain different objects in the deepclone.

In Ruby, how to write a method to display any object's instance variable names and its values

Given any object in Ruby (on Rails), how can I write a method so that it will display that object's instance variable names and its values, like this:
#x: 1
#y: 2
#link_to_point: #<Point:0x10031b298 #y=20, #x=38>
(Update: inspect will do except for large object it is difficult to break down the variables from the 200 lines of output, like in Rails, when you request.inspect or self.inspect in the ActionView object)
I also want to be able to print <br> to the end of each instance variable's value so as to print them out nicely on a webpage.
the difficulty now seems to be that not every instance variable has an accessor, so it can't be called with obj.send(var_name)
(the var_name has the "#" removed, so "#x" becomes "x")
Update: I suppose using recursion, it can print out a more advanced version:
#<Point:0x10031b462>
#x: 1
#y: 2
#link_to_point: #<Point:0x10031b298>
#x=38
#y=20
I would probably write it like this:
class Object
def all_variables(root=true)
vars = {}
self.instance_variables.each do |var|
ivar = self.instance_variable_get(var)
vars[var] = [ivar, ivar.all_variables(false)]
end
root ? [self, vars] : vars
end
end
def string_variables(vars, lb="\n", indent="\t", current_indent="")
out = "#{vars[0].inspect}#{lb}"
current_indent += indent
out += vars[1].map do |var, ivar|
ivstr = string_variables(ivar, lb, indent, current_indent)
"#{current_indent}#{var}: #{ivstr}"
end.join
return out
end
def inspect_variables(obj, lb="\n", indent="\t", current_indent="")
string_variables(obj.all_variables, lb, indent, current_indent)
end
The Object#all_variables method produces an array containing (0) the given object and (1) a hash mapping instance variable names to arrays containing (0) the instance variable and (1) a hash mapping…. Thus, it gives you a nice recursive structure. The string_variables function prints out that hash nicely; inspect_variables is just a convenience wrapper. Thus, print inspect_variables(foo) gives you a newline-separated option, and print inspect_variables(foo, "<br />\n") gives you the version with HTML line breaks. If you want to specify the indent, you can do that too: print inspect_variables(foo, "\n", "|---") produces a (useless) faux-tree format instead of tab-based indenting.
There ought to be a sensible way to write an each_variable function to which you provide a callback (which wouldn't have to allocate the intermediate storage); I'll edit this answer to include it if I think of something. Edit 1: I thought of something.
Here's another way to write it, which I think is slightly nicer:
class Object
def each_variable(name=nil, depth=0, parent=nil, &block)
yield name, self, depth, parent
self.instance_variables.each do |var|
self.instance_variable_get(var).each_variable(var, depth+1, self, &block)
end
end
end
def inspect_variables(obj, nl="\n", indent="\t", sep=': ')
out = ''
obj.each_variable do |name, var, depth, _parent|
out += [indent*depth, name, name ? sep : '', var.inspect, nl].join
end
return out
end
The Object#each_variable method takes a number of optional arguments, which are not designed to be specified by the user; instead, they are used by the recursion to maintain state. The given block is passed (a) the name of the instance variable, or nil if the variable is the root of the recursion; (b) the variable; (c) the depth to which the recursion has descended; and (d), the parent of the current variable, or nil if said variable is the root of the recursion. The recursion is depth-first. The inspect_variables function uses this to build up a string. The obj argument is the object to iterate through; nl is the line separator; indent is the indentation to be applied at each level; and sep separates the name and the value.
Edit 2: This doesn't really add anything to the answer to your question, but: just to prove that we haven't lost anything in the reimplementation, here's a reimplementation of all_variables in terms of each_variables.
def all_variables(obj)
cur_depth = 0
root = [obj, {}]
tree = root
parents = []
prev = root
obj.each_variable do |name, var, depth, _parent|
next unless name
case depth <=> cur_depth
when -1 # We've gone back up
tree = parents.pop(cur_depth - depth)[0]
when +1 # We've gone down
parents << tree
tree = prev
else # We're at the same level
# Do nothing
end
cur_depth = depth
prev = tree[1][name] = [var, {}]
end
return root
end
I feel like it ought to be shorter, but that may not be possible; because we don't have the recursion now, we have to maintain the stack explicitly (in parents). But it is possible, so the each_variable method works just as well (and I think it's a little nicer).
I see... Antal must be giving the advanced version here...
the short version then probably is:
def p_each(obj)
obj.instance_variables.each do |v|
puts "#{v}: #{obj.instance_variable_get(v)}\n"
end
nil
end
or to return it as a string:
def sp_each(obj)
s = ""
obj.instance_variables.each do |v|
s += "#{v}: #{obj.instance_variable_get(v)}\n"
end
s
end
or shorter:
def sp_each(obj)
obj.instance_variables.map {|v| "#{v}: #{obj.instance_variable_get(v)}\n"}.join
end
This is a quick adaptation of a simple JSON emitter I wrote for another question:
class Object
def inspect!(indent=0)
return inspect if instance_variables.empty?
"#<#{self.class}:0x#{object_id.to_s(16)}\n#{' ' * indent+=1}#{
instance_variables.map {|var|
"#{var}: #{instance_variable_get(var).inspect!(indent)}"
}.join("\n#{' ' * indent}")
}\n#{' ' * indent-=1}>"
end
end
class Array
def inspect!(indent=0)
return '[]' if empty?
"[\n#{' ' * indent+=1}#{
map {|el| el.inspect!(indent) }.join(",\n#{' ' * indent}")
}\n#{' ' * indent-=1}]"
end
end
class Hash
def inspect!(indent=0)
return '{}' if empty?
"{\n#{' ' * indent+=1}#{
map {|k, v|
"#{k.inspect!(indent)} => #{v.inspect!(indent)}"
}.join(",\n#{' ' * indent}")
}\n#{' ' * indent-=1}}"
end
end
That's all the magic, really. Now we only need some simple defaults for some types where a full-on inspect doesn't really make sense (nil, false, true, numbers, etc.):
module InspectBang
def inspect!(indent=0)
inspect
end
end
[Numeric, Symbol, NilClass, TrueClass, FalseClass, String].each do |klass|
klass.send :include, InspectBang
end
Like this?
# Get the instance variables of an object
d = Date.new
d.instance_variables.each{|i| puts i + "<br />"}
Ruby Documentation on instance_variables.
The concept is commonly called "introspection", (to look into oneself).

Resources