It's only my second post and I'm still learning ruby.
I'm trying to figure this out based on my Java knowledge but I can't seem to get it right.
What I need to do is:
I have a function that reads a file line by line and extract different car features from each line, for example:
def convertListings2Catalogue (fileName)
f = File.open(fileName, "r")
f.each_line do |line|
km=line[/[0-9]+km/]
t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
trans = ....
end end
Now for each line I need to store the extracted features into separate
hashes that I can access later in my program.
The issues I'm facing:
1) I'm overwriting the features in the same hash
2) Can't access the hash outside my function
That what's in my file:
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC,
Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated
Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors,
Keyless Entry},19BF723A,2018,LE
Now my function extracts the features of each car model, but I need to store these features in 3 different hashes with the same keys but different values.
listing = Hash.new(0)
listing = { kilometers: km, type: t, transmission: trans, drivetrain: dt, status: status, car_maker: car_maker }
I tried moving the data from one hash to another, I even tried storing the data in an array first and then moving it to the hash but I still can't figure out how to create separate hashes inside a loop.
Thanks
I don't fully understand the question but I thought it was important to suggest how you might deal with a more fundamental issue: extracting the desired information from each line of the file in an effective and Ruby-like manner. Once you have that information, in the form of an array of hashes, one hash per line, you can do with it what you want. Alternatively, you could loop through the lines in the file, constructing a hash for each line and performing any desired operations before going on to the next line.
Being new to Ruby you will undoubtedly find some of the code below difficult to understand. If you persevere, however, I think you will be able to understand all of it, and in the process learn a lot about Ruby. I've made some suggestions in the last section of my answer to help you decipher the code.
Code
words_by_key = {
type: %w| sedan coupe hatchback station suv |,
transmission: %w| auto manual steptronic |,
drivetrain: %w| fwd rwd awd |,
status: %w| used new |,
car_maker: %w| honda toyota mercedes bmw lexus |,
model: %w| camry clk crv |
}
#=> {:type=>["sedan", "coupe", "hatchback", "station", "suv"],
# :transmission=>["auto", "manual", "steptronic"],
# :drivetrain=>["fwd", "rwd", "awd"],
# :status=>["used", "new"],
# :car_maker=>["honda", "toyota", "mercedes", "bmw", "lexus"],
# :model=>["camry", "clk", "crv"]}
WORDS_TO_KEYS = words_by_key.each_with_object({}) { |(k,v),h| v.each { |s| h[s] = k } }
#=> {"sedan"=>:type, "coupe"=>:type, "hatchback"=>:type, "station"=>:type, "suv"=>:type,
# "auto"=>:transmission, "manual"=>:transmission, "steptronic"=>:transmission,
# "fwd"=>:drivetrain, "rwd"=>:drivetrain, "awd"=>:drivetrain,
# "used"=>:status, "new"=>:status,
# "honda"=>:car_maker, "toyota"=>:car_maker, "mercedes"=>:car_maker,
# "bmw"=>:car_maker, "lexus"=>:car_maker,
# "camry"=>:model, "clk"=>:model, "crv"=>:model}
module ExtractionMethods
def km(str)
str[/\A\d+(?=km\z)/]
end
def year(str)
str[/\A\d+{4}\z/]
end
def stock(str)
return nil if str.end_with?('km')
str[/\A\d+\p{Alpha}\p{Alnum}*\z/]
end
def trim(str)
str[/\A\p{Alpha}{2}\z/]
end
def fuel_consumption(str)
str.to_f if str[/\A\d+(?:\.\d+)?(?=l\/100km\z)/]
end
end
class K
include ExtractionMethods
def extract_hashes(fname)
File.foreach(fname).with_object([]) do |line, arr|
line = line.downcase
idx_left = line.index('{')
idx_right = line.index('}')
if idx_left && idx_right
g = { set_of_features: line[idx_left..idx_right] }
line[idx_left..idx_right] = ''
line.squeeze!(',')
else
g = {}
end
arr << line.split(',').each_with_object(g) do |word, h|
word.strip!
if WORDS_TO_KEYS.key?(word)
h[WORDS_TO_KEYS[word]] = word
else
ExtractionMethods.instance_methods.find do |m|
v = public_send(m, word)
(h[m] = v) unless v.nil?
v
end
end
end
end
end
end
Example
data =<<BITTER_END
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC, Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors, Keyless Entry},19BF723A,2018,LE
BITTER_END
FILE_NAME = 'temp'
File.write(FILE_NAME, data)
#=> 353 (characters written to file)
k = K.new
#=> #<K:0x00000001c257d348>
k.extract_hashes(FILE_NAME)
#=> [{:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry}",
# :km=>"65101", :type=>"sedan", :transmission=>"manual", :year=>"2010",
# :stock=>"18131a", :drivetrain=>"fwd", :status=>"used", :fuel_consumption=>5.5,
# :car_maker=>"toyota", :model=>"camry", :trim=>"se"},
# {:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry, power seats}",
# :type=>"coupe", :km=>"1100", :transmission=>"auto", :drivetrain=>"rwd",
# :model=>"clk", :trim=>"lx", :stock=>"18fo724a", :year=>"2017",
# :fuel_consumption=>6.0, :status=>"used"},
# {:set_of_features=>"{heated seats, heated mirrors, keyless entry}",
# :drivetrain=>"awd", :type=>"suv", :km=>"0", :transmission=>"auto",
# :status=>"new", :car_maker=>"honda", :model=>"crv", :fuel_consumption=>8.0,
# :stock=>"19bf723a", :year=>"2018", :trim=>"le"}]
Explanation
Firstly, note that the HEREDOC needs to be un-indented before being executed.
You will see that the instance method K#extract_hashes uses IO#foreach to read the file line-by-line.1
The first step in processing each line of the file is to downcase it. You will then want to split the string on commas to form an array of words. There is a problem, however, in that you don't want to split on commas that are between a left and right brace ({ and }), which corresponds to the key :set_of_features. I decided to deal with that by determining the indices of the two braces, creating a hash with the single key :set_of_features, delete that substring from the line and lastly replace a resulting pair of adjacent commas with a single comma:
idx_left = line.index('{')
idx_right = line.index('}')
if idx_left && idx_right
g = { set_of_features: line[idx_left..idx_right] }
line[idx_left..idx_right] = ''
line.squeeze!(',')
else
g = {}
end
See String for the documentation of the String methods used here and elsewhere.
We can now convert the resulting line to an array of words by splitting on the commas. If any capitalization is desired in the output this should be done after the hashes have been constructed.
We will build on the hash { set_of_features: line[idx_left..idx_right] } just created. When complete, it will be appended to the array being returned.
Each element (word) in the array, is then processed. If it is a key of the hash WORDS_TO_KEYS we set
h[WORDS_TO_KEYS[word]] = word
and are finished with that word. If not, we execute each of the instance methods m in the module ExtractionMethods until one is found for which m[word] is not nil. When that is found another key-value pair is added to the hash h:
h[m] = word
Notice that the name of each instance method in ExtractionMethods, which is a symbol (e.g., :km), is a key in the hash h. Having separate methods facilitates debugging and testing.
I could have written:
if (s = km(word))
s
elsif (s = year(word))
s
elsif (s = stock(str))
s
elsif (s = trim(str))
s
elsif (s = fuel_consumption(str))
s
end
but since all these methods take the same argument, word, we can instead use Object#public_send:
a = [:km, :year, :stock, :trim, :fuel_consumption]
a.find do |m|
v = public_send(m, word)
(h[m] = v) unless v.nil?
v
end
A final tweak is to put all the methods in the array a in a module ExtractionMethods and include that module in the class K. We can then replace a in the find expression above with ExtractionMethods.instance_methods. (See Module#instance_methods.)
Suppose now that the data are changed so that additional fields are added (e.g., for "colour" or "price"). Then the only modifications to the code required are changes to words_by_key and/or the addition of methods to ExtractionMethods.
Understanding the code
It may be helpful to run the code with puts statements inserted. For example,
idx_left = line.index('{')
idx_right = line.index('}')
puts "idx_left=#{idx_left}, idx_left=#{idx_left}"
Where code is chained it may be helpful to break it up with temporary variables and insert puts statements. For example, change
arr << line.split(',').each_with_object(g) do |word, h|
...
to
a = line.split(',')
puts "line.split(',')=#{a}"
enum = a.each_with_object(g)
puts "enum.to_a=#{enum.to_a}"
arr << enum do |word, h|
...
The second puts here is merely to see what elements the enumerator enum will generate and pass to the block.
Another way of doing that is to use the handy method Object#tap, which is inserted between two methods:
arr << line.split(',').tap { |a| puts "line.split(',')=#{a}"}.
each_with_object(g) do |word, h|
...
tap (great name, eh?), as used here, simply returns its receiver after displaying its value.
Lastly, I've used the method Enumerable#each_with_object in a couple of places. It may seem complex but it's actually quite simple. For example,
arr << line.split(',').each_with_object(g) do |word, h|
...
end
is effectively equivalent to:
h = g
arr << line.split(',').each do |word|
...
end
h
1 Many IO methods are typically invoked on File. This is acceptable because File.superclass #=> IO.
You could leverage the fact that your file instance is an enumerable. This allows you to leverage the inject method, and you can seed that with an empty hash. collector in this case is the hash that gets passed along as the iteration continues. Be sure to (implicitly, by having collector be the last line of the block) return the value of collector as the inject method will use this to feed into the next iteration. It's some pretty powerful stuff!
I think this is roughly what you're going for. I used model as the key in the hash, and set_of_features as your data.
def convertListings2Catalogue (fileName)
f = File.open(fileName, "r")
my_hash = f.inject({}) do |collector, line|
km=line[/[0-9]+km/]
t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
trans = line[(Regexp.union(/auto/i, /manual/i, /steptronic/i))]
dt = line[(Regexp.union(/fwd/i, /rwd/i, /awd/i))]
status = line[(Regexp.union(/used/i, /new/i))]
car_maker = line[(Regexp.union(/honda/i, /toyota/i, /mercedes/i, /bmw/i, /lexus/i))]
stock = line.scan(/(\d+[a-z0-9]+[a-z](?<!km\b))(?:,|$)/i).first
year = line.scan(/(\d{4}(?<!km\b))(?:,|$)/).first
trim = line.scan(/\b[a-zA-Z]{2}\b/).first
fuel = line.scan(/[\d.]+L\/\d*km/).first
set_of_features = line.scan(/\{(.*?)\}/).first
model = line[(Regexp.union(/camry/i, /clk/i, /crv/i))]
collector[model] = set_of_features
collector
end
end
I'm working with the LinkedIn API to get companies' details. They are sending an XML response, so I simply converted the XML to a hash using the .to_hash method
This is a sample hash I'm getting: http://pastebin.com/1bXtHZ2F
in some companies they have more than one locations and contact information, i want to parse this data and get the details like phone number, city, postal_code etc.
The structure of the response is not consistent. Sometimes location field itself is missing or the postal_code is available only at the fourth location.
I tried two ways:
1.
def phone(locations)
(locations && locations["values"][0]["contactInfo"]["phone1"]) || nil
end
This is not working if the phone number is not available in the first array
2.
def phone(locations)
if locations["locations"]["total"].to_i == 1
locations["locations"]["location"]["contact_info"]["phone1"]
else
locations["locations"]["location"].each do |p|
if (!p["contact_info"]["phone1"].nil? || !p['contact_info'].nil?)
return p["contact_info"]["phone1"]
break
end
end
end
end
This is not working if the "location" hash itself is missing from the response. I need a solution where I can search with the keys "city", "phone" and "postal_code" and update if it is present. If it returns an array, parse the array and get the non-empty data.
I've also read this StackOverflow answer.
I see this as a question about code confidence. That is, I'm betting you can figure out how to guess your way through all the possible conditions... but that will create a mess of unconfident code. Confident code states what it wants and it gets it and moves on. (Note: I get all of my inspiration on this topic from this wonderful book: http://www.confidentruby.com/ by Avdi Grimm).
That said, I'd recommend the following.
Install the naught gem: https://github.com/avdi/naught
In your code, utilize the Maybe conversion function (read through the gem documetnation for info) to confidently arrive at your values:
At the top of your class or controller:
NullObject = Naught.build
include NullObject::Conversions
In your method:
def phone(locations)
return {} if locations["location"].blank?
Maybe(locations["locations"])["location"].to_a.inject({}) do |location, acc|
contact_info = Maybe(location["contact_info"])
acc[location][:city] = contact_info["city1"].to_s
acc[location][:phone] = contact_info["phone1"].to_i
acc[location][:postal_code] = contact_info["postal_code1"].to_s
acc
end
end
I'm not sure exactly what you're trying to accomplish but the above may be a start. It is simply attempting to assume all of the keys exist. Whether they do or they don't they get converted to a object (an array, a string or an integer). And then, ultimately, collected into a hash (call acc -- short for "accumulator" -- internal to the loop above) to be returned.
If any of the above needs clarification let me know and we can chat.
Ok, this code basically works through the hash and isn't concerned about node names (other than the specific nodes it's searching for)
the find_and_get_values method takes two arguments: object to search, and an array of nodes to find. It will only return a result if all nodes in the array are siblings under the same parent node. (so "city" and "postal_code" must be under the same parent otherwise neither is returned)
The data returned is a simple hash.
The get_values method takes one argument (the company hash) and calls find_and_get_values twice, once for %w(city postal_code) and once for %w(phone1) and merges the hash results into one hash.
def get_values(company)
answer = {}
answer.merge!(find_and_get_values(company["locations"], %w(city postal_code))
answer.merge!(find_and_get_values(company["locations"], ["phone1"]))
answer
end
def find_and_get_values(source, match_keys)
return {} if source.nil?
if source.kind_of?(Array)
source.each do |sub_source|
result = find_and_get_values(sub_source, match_keys)
return result unless result.empty?
end
else
result = {}
if source.kind_of?(Hash)
match_keys.each do |key|
result[key] = source[key] unless source[key].nil?
end
return result if result.count == match_keys.count
source.each do |sub_source|
result = find_and_get_values(sub_source, match_keys)
return result unless result.empty?
end
end
end
return {}
end
p get_values(company)
Given any object in Ruby (on Rails), how can I write a method so that it will display that object's instance variable names and its values, like this:
#x: 1
#y: 2
#link_to_point: #<Point:0x10031b298 #y=20, #x=38>
(Update: inspect will do except for large object it is difficult to break down the variables from the 200 lines of output, like in Rails, when you request.inspect or self.inspect in the ActionView object)
I also want to be able to print <br> to the end of each instance variable's value so as to print them out nicely on a webpage.
the difficulty now seems to be that not every instance variable has an accessor, so it can't be called with obj.send(var_name)
(the var_name has the "#" removed, so "#x" becomes "x")
Update: I suppose using recursion, it can print out a more advanced version:
#<Point:0x10031b462>
#x: 1
#y: 2
#link_to_point: #<Point:0x10031b298>
#x=38
#y=20
I would probably write it like this:
class Object
def all_variables(root=true)
vars = {}
self.instance_variables.each do |var|
ivar = self.instance_variable_get(var)
vars[var] = [ivar, ivar.all_variables(false)]
end
root ? [self, vars] : vars
end
end
def string_variables(vars, lb="\n", indent="\t", current_indent="")
out = "#{vars[0].inspect}#{lb}"
current_indent += indent
out += vars[1].map do |var, ivar|
ivstr = string_variables(ivar, lb, indent, current_indent)
"#{current_indent}#{var}: #{ivstr}"
end.join
return out
end
def inspect_variables(obj, lb="\n", indent="\t", current_indent="")
string_variables(obj.all_variables, lb, indent, current_indent)
end
The Object#all_variables method produces an array containing (0) the given object and (1) a hash mapping instance variable names to arrays containing (0) the instance variable and (1) a hash mapping…. Thus, it gives you a nice recursive structure. The string_variables function prints out that hash nicely; inspect_variables is just a convenience wrapper. Thus, print inspect_variables(foo) gives you a newline-separated option, and print inspect_variables(foo, "<br />\n") gives you the version with HTML line breaks. If you want to specify the indent, you can do that too: print inspect_variables(foo, "\n", "|---") produces a (useless) faux-tree format instead of tab-based indenting.
There ought to be a sensible way to write an each_variable function to which you provide a callback (which wouldn't have to allocate the intermediate storage); I'll edit this answer to include it if I think of something. Edit 1: I thought of something.
Here's another way to write it, which I think is slightly nicer:
class Object
def each_variable(name=nil, depth=0, parent=nil, &block)
yield name, self, depth, parent
self.instance_variables.each do |var|
self.instance_variable_get(var).each_variable(var, depth+1, self, &block)
end
end
end
def inspect_variables(obj, nl="\n", indent="\t", sep=': ')
out = ''
obj.each_variable do |name, var, depth, _parent|
out += [indent*depth, name, name ? sep : '', var.inspect, nl].join
end
return out
end
The Object#each_variable method takes a number of optional arguments, which are not designed to be specified by the user; instead, they are used by the recursion to maintain state. The given block is passed (a) the name of the instance variable, or nil if the variable is the root of the recursion; (b) the variable; (c) the depth to which the recursion has descended; and (d), the parent of the current variable, or nil if said variable is the root of the recursion. The recursion is depth-first. The inspect_variables function uses this to build up a string. The obj argument is the object to iterate through; nl is the line separator; indent is the indentation to be applied at each level; and sep separates the name and the value.
Edit 2: This doesn't really add anything to the answer to your question, but: just to prove that we haven't lost anything in the reimplementation, here's a reimplementation of all_variables in terms of each_variables.
def all_variables(obj)
cur_depth = 0
root = [obj, {}]
tree = root
parents = []
prev = root
obj.each_variable do |name, var, depth, _parent|
next unless name
case depth <=> cur_depth
when -1 # We've gone back up
tree = parents.pop(cur_depth - depth)[0]
when +1 # We've gone down
parents << tree
tree = prev
else # We're at the same level
# Do nothing
end
cur_depth = depth
prev = tree[1][name] = [var, {}]
end
return root
end
I feel like it ought to be shorter, but that may not be possible; because we don't have the recursion now, we have to maintain the stack explicitly (in parents). But it is possible, so the each_variable method works just as well (and I think it's a little nicer).
I see... Antal must be giving the advanced version here...
the short version then probably is:
def p_each(obj)
obj.instance_variables.each do |v|
puts "#{v}: #{obj.instance_variable_get(v)}\n"
end
nil
end
or to return it as a string:
def sp_each(obj)
s = ""
obj.instance_variables.each do |v|
s += "#{v}: #{obj.instance_variable_get(v)}\n"
end
s
end
or shorter:
def sp_each(obj)
obj.instance_variables.map {|v| "#{v}: #{obj.instance_variable_get(v)}\n"}.join
end
This is a quick adaptation of a simple JSON emitter I wrote for another question:
class Object
def inspect!(indent=0)
return inspect if instance_variables.empty?
"#<#{self.class}:0x#{object_id.to_s(16)}\n#{' ' * indent+=1}#{
instance_variables.map {|var|
"#{var}: #{instance_variable_get(var).inspect!(indent)}"
}.join("\n#{' ' * indent}")
}\n#{' ' * indent-=1}>"
end
end
class Array
def inspect!(indent=0)
return '[]' if empty?
"[\n#{' ' * indent+=1}#{
map {|el| el.inspect!(indent) }.join(",\n#{' ' * indent}")
}\n#{' ' * indent-=1}]"
end
end
class Hash
def inspect!(indent=0)
return '{}' if empty?
"{\n#{' ' * indent+=1}#{
map {|k, v|
"#{k.inspect!(indent)} => #{v.inspect!(indent)}"
}.join(",\n#{' ' * indent}")
}\n#{' ' * indent-=1}}"
end
end
That's all the magic, really. Now we only need some simple defaults for some types where a full-on inspect doesn't really make sense (nil, false, true, numbers, etc.):
module InspectBang
def inspect!(indent=0)
inspect
end
end
[Numeric, Symbol, NilClass, TrueClass, FalseClass, String].each do |klass|
klass.send :include, InspectBang
end
Like this?
# Get the instance variables of an object
d = Date.new
d.instance_variables.each{|i| puts i + "<br />"}
Ruby Documentation on instance_variables.
The concept is commonly called "introspection", (to look into oneself).