Parsing XML sheet with Rails Nokogiri XPath for attributes and elements - ruby-on-rails

I'm trying to use Nokogiri in a rails 4.2.0 environment to parse a data sheet of classes. What I intend is to have each course parsed, with the #catalog_nbr, #subject attributes stored, as well as the first instructor listed. The code I have below simply yields empty arrays. I believe the problem has to do with using the .each method, but I can't figure it out!
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("//course").each do
num = doc.xpath("./#catalog_nbr").text
subject = doc.xpath("./#subject").text
instructor = doc.xpath("./sections/section/meeting/instructors/instructor")[1].text
Course.create(:subject => subject, :number => num, :instructor => instructor)
end

Try this.
After selecting the doc, we need to traverse each of the rows in the document. Lets call each of that rows as row
Next. Assign default values if they are blank. Read this article to get more information on this.
doc.xpath("//course").each do |row|
num = row.xpath("./#catalog_nbr").text || "N/A"
subject = row.xpath("./#subject").text || "N/A"
instructor = row.xpath("./sections/section/meeting/instructors/instructor")[1].text || "N/A"
Course.create(:subject => subject, :number => num, :instructor => instructor)
end

Here's a working solution. Note that the XML file you have linked to always has both catalog numbers and subjects for each course, so there's no need for any || "N/A" there (but perhaps it's nice to be safe):
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.XML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("/courses/course").each do |course|
num = course["catalog_nbr"] || "N/A" # in case it doesn't exist
subj = course["subject"] || "N/A" # in case it doesn't exist
inst = (course.at("sections/section/meeting/instructors/instructor/text()") || "N/A").to_s
data = { subject:subj, number:num, instructor:inst }
p data
end
#=> {:subject=>"CS", :number=>"1110", :instructor=>"Van Loan,C (cfv3)"}
#=> {:subject=>"CS", :number=>"1112", :instructor=>"Fan,K (kdf4)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1132", :instructor=>"Fan,K (kdf4)"}
#=> etc.

Related

Needing to find matches in my controller from an array! Ruby on Rails

I am in a broken spot. I was able to get the array from into #set1 and now need to compare #set1 with #set2 and see how many matches there are. I can get the #array1 to work correctly if I have static numbers in an array in #array2 but not when I make it dynamic.
I need a way to compare these two arrays and am at a loss now!
def show
#set1 = Set1.find(params[:id])
#set2 = Set2.where(:date => #set1.date)
#array1 = [Set1.find(params[:id]).let1, Set1.find(params[:id]).let2]
#array2 = [Winnings.where(:date => #set1.date).let1, Winnings.where(:date => #set1.date).let2]
#intersection = #array1 & #array2
end
I think part of the problem here is that you can make new objects with the same attributes but that do not respond properly to the comparisons that the intersection operator :& uses.
Example:
class Thing
attr_reader :first,:last
def initialize(first,last)
#first = first
#last = last
end
end
thing1 = Thing.new("John","Smith")
thing2 = Thing.new("John","Smith")
thing1 == thing2
# => false
[thing1] & [thing2]
# => []
You might consider mapping each array to some identifying value (maybe id) and finding the intersection of those arrays. That is to say
#set1 = Set1.find(params[:id])
#set2 = Set2.where(:date => #set1.date)
#array1 = [Set1.find(params[:id]).let1, Set1.find(params[:id]).let2]
#array2 = [Winnings.where(:date => #set1.date).let1, Winnings.where(:date => #set1.date).let2]
#array1.map{|obj| obj.id} & #array2.map{|obj| obj.id}
# => an array of unique object ids that are in both #array1 and #array2
Or, if you want the objects themselves...
(#array1.map{|obj| obj.id} & #array2.map{|obj| obj.id}).map{ |id| Set.find(id) }

Take array and convert to a hash Ruby

I am trying this for the first time and am not sure I have quite achieved what i want to. I am pulling in data via a screen scrape as arrays and want to put them into a hash.
I have a model with columns :home_team and :away_team and would like to post the data captured via the screen scrape to these
I was hoping someone could quickly run this in a rb file
require 'open-uri'
require 'nokogiri'
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map {|team| team.text.strip}
away_team = doc.css(".team-away.teams").map {|team| team.text.strip}
team_clean = Hash[:home_team => home_team, :away_team => away_team]
puts team_clean.inspect
and advise if this is actually a hash as it seems to be an array as i cant see the hash name being outputted. i would of expected something like this
{"team_clean"=>[{:home_team => "Man Utd", "Chelsea", "Liverpool"},
{:away_team => "Swansea", "Cardiff"}]}
any help appreciated
You actually get a Hash back. But it looks different from the one you expected. You expect a Hash inside a Hash.
Some examples to clarify:
hash = {}
hash.class
=> Hash
hash = { home_team: [], away_team: [] }
hash.class
=> Hash
hash[:home_team].class
=> Array
hash = { hash: { home_team: [], away_team: [] } }
hash.class
=> Hash
hash[:hash].class
=> Hash
hash[:hash][:home_team].class
=> Array
The "Hash name" as you call it, is never "outputed". A Hash is basically a Array with a different index. To clarify this a bit:
hash = { 0 => "A", 1 => "B" }
array = ["A", "B"]
hash[0]
=> "A"
array[0]
=> "A"
hash[1]
=> "B"
array[1]
=> "B"
Basically with a Hash you additionally define, how and where to find the values by defining the key explicitly, while an array always stores it with a numerical index.
here is the solution
team_clean = Hash[:team_clean => [Hash[:home_team => home_team,:away_team => away_team]]]

Array only saves the last value in ruby on rails

I have a loop which outputs information I grabbed from a website. To make the information display in an readable fashion, I insert it into an array that will be displayed on my view page. However, The array does not store all the values retrieved and instead only saves the last value appended to it. In the end I can only get the last value inserted into the array to be displayed.
My controller file...
def home
scrape()
end
private
def scrape
require 'rubygems'
require 'nokogiri'
require 'open-uri'
time = Time.new
month = I18n.t("date.abbr_month_names")[time.month]
day = time.day
#strings = []
#United States
cities = [
"sfbay", "losangeles", "athensga", "phoenix", "santabarbara", "denver",
"panamacity", "miami", "austin", "bakersfield", "keys", "newyork"
]
cities.map do |city|
#Search Terms
search_terms = ["mechanic", "car", "tech"]
search_terms.map do |term|
escaped_term = CGI.escape(term)
url = "http://#{city}.craigslist.org/search/jjj?query=#{escaped_term}&catAbb=jjj&
srchType=A"
doc = Nokogiri::HTML(open(url))
doc.css(".row").map do |row|
date = row.css(".itemdate").text
a_tag = row.css("a")[0]
text = a_tag.text
link = a_tag[:href]
#strings == []
if date = "#{month} #{day}"
#strings << "#{date} #{text} #{link}"
end
end
end
end
end
In the view home.html.erb file...
<%= raw(#strings.join('<br />')) %>
So when I go to the home page, I'm only display the last value inserted into the array. What is wrong and how do I fix it?
For one thing you create a new array for every row for every city. (But don't, actually; the assignment is a compare, ==, at the moment.)
For another you set date equal to "#{month} #{day}" instead of doing a comparison.

How can I change data collection from hash to array?

Now I'm fetching data from another url...
Here is my code:
require 'rubygems'
require 'nokogiri'
html = page.body
doc = Nokogiri::HTML(html)
doc.encoding = 'utf-8'
rows = doc.search('//table[#id = "MainContent_GridView1"]//tr')
#details = rows.collect do |row|
detail = {}
[
[:car, 'td[1]/text()'],
[:article, 'td[2]/text()'],
[:group, 'td[3]/text()'],
[:price, 'td[4]/text()'],
].each do |name, xpath|
detail[name] = row.at_xpath(xpath).to_s.strip
end
detail
end
#details
I tried to do it via array, not a hash. But I get a lot of errors...
Are there any ideas?
I need it for another method...
also i set data (this result hash) to another car here:
oem_art = []
#constr_num.each do |o|
as_oem = get_from_as_oem(o.ARL_SEARCH_NUMBER)
if as_oem.present?
oem_art << as_oem
end
end
#oem_art = oem_art.to_a.uniq
Do you just want to change a hash into an array? If so, just use the to_a method on your hash.
hash = {:a => "something", :b => "something else"}
array = hash.to_a
array.inspect #=> [[:a, "something"], [:b, "something else"]]
It looks like you're looking for something like hash['key'] to hash.key in Ruby
The Hash Class doesn't support .key notation by default, OpenStruct creates an Object from the Hash so you can use dot notation to access the properties. Overall it's basically just syntactic sugar with overhead.
Suggested code (from linked answer)
>> require 'ostruct'
=> []
>> foo = {'bar'=>'baz'}
=> {"bar"=>"baz"}
>> foo_obj = OpenStruct.new foo
=> #<OpenStruct bar="baz">
>> foo_obj.bar
=> "baz"
So in your example, you could do:
# Initialised somewhere
require 'ostruct'
DETAIL_INDICES = {
:car => 1,
:article => 2,
:group => 3,
:price => 4,
}
# ** SNIP **
#details = rows.map do |row|
DETAIL_INDICES.inject({}) do |h,(k,v)|
h.merge(k => row.at_xpath("td[#{v}]/text()").to_s.strip)
end
end.collect { |hash| OpenStruct.new hash }
#details.each do |item|
puts item.car
end
Of course if performance is a concern you can merge your map&collect (They are the same), but this is just a minor separation for basic semantic differences, although I usually only use map for consistency, so feel free to choose yourself :)
EDIT -- Additional code from your edit simplified
#oem_art = #constr_num.select do |item|
as_oem = get_from_as_oem(item.ARL_SEARCH_NUMBER)
as_oem.present?
end
puts #oem_art.uniq

Ruby way to loop and check subsequent values against each other

I have an array that contains dates and values. An example of how it might look:
[
{'1/1/2010' => 'aa'},
{'1/1/2010' => 'bb'},
{'1/2/2010' => 'cc'},
{'1/2/2010' => 'dd'},
{'1/3/2010' => 'ee'}
]
Notice that some of the dates repeat. I'm trying to output this in a table format and I only want to show unique dates. So I loop through it with the following code to get my desired output.
prev_date = nil
#reading_schedule.reading_plans.each do |plan|
use_date = nil
if plan.assigned_date != prev_date
use_date = plan.assigned_date
end
prev_date = plan.assigned_date
plan.assigned_date = use_date
end
The resulting table will then look something like this
1/1/2010 aa
bb
1/2/2010 cc
dd
1/3/2010 ee
This work fine but I am new to ruby and was wondering if there was a better way to do this.
Enumerable.group_by is a good starting point:
require 'pp'
asdf = [
{'1/1/2010' => 'aa'},
{'1/1/2010' => 'bb'},
{'1/2/2010' => 'cc'},
{'1/2/2010' => 'dd'},
{'1/3/2010' => 'ee'}
]
pp asdf.group_by { |n| n.keys.first }.map{ |a,b| { a => b.map { |c| c.to_a.last.last } } }
# >> [{"1/1/2010"=>["aa", "bb"]}, {"1/2/2010"=>["cc", "dd"]}, {"1/3/2010"=>["ee"]}]
Which should be a data structure you can bend to your will.
I don't know as though it's better, but you could group the values by date using (e.g.) Enumerable#reduce (requires Ruby >= 1.8.7; before that, you have Enumerable#inject).
arr.reduce({}) { |memo, obj|
obj.each_pair { |key, value|
memo[key] = [] if ! memo.has_key?(key);
memo[key] << value
}
memo
}.sort
=> [["1/1/2010", ["aa", "bb"]], ["1/2/2010", ["cc", "dd"]], ["1/3/2010", ["ee"]]]
You could also use Array#each to similar effect.
This is totally a job for a hash.
Create a hash and use the date as the hashkey and an empty array as the hashvalue.
Then accumulate the values from the original array in the hashvalue array

Resources