Nokogiri collects each line as one line - ruby-on-rails

I run an affiliate website, and I have code that consumes the XML datafeeds. I have written a method that can scrape and store the data from the XML feed, it works perfectly for say 60 feeds we have. Each row in the feed is created as a #product.
Now enter Linkshare, the code I have that works for the other 60 feeds doesn't work for Linkshare feeds. Instead of having #products.count = a few hundred, it tries to load ALL products into ONE instance of #products, so I get:
Soul Cal Deluxe Heart Camisole Top - WomensOnly Vest Top - WomensSoul
Cal Deluxe Racer Back Vest Top - WomensSoul Cal Deluxe Racer Back Vest
Top - WomensSoul Cal Deluxe Racer Back Vest Top - WomensSoul Cal
Deluxe Racer Back Vest Top - WomensCrafted Grunge Jumper - WomensMiso
Pearl & Fireball Ring Stack - WomensSoul Cal Deluxe Checked Shacket -
MensSoul Cal Deluxe Hines Stripe Shirt - MensMiso Embellished Waist
Dress - WomensGlamorous Belted Dress - WomensMiso Front Tie Shorts -
WomensMiso Tie Front Shorts - WomensSoul Cal Deluxe Stretch Skinny
Trousers - WomensSoul Cal Deluxe Stretch Skinny Trousers - WomensSoul
Cal Deluxe Stretch Skinny Trousers - WomensVero Moda Bernice Dress -
WomensMiso Dipped Hem Maxi Dress - WomensSoul Cal Deluxe Belted Chinos
- WomensSoul Cal Deluxe Stretch Skinny Trousers - WomensMiso Jewel Bodycon Dress - WomensMiso Evening Bandeau Dress - WomensMiso Aztec
Tube Skirt - WomensCrafted Embellished Neck Dress - WomensMiso Spot
Dress - WomensMiso Spot Dress - Womens"}]
(Note, I took just the end of the console log, but you see what I mean.
I've run lots of testing, this has been a problem we've had for a while.
Has anyone had anything like this?
Could it be doctypes or hpricot?
Any Suggestions?
Example feed:
<?xml version="1.0" encoding="UTF-8"?><merchandiser xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="merchandiser.xsd">
<header><merchantId>36503</merchantId><merchantName>Republic</merchantName><createdOn>2012-01-12/20:21:17</createdOn></header><product product_id="73794" name="Soul Cal Deluxe Borg DLX Zip Through Hoody - Mens" sku_number="73794" manufacturer_name="Soul Cal Deluxe" part_number="73794"><category><primary>Men > Sweats & Hoodies</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.73794&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F73794</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/73794/73794_lg1.jpg</productImage><buy></buy></URL><description><short>Blue, Hoodies, Hand wash only</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">34.99</sale><retail>34.99</retail></price><brand>Soul Cal Deluxe</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.73794&type=15&subid=0</pixel></product>
<product product_id="51118" name="Jack & Jones Dale Jack Jeans - Mens" sku_number="51118" manufacturer_name="Jack&Jones" part_number="51118"><category><primary>Men > Jeans</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.51118&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F51118</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/51118/51118_lg1.jpg</productImage><buy></buy></URL><description><short>Blue, Straight</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">40.00</sale><retail>64.99</retail></price><brand>Jack&Jones</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.51118&type=15&subid=0</pixel></product>
<product product_id="51128" name="Diesel Straight Leg Larkee Jeans - Mens" sku_number="51128" manufacturer_name="Diesel" part_number="51128"><category><primary>Men > Jeans</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.51128&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F51128</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/51128/51128_lg1.jpg</productImage><buy></buy></URL><description><short>Blue, Straight</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">120.00</sale><retail>120.00</retail></price><brand>Diesel</brand><shipping><cost currency="GBP"><amount>0.00</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.51128&type=15&subid=0</pixel></product>
<product product_id="68226" name="Miso Fox Scarf - Womens" sku_number="68226" manufacturer_name="Miso" part_number="68226"><category><primary>Women > Accessories</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.68226&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F68226</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/68226/68226_lg1.jpg</productImage><buy></buy></URL><description><short>Stone, Scarves, Machine washable</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">5.00</sale><retail>19.99</retail></price><brand>Miso</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.68226&type=15&subid=0</pixel></product>
<product product_id="67968" name="Levis Pleat Trucker Jacket - Womens" sku_number="67968" manufacturer_name="Levis" part_number="67968"><category><primary>Women > Coats & Jackets</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.67968&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F67968</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/67968/67968_lg1.jpg</productImage><buy></buy></URL><description><short>Blue, Jackets</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">30.00</sale><retail>84.99</retail></price><brand>Levis</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.67968&type=15&subid=0</pixel></product>
<product product_id="81217" name="G-Star Dean Army Tapered Jeans - Womens" sku_number="81217" manufacturer_name="G-Star" part_number="81217"><category><primary>Women > Jeans</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.81217&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F81217</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/81217/81217_lg1.jpg</productImage><buy></buy></URL><description><short>Black, Straight</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">115.00</sale><retail>115.00</retail></price><brand>G-Star</brand><shipping><cost currency="GBP"><amount>0.00</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.81217&type=15&subid=0</pixel></product>
<product product_id="58377" name="Police 883 Embro Hat - Mens" sku_number="58377" manufacturer_name="Police 883" part_number="58377"><category><primary>Men > Accessories</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.58377&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F58377</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/58377/58377_lg1.jpg</productImage><buy></buy></URL><description><short>Black, Hats & Beanies</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">17.99</sale><retail>17.99</retail></price><brand>Police 883</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.58377&type=15&subid=0</pixel></product>
<product product_id="58306" name="Diesel Kyle Beanie - Mens" sku_number="58306" manufacturer_name="Diesel" part_number="58306"><category><primary>Men > Accessories</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.58306&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F58306</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/58306/58306_lg1.jpg</productImage><buy></buy></URL><description><short>Black, Hats & Beanies</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">19.99</sale><retail>19.99</retail></price><brand>Diesel</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.58306&type=15&subid=0</pixel></product>
<product product_id="86865" name="Miso Fair Isle Cardigan - Womens" sku_number="86865" manufacturer_name="Miso" part_number="86865"><category><primary>Women > Knitwear</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.86865&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F86865</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/86865/86865_lg1.jpg</productImage><buy></buy></URL><description><short>Stone, Cardigans, Hand wash only</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">49.99</sale><retail>49.99</retail></price><brand>Miso</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.86865&type=15&subid=0</pixel></product>
<product product_id="52947" name="White Label Chinos - Mens" sku_number="52947" manufacturer_name="White Label" part_number="52947"><category><primary>Men > trs</primary><secondary></secondary></category><URL><product>http://click.linksynergy.com/link?id=g*0hRDNOv4M&offerid=215450.52947&type=15&murl=http%3A%2F%2Fwww.republic.co.uk%2Finvt%2F52947</product><productImage>http://www.republic.co.uk/content/ebiz/republic/invt/52947/52947_lg1.jpg</productImage><buy></buy></URL><description><short>Stone, Chinos, Machine washable</short><long></long></description><discount currency="GBP"><amount></amount><type>amount</type></discount><price currency="GBP"><sale begin_date="" end_date="">20.00</sale><retail>20.00</retail></price><brand>White Label</brand><shipping><cost currency="GBP"><amount>3.95</amount><currency>GBP</currency></cost><information></information><availability></availability></shipping><keywords></keywords><upc></upc><m1></m1><pixel>http://ad.linksynergy.com/fs-bin/show?id=g*0hRDNOv4M&bids=215450.52947&type=15&subid=0</pixel></product>

One problem is that the XML is malformed. The <merchandiser> tag is not terminated. Nokogiri will tell you as much if you check the errors method after parsing the document:
doc.errors
[
[0] #<Nokogiri::XML::SyntaxError:0x10109b918
attr_reader :code = 77,
attr_reader :column = 1,
attr_reader :domain = 1,
attr_reader :file = nil,
attr_reader :int1 = 1,
attr_reader :level = 3,
attr_reader :line = 12,
attr_reader :str1 = "merchandiser",
attr_reader :str2 = nil,
attr_reader :str3 = nil
>
]
Adding a closing </merchandiser> tag at the end of the file cleared that up:
doc = Nokogiri::XML(File.read('./test.xml'))
#<Nokogiri::XML::Document:0x1009ae038
#node_cache = [],
attr_accessor :errors = [],
attr_reader :decorators = nil
>
After fixing that I parsed for product:
doc.search('product').size
=> 20
There are <product> collisions:
<product>
<URL>
<product />
</URL>
</product>
which could confuse your code. It confused my search above that reported 20 occurrences.

Related

Checking if hash value has a text

I have a hash:
universityname = e.university
topuniversities = CSV.read('lib/assets/topuniversities.csv',{encoding: "UTF-8", headers:true, header_converters: :symbol, converters: :all})
hashed_topuniversities = topuniversities.map {|d| d.to_hash}
hashed_topuniversities.any? {|rank, name| name.split(' ').include?(universityname) }.each do |s|
if s[:universityrank] <= 10
new_score += 10
elsif s[:universityrank] >= 11 && s[:universityrank] <= 25
new_score += 5
elsif s[:universityrank] >= 26 && s[:universityrank] <= 50
new_score += 3
elsif s[:universityrank] >= 51 && s[:universityrank] <= 100
new_score += 2
end
Basically what this is doing is looking at a hash and checking if the hash value contains a university name is an input.
For example the user input can be "Oxford University" and in the hash its stored as "Oxford". The User needs to type in as it stored in the hash to be able to be assigned a score, But I want it that if the user types in "oxford university" then the hash value "Oxford" should be selected and then go through.
Everything else in this works fine but the .include? does not work correctly, I still need to type the exact word.
hashed_topuniversities = topuniversities.map &:to_hash
univ = hashed_topuniversities.detect do |rank, name|
name.downcase.split(' ').include?(universityname.downcase)
end
new_score += case univ[:universityrank]
when -Float::INFINITY..10 then 10
when 11..25 then 5
when 26..50 then 3
when 50..100 then 2
else 0
end
Besides some code improvements in terms of being more idiomatic ruby, the main change is downcase called on both university name and user input. Now they are compared case insensitive.
I don't think your approach will work (in real-life, anyway). "University of Oxford" is an easy one--just look for the presence of the word, "Oxford". What about "University of Kansas"? Would you merely try to match "Kansas"? What about "Kansas State University"?
Also, some universities are are customarily referred to by well-know acronyms or shortened names, such as "LSE", "UCLA", "USC", "SUNY", "LSU", "RPI", "Penn State", "Georgia Tech", "Berkeley" and "Cal Tech". You also need to think about punctuation and "little words" (e.g., "at", "the", "of") in university names (e.g., "University of California, Los Angeles").
For any serious application, I think you need to construct a list of all commonly-used names for each university and then require an exact match between those names and the given university name (after punctuation and little words have been removed). You can do that by modifying the hash hashed_top_universities, perhaps like this:
hashed_top_universities
#=> { "University of California at Berkeley" =>
# { rank: 1, names: ["university california", "berkeley", "cal"] },
# "University of California at Los Angeles" =>
# { rank: 2, names: ["ucla"] },
# "University of Oxford" =>
# { rank: 3, names: ["oxford", "oxford university"] }
# }
Names of some universities contain non-ASCII characters, which is a further complication (that I will not address).
Here's how you might code it.
Given a university name, the first step is to construct a hash (reverse_hash) that maps university names to ranks. The names consist of the elements of the value of the key :names in the inner hashes in hashed_top_universities, together with the complete university names that comprise the keys in that hash, after they have been downcased and punctuation and "little words" have been removed.
PUNCTUATION = ",."
EXCLUSIONS = %w| of for the at u |
SCORE = { 1=>10, 3=>7, 25=>5, 50=>3, 100=>2, Float::INFINITY=>0 }
reverse_hash = hashed_top_universities.each_with_object({}) { |(k,v),h|
(v[:names] + [simplify(k)]).each { |name| h[name] = v[:rank] } }
#=> {"university california"=>1, "berkeley"=>1, "cal"=>1,
# "university california berkeley"=>1,
# "ucla"=>2, "university california los angeles"=>2,
# "oxford"=>3, "oxford university"=>3, "university oxford"=>3}
def simplify(str)
str.downcase.delete(PUNCTUATION).
gsub(/\b#{Regexp.union(EXCLUSIONS)}\b/,'').
squeeze(' ')
end
def score(name, reverse_hash)
rank = reverse_hash[simplify(name)]
SCORE.find { |k,_| rank <= k }.last
end
Let's try it.
score("University of California at Berkeley", reverse_hash)
#=> 10
score("Cal", reverse_hash)
#=> 10
score("UCLA", reverse_hash)
#=> 7
score("Oxford", reverse_hash)
#=> 7

Rails Hpricot gem: Unable to get the link content from xml

I am using Hpricot gem to parse xml. I am able to get title and pubdate but it did not work for link. Here is the code snippet
items = doc.search("//item").first(6)
items.each do |item|
feed = {}
feed[:title] = item.search("//title").text
feed[:link] = item.search("//link").text
feed[:published_date] = item.search("//pubdate").text
feeds << feed
end
The resultant hpricot elements are as follows:
#<Hpricot::Elements[{elem <item> "\n\t\t" {elem <title> "openagent.com.au" </title>} "\n\t\t" {emptyelem <link>} "http://blog.iproperty.com.au/2016/03/22/openagent-com-au/" {bogusetag </link>} "\n\t\t" {elem <comments> "http://blog.iproperty.com.au/2016/03/22/openagent-com-au/#comments" </comments>} "\n\t\t" {elem <pubdate> "Mon, 21 Mar 2016 22:43:28 +0000" </pubDate>} "\n\t\t"
I have pasted the initial part as it is the only part which is important. Can anyone tell what is the solution for it.
items = doc.search("//item").first(6)
items.each do |item|
feed = {}
feed[:title] = item.search("//title").text
feed[:link] = item.search("//link").innerHTML
feed[:published_date] = item.search("//pubdate").text
feeds << feed
end
for getting link we can use innerHTML

Rails - decimal not displaying in view

I have the following object -
Reckoner Load (0.2ms) SELECT "reckoners".* FROM "reckoners" WHERE
"reckoners"."id" = ? LIMIT 1 [["id", 2]]
=> #<Reckoner id: 2, group: "test FTEProrated", description: "try new calc",
leave: 28, abscence: #<BigDecimal:7fab221cf438,'0.1E2',9(27)>,
created_at: "2014-10-06 15:56:10", updated_at: "2014-10-06 16:14:32",
FTEProRated: #<BigDecimal:7fab221ca1e0,'0.0',9(27)>>
If I've done my calculation correctly, FTEProRated should be about 0.8 - but it's being displayed as 0.0. Where am I going wrong?
The calculation is carried out in the controller before the object is saved -
in update -
pro_rate_this #reckoner
...
def pro_rate_this(reckoner)
reckoner.FTEProRated = (260-reckoner.leave)/260
end
when I change the calc to test what's being saved, it all seems okay - for example, this -
def pro_rate_this(reckoner)
reckoner.FTEProRated = (260-reckoner.leave)
end
gives a value of 232 when reckoner.leave == 28. It's only when the value becomes < 1 that it seems to be a problem, though that could be coincidence.
Math... make sure you're dividing by floats if you want your answer to be a float.
def pro_rate_this(reckoner)
reckoner.FTEProRated = (260-reckoner.leave)/260.0
end
See the denominator there? Just add that .0 to the 260 and report back.

Order alphabetically and group by first letter

I currently have the following code:
- #alpha = Glossary.find(:all, :order =>"title ASC").group_by{|u| u.title[0]}
- #glossary = Glossary.find(:all, :order =>"title ASC")
- #alpha.each do|a|
%h1= a[0]
- #glossary.each do |g|
%p display stuff
This displays all of the glossary terms under each letter rather than only the ones that begin with the letter.. I've tried a few things but I'm not sure how to select the right thing.
You should be able to do everything with your #alpha instance variable, since you're using group_by:
- #alpha = Glossary.find(:all, :order =>"title ASC").group_by{|u| u.title[0]}
- #alpha.each do |alpha, glossary_array|
%h1= alpha
- glossary_array.each do |item|
%p= item
You're close. I think you just want to do
- #alpha = Glossary.order("title ASC").group_by{|u| u.title[0]}
- #alpha.each do |letter, items|
%h1= letter
- items.each do |item|
%p= item

Rails + Bootstrap 3: Have two columns in a row, but have them collapse one column after another

This is an odd problem I'm having. I have a list of names, split up into two arrays at the middle. So names A-M are in name1 and names N-Z are in name2. Now I'm going through each array and putting them in a row with two columns so the names look like this:
Aaron Neil
Arthur Nick
etc. But when I collapse the page down, it looks like:
Aaron
Neil
Arthur
Nick
and I want it to look like:
Aaron
Arthur
Neil
Nick
Here's the haml:
- a = true
- index = 0
- while(a)
- first = name1[index]
- second = name2[index]
- if first != nil || second != nil
.row
.col-md-6
- if first
= first
.col-md-6
- if second
= second
- index += 1
- else
- a = false
I understand why this is happening, but I'm not exactly sure how else to approach this. Does anyone have any insight?
You are creating a new row and column for each set of names.
instead you should run your loop inside the columns like this:
- name1 = %w[Aaron Arthur]
- name2 = %w[Neil Nick]
.row
.col-md-6
-name1.each do |x|
%p= x
.col-md-6
-name2.each do |x|
%p= x

Resources