I have an rails application wherein i am scraping data from the internet. I have this snippet of code where it reports syntax errors thus preventing it from running.
I have tried to sort it out but unable to find out what is wrong. Where am i going wrong.
The snippet is shown below:
def reuters
ticker_sym = 'FB.O'
reuters_home_url = "http://in.reuters.com"
reuters_base_url = "http://in.reuters.com/finance/stocks/"
board_members = Nokogiri::HTML(open(reuters_base_url + 'companyOfficers?symbol=' + ticker_sym.to_s ))
members = []
table = board_members.css('.column1 tbody.dataSmall').first
table_desc = board_members.css('.column1 tbody.dataSmall')[1]
table.css('tr').each_with_index do |row,index|
next if index == 0
members << {
name: row.css('td[1] h2 a').text.strip,
title: row.css('td[4]').text.strip,
position_held: row.css('td[3]').text.strip,
age: row.css('td[2]').text.strip,
member_link: URI.join(reuters_home_url,row.css('td[1] h2 a').attr("href")).to_s
table_desc.css('tr').each_with_index do |col,index2|
next if index2 == 0
members << {
description: col.css('td[2]').text.strip
}
end
}
end
end
Have attached a screenshot of my rails application error page shown below:
Rails error page
Add } before table_desc.css('tr').each_with_index do |col, index2| and remove } after end like this.
def reuters
ticker_sym = 'FB.O'
reuters_home_url = "http://in.reuters.com"
reuters_base_url = "http://in.reuters.com/finance/stocks/"
board_members = Nokogiri::HTML(open(reuters_base_url + 'companyOfficers?symbol=' + ticker_sym.to_s))
members = []
table = board_members.css('.column1 tbody.dataSmall').first
table_desc = board_members.css('.column1 tbody.dataSmall')[1]
table.css('tr').each_with_index do |row, index|
next if index == 0
members << {
name: row.css('td[1] h2 a').text.strip,
title: row.css('td[4]').text.strip,
position_held: row.css('td[3]').text.strip,
age: row.css('td[2]').text.strip,
member_link: URI.join(reuters_home_url, row.css('td[1] h2 a').attr("href")).to_s
}
table_desc.css('tr').each_with_index do |col, index2|
next if index2 == 0
members << {
description: col.css('td[2]').text.strip
}
end
end
end
Related
I am working on this exercise in pil4.
Exercise 15.5:
The approach of avoiding constructors when saving tables with cycles is too radical. It is
possible to save the table in a more pleasant format using constructors for the simple case, and to use
assignments later only to fix sharing and loops. Reimplement the function save (Figure 15.3, “Saving
tables with cycles”) using this approach. Add to it all the goodies that you have implemented in the previous
exercises (indentation, record syntax, and list syntax).
I have tried this with the code below, but it seems not to work on the nested table with a string key.
local function basicSerialize(o)
-- number or string
return string.format("%q",o)
end
local function save(name,value,saved,indentation,isArray)
indentation = indentation or 0
saved = saved or {}
local t = type(value)
local space = string.rep(" ",indentation + 2)
local space2 = string.rep(" ",indentation + 4)
if not isArray then io.write(name," = ") end
if t == "number" or t == "string" or t == "boolean" or t == "nil" then
io.write(basicSerialize(value),"\n")
elseif t == "table" then
if saved[value] then
io.write(saved[value],"\n")
else
if #value > 0 then
if indentation > 0 then io.write(space) end
io.write("{\n")
end
local indexes = {}
for i = 1,#value do
if type(value[i]) ~= "table" then
io.write(space2)
io.write(basicSerialize(value[i]))
else
local fname = string.format("%s[%s]",name,i)
save(fname,value[i],saved,indentation + 2,true)
end
io.write(",\n")
indexes[i] = true
end
if #value > 0 then
if indentation > 0 then io.write(space) end
io.write("}\n")
else
io.write("{}\n")
end
saved[value] = name
for k,v in pairs(value) do
if not indexes[k] then
k = basicSerialize(k)
local fname = string.format("%s[%s]",name,k)
save(fname,v,saved,indentation + 2)
io.write("\n")
end
end
end
else
error("cannot save a " .. t)
end
end
local a = { 1,2,3, {"one","Two"} ,5, {4,b = 4,5,6} ,a = "ddd"}
local b = { k = a[4]}
local t = {}
save("a",a,t)
save("b",b,t)
print()
And I got the wrong ouput.
a = {
1,
2,
3,
{
"one",
"Two",
}
,
5,
{
4,
5,
6,
}
a[6]["b"] = 4
,
}
a["a"] = "ddd"
b = {}
b["k"] = a[4]
How could I make the text ' a[6]["b"] = 4 ' jump out of the table constructor?
I'm completely new to Ruby on Rails but I think I might be missing something obvious. I'm currently working on a webapp that scrapes auction websites. The bones of the app was created by someone else. I'm currently trying to add new website scrapes but they don't seem to be working.
I have read through some of the Nokogiri documentation, checked that the scraped information is indeed not being written to the database (the seeded URLs that are being targeted have been when I check via the rails console) and used the chrome extension CSS Selector Tester to check that I am targeting the correct CSS selectors. The record ids are correct when I check via the rails console.
I have put what I think are the important sections of code below, but I might be missing something that I don't realise is important.
The websites I'm having issues with are Lot-art.com & Lot-Tissimo.com
Any help will be much appreciated.
Seeded URLs
Source.create(name: "Auction.fr", query_template: "https://www.auction.fr/_en/lot/search/?contexte=futures&tri=date_debut%20ASC&query={query}&page={page}")
Source.create(name: "Invaluable.co.uk", query_template: "https://www.invaluable.co.uk/search/api/search-results?keyword={query}&size=1000")
Source.create(name: "Interencheres.com", query_template: "http://www.interencheres.com/en/recherche/lot?search%5Bkeyword%5D={query}&page={page}")
Source.create(name: "Gazette-drouot.com", query_template: "http://catalogue.gazette-drouot.com/html/g/recherche.jsp?numPage={page}&filterDate=1&query={query}&npp=100")
Source.create(name: "Lot-art.com", query_template: "http://www.lot-art.com/auction-search/?form_id=lot_search_form&page=1&mq=&q={query}&ord=recent")
Source.create(name: "Lot-tissimo.com", query_template: "https://lot-tissimo.com/en/cmd=s&lwr=&ww={query}&xw=&srt=SN&wg=EUR&page={page}")
Scheduler code
require 'rufus-scheduler'
require 'nokogiri'
require 'mechanize'
require 'open-uri'
require "net/https"
s = Rufus::Scheduler.singleton
s.interval '1m' do
setting = Setting.find(1)
agent = Mechanize.new
agent.user_agent_alias = 'Windows Chrome'
agent.cookie_jar.load(File.join(Rails.root, 'tmp/cookies.yaml'))
List.all.each do |list|
number_of_new_items = 0
list.actions.each do |action|
url = action.source.query_template.gsub('{query}', action.list.query)
case action.source.id
when 1 # Auction.fr
20.downto(1) do |page|
doc = Nokogiri::HTML(open(url.gsub('{page}', page.to_s)))
doc.css("div.list-products > ul > li").reverse.each do |item_data|
price = 0
if item_data.at_css("h3.h4.adjucation.ft-blue") && /Selling price : ([\d\s]+) €/.match(item_data.at_css("h3.h4.adjucation.ft-blue").text)
price = /Selling price : ([\d\s]+) €/.match(item_data.at_css("h3.h4.adjucation.ft-blue").text)[1].gsub(" ", "")
end
item = action.items.new(
title: item_data.at_css("h2").text.strip,
url: item_data.at_css("h2 a")["href"],
picture: item_data.at_css("div.image-wrap.lazy div.image img")["src"],
price: price,
currency: "€"
)
ActiveRecord::Base.logger.silence do # This disable writing logs
if item.save
number_of_new_items = number_of_new_items + 1
end
end
end
end
when 97 # Lot-Tissimo.com
5.downto(1) do |page|
doc = Nokogiri::HTML(open(url.gsub('{page}', page.to_s)))
doc.css("#inhalt > .objektliste").reverse.each do |item_data|
# price = 0
# if item_data.at_css("h3.h4.adjucation.ft-blue") && /Selling price : ([\d\s]+) €/.match(item_data.at_css("h3.h4.adjucation.ft-blue").text)
# price = /Selling price : ([\d\s]+) €/.match(item_data.at_css("h3.h4.adjucation.ft-blue").text)[1].gsub(" ", "")
# end
item = action.items.new(
title: item_data.at_css("div.objli-desc").text.strip,
url: item_data.at_css("td.objektliste-foto a")["href"],
picture: item_data.at_css("td.objektliste-foto a#lot_link img")["src"],
price: price,
currency: "€"
)
ActiveRecord::Base.logger.silence do # This disable writing logs
if item.save
number_of_new_items = number_of_new_items + 1
end
end
end
end
when 2 # Invaluable.co.uk
doc = JSON.parse(open(url).read)
doc["itemViewList"].reverse.each do |item_data|
puts item_data["itemView"]["photos"]
item = action.items.new(
title: item_data["itemView"]["title"],
url: "https://www.invaluable.co.uk/buy-now/" + item_data["itemView"]["title"].parameterize + "-" + item_data["itemView"]["ref"],
picture: item_data["itemView"]["photos"] != nil ? item_data["itemView"]["photos"].first["_links"]["medium"]["href"] : nil,
price: item_data["itemView"]["price"],
currency: item_data["itemView"]["currencySymbol"]
)
ActiveRecord::Base.logger.silence do # This disable writing logs
if item.save
number_of_new_items = number_of_new_items + 1
end
end
end
when 3 # Interencheres.com
# doc = Nokogiri::HTML(open(url))
5.downto(1) do |page|
doc = Nokogiri::HTML(open(url.gsub('{page}', page.to_s)))
doc.css("div#lots_0 div.ligne_vente").reverse.each do |item_data|
price = 0
item = action.items.new(
title: item_data.at_css("div.ph_vente div.des_vente p a").text.strip,
url: "http://www.interencheres.com" + item_data.at_css("div.ph_vente div.des_vente p a")["href"],
picture: item_data.at_css("div.ph_vente div.gd_ph_vente img")["src"],
price: price,
currency: "€"
)
ActiveRecord::Base.logger.silence do # This disable writing logs
if item.save
number_of_new_items = number_of_new_items + 1
end
end
end
end
when 4 # Gazette-drouot.com
5.downto(1) do |page|
# doc = Nokogiri::HTML(open(url.gsub('{page}', page.to_s)))
doc = agent.get(url.gsub('{page}', page.to_s))
# doc = agent.get(url)
doc.css("div#recherche_resultats div.lot_recherche").reverse.each do |item_data|
price = 0
picture = item_data.at_css("img.image_thumb_recherche") ? item_data.at_css("img.image_thumb_recherche")["src"] : nil
item = action.items.new(
title: item_data.at_css("#des_recherche").text.strip.truncate(140),
url: "http://catalogue.gazette-drouot.com/html/g/" + item_data.at_css("a.lien_under")["href"],
picture: picture,
price: price,
currency: "€"
)
ActiveRecord::Base.logger.silence do # This disable writing logs
if item.save
number_of_new_items = number_of_new_items + 1
end
end
end
end
when 69 # Lot-art.com
doc = agent.get(url)
doc.css("div.lot_list_holder").reverse.each do |item_data|
price = 0
item = action.items.new(
title: item_data.at_css("div.lot_list_body a")[0].text.strip.truncate(140),
url: item_data.at_css("div.lot_list_body")["href"],
picture: item_data.at_css("a.lot_list_thumb img") ["src"],
price: price,
currency: "€"
)
ActiveRecord::Base.logger.silence do # This disable writing logs
if item.save
number_of_new_items = number_of_new_items + 1
end
end
end
end
end
if number_of_new_items > 0 && setting.notifications_per_hour > setting.notifications_this_hour && setting.pushover_app_token.present? && setting.pushover_user_key.present?
url = URI.parse("https://api.pushover.net/1/messages.json")
req = Net::HTTP::Post.new(url.path)
req.set_form_data({
:token => setting.pushover_app_token,
:user => setting.pushover_user_key,
:message => "#{number_of_new_items} new items on #{list.name}!",
:url_title => "Check the list",
:url => "http://spottheauction.com/lists/#{list.id}"
})
res = Net::HTTP.new(url.host, url.port)
res.use_ssl = true
res.verify_mode = OpenSSL::SSL::VERIFY_PEER
res.start {|http| http.request(req) }
end
end
agent.cookie_jar.save(File.join(Rails.root, 'tmp/cookies.yaml'))
end
s.cron '0 * * * *' do
setting = Setting.find(1)
setting.notifications_this_hour = 0
setting.save
end
new just initializes an instance but doesn't save the instance. Do you actually call save somewhere?
You have two options:
Call save on the item:
item = action.items.new(
# ...
)
item.save
Or use create instead of new:
item = action.items.create(
# ...
)
In case someone else comes across this. I got the scraping of lot-art.com to work. It seemed that I was lacking specificity in the css selector for nokogiri to pull the correct data.
I am still having continuing issues with lot-tissimo although that appears to be from something else as other scrapers have issues such as scraping-hub's portia spiders.
Going blind here.. I can't understand why these 2 strings are not equal.. When I puts them to the terminal they are both class string and when I just compare the output they ARE equal. But somehow in my code they are not.. I can't figure out why.
Here is my Ruby code:
def prep_for_duplicate_webhook
#redis_cart = Redis.new
cart_stamp_saved = #redis_cart.get("cart_stamp_saved")
if cart_stamp_saved.nil?
cart_stamp_saved = {}
cart_stamp_saved[:token] = cart_params['token']
cart_stamp_saved[:updated_at] = cart_params['updated_at']
#redis_cart.set("cart_stamp_saved", cart_stamp_saved.to_json)
end
#cart_stamp_incoming = {}
#cart_stamp_incoming["token"] = cart_params['token']
#cart_stamp_incoming["updated_at"] = cart_params['updated_at']
end
def duplicate_webhook?
prep_for_duplicate_webhook
#cart_stamp_saved = redis_cart.get("cart_stamp_saved")
cart_stamp_saved == cart_stamp_incoming.to_json
end
And then the hash's I'm comparing are these two:
cart_stamp_saved = {"token"=>"4a093432ba5c430dd545b16c0e89f187",
"updated_at"=>"2017-02-17T15:27:22.923Z"}
cart_stamp_incoming= {"token"=>"4a093432ba5c430dd545b16c0e89f187",
"updated_at"=>"2017-02-17T15:27:22.923Z"}
If I just copy and paste the above into a new page, and the do this, the response is true
pp cart_stamp_saved == cart_stamp_incoming.to_json
What am I missing?
I have a piece of code I have written in Ruby 1.8.7
Variable emotes has a list of emoticons separated by a space. However, when I apply the split function, I get an error
lines=[]
keywords=""
emotes=""
Dir["/home/pnpninja/Downloads/aclImdb/train/formatted neg/*"].each do |reviewfile|
sum_emote = 0
sum_keyword = 0
lines = File.foreach(reviewfile.to_s).first(2)
lines[0].gsub!("\n",'')
keywords = lines[0].split(" ")
emotes = lines[1].split(" ")
keywords.each { |keyword| sum_keyword = sum_keyword + keywordhash[keyword] }
emotes.each { |emote| sum_emote = sum_emote + emotehash[emote] }
senti=""
if sum_emote+sum_keyword >= 0
senti = "Positive"
else
senti = "Negative"
end
vv = reviewfile.gsub('formatted neg','analysed neg')
fin = File.open(vv.to_s, 'a')
fin << "Emote Weight = #{sum_emote}\n"
fin << "Keyword Weight = #{sum_keyword}\n"
fin << "Sentiment : #{senti}"
fin.close
end
The error I get is
NoMethodError: private method `split' called for nil:NilClass
at line
emotes = lines[1].split(" ")
THe second line in each file could be empty or not.
The error is telling you that you can't call split on a nil object.
Rewrite your code to ensure there are no nil objects or ensure nothing is done if the object in question is nil
unless lines[1].nil?
emotes = lines[1].split(" ")
emotes.each { |emote| sum_emote = sum_emote + emotehash[emote] }
end
What I want to do is something like this:
searchid = 4
while searchid != -1
#a += A.find(searchid)
#b = B.find(searchid)
searchid = #b.parentid
end
The problem being the line
#a += A.find(searchid)
The error being something like
NoMethodError: undefined method `+' for #<A:0x173f9a0>
So, how do you combine multiple 'find' requests?
You have to initialize #a = [] as an array before the += .
searchid = 4
#a = []
while searchid != -1
#a += A.find(searchid)
#b = B.find(searchid)
searchid = #b.parentid
end
You can combine them like:
searchid = 4
#a = []
while searchid != -1
#a += A.find(searchid)
#a += B.find(searchid)
searchid = #a.last.parentid
end
Got it to work (with help).
Did the following:
#a = []
with
#a << A.find_by_something(something)
seems to have worked.
Also using #a.compact! to get rid of the null entries.
Thanks for all the help :)