I'm trying to read in the html of a certain website.
Trying #something = open("http://www.google.com/") fails with the following error:
Errno::ENOENT in testController#show
No such file or directory - http://www.google.com/
Going to http://www.google.com/, I obviously see the site. What am I doing wrong?
Thanks!
You need to require 'open-uri' first to be able to open() remote paths.
See the docs for more info.
You should use a utility like Nokogiri to parse the returned content like so:
(From the Nokogiri site front page # http://nokogiri.org/)
require 'nokogiri'
require 'open-uri'
# Get a Nokogiri::HTML:Document for the page we’re interested in...
doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))
# Do funky things with it using Nokogiri::XML::Node methods...
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
puts link.content
end
will print to the screen:
Some Link
Related
This is my Scraper Controller
class ScraperController < ApplicationController
def getinformation
require 'open-uri'
require 'nokogiri'
#information = Nokogiri::HTML(open('https://ibotta.com/rebates'))
end
end
And this is a webpage of the information that I'm getting from nokogiri
https://rails-tutorial2-chriscma.c9users.io/scraper/getinformation
I'm not getting any of the product names, and I'm not sure why?
The page you're trying to scrape is dynamically generated by executing javascript, so you won't be able to use nokogiri to download the content. It looks like the offers on the page are loaded from https://ibotta.com/web_v1/offers.json, but this isn't accessible directly. Therefore, I think you'll need to use something which can execute javascript like selenium / phantomjs / chrome headless / watir etc in order to load the page.
So I have this code in my applicaton which is used to get a xml list of current rates from web and save them for future use in the app.
def get_rates
today_path = Rails.root.join 'rates', "#{Date.today.to_s}.xml"
Hash[Hash.from_xml(if File.exists? today_path
File.read today_path
else
xml = Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml'
File.write today_path, xml
xml
end)["Envelope"]["Cube"]["Cube"]["Cube"].map &:values]
end
This was written like half a year ago.
Now since today, it does not work anymore. I get this error:
NameError in FormController#converter
uninitialized constant FormController::Net
What has gone wrong?
It looks like the net/http library is not being required. It may have been required somewhere else in your application and that line was deleted or a gem could have been removed that loaded the library which allowed it to work. Try adding the following line at the top of your file before the class definition and see if it works again.
require "net/http"
It looks like you need to do require 'net/http'. Add that line in your form_controller.rb file and try to run that method again to check if it works.
I search links via css form page = agent.get('http://www.print-index.ru/default.aspx?p=81&gr=198') and after that I have in page variable a lot of links but I don't know how use them, how click on them via Mechanize. I found on stackoverflow this method:
page = agent.get "http://google.com"
node = page.search ".//p[#class='posted']"
Mechanize::Page::Link.new(node, agent, page).click
but it works for only one link so how can I use this method for many links.
If I should post additional information, please say it.
If your goal is simply to make it to the next page and then scrape some info off of it, then all you really care about are:
Page content (For scraping your data)
The URL to the next page you need to visit
The way you get to the page content could be done by using Mechanize OR something else, like OpenURI (which is part of Ruby standard lib). As a side note, Mechanize uses Nokogiri behind the scenes; when you start to dig into elements on the parsed page you will see they come back as Nokogiri related objects.
Anyways, if this were my project I'd probably go the route of using OpenURI to get at the page's content and then Nokogiri to search it. I like the idea of using a Ruby standard library instead of requiring an additional dependency.
Here is an example using OpenURI:
require 'nokogiri'
require 'open-uri'
printing_page = Nokogiri::HTML(open("http://www.print-index.ru/default.aspx?p=81&gr=198"))
# ...
# Your code to scrape whatever you want from the Printing Page goes here
# ...
# Find the next page to visit. Example: You want to visit the "About the project" page next
about_project_link_in_navbar_menu = printing_page.css('a.graymenu')[4] # This is a overly simple finder. Nokogiri can do xpath searches too.
about_project_link_in_navbar_menu_url = "http://www.print-index.ru#{about_project_link_in_navbar_menu.attributes["href"].value}" # Get the URL page
about_project_page = Nokogiri::HTML(open(about_project_link_in_navbar_menu_url)) # Get the About page's content
# ....
# Do something...
# ....
Here's an example using Mechanize to get the page content (they are very similar):
require 'mechanize'
agent = Mechanize.new
printing_page = agent.get("http://www.print-index.ru/default.aspx?p=81&gr=198")
# ...
# Your code to scrape whatever you want from the Printing Page goes here
# ...
# Find the next page to visit. Example: You want to visit the "About the project" page next
about_project_link_in_navbar_menu = printing_page.search('a.graymenu')[4] # This is a overly simple finder. Nokogiri can do xpath searches too.
about_project_link_in_navbar_menu_url = "http://www.print-index.ru#{about_project_link_in_navbar_menu.attributes["href"].value}" # Get the URL page
about_project_page = agent.get(about_project_link_in_navbar_menu_url)
# ....
# Do something...
# ....
PS I used google to translate Russian to english.. if the variable names are incorrect, i'm sorry! :X
I'm trying to fetch an image from Twitter:
open("http://api.twitter.com/1/users/profile_image/barackobama.png?size=bigger")
But I get:
RuntimeError: redirection forbidden: http://... -> https://...
There is an open issue and it seems that I can use an extension to open_uri but I don't know how it works. For example, if I place it in lib/ or if I paste the module in the console, it still doesn't work. Any idea?
I think the proper place to put such a patch is in a file inside config/initializers, i.e. config/initializers/open_uri_allow_unsafe_redirects_patch.rb. You have to require 'open-uri' before reopening the OpenURI module:
require 'open-uri'
module OpenURI
# the rest of the file here...
end
Then you have to call open passing the option allow_unsafe_redirects set to true:
open('http://api.twitter.com/1/users/profile_image/barackobama.png?size=bigger',
allow_unsafe_redirects: true)
You can find more informations about initializer files on the Ruby on Rails guide
I've just started using Ruby on Rails and so far it's working nicely. I'm now trying to implement a gem but it's not working, and I am hoping it's just a beginner mistake - something which I've not yet grasped!!
I've followed tutorials and got my hello world example - also managed to git push this to my Heroku account. I started to follow the tutorial here: http://railscasts.com/episodes/190-screen-scraping-with-nokogiri and got the following code working in Terminal (on mac)
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=batman&Find.x=0&Find.y=0&Find=Find"
doc = Nokogiri::HTML(open(url))
puts doc.at_css("title").text
So that worked nicely. I can see the title inside of terminal. However, when I try to put this code in my view controller, it cannot find the nokogiri gem. The code in my controller is
class HomeController < ApplicationController
def index
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=batman&Find.x=0&Find.y=0&Find=Find"
doc = Nokogiri::HTML(open(url))
#mattVar = doc.at_css("title").text
end
end
So then in my HTML, I have
<h1>Hello!</h1>
<p><%= #mattVar %></p>
Which should output the title as it does in the terminal window. However, I get an error saying no such file to load -- nokogiri
Any ideas where I'm going wrong?
If I do gem list I can see the nokogiri gem there.
UPDATE
I closed the local rails server and restarted. It's now working. Not sure if it was because of the work adding the gem to my root or doing the 'bundle install' (as the posts below suggested). Thanks for the help.
Firstly try changing your controller to be somthing like
class HomeController < ApplicationController
require 'nokogiri'
require 'open-uri'
def index
url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=batman&Find.x=0&Find.y=0&Find=Find"
doc = Nokogiri::HTML(open(url))
#mattVar = doc.at_css("title").text
end
end
I'm guessing you are also using bundler, check that you have include this gem and run bundle install.
You can find the documentation for this at http://gembundler.com/
You should put your various require statements outside of your index method. It won't slow you down much to have them in there, but you're asking Ruby to ensure that they are loaded every time the method is called. Better to just require them once at the top of your code. This won't cause your problem, however.
I suspect that you are testing your server under a different environment than what IRB is running in. Try removing the offending require statements for the moment and changing your HTML template to:
<h1>Environment</h1>
<ul>
<li>Ruby: <%=RUBY_VERSION%></li>
<li>Gems: <%=`gem list`%></li>
</ul>