Using Nokogiri to scrape data: "undefined method `text'" - ruby-on-rails

I am using Nokogiri in my Rails application to scrape information from a website but am getting:
NoMethodError: "undefined method `text' for nil:NilClass".
This is sample code:
require 'nokogiri'
require 'open-uri'
url = "https://btc-e.com/exchange/btc_usd/"
doclink = Nokogiri::HTML(open(url))
doclink.at_css(".orderStats:nth-child(1) strong").text
I am trying to pull in the "Last Price" listed in the URL. I used the "SelectorGadget" Chrome Add-in to find the CSS description. I also tried using .orderStats strong but got the same no method error. How do I fix this?

The page you are referring to uses JavaScript to populate itself. Since Nokigiri doesn't execute JS, the page Nokigiri fetches is pretty useless:
<html>
<head><title>loading</title></head>
<body>
<p>Please wait...</p>
<script>/* POPULATES THE PAGE */</script>
</body>
</html>
One solution would be to use a scraper that executes JS, e.g. Capybara+PhantomJS. Here's an article that describes how: http://www.chrisle.me/2012/12/scraping-html5-sites-using-capybara-phantomjs/. Google for more info.

Related

How to parse a Google search page to get result statistics and AdWords count using Nokogiri

I am trying to scrape a Google search page to learn scraping, using code like this:
doc = Nokogiri::HTML(open("https://www.google.com/search?q=cardiovascular+diesese"))
I want to get the result statistics text in every search page:
but I can't find the position of the content in the parsed HTML. I can inspect the page in the browser and see it's in a <div id="result-stats">. I tried this to find it:
doc.at_css('[id="result-stats"]').text
Your use of CSS is awkward. Consider this:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<div id="result-stats">foo</div>
</body>
</html>
EOT
doc.at_css('[id="result-stats"]').text # => "foo"
doc.at('#result-stats').text # => "foo"
CSS uses # for id, so '[id="result-stats"]' is unnecessarily verbose.
Nokogiri is smart enough to know to use CSS when it looks at the selector; In many years of using it I've only fooled it once and was forced to use the CSS/XPath specific versions of the generic search or at methods. By using the generic methods you can change the selector between CSS and XPath without bothering with the method being called. "Using 'at', 'search' and their siblings" talks about this.
In addition, just for fun, Nokogiri should have all the jQuery extensions to CSS as those were on the v2.0 roadmap for Nokogiri.
You need to use Selenium WebDriver to get dynamic content. Nokogiri alone cannot parse it.
require 'selenium-webdriver'
driver = Selenium::WebDriver.for :firefox
driver.get "https://www.google.com/search?q=cardiovascular+diesese"
doc = Nokogiri::HTML driver.page_source
doc.at_css('[id="result-stats"]').text

Google analytics with Rails 5 and ActiveAdmin

I've followed these steps and I successfully see traffic going to my site, the problem is I can see the traffic of every controller/method EXCEPT those within the admin namespace, because the script at app/views/layouts/application.html.erb (the one shown below) is not executed at any ActiveAdmin page...
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-XXXXXXX-XX', 'herokuapp.com');
ga('send', 'pageview');
</script>
I know Devise is able to tell me how many times an user have sign in my site, but I would like to see more statistics....
How could I solve this? I don't want put a render partial with that code in every ActiveAdmin's controller/method just for this task... is there an app/views/layouts/application.html.erb equivalent for ActiveAdmin? I mean a way to put a script tag in every ActiveAdmin's view?
You can add custom scripts to ActiveAdmin's layout by simply editing app/assets/javascripts/active_admin.js. This file should have been created when you first set up ActiveAdmin.
Create a new javascript file to hold the Google Analytics script, and simply include the new script in active_admin.js.
It should look something like this:
// app/assets/javascripts/google_analytics.js
// Google analytics script goes here.
// app/assets/javascripts/active_admin.js
//= require active_admin/base
//= require ./google_analytics

Search through raw code for <script> tags in Ruby on Rails

I am using HTTParty gem to get raw html but I am only interested in the JSON section of the code. That part of the code is nested inside a
<script type="text/javascript">.
From within my Rails app how can I only select only the section I am interested in screenshot of response?
response = HTTParty.get('https://www.instagram.com/crossfitwanderlust_bali/')
Use nokogiri gem. With this, you can parse your HTML content easily.

Chartkick for Newsletters

Is it possible to use Chartkick without rails? I would like to write a ruby script that generates a HTML newsletter which can be send by mail. The output should be a HTML file.
How do I integrate Chartkick into my project and supply it with the data needed?
Yes you can use chartkick without rails i'll show you basic example from there you can go to this link for more http://www.stuartellis.name/articles/erb/
inside irb type this
require 'chartkick'
include Chartkick::Helper
#data = [
["Washington", "1789-04-29", "1797-03-03"],
["Adams", "1797-03-03", "1801-03-03"],
["Jefferson", "1801-03-03", "1809-03-03"]
]
template = "<%= timeline #data%>"
renderer = ERB.new(template)
puts renderer.result()
This gives you html and js you need, but you have to include js manually
<script src="https://www.google.com/jsapi"></script>
<script src="chartkick.js"></script>
which you can download here chartkick
on a side note:
you dont even need to use ruby it's just javascript library chartkick.js

Using javascript global window variables and integration testing

There's this nifty stackoverflow post on passing variables to Javascript. It echos this railscast episode. The technique works like a charm for configuring a jquery datepicker, but cause all my javascript integration tests to fail.
Here is the code in application.html.erb
<script type="text/javascript">
<%-# commented line -%>
window.sDateFormatLocal = "<%= t 'date.formats.js_date' %>"
</script>
This is a datepicker initialization that uses it
$("input.datepicker").datepicker({
dateFormat: sDateFormatLocal,
onClose: function(value, ui) {
console.log("regular old datepicker");
}
}
It appears to work very well. The only problem, all my integration tests with 'js: true' now fail. Here are the errors I get.
Capybara::Poltergeist::JavascriptError:
One or more errors were raised in the Javascript code on the page:
ReferenceError: Can't find variable: sDateFormatLocal
When I run in browser (Chrome, Firefox) there are no errors or warnings in the console.
For completeness, a typical spec looks like this:
describe "The root" do
it "should load the page and have js working.", :js => true do
visit root_path
page.should have_content "Hello world"
end
end
Is there a setting I am missing to allow variables like this in poltergeist?
If your datepicker function is not in jQuery document ready function or similar methods such as window.onload you'll have trouble.
By default the application.js will be loaded in head, and the JS code in your html.erb later. The html and assets are loaded by browser in parallel, and very likely assets will finish loading at first. If you execute the function right away instead of waiting for document ready, the variable is undefined.
If you missed that, put such code in ready block.
A better practice for exporting server variable is, instead of put it in html body, put it on head before application.js so you won't have any problem on undefined variable.
Just to follow-up on this, in case someone in is debugging a similar problem. It turned out in this case, not every view was using the same template. For instance, the signin screen has a different head. That was causing this not to load in certain circumstances, yet not others, like the ones where my tests were failing. Bottom line, make sure when you replicate your tests, you are doing it exactly as the tests do, like in my case, passing through a signin screen.

Resources