Get content in href page when crawl data in rails? - ruby-on-rails

I want to crawl data from a website. In this website :
HTML :
<div>
<ul>
<li>Place1</li>
<li>Place2</li>
</ul>
</div>
Inside "http://.../place1":
<div>
<p>Place 1</p>
<img src="...">
<div>
How can I crawl data inside href using 'Nokogiri" gem? (Data in other page when we click )
When I research, I only find the way to crawl data in a page. Not find how to crawl data inside href page. Thanks

In order to crawl data inside href, you have to create a new request to crawl data inside it.
...
# require 'open-uri'
href = 'http://.../place1'
doc = Nokogiri::HTML(open(href))
...

You can get all links by .css method. Then you can crawl by like this
# require 'open-uri'
links = doc.css('a').map { |link| link['href'] }
links.each do |link|
doc = Nokogiri::HTML(open(link))
end

Related

Rails: How to read a list of images from a URL

I have a URL like below->
images = open("example.com").read
which returns
<center>
<font size=-1>
<img src=example.com/show?1><br>1 image<p>
<img src=example.com/show?2><br>2 image<p>
<img src=example.com/show?3><br>3 image<p>
</font>
I want to capture each of these on backend and send them to the front end.
So far I was sending the resulting html directly to front end where it was displayed. But now I want to capture it on backend and then send each one to UI. How can I do this?
I will recommend Nokogiri for this. You can then do something like
html_string = open("example.com").read
nokogiri_html_string = Nokogiri::HTML( html_string )
image_tags = nokogiri_html_string.css('img')
image_sources = nokogiri_html_string.css('img').map{ |i| i['src'] }
Hope this will help.

Display all Ghost tags

I would like to add all my tags to a footer section. {{tags}} seems to only work within a post, but not somewhere in the default template.
How can I display a full list of all tags created?
I managed to get my result using the Ghost API, however I would be surprised if this is the only way.
You should use the get helper:
{{#get "tags" limit="all"}}
<ul class="tags">
{{#foreach tags}}
<li>
{{name}}
</li>
{{/foreach}}
</ul>
{{/get}}
You can read more about get helper on docs.

Can't select <article> selector with Nokogiri

Here is the HTML source I am trying to scrape:
<section class="articles">
<article role="article">
</article>
<article role="article">
</article>
I am trying to scrape the href with this:
require 'open-air'
require 'nokogiri'
url = "http://www.vg.no/sport/langrenn/"
doc = Nokogiri::HTML(open(url))
doc.css(".articles article").each do |i|
location = i.at_css("a")[:href]
puts location
end
I have tried so many other things, but this seems like it should work. I have been able to scrape content using other selectors on this page, just nothing inside of the <article></article> tags, which contains everything I need.

Using rails data collection in .js.erb

I have a books index page which shows all books using the familiair structure:
#books.each do |book|
....
end
On the same page I want to show a Google Chart, using the same #books data.
The chart is triggered by a div:
<div id="chart_div"></div>
And there is a chart.js.erb file:
if ($("#chart_div").length > 0){
google.load('visualization', '1.1', {packages: ['corechart']});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['Title', 'Author'],
<%- #books.each do |book| %>
<%= "['#{book.title}','#{book.author}']," %>
<%- end %>
]);
var chart = new google.visualization.AreaChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
}
But this returns the error undefined methodeach' for nil:NilClass, so the#booksis not available in the.js.erb` file. How can I make the books available in the JavaScript? And are there better options to use this data in the chart?
JavaScripts are not compiled in the context of controller action therefore you cannot use instance variables assigned in controller. You have possibly two options:
Load data via AJAX from the server from your chart.js file
Store #book data into your DOM during page rendering eg. as JSON and then read it from the chart.js
Second option is a little bit easier to explain:
Add code something like this to your page with cart:
<script id="chart_data" type="application/json" charset="uff-8>
<%= raw #books.map { |book| [h(book.title), h(book.author)] }.unshift(['Title', 'Author']).to_json %>
</script>
And to your draw_chart function in chart.js load data with
var data_array = JSON.parse($("#chart_data").html());
var data = google.visualization.arrayToDataTable(data_array);
You can also remove erb extension from chart.js you will not need to compile javascripts with erb.
I hope this helps a bit.

Nokogiri: how to find a div by id and see what text it contains?

I just started using Nokogiri this morning and I'm wondering how to perform a simple task: I just need to search a webpage for a div like this:
<div id="verify" style="display:none"> site_verification_string </div>
I want my code to look something like this:
require 'nokogiri'
require 'open-uri'
url = h(#user.first_url)
doc = Nokogiri::HTML(open(url))
if #SEARCH_FOR_DIV#.text == site_verification_string
#user.save
end
So the main question is, how do I search for that div using nokogiri?
Any help is appreciated.
html = <<-HTML
<html>
<body>
<div id="verify" style="display: none;">foobar</div>
</body>
</html>
HTML
doc = Nokogiri::HTML html
puts 'verified!' if doc.at_css('[id="verify"]').text.eql? 'foobar'
For a simple way to get an element by its ID you can use .at_css("element#id")
Example for finding a div with the id "verify"
html = Nokogiri::HTML(open("http://example.com"))
puts html.at_css("div#verify")
This will get you the div and all the elements it contains

Resources