I have a div which looks like as following and I am trying to scrape the data using itemprop but I cant seem to get it to work.
<div class="information">
<h1 itemprop="title">Some title here</h1>
<span itemprop="addressLocality">St. Inigoes</span>,
<span itemprop="addressRegion">MD</span>
<span itemprop="addressCountry">US</span>
</div>
Without itemprop I can get the data uaing data.css('.information').css('h1').try(:text) but if i try the following i get null data.css('meta[#itemprop="title"]') and the response I get it null.
So my question is how can i scrape the data of all span and h1 using itemprop
You should be able to scrape using the following technique
title = data.at("//h1[#itemprop = 'title']").children.text
addressLocality = data.at("//span[#itemprop = 'addressLocality']").children.text
addressRegion = data.at("//span[#itemprop = 'addressRegion']").children.text
addressCountry = data.at("//span[#itemprop = 'addressCountry']").children.text
Related
I have a URL like below->
images = open("example.com").read
which returns
<center>
<font size=-1>
<img src=example.com/show?1><br>1 image<p>
<img src=example.com/show?2><br>2 image<p>
<img src=example.com/show?3><br>3 image<p>
</font>
I want to capture each of these on backend and send them to the front end.
So far I was sending the resulting html directly to front end where it was displayed. But now I want to capture it on backend and then send each one to UI. How can I do this?
I will recommend Nokogiri for this. You can then do something like
html_string = open("example.com").read
nokogiri_html_string = Nokogiri::HTML( html_string )
image_tags = nokogiri_html_string.css('img')
image_sources = nokogiri_html_string.css('img').map{ |i| i['src'] }
Hope this will help.
<div>
<div class="col-md-4">
<h3 class="textStrong">Latest Tweets</h3>
<a class="twitter-timeline" href="https://twitter.com/RFUK">Tweets by RFUK </a></div>
</div>
<div class="col-md-4"></div>
<div class="col-md-4">
<h2>News Feeds</h2>
#{
var news = new List<Piranha.Entities.Post>();
using (var db = new Piranha.DataContext()) {
news = db.Posts
.Include(p => p.CreatedBy)
.Where(p => p.Template.Name == "News Post Types")
.OrderByDescending(p => p.Published)
.Take(4).ToList();
}
}
#foreach (var post in news) {
<div class="post">
<h2>#post.Title</h2>
<p class="meta">Published #post.Published.Value.ToString("yyyy-MM-dd") by #post.CreatedBy.Firstname</p>
<p>#post.Excerpt</p>
<img src="#post.Attachments">
</div>
I working with posts. I have this code to work with.... Works really well I might add.. However the attached image I wish to display with the post. How can I do that?
<img src="#post.Attachments">
It doesn't appear to work any suggestions
on how I sort what I need to do?
Like #andreasnico pointed out Attachments is a collection of referenced media asset id's. If you want to display the first attachment (assuming you know it's an image) you'd probably do like this.
#foreach (var post in news) {
<div class="post">
<h2>#post.Title</h2>
<p class="meta">Published #post.Published.Value.ToString("yyyy-MM-dd") by #post.CreatedBy.Firstname</p>
<p>#post.Excerpt</p>
#if (post.Attachments.Count > 0) {
<img src="#UI.Content(post.Attachments[0])">
}
</div>
}
This would get the content URL for the first attachment and use it as the source to the image. Note that you can also scale & crop images for use in lists like this with:
<img src="#UI.Content(post.Attachments[0], 300, 100)">
This would scale & crop the image to be 300px wide & 100px high. You can read more about this here: http://piranhacms.org/docs/api-reference/ui-helper
Also if the page displaying the post list is controlled by the CMS and has a page type I'd suggest you look into adding either a PostRegion or PostModelRegion to that page. These region automatically loads a collection of post into the page model, you can specify the amount, sort order & some other stuff. This will simplify you reusing the page type but for example changing which type of post to display for different page instances.
Regards
HÃ¥kan
I'm trying to get the src value of a block of HTML. I am specifically trying to achieve this using the at_css and not using XPath.
So far all I'm getting is either nil or a blank string.
This is the HTML:
<div class="" id="imageProductContainer">
<a id="idLinkProductMainImage" href='URL'>
<img id="productMainImage" src="SRC.jpg" alt="alt" title="A Title" align="left" class="product_image_productpage_main selectorgadget_selected">
</a>
</div>
The code I have is:
item = page.doc.at_css("#productMainImage img").text.strip unless page.doc.at_css("#productMainImage img").nil?
puts item #prints blank
item = item["src"]
puts item #prints blank
Where page.doc is the Nokogiri HTML element.
If you need the src attribute, you can do it like this:
pace.doc.at_css('#idLinkProductMainImage img').attr('src')
Also, I believe the problem is the way you are getting the img tag. You are trying to get all img tags inside #productMainImage, but this id is the image itself, so it will find nothing.
If you use the link id #idLinkProductMainImage, then you have a img tag to search inside it.
I'm trying to make a searchable list of posts on a ruby on rails application that I made. I have AngularJS working on the application. All of the posts are saved on rails in #posts. How would I make AngularJS filter over that?
Here is the relevant view:
<h1>Posts</h1>
<div ng-app>
<div ng-controller = "PostCtrl">
<input placeholder = "Search Post Titles", ng-model="searchText">
<div ng-repeat="post in posts | filter:searchText">
<h2>{{post.title}}</h2>
<p>{{post.text}}</p>
</div>
</div>
</div>
I'm not sure how to fill the angular array posts with the objects in #posts.
It appears that your code works as it is. Here is a plunker that seems to recreate your code.
If you would like to filter inside the post object, you can use this syntax:
ng-repeat="post in posts | filter:{ text: searchText }"
The above will only search the values of the text property of post.
Continuing the answer of Davin Tryon, you can try as this.
First use ng-init="init()" in your Controller:
<div ng-controller = "PostCtrl" ng-init="init( <%= #posts.to_json %> )">
Then in your controller you can do:
$scope.init = (posts) ->
$scope.posts = angular.fromJson(posts)
Then posts will be a JavaScript Object which can be accessed in your scope as if you have query it using angular resource.
If you want to include the associations of posts (lets say comments i.e) you can check rails docs for to_json (or "as_json") and the :include option
I just started using Nokogiri this morning and I'm wondering how to perform a simple task: I just need to search a webpage for a div like this:
<div id="verify" style="display:none"> site_verification_string </div>
I want my code to look something like this:
require 'nokogiri'
require 'open-uri'
url = h(#user.first_url)
doc = Nokogiri::HTML(open(url))
if #SEARCH_FOR_DIV#.text == site_verification_string
#user.save
end
So the main question is, how do I search for that div using nokogiri?
Any help is appreciated.
html = <<-HTML
<html>
<body>
<div id="verify" style="display: none;">foobar</div>
</body>
</html>
HTML
doc = Nokogiri::HTML html
puts 'verified!' if doc.at_css('[id="verify"]').text.eql? 'foobar'
For a simple way to get an element by its ID you can use .at_css("element#id")
Example for finding a div with the id "verify"
html = Nokogiri::HTML(open("http://example.com"))
puts html.at_css("div#verify")
This will get you the div and all the elements it contains