I have the following RSS feed. When I parse this feed using Feedzira, Simple-rss and by RSS Parser one by one, I faced similar problem that for item nodes in this RSS, It is just showing title and description, but skipping url and id nodes.
<rss xmlns:content="![CDATA[http://purl.org/rss/1.0/modules/content/]]" version="2.0">
<parsererror style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black">
<h3>This page contains the following errors:</h3>
<div style="font-family:monospace;font-size:12px">
error on line 1 at column 63: xmlns:content: '![CDATA[http://purl.org/rss/1.0/modules/content/]]' is not a valid URI
</div>
<h3>
Below is a rendering of the page up to the first error.
</h3>
</parsererror>
<channel>
<title>
<![CDATA[ Leonardo Hotels ]]>
</title>
<link>
<![CDATA[ http://www.leonardo-hotels.mobi ]]>
</link>
<description>
<![CDATA[ All Leonardo Hotels ]]>
</description>
<copyright>
<![CDATA[ Copyright 2013, Silvertravel.co.il ]]>
</copyright>
<ttl>20</ttl>
<lastBuildDate>
<![CDATA[ Thu, 21 Nov 2013 11:34:38 EST ]]>
</lastBuildDate>
<item>
<content:lang>
<![CDATA[ eng ]]>
</content:lang>
<content:id>
<![CDATA[ 16 ]]>
</content:id>
<title>
<![CDATA[ Leonardo Suite Hotel Tel Aviv-Bat Yam ]]>
</title>
<description>
<![CDATA[
Located directly at the beach, the Leonardo Suite Hotel Tel Aviv-Bat Yam offers 108 ]]>
</description>
<content:url>
<![CDATA[
http://www.leonardo-hotels.mobi/octopus/Upload/Images/Resorts/batyam deluxe livingroom 255.jpg
]]>
</content:url>
</item>
.
.
.
<item> ...</item>
<item> ...</item>
For more insight of this problem this is the code for each parser:
For RSS parser
url = 'http://www.leonardo-hotels.mobi/rss.aspx?lang=eng'
url, timeout = feed_url.strip, 60
uri = URI.parse(URI.encode(url))
http = Net::HTTP.new(uri.host, uri.port)
http.open_timeout, http.read_timeout = timeout, timeout
http.request_get(uri.request_uri) do |response|
data = RSS::Parser.parse(response.read_body, false, false)
puts data.channel.item.inspect
return data.channel.items
end
For Simple RSS
url = 'http://www.leonardo-hotels.mobi/rss.aspx?lang=eng'
rss = SimpleRSS.parse open(url)
puts rss.channel.items.first
For Feedzira
url = 'http://www.leonardo-hotels.mobi/rss.aspx?lang=eng'
rss = Feedzirra::Feed.fetch_and_parse(url)
puts rss.entries.first.inspect
The problem is that it is not showing all nodes children data.
I found that above three mentioned solutions are working perfectly actually the problem was in the provided RSS feed.
How I came to know that my above code is working perfect?
I taken below feed url as sample for testing and run my code on this url. Things were quite awesome. Then I came to know that Client's RSS feed was not according to standards.
http://feeds.feedburner.com/railscasts
Note:
I didn't removed question from here because it contain nice info for anyone who need help in this specific case.
For Feedzira installation help
How I installed Feedzira
Related
I'm creating a Ruby on Rails application and using Nokogiri to parse an XML file. I'm trying to parse the XML file into mutable strings which I can manipulate to create other content.
Here's a sample XML I'm using
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<title type="html">
<![CDATA[ First Post! ]]>
</title>
<content type="html">
<![CDATA[
<p>I’m very excited to have finally got my site up and running along with this blog!</p>]]>
</content>
</entry>
</feed>
This is what I've done so far relating to my problem
In my controller -
def index
#blog_title, #blog_post = parse_xml
end
private
def parse_xml
#xml_doc = Nokogiri::XML(open("atom.xml"))
titles = #xml_doc.css("entry title")
post = #xml_doc.css("content")
return titles, post
end
In my view -
<% for i in 1..#blog_title.length %>
<li><%= #blog_title[i-1] %></li>
<li><%= #blog_post[i-1] %></li>
<% end %>
A sample output from the view (it returns a Nokogiri Element) -
<title type="html"><![CDATA[First Post!]]></title>
So ideally, I'd like to make all the Nokogiri::Element inside the Nokogiri::Document a string or make the entire array a String array.
I've tried iterating through each element and calling .to_s but it doesn't seem to work.
I've also tried calling Ruby::String methods such as slice and that doesn't work (for obvious reasons).
The end result I'm trying to get at (using the sample output on my view) is to return only the following and none of the rest.
First Post!
Can anyone help me? If I'm not clear enough or if someone needs to see more work, please feel free to ask!
For your case you should simply use .text to extract the content of tags. Something like titles.text would work.
You're dealing with RSS/Atom feeds which can contain multiple title tags. You need to iterate over all title nodes and extract their content separately, in a way that lets you keep track of their order and what article they're attached to:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<title type="html">
<![CDATA[ First Post! ]]>
</title>
<content type="html">
<![CDATA[
<p>I’m very excited to have finally got my site up and running along with this blog!</p>]]>
</content>
</entry>
</feed>
EOT
doc.search('title').map(&:text)
# => ["\n First Post! \n "]
This returns an array of the text inside the title nodes. From there you can easily clean up each string, manipulate them, reuse them, whatever.
doc.search('title').map{ |s| s.text.strip }
# => ["First Post!"]
search returns a NodeSet, which is akin to an array of title nodes found in the document. If you don't iterate over them you'll get a concatenated string containing all their text, which is usually NOT what you want:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<foo>
<title>this</title>
<title>is</title>
<title>what</title>
<title>you'd</title>
<title>get</title>
</foo>
EOT
doc.search('title').text
# => "thisiswhatyou'dget"
versus:
doc.search('title').map(&:text)
# => ["this", "is", "what", "you'd", "get"]
Trying to tear apart the first result is impossible unless you have prior knowledge of the document's structure which is usually not true. Iterating over the returned NodeSet will yield very usable results.
To maintain consistency with the various title tags in a feed, you need to loop over the entries, then extract the embedded titles which is a bit different than what your sample XML and code shows:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<feed xmlns="http://www.w3.org/2005/Atom">
<entry>
<title type="html">
<![CDATA[ First Post! ]]>
</title>
<content type="html">
<![CDATA[
<p>I’m very excited to have finally got my site up and running along with this blog!</p>]]>
</content>
</entry>
<entry>
<title type="html">
<![CDATA[ Second Post! ]]>
</title>
<content type="html">
<![CDATA[
<p>blah</p>]]>
</content>
</entry>
</feed>
EOT
titles = doc.search('entry').map { |entry|
entry.at('title').text.strip
}
titles # => ["First Post!", "Second Post!"]
Or perhaps more usable:
titles_and_content = doc.search('entry').map { |entry|
[
entry.at('title').text.strip,
entry.at('content').text.strip
]
}
titles_and_content
# => [["First Post!",
# "<p>I’m very excited to have finally got my site up and running along with this blog!</p>"],
# ["Second Post!", "<p>blah</p>"]]
which returns the title and the content for each entry. From this you can easily build up code to extract the links to the articles, date of publishing, refresh-rates, original site, everything you'd want to know about an individual article and its source, then store it in a database for later regurgitation when requested.
There are gems and scripts available for processing RDF, RSS and Atom feeds, however, years ago, when I had to write a huge aggregator for feeds, nothing was available that met my needs and I wrote one from scratch. I'd recommend trying to find one rather than reinvent that wheel, otherwise look through their source and learn from their experience. There are a number of things to do in code to be a good network-citizen that doesn't swamp the servers and get you banned.
See "How to avoid joining all text from Nodes when scraping" also.
I am trying to set up an integration using the SDK to create KB article records in CRM 2013, and so far haven't been able to figure out a good way to build the article xml. We want to use sharepoint as our document authoring tool , and then send those documents over to CRM. From the research I've done so far, I know that in order to create a new kb article I need to link it to a template. I created a very basic template with one section as a test to work with, then in a test app using the SDK I created a new KBArticle entity instance, set the necessary required fields and assigned the template to the new article. I tried building the xml for the ArticleXML attribute by starting with the StructureXML attribute of the template and filling in the content section with some test html content. I was able to create the kb article successfully and then load it up in CRM, but it doesn't look right yet. I also created a new kb article through the UI and then using the SDK I retrieved it and examined the ArticleXML attribute to compare with the one I'm trying to create programmatically.
Here's the basic structure of the ArticleXML for an article created in the UI:
<articledata>
<section id="0">
<content>
<![CDATA[<b>Article content located here</b>]]>
</content>
</section>
<section id="1">
<content>
<![CDATA[]]>
</content>
</section>
</articledata>
Now here is the StructureXML attribute value from the template I created:
<kbarticle>
<sections nextSectionId="1">
<section type="docprop" name="title"/>
<section type="docprop" name="number"/>
<section type="edit" id="0">
<![CDATA[Content]]>
<instructions>
<![CDATA[Place KB article content here]]>
/instructions>
</section>
</sections>
<stylesheet>
<article>
<style name="background-color" value="#ffffff"/>
<style name="font-family" value="verdana"/>
<style name="font-size" value="10pt"/>
</article>
<title>
<style name="font-family" value="verdana"/>
<style name="font-size" value="16pt"/>
</title>
<number>
<style name="color" value="#666666"/>
<style name="font-size" value="9pt"/>
</number>
<heading>
<style name="font-size" value="10pt"/>
<style name="font-weight" value="bold"/>
<style name="color" value="#000066"/>
<style name="border-bottom" value="1px solid #999999"/>
</heading>
</stylesheet>
</kbarticle>
That template XML is what I tried to use and assign to the new article, but obviously it doesn't look right when the article is viewed, the template content is there along with the content I added, its basically duplicated:
I did also see there is a FormatXML attribute on the template, which contains XSL to transform the XML, I tried using this but it produces HTML output that isn't what I want either. I'm struggling with how to get from the template to the ArticleXML that I need in order to create the new KB article. Any help with this is much appreciated!
I am fairly new to augmented reality implementations, and have been handed down a project which used Junaio sometime back in 2012. I need to re-implement everything from scratch, and I think I've been fairly successful so far; I can get my POI's to show up on a map and in the list, however the 3D Models for those points and thumbnail/icon images for those POIs do not show up in the app when I load my channel. Here is an example of the XML being used for one of the POIs. Is there something I'm missing?
<object id="3">
<title>
<![CDATA[ Post Office by Esri ]]>
</title>
<thumbnail>
<![CDATA[
<path_to_image>/USPS_logo.jpg
]]>
</thumbnail>
<icon>
<![CDATA[
<path_to_image>/USPS_logo.jpg
]]>
</icon>
<location>
<lat>34.057962</lat>
<lon>-117.194614</lon>
<alt>0</alt>
</location>
<popup>
<description>
<![CDATA[ USPS ]]>
</description>
<buttons>
<button id="url" name="Website">
<![CDATA[ http://www.usps.com/ ]]>
</button>
</buttons>
</popup>
<assets3d>
<model>
<![CDATA[
<path_to_model_zip>/Redlands_NewYorkSt_PO.zip
]]>
</model>
<transform>
<translation>
<x>0</x>
<y>0</y>
<z>0</z>
</translation>
<rotation type="eulerdeg">
<x>0</x>
<y>0</y>
<z>0</z>
</rotation>
<scale>
<x>1000</x>
<y>1000</y>
<z>1000</z>
</scale>
</transform>
</assets3d>
</object>
the best way to find support for that is on metaio helpdesk. maybe this helps:
http://helpdesk.metaio.com/questions/17632/gps-tracking-scale-over-distance
I'm using rails and the Nokogiri parser. My xml is as below and I'm trying to get the 'Biology: 08:00' text into my view.
<rss version="2.0">
<channel>
<item>
<title>Biology: 08:00</title>
<description>Start time of Biology</description>
<pubDate>Tue, 13 Oct 2009 UT</pubDate>
</item>
</channel>
</rss>
I can find the node with the text 'biology' using the code below
#content = doc.xpath('//title[contains(text(),"Biology")]')
When I move it into my view it strangely ends up as the title of my .html.erb page. I can't seem to get it into the body with
<body>
<%=#content%>
</body>
anyone know what's going on?
You're getting the whole node, and the node is a <title> tag.
you want:
#content = doc.xpath('//title[contains(text(),"Biology")]/text()')
to get the text content of the node
i'm migrating my app to delphi 2009. my app must still use a lot of AnsiString. during migration, i find myself always converting:
abc := def;
into:
abc := string(def);
or
abc := TDeviceAnsiString(def);
i know i should be able to do this with templates but i find templates--although powerful--are not so easy to get working. here's what i've been trying:
<?xml version="1.0" encoding="utf-8" ?>
<codetemplate xmlns="http://schemas.borland.com/Delphi/2005/codetemplates"
version="1.0.0">
<template name="das" invoke="auto">
<point name="expr">
<script language="Delphi">
InvokeCodeCompletion;
</script>
<hint>
MP: TDeviceAnsiString
</hint>
<text>
True
</text>
</point>
<description>
MP: TDeviceAnsiString
</description>
<author>
Mike
</author>
<code language="Delphi" context="any" delimiter="|"><![CDATA[TDeviceAnsiString(|selected|)|end|]]>
</code>
</template>
</codetemplate>
it doesn't appear in the Surround menu and it doesn't activate whenever i want. i'd like to be able to
abc := **das***[tab]*def;
or select the "def" and use "surround" to get:
abc := TDeviceAnsiString(def);
thank you for your help!
This should do it:
<?xml version="1.0" encoding="utf-8" ?>
<codetemplate xmlns="http://schemas.borland.com/Delphi/2005/codetemplates"
version="1.0.0">
<template name="das" surround="true" invoke="auto">
<description>
MP: TDeviceAnsiString
</description>
<author>
Mike rev François
</author>
<code language="Delphi" delimiter="|"><![CDATA[TDeviceAnsiString(|end||selected|)]]>
</code>
</template>
</codetemplate>
Added: You can find more info on the Delphi Wiki with the LiveTemplates Technical Infos