iOS RSSFeed, How to fech feed automatic from website - ios

I am working on news based application in which I want to fetch the dynamic feed with just typing website's name.
For example: If i want to fetch feed from CNN.com or BBCNEWS.com or etc , then i have to just write website name in textbox like "BBC.com" in place of it's rss urlname
http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml.
I know how to fetch feed from static link but i want to do it dynamically
I have searched a lot regarding this but havn't find any answer. I have seen this in feedly application. In which they have done like this.
so, if anybody know then help me regarding this issue.

RSS comes with a mechanism call Auto-Discovery which links RSS feeds to an HTML page.
It relies on the use of a <link> element in the <head> section of any HTML page.
The <link> tag includes 4 important elements:
rel should include alternate which tells the application that the linked document contains an alternate view of the current
document/page. You can also use the feed value, even though, in our
experience, this is much less frequent. Using both is probably a safe
bet
type indicates the MIME type of this alternate representation. RSS uses application/rss+xml while Atom uses application/atom+xml
title is a human description of the document. It’s good to re-use the page’s title. Do not add RSS as it’s meaningless for people :)
href is the most important attribute: it’s the URL (relative or absolute) of the feed.
Here’s, for example, the discovery for this page's very RSS feed:
<link rel="alternate" type="application/atom+xml" title="Feed for question 'iOS RSSFeed, How to fech feed automatic from website'" href="/feeds/question/32946522">
It's a great example!

In the HTML of the site, you'll find a snippet like this
<link rel='alternate' type='application/rss+xml' title='RSS' href='http://feeds.feedburner.com/martini'>
That's where the RSS URL comes from.

Related

Google SDTT appending "#__sid=md3" to URL for mainEntityOfPage

Why is this happening?
HTML shows:
<meta content='http://www.costumingdiary.com/2015/05/freddie-mercury-robe-francaise.html' itemprop='mainEntityOfPage' itemscope='itemscope'/>
Structured Data Testing Tool output shows:
http://www.costumingdiary.com/2015/05/freddie-mercury-robe-francaise.html#__sid=md3
Update: It looks like it has to do with my breadcrumb list. But still, why is it happening, and is it wrong?
If the URL you want to provide is unique you can use the itemid property.
I was confronted with mainEntityOfPage by the tool after the latest update. And using Google's example I used the following code
<meta itemscope itemprop="mainEntityOfPage" itemType="https://schema.org/WebPage" itemid="https://blog.hompus.nl/2015/12/04/json-on-a-diet-how-to-shrink-your-dtos-part-2-skip-empty-collections/" />
And this show up correctly in the Structured Data Testing Tool results for my blog
I don’t know where the fragment #__sid=md3 is coming from, but as the SDTT had some quirks with BreadcrumbList in the past, it might also be a side effect of this.
But note that if you want to provide a URL as value for the mainEntityOfPage property, you must use a link element instead of a meta element:
<link itemprop="mainEntityOfPage" href="http://www.costumingdiary.com/2015/05/freddie-mercury-robe-francaise.html" />
(See examples for Microdata markup that creates an item value, instead of a URL value, for mainEntityOfPage.)

Inconsistent results trying to parse og:image tag from a webpage manually and programmatically

I first manually browse to the below URL:
Mounting injuries won't stop Germany's path to World Cup
Then if view the page source and look for og:image meta tags I find the following:
<meta property="og:image" content="http://l.yimg.com/bt/api/res/1.2/JjwtkhIEdT9nKxLp8p0LFQ--/YXBwaWQ9eW5ld3M7cT04NTt3PTYwMA--/http://media.zenfs.com/en_us/News/Reuters/2013-10-08T122032Z_1_CBRE9970YAZ00_RTROPTP_2_SOCCER-WORLD.JPG"/>
However, if I try to parse the same url programmatically, I get a generic Yahoo stock icon. Here is the code that I am using:
string url = "http://sports.yahoo.com/news/mounting-injuries-wont-stop-germanys-path-world-cup-122032650--sow.html";
WebClient wc = new WebClient();
var doc = new HtmlAgilityPack.HtmlDocument();
string newsPageSource = wc.DownloadString(sourceUri.ToString());
doc.LoadHtml(newsPageSource);
...
(I have removed the rest fro brevity).
If I debug here and inspect the newsPageSource string that contains the content of the target web page and look for og:image tag, its contents are different:
<meta property="og:image" content="http://l.yimg.com/bt/api/res/1.2/81I5U991YW6EEaB2Cjd58g--/YXBwaWQ9eW5ld3M7cT04NTt3PTYwMA--/http://l.yimg.com/os/mit/media/m/social/images/social_default_logo-1481777.png"/>
So not sure what is going on here. I guess, when browsing manually, the original URL is probably redirecting to some other internal URL but when doing this programmatically, the code just grabs the first "snapshot" of page source, without waiting a bit longer and executing any redirects. Can anyone shed light here? Or better yet, how would I extract the real image (2013-10-08T122032Z_1_CBRE9970YAZ00_RTROPTP_2_SOCCER-WORLD.JPG) in this case instead of getting a Yahoo stock icon (social_default_logo-1481777.png).
Somehow Facebook and Google+ are smart enough to extract the correct image when I paste the same link.
Thanks,
Archil

Print web page with original look

I want to achieve print functionality such that user can print out the web form and use it as paper form for the same purpose. Of course I do not need all the web page header and footer to be printed, just content of a div which take most of the page. I did play around with media print css and menage print result to look almost as original page. But the I tried to print it in another browser(Chrome) and it is all messed. (before I tried Mozilla).
For the web form I user css framework Twitter Bootstrap and I had to override its css (in print media) for almost each element individually to get some normal look in the print result.
My question is is there some way (framework/plugin) to print just what you see on the page, maybe as an image or something?
Any other suggestions are welcome.
Thanks.
If you are familiar with PHP you can try the PHP class files of TCPDF or those of FPDF.
Or there is also dompdf which renders HTML to PDF, but this will include more than just the information of one div.
And for further info here is a post on Stack where users are discussing which they think is best.

Understanding og:url

I am working through the Facebook tutorial for iOS and am having trouble when a get to the final part with Publish Open Graph Story. I have gone through and set everything up as best I understand. When I try to test using the Object Debugger I get "Missing Required Property: The 'og:url' property is required, but not present." Can some one help me and explain this tag and how it should be set?
Thanks for the help.
Have a look at ogp.me they define og:url as :
og:url - The canonical URL of your object that will be used as its
permanent ID in the graph, e.g.,
"http://www.imdb.com/title/tt0117500/".
Basically as jeff sherlock of facebook explains in this post: https://stackoverflow.com/a/7831012/228741
That when you give the url of your action (the one containing meta tags) facebook ignores everything that is on that page (doesn't render it) . But it renders whatever you have given in the og:url.
What i do usually is have my og:url call the same page with the parameters. So facebook renders the same page for me. If you want to render some other page you give the link in the og:url.
This is set as a meta tag in the <head> section.
Example :
<meta property="og:url" content="your url">

How to manipulate DOM with Ruby on Rails

As the title said, I have some DOM manipulation tasks. For example, I want to:
- find all H1 element which have blue color.
- find all text which have size 12px.
- etc..
How can I do it with Rails?
Thank you.. :)
Update
I have been doing some research about extracting web page content based on this paper-> http://www.springerlink.com/index/A65708XMUR9KN9EA.pdf
The summary of the step is:
get the web url which I want to be extracted (single web page)
grab some elements from the web page based on some visual rules (Ex: grab all H1 which have blue color)
process the elements with my algorithm
save the result into my database.
-sorry for my bad english-
If what you're trying to do is manipulate HTML documents inside a rails application, you should take a look at Nokogiri.
It uses XPath to search through the document. With the following, you would find any h1 with the "blue" css class inside a document.
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.stackoverflow.com'))
doc.xpath('//h1/a[#class="blue"]').each do |link|
puts link.content
end
After, if what you were trying to do was indeed parse the current page dom, you should take a look at JavaScript and JQuery. Rails can't do that.
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
To reliably sort out what color an arbitrary element on a webpage is, you would need to reverse engineer a browser (to accurately take into account stylesheets, markup hacks, broken tags, images, etc).
A far easier approach would be to embed an existing browser such as gecko into a custom application of your making.
As your spider would browse pages, it would pass them to your embedded instance of gecko where you could use getComputedStyle to pull what color an individual element happens to be.
You originally mentioned wanting to use Ruby on Rails for this project, Rails is a framework for writing presentational applications and really a bad fit for a project like this.
As a starting point, I'd recommend you check out RubyGnome, and in particular RubyGnome's Gtk::MozEmbed functionality.

Resources