So, I'm using feedzirra gem (https://github.com/pauldix/feedzirra) for parsing and displaying RSS data. However, I'm having trouble displaying different feeds. The code in my view is displaying FeedEntry.all, albeit, I'd like to display several different feeds under different headers.
I'm having trouble wrapping my head around how to do this. Another way of putting it would be, how can you seperate the feeds? Anyway to seperate them by title and display them by the title?
E.G if I were to pull data from YouTube's rss (hypothetically), and Spotify's as well how could I display YouTube's rss under the YouTube header, and likewise with Spotify?
Here's the gist of it: https://gist.github.com/anonymous/5801535
Have you looked at backending your feedzirra output with a tool like data_active to convert it into an ActiveRecord structure? I am just beginning to build a similar process so I don't yet have code to show. I recognize there are some duplications here so it may be that feedzirra does the fetch and data_active does the parse.
Related
I tried using Feedjira to assist with content analysis from newsfeeds, but it appears that RSS feeds now only link to content rather than including them with RSS as I found out in "Feedjira not adding content and author". I plan to use Feedjira to get the URL for the article, but then use Nokogiri to scrape the article and pick out the relevant parts.
The problem is that each media outlet will have a different format for their pages and I need to know the best way for Nokogiri to take the URL from the database (supplied by Feedjira) and depending on the associated feed title (also the database from Feedjira sync) scrape the page in a specific way and save it to a separate table in the database. Anyone got any suggestions?
I don't know your special use case but I'm also doing content analysis using news feeds.
Maybe you'll have a look on Readability which provides you a generic content scraper.
The problem you've encountered is that every feed generator does it a bit differently, just as with HTML generators. You can assume certain fields are going to be in place in an RDF, RSS or ATOM feed, however the author of the feed could use optional tags that you could find very useful, so you have to write code to look for them.
I wrote several feed aggregators in the past, including one that was handling well over 1000 feeds daily. By sniffing out the feed type, ATOM vs. RSS vs RDF, then I could make sensible checks for fields that were interesting given that format, and extract the data if it was available.
Pre-canned parsers get it wrong too often, either grabbing data you don't want and making a mess of the output, or skipping data you do want leaving gaps in the output, so be prepared to write code if you want it done correctly.
You'll probably want to take advantage of a backing database too, to keep track of what you looked at last and when you're supposed to look at it again; That's part of being a good network citizen. You'll also want to keep track whether a feed was down the last n times you looked so you can trim out dead sites.
I have been programming with Ruby and Rails for the last 6 months, and I am pretty much comfortable with it now. But, I never had to deal with RSS feeds before with my Rails app. For my current project, I have a display page which shows different market indices with their open, high, low , volume values. Now, I would like to bring RSS feeds from yahoo finance and want to show them in the same page. I want to fetch RSS feeds from this link http://finance.yahoo.com/rssindex and then I want to show them in my display page in my Rails app. While searching for a solution for this problem, I came across this : Generating RSS feed in Rails 3
But, I think this is for generating RSS feeds for one's app. I believe my need is different than this. I want to fetch RSS feeds and show them in my display page. For fetching RSS feeds I found a ruby gem called feedzirra which seems to be great. But, I am still a bit confused about how I will actually display the news feeds with my Rails app ? I mean, do I need to create any model, view and controller for that ? do I need to save the news feeds in my data base for displaying them in my display page ? Or, is there any way of just fetching using feedzirra and displaying them in my display page ? I am just not sure how to move forward from this position. What is the best Rails way of solving this problem ?
Ryan Bates has a railscast on one way to handle the problem:
http://railscasts.com/episodes/168-feed-parsing
You should definitely have a model for this data. The goal should be to get the model handling the fetching and parsing of data.
If you aren't interested in saving information to your own database(as is shown in the railscast episode), beware you will incur a bit of a performance penalty for making a request to yahoo on every page load.
Here's a simple example of a method you could add to your model.
def self.lookup_stock_titles(stock_name)
titles = Array.new
Feedzirra::Feed.fetch_and_parse("http://feeds.finance.yahoo.com/rss/2.0/headline?s=#{stock_name}®ion=US&lang=en-US").entries.each do |entry|
titles << entry.title
end
return titles
end
This would give you an array of titles from the rss feed which you could integrate into a controller or view however you like.
<% ModelName.lookup_stock_titles("ABC").each do |title| %>
<h2><%= title %></h2>
<% end %>
Also be sure that yahoo allows you to use their data the way you want, saving it to a database, number of requests per hour, what kind of credit do you need to give to them, etc.
I am working on a web application and want to use SimplePie for parsing a lot of RSS feeds.
One problem is that I want to know if SimplePie automatically removes all the dead links while parsing RSS feeds. So that the final output of the RSS feed doesn't have any invalid links.
No, it does not. In order to do so, you'd have to send a HEAD request for every link in the feed. This is too expensive for SimplePie to do, so you'll need to find code to do that on your own. Try this example on SO.
I have over 400 feeds that are from YouTube channels that I want to combine into just one feed that can be filtered by date, so it only displays from x date to y date.
Example of what the feed looks like: http://gdata.youtube.com/feeds/api/users/google/uploads
I have already tried to use different web services but they have all failed to work. I have tried yahoo pipes but I'm not really sure how I would be able to combine hundreds of feeds using it.
Why I need to do this is because I have a Wordpress Plugin that can post content via feeds, but 400 is too much and I only need current content from that feed. Please can anyone give me some suggestions on how I can combine all these feeds then filter them? Or possibly suggest some alternative?
Take a look at Yahoo Pipes - http://pipes.yahoo.com
I'm using RSS library so i can parse Atom and RSS in Ruby and Rails and store it in a model.
I've looked at the standard RSS library, but is there one library that will auto-detect that there is a new rss feed so i can update my database ?
what are the best practice to trigger an instruction in order to store the new rss feed ?
should i use threads to handle that problem ?is it going to be slow?
thank you for your help
OK heres the deal.
If you want a real fast feed parser go for Feedzirra. Does not work on windows. http://github.com/pauldix/feedzirra
Autodiscovery?
-Theres truffle-hog if you don't want to do GET redirects. http://github.com/pauldix/truffle-hog
-Theres feedbag if you want to do GET redirects to find feeds from given urls. This is slower though. http://github.com/damog/feedbag
Feedzirra is the best bet if you want to poll for new entries for your feed. But if you want a more non-polling solution to your problem then i would suggest going through the pubsubhubbub spec. Make sure while parsing your feeds they are pubsubhubbub enabled. Check for the link tag. If it points to pubsubhubbub.appspot.com or any other pubsub enabled hub then just subscribe to the feed by sending a subscription request to the hub. You can then define a endpoint in your app which will in turn receive updated entry pings for your feed subscription from the hub. Just read the raw POST data and store it in your database. Stats are that 95% of the blogger blogs are pubsub enabled. That is a lot of data in your hands already. :)
If you are polling for changes then you should check the last-modified or etag from the header rather than parse the entire feed again. Saves you from wasting resources. Feedzirra takes care of this for you.
I am not sure what you mean by "auto-detect" a new feed?
Are you looking for code that can discover when someone creates a new feed on a site? Or, do you mean discover when an existing feed has a new article?
The first is tough because your code needs to know what site to look at so it needs some sort of auto-discovery of sites with new feeds. Searching the google for "new rss feeds" doesn't return anything that looks useful, at least not on the first page. If you, or your users, know of a new site then you can have an interface to add new sites to search. Then you grab the page at that URL, look for the RSS/Atom auto-discovery links, and go from there. Auto-discovery links can open a can of worms because of duplicate content being served using different protocols (RDF, RSS and Atom), so you have to determine which to use, or multiple feeds with alternate content listed.
If you mean you want to discover when an existing feed has new articles, then you have to keep track of the last time your code looked at the feed, and the last article that was seen, then retrieve the feed and see if any articles were not in your list of previously seen articles. Your code needs to be sensitive to the time-to-live information in a lot of feeds too. Hitting the feed every fifteen minutes when they update once a week is bad form. Most aggregation code can do those things already but you might need to configure a database and tell the code how to find it.
Generally, for this sort of task I set up a crontab entry on a production Linux or Unix system and fire off the job periodically, looking in the database for feeds whose last-run-time plus the stored time-to-live value is in the past.
Does that help any?
Very easy solution is to use Dynamic attribute-based finders
When you are filling your model with RSS feed data, instead of Model.create(...) use Model.find_or_create_by_column(value, :other_column => other_value).
You can specify a date as unique value or RSS message title ... (whatever you want)
I think this is pretty easy. You can make some cron task to fill your model once per hour for example. Only new feeds will be added.
There is no chance to get some "event" when RSS is updated without downloading whole RSS feed again.