Advanced Feed Parsing in Rails - ruby-on-rails

I am a newbie to rails and I have been watching Rails Casts videos.
I am interested to know a little bit more on FeedZirra (Rails casts episode 168) and especially feed parsing.
For example, I need to Parse feeds from Telegraph and Guardian
I want to put all the sports news from both the newspapers in one table, just football news in another table, cricket news in another table etc
How can I achieve that using feed-zirra?
How do I display only football news in one view and only cricket news in another view?
Also, I want the user to know which website he is gonna visit before he actually clicks the link and finds out.
Something like this
Ryder Cup 2010: Graeme McDowell the perfect hero for Europe
5 min ago | Telegraph.co.uk
How do I display Telegraph.co.uk
Looking forward for your help and support
Thanks

There are many questions there, but I'll take this one:
I just know how to put all feeds in
table. I dont know how to keep feeds
in different tables
Create different models to suit your data model, based on what information you need to show rather than what is provided in the feed. (Different tables for each models if required or Single Table Inheritance if possible)
Write a wrapper class that will use FeedZirra (or any other parser for that matter) to read the parsed feeds and process them. These are generally kept in the lib folder.
Create a rake task which can be called to run this script OR if you are familiar with delayed_job, then create a job.
Schedule your rake task through cron or your job through delayed_job, so that you can periodically update your data.

Related

Adding Feeds to Feedjira via Front-End

I'm looking to add feeds via the front-end to be fetched by Feedjira, I've followed the basic Railscast tutorial and got 1 feed set up and working. What I can't wrap my head around is allowing me to add a feed to feedjira via the front end (e.g. text box) and then letting the cron job pick it up and parse it.
I have a user model and posts model, with a default Feedjira set-up. I'm guessing I need a 'feeds' table in the user model which stores the URL's of the users desired RSS feeds and then passes them into Feedjira to be parsed?
Pretty new at rails/ruby and would love some help/guidance on this matter.
Thanks
Once you have URL store in a table, or in a field of your User model, you can parse it in your rake task.
In your task, itinerates users, or feeds, and do that you did with one feed inside your loop.
So finally, let your cron job do the job for your at interval time.

What's the most efficient way to create an alert queue for a model with hundreds of millions of entries?

I am currently working on an application in Rails (though language/framework shouldn't matter for this question since it is more of a theoretical one). I'm working on wrapping my head around this problem:
Say I am tracking millions of blogs online and am plugged into their RSS feeds. My app pings these feeds every few few minutes to see if there has been any new activity across any of these millions of blogs. If there is any new activity, I want to alert users of my application who have signed up to receive alerts for specific blogs that there has been an alert.
Does it make sense to have a user_blog_alerts table (where a user can specify custom keywords to be alerted about) and continuously check this table against every new entry that comes in from my feed? And when there is a match, to add them to a queue (using Redis)?
What is the best, most efficient way to build and model this alerting system? Am I even thinking about this in the right way? Are there any good examples or tutorials on this when working with such large amounts of data?
I'm not sure what the right way to do this is, but the thought of continuously scanning a table over and over sounds exhausting (ie. unscalable).
Off the top of my head, what if you created a LIST for every blog in Redis. The values would be the user IDs of those who wanted an alert. The key name would contain the blog id (ex: "user_blog_alerts:12345").
Then when you got a new post for blog 12345 it's a simple lookup to see if that key exists. If it does, then fire off alerts for each user in the list.

What is a good approach to allow users subscribe to keywords

I have a rails application using Rails 4, PostgreSQL and hosted on Heroku.
The application revolves around the following models: User and Article.
A user can create articles. An article contains a title, description, location (latitude, longitude) and an image.
I would like to add a notification system that works as follows:
A user can set-up a list of keywords that they wish to subscribe to.
The user gets a notification if an article containing one of their keywords is added (in the title, but perhaps in description in time).
What is the best approach to implement this in a scalable way?
In its simplest form, I could create a model called Keyword that stores what keywords a user wants to be notified for.
Then in the create action for article, check to see if the title (or description) contains any of the saved keywords.
This sounds good but will probably fall over once any reasonable amount of users are added.
Obviously, a background task would do the trick but it still sounds wrong to do a basic string contains directly on the database.
Perhaps I could tokenize the title and description into an index and use a background process to handle the heavy lifting? I heard Postgres has some built in text search - could this work?
Could I use a Heroku add-on like Solr or Redis to handle all this or is it overkill? (Not having to pay for an add-on is an advantage).
Perhaps someone has a better implementation for the same functionality.
I know I can implement it quickly, I just want to be sure it implementation is up to scratch.
Thanks,
Brian
I have faced a similar problem. The slowest thing is to do a case insensitive search. What I would suggest to you is the following approach: let TID be the id of the row in which you store the title; then create a table which has one row for every word in your title in lowercase, with the corresponding TID. Than what you need is a join between the word and the keywords of the given user. You can speed up this query with hash indexes.
In my case, no one of the postgres text function was usable because they all have poor performance.
PS we implemented a full text search over about 60000 documents, so your case might be a bit different.

Building Django Activity Feed using Redis

How can one build an activity feed using Django & Redis?
Example: In the 'Home' section of my iOS app, I would like to fill it with activities generated by users via JSON.
Bob liked Kyle's poem.
Bob started following Kyle.
Bob liked 6 poems ------>(all six poems aggregated together in the feed)
Bob commented on Kyle's poem: Beautiful piece!
How can I go about doing this? If the question is not clear, please let me know so that I can make it clearer for you and others who come across this post and may find it useful! Thank you
What you are actually doing requires
aggregation logic (which you can write in python since your main framework is django)
a task queue running in the background which executes these aggregation logic
denormalized and duplicated data in your redis database, repeating data which are relational in your main database, such as your postgresql database
You can breakdown your activity feed into its components which are aggregated together on redis but are related to each other on your relational database.
Bob and Kyle and poems and Beautiful piece are objects, respectively user object, user object, a poem object and a comment object which are stored in your relational database.
Your activity types are "following", "liked", "commented".
You can then write your python logic to aggregate them into a single feed item which is stored in your redis database and each of these feed item is composed of objects and activity types (and a time stamp in which that activity happens).
That's the primary design consideration to get started.
Here's a good example - https://github.com/SupermanScott/redis-activity-example
Stream-Framework is an open-source library made to build feeds and supports both Redis and Cassandra as storage backends.
You can check it out on github
Disclaimer: I am one of the authors of Stream-Framework

RoR: Use Feedzirra to pull different feeds and display as one

I can successfully pull different feeds using the Feedzirra gem and get feed updates. However, each feed that I'd like to pull has different content (ie: Github Public Feed, last.fm recently played, etc.).
What is the best way to go about combining all of these feeds into one? Right now I have different models for different types of feeds and some feeds use different timestamps than the others.
m,
You could add multiple extra fields to hold each of the unique attributes in an uber-feed object, only filling in the ones that come from each particular feed at time of processing. (It's kind of like the NoSQL model in that way, though not quite, since you have to define the fields ahead of time, but you can add any arbitrary field as a data-holder.)
This is how you add a new field to all instances of a feed...
Feedzirra::Feed.add_common_feed_entry(:my_custom_field)
You'll find a little more dialog about this here...
https://groups.google.com/forum/?fromgroups#!msg/feedzirra/_h4y8_vwDGc/N8sjym6NouEJ
You are creating an activity feed -- here are several gems that you can research on how to create activity feeds: https://www.ruby-toolbox.com/categories/Rails_Activity_Feeds

Resources