Using FeedJira to create RSS aggregator/reader - ruby-on-rails

I am trying to create my own rss reader app in ruby on rails. I want to be able to store various news stories in my database that I can pull from later to display each story with its headline, image, summary, etc. in a nice layout. I am working with the feedjira library and am also pretty new to RoR. I know that these two commands in the rails console fetch rss feeds and somehow parse them:
urls = %w[http://feedjira.com/blog/feed.xml https://github.com/feedjira/feedjira/feed.xml]
feeds = Feedjira::Feed.fetch_and_parse urls
While these two commands work on rss feeds, I was wondering how I could configure my database/model and then save the news entries I get from Feedjira into the db. I tried watching the railscast on this issue but it seemed a bit out of date. Any help on this issue would be immensely appreciated! Thanks in advance!

Here's one way:
Create a model such as this:
class Entry < ActiveRecord::Base
attr_accessible :guid, :source_site_id, :url, :title, :summary, :description, :published_at
def self.update_from_feed(feed_name)
feed = Feed.find_by_name(feed_name)
feed_data = Feedjira::Feed.fetch_and_parse(feed.feed_url)
add_entries(feed_data.entries, feed)
end
private
def self.add_entries(entries, feed)
entries.each do |entry|
break if exists? :entry_id => entry.id
create!(
:entry_id => entry.id,
:feed_id => feed.id,
:url => entry.url,
:title => entry.title.sanitize,
:summary => entry.summary.sanitize,
:description => entry.content.sanitize,
:published_at => entry.published
)
end
end
end
end
You can then call this from the cli / cron or whatever with, for example:
rails runner -e development 'Entry.update_from_feed("feedname")'
This runs the update_from_feed method in the context of your Rails app using a separate rails instance (a bit like rails console), but doesn't impact the running Rails instance.
In this example, there's a separate model which has name and feed_urls, so there's a lookup of the url based on the provided name.
This code doesn't use the ability of Feedjira to check for updates, so dupe checking is baked in.
(This guthub issue says to avoid using the #update method.
Note that the use of break assumes that new entries are always added to the top of the feed. If you don't trust the feed, then replace break if with unless. The url can be used as an alternative unique id.
Edit:
Here's a version of the update_from_feed method that takes advantage of Feedjira's ability to process multiple feeds:
def self.update_all
feed_urls = Feed.pluck :feed_url
feeds = Feedjira::Feed.fetch_and_parse(feed_urls)
feed_urls.each do |feed_url|
feed = Feed.find_by_feed_url(feed_url)
add_entries(feeds[feed_url].entries, feed)
end
end
pluck returns all the rows of the specified column(s) (:feed_url in this case) in an array. Equally you could change it to accept an array of names, from which it looks up an array of URLs to pass to feedjira.
Finally, if you wanted a self-looping method, you could include:
def self.update_all_periodically(frequency = 15.minutes)
loop do
update_all_from_feed
sleep frequency.to_i
end
end
Then this:
rails runner -e development 'Feed.update_all_periodically'
won't return until you break the process, and will update all feeds at the default frequency, or that specified as an optional argument.
If you wanted to run the updates asynchronously in your main Rails process, then a background worker such as Sidekiq, Resque or DelayedJob will do the... job. :)

Scheduling the fetching and parsing of al these feeds can be incredibly hard and time consuming, which means you shoud absolutely not do it from inside the Rails app itself. At best, you should do it using an 'offline' script.
You could also simply rely on existing APIs like Superfeedr and its rack middleware.

Related

Rails 5. How to use an uploads controller to handle bulk uploads by dispatching each record

This is Rails 5 with Mongoid.
I have a MVC structure for load balancers. The model and the controller do all the validation and sanitising I want and need. However, I want to be able to send an array of balancer data structures to an upload controller, which will verify the upload is in a sane format, and then iterate over the array dispatching to the balancers controller to validate, insert, or upsert each entry as needed -- and accumulate statistics so that at the end of the iteration I can produce an HTML or JSON rendering of success count and detailed errors.
I'm pretty sure I need to frob the params structure as part of each iteration/dispatch, but it's unclear how, or whether I want to use the BalancersController's #new, #create, #update, #save, or what method in order to get the validation to work and the insert/upsert to be done properly.
I haven't yet been able to find anything that seems to describe this exact sort of bulk-upload scenario, so any/all help & guidance will be much appreciated.
Thanks!
(There are serveral other models/controllers I want to push through here; the balancer case is just an example.)
I've sort of got the collecting of the statistics summary down, but I' having trouble dispatching to the balancers controller to actually update the database with anything other than nil.
[edited]
Here's an simple example model:
class Puppet::Dbhost
include ::PerceptSys
include Mongoid::Document
include Mongoid::Timestamps
validates(:pdbfqdn,
:presence => true)
field(:pdbfqdn,
:type => String)
field(:active,
:type => Mongoid::Boolean)
field(:phases,
:type => Array)
index(
{
:pdbfqdn => 1,
},
{
:unique => true,
:name => 'dbhosts_pdbfqdn',
}
)
end # class Puppet::Dbhost
The uploads controller has a route post('/uploads/puppet/dbhosts(.:format)', :id => 'puppet_dbhosts', :to => 'uploads#upload').
The document containing upload information looks like this:
{
"puppet_dbhosts" : [
{
"pdbfqdn" : "some-fqdn",
"active" : true
}
]
}
The uploads#upload method vets the basic syntax of the document, and then is where I get stuck. I've tried
before_action(:upload, :set_upload_list)
def set_upload_list
params.require(:puppet_dbhosts)
end
def upload
reclist = params.delete(:puppet_dbhosts)
reclist.each do |dbhrec|
params[:puppet_dbhost] = dbhrec
dbhost = Puppet::Dbhost.new
dbhost.update
end
end
In very broad strokes. The logic for validating a puppet_dbhost entry is embedded in the model and the controller; the uploads controller is meant to simply dispatch and delegate, and tabulate the results. There a several models accepting uploads in this manner, hence the separate controller.
Well, I feel sheeeepish. Obviously the correct path is to abstract the generic upload methods and logic into a separate module and mix it in. Routes can be adjusted. So no need for controllers to know about the internals of others for this scenario. Thanks for the hint!

Rails; Fetch records within initializer

I've been wondering it is common to fetch records within initializer?
Here this is an example for service object to fetch records and generated pdf receipt file.
Input is invoice uuid, and fetch the related records such as card detail, invoice items within initialier.
class Pdf::GenerateReceipt
include Service
attr_reader :invoice, :items, :card_detail
def initialize(invoice_uuid)
#invoice ||= find_invoice!(invoice_uuid) # caching
#items = invoice.invoice_items
#card_detail = card_detail
end
.....
def call
return ReceiptGenerator.new(
id: invoice.uuid, # required
outline: outline, # required
line_items: line_items, # required
customer_info: customer_info
)
rescue => e
false, e
end
.....
def card_detail
card_metadata = Account.find(user_detail[:id]).credit_cards.primary.last
card_detail = {}
card_detail[:number] = card_metadata.blurred_number
card_detail[:brand] = card_metadata.brand
card_detail
end
end
Pdf::GenerateReceipt.('28ed7bb1-4a3f-4180-89a3-51cb3e621491') # => then generate pdf
The problem is if the records not found, this generate an error.
I could rescue within the initializer, however that seems not common.
How could I work around this in more ruby way?
This is mostly opinion and anecdotal, but I prefer to deal with casting my values as far up the chain as possible. So i would find the invoice before this object and pass it in as an argument, same with the card_detail.
If you do that in this class, it will limit the responsibility to coordinating those two objects, which is way easier to test but also adds another layer that you have to reason about in the future.
So how i would handle, split this into 4 separate things
Invoice Finder thing
Card Finder thing
Pdf Generator that takes invoice and card as arguments
Finally, something to orchestrate the 3 actions above
Hope this helps.
Addition: Check out the book confident ruby by avdi grimm. It's really great for outlining handling this type of scenario.

Communicating between 2 Rails applications

I have 2 separate rails applications (app a, app b). Both of these apps maintain a customer list. I would like to run a rake task once a day and have app b pull in select customers from app a.
This is the way I have attempted to solve this. If I am going down the wrong road please let me know.
I am using JBuilder to generate the JSON
My issue is with how to have App B set an id in app A, so that the system knows the customer has already been transfered over.
Im assuming I have to do a put request similar to what I have done to get the customers list, but I am having issues getting that to work.
App A
Customers Model
scope :for_export, :conditions => {:verified => true, :new_system_id => nil ...}
Customers Controller
skip_before_filter :verify_authenticity_token, :only => [:update]
#
def index
#customers = Customer.for_export
end
def update
#customer = Customer.find(params[:id])
if #customer.update_attributes(params[:customer])
render :text => 'success', :status => 200
end
end
App B
rake task
task :import_customers => :environment do
c = Curl::Easy.new("http://domain.com/customers.json")
c.http_auth_types = :basic
c.username = 'username'
c.password = 'password'
c.perform
a = JSON.parse(c.body_str)
a.each do |customer|
customer = Customer.create(customer)
#put request back to server a to update field
end
end
end
end
What I have is currently working, Im just not sure if this is the correct method, and also how to initiate a put request to call the update method in the customer controller.
Thanks!
Ryan
I'm sorry that I'm not answering your question, but I am giving you an alternative. What you are trying to create sounds a lot like an ETL job. You may want to consider having a batch job move a copy of your customers table from app a over to app b periodically, and then have another batch job import that table into app b's database. I know, it's a little clunky, but it's a very popular and reliable pattern to solve your problem.
Also, if both apps are in the same data center, then you may want to create a read-only database view of app a's customer data and then have app b read that using SQL calls. It's a slightly cheaper and easier way to integrate the two apps than the option that I listed above.
Good luck!

How to retrieve database data by disabling the Ruby on Rails cache system only for one case?

I am using Ruby on Rails v3.2.2 and I would like to retrieve database data by disabling the system cache only in a case. That is, in my view file I have something like the following:
<h1>Random articles 1</h1>
<%= Article.order('RAND()').limit(3).inspect %>
...
<h1>Random articles 2</h1>
<%= Article.order('RAND()').limit(3).inspect %>
When the view file is rendered it outputs the same data for both under "Random articles 1" and "Random articles 2". It happens because the Ruby on Rails cache system (by "default"/"convention") tries to hit the database as less as possible for performance reasons.
How can I prevent this behavior (just for the above explained case) so to output different data for finder methods in my view file?
There is uncached method in ActiveRecord. Looks like you can use it like that:
articles = Article.uncached do
Article.order('RAND()').limit(3)
end
but you better extract that into class method of your model.
See this article for more information
I tried to reproduce your issue, but could not (Rails 3.2.2 too, using sqlite3 adapter, code below). But try this anyway:
Article.uncached do Article.order('RAND()').limit(3).inspect end
The following is how I tried to reproduce your issue in an empty Rails project, for me it yielded the articles in a different order all the time, though:
ActiveRecord::Migration.create_table :articles do |t| t.string :name end
class Article < ActiveRecord::Base; end
20.times do |i| Article.create :name => "Article#{i}" end
# sqlite doesn't have a RAND() function, emulate it
Article.connection.instance_variable_get(:#connection).define_function 'RAND' do rand end
p *Article.order('RAND()').limit(3)
Maybe you spot a mistake in how I tried to reproduce your issue.

Rails polling RSS every 6 hours

I am seeking advice on how to poll RSS with a given interval such as 6 hours.
The following code works, it reads and parses the feed and adds it to the database. It only adds new feeds. Here is the class method:
def self.update_from_feed(feed_url)
feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing") #Example feed
feed.entries.each do |entry|
unless exists? :guid => entry.id
create!(
:name => entry.title,
:url => entry.url,
:published_at => entry.published,
:guid => entry.id
)
end
end
end
How do I run this entire class method lets say, every 6 hours? I'm new to Ruby (and Rails), so any help would be appreciated with an example. I want to avoid running an external cron job if possible. I want it to run every 6 hours within the code if that makes sense. Thanks
No, it doesn't make sense to not use cron for this. This is what cron is for, it's in the name of the program.
If you don't like the cron syntax, that's cool, there's a gem called whenever, https://github.com/javan/whenever that gives a nice Ruby syntax for generating a cron job and a command that lets you do that.
However, for the love of god, do not try to invent a new way of doing this, unless you're adding some killer features. Use cron, move on.

Resources