How to extract article content from a website/blog - ruby-on-rails

I'm trying to write a generic function for extracting article text from blog posts and websites.
A few simplified examples I'd like to be able to process:
Random website:
...
<div class="readAreaBox" id="readAreaBox">
<h1 itemprop="headline">title</h1>
<div class="chapter_update_time">time</div>
<div class="p" id="chapterContent">article text</div>
</div>
...
Wordpress:
<div id="main" class="site-main">
<div id="primary" class="site-content" role="main">
<div id="content" class="site-content" role="main">
<article id="post-1234" class="post-1234 post type-post">
<div class="entry-meta clear">..</div>
<h1 class="entry-title">title</h1>
<div class="entry-content clear">
article content
<div id="jp-post-flair" class="sharedaddy">sharing links</div>
</div>
</article>
</div>
</div>
</div>
Blogspot:
<div id="content">
...
<div class="main" id="main">
<div class="post hentry">
<h3 class="post-title">title</h3>
<div class="post-header">...</div>
<div class="post-body">article content</div>
<div class="post-footer">...</div>
</div>
</div>
</div>
What I came up with (doc is a Nokogiri::HTML::Document):
def fetch_content
html = ''
['#content', '#main', 'article', '.post-body', '.entry-content', '#chapterContent'].each do |css|
candidate = doc.css(css).to_html
html = [html, candidate].select(&:present?).sort_by(&:length).first
end
self.content = html
end
It works relatively well for the examples I tested with but it still leaves some sharing and navigation links plus it won't work if a page uses more cryptic class names.
Is there a better way to do this?

There is a gem called pismo that implements a couple of algorithms that attempts to extract article content.
There is a java library boilerpipe which you can interface from JRuby which extract textual content of a webpage.

Use rapar this gives the facility to write domain specific parser like wordpress.com, blogspot.com etc

You could also use a free article extraction API, for example:
diffbot.com
embed.ly
textracto.com
Some of them work quite good, and as I know there are all easy to integrate with Ruby.

Related

'order' (bootsrap 4 grid) does not work as expected in Safari IOS

This code works perfectly in desktop on windows. but when i test it in safari ios the third col always collapse and jump to a new line.
here is the code:
<div class="container">
<div class="row">
<div class="col-4">
First, but unordered
</div>
<div class="col-4 order-sm-4">
Second in sm+, last in mobile
</div>
<div class="col-4 order-3">
Last in sm+, Second in mobile
</div>
</div>
</div>
Anybody else experienced it? is it a bug?
Thanks
I think you wrote the order classes wrong. Bootstrap is mobile first so col-sm for example is sm and up (to lg).
This is probably what you want:
<div class="container">
<div class="row">
<div class="col-4">
First, but unordered
</div>
<div class="col-4 order-3 order-sm-1">
Second in sm+, last in mobile
</div>
<div class="col-4 order-2">
Last in sm+, Second in mobile
</div>
</div>
</div>
Tell me if there's any issue with this code!

Setting Data-Parent and HREF dynamically in a for-loop

Previously to create Accordion controls I used to use this piece of code:
<div class="panel-group" id="accordionMessagesSetup">
<div class="panel panel-default">
<div class="panel-heading">
<h4 class="panel-title">
<a class="accordion-toggle" data-toggle="collapse" data-parent="#accordionMessagesSetup" href="#collapseMessagesSetup">
<span class="glyphicon glyphicon-chevron-up"></span>
Message Setup
</a>
</h4>
</div>
<div id="collapseMessagesSetup" class="panel-collapse collapse in">
<div>
<p style="background-color: red"> Someting ELSE in here</p>
<p style="background-color: red"> Someting ELSE2 in here</p>
</div>
</div>
</div>
</div>
or as seen here: Bootplay Live Demo
Now I still want to use my example but in this page I have a for-each loop so I need to create these at run-time.
The items I need to put variables there in order for this to work are
id="accordionMessagesSetup"
data-parent="#accordionMessagesSetup"
href="#collapseMessagesSetup"
id="collapseMessagesSetup"
How can I initialize those in a for-each loop a mode using Razor?
Imagine you have whatever property you like to do it in the model.
The biggest issue you are/will likely run into is Razor parsing. When you try to use a Razor variable in the middle of some bit of text, often Razor cannot determine where the variable name ends. For example, if you were to do something like:
<div id="accordion#Model.IdMessageSetup">
Razor thinks it needs to look for a property on the model named IdMessageSetup, when actually, you just wanted Id. The easiest way to fix this is to wrap the variable in paranthesis:
<div id="accordion#(Model.Id)MessageSetup">
Now, it's clear which part is the variable. As far as adding the # sign goes, I'm not really sure what the confusion is there. You just put it where it needs to go:
<a href="#collapse#(Model.Id)MessagesSetup">
Nothing special required.

How to override Materialize CSS in Ruby on Rails? [duplicate]

I've been looking through some posts on the internet about Materialize in Rails, however this area seems to be very fuzzy. I am currently using the materialize-sass gem.
I didn't find very many helpful posts, I decided to resort here.
Here's my code for a page I have pages/discover.html.erb
<%= stylesheet_link_tag "https://fonts.googleapis.com/icon?family=Material+Icons" %>
<div class="row">
<div class="col s12 m5">
<div class="card">
<div class="card-image b">
<img src="http://i.huffpost.com/gen/1456585/images/o-CRAIG-COBB-facebook.jpg">
<span class="card-title">Cregg Cobb Is Back At It Again, But Only This Time He Has Guns</span>
</div>
<div class="card-content">
<p>I am a very simple card. I am good at containing small bits of information.
I am convenient because I require little markup to use effectively.</p>
</div>
<div class="card-action">
<div class="chip">
<img src="http://i.huffpost.com/gen/1456585/images/o-CRAIG-COBB-facebook.jpg" id="cobb">
Like
</div>
</div>
</div>
</div>
</div>
I want to make the <img> darker by using filter: brightness(50%);
How do I go about doing this? Where do I begin?
Bonus: If you know where I can add the jQuery stuff that Materializecss.com mentions when discussing components and transitions, it would be greatly appreciated!
Since it's just a single override, you could add a new .css file to assets (or wherever you're storing static stuff), with a !important class:
img.darker {
filter: brightness(50%) !important;
}
Then import it at the top of your template:
<%= stylesheet_link_tag "/path/to/css.file" %>
And add the class to your image tag:
<img src="/my/img.png" class="darker">
Alternatively you could just style the image tags individually:
<img src="/my/img.png" style="filter:brightness(50%)">

Using an HTML Action Link to target a div

I am new to ASP/MVC and I am having trouble figuring out how to link a div to a page in HTML markup. This is the current link in pure HTML. I want to accomplish this, but in razor syntax
<div class="col-md-2">
<a href="ambulance.html">
<div class="amb item">
<div class="tiletext">AMB</div>
<div class="tilesubtext">Ambulance</div>
</div>
</a>
</div>
I've been looking into action links, but if there is a better way to accomplish this, I'm open to it!
Possible duplicate. I'll add a bit of explanation that pertains to the question since it has to do with Razor:
What your backend developer needs is Url.Action helper. This will let you route the link through the MVC framework.
So say:
<div class="col-md-2">
<a href="#Url.Action("Cars", "Ambulance")">
<div class="amb item">
<div class="tiletext">AMB</div>
<div class="tilesubtext">Ambulance</div>
</div>
</a>
</div>
ASP.NET MVC: generating action link with custom html in it
Html.ActionLink method only creates anchor tags for some action method, you need Url.Action method. Using the rest of the markup is fine.
<div class="col-md-2">
<a href="#Url.Action("Index","Home")">
<div class="amb item">
<div class="tiletext">AMB</div>
<div class="tilesubtext">Ambulance</div>
</div>
</a>
</div>

Twitter Bootstrap isn't aligning propertly

I would like to know if it's a bug or some mistake of mine.
In my Rails project, twitter's bootstrap isn't aligning the container at the center.
My code:
<div class="container">
<div class="span12" style="background-color:blue; color:white;">
<%= yield %> <!-- This get the sample text "Some test." -->
</div>
</div>
Here's an image of the bug.
http://img543.imageshack.us/img543/3405/screenshot20120702at035.png
Thanks a lot everybody! :-)
<div class="container">
<div class="row">
<div class="span12">
</div>
</div>
</div>
DEMO

Resources