rails: get a teaser/excerpt for an article - ruby-on-rails

I have a page that will list news articles. To cut down on the page's length, I only want to display a teaser (the first 200 words / 600 letters of the article) and then display a "more..." link, that, when clicked, will expand the rest of the article in a jQuery/Javascript way. Now, I've all that figured out and even found the following helper method on some paste page, which will make sure, that the news article (string) is not chopped up right in the middle of a word:
def shorten (string, count = 30)
if string.length >= count
shortened = string[0, count]
splitted = shortened.split(/\s/)
words = splitted.length
splitted[0, words-1].join(" ") + ' ...'
else
string
end
end
The problem that I have is that the news article bodies that I get from the DB are formatted HTML. So if I'm unlucky, the above helper will chop up my article string right in the middle of an html tag and insert the "more..." string there (e.g. between ""), which will corrupt my html on the page.
Is there any way around this or is there a plugin out there that I can use to generate excerpts/teasers from an HTML string?

You can use a combination of Sanitize and Truncate.
truncate("And they found that many people were sleeping better.",
:omission => "... (continued)", :length => 15)
# => And they found... (continued)
I'm doing a similar task where I have blog posts and I just want to show a quick excerpt. So in my view I simply do:
sanitize(truncate(blog_post.body, length: 150))
That strips out the HTML tags, gives me the first 150 characters and is handled in the view so it's MVC friendly.
Good luck!

My answer here should do work. The original question (err, asked by me) was about truncating markdown, but I ended up converting the markdown to HTML then truncating that, so it should work.
Of course if your site gets much traffic, you should cache the excerpt (perhaps when the post is created/updated, you could store the excerpt in the database?), this would also mean you could allow the user to modify or enter their own excerpt
Usage:
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
..and the code (copied from the other answer):
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end

Thanks a lot for your answers!
However, in the meantime I stumbled upon the jQuery HTML Truncator plugin, which perfectly fits my purposes and shifts the truncation to the client-side. It doesn't get any easier :-)

you would have to write a more complex parsers if you dont want to split in the middle of html elements. it would have to remember if it is in the middle of a <> block and if its between two tags.
even if you did that, you would still have problems. if some put the whole article into an html element, since the parser couldnt split it anywhere, because of the missing closing tag.
if it is possible at all i would try not to put any tags into the articles or keep it to tags that dont contain anything (no <div> and so on). that way you would only have to check if you are in the middle of a tag which is pretty simple:
def shorten (string, count = 30)
if string.length >= count
shortened = string[0, count]
splitted = shortened.split(/\s/)
words = splitted.length
if(splitted[words-1].include? "<")
splitted[0,words-2].join(" ") + ' ...'
else
splitted[0, words-1].join(" ") + ' ...'
else
string
end
end

I would have sanitized the HTML and extracted the first sentence. Assuming you have an article model, with a 'body' attribute that contains the HTML:
# lib/core_ext/string.rb
class String
def first_sentence
self[/(\A[^.|!|?]+)/, 1]
end
end
# app/models/article.rb
def teaser
HTML::FullSanitizer.new.sanitize(body).first_sentence
end
This would convert "<b>This</b> is an <em>important</em> article! And here is the rest of the article." into "This is an important article".

I solved this using following solution
Install gem 'sanitize'
gem install sanitize
and used following code, here body is text containing html tags.
<%= content_tag :div, Sanitize.clean(truncate(body, length: 200, separator: ' ', omission: "... #{ link_to '(continue)', '#' }"), Sanitize::Config::BASIC).html_safe %>
Gives excerpt with valid html.
I hope it helps somebody.

There is now a gem named HTMLTruncator that takes care of this for you. I've used it to display post excerpts and the like, and it's very robust.

If you are using Active Text, I would suggest first converting the text using to_plain_text.
truncate(sanitize(career.content.body.to_plain_text), length: 150).squish

Related

How can you splice together a string with link_to variables that have indices relative to the string in Ruby on Rails?

I'm trying to take many posts with example text "you can find other #apple #orchard examples at www.google.com and www.bing.com #funfruit" and display the text to the user with URLs and #tags linking to their appropriate routes.
I have successfully done this with text that only contains any number of #tags, or a single URL, with the following code:
application_controller.rb
def splice_posts(posts, ptags, spliced)
# Build all posts as items in spliced, with each item an post_pieces array
posts.reverse.each do |post|
tag_indices = []
tag_links = []
# Get post URLs: [{:url=>"www.google.com", :indices=>[209, 223]}]
post_links = extract_urls_with_indices(post.text)
# Save each as rails style link with indices
# For each of the ptags associated with post
ptags.where(post_id:post.id).each do |ptag|
# Store hashtag's start/stop indices for splicing post
tag_indices.append([ptag.index_start, ptag.index_end])
# Store hashtag links for splicing post
tag_links.append(view_context.link_to '#' + ptag.hashtag, atag_path(Atag.find_by(id:ptag.atag_id).id),
:class => 'post_hashtag', :remote => true, :onclick => "location.href='#top'")
end
# Create and store post as post_pieces in spliced
# If there are no hashtags
if tag_indices.length == 0
# And no links
if post_links.length == 0
spliced.append([post.text, post.id])
# But links
else
spliced.append([post.text[0..post_links[0][:indices][0]-2],
view_context.link_to(post_links[0][:url], post_links[0][:url], target: "_blank"),
post.text[post_links[0][:indices][1]..-1], post.id])
end
# Elsif there is one hashtag
elsif tag_indices.length == 1
if post.text[0] == '#'
spliced.append([post.text[2..tag_indices[0][0]], tag_links[0],
post.text[tag_indices[0][1]..-1], post.id])
else
spliced.append([post.text[0..tag_indices[0][0]-2], tag_links[0],
post.text[tag_indices[0][1]..-1], post.id])
end
# Else there are multiple hashtags, splice them in and store
else
# Reset counter for number of tags in this post
#tag_count = 0
# If post begins with tag, no text before first tag
if tag_indices[0][0] == 0
post_pieces = []
# Else store text before first tag
else
post_pieces = [post.text[0..tag_indices[0][0]-2]]
end
# Build core of post_pieces, splicing together tags and text
tag_indices.each do |indice|
post_pieces.append(tag_links[#tag_count])
post_pieces.append(post.text[indice[1]..tag_indices[#tag_count+1][0]-2])
if #tag_count < (tag_indices.length-2)
#tag_count += 1
else
# Do nothing
end
end
# Knock off the junk at the end
post_pieces.pop
post_pieces.pop
# Finish compiling post_pieces and store it in spliced
post_pieces.append(tag_links[-1])
post_pieces.append(post.text[tag_indices[-1][1]..-1])
# Make last item in array post id for comment association purposes
post_pieces.append(post.id)
spliced.append(post_pieces)
end
end
end
The spliced posts are then easily served in the view piece by piece:
<% #posts_spliced.each do |post_pieces| %>
<%# Build post from pieces (text and hashtags), excluding last element which is post_id %>
<% post_pieces[0..-2].each do |piece| %>
<%= piece %>
<% end %>
<% end %>
The problem is that this implementation is convoluted to begin with, and trying to patch it with dozens of nested if/else statement to handle URLs seems like madness, as I'm suspecting that a more experienced software engineer/rails developer could enlighten me on how to do this with a fraction of the code.
To clarify I have the following variables already available for each post (with examples) :
post = 'some text with #tags and www.urls.com potentially #multiple of each.com'
post_urls = [{:url=>"www.urls.com", :indices=>[25, 37]}, {:url=>"each.com", :indices=>[63, 71]}]
post_tags = [{:hashtag=>"tags", :indices=>[15, 20]}, {:hashtag=>"multiple", :indices=>[50, 59]}]
I'm thinking that a more practical implementation might involve the indices more directly, but perhaps breaking the post into elements in an array is the wrong idea altogether, or perhaps there is an easier way, but before I spend a couple hours conceptualizing the logic and writing the code for another possible unideal solution, I thought I should see if someone could enlighten me here.
Thanks so much!
Unless I'm missing something important, I think you've overcomplicated things.
First, you split the string by spaces.
string = "whatever string typed in by user"
split_string = string.split
Then you map the split-string-array according to your requirements and join the results.
# create show_hashtag(str) and show_link(str) helpers
split_string.map do |str|
if str.starts_with?('#')
show_hashtag(str)
elsif url_regexp.match(str) # you must define url_regexp
show_link(str)
else
str
end
end.join(' ')
You won't have to worry about positions of the text, tags, or links because map will take care of it for you.
Wrap all of that in a helper and in your view you could do the following:
<%= your_helper(string_typed_in_by_user).html_safe %>
Watch out for the user typing in HTML though!

Ruby, limit the number of files returned with Find module per request

I have a controller that calls a find_photos method, passing it a query string (name of file)
class BrandingPhoto < ActiveRecord::Base
def self.find_photos(query)
require "find"
found_photos = []
Find.find("u_photos/photo_browse/photos/") do |img_path|
# break off just the filename from full path
img = img_path.split('/').last
if query.blank? || query.empty?
# if query is blank, they submitted the form with browse all- return all photos
found_photos << img
else
# otherwise see if the file includes their query and return it
found_photos << img if img.include?(query)
end
end
found_photos.empty? ? "no results found" : found_photos
end
end
This is just searching a directory full of photos- there is no table backing this.
Ideally what I would like is to be able to limit the number of results returned by find_photos to around 10-15, then fetch the next 10-15 results as needed.
I was thinking that the code to do this might involve looping through 10 times and grabbing those files- store the last filename in a variable or as a parameter, and then send that variable back to the method, telling it to continue the search from that filename.
This assumes that the files are looped through in the same order everytime, and that there is no simpler way to accomplish this.
If there are any suggestions, I'd love to hear them/see some examples of how you'd accomplish this.
Thank you.
The first thing that comes to mind for this problem is to cut the array down after you come out of the loop. This wouldn't work well with a ton of files though A different solution might be to add a break for the size of the array viz. break if found_photos.length > 10 inside the loop
It's not too hard to do what you want, but you need to consider how you'll handle entries that are added or removed in-between page loads, filenames with UTF-8 or Unicode characters, and embedded/parent directories.
This is old-school code for the basis for what you're talking about:
require 'erb'
require 'sinatra'
get '/list_photos' do
dir = params[ :dir ]
offset = params[ :offset ].to_i
num = params[ :num ].to_i
files = Dir.entries(dir).reject{ |fn| fn[/^\./] || File.directory?(File.join(dir, fn)) }
total_files = files.size
prev_a = next_a = ''
if (offset > 0)
prev_a = "<a href='/list_photos?dir=#{ dir }&num=#{ num }&offset=#{ [ 0, offset - num ].max }'><< Previous</a>"
end
if (offset < total_files)
next_a = "<a href='/list_photos?dir=#{ dir }&num=#{ num }&offset=#{ [ total_files, offset + num ].min }'>Next >></a>"
end
files_to_display = files[offset, num]
template = ERB.new <<EOF
<html>
<head></head>
<body>
<table>
<% files_to_display.each do |f| %>
<tr><td><%= f %></td></tr>
<% end %>
</table>
<%= prev_a %> | <%= total_files %> files | <%= next_a %>
</body>
</html>
EOF
content_type 'text/html'
template.result(binding)
end
It's a little Sinatra server, so save it as test.rb and run from the command-line using:
ruby test.rb
In a browser connect to the running Sinatra server using a URL like:
http://hostname:4567/list_photos?dir=/path/to/image/files&num=10&offset=0
I'm using Sinatra for convenience, but the guts of the routine is the basis for what you want. How to convert it into Rails terms is left as an exercise for the reader.

Using lists in prawn

Im using prawn to create pdfs that contain much data in table format and some lists. The problem with the lists is that Im just using text as lists because there is no semantic equivalent to ul > li lists like I use them in the webfrointend. So the lists arent justified. A list point that uses more than one line looks creapy because I doesnt fit the list icon. How can I implement lists in prawn that dont look like crap?
Prawn was a good PDF library but the problem is its own view system. There is Prawn-format but is not maintained anymore.
I suggest to use WickedPDF, it allows you to include simple ERB code in your PDF.
Using Prawn: another dirty and ugly solution is a two column table without border, first column contains list-bullet, second column text:
table([ ["•", "First Element"],
["•", "Second Element"],
["•", "Third Element"] ])
I just had a similar problem and solved it within Prawn a slightly different way than using a table:
["Item 1","Item 2","Item 3"].each() do |list-item|
#create a bounding box for the list-item label
#float it so that the cursor doesn't move down
float do
bounding_box [15,cursor], :width => 10 do
text "•"
end
end
#create a bounding box for the list-item content
bounding_box [25,cursor], :width => 600 do
text list-item
end
#provide a space between list-items
move_down(5)
end
This could obviously be extended (for example, you could do numbered lists with an each_with_index() rather than each()). It also allows for arbitrary content in the bounding box (which isn't allowed in tables).
An excellent solution that respects the cursor position as well as render like a true list with a small number of lines of code is:
items = ["first","second","third"]
def bullet_list(items)
start_new_page if cursor < 50
items.each do |item|
text_box "•", at: [13, cursor]
indent(30) do
text item
end
end
end
The start_new_page clause covers scenarios where the bullet line item may need to go onto the next page. This maintains keeping the bullet with the bullet content.
Example PDF Rendering Screenshot:
To create a bullet with Adobe's built in font, use \u2022.
\u2022 This will be the first bullet item
\u2022 blah blah blah
Prawn supports symbols (aka glyphs) with WinAnsi codes and these must be encoded as UTF-8. See this post for more details: https://groups.google.com/forum/#!topic/prawn-ruby/axynpwaqK1g
The Prawn manual has a complete list of the glyphs that are supported.
Just did this for a customer. For everybody who wants to render preformatted html containing ul / ol lists:
def render_html_text(text, pdf)
#render text (indented if inside ul)
indent = 0 #current indentation (absolute, e.g. n*indent_delta for level n)
indent_delta = 10 #indentation step per list level
states = [] #whether we have an ol or ul at level n
indices = [] #remembers at which index the ol list at level n, currently is
#while there is another list tag do
# => starting position of list tag is at i
# render everything that comes before the tag
# cut everything we have rendered from the whole text
#end
while (i = text.index /<\/?[ou]l>/) != nil do
part = text[0..i-1]
if indent == 0 #we're not in a list, but at the top level
pdf.text part, :inline_format => true
else
pdf.indent indent do
#render all the lis
part.gsub(/<\/li>/, '').split('<li>').each do |item|
next if item.blank? #split may return some ugly start and end blanks
item_text = if states.last == :ul
"• #{item}"
else # :ol
indices[indices.length-1] = indices.last + 1
"#{indices.last}. #{item}"
end
pdf.text item_text, :inline_format => true
end
end
end
is_closing = text[i+1] == '/' #closing tag?
if is_closing
indent -= indent_delta
i += '</ul>'.length
states.pop
indices.pop
else
pdf.move_down 10 if indent == 0
type_identifier = text[i+1] #<_u_l> or <_o_l>
states << if type_identifier == 'u'
:ul
elsif type_identifier == 'o'
:ol
else
raise "what means type identifier '#{type_identifier}'?"
end
indices << 0
indent += indent_delta
i += '<ul>'.length
end
text = text[i..text.length-1] #cut the text we just rendered
end
#render the last part
pdf.text text, :inline_format => true unless text.blank?
end
One go-around is to create a method similar to crm's answer. The difference is that it won't break when the text goes to another page and you can have multiple levels as well.
def bullet_item(level = 1, string)
indent (15 * level), 0 do
text "• " + string
end
end
Simply call this method like so:
bullet_item(1, "Text for bullet point 1")
bullet_item(2, "Sub point")
Feel free to refactor.
I think a better approach is pre-processing the HTML string using Nokogiri, leaving only basics tags that Prawn could manage with "inline_format" option, as in this code:
def self.render_html_text(instr)
# Replacing <p> tag
outstr = instr.gsub('<p>',"\n")
outstr.gsub!('</p>',"\n")
# Replacing <ul> & <li> tags
doc = Nokogiri::HTML(outstr)
doc.search('//ul').each do |ul|
content = Nokogiri::HTML(ul.inner_html).xpath('//li').map{|n| "• #{n.inner_html}\n"}.join
ul.replace(content)
end
#removing some <html><body> tags inserted by Nokogiri
doc.at_xpath('//body').inner_html
end

Rails - Building a helper that breaks long words/URLs with <WBR>

my app sends out emails. If there is a very long word or a long URL, it is breaking the email viewing experience by not letting the iphone zoom ad tighten in.
Here's what I've come up with so far but it's not working, thoughts?
Helper
def html_format(string, max_width=12)
text = string.gsub("\n", '<br />').html_safe.strip
(text.length < max_width) ?
text :
text.scan(/.{1,#{max_width}}/).join("<wbr>")
return text
end
View
<%= html_format(#comment.content) %>
Here's a method I found online that seems to work well for splitting long strings with <wbr>:
def split_str(str, len = 10)
fragment = /.{#{len}}/
str.split(/(\s+)/).map! { |word|
(/\s/ === word) ? word : word.gsub(fragment, '\0<wbr />')
}.join
end
This post shows how to wrap long words with regular expressions.

Don't escape html in ruby on rails

rails 3 seems to escape everything, including html. I have tried using raw() but it still escapes html. Is there a workaround? This is my helper that I am using (/helpers/application_helper.rb):
module ApplicationHelper
def good_time(status = true)
res = ""
if status == true
res << "Status is true, with a long message attached..."
else
res << "Status is false, with another long message"
end
end
end
I am calling the helper in my view using this code:
<%= raw(good_time(true)) %>
You can use .html_safe like this:
def good_time(status = true)
if status
"Status is true, with a long message attached...".html_safe
else
"Status is false, with another long message".html_safe
end
end
<%= good_time(true) %>
I ran into this same thing and discovered a safer solution than using html_safe, especially once you introduce strings which are dynamic.
First, the updated code:
def good_time(long_message1, long_message2, status = true)
html = "".html_safe
html << "Status is #{status}, "
if status
html << long_message1
else
html << long_message2
end
html
end
<%= good_time(true) %>
This escapes long_message content if it is unsafe, but leaves it unescaped if it is safe.
This allows "long message for success & such." to display properly, but also escapes "malicious message <script>alert('foo')</script>".
The explanation boils down to this -- 'foo'.html_safe returns an ActiveSupport::SafeBuffer which acts like a String in every way except one: When you append a String to a SafeBuffer (by calling + or <<), that other String is HTML-escaped before it is appended to the SafeBuffer. When you append another SafeBuffer to a SafeBuffer, no escaping will occur. Rails is rendering all of your views under the hood using SafeBuffers, so the updated method above ends up providing Rails with a SafeBuffer that we've controlled to perform escaping on the long_message "as-needed" rather than "always".
Now, the credit for this answer goes entirely to Henning Koch, and is explained in far more detail at Everything you know about html_safe is wrong -- my recap above attempts only to provide the essence of the explanation in the event that this link ever dies.

Resources