Truncate Markdown? - ruby-on-rails

I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function

Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end

Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.

You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview

Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.

I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting

Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/

A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)

Related

Multiple `gsub` in multiple `each` loops getting overriden one by another

Trying to iterate over some phrases, and whenever I find a word, I need to replace it with a link.
phrases = ["hello world", "worldwide"]
words_to_link = ["world", "world"]
I am trying to get:
"hello <a href='world'>world</a><br />worldwide"
My code is:
phrases.each do |ph|
words_to_link.each do |w|
ph.gsub!(w, "<a href='#{w}'>#{w}</a>")
end
end.join("<br />").html_safe
The output of this is:
"hello <a href='<a href='world'>world</a>'><a href='world'>world</a></a><br /><a href='<a href='world'>world</a>'><a href='world'>world</a></a>wide"
On the first run it finds all occurrences of world, but on the second, it goes inside the generated world and gsubs again.
Another problem is the proper regex to only find words by boundaries, I thought it would be /\b(word)\b, but that didn't work.
Any pointers?
I'm a little confused by your question, so may have got the wrong end of the stick here. However, here is an answer by my interpretation:
phrases = ["hello world", "worldwide"]
substitutions = { /\bworld\b/ => "world" }
phrases.each do |ph|
substitutions.each do |pattern, replacement|
ph.gsub!(pattern, "<a href='#{replacement}'>#{replacement}</a>")
end
end
phrases.join("<br />").html_safe
You can use \b in a regex to mark a work boundary, to avoid altering the "worldwide" string. And (I think this is what you wanted?) you can define some mapping between the search/replace terms rather than looping though twice, to avoid the double-replacement.

Add space between elements using sanitize and truncate in Rails

Working on HTML content and trying to generate some excerpt for every post, I'm using the following code to sanitize and then truncate, but the issue is that the generated text without any spaces.
= sanitize(post.body, tags: []).truncate(155, separator: ' ...').html_safe
The following image form the editor, every word is a single p
element, there may be any HTML element like images, video, .. . I only want to show only text.
The following image is the view, I want to add spaces between theses words (p elements)
Inspector
How I can sanitize, truncate the content and keeping the format
The problem is you dont have spaces around the tags i.e
The text is something like
include ActionView::Helpers::SanitizeHelper
> content = "<p>this</p><p>is</p><p>great</p><p>and</p><p>for</p><p>some</p><p>reason</p>"
> strip_tags(content)
> "thisisgreatandforsomereason"
> content = "<p>this </p><p>is </p><p>great </p><p>and </p><p>for </p><p>some </p><p>reason </p>"
> strip_tags(content)
> "this is great and for some reason"
either try giving spaces around the <p> tag or use regex to create spaces
for ex
> content = "<p>this</p><p>is</p><p>great</p><p>and</p><p>for</p><p>some</p><p>reason</p>"
> strip_tags(content.gsub('</p>', ' </p>')) # Note the space in replaced content
> "this is great and for some reason"

Gemoji breaks Kramdown's HTML

Why does Kramdown's autolinking parser break when running it over a gemojified text field?
For [Test](http://google.com "Test") I'm getting:
Test
instead of the expected output:
Test
Live app: http://runnable.com/VAL1VuMjrGFur2yx/forem-gemoji-kramdown (see the Test post)
application_helper.rb:
def add_emojify_and_kramdown(text)
raw(Kramdown::Document.new(emojify(text)).to_html)
end
[...snip...]
def emojify(text)
h(text).to_str.gsub(/:([a-z0-9\+\-_]+):/) do |match|
if emoji = Emoji.find_by_alias($1)
'![' + $1 + '](' + asset_path("emoji/#{emoji.image_filename}") + ')'
else
match
end
end
end
Some additional info:
raw(Kramdown::Document.new(text).to_html) returns the expected output, but without Gemoji
raw(emojify(text)) doesn't change anything seeing as how text contains no emojis
raw(emojify(Kramdown::Document.new(text).to_html)) returns the expected output, but as raw HTML
The first thing your emojify method does is h(text), which HTML escapes the input, converting
[Test](http://google.com "Test")
into
[Test](http://google.com "Test")
Kramdown then operates on this string, and since it no longer contains quote marks it assumes the whole contents of (...) is the URL, producing:
Test
To get it to work you just need to drop the call to h: text.gsub(.... You’ll likely need to think about how to manage your string safety if this is external data.

Performance implications of using :coffescript filter inside HAML templates?

So HAML 4 includes a coffeescript filter, which allows us coffee-loving rails people to do neat things like this:
- word = "Awesome."
:coffeescript
$ ->
alert "No semicolons! #{word}"
My question: For the end user, is this slower than using the equivalent :javascript filter? Does using the coffeescript filter mean the coffeescript will be compiled to javascript on every page load (which would obviously be a performance disaster), or does this only happen once when the application is started?
It depends.
When Haml compiles a filter it checks to see if the filter text contains any interpolation (#{...}). If there isn’t any then it will be the same text to transform on each request, so the conversion is done once at compile time and the result included in the template.
If there is interpolation in the filter text, then the actual text to transform will vary on each request, so the Coffeescript will need to be compiled each time.
Here’s an example. First with no interpolation:
:coffeescript
$ ->
alert "No semicolons! Awesome"
This generates the code (use haml -d to see the generated Ruby code):
_hamlout.buffer << "<script>\n (function() {\n $(function() {\n return alert(\"No semicolons! Awesome\");\n });\n \n }).call(this);\n</script>\n";
This code simply adds a string to the buffer, so no Coffeescript is being recompiled.
Now with interpolation:
- word = "Awesome."
:coffeescript
$ ->
alert "No semicolons! #{word}"
This generates:
word = "Awesome."
_hamlout.buffer << "#{
find_and_preserve(Haml::Filters::Coffee.render_with_options(
"$ ->
alert \"No semicolons! #{word}\"\n", _hamlout.options))
}\n";
Here, since Haml needs to wait to see what the value of the interpolation is, the Coffeescript is recompiled each time.
You can avoid compiling the Coffeescript on each request by not having any interpolation inside your :coffeescript filters.
The :javascript filter behaves similarly, checking to see if there is any interpolation, but since the :javascript filter only outputs some text to the buffer when it runs there is much less of a performance hit using it. You could possibly combine :javascript and :coffeescript filters, putting interpolated data in :javascript and keeping your :coffeescript static:
- word = "Awesome"
:javascript
var message = "No semicolons! #{word}";
:coffeescript
alert message
matt's answer is clear on what is going on. I made a helper to add locals to :coffeescript filters from a hash. This way you don't need to use global JavaScript variables. As a side note: on Linux, the slowdown is really negligible. On Windows however, the impact on performance is quite important (easily more than 100ms per block to compile).
module HamlHelper
def coffee_with_locals locals={}, &block
block_content = capture_haml do
block.call
end
return block_content if locals.blank?
javascript_locals = "\nvar "
javascript_locals << locals.map{ |key, value| j(key.to_s) + ' = ' + value.to_json.gsub('</', '<\/') }.join(",\n ")
javascript_locals << ";\n"
content_node = Nokogiri::HTML::DocumentFragment.parse(block_content)
content_node.search('script').each do |script_tag|
# This will match the '(function() {' at the start of coffeescript's compiled code
split_coffee = script_tag.content.partition(/\(\s*function\s*\(\s*\)\s*\{/)
script_tag.content = split_coffee[0] + split_coffee[1] + javascript_locals + split_coffee[2]
end
content_node.to_s.html_safe
end
end
It allows you to do the following:
= coffee_with_locals "test" => "hello ", :something => ["monde", "mundo", "world"], :signs => {:interogation => "?", :exclamation => "!"} do
:coffeescript
alert(test + something[2] + signs['exclamation'])
Since there is no interpollation, the code is actually compiled as normal.

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.

Resources