Add space between elements using sanitize and truncate in Rails - ruby-on-rails

Working on HTML content and trying to generate some excerpt for every post, I'm using the following code to sanitize and then truncate, but the issue is that the generated text without any spaces.
= sanitize(post.body, tags: []).truncate(155, separator: ' ...').html_safe
The following image form the editor, every word is a single p
element, there may be any HTML element like images, video, .. . I only want to show only text.
The following image is the view, I want to add spaces between theses words (p elements)
Inspector
How I can sanitize, truncate the content and keeping the format

The problem is you dont have spaces around the tags i.e
The text is something like
include ActionView::Helpers::SanitizeHelper
> content = "<p>this</p><p>is</p><p>great</p><p>and</p><p>for</p><p>some</p><p>reason</p>"
> strip_tags(content)
> "thisisgreatandforsomereason"
> content = "<p>this </p><p>is </p><p>great </p><p>and </p><p>for </p><p>some </p><p>reason </p>"
> strip_tags(content)
> "this is great and for some reason"
either try giving spaces around the <p> tag or use regex to create spaces
for ex
> content = "<p>this</p><p>is</p><p>great</p><p>and</p><p>for</p><p>some</p><p>reason</p>"
> strip_tags(content.gsub('</p>', ' </p>')) # Note the space in replaced content
> "this is great and for some reason"

Related

Nokogiri: Get text which is not inside the <a> tag

Take a look at this example:
<li>This is a website, it belongs to John Sulliva</li>
I can get the content of the <li> tag by using:
nodeset = doc.css('li')
I also can get the text inside the <a> tag by using:
nodeset.each do |element|
ahref = element.css('a') // <-- This is a website
name = ahref.text.strip // <--This is a website
end
But how do I get the rest of the text within the <li> tag but without the text from the <a> tag?
From this example, I like to get
", it belongs to John Sullivan"
How can I do this?
This is straightforward using XPath and the text() node test. If you have extracted the lis into nodeset, you can get the text with:
nodeset.xpath('./text()')
Or you can get it directly from the whole doc:
doc.xpath('//li/text()')
This uses the text() node test as part of te XPath expression, not the text Ruby method. It extracts any text nodes that are direct descendants of the li node, so doesn’t include the contents of the a element.
I found a cheap way to get the rest of the text:
ahref = element.css('a')
name = ahref.text.strip
suppl = element.text.strip.gsub(name, '')

How to show String new lines on gsp grails file?

I've stored a string in the database. When I save and retrieve the string and the result I'm getting is as following:
This is my new object
Testing multiple lines
-- Test 1
-- Test 2
-- Test 3
That is what I get from a println command when I call the save and index methods.
But when I show it on screen. It's being shown like:
This is my object Testing multiple lines -- Test 1 -- Test 2 -- Test 3
Already tried to show it like the following:
${adviceInstance.advice?.encodeAsHTML()}
But still the same thing.
Do I need to replace \n to or something like that? Is there any easier way to show it properly?
Common problems have a variety of solutions
1> could be you that you replace \n with <br>
so either in your controller/service or if you like in gsp:
${adviceInstance.advice?.replace('\n','<br>')}
2> display the content in a read-only textarea
<g:textArea name="something" readonly="true">
${adviceInstance.advice}
</g:textArea>
3> Use the <pre> tag
<pre>
${adviceInstance.advice}
</pre>
4> Use css white-space http://www.w3schools.com/cssref/pr_text_white-space.asp:
<div class="space">
</div>
//css code:
.space {
white-space:pre
}
Also make a note if you have a strict configuration for the storage of such fields that when you submit it via a form, there are additional elements I didn't delve into what it actually was, it may have actually be the return carriages or \r, anyhow explained in comments below. About the good rule to set a setter that trims the element each time it is received. i.e.:
Class Advice {
String advice
static constraints = {
advice(nullable:false, minSize:1, maxSize:255)
}
/*
* In this scenario with a a maxSize value, ensure you
* set your own setter to trim any hidden \r
* that may be posted back as part of the form request
* by end user. Trust me I got to know the hard way.
*/
void setAdvice(String adv) {
advice=adv.trim()
}
}
${raw(adviceInstance.advice?.encodeAsHTML().replace("\n", "<br>"))}
This is how i solve the problem.
Firstly make sure the string contains \n to denote line break.
For example :
String test = "This is first line. \n This is second line";
Then in gsp page use:
${raw(test?.replace("\n", "<br>"))}
The output will be as:
This is first line.
This is second line.

Regular expression to determine each and every attribute of an anchor tag inside HTML content

I basically wanted the values of each and every attribute. The attributes may be optional and the href may contain HTTP or HTTPS.
A sample anchor tag inside content is:
<a class=\"direct_link\" rel=\"nofollow\" target=\"_blank\" href=\"http://google.com\">link text</a>
Sample HTML content is:
<p><br></p><h1>A beautiful <a class=\"f-link\" rel=\"nofollow\" target=\"_blank\" href=\"fake.com/abc.html\">jQuery</a>; a</h1><h3 class=\"text-light\">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's.</h3><p><br></p><p><br></p>
Don't use a regular expression to try to parse HTML. HTML can be expressed too many ways and still be valid, yet it will break your pattern and code.
The correct way to get the values for the parameters is to use a parser. Nokogiri is the defacto XML/HTML parser for Ruby:
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse(' <a class=\"direct_link\" rel=\"nofollow\" target=\"_blank\" href=\"http://google.com\">link text</a>')
That parses the document into a DOM and returns it.
link = doc.at('a')
at finds the first instance using the CSS 'a' selector. (If you want to iterate over them all you can use search, which returns a NodeSet, which is akin to an Array.)
At this point link is a Node, which we can consider to be like a pointer to the <a> tag.
link.to_h # => {"class"=>"\\\"direct_link\\\"", "rel"=>"\\\"nofollow\\\"", "target"=>"\\\"_blank\\\"", "href"=>"\\\"http://google.com\\\""}
That is the link's parameters and their values turned into a hash. Or, you can directly access the parameters, using keys, or their values:
link.values # => ["\\\"direct_link\\\"", "\\\"nofollow\\\"", "\\\"_blank\\\"", "\\\"http://google.com\\\""]
link.keys # => ["class", "rel", "target", "href"]
Or treat it like a hash and iterate over the key/value pairs:
link.each do |k, v|
puts 'parameter: "%s" value: "%s"' % [k, v]
end
# >> parameter: "class" value: "\"direct_link\""
# >> parameter: "rel" value: "\"nofollow\""
# >> parameter: "target" value: "\"_blank\""
# >> parameter: "href" value: "\"http://google.com\""
The advantage to using the parser, is that the HTML format can change and the parser is still able to figure it out, and your code won't care. The following format works just as good as the tag used above:
doc = Nokogiri::HTML::DocumentFragment.parse(' <a
class=\"direct_link\"
rel=\"nofollow\" target=\"_blank\"
href=\"http://google.com\">
link text
</a>')
Try doing that with a pattern.
Well if you want does the stuff in the quotes it would be this:
"([\w:\/.]+)\\"
Test it here
Otherwise if you want the name before the quotes it would be this:
(\w+=\\"[\w:\/.]+\\")
Test it here
This one matches tags without backslashes:
(\w+="[\w:\/.-]+")
Test it here

Generate a link_to on the fly if a URL is found inside the contents of a db text field?

I have an automated report tool (corp intranet) where the admins have a few text area boxes to enter some text for different parts of the email body.
What I'd like to do is parse the contents of the text area and wrap any hyperlinks found with link tags (so when the report goes out there are links instead of text urls).
Is ther a simple way to do something like this without figuring out a way of parsing the text to add link tags around a found (['http:','https:','ftp:] TO the first SPACE after)?
Thank You!
Ruby 1.87, Rails 2.3.5
Make a helper :
def make_urls(text)
urls = %r{(?:https?|ftp|mailto)://\S+}i
html_text = text.gsub urls, '\0'
html_text
end
on the view just call this function , you will get the expected output.
like :
irb(main):001:0> string = 'here is a link: http://google.com'
=> "here is a link: http://google.com"
irb(main):002:0> urls = %r{(?:https?|ftp|mailto)://\S+}i
=> /(?:https?|ftp|mailto):\/\/\S+/i
irb(main):003:0> html = string.gsub urls, '\0'
=> "here is a link: http://google.com"
There are many ways to accomplish your goal. One way would be to use Regex. If you have never heard of regex, this wikipedia entry should bring you up to speed.
For example:
content_string = "Blah ablal blabla lbal blah blaha http://www.google.com/ adsf dasd dadf dfasdf dadf sdfasdf dadf dfaksjdf kjdfasdf http://www.apple.com/ blah blah blah."
content_string.split(/\s+/).find_all { |u| u =~ /^https?:/ }
Which will return: ["http://www.google.com/", "http://www.apple.com/"]
Now, for the second half of the problem, you will use the array returned above to subsititue the text links for hyperlinks.
links = ["http://www.google.com/", "http://www.apple.com/"]
links.each do |l|
content_string.gsub!(l, "<a href='#{l}'>#{l}</a>")
end
content_string will now be updated to contain HTML hyperlinks for all http/https URLs.
As I mentioned earlier, there are numerous ways to tackle this problem - to find the URLs you could also do something like:
require 'uri'
URI.extract(content_string, ['http', 'https'])
I hope this helps you.

Truncate Markdown?

I have a Rails site, where the content is written in markdown. I wish to display a snippet of each, with a "Read more.." link.
How do I go about this? Simple truncating the raw text will not work, for example..
>> "This is an [example](http://example.com)"[0..25]
=> "This is an [example](http:"
Ideally I want to allow the author to (optionally) insert a marker to specify what to use as the "snippet", if not it would take 250 words, and append "..." - for example..
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
The marker could be thought of like an EOF marker (which can be ignored when displaying the full document)
I am using maruku for the Markdown processing (RedCloth is very biased towards Textile, BlueCloth is extremely buggy, and I wanted a native-Ruby parser which ruled out peg-markdown and RDiscount)
Alternatively (since the Markdown is translated to HTML anyway) truncating the HTML correctly would be an option - although it would be preferable to not markdown() the entire document, just to get the first few lines.
So, the options I can think of are (in order of preference)..
Add a "truncate" option to the maruku parser, which will only parse the first x words, or till the "excerpt" marker.
Write/find a parser-agnostic Markdown truncate'r
Write/find an intelligent HTML truncating function
Write/find an intelligent HTML truncating function
The following from http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/, with some modifications will correctly truncate HTML, and easily allow appending a string before the closing tags.
>> puts "<p><b>Something</p>".truncate_html(5, at_end = "...")
=> <p><b>Someth...</b></p>
The modified code:
require 'rexml/parsers/pullparser'
class String
def truncate_html(len = 30, at_end = nil)
p = REXML::Parsers::PullParser.new(self)
tags = []
new_len = len
results = ''
while p.has_next? && new_len > 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results << "<#{tags.last}#{attrs_to_s(p_e[1])}>"
when :end_element
results << "</#{tags.pop}>"
when :text
results << p_e[0][0..new_len]
new_len -= p_e[0].length
else
results << "<!-- #{p_e.inspect} -->"
end
end
if at_end
results << "..."
end
tags.reverse.each do |tag|
results << "</#{tag}>"
end
results
end
private
def attrs_to_s(attrs)
if attrs.empty?
''
else
' ' + attrs.to_a.map { |attr| %{#{attr[0]}="#{attr[1]}"} }.join(' ')
end
end
end
Here's a solution that works for me with Textile.
Convert it to HTML
Truncate it.
Remove any HTML tags that got cut in half with
html_string.gsub(/<[^>]*$/, "")
Then, uses Hpricot to clean it up and close unclosed tags
html_string = Hpricot( html_string ).to_s
I do this in a helper, and with caching there's no performance issue.
You could use a regular expression to find a line consisting of nothing but "^" characters:
markdown_string = <<-eos
This article is an example of something or other.
This segment will be used as the snippet on the index page.
^^^^^^^^^^^^^^^
This text will be visible once clicking the "Read more.." link
eos
preview = markdown_string[0...(markdown_string =~ /^\^+$/)]
puts preview
Rather than trying to truncate the text, why not have 2 input boxes, one for the "opening blurb" and one for the main "guts". That way your authors will know exactly what is being show when without having to rely on some sort of funkly EOF marker.
I will have to agree with the "two inputs" approach, and the content writer would need not to worry, since you can modify the background logic to mix the two inputs in one when showing the full content.
full_content = input1 + input2 // perhaps with some complementary html, for a better formatting
Not sure if it applies to this case, but adding the solution below for the sake of completeness. You can use strip_tags method if you are truncating Markdown-rendered contents:
truncate(strip_tags(markdown(article.contents)), length: 50)
Sourced from:
http://devblog.boonecommunitynetwork.com/rails-and-markdown/
A simpler option that just works:
truncate(markdown(item.description), length: 100, escape: false)

Resources