rails truncate method adds special characters - ruby-on-rails

I have this html text:
<p> I'm a html text</p>
To show it on my web page, I first sanitize it and remove the tags:
sanitize(best_practice.milestone.description, :tags=>[])
I then shows ok, the is removed.
But if I decide to truncate the text like this:
sanitize(best_practice.milestone.description, :tags=>[]).truncate(30)
The is visible again on my web page. All the special chars will actually be visible.
What can I do to avoid truncate to make this special chars visible?

Dealing with sanitize helpers and truncation can be tricky. There are a lot of different sanitize helpers: h, CGI::escapeHTML, sanitize, strip_tags, html_safe, etc. Sanitization and truncation do not work well together if a string is truncated between an opening and a closing tag or right in the middle of a special HTML character.
The following statement seems to work
sanitize(text, :tags=>[]).truncate(30, :separator => " ").html_safe
The trick is to a pass a :separator option to truncate text at a natural break.

Related

Properly Escaping a String inside a View

I've read in multiple places that as of Rails 3 you no longer have to use html_escape "some string" to actually escape a string in a view and that simply writing <%= "some string" %> would escape the string by default. However, I cannot find this information in the docs. I read through the XSS in the Rails guides section that stated this:
https://guides.rubyonrails.org/security.html#cross-site-scripting-xss
As a second step, it is good practice to escape all output of the application, especially when re-displaying user input, which hasn't been input-filtered (as in the search form example earlier on). Use escapeHTML() (or its alias h()) method to replace the HTML input characters &, ", <, and > by their uninterpreted representations in HTML (&, ", <, and >).
Then I see several blogs that state that it is escaped by default. For example: https://www.netsparker.com/blog/web-security/preventing-xss-ruby-on-rails-web-applications/
https://dzone.com/articles/preventing-cross-site-scripting-vulnerabilities-wh
Found it:
https://guides.rubyonrails.org/3_0_release_notes.html
"7.4.3 Other Changes
You no longer need to call h(string) to escape HTML output, it is on by default in all view templates. If you want the unescaped string, call raw(string)."
escapeHTML() (or its alias h()) are from CGI::escapeHTML, which is a Ruby API implementation. If you aren't using Rails you still have a way to escape HTML. Rails may do some automagical handling of HTML in ERB files for display, and that is what you are probably referring to with html_escape "some string" and <%= "some string" %>. I think you are possibly confusing html_escape which you might need when displaying urls and such that are stored in the DB and you want the ERB processor to not mess it up? I know sometimes, particularly in .js.erb files I need to escape some things to get the result I was expecting. This is different than sanitizing. It seems in your example they are referring to something that you might accept and then redisplay, like a search string. If you put <i>hello</i> into a search box you would want to sanitize the input before passing it to the back end, or if you are using some javascript to filter you might want to escape it both for security reasons and to let it re-display correctly in the search box after you've filtered.
Edit: I was not able to find the answer to your comment in the ri doc either. But I tried:
<%= "<b>hello</b>" %>
<%= h("<b>hello</b>") %>
And got the same result in the browser:
<b>hello</b>
<b>hello</b>
So if you are asking if it is true, then I would say yes.

Remove whitespace from inside html tags

How can I remove whitespace from within html tags?
For example:
"\r\n\t This is a paragraph\r\n".strip
=> "This is a Paragraph"
But what about when:
"<p>\r\n\t This is a paragraph\r\n</p>".strip
=> "<p>\r\n\t This is a paragraph\r\n</p>"
How can I get ruby to remove the whitespace from inside the tags (while retaining the p tags)?
In rails , there is a method name 'squish', for example:
"<p>\r\n\t This is a paragraph\r\n</p>".squish => "<p> This is a paragraph </p>"
Returns the string, first removing all whitespace on both ends of the string, and then changing remaining consecutive whitespace groups into one space each.
It leaves a space. If you want to get "<p>This is a paragraph</p>", I think you should use regex , it's more complicated than this. ^_^
If you want to use it in view. You can do with the below statement.
<%= strip_tags("<p>\r\n\t This is a paragraph\r\n</p>").strip %>
suppose if you want to use it in controller you can do with this statement.
self.class.helpers.strip_tags("<p>\r\n\t This is a paragraph\r\n</p>").strip
strip_tags method will removes all html tags from string.
I hope this will helps.

Line breaks are not shown in heredoc

I have a heredoc string
html =<<EOF
<span>
Hello hello 123
</span>
<a>Link1</a>
<a>Link2Link2</a>
EOF
If I say puts html, it will give html as it is, meaning with new lines which is fine. If I call p html I'll get the html without line breaks.
However, what I really need to do is to convert this html into picture and it should have line breaks. Here is how I do that:
kit = IMGKit.new html, quality: 30
# using Magick::Image ......
# some code which is not important....
img.write("my_gif.gif")
It's almost fine except the fact that the result html, as I've already said, doesn't have line breaks, it has only one line:
<span>Hello hello 123</span><a>Link1</a><a>Link2Link2</a>
Of course, if I add <br /> tags, it all will be worked out. But I'm not able to do that for some reason, I want not to use <br /> and still have line breaks.
This is not the problem of IMGKit or Rmagic as I'm pretty sure.
So how do I achieve that?
I agree it is not a problem with IMGKit - it is doing what it is supposed to do - render the HTML. There is also nothing wrong with the heredoc, and nothing magical you can do with Ruby's representation of the HTML such that literal whitespace (spaces, tabs, newlines) in HTML source become visible when rendered.
The most common rendering of source whitespace by HTML viewers is that any length of pure whitespace (whether spaces, tabs, newlines or any combination) is rendered as a single space -> <- in the view. Additionally, whitespace between one element end and another starting is often completely ignored (although the rendering of the elements themselves may cause layout/spacing effects in the view).
You could, however, do something like this:
kit = IMGKit.new html.gsub(/\n/,"<br/>"), quality: 30
and have line breaks rendered without adding <br/> to your heredoc.

Prevent HTML escaping in a text area

Realized that if I put HTML code in a rails text area, it will output the html.
For instance:
<b> Hello </b>
outputs as:
Hello
I thought rails 3 text inputs automatically escape HTML but whenever I output #variable.textarea, it still shows the bold text. Is it being selective about what HTML to input? And how do I make sure all HTML is always escape when I output the content of my textarea?
Thanks!
If <b>hello</b> comes out as hello, that means HTML escaping is already prevented.
Since you don't want users to be able to use HTML in their inputs, you want HTML to be escaped, so that <b>hello</b> comes out as <b>hello</b>.
In a Rails 3 app, html automatically gets escaped - but you can explicitly escape it using the h method:
<%= h my_string %>

newline characters screwing up <pre> tags (Ruby on Rails)

I developing a blog and some really annoying stuff is happening with newline characters (\n). Everything works fine except if I make a post that contains pre tags my newline characters screw up the indentation.
So if I have code that looks like this
<pre>
<code>
some code some code
more code more code
</code>
</pre>
For some reason the newline characters that are saved in the db field with the post are causing whatever is inside the pre tag to be indented by a tab or two.
I have no idea why it's doing it, but if I do something like
string.gsub!(/\n/, "<br />")
The indentation is removed, so I know it has to do with the \n. But then my problem is that there are way too many line breaks and the format is then way off.
So then I tried to capture everything inside the pre tags with a method that looks like this
def remove_newlines(string)
regexp = /<pre>\s?(.*?)\s?<\/pre>/
code = regexp.match(string)
code[1].gsub!(/\n/, "<br />")
end
But I can't get that to work properly.
Anyone know how I can rid of this weird indentation problem, or any pointers on this?
Thanks!
It sounds like your template engine is auto-indenting the contents of the <pre> tags. Browsers render the whitespace inside <pre> tags as it is (and so they should, according to specs). This means that the whitespace at the beginning of each line inside the <pre> added by the template engine in order to make the HTML source more readable is rendered in the actual page as well, unlike whitespace most other places in HTML source.
The solution therefore depends on your templating language.
If you are using HAML:
HAML FAQ: How do I stop Haml from indenting the contents of my pre and textarea tags?
Hope this helps.

Resources