Rails only escape certain sections of content - ruby-on-rails

I'm looking to turn all words preceeded by a # (ie #stackoverflow) into a link that when clicked through will link to the search page with the word as a query.
I tried this recently and got the right HTML being returned, but because content is automatically escaped it showed as:
This is some content something
My question is: Is there any way to only apply html_safe to every part of the content except for these links?

If your tags are simple alphanumeric strings (i.e. nothing that needs to be HTML or URL encoded), then you could do something like this:
s = ERB::Util.html_escape(text_to_be_linkified).gsub(/#(\w+)/, '\1').html_safe
Then s.html_safe? will be true and <%= ... %> will pass the result through as-is. If you put this in a view helper, then you shouldn't need the ERB::Util. prefix on html_escape. If you do need to worry about URL or HTML encoding then you could modify the gsub replacement string appropriately.
For example:
> s = ERB::Util.html_escape('<pancakes & #things').gsub(/#(\w+)/, '\1').html_safe
> puts s.html_safe?
true
> puts s
<pancakes & things

Related

Rails 5 - How to strip tags from string in rails (NOT in/for html)

I need to strip tags from user input before saving into DB
I'm well aware of strip_tags method but it also html escapes string, as well as all other recommended methods:
Rails::Html::FullSanitizer.new.sanitize '&'
=> "&"
Rails::Html::WhiteListSanitizer.new.sanitize('&', tags: [])
=> "&"
ActionController::Base.helpers.strip_tags "&"
=> "&"
The string I want to sanitize is NOT to be escaped, it's getting exported via API, used in files etc. it's NOT only outputted via HTML (where also in cases like link_to ActionController::Base.helpers.strip_tags("&") - link_to is double escaping string so you'll get link to & in the frontend )
As a monkey patch I've wrapped strip_tags into CGI.unescapeHTML to get more or less expected result but want to find some straight solution (I'm also afraid what else can strip_tags do and there are too many moving parts for that small functionality - more stuff that can go wrong or become broken)
Real world example:
JPMorgan Chase & Co should become JPMorgan Chase & Co after removing tags
test<script>alert('hacked!');</script>&test should become test&test after stripping tags
And also string:
"test <script>alert('hacked!')</script>"
Should still be
"test <script>alert('hacked!')</script>"
After stripping HTMLs
With alternative solutions that I've found or that was proposed:
> Nokogiri::HTML("test <script>alert('hacked!')</script>").text
=> "test <script>alert('hacked!')</script>"
> Loofah.fragment("test <script>alert('hacked!')</script>").text(encode_special_chars: false)
=> "test <script>alert('hacked!')</script>"
So they're also a no go
You have to parse the HTML and extract the text elements. Use Nokogiri to do that.
Nokogiri::HTML("<div>Strip <i>this</i> & <b>this</b> & <u>this</u>!</div>").text
Nokogiri is already used by Rails so there's no cost to using it.
You will get all the text, including the content of <script> tags.
Nokogiri::HTML(%q[test<script>alert('hacked!');</script>&test]).text
# testalert('hacked!');&test
You can strip the <script> tags.
doc = Nokogiri::HTML(%q[test<script>alert('hacked!');</script>&test])
doc.search('//script').each { |node| node.replace('') }
doc.text
# test&test
But with the tags stripped out the string is of no harm. It might not be worth the effort.
See the Nokogiri tutorials for more.

Properly Escaping a String inside a View

I've read in multiple places that as of Rails 3 you no longer have to use html_escape "some string" to actually escape a string in a view and that simply writing <%= "some string" %> would escape the string by default. However, I cannot find this information in the docs. I read through the XSS in the Rails guides section that stated this:
https://guides.rubyonrails.org/security.html#cross-site-scripting-xss
As a second step, it is good practice to escape all output of the application, especially when re-displaying user input, which hasn't been input-filtered (as in the search form example earlier on). Use escapeHTML() (or its alias h()) method to replace the HTML input characters &, ", <, and > by their uninterpreted representations in HTML (&, ", <, and >).
Then I see several blogs that state that it is escaped by default. For example: https://www.netsparker.com/blog/web-security/preventing-xss-ruby-on-rails-web-applications/
https://dzone.com/articles/preventing-cross-site-scripting-vulnerabilities-wh
Found it:
https://guides.rubyonrails.org/3_0_release_notes.html
"7.4.3 Other Changes
You no longer need to call h(string) to escape HTML output, it is on by default in all view templates. If you want the unescaped string, call raw(string)."
escapeHTML() (or its alias h()) are from CGI::escapeHTML, which is a Ruby API implementation. If you aren't using Rails you still have a way to escape HTML. Rails may do some automagical handling of HTML in ERB files for display, and that is what you are probably referring to with html_escape "some string" and <%= "some string" %>. I think you are possibly confusing html_escape which you might need when displaying urls and such that are stored in the DB and you want the ERB processor to not mess it up? I know sometimes, particularly in .js.erb files I need to escape some things to get the result I was expecting. This is different than sanitizing. It seems in your example they are referring to something that you might accept and then redisplay, like a search string. If you put <i>hello</i> into a search box you would want to sanitize the input before passing it to the back end, or if you are using some javascript to filter you might want to escape it both for security reasons and to let it re-display correctly in the search box after you've filtered.
Edit: I was not able to find the answer to your comment in the ri doc either. But I tried:
<%= "<b>hello</b>" %>
<%= h("<b>hello</b>") %>
And got the same result in the browser:
<b>hello</b>
<b>hello</b>
So if you are asking if it is true, then I would say yes.

How to Display Brackets Correctly in Code in User Comments?

When a user posts a comment on my site, I run it through a sanitized markdown formatter on the backend and then display it on the site.
However, this causes the less-than and greater-than signs (< and >) to come out with their HTML codes (< and &rt;) inside the user's code examples (which gets marked with <pre> and <code> tags). The brackets display correctly outside of code, but how do I fix it so they show up correctly inside code?
In short, I want what now shows up as:
if(a < b)
To show up as:
if(a < b)
This is my code in the helper for marking down the user's comment:
def comment_markdown(text)
renderer = Redcarpet::Render::HTML.new()
markdown = Redcarpet::Markdown.new(renderer)
safe_text = sanitize text, tags: %w(b i code pre br p)
markdown.render(safe_text).html_safe
end
It's called in the view:
<%= comment_markdown comment.text %>
Rails already HTML-safe's text for display in views; so with your call to .html_safe in the comment_markdown method, it's getting escaped twice.
Simply remove your call to .html_safe:
def comment_markdown(text)
renderer = Redcarpet::Render::HTML.new()
markdown = Redcarpet::Markdown.new(renderer)
safe_text = sanitize text, tags: %w(b i code pre br p)
markdown.render(safe_text)
end
I think I'll just use Redcarpet's filter_html: true option to prevent any security issues from iframes and the like. Then I don't need to sanitize the text, so it doesn't escape text inside pre tags, and it displays normally. I just need to see how to configure it so users can't use distracting things like Headers.

Remove whitespace from inside html tags

How can I remove whitespace from within html tags?
For example:
"\r\n\t This is a paragraph\r\n".strip
=> "This is a Paragraph"
But what about when:
"<p>\r\n\t This is a paragraph\r\n</p>".strip
=> "<p>\r\n\t This is a paragraph\r\n</p>"
How can I get ruby to remove the whitespace from inside the tags (while retaining the p tags)?
In rails , there is a method name 'squish', for example:
"<p>\r\n\t This is a paragraph\r\n</p>".squish => "<p> This is a paragraph </p>"
Returns the string, first removing all whitespace on both ends of the string, and then changing remaining consecutive whitespace groups into one space each.
It leaves a space. If you want to get "<p>This is a paragraph</p>", I think you should use regex , it's more complicated than this. ^_^
If you want to use it in view. You can do with the below statement.
<%= strip_tags("<p>\r\n\t This is a paragraph\r\n</p>").strip %>
suppose if you want to use it in controller you can do with this statement.
self.class.helpers.strip_tags("<p>\r\n\t This is a paragraph\r\n</p>").strip
strip_tags method will removes all html tags from string.
I hope this will helps.

Rails 3.1 HAML escaping too much on a an `:escaped` chunk, how to control it so that it only escapes ampersands?

I have a chunk of code provided by Wistia to embed videos into a page. This source is embedable raw html and they include some ampersands in it directly. Of course my w3c validator yells at me all day long and with these in it I'm getting hundreds of errors like:
& did not start a character reference. (& probably should have been escaped as &.)
My view is in HAML so I'm assuming that I needed to escape the sequence, which I happily did with:
:escape
<object width="...
Upon doing this the video no longer loads as it has escaped the entire string with <object width=" ... etc.
How would one properly escape such sequences programmatically vs manually altering the inserted string each time a new update is made in Rails 3.1 with HAML?
You'll probably want to put your HTML into its own partial, then render it into a string and do a String#gsub on it.
Put your Wistia HTML into a partial called something like app/views/shared/_wistia.html
Then create a helper that looks like:
def embed_video(partial)
html = render_to_string(:partial => "shared/#{partial}")
html.gsub '&', '&'
end
And in your HAML, just put = embed_video 'wistia' wherever you want the video to be inserted.

Resources