How good is the Rails sanitize() method? - ruby-on-rails

Can I use ActionView::Helpers::SanitizeHelper#sanitize on user-entered text that I plan on showing to other users? E.g., will it properly handle all cases described on this site?
Also, the documentation mentions:
Please note that sanitizing
user-provided text does not guarantee
that the resulting markup is valid
(conforming to a document type) or
even well-formed. The output may still
contain e.g. unescaped ’<’, ’>’, ’&’
characters and confuse browsers.
What's the best way to handle this? Pass the sanitized text through Hpricot before displaying?

Ryan Grove's Sanitize goes a lot farther than Rails 3 sanitize. It ensures the output HTML is well-formed and has three built-in whitelists:
Sanitize::Config::RESTRICTED
Allows only very simple inline formatting markup. No links, images, or block elements.
Sanitize::Config::BASIC
Allows a variety of markup including formatting tags, links, and lists. Images and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto protocols, and a attribute is added to all links to mitigate SEO spam.
Sanitize::Config::RELAXED Allows an even wider variety of markup than BASIC, including images and tables. Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images are limited to HTTP and HTTPS. In this mode, is not added to links.

Sanitize is certainly better than the "h" helper. Instead of escaping everything, it actually allows the html tags that you specify. And yes, it does prevent cross-site scripting because it removes javascript from the mix entirely.
In short, both will get the job done. Use "h" when you don't expect anything other than plaintext, and use sanitize when you want to allow some, or you believe people may try to enter it. Even if you disallow all tags with sanitize, it'll "pretty up" the code by removing them instead of escaping them as "h" does.
As for incomplete tags: You could run a validation on the model that passes html-containing fields through hpricot, but I think this is overkill in most applications.

The best course of action depends on two things:
Your rails version (2.x or 3.x)
Whether your users are supposed to enter any html at all on the input or not.
As a general rule, I don't allow my users to input html - instead I let them input textile.
On rails 3.x:
User input is sanitized by default. You don't have to do anything, unless you want your users to be able to send some html. In that case, keep reading.
This railscast deals with XSS attacks on rails 3.
On rails 2.x:
If you don't allow any html from your users, just protect your output with the h method, like this:
<%= h post.text %>
If you want your users to send some html: you can use rails' sanitize method or HTML::StathamSanitizer

Related

When do I need to encode with multiple codecs in Grails?

I'm not clear of when (or if) I should use multiple Grails encodeAsXXX calls.
This reference says you need to encodeAsURL and then encodeAsJavaScript: http://grailsrocks.com/blog/2013/4/19/can-i-pwn-your-grails-application
It also says you need to encodeAsURL and then encodeAsHTML, I don't understand why this is necessary in the case shown but not all the time?
Are there other cases I should me using multiple chained encoders?
If I'm rendering a URL to a HTML attribute should I encodeAsURL then encodeAsHTML?
If I'm rendering a URL to a JavaScript variable sent as part of a HTML document (via a SCRIPT element) should I encodeAsURL, encodeAsJavaScript then encodeAsHTML?
If I'm rendering a string to a JavaScript variable sent as part of a HTML document should I encodeAsJavaScript then encodeAsHTML?
The official docs - https://docs.grails.org/latest/guide/security.html - don't show any examples of multiple chained encoders.
I can't see how I can understand what to do here except by finding the source for all the encoders and looking at what they encode and what's valid on the receiving end - but I figure it shouldn't be that hard for a developer and there is probably something simple I'm missing or some instructions I haven't found.
FWIW, I think the encoders I'm talking about are these ones:
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/util/JavaScriptUtils.html#javaScriptEscape-java.lang.String-
https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html#encode(java.lang.String,%20java.lang.String)
https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html#htmlEscape-java.lang.String-
.
It is certainly important to always consider XSS but in reading your question I think you are overestimating what you need to do. As long as you're using Grails 2.3 or higher and your grails.views.default.codec is set to html which it will be by default, everything rendered in your GSP with ${} will be escaped properly for you.
It is only when you are intentionally bypassing the escaping, such as if you need to get sanitized user input back into valid JavaScript within your GSP for some reason, that you would need to use the encodeAsXXX methods or similar.
I would argue (and the article makes a mention of this as well) that this should raise a smell anyway, as you probably should have that JavaScript encapsulated in a different file or TagLib where the escaping is handled.
Bottom line, use the encoding methods only if you are overriding the default HTML encoding, otherwise ${} handles it for you.

spring security native xss filter

Why Spring Security doesn't provide any XSS filter to clean the form input values?
Accordingly to this ticket, such XSS filter is a low priority:
https://jira.spring.io/browse/SEC-2167?jql=text%20~%20%22xss%22
(although the ticket speaks only about URL querystring. Sanitizing POST params would be also required)
In my opinion it would be really useful that spring would provide such a filter instead of building your own. This filter it's a recurrent problem.
XSS is best handled at output stage via the use of encoding. That is, store everything in your database as is, and yes storing <script> is fine, however once output, encode correctly for the context it is output in. For HTML this would be <script>, however if your output context was plain text you would just output as is <script> (assuming the same character set encoding is used). Side note: Use parameterised queries or equivalent for storing in your database to avoid SQL injection, however the text stored should exactly match what was entered.
Microsoft attempts to block inputs that look like XSS via their request validation feature in ASP.NET. However, this isn't very effective and flaws are found quite often. Similar approaches from other frameworks are doomed to fail.
The reason that this is much better is that it makes things much more simple. Imagine if StackOverflow didn't allow HTML or script tags - the site would not be functional as a place for people to post code snippets.
You can use input validation as second line of defence. For example, if you are asking the user to enter their car registration you would only want to allow alphanumerics and space to be entered. However, for more complex fields it is often difficult to restrict input to a safe set as output context is unknown at this stage.
Say your language filtered < and > characters. However you were outputting user input into the following context.
<img src="foo.jpg" alt="USER-INPUT" />
An XSS attack is possible by entering " onmouseover="alert('xss') because it would be rendered as
<img src="foo.jpg" alt="" onmouseover="alert('xss')" />
Similar problems would ensue if you were outputting to JavaScript server-side. This is why it should be up to the developer to select the correct encoding type when using user controlled values.

Rails comments system with bb-code

In my rails 4 app i want to add comments to my articles, but i want to add functional as most forum-engines do (like SMF), and i need to add bb-code for it.
Are there any good gem for it? With rails 4 support? How then in controller i can translate [quote] to some div with some style?
Also how is it good to store html data in database?
For example if i use haml, and somebody post comment as
- current_user.id
or something similar to this, how to secure my app from "bad boys" ? Sure i can change comments system to something like: quote_parent_id, but if i have multiple quotes in one comment? so it is hard to realise, better is to store html, but to secure it somehow.
Could i do this? And how? Please give good ideas, tutorials, gem-links.
Look into https://github.com/veger/ruby-bbcode
Since it converts to HTML and does not excecute user input as Ruby code - you'll be fairly safe. However, I havent tried the gem and its possible it introduces some XSS vulnerabilities.
Have you considered Markdown as an option?
You should also look into https://github.com/asceth/bbcoder ( I should note I am the original author ).
In the controller, changing a string such as "[quote=user]My post of epic importance[/quote]" into a div etc is just doing:
# assume params[:comment] is the text you are converting
params[:comment].bbcode_to_html
As for storing html in a database, there is no right or wrong answer. If you want to allow users to edit their posts later then I would lean towards not storing the html version but storing their original bbcode version. This way when you allow them to edit you aren't having to convert html back to bbcode.
To make sure you aren't open to XSS and other attacks I recommend combining other gems like sanitize.
Sanitize.clean(text.to_s).bbcode_to_html
Some more notes:
Multiple tags and nested tags are parsed as they are seen without any additional steps required. So a comment or post with lots of bbcode tags, multiple quotes, b tags or anything else is dealt with by just calling bbcode_to_html on the variable/string.
If a user tries to use haml in their post it should appear as-is. haml shouldn't try to eval the string unless you specifically tell it to which I'm not even sure how to do that unless haml as a special filter or operator.

Any problems with using a period in URLs to delimiter data?

I have some easy to read URLs for finding data that belongs to a collection of record IDs that are using a comma as a delimiter.
Example:
http://www.example.com/find:1%2C2%2C3%2C4%2C5
I want to know if I change the delimiter from a comma to a period. Since periods are not a special character in a URL. That means it won't have to be encoded.
Example:
http://www.example.com/find:1.2.3.4.5
Are there any browsers (Firefox, Chrome, IE, etc) that will have a problem with that URL?
There are some related questions here on SO, but none that specific say it's a good or bad practice.
To me, that looks like a resource with an odd query string format.
If I understand correctly this would be equal to something like:
http://www.example.com/find?id=1&id=2&id=3&id=4&id=5
Since your filter is acting like a multi-select (IDs instead of search fields), that would be my guess at a standard equivalent.
Browsers should not have any issues with it, as long as the application's route mechanism handles it properly. And as long as you are not building that query-like thing with an HTML form (in which case you would need JS or some rewrites, ew!).
May I ask why not use a more standard URL and querystring? Perhaps something that includes element class (/reports/search?name=...), just to know what is being queried by find. Just curious, I knows sometimes standards don't apply.

Sanitize output in Rails

What is the best solution to sanitize output HTML in Rails (to avoid XSS attacks)?
I have two options: white_list plugin or sanitize method from Sanitize Helper http://api.rubyonrails.com/classes/ActionView/Helpers/SanitizeHelper.html . For me until today the white_list plugin worked better and in the past, Sanitize was very buggy, but as part of the Core, probably it will be under development and be supported for a while.
I recommend http://code.google.com/p/xssterminate/.
I think the h helper method will work here:
<%= h #user.profile %>
This will escape angle brackets and therefore neutralize any embedded JavaScript. Of course this will also eliminate any formatting your users might use.
If you want formatting, maybe look at markdown.
Personally I think it's not a small decision to accept any HTML entry in any web app. You can test for white/blacklisted tags as much as you like, but unless you're testing for correct nesting, someone could enter a series of closing tags, for example
</td></tr></span></div>
and really mess with your layout.
I'd usually give people something like Textile to enter their markup, since I'd rather spend my time working on business logic than HTML parsing.
Of course, if this text entry is more fundamental to your app (as for example it is for stackoverflow) then you probably should give more attention to hand-rolling your own.

Resources