Securely render arbitrary user-uploaded content (from a WSYWIG editor) - ruby-on-rails

I have a site which allows an admin to edit a section of one page of the site with arbitrary HTML (via a WSYWIG editor), and I want to figure out a way to serve this arbitrary HTML securely to other users.
The basic intent is to eliminate any possibility of XSS errors (i.e, a user getting their cookie stolen or something).I've seen that the following subset of HTML tags are unsafe to allow users to enter: iframe, frame, embed, style, video, object, etc.However, filtering out iFrames or style tags is not feasible for my use case because admins need to be able to upload youtube videos and style the text.
I've also heard that Content Management Systems sometimes serve user-uploaded content from a separate domain (e.g, content.mysite.com) so that whatever code may run as a result of the user-uploaded content can't steal my site's cookie (e.g, app.mysite.com) because of the same origin policy.However, this seems like kind of an overblown solution for me since my site is not a CMS,there's just one part of one page (editable only by admins) which allows for arbitrary customization.
So, is there a way to go about this?Would embedding the arbitrary content in an iframe keep users safe? Thanks in advance!
Also of potential relevance: the framework I'm using is Ruby on Rails.

An html editor on the client is not straightforward to protect against XSS at all. As you say, serving such content from a different domain may mitigate the risk, but gives way to other questions (like for example how will you authenticate and authorize users on the other domain to prevent downloading any user's content).
Also whitelist validation of tags and attributes is usually not feasible. You could have a whitelist of tags and attributes of those tags and anything else could be removed from html code. The problem with this is that the html editor will most likely want to use the style attribute, and styles are vulnerable to XSS, at least in older browsers. An editor usually needs to be able to save a link (<a href="">), and that's also vulnerable to XSS, for example <a href="javascript: alert(1)">.
One approach you may take and one that actually works is Google Caja. It can remove all Javascript from HTML, Javascript and CSS so that it's safe to include in your page. It also has a client-side sanitizer written in Javascript that can be used by itself, and may provide enough protection. Another client-side sanitizer is DOMPurify. Caja at its best is a server-side solution too, but that's somewhat harder to install and maintain. The client-side part of it, and also DOMPurify are slightly less secure as they are all on the client, but can very well provide adequate protection in many situations. (Note that DOMPurify does not work in older browsers and does not sanitize CSS code, which is OK-ish for recent browsers.)
With these client-side solutions, you let the user save any HTML with any injected Javascript to the database if the user (attacker) is smart enough, but when displaying the html content, you send it to the client in a way that's not vulnerable to XSS (like for example in an AJAX response with a text/javascript content type), and then run it through the client-side sanitizer in the browser before inserting into the page DOM (giving it to the editor control itself).
If your editor allows for the right hooks, you can also do this for any preview functionality, sanitizing the HTML entered before switching to preview. This is important, because while arguably a smaller risk, it is still considered XSS if the user can inject Javascript in the editor and then run it in preview, without ever sending it to the server. This is called DOM XSS, and it is one reason you may want to do sanitization on the client.

Related

XSS Attack prevention in C#

I've a Web API project which is consumed by an MVC project. The MVC project has a fair amount of user inputs which are displayed as output on the web page.
Now, I want to protect my site from XSS attacks. I've read about Microsoft's AntiXss library, input validations, output filtering etc. But my question is, How do I apply this to my project. Where to put input validations, how to filter my output, how do i sanitize user data, do I need to sanitize the data in APIs also or just in MVC before I send it to the APIs, and if yes, then how, where to use AntiXss library, in MVC or in web API, and how etc.
The answer depends on how exactly user input makes its way into the page DOM in the browser.
If the MVC application generates cshtml pages (with Razor), you need to implement output encoding there, in cshtml files. Note that AntiXSS as a separate library is now deprecated, it's now in the System.Web.Security.AntiXss namespace by default. You need to encode all output according to the context that they get written into (most importantly, you need to encode any input that's written in a Javascript context, be it a script tag, an event attribute like onclick, the first character of a href for an a tag, etc). For plain html output (text between tags) Razor already provides html encoding by default, so it's ok to just do <div>#myVar</div>.
If your frontend consumes something like a JSON API, then you probably have some kind of a client side template engine (Knockout, etc). In that case, it's reasonably safe to send data as received from the user back to the client with an application/json content type (that's actually very important). Then you have to carefully select binding methods to always bind user input as text and not as html to the page elements. This practically means things like using Knockout's text binding instead of html or using jQuery's .text() method instead of .html(), etc.
Please note that a full tutorial on XSS prevention would be way longer than an answer here, so this answer only highlights some high level things and the general way this should be done to prevent XSS.

Pros and Cons of Isotope templates (Rails)

Isotope lets you write templates in javascript. These templates can then be rendered by either the client (using plain-old javascript) or on the server (using Johnson).
The benefit is DRYer code. When updating the DOM on an ajax or web socket update, you can don't have to write a new partial...just point it to the one you already wrote.
Has anyone used this?
Interesting, I would have to try it, however , and I know not a lot of people do it, but I actually use HAML to template .js files. Although there is still that problem the author mentions , of each request being templated on the server and sending back html, unless you are sending loads of kb, or you have really, really high load site I don't think it's such a big deal.
Also ideally you shouldn't be even sending html data back and force, just JSON objects, which are rendered on the page by your ajax request. The only legitimate use I can see for this is if you have heavily ajax website, such as where you load a page once, and the you just keeping doing ajax requests for all interaction and javascript to manipulate view.
So it would help if you would clarify the final goal. Is this for some internal app where you control user environment ( you know for sure which browsers they will use, and that they will have fast enough computers to manipulate all this javascript?) Or is it going to be an app targeted towards 3rd world, where people don't have yet resources available to use all that fancy javascript.
All that said, it's an interesting concept, and I will try it our myself, to see how well it works.
This uses Johnson, which last I checked did not work on Ruby 1.9. So that might hint at some of the immaturity of this particular solution. Eventually the community will come up with something that works really well.
One approach I have used is to make 2 completely separate templates, but try to make them as similar as possible so that it is easy to port changes from one to the other.
This seems like a bad idea. In an Ajax application, I believe that the server should be responsible for rendering all display text. This makes i18n easier, and concentrates everything in one place. The JavaScript should simply receive data from the server, with all display text already rendered, and put it in the appropriate DOM object.
In other words, I believe that in an Ajax application, the need for a JS template engine is itself a design smell.
The situation is different in exclusively client-side JS applications, of course.

On which browsers does work the Rails 3 XSRF protection?

There is a nice XSRF protection for link_to method in Rails 3 that generates some custom HTML5 tags, a hash security key and with a bunch of JavaScript it can send requests using safer PUT/DELETE/POST methods instead of HTTP GET. Thats very nice.
But I am in doubt on which browsers does this work? I mean it definitely does not work when JavaScript is disabled. But does the browser need to be HTML5? AFAIK there are many browsers that implement some portions of HTML5 and as this technique needs only a custom HTML tag it could work on older ones.
Is there any kind of document that describes this compatibility? I am interested in:
Chrome/Safari
Firefox
MSIE
Opera
Thanks
The links only contain that special HTML5-data if you want the link to be POST/PUT/DELETE. A regular link can only be a GET. JavaScript dependency is because of this, not because of the XSRF solution.
The custom HTML5 attributes (not tags) are just attributes that are named "data-...". Browsers did accept custom attributes before HTML5, but now there is a way how you can add custom attributes without jeopardizing your HTML5-validity.
So, for this list of browsers you provided: all working, down to IE6 (unless you disable JavaScript).

WMD editor sanitizing

I am trying to find ways to sanitize the input of the WMD editor.
Specifically, I am trying to make HTML tags only available in the <code>tags that WMD generates. Is that possible
My problem is that the following code is rendered as HTML which is vunerable to potential XSS attacks.
For example, <a onmouseover="alert(1)" href="#">read this!</a>
The above code renders normally both in preview mode and when saved to the database.
I notice that Stack Overflow doesn't seem to have this problem. The same code is just rendered as text.
I notice that the Stack Overflow team has shared their code in http://refactormycode.com/codes/333-sanitize-html. Do I really have to use C# in order to sanitize WMD to do this?
I ended up using HTML Purifier for this.
If you want to block bad scripts from WMD on the client side, take a look at my answer here:
Align the WMD editor's preview HTML with server-side HTML validation (e.g. no embedded JavaScript code).
It shows how to implement a client-side whitelist in the WMD editor to restrict WMD's preview pane HTML to known-safe HTML elements and known-safe HTML attributes. It does the validation after WMD geneates its HTML, so even if there's a bug in the WMD editor's HTML generation which allows bad script to get through, the whitelist blocker will catch it. This code is based on StackOverflow.com's implementation of the same validation.
That said, you also need server-side validation too (If you're using PHP, HTML Purifier is a good choice), because even if you fix the client, that doesn't prevent an attacker from simulating a browser and saving malicious markdown by POST-ing it to your server. So doing client-side WMD previewer validation isn't actually required, except to defend against an obscure case where an attacker manages to get compromised markdown onto the server, and convinces a site moderator to edit the page. In that case, client WMD previewer validation might prevent an attacker from taking over the entire site.
Also, doing client-side validation can be helpful because then you know that the same tags and HTML allowed by the client will also be allowed on the server. Make sure to sync the server-side whitelist with the client whitelist. StackOverflow's whitelist is here if you want an example.

Handling Rich Text in an MVC application

What are the best practices regarding working with rich text in a web application? I don't want to leave myself vulnerable to script attacks. Should the data be encoded going into the database and then decoded when displayed back to the user? Any advice on rich text editor's that handle things like removing script tags or encoding the entered markup?
You should pick a whitelist of known tags and attributes, parse the user input as XML, and remove every tag or attribute that isn't in the whitelist.
EDIT: Note that if you allow hyperlinks or images, you have to validate the src and href tags. I would recommend parsing it using System.Uri, restricting the scheme to http, and perhaps the domain to your site (depending what you want your users to be able to do).
Similar things have been done before; google them.
EDIT: For example, see this question
2nd EDIT:
You should not encode the data before putting it into the database. As long as you're using parameters (and if you aren't, you really should), the database will be completely unaffected by anything you put in it.
If your input sanitization is secure (see above), it won't make any difference if you encode it and decode it on the way, and if the sanitization isn't secure, encoding it won't help.
However, it probably is a good idea to run it through a standard XML parser, reject any input that doesn't parse, and use the formatted XML from the parser (as I mentioned above)
3rd EDIT:
There are many rich text editors out there; for MVC, I think I'd recommend FCKEditor. It will escape input for you, but you must not rely on it exclusively as an attacker can disable JavaScript or forge his own HTTP request. (You still need to validate the HTML on the server). There are many rich editors for web forms (which, I assume, do server-side validation); there aren't any for MVC (yet)
The best option is to encode data which is send to user and do not encode it in database.Also far as I know asp.net prevent script attacks by validating input.

Resources