WMD editor sanitizing - editor

I am trying to find ways to sanitize the input of the WMD editor.
Specifically, I am trying to make HTML tags only available in the <code>tags that WMD generates. Is that possible
My problem is that the following code is rendered as HTML which is vunerable to potential XSS attacks.
For example, <a onmouseover="alert(1)" href="#">read this!</a>
The above code renders normally both in preview mode and when saved to the database.
I notice that Stack Overflow doesn't seem to have this problem. The same code is just rendered as text.
I notice that the Stack Overflow team has shared their code in http://refactormycode.com/codes/333-sanitize-html. Do I really have to use C# in order to sanitize WMD to do this?

I ended up using HTML Purifier for this.

If you want to block bad scripts from WMD on the client side, take a look at my answer here:
Align the WMD editor's preview HTML with server-side HTML validation (e.g. no embedded JavaScript code).
It shows how to implement a client-side whitelist in the WMD editor to restrict WMD's preview pane HTML to known-safe HTML elements and known-safe HTML attributes. It does the validation after WMD geneates its HTML, so even if there's a bug in the WMD editor's HTML generation which allows bad script to get through, the whitelist blocker will catch it. This code is based on StackOverflow.com's implementation of the same validation.
That said, you also need server-side validation too (If you're using PHP, HTML Purifier is a good choice), because even if you fix the client, that doesn't prevent an attacker from simulating a browser and saving malicious markdown by POST-ing it to your server. So doing client-side WMD previewer validation isn't actually required, except to defend against an obscure case where an attacker manages to get compromised markdown onto the server, and convinces a site moderator to edit the page. In that case, client WMD previewer validation might prevent an attacker from taking over the entire site.
Also, doing client-side validation can be helpful because then you know that the same tags and HTML allowed by the client will also be allowed on the server. Make sure to sync the server-side whitelist with the client whitelist. StackOverflow's whitelist is here if you want an example.

Related

XSS Attack prevention in C#

I've a Web API project which is consumed by an MVC project. The MVC project has a fair amount of user inputs which are displayed as output on the web page.
Now, I want to protect my site from XSS attacks. I've read about Microsoft's AntiXss library, input validations, output filtering etc. But my question is, How do I apply this to my project. Where to put input validations, how to filter my output, how do i sanitize user data, do I need to sanitize the data in APIs also or just in MVC before I send it to the APIs, and if yes, then how, where to use AntiXss library, in MVC or in web API, and how etc.
The answer depends on how exactly user input makes its way into the page DOM in the browser.
If the MVC application generates cshtml pages (with Razor), you need to implement output encoding there, in cshtml files. Note that AntiXSS as a separate library is now deprecated, it's now in the System.Web.Security.AntiXss namespace by default. You need to encode all output according to the context that they get written into (most importantly, you need to encode any input that's written in a Javascript context, be it a script tag, an event attribute like onclick, the first character of a href for an a tag, etc). For plain html output (text between tags) Razor already provides html encoding by default, so it's ok to just do <div>#myVar</div>.
If your frontend consumes something like a JSON API, then you probably have some kind of a client side template engine (Knockout, etc). In that case, it's reasonably safe to send data as received from the user back to the client with an application/json content type (that's actually very important). Then you have to carefully select binding methods to always bind user input as text and not as html to the page elements. This practically means things like using Knockout's text binding instead of html or using jQuery's .text() method instead of .html(), etc.
Please note that a full tutorial on XSS prevention would be way longer than an answer here, so this answer only highlights some high level things and the general way this should be done to prevent XSS.

Securely render arbitrary user-uploaded content (from a WSYWIG editor)

I have a site which allows an admin to edit a section of one page of the site with arbitrary HTML (via a WSYWIG editor), and I want to figure out a way to serve this arbitrary HTML securely to other users.
The basic intent is to eliminate any possibility of XSS errors (i.e, a user getting their cookie stolen or something).I've seen that the following subset of HTML tags are unsafe to allow users to enter: iframe, frame, embed, style, video, object, etc.However, filtering out iFrames or style tags is not feasible for my use case because admins need to be able to upload youtube videos and style the text.
I've also heard that Content Management Systems sometimes serve user-uploaded content from a separate domain (e.g, content.mysite.com) so that whatever code may run as a result of the user-uploaded content can't steal my site's cookie (e.g, app.mysite.com) because of the same origin policy.However, this seems like kind of an overblown solution for me since my site is not a CMS,there's just one part of one page (editable only by admins) which allows for arbitrary customization.
So, is there a way to go about this?Would embedding the arbitrary content in an iframe keep users safe? Thanks in advance!
Also of potential relevance: the framework I'm using is Ruby on Rails.
An html editor on the client is not straightforward to protect against XSS at all. As you say, serving such content from a different domain may mitigate the risk, but gives way to other questions (like for example how will you authenticate and authorize users on the other domain to prevent downloading any user's content).
Also whitelist validation of tags and attributes is usually not feasible. You could have a whitelist of tags and attributes of those tags and anything else could be removed from html code. The problem with this is that the html editor will most likely want to use the style attribute, and styles are vulnerable to XSS, at least in older browsers. An editor usually needs to be able to save a link (<a href="">), and that's also vulnerable to XSS, for example <a href="javascript: alert(1)">.
One approach you may take and one that actually works is Google Caja. It can remove all Javascript from HTML, Javascript and CSS so that it's safe to include in your page. It also has a client-side sanitizer written in Javascript that can be used by itself, and may provide enough protection. Another client-side sanitizer is DOMPurify. Caja at its best is a server-side solution too, but that's somewhat harder to install and maintain. The client-side part of it, and also DOMPurify are slightly less secure as they are all on the client, but can very well provide adequate protection in many situations. (Note that DOMPurify does not work in older browsers and does not sanitize CSS code, which is OK-ish for recent browsers.)
With these client-side solutions, you let the user save any HTML with any injected Javascript to the database if the user (attacker) is smart enough, but when displaying the html content, you send it to the client in a way that's not vulnerable to XSS (like for example in an AJAX response with a text/javascript content type), and then run it through the client-side sanitizer in the browser before inserting into the page DOM (giving it to the editor control itself).
If your editor allows for the right hooks, you can also do this for any preview functionality, sanitizing the HTML entered before switching to preview. This is important, because while arguably a smaller risk, it is still considered XSS if the user can inject Javascript in the editor and then run it in preview, without ever sending it to the server. This is called DOM XSS, and it is one reason you may want to do sanitization on the client.

Allowing only certain HTML tags as user input

My site allows site-users to write blog-posts
class BlogPost
{
[AllowHtml]
public string Content;
}
The site is created using a MVC5 Internet application template and uses bootstrap 3 for it's CSS. So I decided to use http://jhollingworth.github.io/bootstrap-wysihtml5 to take care of all the JavaScript Part of a Rich Text Editor.
It works like a charm. But in order to make the POST happen, I had to add the [AllowHtml] attribute as in the code above. So now I'm scared of dangerous stuff that can get into the database and be in-turn displayed to all users.
I tried giving values like <script>alert("What's up?")</script> etc in the form and it seemed to be fine... the text was displayed exactly the same way (<script> became <script>. But this conversion seemed to be done by the javascript plugin I used.
So I used fiddler to compose a POST request with the same script tag and this time, the page actually executed the JavaScript code.
Is there any way I can figure out vulnerable input like <script> and even Link...?
Unfortunately, you have to sanitize the HTML yourself. See these on how people did it:
How to sanitize input from MCE in ASP.NET? - whitelist using Html Agility Pack
.NET HTML Sanitation for rich HTML Input - blacklist using Html Agility Pack
An alternative to accepting HTML is to accept markdown or BBCode instead. Both of them are widely used (markdown is used by stackoverflow!) and eliminate the need to sanitize the input. There are rich editors available too.
Edit
I found that Microsoft Web Protection Library can sanitize HTML input
through AntiXss.GetSafeHtml and AntiXss.GetSafeHtmlFragment.
Documentation is really poor though and seems like you can't configure which tags are valid.
I faced the same problem sanitizing wysihtml5 content on the server side. I was rather charmed by how wysihtml5 performed client side sanitation and implemented this using Html Agility Pack: HtmlRuleSanitizer on Github
Also available as NuGet package.
The reason for not using Microsoft's AntiXss is that it's not possible to enforce more detailed rules like what to do with tags. This results in tags being completely deleted when it for example would make sense to preserve the textual content. In addition I wanted to have a white listing approach on everything (CSS, tags and attributes).

Handling Rich Text in an MVC application

What are the best practices regarding working with rich text in a web application? I don't want to leave myself vulnerable to script attacks. Should the data be encoded going into the database and then decoded when displayed back to the user? Any advice on rich text editor's that handle things like removing script tags or encoding the entered markup?
You should pick a whitelist of known tags and attributes, parse the user input as XML, and remove every tag or attribute that isn't in the whitelist.
EDIT: Note that if you allow hyperlinks or images, you have to validate the src and href tags. I would recommend parsing it using System.Uri, restricting the scheme to http, and perhaps the domain to your site (depending what you want your users to be able to do).
Similar things have been done before; google them.
EDIT: For example, see this question
2nd EDIT:
You should not encode the data before putting it into the database. As long as you're using parameters (and if you aren't, you really should), the database will be completely unaffected by anything you put in it.
If your input sanitization is secure (see above), it won't make any difference if you encode it and decode it on the way, and if the sanitization isn't secure, encoding it won't help.
However, it probably is a good idea to run it through a standard XML parser, reject any input that doesn't parse, and use the formatted XML from the parser (as I mentioned above)
3rd EDIT:
There are many rich text editors out there; for MVC, I think I'd recommend FCKEditor. It will escape input for you, but you must not rely on it exclusively as an attacker can disable JavaScript or forge his own HTTP request. (You still need to validate the HTML on the server). There are many rich editors for web forms (which, I assume, do server-side validation); there aren't any for MVC (yet)
The best option is to encode data which is send to user and do not encode it in database.Also far as I know asp.net prevent script attacks by validating input.

Javascript Rich Text Editor and associated class to filter and clean the input?

I realise there are several rich text editors for jQuery but I cannot find any that have an associated class that does the filtering and cleaning required to accept the input into a database.
Does such a class exist?
I am particularly interested for a PHP library, but .NET would be interesting too.
If you would use FCKeditor it would allow to you get clean HTML or XHTML (editor.GetXHTML()) which you could then write into DB.
Actually it's not that much important what you write into DB, because usially you write original HTML (you can always strip from it saspicious tags if needed). In order to prevent XSS attacks it is essential to properly encode content before displaying it on the web-page. For .NET there is AntiXSS library for that at CodePlex.com
For PHP you may want to look at the following libraries:
Zend_Filter
HTML Sanitizer
PHP Input Filter
And also this article:
Avoiding XSS security attacks to sites that use HTML editors

Resources