HTML to RTF to HTML = HTML Without scripts? - asp.net-mvc

I'd like to implement a WYSIWYG HTML editor in my web application. I looked at the Codeplex AntiXSS by Microsoft for Crosssite Scripting protection, and the feedback seems really bad.
The alternative I have in mind is converting the input from the editor to RTF and then back to HTML and only then ship it to the database so it can be served later. I understand that this is an incredibly inefficient method, but the question is if that way I can guarantee no scripts at all. or in other words, can this provide me a complete XSS protection?

Related

XSS Attack prevention in C#

I've a Web API project which is consumed by an MVC project. The MVC project has a fair amount of user inputs which are displayed as output on the web page.
Now, I want to protect my site from XSS attacks. I've read about Microsoft's AntiXss library, input validations, output filtering etc. But my question is, How do I apply this to my project. Where to put input validations, how to filter my output, how do i sanitize user data, do I need to sanitize the data in APIs also or just in MVC before I send it to the APIs, and if yes, then how, where to use AntiXss library, in MVC or in web API, and how etc.
The answer depends on how exactly user input makes its way into the page DOM in the browser.
If the MVC application generates cshtml pages (with Razor), you need to implement output encoding there, in cshtml files. Note that AntiXSS as a separate library is now deprecated, it's now in the System.Web.Security.AntiXss namespace by default. You need to encode all output according to the context that they get written into (most importantly, you need to encode any input that's written in a Javascript context, be it a script tag, an event attribute like onclick, the first character of a href for an a tag, etc). For plain html output (text between tags) Razor already provides html encoding by default, so it's ok to just do <div>#myVar</div>.
If your frontend consumes something like a JSON API, then you probably have some kind of a client side template engine (Knockout, etc). In that case, it's reasonably safe to send data as received from the user back to the client with an application/json content type (that's actually very important). Then you have to carefully select binding methods to always bind user input as text and not as html to the page elements. This practically means things like using Knockout's text binding instead of html or using jQuery's .text() method instead of .html(), etc.
Please note that a full tutorial on XSS prevention would be way longer than an answer here, so this answer only highlights some high level things and the general way this should be done to prevent XSS.

Allowing only certain HTML tags as user input

My site allows site-users to write blog-posts
class BlogPost
{
[AllowHtml]
public string Content;
}
The site is created using a MVC5 Internet application template and uses bootstrap 3 for it's CSS. So I decided to use http://jhollingworth.github.io/bootstrap-wysihtml5 to take care of all the JavaScript Part of a Rich Text Editor.
It works like a charm. But in order to make the POST happen, I had to add the [AllowHtml] attribute as in the code above. So now I'm scared of dangerous stuff that can get into the database and be in-turn displayed to all users.
I tried giving values like <script>alert("What's up?")</script> etc in the form and it seemed to be fine... the text was displayed exactly the same way (<script> became <script>. But this conversion seemed to be done by the javascript plugin I used.
So I used fiddler to compose a POST request with the same script tag and this time, the page actually executed the JavaScript code.
Is there any way I can figure out vulnerable input like <script> and even Link...?
Unfortunately, you have to sanitize the HTML yourself. See these on how people did it:
How to sanitize input from MCE in ASP.NET? - whitelist using Html Agility Pack
.NET HTML Sanitation for rich HTML Input - blacklist using Html Agility Pack
An alternative to accepting HTML is to accept markdown or BBCode instead. Both of them are widely used (markdown is used by stackoverflow!) and eliminate the need to sanitize the input. There are rich editors available too.
Edit
I found that Microsoft Web Protection Library can sanitize HTML input
through AntiXss.GetSafeHtml and AntiXss.GetSafeHtmlFragment.
Documentation is really poor though and seems like you can't configure which tags are valid.
I faced the same problem sanitizing wysihtml5 content on the server side. I was rather charmed by how wysihtml5 performed client side sanitation and implemented this using Html Agility Pack: HtmlRuleSanitizer on Github
Also available as NuGet package.
The reason for not using Microsoft's AntiXss is that it's not possible to enforce more detailed rules like what to do with tags. This results in tags being completely deleted when it for example would make sense to preserve the textual content. In addition I wanted to have a white listing approach on everything (CSS, tags and attributes).

Importance of XML, XHTML, HTML5, RAILS, PYLONS

Say, I am creating a web project. I know where I would use HTML, CSS, javascript, and PHP(WAMP).
Now, Where and why would I use XML, XHTML/HTML5(wow it says vector graphics are possible with HTML5?), and Rails or Pylons?
I'm sorry if this looks like a n00b question. I'm not asking how to learn, or what it is - just where and why in a web project would I - if I have to - use it.
You would use XHTML or HTML5 instead of HTML, because XHTML and HTML5 are specific versions of HTML. HTML5 is the newest one.
You would use Ruby or Python instead of PHP, because you prefer one of them over the other ones.
You would use XML when talking to a foreign web service like Twitter, because you need to serialize data in some way. You can also use JSON instead of XML.

WMD editor sanitizing

I am trying to find ways to sanitize the input of the WMD editor.
Specifically, I am trying to make HTML tags only available in the <code>tags that WMD generates. Is that possible
My problem is that the following code is rendered as HTML which is vunerable to potential XSS attacks.
For example, <a onmouseover="alert(1)" href="#">read this!</a>
The above code renders normally both in preview mode and when saved to the database.
I notice that Stack Overflow doesn't seem to have this problem. The same code is just rendered as text.
I notice that the Stack Overflow team has shared their code in http://refactormycode.com/codes/333-sanitize-html. Do I really have to use C# in order to sanitize WMD to do this?
I ended up using HTML Purifier for this.
If you want to block bad scripts from WMD on the client side, take a look at my answer here:
Align the WMD editor's preview HTML with server-side HTML validation (e.g. no embedded JavaScript code).
It shows how to implement a client-side whitelist in the WMD editor to restrict WMD's preview pane HTML to known-safe HTML elements and known-safe HTML attributes. It does the validation after WMD geneates its HTML, so even if there's a bug in the WMD editor's HTML generation which allows bad script to get through, the whitelist blocker will catch it. This code is based on StackOverflow.com's implementation of the same validation.
That said, you also need server-side validation too (If you're using PHP, HTML Purifier is a good choice), because even if you fix the client, that doesn't prevent an attacker from simulating a browser and saving malicious markdown by POST-ing it to your server. So doing client-side WMD previewer validation isn't actually required, except to defend against an obscure case where an attacker manages to get compromised markdown onto the server, and convinces a site moderator to edit the page. In that case, client WMD previewer validation might prevent an attacker from taking over the entire site.
Also, doing client-side validation can be helpful because then you know that the same tags and HTML allowed by the client will also be allowed on the server. Make sure to sync the server-side whitelist with the client whitelist. StackOverflow's whitelist is here if you want an example.

Javascript Rich Text Editor and associated class to filter and clean the input?

I realise there are several rich text editors for jQuery but I cannot find any that have an associated class that does the filtering and cleaning required to accept the input into a database.
Does such a class exist?
I am particularly interested for a PHP library, but .NET would be interesting too.
If you would use FCKeditor it would allow to you get clean HTML or XHTML (editor.GetXHTML()) which you could then write into DB.
Actually it's not that much important what you write into DB, because usially you write original HTML (you can always strip from it saspicious tags if needed). In order to prevent XSS attacks it is essential to properly encode content before displaying it on the web-page. For .NET there is AntiXSS library for that at CodePlex.com
For PHP you may want to look at the following libraries:
Zend_Filter
HTML Sanitizer
PHP Input Filter
And also this article:
Avoiding XSS security attacks to sites that use HTML editors

Resources