In Firefox extension we use parseFragment (documentation) to parse a string of HTML (received from 3rd party server) into a sanitized DocumentFragment as it required by Mozilla. The only problem, the parser removes all attributes we need, for example, class attribute.
Is it possible somehow to keep class attributes while parsing HTML with parseFragment?
P.S. I know that in Gecko 14.0 they replaced this function with another which supports sanitizing parameters. But what to do with Gecko < 14.0?
No, the whitelist is hardcoded and cannot be adjusted. However, the class attribute is in the whitelist and should be kept, you probably meant the style attribute? If you need a customized behavior you will have to use a different solution (like DOMParser which can parse HTML documents in Firefox 12).
As to older Firefox versions, you can parse XHTML data with DOMParser there. If you really have HTML then I am only aware of one way to parse it without immediately inserting it into a document (which might cause various security issues): range.createContextualFragment(). You need an HTML document for that, if you don't have one - a hidden <iframe> loading about:blank will do as well. Here is how it works:
// Get the HTML document
var doc = document.getElementById("dummyFrame").contentDocument;
// Parse data
var fragment = doc.createRange().createContextualFragment(htmlData);
// Sanitize it
sanitizeData(fragment);
Here sanitizing the data is your own responsibility. You probably want to base your sanitization on Mozilla's whitelist that I linked to above - remove all tags and attributes that are not on that list, also make sure to check the links. The style attribute is a special case: it used to be insecure but IMHO no longer is given than -moz-binding isn't supported on the web any more.
Related
I'm not clear of when (or if) I should use multiple Grails encodeAsXXX calls.
This reference says you need to encodeAsURL and then encodeAsJavaScript: http://grailsrocks.com/blog/2013/4/19/can-i-pwn-your-grails-application
It also says you need to encodeAsURL and then encodeAsHTML, I don't understand why this is necessary in the case shown but not all the time?
Are there other cases I should me using multiple chained encoders?
If I'm rendering a URL to a HTML attribute should I encodeAsURL then encodeAsHTML?
If I'm rendering a URL to a JavaScript variable sent as part of a HTML document (via a SCRIPT element) should I encodeAsURL, encodeAsJavaScript then encodeAsHTML?
If I'm rendering a string to a JavaScript variable sent as part of a HTML document should I encodeAsJavaScript then encodeAsHTML?
The official docs - https://docs.grails.org/latest/guide/security.html - don't show any examples of multiple chained encoders.
I can't see how I can understand what to do here except by finding the source for all the encoders and looking at what they encode and what's valid on the receiving end - but I figure it shouldn't be that hard for a developer and there is probably something simple I'm missing or some instructions I haven't found.
FWIW, I think the encoders I'm talking about are these ones:
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/util/JavaScriptUtils.html#javaScriptEscape-java.lang.String-
https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html#encode(java.lang.String,%20java.lang.String)
https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html#htmlEscape-java.lang.String-
.
It is certainly important to always consider XSS but in reading your question I think you are overestimating what you need to do. As long as you're using Grails 2.3 or higher and your grails.views.default.codec is set to html which it will be by default, everything rendered in your GSP with ${} will be escaped properly for you.
It is only when you are intentionally bypassing the escaping, such as if you need to get sanitized user input back into valid JavaScript within your GSP for some reason, that you would need to use the encodeAsXXX methods or similar.
I would argue (and the article makes a mention of this as well) that this should raise a smell anyway, as you probably should have that JavaScript encapsulated in a different file or TagLib where the escaping is handled.
Bottom line, use the encoding methods only if you are overriding the default HTML encoding, otherwise ${} handles it for you.
I'm rendering content using Backbone in Rails. Some of the json properties i'm getting from the models will be html attributes, some of them might be used inside the javascript and others will be inserted between html elements. All of these require different escaping mechanisms, how do people deal with this?
In our project we are using doT templates which (as most other) allow for interpolation with encoding ({{! ... }}). You could also try to encode all data and strip any possible javascripts server side when data is saved to be 100% sure you won't get anything malicious
Additionally if you are using jquery methods remember to use text method to insert data rather then html as text will automatically encode it.
And I really recommend the doT! It's super fast and we've managed to make it play really nicely with requirejs
I have in my browser.xul code,what I am tyring to is to fetch data from an html file and to insert it into my div element.
I am trying to use div.innerHTML but I am getting an exception:
Component returned failure code: 0x804e03f7
[nsIDOMNSHTMLElement.innerHTML]
I tried to parse the HTML using Components.interfaces.nsIScriptableUnescapeHTML and to append the parsed html into my div but my problem is that style(attribute and tag) and script isn`t parsed.
First a warning: if your HTML data comes from the web then you are trying to build a security hole into your extension. HTML code from the web should never be trusted (even when coming from your own web server and via HTTPS) and you should really use nsIScriptableUnescapeHTML. Styles should be part of your extension, using styles from the web isn't safe. For more information: https://developer.mozilla.org/En/Displaying_web_content_in_an_extension_without_security_issues
As to your problem, this error code is NS_ERROR_HTMLPARSER_STOPPARSING which seems to mean a parsing error. I guess that you are trying to feed it regular HTML code rather than XHTML (which would be XML-compliant). Either way, a better way to parse XHTML code would be DOMParser, this gives you a document that you can then insert into the right place.
If the point is really to parse HTML code (not XHTML) then you have two options. One is using an <iframe> element and displaying your data there. You can generate a data: URL from your HTML data:
frame.src = "data:text/html;charset=utf-8," + encodeURIComponent(htmlData);
If you don't want to display the data in a frame you will still need a frame (can be hidden) that has an HTML document loaded (can be about:blank). You then use Range.createContextualFragment() to parse your HTML string:
var range = frame.contentDocument.createRange();
range.selectNode(frame.contentDocument.documentElement);
var fragment = range.createContextualFragment(htmlData);
XML documents don't have innerHTML, and nsIScriptableUnescapeHTML is one way to get the html parsed but it's designed for uses where the HTML might not be safe; as you've found out it throws away the script nodes (and a few other things).
There are a couple of alternatives, however. You can use the responseXML property, although this may be suboptimal unless you're receiving XHTML content.
You could also use an iframe. It may seem old-fashioned, but an iframe's job is to take a url (the src property) and render the content it receives, which necessarily means parsing it and building a DOM. In general, when an extension running as chrome does this, it will have to take care not to give the remote content the same chrome privilages. Luckily that's easily managed; just put type="content" on the iframe. However, since you're looking to import the DOM into your XUL document wholesale, you must have already ensured that this remote content will always be safe. You're evidently using an HTTPS connection, and you've taken extra care to verify the identity of the server by making sure it sends the right certificate. You've also verified that the server hasn't been hacked and isn't delivering malicious content.
I'm writing a restful XML API for a university assignment, the spec requires no HTML frontend.
There doesn't seem to be any documentation (or guessable functionality) regarding how to change the default format? Whilst thus far I have created all templates as ...Success.xml.php it would be easier to just use the regular ones and set this globally; I really expected this functionality to be configurable from YAML.. yet I have found some hard coded references to the HTML format.
The main issue I'm encountering is that part of the assessment is returning a 404 in a certain way (not as a 404 :/), but importantly it must always return XML, and the default setup of a missing route is a HTML 404 not XML (so it only works when I use forward404 from an action running via a XML route.
So in summary, is there a way to do this / what class(es) do I have to override?
Try putting this in factories.yml
all:
request:
class: sfWebRequest
param:
default_format: xml
That will still need the template names changing though. It will just mean that urls that don't specify a format will revert to xml instead of html.
You can subclass sfPHPView and override the initialise method to affect this (copy paste the initialise method from sfView) - the lines like this need changing:
if ('html' != $format)
You then need to change the view class used ... try this:
http://mirmodynamics.com/post/2009/02/23/symfony%3A-use-your-own-View-class
My website is XHTML Transitional compliant except for one thing: the & (ampersand) in the URL are written as it is, instead of &
That is, all the URLs in my pages are usually like this:
Foo
But XHTML validator generates this error:
cannot generate system identifier for general entity "y"
... and it wants the URL to be written like this:
Foo
The problem is that Internet Explorer and Firefox don't handle the URL correctly and ignore the y parameter. How can I make this link work and validate correctly?
It seems to me that it is impossible to write XHTML pages if the browsers don't work with strict encoded XHTML URLs.
Do you want to see in action? See the difference between these two links (copy and paste them as they are):
http://stackoverflow.com/search?q=ff&sort=newest
and
http://stackoverflow.com/search?q=ff&sort=newest
I have just tried this. What you attempted to do is correct. In HTML if you are writing a link the & characters should be encoded as & You would only encode the & as %26 if you wanted a parameter value to contain an ampersand. I just wrote a simple HTML page that contained a link: Click me
and it worked fine: default2.aspx received the parameters intended and the source passed validation.
The encoding of & as & is required in HTML, not in the link. When the browser sees the & in the HTML source for a link it will interpret it as an ampersand and the link target will be as intended. If you paste a URL into your browser address bar it does not expect it to be HTML and does not try to interpret any HTML encoding that it may contain. This is why your example links that you suggest we should copy/paste into a browser don't work and why we wouldn't expect them to work.
If you post a bit more of your actual code we might be able to see what you have done wrong, but you appear to be heading the right direction by using & in your anchor tags.
It was my fault: the hyperlink control already encoded &, so my URL http://foo?x=1&y=2 was encoded to http://foo?x=1&y=2
Normally the & inside the URL is correctly handled by browsers, as you stated.
You could use & instead of & in your URL within your page.
That should allow it to be validated as strict XHTML...
Foo
Note, if used by an ASP.NET Request.QueryString function, the query string doesn't use XML encoding; it uses URL encoding:
/mypath/mypage?b=%26stuff
So you need to provide a function translating '&' into %26.
Note: in that case, Server.URLEncode(”neetu & geetu”), which would produce neetu+%26+geetu, is not what you want, since you need to translate & into %26, not just '&'. You must add a replace() call applied to URLEncode result, in order to replace '%26amp;' by '%26'.
To be even more thorough: use &, a numeric character reference.
Because & is a character entity reference:
Character entity references are defined in the markup language
definition. This means, for example, that for HTML only a specific
range of characters (defined by the HTML specification) can be
represented as character entity references (and that includes only a
small subset of the Unicode range).
That's coming from the wise people at W3C (read this for more).
Of course, this is not a very big deal, but the suggestion of W3C is that the numeric one will be valid and useable everywhere and always, while the named one is 'fine' for HTML but nothing more.
The problem is worse than you think - try it in Safari. & gets converted to & and the hash ends the URL.
The correct answer is to not output XHTML - there's no reason that justifies spending more time on development and alienating Mac users.