Does HTML Encoding have any cons? - asp.net-mvc

I develop a project on ASP.NET MVC framework. All files and charsets are in UTF-8. I'm using model bindings and in some of my models the display property includes some accented chars or single/double quotes.
As Razor engine automatically encodes helpers (ie. DisplayNameFor) the accented chars and quotes are encoded.
I may try to use some custom helpers to achieve rendering without encoding but I would like to learn if HTML encoding has any cons? I'm using UTF-8 encoding and I want to render text "Öger's tours" as it is. However it is rendered as "Öger's tours". I'm asking for this scenario.
(I've heard that search engine indexing performs better without encoded text. But I don't know why.)
Thank you.

The only mandatory character to entity encoding is for <, which starts the opening and closing tags of HTML elements, the & character, which otherwise starts an HTML entity, and (within attributes enclosed in double quotes) " to prevent terminating an attribute prematurely. It is also a good idea to use the entity for > to prevent confusing parsers.
For everything else it is absolutely enough to specify the proper charset encoding and properly apply it in the HTML file. There is particularly no need to encode ' outside attribute values enclosed in single quotes or umlauts, ligatures or other non-ASCII characters if the HTML file's charset supports them.

I found the solution as using the AntiXSS library for Razor encoderType. This answer describes it well. Special characters in html output
The default Razor encoder encodes accented chars whereas the AntiXSS library does not encode them. So, accented chars are rendered as they are.

Related

Why do ampersands (&) need to be encoded in JSF? Is there a way around this?

I often have Javascript on my JSF XHTML pages that has && in it, which I end up having to encode as &&
For example, when I place the following in my JSF XHTML page file:
I am an & sign
I get the error:
The entity name must immediately follow the '&' in the entity reference
One wayto fix this appears to be to change the '&' to & which I find undesirable to just writing '&'.
It also appears that for cases where I use the '&' in Javascript, I can wrap the Javascript in CDATA tags; when wrapped in CDATA tags, I can then write '&' without having to escape it as &, which is a good workaround to be able to have more readable Javascript code on my page.
But what happens when I want to use the literal '&' elsewhere on the page when it is not within <script> tags and therefore cannot as easily wrap the code in CDATA tags? Must I always escape '&' as & for these cases?
Note trying to use 's ability to escape values and do not seem to be able to fix the issue
Facelets is a XML based view technology. Any characters which have special treatment by the XML parser needs to be XML-escaped when the intent is to present them literally. That covers among others < and &. The < indicates the start of a XML tag like so <foo> and the & indicates the start of a XML entity like so &. The < must be escaped as < and the & as &.
Not escaping them in Facelets would result in the following exception for <
javax.faces.view.facelets.FaceletException: Error Parsing /test.xhtml: Error Traced[line: 42] The content of elements must consist of well-formed character data or markup.
and the following one for &
javax.faces.view.facelets.FaceletException: Error Parsing /test.xhtml: Error Traced[line: 42] The entity name must immediately follow the '&' in the entity reference.
This is not specifically related to JavaScript, this applies to the entire view, including "plain text". Those characters just happen to be JavaScript operators as well. There's no way to go around this, that's just how XML is specified. In JavaScript, there's however one more way to avoid escaping or using CDATA blocks: just put that JS code in its own .js file which you load by <script> or <h:outputScript>.
In EL, there is also the && operator which also needs to be escaped as && as well, but fortunately there's an alias for this operator, the and operator.
See also:
Mozilla Developer Network - Writing JavaScript for XHTML
It's because & is special characters in XML : http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
FYI, I tried to write the (c) character in my JSF page.
An error was raised when I wrote &copy : "copy is referenced but not declared"
When I wrote © I got the raw string back.
I could display the special character using the unicode notation : ©
This code worked for me :
<h:outputText value="&copy;" escape="false" />

Rails 3 dealing with special characters

I want to provide user with ability to fill-in input field with special characters (i.e. ¥ and others).
User input could be saved in xml file and later fetched and rendered back to form input.
What is the best practice of saving special symbols to xml (maybe using html entities or hexadecimal form)?
Thanks for advance.
I'd say if you save the file in utf-8 you will have no problems.
If some controller/view has problems with encoding you have to place this in the first line:
# encoding: utf-8
There's nothing special about them and you can don't need to encode them. Let your XML library deal with that, XML supports unicode ever since, and what you call "special symbols" are just unicode characters.

How to disable UTF character (punctuation) escaping when creating XML using default to_xml with Rails?

Given a rails models column that contains
"Something & Something Else" when outputting to_xml
Rails will escape the Ampersand like so:
<MyElement>Something & Something Else</MyElement>
Our client software is all UTF aware and it would be better if we can just leave the column content raw in our XML output.
There was an old solution that worked by setting $KCODE="UTF8" in an environment file, but this trick no longer works, and was always an All or Nothing solution.
Any recommendations on how to disable this? on a case by case basis?
It does not matter if the client software is UTF-8-aware. An ampersand cannot be used unescaped in XML. If the software is supposed to also be XML-aware, then any content that includes ampersands is not allowed to be kept "raw".
This is nothing to do with Unicode (or "UTF"). Ampersands in XML must be escaped, otherwise it isn't XML, and no XML software will accept it. If you're saying you want the escaping disabled, then you're saying you don't want the output to be XML.

Why is this query string invalid?

In my asp.net mvc page I create a link that renders as followed:
http://localhost:3035/Formula/OverView?colorId=349405&paintCode=744&name=BRILLANT%20SILVER&formulaId=570230
According to the W3C validator, this is not correct and it errors after the first ampersand. It complains about the & not being encoded and the entity &p not recognised etc.
AFAIK the & shouldn't be encoded because it is a separator for the key value pair.
For those who care: I send these pars as querystring and not as "/" seperated values because there is no decent way of passing on optional parameters that I know of.
To put all the bits together:
an anchor (<a>) tag's href attribute needs an encoded value
& encodes to &
to encode an '&' when it is part of your parameter's value, use %26
Wouldn't encoding the ampersand into & make it part of my parameter's value?
I need it to seperate the second variable from the first
Indeed, by encoding my href value, I do get rid of the errors. What I'm wondering now however is what to do if for example my colorId would be "123&456", where the ampersand is part of the value.
Since the separator has to be encoded, what to do with encoded ampersands. Do they need to be encoded twice so to speak?
So to get the url:
www.mySite.com/search?query=123&456&page=1
What should my href value be?
Also, I think I'm about the first person in the world to care about this.. go check the www and count the pages that get their query string validated in the W3C validator..
Entities which are part of the attributes should be encoded, generally. Thus you need & instead of just &
It works even if it doesn't validate because most browsers are very, very, very lenient in what to accept.
In addition, if you are outputting XHTML you have to encode every entity everywhere, not just inside the attributes.
All HTML attributes need to use character entities. You only don't need to change & into & within script blocks.
Whatever
Anywhere in an HTML document that you want an & to display directly next to something other than whitespace, you need to use the character entity &. If it is part of an attribute, the & will work as though it was an &. If the document is XHTML, you need to use character entities everywhere, even if you don't have something immediately next to the &. You can also use other character entities as part of attributes to treat them as though they were the actual characters.
If you want to use an ampersand as part of a URL in a way other than as a separator for parameters, you should use %26.
As an example...
Hello
Would send the user to http://localhost/Hello, with name=Bob and text=you & me "forever".
This is a slightly confusing concept to some people, I've found. When you put & in a HTML page, such as in <a href="abc?def=5&ghi=10">, the URL is actually abc?def=5&ghi=10. The HTML parser converts the entity to an ampersand.
Think of exactly the same as how you need to escape quotes in a string:
// though you define your string like this:
myString = "this is \"something\" you know?"
// the string is ACTUALLY: this is "something" you know?
// when you look at the HTML, you see:
<a href="foo?bar=1&baz=2">
// but the url is ACTUALLY: foo?bar=1&bar=2

Encoding of XHTML and & (ampersand)

My website is XHTML Transitional compliant except for one thing: the & (ampersand) in the URL are written as it is, instead of &
That is, all the URLs in my pages are usually like this:
Foo
But XHTML validator generates this error:
cannot generate system identifier for general entity "y"
... and it wants the URL to be written like this:
Foo
The problem is that Internet Explorer and Firefox don't handle the URL correctly and ignore the y parameter. How can I make this link work and validate correctly?
It seems to me that it is impossible to write XHTML pages if the browsers don't work with strict encoded XHTML URLs.
Do you want to see in action? See the difference between these two links (copy and paste them as they are):
http://stackoverflow.com/search?q=ff&sort=newest
and
http://stackoverflow.com/search?q=ff&sort=newest
I have just tried this. What you attempted to do is correct. In HTML if you are writing a link the & characters should be encoded as & You would only encode the & as %26 if you wanted a parameter value to contain an ampersand. I just wrote a simple HTML page that contained a link: Click me
and it worked fine: default2.aspx received the parameters intended and the source passed validation.
The encoding of & as & is required in HTML, not in the link. When the browser sees the & in the HTML source for a link it will interpret it as an ampersand and the link target will be as intended. If you paste a URL into your browser address bar it does not expect it to be HTML and does not try to interpret any HTML encoding that it may contain. This is why your example links that you suggest we should copy/paste into a browser don't work and why we wouldn't expect them to work.
If you post a bit more of your actual code we might be able to see what you have done wrong, but you appear to be heading the right direction by using & in your anchor tags.
It was my fault: the hyperlink control already encoded &, so my URL http://foo?x=1&y=2 was encoded to http://foo?x=1&amp;y=2
Normally the &amp inside the URL is correctly handled by browsers, as you stated.
You could use & instead of & in your URL within your page.
That should allow it to be validated as strict XHTML...
Foo
Note, if used by an ASP.NET Request.QueryString function, the query string doesn't use XML encoding; it uses URL encoding:
/mypath/mypage?b=%26stuff
So you need to provide a function translating '&' into %26.
Note: in that case, Server.URLEncode(”neetu & geetu”), which would produce neetu+%26+geetu, is not what you want, since you need to translate & into %26, not just '&'. You must add a replace() call applied to URLEncode result, in order to replace '%26amp;' by '%26'.
To be even more thorough: use &, a numeric character reference.
Because & is a character entity reference:
Character entity references are defined in the markup language
definition. This means, for example, that for HTML only a specific
range of characters (defined by the HTML specification) can be
represented as character entity references (and that includes only a
small subset of the Unicode range).
That's coming from the wise people at W3C (read this for more).
Of course, this is not a very big deal, but the suggestion of W3C is that the numeric one will be valid and useable everywhere and always, while the named one is 'fine' for HTML but nothing more.
The problem is worse than you think - try it in Safari. &amp; gets converted to &#38; and the hash ends the URL.
The correct answer is to not output XHTML - there's no reason that justifies spending more time on development and alienating Mac users.

Resources