Why is this query string invalid? - asp.net-mvc

In my asp.net mvc page I create a link that renders as followed:
http://localhost:3035/Formula/OverView?colorId=349405&paintCode=744&name=BRILLANT%20SILVER&formulaId=570230
According to the W3C validator, this is not correct and it errors after the first ampersand. It complains about the & not being encoded and the entity &p not recognised etc.
AFAIK the & shouldn't be encoded because it is a separator for the key value pair.
For those who care: I send these pars as querystring and not as "/" seperated values because there is no decent way of passing on optional parameters that I know of.
To put all the bits together:
an anchor (<a>) tag's href attribute needs an encoded value
& encodes to &
to encode an '&' when it is part of your parameter's value, use %26
Wouldn't encoding the ampersand into & make it part of my parameter's value?
I need it to seperate the second variable from the first
Indeed, by encoding my href value, I do get rid of the errors. What I'm wondering now however is what to do if for example my colorId would be "123&456", where the ampersand is part of the value.
Since the separator has to be encoded, what to do with encoded ampersands. Do they need to be encoded twice so to speak?
So to get the url:
www.mySite.com/search?query=123&456&page=1
What should my href value be?
Also, I think I'm about the first person in the world to care about this.. go check the www and count the pages that get their query string validated in the W3C validator..

Entities which are part of the attributes should be encoded, generally. Thus you need & instead of just &
It works even if it doesn't validate because most browsers are very, very, very lenient in what to accept.
In addition, if you are outputting XHTML you have to encode every entity everywhere, not just inside the attributes.

All HTML attributes need to use character entities. You only don't need to change & into & within script blocks.
Whatever
Anywhere in an HTML document that you want an & to display directly next to something other than whitespace, you need to use the character entity &. If it is part of an attribute, the & will work as though it was an &. If the document is XHTML, you need to use character entities everywhere, even if you don't have something immediately next to the &. You can also use other character entities as part of attributes to treat them as though they were the actual characters.
If you want to use an ampersand as part of a URL in a way other than as a separator for parameters, you should use %26.
As an example...
Hello
Would send the user to http://localhost/Hello, with name=Bob and text=you & me "forever".

This is a slightly confusing concept to some people, I've found. When you put & in a HTML page, such as in <a href="abc?def=5&ghi=10">, the URL is actually abc?def=5&ghi=10. The HTML parser converts the entity to an ampersand.
Think of exactly the same as how you need to escape quotes in a string:
// though you define your string like this:
myString = "this is \"something\" you know?"
// the string is ACTUALLY: this is "something" you know?
// when you look at the HTML, you see:
<a href="foo?bar=1&baz=2">
// but the url is ACTUALLY: foo?bar=1&bar=2

Related

Base 64 encoded querystring parameter getting characters replaced

I have a querystring parameter that is an encoded string that gets converted to Base64. That parameter is then embedded in a link within an email. When I click the link in the email, the querystring parameter has had all the + characters within it replaced by space characters. There are no other differences. Is there a method I can call to sanitise the string and effectively replace the spaces with pluses again. I'm currently doing a string replace which is a bit fat hack. Something is causing the replacement but I'm not sure what. Has anyone come across anything like this before?
Example - querystring parameter value within URL of the browser:
yo3rZZbZyG4UCN+L3pcTYJXmWEggnkW1qcyJk2uBrVTtGUSKIlBcJ8e9TSx8BHjHJv0JhI8H6LbIqUl+3lA7qn+lOgpSi3rCGN4bm5moOWcCA449C1Z3zj7J1FkOXH2HMox4VUZ7x7fF65MRwuBBmw==
Value of string within controller action:
yo3rZZbZyG4UCN L3pcTYJXmWEggnkW1qcyJk2uBrVTtGUSKIlBcJ8e9TSx8BHjHJv0JhI8H6LbIqUl 3lA7qn lOgpSi3rCGN4bm5moOWcCA449C1Z3zj7J1FkOXH2HMox4VUZ7x7fF65MRwuBBmw==
You should URL encode the base64 string to the link, so it is:
yo3rZZbZyG4UCN%2BL3pcTYJXmWEggnkW1qcyJk2uBrVTtGUSKIlBcJ8e9TSx8BHjHJv0JhI8H6LbIqUl%2B3lA7qn%2BlOgpSi3rCGN4bm5moOWcCA449C1Z3zj7J1FkOXH2HMox4VUZ7x7fF65MRwuBBmw%3D%3D
HttpUtility.UrlEncode(base64str) in .NET, or encodeURIComponent(base64str) in javascript
you can use System.Web.HttpServerUtility.UrlTokenEncode (from http://brockallen.com/2014/10/17/base64url-encoding/#comments)
It is doing this because the + sign is interpreted as a marker to say that another parameter follows. This is why it is getting mangled. You should URL encode your string before you pass it to the server.

Difference between these three URL's?

Can somebody explain how does it matter to pass different parameters in a url,
e-g
1: www.domain.com/folder1/folder2/file.html?param=9?val=ty5?test
2: www.domain.com/folder1/folder2/file.html#param=93#val=t5y5?test=9
3: www.domain.com/folder1/folder2/file.html&param=9?val=ty5&test=90#poiu
Basically I want to know what do these three characters (#, &, ?) do in the url. I have seen them most of the times? can I use some thing other than that
e-g: www.domain.com/folder1/folder2/file.html*param=9_val+ty5#test
? indicates the start of the query string
& separates key value pairs of the querystring
# indicates an anchor. Here's more on anchor links.
Note that all three of your urls are incorrect.
Valid url:
http://domain/path/file?name=value&name=value#anc
I notice you've edited your question with an additional question
can I use some thing other than that e-g:
www.domain.com/folder1/folder2/file.html*param=9_val+ty5#test
You can use whatever you like in the part of the querystring or anchor as long as it is url encoded.
This Wikipedia article goes in to the detail and gives some good examples.
A ? indicates the start of the query
A & separates the parameters in the query
A # identifies a fragment in the HTML resource to be rendered. It's often used to identify which but if the page the browser should ensure is in view eg a heading etc
? represents that the URL contains QueryString values.
& is used to for multiple querystring values. Example
www.abc.com/page?id=abc&pwd=def
and # is new for me I first time saw it.
1: ? is used for separating back end code to its arguments. Notice the file extension is html doesn't necessarily says that the back end code is in HTML
2: # is used to link to anchors within the html page
3: & is used for separating arguments with other arguments. in this case, the file.html is also an argument itself, while the backend code is the "/", which can be anything. e.g. index.php, default.asp, index.do. it all depends on your URL rewrite.

Can we use & in url?

Can we use "&" in a url ? or should "and" be used?
Yes, you can use it plain in your URL path like this:
http://example.com/Alice&Bob
Only if you want to use it in the query you need to encode it with %26:
http://example.com/?arg=Alice%26Bob
Otherwise it would be interpreted as argument separator when interpreted as application/x-www-form-urlencoded.
See RFC 3986 for more details.
An URL is generally in the form
scheme://host/some/path/to/file?query1=value&query2=value
So it is not advisable to use it in an URL unless you want to use it for parameters. Otherwise you should percent escape it using %26, e.g.
http://www.example.com/hello%26world
This results in the path being submitted as hello&world. There are other characters which must be escaped when used out of context in an URL. See here for a list.
Unless you're appending variables to the query string, encode it.
encode '&' with & (this answer is based on your use of tags)
If you are asking what to use "&" or "and" when registering the name of your URL, I would use "and".
EDIT: As mentioned in comments "& is an HTML character entity and not a URI character entity. By putting that into a URI you still have the ampersand character and additional extraneous characters." I started answering before fully understanding your question.

How in ASP.NET MVC to change Url.Encode character replacement strategy?

I'm using Url.Encode within a view and it's replacing spaces with + so instead of:
/production/cats-the-musical I'm getting .../cats+the+musical.
I'm sure this is an easy one, but where do you go to configuring which characters are used for this?
I'll be doing this:
public static string EncodeForSEO(this UrlHelper helper, string unencodedUrl)
{
return helper.Encode(unencodedUrl.Replace(' ', '-'));
}
Until I get a better answer from you guys.
Edit: Thanks Guffa for pointing out my hasty coding.
I want to draw attention to Path versus Query String encoding differences
MVC allows / encourages us to write paths (routes) that can be easier to remember than query strings. e.g. /Products.aspx?id=1 could, in MVC, be /Products/View/1
Building on that, it also encourages, for SEO friendliness, other data that may or may not be necessary like /Products/View/1/Coffee
If the name has space characters, or a necessary parameter is a string containing space characters, and you are including it in the Url path, one of 2 things must happen because a ' ' cannot be left in a Url Path or Query string parameter without being encoded.
You must UrlPathEncode() the string
first you transform the spaces in the string,
then call UrlPathEncode() as you may have other characters requiring encoding.
Note: there is a big difference between Url Encoding (meant for query strings) and Url Path Encoding (meant for path portions of Urls)
cats the musical -> UrlEncode -> cats+the+musical
-- this is not valid in a url path
cats the musical -> UrlPathEncode -> cats%20the%20musical
If you're following along; going back to Web Forms vs MVC - /Products.aspx?name=Coffee+Beans would be rewritten as /Products/View/Coffee%20Beans
So that leaves us where OP's question starts. Q: How do you get SEO and human Friendly Urls? Q: Use #Guffas code to replace the " " with "-" in your own code before UrlPathEncoding the rest.
In sites I've worked on, when we have a user-entered value used only for SEO (like a blog title or similar) we go a step further normalizing the string output by collapsing successive spaces into a single "-" e.g.
cats the musical which would otherwise be cats-----the-----musical becomes cats-the-musical
You can't change which characters the UrlEncode method uses, the use of "+" for spaces is defined in the standards for how an URL is encoded, using "-" instead would mean that the method would change the value and not just encoding it. As the "-" character is not encoded, there would be no way to decode the string back to the original value.
In your method, there is no need to check for the character before doing the replacement. If the Replace method doesn't find anything to replace, it just returns the original string reference.
public static string EncodeForSEO(this UrlHelper helper, string unencodedUrl) {
return helper.Encode(unencodedUrl.Replace(' ', '-'));
}

Encoding of XHTML and & (ampersand)

My website is XHTML Transitional compliant except for one thing: the & (ampersand) in the URL are written as it is, instead of &
That is, all the URLs in my pages are usually like this:
Foo
But XHTML validator generates this error:
cannot generate system identifier for general entity "y"
... and it wants the URL to be written like this:
Foo
The problem is that Internet Explorer and Firefox don't handle the URL correctly and ignore the y parameter. How can I make this link work and validate correctly?
It seems to me that it is impossible to write XHTML pages if the browsers don't work with strict encoded XHTML URLs.
Do you want to see in action? See the difference between these two links (copy and paste them as they are):
http://stackoverflow.com/search?q=ff&sort=newest
and
http://stackoverflow.com/search?q=ff&sort=newest
I have just tried this. What you attempted to do is correct. In HTML if you are writing a link the & characters should be encoded as & You would only encode the & as %26 if you wanted a parameter value to contain an ampersand. I just wrote a simple HTML page that contained a link: Click me
and it worked fine: default2.aspx received the parameters intended and the source passed validation.
The encoding of & as & is required in HTML, not in the link. When the browser sees the & in the HTML source for a link it will interpret it as an ampersand and the link target will be as intended. If you paste a URL into your browser address bar it does not expect it to be HTML and does not try to interpret any HTML encoding that it may contain. This is why your example links that you suggest we should copy/paste into a browser don't work and why we wouldn't expect them to work.
If you post a bit more of your actual code we might be able to see what you have done wrong, but you appear to be heading the right direction by using & in your anchor tags.
It was my fault: the hyperlink control already encoded &, so my URL http://foo?x=1&y=2 was encoded to http://foo?x=1&amp;y=2
Normally the &amp inside the URL is correctly handled by browsers, as you stated.
You could use & instead of & in your URL within your page.
That should allow it to be validated as strict XHTML...
Foo
Note, if used by an ASP.NET Request.QueryString function, the query string doesn't use XML encoding; it uses URL encoding:
/mypath/mypage?b=%26stuff
So you need to provide a function translating '&' into %26.
Note: in that case, Server.URLEncode(”neetu & geetu”), which would produce neetu+%26+geetu, is not what you want, since you need to translate & into %26, not just '&'. You must add a replace() call applied to URLEncode result, in order to replace '%26amp;' by '%26'.
To be even more thorough: use &, a numeric character reference.
Because & is a character entity reference:
Character entity references are defined in the markup language
definition. This means, for example, that for HTML only a specific
range of characters (defined by the HTML specification) can be
represented as character entity references (and that includes only a
small subset of the Unicode range).
That's coming from the wise people at W3C (read this for more).
Of course, this is not a very big deal, but the suggestion of W3C is that the numeric one will be valid and useable everywhere and always, while the named one is 'fine' for HTML but nothing more.
The problem is worse than you think - try it in Safari. &amp; gets converted to &#38; and the hash ends the URL.
The correct answer is to not output XHTML - there's no reason that justifies spending more time on development and alienating Mac users.

Resources