How in ASP.NET MVC to change Url.Encode character replacement strategy? - asp.net-mvc

I'm using Url.Encode within a view and it's replacing spaces with + so instead of:
/production/cats-the-musical I'm getting .../cats+the+musical.
I'm sure this is an easy one, but where do you go to configuring which characters are used for this?
I'll be doing this:
public static string EncodeForSEO(this UrlHelper helper, string unencodedUrl)
{
return helper.Encode(unencodedUrl.Replace(' ', '-'));
}
Until I get a better answer from you guys.
Edit: Thanks Guffa for pointing out my hasty coding.

I want to draw attention to Path versus Query String encoding differences
MVC allows / encourages us to write paths (routes) that can be easier to remember than query strings. e.g. /Products.aspx?id=1 could, in MVC, be /Products/View/1
Building on that, it also encourages, for SEO friendliness, other data that may or may not be necessary like /Products/View/1/Coffee
If the name has space characters, or a necessary parameter is a string containing space characters, and you are including it in the Url path, one of 2 things must happen because a ' ' cannot be left in a Url Path or Query string parameter without being encoded.
You must UrlPathEncode() the string
first you transform the spaces in the string,
then call UrlPathEncode() as you may have other characters requiring encoding.
Note: there is a big difference between Url Encoding (meant for query strings) and Url Path Encoding (meant for path portions of Urls)
cats the musical -> UrlEncode -> cats+the+musical
-- this is not valid in a url path
cats the musical -> UrlPathEncode -> cats%20the%20musical
If you're following along; going back to Web Forms vs MVC - /Products.aspx?name=Coffee+Beans would be rewritten as /Products/View/Coffee%20Beans
So that leaves us where OP's question starts. Q: How do you get SEO and human Friendly Urls? Q: Use #Guffas code to replace the " " with "-" in your own code before UrlPathEncoding the rest.
In sites I've worked on, when we have a user-entered value used only for SEO (like a blog title or similar) we go a step further normalizing the string output by collapsing successive spaces into a single "-" e.g.
cats the musical which would otherwise be cats-----the-----musical becomes cats-the-musical

You can't change which characters the UrlEncode method uses, the use of "+" for spaces is defined in the standards for how an URL is encoded, using "-" instead would mean that the method would change the value and not just encoding it. As the "-" character is not encoded, there would be no way to decode the string back to the original value.
In your method, there is no need to check for the character before doing the replacement. If the Replace method doesn't find anything to replace, it just returns the original string reference.
public static string EncodeForSEO(this UrlHelper helper, string unencodedUrl) {
return helper.Encode(unencodedUrl.Replace(' ', '-'));
}

Related

Translate string with URL in web2py

A simple question - If I have a web page view in web2py with a string like
Here is a string to translate.
What is the approved way to apply the T() operator to it so that I can provide different language translations, but keep the same url?
Your translation string can include interpolated variables as described here. For example:
T('Here is a string to translate',
dict(url=URL('my_other_page', vars=vars)))
Note, if you intend to insert the above in a web2py view, the HTML markup will be escaped by default, so you would have to wrap it in XML() to prevent the escaping:
{{=XML(T('Here is a string to translate',
dict(url=URL('my_other_page', vars=vars))))}}
For me the simplest way is first translate with a placeholder like:
Here is a string to translate
and second substitute the placeholder #URL# with URL('my_other_page', vars=vars)
The only risk is the corruption of code or placeholder during translation.

safe character to separate multiple urls

I am preparing a special string, in which keys are values are concatenated like below:
username=foo&age=24&email=foo#bar.com&homepage=http://foo.com
& is the separator for two key=value pairs
value is url encoded
I have a scenario where there are multiple home pages for a user.
I want to specify multiple urls for the homepage key
name=foo&age=24&email=foo#bar.com&homepage=url1<some_safe_url_separator_char>url2<some_safe_url_separator_char>url3
We have no control/idea over what url1, url2, .. may contain?
What is a good choice of some_safe_url_separator_char?
In other words I am not looking for a safe character to be used IN a url, but a safe character to be used to SEPARATE two urls in a string
well you can use URL re-writing for this .
It will make a URL that will be safe as it will hide the name of parameters
For refrence you can use URL rewriting
URL rewriting will make a url seprated by '/' and its tough to be decoded by an external person.
you can follow links i'm posting
URL rewriting for beginners

Should I url encode a query string parameter that's a URL?

Just say I have the following url that has a query string parameter that's an url:
http://www.someSite.com?next=http://www.anotherSite.com?test=1&test=2
Should I url encode the next parameter? If I do, who's responsible for decoding it - the web browser, or my web app?
The reason I ask is I see lots of big sites that do things like the following
http://www.someSite.com?next=http://www.anotherSite.com/another/url
In the above, they don't bother encoding the next parameter because I'm guessing, they know it doesn't have any query string parameters itself. Is this ok to do if my next url doesn't include any query string parameters as well?
RFC 2396 sec. 2.2 says that you should URL-encode those symbols anywhere where they're not used for their explicit meanings; i.e. you should always form targetUrl + '?next=' + urlencode(nextURL).
The web browser does not 'decode' those parameters at all; the browser doesn't know anything about the parameters but just passes along the string. A query string of the form http://www.example.com/path/to/query?param1=value&param2=value2 is GET-requested by the browser as:
GET /path/to/query?param1=value&param2=value2 HTTP/1.1
Host: www.example.com
(other headers follow)
On the backend, you'll need to parse the results. I think PHP's $_REQUEST array will have already done this for you; in other languages you'll want to split over the first ? character, then split over the & characters, then split over the first = character, then urldecode both the name and the value.
According to RFC 3986:
The query component is indicated by the first question mark ("?")
character and terminated by a number sign ("#") character or by the
end of the URI.
So the following URI is valid:
http://www.example.com?next=http://www.example.com
The following excerpt from the RFC makes this clear:
... as query components are often used to carry identifying
information in the form of "key=value" pairs and one frequently used
value is a reference to another URI, it is sometimes better for
usability to avoid percent-encoding those characters.
It is worth noting that RFC 3986 makes RFC 2396 obsolete.

Base 64 encoded querystring parameter getting characters replaced

I have a querystring parameter that is an encoded string that gets converted to Base64. That parameter is then embedded in a link within an email. When I click the link in the email, the querystring parameter has had all the + characters within it replaced by space characters. There are no other differences. Is there a method I can call to sanitise the string and effectively replace the spaces with pluses again. I'm currently doing a string replace which is a bit fat hack. Something is causing the replacement but I'm not sure what. Has anyone come across anything like this before?
Example - querystring parameter value within URL of the browser:
yo3rZZbZyG4UCN+L3pcTYJXmWEggnkW1qcyJk2uBrVTtGUSKIlBcJ8e9TSx8BHjHJv0JhI8H6LbIqUl+3lA7qn+lOgpSi3rCGN4bm5moOWcCA449C1Z3zj7J1FkOXH2HMox4VUZ7x7fF65MRwuBBmw==
Value of string within controller action:
yo3rZZbZyG4UCN L3pcTYJXmWEggnkW1qcyJk2uBrVTtGUSKIlBcJ8e9TSx8BHjHJv0JhI8H6LbIqUl 3lA7qn lOgpSi3rCGN4bm5moOWcCA449C1Z3zj7J1FkOXH2HMox4VUZ7x7fF65MRwuBBmw==
You should URL encode the base64 string to the link, so it is:
yo3rZZbZyG4UCN%2BL3pcTYJXmWEggnkW1qcyJk2uBrVTtGUSKIlBcJ8e9TSx8BHjHJv0JhI8H6LbIqUl%2B3lA7qn%2BlOgpSi3rCGN4bm5moOWcCA449C1Z3zj7J1FkOXH2HMox4VUZ7x7fF65MRwuBBmw%3D%3D
HttpUtility.UrlEncode(base64str) in .NET, or encodeURIComponent(base64str) in javascript
you can use System.Web.HttpServerUtility.UrlTokenEncode (from http://brockallen.com/2014/10/17/base64url-encoding/#comments)
It is doing this because the + sign is interpreted as a marker to say that another parameter follows. This is why it is getting mangled. You should URL encode your string before you pass it to the server.

Why is this query string invalid?

In my asp.net mvc page I create a link that renders as followed:
http://localhost:3035/Formula/OverView?colorId=349405&paintCode=744&name=BRILLANT%20SILVER&formulaId=570230
According to the W3C validator, this is not correct and it errors after the first ampersand. It complains about the & not being encoded and the entity &p not recognised etc.
AFAIK the & shouldn't be encoded because it is a separator for the key value pair.
For those who care: I send these pars as querystring and not as "/" seperated values because there is no decent way of passing on optional parameters that I know of.
To put all the bits together:
an anchor (<a>) tag's href attribute needs an encoded value
& encodes to &
to encode an '&' when it is part of your parameter's value, use %26
Wouldn't encoding the ampersand into & make it part of my parameter's value?
I need it to seperate the second variable from the first
Indeed, by encoding my href value, I do get rid of the errors. What I'm wondering now however is what to do if for example my colorId would be "123&456", where the ampersand is part of the value.
Since the separator has to be encoded, what to do with encoded ampersands. Do they need to be encoded twice so to speak?
So to get the url:
www.mySite.com/search?query=123&456&page=1
What should my href value be?
Also, I think I'm about the first person in the world to care about this.. go check the www and count the pages that get their query string validated in the W3C validator..
Entities which are part of the attributes should be encoded, generally. Thus you need & instead of just &
It works even if it doesn't validate because most browsers are very, very, very lenient in what to accept.
In addition, if you are outputting XHTML you have to encode every entity everywhere, not just inside the attributes.
All HTML attributes need to use character entities. You only don't need to change & into & within script blocks.
Whatever
Anywhere in an HTML document that you want an & to display directly next to something other than whitespace, you need to use the character entity &. If it is part of an attribute, the & will work as though it was an &. If the document is XHTML, you need to use character entities everywhere, even if you don't have something immediately next to the &. You can also use other character entities as part of attributes to treat them as though they were the actual characters.
If you want to use an ampersand as part of a URL in a way other than as a separator for parameters, you should use %26.
As an example...
Hello
Would send the user to http://localhost/Hello, with name=Bob and text=you & me "forever".
This is a slightly confusing concept to some people, I've found. When you put & in a HTML page, such as in <a href="abc?def=5&ghi=10">, the URL is actually abc?def=5&ghi=10. The HTML parser converts the entity to an ampersand.
Think of exactly the same as how you need to escape quotes in a string:
// though you define your string like this:
myString = "this is \"something\" you know?"
// the string is ACTUALLY: this is "something" you know?
// when you look at the HTML, you see:
<a href="foo?bar=1&baz=2">
// but the url is ACTUALLY: foo?bar=1&bar=2

Resources