Changing "/" to "%2f" in URL doesn't work - url

I have an orchard site and have the following problem:
If I use the URL: http://asiahotelct.com/tours/ct---chau-%C4%91oc---ha-tien-3n2%C4%91, it's okay. But when I change url the / to %2f (like so: http://asiahotelct.com/tours%2fct---chau-%C4%91oc---ha-tien-3n2%C4%91), it no longer works.
Why can / not be replaced by %2f?

Any url is a kind of complete address to some resource(file) in network. But according to the rules of how it must be actually (to work as you expect), its expected that a few characters must have some specific meaning; just like in this case: "/" means a separator that separates the individual elements of your address(url).
But in case you need such specific characters to be a part of any such element of address(url), we must encode it. List of codes
URL encoding converts characters into a format that can be transmitted
over the Internet.
- w3Schools
So, "/" is actually a seperator, but "%2f" becomes an ordinary character that simply represents "/" character in element of your url.

Related

Is there a way to escape all the special characters in a url string parameter?

I need users to be able to pass a file path as a parameter of a get url (the file would not be uploaded and only the local file path is used for some security reasons). Now it's difficult for them to go and change all the backslashes to "%5". I was wondering if there is a way to force encoding of a part of the url. For example something as simple as putting it in double quotes, which doesn't work...
http://example.com/"c:\user\somone\somefile.txt"/dosomething
I ended up using pattern matching of rest routes at the server level. Something like this:
/example.com/*path/dosomething
So it would match any path even with slashes/backslashes. At last I do a decoding of the url to get rid of the escaped characters passed by browser for chars like space.
java.net.URLDecoder.decode(path, "UTF-8")

Should I url encode a query string parameter that's a URL?

Just say I have the following url that has a query string parameter that's an url:
http://www.someSite.com?next=http://www.anotherSite.com?test=1&test=2
Should I url encode the next parameter? If I do, who's responsible for decoding it - the web browser, or my web app?
The reason I ask is I see lots of big sites that do things like the following
http://www.someSite.com?next=http://www.anotherSite.com/another/url
In the above, they don't bother encoding the next parameter because I'm guessing, they know it doesn't have any query string parameters itself. Is this ok to do if my next url doesn't include any query string parameters as well?
RFC 2396 sec. 2.2 says that you should URL-encode those symbols anywhere where they're not used for their explicit meanings; i.e. you should always form targetUrl + '?next=' + urlencode(nextURL).
The web browser does not 'decode' those parameters at all; the browser doesn't know anything about the parameters but just passes along the string. A query string of the form http://www.example.com/path/to/query?param1=value&param2=value2 is GET-requested by the browser as:
GET /path/to/query?param1=value&param2=value2 HTTP/1.1
Host: www.example.com
(other headers follow)
On the backend, you'll need to parse the results. I think PHP's $_REQUEST array will have already done this for you; in other languages you'll want to split over the first ? character, then split over the & characters, then split over the first = character, then urldecode both the name and the value.
According to RFC 3986:
The query component is indicated by the first question mark ("?")
character and terminated by a number sign ("#") character or by the
end of the URI.
So the following URI is valid:
http://www.example.com?next=http://www.example.com
The following excerpt from the RFC makes this clear:
... as query components are often used to carry identifying
information in the form of "key=value" pairs and one frequently used
value is a reference to another URI, it is sometimes better for
usability to avoid percent-encoding those characters.
It is worth noting that RFC 3986 makes RFC 2396 obsolete.

Percent sign in URL doesn't match rewrite rule (ASP.NET MVC project on IIS)

I have an URL in the form
/search/search-terms
which is described in the RewriteRules.config to match the regular expression
^search/(.*)$
The action rewrites it to
Search.aspx?value={R:1}
with appendQueryString set to true.
Everything is fine until I put an ampersand (&) in the search terms. I encode "term1&term2" to the URL string
/search/term1%26term2
but this URL is only matched up to term1. I'm not sure what causes this, whether the presence of the percent sign in the URL or a wrong encoding. What I know is that Request["value"] returns term1 only, so this is not a problem with my logic but with the URL rewrite itself.
How can I get the whole string, including the escaped ampersand? My logic then correctly decodes the escaped entity; I know this because it works in localhost, where no rewriting is performed.
It turns out that there is no solution: the percent sign breaks the rewrite rules matching. I had to encode and decode the query string, replacing special characters that get URL-enconded to percent-sign-tokens (like space to %20 and ampersand to %26) with other words.

How to transform encoded URL to readable texts?

It's about Bangla Unicode texts, but can be a problem for any language other than Latin glyphs.
I'm a host of a Bangla blog with all its texts and categories in Bangla (I prefer not to say Bengali as because the name of the language is Bangla rather than Bengali).
So the category in Bangla "বাংলা" saying a URL like:
http://www.example.com/category/বাংলা
But whenever I copied the URL from address bar and put 'em into a chat panel or somewhere else, it changed with some strange characters, for example:
http://www.example.com/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0*
* it's just an example, not the exact gibberish for the word "বাংলা")
So, in many cases I got some encoded URLs like above, from where I found no trace which Unicode text they are saying. Recently I'm getting some 404 error logged by one of my plugin. From there I found a URI like:
/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0%A6%BE%E0%A7%9F%E0%A7%81%E0%A6%AC%E0%A6%BF%E0%A6%A6%E0%A7%8D%E0%A6%AF%E0
I used the Jetpack's Omnisearch to find out any match, but the result is empty. I can't even trace which category that is— creating such a 404.
So here comes the question:
How can I transform the encoded URL to readable glyphs?
http://www.example.com/category/বাংলা
isn't a URL; URLs can only contain ASCII characters. This is an IRI.
http://www.example.com/category/%E0%A6%AC%E0%A6%BE%E0%A6%82%E0%A6%B2%E0%A6%BE
is the URI representation of that IRI. They are otherwise equivalent. A browser may display the ‘pretty’ IRI version in the user interface, but put the URI version on the clipboard so that you can paste it into other tools that don't support IRI.
The 404 address you pasted translates to:
/category/স্নায়ুবিদ্য�
where the last character is a � because it is an invalid, truncated UTF-8 sequence. (This is probably why the request failed.) Someone may have mis-pasted a partial URI here.
If you're using javascript you can do:
decodeURIComponent(url);
This will make sure the original language is preserved.

trouble constructiing url encoded link

if i do this:
<a target="_blank" href="<%=Url.Encode(sitelink)%>"> LINK TO SITE</a>
I get the link encoded but prepended with the current local domain "http://localhost/http://...."
whats the proper way to do this
The Url.Encode method is used to escape special characters for usage in the query part of a url - it's not meant to be applied to the entire url, because that will escape things like the :// at the beginning (which is why you get the local domain prepended, because it's no longer a full URL, instead getting interpreted as a relative url).

Resources