why routes with % character in routes will display blank? - ruby-on-rails

in rails when the routes or url has % will display blank page.
e.g www.domain.com/% ---- will display blank page
I check also some website like github.com/% but still display in blank page.

It is related to URL Encoding. In short.
URL encoding converts characters into a format that can be transmitted
over the Internet.
Then we have some basic terminology you might wish to know
URL - Uniform Resource Locator
Web browsers request pages from web servers by using a URL.
The URL is the address of a web page, like https://stackoverflow.com/
URL Encoding (Percent Encoding)
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has
to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by
two hexadecimal digits. URLs cannot contain spaces. URL encoding
normally replaces a space with a plus (+) sign or with %20.
Well, so when your route contains only %, it was recognized as the encoded string but it missed two hexadecimal digits followed so you will always get a blank page or a custom error page depends on how your site was configured.
Because browser will request to https://stackoverflow.com/% and it is not existing.
How to handle it
Basically, If your URL is in unwell format when requesting to the server. It will return you an HTTP 400 error code which means
10.4.1 400 Bad Request
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications.
So what we need to do is just doing configuration on the server to redirect to a custom error page (in this case IIS Web Server of ASP.NET website). Example can found here

Related

Language specific characters in URL

Colleagues from work have created API endpoint which uses language specific characters in url. This api url looks like
http://somedomain.com/someapi/somemethod/zażółć/gęślą/jaźń
Is this OK or is it a bad approach?
Technically, that's not a valid URL but web browsers and other clients finesse it. The script that characters are from is not an issue but structural characters like "/?#" could be. You'll have to consider what to do when they show up in data that you are "pasting" into your URLs.
An HTTP URL is:
an ASCII-encoded scheme (in this case the protocol "http")
a punycode-encoded, ASCII-encoded domain
a %-encoded, ASCII-encoded, server-defined sequence of octets for the path, optional query, and optional hash.
See RFC 3986
The assumption that everyone makes—quite reasonably because it is the predominant practice—is that the path, query, and hash are text. There is no text but encoded text. So, some character encoding is involved. Where %-encoding is needed outside of structural characters, browsers are going to assume UTF-8. If you don't want browsers to do the %-encoding, use valid URLs by doing it yourself with the character encoding that you are using.
As the world is standardizing on UTF-8 (where applicable), the HTML DOM has also with the encodeURIComponent function. Clients using JavaScript in a web browser are likely to use this function, either directly or through some library.
UTF-8 encoded, %-encoded (and, then on the wire, ASCII-encoded) version of your URL that my browser created:
http://somedomain.com/someapi/somemethod/za%C5%BC%C3%B3%C5%82%C4%87/g%C4%99%C5%9Bl%C4%85/ja%C5%BA%C5%84
(You can see this yourself using your browser's dev tools [F12 key, network tab] or a packet sniffer [e.g., Wireshark or Fiddler]. What you gave as a URL is never seen on the wire.)
Your server application probably understands that just fine. In any case, it is your server's rules that the client complies with. If your API uses UTF-8 encoded, %-encoded URLs then just document that. (But phrase it in a way that doesn't confuse people who do that already without knowing.)

Should I url encode a query string parameter that's a URL?

Just say I have the following url that has a query string parameter that's an url:
http://www.someSite.com?next=http://www.anotherSite.com?test=1&test=2
Should I url encode the next parameter? If I do, who's responsible for decoding it - the web browser, or my web app?
The reason I ask is I see lots of big sites that do things like the following
http://www.someSite.com?next=http://www.anotherSite.com/another/url
In the above, they don't bother encoding the next parameter because I'm guessing, they know it doesn't have any query string parameters itself. Is this ok to do if my next url doesn't include any query string parameters as well?
RFC 2396 sec. 2.2 says that you should URL-encode those symbols anywhere where they're not used for their explicit meanings; i.e. you should always form targetUrl + '?next=' + urlencode(nextURL).
The web browser does not 'decode' those parameters at all; the browser doesn't know anything about the parameters but just passes along the string. A query string of the form http://www.example.com/path/to/query?param1=value&param2=value2 is GET-requested by the browser as:
GET /path/to/query?param1=value&param2=value2 HTTP/1.1
Host: www.example.com
(other headers follow)
On the backend, you'll need to parse the results. I think PHP's $_REQUEST array will have already done this for you; in other languages you'll want to split over the first ? character, then split over the & characters, then split over the first = character, then urldecode both the name and the value.
According to RFC 3986:
The query component is indicated by the first question mark ("?")
character and terminated by a number sign ("#") character or by the
end of the URI.
So the following URI is valid:
http://www.example.com?next=http://www.example.com
The following excerpt from the RFC makes this clear:
... as query components are often used to carry identifying
information in the form of "key=value" pairs and one frequently used
value is a reference to another URI, it is sometimes better for
usability to avoid percent-encoding those characters.
It is worth noting that RFC 3986 makes RFC 2396 obsolete.

url encode & url escape & url rewrite, what's the differences?

It's kinda confusing to differenciate those three terms.
It'll be more understandable if you can explain with examples.
Url encoding and Url escaping are one and the same..
URL Encoding is a process of transforming user input to a CGI form so it is fit for travel across the network; basically, stripping spaces and special characters present in the url, replacing them with escape characters.
URL rewriting changes the way you normally associate urls with resources. Normally, test.com/aboutus makes us think that it will take us to the about us page. But internally, Server may take user 1 to /aboutus/page1.html, user 2 to /aboutus/page2.html or any other resource. The Url exposed to the end user will be test.com/aboutus but the resource being rendered can be different. Note that Url Rewriting is performed by Server.

Rails: Plus sign in GET-Request replaced by space

In Rails 3 (Ruby 1.9.2) I send an request
Started GET "/controller/action?path=/41_+"
But the parameter list looks like this:
{"path"=>"/41_ ",
"controller"=>"controller",
"action"=>"action"}
Whats going wrong here? The -, * or . sign works fine, its just the +which will be replaced by a space.
That's normal URL encoding, the plus sign is a shorthand for a space:
Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.
And from the HTML5 standard:
The character is a U+0020 SPACE character
Replace the character with a single U+002B PLUS SIGN character (+).
For POST-requests, (in case that's how some of you stumbled upon this question, like me) one might encounter this problem because one has encoded the data in the wrong way on the client side. Encoding the data as application/x-www-form-urlencoded will tell rails to decode the data as it decodes a URL, and hence replace + signs with whitespace, according to the standard RFC1738 as explained by #mu is too short
The solution is to encode the data on the client side as multipart/form-data.
In PHP, using cURL, this is done by taking into consideration the following gotcha:
Passing an array to CURLOPT_POSTFIELDS will encode the data as
multipart/form-data, while passing a URL-encoded string will encode
the data as application/x-www-form-urlencoded. http://php.net/manual/en/function.curl-setopt.php
You might wonder why I was using PHP on the client side (that's because the client in my example was another webserver, since I'm working on an API connection.)

browser url encoding different that java's URLEncoder.encode

I have a url like this:
product//excerpts?feature=picture+modes
when i access this url from browser, the backend receives the request as:
"product//excerpts?feature=picture+modes"
At the web server end, i am using + as a separator. so feature=picture+modes means there are 2 features: picture and modes
I have created and automated script(Java) which goes to the url and retrieves its content.
When i run this script, the backend receives the request as:
"product/B000NK6J6Q/excerpts/feature=picture%2Bmodes"
This is because inside my script(Java) i use URLEncoder.encode which converts + to %2B and send this encoded url to the server.
Why are the urlEncoders provided by Java and those present with browsers(FF/IE) different.
how do i make them same? How do i decode them? ('+' on URLDecoder.decode gives space)
Also, is using '+' as a separator according to conventions (and specifications?) ?
Prac
What you are seeing is actually correct. See Percent encoding on Wikipedia. The + character is a reserved character and therefore needs to be encoded as %2B. Furthermore, historically browsers use a different form of percent encoding for forms that are submitted with the MIME type application/x-www-form-urlencoded where spaces become + instead of %20.
If you want to use a + in the URL then your separator should be a space in the backend. If you want to use + in the backend, then you will have %2B in the URL.

Resources