400 code error when URL contains % symbol? (NGINX) - url

How to prevent a server from returning an error 400 code error when the URL contains % symbol using NGINX server?
Nginx configuration for my website:
....
rewrite ^/download/(.+)$ /download.php?id=$1 last;
....
When I tried to get access to this URL:
http://mywebsite.net/download/some-string-100%-for-example
I got this error:
400 Bad Request
With this url :
http://mywebsite.net/download/some-string-%25-for-example
it's work fine !

It's because it needs to be URL encoded first.
This will explain:
http://www.w3schools.com/tags/ref_urlencode.asp
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
The URL interpreter is confused to see a % without hexadecimals after it.
Why would you think of solving by changing Nginx configuration???
It's impossible to solve from the server side. It's a problem from the client side.
https://headteacherofgreenfield.wordpress.com/2016/03/23/100-celebrations/
In that URL, the title is 100% Celebrations! but the permalink is autogenerated to 100-celebrations. It's because they know putting 100% will cause a URL encode problem.
If even Wordpress doesn't do it your way, then why should you do it?

Related

What format of url is this with the colon almost in the end - https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize

I am trying to consume the google text to speech api here : https://cloud.google.com/speech-to-text/docs/async-recognize#speech-async-recognize-gcs-protocol
and it has this url format below
https://google-speech-api-base-urlspeech:longrunningrecognize
What is this URL format with colon(:)in the end?
When I try to hit this URL, it gives me an error specifically while running test case on it .e. Invalid URI. Invalid Port?
But the official google documentation says this is a valid url? How to use this?
This format of URL is called gRPC Transcoding syntax. Your first URL is invlaid , because it's in the first path segment of a relative-path reference.
https://google-speech-api-base-urlspeech:longrunningrecognize
This url is invalid for usage, whereas the one below, https://speech.googleapis.com/v1/speech:longrunningrecognize was running fine.
Try changing your URL to something like
https://google-speech-api-base-url/speech:longrunningrecognize. It will work.
I looked at the documentation page you referenced and was unable to see/find a URL that looked like:
https://google-speech-api-base-urlspeech:longrunningrecognize
However, what I did find was a URL of the form:
https://speech.googleapis.com/v1/speech:longrunningrecognize
which looks perfectly valid.
The documentation for this REST request can be found here:
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/longrunningrecognize
Could you have made an error in your reading and comprehension?
Apparently the colon (:) is legal in the path part of a URL:
Are colons allowed in URLs?

Generating Oauth authorization token using base64 encoding

I am trying to follow the guide to generate Oauth authentication tokens for YAHOO DSP API.
Base64 encoding is a way of encoding binary data into text so that it can be easily transmitted across a network without error.
In this step, you will take the client ID and client secret that the YDN console generated for you and encode them using the base64 protocol. You can use an online encoding service like base64encode.org.
No matter which service you use, ensure that no spaces are appended to the CLIENT_ID and CLIENT_SECRET keys and separate the CLIENT_ID and CLIENT_SECRET with a colon, i.e. CLIENT_ID:CLIENT_SECRET.
The generated value will now be referenced as ENCODED(CLIENT_ID:CLIENT_SECRET) in this guide.
An example is given:
CLIENT_ID = dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
CLIENT_SECRET= a7e13ea3740b933496d88755ff341bfb824805a6
AUTHORIZATION = ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Using the recommended website I get the wrong AUTHORIZATION.
I have tried both encoding the whole thing at once ie. encode(CLIENT_ID:CLIENT_SECRET), and each element individually encode(CLIENT_ID):encode(CLIENT_SECRET).
Attempt encoding whole thing:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCTOiBhN2UxM2VhMzc0MGI5MzM0OTZkODg3NTVmZjM0MWJmYjgyNDgwNWE2
Attempt encoding each element:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCT:YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Expected result:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
The difference between 'each element' and the expected result is only a few characters corresponding to the end of client_ID and the colon.
B4oCT: should be BLS06.
Links to full documentation:
https://developer.yahoo.com/dsp/api/docs/authentication/tokens.html
https://developer.yahoo.com/dsp/api/docs/traffic/info/sandbox.html
Update:
The final character of Client_ID is '–' . This is some sort of non-standard character that is interpreted as two dashes i.e.'--' in utf-8 and windows 1258.
One different, TO NOTE is, that when you decrypt the expected output you will get your client id as
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA--
instead of
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
NOTE, there are two "-" at the end.
OAuth client auth token is always generated using Base64 encoding with following format
Base64_Encoding(CLIENT_ID:CLIENT_SECRET)
Most of the usage perform this Base64 encoding with encoding type as "UTF-8".
It looks like, Yahoo requires this token with different encoding. On "https://www.base64encode.org/" if you try to encode your "CLIENT_ID:CLIENT_SECRET" with "Windows-1254" as destination charset, you will receive the expected result. So, it looks like both encoding and decoding here is done keeping "Windows-1254" charset in place.

URL Encoded characters in IE 9

I have a problem hitting a URL and getting HTTP 404 error (No web page found). Some findings led me to conclude it is an encoding issue for one of the query parameters so I tried to apply urlEncoding i.e. HttpUtility.UrlEncode("<= 1 Week"). This produces "%3c%3d1+Week". But the problem persists. If no UrlEncode is done, the querystring will be "%3C=1%20Week"
Anyone knows why this is happening?

What URL encoding irregularities do browsers generally tolerate?

Web browsers tend to do their best to recover malformed URLs.
Let's start with a baseline google query.
http://www.google.com/search?q=myquery
Which results in my browser (recent-ish build of Chrome) requesting.
GET http://www.google.com/search?q=myquery HTTP/1.1
Fully expected behavior obviously.
Let's try putting an unescaped space into the mix.
http://www.google.com/search?q=my query
GET http://www.google.com/search?q=my%20query HTTP/1.1
What if we use the % character? Because it's not followed by a valid character code the browser should escape it to %25
http://www.google.com/search?q=i always give 100%
GET http://www.google.com/search?q=i%20always%20give%20100% HTTP/1.1
Chrome didn't escape the %!
Is space substitution the only URL transformation an average browser will/is expected to perform? Are there libraries for performing these kinds of URL "salvaging" transformations?

Maximum length of URL fragments (hash)

Is there a length limit for the fragment part of an URL (also known as the hash)?
The hash is client side only, so the rules for HTTP may not apply to it.
It depends on the browser.
I found that in safari, chrome, and Firefox, an URL with a long hash is legal, but if it is a param send to the server, the browser will display an 414 or 413 error.
for example:
an URL like http://www.stackoverflow.com/?abc#{hash value with 100 thousand characters} will be ok. and you can use location.hash to get the hash value in javascript but an URL like http://www.stackoverflow.com/?abc&{query with 100 thousand characters} will be illegal, if you paste this link in the address bar, a 413 error code will be given and the message is the client issued a request that was too long. If that is a link in a web page, in my computer, Nginx response the 414 error message.
I don't know the situation in IE.
So I think, the limitation of the length of URL is just for transmission or HTTP server, the browser will check it sometimes, but not every time, and it will always be allowed to be used as a hash.
There is definitely a length for the whole url.
Read
RFC2616 - Hypertext Transfer Protocol
Maximum URL length is 2,083 characters in Internet Explorer

Resources