What URL encoding irregularities do browsers generally tolerate?

What URL encoding irregularities do browsers generally tolerate? - url

Web browsers tend to do their best to recover malformed URLs.
Let's start with a baseline google query.
http://www.google.com/search?q=myquery
Which results in my browser (recent-ish build of Chrome) requesting.
GET http://www.google.com/search?q=myquery HTTP/1.1
Fully expected behavior obviously.
Let's try putting an unescaped space into the mix.
http://www.google.com/search?q=my query
GET http://www.google.com/search?q=my%20query HTTP/1.1
What if we use the % character? Because it's not followed by a valid character code the browser should escape it to %25
http://www.google.com/search?q=i always give 100%
GET http://www.google.com/search?q=i%20always%20give%20100% HTTP/1.1
Chrome didn't escape the %!
Is space substitution the only URL transformation an average browser will/is expected to perform? Are there libraries for performing these kinds of URL "salvaging" transformations?

Related

Generating Oauth authorization token using base64 encoding

I am trying to follow the guide to generate Oauth authentication tokens for YAHOO DSP API.
Base64 encoding is a way of encoding binary data into text so that it can be easily transmitted across a network without error.
In this step, you will take the client ID and client secret that the YDN console generated for you and encode them using the base64 protocol. You can use an online encoding service like base64encode.org.
No matter which service you use, ensure that no spaces are appended to the CLIENT_ID and CLIENT_SECRET keys and separate the CLIENT_ID and CLIENT_SECRET with a colon, i.e. CLIENT_ID:CLIENT_SECRET.
The generated value will now be referenced as ENCODED(CLIENT_ID:CLIENT_SECRET) in this guide.
An example is given:
CLIENT_ID = dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
CLIENT_SECRET= a7e13ea3740b933496d88755ff341bfb824805a6
AUTHORIZATION = ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Using the recommended website I get the wrong AUTHORIZATION.
I have tried both encoding the whole thing at once ie. encode(CLIENT_ID:CLIENT_SECRET), and each element individually encode(CLIENT_ID):encode(CLIENT_SECRET).
Attempt encoding whole thing:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCTOiBhN2UxM2VhMzc0MGI5MzM0OTZkODg3NTVmZjM0MWJmYjgyNDgwNWE2
Attempt encoding each element:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCT:YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Expected result:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
The difference between 'each element' and the expected result is only a few characters corresponding to the end of client_ID and the colon.
B4oCT: should be BLS06.
Links to full documentation:
https://developer.yahoo.com/dsp/api/docs/authentication/tokens.html
https://developer.yahoo.com/dsp/api/docs/traffic/info/sandbox.html
Update:
The final character of Client_ID is '–' . This is some sort of non-standard character that is interpreted as two dashes i.e.'--' in utf-8 and windows 1258.

One different, TO NOTE is, that when you decrypt the expected output you will get your client id as
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA--
instead of
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
NOTE, there are two "-" at the end.
OAuth client auth token is always generated using Base64 encoding with following format
Base64_Encoding(CLIENT_ID:CLIENT_SECRET)
Most of the usage perform this Base64 encoding with encoding type as "UTF-8".
It looks like, Yahoo requires this token with different encoding. On "https://www.base64encode.org/" if you try to encode your "CLIENT_ID:CLIENT_SECRET" with "Windows-1254" as destination charset, you will receive the expected result. So, it looks like both encoding and decoding here is done keeping "Windows-1254" charset in place.

400 code error when URL contains % symbol? (NGINX)

How to prevent a server from returning an error 400 code error when the URL contains % symbol using NGINX server?
Nginx configuration for my website:
....
rewrite ^/download/(.+)$ /download.php?id=$1 last;
....
When I tried to get access to this URL:
http://mywebsite.net/download/some-string-100%-for-example
I got this error:
400 Bad Request
With this url :
http://mywebsite.net/download/some-string-%25-for-example
it's work fine !

It's because it needs to be URL encoded first.
This will explain:
http://www.w3schools.com/tags/ref_urlencode.asp
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
The URL interpreter is confused to see a % without hexadecimals after it.
Why would you think of solving by changing Nginx configuration???
It's impossible to solve from the server side. It's a problem from the client side.
https://headteacherofgreenfield.wordpress.com/2016/03/23/100-celebrations/
In that URL, the title is 100% Celebrations! but the permalink is autogenerated to 100-celebrations. It's because they know putting 100% will cause a URL encode problem.
If even Wordpress doesn't do it your way, then why should you do it?

URL Encoded characters in IE 9

I have a problem hitting a URL and getting HTTP 404 error (No web page found). Some findings led me to conclude it is an encoding issue for one of the query parameters so I tried to apply urlEncoding i.e. HttpUtility.UrlEncode("<= 1 Week"). This produces "%3c%3d1+Week". But the problem persists. If no UrlEncode is done, the querystring will be "%3C=1%20Week"
Anyone knows why this is happening?

Twitter stream API - Erlang client

I'm very new in Erlang world and I'm trying to write a client for the Twitter Stream API. I'm using httpc:request to make a POST request and I constantly get 401 error, I'm obviously doing something wrong with how I'm sending the request... What I have looks like this:
fetch_data() ->
Method = post,
URL = "https://stream.twitter.com/1.1/statuses/filter.json",
Headers = "Authorization: OAuth oauth_consumer_key=\"XXX\", oauth_nonce=\"XXX\", oauth_signature=\"XXX%3D\", oauth_signature_method=\"HMAC-SHA1\", oauth_timestamp=\"XXX\", oauth_token=\"XXX-XXXXX\", oauth_version=\"1.0\"",
ContentType = "application/json",
Body = "{\"track\":\"keyword\"}",
HTTPOptions = [],
Options = [],
R = httpc:request(Method, {URL, Headers, ContentType, Body}, HTTPOptions, Options),
R.
At this point I'm confident there's no issue with the signature as the same signature works just fine when trying to access the API with curl. I'm guessing there's some issue with how I'm making the request.
The response I'm getting with the request made the way demonstrated above is:
{ok,{{"HTTP/1.1",401,"Unauthorized"},
[{"cache-control","must-revalidate,no-cache,no-store"},
{"connection","close"},
{"www-authenticate","Basic realm=\"Firehose\""},
{"content-length","1243"},
{"content-type","text/html"}],
"<html>\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"/>\n<title>Error 401 Unauthorized</title>\n</head>\n<body>\n<h2>HTTP ERROR: 401</h2>\n<p>Problem accessing '/1.1/statuses/filter.json'. Reason:\n<pre> Unauthorized</pre>\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n</body>\n</html>\n"}}
When trying with curl I'm using this:
curl --request 'POST' 'https://stream.twitter.com/1.1/statuses/filter.json' --data 'track=keyword' --header 'Authorization: OAuth oauth_consumer_key="XXX", oauth_nonce="XXX", oauth_signature="XXX%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="XXX", oauth_token="XXX-XXXX", oauth_version="1.0"' --verbose
and I'm getting the events just fine.
Any help on this would be greatly appreciated, new with Erlang and I've been pulling my hair out on this one for quite a while.

There are several issues with your code:
In Erlang you are encoding parameters as a JSON body while with curl, you are encoding them as form data (application/x-www-form-urlencoded). Twitter API expects the latter. In fact, you get a 401 because the OAuth signature does not match, as you included the track=keyword parameter in the computation while Twitter's server computes it without the JSON body, as it should per OAuth RFC.
You are using httpc with default options. This will not work with the streaming API as the stream never ends. You need to process results as they arrive. For this, you need to pass {sync, false} option to httpc. See also stream and receiver options.
Eventually, while httpc can work initially to access Twitter streaming API, it brings little value to the code you need to develop around it to stream from Twitter API. Depending on your needs you might want to replace it a simple client directly built on ssl, especially considering it can decode HTTP packets (what is left for you is the HTTP chunk encoding).
For example, if your keywords are rare, you might get a timeout from httpc. Besides, it might be easier to update the list of keywords or your code with no downtime without httpc.
A streaming client directly based on ssl could be implemented as a gen_server (or a simple process, if you do not follow OTP principles) or even better a gen_fsm to implement reconnection strategies. You could proceed as follows:
Connect using ssl:connect/3,4 specifying that you want the socket to decode the HTTP packets with {packet, http_bin} and you want the socket to be configured in passive mode {active, false}.
Send the HTTP request packet (preferably as an iolist, with binaries) with ssl:send/2,3. It shall spread on several lines separated with CRLF (\r\n), with first the query line (GET /1.1/statuses/filter.json?... HTTP/1.1) and then the headers including the OAuth headers. Make sure you include Host: stream.twitter.com as well. End with an empty line.
Receive the HTTP response. You can implement this with a loop (since the socket is in passive mode), calling ssl:recv/2,3 until you get http_eoh (end of headers). Note down whether the server will send you data chunked or not by looking at the Transfer-Encoding response header.
Configure the socket in active mode with ssl:setopts/2 and specify you want packets as raw and data in binary format. In fact, if data is chunked, you could continue to use the socket in passive mode. You could also get data line by line or get data as strings. This is a matter of taste: raw is the safest bet, line by line requires that you check the buffer size to prevent truncation of a long JSON-encoded tweet.
Receive data from Twitter as messages sent to your process, either with receive (simple process) or in handle_info handler (if you implemented this with a gen_server). If data is chunked, you shall first receive the chunk size, then the tweets and the end of the chunk eventually (cf RFC 2616). Be prepared to have tweets that spread on several chunks (i.e. maintain some kind of buffer). The best here is to do the minimum decoding in this process and send tweets to another process, possibly in binary format.
You should also handle errors and socket being closed by Twitter. Make sure you follow Twitter's guidelines for reconnection.

Maximum length of URL fragments (hash)

Is there a length limit for the fragment part of an URL (also known as the hash)?

The hash is client side only, so the rules for HTTP may not apply to it.

It depends on the browser.
I found that in safari, chrome, and Firefox, an URL with a long hash is legal, but if it is a param send to the server, the browser will display an 414 or 413 error.
for example:
an URL like http://www.stackoverflow.com/?abc#{hash value with 100 thousand characters} will be ok. and you can use location.hash to get the hash value in javascript but an URL like http://www.stackoverflow.com/?abc&{query with 100 thousand characters} will be illegal, if you paste this link in the address bar, a 413 error code will be given and the message is the client issued a request that was too long. If that is a link in a web page, in my computer, Nginx response the 414 error message.
I don't know the situation in IE.
So I think, the limitation of the length of URL is just for transmission or HTTP server, the browser will check it sometimes, but not every time, and it will always be allowed to be used as a hash.

There is definitely a length for the whole url.
Read
RFC2616 - Hypertext Transfer Protocol
Maximum URL length is 2,083 characters in Internet Explorer

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart