Difference between application/x-www-form-urlencoded vs URL Encoding - url

Based on this reference, the URL encoding for a space is %20 and + is %2B.
TLDR; But based on these two posts (post1 and post2), an application/x-www-form-urlencoded uses + for space.
I am so confused with the difference between URL encoding in general and application/x-www-form-urlencoded.
What's this difference and why?

Related

When is a character equivalent to its percent-encoded version in a URL?

(Context: I'm writing an HTML sanitiser, and want to normalize URLs as a defence-in-depth measure, making it impossible to use abnormally escaped URLs to bypass downstream blacklists (I'm not relying on blacklists myself) or mislead users.)
When given a URL, in what contexts can a character be changed to its percent-encoded version, or vice versa, without changing the meaning of the URL?
What I've been able to conclude so far:
In the path portion of a URL, / is not equivalent to its escaped form %2F
The separator ? between the path and query string is not equivalent to its escaped form %3F (presumably the same rule also applies to the fragment separator #)
For the special cases of . and .. within a hierarchical path, . is equivalent to %2E according to the specification
Some characters, such as ^, are illegal in URLs, and thus must only appear in encoded form – the decoded form is not equivalent because it can't be used at all
I don't have a second-hand source for this, but all the software I've tested agrees that percent-encoded domain names are equivalent to the corresponding decoded versions (e.g. ex%61mple.com is equivalent to example.com in the host part of a URL) – this makes sense because %, /, and illegal-in-URL characters are all illegal in domain names anyway, so escaping could not possibly be of use
% cannot be equivalent to its encoded form %25, otherwise there would be no way to escape the escape character
application/x-www-form-urlencoded is a commonly (although not universally) used format for URL query strings, and in that format, =, +, & are not equivalent to %3D, %2B, %26 respectively; thus these equivalences cannot hold in URL query strings
However, I'm finding it unclear what the correct action to take with real-world URLs is in other cases, especially as real-life URL parsing libraries tend not to match the specification exactly. In particular:
Should I be percent-decoding characters in the path portion of a URL that are URL-safe (other than %/?#) but have been unexpectedly encoded anyway? The most common software behaviour that I've seen for URLs like http://example.com/ind%65x.html is to treat them as distinct URLs from http://example.com/index.html (e.g. they appear differently in logs and don't compare as equal), but to actually handle the two "distinct" URLs the same way. I don't know whether this is an implementation detail, or whether it's some sort of compatibility workaround.
Should I be decoding any characters in query strings? If so, which?
Should I be decoding any characters in fragments? If so, which?
There seem to be competing standards on this subject, and real-world application behaviour might not match any of them, so I'm interested in knowing how far I can go with URL normalization without breaking real-world use cases. (It would also be helpful to know in which situations escaped characters might be technically different in meaning from the non-escaped versions, but in which escaping them would have no legitimate uses – a sanitiser could have an option to reject URLs that escaped these characters as being likely to be malicious.)
I hope this may provide some insight to your question:
We should only encode the individual components of the ur (example query parameters and fragments), excluding the domain name, that may contain unsafe symbols. Please note, the different components have different rules of what characters need to be encoded and which ones do not. Please read here [https://datatracker.ietf.org/doc/html/rfc3986].
In general, you may follow below:
These unreserved Characters Need not be encoded: ALPHA (uppercase and lowercase) / Decimal Digits / "-" / "." / "_" / "~"   
The space character is converted into a plus sign "+" and should not trigger encoding.
All other characters (unsafe, reserved characters if not used for their reserved purposes) should be encoded. Below is a list of such characters (it may include a few more):
! * ' ( ) ; : # & = + $ , / ? # [ ] % { } | \ ^ 

Does URL encoding guarrantee for all outputted characters to be printable (visible)?

Does URL encoding guarantee for all encoded characters (after the encoding process) to be printable (visible)? Within its specification and scope? "Printable" here is defined as "visible on paper". Unfortunately could not find any documents mentioning anything similar online
URL encoding uses a very limited set of characters (probably 7-bit ascii), hence is always printable.
All 8-bit codes, plus all of these: !"# $%&' ()*+ ,/:; <=>? #[\] ^``{| }~ are turned into something else.
Perhaps importantly, but confusing: a single space is turned into +.
The goal of the encoding is to avoid parsing problems in URLs:
HTTP://example.com/blah.php?my_url=example.com?confusion reighn&x=(a+b)
The stuff after my_url= should have been encoded.

Cross Platform Url Encoding for Query Strings

There are multiple classes and functions in different Programming Languages for encoding and decoding strings to be URL friendly. For example
in java
URLEncoder.encode(String, String)
in PHP
urlencode ( string $str )
And ...
My question is, If I UrlEncode a String in java, can I expect the other different UrlDecoders in other Languages decode to the same original sting?
I'm creating a Service that needs to encode some Base64 value in query string and I have no idea who are serving to.
Please consider the only option I have here seems to be the query string. I can't use xml or json or HTTP headers Since I need this to be in a url to be redirected.
I looked around and there were some questions exactly like this but non of them had a proper answer.
I appreciate so much for any acknowledge or any solutions.
EDIT:
For example in PHP Manual there is this description:
Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.
That sounds it does not follow the RFC
It sounds url encoders can use various algorithms in different Programming Languages.
But one should look for the encoding schema for every function. For example one of them could be
application/x-www-form-urlencoded
looking into JAVA Url Encoder:
Translates a string into application/x-www-form-urlencoded format using a specific encoding scheme. This method uses the supplied encoding scheme to obtain the bytes for unsafe characters.
Also looking into PHP's
that is the same way as in application/x-www-form-urlencoded media type
So if you are looking for a Cross Platform Url Encoding you should tell your users what is the format of your encoder.
This way, they can found the appropriate Decoder or otherwise they can implement their own.
After some investigation, sounds application/x-www-form-urlencoded is the most popular among others.

What is the charset of URLs?

When someone types an url in a browser to access a page, which charset is used for that URL? Is there a standard? Can I consider that UTF-8 is used everywhere? Which characters are accepted?
URLs may contain only a subset of ASCII, all URLs are valid ASCII.
International domain names must be Punycode encoded. Non-ASCII characters in the path or query parts must be encoded, with Percent-encoding being the generally agreed-upon standard.
Percent-encoding only takes the raw bytes and encodes each byte as %xx. There's no generally followed standard on what encoding should be used to determine a byte representation. As such, it's basically impossible to assume any particular character set being used in the percent-encoded representation. If you're creating those links, then you're in full control over the used charset before percent-encoding; if you're not, you're mostly out of luck. Though you will most likely encounter UTF-8, this is not guaranteed.

AFNetworking: encoding URL strings containing '%" output is %25

Using AFNetworking 2.0 - when doing a get request with NSDictionary parameters - one of my parameters contains a % in it - it seems like AFNetworking is putting 25 in front of the % when encoding the URL - is there anyway to stop this prom happening?
% is used to mark URL-encoded characters. For example, %20 is a space, %3D is =, etc. You can read more about which characters get encoded, and why, here. The percent symbol is used to URL encode other characters, so it needs to be encoded. (Otherwise, the two subsequent characters would be interpreted incorrectly.)
So, encoding % as %25 is expected behavior. If your server isn't parsing this properly, then your server is not conforming to the standard outlined in the document I linked to above.
That said, if you really wish to override this behavior, you can do so by subclassing AFURLRequestSerialization, which contains all of the encoding logic. You can review the requestSerializer property on AFHTTPRequestOperationManager for more details.

Resources