AFNetworking: encoding URL strings containing '%" output is %25 - ios

Using AFNetworking 2.0 - when doing a get request with NSDictionary parameters - one of my parameters contains a % in it - it seems like AFNetworking is putting 25 in front of the % when encoding the URL - is there anyway to stop this prom happening?

% is used to mark URL-encoded characters. For example, %20 is a space, %3D is =, etc. You can read more about which characters get encoded, and why, here. The percent symbol is used to URL encode other characters, so it needs to be encoded. (Otherwise, the two subsequent characters would be interpreted incorrectly.)
So, encoding % as %25 is expected behavior. If your server isn't parsing this properly, then your server is not conforming to the standard outlined in the document I linked to above.
That said, if you really wish to override this behavior, you can do so by subclassing AFURLRequestSerialization, which contains all of the encoding logic. You can review the requestSerializer property on AFHTTPRequestOperationManager for more details.

Related

When is a character equivalent to its percent-encoded version in a URL?

(Context: I'm writing an HTML sanitiser, and want to normalize URLs as a defence-in-depth measure, making it impossible to use abnormally escaped URLs to bypass downstream blacklists (I'm not relying on blacklists myself) or mislead users.)
When given a URL, in what contexts can a character be changed to its percent-encoded version, or vice versa, without changing the meaning of the URL?
What I've been able to conclude so far:
In the path portion of a URL, / is not equivalent to its escaped form %2F
The separator ? between the path and query string is not equivalent to its escaped form %3F (presumably the same rule also applies to the fragment separator #)
For the special cases of . and .. within a hierarchical path, . is equivalent to %2E according to the specification
Some characters, such as ^, are illegal in URLs, and thus must only appear in encoded form – the decoded form is not equivalent because it can't be used at all
I don't have a second-hand source for this, but all the software I've tested agrees that percent-encoded domain names are equivalent to the corresponding decoded versions (e.g. ex%61mple.com is equivalent to example.com in the host part of a URL) – this makes sense because %, /, and illegal-in-URL characters are all illegal in domain names anyway, so escaping could not possibly be of use
% cannot be equivalent to its encoded form %25, otherwise there would be no way to escape the escape character
application/x-www-form-urlencoded is a commonly (although not universally) used format for URL query strings, and in that format, =, +, & are not equivalent to %3D, %2B, %26 respectively; thus these equivalences cannot hold in URL query strings
However, I'm finding it unclear what the correct action to take with real-world URLs is in other cases, especially as real-life URL parsing libraries tend not to match the specification exactly. In particular:
Should I be percent-decoding characters in the path portion of a URL that are URL-safe (other than %/?#) but have been unexpectedly encoded anyway? The most common software behaviour that I've seen for URLs like http://example.com/ind%65x.html is to treat them as distinct URLs from http://example.com/index.html (e.g. they appear differently in logs and don't compare as equal), but to actually handle the two "distinct" URLs the same way. I don't know whether this is an implementation detail, or whether it's some sort of compatibility workaround.
Should I be decoding any characters in query strings? If so, which?
Should I be decoding any characters in fragments? If so, which?
There seem to be competing standards on this subject, and real-world application behaviour might not match any of them, so I'm interested in knowing how far I can go with URL normalization without breaking real-world use cases. (It would also be helpful to know in which situations escaped characters might be technically different in meaning from the non-escaped versions, but in which escaping them would have no legitimate uses – a sanitiser could have an option to reject URLs that escaped these characters as being likely to be malicious.)
I hope this may provide some insight to your question:
We should only encode the individual components of the ur (example query parameters and fragments), excluding the domain name, that may contain unsafe symbols. Please note, the different components have different rules of what characters need to be encoded and which ones do not. Please read here [https://datatracker.ietf.org/doc/html/rfc3986].
In general, you may follow below:
These unreserved Characters Need not be encoded: ALPHA (uppercase and lowercase) / Decimal Digits / "-" / "." / "_" / "~"   
The space character is converted into a plus sign "+" and should not trigger encoding.
All other characters (unsafe, reserved characters if not used for their reserved purposes) should be encoded. Below is a list of such characters (it may include a few more):
! * ' ( ) ; : # & = + $ , / ? # [ ] % { } | \ ^ 

Does URL encoding guarrantee for all outputted characters to be printable (visible)?

Does URL encoding guarantee for all encoded characters (after the encoding process) to be printable (visible)? Within its specification and scope? "Printable" here is defined as "visible on paper". Unfortunately could not find any documents mentioning anything similar online
URL encoding uses a very limited set of characters (probably 7-bit ascii), hence is always printable.
All 8-bit codes, plus all of these: !"# $%&' ()*+ ,/:; <=>? #[\] ^``{| }~ are turned into something else.
Perhaps importantly, but confusing: a single space is turned into +.
The goal of the encoding is to avoid parsing problems in URLs:
HTTP://example.com/blah.php?my_url=example.com?confusion reighn&x=(a+b)
The stuff after my_url= should have been encoded.

Cross Platform Url Encoding for Query Strings

There are multiple classes and functions in different Programming Languages for encoding and decoding strings to be URL friendly. For example
in java
URLEncoder.encode(String, String)
in PHP
urlencode ( string $str )
And ...
My question is, If I UrlEncode a String in java, can I expect the other different UrlDecoders in other Languages decode to the same original sting?
I'm creating a Service that needs to encode some Base64 value in query string and I have no idea who are serving to.
Please consider the only option I have here seems to be the query string. I can't use xml or json or HTTP headers Since I need this to be in a url to be redirected.
I looked around and there were some questions exactly like this but non of them had a proper answer.
I appreciate so much for any acknowledge or any solutions.
EDIT:
For example in PHP Manual there is this description:
Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.
That sounds it does not follow the RFC
It sounds url encoders can use various algorithms in different Programming Languages.
But one should look for the encoding schema for every function. For example one of them could be
application/x-www-form-urlencoded
looking into JAVA Url Encoder:
Translates a string into application/x-www-form-urlencoded format using a specific encoding scheme. This method uses the supplied encoding scheme to obtain the bytes for unsafe characters.
Also looking into PHP's
that is the same way as in application/x-www-form-urlencoded media type
So if you are looking for a Cross Platform Url Encoding you should tell your users what is the format of your encoder.
This way, they can found the appropriate Decoder or otherwise they can implement their own.
After some investigation, sounds application/x-www-form-urlencoded is the most popular among others.

Google's URL encoding?

I have noticed that Google does not encode all special characters in the query part of the URL . For example:
Placing this string in Google's search: !##$%^&*()
Yields this URL: https://www.google.com/#q=!%40%23%24%25^%26*()
Notice that the !, ^, *, ( , and ) are not encoded.
Some of the characters such as : or < are considered unsafe or reserved, yet Google doesn't encode them.
Can someone explain why Google does this, and if they have a reference document as to exactly what characters get encoded and which don't?
Thanks for any help!
As documented here:
Some characters are not safe to use in a URL without first being
encoded. Because a Google search request is made by using an HTTP URL,
the search request must follow URL conventions, including character
encoding, where necessary.
The HTTP URL syntax defines that only alphanumeric characters, the
special characters $-_.+!*'(), and the reserved characters ;/?:#=& can
be used as values within an HTTP URL request. Since reserved
characters are used by the search engine to decode the URL, and some
special characters are used to request search features, then all
non-alphanumeric characters used as a value to an input parameter must
be URL-encoded.
To URL-encode a string:
Replace space characters with a "+" character Replace each
non-alphanumeric character by its hexadecimal ASCII value, in the
format of a "%" character followed by two hexadecimal digits. (Such an
ASCII value may be referred to as an escape code.)
Some input parameters require that the values passed to Google search are double-URL-encoded. This requirement means that you must apply the URL encoding to the string twice in succession to generate the final value.

Why need encode and decode urls?

I have read the question Why do you need to encode URLs
but I still confused:
Why the W3C just allow more character could exist in URL?So it could avoid encoding?
Why there is exist decode
The URL representation of characters may differ from the characters you have in your code. In other words, there is a specific grammar that defines how URLs are assembled. Special characters that are used in forming a URL need to be encoded so that they do not cause unexpected results.
Now to answer your questions more specifically:
They may already allow some of the characters you are thinking of, but these characters (&, ?, for example) are given special meaning to function in a certain way. Therefore, they cannot be used in a different context. From the link to the question you posted, it also looks like in the example of the space character, it is not supported because of the problems it would introduce in its use.
Decode is useful for decoding the URL to get the string representation of the URL before it was encoded for manipulation/other functions in the application.

Resources