Should I url encode a query string parameter that's a URL? - url

Just say I have the following url that has a query string parameter that's an url:
http://www.someSite.com?next=http://www.anotherSite.com?test=1&test=2
Should I url encode the next parameter? If I do, who's responsible for decoding it - the web browser, or my web app?
The reason I ask is I see lots of big sites that do things like the following
http://www.someSite.com?next=http://www.anotherSite.com/another/url
In the above, they don't bother encoding the next parameter because I'm guessing, they know it doesn't have any query string parameters itself. Is this ok to do if my next url doesn't include any query string parameters as well?

RFC 2396 sec. 2.2 says that you should URL-encode those symbols anywhere where they're not used for their explicit meanings; i.e. you should always form targetUrl + '?next=' + urlencode(nextURL).
The web browser does not 'decode' those parameters at all; the browser doesn't know anything about the parameters but just passes along the string. A query string of the form http://www.example.com/path/to/query?param1=value&param2=value2 is GET-requested by the browser as:
GET /path/to/query?param1=value&param2=value2 HTTP/1.1
Host: www.example.com
(other headers follow)
On the backend, you'll need to parse the results. I think PHP's $_REQUEST array will have already done this for you; in other languages you'll want to split over the first ? character, then split over the & characters, then split over the first = character, then urldecode both the name and the value.

According to RFC 3986:
The query component is indicated by the first question mark ("?")
character and terminated by a number sign ("#") character or by the
end of the URI.
So the following URI is valid:
http://www.example.com?next=http://www.example.com
The following excerpt from the RFC makes this clear:
... as query components are often used to carry identifying
information in the form of "key=value" pairs and one frequently used
value is a reference to another URI, it is sometimes better for
usability to avoid percent-encoding those characters.
It is worth noting that RFC 3986 makes RFC 2396 obsolete.

Related

URL Query String without Question Mark?

I cannot find documentation anywhere regarding whether the following URL that has a query string is valid.
http://www.example.com/webapp&someKey=someValue
I know that ? starts a list of key-value pairs separated by &.
Is the ? required?
? appears to be required for the trailing part to be called query.
Query string is defined in RFC 3986. Section 3.3 Path says:
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component
(Section 3.4), serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or
by the end of the URI.
Section 3.4 defines query:
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
RFC 1738 for URL has a section for HTTP URL scheme. It says in section 3.3 that:
An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
where and are as described in Section 3.1. If :
is omitted, the port defaults to 80. No user name or password is
allowed. is an HTTP selector, and is a query
string. The is optional, as is the and its
preceding "?". If neither nor is present, the "/"
may also be omitted.
Within the and components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.
You can use tricks to take the URI as you mention and then split it as if it was a query string. Frameworks like Laravel, Django etc. allow you to handle routes in a query string like manner. There's more to it than what I say; I was just giving an example about Frameworks' handling of URIs.
Look at this example from Laravel documentation: https://laravel.com/docs/7.x/routing#required-parameters. It shows how Laravel takes a route like https://site/posts/1/comments/3 and handles the post id 1 and comment id 3 through a function.
Route::get('posts/{post}/comments/{comment}', function ($postId, $commentId) {
//
});
You can, perhaps, handle routes like http://site/webapp/somekey/somevalue.

Should a client handle an url ending with "?" but no parameters?

A url without parameters but with a question mark appended at the end is passed to the client to be parsed and used.
I've been told the client should be robust enough to handle such url and proceed. But shouldn't this be fixed server-side?
thanks
An empty query part is not an error, so it definitely needs to be accepted by the client. (Reference: RFC 3986 section 3.4 which shows the syntax of the query as 0 or more allowable characters.)
An empty query is different from an undefined query (i.e., the URI does not contain a ?). If the base URI contains a query component, merging a relative URI with an empty query will override the base URI's query, whereas merging a relative URI without a query will copy the base URI's query into the merged result.

Language specific characters in URL

Colleagues from work have created API endpoint which uses language specific characters in url. This api url looks like
http://somedomain.com/someapi/somemethod/zażółć/gęślą/jaźń
Is this OK or is it a bad approach?
Technically, that's not a valid URL but web browsers and other clients finesse it. The script that characters are from is not an issue but structural characters like "/?#" could be. You'll have to consider what to do when they show up in data that you are "pasting" into your URLs.
An HTTP URL is:
an ASCII-encoded scheme (in this case the protocol "http")
a punycode-encoded, ASCII-encoded domain
a %-encoded, ASCII-encoded, server-defined sequence of octets for the path, optional query, and optional hash.
See RFC 3986
The assumption that everyone makes—quite reasonably because it is the predominant practice—is that the path, query, and hash are text. There is no text but encoded text. So, some character encoding is involved. Where %-encoding is needed outside of structural characters, browsers are going to assume UTF-8. If you don't want browsers to do the %-encoding, use valid URLs by doing it yourself with the character encoding that you are using.
As the world is standardizing on UTF-8 (where applicable), the HTML DOM has also with the encodeURIComponent function. Clients using JavaScript in a web browser are likely to use this function, either directly or through some library.
UTF-8 encoded, %-encoded (and, then on the wire, ASCII-encoded) version of your URL that my browser created:
http://somedomain.com/someapi/somemethod/za%C5%BC%C3%B3%C5%82%C4%87/g%C4%99%C5%9Bl%C4%85/ja%C5%BA%C5%84
(You can see this yourself using your browser's dev tools [F12 key, network tab] or a packet sniffer [e.g., Wireshark or Fiddler]. What you gave as a URL is never seen on the wire.)
Your server application probably understands that just fine. In any case, it is your server's rules that the client complies with. If your API uses UTF-8 encoded, %-encoded URLs then just document that. (But phrase it in a way that doesn't confuse people who do that already without knowing.)

Rails: Plus sign in GET-Request replaced by space

In Rails 3 (Ruby 1.9.2) I send an request
Started GET "/controller/action?path=/41_+"
But the parameter list looks like this:
{"path"=>"/41_ ",
"controller"=>"controller",
"action"=>"action"}
Whats going wrong here? The -, * or . sign works fine, its just the +which will be replaced by a space.
That's normal URL encoding, the plus sign is a shorthand for a space:
Within the query string, the plus sign is reserved as shorthand notation for a space. Therefore, real plus signs must be encoded. This method was used to make query URIs easier to pass in systems which did not allow spaces.
And from the HTML5 standard:
The character is a U+0020 SPACE character
Replace the character with a single U+002B PLUS SIGN character (+).
For POST-requests, (in case that's how some of you stumbled upon this question, like me) one might encounter this problem because one has encoded the data in the wrong way on the client side. Encoding the data as application/x-www-form-urlencoded will tell rails to decode the data as it decodes a URL, and hence replace + signs with whitespace, according to the standard RFC1738 as explained by #mu is too short
The solution is to encode the data on the client side as multipart/form-data.
In PHP, using cURL, this is done by taking into consideration the following gotcha:
Passing an array to CURLOPT_POSTFIELDS will encode the data as
multipart/form-data, while passing a URL-encoded string will encode
the data as application/x-www-form-urlencoded. http://php.net/manual/en/function.curl-setopt.php
You might wonder why I was using PHP on the client side (that's because the client in my example was another webserver, since I'm working on an API connection.)

Is a url query parameter valid if it has no value?

Is a url like http://example.com/foo?bar valid?
I'm looking for a link to something official that says one way or the other. A simple yes/no answer or anecdotal evidence won't cut it.
Valid to the URI RFC
Likely acceptable to your server-side framework/code
The URI RFC doesn't mandate a format for the query string. Although it is recognized that the query string will often carry name-value pairs, it is not required to (e.g. it will often contain another URI).
3.4. Query
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). ...
... However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, ...
HTML establishes that a form submitted via HTTP GET should encode the form values as name-value pairs in the form "?key1=value1&key2=value2..." (properly encoded). Parsing of the query string is up to the server-side code (e.g. Java servlet engine).
You don't identify what server-side framework you use, if any, but it is possible that your server-side framework may assume the query string will always be in name-value pairs and it may choke on a query string that is not in that format (e.g. ?bar). If its your own custom code parsing the query string, you simply have to ensure you handle that query string format. If its a framework, you'll need to consult your documentation or simply test it to see how it is handled.
They're perfectly valid. You could consider them to be the equivalent of the big muscled guy standing silently behind the mob messenger. The guy doesn't have a name and doesn't speak, but his mere presence conveys information.
"The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs." http://www.w3.org/Protocols/rfc2616/rfc2616.html
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
So yes, anything is valid after a question mark. Your server may interpret differently, but anecdotally, you can see some languages treat that as a boolean value which is true if listed.
Yes, it is valid.
If one simply want to check if the parameter exists or not, this is one way to do so.
URI Spec
The only relevant part of the URI spec is to know everything between the first ? and the first # fits the spec's definition of a query. It can include any characters such as [:/.?]. This means that a query string such as ?bar, or ?ten+green+apples is valid.
Find the RFC 3986 here
HTML Spec
isindex is not meaningfully HTML5.
It's provided deprecated for use as the first element in a form only, and submits without a name.
If the entry's name is "isindex", its type is "text", and this is the first entry in the form data set, then append the value to result and skip the rest of the substeps for this entry, moving on to the next entry, if any, or the next step in the overall algorithm otherwise.
The isindex flag is for legacy use only. Forms in conforming HTML documents will not generate payloads that need to be decoded with this flag set.
The last time isindex was supported was HTML3. It's use in HTML5 is to provide easier backwards compatibility.
Support in libraries
Support in libraries for this format of URI varies however some libraries do provide legacy support to ease use of isindex.
Perl URI.pm (special support)
Some libraries like Perl's URI provide methods of parsing these kind of structures
$uri->query_keywords
$uri->query_keywords( $keywords, ... )
$uri->query_keywords( \#keywords )
Sets and returns query components that use the keywords separated by "+" format.
Node.js url (no special support)
As another far more frequent example, node.js takes the normal route and eases parsing as either
A string
or, an object of keys and values (using parseQueryString)
Most other URI-parsing APIs following something similar to this.
PHP parse_url, follows as similar implementation but only returns the string for the query. Parsing into an object of k=>v requires parse_string()
It is valid: see Wikipedia, RFC 1738 (3.3. HTTP), RFC 3986 (3. Syntax Components).
isindex deprecated magic name from HTML5
This deprecated feature allows a form submission to generate such an URL, providing further evidence that it is valid for HTML. E.g.:
<form action="#isindex" class="border" id="isindex" method="get">
<input type="text" name="isindex" value="bar"/>
<button type="submit">Submit</button>
</form>
generates an URL of type:
?bar
Standard: https://www.w3.org/TR/html5/forms.html#naming-form-controls:-the-name-attribute
isindex is however deprecated as mentioned at: https://stackoverflow.com/a/41689431/895245
As all other answers described, it's perfectly valid for checking, specially for boolean kind stuff
Here is a simple function to get the query string by name:
function getParameterByName(name, url) {
if (!url) {
url = window.location.href;
}
name = name.replace(/[\[\]]/g, "\\$&");
var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
results = regex.exec(url);
if (!results) return null;
if (!results[2]) return '';
return decodeURIComponent(results[2].replace(/\+/g, " "));
}
and now you want to check if the query string you are looking for exists or not, you may do a simple thing like:
var exampleQueryString = (getParameterByName('exampleQueryString') != null);
the exampleQueryString will be false if the function can't find the query string, otherwise will be true.
The correct resource to look for this is RFC6570. Please refer to section 3.2.9 where in examples empty parameter is presented as below.
Example Template Expansion
{&x,y,empty} &x=1024&y=768&empty=

Resources