Is a url like http://example.com/foo?bar valid?
I'm looking for a link to something official that says one way or the other. A simple yes/no answer or anecdotal evidence won't cut it.
Valid to the URI RFC
Likely acceptable to your server-side framework/code
The URI RFC doesn't mandate a format for the query string. Although it is recognized that the query string will often carry name-value pairs, it is not required to (e.g. it will often contain another URI).
3.4. Query
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). ...
... However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, ...
HTML establishes that a form submitted via HTTP GET should encode the form values as name-value pairs in the form "?key1=value1&key2=value2..." (properly encoded). Parsing of the query string is up to the server-side code (e.g. Java servlet engine).
You don't identify what server-side framework you use, if any, but it is possible that your server-side framework may assume the query string will always be in name-value pairs and it may choke on a query string that is not in that format (e.g. ?bar). If its your own custom code parsing the query string, you simply have to ensure you handle that query string format. If its a framework, you'll need to consult your documentation or simply test it to see how it is handled.
They're perfectly valid. You could consider them to be the equivalent of the big muscled guy standing silently behind the mob messenger. The guy doesn't have a name and doesn't speak, but his mere presence conveys information.
"The "http" scheme is used to locate network resources via the HTTP protocol. This section defines the scheme-specific syntax and semantics for http URLs." http://www.w3.org/Protocols/rfc2616/rfc2616.html
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
So yes, anything is valid after a question mark. Your server may interpret differently, but anecdotally, you can see some languages treat that as a boolean value which is true if listed.
Yes, it is valid.
If one simply want to check if the parameter exists or not, this is one way to do so.
URI Spec
The only relevant part of the URI spec is to know everything between the first ? and the first # fits the spec's definition of a query. It can include any characters such as [:/.?]. This means that a query string such as ?bar, or ?ten+green+apples is valid.
Find the RFC 3986 here
HTML Spec
isindex is not meaningfully HTML5.
It's provided deprecated for use as the first element in a form only, and submits without a name.
If the entry's name is "isindex", its type is "text", and this is the first entry in the form data set, then append the value to result and skip the rest of the substeps for this entry, moving on to the next entry, if any, or the next step in the overall algorithm otherwise.
The isindex flag is for legacy use only. Forms in conforming HTML documents will not generate payloads that need to be decoded with this flag set.
The last time isindex was supported was HTML3. It's use in HTML5 is to provide easier backwards compatibility.
Support in libraries
Support in libraries for this format of URI varies however some libraries do provide legacy support to ease use of isindex.
Perl URI.pm (special support)
Some libraries like Perl's URI provide methods of parsing these kind of structures
$uri->query_keywords
$uri->query_keywords( $keywords, ... )
$uri->query_keywords( \#keywords )
Sets and returns query components that use the keywords separated by "+" format.
Node.js url (no special support)
As another far more frequent example, node.js takes the normal route and eases parsing as either
A string
or, an object of keys and values (using parseQueryString)
Most other URI-parsing APIs following something similar to this.
PHP parse_url, follows as similar implementation but only returns the string for the query. Parsing into an object of k=>v requires parse_string()
It is valid: see Wikipedia, RFC 1738 (3.3. HTTP), RFC 3986 (3. Syntax Components).
isindex deprecated magic name from HTML5
This deprecated feature allows a form submission to generate such an URL, providing further evidence that it is valid for HTML. E.g.:
<form action="#isindex" class="border" id="isindex" method="get">
<input type="text" name="isindex" value="bar"/>
<button type="submit">Submit</button>
</form>
generates an URL of type:
?bar
Standard: https://www.w3.org/TR/html5/forms.html#naming-form-controls:-the-name-attribute
isindex is however deprecated as mentioned at: https://stackoverflow.com/a/41689431/895245
As all other answers described, it's perfectly valid for checking, specially for boolean kind stuff
Here is a simple function to get the query string by name:
function getParameterByName(name, url) {
if (!url) {
url = window.location.href;
}
name = name.replace(/[\[\]]/g, "\\$&");
var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
results = regex.exec(url);
if (!results) return null;
if (!results[2]) return '';
return decodeURIComponent(results[2].replace(/\+/g, " "));
}
and now you want to check if the query string you are looking for exists or not, you may do a simple thing like:
var exampleQueryString = (getParameterByName('exampleQueryString') != null);
the exampleQueryString will be false if the function can't find the query string, otherwise will be true.
The correct resource to look for this is RFC6570. Please refer to section 3.2.9 where in examples empty parameter is presented as below.
Example Template Expansion
{&x,y,empty} &x=1024&y=768&empty=
Related
I cannot find documentation anywhere regarding whether the following URL that has a query string is valid.
http://www.example.com/webapp&someKey=someValue
I know that ? starts a list of key-value pairs separated by &.
Is the ? required?
? appears to be required for the trailing part to be called query.
Query string is defined in RFC 3986. Section 3.3 Path says:
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component
(Section 3.4), serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or
by the end of the URI.
Section 3.4 defines query:
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
RFC 1738 for URL has a section for HTTP URL scheme. It says in section 3.3 that:
An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
where and are as described in Section 3.1. If :
is omitted, the port defaults to 80. No user name or password is
allowed. is an HTTP selector, and is a query
string. The is optional, as is the and its
preceding "?". If neither nor is present, the "/"
may also be omitted.
Within the and components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.
You can use tricks to take the URI as you mention and then split it as if it was a query string. Frameworks like Laravel, Django etc. allow you to handle routes in a query string like manner. There's more to it than what I say; I was just giving an example about Frameworks' handling of URIs.
Look at this example from Laravel documentation: https://laravel.com/docs/7.x/routing#required-parameters. It shows how Laravel takes a route like https://site/posts/1/comments/3 and handles the post id 1 and comment id 3 through a function.
Route::get('posts/{post}/comments/{comment}', function ($postId, $commentId) {
//
});
You can, perhaps, handle routes like http://site/webapp/somekey/somevalue.
I'm reading the book, HTTP - The Definitive Guide, from which I get the URL general format:
<scheme>://<user>:<password>#<host>:<port>/<path>;<params>?<query>#<frag>
The <params> part said,
The path component for HTTP URLs can be broken into path segments. Each segment can have its own params. For example:
http://www.joes-hardware.com/hammers;sale=false/index.html;graphics=true
In my opinion, path params can also be used to query resources like query strings, but why it's barely seen?
And I'm a Rails developer, and I haven't seen its usage or specification in Rails. Does Rails not support it?
You ask several questions
Why do we not see ;params=value much?
Because query parameters using ?=& are widely supported, like in PHP, .net, ruby etc.. with convenient functions like $_GET[].
While params delimited by ; or , do not have these convenient helper functions. You do encounter them at Rest api's, where they are used in the htaccess or the controller to get relevant parameters.
Does Ruby support params delimited with ;?
Once you obtain the current url, you can get all parameters with a simple regex call. This is also why they are used in htaccess files, because they are easily regexed (is that a word?).
Both parameter passing structures are valid and can be used, the only clear reason why one is used more often than the other is because of preference and support in the different languages.
With the Rails Way of adding hashes in the parameter of an URL like so:
http://api.example.com?person[first]=Jane&person[last]=Doe&person[email]=jane#doe.com
How do I format the API Blueprint doc to accommodate a list of available hashes?
Parameters
person[first] (required, string, Jane) ... First name
This is not legal when I execute the document.
Any ideas or tips are welcome!
Per https://www.rfc-editor.org/rfc/rfc3986#section-3.2.2, you must escape [] in URIs. As such, you need to do:
Parameters
person%5Bfirst%5D (required, string, Jane) ...
If you template the URI in your blueprint, you must also escape the [] there as well.
FYI, there is a bug in the original documentation for code generation in Apiary.io (if you are using that) and the generated URIs at the moment that does not properly handle the escaping. You can turn on the Beta documentation, which does not have that issue.
Just say I have the following url that has a query string parameter that's an url:
http://www.someSite.com?next=http://www.anotherSite.com?test=1&test=2
Should I url encode the next parameter? If I do, who's responsible for decoding it - the web browser, or my web app?
The reason I ask is I see lots of big sites that do things like the following
http://www.someSite.com?next=http://www.anotherSite.com/another/url
In the above, they don't bother encoding the next parameter because I'm guessing, they know it doesn't have any query string parameters itself. Is this ok to do if my next url doesn't include any query string parameters as well?
RFC 2396 sec. 2.2 says that you should URL-encode those symbols anywhere where they're not used for their explicit meanings; i.e. you should always form targetUrl + '?next=' + urlencode(nextURL).
The web browser does not 'decode' those parameters at all; the browser doesn't know anything about the parameters but just passes along the string. A query string of the form http://www.example.com/path/to/query?param1=value¶m2=value2 is GET-requested by the browser as:
GET /path/to/query?param1=value¶m2=value2 HTTP/1.1
Host: www.example.com
(other headers follow)
On the backend, you'll need to parse the results. I think PHP's $_REQUEST array will have already done this for you; in other languages you'll want to split over the first ? character, then split over the & characters, then split over the first = character, then urldecode both the name and the value.
According to RFC 3986:
The query component is indicated by the first question mark ("?")
character and terminated by a number sign ("#") character or by the
end of the URI.
So the following URI is valid:
http://www.example.com?next=http://www.example.com
The following excerpt from the RFC makes this clear:
... as query components are often used to carry identifying
information in the form of "key=value" pairs and one frequently used
value is a reference to another URI, it is sometimes better for
usability to avoid percent-encoding those characters.
It is worth noting that RFC 3986 makes RFC 2396 obsolete.
I'd like to test whether the URL that the user inputs into my form is "proper", e.g. the following are proper:
http://www.google.com
www.google.com
www.google.com/
but the following probably shouldn't be:
google
http://www.go?ogle?#%
I don't have in mind what "proper" means, but is there some standard out there that I can use?
In HTML5 you can use the input element with the type value url: http://www.w3.org/TR/html5/states-of-the-type-attribute.html#url-state-type-url. You'd need to check which browsers already implemented a validation for it, though. If it's important, you'd also need server-side validation, of course.
Here you can see what URLs are considered valid by HTML5: http://www.w3.org/TR/html5/urls.html#valid-url. It references RFC 3986 for URIs and RFC 3987 for IRIs.
You should probably have a look at RegEx for URL validation (see for example this question: PHP validation/regex for URL) or check if your library/programming-language/CMS has special functions for it.