Should the WWW-Authenticate realm parameter be encoded? - oauth

When building the WWW-Authenticate header value for OAuth/OAuth1a it calls for percent-encoding the parameters. The spec and implementations are ambiguous on whether the realm parameter should be percent-encoded or not.
Section 5.4.1. Authorization Header of the spec reads:
The OAuth Protocol Parameters are sent in the Authorization header the
following way:
Parameter names and values are encoded per Parameter Encoding.
For each parameter, the name is immediately followed by an '=' character (ASCII code 61), a '"' character (ASCII code 34), the parameter value (MAY be empty), and another '"' character (ASCII code 34).
Parameters are separated by a comma character (ASCII code 44) and OPTIONAL linear whitespace per [RFC2617].
The OPTIONAL realm parameter is added and interpreted per [RFC2617], section 1.2.
For example:
Authorization: OAuth realm="http://sp.example.com/",
oauth_consumer_key="0685bd9184jfhq22",
oauth_token="ad180jjd733klru7",
oauth_signature_method="HMAC-SHA1",
oauth_signature="wOJIO9A2W5mFwDgiDvZbTSMK%2FPY%3D",
oauth_timestamp="137131200",
oauth_nonce="4572616e48616d6d65724c61686176",
oauth_version="1.0"
If these steps are meant to be taken in order, then it seems like only the OAuth specific parameters are meant to be url encoded.
If these steps are not meant to be taken in order, then maybe the realm parameter is included in step 1. However, the WWW-Authenticate header example in the OAuth1a spec shows the realm as realm="http://sp.example.com/" which is not percent-encoding the colon or the slashes.
To make matters more confusing, it seems this varies from implementation to implementation. Many OAuth implementations give no special treatment to the parameters and simply percent-encode all of them, but other OAuth implementations give special treatment to the realm parameter and exclude it from percent-encoding.
What is the correct behavior for adding the realm parameter to the WWW-Authenticate header?

The WWW-Authenticate header and the realm parameter, in particular, are defined by rfc2617 and rfc7235, which do not say anything about encoding. rfc7235 shows an example where the spaces in "Login to \"apps\"" are not percent-encoded.
rfc2617 and rfc7235 are the authority on the WWW-Authenticate header and realm parameter while the OAuth1a spec is only the authority on the additional OAuth specific parameters. Therefore the realm parameter should not be percent-encoded and section 5.4.1 of the OAuth1a spec should be interpreted to only be talking about OAuth Protocol Parameters with regards to percent-encoding.

Related

URL Query String without Question Mark?

I cannot find documentation anywhere regarding whether the following URL that has a query string is valid.
http://www.example.com/webapp&someKey=someValue
I know that ? starts a list of key-value pairs separated by &.
Is the ? required?
? appears to be required for the trailing part to be called query.
Query string is defined in RFC 3986. Section 3.3 Path says:
The path component contains data, usually organized in hierarchical
form, that, along with data in the non-hierarchical query component
(Section 3.4), serves to identify a resource within the scope of the
URI's scheme and naming authority (if any). The path is terminated
by the first question mark ("?") or number sign ("#") character, or
by the end of the URI.
Section 3.4 defines query:
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
RFC 1738 for URL has a section for HTTP URL scheme. It says in section 3.3 that:
An HTTP URL takes the form:
http://<host>:<port>/<path>?<searchpart>
where and are as described in Section 3.1. If :
is omitted, the port defaults to 80. No user name or password is
allowed. is an HTTP selector, and is a query
string. The is optional, as is the and its
preceding "?". If neither nor is present, the "/"
may also be omitted.
Within the and components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.
You can use tricks to take the URI as you mention and then split it as if it was a query string. Frameworks like Laravel, Django etc. allow you to handle routes in a query string like manner. There's more to it than what I say; I was just giving an example about Frameworks' handling of URIs.
Look at this example from Laravel documentation: https://laravel.com/docs/7.x/routing#required-parameters. It shows how Laravel takes a route like https://site/posts/1/comments/3 and handles the post id 1 and comment id 3 through a function.
Route::get('posts/{post}/comments/{comment}', function ($postId, $commentId) {
//
});
You can, perhaps, handle routes like http://site/webapp/somekey/somevalue.

What are the legal and illegal characters in URL/Link?

What happens if there is a illegal character? Does the URL fix it self by encoding the illegal characters into something else?
As explained here
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]#!$&'()*+,;=.
Any other character needs to be encoded with the percent-encoding
(%hh). Each part of the URI has further restrictions about what
characters need to be represented by an percent-encoded word.
Allowed characters
RFC 3986 defines which characters are allowed in which URI components.
RFCs for specific URI schemes might further restrict this.
If you are interested in HTTP/HTTPS URIs: they are defined in RFC 7230. AFAIK they don’t have further restrictions regarding allowed characters, so you could stick to the definitions in RFC 3986.
What happens if illegal characters are used?
Depends on many factors … could be anything from "nothing happens" to "doesn’t work anymore".
Does the URL fix it self by encoding the illegal characters into something else?
A URI can’t fix itself, it’s just a string.
Clients working with this URI (browser, server, email client, etc.) may try to fix a URI (or work with invalid URIs) according to their own rules.
URI vs. link
Also note that there’s a difference between a URI and linking to (or storing etc.) this URI in a document.
The host language (e.g., HTML) might have rules what to encode. This does not change the URI, only the way the URI is stored/specified in this document.
For example, the valid URI http://example.com/a&b would have to be linked like this in HTML documents:
Link
But the URI is still http://example.com/a&b, not http://example.com/a&b.

Should I url encode a query string parameter that's a URL?

Just say I have the following url that has a query string parameter that's an url:
http://www.someSite.com?next=http://www.anotherSite.com?test=1&test=2
Should I url encode the next parameter? If I do, who's responsible for decoding it - the web browser, or my web app?
The reason I ask is I see lots of big sites that do things like the following
http://www.someSite.com?next=http://www.anotherSite.com/another/url
In the above, they don't bother encoding the next parameter because I'm guessing, they know it doesn't have any query string parameters itself. Is this ok to do if my next url doesn't include any query string parameters as well?
RFC 2396 sec. 2.2 says that you should URL-encode those symbols anywhere where they're not used for their explicit meanings; i.e. you should always form targetUrl + '?next=' + urlencode(nextURL).
The web browser does not 'decode' those parameters at all; the browser doesn't know anything about the parameters but just passes along the string. A query string of the form http://www.example.com/path/to/query?param1=value&param2=value2 is GET-requested by the browser as:
GET /path/to/query?param1=value&param2=value2 HTTP/1.1
Host: www.example.com
(other headers follow)
On the backend, you'll need to parse the results. I think PHP's $_REQUEST array will have already done this for you; in other languages you'll want to split over the first ? character, then split over the & characters, then split over the first = character, then urldecode both the name and the value.
According to RFC 3986:
The query component is indicated by the first question mark ("?")
character and terminated by a number sign ("#") character or by the
end of the URI.
So the following URI is valid:
http://www.example.com?next=http://www.example.com
The following excerpt from the RFC makes this clear:
... as query components are often used to carry identifying
information in the form of "key=value" pairs and one frequently used
value is a reference to another URI, it is sometimes better for
usability to avoid percent-encoding those characters.
It is worth noting that RFC 3986 makes RFC 2396 obsolete.

What kind of URI are allowed for OpenID?

I'm implementing a login system using OpenID.
The doc says :
Subject Identifier
An identifier for a set of attributes. It MUST be a URI. The subject identifier corresponds to the end-user identifier in the authentication portion of the messages. In other words, the subject of the identity attributes in the attribute exchange part of the message is the same as the end-user in the authentication part. The subject identifier is not included in the attribute exchange.
URI are quite larges in definition, it can be http://, but also gopher://.
I'm sure gopher is not a valid URI protocol, but then, excluding http(s), what else is allowed as a subject identifier from the OpenID protocol ?
You're quoting the wrong spec. The openid specification, section 7.2 says:
7.2. Normalization
The end user's input MUST be normalized into an Identifier, as follows:
If the user's input starts with the "xri://" prefix, it MUST be stripped off, so that XRIs are used in the canonical form.
If the first character of the resulting string is an XRI Global Context Symbol ("=", "#", "+", "$", "!") or "(", as defined in Section 2.2.1 of [XRI_Syntax_2.0], then the input SHOULD be treated as an XRI.
Otherwise, the input SHOULD be treated as an http URL; if it does not include a "http" or "https" scheme, the Identifier MUST be prefixed with the string "http://". If the URL contains a fragment part, it MUST be stripped off together with the fragment delimiter character "#". See Section 11.5.2 for more information.
URL Identifiers MUST then be further normalized by both following redirects when retrieving their content and finally applying the rules in Section 6 of [RFC3986] to the final destination URL. This final URL MUST be noted by the Relying Party as the Claimed Identifier and be used when requesting authentication.
From the third point we can infer that the identifier must be either a http(s) URL or an XRI.

What are the characteristics of an OAuth token?

How many characters long can an oauth access token and oauth access secret be and what are the allowed characters? I need to store them in a database.
I am not sure there are any explicit limits. The spec doesn't have any.
That said, OAuth tokens are often passed as url parameters and so have some of the same limitations. ie need to be properly encoded, etc.
OAuth doesn't specify the format or content of a token. We simply use encrypted name-value pairs as token. You can use any characters in token but it's much easier to handle if the token is URL-safe. We achieve this by encoding the ciphertext with an URL-safe Base64.
As most people already pointed out. The OAuth specification doesn't give you exact directions but they do say...
cited from: https://datatracker.ietf.org/doc/html/draft-hammer-oauth-10#section-4.9
"Servers should be careful to assign
shared-secrets which are long enough,
and random enough, to resist such
attacks for at least the length of
time that the shared-secrets are
valid."
"Of course, servers are urged to err
on the side of caution, and use the
longest secrets reasonable."
on the other hand, you should consider the maximum URL length of browsers:
see: http://www.boutell.com/newfaq/misc/urllength.html
If you read the spec, it says,
The authorization server issues the registered client a client
identifier - a unique string representing the registration
information provided by the client. The client identifier is not a
secret; it is exposed to the resource owner, and MUST NOT be used
alone for client authentication. The client identifier is unique to
the authorization server.
The client identifier string size is left undefined by this
specification. The client should avoid making assumptions about the
identifier size. The authorization server SHOULD document the size
of any identifier it issues.
Second, Access Token should be sent as header, not as a URL param.
Authorization: Bearer < token>.
An OAuth token is conceptually an arbitrary-sized sequence of bytes, not characters. In URLs, it gets encoded using standard URL escaping mechanisms:
unreserved = ALPHA, DIGIT, '-', '.', '_', '~'
Everything not unreserved gets %-encoded.
I'm not sure whether you just talk about the oauth_token parameter that gets passed around. Usually, additional parameters need to be stored and transmitted as well, such as oauth_token_secret, oauth_signature, etc. Some of them have different data types, for example, oauth_timestamp is an integer representing seconds since 1970 (encoded in decimal ASCII digits).
Valid chars for OAuth token are limited by HTTP header value restrictions as OAuth token is frequently sent in HTTP header "Authorization".
Valid chars for HTTP headers are specified by https://www.rfc-editor.org/rfc/rfc7230#section-3.2.6. Alternatively you may check HTTP header validating code of some popular HTTP client libs, for example see Headers.checkNameAndValue() util of OkHttp framework: https://github.com/square/okhttp/blob/master/okhttp/src/main/java/okhttp3/Headers.java
And this is not all. I wouldn't include HTTP header separator (; and many others) and whitespace symbols (' ' and '\t') and double quote (") (see https://www.rfc-editor.org/rfc/rfc7230#section-3.2.6) as it would require to escape OAuth token before using in HTTP header. Frequently tokens are used by humans in curl test requests, and so good token generators don't add such characters. But you should check what characters may produce Oauth token generator with which your service is working before making any assumptions.
To be specific, even if Oauth spec doesn't say anything, if you are using java and mysql then it will be 16 characters as we generally generate the tokens using UUID and store it as BINARY(16) in the database. I know these details as I have recently done the development using OAuth.

Resources