validate opaque access token by Azure AD - oauth-2.0

Does Azure AD issue opaque access tokens or only JWT tokens?
If yes how do you validate opaque access tokens in that case? because there is no introspection end point?

A JWT has readable content, as you can see for example on https://jwt.io/. Everyone can decode the token and read the information in it. The format is documented in RFC 7519.
An opaque token on the other hand has a format that is not intended to be read by you. Only the issuer knows the format.
Here's a quote from https://auth0.com/docs/tokens:
Opaque tokens: Tokens in a proprietary format that typically contain some identifier to information in a server’s persistent storage. To validate an opaque token, the recipient of the token needs to call the server that issued the token.
an opaque token is a simple string it is just a reference, hence, naturally, its format is entirely arbitrarily determined by the server that issues it (hence the term "proprietary format"). The token string is determined at the time of creation of the underlying (referred-to) content, i.e. when it is paired (associated) with the contents that this token (as the reference or foreign key) refers to
some JWT frameworks only the authentication token is a JWT, but as refresh token they use opaque tokens.
For more information refer this SO thread

Related

How do you guarantee safety when validating Keycloak (OIDC) tokens without making requests to the Keycloak API?

There is a lot of information on OAuth, OIDC and Keycloak, but the main thing every tutorial seems to gloss over is offline validation. The only information I found is in the Keycloak docs on RPT introspection:
No. Just like a regular access token issued by a Keycloak server, RPTs also use the JSON web token (JWT) specification as the default format. If you want to validate these tokens without a call to the remote introspection endpoint, you can decode the RPT and query for its validity locally. Once you decode the token, you can also use the permissions within the token to enforce authorization decisions.
If I wanted to verify a user's request with an authorization token, I would make a request to the Keycloak introspection (or userinfo?) API. I'm not completely sure, but I would guess that Keycloak then verifies the info encoded in the JWT with the Keycloak user database.
However, what if I don't want to make a Keycloak request on every API request? This could improve system performance by limiting the amount of HTTP requests. There are mentions of JWT signature validations and reading the scope and user information encoded in the JWT, but I don't see how this guarantees safety. Isn't it possible to just generate any old JWT, encode any information you want and basically spoof an authorization token? How would I check if the user mentioned in the JWT really exists in the Keycloak database?
I think I am missing some crucial part of this technology.
For example, in ASP.NET Core, an API that receives a JWT token will, at startup, download the public signing keys from the token provider, and by default, it will refresh the keys every 24 hours.
So, when the API receives a JWT token, it will do various checks to validate the token, including:
Validating the signature using the provider public signing key
Validate the audience claim (is the token intended for me?)
Validate the expiry date
The API should not need to query anything against the token provider (keycloak) to let the user in.
However, then we have authorization (What the user is allowed to do), which is a different thing, and that all depends on your requirements.
JWT is all about who the user is; if the token is valid, you can trust that information.
With JWT-tokens, your API can work almost offline from the token provider. It can even be 100% disconnected if you copy the public signing key manually into the API.

Should JWT access tokens contain PII?

JWT access tokens shouldn't contain personally identifiable information (PII) as I understand it. This is to keep them small but also if intercepted, reduce the exposure of the information contained.
The OIDC protocol asks for a user info endpoint to be implemented. It can be called using the access token and it will return a bunch of claims about the user. Effectively what the id token contains, but potentially even more information.
So even though the access token doesn't carry this PII itself, if intercepted it can certainly be used to expose all this information anyway. So the argument about PII in the access token doesn't really stand up.
Does this mean I should be fine including email in the access token, because the API might want it in addition to the sub claim?
There are several points to be addressed here:
Not all access tokens must allow access to the userinfo endpoint. First, your system must expose a userinfo endpoint. Secondly, the user must have consented to release information in the userinfo endpoint to the given client. So in case of some access tokens there will be no threat that a malicious party could access the userinfo endpoint. And sometimes the user can consent to only expose their username, so even if you gain access to userinfo you'll still not be able to read the email. (of course it depends on the implementation of the OIDC Provider)
In the majority of cases oauth access tokens are used as bearer tokens. That means that anyone who has the token can access any data which can be accessed with that token. If someone manages to steal that token they can do whatever the original client could. If it is a concern for you, you can use sender constrained tokens instead of bearer tokens (e.g. mTLS constrained tokens or implement DPoP). These tokens are tied to the client which originally requested them. An attacker would have to steal not only the access token, but also a certificate used to verify proof-of-possession. The implementation is a bit more tricky than with bearer tokens, but security is greatly improved.
I would avoid putting any PII in a JWT. JWT can be decoded just like that, and any information kept within can be read by anyone. Lt's say that someone manages to get hold of a JWT issued from your system, but it's expired. They will not be able to access the API, or userinfo, but they can still extract data from the JWT. It's much better to use opaque tokens as access tokens and exchange them in your gateway (something which is called a Phantom Token approach).
Interestingly enough I only recently gave a talk on that concrete subject - using JWTs as access tokens and the Phantom Token flow :) (you can view here if you're interested :) link)

OAuth2.0 what should be the content format of refresh token, before encryption?

I am developing an OAuth2.0 Server. What should be the format of refresh token and what encryption algo should be used for encryption?
OAuth 2.0 does not enforce any restrictions regarding token formats or encryption. Encryption is completely disregarded in the spec, as communication is supposed to be secured with TLS.
Also - don't implement yourself if you don't really have to. Choose an open source library or even a vendor product if your funding permits it.
That being said, take a look at the JWT RFC. That's the format most people use. You can also consider no format at all and just work with opaque strings, and then implement token introspection in your Authorization server.
As to encryption - anything goes. Most implementations out there support at the very least HS256, RS256 and ES256 for signing JWT tokens. In most real world scenarios you don't need to encrypt the token, a signature is enough.
After struggling for sometime, I figured it out. There are two approaches
Either put everything that you need to construct access token (apart from things that can be derived/calculated directly) in refresh token and encrypt it with global or tenant level symmetric key
Benefits:
a. No storage required
b. Better performance as No contact with DB is required. It is just decryption and creation of access token
Issues/shortcomings
a. No way to keep track of refresh tokens issued.
b. No way to invalidate the issued refresh tokens.
Storing attributes related to refresh token in DB and in response of OAuth Server, returning encrypted refresh token ID. Attributes that might be stored in DB are
ID || Expiry || Count of Token || Subject || Client ID || Custom Attributes (If required)
There can be other attributes as well which we implemented but this must give a bit of idea to people who want to implement their own Refresh Token flow

JWT (Json Web Token) Audience "aud" versus Client_Id - What's the difference?

I'm working on implementing OAuth 2.0 JWT access_token in my authentication server. But, I'm not clear on what the differences are between the JWT aud claim and the client_id HTTP header value. Are they the same? If not, can you explain the difference between the two?
My suspicion is that aud should refer to the resource server(s), and the client_id should refer to one of the client applications recognized by the authentication server (i.e. web app, or iOS app).
In my current case, my resource server is also my web app client.
As it turns out, my suspicions were right. The audience aud claim in a JWT is meant to refer to the Resource Servers that should accept the token.
As this post simply puts it:
The audience of a token is the intended recipient of the token.
The audience value is a string -- typically, the base address of the
resource being accessed, such as https://contoso.com.
The client_id in OAuth refers to the client application that will be requesting resources from the Resource Server.
The Client app (e.g. your iOS app) will request a JWT from your Authentication Server. In doing so, it passes it's client_id and client_secret along with any user credentials that may be required. The Authorization Server validates the client using the client_id and client_secret and returns a JWT.
The JWT will contain an aud claim that specifies which Resource Servers the JWT is valid for. If the aud contains www.myfunwebapp.com, but the client app tries to use the JWT on www.supersecretwebapp.com, then access will be denied because that Resource Server will see that the JWT was not meant for it.
The JWT aud (Audience) Claim
According to RFC 7519:
The "aud" (audience) claim identifies the recipients that the JWT is
intended for. Each principal intended to process the JWT MUST
identify itself with a value in the audience claim. If the principal
processing the claim does not identify itself with a value in the
"aud" claim when this claim is present, then the JWT MUST be
rejected. In the general case, the "aud" value is an array of case-
sensitive strings, each containing a StringOrURI value. In the
special case when the JWT has one audience, the "aud" value MAY be a
single case-sensitive string containing a StringOrURI value. The
interpretation of audience values is generally application specific.
Use of this claim is OPTIONAL.
The Audience (aud) claim as defined by the spec is generic, and is application specific. The intended use is to identify intended recipients of the token. What a recipient means is application specific. An audience value is either a list of strings, or it can be a single string if there is only one aud claim. The creator of the token does not enforce that aud is validated correctly, the responsibility is the recipient's to determine whether the token should be used.
Whatever the value is, when a recipient is validating the JWT and it wishes to validate that the token was intended to be used for its purposes, it MUST determine what value in aud identifies itself, and the token should only validate if the recipient's declared ID is present in the aud claim. It does not matter if this is a URL or some other application specific string. For example, if my system decides to identify itself in aud with the string: api3.app.com, then it should only accept the JWT if the aud claim contains api3.app.com in its list of audience values.
Of course, recipients may choose to disregard aud, so this is only useful if a recipient would like positive validation that the token was created for it specifically.
My interpretation based on the specification is that the aud claim is useful to create purpose-built JWTs that are only valid for certain purposes. For one system, this may mean you would like a token to be valid for some features but not for others. You could issue tokens that are restricted to only a certain "audience", while still using the same keys and validation algorithm.
Since in the typical case a JWT is generated by a trusted service, and used by other trusted systems (systems which do not want to use invalid tokens), these systems simply need to coordinate the values they will be using.
Of course, aud is completely optional and can be ignored if your use case doesn't warrant it. If you don't want to restrict tokens to being used by specific audiences, or none of your systems actually will validate the aud token, then it is useless.
Example: Access vs. Refresh Tokens
One contrived (yet simple) example I can think of is perhaps we want to use JWTs for access and refresh tokens without having to implement separate encryption keys and algorithms, but simply want to ensure that access tokens will not validate as refresh tokens, or vice-versa.
By using aud, we can specify a claim of refresh for refresh tokens and a claim of access for access tokens upon creating these tokens. When a request is made to get a new access token from a refresh token, we need to validate that the refresh token was a genuine refresh token. The aud validation as described above will tell us whether the token was actually a valid refresh token by looking specifically for a claim of refresh in aud.
OAuth Client ID vs. JWT aud Claim
The OAuth Client ID is completely unrelated, and has no direct correlation to JWT aud claims. From the perspective of OAuth, the tokens are opaque objects.
The application which accepts these tokens is responsible for parsing and validating the meaning of these tokens. I don't see much value in specifying OAuth Client ID within a JWT aud claim.
If you came here searching OpenID Connect (OIDC): OAuth 2.0 != OIDC
I recognize that this is tagged for oauth 2.0 and NOT OIDC, however there is frequently a conflation between the 2 standards since both standards can use JWTs and the aud claim. And one (OIDC) is basically an extension of the other (OAUTH 2.0). (I stumbled across this question looking for OIDC myself.)
OAuth 2.0 Access Tokens##
For OAuth 2.0 Access tokens, existing answers pretty well cover it. Additionally here is one relevant section from OAuth 2.0 Framework (RFC 6749)
For public clients using implicit flows, this specification does not
provide any method for the client to determine what client an access
token was issued to.
...
Authenticating resource owners to clients is out of scope for this
specification. Any specification that uses the authorization process
as a form of delegated end-user authentication to the client (e.g.,
third-party sign-in service) MUST NOT use the implicit flow without
additional security mechanisms that would enable the client to
determine if the access token was issued for its use (e.g., audience-
restricting the access token).
OIDC ID Tokens##
OIDC has ID Tokens in addition to Access tokens. The OIDC spec is explicit on the use of the aud claim in ID Tokens. (openid-connect-core-1.0)
aud
REQUIRED. Audience(s) that this ID Token is intended for. It MUST contain the OAuth 2.0 client_id of the Relying Party as an audience value. It MAY also contain identifiers for other audiences. In the general case, the aud value is an array of case sensitive strings. In the common special case when there is one audience, the aud value MAY be a single case sensitive string.
furthermore OIDC specifies the azp claim that is used in conjunction with aud when aud has more than one value.
azp
OPTIONAL. Authorized party - the party to which the ID Token was issued. If present, it MUST contain the OAuth 2.0 Client ID of this party. This Claim is only needed when the ID Token has a single audience value and that audience is different than the authorized party. It MAY be included even when the authorized party is the same as the sole audience. The azp value is a case sensitive string containing a StringOrURI value.
Though this is old, I think question is valid even today
My suspicion is that aud should refer to the resource server(s), and
the client_id should refer to one of the client applications
recognized by the authentication server
Yes, aud should refer to token consuming party. And client_id refers to token obtaining party.
In my current case, my resource server is also my web app client.
In the OP's scenario, web app and resource server both belongs to same party. So this means client and audience to be same. But there can be situations where this is not the case.
Think about a SPA which consume an OAuth protected resource. In this scenario SPA is the client. Protected resource is the audience of access token.
This second scenario is interesting. RFC8707 "Resource Indicators for OAuth 2.0" explains where you can define the intended audience in your authorization request using resource parameter. So the resulting token will restricted to the specified audience. Also, Azure OIDC use a similar approach where it allows resource registration and allow auth request to contain resource parameter to define access token intended audience. Such mechanisms allow OAuth adpotations to have a separation between client and token consuming (audience) party.

What is the OAuth 2.0 Bearer Token exactly?

According to RFC6750-The OAuth 2.0 Authorization Framework: Bearer Token Usage, the bearer token is:
A security token with the property that any party in possession of the token (a "bearer") can use the token in any way that any other party in possession of it can.
To me this definition is vague and I can't find any specification.
Suppose I am implementing an authorization provider, can I supply any kind of string for the bearer token?
Can it be a random string?
Does it have to be a base64 encoding of some attributes?
Should it be hashed?
And does the service provider need to query the authorization provider in order to validate this token?
Bearer Token
A security token with the property that any party in possession of
the token (a "bearer") can use the token in any way that any other
party in possession of it can. Using a bearer token does not
require a bearer to prove possession of cryptographic key material
(proof-of-possession).
The Bearer Token is created for you by the Authentication server. When a user authenticates your application (client) the authentication server then goes and generates for you a Token. Bearer Tokens are the predominant type of access token used with OAuth 2.0. A Bearer token basically says "Give the bearer of this token access".
The Bearer Token is normally some kind of opaque value created by the authentication server. It isn't random; it is created based upon the user giving you access and the client your application getting access.
In order to access an API for example you need to use an Access Token. Access tokens are short lived (around an hour). You use the bearer token to get a new Access token. To get an access token you send the Authentication server this bearer token along with your client id. This way the server knows that the application using the bearer token is the same application that the bearer token was created for. Example: I can't just take a bearer token created for your application and use it with my application it wont work because it wasn't generated for me.
Google Refresh token looks something like this: 1/mZ1edKKACtPAb7zGlwSzvs72PvhAbGmB8K1ZrGxpcNM
copied from comment: I don't think there are any restrictions on the bearer tokens you supply. Only thing I can think of is that its nice to allow more than one. For example a user can authenticate the application up to 30 times and the old bearer tokens will still work. oh and if one hasn't been used for say 6 months I would remove it from your system. It's your authentication server that will have to generate them and validate them so how it's formatted is up to you.
Update:
A Bearer Token is set in the Authorization header of every Inline Action HTTP Request. For example:
POST /rsvp?eventId=123 HTTP/1.1
Host: events-organizer.com
Authorization: Bearer AbCdEf123456
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/1.0 (KHTML, like Gecko; Gmail Actions)
rsvpStatus=YES
The string "AbCdEf123456" in the example above is the bearer authorization token. This is a cryptographic token produced by the authentication server. All bearer tokens sent with actions have the issue field, with the audience field specifying the sender domain as a URL of the form https://. For example, if the email is from noreply#example.com, the audience is https://example.com.
If using bearer tokens, verify that the request is coming from the authentication server and is intended for the the sender domain. If the token doesn't verify, the service should respond to the request with an HTTP response code 401 (Unauthorized).
Bearer Tokens are part of the OAuth V2 standard and widely adopted by many APIs.
As I read your question, I have tried without success to search on the Internet how Bearer tokens are encrypted or signed. I guess bearer tokens are not hashed (maybe partially, but not completely) because in that case, it will not be possible to decrypt it and retrieve users properties from it.
But your question seems to be trying to find answers on Bearer token functionality:
Suppose I am implementing an authorization provider, can I supply any
kind of string for the bearer token? Can it be a random string? Does
it has to be a base64 encoding of some attributes? Should it be
hashed?
So, I'll try to explain how Bearer tokens and Refresh tokens work:
When user requests to the server for a token sending user and password through SSL, the server returns two things: an Access token and a Refresh token.
An Access token is a Bearer token that you will have to add in all request headers to be authenticated as a concrete user.
Authorization: Bearer <access_token>
An Access token is an encrypted string with all User properties, Claims and Roles that you wish. (You can check that the size of a token increases if you add more roles or claims).
Once the Resource Server receives an access token, it will be able to decrypt it and read these user properties. This way, the user will be validated and granted along with all the application.
Access tokens have a short expiration (ie. 30 minutes).
If access tokens had a long expiration it would be a problem, because theoretically there is no possibility to revoke it. So imagine a user with a role="Admin" that changes to "User". If a user keeps the old token with role="Admin" he will be able to access till the token expiration with Admin rights.
That's why access tokens have a short expiration.
But, one issue comes in mind. If an access token has short expiration, we have to send every short period the user and password. Is this secure? No, it isn't. We should avoid it. That's when Refresh tokens appear to solve this problem.
Refresh tokens are stored in DB and will have long expiration (example: 1 month).
A user can get a new Access token (when it expires, every 30 minutes for example) using a refresh token, that the user had received in the first request for a token.
When an access token expires, the client must send a refresh token. If this refresh token exists in DB, the server will return to the client a new access token and another refresh token (and will replace the old refresh token by the new one).
In case a user Access token has been compromised, the refresh token of that user must be deleted from DB. This way the token will be valid only till the access token expires because when the hacker tries to get a new access token sending the refresh token, this action will be denied.
Bearer token is one or more repetition of alphabet, digit, "-" , "." , "_" , "~" , "+" , "/" followed by 0 or more "=".
RFC 6750 2.1. Authorization Request Header Field (Format is ABNF (Augmented BNF))
The syntax for Bearer credentials is as follows:
b64token = 1*( ALPHA / DIGIT /
"-" / "." / "_" / "~" / "+" / "/" ) *"="
credentials = "Bearer" 1*SP b64token
It looks like Base64 but according to Should the token in the header be base64 encoded?, it is not.
Digging a bit deeper in to "HTTP/1.1, part 7: Authentication"**,
however, I see that b64token is just an ABNF syntax definition
allowing for characters typically used in base64, base64url, etc.. So
the b64token doesn't define any encoding or decoding but rather just
defines what characters can be used in the part of the Authorization
header that will contain the access token.
This fully addresses the first 3 items in the OP question's list. So I'm extending this answer to address the 4th question, about whether the token must be validated, so #mon feel free to remove or edit:
The authorizer is responsible for accepting or rejecting the http request. If the authorizer says the token is valid, it's up to you to decide what this means:
Does the authorizer have a way of inspecting the URL, identifying the operation, and looking up some role-based access control database to see if it is allowed? If yes and the request comes through, the service can assume it is allowed, and does not need to verify.
Is the token an all-or-nothing, so if the token is correct, all operations are allowed? Then the service doesn't need to verify.
Does the token mean "this request is allowed, but here is the UUID for the role, you check whether the operation is allowed". Then it's up to the service to look up that role, and see if the operation is allowed.
References
RFC 5234 3.6. Variable Repetition: *Rule
RFC 2616 2.1 Augmented BNF
Please read the example in rfc6749 sec 7.1 first.
The bearer token is a type of access token, which does NOT require PoP(proof-of-possession) mechanism.
PoP means kind of multi-factor authentication to make access token more secure. ref
Proof-of-Possession refers to Cryptographic methods that mitigate the risk of Security Tokens being stolen and used by an attacker. In contrast to 'Bearer Tokens', where mere possession of the Security Token allows the attacker to use it, a PoP Security Token cannot be so easily used - the attacker MUST have both the token itself and access to some key associated with the token (which is why they are sometimes referred to 'Holder-of-Key' (HoK) tokens).
Maybe it's not the case, but I would say,
access token = payment methods
bearer token = cash
access token with PoP mechanism = credit card (signature or password will be verified, sometimes need to show your ID to match the name on the card)
BTW, there's a draft of "OAuth 2.0 Proof-of-Possession (PoP) Security Architecture" now.
A bearer token is like a currency note e.g 100$ bill . One can use the currency note without being asked any/many questions.
Bearer Token A security token with the property that any party in
possession of the token (a "bearer") can use the token in any way that
any other party in possession of it can. Using a bearer token does not
require a bearer to prove possession of cryptographic key material
(proof-of-possession).
The bearer token is a b64token string, with the requirement that if you have it, you can use it. There are no guarantees as to what the meaning of that string actually is in the specification beyond that. It is up to the implementation.
5.2. Threat Mitigation
This document does not specify the encoding or the contents of the
token; hence, detailed recommendations about the means of
guaranteeing token integrity protection are outside the scope of this
document. The token integrity protection MUST be sufficient to
prevent the token from being modified.
https://datatracker.ietf.org/doc/html/rfc6750#section-5.2
While the token could be random each time it is issued, the downside is the server side would need to keep track of the tokens data (e.g. expiration). A JSON Web Token (JWT) is often used as a bearer token, because the server can make decisions based on whats inside the token.
JWT:
https://jwt.io/

Resources