Should JWT access tokens contain PII?

Should JWT access tokens contain PII? - oauth-2.0

JWT access tokens shouldn't contain personally identifiable information (PII) as I understand it. This is to keep them small but also if intercepted, reduce the exposure of the information contained.
The OIDC protocol asks for a user info endpoint to be implemented. It can be called using the access token and it will return a bunch of claims about the user. Effectively what the id token contains, but potentially even more information.
So even though the access token doesn't carry this PII itself, if intercepted it can certainly be used to expose all this information anyway. So the argument about PII in the access token doesn't really stand up.
Does this mean I should be fine including email in the access token, because the API might want it in addition to the sub claim?

There are several points to be addressed here:
Not all access tokens must allow access to the userinfo endpoint. First, your system must expose a userinfo endpoint. Secondly, the user must have consented to release information in the userinfo endpoint to the given client. So in case of some access tokens there will be no threat that a malicious party could access the userinfo endpoint. And sometimes the user can consent to only expose their username, so even if you gain access to userinfo you'll still not be able to read the email. (of course it depends on the implementation of the OIDC Provider)
In the majority of cases oauth access tokens are used as bearer tokens. That means that anyone who has the token can access any data which can be accessed with that token. If someone manages to steal that token they can do whatever the original client could. If it is a concern for you, you can use sender constrained tokens instead of bearer tokens (e.g. mTLS constrained tokens or implement DPoP). These tokens are tied to the client which originally requested them. An attacker would have to steal not only the access token, but also a certificate used to verify proof-of-possession. The implementation is a bit more tricky than with bearer tokens, but security is greatly improved.
I would avoid putting any PII in a JWT. JWT can be decoded just like that, and any information kept within can be read by anyone. Lt's say that someone manages to get hold of a JWT issued from your system, but it's expired. They will not be able to access the API, or userinfo, but they can still extract data from the JWT. It's much better to use opaque tokens as access tokens and exchange them in your gateway (something which is called a Phantom Token approach).
Interestingly enough I only recently gave a talk on that concrete subject - using JWTs as access tokens and the Phantom Token flow :) (you can view here if you're interested :) link)

Related

Why doesn't Access Token have the identity of user?

I am struggling with the concept of access tokens in regards to them not having the identity of the person making the request.
And i think i may be using access tokens incorrectly in my project.
The thing I don't understand is access tokens are often explained as a hotel card key to get into a room, but this analogy just doesn't make sense to me when it comes to the vast majority of requests.
For example say i have a resource for your private messages and scope read:privateMessages
If you want to do some action like read private messages, and you send me an access token I have to know who it is, as in I can't send you back someone else's private messages or all private messages.
A hotel key in this case doesn't work here, a hotel card that has scope read:gymroom write:gymroom
Then yeah you can access the room, but in any applications i have worked on I really need to know who is making the request.
To go back to private messages if you post to private messages, maybe this takes a Message and Destination, I need to know who is sending the private message and I need that from the access token...
The analogy here also breaks a bit as the sub of the access token (if access token is jwt) is usually the user id and I look that up or call the user info endpoint.
But isn't it simpler to just put the user principal name or email in the access token and save the backend from doing the lookup request on each user request?
There is no security benefit that i can see from having the email / username removed from the token, if the same token can be used to call the user info endpoint
Can I get some help with understanding this concept better as I feel maybe I am doing things incorrectly or have some kind of large misunderstanding on this concept.
I do know there are questions on this on stack overflow but I haven't found anything specifically to clear this up.

The analogy here also breaks a bit as the sub of the access token (if
access token is jwt) is usually the user id and I look that up or call
the user info endpoint.
But isn't it simpler to just put the user principal name or email in
the access token and save the backend from doing the lookup request on
each user request?
Yes, you can. If I talk about specific to JWT access token then the primary intent behind using it for stateless authentication which mean token should contain sufficient information for the resource server to authenticate the request without datastore lookup. Oauth access token are merely random keys and resource server does need authroization server for veriyfing the same. However, you can extends oauth authroization server with JWT support to issue the JWT token.

The hotel card key is a good analogy for the access token because it deals with delegation. Whoever presents the hotel card key can get in to the room. If needed there can be identity information of the original user (Resource Owner) in the access token, but in that case it does not represent the "presenter"'s identity, merely the "owner"s identity.
When you actually want to know who is entering the room (or, presenting the token), you'll need to revert to a different token and protocol such as the Identity Token in OpenID Connect.

I think, the hotel analogy helps to understand the concept of Bearer tokens. Anyone with a valid hotel card key can enter the room. This also means that if an attacker manages to steal a Bearer token (the hotel card key), the attacker can gain access to the API (the hotel room), as long as the token is valid.
In general, the API should use the claims from a JWT for authorization, then you can skip the call to the UserEndpoint. Consequently, this means that the JWT should contain all the claims that the API needs for proper authorization. However, be careful when designing the token. If the access tokens are JWTs then a malicious client (e.g. from an attacker) may still try to parse the JWT and access its claims even though the client is not intended to. Or an attacker may come across the token by other means and parse it. In such cases, the application leaks data to an unauthorized party (an attacker). Data leaks can imply a violation of data protection laws or other regulations and cause troubles depending on what data the token contained. Username and email may not be a big issue regarding compliance but they can still cause security issues. For example, username and email may both be used by an attacker for personalized attacks (think of phishing emails).
Note, that it doesn't matter if the token is valid or not - the data is still there.
At Curity, we often recommend using the Phantom Token Pattern. It places an API Gateway in front of the API. The Authorization Server only issues by-reference tokens to the client (i.e. some id). When the client calls the API via the API Gateway, the latter exchanges the by-reference token for a by-value token, i.e. a JWT. In this way, no data will be leaked to the client and the API can still benefit from the JWT and its claims.

Why use openid connect ID token if the access token had all the claims and can be revoked?

I'm using oauth2 authorization code flow with the ASP.NET core 2.2 AddJwtBearer. My token end point returns JWT access toke with all the claims needed for checking the user's permissions.
I can send this token as the bearer for any Web API call and the standard .net code can use those claims to check permissions eg [Authorize(Policy="somePolicy")].
One of the claims points at an internal session key that we can revoke.
So my question is why would I need an ID token or even a refresh token?
The claims and other details are in the access token so what would an ID token add to this?
Having to use a further call to a userinfo end points send to be a waste if the info is in the Auth token?
If I can revoke the session that Auth token points at, surely I don't need a refresh token and can have longer life Auth tokens?
I've read lots of examples and comparisons but most computations between just oauth2 and enhanced with openid connect seem to be with very basic oauth2 not using JWT etc and so written to exaggerate the differences.
So I'm unclear when both are using the same authorization code flow and JWT tokens, what the team advantages are in using the id token in my situation??

Given your context, it seems that OpenId Connect is not necessary for your situation. It really adds value when you are implementing single sign-on (SSO). In that case the Identity token can also be used on SSO logout.
Having additional claims about the identity in the access token is also a waste. Having to send all this information on each call. Especially when you need the information only once (a Spa may persist the information in memory). It's better to have some api (endpoint) expose the information when requested.
About the access token, you can't revoke it. You may be able to revoke authorization, but the access token remains valid until it expires. You want invalid access tokens to short-circuit as soon as possible in the pipeline, before policies are evaluated.
Please note that it's not a common scenario where the api can revoke access by using an internal session key. Most api's are 'session-less' and fully rely on the access token. Because that's the purpose of a JWT, being self-contained, not having to contact the authority to verify the token.
Perhaps you can use a long-lived access token because in your situation the authorization is determined at another level. But are you capable of detecting when the token is compromised? And where are you going to check it? In every api and client? Or would you rather let the authority take care of it (single responsibility)?
When implementing security you should look at the design, the responsibilities, where to do what. Let the authority, that issues the tokens, take care of authentication and client/resource authorization. The Api, being the resource where the business rules (policies) are implemented, can take care of (user) authorization.
The problem with a long-lived token is that when it falls into the wrong hands, it allows access until it expires or, in your case, until you detect something is wrong. Where a short-lived token always allows access for a short time, making it almost not worthwhile for a hacker to obtain a token for the time it can be used.
With short-lived access tokens you'll have to use refresh tokens. The authority can verify on each call whether a new access token should be issued. Of course here counts the same, this only applies to the situation where you are actually verifying the request. Tokens in itself are not safe. You'll have to add some level of security, e.g. check the ip address. But having the authority to take care of it and using one-time-use refresh tokens already does add security.
In my experience with oidc/oauth2, the access token is mainly used to grant client applications access to a resource (on behalf of a user). Where scope claims define the accessible functionality and the sub claim identifies the user.
Authorization can be implemented on different levels and doesn't have to be part of the access token. In fact, permissions should not be part of the access token at all.
So your setup may be fine. But I wouldn't use long-lived access tokens for the reasons already mentioned. Plus they are not managable. You can't update the access token when someting changes in the flow, e.g. when a scope is added.

Why do you need authorization grant when you can just give the token out directly?

Watching this video, it details in OAuth2 that the client application first has to get the authorization grant from the Authorization server and then use that grant to get a token before being able to access the resource server. What purpose does the grant serve? Why not give the client the token right away after the user signs on with his/her username and password?

Because it is more secure, for some application types.
What you describe is so called authorization-code-flow. It is normally used for "classical" web applications, where only the backend needs to access resource server. The exchange of authorization code to access token happens on the backend and access token never leaves it. Exchange can be done only once and in addition client id and secret (stored on the backend) are necessary.
Single-Page-Applications often use implicit-flow where access token is delivered to the frontend directly in the URL.
See more here:
IdentityServer Flows
EDIT: Q: "I still don't see how it is more secure given that you have to have the grant in order to get the token. Why need 2 things instead of just 1 thing to access the resource? If someone steals the token, they can access the resource anyway – stackjlei"
"Stealing" access token will work independent on how your application acquires it. However, stealing access token on the backend is much more difficult than on the frontend.
Authorization code is delivered to the backend also over the frontend but the risk that someone intercepts and uses it is tiny:
It can be exchanged only once.
You need client-id and client-secret in order to exchange it. Client-secret is only available on the backend.
Normally, authorization code will be exchanged by your backend to access-token immediately. So the lifetime of it is just several seconds. It does not matter if someone gets hold of used authorization code afterwards.

In your scenario there could be two servers, an Authorization and a Resource one.
It could be only one as well, but let's imagine this scenario.
The purpose of the Authorization Server is to issue short lived access tokens to known clients. The clients identify themselves via their CLientID and CLientSecret.
The Authorization Server ( AS ) holds the list of clients and their secrets and first checks to make sure the passed values match its list. If they do, it issues a short lived token.
Then the client can talk to the Resource Server ( RS ), while the token is valid. Once the token expires, a new one can be requested or the expired one can be refreshed if that is allowed by the Authorization Server.
The whole point here is security, Normally, the access tokens are passed in the Authorization header of the request and that request needs to be over https to make sure that the data can't be stolen. If, somehow, someone gets hold of an access token, they can only use it until it expires, hence why the short life of the tokens is actually very important. That's why you don't issue one token which never expires.

You have different type of OAuth. On type doesn't require to use the 'grant' authorization. It depend who are the user/application, the ressource owner and the server API.
This way, you - as a user - don't send the password to the application. The application will only use the grant token to gain access to your ressources.
I think this tuto is a pretty good thing if you want more details
https://www.digitalocean.com/community/tutorials/an-introduction-to-oauth-2

Should I send the Secret with the Refresh Token in OAuth 2.0

I'm working to implement a OAuth 2.0 server, and while reading the RFC6749 specification I realized that section 6 on Page 47 regarding "Refreshing an Access Token". Explains that we need to just use the Refresh Token that we have to get a new Token.
But for example, in addition to the Refresh Token, Google require the User ID and the Secret to do so.
This confuses me, because on one hand we have Google that is processing high volume of requests every day, and we have a specification written probably with a smaller scope in mind.
Is it good to send the Secret every hour with the Refresh Token?
Personally I believe no: because the User ID and Secret should be used only to go over the whole OAuth 2.0 process.
Basically
You use the token on each request to prove that you are who you are.
Refresh Token get used only once an hour (and potentially changed at each refresh)
Secret and User ID go to the internet as rarely as possible. Only when option 1 and 2 get compromised.
I personally believe that sending the Secret with the Refresh token is less secure. But maybe I'm missing something.
If you have another point of view, please share it :)

I might be missing something, but what Google requires and what's also specified by OAuth2 is that when refreshing a token from a confidential client application the client must authenticate itself.
The most common type of credentials being used for confidential clients are a client identifier alongside a client secret. This information is issued to a client application and is unrelated to the end-user.
By requiring client authentication the authorization server can be sure the request comes from a specific client and adjust its response accordingly. For example, an authorization server can decide that certain permissions - scopes - can only be requested from confidential clients.
The argument around reducing the number of times the client secret needs to be sent over the wire is a non-issue. OAuth2 mandates that the communication happens over TLS and if you have issues with sending the secret then you would also have issues with sending bearer access tokens.
In conclusion, although sometimes doing things exactly according to spec without questioning the overall context might lead to vulnerabilities:
... some libraries treated tokens signed with the none algorithm as a valid token with a verified signature. The result? Anyone can create their own "signed" tokens with whatever payload they want, allowing arbitrary account access on some systems.
(source: Critical vulnerabilities in JSON Web Token libraries)
Some libraries treated the none algorithm according to spec, but ignored the usage context; if the developer passed a key to verify a signature it most likely did not want to treat unsigned tokens as valid.
However, passing the secret on the refresh token request is not one of these situations so don't worry about it.

Why must we "change temporary credentials for token credentials" in OAuth?

Can't the server just "upgrade" the temporary credentials to token credentials and retain the same key and secret?
The client can then start doing authenticated calls right away after the recieving the callback from the server stating that the temporary credentials has been "upgraded".
Of cause if the temporary credentials have not be upgrade (i.e. client doesn't wait for callback) the authenticated call fails.
So the question is why make an extra call to the server after the callback to "exchange" temporary credentials for token credentials?

You could implement OAuth in that way, but as I understand it, separating Request Tokens from Access Tokens does provide an extra layer of security.
From the Beginner's Guide:
OAuth includes two kind of Tokens:
Request Token and Access Token. Each
Token has a very specific role in the
OAuth delegation workflow. While
mostly an artifact of how the OAuth
specification evolved, the two-Token
design offers some usability and
security features which made it
worthwhile to stay in the
specification. OAuth operates on two
channels: a front-channel which is
used to engage the User and request
authorization, and a back-channel used
by the Consumer to directly interact
with the Service Provider. By limiting
the Access Token to the back-channel,
the Token itself remains concealed
from the User. This allows the Access
Token to carry special meanings and to
have a larger size than the
front-channel Request Token which is
exposed to the User when requesting
authorization, and in some cases needs
to be manually entered (mobile device
or set-top box).
So, as I understand it, by limiting the Access Token to a channel directly between the consumer (your service) and the provider (the service you're gaining access to), you can obtain a secure Access Token (that is, one the attacker doesn't have) even if the user's machine or the user's network connection to your service is compromised. If the Request Token were simply upgraded, then anyone sniffing the user's network connection could easily obtain the Request/Access Token, which we'd prefer to keep secret since it can be used (with your consumer token, of course), potentially for a very long time, to access the user's data. A server-to-server connection is often more secure.
Also, as is pointed out above, this lets you have a much longer key in cases where the Request Token actually has to be typed out by the user (and so is probably very short).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart