How is authorization code different from the access token? - oauth-2.0

In OAuth 2.0 flows the authorization server sends the authorization code to the redirect endpoint and then the webpage has to hit the server again to get a separate access token to query the protected API with.
Why do there have to be two tokens? Specifically could someone provide example(s) of security attacks/vulnerabilities that emerge without this design.
There is this post Facebook OAuth 2.0 "code" and "token" but it doesn't really fully explain the reasoning behind the design.

One (the authorization code) is exchanged in the frontchannel, the other (the access token) in the backchannel. The end goal is obtaining the access token. Since the frontchannel is inherently more insecure it makes sense to send a very short-lived one-time-usage-only temporary credential (i.e the authorization code) in the front channel that the web server can use to obtain the longer-lived repeatedly-usable access token in the backchannel. That backchannel call would also allow for the web server (or: Client) to authenticate itself to the Authorization Server to increase the assurance about dealing with the right party.

Related

OAuth authorization code flow security question (authorization code intercepted by a hacker)

Something I can't wrap my head around.
As I understand the authorization code flow is supposed to be more secured than the implicit flow, because the tokens are not directly sent to the client from the authorization server, but rather retrieved by your backend.
So the flow is basically:
Browser gets the authorization code (as a URL parameter of sort).
Sends it to a public backend endpoint.
The backend sends the code + client secret to the authorization server, retrieves the tokens and stores them in the client's cookie/local storage for further use.
In this flow all the tutorials describe the authorization code as useless to the hacker, why is that? Can't a hacker use Postman or some other client and access your (public) API directly, make it go through step 3 and thus retrieve the tokens just the same?
What am I missing here?
The code is used exactly once. In many scenarios that an attacker might get access to the code, it's already been exchanged for an access token and therefore useless.
The authorization_code is a one-time token.
Authorization Code aka auth code is used publicly so that the client can establish a secure back channel between him and the authorization server so that he can exchange it with the access token without the use of a browser.
The auth code is public and can be intercepted via a proxy since it appears in the query of the redirect_uri and is used via the browser (which is considered insecure). The access token depends on the auth_code (public) and the client_secret (private) for the exchange. Without the client_secret an attacker can get the access token with brute-forcing this way through.
Summary: even if the attacker knows the authcode he can do anything without the client_secret given to the client at registration (or dynamically) and assumed to be secured.

When would I use a Hybrid flow with response_type=code id_token token in OpenID Connect?

I have been reading about OpenId Connect and their flows that are implicit flow, authorization code flow and hybrid flow.
I know that for example, the implicit flow is kind of insecure and should be used just in public clients like SPA application.
Now I´m trying to understand the Hybrid Flow that can be used for non-public applications like .Net MVC applications where you have a backchannel communication and thus you can save a secret password.
Reading about the Hybrid flow I know that it has 3 different types of response_type that can be:
code id_token
code token
code id_token token
For me, the best response_type would be code id_token where I can get the code in the front channel and then send that code to the Identity Server Provider and get the access token through the backchannel.
I've been searching for information on a real-world application of response_type=code id_token token, or code token, but other than reading that in these flows the first token/s are issued by the authorization endpoint which is the front channel and the final tokens that are issued by exchanging the authorization code are issued at the token endpoint, which is the backchannel and thus inherently accepted as being more secure, I fail to understand what you would use this for. Any information would be gladly accepted.
Why hybrid flow? The oft-documented rationale is that your app can immediately have info about the user via the id_token while the access token acquisition is still in flight. Technically this is true but it's still rarely used in the wild.
One real-world example is a Financial-grade API (FAPI) profile developed by a working group under the umbrella of OpenID Foundation. It recommends hybrid flow for security reasons. It's worth noting that the channel split "feature" of the flow is not enough on its own to provide the desired security properties, more "cooperation" from other moving parts is needed. From FAPI implementer's draft part 2:
This profile describes security provisions for the server and client
that are appropriate for Financial-grade APIs by defining the measures
to mitigate:
attacks that leverage the weak binding of endpoints in [RFC6749] (e.g. malicious endpoint attacks, IdP mix-up attacks),
attacks that modify authorization requests and responses unprotected in [RFC6749] by leveraging OpenID Connect's Hybrid Flow that returns
an ID Token in the authorization response.
and details
8.3.3 Identity provider (IdP) mix-up attack
In this attack, the client has
registered multiple IdPs and one of them is a rogue IdP that returns
the same client_id that belongs to one of the honest IdPs. When a user
clicks on a malicious link or visits a compromised site, an
authorization request is sent to the rogue IdP. The rogue IdP then
redirects the client to the honest IdP that has the same client_id. If
the user is already logged on at the honest IdP, then the
authentication may be skipped and a code is generated and returned to
the client. Since the client was interacting with the rogue IdP, the
code is sent to the rogue IdP's token endpoint. At the point, the
attacker has a valid code that can be exchanged for an access token at
the honest IdP.
This is mitigated by the use of OpenID Connect Hybrid Flow in which
the honest IdP's issuer identifier is included as the value of iss.
The client then sends the code to the token endpoint that is
associated with the issuer identifier thus it will not get to the
attacker.
8.4.3. Authorization response parameter injection attack
This attack occurs when the victim and attacker use the same relying party client.
The attacker is somehow able to capture the authorization code and
state from the victim's authorization response and uses them in his
own authorization response.
This can be mitigated by using OpenID Connect Hybrid Flow where the
c_hash, at_hash, and s_hash can be used to verify the validity of the
authorization code, access token, and state parameters. The server can
verify that the state is the same as what was stored in the browser
session at the time of the authorization request.
For a more technical description of these two attacks and countermeasures, see Single Sign-On Security – An Evaluation of OpenID Connect
For a realllly detailed description, take a look at OIDC Security Analysis paper.
The hybrid flow allows the backend to continue to act on the user’s behalf in an offline way (when the user is no longer present sending requests by browser) or independently of the frontend... doing other stuff in parallel. It can use the back channel exchanged refresh token to continue to get a new access token and work indefinitely.

Why does authorization grant flow skip the authorization code just return an access token?

I'm learning about O Auth 2 from here
I was wondering in the step of "Authorization server redirects user agent to client with authorization code", why doesn't the server just give the access token instead? Why give an authorization code that then is used to get the access token? Why not just give the access token directly? Is it because there there is a different access token for each resource so that you need to go through O Auth again to access a different resource?
The authorization grant code can pass through unsecured or potentially risky environments such as basic HTTP connection (not HTTPS) or a browser. But it's worthless without a client secret. The client can be a backend application. If the OAuth2 server returned a token, it could get compromised.
There is another OAuth2 flow - the Implicit flow, which returns an access token right after the authentication, but it's designed mainly for JavaScript applications or other deployments where it's safe to use it.
If a malicious app gets hold of the client id of your app(which is easily available, for example one can inspect the source), then it can use that to retrieve the token without the use of the client secret. All the malicious app needs to do is to somehow either specify the redirect URI to itself or to tap into the registered redirect URI.
That is the reason for breaking the flow as such. Note, when the client secret is not to be used as in SPA (Single Page Apps) or Mobile Apps, then PKCE comes to the rescue.
There is a reason for breaking up the authorization flow so as to keep the resource owner's interaction with the authorization server isolated from the client's interactions with the authorization server. Therefore we need to have two interactions with the authorization server. One in which the resource owner authenticates with it's credentials to the authorization server. And another where the client sends in it's client secret to the authorization server.
Please also see PKCE that deals with SPA (SinglePageApp)/Mobile apps.

oAuth2.0 access token confusion

I am following this tutorial about OAuth2.0 https://developers.google.com/youtube/v3/guides/authentication
It looks quite clear how OAuth2.0 works. But I have a bit confusion at the access token part.
After obtaining an access token for a user, your application can use
that token to submit authorized API requests on that user's behalf.
The API supports two ways to specify an access token: Specify the
access token as the value of the access_token query parameter:
www.googleapis.com/youtube/v3/videos?access_token=ACCESS_TOKEN
if someone acquired this access token during the url transferring they can access this protected resource right?
How the server know if the request is coming from the client initially requested the access token?
UPDATE:
after reading this post Are HTTPS headers encrypted? my confusion is cleared. I thought query string is not encrypted during transmission in the network.
Generally I think the consensus is that OAuth 2.0 is a server side technology and all access tokens and communication should be transmitted using SSL as the bearer tokens need to be kept as secure as possible.
Also, you need to know that there are 2 types of flows in OAuth 2.0
i) Implicit grant flow - This is the flow where the user logs in to the service provider and his browser gets the access token. Say you have X.com and Log in via Facebook. Once the user keys in his FB credentials, the access token is sent to his browser.
ii) Authorization Code flow - In this flow (consider the above situation again), facebook will pass an authorization code to the user's browser. If anyone, somehow, intercepts the authorization code there is nothing he can do. An authorization code can be exchanged for an access when passed with valid client credentials. So, when the user logs in, his browser gets an authorization code which is passed to your server at X.com. from there you would hit the code-token exchange endpoint provided by FB and get the access token returned to your server!
Authorization code flow adds another layer of security, where the access token is visible only to the client + server and not to the user agent. And as you figured out yourself, the token is passed via HTTPS.

What is the difference between the OAuth Authorization Code and Implicit workflows? When to use each one?

OAuth 2.0 has multiple workflows. I have a few questions regarding the two.
Authorization code flow - User logs in from client app, authorization server returns an authorization code to the app. The app then exchanges the authorization code for access token.
Implicit grant flow - User logs in from client app, authorization server issues an access token to the client app directly.
What is the difference between the two approaches in terms of security? Which one is more secure and why?
I don't see a reason why an extra step (exchange authorization code for token) is added in one work flow when the server can directly issue an Access token.
Different websites say that Authorization code flow is used when client app can keep the credentials secure. Why?
The access_token is what you need to call a protected resource (an API). In the Authorization Code flow there are 2 steps to get it:
User must authenticate and returns a code to the API consumer (called the "Client").
The "client" of the API (usually your web server) exchanges the code obtained in #1 for an access_token, authenticating itself with a client_id and client_secret
It then can call the API with the access_token.
So, there's a double check: the user that owns the resources surfaced through an API and the client using the API (e.g. a web app). Both are validated for access to be granted. Notice the "authorization" nature of OAuth here: user grants access to his resource (through the code returned after authentication) to an app, the app get's an access_token, and calls on the user's behalf.
In the implicit flow, step 2 is omitted. So after user authentication, an access_token is returned directly, that you can use to access the resource. The API doesn't know who is calling that API. Anyone with the access_token can, whereas in the previous example only the web app would (it's internals not normally accessible to anyone).
The implicit flow is usually used in scenarios where storing client id and client secret is not recommended (a device for example, although many do it anyway). That's what the the disclaimer means. People have access to the client code and therefore could get the credentials and pretend to become resource clients. In the implicit flow all data is volatile and there's nothing stored in the app.
I'll add something here which I don't think is made clear in the above answers:
The Authorization-Code-Flow allows for the final access-token to never reach and never be stored on the machine with the browser/app. The temporary authorization-code is given to the machine with the browser/app, which is then sent to a server. The server can then exchange it with a full access token and have access to APIs etc. The user with the browser gets access to the API only through the server with the token.
Implicit flow can only involve two parties, and the final access token is stored on the client with the browser/app. If this browser/app is compromised so is their auth-token which could be dangerous.
tl;dr don't use implicit flow if you don't trust the users machine to hold tokens but you do trust your own servers.
The difference between both is that:
In Implicit flow,the token is returned directly via redirect URL with "#" sign and this used mostly in javascript clients or mobile applications that do not have server side at its own, and the client does not need to provide its secret in some implementations.
In Authorization code flow, code is returned with "?" to be readable by server side then server side is have to provide client secret this time to token url to get token as json object from authorization server. It is used in case you have application server that can handle this and store user token with his/her profile on his own system, and mostly used for common mobile applications.
so it is depends on the nature of your client application, which one more secure "Authorization code" as it is request the secret on client and the token can be sent between authorization server and client application on very secured connection, and the authorization provider can restrict some clients to use only "Authorization code" and disallow Implicit
Which one is more secure and why?
Both of them are secure, it depends in the environment you are using it.
I don't see a reason why an extra step (exchange authorization code
for token) is added in one work flow when the server can directly
issue an Access token.
It is simple. Your client is not secure. Let's see it in details.
Consider you are developing an application against Instagram API, so you register your APP with Instagram and define which API's you need. Instagram will provide you with client_id and client_secrect
On you web site you set up a link which says. "Come and Use My Application". Clicking on this your web application should make two calls to Instagram API.
First send a request to Instagram Authentication Server with below parameters.
1. `response_type` with the value `code`
2. `client_id` you have get from `Instagram`
3. `redirect_uri` this is a url on your server which do the second call
4. `scope` a space delimited list of scopes
5. `state` with a CSRF token.
You don't send client_secret, You could not trust the client (The user and or his browser which try to use you application). The client can see the url or java script and find your client_secrect easily. This is why you need another step.
You receive a code and state. The code here is temporary and is not saved any where.
Then you make a second call to Instagram API (from your server)
1. `grant_type` with the value of `authorization_code`
2. `client_id` with the client identifier
3. `client_secret` with the client secret
4. `redirect_uri` with the same redirect URI the user was redirect back to
5. `code` which we have already received.
As the call is made from our server we can safely use client_secret ( which shows who we are), with code which shows the user have granted out client_id to use the resource.
In response we will have access_token
The implicit grant is similar to the authorization code grant with two distinct differences.
It is intended to be used for user-agent-based clients (e.g. single page web apps) that can’t keep a client secret because all of the application code and storage is easily accessible.
Secondly instead of the authorization server returning an authorization code which is exchanged for an access token, the authorization server returns an access token.
Please find details here
http://oauth2.thephpleague.com/authorization-server/which-grant/
Let me summarize the points that I learned from above answers and add some of my own understandings.
Authorization Code Flow!!!
If you have a web application server that act as OAuth client
If you want to have long lived access
If you want to have offline access to data
when you are accountable for api calls that your app makes
If you do not want to leak your OAuth token
If you don't want you application to run through authorization flow every time it needs access to data. NOTE: The Implicit Grant flow does not entertain refresh token so if authorization server expires access tokens regularly, your application will need to run through the authorization flow whenever it needs access.
Implicit Grant Flow!!!
When you don't have Web Application Server to act as OAuth Client
If you don't need long lived access i.e only temporary access to data is required.
If you trust the browser where your app runs and there is limited concern that the access token will leak to untrusted users.
Implicit grant should not be used anymore, see the IETF current best practices for details. https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics-18#section-2.1.2
As an alternative use a flow with response type code; for clients without possibility to securely store client credentials the authorization code with PKCE flow should be your choice.
From practical perspective (What I understood), The main reason for having Authz code flow is :
Support for refresh tokens (long term access by apps on behalf of User), not supported in implicit: refer:https://www.rfc-editor.org/rfc/rfc6749#section-4.2
Support for consent page which is a place where Resource Owner can control what access to provide (Kind of permissions/authorization page that you see in google). Same is not there in implicit . See section : https://www.rfc-editor.org/rfc/rfc6749#section-4.1 , point (B)
"The authorization server authenticates the resource owner (via the user-agent) and establishes whether the resource owner grants or denies the client's access request"
Apart from that, Using refresh tokens, Apps can get long term access to user data.
There seem to be two key points, not discussed so far, which explain why the detour in the Authorization Code Grant Type adds security.
Short story: The Authorization Code Grant Type keeps sensitive information from the browser history, and the transmission of the token depends only on the HTTPS protection of the authorization server.
Longer version:
In the following, I'll stick with the OAuth 2 terminology defined in the RFC (it's a quick read): resource server, client, authorization server, resource owner.
Imagine you want some third-party app (= client) to access certain data of your Google account (= resource server). Let's just assume Google uses OAuth 2. You are the resource owner for the Google account, but right now you operate the third-party app.
First, the client opens a browser to send you to the secure URL of the Google authorization server. Then you approve the request for access, and the authorization server sends you back to the client's previously-given redirect URL, with the authorization code in the query string. Now for the two key points:
The URL of this redirect ends up in the browser history. So we don't want a long lived, directly usable access token here. The short lived authorization code is less dangerous in the history. Note that the Implicit Grant type does put the token in the history.
The security of this redirect depends on the HTTPS certificate of the client, not on Google's certificate. So we get the client's transmission security as an extra attack vector (For this to be unavoidable, the client needs to be non-JavaScript. Since otherwise we could transmit the authorization code via fragment URL, where the code would not go through the network. This may be the reason why Implicit Grant Type, which does use a fragment URL, used to be recommended for JavaScript clients, even though that's no longer so.)
With the Authorization Code Grant Type, the token is finally obtained by a call from the client to the authorization server, where transmission security only depends on the authorization server, not on the client.

Resources