Spring Security - Google OAuth 2.0 - UnknownHostException www.googleapis.com - spring-security

I've implemented Google oauth login based on this tutorial: https://www.callicoder.com/spring-boot-security-oauth2-social-login-part-1/
It is working correctly when app is run locally. However, after deploying it on GKE, I'm unable to log in - flow fails with the following error:
error: [invalid_token_response] An error occurred while attempting to retrieve the OAuth 2.0 Access Token Response: I/O error on POST request for "https://www.googleapis.com/oauth2/v4/token": www.googleapis.com; nested exception is java.net.UnknownHostException: www.googleapis.com
Which comes from OAuth2AccessTokenResponseClient
As I said before, it's working fine when run on localhost and I'm unable to debug it.
The app is deployed with Ingress using a static IP. I've assigned that IP to my domain very recently. Domain is registered in Google APIs Authorised redirect URIs

Google APIs use the OAuth 2.0 protocol for authentication and authorization. Google supports common OAuth 2.0 scenarios such as those for web server, installed, and client-side applications. Please have a look at this link.
We can follow the below steps for obtaining OAuth 2.0 access tokens.
Step 1: Generate a code verifier and challenge
Step 2: Send a request to Google's OAuth 2.0 server
Step 3: Google prompts user for consent
Step 4: Handle the OAuth 2.0 server response
Step 5: Exchange authorization code for refresh and access tokens

The problem was that kube-dns pods dind't get up. I set up a preemptible cluster and added a taint to it's only node pool. That prevented kube-dns from starting:
Normal NotTriggerScaleUp 61s (x22798 over 2d18h) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate
Warning FailedScheduling 44s (x141 over 26h) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
After removing the taint hostname got resolved

Related

cert-manager does not issue certificate after upgrading to AKS k8s 1.24.6

I have an automatic setup with scripts and helm to create a Kubernetes Cluster on MS Azure and to deploy my application to the cluster.
First of all: everything works fine when I create a cluster with Kubernetes 1.23.12, that means after a few minutes everything is installed and I can access my website and there is a certificate issued by letsencrypt.
But when I delete this cluster completely and reinstall it and only change the Kubernetes version from 1.23.12 to 1.24.6. I dont't get a certificate any more.
I see that the acme challenge is not working. I get the following error:
Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk': Get "http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk": dial tcp: lookup my.hostname.de on 10.0.0.10:53: no such host
After some time the error message changes to:
'Error accepting authorization: acme: authorization error for my.hostname.de:
400 urn:ietf:params:acme:error:connection: 20.79.77.156: Fetching http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk:
Timeout during connect (likely firewall problem)'
10.0.0.10 is the cluster IP of kube-dns in my kubernetes cluster. When I look at "Services and Ingresses" in Azure portal I can see the port 53/UDP;53/TCP for the cluster IP 10.0.0.10
And I can see there that 20.79.77.156 is the external IP of the ingres-nginx-controller (Ports: 80:32284/TCP;443:32380/TCP)
So I do not understand why the acme challenge cannot be performed successfully.
Here some information about the version numbers:
Azure Kubernetes 1.24.6
helm 3.11
cert-manager 1.11.0
ingress-nginx helm-chart: 4.4.2 -> controller-v1.5.1
I have tried to find the same error on the internet. But you don't find it often and the solutions do not seem to fit to my problem.
Of course I have read a lot about k8s 1.24.
It is not a dockershim problem, because I have tested the cluster with the Detector for Docker Socket (DDS) tool.
I have updated cert-manager and ingress-nginx to new versions (see above)
I have also tried it with Kubernetes 1.25.4 -> same error
I have found this on the cert-manager Website: "cert-manager expects that ServerSideApply is enabled in the cluster for all versions of Kubernetes from 1.24 and above."
I think I understood the difference between Server Side Apply and Client Side Apply, but I don't know if and how I can enable it in my cluster and if this could be a solution to my problem.
Any help is appreciated. Thanks in advance!
I've solved this myself recently, try this for your ingress controller:
ingress-nginx:
rbac:
create: true
controller:
service:
annotations:
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
k8s 1.24+ is using a different endpoint for health probes.

Azure Application gateway fails with terminal provisioning state "Failed"

I am deploying azure application gateway (internal) with V2, it succeeded couple of times in other subscriptions (Environments), however, it is failing with strange error and without much details about the error.
deployment fails after 30 mins of applying/creating
there is a UDR but which is for different purpose and not blocking or restricting the default internet route
The deployment is using terraform and everything worked well in other instances deployment
I tried to reproduce the same in my environment and got the same error like below.
"details": [
{
"code": "Conflict",
"message": "{\r\n \"status\": \"Failed\",\r\n \"error\": {\r\n \"code\": \"ResourceDeploymentFailure\",\r\n \"message\": \"![The resource operation completed with terminal provisioning state 'Failed](https://i.imgur.com/eipLRgp.png)'.\"\r\n }\r\n}"
}
]
This issue generally occurs, when an unsupported route typically a 0.0.0.0/0 route to a firewall being advertised via BGP is affecting the Application Gateway Subnet.
Try to deploy with a default vnet and manage subnet configuration like below:
When I tried to deploy, Azure Application gateway deployment succeeded successfully like below:
If your deployment fails after 30 mins you can make use of diagnose logs to check error messages in any logs pertaining to the unsuccessful procedure.
Once you determine the cause of the issue diagnosis will guide you to take the necessary steps and fix the issue. Resolving network issues, depending on the cause of the failure.
Found the issue and resolution
Raised a Microsoft case to see the logs of the APPGW at the platform level
Microsoft verified the logs and identified that AppGW is not able to communicate with the Keyvault to read the ssl certificate as we are using Keyvault to store ssl cert for TLS encryption
Found out that subnet to subnet communication is blocked and hence AppGW is unable to communicate with KV in another subnet
Resolution:
Allowed subnet to subnet communication where appgw and kv are present
Conclusion:
Microsoft would have enabled better logging information (error details) in the AppGW resource deployment and or resource activity logs

OIDC redirect loop issue

Env:
Istio: 1.14.1
Okta
OAuth2-proxy: v7.2.0
K8s: 1.21.11
Followed this https://www.jetstack.io/blog/istio-oidc/ to setup OIDC auth for my application. I am running into redirect loop issue. I see this error in the oauth2-proxy pod logs
[2022/07/06 17:20:56] [oauthproxy.go:862] No valid authentication in request. Initiating login.
Any help to troubleshoot this? Please let me know if you need more details about the setup.

JupyterHub Azure Gov OAuth posts to non-Gov AAD Authority endpoints, not the ones set in App Registration

Context:
I have JupyterHub helm chart deployed on AWS EKS, following instructions here https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/index.html
for SSO for all our apps, we use Azure Gov AD
in Azure App Registration, as required for apps authenticated on Azure Gov, Authority Endpoints are setup correctly to hit https://login.microsoftonline.us/<tenant_id>/oauth2/token as shown in attached pic
however, when I try SSO into JupyterHub, connection in Auth0 is hitting https://login.microsoftonline.com instead, as shown in attached logs from pod running Jupyterhub, resulting in a 500 Status error
What could be causing Auth0 connection to hit a different, wrong endpoint not specified in AAD App Registration?
Did anyone face similar issues trying to authenticate an app on Azure Gov?
Is this an error on Azure AD side, or how OAuthenticator is configured on JupyterHub?
AAD Endpoints
k logs hub-78c6c9ff4f-znbxp -n jupyterhub2 --follow
[E 2022-03-03 02:15:58.325 JupyterHub oauth2:389] Error fetching 400 POST https://login.microsoftonline.com/<tenant_id>/oauth2/token: {
"correlation_id": <id>,
"error": "invalid_request",
"error_codes": [
900432
],
"error_description": "AADSTS900432: Confidential Client is not supported in Cross Cloud request.\r\nTrace ID: 9898f82e-b503-4c47-8ae4-859d8d54b500\r\nCorrelation ID: a86756e7-8aaa-4eb9-876a-1db5d145889d\r\nTimestamp: 2022-03-03 02:15:58Z",
"timestamp": "2022-03-03 02:15:58Z",
"trace_id": "9898f82e-b503-4c47-8ae4-859d8d54b500"
}
I got this resolved - TL;DR the AzureOAuthenticator from JupyterHub isn't built for Azure Gov apps. It defaults to Auth0 endpoint that is only for non-Gov Azure apps.
So I had to create my own authenticator with the correct configuration from Azure AD app registration.
Also I was using my secret ID instead of secret value.

Error when trying to get token using Managed Service Identity in a multi-container azure web app service

We have the following scenario:
Current working setup
Web API project using a single DockerFile
A release pipe line with an 'Azure App Service deploy' task.
Proposed new setup
Web API project using multi container Docker Compose file
A release pipe line with an 'Azure Web App for Containers' task.
Upon deploying the new setup we receive the below error message:
ERROR - multi-container unit was not started successfully
Unhandled exception. System.AggregateException: One or more errors occurred.
(Parameters: Connection String: XXX, Resource: https://vault.azure.net, Authority:
https://login.windows.net/xxxxx. Exception Message:
Tried to get token using Managed Service Identity.
Access token could not be acquired. Connection refused)
The exception thrown is because it can't connect to Azure MSI (Managed Service Identity). It does this to obtain a token before connecting to key vault.
I have tried the following based upon some research and solutions others have found:
Connecting with "RunAs=App" (this seems to be the default parameter-less constructor anyway)
Building up the connection string myself manually by pulling the "MSI_SECRET" environment variable from the machine. This is always blank.
Restarting MSI.
Upgrading and downgrading AppAuthentication package
MSI appears to be configured correctly as it works perfectly with our current working setup so we can rule that out.
It's worth noting that this is System assigned identity not a user assigned one.
The documentation that states which services support managed identites only mentions 'Azure Container Instances' not 'Azure Managed Container Instances' and that is for Linux/Preview too so that it could be not supported.
Services that support managed identities for Azure resources
We've spent a considerable amount of time getting to this point with the configuration and deployment and it would be great if we could resolve this last issue.
Any help appreciated.
Unfortunately, there currently is no multi-container support for managed identities. The multi-container feature is in preview and so does not have all its functionality working yet.
However, the documentation you linked to is also not as clear about the supported scenarios, so I am working on getting this documentation updated to better clarify this. I can update this answer once that's done.

Resources