Error reading JDBC_PING table, Keycloak cluster - docker

Can somebody help me with this problem.
I have two instances of docker keycloak containers cluster using a postgre Database. I use JDBC_PING for keycloak cluster discovery. The problem is that when checking one of the instances logs I get the following errors:
Error reading JDBC_PING table(https://i.stack.imgur.com/vrsdp.png)
Rollback(https://i.stack.imgur.com/2z0MF.png)
Multiple threads active within it(https://i.stack.imgur.com/lemFD.png)
All of them are deployed on azure ACI using an application gateway for managing traffic.
Can somebody point me in the right direction for debugging?
I don't know what to check.
Only one container throws this error.
Edit: There is not a problem with the keycloak cluster because I disabled jdbc_ping and left only 1 instance, I think it is a connection timeout, because the exceptions are still apearing. It is really weird that it happens only on production, on staging is working fine. Still investigating :(

Related

How to edit bad gateway page on traefik

I want to edit the Bad Gateway page from traefik to issue a command like
docker restart redis
Does anyone have an idea on how to do this?
A bit of background:
I have a somewhat broken setup of Traefik v2.5 and Authelia on my development server, where sometimes I get a Bad Gateway Error when accessing a page. Usually this is fixed by clearing all sessions from redis. I tried to locate the bug, but the error logs aren't helpful and I don't have the time and skills to make the bug reproduceable or find the broken configuration. So instead I always use ssh into the maschine and reset redis manually

Cannot access Keycloak account-console in Kubernetes (403)

I have found a strange behavior in Keycloak when deployed in Kubernetes, that I can't wrap my head around.
Use-case:
login as admin:admin (created by default)
click on Manage account
(manage account dialog screenshot)
I have compared how the (same) image (quay.io/keycloak/keycloak:17.0.0) behaves if it runs on Docker or in Kubernetes (K3S).
If I run it from Docker, the account console loads. In other terms, I get a success (204) for the request
GET /realms/master/protocol/openid-connect/login-status-iframe.html/init?client_id=account-console
From the same image deployed in Kubernetes, the same request fails with error 403. However, on this same application, I get a success (204) for the request
GET /realms/master/protocol/openid-connect/login-status-iframe.html/init?client_id=security-admin-console
Since I can call security-admin-console, this does not look like an issue with the Kubernetes Ingress gateway nor with anything related to routing.
I've then thought about a Keycloak access-control configuration issue, but in both cases I use the default image without any change. I cross-checked to be sure, it appears that the admin user and the account-console client are configured exactly in the same way in both the docker and k8s applications.
I have no more idea about what could be the problem, do you have any suggestion?
Try to set ssl_required = NONE in realm table in Keycloak database to your realm (master)
So we found that it was the nginx ingress controller causing a lot of issues. While we were able to get it working with nginx, via X-Forwarded-Proto etc., but it was a bit complicated and convoluted. Moving to haproxy instead resolved this problem. As well, make sure you are interfacing with the ingress controller over https or that may cause issues with keycloak.
annotations:
kubernetes.io/ingress.class: haproxy
...

Composer instance freeze, metadata.google.internal authentication error

Our Composer instance dropped all its active workers in the middle of the day. Node memory and cpu utilization disappeared for 2 out of 3 nodes.
First errors were:
_mysql_exceptions.OperationalError: (2006, "Can't connect to MySQL server on 'airflow-sqlproxy-service.default.svc.cluster.local' (110))"
Restarting Composer instance (with a dummy env variable) does not help, gives the below error:
Killing GKE workers in error does not help either. Stackdriver has this:
ERROR: (gcloud.container.clusters.describe) You do not currently have an active account selected.)
And another error seems to point to internal Google authentication service problem:
ERROR: (gcloud.container.clusters.get-credentials) There was a problem refreshing your current auth tokens: Unable to find the server at metadata.google.internal)
The Composer storage bucket seems to have 'Storage Legacy Bucket ...' permissions for some service accounts. Some changes going on in the authentication backend or what could be the underlying cause of the sudden and weird freeze?
Versions are composer-1.8.2 and airflow-1.10.3.

Spring Boot Admin - Running in Docker Swarm weirdly

I am running multiple Spring-Boot servers all connected to a Spring Boot Admin instance. Everything is running in the same Docker Swarm.
Spring Boot Admin keeps reporting on these "fake" instances that pop up and die. They are up for 1 second and then become unresponsive. When I clear them, they come back. The details for that instance show this error:
Fetching live health status failed. This is the last known information.
Request failed with status code 502
Here's a screenshot:
This is the same for all my APIs. This is causing us to get an inaccurate health reading of our services. How can I get Admin to stop reporting on these non-existant containers ?
I've looked in all my nodes and can't find any containers (running or stopped) that match the unresponsive containers that Admin is reporting.

Gateway app can not connect to microservices

we are using Jhipster and docker for our microservices architechture. we just deployed our application stack to docker swarm(docker-compose version 3) with one only one node as active and having issues with Gateway app throwing zuul timeout connecting to backend microservices. We have a different environment where we are not using swarm(docker-compose version 2) and it works great. In swarm I was able curl to backend microservices from Gateway app using containername:port but not containerIp:port. I am lost here as I could not narrow down the issue to whether it is a swarm issue or jhipster issue. I even changed the 'prefer-ip-address: false' in our app properties but it is same issue? Any leads on what the issue could be?

Resources