I encounter an issue many times, fixed it by recreating the problem app, but I still want to know how to debug and what cause this issue.
For example:
I created a new app jenkins persistent application in my master, I'm able to curl then endpoints IP address in Node/Master, but not able to curl the application exposed IP/hostname address. Sometimes, I just need to remove the application then re-create again, this problem will be fixed, but this time I really want to know how to fix it without re-create, followed the article in https://docs.openshift.com/enterprise/3.1/admin_guide/sdn_troubleshooting.html to debugging, and I'm pretty sure the DNS is working, just let me kknow if I need to provide any more information to here. Thanks.
Here is my svc description output:
# oc describe svc/jenkins
Name: jenkins
Namespace: developer
Labels: app=jenkins-persistent
template=jenkins-persistent-template
Selector: name=jenkins
Type: ClusterIP
IP: 172.30.78.168
Port: web 80/TCP
Endpoints: 10.128.0.97:8080
Session Affinity: None
No events.
# oc get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
jenkins 172.30.78.168 <none> 80/TCP 1h
jenkins-jnlp 172.30.87.38 <none> 50000/TCP 1h
# curl http://10.128.0.97:8080
<html><head><meta http-equiv='refresh' content='1;url=/securityRealm/commenceLogin?from=%2F'/><script>window.location.replace('/securityRealm/commenceLogin?from=%2F');</script></head><body style='background-color:white; color:white;'>
Authentication required
<!--
You are authenticated as: anonymous
Groups that you are in:
Permission you need to have (but didn't): hudson.model.Hudson.Read
... which is implied by: hudson.security.Permission.GenericRead
... which is implied by: hudson.model.Hudson.Administer
-->
</body></html>
# oc get oauthclient
NAME SECRET WWW-CHALLENGE REDIRECT URIS
cockpit-oauth-client user7IjHLvwuclbrHeVmi2pslHpSbmQuI3ePqjAHbLSS0aNBekio2aqDM3iBbx33Qwwp FALSE https://registry-console-default.com.cn,https://jenkins-developer.com.cn
# curl 172.30.78.168:8080
curl: (7) Failed connect to 172.30.78.168:8080; No route to host
# curl 172.30.78.168
<html><head><meta http-equiv='refresh' content='1;url=/securityRealm/commenceLogin?from=%2F'/><script>window.location.replace('/securityRealm/commenceLogin?from=%2F');</script></head><body style='background-color:white; color:white;'>
Authentication required
<!--
You are authenticated as: anonymous
Groups that you are in:
Permission you need to have (but didn't): hudson.model.Hudson.Read
... which is implied by: hudson.security.Permission.GenericRead
... which is implied by: hudson.model.Hudson.Administer
# curl jenkins-developer.com.cn
curl: (7) Failed connect to jenkins-developer.com.cn:80; Connection refused
# oc describe route
Name: jenkins
Namespace: developer
Created: 2 days ago
Labels: app=jenkins-persistent
template=jenkins-persistent-template
Annotations: <none>
Requested Host: jenkins-developer.paas.com.cn
exposed on router ose-router 2 days ago
Path: <none>
TLS Termination: <none>
Insecure Policy: <none>
Endpoint Port: web
Service: jenkins
Weight: 100 (100%)
Endpoints: 10.128.0.97:8080
Related
The console ui in my OpenShift 4.5.x installation has mysteriously stopped working. Visiting the console URL now results in the message:
Application is not available
The application is currently not serving requests at this endpoint. It may not have been started or is still starting.
One usually sees this if a route exists but is cannot find a corresponding service or pod, but in this case, the route exists:
$ oc -n openshift-console get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
console console-openshift-console.apps.example.com console https reencrypt/Redirect None
downloads downloads-openshift-console.apps.example.com downloads http edge/Redirect None
The service exists:
$ oc -n openshift-console get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
console ClusterIP 172.30.36.70 <none> 443/TCP 57d
downloads ClusterIP 172.30.190.186 <none> 80/TCP 57d
And the pods exist and are healthy:
$ oc -n openshift-console get pods
NAME READY STATUS RESTARTS AGE
console-76c8d7d755-gtfm8 0/1 Running 1 4m12s
console-76c8d7d755-mvf6n 0/1 Running 1 4m12s
downloads-9656c996-mmqhk 1/1 Running 0 53d
downloads-9656c996-z2khj 1/1 Running 0 53d
Looking at the logs for the console pods, there appears to be a problem contacting the oauth service:
2021-01-04T22:05:48Z auth: error contacting auth provider (retrying in 10s): Get https://kubernetes.default.svc/.well-known/oauth-authorization-server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-04T22:05:58Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: EOF
2021-01-04T22:06:13Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-04T22:06:23Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: EOF
2021-01-04T22:06:38Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2021-01-04T22:06:53Z auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.example.com/oauth/token failed: Head https://oauth-openshift.apps.example.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
But the pods in the openshift-authentication namespace appear to be healthy and are not reporting any errors in the logs. Where should I be looking for the source of the problem?
The expected route and service exist in the openshift-authentication namespace:
$ oc -n openshift-authentication get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
oauth-openshift oauth-openshift.apps.example.com oauth-openshift 6443 passthrough/Redirect None
$ oc -n openshift-authentication get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
oauth-openshift ClusterIP 172.30.233.202 <none> 443/TCP 57d
$ oc -n openshift-authentication get route oauth-openshift -o json | jq .status
{
"ingress": [
{
"conditions": [
{
"lastTransitionTime": "2020-11-08T19:48:08Z",
"status": "True",
"type": "Admitted"
}
],
"host": "oauth-openshift.apps.example.com",
"routerCanonicalHostname": "apps.example.com",
"routerName": "default",
"wildcardPolicy": "None"
}
]
}
It turned out to be an issue with the default ingress routers. There were no obvious errors, but I was able to resolve the problem by restarting the routers:
oc -n openshift-ingress get pod -o json |
jq -r '.items[].metadata.name' |
xargs oc -n openshift-ingress delete pod
i had same issue on OpenShift 3.11
i just deleted secret with certificate, openshift will create new secret, now console works.
oc delete secret console-serving-cert -n openshift-console
My cluster includes: 1 master and 2 worker nodes. I created a pod using deployment yaml. The pod running successfully on the worker node 1, I can ping the pod's ip on worker nodes but I can't ping the ip address of the pod on the master. I tried to disable firewarlld, restart docker but not successfully. Please see my commands
[root#k8s-master ~]# kubectl get pods -o wide | grep qldv
qldv-liberty-8499dfcf67-55njr 1/1 Running 0 6m42s 10.40.0.2 worker-node1 <none> <none>
[root#k8s-master ~]# ping 10.40.0.2
PING 10.40.0.2 (10.40.0.2) 56(84) bytes of data.
From 10.32.0.1 icmp_seq=1 Destination Host Unreachable
From 10.32.0.1 icmp_seq=2 Destination Host Unreachable
From 10.32.0.1 icmp_seq=3 Destination Host Unreachable
[root#k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 43d v1.15.0
worker-node1 Ready <none> 42d v1.15.0
worker-node2 Ready <none> 42d v1.15.0
[root#k8s-master ~]# kubectl describe pod qldv-liberty-8499dfcf67-55njr
Name: qldv-liberty-8499dfcf67-55njr
Namespace: default
Priority: 0
Node: worker-node1/192.168.142.130
Start Time: Sat, 17 Aug 2019 20:05:57 +0700
Labels: app=qldv-liberty
pod-template-hash=8499dfcf67
Annotations: <none>
Status: Running
IP: 10.40.0.2
Controlled By: ReplicaSet/qldv-liberty-8499dfcf67
Containers:
qldv-liberty:
Container ID: docker://03636fb62d4cca0e41f4ad9f5a94b50cf371089ab5a0813ed802d02f4ac4b07a
Image: qldv-liberty
Image ID: docker://sha256:bd0d7ce1c07da5b9d398131b17da7a6931a9b7ae0673d19a6ec0c409416afc69
Port: 9080/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 17 Aug 2019 20:06:23 +0700
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vtphv (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-vtphv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vtphv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 119s default-scheduler Successfully assigned default/qldv-liberty-8499dfcf67-55njr to worker-node1
Normal Pulled 96s kubelet, worker-node1 Container image "qldv-liberty" already present on machine
Normal Created 95s kubelet, worker-node1 Created container qldv-liberty
Normal Started 91s kubelet, worker-node1 Started container qldv-liberty
I have another app, it also has a pod that running on the worker node 1, and I can ping the pod's ip from master. But I don't know why it is impossible with above case.
Please help me !
I doubt that the cluster still exists, therefore I'd better share some troubleshooting tips:
Check status of all control plane components and node status. Ensure kube-proxy and network addon (flannel/calico/waive/etc) Pods exist on each node and in Ready state.
kubectl get deployments,daemonsets,pods,svc -A -o wide
There are several requirements for Kubernetes cluster, and it worth to check if they are satisfied.
Some useful information could be found in the control-plane component logs using
kubectl logs kube-component-name-pod -n kube-system
or kubelet logs using
journalctl -u kubelet
It's better to use well known images like nginx or mendhak/http-https-echo. They could be configured to listen any desired port and provide detailed information about requests in logs or in HTTP reply. It helps to exclude application/image related issues.
Check connectivity to Pod IP and Service ClusterIP within the same node first.
If worker node OS doesn't have necessary tools for troubleshooting (e.g container optimized images or coreOS), Pod with Ubuntu or Busybox image can be used for that. Creating Deployment or DaemonSet could help to schedule it on all nodes. Note that firewall or network issues can block kubectl exec connections to that pods.
If everything works fine within the same node, but connection to the Pod can't be established from another node it worth to check network addon status and nodes firewall configuration. Native Linux firewall helpers can interfere with iptables set of rules created by kube-proxy and block the connection.
Clusters created in public clouds may require additional routing, peering, cloud firewall or security groups configuration to allow full IPIP connectivity between cluster nodes, especially if they are created in different VPCs.
The next thing that worth to check is coredns/kube-dns health. They suppose to resolve to correct IP address cluster Services' names like servicename.namespacename.svc.cluster.local if requested using their pod IP addresses or kube-dns Service (it usually has IP address 10.96.0.10 in default kubeadm cluster configuration).
Solution for each problem could be found in another answers on StackExchange sites. Official documentation is another great source of information and also contains good examples.
I am trying to configure kubernetes plugin in Jenkins. Here are the details I am putting in:
Now, when I click on test connection, I get the following error:
Error testing connection https://xx.xx.xx.xx:8001: Failure executing: GET at: https://xx.xx.xx.xx:8001/api/v1/namespaces/default/pods. Message: Unauthorized! Configured service account doesn't have access. Service account may have been revoked. Unauthorized.
After doing some google, I realized it might be because of role binding, so I create a role binding for my default service account:
# kubectl describe rolebinding jenkins
Name: jenkins
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: pod-reader
Subjects:
Kind Name Namespace
---- ---- ---------
ServiceAccount default default
Here is the pod-reader role:
# kubectl describe role pod-reader
Name: pod-reader
Labels: <none>
Annotations: <none>
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
pods [] [] [get watch list]
But I still get the same error. Is there anything else that needs to be done here? TIA.
I think it's not working because you didn't provide the certificate. This worked for me.
Figured it out, I was using credentials as plain text. I changed that to kubernetes secret, and it worked.
I have create multicluster kubernetes environment and my node details is:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
16-node-121 Ready <none> 32m v1.14.1 192.168.0.121 <none> Ubuntu 16.04.6 LTS 4.4.0-142-generic docker://18.9.2
master-16-120 Ready master 47m v1.14.1 192.168.0.120 <none> Ubuntu 16.04.6 LTS 4.4.0-142-generic docker://18.9.2
And I created a service and exposed the service using following command:
$kubectl expose deployment hello-world --port=80 --target-port=8080
The is created and exposed. My service detail information is:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-world ClusterIP 10.105.7.156 <none> 80/TCP 33m
I exposed my deployment by following command:
kubectl expose deployment hello-world --port=80 --target-port=8080
service/hello-world exposed
Unfortunately when I try to access my service using curl command I'm getting timeout error:
My service details are following:
master-16-120#master-16-120:~$ kubectl describe service hello-world
Name: hello-world
Namespace: default
Labels: run=hello-world
Annotations: <none>
Selector: run=hello-world
Type: ClusterIP
IP: 10.105.7.156
Port: <unset> 80/TCP
TargetPort: 8080/TCP
Endpoints: 192.168.1.2:8080
Session Affinity: None
Events: <none>
curl http://10.105.7.156:80
curl: (7) Failed to connect to 10.105.7.156 port 80: Connection timed out
Here I'm using calico for my multicluster network which is :
wget https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
wget https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
My Pod networking specification is:
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
At last I have got the solution. Thanks to Daniel's comment which helps me to reach my solution.
I change my kubernetis pod network CIDR and calico as follow:
--pod-network-cidr=10.10.0.0/16
And also configure master which is master-16-120 Hosts (/etc/hosts):
master-16-120 192.168.0.120
16-node-121 192.168.0.121
And in the node which is 16-node-121 Hosts (/etc/hosts)
master-16-120 192.168.0.120
16-node-121 192.168.0.121
Now my kubernetes is ready to go.
I'm stepping through Kubernetes in Action to get more than just familiarity with Kubernetes.
I already had a Docker Hub account that I've been using for Docker-specific experiments.
As described in chapter 2 of the book, I built the toy "kubia" image, and I was able to push it to Docker Hub. I verified this again by logging into Docker Hub and seeing the image.
I'm doing this on Centos7.
I then run the following to create the replication controller and pod running my image:
kubectl run kubia --image=davidmichaelkarr/kubia --port=8080 --generator=run/v1
I waited a while for statuses to change, but it never finishes downloading the image, when I describe the pod, I see something like this:
Normal Scheduled 24m default-scheduler Successfully assigned kubia-25th5 to minikube
Normal SuccessfulMountVolume 24m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-x5nl4"
Normal Pulling 22m (x4 over 24m) kubelet, minikube pulling image "davidmichaelkarr/kubia"
Warning Failed 22m (x4 over 24m) kubelet, minikube Failed to pull image "davidmichaelkarr/kubia": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
So I then constructed the following command:
curl -v -u 'davidmichaelkarr:**' 'https://registry-1.docker.io/v2/'
Which uses the same password I use for Docker Hub (they should be the same, right?).
This gives me the following:
* About to connect() to proxy *** port 8080 (#0)
* Trying **.**.**.**...
* Connected to *** (**.**.**.**) port 8080 (#0)
* Establish HTTP proxy tunnel to registry-1.docker.io:443
* Server auth using Basic with user 'davidmichaelkarr'
> CONNECT registry-1.docker.io:443 HTTP/1.1
> Host: registry-1.docker.io:443
> User-Agent: curl/7.29.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
<
* Proxy replied OK to CONNECT request
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=*.docker.io
* start date: Aug 02 00:00:00 2017 GMT
* expire date: Sep 02 12:00:00 2018 GMT
* common name: *.docker.io
* issuer: CN=Amazon,OU=Server CA 1B,O=Amazon,C=US
* Server auth using Basic with user 'davidmichaelkarr'
> GET /v2/ HTTP/1.1
> Authorization: Basic ***
> User-Agent: curl/7.29.0
> Host: registry-1.docker.io
> Accept: */*
>
< HTTP/1.1 401 Unauthorized
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< Www-Authenticate: Bearer realm="https://auth.docker.io/token",service="registry.docker.io"
< Date: Wed, 24 Jan 2018 18:34:39 GMT
< Content-Length: 87
< Strict-Transport-Security: max-age=31536000
<
{"errors":[{"code":"UNAUTHORIZED","message":"authentication required","detail":null}]}
* Connection #0 to host *** left intact
I don't understand why this is failing auth.
Update:
Based on the first answer and the info I got from this other question, I edited the description of the service account, adding the "imagePullSecrets" key, then I deleted the replicationcontroller again and recreated it. The result appeared to be identical.
This is the command I ran to create the secret:
kubectl create secret docker-registry regsecret --docker-server=registry-1.docker.io --docker-username=davidmichaelkarr --docker-password=** --docker-email=**
Then I obtained the yaml for the serviceaccount, added the key reference for the secret, then set that yaml as the settings for the serviceaccount.
This are the current settings for the service account:
$ kubectl get serviceaccount default -o yaml
apiVersion: v1
imagePullSecrets:
- name: regsecret
kind: ServiceAccount
metadata:
creationTimestamp: 2018-01-24T00:05:01Z
name: default
namespace: default
resourceVersion: "81492"
selfLink: /api/v1/namespaces/default/serviceaccounts/default
uid: 38e2882c-009a-11e8-bf43-080027ae527b
secrets:
- name: default-token-x5nl4
Here's the updated events list from the describe of the pod after doing this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m default-scheduler Successfully assigned kubia-f56th to minikube
Normal SuccessfulMountVolume 7m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-x5nl4"
Normal Pulling 5m (x4 over 7m) kubelet, minikube pulling image "davidmichaelkarr/kubia"
Warning Failed 5m (x4 over 7m) kubelet, minikube Failed to pull image "davidmichaelkarr/kubia": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal BackOff 4m (x6 over 7m) kubelet, minikube Back-off pulling image "davidmichaelkarr/kubia"
Warning FailedSync 2m (x18 over 7m) kubelet, minikube Error syncing pod
What else might I be doing wrong?
Update:
I think it's likely that all these issues with authentication are unrelated to the real issue. The key point is what I see in the pod description (breaking into multiple lines to make it easier to see):
Warning Failed 22m (x4 over 24m) kubelet,
minikube Failed to pull image "davidmichaelkarr/kubia": rpc error: code =
Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/:
net/http: request canceled while waiting for connection
(Client.Timeout exceeded while awaiting headers)
The last line seems like the most important piece of information at this point. It's not failing authentication, it's timing out the connection. In my experience, something like this is usually caused by issues getting through a firewall/proxy. We do have an internal proxy, and I have those environment variables set in my environment, but what about the "serviceaccount" that kubectl is using to make this connection? Do I have to somehow set a proxy configuration in the serviceaccount description?
You need to make sure the Docker daemon running in the Minikube VM uses your corporate proxy by starting minikube along these lines:
minikube start --docker-env http_proxy=http://proxy.corp.com:port --docker-env https_proxy=http://proxy.corp.com:port --docker-env no_proxy=192.168.99.0/24
I faced same issue couple of time.
Updating here, might be useful for someone.
First describe the POD(kubectl describe pod <pod_name>),
1. If you see access denied/repository does not exist errors like
Error response from daemon: pull access denied for test/nginx,
repository does not exist or may require 'docker login': denied:
requested access to the resource is denied
Solution:
If local K8s, you need to login into docker registry first OR
if Kubernetes Cluster on Cloud, create secret for Registry and add imagepullsecret
along with secret name
2. If you get timeout error,
Error: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while
awaiting headers)
Solution:
check the node is able to connect network OR able to reach private/public Registry.
If AWS EKS Cluster, you need to enable auto-assign ip to Subnet where EC2 is running.
To fetch images stored on registries that require credentials, you need to create a special type of secret called imagePullSecrets.
kubectl create secret docker-registry regsecret --docker-server=<your-registry-server> --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
Then create the Pod specifying the imagePullSecrets field
apiVersion: v1
kind: Pod
metadata:
name: private-reg
spec:
containers:
- name: private-reg-container
image: <your-private-image>
imagePullSecrets:
- name: regsecret
As mentioned in my comment to the original post, I had the same issue. The only thing of note is the minikube was up as at creation. I restarted the underlying VM and image pulls started working.
This seems to be quite old issue, but I have similar issue and solved by logged in to your docker account.
You can try it by deleting the existing failed pods, firing "docker login" command (login to your acc), then retry for the pod creation.