How do I diagnose a Kubernetes cluster that never becomes ready?

How do I diagnose a Kubernetes cluster that never becomes ready? - docker

I deployed an image to Kubernetes, but it never becomes ready, even after hours.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-b8dd974db-9jbsl 0/1 ImagePullBackOff 0 21m
All this happens with the Quickstart Hello app, as well as my own Docker image.
Attempts to attach fail.
$ kubectl attach -it myapp-b8dd974db-9jbs
Unable to use a TTY - container myapp did not allocate one
If you don't see a command prompt, try pressing enter.
error: unable to upgrade connection: container
myapp not found in pod myapp-b8dd974db-9jbsl_default
Attempts to access it over HTTP fail.
In Stackdriver Logging I see messages like
skipping: failed to "StartContainer" for "myapp"
with ImagePullBackOff: "Back-off pulling image
\"gcr.io/myproject/myapp-image:1.0\""
and No such image
Yet I did deploy these images and the Cloud Console shows that the pods are "green."
And kubectl seems to tell me that the cluster is OK.
$ kubectl get service myapp
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myapp LoadBalancer 10.43.248.78 35.193.107.141 8222:31840/TCP 29m
How can I diagnose this?

You can use kubectl describe myapp-b8dd974db-9jbsl to get more information on your pod.
But from the status message 'ImagePullBackOff' it is probably trying to download the docker image and failing.
This might because of several reasons, you will obtain more information with the kubectl describe but it's probably that you don't have permissions to that docker repository or the image/image:tag does not exist.

Related

How To Stop a Stuck Pod in Kubernetes

Background
I am trying to learn to automate deployments with Jenkins on my laptop computer. I did not check the resource settings in the helm chart when I deployed Jenkins and I ended up over provisioned the memory and cpu requests.
The pod was initializing for several minutes and then eventually ended up in the status of CrashLoopBackOf.
Software and Versions
$ minikube start
😄 minikube v1.17.1 on Microsoft Windows 10 Enterprise 10.0.19042 Build 19042
...
...
🐳 Preparing Kubernetes v1.20.2 on Docker 20.10.2
...
Note that Docker was installed from Visual Studio Code with Docker Desktop and Windows 10 WSL Ubuntu 20.04 LTS enabled.
$ helm version
version.BuildInfo{Version:"v3.5.2", GitCommit:"167aac70832d3a384f65f9745335e9fb40169dc2", GitTreeState:"dirty", GoVersion:"go1.15.7"}
Installation
$ helm repo add stable https://charts.jenkins.io
$ helm repo ls
NAME URL
stable https://charts.jenkins.io
$ kubectl create namespace devops-cicd
namespace/devops-cicd created
$ helm install jenkins stable/jenkins --namespace devops-cicd
$ kubectl get svc -n devops-cicd -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
jenkins ClusterIP 10.108.169.104 <none> 8080/TCP 7m1s app.kubernetes.io/component=jenkins-controller,app.kubernetes.io/instance=jenkins
jenkins-agent ClusterIP 10.103.213.213 <none> 50000/TCP 7m app.kubernetes.io/component=jenkins-controller,app.kubernetes.io/instance=jenkins
$ kubectl get pod -n devops-cicd --output wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
jenkins-0 1/2 Running 1 8m13s 172.17.0.10 minikube <none> <none>
The pod failed eventually, ending with the status of CrashLoopBackOff
Unfortunately, I forgot to extract the logs for the pod.
In full disclosure, I got it deployed successfully by pulling the chart to my local file system and halved the size of the memory and cpu settings.
Questions
I fear that the situation of over provisioning in the Production environment one day. So how does one stop a failed pod from respawning/restarting and undo/rollback the deployment?
I tried to set Deployment replicas=0 but it had no effect. Actually, the only resources I could see were a couple of Services, the Pod itself, a PersistentVolume and some secrets.
I had to delete the namespace to remove the pod. This is not ideal. So what is the best way to tackle this situation (i.e. just deal with the problematic pod)?

Drawing on the feedback I have gathered and confirmed that the pod is scheduled by a StatefulSet. I am attempting to answer my own question with the hope that it is useful for newbies like me.
My question was how to stop a pod (from respawning).
So here I get the info on the StatefulSet:
$ kubectl get statefulsets -n devops-cicd -o wide
NAME READY AGE CONTAINERS IMAGES
jenkins 0/1 33s jenkins,config-reload jenkins/jenkins:2.303.1-jdk11,kiwigrid/k8s-sidecar:1.12.2
Then scale in:
$ kubectl scale statefulset jenkins --replicas=0 -n devops-cicd
statefulset.apps/jenkins scaled
Result:
$ kubectl get statefulsets -n devops-cicd -o wide
NAME READY AGE CONTAINERS IMAGES
jenkins 0/0 6m35s jenkins,config-reload jenkins/jenkins:2.303.1-jdk11,kiwigrid/k8s-sidecar:1.12.2

Why is my Kubernetes deployment registering as unavailable even though it runs in Docker?

I have a docker image I have created that works on docker like this (local docker)n...
docker run -p 4000:8080 jrg/hello-kerb
Now I am trying to run it as a Kubernetes pod. To do this I create the deployment...
kubectl create deployment hello-kerb --image=jrg/hello-kerb
Then I run kubectl get deployments but the new deployment comes as unavailable...
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
hello-kerb 1 1 1 0 17s
I was using this site as the instructions. It shows that the status should be available...
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
hello-node 1 1 1 1 1m
What am I missing? Why is the deployment unavailable?
UPDATE
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-kerb-6f8f84b7d6-r7wk7 0/1 ImagePullBackOff 0 12s

If you are running a local image (from docker build) it is directly available to the docker daemon and can be executed. If you are using a remote daemon, f.e. in a kubernetes cluster, it will try to get the image from the default registry, since the image is not available locally. This is usually dockerhub. I checked https://hub.docker.com/u/jrg/ and there seems to be no repository and therefore no jrg/hello-kerb
So how can you solve this? When using minikube, you can build (and provide) the image using the docker daemon that is provided by minikube.
eval $(minikube docker-env)
docker build -t jrg/hello-kerb .
You could also provide the image at a registry that is reachable from your container runtime in the kubernetes cluster, f.e. dockerhub.

I solved this by using kubectl edit deployment hello-kerb then finding "imagePullPolicy" (:/PullPolicy). Finally I changed the value from "Always" to "Never". After saving this when I run kubectl get pod it shows...
NAME READY STATUS RESTARTS AGE
hello-kerb-6f744b6cc5-x6dw6 1/1 Running 0 6m
And I can access it.

Kubernetes showing Image Pull Error while scheduling pods

I have configured a secret on Kubernetes and inside the node, I am able to pull an image with docker pull perfectly. But when kubectl tries to schedule a pod on the node it shows image pull backoff error. Is there any setting needs to be done while bootstrapping. I am using community AMI on AWS for Kubernetes node.

Try this:
kubectl describe pod-name - see event log at the end. it should show series of events starting from initial image pull to subsequent attempts and may continue to restart in order to achieve desired state as per deployment record
In most scenarios something within container erroring out resulting restart expected behavior by k8s. to check logs - kubectl logs pod-name
Try to keep container running so you can peek inside running container for more troubleshooting using kubectl exec -it pod-name (if single container) or kubectl exec -it pod-name -c container-name.

Kubernetes :: Restart terminated pod

I'm using Kubernetes to run jobs with a RestartPolicy to Never.
Sometime, I would like to be able to debug a failed/terminated pod. In some way, I'm trying to find how restart it with a sleep XXX command to connect (exec) to the container and get the same state.
In Docker this is something doable using docker ps --all and then docker start X but I didn't find something similar with kubectl or the client-go
Thanks!

Not sure about client-go as I have no experience there. But if I understood the question correctly, you can check the reason of the failure:
kubectl get pods (if you do not see your pod here add --all-namespaces)
NAME READY STATUS RESTARTS AGE
pi-c2x4r 0/1 Completed 0 19m
pi-test-c5hln 0/1 Error 0 16m`
And then run:
kubectl describe pod pi-test-c5hln (name of your pod).
kubectl logs pi-test-c5hln
You can also find more information when you run:
kubectl describe job *job name*
You can find useful information about Jobs and how to work with them (including cleanup, termination and patterns) in here.
Not sure if it needs to be added, but terminating is ongoing process, so you can work with the pod after it goes from terminating to other status (error, completed).

Kubernetes pods are running but docker ps does not give any output

I have been trying to run tomcat container on port 5000 on cluster using kubernetes. But when i am using kubectl create -f tmocat_pod.yaml , it creates pod but docker ps does not give any output. Why is it so?
Ideally, when it is running a pod, it means it is running a container inside that pod and that container is defined in yaml file.
Why is that docker ps does not show any containers running?
I am following the below URLs:
http://containertutorials.com/get_started_kubernetes/k8s_example.html
https://blog.jetstack.io/blog/k8s-getting-started-part2/
How can I get it running and see tomcat running on browser on port 5000.

The docker containers should be running on the virtual machine. Since I only installed minikube on my local machine, I confirmed the following will bring what you want:
minikub ssh
...
docker ps
Just try the kubernetes equivalent of minikube ssh.

In Kubernetes, Docker contaienrs are run on Pods, and Pods are run on Nodes, and Nodes are run on your machine (minikube/GKE)
When you run kubectl create -f tmocat_pod.yaml you basically create a pod and it runs the docker container on that pod.
The node that holds this pod, is basically a virtual instance, if you could 'SSH' into that node, docker ps would work.
What you need is:
kubectl get pods <-- It is like docker ps, it shows you all the pods (think of it as docker containers) running
kubectl get nodes <-- view the host machines for your pods.
kubectl describe pods <pod-name> <-- view system logs for your pods.
kubectl logs <pod-name> <-- Will give you logs for the specific pod.

You can connect your Terminal with the docker server what is running inside your Node/VM.
With this command in your terminal: eval $(minikube docker-env)
This only configures your current terminal window.
illustration

may be you are not using docker as container runtime.
I faced the same issue, and i forgot that i switched to gVisor with runsc as handler.
cat /etc/default/kubelet
KUBELET_EXTRA_ARGS="--container-runtime remote --container-runtime-endpoint unix:///run/containerd/containerd.sock"
If so, you need to use runsc command instead of docker.

I'm not sure where you are running the docker ps command, but if you are trying to do that from your host machine and the k8s cluster is located elsewhere, i.e. your machine is not a node in the cluster, docker ps will not return anything since the containers are not tied to your docker host.
Assuming your pod is running, kubectl get pods will display all of your running pods. To check further details, you can use kubectl describe pod <yourpodname> to check the status of each container (in great detail). To get the pod names, you should be able to use tab-complete with the kubernetes cli. Also, if your pod contains multiple containers, you will need to give the container name as well, which you can use tab-complete for after you've selected your pod.
The output will look similar to:
kubectl describe pod comparison-api-dply-reborn-6ffb88b46b-s2mtx
Name: comparison-api-dply-reborn-6ffb88b46b-s2mtx
Namespace: default
Node: aks-nodepool1-99697518-0/10.240.0.5
Start Time: Fri, 20 Apr 2018 14:08:21 -0400
Labels: app=comparison-pod-reborn
pod-template-hash=2996446026
...
Status: Running
IP: *.*.*.*
Controlled By: ReplicaSet/comparison-api-dply-reborn-6ffb88b46b
Containers:
rabbit-mq:
...
Port: 5672/TCP
State: Running
...
If your containers and pods are already running, then you shouldn't need to troubleshoot them too much. To make them accessible from the Public Internet, take a look at Services (https://kubernetes.io/docs/concepts/services-networking/service/) to make your API's IP address fixed and easily reachable.

Have you tried a "docker ps -a" to see if the container is dead? If it is there you can see its logs with "docker logs " and maybe this gives you a hint.

If your pod is running successfully and if you are looking for the container on the node where the pod is scheduled the issue could be kubernetes is using a different container runtime.
Example
root#renjith-laptop:/home/renjith/raspbery-k8s# kubectl exec -it nginx-8586cf59-h92ct bash
root#nginx-8586cf59-h92ct:/# exit
exit
root#renjith-laptop:/home/renjith/raspbery-k8s# kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-8586cf59-h92ct 1/1 Running 0 47s 10.20.0.3 renjith-laptop
root#renjith-laptop:/home/renjith/raspbery-k8s# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root#renjith-laptop:/home/renjith/raspbery-k8s#
Here I am able exec to the pod, and I am in the same node where pod is scheduled, but docker ps doesn't show the container. In my case kubelet is using different container runtime, one of the argument to kubelet service is --container-runtime-endpoint=unix:///var/run/cri-containerd.sock

From Kubernetes documentation to get container images running on your system:
kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c
Then you get back something like:
2 registry.k8s.io/coredns/coredns:v1.9.3
1 registry.k8s.io/etcd:3.5.4-0
1 registry.k8s.io/kube-apiserver:v1.25.1
1 registry.k8s.io/kube-controller-manager:v1.25.1
3 registry.k8s.io/kube-proxy:v1.25.1
1 registry.k8s.io/kube-scheduler:v1.25.1

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart