Kubernetes Calico node 'XXXXXXXXXXX' already using IPv4 Address XXXXXXXXX, CrashLoopBackOff

Kubernetes Calico node 'XXXXXXXXXXX' already using IPv4 Address XXXXXXXXX, CrashLoopBackOff - docker

I used the AWS Kubernetes Quickstart to create a Kubernetes cluster in a VPC and private subnet: https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud.pdf. It was running fine for a while. I have Calico installed on my Kubernetes cluster. I have two nodes and a master. The calico pods on the master are running fine, the ones on the nodes are in crashloopbackoff state:
NAME READY STATUS RESTARTS AGE
calico-etcd-ztwjj 1/1 Running 1 55d
calico-kube-controllers-685755779f-ftm92 1/1 Running 2 55d
calico-node-gkjgl 1/2 CrashLoopBackOff 270 22h
calico-node-jxkvx 2/2 Running 4 55d
calico-node-mxhc5 1/2 CrashLoopBackOff 9 25m
Describing one of the crashed pods:
ubuntu#ip-10-0-1-133:~$ kubectl describe pod calico-node-gkjgl -n kube-system
Name: calico-node-gkjgl
Namespace: kube-system
Node: ip-10-0-0-237.us-east-2.compute.internal/10.0.0.237
Start Time: Mon, 17 Sep 2018 16:56:41 +0000
Labels: controller-revision-hash=185957727
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.0.0.237
Controlled By: DaemonSet/calico-node
Containers:
calico-node:
Container ID: docker://d89979ba963c33470139fd2093a5427b13c6d44f4c6bb546c9acdb1a63cd4f28
Image: quay.io/calico/node:v3.1.1
Image ID: docker-pullable://quay.io/calico/node#sha256:19fdccdd4a90c4eb0301b280b50389a56e737e2349828d06c7ab397311638d29
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 18 Sep 2018 15:14:44 +0000
Finished: Tue, 18 Sep 2018 15:14:44 +0000
Ready: False
Restart Count: 270
Requests:
cpu: 250m
Liveness: http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: kubeadm,bgp
CALICO_DISABLE_FILE_LOGGING: true
CALICO_K8S_NODE_REF: (v1:spec.nodeName)
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_IPV4POOL_IPIP: Always
FELIX_IPV6SUPPORT: false
FELIX_IPINIPMTU: 1440
FELIX_LOGSEVERITYSCREEN: info
IP: autodetect
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
install-cni:
Container ID: docker://b37e0ec7eba690473a4999a31d9f766f7adfa65f800a7b2dc8e23ead7520252d
Image: quay.io/calico/cni:v3.1.1
Image ID: docker-pullable://quay.io/calico/cni#sha256:dc345458d136ad9b4d01864705895e26692d2356de5c96197abff0030bf033eb
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Running
Started: Mon, 17 Sep 2018 17:11:52 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 17 Sep 2018 16:56:43 +0000
Finished: Mon, 17 Sep 2018 17:10:53 +0000
Ready: True
Restart Count: 1
Environment:
CNI_CONF_NAME: 10-calico.conflist
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
calico-cni-plugin-token-b7sfl:
Type: Secret (a volume populated by a Secret)
SecretName: calico-cni-plugin-token-b7sfl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
:NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m (x6072 over 22h) kubelet, ip-10-0-0-237.us-east-2.compute.internal Back-off restarting failed container
The logs for the same pod:
ubuntu#ip-10-0-1-133:~$ kubectl logs calico-node-gkjgl -n kube-system -c calico-node
2018-09-18 15:14:44.605 [INFO][8] startup.go 251: Early log level set to info
2018-09-18 15:14:44.605 [INFO][8] startup.go 269: Using stored node name from /var/lib/calico/nodename
2018-09-18 15:14:44.605 [INFO][8] startup.go 279: Determined node name: ip-10-0-0-237.us-east-2.compute.internal
2018-09-18 15:14:44.609 [INFO][8] startup.go 101: Skipping datastore connection test
2018-09-18 15:14:44.610 [INFO][8] startup.go 352: Building new node resource Name="ip-10-0-0-237.us-east-2.compute.internal"
2018-09-18 15:14:44.610 [INFO][8] startup.go 367: Initialize BGP data
2018-09-18 15:14:44.614 [INFO][8] startup.go 564: Using autodetected IPv4 address on interface ens3: 10.0.0.237/19
2018-09-18 15:14:44.614 [INFO][8] startup.go 432: Node IPv4 changed, will check for conflicts
2018-09-18 15:14:44.618 [WARNING][8] startup.go 861: Calico node 'ip-10-0-0-237' is already using the IPv4 address 10.0.0.237.
2018-09-18 15:14:44.618 [WARNING][8] startup.go 1058: Terminating
Calico node failed to start
So it seems like there is a conflict finding the node IP address, or Calico seems to think the IP is already assigned to another node. Doing a quick search i found this thread: https://github.com/projectcalico/calico/issues/1628. I see that this should be resolved by setting the IP_AUTODETECTION_METHOD to can-reach=DESTINATION, which I'm assuming would be "can-reach=10.0.0.237". This config is an environment variable set on calico/node container. I have been attempting to shell into the container itself, but kubectl tells me the container is not found:
ubuntu#ip-10-0-1-133:~$ kubectl exec calico-node-gkjgl --stdin --tty /bin/sh -c calico-node -n kube-system
error: unable to upgrade connection: container not found ("calico-node")
I'm suspecting this is due to Calico being unable to assign IPs. So I logged onto the host and attempt to shell on the container using docker:
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/bash
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
So I guess there is no shell to execute in the container. Makes sense why Kubernetes couldn't execute that. I tried running commands externally to list environment variables, but I haven't been able to find any, I could be running these commands wrong however:
root#ip-10-0-0-237:~# docker inspect -f '{{range $index, $value := .Config.Env}}{{$value}} {{end}}' k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 printenv IP_AUTODETECTION_METHOD
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"printenv\": executable file not found in $PATH"
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/env
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/env\": stat /bin/env: no such file or directory"
Okay, so maybe I am going about this the wrong way. Should I attempt to change the Calico config files using Kubernetes and redeploy it? Where can I find these on my system? I haven't been able to find where to set the environment variables.

If you look at the Calico docs IP_AUTODETECTION_METHOD is already defaulting to first-round.
My guess is that something or the IP address is not being released by the previous 'run' of calico, or just simply a bug in the v3.1.1 version of calico.
Try:
Delete your Calico pods that are in a CrashBackOff loop
kubectl -n kube-system delete calico-node-gkjgl calico-node-mxhc5
Your pods will be re-created and hopefully initialize.
Upgrade Calico to v3.1.3 or latest. Follow these docs My guess is that Heptio's Calico installation is using the etcd datastore.
Try to understand how Heptio's AWS AMIs work and see if there are any issues with them. This might take some time so you could contact their support as well.
Try a different method to install Kubernetes with Calico. Well documented on https://kubernetes.io

For me what worked was to remove left over docker-networks on the Nodes.
I had to list out current networks on each Node: docker network list and then remove the unneeded ones: docker network rm <networkName>.
After doing that the calico deployment pods were running fine

Related

Why I get exec failed: container_linux.go:380 when I go inside Kubernetes pod?

I started learning about Kubernetes and I installed minikube and kubectl on Windows 7.
After that I created a pod with command:
kubectl run firstpod --image=nginx
And everything is fine:
[![enter image description here][1]][1]
Now I want to go inside the pod with this command: kubectl exec -it firstpod -- /bin/bash but it's not working and I have this error:
OCI runtime exec failed: exec failed: container_linux.go:380: starting container
process caused: exec: "C:/Program Files/Git/usr/bin/bash.exe": stat C:/Program
Files/Git/usr/bin/bash.exe: no such file or directory: unknown
command terminated with exit code 126
How can I resolve this problem?
And another question is about this firstpod pod. With this command kubectl describe pod firstpod I can see information about the pod:
Name: firstpod
Namespace: default
Priority: 0
Node: minikube/192.168.99.100
Start Time: Mon, 08 Nov 2021 16:39:07 +0200
Labels: run=firstpod
Annotations: <none>
Status: Running
IP: 172.17.0.3
IPs:
IP: 172.17.0.3
Containers:
firstpod:
Container ID: docker://59f89dad2ddd6b93ac4aceb2cc0c9082f4ca42620962e4e692e3d6bcb47d4a9e
Image: nginx
Image ID: docker-pullable://nginx#sha256:644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36
Port: <none>
Host Port: <none>
State: Running
Started: Mon, 08 Nov 2021 16:39:14 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9b8mx (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-9b8mx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned default/firstpod to minikube
Normal Pulling 32m kubelet Pulling image "nginx"
Normal Pulled 32m kubelet Successfully pulled image "nginx" in 3.677130128s
Normal Created 31m kubelet Created container firstpod
Normal Started 31m kubelet Started container firstpod
So I can see it is a docker container id and it is started, also there is the image, but if I do docker images or docker ps there is nothing. Where are these images and container? Thank you!
[1]: https://i.stack.imgur.com/xAcMP.jpg

One error for certain is gitbash adding Windows the path. You can disable that with a double slash:
kubectl exec -it firstpod -- //bin/bash
This command will only work if you have bash in the image. If you don't, you'll need to pick a different command to run, e.g. /bin/sh. Some images are distroless or based on scratch to explicitly not include things like shells, which will prevent you from running commands like this (intentionally, for security).

Running a docker container which uses GPU from kubernetes fails to find the GPU

I want to run a docker container which uses GPU (it runs a cnn to detect objects on a video), and then run that container on Kubernetes.
I can run the container from docker alone without problems, but when I try to run the container from Kubernetes it fails to find the GPU.
I run it using this command:
kubectl exec -it namepod /bin/bash
This is the problem that I get:
kubectl exec -it tym-python-5bb7fcf76b-4c9z6 /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root#tym-python-5bb7fcf76b-4c9z6:/opt# cd servicio/
root#tym-python-5bb7fcf76b-4c9z6:/opt/servicio# python3 TM_Servicev2.py
Try to load cfg: /opt/darknet/cfg/yolov4.cfg, weights: /opt/yolov4.weights, clear = 0
CUDA status Error: file: ./src/dark_cuda.c : () : line: 620 : build time: Jul 30 2021 - 14:05:34
CUDA Error: no CUDA-capable device is detected
python3: check_error: Unknown error -1979678822
root#tym-python-5bb7fcf76b-4c9z6:/opt/servicio#
EDIT.
I followed all the steps on the Nvidia docker 2 guide and downloaded the Nvidia plugin for Kubernetes.
however when I deploy Kubernetes it stays as "pending" and never actually starts. I don't get an error anymore, but it never starts.
The pod appears like this:
gpu-pod 0/1 Pending 0 3m19s
EDIT 2.
I ended up reinstalling everything and now my pod appears completed but not running. like this.
default gpu-operator-test 0/1 Completed 0 62m
Answering Wiktor.
when I run this command:
kubectl describe pod gpu-operator-test
I get:
Name: gpu-operator-test
Namespace: default
Priority: 0
Node: pdi-mc/192.168.0.15
Start Time: Mon, 09 Aug 2021 12:09:51 -0500
Labels: <none>
Annotations: cni.projectcalico.org/containerID: 968e49d27fb3d86ed7e70769953279271b675177e188d52d45d7c4926bcdfbb2
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Succeeded
IP: 192.168.10.81
IPs:
IP: 192.168.10.81
Containers:
cuda-vector-add:
Container ID: docker://d49545fad730b2ec3ea81a45a85a2fef323edc82e29339cd3603f122abde9cef
Image: nvidia/samples:vectoradd-cuda10.2
Image ID: docker-pullable://nvidia/samples#sha256:4593078cdb8e786d35566faa2b84da1123acea42f0d4099e84e2af0448724af1
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 09 Aug 2021 12:10:29 -0500
Finished: Mon, 09 Aug 2021 12:10:30 -0500
Ready: False
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9ktgq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-9ktgq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
I'm using this configuration file to create the pod
apiVersion: v1
kind: Pod
metadata:
name: gpu-operator-test
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "nvidia/samples:vectoradd-cuda10.2"
resources:
limits:
nvidia.com/gpu: 1

Addressing two topics here:
The error you saw at the beginning:
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Means that you tried to use a deprecated version of the kubectl exec command. The proper syntax is:
$ kubectl exec (POD | TYPE/NAME) [-c CONTAINER] [flags] -- COMMAND [args...]
See here for more details.
According the the official docs the gpu-operator-test pod should run to completion:
You can see that the pod's status is Succeeded and also:
State: Terminated
Reason: Completed
Exit Code: 0
Exit Code: 0 means that the specified container command completed successfully.
More details can be found in the official docs.

httpd Docker image CrashLoopBackOff on Kubernetes

I have a simple docker image which is working fine locally.
It is basically the same as the example on apache's httpd page.
FROM httpd:2.4
COPY ./public-html/ /usr/local/apache2/htdocs/
As per the page example, I can build and run my image as follows:
$ docker build -t gcr.io/${PROJECT_ID}/hello-app:v1 .
$ docker run -dit --name my-running-app -p 8080:80 <img_id>
I then head over to http://localhost:8080 , and everything seems to be working as it should.
However, when I try to create a deployment for my Google Cloud Kubernetes instance, my pod fails and gets to the state of CrashLoopBackOff. (This is after I have pushed the image to Google Cloud Registry, so that the deployment may grab the image from there.)
I think that this CrashLoopBackOff problem is happening due to me not having an ENTRYPOINT to my container; ie, the pod spawns, no command is issued, and then it is completed and crashes.
I have 2 questions then:
What command should I add to my Dockerfile to get the http server up and running on the pod (assuming my assessment of the problem is indeed correct)?
How is this running locally? Locally I simply $ docker run -dit --name my-running-app -p 8080:80 <img_id>. I do not specify that the container should run httpd, yet it does? How is this happening?
Edit - additional information:
I deployed onto K8's by doing the following:
$ kubectl create deployment hello-app --image=gcr.io/${PROJECT_ID}/hello-app:v1
Kubectl logs:
$ kubectl logs <pod_name>
standard_init_linux.go:211: exec user process caused "exec format error"
kubectl describe:
$ kubectl describe pod hello-app-6b89cd98f6-gn65p
Name: <name>
Namespace: default
Priority: 0
Node: <my_node>
Start Time: Mon, 22 Mar 2021 12:32:51 +0200
Labels: app=hello-app
pod-template-hash=6b89cd98f6
Annotations: <none>
Status: Running
IP: 10.12.1.13
IPs:
IP: 10.12.1.13
Controlled By: <replica_set>
Containers:
hello-app:
Container ID: <cid>
Image: <img>
Image ID: <img_id>
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 22 Mar 2021 15:12:18 +0200
Finished: Mon, 22 Mar 2021 15:12:18 +0200
Ready: False
Restart Count: 36
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b8p9t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-b8p9t:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-b8p9t
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m9s (x741 over 164m) kubelet Back-off restarting failed container

CrashLoopBackOff error means the pod keeps crashing and kubernetes has given up on it. You have to determine what is causing the crash.
Overally the cause of problem may be that:
type of parameters of the pod or container have been configured incorrectly
the application inside the container keeps crashing
an error occurred while deploying Kubernetes
You can type watch kubectl describe <pod-name> to check events as the pod is being created. But if the pod crashes after it starts up, you need to get the container logs kubectl logs -f <your-pod-name>.
Read more: kubernetes-crashloopbackoff.
As #Krishna Chaurasia said check the thread which is implying that the default command being run is not an executable - executable formats could be different for different platforms. As #Sagar Velankar mentioned use in Docker file in FROM line --platform flag to specify linux/amd64 as the target architecture. See: dockerfile-from.
You can use docker buildx docs.docker.com/docker-for-mac/multi-arch to build and push multi architecture images and kubelet will pull the image with correct architecture.

Kind kubernetes cluster failed to pull docker images

I tried to use KinD as an alternative of Minikube to bootstrap a K8S cluster in my local machine.
The cluster is created successfully.
But when I tried to create some pods/deployments from images, it failed.
$ kubectl run nginx --image=nginx
$ kubectl run hello --image=hello-world
After some minutes, use get pods to get a failed status.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello 0/1 ImagePullBackOff 0 11m
nginx 0/1 ImagePullBackOff 0 22m
I am afraid this is another Global Firewall problem in China.
kubectl describe pods/nginx
Name: nginx
Namespace: default
Priority: 0
Node: dev-control-plane/172.19.0.2
Start Time: Sun, 30 Aug 2020 19:46:06 +0800
Labels: run=nginx
Annotations: <none>
Status: Pending
IP: 10.244.0.5
IPs:
IP: 10.244.0.5
Containers:
nginx:
Container ID:
Image: nginx
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mgq96 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-mgq96:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mgq96
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 56m default-scheduler Successfully assigned default/nginx to dev-control-plane
Normal BackOff 40m kubelet, dev-control-plane Back-off pulling image "nginx"
Warning Failed 40m kubelet, dev-control-plane Error: ImagePullBackOff
Warning Failed 13m (x3 over 40m) kubelet, dev-control-plane Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to copy: unexpected EOF
Warning Failed 13m (x3 over 40m) kubelet, dev-control-plane Error: ErrImagePull
Normal Pulling 13m (x4 over 56m) kubelet, dev-control-plane Pulling image "nginx"
When I entered to the kindest/node container, but there is no docker in it. Not sure how KIND works, originally I understand it deploys a K8S cluster into a Docker container.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a644f8b61314 kindest/node:v1.19.0 "/usr/local/bin/entr…" About an hour ago Up About an hour 127.0.0.1:52301->6443/tcp dev-control-plane
$ docker exec -it a644f8b61314 /bin/bash
root#dev-control-plane:/# docker -v
bash: docker: command not found
After reading the Kind docs, I can not find an option to set a repository mirror there like that in Minikube.
BTW, I am using the latest Docker Desktop beta on a Windows 10.

First pull the image in your local system using docker pull nginx and then use below command to load that image to the kind cluster
kind load docker-image nginx --name kind-cluster-name
Kind uses containerd instead of docker as runtime, that's why docker is not installed on the nodes.
Alternatively you can use crictl tool to pull and check images inside the kind node.
crictl pull nginx
crictl images

I've run into same issue because I've exported http_proxy and https_proxy before creating cluster to a local proxy (127.0.0.1), which is unrechable in the cluster. After unset http(s)_proxy and recreate cluster, everything runs fine.

Pod gets into status of CrashLoopBackOff and gets restarted repeatedly - Exit code is 0

I have a docker container that is running fine when I run it using docker run. I am trying to put that container inside a pod but I am facing issues. The first run of the pod shows status as "Completed". And then the pod keeps restarting with CrashLoopBackoff status. The exit code however is 0.
Here is the result of kubectl describe pod :
Name: messagingclientuiui-6bf95598db-5znfh
Namespace: mgmt
Node: db1mgr0deploy01/172.16.32.68
Start Time: Fri, 03 Aug 2018 09:46:20 -0400
Labels: app=messagingclientuiui
pod-template-hash=2695115486
Annotations: <none>
Status: Running
IP: 10.244.0.7
Controlled By: ReplicaSet/messagingclientuiui-6bf95598db
Containers:
messagingclientuiui:
Container ID: docker://a41db3bcb584582e9eacf26b02c7ef26f57c2d43b813f44e4fd1ba63347d3fc3
Image: 172.32.1.4/messagingclientuiui:667-I20180802-0202
Image ID: docker-pullable://172.32.1.4/messagingclientuiui#sha256:89a002448660e25492bed1956cfb8fff447569e80ac8b7f7e0fa4d44e8abee82
Port: 9087/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 03 Aug 2018 09:50:06 -0400
Finished: Fri, 03 Aug 2018 09:50:16 -0400
Ready: False
Restart Count: 5
Environment Variables from:
mesg-config ConfigMap Optional: false
Environment: <none>
Mounts:
/docker-mount from messuimount (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-2pthw (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
messuimount:
Type: HostPath (bare host directory volume)
Path: /mon/monitoring-messui/docker-mount
HostPathType:
default-token-2pthw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-2pthw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m default-scheduler Successfully assigned messagingclientuiui-6bf95598db-5znfh to db1mgr0deploy01
Normal SuccessfulMountVolume 4m kubelet, db1mgr0deploy01 MountVolume.SetUp succeeded for volume "messuimount"
Normal SuccessfulMountVolume 4m kubelet, db1mgr0deploy01 MountVolume.SetUp succeeded for volume "default-token-2pthw"
Normal Pulled 2m (x5 over 4m) kubelet, db1mgr0deploy01 Container image "172.32.1.4/messagingclientuiui:667-I20180802-0202" already present on machine
Normal Created 2m (x5 over 4m) kubelet, db1mgr0deploy01 Created container
Normal Started 2m (x5 over 4m) kubelet, db1mgr0deploy01 Started container
Warning BackOff 1m (x8 over 4m) kubelet, db1mgr0deploy01 Back-off restarting failed container
kubectl get pods
NAME READY STATUS RESTARTS AGE
messagingclientuiui-6bf95598db-5znfh 0/1 CrashLoopBackOff 9 23m
I am assuming we need a loop to keep the container running in this case. But I dont understand why it worked when it ran using docker and not working when it is inside a pod. Shouldnt it behave the same ?
How do we henerally debug CrashLoopBackOff status apart from running kubectl describe pod and kubectl logs

The container would terminate with exit code 0 if there isn't at least one process running in the background. To keep the container running, add these to the deployment configuration:
command: ["sh"]
stdin: true
Replace sh with bash on any other shell that the image may have.
Then you can drop inside the container with exec:
kubectl exec -it <pod-name> sh
Add -c <container-name> argument if the pod has more than one container.

are you sure you run your software as docker run ... -d ... <command> and it kept running and you use the same exact command in your pod ? In some cases, if you compare things that run on docker with -it and no -d you might find your self in a pinch as they expect terminal to communicate with user and exit if tty is not available (hint: pod/container can be run with tty: true)
It is very unlikely that you have software that runs in a detached docker and does not in kube.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Kubernetes Calico node 'XXXXXXXXXXX' already using IPv4 Address XXXXXXXXX, CrashLoopBackOff - docker

For me what worked was to remove left over docker-networks on the Nodes. I had to list out current networks on each Node: docker network list and then remove the unneeded ones: docker network rm <networkName>. After doing that the calico deployment pods were running fine

Related

Why I get exec failed: container_linux.go:380 when I go inside Kubernetes pod?

Running a docker container which uses GPU from kubernetes fails to find the GPU

httpd Docker image CrashLoopBackOff on Kubernetes

Kind kubernetes cluster failed to pull docker images

Pod gets into status of CrashLoopBackOff and gets restarted repeatedly - Exit code is 0

Categories

Resources