I have a simple docker image which is working fine locally.
It is basically the same as the example on apache's httpd page.
FROM httpd:2.4
COPY ./public-html/ /usr/local/apache2/htdocs/
As per the page example, I can build and run my image as follows:
$ docker build -t gcr.io/${PROJECT_ID}/hello-app:v1 .
$ docker run -dit --name my-running-app -p 8080:80 <img_id>
I then head over to http://localhost:8080 , and everything seems to be working as it should.
However, when I try to create a deployment for my Google Cloud Kubernetes instance, my pod fails and gets to the state of CrashLoopBackOff. (This is after I have pushed the image to Google Cloud Registry, so that the deployment may grab the image from there.)
I think that this CrashLoopBackOff problem is happening due to me not having an ENTRYPOINT to my container; ie, the pod spawns, no command is issued, and then it is completed and crashes.
I have 2 questions then:
What command should I add to my Dockerfile to get the http server up and running on the pod (assuming my assessment of the problem is indeed correct)?
How is this running locally? Locally I simply $ docker run -dit --name my-running-app -p 8080:80 <img_id>. I do not specify that the container should run httpd, yet it does? How is this happening?
Edit - additional information:
I deployed onto K8's by doing the following:
$ kubectl create deployment hello-app --image=gcr.io/${PROJECT_ID}/hello-app:v1
Kubectl logs:
$ kubectl logs <pod_name>
standard_init_linux.go:211: exec user process caused "exec format error"
kubectl describe:
$ kubectl describe pod hello-app-6b89cd98f6-gn65p
Name: <name>
Namespace: default
Priority: 0
Node: <my_node>
Start Time: Mon, 22 Mar 2021 12:32:51 +0200
Labels: app=hello-app
pod-template-hash=6b89cd98f6
Annotations: <none>
Status: Running
IP: 10.12.1.13
IPs:
IP: 10.12.1.13
Controlled By: <replica_set>
Containers:
hello-app:
Container ID: <cid>
Image: <img>
Image ID: <img_id>
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 22 Mar 2021 15:12:18 +0200
Finished: Mon, 22 Mar 2021 15:12:18 +0200
Ready: False
Restart Count: 36
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b8p9t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-b8p9t:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-b8p9t
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m9s (x741 over 164m) kubelet Back-off restarting failed container
CrashLoopBackOff error means the pod keeps crashing and kubernetes has given up on it. You have to determine what is causing the crash.
Overally the cause of problem may be that:
type of parameters of the pod or container have been configured incorrectly
the application inside the container keeps crashing
an error occurred while deploying Kubernetes
You can type watch kubectl describe <pod-name> to check events as the pod is being created. But if the pod crashes after it starts up, you need to get the container logs kubectl logs -f <your-pod-name>.
Read more: kubernetes-crashloopbackoff.
As #Krishna Chaurasia said check the thread which is implying that the default command being run is not an executable - executable formats could be different for different platforms. As #Sagar Velankar mentioned use in Docker file in FROM line --platform flag to specify linux/amd64 as the target architecture. See: dockerfile-from.
You can use docker buildx docs.docker.com/docker-for-mac/multi-arch to build and push multi architecture images and kubelet will pull the image with correct architecture.
Related
I started learning about Kubernetes and I installed minikube and kubectl on Windows 7.
After that I created a pod with command:
kubectl run firstpod --image=nginx
And everything is fine:
[![enter image description here][1]][1]
Now I want to go inside the pod with this command: kubectl exec -it firstpod -- /bin/bash but it's not working and I have this error:
OCI runtime exec failed: exec failed: container_linux.go:380: starting container
process caused: exec: "C:/Program Files/Git/usr/bin/bash.exe": stat C:/Program
Files/Git/usr/bin/bash.exe: no such file or directory: unknown
command terminated with exit code 126
How can I resolve this problem?
And another question is about this firstpod pod. With this command kubectl describe pod firstpod I can see information about the pod:
Name: firstpod
Namespace: default
Priority: 0
Node: minikube/192.168.99.100
Start Time: Mon, 08 Nov 2021 16:39:07 +0200
Labels: run=firstpod
Annotations: <none>
Status: Running
IP: 172.17.0.3
IPs:
IP: 172.17.0.3
Containers:
firstpod:
Container ID: docker://59f89dad2ddd6b93ac4aceb2cc0c9082f4ca42620962e4e692e3d6bcb47d4a9e
Image: nginx
Image ID: docker-pullable://nginx#sha256:644a70516a26004c97d0d85c7fe1d0c3a67ea8ab7ddf4aff193d9f301670cf36
Port: <none>
Host Port: <none>
State: Running
Started: Mon, 08 Nov 2021 16:39:14 +0200
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9b8mx (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-9b8mx:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned default/firstpod to minikube
Normal Pulling 32m kubelet Pulling image "nginx"
Normal Pulled 32m kubelet Successfully pulled image "nginx" in 3.677130128s
Normal Created 31m kubelet Created container firstpod
Normal Started 31m kubelet Started container firstpod
So I can see it is a docker container id and it is started, also there is the image, but if I do docker images or docker ps there is nothing. Where are these images and container? Thank you!
[1]: https://i.stack.imgur.com/xAcMP.jpg
One error for certain is gitbash adding Windows the path. You can disable that with a double slash:
kubectl exec -it firstpod -- //bin/bash
This command will only work if you have bash in the image. If you don't, you'll need to pick a different command to run, e.g. /bin/sh. Some images are distroless or based on scratch to explicitly not include things like shells, which will prevent you from running commands like this (intentionally, for security).
I want to run a docker container which uses GPU (it runs a cnn to detect objects on a video), and then run that container on Kubernetes.
I can run the container from docker alone without problems, but when I try to run the container from Kubernetes it fails to find the GPU.
I run it using this command:
kubectl exec -it namepod /bin/bash
This is the problem that I get:
kubectl exec -it tym-python-5bb7fcf76b-4c9z6 /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root#tym-python-5bb7fcf76b-4c9z6:/opt# cd servicio/
root#tym-python-5bb7fcf76b-4c9z6:/opt/servicio# python3 TM_Servicev2.py
Try to load cfg: /opt/darknet/cfg/yolov4.cfg, weights: /opt/yolov4.weights, clear = 0
CUDA status Error: file: ./src/dark_cuda.c : () : line: 620 : build time: Jul 30 2021 - 14:05:34
CUDA Error: no CUDA-capable device is detected
python3: check_error: Unknown error -1979678822
root#tym-python-5bb7fcf76b-4c9z6:/opt/servicio#
EDIT.
I followed all the steps on the Nvidia docker 2 guide and downloaded the Nvidia plugin for Kubernetes.
however when I deploy Kubernetes it stays as "pending" and never actually starts. I don't get an error anymore, but it never starts.
The pod appears like this:
gpu-pod 0/1 Pending 0 3m19s
EDIT 2.
I ended up reinstalling everything and now my pod appears completed but not running. like this.
default gpu-operator-test 0/1 Completed 0 62m
Answering Wiktor.
when I run this command:
kubectl describe pod gpu-operator-test
I get:
Name: gpu-operator-test
Namespace: default
Priority: 0
Node: pdi-mc/192.168.0.15
Start Time: Mon, 09 Aug 2021 12:09:51 -0500
Labels: <none>
Annotations: cni.projectcalico.org/containerID: 968e49d27fb3d86ed7e70769953279271b675177e188d52d45d7c4926bcdfbb2
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Succeeded
IP: 192.168.10.81
IPs:
IP: 192.168.10.81
Containers:
cuda-vector-add:
Container ID: docker://d49545fad730b2ec3ea81a45a85a2fef323edc82e29339cd3603f122abde9cef
Image: nvidia/samples:vectoradd-cuda10.2
Image ID: docker-pullable://nvidia/samples#sha256:4593078cdb8e786d35566faa2b84da1123acea42f0d4099e84e2af0448724af1
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 09 Aug 2021 12:10:29 -0500
Finished: Mon, 09 Aug 2021 12:10:30 -0500
Ready: False
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9ktgq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-9ktgq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
I'm using this configuration file to create the pod
apiVersion: v1
kind: Pod
metadata:
name: gpu-operator-test
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "nvidia/samples:vectoradd-cuda10.2"
resources:
limits:
nvidia.com/gpu: 1
Addressing two topics here:
The error you saw at the beginning:
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Means that you tried to use a deprecated version of the kubectl exec command. The proper syntax is:
$ kubectl exec (POD | TYPE/NAME) [-c CONTAINER] [flags] -- COMMAND [args...]
See here for more details.
According the the official docs the gpu-operator-test pod should run to completion:
You can see that the pod's status is Succeeded and also:
State: Terminated
Reason: Completed
Exit Code: 0
Exit Code: 0 means that the specified container command completed successfully.
More details can be found in the official docs.
I am trying to create a pod using my own docker image on localhost.
This is the dockerfile used to create the image :
FROM centos:8
RUN yum install -y gdb
RUN yum group install -y "Development Tools"
CMD ["/usr/bin/bash"]
The yaml file used to create the pod is this :
---
apiVersion: v1
kind: Pod
metadata:
name: server
labels:
app: server
spec:
containers:
- name: server
imagePullPolicy: Never
image: localhost:5000/server
ports:
- containerPort: 80
root#node1:~/test/server# docker images | grep server
server latest 82c5228a553d 3 hours ago 948MB
localhost.localdomain:5000/server latest 82c5228a553d 3 hours ago 948MB
localhost:5000/server latest 82c5228a553d 3 hours ago 948MB
The image has been pushed to localhost registry.
Following is the error I receive.
root#node1:~/test/server# kubectl get pods
NAME READY STATUS RESTARTS AGE
server 0/1 CrashLoopBackOff 5 5m18s
The output of describe pod :
root#node1:~/test/server# kubectl describe pod server
Name: server
Namespace: default
Priority: 0
Node: node1/10.0.2.15
Start Time: Mon, 07 Dec 2020 15:35:49 +0530
Labels: app=server
Annotations: cni.projectcalico.org/podIP: 10.233.90.192/32
cni.projectcalico.org/podIPs: 10.233.90.192/32
Status: Running
IP: 10.233.90.192
IPs:
IP: 10.233.90.192
Containers:
server:
Container ID: docker://c2982e677bf37ff11272f9ea3f68565e0120fb8ccfb1595393794746ee29b821
Image: localhost:5000/server
Image ID: docker-pullable://localhost.localdomain:5000/server#sha256:6bc8193296d46e1e6fa4cb849fa83cb49e5accc8b0c89a14d95928982ec9d8e9
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 07 Dec 2020 15:41:33 +0530
Finished: Mon, 07 Dec 2020 15:41:33 +0530
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tb7wb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-tb7wb:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tb7wb
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m default-scheduler Successfully assigned default/server to node1
Normal Pulled 4m34s (x5 over 5m59s) kubelet Container image "localhost:5000/server" already present on machine
Normal Created 4m34s (x5 over 5m59s) kubelet Created container server
Normal Started 4m34s (x5 over 5m59s) kubelet Started container server
Warning BackOff 56s (x25 over 5m58s) kubelet Back-off restarting failed container
I get no logs :
root#node1:~/test/server# kubectl logs -f server
root#node1:~/test/server#
I am unable to figure out whether the issue is with the container or yaml file for creating pod. Any help would be appreciated.
Posting this as Community Wiki.
As pointed by #David Maze in comment section.
If docker run exits immediately, a Kubernetes Pod will always go into CrashLoopBackOff state. Your Dockerfile needs to COPY in or otherwise install and application and set its CMD to run it.
Root cause can be also determined by Exit Code. In 3) Check the exit code article, you can find a few exit codes like 0, 1, 128, 137 with description.
3.1) Exit Code 0
This exit code implies that the specified container command completed ‘sucessfully’, but too often for Kubernetes to accept as working.
In short story, your container was created, all action mentioned was executed and as there was nothing else to do, it exit with Exit Code 0.
A CrashLoopBackOff error occurs when a pod startup fails repeatedly in Kubernetes.`
Your image based on centos with few additional installations did not have any process in backgroud left, so it was categorized as Completed. As this happen so fast, kubernetes restarted it and it fall in loop.
$ kubectl run centos --image=centos
$ kubectl get po -w
NAME READY STATUS RESTARTS AGE
centos 0/1 CrashLoopBackOff 1 5s
centos 0/1 Completed 2 17s
centos 0/1 CrashLoopBackOff 2 31s
centos 0/1 Completed 3 46s
centos 0/1 CrashLoopBackOff 3 58s
centos 1/1 Running 4 88s
centos 0/1 Completed 4 89s
centos 0/1 CrashLoopBackOff 4 102s
$ kubectl describe po centos | grep 'Exit Code'
Exit Code: 0
But when you have used sleep 3600, in your container, command sleep was executing for hour. After this time it would also exit with Exit Code 0.
Hope it clarified.
I'm taking a course that uses Kubernetes and am running into an error when I try to create a pod in Kubernetes.
I'm using Ubuntu, AMD64
I installed microk8s.kubectl following these instructions (https://ubuntu.com/kubernetes/install)
Here's my Dockerfile which runs correctly when I use only Docker.
FROM node:alpine
WORKDIR /app
COPY package.json ./
RUN npm install
COPY ./ ./
CMD ["npm", "start"]
Here's my posts.yaml file, verbatim to the course I'm taking
apiVersion: v1
kind: Pod
metadata:
name: posts
spec:
containers:
- name: posts
image: emendoza1986/blog_posts:0.0.1
output from kubectl get pods
NAME READY STATUS RESTARTS AGE
posts 0/1 CrashLoopBackOff 6 10m
output from kubectl logs posts
/bin/sh: [npm,start]: not found
output from kubectl describe pod posts
Name: posts
Namespace: default
Priority: 0
Node: desktope/192.168.0.18
Start Time: Thu, 23 Jul 2020 10:58:40 -0700
Labels: <none>
Annotations: Status: Running
IP: 10.1.87.20
IPs:
IP: 10.1.87.20
Containers:
posts:
Container ID: containerd://acb403c53759670370959cfa2cc0939f53126aee889e1f6dc2e831bc4dc22c3c
Image: emendoza1986/blog_posts:0.0.1
Image ID: docker.io/emendoza1986/blog_posts#sha256:f69b30cf0382d4c273643ac11c505378854b966063974cc57d187718cc0b0fd5
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 127
Started: Thu, 23 Jul 2020 10:58:59 -0700
Finished: Thu, 23 Jul 2020 10:58:59 -0700
Ready: False
Restart Count: 2
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-2fm2c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-2fm2c:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-2fm2c
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 48s default-scheduler Successfully assigned default/posts to desktope
Normal Pulled 29s (x3 over 47s) kubelet, desktope Container image "emendoza1986/blog_posts:0.0.1" already present on machine
Normal Created 29s (x3 over 47s) kubelet, desktope Created container posts
Normal Started 29s (x3 over 47s) kubelet, desktope Started container posts
Warning BackOff 12s (x4 over 45s) kubelet, desktope Back-off restarting failed container
output from microk8s.status
microk8s is running
addons:
dashboard: enabled
dns: enabled
metrics-server: enabled
cilium: disabled
fluentd: disabled
gpu: disabled
helm: disabled
helm3: disabled
host-access: disabled
ingress: disabled
istio: disabled
jaeger: disabled
knative: disabled
kubeflow: disabled
linkerd: disabled
metallb: disabled
prometheus: disabled
rbac: disabled
registry: disabled
storage: disabled
output from microk8s inspect
Inspecting Certificates
Inspecting services
Service snap.microk8s.daemon-cluster-agent is running
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-apiserver-kicker is running
Service snap.microk8s.daemon-proxy is running
Service snap.microk8s.daemon-kubelet is running
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-flanneld is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
Copy processes list to the final report tarball
Copy snap list to the final report tarball
Copy VM name (or none) to the final report tarball
Copy disk usage information to the final report tarball
Copy memory usage information to the final report tarball
Copy server uptime to the final report tarball
Copy current linux distribution to the final report tarball
Copy openSSL information to the final report tarball
Copy network configuration to the final report tarball
Inspecting kubernetes cluster
Inspect kubernetes cluster
Building the report tarball
Report tarball is at /var/snap/microk8s/1503/inspection-report-20200723_112646.tar.gz
I see the error coming from the log but I haven't been able to find a solution. Thank you for your help!
Thank you for the helpful comments. Originally I had my Dockerfile as
CMD ['npm', 'start'].
I had fixed it locally to
CMD ["npm", "start"]
but I didn't push the new version to docker hub.
Pushing the new version fixed the problem.
I am facing a weird issue with my pods. I am launching around 20 pods in my env and every time some random 3-4 pods out of them hang with Init:0/1 status. On checking the status of pod, Init container shows running status, which should terminate after task is finished, and app container shows Waiting/Pod Initializing stage. Same init container image and specs are being used in across all 20 pods but this issue is happening with some random pods every time. And on terminating these stuck pods, it stucks in Terminating state. If i ssh on node at which this pod is launched and run docker ps, it shows me init container in running state but on running docker exec it throws error that container doesn't exist. This init container is pulling configs from Consul Server and on checking volume (got from docker inspect), i found that it has pulled all the key-val pairs correctly and saved it in defined file name. I have checked resources on all the nodes and more than enough is available on all.
Below is detailed example of on the pod acting like this.
Kubectl Version :
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Pods :
kubectl get pods -n dev1|grep -i session-service
session-service-app-75c9c8b5d9-dsmhp 0/1 Init:0/1 0 10h
session-service-app-75c9c8b5d9-vq98k 0/1 Terminating 0 11h
Pods Status :
kubectl describe pods session-service-app-75c9c8b5d9-dsmhp -n dev1
Name: session-service-app-75c9c8b5d9-dsmhp
Namespace: dev1
Node: ip-192-168-44-18.ap-southeast-1.compute.internal/192.168.44.18
Start Time: Fri, 27 Apr 2018 18:14:43 +0530
Labels: app=session-service-app
pod-template-hash=3175746185
release=session-service-app
Status: Pending
IP: 100.96.4.240
Controlled By: ReplicaSet/session-service-app-75c9c8b5d9
Init Containers:
initpullconsulconfig:
Container ID: docker://c658d59995636e39c9d03b06e4973b6e32f818783a21ad292a2cf20d0e43bb02
Image: shr-u-nexus-01.myops.de:8082/utils/app-init:1.0
Image ID: docker-pullable://shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd
Port: <none>
Args:
-consul-addr=consul-server.consul.svc.cluster.local:8500
State: Running
Started: Fri, 27 Apr 2018 18:14:44 +0530
Ready: False
Restart Count: 0
Environment:
CONSUL_TEMPLATE_VERSION: 0.19.4
POD: sand
SERVICE: session-service-app
ENV: dev1
Mounts:
/var/lib/app from shared-volume-sidecar (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro)
Containers:
session-service-app:
Container ID:
Image: shr-u-nexus-01.myops.de:8082/sand-images/sessionservice-init:sitv12
Image ID:
Port: 8080/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/appenv from shared-volume-sidecar (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro)
Conditions:
Type Status
Initialized False
Ready False
PodScheduled True
Volumes:
shared-volume-sidecar:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-bthkv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bthkv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Container Status on Node :
sudo docker ps|grep -i session
c658d5999563 shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
da120abd3dbb gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
f53d48c7d6ec shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
c26415458d39 gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
On running Docker exec (same result with kubectl exec) :
sudo docker exec -it c658d5999563 bash
rpc error: code = 2 desc = containerd: container not found
A Pod can be stuck in Init status due to many reasons.
PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means one init container is not finalized; init:N/M means the Pod has M Init Containers, and N have completed so far.
Gathering information
For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.
kubectl describe pods pod-XXX with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name
kubectl logs pod-XXX this command prints the logs for a container in a pod or specified resource.
kubectl logs pod-XXX -c init-container-xxx This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:
The output of kubectl logs pod-XXX -c init-container-xxx can thrown meaningful info of the issue, reference:
In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.
Additional:
kubectl describe node node-XXX describing the pod will give you the name of its node, which you can also inspect with this command.
kubectl get events to list the cluster events.
journalctl -xeu kubelet | tail -n 10 kubelet logs on systemd (journalctl -xeu docker | tail -n 1 for docker).
Solutions
The solutions depends on the information gathered, once the root cause is found.
When you find a log with an insight of the root cause, you can investigate that specific root cause.
Some examples:
1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.
2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh.
3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.
4 > A FailedSync was found, and it was solved restarting docker on the node.
In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.
My problem was related to the ebs-csi-controller (AWS EKS 1.24)
The ebs addin needs access to a role, and in my case the role trust relationship was broken. It uses OIDC, so I had to add my cluster's OIDC provider manually into the IAM identity provider section
kubectl logs deployment/ebs-csi-controller -n kube-system -c ebs-plugin
helped diagnose this, as well as
https://aws.amazon.com/premiumsupport/knowledge-center/eks-troubleshoot-ebs-volume-mounts/