Kubernetes pods hanging in Init state - docker

I am facing a weird issue with my pods. I am launching around 20 pods in my env and every time some random 3-4 pods out of them hang with Init:0/1 status. On checking the status of pod, Init container shows running status, which should terminate after task is finished, and app container shows Waiting/Pod Initializing stage. Same init container image and specs are being used in across all 20 pods but this issue is happening with some random pods every time. And on terminating these stuck pods, it stucks in Terminating state. If i ssh on node at which this pod is launched and run docker ps, it shows me init container in running state but on running docker exec it throws error that container doesn't exist. This init container is pulling configs from Consul Server and on checking volume (got from docker inspect), i found that it has pulled all the key-val pairs correctly and saved it in defined file name. I have checked resources on all the nodes and more than enough is available on all.
Below is detailed example of on the pod acting like this.
Kubectl Version :
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Pods :
kubectl get pods -n dev1|grep -i session-service
session-service-app-75c9c8b5d9-dsmhp 0/1 Init:0/1 0 10h
session-service-app-75c9c8b5d9-vq98k 0/1 Terminating 0 11h
Pods Status :
kubectl describe pods session-service-app-75c9c8b5d9-dsmhp -n dev1
Name: session-service-app-75c9c8b5d9-dsmhp
Namespace: dev1
Node: ip-192-168-44-18.ap-southeast-1.compute.internal/192.168.44.18
Start Time: Fri, 27 Apr 2018 18:14:43 +0530
Labels: app=session-service-app
pod-template-hash=3175746185
release=session-service-app
Status: Pending
IP: 100.96.4.240
Controlled By: ReplicaSet/session-service-app-75c9c8b5d9
Init Containers:
initpullconsulconfig:
Container ID: docker://c658d59995636e39c9d03b06e4973b6e32f818783a21ad292a2cf20d0e43bb02
Image: shr-u-nexus-01.myops.de:8082/utils/app-init:1.0
Image ID: docker-pullable://shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd
Port: <none>
Args:
-consul-addr=consul-server.consul.svc.cluster.local:8500
State: Running
Started: Fri, 27 Apr 2018 18:14:44 +0530
Ready: False
Restart Count: 0
Environment:
CONSUL_TEMPLATE_VERSION: 0.19.4
POD: sand
SERVICE: session-service-app
ENV: dev1
Mounts:
/var/lib/app from shared-volume-sidecar (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro)
Containers:
session-service-app:
Container ID:
Image: shr-u-nexus-01.myops.de:8082/sand-images/sessionservice-init:sitv12
Image ID:
Port: 8080/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/appenv from shared-volume-sidecar (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro)
Conditions:
Type Status
Initialized False
Ready False
PodScheduled True
Volumes:
shared-volume-sidecar:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-bthkv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bthkv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Container Status on Node :
sudo docker ps|grep -i session
c658d5999563 shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
da120abd3dbb gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
f53d48c7d6ec shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
c26415458d39 gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
On running Docker exec (same result with kubectl exec) :
sudo docker exec -it c658d5999563 bash
rpc error: code = 2 desc = containerd: container not found

A Pod can be stuck in Init status due to many reasons.
PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means one init container is not finalized; init:N/M means the Pod has M Init Containers, and N have completed so far.
Gathering information
For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.
kubectl describe pods pod-XXX with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name
kubectl logs pod-XXX this command prints the logs for a container in a pod or specified resource.
kubectl logs pod-XXX -c init-container-xxx This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:
The output of kubectl logs pod-XXX -c init-container-xxx can thrown meaningful info of the issue, reference:
In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.
Additional:
kubectl describe node node-XXX describing the pod will give you the name of its node, which you can also inspect with this command.
kubectl get events to list the cluster events.
journalctl -xeu kubelet | tail -n 10 kubelet logs on systemd (journalctl -xeu docker | tail -n 1 for docker).
Solutions
The solutions depends on the information gathered, once the root cause is found.
When you find a log with an insight of the root cause, you can investigate that specific root cause.
Some examples:
1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.
2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh.
3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.
4 > A FailedSync was found, and it was solved restarting docker on the node.
In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.

My problem was related to the ebs-csi-controller (AWS EKS 1.24)
The ebs addin needs access to a role, and in my case the role trust relationship was broken. It uses OIDC, so I had to add my cluster's OIDC provider manually into the IAM identity provider section
kubectl logs deployment/ebs-csi-controller -n kube-system -c ebs-plugin
helped diagnose this, as well as
https://aws.amazon.com/premiumsupport/knowledge-center/eks-troubleshoot-ebs-volume-mounts/

Related

Kubernetes GPU Pod error : validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"

When trying to create Pods that can use GPU, I get the error "exec: "nvidia-smi": executable file not found in $PATH" ".
To explain the error from the beginning, my main goal was to create JupyterHub enviroments that can use GPU. I installed Zero to JupyterHub for Kubernetes. I followed these steps to be able to use GPU. When I check my nodes GPUs seems schedulable by Kubernetes. So far everything seemed fine.
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
arge-server 1
However, when I logged in to JupyetHub and tried to open the profile using GPU, I got an error: [Warning] 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. So, I checked the Pods and I found that they were all in the "Waiting: PodInitializing" state.
kubectl get pods -n gpu-operator-resources
NAME READY STATUS RESTARTS AGE
nvidia-dcgm-x5rqs 0/1 Init:0/1 2 6d20h
nvidia-device-plugin-daemonset-jhjhb 0/1 Init:0/1 0 6d20h
gpu-feature-discovery-pd4xv 0/1 Init:0/1 2 6d20h
nvidia-dcgm-exporter-7mjgt 0/1 Init:0/1 2 6d20h
nvidia-operator-validator-9xjmv 0/1 Init:Error 10 26m
After that, I took a closer look at the Pod nvidia-operator-validator-9xjmv, which was the beginning of the error, and I saw that the toolkit-validation container was throwing a CrashLoopBackOff error. Here is the relevant part of the log:
kubectl describe pod nvidia-operator-validator-9xjmv -n gpu-operator-resources
Name: nvidia-operator-validator-9xjmv
Namespace: gpu-operator-resources
.
.
.
Controlled By: DaemonSet/nvidia-operator-validator
Init Containers:
.
.
.
toolkit-validation:
Container ID: containerd://e7d004f0809cbefdae5407ea42eb659972ea7eefa5dd6e45e968cbf3ed22bf2e
Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2
Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator#sha256:a07fd1c74e3e469ac316d17cf79635173764fdab3b681dbc282027a23dbbe227
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
nvidia-validator
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 18 Nov 2021 12:55:00 +0300
Finished: Thu, 18 Nov 2021 12:55:00 +0300
Ready: False
Restart Count: 16
Environment:
WITH_WAIT: false
COMPONENT: toolkit
Mounts:
/run/nvidia/validations from run-nvidia-validations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hx7ls (ro)
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 58m default-scheduler Successfully assigned gpu-operator-resources/nvidia-operator-validator-9xjmv to arge-server
Normal Pulled 58m kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 58m kubelet Created container driver-validation
Normal Started 58m kubelet Started container driver-validation
Normal Pulled 56m (x5 over 58m) kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 56m (x5 over 58m) kubelet Created container toolkit-validation
Normal Started 56m (x5 over 58m) kubelet Started container toolkit-validation
Warning BackOff 3m7s (x255 over 58m) kubelet Back-off restarting failed container
Then, I looked at the logs of the container and I got the following error.
kubectl logs -n gpu-operator-resources -f nvidia-operator-validator-9xjmv -c toolkit-validation
time="2021-11-18T09:29:24Z" level=info msg="Error: error validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"
toolkit is not ready
For similar issues, it was suggested to delete the failed Pod and deployment. However, doing these did not fix my problem. Do you have any suggestions?
I have;
Ubuntu 20.04
Kubernetes v1.21.6
Docker 20.10.10
NVIDIA-SMI 470.82.01
CUDA 11.4
CPU: Intel Xeon E5-2683 v4 (32) # 2.097GHz
GPU: NVIDIA GeForce RTX 2080 Ti
Memory: 13815MiB / 48280MiB
Thanks in advance.
In case you're are still having the issue, we just had the same issue on our cluster, the "dirty" fix is to do that:
rm /run/nvidia/driver
ln -s / /run/nvidia/drive
kubectl delete pod -n gpu-operator nvidia-operator-validator-xxxxx
The reason is the init pod of the nvidia-operator-validator try to execute nvidia-smi within a chroot from /run/nvidia/driver .. which is a tmpfs (so doesn't persist accross reboot) and is not populated when performing a manual install of the drivers.
Do hope for a better fix from Nvidia.

httpd Docker image CrashLoopBackOff on Kubernetes

I have a simple docker image which is working fine locally.
It is basically the same as the example on apache's httpd page.
FROM httpd:2.4
COPY ./public-html/ /usr/local/apache2/htdocs/
As per the page example, I can build and run my image as follows:
$ docker build -t gcr.io/${PROJECT_ID}/hello-app:v1 .
$ docker run -dit --name my-running-app -p 8080:80 <img_id>
I then head over to http://localhost:8080 , and everything seems to be working as it should.
However, when I try to create a deployment for my Google Cloud Kubernetes instance, my pod fails and gets to the state of CrashLoopBackOff. (This is after I have pushed the image to Google Cloud Registry, so that the deployment may grab the image from there.)
I think that this CrashLoopBackOff problem is happening due to me not having an ENTRYPOINT to my container; ie, the pod spawns, no command is issued, and then it is completed and crashes.
I have 2 questions then:
What command should I add to my Dockerfile to get the http server up and running on the pod (assuming my assessment of the problem is indeed correct)?
How is this running locally? Locally I simply $ docker run -dit --name my-running-app -p 8080:80 <img_id>. I do not specify that the container should run httpd, yet it does? How is this happening?
Edit - additional information:
I deployed onto K8's by doing the following:
$ kubectl create deployment hello-app --image=gcr.io/${PROJECT_ID}/hello-app:v1
Kubectl logs:
$ kubectl logs <pod_name>
standard_init_linux.go:211: exec user process caused "exec format error"
kubectl describe:
$ kubectl describe pod hello-app-6b89cd98f6-gn65p
Name: <name>
Namespace: default
Priority: 0
Node: <my_node>
Start Time: Mon, 22 Mar 2021 12:32:51 +0200
Labels: app=hello-app
pod-template-hash=6b89cd98f6
Annotations: <none>
Status: Running
IP: 10.12.1.13
IPs:
IP: 10.12.1.13
Controlled By: <replica_set>
Containers:
hello-app:
Container ID: <cid>
Image: <img>
Image ID: <img_id>
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 22 Mar 2021 15:12:18 +0200
Finished: Mon, 22 Mar 2021 15:12:18 +0200
Ready: False
Restart Count: 36
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b8p9t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-b8p9t:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-b8p9t
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m9s (x741 over 164m) kubelet Back-off restarting failed container
CrashLoopBackOff error means the pod keeps crashing and kubernetes has given up on it. You have to determine what is causing the crash.
Overally the cause of problem may be that:
type of parameters of the pod or container have been configured incorrectly
the application inside the container keeps crashing
an error occurred while deploying Kubernetes
You can type watch kubectl describe <pod-name> to check events as the pod is being created. But if the pod crashes after it starts up, you need to get the container logs kubectl logs -f <your-pod-name>.
Read more: kubernetes-crashloopbackoff.
As #Krishna Chaurasia said check the thread which is implying that the default command being run is not an executable - executable formats could be different for different platforms. As #Sagar Velankar mentioned use in Docker file in FROM line --platform flag to specify linux/amd64 as the target architecture. See: dockerfile-from.
You can use docker buildx docs.docker.com/docker-for-mac/multi-arch to build and push multi architecture images and kubelet will pull the image with correct architecture.

Unable to install Jenkins on Minikube using Helm due to the permission on mac

I`vw tried to install jenkins on minikube according this article
https://www.jenkins.io/doc/book/installing/kubernetes/
When I type kubectl logs pod/jenkins-0 init -n jenkins
I get
disable Setup Wizard
/var/jenkins_config/apply_config.sh: 4: /var/jenkins_config/apply_config.sh: cannot create /var/jenkins_home/jenkins.install.UpgradeWizard.state: Permission denied
I almost sure that I have some problems with file system on mac.
I did not create serviceAccount from article because helm have not seen it and returns error.
Instead of it I changed in jenkins-values.yaml
serviceAccount:
create: true
name: jenkins
annotations: {}
Then I tried set next values to 0. It have no affect.
runAsUser: 1000
fsGroup: 1000
Addition info:
kubectl get all -n jenkins
NAME READY STATUS RESTARTS AGE
pod/jenkins-0 0/2 Init:CrashLoopBackOff 7 15m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jenkins ClusterIP 10.104.114.29 <none> 8080/TCP 15m
service/jenkins-agent ClusterIP 10.104.207.201 <none> 50000/TCP 15m
NAME READY AGE
statefulset.apps/jenkins 0/1 15m
Also tried to use different directories for volume live /Volumes/data and add 777 permissions to it.
There are a couple potentials in here, but there is a solution without switching to runAsUser 0 (which breaks security assessments).
The folder /data/jenkins-volume is created as root by default, with a 755 permission set so you can't create persistent data in this dir with the default jenkins build.
To fix this, enter minikube with $ minikube ssh and run: $ chown 1000:1000 /data/jenkins-volume
The other thing that could be biting you (after fixing the folder permissions) is SELinux policies, when you are running your Kubernetes on a RHEL based OS.
To fix this: $ chcon -R -t svirt_sandbox_file_t /data/jenkins-volume
It was resolved
I just set runAsUser to 0 everywhere.
runAsUser to 0 everywhere worked, but this not the ideal solution due to potential security issues. Good for dev environment but not for prod.

CrashLoopBackOff while deploying pod using image from private registry

I am trying to create a pod using my own docker image on localhost.
This is the dockerfile used to create the image :
FROM centos:8
RUN yum install -y gdb
RUN yum group install -y "Development Tools"
CMD ["/usr/bin/bash"]
The yaml file used to create the pod is this :
---
apiVersion: v1
kind: Pod
metadata:
name: server
labels:
app: server
spec:
containers:
- name: server
imagePullPolicy: Never
image: localhost:5000/server
ports:
- containerPort: 80
root#node1:~/test/server# docker images | grep server
server latest 82c5228a553d 3 hours ago 948MB
localhost.localdomain:5000/server latest 82c5228a553d 3 hours ago 948MB
localhost:5000/server latest 82c5228a553d 3 hours ago 948MB
The image has been pushed to localhost registry.
Following is the error I receive.
root#node1:~/test/server# kubectl get pods
NAME READY STATUS RESTARTS AGE
server 0/1 CrashLoopBackOff 5 5m18s
The output of describe pod :
root#node1:~/test/server# kubectl describe pod server
Name: server
Namespace: default
Priority: 0
Node: node1/10.0.2.15
Start Time: Mon, 07 Dec 2020 15:35:49 +0530
Labels: app=server
Annotations: cni.projectcalico.org/podIP: 10.233.90.192/32
cni.projectcalico.org/podIPs: 10.233.90.192/32
Status: Running
IP: 10.233.90.192
IPs:
IP: 10.233.90.192
Containers:
server:
Container ID: docker://c2982e677bf37ff11272f9ea3f68565e0120fb8ccfb1595393794746ee29b821
Image: localhost:5000/server
Image ID: docker-pullable://localhost.localdomain:5000/server#sha256:6bc8193296d46e1e6fa4cb849fa83cb49e5accc8b0c89a14d95928982ec9d8e9
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 07 Dec 2020 15:41:33 +0530
Finished: Mon, 07 Dec 2020 15:41:33 +0530
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tb7wb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-tb7wb:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tb7wb
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m default-scheduler Successfully assigned default/server to node1
Normal Pulled 4m34s (x5 over 5m59s) kubelet Container image "localhost:5000/server" already present on machine
Normal Created 4m34s (x5 over 5m59s) kubelet Created container server
Normal Started 4m34s (x5 over 5m59s) kubelet Started container server
Warning BackOff 56s (x25 over 5m58s) kubelet Back-off restarting failed container
I get no logs :
root#node1:~/test/server# kubectl logs -f server
root#node1:~/test/server#
I am unable to figure out whether the issue is with the container or yaml file for creating pod. Any help would be appreciated.
Posting this as Community Wiki.
As pointed by #David Maze in comment section.
If docker run exits immediately, a Kubernetes Pod will always go into CrashLoopBackOff state. Your Dockerfile needs to COPY in or otherwise install and application and set its CMD to run it.
Root cause can be also determined by Exit Code. In 3) Check the exit code article, you can find a few exit codes like 0, 1, 128, 137 with description.
3.1) Exit Code 0
This exit code implies that the specified container command completed ‘sucessfully’, but too often for Kubernetes to accept as working.
In short story, your container was created, all action mentioned was executed and as there was nothing else to do, it exit with Exit Code 0.
A CrashLoopBackOff error occurs when a pod startup fails repeatedly in Kubernetes.`
Your image based on centos with few additional installations did not have any process in backgroud left, so it was categorized as Completed. As this happen so fast, kubernetes restarted it and it fall in loop.
$ kubectl run centos --image=centos
$ kubectl get po -w
NAME READY STATUS RESTARTS AGE
centos 0/1 CrashLoopBackOff 1 5s
centos 0/1 Completed 2 17s
centos 0/1 CrashLoopBackOff 2 31s
centos 0/1 Completed 3 46s
centos 0/1 CrashLoopBackOff 3 58s
centos 1/1 Running 4 88s
centos 0/1 Completed 4 89s
centos 0/1 CrashLoopBackOff 4 102s
$ kubectl describe po centos | grep 'Exit Code'
Exit Code: 0
But when you have used sleep 3600, in your container, command sleep was executing for hour. After this time it would also exit with Exit Code 0.
Hope it clarified.

Where is kube-apiserver located

Base question: When I try to use kube-apiserver on my master node, I get command not found error. How I can install/configure kube-apiserver? Any link to example will help.
$ kube-apiserver --enable-admission-plugins DefaultStorageClass
-bash: kube-apiserver: command not found
Details: I am new to Kubernetes and Docker and was trying to create StatefulSet with volumeClaimTemplates. My problem is that the automatic PVs are not created and I get this message in the PVC log: "persistentvolume-controller waiting for a volume to be created". I am not sure if I need to define DefaultStorageClass and so needed kube-apiserver to define it.
Name: nfs
Namespace: default
StorageClass: example-nfs
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner=example.com/nfs
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 3m (x2401 over 10h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "example.com/nfs" or manually created by system administrator
Here is get pvc result:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs Pending example-nfs 10h
And get storageclass:
$ kubectl describe storageclass example-nfs
Name: example-nfs
IsDefaultClass: No
Annotations: <none>
Provisioner: example.com/nfs
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
How can I troubleshoot this issue (e.g. logs for why the storage was not created)?
You are asking two different questions here, one about kube-apiserver configuration, one about troubleshooting your StorageClass.
Here's an answer for your first question:
kube-apiserver is running as a Docker container on your master node. Therefore, the binary is within the container, not on your host system. It is started by the master's kubelet from a file located at /etc/kubernetes/manifests. kubelet is watching this directory and will start any Pod defined here as "static pods".
To configure kube-apiserver command line arguments you need to modify /etc/kubernetes/manifests/kube-apiserver.yaml on your master.
I'll refer to the question regarding the location of the api-server.
Basic answer (specific to the question title):
The kube apiserver is located on the master node (known as the control plane).
It can be executed:
1 ) Via the host's init system (like systemd).
2 ) As a pod (I'll explain below).
In both cases it will be located on the control plane (left side below):
If its running under systemD you can run: systemctl status api-server to see the path to the configuration (drop-in) file.
If it is running as pod you can view it under the kube-system namespace with all other control panel components (plus kube-proxy and maybe network solution like weave below):
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-lpdlc 1/1 Running 1 2d22h
coredns-f9fd979d6-vcs7g 1/1 Running 1 2d22h
etcd-my-master 1/1 Running 1 2d22h
kube-apiserver-my-master 1/1 Running 1 2d22h #<----Here
kube-controller-manager-my-master 1/1 Running 1 2d22h
kube-proxy-kh2lc 1/1 Running 1 2d22h
kube-scheduler-my-master 1/1 Running 1 2d22h
weave-net-59r5b 2/2 Running 3 2d22h
You can run:
kubectl describe pod/kube-apiserver-my-master -n kube-system
In order to get more details regarding the pod.
A bit more advanced answer:
(regarding the location of /etc/kubernetes/manifests)
Lets say we have no idea where to find the relevant path for the kube-api-server config file.
But we need to remember two important things:
1 ) The kube-api-server is running on the master node.
2 ) The Kubelet isn't running as pod and when the control plane components (plus kube-proxy) are executed as static pods - it is done by the Kubelet on the master node.
So we can start our journey for reaching the manifests path by investigating the Kubelet logs.
If the Kubelet is running for a long time it will be a very large file and we'll need to dump it somewhere and go to the begging - or if Kubelet was started 5 minutes ago we can run:
sudo journalctl -u kubelet --since -5m >> kubelet_5_minutes.log
And a quick search for "api-server" will bring us to the 2 lines below where the path of the manifests in mentioned:
my-master kubelet[71..]: 00:03:21 kubelet.go:261] Adding pod path: /etc/kubernetes/manifests
my-master kubelet[71..]: 00:03:21 kubelet.go:273] Watching apiserver
And also we can see that the Kubelet is trying to create the kube-apiserver pod under my-master node and inside the kube-system namespace:
my-master kubelet[71..]: 00:03:29.05 kubelet.go:1576] ..
Creating a mirror pod for "kube-apiserver-my-master_kube-system
To make the storage class "example-nfs" default, you need to run the below command:
kubectl patch storageclass example-nfs -p '{"metadata":
{"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'

Resources