pod creation stuck in ContainerCreating state

pod creation stuck in ContainerCreating state - docker

I have created a k8s cluster with RHEL7 with kubernetes packages GitVersion:"v1.8.1". I'm trying to deploy wordpress on my custom cluster. But pod creation is always stuck in ContainerCreating state.
phani#k8s-master]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default wordpress-766d75457d-zlvdn 0/1 ContainerCreating 0 11m
kube-system etcd-k8s-master 1/1 Running 0 1h
kube-system kube-apiserver-k8s-master 1/1 Running 0 1h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 1h
kube-system kube-dns-545bc4bfd4-bb8js 3/3 Running 0 1h
kube-system kube-proxy-bf4zr 1/1 Running 0 1h
kube-system kube-proxy-d7zvg 1/1 Running 0 34m
kube-system kube-scheduler-k8s-master 1/1 Running 0 1h
kube-system weave-net-92zf9 2/2 Running 0 34m
kube-system weave-net-sh7qk 2/2 Running 0 1h
Docker Version:1.13.1
Pod status from descibe command
Normal Scheduled 18m default-scheduler Successfully assigned wordpress-766d75457d-zlvdn to worker1
Normal SuccessfulMountVolume 18m kubelet, worker1 MountVolume.SetUp succeeded for volume "default-token-tmpcm"
Warning DNSSearchForming 18m kubelet, worker1 Search Line limits were exceeded, some dns names have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local
Warning FailedCreatePodSandBox 14m kubelet, worker1 Failed create pod sandbox.
Warning FailedSync 25s (x8 over 14m) kubelet, worker1 Error syncing pod
Normal SandboxChanged 24s (x8 over 14m) kubelet, worker1 Pod sandbox changed, it will be killed and re-created.
from the kubelet log I observed below error on worker
error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
But kubelet is stable no problems seen on worker.
How do I solve this problem?
I checked the cni failure, I couldn't find anything.
~]# ls /opt/cni/bin
bridge cnitool dhcp flannel host-local ipvlan loopback macvlan noop ptp tuning weave-ipam weave-net weave-plugin-2.3.0
In journal logs below messages are repetitively appeared . seems like scheduler is trying to create the container all the time.
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421184 14339 remote_runtime.go:115] StopPodSandbox "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421212 14339 kuberuntime_manager.go:780] Failed to stop sandbox {"docker" "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304"}
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421247 14339 kuberuntime_manager.go:580] killPodWithSyncResult failed: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421262 14339 pod_workers.go:182] Error syncing pod 7f1c6bf1-6af3-11e8-856b-fa163e3d1891 ("wordpress-766d75457d-spdrb_default(7f1c6bf1-6af3-11e8-856b-fa163e3d1891)"), skipping: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"

Failed create pod sandbox.
... is almost always a CNI failure; I would check on the node that all the weave containers are happy, and that /opt/cni/bin is present (or its weave equivalent)
You may have to check both the journalctl -u kubelet.service as well as the docker logs for any containers running to discover the full scope of the error on the node.

It's seem to working by removing the$KUBELET_NETWORK_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
I have removed $KUBELET_NETWORK_ARGS and restarted the worker node then pods got deployed successfully.

As Matthew said it's most likely a CNI failure.
First, find the node this pod is running on:
kubectl get po wordpress-766d75457d-zlvdn -o wide
Next in the node where the pod is located check /etc/cni/net.d if you have more than one .conf then you can delete one and restart the node.
source: https://github.com/kubernetes/kubeadm/issues/578.
note this is one of the solutions.

While hopefully it's no one else's problem, for me, this happened when part of my filesystem was full.
I had pods stuck in ContainerCreating only on one node in my cluster. I also had a bunch of pods which I expected to shutdown, but hadn't. Someone recommended running
sudo systemctl status kubelet -l
which showed me a bunch of lines like
Jun 18 23:19:56 worker01 kubelet[1718]: E0618 23:19:56.461378 1718 kuberuntime_manager.go:647] createPodSandbox for pod "REDACTED(2c681b9c-cf5b-11eb-9c79-52540077cc53)" failed: mkdir /var/log/pods/2c681b9c-cf5b-11eb-9c79-52540077cc53: no space left on device
I confirmed that I was out of space with
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 189G 0 189G 0% /dev
tmpfs 189G 0 189G 0% /sys/fs/cgroup
/dev/mapper/vg01-root 20G 7.0G 14G 35% /
/dev/mapper/vg01-tmp 4.0G 34M 4.0G 1% /tmp
/dev/mapper/vg01-home 4.0G 72M 4.0G 2% /home
/dev/mapper/vg01-varlog 10G 10G 20K 100% /var/log
/dev/mapper/vg01-varlogaudit 2.0G 68M 2.0G 4% /var/log/audit
I just had to clear out that dir (and did some manual cleanup on all the pending pods and pods that were stuck running).

Related

Kubernetes GPU Pod error : validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"

When trying to create Pods that can use GPU, I get the error "exec: "nvidia-smi": executable file not found in $PATH" ".
To explain the error from the beginning, my main goal was to create JupyterHub enviroments that can use GPU. I installed Zero to JupyterHub for Kubernetes. I followed these steps to be able to use GPU. When I check my nodes GPUs seems schedulable by Kubernetes. So far everything seemed fine.
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
arge-server 1
However, when I logged in to JupyetHub and tried to open the profile using GPU, I got an error: [Warning] 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. So, I checked the Pods and I found that they were all in the "Waiting: PodInitializing" state.
kubectl get pods -n gpu-operator-resources
NAME READY STATUS RESTARTS AGE
nvidia-dcgm-x5rqs 0/1 Init:0/1 2 6d20h
nvidia-device-plugin-daemonset-jhjhb 0/1 Init:0/1 0 6d20h
gpu-feature-discovery-pd4xv 0/1 Init:0/1 2 6d20h
nvidia-dcgm-exporter-7mjgt 0/1 Init:0/1 2 6d20h
nvidia-operator-validator-9xjmv 0/1 Init:Error 10 26m
After that, I took a closer look at the Pod nvidia-operator-validator-9xjmv, which was the beginning of the error, and I saw that the toolkit-validation container was throwing a CrashLoopBackOff error. Here is the relevant part of the log:
kubectl describe pod nvidia-operator-validator-9xjmv -n gpu-operator-resources
Name: nvidia-operator-validator-9xjmv
Namespace: gpu-operator-resources
.
.
.
Controlled By: DaemonSet/nvidia-operator-validator
Init Containers:
.
.
.
toolkit-validation:
Container ID: containerd://e7d004f0809cbefdae5407ea42eb659972ea7eefa5dd6e45e968cbf3ed22bf2e
Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2
Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator#sha256:a07fd1c74e3e469ac316d17cf79635173764fdab3b681dbc282027a23dbbe227
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
nvidia-validator
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 18 Nov 2021 12:55:00 +0300
Finished: Thu, 18 Nov 2021 12:55:00 +0300
Ready: False
Restart Count: 16
Environment:
WITH_WAIT: false
COMPONENT: toolkit
Mounts:
/run/nvidia/validations from run-nvidia-validations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hx7ls (ro)
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 58m default-scheduler Successfully assigned gpu-operator-resources/nvidia-operator-validator-9xjmv to arge-server
Normal Pulled 58m kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 58m kubelet Created container driver-validation
Normal Started 58m kubelet Started container driver-validation
Normal Pulled 56m (x5 over 58m) kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 56m (x5 over 58m) kubelet Created container toolkit-validation
Normal Started 56m (x5 over 58m) kubelet Started container toolkit-validation
Warning BackOff 3m7s (x255 over 58m) kubelet Back-off restarting failed container
Then, I looked at the logs of the container and I got the following error.
kubectl logs -n gpu-operator-resources -f nvidia-operator-validator-9xjmv -c toolkit-validation
time="2021-11-18T09:29:24Z" level=info msg="Error: error validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"
toolkit is not ready
For similar issues, it was suggested to delete the failed Pod and deployment. However, doing these did not fix my problem. Do you have any suggestions?
I have;
Ubuntu 20.04
Kubernetes v1.21.6
Docker 20.10.10
NVIDIA-SMI 470.82.01
CUDA 11.4
CPU: Intel Xeon E5-2683 v4 (32) # 2.097GHz
GPU: NVIDIA GeForce RTX 2080 Ti
Memory: 13815MiB / 48280MiB
Thanks in advance.

In case you're are still having the issue, we just had the same issue on our cluster, the "dirty" fix is to do that:
rm /run/nvidia/driver
ln -s / /run/nvidia/drive
kubectl delete pod -n gpu-operator nvidia-operator-validator-xxxxx
The reason is the init pod of the nvidia-operator-validator try to execute nvidia-smi within a chroot from /run/nvidia/driver .. which is a tmpfs (so doesn't persist accross reboot) and is not populated when performing a manual install of the drivers.
Do hope for a better fix from Nvidia.

Setting up ELK stack on kubernetes using minikube

After installing this is what my pods look like
Running pods
NAME READY STATUS RESTARTS AGE
elk-elasticsearch-client-5ffc974f8-987zv 1/1 Running 0 21m
elk-elasticsearch-curator-1582107120-4f2wm 0/1 Completed 0 19m
elk-elasticsearch-data-0 0/1 Pending 0 21m
elk-elasticsearch-exporter-84ff9b656d-t8vw2 1/1 Running 0 21m
elk-elasticsearch-master-0 1/1 Running 0 21m
elk-elasticsearch-master-1 1/1 Running 0 20m
elk-filebeat-4sxn9 0/2 Init:CrashLoopBackOff 9 21m
elk-kibana-77b97d7c69-d4jzz 1/1 Running 0 21m
elk-logstash-0 0/2 Pending 0 21m
So filebeat refuses to start.
Getting the logs from this node I get
Exiting: Couldn't connect to any of the configured Elasticsearch hosts. Errors: [Error connection to Elasticsearch http://elk-elasticsearch-client.elk.svc:9200: Get http://elk-elasticsearch-client.elk.svc:9200: lookup elk-elasticsearch-client.elk.svc on 10.96.0.10:53: no such host]
Also when trying to access the kibana node (the only node i can call using http) I get that it is not ready.
get pv:
pvc-9b9b13d8-48d2-4a79-a10c-8d1278554c75 4Gi RWO Delete Bound default/data-elk-elasticsearch-master-0 standard 113m
pvc-d8b361d7-8e04-4300-a0f8-c79f7cea7e44 4Gi RWO Delete Bound default/data-elk-elasticsearch-master-1 standard 112m
I'm running minikube with the none vm-driver which it tells me, does not respect the memory or cpu-flag. But I don't get it complaining about resources
kubectl version 1.17
docker version i 19.03.5, build 633a0ea838
minikube version 1.6.2
The elk stack was installed using helm.
I have the following versions:
elasticsearch-1.32.2.tgz
elasticsearch-curator-2.1.3.tgz
elasticsearch-exporter-2.2.0.tgz
filebeat-4.0.0.tgz
kibana-3.2.6.tgz
logstash-2.4.0.tgz
Running on ubuntu 18.04

Tearing everything down and then installing the required components from other helm-charts solved the issues. It may be that the charts I was using were not intended to run locally on minikube.

Coredns in pending state in Kubernetes cluster

I am trying to configure a 2 node Kubernetes cluster. First I am trying to configure the master node of the cluster on a CentOS VM. I have initialized the cluster using 'kubeadm init --apiserver-advertise-address=172.16.100.6 --pod-network-cidr=10.244.0.0/16' and deployed the flannel network to the cluster. But when I do 'kubectl get nodes', I get the following output ----
[root#kubernetus ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetus NotReady master 57m v1.12.0
Following is the output of 'kubectl get pods --all-namespaces -o wide ' ----
[root#kubernetus ~]# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-system coredns-576cbf47c7-9x59x 0/1 Pending 0 58m <none> <none> <none>
kube-system coredns-576cbf47c7-l52wc 0/1 Pending 0 58m <none> <none> <none>
kube-system etcd-kubernetus 1/1 Running 2 57m 172.16.100.6 kubernetus <none>
kube-system kube-apiserver-kubernetus 1/1 Running 2 57m 172.16.100.6 kubernetus <none>
kube-system kube-controller-manager-kubernetus 1/1 Running 1 57m 172.16.100.6 kubernetus <none>
kube-system kube-proxy-hr557 1/1 Running 1 58m 172.16.100.6 kubernetus <none>
kube-system kube-scheduler-kubernetus 1/1 Running 1 57m 172.16.100.6 kubernetus <none>
coredns is in a pending state for a very long time. I have removed docker and kubectl, kubeadm, kubelet a no of times & tried to recreate the cluster, but every time it shows the same output. Can anybody help me with this issue?

Try to install Pod network add-on (Base on this guide).
Run this line:
kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml

Unable to update cni config: No networks found in /etc/cni/net.d .....
Oct 02 19:21:32 kubernetus kubelet[19007]: E1002 19:21:32.886170 19007
kubelet.go:2167] Container runtime network not ready:
NetworkReady=false reason:NetworkPluginNotReady message:docker:
network plugin is not ready: cni config uninitialized
According to this error, you forgot to initialize a Kubernetes Pod network add-on. Looking at your settings, I suppose it should be Flannel.
Here is the instruction from the official Kubernetes documentation:
For flannel to work correctly, you must pass
--pod-network-cidr=10.244.0.0/16 to kubeadm init.
Set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 by running
sysctl net.bridge.bridge-nf-call-iptables=1 to pass bridged IPv4
traffic to iptables’ chains. This is a requirement for some CNI
plugins to work, for more information please see here.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
Note that flannel works on amd64, arm, arm64 and ppc64le, but until
flannel v0.11.0 is released you need to use the following manifest
that supports all the architectures:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/c5d10c8/Documentation/kube-flannel.yml
For more information, you can visit this link.

For the Kubernetes cluster to be available, the cluster should have a Container Networking Interface (CNI). A pod-network is required to be configured for the dns pod to be functional.
Install any of the CNI Providers like:
- Flannel
- Calico
- Canal
- WeaveNet, etc.,
Without this, the hosted Kubernetes cluster would have the master in the NotReady State.

Check if docker and kubernetes are using the same cgroup driver.
I faced the same issue (CentOS 7, kubernetes v1.14.1), and setting same cgroup driver (systemd) fixed it.

I installed kubernetes with 1 master + 1 work-node.
After I made kubeadm init ..., I faced two issues:
On the master node, the coredns were pending.
On the work-node, kubectl command didn't work out
On the work-node, I did the following and fixed the both issues:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/kubelet.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config**

For me, I've restarted the system and re-applied calico.yaml, coredns and calico pods started creating.

take this solution at least priority and try changing instance type (preferably higher cpu core/ram)
in my case i have changed linux instance t3.micro to t2.medium and its works

Kubelet failed to get cgroup stats for "/system.slice/docker.service"

Question
What the kubectl (1.8.3 on CentOS 7) error massage actually means and how to resolve.
Nov 19 22:32:24 master kubelet[4425]: E1119 22:32:24.269786 4425 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get con
Nov 19 22:32:24 master kubelet[4425]: E1119 22:32:24.269802 4425 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get conta
Research
Found the same error and followed the workaround by updating the service unit of kubelet as below but did not work.
kubelet fails to get cgroup stats for docker and kubelet services
/etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/
[Service]
ExecStart=/usr/bin/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
Background
Setting up Kubernetes cluster by following Install kubeadm. The section in the document Installing Docker says about aligning the cgroup driver as below.
Note: Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. To ensure compatability you can either update Docker, like so:
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
But doing so caused docker service failed to start with:
unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag".
Nov 19 16:55:56 localhost.localdomain systemd1: docker.service: main process exited, code=exited, status=1/FAILURE.
Maser node is in ready with all system pods are running.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-master 1/1 Running 0 39m
kube-system kube-apiserver-master 1/1 Running 0 39m
kube-system kube-controller-manager-master 1/1 Running 0 39m
kube-system kube-dns-545bc4bfd4-mqqqk 3/3 Running 0 40m
kube-system kube-flannel-ds-fclcs 1/1 Running 2 13m
kube-system kube-flannel-ds-hqlnb 1/1 Running 0 39m
kube-system kube-proxy-t7z5w 1/1 Running 0 40m
kube-system kube-proxy-xdw42 1/1 Running 0 13m
kube-system kube-scheduler-master 1/1 Running 0 39m
Environment
Kubernetes 1.8.3 on CentOS with Flannel.
$ kubectl version -o json | python -m json.tool
{
"clientVersion": {
"buildDate": "2017-11-08T18:39:33Z",
"compiler": "gc",
"gitCommit": "f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd",
"gitTreeState": "clean",
"gitVersion": "v1.8.3",
"goVersion": "go1.8.3",
"major": "1",
"minor": "8",
"platform": "linux/amd64"
},
"serverVersion": {
"buildDate": "2017-11-08T18:27:48Z",
"compiler": "gc",
"gitCommit": "f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd",
"gitTreeState": "clean",
"gitVersion": "v1.8.3",
"goVersion": "go1.8.3",
"major": "1",
"minor": "8",
"platform": "linux/amd64"
}
}
$ kubectl describe node master
Name: master
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=master
node-role.kubernetes.io/master=
Annotations: flannel.alpha.coreos.com/backend-data={"VtepMAC":"86:b6:7a:d6:7b:b3"}
flannel.alpha.coreos.com/backend-type=vxlan
flannel.alpha.coreos.com/kube-subnet-manager=true
flannel.alpha.coreos.com/public-ip=10.0.2.15
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node-role.kubernetes.io/master:NoSchedule
CreationTimestamp: Sun, 19 Nov 2017 22:27:17 +1100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:32:24 +1100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.99.10
Hostname: master
Capacity:
cpu: 1
memory: 3881880Ki
pods: 110
Allocatable:
cpu: 1
memory: 3779480Ki
pods: 110
System Info:
Machine ID: ca0a351004604dd49e43f8a6258ddd77
System UUID: CA0A3510-0460-4DD4-9E43-F8A6258DDD77
Boot ID: e9060efa-42be-498d-8cb8-8b785b51b247
Kernel Version: 3.10.0-693.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.8.3
Kube-Proxy Version: v1.8.3
PodCIDR: 10.244.0.0/24
ExternalID: master
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-master 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-master 250m (25%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-master 200m (20%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-dns-545bc4bfd4-mqqqk 260m (26%) 0 (0%) 110Mi (2%) 170Mi (4%)
kube-system kube-flannel-ds-hqlnb 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-t7z5w 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-master 100m (10%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
810m (81%) 0 (0%) 110Mi (2%) 170Mi (4%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 38m kubelet, master Starting kubelet.
Normal NodeAllocatableEnforced 38m kubelet, master Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 37m (x8 over 38m) kubelet, master Node master status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 37m (x8 over 38m) kubelet, master Node master status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 37m (x7 over 38m) kubelet, master Node master status is now: NodeHasNoDiskPressure
Normal Starting 37m kube-proxy, master Starting kube-proxy.
Normal Starting 32m kubelet, master Starting kubelet.
Normal NodeAllocatableEnforced 32m kubelet, master Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 32m kubelet, master Node master status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 32m kubelet, master Node master status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 32m kubelet, master Node master status is now: NodeHasNoDiskPressure
Normal NodeNotReady 32m kubelet, master Node master status is now: NodeNotReady
Normal NodeReady 32m kubelet, master Node master status is now: NodeReady

the reason for this problem is that the nodes docker version diff the kubernetes need docker version .
You can directly uninstall docker, reinstall the specified version of docker on each nodes , next step restart docker, and node will be back online immediately.
And the docker-images and pods installed in this judge will not be affected because the physical folder is still there.
yum remove -y docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \
docker-engine
yum install -y docker-ce-18.09.7 docker-ce-cli-18.09.7 containerd.io
systemctl enable docker
systemctl start docker

I had exactly same issue, I've added parameters to ExecStart as mentioned above, but still getting same error. Then I've did kubeadm reset and systemctl daemon-reload and recreated cluster. This error seems to be gone. Testing now...

kubernetes installation and kube-dns: open /run/flannel/subnet.env: no such file or directory

Overview
kube-dns can't start (SetupNetworkError) after kubeadm init and network setup:
Error syncing pod, skipping: failed to "SetupNetwork" for
"kube-dns-654381707-w4mpg_kube-system" with SetupNetworkError:
"Failed to setup network for pod
\"kube-dns-654381707-w4mpg_kube-system(8ffe3172-a739-11e6-871f-000c2912631c)\"
using network plugins \"cni\": open /run/flannel/subnet.env:
no such file or directory; Skipping pod"
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:48:38Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:42:39Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Environment
VMWare Fusion for Mac
OS
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Kernel (e.g. uname -a)
Linux ubuntu-master 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
What is the problem
kube-system kube-dns-654381707-w4mpg 0/3 ContainerCreating 0 2m
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 3m 1 {default-scheduler } Normal Scheduled Successfully assigned kube-dns-654381707-w4mpg to ubuntu-master
2m 1s 177 {kubelet ubuntu-master} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "kube-dns-654381707-w4mpg_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-654381707-w4mpg_kube-system(8ffe3172-a739-11e6-871f-000c2912631c)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"
What I expected to happen
kube-dns Running
How to reproduce it
root#ubuntu-master:~# kubeadm init
Running pre-flight checks
<master/tokens> generated token: "247a8e.b7c8c1a7685bf204"
<master/pki> generated Certificate Authority key and certificate:
Issuer: CN=kubernetes | Subject: CN=kubernetes | CA: true
Not before: 2016-11-10 11:40:21 +0000 UTC Not After: 2026-11-08 11:40:21 +0000 UTC
Public: /etc/kubernetes/pki/ca-pub.pem
Private: /etc/kubernetes/pki/ca-key.pem
Cert: /etc/kubernetes/pki/ca.pem
<master/pki> generated API Server key and certificate:
Issuer: CN=kubernetes | Subject: CN=kube-apiserver | CA: false
Not before: 2016-11-10 11:40:21 +0000 UTC Not After: 2017-11-10 11:40:21 +0000 UTC
Alternate Names: [172.20.10.4 10.96.0.1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local]
Public: /etc/kubernetes/pki/apiserver-pub.pem
Private: /etc/kubernetes/pki/apiserver-key.pem
Cert: /etc/kubernetes/pki/apiserver.pem
<master/pki> generated Service Account Signing keys:
Public: /etc/kubernetes/pki/sa-pub.pem
Private: /etc/kubernetes/pki/sa-key.pem
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 14.053453 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 0.508561 seconds
<master/apiclient> attempting a test deployment
<master/apiclient> test deployment succeeded
<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
<master/discovery> kube-discovery is ready after 1.503838 seconds
<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns
Kubernetes master initialised successfully!
You can now join any number of machines by running the following on each node:
kubeadm join --token=247a8e.b7c8c1a7685bf204 172.20.10.4
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-eo1ua 1/1 Running 0 47s
kube-system etcd-ubuntu-master 1/1 Running 3 51s
kube-system kube-apiserver-ubuntu-master 1/1 Running 0 49s
kube-system kube-controller-manager-ubuntu-master 1/1 Running 3 51s
kube-system kube-discovery-1150918428-qmu0b 1/1 Running 0 46s
kube-system kube-dns-654381707-mv47d 0/3 ContainerCreating 0 44s
kube-system kube-proxy-k0k9q 1/1 Running 0 44s
kube-system kube-scheduler-ubuntu-master 1/1 Running 3 51s
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl apply -f https://git.io/weave-kube
daemonset "weave-net" created
root#ubuntu-master:~#
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-eo1ua 1/1 Running 0 47s
kube-system etcd-ubuntu-master 1/1 Running 3 51s
kube-system kube-apiserver-ubuntu-master 1/1 Running 0 49s
kube-system kube-controller-manager-ubuntu-master 1/1 Running 3 51s
kube-system kube-discovery-1150918428-qmu0b 1/1 Running 0 46s
kube-system kube-dns-654381707-mv47d 0/3 ContainerCreating 0 44s
kube-system kube-proxy-k0k9q 1/1 Running 0 44s
kube-system kube-scheduler-ubuntu-master 1/1 Running 3 51s
kube-system weave-net-ja736 2/2 Running 0 1h

It looks like you have configured flannel before running kubeadm init. You can try to fix this by removing flannel (it may be sufficient to remove config file rm -f /etc/cni/net.d/*flannel*), but it's best to start fresh.

open below file location(if exists, either create) and paste below data
vim /run/flannel/subnet.env
FLANNEL_NETWORK=10.240.0.0/16
FLANNEL_SUBNET=10.240.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

pod creation stuck in ContainerCreating state - docker

It's seem to working by removing the$KUBELET_NETWORK_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf I have removed $KUBELET_NETWORK_ARGS and restarted the worker node then pods got deployed successfully.

Related

Kubernetes GPU Pod error : validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"

Setting up ELK stack on kubernetes using minikube

Coredns in pending state in Kubernetes cluster

Kubelet failed to get cgroup stats for "/system.slice/docker.service"

kubernetes installation and kube-dns: open /run/flannel/subnet.env: no such file or directory

Categories

Resources