Kubelet failed to get cgroup stats for "/system.slice/docker.service" - docker

Question
What the kubectl (1.8.3 on CentOS 7) error massage actually means and how to resolve.
Nov 19 22:32:24 master kubelet[4425]: E1119 22:32:24.269786 4425 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get con
Nov 19 22:32:24 master kubelet[4425]: E1119 22:32:24.269802 4425 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get conta
Research
Found the same error and followed the workaround by updating the service unit of kubelet as below but did not work.
kubelet fails to get cgroup stats for docker and kubelet services
/etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/
[Service]
ExecStart=/usr/bin/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
Background
Setting up Kubernetes cluster by following Install kubeadm. The section in the document Installing Docker says about aligning the cgroup driver as below.
Note: Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. To ensure compatability you can either update Docker, like so:
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
But doing so caused docker service failed to start with:
unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag".
Nov 19 16:55:56 localhost.localdomain systemd1: docker.service: main process exited, code=exited, status=1/FAILURE.
Maser node is in ready with all system pods are running.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-master 1/1 Running 0 39m
kube-system kube-apiserver-master 1/1 Running 0 39m
kube-system kube-controller-manager-master 1/1 Running 0 39m
kube-system kube-dns-545bc4bfd4-mqqqk 3/3 Running 0 40m
kube-system kube-flannel-ds-fclcs 1/1 Running 2 13m
kube-system kube-flannel-ds-hqlnb 1/1 Running 0 39m
kube-system kube-proxy-t7z5w 1/1 Running 0 40m
kube-system kube-proxy-xdw42 1/1 Running 0 13m
kube-system kube-scheduler-master 1/1 Running 0 39m
Environment
Kubernetes 1.8.3 on CentOS with Flannel.
$ kubectl version -o json | python -m json.tool
{
"clientVersion": {
"buildDate": "2017-11-08T18:39:33Z",
"compiler": "gc",
"gitCommit": "f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd",
"gitTreeState": "clean",
"gitVersion": "v1.8.3",
"goVersion": "go1.8.3",
"major": "1",
"minor": "8",
"platform": "linux/amd64"
},
"serverVersion": {
"buildDate": "2017-11-08T18:27:48Z",
"compiler": "gc",
"gitCommit": "f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd",
"gitTreeState": "clean",
"gitVersion": "v1.8.3",
"goVersion": "go1.8.3",
"major": "1",
"minor": "8",
"platform": "linux/amd64"
}
}
$ kubectl describe node master
Name: master
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=master
node-role.kubernetes.io/master=
Annotations: flannel.alpha.coreos.com/backend-data={"VtepMAC":"86:b6:7a:d6:7b:b3"}
flannel.alpha.coreos.com/backend-type=vxlan
flannel.alpha.coreos.com/kube-subnet-manager=true
flannel.alpha.coreos.com/public-ip=10.0.2.15
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node-role.kubernetes.io/master:NoSchedule
CreationTimestamp: Sun, 19 Nov 2017 22:27:17 +1100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:32:24 +1100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.99.10
Hostname: master
Capacity:
cpu: 1
memory: 3881880Ki
pods: 110
Allocatable:
cpu: 1
memory: 3779480Ki
pods: 110
System Info:
Machine ID: ca0a351004604dd49e43f8a6258ddd77
System UUID: CA0A3510-0460-4DD4-9E43-F8A6258DDD77
Boot ID: e9060efa-42be-498d-8cb8-8b785b51b247
Kernel Version: 3.10.0-693.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.8.3
Kube-Proxy Version: v1.8.3
PodCIDR: 10.244.0.0/24
ExternalID: master
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-master 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-master 250m (25%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-master 200m (20%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-dns-545bc4bfd4-mqqqk 260m (26%) 0 (0%) 110Mi (2%) 170Mi (4%)
kube-system kube-flannel-ds-hqlnb 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-t7z5w 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-master 100m (10%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
810m (81%) 0 (0%) 110Mi (2%) 170Mi (4%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 38m kubelet, master Starting kubelet.
Normal NodeAllocatableEnforced 38m kubelet, master Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 37m (x8 over 38m) kubelet, master Node master status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 37m (x8 over 38m) kubelet, master Node master status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 37m (x7 over 38m) kubelet, master Node master status is now: NodeHasNoDiskPressure
Normal Starting 37m kube-proxy, master Starting kube-proxy.
Normal Starting 32m kubelet, master Starting kubelet.
Normal NodeAllocatableEnforced 32m kubelet, master Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 32m kubelet, master Node master status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 32m kubelet, master Node master status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 32m kubelet, master Node master status is now: NodeHasNoDiskPressure
Normal NodeNotReady 32m kubelet, master Node master status is now: NodeNotReady
Normal NodeReady 32m kubelet, master Node master status is now: NodeReady

the reason for this problem is that the nodes docker version diff the kubernetes need docker version .
You can directly uninstall docker, reinstall the specified version of docker on each nodes , next step restart docker, and node will be back online immediately.
And the docker-images and pods installed in this judge will not be affected because the physical folder is still there.
yum remove -y docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \
docker-engine
yum install -y docker-ce-18.09.7 docker-ce-cli-18.09.7 containerd.io
systemctl enable docker
systemctl start docker

I had exactly same issue, I've added parameters to ExecStart as mentioned above, but still getting same error. Then I've did kubeadm reset and systemctl daemon-reload and recreated cluster. This error seems to be gone. Testing now...

Related

How to fix kubernetes taint node.kubernetes.io/not-ready: NoSchedule

I am trying to run local development kubernetes cluster which runs in Docker Desktop context. But its just keeps having following taint: node.kubernetes.io/not-ready:NoSchedule.
Manally removing taints, ie kubectl taint nodes --all node.kubernetes.io/not-ready-, doesn't help, because it comes back right away
kubectl describe node, is:
Name: docker-desktop
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=docker-desktop
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 07 May 2021 11:00:31 +0100
Taints: node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: docker-desktop
AcquireTime: <unset>
RenewTime: Fri, 07 May 2021 16:14:19 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 11:00:31 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 11:00:31 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 11:00:31 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 16:11:05 +0100 KubeletNotReady PLEG is not healthy: pleg was last seen active 6m22.485400578s ago; threshold is 3m0s
Addresses:
InternalIP: 192.168.65.4
Hostname: docker-desktop
Capacity:
cpu: 5
ephemeral-storage: 61255492Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 18954344Ki
pods: 110
Allocatable:
cpu: 5
ephemeral-storage: 56453061334
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 18851944Ki
pods: 110
System Info:
Machine ID: f4da8f67-6e48-47f4-94f7-0a827259b845
System UUID: d07e4b6a-0000-0000-b65f-2398524d39c2
Boot ID: 431e1681-fdef-43db-9924-cb019ff53848
Kernel Version: 5.10.25-linuxkit
OS Image: Docker Desktop
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.6
Kubelet Version: v1.19.7
Kube-Proxy Version: v1.19.7
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1160m (23%) 1260m (25%)
memory 1301775360 (6%) 13288969216 (68%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeNotReady 86m (x2 over 90m) kubelet Node docker-desktop status is now: NodeNotReady
Normal NodeReady 85m (x3 over 5h13m) kubelet Node docker-desktop status is now: NodeReady
Normal Starting 61m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 61m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 61m (x8 over 61m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 61m (x7 over 61m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 61m (x8 over 61m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal Starting 60m kube-proxy Starting kube-proxy.
Normal NodeNotReady 55m kubelet Node docker-desktop status is now: NodeNotReady
Normal Starting 49m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 49m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientPID 49m (x7 over 49m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory 49m (x8 over 49m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 49m (x8 over 49m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal Starting 48m kube-proxy Starting kube-proxy.
Normal NodeNotReady 41m kubelet Node docker-desktop status is now: NodeNotReady
Normal Starting 37m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 37m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientPID 37m (x7 over 37m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeHasNoDiskPressure 37m (x8 over 37m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientMemory 37m (x8 over 37m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal Starting 36m kube-proxy Starting kube-proxy.
Normal NodeAllocatableEnforced 21m kubelet Updated Node Allocatable limit across pods
Normal Starting 21m kubelet Starting kubelet.
Normal NodeHasSufficientMemory 21m (x8 over 21m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 21m (x7 over 21m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeHasNoDiskPressure 21m (x8 over 21m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal Starting 21m kube-proxy Starting kube-proxy.
Normal NodeReady 6m16s (x2 over 14m) kubelet Node docker-desktop status is now: NodeReady
Normal NodeNotReady 3m16s (x3 over 15m) kubelet Node docker-desktop status is now: NodeNotReady
Allocated resources are quite significant, because the cluster is huge as well
CPU: 5GB
Memory: 18GB
SWAP: 1GB
Disk Image: 60GB
Machine: Mac Core i7, 32GB RAM, 512 GB SSD
I can see that the problem is with PLEG, but I need to understand what caused Pod Lifecycle Event Generator to result an error. Whether it's not sufficient allocated node resources or something else.
Any ideas?
In my case the problem was some super resource-hungry pods. Thus I had to downscale some deployments to be able to have a stable environment

1 out 5 fluentd is in ImagePullBackOff state

I have 1 master and 5 nodes k8s cluster. I am setting EFK with ref: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes#step-4-%E2%80%94-creating-the-fluentd-daemonset
While Creating the Fluentd DaemonSet, 1 out 5 fluentd is in ImagePullBackOff state :
kubectl get all -n kube-logging -o wide Tue Apr 21 03:49:26 2020
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES
SELECTOR
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
NAME READY STATUS RESTARTS AGE IP NODE
po/fluentd-82h6k 1/1 Running 1 1d 100.96.15.56 ip-172-20-52-52.us-west-1.compute.internal
po/fluentd-8ghjq 0/1 ImagePullBackOff 0 17h 100.96.10.170 ip-172-20-58-72.us-west-1.compute.internal
po/fluentd-fdmc8 1/1 Running 1 1d 100.96.3.73 ip-172-20-63-147.us-west-1.compute.internal
po/fluentd-g7755 1/1 Running 1 1d 100.96.2.22 ip-172-20-60-101.us-west-1.compute.internal
po/fluentd-gj8q8 1/1 Running 1 1d 100.96.16.17 ip-172-20-57-232.us-west-1.compute.internal
admin#ip-172-20-58-79:~$ kubectl describe po/fluentd-8ghjq -n kube-logging
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 12m (x4364 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Back-off pulling image "fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1"
Warning FailedSync 2m (x4612 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Error syncing pod
Kubelet logs on node which is failing to run Fulentd
admin#ip-172-20-58-72:~$ journalctl -u kubelet -f
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095334 755 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095369 755 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: W0421 03:53:53.095440 755 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Apr 21 03:53:54 ip-172-20-58-72 kubelet[755]: I0421 03:53:54.882213 755 server.go:779] GET /metrics/cadvisor: (50.308555ms) 200 [[Prometheus/2.12.0] 172.20.58.79:54492]
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: I0421 03:53:55.452951 755 kuberuntime_manager.go:500] Container {Name:fluentd Image:fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:FLUENT_ELASTICSEARCH_HOST Value:vpc-cog-01-es-dtpgkfi.ap-southeast-1.es.amazonaws.com ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_PORT Value:443 ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_SCHEME Value:https ValueFrom:nil} {Name:FLUENTD_SYSTEMD_CONF Value:disable ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:varlog ReadOnly:false MountPath:/var/log SubPath: MountPropagation:<nil>} {Name:varlibdockercontainers ReadOnly:true MountPath:/var/lib/docker/containers SubPath: MountPropagation:<nil>} {Name:fluentd-token-k8fnp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: E0421 03:53:55.455327 755 pod_workers.go:182] Error syncing pod aa65dd30-82f2-11ea-a005-0607d7cb72ed ("fluentd-8ghjq_kube-logging(aa65dd30-82f2-11ea-a005-0607d7cb72ed)"), skipping: failed to "StartContainer" for "fluentd" with ImagePullBackOff: "Back-off pulling image \"fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1\""
Kubelet logs on the node which is running Fulentd successfully
admin#ip-172-20-63-147:~$ journalctl -u kubelet -f
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874293 1272 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874336 1272 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: W0421 04:09:25.874453 1272 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available

Getting ImageInspectError when trying to run an OpenFaas function on Raspberry Pi 3B+

I'm trying to deploy a function with OpenFaas project and a kubernetes cluster running on 2 Raspberry Pi 3B+.
Unfortunately, the pod that should handle the function is going to ImageInspectError state...
I tried to run the function with Docker directly and which is contain in a Docker image, and everything is working fine.
I opened an issue on the OpenFaas github and the maintainer told me to ask directly the Kubernetes community to get some clues.
My first question is : What does ImageInspectError mean and where it comes from ?
And here is all the informations I have :
Expected Behaviour
Pod should run.
Current Behaviour
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-masternode 1/1 Running 1 1d
kube-system kube-apiserver-masternode 1/1 Running 1 1d
kube-system kube-controller-manager-masternode 1/1 Running 1 1d
kube-system kube-dns-7f9b64f644-x42sr 3/3 Running 3 1d
kube-system kube-proxy-wrp6f 1/1 Running 1 1d
kube-system kube-proxy-x6pvq 1/1 Running 1 1d
kube-system kube-scheduler-masternode 1/1 Running 1 1d
kube-system weave-net-4995q 2/2 Running 3 1d
kube-system weave-net-5g7pd 2/2 Running 3 1d
openfaas-fn figlet-7f556fcd87-wrtf4 1/1 Running 0 4h
openfaas-fn testfaceraspi-7f6fcb5897-rs4cq 0/1 ImageInspectError 0 2h
openfaas alertmanager-66b98dd4d4-kcsq4 1/1 Running 1 1d
openfaas faas-netesd-5b5d6d5648-mqftl 1/1 Running 1 1d
openfaas gateway-846f8b5686-724q8 1/1 Running 2 1d
openfaas nats-86955fb749-7vsbm 1/1 Running 1 1d
openfaas prometheus-6ffc57bb8f-fpk6r 1/1 Running 1 1d
openfaas queue-worker-567bcf4d47-ngsgv 1/1 Running 2 1d
The testfaceraspi doesn't run.
Logs from the pod :
$ kubectl logs testfaceraspi-7f6fcb5897-rs4cq -n openfaas-fn
Error from server (BadRequest): container "testfaceraspi" in pod "testfaceraspi-7f6fcb5897-rs4cq" is waiting to start: ImageInspectError
Pod describe :
$ kubectl describe pod -n openfaas-fn testfaceraspi-7f6fcb5897-rs4cq
Name: testfaceraspi-7f6fcb5897-rs4cq
Namespace: openfaas-fn
Node: workernode/10.192.79.198
Start Time: Thu, 12 Jul 2018 11:39:05 +0200
Labels: faas_function=testfaceraspi
pod-template-hash=3929761453
Annotations: prometheus.io.scrape=false
Status: Pending
IP: 10.40.0.16
Controlled By: ReplicaSet/testfaceraspi-7f6fcb5897
Containers:
testfaceraspi:
Container ID:
Image: gallouche/testfaceraspi
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImageInspectError
Ready: False
Restart Count: 0
Liveness: exec [cat /tmp/.lock] delay=3s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [cat /tmp/.lock] delay=3s timeout=1s period=10s #success=1 #failure=3
Environment:
fprocess: python3 index.py
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5qhnn (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-5qhnn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5qhnn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning DNSConfigForming 2m (x1019 over 3h) kubelet, workernode Search Line limits were exceeded, some search paths have been omitted, the applied search line is: openfaas-fn.svc.cluster.local svc.cluster.local cluster.local heig-vd.ch einet.ad.eivd.ch web.ad.eivd.ch
And the event logs :
$ kubectl get events --sort-by=.metadata.creationTimestamp -n openfaas-fn
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
14m 1h 347 testfaceraspi-7f6fcb5897-rs4cq.1540db41e89d4c52 Pod Warning DNSConfigForming kubelet, workernode Search Line limits were exceeded, some search paths have been omitted, the applied search line is: openfaas-fn.svc.cluster.local svc.cluster.local cluster.local heig-vd.ch einet.ad.eivd.ch web.ad.eivd.ch
4m 1h 75 figlet-7f556fcd87-wrtf4.1540db421002b49e Pod Warning DNSConfigForming kubelet, workernode Search Line limits were exceeded, some search paths have been omitted, the applied search line is: openfaas-fn.svc.cluster.local svc.cluster.local cluster.local heig-vd.ch einet.ad.eivd.ch web.ad.eivd.ch
10m 10m 1 testfaceraspi-7f6fcb5897-d6z78.1540df9ed8b91865 Pod Normal Scheduled default-scheduler Successfully assigned testfaceraspi-7f6fcb5897-d6z78 to workernode
10m 10m 1 testfaceraspi-7f6fcb5897.1540df9ed6eee11f ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: testfaceraspi-7f6fcb5897-d6z78
10m 10m 1 testfaceraspi-7f6fcb5897-d6z78.1540df9eef3ef504 Pod Normal SuccessfulMountVolume kubelet, workernode MountVolume.SetUp succeeded for volume "default-token-5qhnn"
4m 10m 27 testfaceraspi-7f6fcb5897-d6z78.1540df9eef5445c0 Pod Warning DNSConfigForming kubelet, workernode Search Line limits were exceeded, some search paths have been omitted, the applied search line is: openfaas-fn.svc.cluster.local svc.cluster.local cluster.local heig-vd.ch einet.ad.eivd.ch web.ad.eivd.ch
8m 9m 8 testfaceraspi-7f6fcb5897-d6z78.1540df9f670d0dad Pod spec.containers{testfaceraspi} Warning InspectFailed kubelet, workernode Failed to inspect image "gallouche/testfaceraspi": rpc error: code = Unknown desc = Error response from daemon: readlink /var/lib/docker/overlay2/l: invalid argument
9m 9m 7 testfaceraspi-7f6fcb5897-d6z78.1540df9f670fcf3e Pod spec.containers{testfaceraspi} Warning Failed kubelet, workernode Error: ImageInspectError
Steps to Reproduce (for bugs)
Deploy OpenFaas on a 2 node k8s cluster
Create function with faas new testfaceraspi --lang python3-armhf
Add the following code in the handler.py :
import json
def handle(req):
jsonl = json.loads(req)
return ("Found " + str(jsonl["nbFaces"]) + " faces in OpenFaas Function on raspi !")
Change gateway and image in the .yml
provider:
name: faas
gateway: http://127.0.0.1:31112
functions:
testfaceraspi:
lang: python3-armhf
handler: ./testfaceraspi
image: gallouche/testfaceraspi
Run faas build -f testfacepi.yml
Login in DockerHub with docker login
Run faas push -f testfacepi.yml
Run faas deploy -f testfacepi.yml
Your Environment
FaaS-CLI version ( Full output from: faas-cli version ):
Commit: 3995a8197f1df1ecdf524844477cffa04e4690ea
Version: 0.6.11
Docker version ( Full output from: docker version ):
Client:
Version: 18.04.0-ce
API version: 1.37
Go version: go1.9.4
Git commit: 3d479c0
Built: Tue Apr 10 18:25:24 2018
OS/Arch: linux/arm
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.04.0-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.4
Git commit: 3d479c0
Built: Tue Apr 10 18:21:25 2018
OS/Arch: linux/arm
Experimental: false
Operating System and version (e.g. Linux, Windows, MacOS):
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 9.4 (stretch)
Release: 9.4
Codename: stretch
Thanks by advance and tell me if you need any more informations.
Gallouche
I've seen this error because the docker version wasn't supported by Kubernetes. As of Kubernetes version 1.11, the supported versions are 1.11.2 to 1.13.1 and 17.03.x.
I couldn't test the solution with OpenFaaS.

pod creation stuck in ContainerCreating state

I have created a k8s cluster with RHEL7 with kubernetes packages GitVersion:"v1.8.1". I'm trying to deploy wordpress on my custom cluster. But pod creation is always stuck in ContainerCreating state.
phani#k8s-master]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default wordpress-766d75457d-zlvdn 0/1 ContainerCreating 0 11m
kube-system etcd-k8s-master 1/1 Running 0 1h
kube-system kube-apiserver-k8s-master 1/1 Running 0 1h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 1h
kube-system kube-dns-545bc4bfd4-bb8js 3/3 Running 0 1h
kube-system kube-proxy-bf4zr 1/1 Running 0 1h
kube-system kube-proxy-d7zvg 1/1 Running 0 34m
kube-system kube-scheduler-k8s-master 1/1 Running 0 1h
kube-system weave-net-92zf9 2/2 Running 0 34m
kube-system weave-net-sh7qk 2/2 Running 0 1h
Docker Version:1.13.1
Pod status from descibe command
Normal Scheduled 18m default-scheduler Successfully assigned wordpress-766d75457d-zlvdn to worker1
Normal SuccessfulMountVolume 18m kubelet, worker1 MountVolume.SetUp succeeded for volume "default-token-tmpcm"
Warning DNSSearchForming 18m kubelet, worker1 Search Line limits were exceeded, some dns names have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local
Warning FailedCreatePodSandBox 14m kubelet, worker1 Failed create pod sandbox.
Warning FailedSync 25s (x8 over 14m) kubelet, worker1 Error syncing pod
Normal SandboxChanged 24s (x8 over 14m) kubelet, worker1 Pod sandbox changed, it will be killed and re-created.
from the kubelet log I observed below error on worker
error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
But kubelet is stable no problems seen on worker.
How do I solve this problem?
I checked the cni failure, I couldn't find anything.
~]# ls /opt/cni/bin
bridge cnitool dhcp flannel host-local ipvlan loopback macvlan noop ptp tuning weave-ipam weave-net weave-plugin-2.3.0
In journal logs below messages are repetitively appeared . seems like scheduler is trying to create the container all the time.
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421184 14339 remote_runtime.go:115] StopPodSandbox "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421212 14339 kuberuntime_manager.go:780] Failed to stop sandbox {"docker" "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304"}
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421247 14339 kuberuntime_manager.go:580] killPodWithSyncResult failed: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421262 14339 pod_workers.go:182] Error syncing pod 7f1c6bf1-6af3-11e8-856b-fa163e3d1891 ("wordpress-766d75457d-spdrb_default(7f1c6bf1-6af3-11e8-856b-fa163e3d1891)"), skipping: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Failed create pod sandbox.
... is almost always a CNI failure; I would check on the node that all the weave containers are happy, and that /opt/cni/bin is present (or its weave equivalent)
You may have to check both the journalctl -u kubelet.service as well as the docker logs for any containers running to discover the full scope of the error on the node.
It's seem to working by removing the$KUBELET_NETWORK_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
I have removed $KUBELET_NETWORK_ARGS and restarted the worker node then pods got deployed successfully.
As Matthew said it's most likely a CNI failure.
First, find the node this pod is running on:
kubectl get po wordpress-766d75457d-zlvdn -o wide
Next in the node where the pod is located check /etc/cni/net.d if you have more than one .conf then you can delete one and restart the node.
source: https://github.com/kubernetes/kubeadm/issues/578.
note this is one of the solutions.
While hopefully it's no one else's problem, for me, this happened when part of my filesystem was full.
I had pods stuck in ContainerCreating only on one node in my cluster. I also had a bunch of pods which I expected to shutdown, but hadn't. Someone recommended running
sudo systemctl status kubelet -l
which showed me a bunch of lines like
Jun 18 23:19:56 worker01 kubelet[1718]: E0618 23:19:56.461378 1718 kuberuntime_manager.go:647] createPodSandbox for pod "REDACTED(2c681b9c-cf5b-11eb-9c79-52540077cc53)" failed: mkdir /var/log/pods/2c681b9c-cf5b-11eb-9c79-52540077cc53: no space left on device
I confirmed that I was out of space with
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 189G 0 189G 0% /dev
tmpfs 189G 0 189G 0% /sys/fs/cgroup
/dev/mapper/vg01-root 20G 7.0G 14G 35% /
/dev/mapper/vg01-tmp 4.0G 34M 4.0G 1% /tmp
/dev/mapper/vg01-home 4.0G 72M 4.0G 2% /home
/dev/mapper/vg01-varlog 10G 10G 20K 100% /var/log
/dev/mapper/vg01-varlogaudit 2.0G 68M 2.0G 4% /var/log/audit
I just had to clear out that dir (and did some manual cleanup on all the pending pods and pods that were stuck running).

kubernetes installation and kube-dns: open /run/flannel/subnet.env: no such file or directory

Overview
kube-dns can't start (SetupNetworkError) after kubeadm init and network setup:
Error syncing pod, skipping: failed to "SetupNetwork" for
"kube-dns-654381707-w4mpg_kube-system" with SetupNetworkError:
"Failed to setup network for pod
\"kube-dns-654381707-w4mpg_kube-system(8ffe3172-a739-11e6-871f-000c2912631c)\"
using network plugins \"cni\": open /run/flannel/subnet.env:
no such file or directory; Skipping pod"
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:48:38Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:42:39Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Environment
VMWare Fusion for Mac
OS
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Kernel (e.g. uname -a)
Linux ubuntu-master 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
What is the problem
kube-system kube-dns-654381707-w4mpg 0/3 ContainerCreating 0 2m
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 3m 1 {default-scheduler } Normal Scheduled Successfully assigned kube-dns-654381707-w4mpg to ubuntu-master
2m 1s 177 {kubelet ubuntu-master} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "kube-dns-654381707-w4mpg_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-654381707-w4mpg_kube-system(8ffe3172-a739-11e6-871f-000c2912631c)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"
What I expected to happen
kube-dns Running
How to reproduce it
root#ubuntu-master:~# kubeadm init
Running pre-flight checks
<master/tokens> generated token: "247a8e.b7c8c1a7685bf204"
<master/pki> generated Certificate Authority key and certificate:
Issuer: CN=kubernetes | Subject: CN=kubernetes | CA: true
Not before: 2016-11-10 11:40:21 +0000 UTC Not After: 2026-11-08 11:40:21 +0000 UTC
Public: /etc/kubernetes/pki/ca-pub.pem
Private: /etc/kubernetes/pki/ca-key.pem
Cert: /etc/kubernetes/pki/ca.pem
<master/pki> generated API Server key and certificate:
Issuer: CN=kubernetes | Subject: CN=kube-apiserver | CA: false
Not before: 2016-11-10 11:40:21 +0000 UTC Not After: 2017-11-10 11:40:21 +0000 UTC
Alternate Names: [172.20.10.4 10.96.0.1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local]
Public: /etc/kubernetes/pki/apiserver-pub.pem
Private: /etc/kubernetes/pki/apiserver-key.pem
Cert: /etc/kubernetes/pki/apiserver.pem
<master/pki> generated Service Account Signing keys:
Public: /etc/kubernetes/pki/sa-pub.pem
Private: /etc/kubernetes/pki/sa-key.pem
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 14.053453 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 0.508561 seconds
<master/apiclient> attempting a test deployment
<master/apiclient> test deployment succeeded
<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
<master/discovery> kube-discovery is ready after 1.503838 seconds
<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns
Kubernetes master initialised successfully!
You can now join any number of machines by running the following on each node:
kubeadm join --token=247a8e.b7c8c1a7685bf204 172.20.10.4
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-eo1ua 1/1 Running 0 47s
kube-system etcd-ubuntu-master 1/1 Running 3 51s
kube-system kube-apiserver-ubuntu-master 1/1 Running 0 49s
kube-system kube-controller-manager-ubuntu-master 1/1 Running 3 51s
kube-system kube-discovery-1150918428-qmu0b 1/1 Running 0 46s
kube-system kube-dns-654381707-mv47d 0/3 ContainerCreating 0 44s
kube-system kube-proxy-k0k9q 1/1 Running 0 44s
kube-system kube-scheduler-ubuntu-master 1/1 Running 3 51s
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl apply -f https://git.io/weave-kube
daemonset "weave-net" created
root#ubuntu-master:~#
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-eo1ua 1/1 Running 0 47s
kube-system etcd-ubuntu-master 1/1 Running 3 51s
kube-system kube-apiserver-ubuntu-master 1/1 Running 0 49s
kube-system kube-controller-manager-ubuntu-master 1/1 Running 3 51s
kube-system kube-discovery-1150918428-qmu0b 1/1 Running 0 46s
kube-system kube-dns-654381707-mv47d 0/3 ContainerCreating 0 44s
kube-system kube-proxy-k0k9q 1/1 Running 0 44s
kube-system kube-scheduler-ubuntu-master 1/1 Running 3 51s
kube-system weave-net-ja736 2/2 Running 0 1h
It looks like you have configured flannel before running kubeadm init. You can try to fix this by removing flannel (it may be sufficient to remove config file rm -f /etc/cni/net.d/*flannel*), but it's best to start fresh.
open below file location(if exists, either create) and paste below data
vim /run/flannel/subnet.env
FLANNEL_NETWORK=10.240.0.0/16
FLANNEL_SUBNET=10.240.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Resources