How to fix kubernetes taint node.kubernetes.io/not-ready: NoSchedule

How to fix kubernetes taint node.kubernetes.io/not-ready: NoSchedule - docker

I am trying to run local development kubernetes cluster which runs in Docker Desktop context. But its just keeps having following taint: node.kubernetes.io/not-ready:NoSchedule.
Manally removing taints, ie kubectl taint nodes --all node.kubernetes.io/not-ready-, doesn't help, because it comes back right away
kubectl describe node, is:
Name: docker-desktop
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=docker-desktop
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 07 May 2021 11:00:31 +0100
Taints: node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: docker-desktop
AcquireTime: <unset>
RenewTime: Fri, 07 May 2021 16:14:19 +0100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 11:00:31 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 11:00:31 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 11:00:31 +0100 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Fri, 07 May 2021 16:14:05 +0100 Fri, 07 May 2021 16:11:05 +0100 KubeletNotReady PLEG is not healthy: pleg was last seen active 6m22.485400578s ago; threshold is 3m0s
Addresses:
InternalIP: 192.168.65.4
Hostname: docker-desktop
Capacity:
cpu: 5
ephemeral-storage: 61255492Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 18954344Ki
pods: 110
Allocatable:
cpu: 5
ephemeral-storage: 56453061334
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 18851944Ki
pods: 110
System Info:
Machine ID: f4da8f67-6e48-47f4-94f7-0a827259b845
System UUID: d07e4b6a-0000-0000-b65f-2398524d39c2
Boot ID: 431e1681-fdef-43db-9924-cb019ff53848
Kernel Version: 5.10.25-linuxkit
OS Image: Docker Desktop
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.6
Kubelet Version: v1.19.7
Kube-Proxy Version: v1.19.7
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1160m (23%) 1260m (25%)
memory 1301775360 (6%) 13288969216 (68%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeNotReady 86m (x2 over 90m) kubelet Node docker-desktop status is now: NodeNotReady
Normal NodeReady 85m (x3 over 5h13m) kubelet Node docker-desktop status is now: NodeReady
Normal Starting 61m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 61m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 61m (x8 over 61m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 61m (x7 over 61m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 61m (x8 over 61m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal Starting 60m kube-proxy Starting kube-proxy.
Normal NodeNotReady 55m kubelet Node docker-desktop status is now: NodeNotReady
Normal Starting 49m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 49m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientPID 49m (x7 over 49m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory 49m (x8 over 49m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 49m (x8 over 49m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal Starting 48m kube-proxy Starting kube-proxy.
Normal NodeNotReady 41m kubelet Node docker-desktop status is now: NodeNotReady
Normal Starting 37m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 37m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientPID 37m (x7 over 37m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeHasNoDiskPressure 37m (x8 over 37m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientMemory 37m (x8 over 37m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal Starting 36m kube-proxy Starting kube-proxy.
Normal NodeAllocatableEnforced 21m kubelet Updated Node Allocatable limit across pods
Normal Starting 21m kubelet Starting kubelet.
Normal NodeHasSufficientMemory 21m (x8 over 21m) kubelet Node docker-desktop status is now: NodeHasSufficientMemory
Normal NodeHasSufficientPID 21m (x7 over 21m) kubelet Node docker-desktop status is now: NodeHasSufficientPID
Normal NodeHasNoDiskPressure 21m (x8 over 21m) kubelet Node docker-desktop status is now: NodeHasNoDiskPressure
Normal Starting 21m kube-proxy Starting kube-proxy.
Normal NodeReady 6m16s (x2 over 14m) kubelet Node docker-desktop status is now: NodeReady
Normal NodeNotReady 3m16s (x3 over 15m) kubelet Node docker-desktop status is now: NodeNotReady
Allocated resources are quite significant, because the cluster is huge as well
CPU: 5GB
Memory: 18GB
SWAP: 1GB
Disk Image: 60GB
Machine: Mac Core i7, 32GB RAM, 512 GB SSD
I can see that the problem is with PLEG, but I need to understand what caused Pod Lifecycle Event Generator to result an error. Whether it's not sufficient allocated node resources or something else.
Any ideas?

In my case the problem was some super resource-hungry pods. Thus I had to downscale some deployments to be able to have a stable environment

Related

1 out 5 fluentd is in ImagePullBackOff state

I have 1 master and 5 nodes k8s cluster. I am setting EFK with ref: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes#step-4-%E2%80%94-creating-the-fluentd-daemonset
While Creating the Fluentd DaemonSet, 1 out 5 fluentd is in ImagePullBackOff state :
kubectl get all -n kube-logging -o wide Tue Apr 21 03:49:26 2020
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES
SELECTOR
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
NAME READY STATUS RESTARTS AGE IP NODE
po/fluentd-82h6k 1/1 Running 1 1d 100.96.15.56 ip-172-20-52-52.us-west-1.compute.internal
po/fluentd-8ghjq 0/1 ImagePullBackOff 0 17h 100.96.10.170 ip-172-20-58-72.us-west-1.compute.internal
po/fluentd-fdmc8 1/1 Running 1 1d 100.96.3.73 ip-172-20-63-147.us-west-1.compute.internal
po/fluentd-g7755 1/1 Running 1 1d 100.96.2.22 ip-172-20-60-101.us-west-1.compute.internal
po/fluentd-gj8q8 1/1 Running 1 1d 100.96.16.17 ip-172-20-57-232.us-west-1.compute.internal
admin#ip-172-20-58-79:~$ kubectl describe po/fluentd-8ghjq -n kube-logging
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 12m (x4364 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Back-off pulling image "fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1"
Warning FailedSync 2m (x4612 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Error syncing pod
Kubelet logs on node which is failing to run Fulentd
admin#ip-172-20-58-72:~$ journalctl -u kubelet -f
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095334 755 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095369 755 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: W0421 03:53:53.095440 755 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Apr 21 03:53:54 ip-172-20-58-72 kubelet[755]: I0421 03:53:54.882213 755 server.go:779] GET /metrics/cadvisor: (50.308555ms) 200 [[Prometheus/2.12.0] 172.20.58.79:54492]
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: I0421 03:53:55.452951 755 kuberuntime_manager.go:500] Container {Name:fluentd Image:fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:FLUENT_ELASTICSEARCH_HOST Value:vpc-cog-01-es-dtpgkfi.ap-southeast-1.es.amazonaws.com ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_PORT Value:443 ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_SCHEME Value:https ValueFrom:nil} {Name:FLUENTD_SYSTEMD_CONF Value:disable ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:varlog ReadOnly:false MountPath:/var/log SubPath: MountPropagation:<nil>} {Name:varlibdockercontainers ReadOnly:true MountPath:/var/lib/docker/containers SubPath: MountPropagation:<nil>} {Name:fluentd-token-k8fnp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: E0421 03:53:55.455327 755 pod_workers.go:182] Error syncing pod aa65dd30-82f2-11ea-a005-0607d7cb72ed ("fluentd-8ghjq_kube-logging(aa65dd30-82f2-11ea-a005-0607d7cb72ed)"), skipping: failed to "StartContainer" for "fluentd" with ImagePullBackOff: "Back-off pulling image \"fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1\""
Kubelet logs on the node which is running Fulentd successfully
admin#ip-172-20-63-147:~$ journalctl -u kubelet -f
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874293 1272 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874336 1272 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: W0421 04:09:25.874453 1272 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available

pod creation stuck in ContainerCreating state

I have created a k8s cluster with RHEL7 with kubernetes packages GitVersion:"v1.8.1". I'm trying to deploy wordpress on my custom cluster. But pod creation is always stuck in ContainerCreating state.
phani#k8s-master]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default wordpress-766d75457d-zlvdn 0/1 ContainerCreating 0 11m
kube-system etcd-k8s-master 1/1 Running 0 1h
kube-system kube-apiserver-k8s-master 1/1 Running 0 1h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 1h
kube-system kube-dns-545bc4bfd4-bb8js 3/3 Running 0 1h
kube-system kube-proxy-bf4zr 1/1 Running 0 1h
kube-system kube-proxy-d7zvg 1/1 Running 0 34m
kube-system kube-scheduler-k8s-master 1/1 Running 0 1h
kube-system weave-net-92zf9 2/2 Running 0 34m
kube-system weave-net-sh7qk 2/2 Running 0 1h
Docker Version:1.13.1
Pod status from descibe command
Normal Scheduled 18m default-scheduler Successfully assigned wordpress-766d75457d-zlvdn to worker1
Normal SuccessfulMountVolume 18m kubelet, worker1 MountVolume.SetUp succeeded for volume "default-token-tmpcm"
Warning DNSSearchForming 18m kubelet, worker1 Search Line limits were exceeded, some dns names have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local
Warning FailedCreatePodSandBox 14m kubelet, worker1 Failed create pod sandbox.
Warning FailedSync 25s (x8 over 14m) kubelet, worker1 Error syncing pod
Normal SandboxChanged 24s (x8 over 14m) kubelet, worker1 Pod sandbox changed, it will be killed and re-created.
from the kubelet log I observed below error on worker
error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
But kubelet is stable no problems seen on worker.
How do I solve this problem?
I checked the cni failure, I couldn't find anything.
~]# ls /opt/cni/bin
bridge cnitool dhcp flannel host-local ipvlan loopback macvlan noop ptp tuning weave-ipam weave-net weave-plugin-2.3.0
In journal logs below messages are repetitively appeared . seems like scheduler is trying to create the container all the time.
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421184 14339 remote_runtime.go:115] StopPodSandbox "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421212 14339 kuberuntime_manager.go:780] Failed to stop sandbox {"docker" "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304"}
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421247 14339 kuberuntime_manager.go:580] killPodWithSyncResult failed: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421262 14339 pod_workers.go:182] Error syncing pod 7f1c6bf1-6af3-11e8-856b-fa163e3d1891 ("wordpress-766d75457d-spdrb_default(7f1c6bf1-6af3-11e8-856b-fa163e3d1891)"), skipping: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"

Failed create pod sandbox.
... is almost always a CNI failure; I would check on the node that all the weave containers are happy, and that /opt/cni/bin is present (or its weave equivalent)
You may have to check both the journalctl -u kubelet.service as well as the docker logs for any containers running to discover the full scope of the error on the node.

It's seem to working by removing the$KUBELET_NETWORK_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
I have removed $KUBELET_NETWORK_ARGS and restarted the worker node then pods got deployed successfully.

As Matthew said it's most likely a CNI failure.
First, find the node this pod is running on:
kubectl get po wordpress-766d75457d-zlvdn -o wide
Next in the node where the pod is located check /etc/cni/net.d if you have more than one .conf then you can delete one and restart the node.
source: https://github.com/kubernetes/kubeadm/issues/578.
note this is one of the solutions.

While hopefully it's no one else's problem, for me, this happened when part of my filesystem was full.
I had pods stuck in ContainerCreating only on one node in my cluster. I also had a bunch of pods which I expected to shutdown, but hadn't. Someone recommended running
sudo systemctl status kubelet -l
which showed me a bunch of lines like
Jun 18 23:19:56 worker01 kubelet[1718]: E0618 23:19:56.461378 1718 kuberuntime_manager.go:647] createPodSandbox for pod "REDACTED(2c681b9c-cf5b-11eb-9c79-52540077cc53)" failed: mkdir /var/log/pods/2c681b9c-cf5b-11eb-9c79-52540077cc53: no space left on device
I confirmed that I was out of space with
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 189G 0 189G 0% /dev
tmpfs 189G 0 189G 0% /sys/fs/cgroup
/dev/mapper/vg01-root 20G 7.0G 14G 35% /
/dev/mapper/vg01-tmp 4.0G 34M 4.0G 1% /tmp
/dev/mapper/vg01-home 4.0G 72M 4.0G 2% /home
/dev/mapper/vg01-varlog 10G 10G 20K 100% /var/log
/dev/mapper/vg01-varlogaudit 2.0G 68M 2.0G 4% /var/log/audit
I just had to clear out that dir (and did some manual cleanup on all the pending pods and pods that were stuck running).

Kubelet failed to get cgroup stats for "/system.slice/docker.service"

Question
What the kubectl (1.8.3 on CentOS 7) error massage actually means and how to resolve.
Nov 19 22:32:24 master kubelet[4425]: E1119 22:32:24.269786 4425 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get con
Nov 19 22:32:24 master kubelet[4425]: E1119 22:32:24.269802 4425 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get conta
Research
Found the same error and followed the workaround by updating the service unit of kubelet as below but did not work.
kubelet fails to get cgroup stats for docker and kubelet services
/etc/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/
[Service]
ExecStart=/usr/bin/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
Background
Setting up Kubernetes cluster by following Install kubeadm. The section in the document Installing Docker says about aligning the cgroup driver as below.
Note: Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. To ensure compatability you can either update Docker, like so:
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
But doing so caused docker service failed to start with:
unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag".
Nov 19 16:55:56 localhost.localdomain systemd1: docker.service: main process exited, code=exited, status=1/FAILURE.
Maser node is in ready with all system pods are running.
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-master 1/1 Running 0 39m
kube-system kube-apiserver-master 1/1 Running 0 39m
kube-system kube-controller-manager-master 1/1 Running 0 39m
kube-system kube-dns-545bc4bfd4-mqqqk 3/3 Running 0 40m
kube-system kube-flannel-ds-fclcs 1/1 Running 2 13m
kube-system kube-flannel-ds-hqlnb 1/1 Running 0 39m
kube-system kube-proxy-t7z5w 1/1 Running 0 40m
kube-system kube-proxy-xdw42 1/1 Running 0 13m
kube-system kube-scheduler-master 1/1 Running 0 39m
Environment
Kubernetes 1.8.3 on CentOS with Flannel.
$ kubectl version -o json | python -m json.tool
{
"clientVersion": {
"buildDate": "2017-11-08T18:39:33Z",
"compiler": "gc",
"gitCommit": "f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd",
"gitTreeState": "clean",
"gitVersion": "v1.8.3",
"goVersion": "go1.8.3",
"major": "1",
"minor": "8",
"platform": "linux/amd64"
},
"serverVersion": {
"buildDate": "2017-11-08T18:27:48Z",
"compiler": "gc",
"gitCommit": "f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd",
"gitTreeState": "clean",
"gitVersion": "v1.8.3",
"goVersion": "go1.8.3",
"major": "1",
"minor": "8",
"platform": "linux/amd64"
}
}
$ kubectl describe node master
Name: master
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=master
node-role.kubernetes.io/master=
Annotations: flannel.alpha.coreos.com/backend-data={"VtepMAC":"86:b6:7a:d6:7b:b3"}
flannel.alpha.coreos.com/backend-type=vxlan
flannel.alpha.coreos.com/kube-subnet-manager=true
flannel.alpha.coreos.com/public-ip=10.0.2.15
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: node-role.kubernetes.io/master:NoSchedule
CreationTimestamp: Sun, 19 Nov 2017 22:27:17 +1100
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:27:13 +1100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Sun, 19 Nov 2017 23:04:56 +1100 Sun, 19 Nov 2017 22:32:24 +1100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.99.10
Hostname: master
Capacity:
cpu: 1
memory: 3881880Ki
pods: 110
Allocatable:
cpu: 1
memory: 3779480Ki
pods: 110
System Info:
Machine ID: ca0a351004604dd49e43f8a6258ddd77
System UUID: CA0A3510-0460-4DD4-9E43-F8A6258DDD77
Boot ID: e9060efa-42be-498d-8cb8-8b785b51b247
Kernel Version: 3.10.0-693.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.8.3
Kube-Proxy Version: v1.8.3
PodCIDR: 10.244.0.0/24
ExternalID: master
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-master 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-master 250m (25%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-master 200m (20%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-dns-545bc4bfd4-mqqqk 260m (26%) 0 (0%) 110Mi (2%) 170Mi (4%)
kube-system kube-flannel-ds-hqlnb 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-t7z5w 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-master 100m (10%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
810m (81%) 0 (0%) 110Mi (2%) 170Mi (4%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 38m kubelet, master Starting kubelet.
Normal NodeAllocatableEnforced 38m kubelet, master Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 37m (x8 over 38m) kubelet, master Node master status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 37m (x8 over 38m) kubelet, master Node master status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 37m (x7 over 38m) kubelet, master Node master status is now: NodeHasNoDiskPressure
Normal Starting 37m kube-proxy, master Starting kube-proxy.
Normal Starting 32m kubelet, master Starting kubelet.
Normal NodeAllocatableEnforced 32m kubelet, master Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 32m kubelet, master Node master status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 32m kubelet, master Node master status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 32m kubelet, master Node master status is now: NodeHasNoDiskPressure
Normal NodeNotReady 32m kubelet, master Node master status is now: NodeNotReady
Normal NodeReady 32m kubelet, master Node master status is now: NodeReady

the reason for this problem is that the nodes docker version diff the kubernetes need docker version .
You can directly uninstall docker, reinstall the specified version of docker on each nodes , next step restart docker, and node will be back online immediately.
And the docker-images and pods installed in this judge will not be affected because the physical folder is still there.
yum remove -y docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \
docker-engine
yum install -y docker-ce-18.09.7 docker-ce-cli-18.09.7 containerd.io
systemctl enable docker
systemctl start docker

I had exactly same issue, I've added parameters to ExecStart as mentioned above, but still getting same error. Then I've did kubeadm reset and systemctl daemon-reload and recreated cluster. This error seems to be gone. Testing now...

docker microservice apps restart over and over again in kubernetes

I am trying to run microservice applications with kubernetes. I have rabbitmq, elasticsearch and eureka discovery service running on kubernetes. Other than that, I have three microservice applications. When I run two of them, it is fine; however when I run the third one they all began restarting over and over again without any reason.
One of my config files:
apiVersion: v1
kind: Service
metadata:
name: hrm
labels:
app: suite
spec:
type: NodePort
ports:
- port: 8086
nodePort: 30001
selector:
app: suite
tier: hrm-core
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: hrm
spec:
replicas: 1
template:
metadata:
labels:
app: suite
tier: hrm-core
spec:
containers:
- image: privaterepo/hrm-core
name: hrm
ports:
- containerPort: 8086
imagePullSecrets:
- name: regsecret
Result from kubectl describe pod hrm:
State: Running
Started: Mon, 12 Jun 2017 12:08:28 +0300
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 12 Jun 2017 12:07:05 +0300
Ready: True
Restart Count: 5
18m 18m 1 kubelet, minikube Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "hrm" with CrashLoopBackOff: "Back-off 10s restarting failed container=hrm pod=hrm-3288407936-cwvgz_default(915fb55c-4f4a-11e7-9240-080027ccf1c3)"
kubectl get pods:
NAME READY STATUS RESTARTS AGE
discserv-189146465-s599x 1/1 Running 0 2d
esearch-3913228203-9sm72 1/1 Running 0 2d
hrm-3288407936-cwvgz 1/1 Running 6 46m
parabot-1262887100-6098j 1/1 Running 9 2d
rabbitmq-279796448-9qls3 1/1 Running 0 2d
suite-ui-1725964700-clvbd 1/1 Running 3 2d
kubectl version:
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"dirty", BuildDate:"2017-04-07T20:43:50Z", GoVersion:"go1.7.1", Compiler:"gc", Platform:"linux/amd64"}
minikube version:
minikube version: v0.18.0
When I look at pod logs, there is no error. It seems like it starts without any problem. what could be the problem here?
edit: output of kubectl get events:
19m 19m 1 discserv-189146465-lk3sm Pod Normal SandboxChanged kubelet, minikube Pod sandbox changed, it will be killed and re-created.
19m 19m 1 discserv-189146465-lk3sm Pod spec.containers{discserv} Normal Pulling kubelet, minikube pulling image "private repo"
19m 19m 1 discserv-189146465-lk3sm Pod spec.containers{discserv} Normal Pulled kubelet, minikube Successfully pulled image "private repo"
19m 19m 1 discserv-189146465-lk3sm Pod spec.containers{discserv} Normal Created kubelet, minikube Created container with id 1607af1a7d217a6c9c91c1061f6b2148dd830a525b4fb02e9c6d71e8932c9f67
19m 19m 1 discserv-189146465-lk3sm Pod spec.containers{discserv} Normal Started kubelet, minikube Started container with id 1607af1a7d217a6c9c91c1061f6b2148dd830a525b4fb02e9c6d71e8932c9f67
19m 19m 1 esearch-3913228203-6l3t7 Pod Normal SandboxChanged kubelet, minikube Pod sandbox changed, it will be killed and re-created.
19m 19m 1 esearch-3913228203-6l3t7 Pod spec.containers{esearch} Normal Pulled kubelet, minikube Container image "elasticsearch:2.4" already present on machine
19m 19m 1 esearch-3913228203-6l3t7 Pod spec.containers{esearch} Normal Created kubelet, minikube Created container with id db30f7190fec4643b0ee7f9e211fa92572ff24a7d934e312a97e0a08bb1ccd60
19m 19m 1 esearch-3913228203-6l3t7 Pod spec.containers{esearch} Normal Started kubelet, minikube Started container with id db30f7190fec4643b0ee7f9e211fa92572ff24a7d934e312a97e0a08bb1ccd60
18m 18m 1 hrm-3288407936-d2vhh Pod Normal Scheduled default-scheduler Successfully assigned hrm-3288407936-d2vhh to minikube
18m 18m 1 hrm-3288407936-d2vhh Pod spec.containers{hrm} Normal Pulling kubelet, minikube pulling image "private repo"
18m 18m 1 hrm-3288407936-d2vhh Pod spec.containers{hrm} Normal Pulled kubelet, minikube Successfully pulled image "private repo"
18m 18m 1 hrm-3288407936-d2vhh Pod spec.containers{hrm} Normal Created kubelet, minikube Created container with id 34d1f35fc68ed64e5415e9339405847d496e48ad60eb7b08e864ee0f5b87516e
18m 18m 1 hrm-3288407936-d2vhh Pod spec.containers{hrm} Normal Started kubelet, minikube Started container with id 34d1f35fc68ed64e5415e9339405847d496e48ad60eb7b08e864ee0f5b87516e
18m 18m 1 hrm-3288407936 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: hrm-3288407936-d2vhh
18m 18m 1 hrm Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set hrm-3288407936 to 1
19m 19m 1 minikube Node Normal RegisteredNode controllermanager Node minikube event: Registered Node minikube in NodeController
19m 19m 1 minikube Node Normal Starting kubelet, minikube Starting kubelet.
19m 19m 1 minikube Node Warning ImageGCFailed kubelet, minikube unable to find data for container /
19m 19m 1 minikube Node Normal NodeAllocatableEnforced kubelet, minikube Updated Node Allocatable limit across pods
19m 19m 1 minikube Node Normal NodeHasSufficientDisk kubelet, minikube Node minikube status is now: NodeHasSufficientDisk
19m 19m 1 minikube Node Normal NodeHasSufficientMemory kubelet, minikube Node minikube status is now: NodeHasSufficientMemory
19m 19m 1 minikube Node Normal NodeHasNoDiskPressure kubelet, minikube Node minikube status is now: NodeHasNoDiskPressure
19m 19m 1 minikube Node Warning Rebooted kubelet, minikube Node minikube has been rebooted, boot id: f66e28f9-62b3-4066-9e18-33b152fa1300
19m 19m 1 minikube Node Normal NodeNotReady kubelet, minikube Node minikube status is now: NodeNotReady
19m 19m 1 minikube Node Normal Starting kube-proxy, minikube Starting kube-proxy.
19m 19m 1 minikube Node Normal NodeReady kubelet, minikube Node minikube status is now: NodeReady
8m 8m 1 minikube Node Warning SystemOOM kubelet, minikube System OOM encountered
18m 18m 1 parabot-1262887100-r84kf Pod Normal Scheduled default-scheduler Successfully assigned parabot-1262887100-r84kf to minikube
8m 18m 2 parabot-1262887100-r84kf Pod spec.containers{parabot} Normal Pulling kubelet, minikube pulling image "private repo"
8m 18m 2 parabot-1262887100-r84kf Pod spec.containers{parabot} Normal Pulled kubelet, minikube Successfully pulled image "private repo"
18m 18m 1 parabot-1262887100-r84kf Pod spec.containers{parabot} Normal Created kubelet, minikube Created container with id ed8b5c19a2ad3729015f20707b6b4d4132f86bd8a3f8db1d8d79381200c63045
18m 18m 1 parabot-1262887100-r84kf Pod spec.containers{parabot} Normal Started kubelet, minikube Started container with id ed8b5c19a2ad3729015f20707b6b4d4132f86bd8a3f8db1d8d79381200c63045
8m 8m 1 parabot-1262887100-r84kf Pod spec.containers{parabot} Normal Created kubelet, minikube Created container with id 664931f24e482310e1f66dcb230c9a2a4d11aae8d4b3866bcbd084b19d3d7b2b
8m 8m 1 parabot-1262887100-r84kf Pod spec.containers{parabot} Normal Started kubelet, minikube Started container with id 664931f24e482310e1f66dcb230c9a2a4d11aae8d4b3866bcbd084b19d3d7b2b
18m 18m 1 parabot-1262887100 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: parabot-1262887100-r84kf
18m 18m 1 parabot Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set parabot-1262887100 to 1
19m 19m 1 rabbitmq-279796448-pcqqh Pod Normal SandboxChanged kubelet, minikube Pod sandbox changed, it will be killed and re-created.
19m 19m 1 rabbitmq-279796448-pcqqh Pod spec.containers{rabbitmq} Normal Pulling kubelet, minikube pulling image "rabbitmq"
19m 19m 1 rabbitmq-279796448-pcqqh Pod spec.containers{rabbitmq} Normal Pulled kubelet, minikube Successfully pulled image "rabbitmq"
19m 19m 1 rabbitmq-279796448-pcqqh Pod spec.containers{rabbitmq} Normal Created kubelet, minikube Created container with id 155e900afaa00952e4bb9a7a8b282d2c26004d187aa727201bab596465f0ea50
19m 19m 1 rabbitmq-279796448-pcqqh Pod spec.containers{rabbitmq} Normal Started kubelet, minikube Started container with id 155e900afaa00952e4bb9a7a8b282d2c26004d187aa727201bab596465f0ea50
19m 19m 1 suite-ui-1725964700-ssshn Pod Normal SandboxChanged kubelet, minikube Pod sandbox changed, it will be killed and re-created.
19m 19m 1 suite-ui-1725964700-ssshn Pod spec.containers{suite-ui} Normal Pulling kubelet, minikube pulling image "private repo"
19m 19m 1 suite-ui-1725964700-ssshn Pod spec.containers{suite-ui} Normal Pulled kubelet, minikube Successfully pulled image "private repo"
19m 19m 1 suite-ui-1725964700-ssshn Pod spec.containers{suite-ui} Normal Created kubelet, minikube Created container with id bcaa7d96e3b0e574cd48641a633eb36c5d938f5fad41d44db425dd02da63ba3a
19m 19m 1 suite-ui-1725964700-ssshn Pod spec.containers{suite-ui} Normal Started kubelet, minikube Started container with id bcaa7d96e3b0e574cd48641a633eb36c5d938f5fad41d44db425dd02da63ba3a

See kubectl get logs for any obvious errors. In this case, as suspected, it looks like it is insufficient resources problem (or a service that has resource leaks).
If possible, try increasing resources to see if it helps.

kubernetes installation and kube-dns: open /run/flannel/subnet.env: no such file or directory

Overview
kube-dns can't start (SetupNetworkError) after kubeadm init and network setup:
Error syncing pod, skipping: failed to "SetupNetwork" for
"kube-dns-654381707-w4mpg_kube-system" with SetupNetworkError:
"Failed to setup network for pod
\"kube-dns-654381707-w4mpg_kube-system(8ffe3172-a739-11e6-871f-000c2912631c)\"
using network plugins \"cni\": open /run/flannel/subnet.env:
no such file or directory; Skipping pod"
Kubernetes version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:48:38Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.4", GitCommit:"3b417cc4ccd1b8f38ff9ec96bb50a81ca0ea9d56", GitTreeState:"clean", BuildDate:"2016-10-21T02:42:39Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}
Environment
VMWare Fusion for Mac
OS
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Kernel (e.g. uname -a)
Linux ubuntu-master 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
What is the problem
kube-system kube-dns-654381707-w4mpg 0/3 ContainerCreating 0 2m
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
3m 3m 1 {default-scheduler } Normal Scheduled Successfully assigned kube-dns-654381707-w4mpg to ubuntu-master
2m 1s 177 {kubelet ubuntu-master} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "kube-dns-654381707-w4mpg_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-654381707-w4mpg_kube-system(8ffe3172-a739-11e6-871f-000c2912631c)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"
What I expected to happen
kube-dns Running
How to reproduce it
root#ubuntu-master:~# kubeadm init
Running pre-flight checks
<master/tokens> generated token: "247a8e.b7c8c1a7685bf204"
<master/pki> generated Certificate Authority key and certificate:
Issuer: CN=kubernetes | Subject: CN=kubernetes | CA: true
Not before: 2016-11-10 11:40:21 +0000 UTC Not After: 2026-11-08 11:40:21 +0000 UTC
Public: /etc/kubernetes/pki/ca-pub.pem
Private: /etc/kubernetes/pki/ca-key.pem
Cert: /etc/kubernetes/pki/ca.pem
<master/pki> generated API Server key and certificate:
Issuer: CN=kubernetes | Subject: CN=kube-apiserver | CA: false
Not before: 2016-11-10 11:40:21 +0000 UTC Not After: 2017-11-10 11:40:21 +0000 UTC
Alternate Names: [172.20.10.4 10.96.0.1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local]
Public: /etc/kubernetes/pki/apiserver-pub.pem
Private: /etc/kubernetes/pki/apiserver-key.pem
Cert: /etc/kubernetes/pki/apiserver.pem
<master/pki> generated Service Account Signing keys:
Public: /etc/kubernetes/pki/sa-pub.pem
Private: /etc/kubernetes/pki/sa-key.pem
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 14.053453 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 0.508561 seconds
<master/apiclient> attempting a test deployment
<master/apiclient> test deployment succeeded
<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
<master/discovery> kube-discovery is ready after 1.503838 seconds
<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns
Kubernetes master initialised successfully!
You can now join any number of machines by running the following on each node:
kubeadm join --token=247a8e.b7c8c1a7685bf204 172.20.10.4
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-eo1ua 1/1 Running 0 47s
kube-system etcd-ubuntu-master 1/1 Running 3 51s
kube-system kube-apiserver-ubuntu-master 1/1 Running 0 49s
kube-system kube-controller-manager-ubuntu-master 1/1 Running 3 51s
kube-system kube-discovery-1150918428-qmu0b 1/1 Running 0 46s
kube-system kube-dns-654381707-mv47d 0/3 ContainerCreating 0 44s
kube-system kube-proxy-k0k9q 1/1 Running 0 44s
kube-system kube-scheduler-ubuntu-master 1/1 Running 3 51s
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl apply -f https://git.io/weave-kube
daemonset "weave-net" created
root#ubuntu-master:~#
root#ubuntu-master:~#
root#ubuntu-master:~# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system dummy-2088944543-eo1ua 1/1 Running 0 47s
kube-system etcd-ubuntu-master 1/1 Running 3 51s
kube-system kube-apiserver-ubuntu-master 1/1 Running 0 49s
kube-system kube-controller-manager-ubuntu-master 1/1 Running 3 51s
kube-system kube-discovery-1150918428-qmu0b 1/1 Running 0 46s
kube-system kube-dns-654381707-mv47d 0/3 ContainerCreating 0 44s
kube-system kube-proxy-k0k9q 1/1 Running 0 44s
kube-system kube-scheduler-ubuntu-master 1/1 Running 3 51s
kube-system weave-net-ja736 2/2 Running 0 1h

It looks like you have configured flannel before running kubeadm init. You can try to fix this by removing flannel (it may be sufficient to remove config file rm -f /etc/cni/net.d/*flannel*), but it's best to start fresh.

open below file location(if exists, either create) and paste below data
vim /run/flannel/subnet.env
FLANNEL_NETWORK=10.240.0.0/16
FLANNEL_SUBNET=10.240.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to fix kubernetes taint node.kubernetes.io/not-ready: NoSchedule - docker

In my case the problem was some super resource-hungry pods. Thus I had to downscale some deployments to be able to have a stable environment

Related

1 out 5 fluentd is in ImagePullBackOff state

pod creation stuck in ContainerCreating state

Kubelet failed to get cgroup stats for "/system.slice/docker.service"

docker microservice apps restart over and over again in kubernetes

kubernetes installation and kube-dns: open /run/flannel/subnet.env: no such file or directory

Categories

Resources