Kubernetes CoreDNS in CrashLoopBackOff - docker

I understand that this question is asked dozen times, but nothing has helped me through internet searching.
My set up:
CentOS Linux release 7.5.1804 (Core)
Docker Version: 18.06.1-ce
Kubernetes: v1.12.3
Installed by official guide and this one:https://www.techrepublic.com/article/how-to-install-a-kubernetes-cluster-on-centos-7/
CoreDNS pods are in Error/CrashLoopBackOff state.
kube-system coredns-576cbf47c7-8phwt 0/1 CrashLoopBackOff 8 31m
kube-system coredns-576cbf47c7-rn2qc 0/1 CrashLoopBackOff 8 31m
My /etc/resolv.conf:
nameserver 8.8.8.8
Also tried with my local dns-resolver(router)
nameserver 10.10.10.1
Setup and init:
kubeadm init --apiserver-advertise-address=10.10.10.3 --pod-network-cidr=192.168.1.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
I tried to solve this with:
Editing the coredns: root#kub~]# kubectl edit cm coredns -n kube-system
and changing
proxy . /etc/resolv.conf
directly to
proxy . 10.10.10.1
or
proxy . 8.8.8.8
Also tried to:
kubectl -n kube-system get deployment coredns -o yaml | sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | kubectl apply -f -
And still nothing helps me.
Error from the logs:
plugin/loop: Seen "HINFO IN 7847735572277573283.2952120668710018229." more than twice, loop detected
The other thread - coredns pods have CrashLoopBackOff or Error state didnt help at all, becouse i havent hit any solutions that were described there. Nothing helped.

Even I have got such error and I successfully managed to work by below steps.
However, you missed 8.8.4.4
sudo nano /etc/resolv.conf
nameserver 8.8.8.8
nameserver 8.8.4.4
run following commands to restart daemon and docker service
sudo systemctl daemon-reload
sudo systemctl restart docker
If you are using kubeadm make sure you delete an entire cluster from master and provision cluster again.
kubectl drain <node_name> --delete-local-data --force --ignore-daemonsets
kubectl delete node <node_name>
kubeadm reset
Once You Provision the new cluster
kubectl get pods --all-namespaces
It Should give below expected Result
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-gldlr 2/2 Running 0 24s
kube-system coredns-86c58d9df4-lpnj6 1/1 Running 0 40s
kube-system coredns-86c58d9df4-xnb5r 1/1 Running 0 40s
kube-system kube-proxy-kkb7b 1/1 Running 0 40s
kube-system kube-scheduler-osboxes 1/1 Running 0 10s

$kubectl edit cm coredns -n kube-system
delete ‘loop’ ,save and exit
restart master node. It was work for me.

I faced the the same issue in my local k8s in Docker (KIND) setup. CoreDns pod gets crashloop backoff error.
Steps followed to make the pod into running state:
As Tim Chan said in this post and by referring the github issues link, I did the following
kubectl -n kube-system edit configmaps coredns -o yaml
modify the section
forward . /etc/resolv.conf with forward . 172.16.232.1 (mycase i set 8.8.8.8 for the timebeing)
Delete one of the Coredns Pods, or can wait for sometime - the pods will be in running state.

Usually happens when coredns can't talk to the kube-apiserver:
Check that your kubernetes service is in the default namespace:
$ kubectl get svc kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 130d
Then (you might have to create a pod):
$ kubectl -n kube-system exec -it <any-pod-with-shell> sh
# ping kubernetes.default.svc.cluster.local
PING kubernetes.default.svc.cluster.local (10.96.0.1): 56 data bytes
Also, try hitting port 443 from the port:
# telnet kubernetes.default.svc.cluster.local 443 # or
# curl kubernetes.default.svc.cluster.local:443

I got the error is:
connect: no route to host","time":"2021-03-19T14:42:05Z"}
crashloopbackoff
in the log showed by kubectl -n kube-system logs coredns-d9fdb9c9f-864rz
The issue is mentioned in https://github.com/coredns/coredns/tree/master/plugin/loop#troubleshooting-loops-in-kubernetes-clusters
tldr;
Reason: /etc/resolv.conf got updated somehow. The original one is at /run/systemd/resolve/resolv.conf:
e.g:
nameserver 172.16.232.1
Quick fix, edit Corefile:
$ kubectl -n kube-system edit configmaps coredns -o yaml
to replace forward . /etc/resolv.conf with forward . 172.16.232.1
e.g:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . 172.16.232.1 {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2021-03-18T15:58:07Z"
name: coredns
namespace: kube-system
resourceVersion: "49996"
uid: 428a03ff-82d0-4812-a3fa-e913c2911ebd
Done, after that, may need to restart the docker
sudo systemctl restart docker
Update: it could be fixed by just sudo systemctl restart docker

Related

Problem with minkube connecting to Node IP and getting timeout

Trying to connect to IP of a node and getting a timeout error
I have started minikube
minikube start
* minikube v1.27.0 on Microsoft Windows 10 Pro 10.0.19042 Build 19042
! Kubernetes 1.25.0 has a known issue with resolv.conf. minikube is using a workaround that should work for most use cases.
! For more information, see: https://github.com/kubernetes/kubernetes/issues/112135
* Using the hyperv driver based on existing profile
* Starting control plane node minikube in cluster minikube
* Restarting existing hyperv VM for "minikube" ...
* Preparing Kubernetes v1.25.0 on Docker 20.10.18 ...
* Verifying Kubernetes components...
- Using image gcr.io/k8s-minikube/storage-provisioner:v5
* Enabled addons: storage-provisioner, default-storageclass
! C:\Program Files\Docker\Docker\resources\bin\kubectl.exe is version 1.22.4, which may have incompatibilites with Kubernetes 1.25.0.
- Want kubectl v1.25.0? Try 'minikube kubectl -- get pods -A'
* Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
Checked that the pods are up and running
PS C:\WINDOWS\system32> minikube kubectl -- get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default account-docker-kubernetes 1/1 Running 3 (2m59s ago) 3h43m
kube-system coredns-565d847f94-dh5qw 1/1 Running 4 (2m59s ago) 3h57m
kube-system etcd-minikube 1/1 Running 0 2m1s
kube-system kube-apiserver-minikube 1/1 Running 0 2m2s
kube-system kube-controller-manager-minikube 1/1 Running 4 (2m59s ago) 3h57m
kube-system kube-proxy-gs6pm 1/1 Running 4 (2m59s ago) 3h57m
kube-system kube-scheduler-minikube 1/1 Running 4 (2m59s ago) 3h57m
kube-system storage-provisioner 1/1 Running 6 (2m59s ago) 3h57m
Checked the service
PS C:\WINDOWS\system32> kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
account-docker-kubernetes NodePort 10.105.105.236 <none> 8082:30163/TCP 3h44m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4h
Found the IP of the node
PS C:\WINDOWS\system32> minikube ip
172.25.177.1
so to connect to the service from outside Kubernetes its node ip plus service port
But http://172.25.177.1:30163/bank/health/ I get a timeout.
So i tried to connect to the Pod from inside first I get the pod IP.
PS C:\WINDOWS\system32> kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
account-docker-kubernetes 1/1 Running 3 (12m ago) 3h53m 172.18.0.2 minikube <none> <none>
Then exec in
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32> kubectl exec --stdin --tty account-docker-kubernetes -- /bin/bash
root#account-docker-kubernetes:/app#
Then ran the curl command on the pod returns 200
root#account-docker-kubernetes:/app# curl -v http://172.18.0.2:8082/bank/health/
* Trying 172.18.0.2:8082...
* TCP_NODELAY set
* Connected to 172.18.0.2 (172.18.0.2) port 8082 (#0)
> GET /bank/health/ HTTP/1.1
> Host: 172.18.0.2:8082
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
< Content-Length: 2
< Date: Sat, 01 Oct 2022 18:09:21 GMT
<
* Connection #0 to host 172.18.0.2 left intact
So I can connect from inside the pod but not from outside using the node IP.
I have also set up a firewall rule to allow connect to port 30163.

kubernetes 1.12.2 failed to load Kubelet config file /var/lib/kubelet/config.yaml

Environment:
Kubernetes 1.12.2
Docker 18.9.0
microk8s.kubectl
$ k get all
NAME READY STATUS
RESTARTS AGE
pod/mysql-0 1/1 Running 0 72s
pod/nginx-ingress-microk8s-controller-c2pgz 0/1 CrashLoopBackOff 129 22h
pod/web-0 1/1 Running 0 78s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 70m
service/mysql-service ClusterIP None <none> 3306/TCP 72s
service/nginx-service ClusterIP None <none> 80/TCP 78s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nginx-ingress-microk8s-controller 1 1 0 1 0 <none> 2d22h
NAME DESIRED CURRENT AGE
statefulset.apps/mysql 1 1 72s
statefulset.apps/web 1 1 78s
/var/log/syslog:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Error syncing pod f0ab0f74-e6f2-11e8-8410-482ae31e6a94 ("nginx-ingress-microk8s-controller-c2pgz_default(f0ab0f74-e6f2-11e8-8410-482ae31e6a94)"), skipping: failed to "StartContainer" for "nginx-ingress-microk8s" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=nginx-ingress-microk8s pod=nginx-ingress-microk8s-controller-c2pgz_default(f0ab0f74-e6f2-11e8-8410-482ae31e6a94)"
What is nginx-ingress-microk8s-controller-c2pgz? Who started it?
You mentioned in the comments that the reason is related to kubeadm init fails.
The /var/lib/kubelet/config.yaml config file is being populated only after:
A successful cluster initialization (kubeadmin init) in the master node.
In the worker node - after a successful joining to the cluster (kubeadm join).
So if the problem is with kubeadm init you should check the command's output (also great if you could paste it in the question).
Make sure you don't run kubeadm init with the --ignore-preflight-errors=all flag.
I'm not familiar with your specific error, but in order for the answer to be more helpful - I'll try to give some possible solutions:
Make sure all requirements for kubeadm are in place.
Check the firewall rules - make sure you don't block egress traffic and that port 6443 ingress rule is open for the worker node (relevant for the joining phase).
Make sure that the required ports are not occupied.
Try restarting Kubelet with systemctl restart kubelet and check latest logs with: sudo journalctl -u kubelet -n 100 --no-pager.
Check if Docker version can be updated to a newer stabler one.
Try running kubeadm reset and make sure you re-run kubeadm init with latest version or with the specific stable version by addding --kubernetes-version=X.Y.Z.
As per RtmY, it works only kubectl initilzation works correct
after doing following
kubeadm init --pod-network-cidr=192.168.0.0/16
it worked successfully.
As i have updated kubelet, I am not able to find /var/lib/kubelet/config.yaml
For that "systemctl status kubelet|journalctl -xe"
failed to load Kubelet config file /var/lib/kubelet/config.yaml
As per the below link, I have copied the config.yaml from other working worker nodes and its worked !!
https://github.com/kubernetes/kubernetes/issues/65863#issuecomment-403003592

Coredns in pending state in Kubernetes cluster

I am trying to configure a 2 node Kubernetes cluster. First I am trying to configure the master node of the cluster on a CentOS VM. I have initialized the cluster using 'kubeadm init --apiserver-advertise-address=172.16.100.6 --pod-network-cidr=10.244.0.0/16' and deployed the flannel network to the cluster. But when I do 'kubectl get nodes', I get the following output ----
[root#kubernetus ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetus NotReady master 57m v1.12.0
Following is the output of 'kubectl get pods --all-namespaces -o wide ' ----
[root#kubernetus ~]# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-system coredns-576cbf47c7-9x59x 0/1 Pending 0 58m <none> <none> <none>
kube-system coredns-576cbf47c7-l52wc 0/1 Pending 0 58m <none> <none> <none>
kube-system etcd-kubernetus 1/1 Running 2 57m 172.16.100.6 kubernetus <none>
kube-system kube-apiserver-kubernetus 1/1 Running 2 57m 172.16.100.6 kubernetus <none>
kube-system kube-controller-manager-kubernetus 1/1 Running 1 57m 172.16.100.6 kubernetus <none>
kube-system kube-proxy-hr557 1/1 Running 1 58m 172.16.100.6 kubernetus <none>
kube-system kube-scheduler-kubernetus 1/1 Running 1 57m 172.16.100.6 kubernetus <none>
coredns is in a pending state for a very long time. I have removed docker and kubectl, kubeadm, kubelet a no of times & tried to recreate the cluster, but every time it shows the same output. Can anybody help me with this issue?
Try to install Pod network add-on (Base on this guide).
Run this line:
kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
Unable to update cni config: No networks found in /etc/cni/net.d .....
Oct 02 19:21:32 kubernetus kubelet[19007]: E1002 19:21:32.886170 19007
kubelet.go:2167] Container runtime network not ready:
NetworkReady=false reason:NetworkPluginNotReady message:docker:
network plugin is not ready: cni config uninitialized
According to this error, you forgot to initialize a Kubernetes Pod network add-on. Looking at your settings, I suppose it should be Flannel.
Here is the instruction from the official Kubernetes documentation:
For flannel to work correctly, you must pass
--pod-network-cidr=10.244.0.0/16 to kubeadm init.
Set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 by running
sysctl net.bridge.bridge-nf-call-iptables=1 to pass bridged IPv4
traffic to iptables’ chains. This is a requirement for some CNI
plugins to work, for more information please see here.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml
Note that flannel works on amd64, arm, arm64 and ppc64le, but until
flannel v0.11.0 is released you need to use the following manifest
that supports all the architectures:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/c5d10c8/Documentation/kube-flannel.yml
For more information, you can visit this link.
For the Kubernetes cluster to be available, the cluster should have a Container Networking Interface (CNI). A pod-network is required to be configured for the dns pod to be functional.
Install any of the CNI Providers like:
- Flannel
- Calico
- Canal
- WeaveNet, etc.,
Without this, the hosted Kubernetes cluster would have the master in the NotReady State.
Check if docker and kubernetes are using the same cgroup driver.
I faced the same issue (CentOS 7, kubernetes v1.14.1), and setting same cgroup driver (systemd) fixed it.
I installed kubernetes with 1 master + 1 work-node.
After I made kubeadm init ..., I faced two issues:
On the master node, the coredns were pending.
On the work-node, kubectl command didn't work out
On the work-node, I did the following and fixed the both issues:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/kubelet.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config**
For me, I've restarted the system and re-applied calico.yaml, coredns and calico pods started creating.
take this solution at least priority and try changing instance type (preferably higher cpu core/ram)
in my case i have changed linux instance t3.micro to t2.medium and its works

Where is kube-apiserver located

Base question: When I try to use kube-apiserver on my master node, I get command not found error. How I can install/configure kube-apiserver? Any link to example will help.
$ kube-apiserver --enable-admission-plugins DefaultStorageClass
-bash: kube-apiserver: command not found
Details: I am new to Kubernetes and Docker and was trying to create StatefulSet with volumeClaimTemplates. My problem is that the automatic PVs are not created and I get this message in the PVC log: "persistentvolume-controller waiting for a volume to be created". I am not sure if I need to define DefaultStorageClass and so needed kube-apiserver to define it.
Name: nfs
Namespace: default
StorageClass: example-nfs
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner=example.com/nfs
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 3m (x2401 over 10h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "example.com/nfs" or manually created by system administrator
Here is get pvc result:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs Pending example-nfs 10h
And get storageclass:
$ kubectl describe storageclass example-nfs
Name: example-nfs
IsDefaultClass: No
Annotations: <none>
Provisioner: example.com/nfs
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
How can I troubleshoot this issue (e.g. logs for why the storage was not created)?
You are asking two different questions here, one about kube-apiserver configuration, one about troubleshooting your StorageClass.
Here's an answer for your first question:
kube-apiserver is running as a Docker container on your master node. Therefore, the binary is within the container, not on your host system. It is started by the master's kubelet from a file located at /etc/kubernetes/manifests. kubelet is watching this directory and will start any Pod defined here as "static pods".
To configure kube-apiserver command line arguments you need to modify /etc/kubernetes/manifests/kube-apiserver.yaml on your master.
I'll refer to the question regarding the location of the api-server.
Basic answer (specific to the question title):
The kube apiserver is located on the master node (known as the control plane).
It can be executed:
1 ) Via the host's init system (like systemd).
2 ) As a pod (I'll explain below).
In both cases it will be located on the control plane (left side below):
If its running under systemD you can run: systemctl status api-server to see the path to the configuration (drop-in) file.
If it is running as pod you can view it under the kube-system namespace with all other control panel components (plus kube-proxy and maybe network solution like weave below):
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-lpdlc 1/1 Running 1 2d22h
coredns-f9fd979d6-vcs7g 1/1 Running 1 2d22h
etcd-my-master 1/1 Running 1 2d22h
kube-apiserver-my-master 1/1 Running 1 2d22h #<----Here
kube-controller-manager-my-master 1/1 Running 1 2d22h
kube-proxy-kh2lc 1/1 Running 1 2d22h
kube-scheduler-my-master 1/1 Running 1 2d22h
weave-net-59r5b 2/2 Running 3 2d22h
You can run:
kubectl describe pod/kube-apiserver-my-master -n kube-system
In order to get more details regarding the pod.
A bit more advanced answer:
(regarding the location of /etc/kubernetes/manifests)
Lets say we have no idea where to find the relevant path for the kube-api-server config file.
But we need to remember two important things:
1 ) The kube-api-server is running on the master node.
2 ) The Kubelet isn't running as pod and when the control plane components (plus kube-proxy) are executed as static pods - it is done by the Kubelet on the master node.
So we can start our journey for reaching the manifests path by investigating the Kubelet logs.
If the Kubelet is running for a long time it will be a very large file and we'll need to dump it somewhere and go to the begging - or if Kubelet was started 5 minutes ago we can run:
sudo journalctl -u kubelet --since -5m >> kubelet_5_minutes.log
And a quick search for "api-server" will bring us to the 2 lines below where the path of the manifests in mentioned:
my-master kubelet[71..]: 00:03:21 kubelet.go:261] Adding pod path: /etc/kubernetes/manifests
my-master kubelet[71..]: 00:03:21 kubelet.go:273] Watching apiserver
And also we can see that the Kubelet is trying to create the kube-apiserver pod under my-master node and inside the kube-system namespace:
my-master kubelet[71..]: 00:03:29.05 kubelet.go:1576] ..
Creating a mirror pod for "kube-apiserver-my-master_kube-system
To make the storage class "example-nfs" default, you need to run the below command:
kubectl patch storageclass example-nfs -p '{"metadata":
{"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'

kube-dns not working on kubernetes arm

I deploy a kubernetes cluster following the guide: https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/. It basically uses hypriotOS and kubernetes from the debian repository.
After the deployment, all the pods were running and no faults were shown. However, the dns server was not working properly on the worker node.
master
$ kubectl -n kube-system get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 34m
kubernetes-dashboard 10.103.97.112 <nodes> 80:30518/TCP 31m
# I installed the dnsutils to have the dig command
$ dig #10.96.0.10 || echo "FAIL"
# shows a valid response (note that we are not resolving anything)
worker
$ dig #10.96.0.10 || echo "FAIL"
....
FAIL
It turn out that the answer was in one of the comments from , but it was not clear that this was my issue.
As the author of the comment stated is due to the iptables policies from Docker versions > 1.13.
To solve it, execute the following on both nodes:
sudo iptables -A FORWARD -i cni0 -j ACCEPT
sudo iptables -A FORWARD -o cni0 -j ACCEPT

Resources