Kubernetes: Container not able to ping www.google.com - docker

I have kubernetes cluster running on 4 Raspberry-pi devices, out of which 1 is acting as master and other 3 are working as worker i.e w1, w2, w3. I have started a daemon set deployment, so each worker is running a pod of 2 containers.
w2 is running pod of 2 container. If I exec into any container and ping www.google.com from the container, I get the response. But if I do the same on w1 and w3 it says temporary failure in name resolution. All the pods in kube-system are running. I am using weave for networking. Below are all the pods for kube-system
NAME READY STATUS RESTARTS AGE
etcd-master-pi 1/1 Running 1 23h
kube-apiserver-master-pi 1/1 Running 1 23h
kube-controller-manager-master-pi 1/1 Running 1 23h
kube-dns-7b6ff86f69-97vtl 3/3 Running 3 23h
kube-proxy-2tmgw 1/1 Running 0 14m
kube-proxy-9xfx9 1/1 Running 2 22h
kube-proxy-nfgwg 1/1 Running 1 23h
kube-proxy-xbdxl 1/1 Running 3 23h
kube-scheduler-master-pi 1/1 Running 1 23h
weave-net-7sh5n 2/2 Running 1 14m
weave-net-c7x8p 2/2 Running 3 23h
weave-net-mz4c4 2/2 Running 6 22h
weave-net-qtgmw 2/2 Running 10 23h
If I am starting the containers using the normal docker container command but not from the kubernetes deployment then I do not see this issue. I think this is because of kube-dns. How can I debug this issue.?

You can start by checking if the dns is working
Run the nslookup on kubernetes.default from inside the pod, check if it is working.
[root#metrics-master-2 /]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
Check the local dns configuration inside the pods:
[root#metrics-master-2 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
At last, check the kube-dns container logs while you run the ping command, It will give you the possible reasons why the name is not resolving.
kubectl logs kube-dns-86f4d74b45-7c4ng -c kubedns -n kube-system
Hope this helps.

This might not be applicable to your scenario, but I wanted to document the solution I found. My issues ended up being related to a flannel network overlay setup on our master nodes.
# kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
coredns-qwer 1/1 Running 0 4h54m
coredns-asdf 1/1 Running 0 4h54m
etcd-h1 1/1 Running 0 4h53m
etcd-h2 1/1 Running 0 4h48m
etcd-h3 1/1 Running 0 4h48m
kube-apiserver-h1 1/1 Running 0 4h53m
kube-apiserver-h2 1/1 Running 0 4h48m
kube-apiserver-h3 1/1 Running 0 4h48m
kube-controller-manager-h1 1/1 Running 2 4h53m
kube-controller-manager-h2 1/1 Running 0 4h48m
kube-controller-manager-h3 1/1 Running 0 4h48m
kube-flannel-ds-amd64-asdf 1/1 Running 0 4h48m
kube-flannel-ds-amd64-qwer 1/1 Running 1 4h48m
kube-flannel-ds-amd64-zxcv 1/1 Running 0 3h51m
kube-flannel-ds-amd64-wert 1/1 Running 0 4h54m
kube-flannel-ds-amd64-sdfg 1/1 Running 1 4h41m
kube-flannel-ds-amd64-xcvb 1/1 Running 1 4h42m
kube-proxy-qwer 1/1 Running 0 4h42m
kube-proxy-asdf 1/1 Running 0 4h54m
kube-proxy-zxcv 1/1 Running 0 4h48m
kube-proxy-wert 1/1 Running 0 4h41m
kube-proxy-sdfg 1/1 Running 0 4h48m
kube-proxy-xcvb 1/1 Running 0 4h42m
kube-scheduler-h1 1/1 Running 1 4h53m
kube-scheduler-h2 1/1 Running 1 4h48m
kube-scheduler-h3 1/1 Running 0 4h48m
tiller-deploy-asdf 1/1 Running 0 4h28m
If I exec'd into any container and ping'd google.com from the container, I get a bad address response.
# ping google.com
ping: bad address 'google.com'
# ip route
default via 10.168.3.1 dev eth0
10.168.3.0/24 dev eth0 scope link src 10.168.3.22
10.244.0.0/16 via 10.168.3.1 dev eth0
ip route varies from ip route run from the master node.
altering my pods deployment configuration to include the hostNetwork: true allowed me to ping outside my container.
my newly running pod ip route
# ip route
default via 172.25.10.1 dev ens192 metric 100
10.168.0.0/24 via 10.168.0.0 dev flannel.1 onlink
10.168.1.0/24 via 10.168.1.0 dev flannel.1 onlink
10.168.2.0/24 via 10.168.2.0 dev flannel.1 onlink
10.168.3.0/24 dev cni0 scope link src 10.168.3.1
10.168.4.0/24 via 10.168.4.0 dev flannel.1 onlink
10.168.5.0/24 via 10.168.5.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 scope link src 172.17.0.1
172.25.10.0/23 dev ens192 scope link src 172.25.11.35 metric 100
192.168.122.0/24 dev virbr0 scope link src 192.168.122.1
# ping google.com
PING google.com (172.217.6.110): 56 data bytes
64 bytes from 172.217.6.110: seq=0 ttl=55 time=3.488 ms
Update 1
My associate and I found a number of different websites which advise against setting hostNetwork: true. We then found this issue and are currently investigating it as a possible solution, sans hostNetwork: true.
Usually you'd do this with the '--ip-masq' flag to flannel which is 'false' by default and is defined as "setup IP masquerade rule for traffic destined outside of overlay network". Which sounds like what you want.
Update 2
It turns out that our flannel network overlay was misconfigured. We needed to ensure that our configmap for flannel had net-conf\.json.network matching our networking.podSubnet (kubeadm config view). Changing these networks to match alleviated our networking woes. We were then able to remove hostNetwork: true from our deployments.

Related

Problem with minkube connecting to Node IP and getting timeout

Trying to connect to IP of a node and getting a timeout error
I have started minikube
minikube start
* minikube v1.27.0 on Microsoft Windows 10 Pro 10.0.19042 Build 19042
! Kubernetes 1.25.0 has a known issue with resolv.conf. minikube is using a workaround that should work for most use cases.
! For more information, see: https://github.com/kubernetes/kubernetes/issues/112135
* Using the hyperv driver based on existing profile
* Starting control plane node minikube in cluster minikube
* Restarting existing hyperv VM for "minikube" ...
* Preparing Kubernetes v1.25.0 on Docker 20.10.18 ...
* Verifying Kubernetes components...
- Using image gcr.io/k8s-minikube/storage-provisioner:v5
* Enabled addons: storage-provisioner, default-storageclass
! C:\Program Files\Docker\Docker\resources\bin\kubectl.exe is version 1.22.4, which may have incompatibilites with Kubernetes 1.25.0.
- Want kubectl v1.25.0? Try 'minikube kubectl -- get pods -A'
* Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
Checked that the pods are up and running
PS C:\WINDOWS\system32> minikube kubectl -- get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default account-docker-kubernetes 1/1 Running 3 (2m59s ago) 3h43m
kube-system coredns-565d847f94-dh5qw 1/1 Running 4 (2m59s ago) 3h57m
kube-system etcd-minikube 1/1 Running 0 2m1s
kube-system kube-apiserver-minikube 1/1 Running 0 2m2s
kube-system kube-controller-manager-minikube 1/1 Running 4 (2m59s ago) 3h57m
kube-system kube-proxy-gs6pm 1/1 Running 4 (2m59s ago) 3h57m
kube-system kube-scheduler-minikube 1/1 Running 4 (2m59s ago) 3h57m
kube-system storage-provisioner 1/1 Running 6 (2m59s ago) 3h57m
Checked the service
PS C:\WINDOWS\system32> kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
account-docker-kubernetes NodePort 10.105.105.236 <none> 8082:30163/TCP 3h44m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4h
Found the IP of the node
PS C:\WINDOWS\system32> minikube ip
172.25.177.1
so to connect to the service from outside Kubernetes its node ip plus service port
But http://172.25.177.1:30163/bank/health/ I get a timeout.
So i tried to connect to the Pod from inside first I get the pod IP.
PS C:\WINDOWS\system32> kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
account-docker-kubernetes 1/1 Running 3 (12m ago) 3h53m 172.18.0.2 minikube <none> <none>
Then exec in
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32> kubectl exec --stdin --tty account-docker-kubernetes -- /bin/bash
root#account-docker-kubernetes:/app#
Then ran the curl command on the pod returns 200
root#account-docker-kubernetes:/app# curl -v http://172.18.0.2:8082/bank/health/
* Trying 172.18.0.2:8082...
* TCP_NODELAY set
* Connected to 172.18.0.2 (172.18.0.2) port 8082 (#0)
> GET /bank/health/ HTTP/1.1
> Host: 172.18.0.2:8082
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
< Content-Length: 2
< Date: Sat, 01 Oct 2022 18:09:21 GMT
<
* Connection #0 to host 172.18.0.2 left intact
So I can connect from inside the pod but not from outside using the node IP.
I have also set up a firewall rule to allow connect to port 30163.

How to connect to an application that using kubernetes and is running in a docker container from local machine?

I feel I have created an abomination. The goal of what I am doing is to run a docker image and start the AWX web application an be able to use AWX on my local machine. The issue with this is that AWX uses kubernetes to run. I have created an image that is able to run kubernetes and the AWX application inside a container. The final output after running my bash script in the container to start AWX looks like this:
NAMESPACE NAME READY STATUS RESTARTS AGE
awx-operator-system awx-demo-586bd67d59-vj79v 4/4 Running 0 3m14s
awx-operator-system awx-demo-postgres-0 1/1 Running 0 4m11s
awx-operator-system awx-operator-controller-manager-5b4fdf998d-7tzgh 2/2 Running 0 5m4s
ingress-nginx ingress-nginx-admission-create-pfcqs 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-admission-patch-8rghp 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-controller-755dfbfc65-f7vm7 1/1 Running 0 5m33s
kube-system coredns-6d4b75cb6d-4lnvw 1/1 Running 0 5m33s
kube-system etcd-minikube 1/1 Running 0 5m46s
kube-system kube-apiserver-minikube 1/1 Running 0 5m45s
kube-system kube-controller-manager-minikube 1/1 Running 0 5m45s
kube-system kube-proxy-ddnh7 1/1 Running 0 5m34s
kube-system kube-scheduler-minikube 1/1 Running 0 5m45s
kube-system storage-provisioner 1/1 Running 1 (5m33s ago) 5m43s
go to http://192.168.49.2:30085 , the username is admin and the password is XL8aBJPy16ziBau84v63QJLNVw2JGmnb
So I believe that it is running and starting properly. The IP address 192.168.49.2 is the IP of one of the kubernetes pods. I have been struggeling to forward the info coming from this pod to my local machine. I have been trying to go from Kubernetes pod -> docker localhost -> local machine local host.
I have tried using kubectl proxy, host.docker.internal curl and a few other with no success. However I might be using these in the wrong form.
I understand that docker containers run in a very isolated environment so is it possible to forward this information from the pod to my local machine?
Thanks for your time!

Kubernetes dial tcp myIP:10250: connect: no route to host

I got Kubernetes Cluster with 1 master and 3 workers nodes.
calico v3.7.3 kubernetes v1.16.0 installed via kubespray https://github.com/kubernetes-sigs/kubespray
Before that, I normally deployed all the pods without any problems.
I can't start a few pod (Ceph):
kubectl get all --namespace=ceph
NAME READY STATUS RESTARTS AGE
pod/ceph-cephfs-test 0/1 Pending 0 162m
pod/ceph-mds-665d849f4f-fzzwb 0/1 Pending 0 162m
pod/ceph-mon-744f6dc9d6-jtbgk 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-744f6dc9d6-mqwgb 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-744f6dc9d6-zthpv 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-check-6f474c97f-gjr9f 1/1 Running 0 162m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ceph-mon ClusterIP None <none> 6789/TCP 162m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/ceph-osd 0 0 0 0 0 node-type=storage 162m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ceph-mds 0/1 1 0 162m
deployment.apps/ceph-mon 0/3 3 0 162m
deployment.apps/ceph-mon-check 1/1 1 1 162m
NAME DESIRED CURRENT READY AGE
replicaset.apps/ceph-mds-665d849f4f 1 1 0 162m
replicaset.apps/ceph-mon-744f6dc9d6 3 3 0 162m
replicaset.apps/ceph-mon-check-6f474c97f 1 1 1 162m
But another obe is ok:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6d57b44787-xlj89 1/1 Running 19 24d
calico-node-dwm47 1/1 Running 310 19d
calico-node-hhgzk 1/1 Running 15 24d
calico-node-tk4mp 1/1 Running 309 19d
calico-node-w7zvs 1/1 Running 312 19d
coredns-74c9d4d795-jrxjn 1/1 Running 0 2d23h
coredns-74c9d4d795-psf2v 1/1 Running 2 18d
dns-autoscaler-7d95989447-7kqsn 1/1 Running 10 24d
kube-apiserver-master 1/1 Running 4 24d
kube-controller-manager-master 1/1 Running 3 24d
kube-proxy-9bt8m 1/1 Running 2 19d
kube-proxy-cbrcl 1/1 Running 4 19d
kube-proxy-stj5g 1/1 Running 0 19d
kube-proxy-zql86 1/1 Running 0 19d
kube-scheduler-master 1/1 Running 3 24d
kubernetes-dashboard-7c547b4c64-6skc7 1/1 Running 591 24d
nginx-proxy-worker1 1/1 Running 2 19d
nginx-proxy-worker2 1/1 Running 0 19d
nginx-proxy-worker3 1/1 Running 0 19d
nodelocaldns-6t92x 1/1 Running 2 19d
nodelocaldns-kgm4t 1/1 Running 0 19d
nodelocaldns-xl8zg 1/1 Running 0 19d
nodelocaldns-xwlwk 1/1 Running 12 24d
tiller-deploy-8557598fbc-7f2w6 1/1 Running 0 131m
I use Centos 7:
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
The error log:
Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000&timestamps=true: dial tcp 10.2.67.203:10250: connect: no route to host
Maybe someone came across this and can help me? I will provide any additional information
logs from pending pods:
Warning FailedScheduling 98s (x125 over 3h1m) default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.
It seems that a firewall is blocking ingress traffic from port 10250 on the 10.2.67.203 node.
You can open it by running the commands below (I'm assuming firewalld is installed or you can run the commands of the equivalent firewall module):
sudo firewall-cmd --add-port=10250/tcp --permanent
sudo firewall-cmd --reload
sudo firewall-cmd --list-all # you should see that port `10250` is updated
tl;dr; It looks like your cluster itself is fairly broken and should be repaired before looking at Ceph specifically
Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000&timestamps=true: dial tcp 10.2.67.203:10250: connect: no route to host
10250 is the port that the Kubernetes API server uses to connect to a node's Kubelet to retrieve the logs.
This error indicates that the Kubernetes API server is unable to reach the node. This has nothing to do with your containers, pods or even your CNI network. no route to host indicates that either:
The host is unavailable
A network segmentation has occurred
The Kubelet is unable to answer the API server
Before addressing issues with the Ceph pods I would investigate why the Kubelet isn't reachable from the API server.
After you have solved the underlying network connectivity issues I would address the crash-looping Calico pods (You can see the logs of the previously executed containers by running kubectl logs -n kube-system calico-node-dwm47 -p).
Once you have both the underlying network and the pod network sorted I would address the issues with the Kubernetes Dashboard crash-looping, and finally, start to investigate why you are having issues deploying Ceph.

Is there a way to syslog from container to underlying k8s node?

I want to syslog from a container to the host Node -
Targeting fluentd (#127.0.0.1:5140) which runs on the node - https://docs.fluentd.org/input/syslog
e.g syslog from hello-server to the node (which hosts all of these namespaces)
I want to syslog output from hello-server container to fluentd running on node (#127.0.0.1:5140).
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default hello-server-7d8589854c-r4xfr 1/1 Running 0 21h
kube-system event-exporter-v0.2.4-5f7d5d7dd4-lgzg5 2/2 Running 0 6d6h
kube-system fluentd-gcp-scaler-7b895cbc89-bnb4z 1/1 Running 0 6d6h
kube-system fluentd-gcp-v3.2.0-4qcbs 2/2 Running 0 6d6h
kube-system fluentd-gcp-v3.2.0-jxnbn 2/2 Running 0 6d6h
kube-system fluentd-gcp-v3.2.0-k58x6 2/2 Running 0 6d6h
kube-system heapster-v1.6.0-beta.1-7778b45899-t8rz9 3/3 Running 0 6d6h
kube-system kube-dns-autoscaler-76fcd5f658-7hkgn 1/1 Running 0 6d6h
kube-system kube-dns-b46cc9485-279ws 4/4 Running 0 6d6h
kube-system kube-dns-b46cc9485-fbrm2 4/4 Running 0 6d6h
kube-system kube-proxy-gke-test-default-pool-040c0485-7zzj 1/1 Running 0 6d6h
kube-system kube-proxy-gke-test-default-pool-040c0485-ln02 1/1 Running 0 6d6h
kube-system kube-proxy-gke-test-default-pool-040c0485-w6kq 1/1 Running 0 6d6h
kube-system l7-default-backend-6f8697844f-bxn4z 1/1 Running 0 6d6h
kube-system metrics-server-v0.3.1-5b4d6d8d98-k7tz9 2/2 Running 0 6d6h
kube-system prometheus-to-sd-2g7jc 1/1 Running 0 6d6h
kube-system prometheus-to-sd-dck2n 1/1 Running 0 6d6h
kube-system prometheus-to-sd-hsc69 1/1 Running 0 6d6h
For some reason k8s does not allow us to use the built in syslog driver docker run --log-driver syslog.
Also, k8s does not allow me to connect with the underlying host using --network="host"
Has anyone tried anything similar? Maybe it would be easier to syslog remotely rather than trying to use the underlying syslog running on every node?
What you are actually looking at is the Stackdriver Logging Agent. According to the documentation at https://kubernetes.io/docs/tasks/debug-application-cluster/logging-stackdriver/#prerequisites:
If you’re using GKE and Stackdriver Logging is enabled in your cluster, you cannot change its configuration, because it’s managed and supported by GKE. However, you can disable the default integration and deploy your own.
The documentation then gives an example of rinning your own fluentd DaemonSet with custom ConfigMap. You'd need to run your own fluentd so you could configure a syslog input per https://docs.fluentd.org/input/syslog.
Then, since the fluentd is running as a DaemonSet, you would configure a Service to expose it to other pods and allow then to connect to it. If you are running the official upstream DaemonSet from https://github.com/fluent/fluentd-kubernetes-daemonset then a service might look like:
apiVersion: v1
kind: Service
namespace: kube-system
metadata:
name: fluentd
spec:
selector:
k8s-app: fluentd-logging
ports:
- protocol: UDP
port: 5140
targetPort: 5140
Then your applications can log to fluentd.kube-system:5140 (see using DNS at https://kubernetes.io/docs/concepts/services-networking/service/#dns).

Pod not response properly

I have a local(without cloud provider) cluster made up of 3 vm the master and the nodes, I have created a volume with a nfs to reuse it if a pod die and is reschedule on another nodes, but i think same component not work well: I use to create the cluster just this guide: kubernetes guide and I have after that create the cluster this is the actual state:
master#master-VirtualBox:~/Documents/KubeT/nfs$ sudo kubectl get pod --all-namespaces
[sudo] password for master:
NAMESPACE NAME READY STATUS RESTARTS AGE
default mysqlnfs3 1/1 Running 0 27m
kube-system etcd-master-virtualbox 1/1 Running 0 46m
kube-system kube-apiserver-master-virtualbox 1/1 Running 0 46m
kube-system kube-controller-manager-master-virtualbox 1/1 Running 0 46m
kube-system kube-dns-86f4d74b45-f6hpf 3/3 Running 0 47m
kube-system kube-flannel-ds-nffv6 1/1 Running 0 38m
kube-system kube-flannel-ds-rqw9v 1/1 Running 0 39m
kube-system kube-flannel-ds-s5wzn 1/1 Running 0 44m
kube-system kube-proxy-6j7p8 1/1 Running 0 38m
kube-system kube-proxy-7pj8d 1/1 Running 0 39m
kube-system kube-proxy-jqshs 1/1 Running 0 47m
kube-system kube-scheduler-master-virtualbox 1/1 Running 0 46m
master#master-VirtualBox:~/Documents/KubeT/nfs$ sudo kubectl get node
NAME STATUS ROLES AGE VERSION
host1-virtualbox Ready <none> 39m v1.10.2
host2-virtualbox Ready <none> 40m v1.10.2
master-virtualbox Ready master 48m v1.10.2
and this is the pod:
master#master-VirtualBox:~/Documents/KubeT/nfs$ sudo kubectl get pod
NAME READY STATUS RESTARTS AGE
mysqlnfs3 1/1 Running 0 29m
it is schedule on the host2 and if i try to go in the shell of host 2 and I do dockerexec I use the container very well, the data are store and retrieve, but when I try to use kubect exec not work:
master#master-VirtualBox:~/Documents/KubeT/nfs$ sudo kubectl exec -it -n default mysqlnfs3 -- /bin/bash
error: unable to upgrade connection: pod does not exist

Resources