Kubernetes Service Requests are sending back responses from Pod IP rather than Service IP, CoreDNS not working for istioctl install - docker

Overview of my setup/problem
I'm running a K3s cluster inside a Docker container on a Rhel 7.9 box. This is all on an air gapped network so bare with me if you don't see copy and pasted examples below.
I'm trying to install Istio on the cluster but the install hangs on setting up the ingress gateway deployment. The Istio install hangs on gateway deployment because its unable to resolve the Istiod Kubernetes service from inside the Ingress Gateway pod.
What I've tried
I tested the image on a Ubuntu vagrant box and the Istio install works fine there. I've also tested the install on a Windows 10 machine using Rancher Desktop and it works fine there as well. At one point it worked on the Rhel box but my team did some security hardening over a two period but naturally they have no idea what change broke my cluster. So I'm trying to narrow down the search.
I've determined that the issue is with CoreDNS in my K3s cluster. I used the dnsutils docker image and ran a nslookup kubernetes.default. I checked the logs of the CoreDNS pod and it shows the lookup but the response it sends back to nslookup has the ip of the CoreDNS pod rather than the kube-dns Kubernetes service. nslookup correctly sees that and says
nslookup kubernetes.default
;; reply from unexpected source: 10.42.0.4#53, expected 10.43.0.2#53
;; reply from unexpected source: 10.42.0.4#53, expected 10.43.0.2#53
;; reply from unexpected source: 10.42.0.4#53, expected 10.43.0.2#53
;; connection timed out; no servers could be reached
10.42.0.4 being the CoreDNS pod and the 10.43.0.2 being the kube-dns Kubernetes Service for that pod.
The logs from the failing Istio Ingress Gateway pod are saying that its failing to retrieve a certificate from the Istiod pod because the Istiod kubernetes service connection is timing out. Which makes sense considering I can't resolve kubernetes.default correctly either.
2021-05-27T10:28:07.342344Z warn ca ca request failed, starting attempt 1 in 91.589072ms
2021-05-27T10:28:07.434806Z warn ca ca request failed, starting attempt 2 in 203.792343ms
2021-05-27T10:28:07.639557Z warn ca ca request failed, starting attempt 3 in 364.729652ms
2021-05-27T10:28:08.005300Z warn ca ca request failed, starting attempt 4 in 830.723933ms
And then states that the request to the Istiod service timed out
transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.96.0.10:53: read udp 10.244.153.113:41187->10.96.0.10:53: i/o timeout
Again my setup is on an air gapped network so ignore the IP addresses in the example above. These were copied from other posts that are related to my issue.
Where to go from here?
I'm trying to figure out what could be causing this problem. DNS resolution should be out of the box functionality for K3s and its not working correctly. As I stated before its not the Docker image I'm running k3s out of since I've gotten k3s and Istio to work on other machines.
Any suggestions on what to do next or advice on how to troubleshoot this would be greatly appreciated. Let me know if there is any other info I can provide to help. Thanks!

TLDR - bridge-nf-call-iptables and bridge-nf-callip6tables were disabled. They need to be enabled.
I found this using docker info. This listed a warning about bridge-nf-call-iptables and bridge-nf-callip6tables being disabled. I found lots of talk on the CoreDNS and k3s github about issues caused by iptables and our suspicions were correct.
This link was the solution for us.

Related

I am trying to deploy the microsoft fluid framework on Aws eks cluster but the pods go in to CrashLoopBackOff

When I get the logs for one of the pods with the CrashLoopBackOff status
kubectl logs alfred
it returns the following errors.
error: alfred service exiting due to error {"label":"winston","timestamp":"2021-11-08T07:02:02.324Z"}
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26) {
errno: 'ENOTFOUND',
code: 'ENOTFOUND',
syscall: 'getaddrinfo',
hostname: 'mongodb'
} {"label":"winston","timestamp":"2021-11-08T07:02:02.326Z"}
error: Client Manager Redis Error: getaddrinfo ENOTFOUND redis {"errno":"ENOTFOUND","code":"ENOTFOUND","syscall":"getaddrinfo","hostname":"redis","stack":"Error: getaddrinfo ENOTFOUND redis\n at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:66:26)","label":"winston","timestamp":"2021-11-08T07:02:02.368Z"}
I am new to the Kubernetes and Aws Eks. Would be looking forward to help. Thanks
If you see the error its failing at getaddrinfo, which is a program/function to resolve the dns name and connect with an external service. It is trying to access a redis cluster. Seems like your EKS cluster doesn't have the connectivity.
However if you are running redis as part of your EKS cluster, make sure to provide/update the kubernetes service dns in the application code, or set this as an environment variable which can be set just before deployment.
Its redis and mongodb, also as error says you are providing hostname as redis and mongodb, it won't resolve to an IP address unless you have mapped it in /etc/hosts file which is actually untrue.
Give the correct hostnames, the pods will come up. This is the root-cause.
The errors above were being generated because mongo and redis were not exposed by a service. After I created service.yaml files for the instances the above errors went away. Aws Eks deploys containers in pods which are scattered across different nodes. In order to let mongodb communicate from one pod to another you must expose a service or aka "frontend" for the mongodb deployment.

Docker for Desktop Kubernetes Unable to connect to the server: dial tcp [::1]:6445

I am using Docker for Desktop on Windows 10 Professional with Hyper-V, also I am not using minikube. I have installed Kubernetes cluster via Docker for Desktop, as shown below:
It shows the Kubernetes is successfully installed and running.
When I run the following command:
kubectl config view
I get the following output:
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://localhost:6445
name: docker-for-desktop-cluster
contexts:
- context:
cluster: docker-for-desktop-cluster
user: docker-for-desktop
name: docker-for-desktop
current-context: docker-for-desktop
kind: Config
preferences: {}
users:
- name: docker-for-desktop
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
However when I run the
kubectl cluster-info
I am getting the following error:
Unable to connect to the server: dial tcp [::1]:6445: connectex: No connection could be made because the target machine actively refused it.
It seems like there is some network issue, I am not sure how to resolve this.
I know this is an old question but the following helped me to resolve a similar issue. The root cause was that I had minikube installed previously and that was being used as my default context.
I was getting following error:
Unable to connect to the server: dial tcp 192.168.1.8:8443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
In the power-shell run the following command:
> kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
docker-desktop docker-desktop docker-desktop
docker-for-desktop docker-desktop docker-desktop
* minikube minikube minikube
this will list all the contexts and see if there are multiple. If you had installed minikube in the past, that will show a * mark as currently selected default context. You can change that to point to docker-desktop context like follows:
> kubectl config use-context docker-desktop
Run the get-contexts command again to verify the * mark.
Now, the following command should work:
> kubectl get pods
Posting a response to this very old question, as I was searching for a solution and later found a different cause for my problem and the solution was simple.
Cause was that the config file was missing from the $HOME$/.kube directory
A simple restart of Docker Desktop restored the file with some defaults and things were back ok.
Side note: The issue started after I upgraded my Docker Desktop Installation to latest (when I got the update available popup). I should also mention that the cluster stopped working and I had to manually remove Docker Desktop and Reinstall the latest version (this was the story before the problem occurred).

minikube 0.30.0 DNS not working on CentOS 7 with Docker 18.06.1-ce and vm-driver=none

I am experimenting with minikube for learning purposes, on a CentOS 7 Linux machine with Docker 18.06.010ce installed
I installed minikube using
minikube start --vm-driver=none"
I deployed a few applications but only to discover they couldn't talk to each other using their hostnames.
I deleted minikube using
minikube delete
I re-installed minikube using
minikube start --vm-driver=none
I then followed the instructions under "Debugging DNS Resolution"
(https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/)
but only to find out that the DNS system was not functional
More precisely, I run:
1.
kubectl create -f https://k8s.io/examples/admin/dns/busybox.yaml
2.
# kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
3.
# kubectl exec busybox cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local contabo.host
options ndots:5
4.
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-c4cffd6dc-dqtbt 1/1 Running 1 4m
kube-dns-86f4d74b45-tr8vc 2/3 Running 5 4m
surprisingly both kube-dns and coredns are running
should this be a concern?
I have looked for a solution anywhere without success
step 2 always returns error
I simply cannot accept that something so simple has become such a huge trouble for me
Please assist
Mine is working with coredns enabled and kube-dns disabled.
C02W84XMHTD5:ucp iahmad$ minikube addons list
- addon-manager: enabled
- coredns: enabled
- dashboard: enabled
- default-storageclass: enabled
- efk: disabled
- freshpod: disabled
- heapster: disabled
- ingress: disabled
- kube-dns: disabled
- metrics-server: disabled
- nvidia-driver-installer: disabled
- nvidia-gpu-device-plugin: disabled
- registry: disabled
- registry-creds: disabled
- storage-provisioner: enabled
you may disable the kube-dns:
minikube addons disable kube-dns
Please note the output of kube-dns pod below, it has only 2 of 3 containers running.
kube-dns-86f4d74b45-tr8vc 2/3 Running 5 4m
The last time I encountered this was when Docker's default FORWARD policy was DROP. Changing it to ACCEPT using below fixed the problem for me.
iptables -P FORWARD ACCEPT
It might be other things too, please check the pod logs.
After deleting /etc/kubernetes and /var/lib/kubelet and /var/lig/kubeadm.yaml and restarting minikube I can now successfully reproduce the DNS resolution debugging steps (https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/)
I bet some stale settings had persisted among minikube start/top iterations leading to inconsistent configuration.
It is also worth mentioning that DNS resolution was lost after restarting the iptables.
I suspect this is iptables rules related, some rule ie being put by minikube and as it gets lost as part of iptables restart the problem re-appears
I managed to resolve the problem by re-installing Minikube after deleting all state files under /etc and /var/lib, but forgot to update.
This can be now closed.

Wildfly/Jboss-v10 is not working in cluster mode with docker swarm

I have my web based java application working in wildfly/jboss version 10.I am using docker(1.13.1-cs2) to deploy my application.Now as per some HA(High availability) scenario I want my application to work in cluster mode.So I changes my wildfly configuration to cluster mode inside my standalone-full-ha.xml.After this changes everything works perfect only if I use default docker network and start container with docker bridge network. But As per my requirement I want this whole container/my application to work as a service by docker swarm.But if I start put this as a service than wildfly/jboss is not be able to start in cluster mode and throwing error like this :
21:01:27,885 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (38 bytes): java.io.IOException: Operation not permitted, headers: NAKACK2: [HIGHEST_SEQNO, seqno=2631], TP: [cluster_name=ee]
21:01:28,826 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (4166 bytes): java.io.IOException: Operation not permitted, headers: FORK: ee:activemq-cluster, NAKACK2: [MSG, seqno=2632], TP: [cluster_name=ee]
21:01:29,886 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (38 bytes): java.io.IOException: Operation not permitted, headers: NAKACK2: [HIGHEST_SEQNO, seqno=2632], TP: [cluster_name=ee]
21:01:30,826 ERROR (TransferQueueBundler,ee,WEB-APP-NODE) JGRP000029: WEB-APP-NODE: failed sending message to cluster (4166 bytes): java.io.IOException: Operation not permitted, headers: FORK: ee:activemq-cluster, NAKACK2: [MSG, seqno=2633], TP: [cluster_name=ee]
Note: I am using default swarm ingress network for port expose and communication.
As per my troubleshooting this issue is related to multicast address used by wildfly/jboss version 10 creating the issue.
I have also tried these steps multicast address in docker
But it is still not help in my case.Can anyone help me in this ?it is appreciated very much!
Thank you!
The overlay network from Docker Swarm does currently not support IP multicast.
You can either fallback to TCP based unicast for your cluster. But that leaves the challenge to know the IP addresses of all other containers in the service.
Another way is to create a macvlan based network which supports unicast. Tutorial: http://collabnix.com/docker-17-06-swarm-mode-now-with-macvlan-support/
With that variant I have the problem that as soon as you connect such a network to a container ingress (routing mesh) and access to the ouside world via docker_gwbridge stops working (details: Docker Swarm container with MACVLAN network gets wrong gateway - no internet access)

Pod creation in ContainerCreating state always

I am trying to create a pod using kubernetes with the following simple command
kubectl run example --image=nginx
It runs and assigns the pod to the minion correctly but the status is always in ContainerCreating status due to the following error. I have not hosted GCR or GCloud on my machine. So not sure why its picking from there only.
1h 29m 14s {kubelet centos-minion1} Warning FailedSync Error syncing pod, skipping:
failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed
for gcr.io/google_containers/pause:2.0, this may be because there are no
credentials on this request. details: (unable to ping registry endpoint
https://gcr.io/v0/\nv2 ping attempt failed with error: Get https://gcr.io/v2/:
http: error connecting to proxy http://87.254.212.120:8080: dial tcp
87.254.212.120:8080: i/o timeout\n v1 ping attempt failed with error:
Get https://gcr.io/v1/_ping: http: error connecting to proxy
http://87.254.212.120:8080: dial tcp 87.254.212.120:8080: i/o timeout)
Kubernetes is trying to create a pause container for your pod; this container is used to create the pod's network namespace. See this question and its answers for more general information on the pause container.
To your specific error: Kubernetes tries to pull the pause container's image (which would be gcr.io/google_containers/pause:2.0, according to your error message) from the Google Container Registry (gcr.io). Apparently, your Docker engine tries to connect to GCR using a HTTP proxy located at 87.254.212.120:8080, to which it apparently cannot connect (i/o timeout).
To correct this error, either make sure that you HTTP proxy server is online and does not block HTTP requests to GCR, or (if you do have public Internet access) disable the proxy connection for your Docker engine (this would typically be done using the http_proxy and https_proxy environment variables, which would have been set in /etc/sysconfig/docker or /etc/default/docker, depending on your Linux distribution).

Resources