Unable to wget/curl from a kubernetes Pod with https/443 - docker

I am not able to curl/wget any URL with https. They're all giving connection refused errors. Here's what I've observed so far:
When I curl any URL with https from any pod, the domain gets resolved to the different IP address than the intended one. I verified this with dig domainname and curling the same domainname. Both IP's were different
For debugging purpose, I tried the same scenario from a kubelet docker container and it worked. But if I tried the same from another app container, it fails.
Any idea what might be wrong? I am sure, there is some issue with networking. Any more steps for debugging?
The cluster is setup with RKE on bare-metal which uses canal for networking.
The website I am trying to curl is updates.jenkins.io and here's the nslookup output
bash-4.4# nslookup updates.jenkins.io
Server: 10.43.0.10
Address: 10.43.0.10#53
Non-authoritative answer:
updates.jenkins.io.domain.name canonical name = io.domain.name.
Name: io.domain.name
Address: 185.82.212.199
And nslookup from the node gives
root#n4:/home# nslookup updates.jenkins.io
Server: 127.0.1.1
Address: 127.0.1.1#53
Non-authoritative answer:
updates.jenkins.io canonical name = mirrors.jenkins.io.
Name: mirrors.jenkins.io
Address: 52.202.51.185
As far as I can see, it is trying to connect to io.domain.name and not updates.jenkins.io.
Further inspection, all domains ending with .io are causing the issue. Here'a another one:
bash-4.4# nslookup test.io
Server: 10.43.0.10
Address: 10.43.0.10#53
Non-authoritative answer:
test.io.domain.name canonical name = io.domain.name.
Name: io.domain.name
Address: 185.82.212.199

Well, there was some issue with /etc/resolv.conf. It was missing the correct nameserver entry. Once that was resolved, and the system components were restarted, everything was working.

Related

Docker containers in a custom bridge network and a dnsmasq service on host

TL;DR: How do I make containers to use dnsmasq running on the host machine as a DNS?
Details of what I tried and where I am rn are below.
In my docker-compose.yml I set up a custom network:
networks:
mycustomnet:
driver: bridge
and all containers are on it:
services:
mycontainer:
networks:
- mycustomnet
As per the docs and some answers here on SO, in a configuration like this, docker will setup /etc/resolv.conf in the container to point to 127.0.0.11, which will then forward DNS requests to whatever the host DNS resolver is set to. That's my understanding, and indeed it seems to do set that correctly:
root#717f2c8ce87e:/# cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
My host is configured with a dnsmasq service that resolves our internal TLDs (say, .example) against a cloud provider (DO) DNS server, and everything else is resolved against Google and Cloudflare DNS. From /etc/dnsmasq.conf:
server=/example/173.245.58.51
server=/example/173.245.59.41
server=/example/198.41.222.173
server=8.8.8.8
server=8.8.4.4
server=1.1.1.1
server=1.0.0.1
On my host I can do nslookup test.example and it is getting correctly resolved:
-> # nslookup test.example
Server: ::1
Address: ::1#53
Name: test.example
Address: 10.104.0.4
But if I do the same in the container, it doesn't resolve hostnames with internal TLDs:
-> # docker exec -it mycontainer_1 bash
root#717f2c8ce87e:/# nslookup test.example
Server: 127.0.0.11
Address: 127.0.0.11#53
** server can't find test.example: NXDOMAIN
But it does resolve addresses such as google.com or github.com:
root#717f2c8ce87e:/# nslookup google.com
Server: 127.0.0.11
Address: 127.0.0.11#53
Non-authoritative answer:
Name: google.com
Address: 172.217.194.100
Name: google.com
Address: 172.217.194.102
Name: google.com
Address: 172.217.194.101
Name: google.com
Address: 172.217.194.139
Name: google.com
Address: 172.217.194.113
Name: google.com
Address: 172.217.194.138
Name: google.com
Address: 2404:6800:4003:c04::8b
Name: google.com
Address: 2404:6800:4003:c04::71
Name: google.com
Address: 2404:6800:4003:c04::64
Name: google.com
Address: 2404:6800:4003:c04::66
This is quite confusing because I don't quite understand what DNS server it is using. Clearly, not the host machine's dnsmasq, or otherwise it should have resolved the .example domains too – I guess?
What am I missing? How do I resolve .example domains from within the container? And how can I check which DNS server my containers currently use?

docker-compose internal DNS server 127.0.0.11 connection refused

Suddenly when I deployed some new containers with docker-compose the internal hostname resolution didn't work.
When I tried to ping one container from the other using the service name from the docker-compose.yaml file I got ping: bad address 'myhostname'
I checked that the /etc/resolv.conf was correct and it was using 127.0.0.11
When I tried to manually resolve my hostname with either nslookup myhostname. or nslookup myhostname.docker.internal I got error
nslookup: write to '127.0.0.11': Connection refused
;; connection timed out; no servers could be reached
Okay so the issue is that the docker DNS server has stopped working. All already started containers still function, but any new ones started has this issue.
I am running Docker version 19.03.6-ce, build 369ce74
I could of course just restart docker to see if it solves it, but I am also keen on understanding why this issue happened and how to avoid it in the future.
I have a lot of containers started on the server and a total of 25 docker networks currently.
Any ideas on what can be done to troubleshoot? Any known issues that could explain this?
The docker-compose.yaml file I use has worked before and no changes has been done to it.
Edit: No DNS names at all can be resolved. 127.0.0.11 refuses all connections. I can ping any external IP addresses, as well as the IP of other containers on the same docker network. It is only the 127.0.0.11 DNS server that is not working. 127.0.0.11 still replies to ping from within the container.
Make sure you're using a custom bridge network, NOT the default one. As per the Docker docs (https://docs.docker.com/network/bridge/), the default bridge network does not allow automatic DNS resolution:
Containers on the default bridge network can only access each other by IP addresses, unless you use the --link option, which is considered legacy. On a user-defined bridge network, containers can resolve each other by name or alias.
I have the same problem. I am using the pihole/pihole docker container as the sole dns server on my network. Docker containers on the same host as the pihole server could not resolve domain names.
I resolved the issue based on "hmario"'s response to this forum post.
In brief, modify the pihole docker-compose.yml from:
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 53:53/udp
- 53:53/tcp
volumes: [...]
to
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 192.168.1.30:53:53/udp
- 192.168.1.30:53:53/tcp
volumes: [...]
Where 192.168.1.30 is a ip address of the docker host.
I'm having exactly the same problem. According to the comment here I could reproduce the setting without docker-compose, only using docker:
docker network create alpine_net
docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
stopping docker (systemctl stop docker) and enabling debug output it gives
> dockerd --debug
[...]
[resolver] read from DNS server failed, read udp 172.19.0.2:40868->192.168.177.1:53: i/o timeout
[...]
where 192.168.177.1 is my local network ip for the host that docker runs on and where also pi-hole as dns server is running and working for all of my systems.
I played around with fixing iptables configuration. but even switching them off completely and opening everything did not help.
The solution I found, without fully understanding the root case, was to move the dns to another server. I installed dnsmasq on a second system with ip 192.168.177.2 that nothing else than forwarding all dns queries back to my pi-hole server on 192.168.177.1
starting docker on 192.168.177.1 again with dns configured to use 192.168.177.2 everything was working again
with this in one terminal
dockerd --debug --dns 192.168.177.2
and the command from above in another it worked again.
> docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
search mydomain.local
nameserver 127.0.0.11
options ndots:0
PING www.google.com (172.217.23.4): 56 data bytes
64 bytes from 172.217.23.4: seq=0 ttl=118 time=8.201 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 8.201/8.201/8.201 ms
So moving the the dns server to another host and adding "dns" : ["192.168.177.2"] to my /etc/docker/daemon.json fixed it for me
Maybe someone else can help me to explain the root cause behind the problem with running the dns server on the same host as docker.
First, make sure your container is connected to a custom bridged network. I suppose by default in a custom network DNS request inside the container will be sent to 127.0.0.11#53 and forwarded to the DNS server of the host machine.
Second, check iptables -L to see if there are docker-related rules. If there is not, probably that's because iptables are restarted/reset. You'll need to restart docker demon to re-add the rules to make DNS request forwarding working.
I had same problem, the problem was host machine's hostname. I have checked hostnamectl result and it was ok but problem solved with stupid reboot. before reboot result of cat /etc/hosts was like this:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 localhost HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 localhost HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
and after reboot, I've got this result:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 hostnameIHaveSetuped HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 hostnameIHaveSetuped HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

docker dns failing with custom dns on host

i'm trying to set up pihole in a docker container (on a raspberry pi) and as such, have my DNS on my ip: 192.160.170.10. The docker container runs the dns and exposes its port 53, where the dns is available
when running iplookup google.com on the host, i get the correct output:
Server: 192.160.170.10
Address: 192.160.170.10#53
Non-authoritative answer:
Name: google.com
Address: 172.217.16.78
My resolv.conf also contains this address.
when running a docker container, i am unable to do this nslookup however:
docker run busybox nslookup google.com
outputs:
;; connection timed out; no servers could be reached
Following this tutorial i've tried specifying the dns with the following command:
docker run --dns 192.160.170.10 busybox nslookup google.com
but this also does not solve the problem. I've also tried adding the dns to /etc/docker/daemon.json, which also does nothing.
the docker container's resolv.conf output is: nameserver 192.160.170.10
What is wrong with my configuration / How can i further debug this DNS issue?
edit:
output from docker run --rm --net=host busybox nslookup google.com:
Server: 192.160.170.10
Address: 192.160.170.10:53
Non-authoritative answer:
Name: google.com
Address: 172.217.16.78
*** Can't find google.com: No answer

kubeadm join times out on non-default NIC/IP

I am trying to configure a K8s cluster on-prem and the servers are running Fedora CoreOS using multiple NICs.
I am configuring the cluster to use a non-default NIC - a bond which is defined with 2 interfaces. All servers can reach each-other over that interface and have HTTP + HTTPS connectivity to the internet.
kubeadm join hangs at:
I0513 13:24:55.516837 16428 token.go:215] [discovery] Failed to request cluster-info, will try again: Get https://${BOND_IP}:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
The relevant kubeadm init config looks like this:
[...]
localAPIEndpoint:
advertiseAddress: ${BOND_IP}
bindPort: 6443
nodeRegistration:
kubeletExtraArgs:
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
runtime-cgroups: "/systemd/system.slice"
kubelet-cgroups: "/systemd/system.slice"
node-ip: ${BOND_IP}
criSocket: /var/run/dockershim.sock
name: master
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
[...]
The join config that am using looks like this:
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
bootstrapToken:
token: ${TOKEN}
caCertHashes:
- "${SHA}"
apiServerEndpoint: "${BOND_IP}:6443"
nodeRegistration:
kubeletExtraArgs:
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
runtime-cgroups: "/systemd/system.slice"
kubelet-cgroups: "/systemd/system.slice"
If I am trying to configure it using default eth0, it works without issues.
This is not a connectivity issue. The port test works fine:
# nc -s ${BOND_IP_OF_NODE} -zv ${BOND_IP_OF_MASTER} 6443
Ncat: Version 7.80 ( https://nmap.org/ncat )
Ncat: Connected to ${BOND_IP_OF_MASTER}:6443.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
I suspect this happens due to kubelet listening on eth0, if so, can I change it to use a different NIC/IP?
LE: The eth0 connection has been cut off completely (cable out, interface down, connection down).
Now, when we init, if we choose port 0.0.0.0 for the kube-api it defaults to the bond, which we wanted initially:
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 0.0.0.0
result:
[certs] apiserver serving cert is signed for DNS names [emp-prod-nl-hilv-quortex19 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.0.0.1 ${BOND_IP}]
I have even added the 6443 port in iptables for accept and it still times out.. All my CALICO pods are up and running (all pods for that matter in kube-system namespace)
LLE:
I have tested calico and weavenet and both show the same issue. The api-server is up and can be reached from the master using curl but it times out from the nodes.
LLLE:
On the premise that the kube-api is nothing but an HTTPS server, I have tried two options from the node that cannot reach it when doing the kubeadm join:
Ran a python3 simple http server over 6443 and WAS ABLE TO CONNECT from node
Ran an nginx pod and exposed it over another port as NodePort and WAS ABLE TO CONNECT from node
the node just cant reach the api-server on 6443 or any other port for that matter ....
what am i doing wrong...
The cause:
The interface used was in BOND of type ACTIVE-ACTIVE. This made it so kubeadm tried another interface from the 2 bonded, which was not in the same subnet as the IP of the advertised server apparently...
Using ACTIVE-PASSIVE did the trick and was able to join the nodes.
LE: If anyone knows why kubeadm join does not support LACP with ACTIVE-ACTIVE bond setups on FEDORA COREOS please advise here. Otherwise, if additional configurations are required, I would very much like to know what I have missed.

Can't resolve home dns from inside k8s pod

So I recently setup a single node kubernetes cluster on my home network. I have a dns server that runs on my router (DD-WRT, dnsmaq) that resolves a bunch of local domains for ease of use. server1.lan, for example resolves to 192.168.1.11.
Server 1 was setup as my single node kubernetes cluster. Excited about the possibilities of local DNS, I spun up my first deployment using a docker container called netshoot which has a bunch of helpful network debugging tools bundled in. I execd into the container, and ran a ping and got the following...
bash-5.0# ping server1.lan
ping: server1.lan: Try again
It failed, then I tried pinging google's DNS (8.8.8.8) and that worked fine.
I tried to resolve the kubernetes default domain, it worked fine
bash-5.0# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
The /etc/resolve.conf file looks fine from inside the pod
bash-5.0# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
I then got to tailing the coredns logs, and I started seeing some interesting output...
2019-11-13T03:01:23.014Z [ERROR] plugin/errors: 2 server1.lan. AAAA: read udp 192.168.156.140:37521->192.168.1.1:53: i/o timeout
2019-11-13T03:01:24.515Z [ERROR] plugin/errors: 2 server1.lan. A: read udp 192.168.156.140:41964->192.168.1.1:53: i/o timeout
2019-11-13T03:01:24.515Z [ERROR] plugin/errors: 2 server1.lan. AAAA: read udp 192.168.156.140:33455->192.168.1.1:53: i/o timeout
2019-11-13T03:01:25.015Z [ERROR] plugin/errors: 2 server1.lan. AAAA: read udp 192.168.156.140:48864->192.168.1.1:53: i/o timeout
2019-11-13T03:01:25.015Z [ERROR] plugin/errors: 2 server1.lan. A: read udp 192.168.156.140:35328->192.168.1.1:53: i/o timeout
It seems like kubernetes is trying to communicate with 192.168.1.1 from inside the cluster network and failing. I guess CoreDNS uses whatever is in the resolv.conf on the host, so here is what that looks like.
nameserver 192.168.1.1
I can resolve server1.lan from everywhere else on the network, except these pods. My router IP is 192.168.1.1, and that is what is responding to DNS queries.
Any help on this would be greatly appreciated, it seems like some kind of IP routing issue between the kubernetes network and my real home network, or that's my theory anyways. Thanks in advance.
So it turns out the issue was that when I initiated the cluster, I specified a pod CIDR that conflicted with IPs on my home network. My kubeadm command was this
sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-cert-extra-sans=server1.lan
Since my home network conflicted with that CIDR, and since my dns upstream was 192.168.1.1, it thought that was on the pod network and not on my home network and failed to route the DNS resolution packets appropriately.
The solution was to recreate my cluster using the following command,
sudo kubeadm init --pod-network-cidr=10.200.0.0/16 --apiserver-cert-extra-sans=server1.lan
And when I applied my calico yaml file, I made sure to replace the default 192.168.0.0/16 CIDR with the new 10.200.0.0/16 CIDR.
Hope this helps someone. Thanks.

Resources