DNS Server in Docker Container - docker

I have the DNS server Unbound in a docker container. This container has the following port mapping in the docker deamon:
0.0.0.0:53->53/tcp, 0.0.0.0:53->53/udp
The docker host has the IP address 192.168.24.5 and a local DHCP server announces the host's IP as the local DNS server. This works fine all over my local network.
The host itself uses this DNS server through the IP 192.168.24.5. That's the address that is put to the host's /etc/resolv.conf. (I know it would not work with docker if there was 127.0.0.1 as the nameserver address.)
I have some other docker containers and they are supposed to use this DNS server as well. The point is, they don't.
What actually happens is this:
Whithin a random container I can ping the host's address as well as the address of the unbound-container. But when I use dig inside a container I get these results:
# dig #172.17.0.6 ...
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 22778
;; flags: qr rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
# dig #192.168.24.5 ...
;; reply from unexpected source: 172.17.0.1#53, expected 192.168.24.5#53
This looks like some internal DNS server intercepts the queries and tries to answer them. That would be fine if it would use the host's DNS server to get an answer, but it doesn't. DNS doesn't work at all in the containers.
Am I doing wrong or is docker doing something it should not ?

The issue is iptables UDP nat for DNS server. You're querying the host IP while it's the docker bridge network's response.
To fix this issue in at least to ways:
Use container IP (DNS container) as DNS resolver if possible.
or
Provide --net=host to your DNS server container and remove port mapping altogether. Then host IP DNS would work as expected.

Related

Dockerized Zabbix: Server Can't Connect to the Agents by IP

Problem:
I'm trying to config a fully containerized Zabbix version 6.0 monitoring system on Ubuntu 20.04 LTS using the Zabbix's Docker-Compose repo found HERE.
The command I used to raise the Zabbix server and also a Zabbix Agent is:
docker-compose -f docker-compose_v3_ubuntu_pgsql_latest.yaml --profile all up -d
Although the Agent rises in a broken state and shows a "red" status, when I change its' IP address FROM 127.0.0.1 TO 172.16.239.6 (default IP Docker-Compose assigns to it) the Zabbix Server can now successfully connect and monitoring is established. HOWEVER: the Zabbix Server cannot connect to any other Dockerized Zabbix Agents on REMOTE hosts which are raised with the docker run command:
docker run --add-host=zabbix-server:172.16.238.3 -p 10050:10050 -d --privileged --name DockerHost3-zabbix-agent -e ZBX_SERVER_HOST="zabbix-server" -e ZBX_PASSIVE_ALLOW="true" zabbix/zabbix-agent:ubuntu-6.0-latest
NOTE: I looked at other Stack groups to post this question, but Stackoverflow appeared to be the go-to group for these Docker/Zabbix issues having over 30 such questions.
Troubleshooting:
Comparative Analysis:
Agent Configuration:
Comparative analysis of the working ("green") Agent on the same host as the Zabbix Server with Agents on different hosts showing "red" statuses (not contactable by the Zabbix server) using the following command show the configurations have parity.
docker exec -u root -it (ID of agent container returned from "docker ps") bash
And then execute:
grep -Ev ^'(#|$)' /etc/zabbix/zabbix_agentd.conf
Ports:
The correct ports were showing as open on the "red" Agents as were open on the "green" agent running on the same host as the Zabbix Server from the output of the command:
ss -luntu
NOTE: This command was issued from the HOST, not the Docker container for the Agent.
Firewalling:
Review of the iptables rules from the HOST (not container) using the following command didn't reveal anything of concern:
iptables -nvx -L --line-numbers
But to exclude Firewalling, I nonetheless allowed everything in iptables in the FORWARD table on both the Zabbix server and an Agent in an "red" status used for testing.
I also allowed everything on the MikroTik GW router connecting the Zabbix Server to the different physical hosts running the Zabbix Agents.
Routing:
The Zabbix server can ping remote Agent interfaces proving there's a route to the Agents.
AppArmor:
I also stopped AppArmor to exclude it as being causal:
sudo systemctl stop apparmor
sudo systemctl status apparmor
Summary:
So everything is wide-open, the Zabbix Server can route to the Agents and the config of the "red" agents have parity with the config of the "green" Agent living on the same host at the Zabbix Server itself.
I've setup non-containerized Zabbix installation in production environments successfully so I'm otherwise familiar with Zabbix.
Why can't the containerized Zabbix Server connect to the containerized Zabbix Agents on different hosts?
Short Answer:
There was NOTHING wrong with the Zabbix config; this was a Docker-induced problem.
docker logs <hostname of Zabbix server> revealed that there appeared to be NAT'ing happening on the Zabbix SERVER, and indeed there was.
Docker was modifying iptables NAT table on the host running the Zabbix Server container causing the source address of the Zabbix Server to present as the IP of the physical host itself, not the Docker-Compose assigned IP address of 172.16.238.3.
Thus, the agent was not expecting this address and refused the connection. My experience of Dockerized apps is that they are mostly good at modifying IP tables to create the correct connectivity, but not in this particular case ;-).
I now reviewed the NAT table by executing the following command on the HOST (not container):
iptables -t nat -nvx -L --line-numbers
This revealed that Docker was being, erm "helpful" and NAT'ing the Zabbix server's traffic
I deleted the offending rules by their rule number:
iptables -t nat -D <chain> <rule #>
After which the Zabbix server's IP address was now presented correctly to the Agents who now accepted the connections and their statuses turned "green".
The problem is reproducible if you execute:
docker-compose -f docker-compose -f docker-compose_v3_ubuntu_pgsql_latest.yaml down
And then run the up command raising the containers again you'll see the offending iptables rule it restored to the NAT table of the host running the Zabbix Server's container breaking the connectivity with Agents.
Longer Answer:
Below are the steps required to identify and resolve the problem of the Zabbix server NAT'ing its' traffic out of the host's IP:
Identify If the HOST of the Zabbix Server container is NAT'ing:
We need to see how the IP of the Zabbix Server's container is presenting to the Agents, so we have to get the container ID for a Zabbix AGENT to review its' logs:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b2fcf38d601f zabbix/zabbix-agent:ubuntu-6.0-latest "/usr/bin/tini -- /u…" 5 hours ago Up 5 hours 0.0.0.0:10050->10050/tcp, :::10050->10050/tcp DockerHost3-zabbix-agent
Next, supply container ID for the Agent to the docker logs command:
docker logs b2fcf38d601f
Then Review the rejected IP address in the log output to determine if it's NOT the Zabbix Server's IP:
81:20220328:000320.589 failed to accept an incoming connection: connection from "NAT'ed IP" rejected, allowed hosts: "zabbix-server"
The fact that you can see this error proves that there is no routing or connectivity issues: the connection is going through, it's just being rejected by the application- NOT the firewall.
If NAT'ing proved, continue to next step
On Zabbix SERVER's Host:
The remediation happens on the Zabbix Server's Host itself, not the Agents. Which is good because we can fix the problem in one place versus many.
Execute below command on the Host running the Zabbix Server's container:
iptables -t nat -nvx -L --line-numbers
Output of command:
Chain POSTROUTING (policy ACCEPT 88551 packets, 6025269 bytes)
num pkts bytes target prot opt in out source destination
1 0 0 MASQUERADE all -- * !br-abeaa5aad213 192.168.24.128/28 0.0.0.0/0
2 73786 4427208 MASQUERADE all -- * !br-05094e8a67c0 172.16.238.0/24 0.0.0.0/0
Chain DOCKER (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 RETURN all -- br-abeaa5aad213 * 0.0.0.0/0 0.0.0.0/0
2 95 5700 RETURN all -- br-05094e8a67c0 * 0.0.0.0/0 0.0.0.0/0
We can see the counters are incrementing for the "POSTROUTING" and "DOCKER" chains- both rule #2 in their respective chains.
These rules are clearly matching and have effect.
Delete the offending rules on the HOST of the Zabbix server container which is NATing its' traffic to the Agents:
sudo iptables -t nat -D POSTROUTING 2
sudo iptables -t nat -D DOCKER 2
Wait a few moments and the Agents should now go "green"- assuming there are no other configuration or firewalling issues. If the Agents remain "red" after applying the fix then please work through the troubleshooting steps I documented in the Question section.
Conclusion:
I've tested and restarting the Zabbix-server container does not recreate the deleted rules. But again, please note that a docker-compose down followed by a docker-compose up WILL recreate the deleted rules and break Agent connectivity.
Hope this saves other folks wasted cycles. I'm a both a Linux and network engineer and this hurt my head, so this would be near impossible to resolve if you're not a dab hand with networking.

docker-compose internal DNS server 127.0.0.11 connection refused

Suddenly when I deployed some new containers with docker-compose the internal hostname resolution didn't work.
When I tried to ping one container from the other using the service name from the docker-compose.yaml file I got ping: bad address 'myhostname'
I checked that the /etc/resolv.conf was correct and it was using 127.0.0.11
When I tried to manually resolve my hostname with either nslookup myhostname. or nslookup myhostname.docker.internal I got error
nslookup: write to '127.0.0.11': Connection refused
;; connection timed out; no servers could be reached
Okay so the issue is that the docker DNS server has stopped working. All already started containers still function, but any new ones started has this issue.
I am running Docker version 19.03.6-ce, build 369ce74
I could of course just restart docker to see if it solves it, but I am also keen on understanding why this issue happened and how to avoid it in the future.
I have a lot of containers started on the server and a total of 25 docker networks currently.
Any ideas on what can be done to troubleshoot? Any known issues that could explain this?
The docker-compose.yaml file I use has worked before and no changes has been done to it.
Edit: No DNS names at all can be resolved. 127.0.0.11 refuses all connections. I can ping any external IP addresses, as well as the IP of other containers on the same docker network. It is only the 127.0.0.11 DNS server that is not working. 127.0.0.11 still replies to ping from within the container.
Make sure you're using a custom bridge network, NOT the default one. As per the Docker docs (https://docs.docker.com/network/bridge/), the default bridge network does not allow automatic DNS resolution:
Containers on the default bridge network can only access each other by IP addresses, unless you use the --link option, which is considered legacy. On a user-defined bridge network, containers can resolve each other by name or alias.
I have the same problem. I am using the pihole/pihole docker container as the sole dns server on my network. Docker containers on the same host as the pihole server could not resolve domain names.
I resolved the issue based on "hmario"'s response to this forum post.
In brief, modify the pihole docker-compose.yml from:
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 53:53/udp
- 53:53/tcp
volumes: [...]
to
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 192.168.1.30:53:53/udp
- 192.168.1.30:53:53/tcp
volumes: [...]
Where 192.168.1.30 is a ip address of the docker host.
I'm having exactly the same problem. According to the comment here I could reproduce the setting without docker-compose, only using docker:
docker network create alpine_net
docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
stopping docker (systemctl stop docker) and enabling debug output it gives
> dockerd --debug
[...]
[resolver] read from DNS server failed, read udp 172.19.0.2:40868->192.168.177.1:53: i/o timeout
[...]
where 192.168.177.1 is my local network ip for the host that docker runs on and where also pi-hole as dns server is running and working for all of my systems.
I played around with fixing iptables configuration. but even switching them off completely and opening everything did not help.
The solution I found, without fully understanding the root case, was to move the dns to another server. I installed dnsmasq on a second system with ip 192.168.177.2 that nothing else than forwarding all dns queries back to my pi-hole server on 192.168.177.1
starting docker on 192.168.177.1 again with dns configured to use 192.168.177.2 everything was working again
with this in one terminal
dockerd --debug --dns 192.168.177.2
and the command from above in another it worked again.
> docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
search mydomain.local
nameserver 127.0.0.11
options ndots:0
PING www.google.com (172.217.23.4): 56 data bytes
64 bytes from 172.217.23.4: seq=0 ttl=118 time=8.201 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 8.201/8.201/8.201 ms
So moving the the dns server to another host and adding "dns" : ["192.168.177.2"] to my /etc/docker/daemon.json fixed it for me
Maybe someone else can help me to explain the root cause behind the problem with running the dns server on the same host as docker.
First, make sure your container is connected to a custom bridged network. I suppose by default in a custom network DNS request inside the container will be sent to 127.0.0.11#53 and forwarded to the DNS server of the host machine.
Second, check iptables -L to see if there are docker-related rules. If there is not, probably that's because iptables are restarted/reset. You'll need to restart docker demon to re-add the rules to make DNS request forwarding working.
I had same problem, the problem was host machine's hostname. I have checked hostnamectl result and it was ok but problem solved with stupid reboot. before reboot result of cat /etc/hosts was like this:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 localhost HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 localhost HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
and after reboot, I've got this result:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 hostnameIHaveSetuped HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 hostnameIHaveSetuped HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6

Centos 7 minimal install can't talk to internet

Newbie trying to install/set up Centos 7. Can ping other machines in the domain, but can't ping gateway, google.com etc. Gets destination host unreachable for gateway and unknown host google.com when pinging google.com
Please advice.
etc/sysconfig/network-scripts:
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=enp4s0
iUUID=c39e3407-a566-4586-8fb9-fd4e3bfc4617
DEVICE=enp4s0
ONBOOT=yes
IPADDR="192.168.192.150"
GATEWAY="208.67.254.41"
DNS1="8.8.8.8"
DNS2="8.8.4.4"
etc/resolv.conf
# Generated by NetworkManager
nameserver 8.8.8.8
nameserver 8.8.4.4
etc/sysconfig/network
# Created by anaconda
NETWORKING=yes
HOSTNAME=centos7
GATEWAY=208.67.254.41
Since it says unknown host google.com the machine is not able to route request to internet DNS server(8.8.8.8) to resolve google ip and when you ping the gateway it destination host not reachable
For a machine to connect to other machine their the machine should be within lan if not on lan then there should be a machine which acts a gateway machine within lan in your case you have pointed gateway to 208.67.254.41 obviously it is not on lan so this machine 208.67.254.41 should be accessible from some machine in lan to do so use route command
which add a routing entry in machines routing table
route add -host gw dev
In your case command goes like
route add -host 208.67.254.41 gw dev
eg : route add -host 192.168.12.45 gw 192.168.12.1 dev eth0
Comment entries if ipv6 is not used
Make sure to keep ip forwarding on in the gateway machine in /etc/sysclt.conf on gateway machine
Have you disabled Network Manager?
Command line:
service NetworkManager status

Erlang - Nodes don't recognize

I'm trying to use distributing programming in Erlang.
But I had a problem, I can't communicate two Erlang's nodes to communicate.
I tried to put the same atom in the "Magical cookies", but it didn't work.
I tried to use command net:ping(node), but reponse was pang (didn't reconigze another node), or used nodes(), to see if my first node see the second node, but it didn't work again.
The first and second node is CentOS in VMWare, using bridge connection in network adaptor.
I entered command ping outside Erlang between VM's and they reconigze each one.
I start the first node, but the second node open process, but can't find the node pong.
(pong#localhost)8> tut17:start_pong().
true
(ping#localhost)5> c(tut17).
{ok,tut17}
(ping#localhost)6> tut17:start_ping(pong#localhost).
<0.55.0>
Thank you!
A similar question here.
The distribution is provided by a daemon called Erlang Port Mapper Daemon. By default it listens on port 4369 so you need to make sure that that port is opened between the nodes. Additionally, each started Erlang VM opens an additional port to communicate with other VMs. You can see those ports with epmd -names:
g#someserv1:~ % epmd -names
epmd: up and running on port 4369 with data:
name hbd at port 22200
You can check if the port is opened by doing telnet to it, e.g.:
g#someserv1:~ % telnet 127.0.0.1 22200
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
^]
Connection closed by foreign host.
You can change the port to the port you want to check, e.g. 4369, and also the IP to the desired IP. Doing ping is not enough because it uses its own ICMP protocol which is different that TCP used by the Erlang distribution to communicate, e.g. ICMP may be allowed but TCP may be blocked.
Edit:
Please follow this guide Distributed Erlang to start an Erlang VM in distributed mode. Then you can use net_adm:ping/1 to connect to it from another node, e.g.:
(hbd#someserv1.somehost.com)17> net_adm:ping('hbd#someserv2.somehost.com').
pong
Only then epmd -names will show the started Erlang VM on the list.
Edit2:
Assume that there are tho hosts, A and B. Each one runs one Erlang VM. epmd -names run on each host shows for example:
Host A:
epmd: up and running on port 4369 with data:
name servA at port 22200
Host B:
epmd: up and running on port 4369 with data:
name servB at port 22300
You need to be able to do:
On Host A:
telnet HostB 4369
telent HostB 22300
On Host B:
telnet HostA 4369
telnet HostA 22200
where HostA and HostB are those hosts' IP addresses (.e.g HostA is IP of Host A, HostB is IP of Host B).
If the telnet works correctly then you should be able to do net_adm:ping/1 from one host to the other, e.g. on Host A you would ping the name of Host B. The name is what the command node(). returns.
You need to make sure you have a node name for your nodes, or they won't be available to connect with. E.g.:
erl -sname somenode#node1
If you're using separate hosts, then you need to make sure that the node names are resolvable to ip addresses somehow. An easy to way to do this is using /etc/hosts.
# Append a similar line to the /etc/hosts file
10.10.10.10 node1
For more helpful answers, you should post what you see in your terminal when you try this.
EDIT
It looks like your shell is auto picking "localhost" as the node name. You can't send messages to another host with the address "localhost". When specifying the name on the shell, try using the # syntax to specify the node name as well:
# On host 1:
erl -sname ping#host1
# On host 2
erl -sname pong#host2
Then edit the host file so host1 and host2 will resolve to the right IP.

glusterfs geo-replication - server with two interfaces - private IP advertised

I have been trying to setup a geo replication with glusterfs servers. Everything worked as expected in my test environment, on my staging environment, but then i tried the production and got stuck.
Let say I have
gluster fs server is on public ip 1.1.1.1
gluster fs slave is on public 2.2.2.2, but this IP is on interface eth1
The eth0 on gluster fs slave server is 192.168.0.1.
So when i start the command on 1.1.1.1 (firewall and ssh keys are set properly)
gluster volume geo-replication vol0 2.2.2.2::vol0 create push-pem
I get an error.
Unable to fetch slave volume details. Please check the slave cluster and slave volume.
geo-replication command failed
The error is not that important in this case, the problem is the slave IP address
2015-03-16T11:41:08.101229+00:00 xxx kernel: TCP LOGDROP: IN= OUT=eth0 SRC=1.1.1.1 DST=192.168.0.1 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=24243 DF PROTO=TCP SPT=1015 DPT=24007 WINDOW=14600 RES=0x00 SYN URGP=0
As you can see in the firewall drop log above, the port 24007 of the slave gluster daemon is advertised on private IP of the interface eth0 on slave server and should be the IP of the eth1 private IP. So master cannot connect and will time out
Is there a way to force gluster server to advertise interface eth1 or bind to it only?
I use cfengine and ansible to push configuration, so binding to Interface could be a better solution than IP, but whatever solution will do.
Thank you in advance.
I've encountered this issue but in a different context.
I was trying to geo-replicate two nodes which were both behind a NAT (AWS instances in different regions).
When the master connects to the slave via the public IP to check for volume compatability/size and other details, it retrieves the hostname of the slave, which usually resolves to something that only has meaning in that remote region.
Then it uses that hostname to dial back to the slave when later setting up the session, which fails, as that hostname resolves to a private IP in a different region.
My workaround for the issue was to use hostnames when creating the volumes, probing for peers, and establishing geo replication, and then add a /etc/hosts entry mapping slaves hostname which usually resolves to its private IP to its public IP, rather than it's private IP.
This gets you to the point where you establish a session, but I haven't had any luck actually getting it to sync, as it uses the wrong IP somewhere long the way again.
Edit:
I've actually managed to get it running by adding /etc/hosts hacks on both sides.
GlusterFS has no notion of the network layer. Check your routes. If the next-hop for your geo-replication slave is on eth1, then gluster will open a port on that interface for the slave IP address.
Also make sure your firewall is configured to forward geo-replication traffic on this port.

Resources