TPROXY compatibility with Docker - docker

I'm trying to understand how TPROXY works in an effort to build a transparent proxy for Docker containers.
After lots of research I managed to create a network namespace, inject an veth interface into it and add TPROXY rules. The following script worked on a clean Ubuntu 18.04.3:
ip netns add ns0
ip link add br1 type bridge
ip link add veth0 type veth peer name veth1
ip link set veth0 master br1
ip link set veth1 netns ns0
ip addr add 192.168.3.1/24 dev br1
ip link set br1 up
ip link set veth0 up
ip netns exec ns0 ip addr add 192.168.3.2/24 dev veth1
ip netns exec ns0 ip link set veth1 up
ip netns exec ns0 ip route add default via 192.168.3.1
iptables -t mangle -A PREROUTING -i br1 -p tcp -j TPROXY --on-ip 127.0.0.1 --on-port 1234 --tproxy-mark 0x1/0x1
ip rule add fwmark 0x1 tab 30
ip route add local default dev lo tab 30
After that I launched a toy Python server from Cloudflare blog:
import socket
IP_TRANSPARENT = 19
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.IPPROTO_IP, IP_TRANSPARENT, 1)
s.bind(('127.0.0.1', 1234))
s.listen(32)
print("[+] Bound to tcp://127.0.0.1:1234")
while True:
c, (r_ip, r_port) = s.accept()
l_ip, l_port = c.getsockname()
print("[ ] Connection from tcp://%s:%d to tcp://%s:%d" % (r_ip, r_port, l_ip, l_port))
c.send(b"hello world\n")
c.close()
And finally by running ip netns exec ns0 curl 1.2.4.8 I was able to observe a connection from 192.168.3.2 to 1.2.4.8 and receive the "hello world" message.
The problem is that it seems to have compatibility issues with Docker. All worked well in a clean environment, but once I start Docker things start to go wrong. It seems like the TPROXY rule was no longer working. Running ip netns exec ns0 curl 192.168.3.1 gave "Connection reset" and running ip netns exec ns0 curl 1.2.4.8 timed out (both should have produced the "hello world" message). I tried restoring all iptables rules, deleting ip routes and rules generated by Docker and shutting down Docker, but none worked even if I didn't configure any networks or containers.
What is happening behind the scenes and how can I get TPROXY working normally?

I traced all processes created by Docker using strace -f dockerd, and looked for lines containing exec. Most commands are iptables commands, which I have already excluded, and the lines with modprobe looked interesting. I loaded these modules one by one and figured out that the module causing the trouble is br_netfilter.
The module enables filtering of bridged packets through iptables, ip6tables and arptables. The iptables part can be disabled by executing echo "0" | sudo tee /proc/sys/net/bridge/bridge-nf-call-iptables. After executing the command, the script worked again without impacting Docker containers.
I am still confused though. I haven't understood the consequences of such a setting. I enabled packet tracing, but it seems that the packets matched the exact same set of rules before and after enabling bridge-nf-call-iptables, but in the former case the first TCP SYN packet got delivered to the Python server, in the latter case the packet got dropped for unknown reasons.

Try running docker with -p 1234
"By default, when you create a container, it does not publish any of its ports to the outside world. To make a port available to services outside of Docker, or to Docker containers which are not connected to the container’s network, use the --publish or -p flag."
https://docs.docker.com/config/containers/container-networking/

Related

Dockerized Zabbix: Server Can't Connect to the Agents by IP

Problem:
I'm trying to config a fully containerized Zabbix version 6.0 monitoring system on Ubuntu 20.04 LTS using the Zabbix's Docker-Compose repo found HERE.
The command I used to raise the Zabbix server and also a Zabbix Agent is:
docker-compose -f docker-compose_v3_ubuntu_pgsql_latest.yaml --profile all up -d
Although the Agent rises in a broken state and shows a "red" status, when I change its' IP address FROM 127.0.0.1 TO 172.16.239.6 (default IP Docker-Compose assigns to it) the Zabbix Server can now successfully connect and monitoring is established. HOWEVER: the Zabbix Server cannot connect to any other Dockerized Zabbix Agents on REMOTE hosts which are raised with the docker run command:
docker run --add-host=zabbix-server:172.16.238.3 -p 10050:10050 -d --privileged --name DockerHost3-zabbix-agent -e ZBX_SERVER_HOST="zabbix-server" -e ZBX_PASSIVE_ALLOW="true" zabbix/zabbix-agent:ubuntu-6.0-latest
NOTE: I looked at other Stack groups to post this question, but Stackoverflow appeared to be the go-to group for these Docker/Zabbix issues having over 30 such questions.
Troubleshooting:
Comparative Analysis:
Agent Configuration:
Comparative analysis of the working ("green") Agent on the same host as the Zabbix Server with Agents on different hosts showing "red" statuses (not contactable by the Zabbix server) using the following command show the configurations have parity.
docker exec -u root -it (ID of agent container returned from "docker ps") bash
And then execute:
grep -Ev ^'(#|$)' /etc/zabbix/zabbix_agentd.conf
Ports:
The correct ports were showing as open on the "red" Agents as were open on the "green" agent running on the same host as the Zabbix Server from the output of the command:
ss -luntu
NOTE: This command was issued from the HOST, not the Docker container for the Agent.
Firewalling:
Review of the iptables rules from the HOST (not container) using the following command didn't reveal anything of concern:
iptables -nvx -L --line-numbers
But to exclude Firewalling, I nonetheless allowed everything in iptables in the FORWARD table on both the Zabbix server and an Agent in an "red" status used for testing.
I also allowed everything on the MikroTik GW router connecting the Zabbix Server to the different physical hosts running the Zabbix Agents.
Routing:
The Zabbix server can ping remote Agent interfaces proving there's a route to the Agents.
AppArmor:
I also stopped AppArmor to exclude it as being causal:
sudo systemctl stop apparmor
sudo systemctl status apparmor
Summary:
So everything is wide-open, the Zabbix Server can route to the Agents and the config of the "red" agents have parity with the config of the "green" Agent living on the same host at the Zabbix Server itself.
I've setup non-containerized Zabbix installation in production environments successfully so I'm otherwise familiar with Zabbix.
Why can't the containerized Zabbix Server connect to the containerized Zabbix Agents on different hosts?
Short Answer:
There was NOTHING wrong with the Zabbix config; this was a Docker-induced problem.
docker logs <hostname of Zabbix server> revealed that there appeared to be NAT'ing happening on the Zabbix SERVER, and indeed there was.
Docker was modifying iptables NAT table on the host running the Zabbix Server container causing the source address of the Zabbix Server to present as the IP of the physical host itself, not the Docker-Compose assigned IP address of 172.16.238.3.
Thus, the agent was not expecting this address and refused the connection. My experience of Dockerized apps is that they are mostly good at modifying IP tables to create the correct connectivity, but not in this particular case ;-).
I now reviewed the NAT table by executing the following command on the HOST (not container):
iptables -t nat -nvx -L --line-numbers
This revealed that Docker was being, erm "helpful" and NAT'ing the Zabbix server's traffic
I deleted the offending rules by their rule number:
iptables -t nat -D <chain> <rule #>
After which the Zabbix server's IP address was now presented correctly to the Agents who now accepted the connections and their statuses turned "green".
The problem is reproducible if you execute:
docker-compose -f docker-compose -f docker-compose_v3_ubuntu_pgsql_latest.yaml down
And then run the up command raising the containers again you'll see the offending iptables rule it restored to the NAT table of the host running the Zabbix Server's container breaking the connectivity with Agents.
Longer Answer:
Below are the steps required to identify and resolve the problem of the Zabbix server NAT'ing its' traffic out of the host's IP:
Identify If the HOST of the Zabbix Server container is NAT'ing:
We need to see how the IP of the Zabbix Server's container is presenting to the Agents, so we have to get the container ID for a Zabbix AGENT to review its' logs:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b2fcf38d601f zabbix/zabbix-agent:ubuntu-6.0-latest "/usr/bin/tini -- /u…" 5 hours ago Up 5 hours 0.0.0.0:10050->10050/tcp, :::10050->10050/tcp DockerHost3-zabbix-agent
Next, supply container ID for the Agent to the docker logs command:
docker logs b2fcf38d601f
Then Review the rejected IP address in the log output to determine if it's NOT the Zabbix Server's IP:
81:20220328:000320.589 failed to accept an incoming connection: connection from "NAT'ed IP" rejected, allowed hosts: "zabbix-server"
The fact that you can see this error proves that there is no routing or connectivity issues: the connection is going through, it's just being rejected by the application- NOT the firewall.
If NAT'ing proved, continue to next step
On Zabbix SERVER's Host:
The remediation happens on the Zabbix Server's Host itself, not the Agents. Which is good because we can fix the problem in one place versus many.
Execute below command on the Host running the Zabbix Server's container:
iptables -t nat -nvx -L --line-numbers
Output of command:
Chain POSTROUTING (policy ACCEPT 88551 packets, 6025269 bytes)
num pkts bytes target prot opt in out source destination
1 0 0 MASQUERADE all -- * !br-abeaa5aad213 192.168.24.128/28 0.0.0.0/0
2 73786 4427208 MASQUERADE all -- * !br-05094e8a67c0 172.16.238.0/24 0.0.0.0/0
Chain DOCKER (2 references)
num pkts bytes target prot opt in out source destination
1 0 0 RETURN all -- br-abeaa5aad213 * 0.0.0.0/0 0.0.0.0/0
2 95 5700 RETURN all -- br-05094e8a67c0 * 0.0.0.0/0 0.0.0.0/0
We can see the counters are incrementing for the "POSTROUTING" and "DOCKER" chains- both rule #2 in their respective chains.
These rules are clearly matching and have effect.
Delete the offending rules on the HOST of the Zabbix server container which is NATing its' traffic to the Agents:
sudo iptables -t nat -D POSTROUTING 2
sudo iptables -t nat -D DOCKER 2
Wait a few moments and the Agents should now go "green"- assuming there are no other configuration or firewalling issues. If the Agents remain "red" after applying the fix then please work through the troubleshooting steps I documented in the Question section.
Conclusion:
I've tested and restarting the Zabbix-server container does not recreate the deleted rules. But again, please note that a docker-compose down followed by a docker-compose up WILL recreate the deleted rules and break Agent connectivity.
Hope this saves other folks wasted cycles. I'm a both a Linux and network engineer and this hurt my head, so this would be near impossible to resolve if you're not a dab hand with networking.

Inside a container, how to resolve DNS on the host, on a specific port

I've an instance running a consul agent & docker. Consul agent can be used to resolve DNS queries on 0.0.0.0:8600. I'ld like to use this from inside a container.
A manual test works, running dig #172.17.0.1 -p 8600 rabbitmq.service.consul inside a container resolve properly.
A first solution is to run --network-mode host. It works. I'll do this until better. But I don't like it, security-wise.
Another idea, use docker's --dns and associated options. Even if I can script grabbing the IP, I can't get how to specify port=8600. Maybe in --dns-opts, but how ?
Along this line, writing the container's resolv.conf could do. But again, how to specify the port, I saw no hints in man resolv.conf, I believe it's not possible.
Last, I can set up a dnsmasq inside the container or in a sidecar container, along the line of this Q/A. But it's a bit heavy.
Anyone can help on this one ?
You can achieve this with the following configuration.
Configure each Consul container with a static IP address.
Use Docker's --dns option to provide these IPs as resolvers to other containers.
Create an iptables rule on the host system which redirects traffic destined to port 53 of the Consul server to port 8600.
For example:
$ sudo iptables --table nat --append PREROUTING --in-interface docker0 --proto udp \
--dst 1920.2.4 --dport 53 --jump DNAT --to-destination 192.0.2.4:8600
# Repeat for TCP
$ sudo iptables --table nat --append PREROUTING --in-interface docker0 --proto tcp \
--dst 192.0.2.4 --dport 53 --jump DNAT --to-destination 192.0.2.4:8600

Docker container hits iptables to proxy

I have two VPSs, first machine (proxy from now) is for proxy and second machine (dock from now) is docker host. I want to redirect all traffic generated inside a docker container itself over proxy, to not exposure dock machines public IP.
As connection between VPSs is over internet, no local connection, created a tunnel between them by ip tunnel as follows:
On proxy:
ip tunnel add tun10 mode ipip remote x.x.x.x local y.y.y.y dev eth0
ip addr add 192.168.10.1/24 peer 192.168.10.2 dev tun10
ip link set dev tun10 mtu 1492
ip link set dev tun10 up
On dock:
ip tunnel add tun10 mode ipip remote y.y.y.y local x.x.x.x dev eth0
ip addr add 192.168.10.2/24 peer 192.168.10.1 dev tun10
ip link set dev tun10 mtu 1492
ip link set dev tun10 up
PS: Do not know if ip tunnel can be used for production, it is another question, anyway planning to use libreswan or openvpn as a tunnel between VPSs.
After, added SNAT rules to iptables on both VPSs and some routing rules as follows:
On proxy:
iptables -t nat -A POSTROUTING -s 192.168.10.2/32 -j SNAT --to-source y.y.y.y
On dock:
iptables -t nat -A POSTROUTING -s 172.27.10.0/24 -j SNAT --to-source 192.168.10.2
ip route add default via 192.168.10.1 dev tun10 table rt2
ip rule add from 192.168.10.2 table rt2
And last but not least created a docker network with one test container attached to it as follows:
docker network create --attachable --opt com.docker.network.bridge.name=br-test --opt com.docker.network.bridge.enable_ip_masquerade=false --subnet=172.27.10.0/24 testnet
docker run --network testnet alpine:latest /bin/sh
Unfortunately all these ended with no success. So the question is how to debug that? Is it correct way? How would you do the redirection over proxy?
Some words about theory: Traffic coming from 172.27.10.0/24 subnet hits iptables SNAT rule, source IP changes to 192.168.10.2. By routing rule it routes over tun10 device, it is the tunnel. And hits another iptables SNAT rule that changes IP to y.y.y.y and finally goes to destination.

Can't ping docker IPv6 container

I ran docker daemon for using it with global IPv6 for containers:
docker daemon --ipv6 --fixed-cidr-v6="xxxx:xxxx:xxxx:xxxx::/64"
After it I ran docker container:
docker run -d --name my-container some-image
It successfully got Global IPv6 address( I checked by docker inspect my-container). But I can't to ping my container by this ip:
Destination unreachable: Address unreachable
But I can successfully ping docker0 bridge by it's IPv6 address.
Output of route -n -6 contains next lines:
Destination Next Hop Flag Met Ref Use If
xxxx:xxxx:xxxx:xxxx::/64 :: U 256 0 0 docker0
xxxx:xxxx:xxxx:xxxx::/64 :: U 1024 0 0 docker0
fe80::/64 :: U 256 0 0 docker0
docker0 interface has global IPv6 address:
inet6 addr: xxxx:xxxx:xxxx:xxxx::1/64 Scope:Global
xxxx:xxxx:xxxx:xxxx:: everywhere is the same, and it's global IPv6 address of my eth0 interface
Does docker required something additional configs for accessing my containers via IPv6?
Assuming IPv6 in your guest OS is properly configured probably you are pinging the container not from host OS, but outside and network discovery protocol is not configured. Other hosts does not know if your container is behind of your host. I'm doing this after start of container with IPv6 (in host OS) (in ExecStartPost clauses of Systemd .service file)
/usr/sbin/sysctl net.ipv6.conf.interface_name.proxy_ndp=1
/usr/bin/ip -6 neigh add proxy $(docker inspect --format {{.NetworkSettings.GlobalIPv6Address}} container_name) dev interface_name"
Beware of IPv6: docker developers say in replies to bug reports they do not have enough time to make IPv6 production-ready in version 1.10 and say nothing about 1.11.
Mb you use wrong ping command. For ipv6 is ping6.
$ ping6 2607:f0d0:1002:51::4

Docker 1.9.0 "bridge" versus a custom bridge network results in difference in hosts file and SSH_CLIENT env variable

Let me first explain what I'm trying to do, as there may be multiple ways to solve this. I have two containers in docker 1.9.0:
node001 (172.17.0.2) (sudo docker run --net=<<bridge or test>> --name=node001 -h node001 --privileged -t -i -v /sys/fs/cgroup:/sys/fs/cgroup <<image>>)
node002 (172.17.0.3) (,,)
When I launch them with --net=bridge I get the correct value for SSH_CLIENT when I ssh from one to the other:
[root#node001 ~]# ssh root#172.17.0.3
root#172.17.0.3's password:
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.17.0.3 56194 22
[root#node001 ~]# ping -c 1 node002
ping: unknown host node002
In docker 1.8.3 I could also use the hostnames I supply when I start them, in 1.8.3 that last ping statement works!
In docker 1.9.0 I don't see anything being added in /etc/hosts, and the ping statement fails. This is a problem for me. So I tried creating a custom network...
docker network create --driver bridge test
When I launch the two containers with --net=test I get a different value for SSH_CLIENT:
[root#node001 ~]# ssh root#172.18.0.3
root#172.18.0.3's password:
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.18.0.1 57388 22
[root#node001 ~]# ping -c 1 node002
PING node002 (172.18.0.3) 56(84) bytes of data.
64 bytes from node002 (172.18.0.3): icmp_seq=1 ttl=64 time=0.041 ms
Note that the ip address is not node001's, it seems to represent the docker host itself. The hosts file is correct though, containing:
172.18.0.2 node001
172.18.0.2 node001.test
172.18.0.3 node002
172.18.0.3 node002.test
My current workaround is using docker 1.8.3 with the default bridge network, but I want this to work with future docker versions.
Is there any way I can customize the test network to make it behave similarly to the default bridge network?
Alternatively:
Maybe make the default bridge network write out the /etc/hosts file in docker 1.9.0?
Any help or pointers towards different solutions will be greatly appreciated..
Edit: 21-01-2016
Apparently the problem is fixed in 1.9.1, with bridge in docker 1.8 and with a custom (--net=test) in 1.9.1, now the behaviour is correct:
[root#node001 tmp]# ip route
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.5
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.18.0.3 52162 22
Retried in 1.9.0 to see if I wasn't crazy, and yeah there the problem occurs:
[root#node001 tmp]# ip route
default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3
[root#node002 ~]# env|grep SSH_CLI
SSH_CLIENT=172.18.0.1 53734 22
So after remove/stop/start-ing the instances the IP-addresses were not exactly the same, but it can be easily seen that the ssh_client source ip is not correct in the last code block. Thanks #sourcejedi for making me re-check.
Firstly, I don't think it's possible to change any settings on the default network, i.e. to write /etc/hosts. You apparently can't delete the default networks, so you can't recreate them with different options.
Secondly
Docker is careful that its host-wide iptables rules fully expose containers to each other’s raw IP addresses, so connections from one container to another should always appear to be originating from the first container’s own IP address. docs.docker.com
I tried reproducing your issue with the random containers I've been playing with. Running wireshark on the bridge interface for the network, I didn't see my ping packets. From this I conclude my containers are indeed talking directly to each other; the host was not doing routing and NAT.
You need to check the routes on your client container ip route. Do you have a route for 172.18.0.2/16? If you only have a default route, it could try to send everything through the docker host. And it might get confused and do masquerading as if it was talking with the outside world.
This might happen if you're running some network configuration in your privileged container. I don't know what's happening if you're just booting it with bash though.

Resources