I have created a service in docker using docker swarm and stack, but somehow my application running in a service replica is not able to connect to other replicas.
Here is more detail. I have three nodes: one manage and two workers. At the same node as manager I run a registry, which is the source for the image which I deploy to the workers using docker stack.
Here is how my configuration file for docker stack looks like:
version: "3.2"
services:
compute:
image: openmpi-docker.registry:443/openmpi
ports:
- "2222:22"
deploy:
replicas: 4
restart_policy:
condition: none
placement:
constraints: [node.role != manager]
networks:
- openmpi-network
networks:
openmpi-network:
driver: overlay
attachable: true
As you can guess the image contains an openmpi distribution. It starts sshd and opens port 22.
I can connect to one of the the services using following command:
ssh -p 2222 -i ./user-ssh/docker_id <worker node>
Then I try to run a simple test. I figure out what are the IP addresses of other replicas and try to connect them over ssh:
for i in $(seq 3 6) ; do ssh 10.0.1.$i hostname ; done
This works as expected I see 4 different hostnames.
So my next test involves mpi.
mpirun -H 10.0.1.3,10.0.1.4,10.0.1.5,10.0.1.6 hostname
This command is supposed to do exactly the same, with the exception that it should use mpirun instead of ssh. But somehow this fails.
So I want to see where exactly mpirun hangs using strace, but it is impossible to do so with SYS_PTRACE capability. And there is seemingly no way to set it if you use swarm. So I try to create another container, not in swarm, which shares the same overlay network:
docker run --network openmpi_openmpi-network --rm -it openmpi-docker.registry:443/openmpi bash
Here openmpi_openmpi-network is the name of the network created by "docker stack". And originally I planned to run it with "--privileged", but it turned out that I do not need that.
Then I switch from the root to a normal user and run the very same mpi command with the very same IP addresses, and the command works. So I do not run in the very same problem.
In other cases the solution to similar problem is just to turn off the firewall, but I'm pretty sure I don't have it inside containers. I created the image myself from debian:9 image and the only server I installed is sshd. And still this wouldn't explain the difference between starting the container in swarm and starting the container using docker run.
This makes me wonder. First how can I debug such kind of problem? Second what is so different between starting the service in swarm and starting it manually in regard to networking?
I would be grateful if you could help me answering these two questions.
Update
I tried to run mpirun with "--mca oob_base_verbose 100" and got following result. The logs are relatively long, but the key difference is in behaviour of the node which initiates the communication.
I a working log at some mpirun of the initiator node prints following:
...
[3365816c2c1e:00033] [[39982,0],2] oob:tcp:init adding 10.0.1.4 to our list of V4 connections
[3365816c2c1e:00033] [[39982,0],2] TCP STARTUP
[3365816c2c1e:00033] [[39982,0],2] attempting to bind to IPv4 port 0
[3365816c2c1e:00033] [[39982,0],2] assigned IPv4 port 57161
[3365816c2c1e:00033] mca:oob:select: Adding component to end
[3365816c2c1e:00033] mca:oob:select: checking available component usock
[3365816c2c1e:00033] mca:oob:select: Querying component [usock]
[3365816c2c1e:00033] oob:usock: component_available called
[3365816c2c1e:00033] [[39982,0],2] USOCK STARTUP
[3365816c2c1e:00033] SUNPATH: /tmp/openmpi-sessions-1000#3365816c2c1e_0/39982/0/usock
[3365816c2c1e:00033] mca:oob:select: Inserting component
[3365816c2c1e:00033] mca:oob:select: Found 2 active transports
<Log output from other nodes>
[3365816c2c1e:00033] [[39982,0],2]: set_addr to uri 2620260352.0;usock;tcp://10.0.1.7,172.18.0.5:48695
[3365816c2c1e:00033] [[39982,0],2]:set_addr checking if peer [[39982,0],0] is reachable via component usock
[3365816c2c1e:00033] [[39982,0],2]: peer [[39982,0],0] is NOT reachable via component usock
[3365816c2c1e:00033] [[39982,0],2]:set_addr checking if peer [[39982,0],0] is reachable via component tcp
[3365816c2c1e:00033] [[39982,0],2] oob:tcp: ignoring address usock
[3365816c2c1e:00033] [[39982,0],2] oob:tcp: working peer [[39982,0],0] address tcp://10.0.1.7,172.18.0.5:48695
[3365816c2c1e:00033] [[39982,0],2] PASSING ADDR 10.0.1.7 TO MODULE
[3365816c2c1e:00033] [[39982,0],2]:tcp set addr for peer [[39982,0],0]
[3365816c2c1e:00033] [[39982,0],2] PASSING ADDR 172.18.0.5 TO MODULE
[3365816c2c1e:00033] [[39982,0],2]:tcp set addr for peer [[39982,0],0]
...
But in non working case the initiator node stops at "Found 2 active transports" message and never prints anything else.
Update 2
I also figured out that hostname works if I add "--mca oob_tcp_if_include 10.0.1.0/24", so that the full command is:
mpirun -H 10.0.1.3,10.0.1.4,10.0.1.5,10.0.1.6 --mca oob_tcp_if_include 10.0.1.0/24 hostname
Still I do not really understand why and how to avoid the need of specifying the subnet.
Related
Suddenly when I deployed some new containers with docker-compose the internal hostname resolution didn't work.
When I tried to ping one container from the other using the service name from the docker-compose.yaml file I got ping: bad address 'myhostname'
I checked that the /etc/resolv.conf was correct and it was using 127.0.0.11
When I tried to manually resolve my hostname with either nslookup myhostname. or nslookup myhostname.docker.internal I got error
nslookup: write to '127.0.0.11': Connection refused
;; connection timed out; no servers could be reached
Okay so the issue is that the docker DNS server has stopped working. All already started containers still function, but any new ones started has this issue.
I am running Docker version 19.03.6-ce, build 369ce74
I could of course just restart docker to see if it solves it, but I am also keen on understanding why this issue happened and how to avoid it in the future.
I have a lot of containers started on the server and a total of 25 docker networks currently.
Any ideas on what can be done to troubleshoot? Any known issues that could explain this?
The docker-compose.yaml file I use has worked before and no changes has been done to it.
Edit: No DNS names at all can be resolved. 127.0.0.11 refuses all connections. I can ping any external IP addresses, as well as the IP of other containers on the same docker network. It is only the 127.0.0.11 DNS server that is not working. 127.0.0.11 still replies to ping from within the container.
Make sure you're using a custom bridge network, NOT the default one. As per the Docker docs (https://docs.docker.com/network/bridge/), the default bridge network does not allow automatic DNS resolution:
Containers on the default bridge network can only access each other by IP addresses, unless you use the --link option, which is considered legacy. On a user-defined bridge network, containers can resolve each other by name or alias.
I have the same problem. I am using the pihole/pihole docker container as the sole dns server on my network. Docker containers on the same host as the pihole server could not resolve domain names.
I resolved the issue based on "hmario"'s response to this forum post.
In brief, modify the pihole docker-compose.yml from:
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 53:53/udp
- 53:53/tcp
volumes: [...]
to
---
version: '3.7'
services:
unbound:
image: mvance/unbound-rpi:1.13.0
hostname: unbound
restart: unless-stopped
ports:
- 192.168.1.30:53:53/udp
- 192.168.1.30:53:53/tcp
volumes: [...]
Where 192.168.1.30 is a ip address of the docker host.
I'm having exactly the same problem. According to the comment here I could reproduce the setting without docker-compose, only using docker:
docker network create alpine_net
docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
stopping docker (systemctl stop docker) and enabling debug output it gives
> dockerd --debug
[...]
[resolver] read from DNS server failed, read udp 172.19.0.2:40868->192.168.177.1:53: i/o timeout
[...]
where 192.168.177.1 is my local network ip for the host that docker runs on and where also pi-hole as dns server is running and working for all of my systems.
I played around with fixing iptables configuration. but even switching them off completely and opening everything did not help.
The solution I found, without fully understanding the root case, was to move the dns to another server. I installed dnsmasq on a second system with ip 192.168.177.2 that nothing else than forwarding all dns queries back to my pi-hole server on 192.168.177.1
starting docker on 192.168.177.1 again with dns configured to use 192.168.177.2 everything was working again
with this in one terminal
dockerd --debug --dns 192.168.177.2
and the command from above in another it worked again.
> docker run -it --network alpine_net alpine /bin/sh -c "cat /etc/resolv.conf; ping -c 4 www.google.com"
search mydomain.local
nameserver 127.0.0.11
options ndots:0
PING www.google.com (172.217.23.4): 56 data bytes
64 bytes from 172.217.23.4: seq=0 ttl=118 time=8.201 ms
--- www.google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 8.201/8.201/8.201 ms
So moving the the dns server to another host and adding "dns" : ["192.168.177.2"] to my /etc/docker/daemon.json fixed it for me
Maybe someone else can help me to explain the root cause behind the problem with running the dns server on the same host as docker.
First, make sure your container is connected to a custom bridged network. I suppose by default in a custom network DNS request inside the container will be sent to 127.0.0.11#53 and forwarded to the DNS server of the host machine.
Second, check iptables -L to see if there are docker-related rules. If there is not, probably that's because iptables are restarted/reset. You'll need to restart docker demon to re-add the rules to make DNS request forwarding working.
I had same problem, the problem was host machine's hostname. I have checked hostnamectl result and it was ok but problem solved with stupid reboot. before reboot result of cat /etc/hosts was like this:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 localhost HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 localhost HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
and after reboot, I've got this result:
# The following lines are desirable for IPv4 capable hosts
127.0.0.1 hostnameIHaveSetuped HostnameSetupByISP
127.0.0.1 localhost.localdomain localhost
127.0.0.1 localhost4.localdomain4 localhost4
# The following lines are desirable for IPv6 capable hosts
::1 hostnameIHaveSetuped HostnameSetupByISP
::1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
I'm trying to understand how TPROXY works in an effort to build a transparent proxy for Docker containers.
After lots of research I managed to create a network namespace, inject an veth interface into it and add TPROXY rules. The following script worked on a clean Ubuntu 18.04.3:
ip netns add ns0
ip link add br1 type bridge
ip link add veth0 type veth peer name veth1
ip link set veth0 master br1
ip link set veth1 netns ns0
ip addr add 192.168.3.1/24 dev br1
ip link set br1 up
ip link set veth0 up
ip netns exec ns0 ip addr add 192.168.3.2/24 dev veth1
ip netns exec ns0 ip link set veth1 up
ip netns exec ns0 ip route add default via 192.168.3.1
iptables -t mangle -A PREROUTING -i br1 -p tcp -j TPROXY --on-ip 127.0.0.1 --on-port 1234 --tproxy-mark 0x1/0x1
ip rule add fwmark 0x1 tab 30
ip route add local default dev lo tab 30
After that I launched a toy Python server from Cloudflare blog:
import socket
IP_TRANSPARENT = 19
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.IPPROTO_IP, IP_TRANSPARENT, 1)
s.bind(('127.0.0.1', 1234))
s.listen(32)
print("[+] Bound to tcp://127.0.0.1:1234")
while True:
c, (r_ip, r_port) = s.accept()
l_ip, l_port = c.getsockname()
print("[ ] Connection from tcp://%s:%d to tcp://%s:%d" % (r_ip, r_port, l_ip, l_port))
c.send(b"hello world\n")
c.close()
And finally by running ip netns exec ns0 curl 1.2.4.8 I was able to observe a connection from 192.168.3.2 to 1.2.4.8 and receive the "hello world" message.
The problem is that it seems to have compatibility issues with Docker. All worked well in a clean environment, but once I start Docker things start to go wrong. It seems like the TPROXY rule was no longer working. Running ip netns exec ns0 curl 192.168.3.1 gave "Connection reset" and running ip netns exec ns0 curl 1.2.4.8 timed out (both should have produced the "hello world" message). I tried restoring all iptables rules, deleting ip routes and rules generated by Docker and shutting down Docker, but none worked even if I didn't configure any networks or containers.
What is happening behind the scenes and how can I get TPROXY working normally?
I traced all processes created by Docker using strace -f dockerd, and looked for lines containing exec. Most commands are iptables commands, which I have already excluded, and the lines with modprobe looked interesting. I loaded these modules one by one and figured out that the module causing the trouble is br_netfilter.
The module enables filtering of bridged packets through iptables, ip6tables and arptables. The iptables part can be disabled by executing echo "0" | sudo tee /proc/sys/net/bridge/bridge-nf-call-iptables. After executing the command, the script worked again without impacting Docker containers.
I am still confused though. I haven't understood the consequences of such a setting. I enabled packet tracing, but it seems that the packets matched the exact same set of rules before and after enabling bridge-nf-call-iptables, but in the former case the first TCP SYN packet got delivered to the Python server, in the latter case the packet got dropped for unknown reasons.
Try running docker with -p 1234
"By default, when you create a container, it does not publish any of its ports to the outside world. To make a port available to services outside of Docker, or to Docker containers which are not connected to the container’s network, use the --publish or -p flag."
https://docs.docker.com/config/containers/container-networking/
I have 2 VMs.
On the first I run:
docker swarm join-token manager
On the second I run the result from this command.
i.e.
docker swarm join --token SWMTKN-1-0wyjx6pp0go18oz9c62cda7d3v5fvrwwb444o33x56kxhzjda8-9uxcepj9pbhggtecds324a06u 192.168.65.3:2377
However, this outputs:
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.65.3:2377: connect: connection refused"
Any idea what's going wrong?
If it helps I'm spinning up these VMs using Vagrant.
Just add the port to firewall on master side
firewall-cmd --add-port=2377/tcp --permanent
firewall-cmd --reload
Then again try docker swarm join on second VM or node side
I was facing similar issue. and I spent couple of hours to figure out the root cause and share to those who may have similar issues.
Environment:
Oracle Cloud + AWS EC2 (2 +2)
OS: 20.04.2-Ubuntu
Docker version : 20.10.8
3 dynamic public IP+ 1 elastic IP
Issues
create two instances on the Oracle cloud at beginning
A instance (manager) docker swarm init --advertise-addr success
B instance (worker) docker join as worker is worker success
when I try to promo B as manager, encountered error
Unable to connect to remote host: No route to host
5. mesh routing is not working properly.
Investigation
Suspect it is related to network/firewall/Security group/security list
ssh to B server (worker), telnet (manager) 2377, with same error
Unable to connect to remote host: No route to host
3. login oracle console and add ingress rule under security list for all of relative port
TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes
UDP port 4789 for overlay network traffic
4. try again but still not work with telnet for same error
5. check the OS level firewall. if has disable it.
systemctl ufw disable
6. try again but still not work with same result
7. I suspect there have something wrong with oracle cloud, then I decide try to use AWS install the same version of OS/docker
8. add security group to allow all of relative ports/protocol and disable ufw
9. test with AWS instance C (leader/master) + D (worker). it works and also can promote D to manager. mesh routing was also work.
10. confirm the issue with oracle cloud
11. try to join the oracle instance (A) to C as worker. it works but still cannot promote as manager.
12. use journalctl -f to investigate the log and confirm there have socket timeout from A/B (oracle instances) to AWS instance(C)
13. relook the A/B, found there have iptables block request
14. remove all of setup in the iptables
# remove the rules
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -F
15. remove all of setup in the iptables
Root Cause
It caused by firewall either in cloud security/WAF/ACL level or OS firewall/rules. e.g. ufw/iptables
I did firewall-cmd --add-port=2377/tcp --permanent firewall-cmd --reload already on master side and was still getting the same error.
I did telnet <master ip> 2377 on worker node and then I did reboot on master.
Then it is working fine.
It looks like your docker swarm manager leader is not running on port 2377. You can check it by firing this command on your swarm manager leader vm. If it is working just fine then you will get similar output
[root#host1]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
tilzootjbg7n92n4mnof0orf0 * host1 Ready Active Leader
Furthermore you can check the listening ports in leader swarm manager node. It should have port tcp 2377 for cluster management communications and tcp/udp port 7946 for communication among nodes opened.
[root#host1]# netstat -ntulp | grep dockerd
tcp6 0 0 :::2377 :::* LISTEN 2286/dockerd
tcp6 0 0 :::7946 :::* LISTEN 2286/dockerd
udp6 0 0 :::7946 :::* 2286/dockerd
In the second vm where you are configuring second swarm manager you will have to make sure you have connectivity to port 2377 of leader swarm manager. You can use tools like telnet, wget, nc to test the connectivity as given below
[root#host2]# telnet <swarm manager leader ip> 2377
Trying 192.168.44.200...
Connected to 192.168.44.200.
For me I was on linux and windows. My windows docker private network was the same as my local network address. So docker daemon wasn't able to find in his own network the master with the address I was giving to him.
So I did :
1- go to Docker Desktop app
2- go to Settings
3- go to Resources
4- go to Network section and change the Docker subnet address (need to be different from your local subnet address).
5- Then apply and restart.
6- use the docker join on the worker again.
Note: All this steps are performed on the node where the error appear. Make sure that the ports 2377, 7946 and 4789 are opens on the master (you can use iptables or ufw).
Hope it works for you.
I have a HTTP health check in my service, exposed on localhost:35000/health. At the moment it always returns 200 OK. The configuration for the health check is done programmatically via the HTTP API rather than with a service config, but in essence, it is:
set id: service-id
set name: health check
set notes: consul does a GET to '/health' every 30 seconds
set http: http://127.0.0.1:35000/health
set interval: 30s
When I run consul in dev mode (consul agent -dev -ui) on my host machine directly the health check passes without any problem. However, when I run consul in a docker container, the health check fails with:
2017/07/08 09:33:28 [WARN] agent: http request failed 'http://127.0.0.1:35000/health': Get http://127.0.0.1:35000/health: dial tcp 127.0.0.1:35000: getsockopt: connection refused
The docker container launches consul, as far as I am aware, in exaclty the same state as the host version:
version: '2'
services:
consul-dev:
image: "consul:latest"
container_name: "net-sci_discovery-service_consul-dev"
hostname: "consul-dev"
ports:
- "8400:8400"
- "8500:8500"
- "8600:8600"
volumes:
- ./etc/consul.d:/etc/consul.d
command: "agent -dev -ui -client=0.0.0.0 -config-dir=/etc/consul.d"
I'm guessing the problem is that consul is trying to do the GET request to the containers loopback interface rather than what I am intending, which is for the loopback interface of the host. Is that a correct assumption? More importantly, what do I need to do to correct the problem?
So it transpires that there was a bug in some previous versions of macOS that prevented use of the docker0 network. Whilst the bug is fixed in newer versions, Docker support extends to older versions and so Docker for Mac doesn't currently support docker0. See this discussion for details.
The workaround is to create an alias to the loopback interface on the host machine, set the service to listen on either that alias or 0.0.0.0, and configure Consul to send the health check GET request to the alias.
To set the alias (choose a private IP address that's not being used for anything else; I chose a class A address but that's irrelevant):
sudo ifconfig lo0 alias 10.200.10.1/24
To remove the alias:
sudo ifconfig lo0 -alias 10.200.10.1
From the service definition above, the HTTP line should now read:
set http: http://10.200.10.1:35000/health
And the HTTP server listening for the health check requests also needs to be listening on either 10.200.10.2 or 0.0.0.0. This latter option is suggested in the discussion but I've only tried it with the alias.
I've updated the title of the question to more accurately reflect the problem, now I know the solution. Hope it helps somebody else too.
Let me first explain what I'm trying to do, as there may be multiple ways to solve this. I have two containers in docker 1.9.0:
node001 (172.17.0.2) (sudo docker run --net=<<bridge or test>> --name=node001 -h node001 --privileged -t -i -v /sys/fs/cgroup:/sys/fs/cgroup <<image>>)
node002 (172.17.0.3) (,,)
When I launch them with --net=bridge I get the correct value for SSH_CLIENT when I ssh from one to the other:
[root#node001 ~]# ssh root#172.17.0.3
root#172.17.0.3's password:
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.17.0.3 56194 22
[root#node001 ~]# ping -c 1 node002
ping: unknown host node002
In docker 1.8.3 I could also use the hostnames I supply when I start them, in 1.8.3 that last ping statement works!
In docker 1.9.0 I don't see anything being added in /etc/hosts, and the ping statement fails. This is a problem for me. So I tried creating a custom network...
docker network create --driver bridge test
When I launch the two containers with --net=test I get a different value for SSH_CLIENT:
[root#node001 ~]# ssh root#172.18.0.3
root#172.18.0.3's password:
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.18.0.1 57388 22
[root#node001 ~]# ping -c 1 node002
PING node002 (172.18.0.3) 56(84) bytes of data.
64 bytes from node002 (172.18.0.3): icmp_seq=1 ttl=64 time=0.041 ms
Note that the ip address is not node001's, it seems to represent the docker host itself. The hosts file is correct though, containing:
172.18.0.2 node001
172.18.0.2 node001.test
172.18.0.3 node002
172.18.0.3 node002.test
My current workaround is using docker 1.8.3 with the default bridge network, but I want this to work with future docker versions.
Is there any way I can customize the test network to make it behave similarly to the default bridge network?
Alternatively:
Maybe make the default bridge network write out the /etc/hosts file in docker 1.9.0?
Any help or pointers towards different solutions will be greatly appreciated..
Edit: 21-01-2016
Apparently the problem is fixed in 1.9.1, with bridge in docker 1.8 and with a custom (--net=test) in 1.9.1, now the behaviour is correct:
[root#node001 tmp]# ip route
default via 172.17.0.1 dev eth0
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.5
[root#node002 ~]# env | grep SSH_CLIENT
SSH_CLIENT=172.18.0.3 52162 22
Retried in 1.9.0 to see if I wasn't crazy, and yeah there the problem occurs:
[root#node001 tmp]# ip route
default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3
[root#node002 ~]# env|grep SSH_CLI
SSH_CLIENT=172.18.0.1 53734 22
So after remove/stop/start-ing the instances the IP-addresses were not exactly the same, but it can be easily seen that the ssh_client source ip is not correct in the last code block. Thanks #sourcejedi for making me re-check.
Firstly, I don't think it's possible to change any settings on the default network, i.e. to write /etc/hosts. You apparently can't delete the default networks, so you can't recreate them with different options.
Secondly
Docker is careful that its host-wide iptables rules fully expose containers to each other’s raw IP addresses, so connections from one container to another should always appear to be originating from the first container’s own IP address. docs.docker.com
I tried reproducing your issue with the random containers I've been playing with. Running wireshark on the bridge interface for the network, I didn't see my ping packets. From this I conclude my containers are indeed talking directly to each other; the host was not doing routing and NAT.
You need to check the routes on your client container ip route. Do you have a route for 172.18.0.2/16? If you only have a default route, it could try to send everything through the docker host. And it might get confused and do masquerading as if it was talking with the outside world.
This might happen if you're running some network configuration in your privileged container. I don't know what's happening if you're just booting it with bash though.