We've been experiencing a long-standing networking issue. In short, one container cannot ping (or ssh) another. Does anybody have an extra moment to think along with me?
Our setup:
Docker CE 18.06.03 (while trying to fix the issue, we've upgraded from 17.03, but it has not helped)
Swarm Classic (Standalone) 1.2.9
Consul as a Swarm backend, running with members on five nodes
Seven nodes in total, six of which host containers
Each container is connected to an overlay network when it is started
What we've tried so far:
This issue has largely stumped us. We've spent a lot of time on it and done much of the basic troubleshooting, and some more advanced troubleshooting (happy to elaborate). (But I don't expect that I've exhausted our options, so please don't hesitate to suggest anything you may think will work.)
It's inconsistent (happening to different images, different nodes), intermittent, and long-standing (several months). We've made two changes, one of which was a workaround for MAC address assignment (explained here: https://github.com/docker/libnetwork/pull/2380; the actual workaround: https://github.com/systemd/systemd/issues/3374#issuecomment-452718898), which did improve the situation, including removing MAC address assignment errors from the logs. We also upgraded to get this fix (https://github.com/docker/libnetwork/pull/1935), which deals with IP reuse. This also decreased the problem (at the time, no containers could communicate). I've also run through some basics tests using the netshoot container (let me know if you want more info on that).
We have a workaround for a given container that is broken: we delete the Consul data for this container and then stop and restart it. From what I can tell, it does not seem to be an issue with the Consul data per se but instead comes from Docker/Swarm resetting several network configurations when the container is started (I can say more if this seems to trigger a thought for anybody reading). Then, the container can often ping other containers, but not always.
Specific question:
It seems like there's a window of time during which this can be worse. It's not necessarily tied to starting several containers at once, but there's a somewhat clear pattern: during some windows of time, containers do not get configured properly to communicate with each other. What troubleshooting steps come to mind for you?
The content below is the output from trying to ping one container (82afb0dccbcc) from two other containers. It fails at first, but then is successful.
The first time I try to ping the container, at 2019-12-10T23:57:52+00:00:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
82afb0dccbcc: user___92397089 crccheck/hello-world
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 0 received, 100% packet loss, time 3033ms^M
^M
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=2 ttl=64 time=0.083 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=3 ttl=64 time=0.072 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=4 ttl=64 time=0.073 ms^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 3 received, 25% packet loss, time 3023ms^M
rtt min/avg/max/mdev = 0.072/0.076/0.083/0.005 ms^M
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In this first ping test, above, we note that the packet loss from the first container is 100% and from the second container, it is 25%.
A few minutes later (2019-12-10T23:57:52+00:00), however, 82afb0dccbcc can be successfully pinged from both containers:
82afb0dccbcc: user___92397089 crccheck/hello-world
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ping from ansible-provisioner:
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=1 ttl=64 time=0.056 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=2 ttl=64 time=0.073 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=3 ttl=64 time=0.077 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=4 ttl=64 time=0.087 ms^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 4 received, 0% packet loss, time 3063ms^M
rtt min/avg/max/mdev = 0.056/0.073/0.087/0.012 ms^M
ping from ansible_container:
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=1 ttl=64 time=0.055 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=2 ttl=64 time=0.055 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=3 ttl=64 time=0.060 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=4 ttl=64 time=0.085 ms^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 4 received, 0% packet loss, time 3062ms^M
rtt min/avg/max/mdev = 0.055/0.063/0.085/0.015 ms^M
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
You need to create a network and connect both the containers to that network.
The Docker embedded DNS server enables name resolution for containers connected to a given network. This means that any connected container can ping another container on the same network by its container name.
From within container1, you can ping container2 by name.So, its important to explicitly specify names for the containers otherwise this would not work.
Create two containers:
docker run -d --name container1 -p 8001:80 test/apache-php
docker run -d --name container2 -p 8002:80 test/apache-php
Now create a network:
docker network create myNetwork
After that connect your containers to the network:
docker network connect myNetwork container1
docker network connect myNetwork container2
Check if your containers are part of the new network:
docker network inspect myNetwork
Now test the connection, you will be able to ping container2 from container1:
docker exec -ti container1 ping container2
I actually ran into this issue randomly, but in my case both containers were already on the same network so it was puzzling me why one container couldn't ping another.
until I ran docker network inspect myNetwork and randomly noticed that for some reason both containers were assigned the SAME mac address... no idea why that happened or even how. Obviously that would preclude pinging since on a LAN mac addresses are used by switching logic to route traffic.
I had to stop and remove the container then recreate it to change the mac address.
In case, if there is any webapp is running on any one of your container and you want to ping/call any endpoint from another container and want to use response then you can follow steps as mentioned below -
First establish inter-container communications using docker network
1. docker network create dockerContainerCommunication
Now connect containers to network dockerContainerCommunication
2. docker network connect dockerContainerCommunication container1
3. docker network connect dockerContainerCommunication container2
Now start your containers (if not started)
4. docker start container1
5. docker start container2
Inspect your network. Here you can also find out IP address of the containers.
docker network inspect dockerContainerCommunication
Now attach to any one of the container from where you want to use web application, then ping other container using curl + IP address you found out in step 6.
or
docker attach container1
OR
docker attach container2
and then run curl command
curl http://IP_ADDRESS:PORT_ON_WHICH_APP_IS_RUNNING/api/endpointPath
I hope it helps.
Related
If we create a container in default bridge network, we are able to access internet from within this container. (Below example copied from Networking with Standalone containers)
docker run -dit --name alpine1 alpine ash
docker attach alpine1
# ping -c 2 google.com
PING google.com (172.217.3.174): 56 data bytes
64 bytes from 172.217.3.174: seq=0 ttl=41 time=9.841 ms
64 bytes from 172.217.3.174: seq=1 ttl=41 time=9.897 ms
--- google.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 9.841/9.869/9.897 ms
pinging google.com from within the container, we see successful ping response.
This behaviour seems to contradict the statement from Docker docs. Or I am not understanding something here ?
Link to Docker docs
Based on #Turing85's comment , I was checking if IPForwarding is enabled on my windows. Used this link How to enable IP forwarding in windows. Looks like its not enabled.
I have very weird problem:
I have the swarm cluster and one of my service have wrong ip:
$ docker service inspect nginx_backend | grep Addr
"Addr": "10.0.0.107/24"
From any container in the cluster:
/ # ping nginx_backend
PING nginx_backend (10.0.0.107): 56 data bytes
64 bytes from 10.0.0.107: seq=0 ttl=64 time=0.057 ms
64 bytes from 10.0.0.107: seq=1 ttl=64 time=0.061 ms
64 bytes from 10.0.0.107: seq=2 ttl=64 time=0.064 ms
64 bytes from 10.0.0.107: seq=3 ttl=64 time=0.083 ms
^C
--- nginx_backend ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.057/0.066/0.083 ms
But in the server which hosted nginx_backend container:
root#backend:~# docker inspect nginx_backend.1.myzy10psfdl9r4jljrsz5zd5t | grep IPv4
"IPv4Address": "10.0.0.87"
And when some service try connect by name it got connect error, but if I manually put record like 10.0.0.87 nginx_backend to /etc/hosts inside a container, it have successful connect.
What I did wrong?)
Docker creates (by default) a Virtual IP (VIP) for each service. That's the 10.0.0.107. It then balances requests between the backend containers. In the second example (10.0.0.87) you're seeing the IP address of one of the containers. That's routable within Docker as well (which is why hitting the IP works). However the name (nginx_backend.1.myzy10psfdl9r4jljrsz5zd5t) is not DNS resolvable so that's why that fails.
You can find a list of the 'backing' containers for a service by doing a DNS lookup on tasks.nginx_backend.
Some more background here: https://docs.docker.com/network/overlay/
I have two containers connected to the default bridge network:
» docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3cc528ddbe7e gitlab/gitlab-runner:latest "/usr/bin/dumb-ini..." 25 minutes ago Up 25 minutes gitlab-runner
3c01073065c7 gitlab/gitlab-ee:latest "/assets/wrapper" About an hour ago Up About an hour (healthy) 0.0.0.0:45022->22/tcp, 0.0.0.0:45080->80/tcp, 0.0.0.0:45443->443/tcp gitlab
I have found the corresponsing IP addresses with docker inspect (any better method of obtaining them?), and I can ping from one container to the other, by IP address:
» docker exec -it gitlab-runner bash
root#3cc528ddbe7e:/# ping 172.17.0.3
PING 172.17.0.3 (172.17.0.3) 56(84) bytes of data.
64 bytes from 172.17.0.3: icmp_seq=1 ttl=64 time=0.079 ms
64 bytes from 172.17.0.3: icmp_seq=2 ttl=64 time=0.063 ms
64 bytes from 172.17.0.3: icmp_seq=3 ttl=64 time=0.060 ms
^C
--- 172.17.0.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.060/0.067/0.079/0.010 ms
But I cannot ping by name:
root#3cc528ddbe7e:/# ping gitlab
ping: unknown host gitlab
Why is this? I thought docker provides DNS by container name.
I have two containers connected to the default bridge network...
I can ping from one container to the other, by IP address...
But I cannot ping by name...
This is the default behavior for the default bridge network.
From: Docker docs
Differences between user-defined bridges and the default bridge
User-defined bridges provide automatic DNS resolution between containers.
Containers on the default bridge network can only access each other by IP addresses, unless you use the --link option, which is considered legacy. On a user-defined bridge network, containers can resolve each other by name or alias.
When I setup a network with docker create network test1 and then start a few containers, for example
docker run -d --net=test1 --name=t1 elasticsearch
docker run -d --net=test1 elasticsearch
docker run -d --net=test1 elasticsearch
I can't broadcast ping any of these containers with docker exec -ti t1 ping 255.255.255.255.
Any idea how I can change this?
This is currently followed in issue 17814
UDP broadcasts don't work in multi-host network between hosts.
UDP broadcasts only work if both containers run on the same host.
Playing with icmp broadcast by pinging on 255.255.255.255, I receive replies only from the local host:
# ping -b 255.255.255.255
WARNING: pinging broadcast address
PING 255.255.255.255 (255.255.255.255) 56(84) bytes of data.
64 bytes from 172.18.0.1: icmp_req=1 ttl=64 time=0.601 ms
64 bytes from 172.18.0.1: icmp_req=2 ttl=64 time=0.424 ms
64 bytes from 172.18.0.1: icmp_req=3 ttl=64 time=0.420 ms
64 bytes from 172.18.0.1: icmp_req=4 ttl=64 time=0.427 ms
(I made sure /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts is set to 0 on both hosts.)
It also seems impossible to set a broadcast address on the interface connected to the shared network:
# ifconfig eth0 broadcast 10.0.0.255
SIOCSIFBRDADDR: Operation not permitted
SIOCSIFFLAGS: Operation not permitted
This ability to multicast in overlay driver is discussed in docker/libnetwork issue 552.
(help wanted)
I keep getting unable_to_contact_cluster_nodes error
Has anyone seen this earlier and resolved it?
I am using rabbitmq-server 1.5.4 installed using ubuntu repositories. I have a hunch that this is something to do with ufw or some other network security measure, enabled by default in ubuntu, that is preventing connections.
The machine is pingable (I made an entry in /etc/hosts file)
pgatram#mzl005:~$ ping mz005
PING mz005 (192.168.0.22) 56(84) bytes of data.
64 bytes from mz005 (192.168.0.22): icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from mz005 (192.168.0.22): icmp_seq=2 ttl=64 time=0.023 ms
^C
--- mz005 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.023/0.024/0.026/0.005 ms
I cant get the cluster to work
pgatram#mzl005:~$ sudo rabbitmqctl cluster rabbit#mz005
Clustering node rabbit#mzl005 with [rabbit#mz005] ...
Error: {unable_to_contact_cluster_nodes,[rabbit#mz005]}
Almost certainly a firewall issue. You should be able to telnet to the other host on port 5672 (or whatever you specified in /etc/default/rabbitmq). If telnet can't connect then the port isn't open. As a sanity check, try telnet to localhost on port 5672.
If you can't telnet it'll be a firewall issue.
After that it's a case of opening the port and trying again.
Chris