So I'm currently running on my local Kubernetes cluster (running on docker) the stable/consul chart from helm.
$ helm install -n wet-fish --namespace consul stable/consul
This creates two services
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
wet-fish-consul ClusterIP None <none> 8500/TCP,8400/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 0s
wet-fish-consul-ui NodePort 10.110.229.223 <none> 8500:30276/TCP
So this means I can run localhost:30276 and see the consul ui.
Now I'm running on my local machine
$ consul agent -dev -config-dir=./consul.d -node=machine
$ consul join 127.0.0.1:30276
This just results in:
Error joining address '127.0.0.1:30276': Unexpected response code: 500 (1 error occurred:
* Failed to join 127.0.0.1: received invalid msgType (72), expected pushPullMsg (6) from=127.0.0.1:30276
)
Failed to join any nodes.
and
2020/01/17 15:17:35 [WARN] agent: (LAN) couldn't join: 0 Err: 1 error occurred:
* Failed to join 127.0.0.1: received invalid msgType (72), expected pushPullMsg (6) from=127.0.0.1:30276
2020/01/17 15:17:35 [ERR] http: Request PUT /v1/agent/join/127.0.0.1:30276, error: 1 error occurred:
* Failed to join 127.0.0.1: received invalid msgType (72), expected pushPullMsg (6) from=127.0.0.1:30276
from=127.0.0.1:59693
There must be a way to have a local consul agent running that can connect to the k8s consul server...
This is on a Mac, so networking isn't as good....
There may be two problems here, the first is that consul agent -dev starts the agent in dev mode. By default dev mode is going to start both a server and an agent. This might be part of the reason behind the error.
The other problem could be due to localhost, the server running in Kubernetes will attempt to health check local agents. It needs to be able to ping the local agent, so even if you manage to join in the first step, it would probably fail health checks.
I agree about networking on Mac it does not make things easy, one thing you will probably have to do is set the advertise address for the local agent (non kube). Docker for mac has a host name docker.for.mac.localhost which is a routable ip to the local machine from a container. When starting the local agent if you set the advertise address to the ip value of that host Kubernetes Consul server should be able to route to the locally running agent.
Potential fix:
1. Ensure local agent is starting in client mode (manually configure not -dev)
2. Set advertise advertise address to an ip address which is routable from Kubernetes docker.for.mac.localhost
Give me a shout if that does not work for you, I have used a setup like this myself, 9/10 it is networking between Docker and the local machine.
Kind regards,
Nic
We are running an environment of 6 engines each with 30 containers.
Two engines are running containers with nginx proxy. These two containers are the only way into the network.
It is now the second time that we are facing a major problem with a set of containers in this environment:
Both nginx container cannot reach some of the containers on other machines. Only one physical engine has this problem, all others are fine. It started with timeouts of some machines, and now after 24 hours, all containers on that machine have the problem.
Some more details:
Nginx is running on machine prod-3.
Second Nginx is running on machine prod-6.
Containers with problems are running on prod-7.
Both nginx cannot reach the containers, but the containers can reach the nginx via "ping".
At the beginning and today in the morning we could reach some of the containers, other not. It started with timeouts, now we cannot ping the containers in the overlay network. This time we are able to look at the traffic using tcpdump:
on the nginx container (10.10.0.37 on prod-3) we start a ping and
as you can see: 100% packet lost:
root#e89c16296e76:/# ping ew-engine-evwx-intro
PING ew-engine-evwx-intro (10.10.0.177) 56(84) bytes of data.
--- ew-engine-evwx-intro ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 7056ms
root#e89c16296e76:/#
On the target machine prod-7 (not inside the container) - we see that all ping packages are received (so the overlay network is routing correctly to the prod-7):
wurzel#rv_____:~/eventworx-admin$ sudo tcpdump -i ens3 dst port 4789 |grep 10.10.0.177
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
IP 10.10.0.37.35270 > 10.10.0.177.http: Flags [S], seq 2637350294, win 28200, options [mss 1410,sackOK,TS val 1897214191 ecr 0,nop,wscale 7], length 0
IP 10.10.0.37.35270 > 10.10.0.177.http: Flags [S], seq 2637350294, win 28200, options [mss 1410,sackOK,TS val 1897214441 ecr 0,nop,wscale 7], length 0
IP 10.10.0.37.35326 > 10.10.0.177.http: Flags [S], seq 2595436822, win 28200, options [mss 1410,sackOK,TS val 1897214453 ecr 0,nop,wscale 7], length 0
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 1, length 64
IP 10.10.0.37.35326 > 10.10.0.177.http: Flags [S], seq 2595436822, win 28200, options [mss 1410,sackOK,TS val 1897214703 ecr 0,nop,wscale 7], length 0
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 2, length 64
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 3, length 64
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 4, length 64
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 5, length 64
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 6, length 64
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 7, length 64
IP 10.10.0.37 > 10.10.0.177: ICMP echo request, id 83, seq 8, length 64
^C304 packets captured
309 packets received by filter
0 packets dropped by kernel
wurzel#_______:~/eventworx-admin$
At first - you can see that there is no anwer ICMP (firewall is not reponsible, also not appamor).
Inside the responsible container (evwx-intro = 10.10.0.177) nothing is received, the interface eth0 (10.10.0.0) is just silent:
root#ew-engine-evwx-intro:/home/XXXXX# tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
root#ew-engine-evwx-intro:/home/XXXXX#
It's really strange.
Any other tool from docker which can help us to see what's going on?
We did not change anything to the firewall, also no automatic updates of the system (maybe security).
The only activity was, that some old containers have been reactivated after a long period (of maybe 1-2 month of inactivity).
We are really lost, if you experienced something comparable, it would be very helpful to understand the steps you took.
Many thanks for any help with this.
=============================================================
6 hours later
After trying nearly everything for a full day, we did the final try:
(1) stop all the containers
(2) stop docker service
(3) stop docker socket service
(4) restart machine
(5) start the containers
... now it looks good at the moment.
To conclude:
(1) we have no clue what was causing the problem. This is bad.
(2) We have learned that the overlay network is not the problem, because the traffic is reaching the target machine where the container is living
(3) We are able to trace the network traffic until it reaches the target machine. Somehow it is not "entering" the container. Because inside the container the network interface shows no activity at all.
We have no knowledge about the vxnet virtual network which is used by docker, so if anybody has a hint, could you help us with a link or tool about it?
Many many thanks in advance.
Andre
======================================================
4 days later...
Just had the same situation again after updating docker-ce 18.06 to 18.09.
We have two machines using docker-ce 18 in combination with ubuntu 18.04 and I just updated the docker-ce to 18.09 because of this problem (Docker container should not resolve DNS in Ubuntu 18.04 ... new resolved service).
I stopped all machines, updated docker, restart machine, started all machines.
Problem: Same problem as described in this post. The ping was received by the target host operating system but not forwarded to the container.
Solution:
1. stop all containers and docker
2. consul leave,
3. cleanup all entries in consul keystore on other machines (was not deleted by leave)
3. start consul
4. restart all enigines
5. restart nginx container ... gotcha, network is working now.
Once again the same problem was hitting us.
We have 7 servers (each running docker as described above), two nginx entry points.
It looks like, that some errors with in the consul key store is the real problem causing the docker network to show the strange behaviour (described above).
In our configuration all 7 server have their own local consul instance which synchronises with the others. For network setup each docker service is doing a lookup at its local consul key store.
In last week we notice that at the same time of the problem with network reachability also the consul clients report problems with synchronisation (leader election problems, repeats etc).
The final solution was to stop the docker engines and the consul clients. Delete the consul database on some servers, join it again to the others. Start the docker engines.
Looks like the consul service is a critical part for the network configuration...
In progress...
I faced the exact issue with overlay network Docker Swarm setup.
I've found that it's not OS or Docker problem. Servers affected are using Intel NIC X series. Other servers with I series NIC are working fine.
Do you use on-premise servers? Or any cloud provider?
We use OVH and it might be caused by some datacenter network misconfiguration.
I am trying to setup the docker which can successfully scan the subnet device's mac address by using nmap. And I've spent 3 days to figure out how to do it but still failed.
For example:
The host IP: 10.19.201.123
The device IP: 10.19.201.101
I've setup docker container which can ping 10.19.201.123 and 10.19.201.101 both successfully. But when I use nmap to scan mac address from docker container, I got below:
~$sudo nmap -sP 10.19.201.101
Starting Nmap 7.01 ( https://nmap.org ) at 2018-05-29 08:57 UTC
Nmap scan report for 10.19.201.101
Host is up (0.00088s latency).
Nmap done: 1 IP address (1 host up) scanned in 0.39 seconds
However, if I use nmap to scan mac address from VM (10.19.201.100), I got:
~$sudo nmap -sP 10.19.201.101
Starting Nmap 7.01 ( https://nmap.org ) at 2018-05-29 17:16 CST
Nmap scan report for 10.19.201.101
Host is up (0.00020s latency).
MAC Address: 0F:01:H5:W3:0G:J5(ICP Electronics)
Nmap done: 1 IP address (1 host up) scanned in 0.32 seconds
PLEASE, who can help or give prompts of how to do it?
For who is still struggling with this issue, I've figured out how to do it on Windows 10.
The solution is to make the container running on the same LAN as your local host, so nmap can scan the LAN device successfully. Below is the way to make your docker container run on the host LAN.
Windows 10 HOME
Change the virtual box setting
Stop VM first by administrator docker-machine stop default
Open Virtual Box
Select default VM and click Settings
Go to Network page, and enable new Network Adapter on Adapter 3
(DO NOT CHANGE Adapter 1 & 2)
Attached Adapter 3 to bridged Adapter with your physical network and click OK
Start VM by administrator docker-machine start default
Open Docker Quickstart Terminal to run container, the new container should be run on the LAN now.
Windows 10 PROFESSIONAL/ENTERPRISE
Create vSwitch with physical network adapter
Open Hyper-V Manager
Action list- > Open Virtual Switch Manager
Create new virtual switch -> select Type: External
Assign your physical network adapter to the vSwitch
Check "Allow management operating system to share this network adapter" and apply change
Go to Control Panel\All Control Panel Items\Network Connections.
Check the vEthernet you just created, and make sure the IPV4 setting is correct. (sometimes the dhcp setting will be empty and you need to reset again here)
Go back to Hyper-V Manager, and go into Setting page of MobyLinuxVM (ensure it's shut down, if it's not, Quit Docker)
Add Hardware > Network Adapter, select the vSwitch you just created and apply change
Modify Docker source code
Find the MobyLinux creation file: MobyLinux.ps1
(normally it's located at: X:\Program Files\Docker\Docker\resources)
Edit the file, and find the function: function New-MobyLinuxVM
Find below line in the function:
$vmNetAdapter = $vm | Hyper-V\Get-VMNetworkAdapter
Update it to:
$vmNetAdapter = $vm | Hyper-V\Get-VMNetworkAdapter | Select-Object -First 1
Save file by administrator
Restart Docker, and the container should run on the LAN now.
We are using haproxy to switch between a local MQTT broker and a cloud broker based on availability (with preference to the local server). haproxy.cfg looks something like this:
global
log 127.0.0.1 local1
maxconn 1000
daemon
debug
#quiet
tune.bufsize 1024576
stats socket /var/run/haproxy.sock mode 600 level admin
defaults
log global
mode tcp
option tcplog
retries 3
option redispatch
timeout connect 5000
timeout client 50000
timeout server 50000
# Listen to all MQTT requests (port 1883)
listen mqtt
bind *:1883
mode tcp
balance first # Connect to first available
timeout client 3h
timeout server 3h
option clitcpka
option srvtcpka
# MQTT server 1 - local wifi
server wifi_broker localserver.local:1883 init-addr libc,last,none check inter 3s rise 5 fall 2 maxconn 1000 on-marked-up shutdown-backup-sessions on-marked-down shutdown-sessions
# MQTT server 2 - cloud
server aws_iot xxxxx.amazonaws.com:8883 backup check backup ssl verify none ca-file ./root-CA.crt crt ./cert.pem inter 5s rise 3 fall 2
listen stats
bind :9000
mode http
stats enable # Enable stats page
stats hide-version # Hide HAProxy version
stats realm Haproxy\ Statistics # Title text for popup window
stats uri /haproxy_stats # Stats URI
Everything works fine if the local broker is available when haproxy starts up. However, if the wifi connection to the local machine is down when haproxy starts up, init-addr none still allows it to start using the backup server (aws_iot). The local server is marked as "Down for Maintenance" and no more health checks are performed. Even after the network is up and running, haproxy is unaware of it and does not switch back from the cloud server.
Is there any way to make it consider unresolved domain name the same as a normal "down" condition?
One alternative I see right now is to have a script polling the domain name in the background and sending an "enable server" command to the haproxy control socket once it is up. This seems overly roundabout for something that should be really simple!
Update:
Running the command echo "enable server mqtt/wifi_server" | socat /var/run/haproxy.sock stdio doesn't switch the backends after the local connection is up and running. haproxy just never switches back to the local server with anything short of restarting it.
Update 2:
Changed init-addr none to init-addr libc,last,none
You are using "init-addr none" so the server will start without any valid IP address when it is in a down state. In addition, your current config makes HAProxy able to resolve hostnames only at startup as mentioned here.
So to make HAProxy resolves localserver.local after startup to get right the IP and send health checks you need to configure a resolvers section in HAProxy.
I am using a Dockerfile to hit our corporate Nexus (npm) server for 'npm install' commands. I am seeing:
* Hostname was NOT found in DNS cache
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 216.xxx.xxx.xxx...
* connect to 216.xxx.xxx.xxx port 443 failed: Connection refused
* Failed to connect to nexus.<something>.com port 443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to nexus.<something>.com port 443: Connection refused
I can resolve www.google.com. I can hit and and use our corporate NPM registery from my local box. It appears that only our internal dns names are the problem and only when attempting to access them from inside a docker container. I've googled and not been able to determine the changes I need to make to fix this problem.
Dockerfile (I've trimmed the irrelevant commands):
FROM node:6.3
RUN curl -k -v https://www.google.com
RUN curl -k -vv https://nexus.<something>.com/repository/npm-all/
The curl to google.com succeeds. The curl to our internal repo fails.
I am starting it with the command:
docker build .
Contents of /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.1.1
I am running Ubuntu 15.10.
Solution
#BMitch is correct. Modify the contents of /etc/resolv.conf by adding dns server addresses associated with your corporate network. In the case of Ubuntu 15+ (i am running gnome3) your config file will be overwritten by the Network Manager so it cannot be hand edited. Make the changes via the Network Manager gui. Open Network Settings, select the DNS tab and add servers.
The local dns address is the issue you're facing. The container can't connect to your localhost ip from inside the container. The solution is to pass an ip address of the DNS server in your docker run or update your /etc/resolv.conf to point to a non-loopback ip address.
From Docker's DNS documentation:
Filtering is necessary because all localhost addresses on the host are
unreachable from the container’s network. After this filtering, if
there are no more nameserver entries left in the container’s
/etc/resolv.conf file, the daemon adds public Google DNS nameservers
(8.8.8.8 and 8.8.4.4) to the container’s DNS configuration. If IPv6 is
enabled on the daemon, the public IPv6 Google DNS nameservers will
also be added (2001:4860:4860::8888 and 2001:4860:4860::8844).
Note: If you need access to a host’s localhost resolver, you must modify your DNS service on the host to listen on a non-localhost
address that is reachable from within the container.