I have 2 Docker Hosts with identical Docker Version and Configs.
Since yesterday i have a Problem with the Overlay Networks on one of the Hosts.
What i did:
docker network create -d overlay --attachable test_network
docker run --rm -it --network="test_network" --name test.DNS.name bash
wget test.DNS.name
On One Host the response is as expected:
bash-4.4# wget test.DNS.name
Connecting to test.DNS.name (10.0.1.6:80)
wget: can't connect to remote host (10.0.1.6): Connection refused
On the other the response is:
bash-4.4# wget test.DNS.name
wget: bad address 'test.DNS.name'
I have no idea where this could come from.
Every Idea is welcome
There the output from docker info (identical on both machines):
Containers: 3
Running: 2
Paused: 0
Stopped: 1
Images: 156
Server Version: 18.03.0-ce
Storage Driver: devicemapper
Pool Name: vg01-docker--pool
Pool Blocksize: 524.3kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data Space Used: 22.89GB
Data Space Total: 42.95GB
Data Space Available: 20.06GB
Metadata Space Used: 6.922MB
Metadata Space Total: 5.583GB
Metadata Space Available: 5.576GB
Thin Pool Minimum Free Space: 4.295GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: m99xj3rhtk8ymabv1k78v4pur
Is Manager: true
ClusterID: of2g9dy3xbwj2jye6potfq5dq
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.16.21.145
Manager Addresses:
10.16.21.145:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfd04396dc68220d1cecbe686a6cc3aa5ce3667c
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.21.1.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.4 (Maipo)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 62.74GiB
Name: gtunxlvd04346
ID: QPYZ:ZYLJ:DNXO:VCRC:Y2CG:LOEN:WSKN:52X5:JPNX:FLSB:GPIR:FY3U
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
thanks Constantin for your suggestion, we restartet the Host with the Problem and now it is working as expected.
So this Problem can be solved with a restart of the Host.
It would still be interesting to find the cause of this, but i think that if there is no relation between researching this issue and the chance this happens again.
Related
My Setup: I have a single machine docker swarm "cluster".
Simply said, I have a stack deployment that is composed of two services A and B. Both services are connected (through an external overlay network) to another stack running a traefik proxy to expose those services to the public.
I can reach both services via their traefik routing from my browser.
What doesn't work though:
I can not reach service A from within service B using A's public domain (via its traefik routing). I always get a connection timeout when attempting a HTTP call.
I this some regular expected behavior that can be fixed with some option or is my setup somehow broken? I read that endpoint_mode: dnsrr might help in some situations of this kind but it really didn't make a difference for me. I tried it on both services as well as on the traefik service.
I don't want to overwhelm you with all the configuration details of my machine and swarm deployments right here as that might be overkill if I just made a configuration mistake that's obvious from my problem description.
For the ambitious reader, here are some further details:
$ docker info
Client:
Debug Mode: false
Server:
Containers: 125
Running: 59
Paused: 0
Stopped: 66
Images: 328
Server Version: 19.03.8
Storage Driver: overlay2
Backing Filesystem: <unknown>
Supports d_type: true
Native Overlay Diff: true
Logging Driver: loki
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: juwtdqk60ufj3rityctog7ev0
Is Manager: true
ClusterID: 3mte2zcw4nfc1jq17dzwvtoi3
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 178.254.21.80
Manager Addresses:
178.254.21.80:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-96-generic
Operating System: Ubuntu 18.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.41GiB
Name: rv1324
ID: L7JD:POVR:EPMY:JGMF:3BFX:DQJA:B5KK:O3PG:YH44:TNLJ:YD3I:GROZ
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
The swarm stack I'm trying to deploy can be found here
https://github.com/skuzzle/cmp/blob/0f8004b41f1a486fee7b6705c4bcbc39a2414412/swarm-stack/feature.yml
The connectivity problem is between the cmpauthorization and the cpmfrontend service. In order to finish an OAuth2 authorization process, the latter service needs to send a POST request to the authorization service's public domain.
I would like to test some docker swarm features and for that I have a windows PC and a mac book pro, both in my private Network.
I installed Docker for Windows (Windows 10 pro, using linux containers) and also Docker for mac.
Then I started both of them and also configured my router to allow the ports they need for TCP and UDP:
Port 2377 TCP for node communication
Port 7946 TCP/UDP for container network discovery.
Port 4789 UDP for the container ingress network.
Also I deactivated the firewall both on my pc and on my mac.
Then I ran docker swarm init on my macbook, which gave me a join token.
On my windows PC I entered that join command in the console and....... it failed!
I got an error message that ends with "... connection refused".
So, can you give me some advise or links to how to properly connect to local machines via docker swarm? I would LOVE to test it and use it for local development and testing of my apps. thanks!
Docker Info from Mac
$ docker info
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 185
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: v3fhiinezmdbbn98l0s6bgqzo
Is Manager: true
ClusterID: o9mcdlgtq37t5r86ganupstez
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 192.168.65.3
Manager Addresses:
192.168.65.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 4.095GiB
Name: linuxkit-025000000001
ID: 2D57:Q3QP:6UZ2:S6JV:WXLG:JN4H:TR6G:V3C3:P6ZP:2ENA:L7ES:OIJD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Docker Info from Windows
$ docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.09.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.125-linuxkit
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs: 3
Total Memory: 7.768GiB
Name: linuxkit-00155d674805
ID: S7LD:PA6I:QGZR:YFQH:BR62:JS5C:DZLS:C6O3:RZUL:7ZXE:PRI6:HPRD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 22
Goroutines: 46
System Time: 2019-04-11T13:28:11.3484452Z
EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
Docker swarm join command output
$ docker swarm join --token SWMTKN-1-5rp7ownwv3ob27vl52ogo8z6d3mbxasdfasdfsadfkrf8hqjk1b5-bi2p5u7i7blk5wepw389sba0w 192.168.x.x:2377
Error response from daemon: rpc error: code = Unavailable desc = all
SubConns are in TransientFailure, latest connection error:
connection error:
desc = "transport: Error while dialing dial tcp 192.168.x.x:2377:
connect: connection refused"
The problem is that netiher docker Desktop for Mac nor for Windows with Linux containers are "true" dockers. Both are using virtual machines with Linux os where true docker engine works.
If I'm correct, 192.162.65.3 is not the IP of your Mac but the IP of vm within some virtual mac network.
Basing on this article https://docs.docker.com/docker-for-mac/docker-toolbox/ and this sentence "Also note that Docker Desktop for Mac can’t route traffic to containers, so you can’t directly access an exposed port on a running container from the hosting machine." Connecting Mac and Windows on Linux containers might not be easy.
I'd recommend for testing either get some cloud VMs or on Windows you can use docker-machine command to spawn multiple Linux VMs on which you can setup local swarm to test features you wish.
Steps to reproduce the issue:
I am creating 5 node docker cluster AWS instances
Terminate the leader(Primary Manager) from AWS console
docker node demote (primary-node) from the terminal
docker node rm (primary-node)
docker node promote (worker-node) / docker node update (worker-node) --role manager
After Step 5 ,
Status of the worker node is down
Availability is active.
Status of the worker node is always down
I tried
docker node update --availability active (worker-node)
But it doesnt help
Output of docker version:
Docker version 18.03.1-ce, build 9ee9f40
Output of docker info:
Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 18
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: pw2cgzi62tr1g5yn42gdue9sd
Is Manager: true
ClusterID: 4l9ngpfnlqov063np7efi9idw
Managers: 2
Nodes: 4
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.31.4.129
Manager Addresses:
172.31.4.129:2377
172.31.6.143:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-1060-aws
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 990.7MiB
Name: ip-172-31-4-129
ID: CBXR:P6YF:ICCQ:R7LT:XM4M:BBMD:N4FN:ZPRI:3VOC:FO54:Y7I6:6LHK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Fix is as below:
1) When the primary leader leaves the cluster, swarm will chose a new leader.
2) Generate manager token of the New Primary leader
3) SSH into worker . Docker swarm leave . Docker swarm join with manager token
I have docker setup as 1 Manager and 1 Worker. Both node are separate machine within a same network.
Initialized docker swarm in manager node and connected another PC to swarm using the docker swarm join-token worker code generated by manager node.
docker info Manager Node
Containers: 16
Running: 5
Paused: 0
Stopped: 11
Images: 303
Server Version: 18.03.1-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 572
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: v5out80i284bavbkhrny82non
Is Manager: true
ClusterID: 2h6jhemo4ch03zzk9dm8hkn97
Managers: 1
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.0.0.1
Manager Addresses:
10.0.0.1:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.13.0-37-generic
Operating System: KDE neon Developer Edition
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.691GiB
Name: wannamit
ID: KR2B:Q2E6:GAPR:HY6X:PYZQ:KUMU:DXCE:7YKI:E5MM:RRHO:BBWG:GM6S
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: amithp
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
docker info Worker Node
Containers: 4
Running: 4
Paused: 0
Stopped: 0
Images: 4
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: vvusfvjpenc9ymsotj4bcs25c
Is Manager: false
Node Address: 192.168.86.38
Manager Addresses:
10.0.0.1:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-22-generic
Operating System: Ubuntu 18.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.093GiB
Name: ubuntu
ID: SKCQ:JZGO:VUHX:HZN5:JD4H:4KPM:5RXK:DWG2:A7E6:WU4T:VQ5N:YHQB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
A Simple is deployed with 5 replication.
version: "3.2"
services:
webapp:
image: amithp/pyapp:latest
deploy:
replicas: 5
restart_policy:
condition: on-failure
resources:
limits:
cpus: "0.1"
memory: 50M
ports:
- "28888:28888"
networks:
- frontend-network
redis:
image: redis
command: redis-server --appendonly yes
deploy:
restart_policy:
condition: on-failure
networks:
- frontend-network
networks:
frontend-network:
external:
name: frontend-network
Deployment is a success. Mostly 2 is deployed in Manager node, 3 other and Redis is deployed over worker node. The docker image is flask app that shows total view count and origin IP from app is being served.
Hello world!
Hostname: 351d83b03555
HostIP: 10.0.0.28
Visits: cannot connect to Redis, counter disabled
Now if I visit localhost:2888 from manager node. It cannot connect to redis and only loops over 2 different IPs. I cross-checked those IP and they are from manager node. Also, I identified IP of containers in worker node and tried to ping from manager node's container, response cannot connect to host (no ping reply).
Am I doing something wrong or did I miss something?
Command of running service:
docker service create -d \
-p 8080:8080 \
--mount type=bind,source=/etc/localtime,target=/etc/localtime \
--mount type=bind,source=/etc/timezone,target=/etc/timezone \
--mount type=bind,source=/home/test/docker/manager,target=/root \
--network test-network \
--workdir /root \
--name test-manager \
--replicas 2 \
--limit-cpu 2 \
--limit-memory 4G \
java:8 java -Dspring.profiles.active=$PROFILE -jar -Xms512m -Xmx4096m /root/target/test-manager.jar
After service started, I tested service with curl 192.168.2.48:8080/info, 50% of the requests are not working. entered containers with docker exec -it xxx bash, used curl 10.0.1.6:8080/info and curl 10.0.1.7:8080/info, found all result was ok.
But if I restart the above service several times, Sometimes, all requests are working completely.
Network check
nc -vuz 192.168.2.48 4789
nc -vz 192.168.2.48 2377
nc -vuz 192.168.2.48 7946
nc -vz 192.168.2.48 7946
All are succeeded.
Results of docker info:
Containers: 4
Running: 3
Paused: 0
Stopped: 1
Images: 25
Server Version: 17.06.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 102
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: tdc32kn0n6bwcz32ljvprcmq0
Is Manager: true
ClusterID: hdyakushxu1c6rsk2cml7b0l3
Managers: 2
Nodes: 2
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Root Rotation In Progress: false
Node Address: 192.168.2.47
Manager Addresses:
192.168.2.47:2377
192.168.2.48:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-92-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.78GiB
Name: ubuntu-qgsp01
ID: RP4U:E3PW:AU5R:BLD2:2QDL:DA25:GY2P:YV67:IR2F:GEBZ:XVX3:XC72
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://20agqwyc.mirror.aliyuncs.com/
Live Restore Enabled: false
I suspect the problem caused by network load balance or vip, a node cannot be reached, so the service was hanged. But if I ping another in a container, I find the network is working. I feel puzzled.
This problem has been bothering me for a long time,hope someone can help me.