Docker swarm mode routing mesh not working - docker

When I deploy a service on a swarm using:
docker service create --replicas 1 --publish published=80,target=80 tutum/hello-world
I can access the service only from the ip of the node running the container. If I scale the service to run on both nodes, I can access the service from both ips, but it will never run from a container on the other node. (as confirmed by the tutum/hello-world image).
The documentation suggests that load balancing should work when it says:
Three tasks will run on up to three nodes. You don’t need to know which nodes are running the tasks; connecting to port 8080 on any of the 10 nodes will connect you to one of the three nginx tasks.
The swarm was created using swarm init and swarm join.
Using docker network ls the ingress swarm network is found on both nodes:
NETWORK ID NAME DRIVER SCOPE
cv6hk9wce8bf ingress overlay swarm
Edit:
Manager node runs linux, worker node runs OSX. Running modinfo ip_vs on the manager nodes returns:
filename: /lib/modules/4.4.0-109-
generic/kernel/net/netfilter/ipvs/ip_vs.ko
license: GPL
srcversion: D856EAE372F4DAF27045C82
depends: nf_conntrack,libcrc32c
intree: Y
vermagic: 4.4.0-109-generic SMP mod_unload modversions
parm: conn_tab_bits:Set connections' hash size (int)
Running modinfo ip_vs_rr returns:
filename: /lib/modules/4.4.0-109-
generic/kernel/net/netfilter/ipvs/ip_vs_rr.ko
license: GPL
srcversion: F21F7372F5E2331EF5F4F73
depends: ip_vs
intree: Y
vermagic: 4.4.0-109-generic SMP mod_unload modversions
Edit 2:
I tried adding a linux worker to the swarm, and it worked as advertised, so the problem appears to be related to the OSX machine.
The problem is solved for me, however, I'll let the question stay for future reference.

Ensure that 7946/tcp, 7946/udp, and 4789/udp are open and available to all nodes in the cluster BEFORE docker swarm init.
Not sure why, but if they are not open PRIOR to creating to the swarm, they will not properly load balance.
https://docs.docker.com/engine/swarm/ingress/

This happen to me, it was caused by a firewall issue. So I open the ports on every worker and manager.
sudo firewall-cmd --permanent --add-port=2377/tcp
sudo firewall-cmd --permanent --add-port=7946/tcp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reload
sudo reboot
Restart server if that doesn't work. Docker service may need to reload too.

Related

Can't start minikube inside docker network

I'm trying to start minikube on ubuntu 18.04 inside nginx proxy manager docker network in order to setup some kubernetes services and manage the domain names and the proxy hosts in the nginx proxy manager platform.
so I have nginxproxymanager_default docker network and when I run minikube start --network=nginxproxymanager_default I get
Exiting due to GUEST_PROVISION: Failed to start host: can't create with that IP, address already in use
what might I been doing wrong?
A similar error was reported with kubernetes/minikube issue 12894
please check whether there are other services using that IP address, and try starting minikube again.
Considering minikube start man page
--network string
network to run minikube with.
Now it is used by docker/podman and KVM drivers.
If left empty, minikube will create a new network.
Using an existing NGiNX network (as opposed to docker/podman) might not be supported.
I have seen NGiNX set up as ingress, not directly as "network".

FirewallD blocks inbound connections to Docker containers OpenSUSE Leap 15.3

I've got Nginx and a few other containers running under docker-compose.
I can access the web server from the host machine, but not from a remote one under port 80 and 443. Once I disable firewalld, I can access the Docker container, signifying this is a firewall issue. I've tried every single possible solution that people used for CentOS and Fedora, but they didn't work for me. Plus, I'm using Docker 20, so this shouldn't be a problem for me as there is a docker zone in firewalld that's supposed to configure everything.
My active zones:
docker
interfaces: docker0 br-5da83ae671bb br-7918f1f94df9
public
interfaces: br0 eth0
trusted
interfaces: tun0
sources: 10.0.0.0/24
The network I'm running the containers on is br-7918f1f94df9 which has ipv4 172.18.0.1/16
In general, does anyone have any commands for me to try to see if it gets fixed? Let me know if I need to include anything else.
Epilogue: Maybe I'm just pissed because I've been having this issue for the past two days now, but I think that SUSE is just a terrible distro for user-friendliness because I keep getting issues like this that seemingly have no fix. What's a better distribution in your opinion?
Remove the interface docker0 from being managed by firewalld
Check the docker documentation https://docs.docker.com/network/iptables/
Please substitute the appropriate zone and docker interface
firewall-cmd --zone=docker --remove-interface=docker0 --permanent
firewall-cmd --reload

Error response from daemon: attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded

I am trying to set up docker swarm with an overlay network. I have some hosts on aws while others are laptops running Ubuntu(same as on aws). Every node has a static public IP. I have created an overlay network as:
docker network create --driver=overlay --attachable test-net
I have created a swarm network on one of the aws hosts. Every other node is able to join that swarm network.
However when I run docker run -it --name alpine2 --network test-net alpine on any node not on aws, I get the error: docker: Error response from daemon: attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded.
But if I run the same on any aws host, then everything is working fine. Is there anything more I need to do in terms of networking/ports If there are some nodes on aws while others are not?
I have opened the ports required for swarm networking on all machines.
EDIT: All the nodes are marked as "active" when listing in the manager node.
UPDATE Solved this issue by opening the respective ports. It now works if all the nodes are Linux based. But when I try to make a swarm with the manager as Linux(ubuntu) os, mac os machines are not able to join the swarm.
check if the node in drain state:
docker node inspect --format {{.Spec.Availability}} node
if yes then update the state:
docker node update --availability active node
here is the explanation:
Resolution
When a node is in drain state, it is expected behavior that you should
not be able to allocate swarm mode resources such as multi-host
overlay network IP addresses to the node.However, swarm mode does not
currently provide a messaging mechanism between the swarm leader where
IP address management occurs back to the worker node that requested
the IP address. So docker run fails with context deadline exceeded.
Internal engineering issue escalation/292 has been opened to provide a
better error message in a future release of the Docker daemon.
source
Check if the below ports are opened on both machines.
TCP port 2377
TCP and UDP port 7946
UDP port 4789
You may use ufw to allow the ports:
ufw allow 2377/tcp
I had a similar issue, managed to fix it by making sure the ENGINE VERSION of the nodes were the same.
sudo docker node ls
Another common cause for this is Ubuntu server installer installing docker using snap, and that package is buggy. Uninstall with snap and install using apt. And reconsider Ubuntu. :-/

ELK containers are unable to connect on different nodes when firewall is running

I have a ELK docker swarm setup running across 4 different hosts. I am able to ping the containers which are on different host but when I try to run curl commands it is not connecting curl http://elastic:9200. Logstash and kibana applications are unable to connect to elasticsearch containers (3 node es cluster) which are on different hosts. I have opened all the ports mentioned in docker swarm documentation across all hosts https://docs.docker.com/engine/swarm/swarm-tutorial/#the-ip-address-of-the-manager-machine but no luck. After stopping firewall on all hosts, LS/Kibana are able to connect to elasticsearch.
Attempted to resurrect connection to dead ES instance, but got an error Unable to connect to Elasticsearch at http://es-proxy:9200/
Has anyone experienced this issue? Thanks.
I was able to resolve the issue by adding my docker network interface into a trusted zone.
https://success.docker.com/article/firewalld-problems-with-container-to-container-network-communications
Lastly, I added my docker subnet as a trusted source.
firewall-cmd --permanent --zone=trusted --add-source=subnet/range

Swarm node Status down, but node should be Ready

I am trying to run a service on a swarm composed of three Raspberry PIs.
I have one manager and two worker nodes.
The problem is that sometimes the status of the worker nodes is "Down" even if the nodes are correctly switched on and connected to the network.
I just started using Docker so I might be doing something wrong, but everything seems to be correctly set.
How would you avoid that "Down" status?
It can depend on your exact version of docker, but your issue was seen in this thread
A possible workaround was to do a docker ps, which seems to helped nodes to join the swarm.
In my case, the docker node had invalid default route and DNS did not work. I was anyways able to ssh on the machine by ip address. I tested first:
ping google.com
Which did not work. Then I changed the default route:
route -n
route add default gw 10.1.2.3
route del default gw 10.1.2.1 (offending gateway)
And finally changed the DNS server from:
/etc/resolv.conf
Then the node came up automatically.
I've had the same issue before. You can fix it by cleaning up /var/lib/docker/swarm/ on the problematic node, then reattach it to the swarm.
1) on problem node
sudo systemctl stop docker
sudo rm -rf /var/lib/docker/swarm
2) on swarm manager
docker node rm <problem-node-name>
docker swarm join-token worker
docker swarm join --token <token> <manager_ip>:2377
3) on problem node
sudo systemctl start docker
enter code here
docker swarm join --token <token> <manager_ip>:2377
In my case, (virtual) network devices changed. Just adjusted settings, did docker swarm leave and docker swarm join for each of the nodes with the problem and then from the manager I removed (docker node rm ...) them. Worked without issues after that.

Resources