I have successfully installed rootless docker and now I'm trying to use docker swarm with it. I'm running four GCP instances. I followed below steps:
on Node 1
docker swarm init --advertise-addr 34.93.X.X
docker swarm join-token manager gives
docker swarm join --token SWMTKN-1-21vhv6gawb9mpur1v379sq52ia2jq4n0boqes0wos10o7m833l-5935hxvsht0x21o0qjpeqykae 34.93.X.X:2377
on Node 2
docker swarm join --token SWMTKN-1-2xtpxpc18p8qf3e4kb3dvsjr4a4ae786entmwuekh6w5bbfmpz-e5rhoya81d1pajet80wx34mcv 34.93.X.X:2377 --advertise-addr 34.93.X.X gives below error
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 34.93.X.X:2377: connect: connection refused"
NOTE
with rootful docker I'm able to join the nodes.
It's not possible today. It's not Swarm's fault, it's the design of Linux. Swarm (by default) uses overlay networking that creates virtual IP's, VXLAN routes, and more in iptables, and rootless (anything) can't control Linux networking to that level as far as I know.
See https://docs.docker.com/engine/security/rootless/#known-limitations
If your goal is just to lock down Docker, I think it's much more effective to things like User Namespaces (dockerd runs as root, but containers don't run as root), change the default user running in containers, and other steps I list here https://github.com/BretFisher/ama/issues/17
Related
My friend and I are trying to connect our Docker daemon using Docker Swarm. We both are using Windows OS and we are NOT on the same network. According to Docker docs each docker host must have the following ports open;
TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes
UDP port 4789 for overlay network traffic
We both have added new rules for the given ports in inbound and outbound rules in the firewall. Though we keep getting the same two errors while trying to join using token created by the manager node using docker swarm join --token command;
1. error response from daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.65.3:2377: connect: connection refused"
2. Timeout error
Also, if either of us runs docker swarm init it shows 192.168.65.3 IP address that isn't part of any network we're connected to. What does it mean?
Docker overlay tutorial also states that in order to connect to the manager node, the worker node should add the IP address of the manager.
docker swarm join --token \ --advertise-addr IP-ADDRESS-OF-WORKER-1
IP-ADDRESS-OF-MANAGER:2377
Does it mean that in our case we have to use public IP address of the manager node after enabling port forwarding?
Potential network issues aside, here is your problem:
We both are using Windows OS
I have seen this issue in other threads when attempting to use Windows nodes in a multi-node swarm. Here are some important pieces of information from the Docker overlay networks documentation:
Before you can create an overlay network, you need to either initialize your Docker daemon as a swarm manager using docker swarm init or join it to an existing swarm using docker swarm join. Either of these creates the default ingress overlay network which is used by swarm services by default.
Overlay network encryption is not supported on Windows. If a Windows node attempts to connect to an encrypted overlay network, no error is detected but the node cannot communicate.
By default, Docker encrypts all swarm service management traffic. As far as I know, disabling this encryption is not possible. Do not confuse this with the --opt encrypted option, as that involves encrypting application data, not swarm management traffic.
For a single-node swarm, using Windows is just fine. For a multi-node swarm, which would be deployed using Docker stack, I highly recommend using Linux for all worker and manager nodes.
A while ago I was using Linux as a manager node and Windows as a worker node. I noticed that joining the swarm would only work if the Linux machine was the swarm manager; If the Windows machine was the manager, joining the swarm would not work. After the Windows machine joined the swarm, container-to-container communication over a user-defined overlay network would not work no matter what. Replacing the Windows machine with a Linux machine fixed all issues.
Some strange troubleshouting with docker since laste update.
Can you help me about this ?
It’s is not my firstr upgrade of package and this case have been reproduice on a freshnew stack.
Updgraded from 18.09.9 to 19.03.12
OS : Ubuntu 16.04 Server
Docker package
docker-ce=5:18.09.9~3-0~ubuntu-bionic
docker-ce-cli=5:19.03.11~3-0~ubuntu-bionic
containerd.io=1.2.13-2
Details
A problem identified with version 19.03.12 of docker
Managers have been put in version 19.03.12
When you want to add a manager to the group with an active leader, an error message is visible
The different known solutions were used
Case
As soon as you play the docker swarm join --token command on non-leader managers, after a few minutes, the leader manager is no longer available
-> Forced to replay the docker swarm init command --force-new-cluster --advertise-addr xx.xx.xx.xx --listen-addr xx.xx.xx.xx: 2377 to find the leader operational
The leader sees the worker nodes in version 19.03.12. No problem with workers
Restarting the docker service leads to the same result
Error Message
The swarm does not have a leader
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
docker msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn"
References applied
https://github.com/moby/moby/issues/34384#:~:text=demote%20master%20...-,new-server%23%20docker%20node%20ls%20Error%20response%20from%20daemon%3A,too%20few%20managers%20are%20online.&text=have%20a%20leader.-,It%27s%20possible%20that%20too%20few%20managers%20are%20online.,of%20the%20managers%20are%20online.
Docker Node is Down after service restart
https://cynici.wordpress.com/2018/05/31/docker-info-rpc-error-on-manager-node/
https://gitmemory.com/issue/docker/swarmkit/2670/481951641
https://forums.docker.com/t/cant-add-third-swarm-manager-or-create-overlay-network-the-swarm-does-not-have-a-leader/50849
https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker
I am trying to learn docker and swarm. I created a swarm with 3 nodes and completed an example using virtualbox and docker-machine. I Once i restarted my machine, All nodes shown as stopped. I started all nodes using
docker-machine start node1 node2 node3
All node started but still I am not able to list nodes even on master node and getting below error:
docker#node1:~$ docker node ls
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
Also the docker state on node1 (master) is pending.
Swarm: pending
NodeID: c93hv5pixlfiei7q9qneuiuen
Error: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
I am getting this error every time i restarted my machine.This is causing me to setup everything from start each time.
Is there anyway I can avoid setting up cluster again and again.
Thanks
You must include the docker service start somewhere in your boot config.
Preventing
demote the node you are going to "switch off"/leave swarm
# find node id
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
o1iz67ehuenfzbyg2gjxayaee hostA Ready Active Reachable 20.10.6
fic857lrupfemxqie5rvq63yt * hostB Ready Active Leader 20.10.6
$ docker node demote o1iz67ehuenfzbyg2gjxayaee
Manager o1iz67ehuenfzbyg2gjxayaee demoted in the swarm.
# now on, the node can safely leave the swarm
$ docker swarm leave --force
Reacting
Restart if there are no healthy nodes.
Start >> stop Docker engine (NOT restart) and init Swarm again. Validate firewall ruleset afterwards as Docker overwrites it.
$ systemctl stop docker
$ systemctl start docker
Drain left node if there is healthy manager node.
Reference https://cynici.wordpress.com/2018/05/31/docker-info-rpc-error-on-manager-node/
please check the firewall on linux:
If you want to promote some NODE as manager, so you please check the port=2377 is accepting request on particular node. Then only Node work as manager. Otherwise you will get an error like below :
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
Solution : Add port number 2377 in firewall.
firewall-cmd --zone=public --add-port=2377/tcp --permanent
success
firewall-cmd --reload
success
I am trying to set up docker swarm with an overlay network. I have some hosts on aws while others are laptops running Ubuntu(same as on aws). Every node has a static public IP. I have created an overlay network as:
docker network create --driver=overlay --attachable test-net
I have created a swarm network on one of the aws hosts. Every other node is able to join that swarm network.
However when I run docker run -it --name alpine2 --network test-net alpine on any node not on aws, I get the error: docker: Error response from daemon: attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded.
But if I run the same on any aws host, then everything is working fine. Is there anything more I need to do in terms of networking/ports If there are some nodes on aws while others are not?
I have opened the ports required for swarm networking on all machines.
EDIT: All the nodes are marked as "active" when listing in the manager node.
UPDATE Solved this issue by opening the respective ports. It now works if all the nodes are Linux based. But when I try to make a swarm with the manager as Linux(ubuntu) os, mac os machines are not able to join the swarm.
check if the node in drain state:
docker node inspect --format {{.Spec.Availability}} node
if yes then update the state:
docker node update --availability active node
here is the explanation:
Resolution
When a node is in drain state, it is expected behavior that you should
not be able to allocate swarm mode resources such as multi-host
overlay network IP addresses to the node.However, swarm mode does not
currently provide a messaging mechanism between the swarm leader where
IP address management occurs back to the worker node that requested
the IP address. So docker run fails with context deadline exceeded.
Internal engineering issue escalation/292 has been opened to provide a
better error message in a future release of the Docker daemon.
source
Check if the below ports are opened on both machines.
TCP port 2377
TCP and UDP port 7946
UDP port 4789
You may use ufw to allow the ports:
ufw allow 2377/tcp
I had a similar issue, managed to fix it by making sure the ENGINE VERSION of the nodes were the same.
sudo docker node ls
Another common cause for this is Ubuntu server installer installing docker using snap, and that package is buggy. Uninstall with snap and install using apt. And reconsider Ubuntu. :-/
I am following this tutorial. I ran sudo docker swarm init --advertise-addr <myip> on 1st ubuntu machine. And then I took the manager join-token and ran it on 2nd ubuntu machine and it is able to join as manager.
But the problem starts when i run docker network create --attachable --driver overlay my-net on 1st machine, it gives me following error:
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
If I run the above command to create network before joining the 2nd node, the network gets created successfully and the 2nd node also gets joined to the 1st swarm node. But when I do anything on the 1st Ubuntu machine, I get the same error on it.
Both Ubuntu machines are in same network and can be pinged by each other.
Ubuntu version - 17.1 64 bit
Docker version 18.03.1-ce, build 9ee9f40
Docker-compose version 1.21.2, build a133471
It seems that the tutorial is off as you will only end up with two managers and that is not enough to form a quorum. You can either add an additional manager node or simply create a single manager (docker swarm init) and then join a single worker using the command that is output as part of the response to docker swarm init. You should SKIP the docker swarm join-token manager step from the tutorial.
Just change the IP of your Ubuntu Machine.
Machine->Settings->nNetwork->select Attached to Bridged Adapter.
restart your machine.