Docker Swarm: Joining the node not working - docker

I am trying to join a worker node to a manager in another machine. The former one is Mac and later one is Windows. The worker host on Mac have a response:
Timeout was reached before node joined. The attempt to join the swarm will continue in the background. Use the "docker info" command to see the current swarm status of your node.
When I typed the Join-Token command again, I received response saying the
This node is already part of a swarm. Use "docker swarm leave" to leave this swarm and join another one.
When I typed the command in manager side:
docker node ls
it only show one node which is the manager node.
Am I doing something wrong here.

Do you use the same docker version on all hosts?

Related

adding a manager back into swarm cluster

I have a swarm cluster with two nodes. 1 Manager and 1 worker. I am running my application on worker node and to run a test case, I force remove manager from docker swarm cluster.
My application continues to work, but I would like to know if there is any possibility to add back the force removed manager in the cluster again. (I don't remember the join-token and neither have them copied anywhere)
I understand docker advises to have odd number of manager nodes to maintain quorum, but would like to know if docker has addressed such scenarios anywhere.
docker swarm init --force-new-cluster --advertise-addr node01:2377
When you run the docker swarm init command with the --force-new-cluster flag, the Docker Engine where you run the command becomes the manager node of a single-node swarm which is capable of managing and running services. The manager has all the previous information about services and tasks, worker nodes are still part of the swarm, and services are still running. You need to add or re-add manager nodes to achieve your previous task distribution and ensure that you have enough managers to maintain high availability and prevent losing the quorum.
How about create new cluster from your previously manager machine and let the worker leave previous cluster to join the new one? That seems like the only possible solution for me, since your cluster now does not have any managers and the previous manager now not in any cluster, you can just create a new cluster and let other workers join the new one.
# In previously manager machine
$ docker swarm init --advertise-addr <manager ip address>
# *copy the generated command for worker to join new cluster after this command
# In worker machine
$ docker swarm leave
# *paste and execute the copied command here

Remove terminated instances ( managers ) from swarm , and recover swarm state

I have a docker swarm cluster, masters running on 6 AWS instances, during some testing, we accidentally terminated 3 instances ( running masters). Now the swarm state seems not working generating error like :
Error: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online
I tried removing the terminated managers through docker commands but whatever command I do like docker node ls or other commands it gives me the same error as above. I also tried adding new node, while adding to swarm it generates the same error.
I can see all the terminated instances IP's when I issue docker info inside one of the managers but cant do anything . How Can I recover from this state?
Node Address: 10.80.8.195
Manager Addresses:
10.80.7.104:2377
10.80.7.213:2377
10.80.7.226:2377
10.80.7.91:2377
10.80.8.195:2377
10.80.8.219:2377
The clustering facility within the swarm is maintained by the manager nodes.
In your case, you lost the cluster quorum by deleting half of the cluster managers. In this particular case, no node could elect a new manager leader and no manager could control the swarm.
In this case, the only way to recover your cluster is to re-initializing it and this will force the creation of a new cluster.
On a manager node, run this command:
docker swarm init --force-new-cluster
And on other manager nodes, I don't remember if they join the new cluster or if you need to leave and join the cluster again.

Docker swarm - flow of request in case of many managers

As we know in docker swarm we can have more than one manger. Let's suppose that we have 2 nodes and 2 managers (so each node is both manager and worker).
Now, let client (using CLI tool) execute following two seperated scenarios:
1. docker create service some_service
2. docker update --force some_service
where client is launched on one of swarm nodes.
Where will above requests be sent? Only to leader or to each worker node? How docker deal with simultaneous requests?
I assume you're talking about the docker cli talking to the manager api.
The docker cli on a node will default to connecting to localhost. Assuming you're on a manager, you can see which node your cli is talking to with docker node ls.
The * next to a node name indicates that's the one you're talking to.
From there if that node isn't the Leader, it will relay the commands to the Leader node and wait on a response to return to your cli. This all means:
Just ensure you're running the docker cli on a manager node or your cli is configured to talk to one.
It doesn't matter which manager, as they will all relay your command to the current Leader.

In Docker swarm mode, is there a way to get managers' information from workers?

In docker swarm mode I can run docker node ls to list swarm nodes but it does not work on worker nodes. I need a similar function. I know worker nodes does not have a strong consistent view of the cluster, but there should be a way to get current leader or reachable leader.
So is there a way to get current leader/manager on worker node on docker swarm mode 1.12.1?
You can get manager addresses by running docker info from a worker.
The docs and the error message from a worker node mention that you have to be on a manager node to execute swarm commands or view cluster state:
Error message from a worker node: "This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager."
After further thought:
One way you could crack this nut is to use an external key/value store like etcd or any other key/value store that Swarm supports and store the elected node there so that it can be queried by all the nodes. You can see examples of that in the Shipyard Docker management / UI project: http://shipyard-project.com/
Another simple way would be to run a redis service on the cluster and another service to announce the elected leader. This announcement service would have a constraint to only run on the manager node(s): --constraint node.role == manager

How to remove node from swarm?

I added three nodes to a swarm cluster with static file mode. I want to remove host1 from the cluster. But I don't find a docker swarm remove command:
Usage: swarm [OPTIONS] COMMAND [arg...]
Commands:
create, c Create a cluster
list, l List nodes in a cluster
manage, m Manage a docker cluster
join, j join a docker cluster
help, h Shows a list of commands or help for one command
How can I remove the node from the swarm?
Using Docker Version: 1.12.0, docker help offers:
➜ docker help swarm
Usage: docker swarm COMMAND
Manage Docker Swarm
Options:
--help Print usage
Commands:
init Initialize a swarm
join Join a swarm as a node and/or manager
join-token Manage join tokens
update Update the swarm
leave Leave a swarm
Run 'docker swarm COMMAND --help' for more information on a command.
So, next try:
➜ docker swarm leave --help
Usage: docker swarm leave [OPTIONS]
Leave a swarm
Options:
--force Force leave ignoring warnings.
--help Print usage
Using the swarm mode introduced in the docker engine version 1.12, you can directly do docker swarm leave.
The reference to "static file mode" implies the container based standalone swarm that predated the current Swarm Mode that most know as Swarm. These are two completely different "Swarm" products from Docker and are managed with completely different methods.
The other answers here focused on Swarm Mode. With Swarm Mode docker swarm leave on the target node will cause the node to leave the swarm. And when the engine is no longer talking to the manager, docker node rm on an active manager for the specific node will cleanup any lingering references inside the cluster.
With the container based classic swarm, you would recreate the manager container with an updated static list. If you find yourself doing this a lot, the external DB for discovery would make more sense (e.g. consul, etcd, or zookeeper). Given the classic swarm is deprecated and no longer being maintained, I'd suggest using either Swarm Mode or Kubernetes for any new projects.
Try this:
docker node list # to get a list of nodes in the swarm
docker node rm <node-id>
Using the Docker CLI
I work with Docker Swarm clusters and to remove a node from the cluster there are two options.
It depends on where you want to run the command, within the node you want to remove or on a manager node other than the node to be removed.
The important thing is that the desired node must be drained before being removed to maintain cluster integrity.
First option:
So I think the best thing to do is (as steps in official document):
Go to one of the nodes with manager status using a terminal ssh;
Optionally get your cluster nodes;
Change the availability to drained of the node you want to remove;
And remove it;
# step 1
ssh user#node1cluster3
# step 2, see the nodes in your cluster like print screen below
docker node ls
# step 3, drain one of them
docker node update --availability drain node4cluster3
# step 4, remove the drained node
docker node rm node4cluster3
Second option:
The second option needs two terminal logins, one on a manager node and one on the node you want to remove.
Perform the 3 initial steps described in the first option to drain the desired node.
Afterwards, log in to the node you want to remove and run the docker swarm leave command.
# remove from swarm using leave
docker swarm leave
# OR, if the desired node is a manager, you can use force (be careful*)
docker swarm leave --force
*For information about maintaining a quorum and disaster recovery, refer to the Swarm administration guide.
My environment information
I use Ubuntu 20.04 for nodes inside VMs;
With Docker version 20.10.9;
Swarm: active;

Resources