Join multiple docker swarm clusters - docker-swarm

There are two clusters of docker swarm. In cluster1 there is one manager and two workers. In cluster2 there is one manager and one worker.
How to join these two docker swarm clusters (cluster1 and cluster2)?

No, not directly. But, you could remove from on cluster and then join the other one. Because, the Manager node handles the worker nodes, their encryption, etc. which can not be done by multiple swarm managers at the same time.
You can always let workers and managers join the swarm node cluster(even nodes are on overlay network). So, in your case cluster1 and cluster2 is running. You have to leave the nodes from the cluster2 so that they can join the cluster1 i.e. a node(manager or worker) can be a part of a single cluster at a time. You must have to docker node rm from cluster2 and then docker swarm join with the token of cluster1 or vice-versa.

Related

Change Leader in Docker Swarm

I have a docker swarm environment with three managers. They are named swarm1, swarm2 and swarm3. Due to different circumstances (e.g. network and resources), swarm1 was set as the leader and should stay as the leader. However, after a resource upgrade, swarm1 was rebooted. This led to that swarm2 is set as the leader and swarm1 has now the status reachable. How is it possible to set again swarm1 to the leader?
With swarm managers, it's bad practice to have a "special" node that needs to be the leader at all times. All of your manager nodes should be as identical as possible. But, to answer your question, there is no way to manually set the swarm leader. However, what you can do is docker node demote the leader (swarm2), and the other manager (swarm3). Once the managers are demoted to workers, swarm1 by default becomes the leader. Then all you have to do is docker node promote swarm2 and swarm3.
I'll attempt to write the whole scenario, so everything is in one place about changing the leader and reverting.
Scenario: Three nodes are swarm1, swarm2 and swarm3. First leader (that needs resource upgrade is swarm1)
Step 1
Make swarm2 as leader
docker node promote swarm2
docker node ls
Make sure that the swarm2 Manager status is Reachable (Not Down)
Step 2
Demote swarm1 now to a worker
docker node demote swarm1
docker node ls
Ensure Manager status for swarm1 is now blank.
You may remove the node from swarm.
From swarm1 node, issue:
docker swarm leave
Do the necessary upgrades to the swarm1 node and join it back to the swarm.
Step 3
Rejoin to the swarm and change lead.
From swarm2 issue
docker swarm join-token manager
From swarm1 node issue the join command
eg:
docker swarm join --token <token> <ip:port>
Step 4
Now re-elect the leader.
From swarm1 issue:
docker node demote swarm2
docker node ls
Now you should see your old configuration with swarm1 as leader.

Docker container deployment to mulitple nodes in a swarm cluster

I have a swarm cluster of 6 worker nodes and 3 master nodes so a total of 9 nodes.
I am having machines of different sizes in my swarm cluster.
So there is a requirement that I need to deploy certain services (containers) on particular worker nodes as per the size of the node.
I am aware we can have placement constraints in the docker-compose file and can specify the hostname.
Since I will be running 2 replicas of the service so swarm will create replicas on the same worker to which I have set the constraint. But I don't want the replicas to be running on the same worker node.
Can we have an option to specify multiple hostnames while setting the placement constraint? Please guide.
You can to combine some nodes by labels.
docker node update --label-add ssd=fat_machine hostname1
docker node update --label-add ssd=fat_machine hostname2
and define label in docker-compose
constraints: [node.labels.ssd == fat_machine]

Do I need to install docker in all my nodes inside the swarm mode?

I know this is a basic question. But I'm new to docker and have this query.
Do I need to install docker in all my nodes that are part of my swarm mode?.
If so what are the ways that I install docker in all my nodes in one shot?
Of course you need to install Docker and its dependencies on each node. On one of the manager nodes, you need to initiate the swarm with docker swarm init and then you join the other machines either as manager nodes or worker nodes.
The number of manager nodes depends on your requirement to compensate node losses:
1 manager node: requires 1 healthy node
3 manager nodes: require 2 healthy nodes for quorum, can compensate 1 unhealthy node
5 manager nodes: require 3 healthy nodes for quorum, can compensate 2 unhealthy nodes
7 manager nodes: require 4 healthy nodes for quorum, can compensate 3 unhealthy nodes
more than 7 is not recommended due to overhead
Using a even number does not provide more reliability, it is quite the oposite. If you have 2 manager nodes, the loss of either one of them renders the cluster headless. If the cluster is not able to build quorum (requires the majority of of manager nodes beeing healthy), the cluster is headless and can not be controlled. Running containers continue to run, but no new containers can be deployed, failed containers won't redeploy, ...).
People usualy deploy a swarm configuration with a configuration management tool like Ansible, Puppet, Chef or Salt.

Docker swarm strategy

Can anyone share their experience of changing the docker swarm scheduling strategy as there are three (spread, binpack and random). spread is default strategy used by docker swarm and I want it change to binpack.
The Swarm scheduling strategies you've listed are for the Classic Swarm that is implemented as a standalone container that acts as a reverse proxy to various docker engines. Most everyone is using the newer Swarm Mode instead of this, and little development effort happens for Classic Swarm.
The newer Swarm Mode includes a single option for the scheduler that can be tuned. That single option is an HA Spread algorithm. When you have multiple replicas of a single service, it will first seek to spread out those replicas across multiple nodes meeting the required criteria. And among the nodes with the fewest replicas, it will pick the nodes with the fewest other scheduled containers first.
The tuning of this algorithm includes constraints and placement preferences. Constraints allow you to require the service run on nodes with specific labels or platforms. And the placement preferences allow you to spread the workload across different values of a given label, which is useful to ensure all replicas are not running within the same AZ.
None of these configurations in Swarm Mode include a binpacking option. If you wish to reduce the number of nodes in your swarm cluster, then you can update the node state to drain workload from the node. This will gracefully stop all swarm managed containers on that node and migrate them to other nodes. Or you can simply pause new workloads from being scheduled on the node which will gradually remove replicas as services are updated and scheduled on other nodes, but not preemptively stop running replicas on that node. These two options are controlled by docker node update --availability:
$ docker node update --help
Usage: docker node update [OPTIONS] NODE
Update a node
Options:
--availability string Availability of the node ("active"|"pause"|"drain")
--label-add list Add or update a node label (key=value)
--label-rm list Remove a node label if exists
--role string Role of the node ("worker"|"manager")
For more details on constraints and placement preferences, see: https://docs.docker.com/engine/reference/commandline/service_create/#specify-service-constraints---constraint

How to recover a node in quorum?

I have set up a three-node quorum network using docker and if my network crashes, and I only have information of one of the node. Do I want to recover that node using binary? also, the blocks in the new network should be in sync with others. Please guide how is it possible?
I assume you’re using docker swarm. The clustering facility within the swarm is maintained by the manager nodes. If for any reason the manager leader becomes unavailable and no enough remaining managers to reach quorum and elect a new manager leader, a quorum is lost and no manager node is able to control the swarm.
In this kind of situations, it may be necessary to re-initialize the swarm and force the creation of a new cluster using the command on the manager leader when it is brought online again:
# docker swarm init --force-new-cluster
This removes all managers except the manager the command was run from. The good thing is that worker nodes will continue to function normally and the other manager nodes should resume functionality once the swarm has been re-initialized.
Sometimes it might be necessary to remove manager nodes from the swarm and rejoin them to the swarm.
But note that when a node rejoins the swarm, it must join the swarm via a manager node.
You can always monitor the health of manager nodes by querying the docker nodes API in JSON format through the /nodes HTTP endpoint:
# docker node inspect manager1 --format "{{ .ManagerStatus.Reachability }}"
# docker node inspect manager1 --format "{{ .Status.State }}"
Also, make it a practice to perform automate the backup of docker swarm config directory /var/lib/docker/swarm to easily recover from disaster.

Resources