How to stop and start a whole cluster Swarm - docker

My question is simple: I need to stop all nodes of a cluster swarm, how to stop properly it before shutdown nodes ?
Next, how to restart it properly ?
In my cluster (10 nodes), there are at least 60 services

If all services are in the same docker stack you can stop all services running on the nodes using: docker stack rm.
To start it again (restart) you run docker stack deploy

Related

Delay Docker Swarm Replicas Reschedule when Node Restart

I currently have a 3 node swarm mode cluster. 1 manager and 2 workers. I have created a service with replicas of 20. When running docker service ps <service>, I do see all replicas have been deployed evenly in 3 nodes. I believe the default swarm placement strategy is spread instead of binpack. That's all good. The problem is when I restart one of the workers after some OS maintenance. The node will take a while to reboot, but at this time, I do not want the services to reschedule to the other 2 nodes because I know the restarted node will soon come back online. Is there a way to delay swarm to reschedule replicas after a node reboot or failure? I want to give it more time before confirming the node is really failed, like maybe 5 minutes or so.
Docker version 20.10.7

Docker Swarm bringing up containers after node is down only after 1 minute

I'm testing Docker Swarm behavior for case when one Swarm Node is down.
And I've see that Docker Swarm start bringing up containers from failed node only after 50-60 seconds which is quite long from my point of view. Containers themselves starting quite fast.
So, is it possible to tune Swarm somehow to decrease this time when Swarm figured out that node is down and it's needed to start new containers?

How to balance containers on newly added node with same elastic IP?

I need help in distributing already running containers on the newly added docker swarm worker node.
I am running docker swarm mode on docker version - 18.09.5. I am using AWS autoscaling for creating 3 masters and 4 workers. For high availability, if one of the workers goes down, all the containers from that worker node will be balanced on other workers. When autoscaling brings new node up, I am adding that worker node to the current docker swarm setup using some automation. But docker swarm is not balancing containers on that worker node. Even I tried to deploy the docker stack again, still swarm is not balancing the containers. Is it because of different node id? How can I customize it? I am using docker compose file deploying stack.
docker stack deploy -c dockerstack.yml NAME
The only (current) to force re-balancing, is to force-update the services. See https://docs.docker.com/engine/swarm/admin_guide/#force-the-swarm-to-rebalance for more information.

leave docker swarm, but leave services running

Is it possible to have a docker swarm node leave the swarm, but keep running services started while being a member of the swarm?
Short answer: don't try to do that.
This would be against the design of swarm mode. For the same reason that docker disables the live-restore functionality when you have swarm mode, you shouldn't be able to keep services running when a node leaves the swarm cluster. The logic behind both decisions is when swarm mode detects that the target state of the service doesn't match the current state, it will do what it can to achieve that target state.
With the live-restore, docker would normally leave containers running when the daemon is stopped, and restore those containers to the docker daemon when the daemon restarts. And similarly, if the containers continued to run when the node leaves the swarm, from the view of the swarm manager, the result would be the same, containers running that the manager has no way to track the current state.
Since the current state cannot be tracked in those scenarios, docker errors on the side of stopping the container when it gracefully stops or leaves the swarm. The scenario where containers will continue to run is during an ungraceful disconnection from the swarm manager(s). In that scenario, the worker doesn't know if only it is down and the workload has been rescheduled elsewhere, or if only the manager is down, and stopping the containers would turn a small outage into a big one, so docker errors on the side of leaving the containers running when the disconnect is uncontrolled.

Can Docker-Swarm run in fail-over-mode only?

I am facing a situation, where I need to run about 100 different applications in Docker containers. It is not reasonable to run all 100 containers on one hardware, so I need to spread the applications over several machines.
As far as I understood, docker-swarm is for scaling only, which means when I run my containers in a swarm, than all 100 containers will automatically be deployed and started on every node of my docker-swarm. But this is not what I am searching for. I want to split the applications and for example run 50 on node1 and 50 on node2.
Question 1:
Is there a way to configure docker-swarm in a way, where my applications will be automatically dispatched on the node which has the most idling resources?
Question 2:
Is there a kind of fail-over-mode in docker swarm which can stop a container on one node and try to start it on on another in case something goes wrong?
all 100 containers will automatically be deployed and started on every node of my docker-swarm
This is not true. When you deploy 100 containers in a swarm, the containers will be distributed on the available nodes in the swarm. You will mostly get an even distribution of containers on all nodes.
Question 1: Is there a way to configure docker-swarm in a way, where my applications will be automatically dispatched on the node which has the most idling resources?
Docker swarm does not check the available resources (memory, cpu ...) available on the nodes before deploying a container on it. The distribution of containers is balanced per nodes, without taking into account the availability of resources on each node.
You can however build a strategy of distributing container on the nodes. You can use placement constraints were you can restrict where a container can be deployed. You can label nodes having a lot of resources and restrict some heavy containers to only run on these nodes.
Question 2: Is there a kind of fail-over-mode in docker swarm which can stop a container on one node and try to start it on on another in case something goes wrong?
If a container crashes, docker swarm will ensure that a new container is started. Again, the decision of what node to deploy the new container on cannot be predetermined.

Resources