Docker Engine version upgrade on Docker Swarm managed nodes, without downtime - docker

I'd like to upgrade the Docker engine on my Docker Swarm managed nodes (both manager and worker nodes) from 18.06 to 19.03, without causing any downtime.
I see there are many tutorials online for rolling update of a Dockerized application without downtime, but nothing related to upgrading the Docker engine on all Docker Swarm managed nodes.
Is it really not possible to upgrade the Docker daemon on Docker Swarm managed nodes without a downtime? If true, that would indeed be a pity.
Thanks in advance to the wonderful community at SO!

You can upgrade managers, in place, one at a time. During this upgrade process, you would drain the node with docker node update, and run the upgrade to the docker engine with the normal OS commands, and then return the node to active. What will not work is to add or remove nodes to the cluster while the managers have mixed versions. This means you cannot completely replace nodes with an install from scratch at the same time you upgrade the versions. All managers need to be the same version (upgraded) and then you can look at rebuilding/replacing the hosts. What I've seen in the past is that nodes do not fully join the manager quorum, and after losing enough managers you eventually lose quorum.
Once all managers are upgraded, then you can upgrade the workers, either with in place upgrades or replacing the nodes. Until the workers have all been upgraded, do not use any new features.

You can drain your node and after that upgrade your docker version,then make this ACTIVE again.
Repeat this step for all the nodes.
DRAIN availability prevents a node from receiving new tasks from the swarm manager. Manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability.
For detailed information you can refer this link :- https://docs.docker.com/engine/swarm/swarm-tutorial/drain-node/

Related

Possible reasons for docker swarm loosing ingress load balancing after patching of underlying VM and upgrade of docker version

We have a set of 3 managers and 3 workers in a docker swarm cluster (community edition) running on RHEL 8.1 in a DMZ. We have a similar like to like set up in a non prod environment where we don't have issues when we patch the underlying VMs to latest RHEL 8.x versions including the docker version upgrade to the latest versions.
But any time we try patching the production cluster VMs, even though the swarm on the surface comes back up fine and we see all the services and tasks running, but for some weird reason the docker swarm looses the docker ingress load balancing capability. We have tried upgrading several different ways and many times, but every time we end up with same result and we have had to revert.
Can any one please shed some light into where to look and why this could be happening ?
Thanks in advance,
ethtool -K <interface> tx off
Fixed it for us, see: Docker Swarm Overlay Network ICMP Works, But Not Anything Else

How services are distributed in docker swarm

Can I somehow configure how master node distributes services in docker swarm? I thought, that it should see free resources of worker nodes and distribute it to "freest" node.
Currently I have problem, that service is distributed into one node, which is full (90% RAM) and it starts be laggy, but at the same time second node has few services and it can handle another one.
docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
wdkklpy6065zxckxyuj000ei4 * docker-master Ready Drain Leader 18.09.6
sk45rol2whdr5eh2jqozy0035 docker-node01 Ready Active Reachable 18.09.6
o4zwwbwwcrbwo4tsd00pxkfuc docker-node02 Ready Active 18.09.6
Now I have 36 (very similar) services, 28 run on docker-node01, 8 on docker-node02. I thought, that ideal state is 16 services on both nodes.
Both docker-nodes are same.
How docker swarm knows where to run service? What algorithm does it use?
It is possible to change/update algorithm for selecting node?
According to the swarmkit project README the only available strategy is spread so it schedule tasks on the least loaded modes.
Note that the swarm won't move nodes around to maintain this strategy so if you added the node02 after the node01 was full then the node02 will remain mostly empty. You could drain both nodes then activate them to see if it distributes better the load.
You can find a more detailed description on the schedules algorithm on the project documentation: scheduling-algorithm
For the older swarm manager this attribute was configurable:
https://docs.docker.com/swarm/reference/manage/#--strategy--scheduler-placement-strategy
Also I found https://docs.docker.com/swarm/scheduler/strategy/, it explains a lot about Docker swarm strategies.

restore a docker swarm

Let's say we have a swarm1 (1 manager and 2 workers), I am going to back up this swarm on a daily basis, so if there is a problem some day, I could restore all the swarm to a new one (swarm2 = 1 manager and 2 workers too).
I followed what described here but it seems that while restoring, the new manager get the same token as the old manager, as a result : the 2 workers get disconnected and I end up with a new swarm2 with 1 manager and 0 worker.
Any ideas / solution?
I don't recommend restoring workers. Assuming you've only lost your single manager, just docker swarm leave on the workers, then join again. Then on the manager you can always cleanup old workers later (does not affect uptime) with docker node rm.
Note that if you loose the manager quorum, this doesn't mean the apps you're running go down, so you'll want to keep your workers up and providing your apps to your users until you fix your manager.
If your last manager fails or you lose quorum, then focus on restoring the raft DB so the swarm manager has quorum again. Then rejoin workers, or create new workers in parallel and only shutdown old workers when new ones are running your app. Here's a great talk by Laura Frank that goes into it at DockerCon.

Adding nodes to docker swarm

We have a swarm running docker 1.13 to which I need to add 3 more nodes running docker 17.04.
Is this possible or will it cause problems?
Will it be possible to update the old nodes without bringing the entire swarm down?
Thanks
I ran into this one myself yesterday and the advice from the Docker developers is that you can mix versions of docker on the swarm managers temporarily, but you cannot promote or demote nodes that don't match the version on all the other swarm managers. They also recommended upgrading all managers before upgrading workers.
According to that advice, you should upgrade the old nodes first, one at a time to avoid avoid bringing down the cluster. If containers are deployed to those managers, you'll want to configure the node to drain with docker node update --availability drain $node_name first. After the upgrade, you can bring is back into service with docker node update --availability active $node_name.
When trying to promote a newer node into an older swarm, what I saw was some very disruptive behavior that wasn't obvious until looking at the debugging logs. The comments on this issue go into more details on Docker's advice and problems I saw.

DC/OS on top of a docker container cluster

Given that I have only one machine(high configuration laptop), can I run the entire DCOS on my laptop (for purely simulation/learning purpose). The way I was thinking to set this up was using some N number of docker containers (with networking enabled between them), where some of those from N would be masters, some slaves, one zookeeper maybe, and 1 container to run the scheduler/application. So basically the 1 docker container would be synonymous to a machine instance in this case. (since I don't have multiple machines and using multiple VMs on one machine would be an overkill)
Has this been already done, so that I can straight try it out or am I completely missing something here with regards to understanding?
We're running such a development configuration where ZooKeeper, Mesos Masters and Slaves as well as Marathon runs fully dockerized (but on 3 bare metal machine cluster) on CoreOS latest stable. It has some known downsides, like when a slave dies the running tasks cannot be recovered AFAIK by the restarted slave.
I think it also depends on the OS what you're running on your laptop. If it's non-Windows, you should normally be fine. If your system supports systemd, then you can have a look at tobilg/coreos-setup to see how I start the Mesos services via Docker.
Still, I would recommend to use a Vagrant/VirtualBox solution if you just want to test how Mesos works/"feels"... Those will probably save you some headaches compared to a "from scratch" solution. The tobilg/coreos-mesos-cluster project runs the services via Docker on CoreOS within Vagrant.
Also, you can have a look at dharmeshkakadia/awesome-mesos and especially the Vagrant based setup section to get some references.
Have a look at https://github.com/dcos/dcos-docker it is quite young but enables you to do exactly what you want.
It starts a DC/OS cluster with masters and agents on a single node in docker containers.

Resources