I have a 3 node docker swarm cluster. We might want to have 2 managers. I know at one time there is only one leader. Since it is a 3 node cluster, I am trying to find some literature to understand what are the pros and cons of multiple managers. I need this info since in my 3 node cluster if I have 2 masters, 1 worker, what is the downside if I simply create 3 masters in a cluster. Any thoughts would be helpful.
A Docker swarm with two managers is not recommended.
Why?
Docker swarm implements a RAFT consensus:
Raft tolerates up to (N-1)/2 failures and requires a majority or quorum of (N/2)+1 members to agree on values proposed to the cluster. This means that in a cluster of 5 Managers running Raft, if 3 nodes are unavailable, the system will not process any more requests to schedule additional tasks
So with 2 managers, if one is down, the other will not be able to schedule additional tasks (no cluster upgrades, no new services, etc...).
The docs is also clear about the number of managers you should have for high availability :
Size your deployment
To make the cluster tolerant to more failures, add additional replica nodes to your cluster.
Manager nodes Failures tolerated
1 0
3 1
5 2
7 3
So in brief, as the doc states here:
Adding more managers does NOT mean increased scalability or higher performance. In general, the opposite is true.
Related
I have 2 swarm nodes and I whish that in case one node shut down, the other one rearrange all services to itself.
Right now I have one leader(manager) and one worker, and it works perfectly if the worker goes down, because leader rearranges all services to itself.
My problem here is when leader goes down and no one assumes services within it.
I already tried with two managers, but didn't works.
So I am thinking about to let all my services in the worker node so if leader node goes down there is no problem at all and if worker node goes down, leader node would rearrange all services to itself.
I tried with
deploy:
placement:
constraints:
- "node.role!=manager"
But it also does not works, because it will never instance this service in a manager node.
So I would like to ask if there is any way to make those two nodes to rearrange all services to itself in case other goes down?!
or
There is an way to configure a service to "preferably" be deployed in one specific node if that node is available otherwise be deployed in any other node?
The rub of it is, you need 3 nodes, all managers. It is not a good idea, even with a 2 node swarm, to make 2 nodes managers as docker swarm uses the raft protocol for manager quorum, and this protocol requires a clear majority. With two manager nodes, if either node goes down, the remaining manager node only represents 50% of the swarm managers and so will not represent the swarm until qorum is restored.
Once you have 3 nodes - all managers - the swarm will tolerate any single nodes failure and move tasks to the other two nodes.
Don't bother with 4 manager nodes - they dont provide extra protection from single node failures, and don't protect from two node failures as, again, only 2 out 4 does not represet more than 50%, to survive 2 node failures you want 5 managers.
i am running k8s cluster on GKE
it has 4 node pool with different configuration
Node pool : 1 (Single node coroned status)
Running Redis & RabbitMQ
Node pool : 2 (Single node coroned status)
Running Monitoring & Prometheus
Node pool : 3 (Big large single node)
Application pods
Node pool : 4 (Single node with auto-scaling enabled)
Application pods
currently, i am running single replicas for each service on GKE
however 3 replicas of the main service which mostly manages everything.
when scaling this main service with HPA sometime seen the issue of Node getting crashed or kubelet frequent restart PODs goes to Unkown state.
How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.
Question : 2
Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.
while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.
however i feel like it's not best way to it with Node selector.
it's always best to run more than one replicas of each service but currently, we are running single replicas only of each service so please suggest considering that part.
As Patrick W rightly suggested in his comment:
if you have a single node, you leave yourself with a single point of
failure. Also keep in mind that autoscaling takes time to kick in and
is based on resource requests. If your node suffers OOM because of
memory intensive workloads, you need to readjust your memory requests
and limits – Patrick W Oct 10 at
you may need to redesign a bit your infrastructure so you have more than a single node in every nodepool as well as readjust mamory requests and limits
You may want to take a look at the following sections in the official kubernetes docs and Google Cloud blog:
Managing Resources for Containers
Assign CPU Resources to Containers and Pods
Configure Default Memory Requests and Limits for a Namespace
Resource Quotas
Kubernetes best practices: Resource requests and limits
How to handle this scenario ? If the node gets crashed GKE taking time
to auto repair and which cause service down time.
That's why having more than just one node for a single node pool can be much better option. It greatly reduces the likelihood that you'll end up in the situation described above. GKE autorapair feature needs to take its time (usually a few minutes) and if this is your only node, you cannot do much about it and need to accept possible downtimes.
Node pool : 3 -4 running application PODs. Inside the application,
there are 3-4 memory-intensive micro services i am also thinking same
to use Node selector and fix it on one Node.
while only small node pool will run main service which has HPA and
node auto scaling auto work for that node pool.
however i feel like it's not best way to it with Node selector.
You may also take a loot at node affinity and anti-affinity as well as taints and tolerations
I am new to docker-swarm and very much interested in knowing inner workings of how docker-swarm distributes the incoming requests for one single service.
For example, I deployed stack with one single service and 10 replicas across 2 nodes. when I brought up the node 5 containers did show up on node1 and other 5 on node2. Now, I make 10 http requests to the same service from 10 different browser instances, does each container end up with one request each? If it was a round-robin, I would think so. However, I am not observing the same behavior from the stack that I have just deployed.
I brought up the stack with configuration above and made 10 requests. When I did that load was mire distributed than concentrated but only 7 of the containers got 10 requests and 3 were free.
This tells me that it is not even-distribution round robin. If not, what algorithm does docker services api follow to determine which container will serve the next request?
When I searched for inner workings of docker swarm, I ended up with article here : https://docs.docker.com/engine/swarm/ingress/
which is interesting look into ingress and routing mesh but still does not answer my original question.
Anyone?
Lets say we have a test setup of 10 nodes, 4 managers and 6 workers.
When the leader manager fails, the other 3 managers will chose another manager as leader.
When this leader as well fails, we only have 2 managers left out of 4. The other managers then say
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
Because we have not more than half of the managers left, they will not be able to chose a new leader although 2 managers of the cluster are left.
My question is
the sense of this rule, because the cluster is without a leader and not manageable anymore as long as no additional managers are added to the cluster, although there are 2 managers available.
Why should I chose the role worker for nodes at all? What advantage are there to have nodes as workers? Managers also act as workers by default only with the disadvantage that they cannot take over when manager nodes fail.
Docker recommends to use a system with odd number of manager nodes. So your initial setup of 4 manager is as good as having 3 manager nodes. It is recommended that you start with 5 nodes, as you are loosing 2 nodes. Also, isn't there any serious issue to be addressed in the way you are using? (loosing so many nodes is not a good sign)
If the swarm loses the quorum of managers, the swarm cannot perform management tasks. If your swarm has multiple managers, always have more than two. To maintain quorum, a majority of managers must be available. An odd number of managers is recommended, because the next even number does not make the quorum easier to keep. For instance, whether you have 3 or 4 managers, you can still only lose 1 manager and maintain the quorum. If you have 5 or 6 managers, you can still only lose two.
Having a dedicated worker nodes makes sure that they won't participate in the Raft distributed state, make scheduling decisions, or serve the swarm mode HTTP API. So the complete compute power of these nodes are dedicated specifically to run the containers.
because manager nodes use the Raft consensus algorithm to replicate data in a consistent way, they are sensitive to resource starvation
The quotes are taken from the docker official documentation link
I am playing with multi node docker swarm in cloud. I setup 4 nodes swarm where 2 manager (1 primary and the other one is reachable manager ) and 2 worker nodes. While I am reading docs, I found out that we have to choose odd number of manager nodes like 1,3.... Not sure what is the technical restriction behind this decision.
This is related to how consensus across managers is determined when maintaining cluster consistency during an outage. See Raft consensus in swarm mode.
The algorithm used to derive consensus for a cluster of N nodes requires (N/2)+1 of them to agree. For a cluster of 2 managers you would actually be reducing reliability because if either of them goes down the other would be unable to do anything. In general, having an even number of managers provides no benefit over having one less.