Hyperledger - Docker swarm fails when deploying to multiple hosts - docker

I am following this tutorial. I ran sudo docker swarm init --advertise-addr <myip> on 1st ubuntu machine. And then I took the manager join-token and ran it on 2nd ubuntu machine and it is able to join as manager.
But the problem starts when i run docker network create --attachable --driver overlay my-net on 1st machine, it gives me following error:
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
If I run the above command to create network before joining the 2nd node, the network gets created successfully and the 2nd node also gets joined to the 1st swarm node. But when I do anything on the 1st Ubuntu machine, I get the same error on it.
Both Ubuntu machines are in same network and can be pinged by each other.
Ubuntu version - 17.1 64 bit
Docker version 18.03.1-ce, build 9ee9f40
Docker-compose version 1.21.2, build a133471

It seems that the tutorial is off as you will only end up with two managers and that is not enough to form a quorum. You can either add an additional manager node or simply create a single manager (docker swarm init) and then join a single worker using the command that is output as part of the response to docker swarm init. You should SKIP the docker swarm join-token manager step from the tutorial.

Just change the IP of your Ubuntu Machine.
Machine->Settings->nNetwork->select Attached to Bridged Adapter.
restart your machine.

Related

Docker 19.03.12 : The swarm does not have a leader aferter swarm upgrade

Some strange troubleshouting with docker since laste update.
Can you help me about this ?
It’s is not my firstr upgrade of package and this case have been reproduice on a freshnew stack.
Updgraded from 18.09.9 to 19.03.12
OS : Ubuntu 16.04 Server
Docker package
docker-ce=5:18.09.9~3-0~ubuntu-bionic
docker-ce-cli=5:19.03.11~3-0~ubuntu-bionic
containerd.io=1.2.13-2
Details
A problem identified with version 19.03.12 of docker
Managers have been put in version 19.03.12
When you want to add a manager to the group with an active leader, an error message is visible
The different known solutions were used
Case
As soon as you play the docker swarm join --token command on non-leader managers, after a few minutes, the leader manager is no longer available
-> Forced to replay the docker swarm init command --force-new-cluster --advertise-addr xx.xx.xx.xx --listen-addr xx.xx.xx.xx: 2377 to find the leader operational
The leader sees the worker nodes in version 19.03.12. No problem with workers
Restarting the docker service leads to the same result
Error Message
The swarm does not have a leader
Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.
docker msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn"
References applied
https://github.com/moby/moby/issues/34384#:~:text=demote%20master%20...-,new-server%23%20docker%20node%20ls%20Error%20response%20from%20daemon%3A,too%20few%20managers%20are%20online.&text=have%20a%20leader.-,It%27s%20possible%20that%20too%20few%20managers%20are%20online.,of%20the%20managers%20are%20online.
Docker Node is Down after service restart
https://cynici.wordpress.com/2018/05/31/docker-info-rpc-error-on-manager-node/
https://gitmemory.com/issue/docker/swarmkit/2670/481951641
https://forums.docker.com/t/cant-add-third-swarm-manager-or-create-overlay-network-the-swarm-does-not-have-a-leader/50849
https://askubuntu.com/questions/935569/how-to-completely-uninstall-docker

Is the 'local' vm required once the swarm cluster has been deployed?

According to the official documentation on Install and Create a Docker Swarm, first step is to create a vm named local which is needed to obtain the token with swarm create.
Once the manager and all nodes have been created and added to the swarm cluster, do I need to keep running the local vm?
Note: this tutorial is for the first version of Swarm (called Swarm legacy). There is a new version called Swarm mode available since Docker 1.12. Putting it out there because there seems to be a lot of confusion between the two.
No you don't have to keep the local VM, this is just to get a unique cluster token with the Docker Hub discovery service.
Now this is a bit overkill just to generate a token. You can bypass this step by:
Running the swarm container directly if you have Docker for Mac or a more generally a local instance of Docker running:
docker run --rm swarm create
Directly query the service discovery URL to generate a token:
curl -X POST "https://discovery.hub.docker.com/v1/clusters"

docker-compose swarm without docker-machine

After looking through docker official swarm explanations, github issues and stackoverflow answers im still at a loss on why i am having the problem that i have.
Issue at hand: docker-compose up starts services not in the swarm even though swarm is active and has 2 nodes.
Im using 1.12.1 docker version.
Looking at swarm tutorial i was able to start and scale my swarm using docker service create without any issues.
running docker-compose up with version 2 docker-compose.yml results in services starting outside of swarm, i can see them through docker ps but not docker service ls
I can see that docker-machine as the tool that solves this problems, but then again it needs virtual box to be installed.
so my questions would be
Can i use docker-compose with docker-swarm (NOT docker-engine) without docker-machine and without experimental build bundle functionality?
If docker service create can start a service on any nodes is it an indication that network configuration of the swarm is correct ?
What is the advantages/disadvantages of docker-machine versus experimental build functionality
1) No. Docker Compose isn't integrated with the new Swarm Mode yet. Issue 3656 in GitHub is tracking that. If you start containers on a swarm with Docker Compose at the moment, it uses docker run to start containers, which is why you see them all on one node.
2) Yes. Actually you can use docker node ls on the manager to confirm all the nodes are up and active, and docker node inspect to check a particular node, you don't need to create a service to validate the swarm.
3) Docker Machine is also behind the 1.12 release, so if you start a swarm with Docker Machine it will be the 'old' type of swarm. The old Docker Swarm product needed a whole lot of extra setup for a key-value store, TLS etc. which Swarm Mode does for free.
1) You can't start services using docker-compose on the new Docker "Swarm Mode". There's a feature to convert a docker-compose file to the new dab format which is understood by the new swarm mode but that's incomplete and experimental at this point. You basically need to use bash scripts to start services at the moment.
2) The nodes in a swarm (swarm mode) interact using their own overlay network. It's the one named ingress when you do docker network ls. You need to setup your own overlay network to run services in. eg:
docker network create -d overlay mynet
docker service create --name serv1 --network mynet nginx
3) I'm not sure what feature you mean by "experimental build'. docker-machine is just a way to create hosts (the nodes). It facilitates the setting up of the docker daemon on each host, the certificates and allows some basic maintenance (renewing the certs, stopping/starting a host if you're the one who created it). It doesn't create services, volumes, networks or manages them. That's the job of the docker api.

Overlay network on Swarm Mode without Docker Machine

I currently have three hosts (docker1, docker2 and docker3) which I have not set up using Docker Machine, each one running the v1.12-rc4 Docker daemon.
I run docker swarm init on docker1, which in turn prints a docker swarm join command which I run on both docker2 and docker3. At that point, running docker info on each host contains the Swarm: active line.
It is at this point that the behavior seems to differ from what I used to get with the standalone Swarm container. Especially, running docker network ls will only show me the networks on the local host, and when trying to create an overlay network, it does not seem like worker nodes are aware of it (i.e. it does not show up on their docker network ls.)
I feel like I have missed out on some important information relating to the workings of the Swarm Mode as opposed to the Swarm container.
What is the correct way of setting up such a cluster without Docker Machine on Docker 1.12 while getting the overlay network feature?
I too thought this was an issue when I first started using it.
This works a little differently in 1.12rc4 - when you deploy a container to your swarm with that network attached to it, it should then create the network on the other nodes as well.
Hope this helps!
Issue
You are using the docker command (used to communicate with your localhost Docker daemon) and not the "swarm" command (used to communicate with the Swarm master).
Solution
It depends on the command you used to start Swarm.
A full step-by-step tutorial (including details on how to deploy an overlay network) is detailled on this answer. I'm sure that reading this will help you ;)
With a network scope of swarm, the network is only propagated to worker nodes on an as-needed basis. If you create a service using that network, and it gets scheduled on that worker node, the network will show up in the docker network ls.
With the now-upcoming 1.13 release, you can get a network that has similar behavior to the non-swarm networks by doing docker network create --attachable .... That network will be valid for both services and normal containers, and will be available to all members of the cluster. As of 1.13.0-rc2, those don't seem to show up in the output of docker network ls.

Checking reason behind node failure

I have docker swarm setup with nodes running node-1, node-2 and node-3. Due to some reason everyday one of my node is getting failed basically they exits. I ran docker logs <container id of swarm> but logs doesn't contains any info related to node failure.
So, is there any logs file where a log related to this failure can be seen? or this is due to some less memory allocation problem?
Can any one suggest me how to dig this problem and find a proper solution. As everyday I have to start by swarm nodes.
Like most containers, Swarm containers run and exit unless you use docker run with the -d option to "daemonize" them. For example:
$ docker run -d swarm join --advertise=172.30.0.69:2375 consul://172.30.0.161:8500
On the other hand, if you used Docker Machine to create the VMs, then also use Docker Machine to create the Swarm manager and nodes. By default, Docker Machine applies TLS authentication to the Docker Engine nodes. The easiest thing to do is to also create the Swarm manager and nodes at the same time as you create the Docker Engine nodes.
For more info, check out the brand new Swarm doc.

Resources