Docker swarm manager cannot recognize swarm nodes - docker

I follow these steps to create Docker swarm cluster.
First: Create Cunsol
docker-machine create -d virtualbox mh-keystore
eval "$(docker-machine env mh-keystore)"
docker run -d \
-p "8500:8500" \
-h "consul" \
progrium/consul -server -bootstrap
Second: Create swarm manager
docker-machine create -d virtualbox node1
docker run -d -p 4000:4000 swarm manage -H :4000 --replication -- advertise $(docker-machine ip node1):4000 consul://$(docker-machine ip mh-keystore):8500
Third: Create swarm node
docker-machine create -d virtualbox node2
docker run -d swarm join --advertise=$(docker-machine ip node2):2375 consul://$(docker-machine ip mh-keystore):8500
Fourth: Login to node1
docker-machine ssh node1
docker -H :4000 info
But this instruction output
(unknown): 192.168.99.106:2375(node2 ip)
└ ID:
└ Status: Pending
└ Containers: 0
└ Reserved CPUs: 0 / 0
└ Reserved Memory: 0 B / 0 B
└ Labels:
└ Error: Cannot connect to the Docker daemon. Is the docker daemon running on this host?....
How can I fix this ?
I have already checked node2 and it runs well.
[Update] I follow this page and it works well. But I still wan't to know how set up swarm cluster without docker-machine.
[Update] Another approach doen't work either.
docker-machine create -d virtualbox \
--swarm \
--swarm-discovery="consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
mhs-demo1
Node1 docker info appear mhs-demo1 ip but info still unknown..
[Update]
When I type eval docker-machine env --swarm node1 It shows
Error checking TLS connection: "node1" is not a swarm master. The
--swarm flag is intended for use with swarm masters Does this cause error ? Why using swarm manager instruction to set up is not swarm
master?
It's so strange. How can I get the same result as
docker-machine create \ -d virtualbox \ --swarm --swarm-master \
--swarm-discovery="consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-store=consul://$(docker-machine ip mh-keystore):8500" \
--engine-opt="cluster-advertise=eth1:2376" \
mhs-demo0
using swarm instruction?
I want to use swarm instruction because I don't want to declare swarm master when I create it.

Why are you using docker-machine just to start a node? You can use docker machine to setup your node with swarm ready to go..
You can follow this tutorial
https://docs.docker.com/engine/userguide/networking/get-started-overlay/

Try deleting this file with:
sudo rm /etc/docker/key.json
Then restart docker with:
sudo service docker restart
At this point docker will make a new key.json file and your master should be able to find your workers. This happens sometimes when you use the same image for all your worker nodes, but its an easy fix.

In docker 1.12 swarm mode is directly available. There is no need for a key value store for the cluster.
just follow the this : https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/

Related

Is there a way to setup a test docker swarm on a single machine?

I am trying to setup a docker swarm on WSL2 for testing purposes. I want to know, if it is possible to have a swarm with multiple "dummy" nodes on a single machine.
Here are the two ways that I trid:
Run multiple WSL instances as suggested here.
PS C:\Users\jdu> wsl -l
Windows-Subsystem für Linux-Distributionen:
Ubuntu3
Ubuntu
Ubuntu2
Docker is installed and run in each WSL instance. So I manage to initialize a swarm on Ubuntu and let Ubuntu2 and Ubuntu3 to join as workers.
On Ubuntu
$ docker swarm init
Swarm initialized: current node (hude19jo7t9dqpe0akg55ipmy) is now a manager.
On Ubuntu2
$ docker swarm join --token SWMTKN-1-xxxxxxxxx-xxxxxxxxx 192.168.189.5:2377 --listen-addr 0.0.0.0:12377
This node joined a swarm as a manager.
Then if I check on Ubuntu
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
hude19jo7t9dqpe0akg55ipmy * laptop-ebc155 Ready Active Leader 20.10.21
ozeq43yukgfbltjnfya0tlx08 laptop-ebc155 Ready Active Reachable 20.10.20
Inspired by the ideas here, I have tried with docker-in-docker containers, e.g. I deploy multiple docker instances on a single WSL.
# Init Swarm master
docker swarm init
# Get join token:
SWARM_TOKEN=$(docker swarm join-token -q worker)
echo $SWARM_TOKEN
# Get Swarm master IP (Docker for Mac xhyve VM IP)
SWARM_MASTER_IP=$(docker info | grep -w 'Node Address' | awk '{print $3}')
echo $SWARM_MASTER_IP
DOCKER_VERSION=dind
# setup deploy Docker-in-Docker containers and join them to a swarm
docker run -d --privileged --name worker-1 --hostname=worker-1 -p 12377:2377 docker:${DOCKER_VERSION}
docker exec worker-1 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
docker run -d --privileged --name worker-2 --hostname=worker-2 -p 22377:2377 docker:${DOCKER_VERSION}
docker exec worker-2 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
docker run -d --privileged --name worker-3 --hostname=worker-3 -p 32377:2377 docker:${DOCKER_VERSION}
docker exec worker-3 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
After that
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
s371tmygu9h640xfosn6kyca4 * laptop-ebc155 Ready Active Leader 20.10.21
w1ina9ttvje4hn6r13p3gzbge worker-1 Ready Active 20.10.20
m8mqky6jchjao01nz8t5e392a worker-2 Ready Active 20.10.20
n29afhbb090tlyn9p0byga9au worker-3 Ready Active 20.10.20
To test the above two swarm setup, I use a very simple compose file as suggested by the official docs. As you can expect, these two swarm setup didn't work that well :/
If the MongoDB and MongoExpress are deployed on different nodes, both of the swarm setups show a same error MongoNetworkError: failed to connect to server [mongo:27017] on first connect. My understanding to this error is, that MongoExpress can not reach MongoDB under mongo:27017, which seems like a problem of the docker internal DNS. Can someone help me out? Or just feel free to tell me, dont try this single-multi nodes ideas anymore :D I am very appreciate to any help!
I just tried the same two exercises :)
Approach 1 - swarm nodes in WSL instances
I think it is currently impossible because of WSL2 design see https://github.com/microsoft/WSL/issues/4304. WSL2 instances are in fact sharing network setup - ip, interfaces, network namespaces, and so on. Every change made in one of them is immediately visible in all others and this conflicts with virtual interfaces and namespaces created by docker swarm nodes when they start up.
I tried configuring multiple ip addresses on eth0 interface, so that each node can have it's own (like here), and then used --advertise-addr --listen-addr options in docker swarm init and docker swarm join commands. Still I'm getting this error in dockerd logs:
moving interface ov-001000-yis5e to host ns failed, invalid argument, after config error error setting interface \"ov-001000-yis5e\" IP to 10.0.0.1/24: cannot program address 10.0.0.1/24 in sandbox interface because it conflicts with existing route {Ifindex: 4 Dst: 10.0.0.0/24 Src: 10.0.0.1 Gw: <nil> Flags: [] Table: 254}"
I believe here docker swarm hits a problem, because it already sees master's interfaces when it tries to to set up routing mesh networking for the worker. All because master and node share network config.
Approach 2 - swarm nodes as docker containers (docker-in-docker)
But I've got no 2. working with just a small change in swarm init command:
# advertise swarm on default bridge network
docker swarm init --advertise-addr 172.17.0.1
For me, the standard docker swarm init selected by default the eth0 address, which was only working for communication from dind -> wsl, but not the other way round.
Another but probably unrelated problem was that I could not access services/stacks executed this way from Windows host. This seems to be a wls bug and luckily there is a workaround.
One last hint about this mongo stack is ... patience. The stack consists of 2 services: mongo - the database and mongo-express - the client. Mongo image is a lot bigger ~600MB while mongo-express just ~135MB. The mongo-express image will be downloaded faster and it will be recreated by swarm multiple times before mongo is even started. Note also that docker images are independently downloaded for each worker in this setup, so also rebalancing may take some time.
I found these commands useful to see what is really happening:
# overview of services
docker service ls
# containers in each swarm service
docker service ps $(docker service ls --format {{.Name}})
# images in each dind worker
for i in $(seq "${NUM_WORKERS}"); do
docker exec worker-${i} docker images
done
#containers in each dind worker
for i in $(seq "${NUM_WORKERS}"); do
docker exec worker-${i} docker ps -a
done
Full listing of commands necessary to get working docker swarm using dind:
docker swarm init --advertise-addr docker0
SWARM_TOKEN=$( docker swarm join-token -q worker)
echo $SWARM_TOKEN
SWARM_MASTER_IP=$( docker info 2>&1 | grep -w 'Node Address' | awk '{print $3}')
echo $SWARM_MASTER_IP
DOCKER_VERSION=20.10.12-dind
NUM_WORKERS=3
# Run NUM_WORKERS workers with SWARM_TOKEN
for i in $(seq "${NUM_WORKERS}"); do
docker run -d --privileged --name worker-${i} --hostname=worker-${i} docker:${DOCKER_VERSION}
sleep 5
docker exec worker-${i} docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
done
# Setup the visualizer
docker service create \
--detach=true \
--name=viz \
--publish=8000:8080/tcp \
--constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
dockersamples/visualizer
####### play with mongo
mkdir mongodemo && cd mongodemo
wget https://raw.githubusercontent.com/docker-library/docs/f6c9b596064e2eed9c3b6ac75bea606cb6d94099/mongo/stack.yml
docker stack deploy -c stack.yml mongo
# from windows:
# mongo will be available under <eth0>:8081
# visualizer under <eth0>:8000
ip -4 addr | grep eth0

ERROR: Error response from daemon: datastore for scope "global" is not initialized

I create sucsessfully a swarm, with two nodes. However when I use docker-compose build && docker-compose up in order to start my project it crashes erroring out this:
ERROR: Error response from daemon: datastore for scope "global" is not initialized
It's a very very simple process:
docker run swarm create
swarm hash:
1477bcd7778d083e02a80c352d4f1b87
docker-machine create -d virtualbox --swarm --swarm-master --swarm-discovery token://1477bcd7778d083e02a80c352d4f1b87 myswarmmaster
docker-machine create -d virtualbox --swarm --swarm-discovery token://1477bcd7778d083e02a80c352d4f1b87 myremotenode1
eval $(docker-machine env --swarm myswarmmaster)
docker-compose build && docker-compose up
And then I get the error:
ERROR: Error response from daemon: datastore for scope "global" is not initialized
I'm running docker on Fedora 25.
I had the same error when I did docker swarm init on an Ubuntu machine. What I found was that swarm tries to access port 2377 so first open up the port 2377 sudo ufw allow 2377
And now docker swarm init worked and showed a message like this
Swarm initialized: current node (sdf23fsd3f24fr3f2f) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SW3Wwww-1-0dfsdffsdfdsfsdfdsfdfdsfdsf-dsfsdfdsfdsfdsfd \
52.15.91.31:2377
The key is make sure that appropriate ports are available.
Hope this helps

Can not connect nodes to docker swarm master (using zookeeper)

I am building my docker swarm cluster in a sandbox.
I have 1 zookeeper on a machine for discovery, 1 swarm master and 2 swarm nodes.
I try to connect them but when I try to run my docker run commands on the swarm master, it does not distribute the work to the nodes.
Also when I do docker info on the swarm master I can see that the nodes are not connected.
I do not know what I am doing wrong.
Here are the step to reproduce my problem:
I have an empty pwd/data folder and a pwd/config folder with my zoo.cfg:
tickTime=2000
dataDir=/tmp/zookeeper
clientPort=2181
initLimit=5
-
#---- CREATE ZOO ---
docker-machine create --driver virtualbox zoo1
docker-machine start zoo1
eval $(docker-machine env zoo1)
docker pull jplock/zookeeper
docker run -p 2181:2181 -v `pwd`/conf:/opt/zookeeper/conf -v `pwd`/data:/tmp/zookeeper jplock/zookeeper
docker-machine ip zoo1 #############192.168.99.100
-
#--- CREATE CLUSTER ---
docker-machine create --driver virtualbox --swarm --swarm-master machine-smaster
docker-machine create --driver virtualbox --swarm machine-s01
docker-machine create --driver virtualbox --swarm machine-s02
-
eval "$(docker-machine env machine-smaster)"
docker run -p 2375:2375 -d -t swarm manage -H 0.0.0.0:2375 --advertise $(docker-machine ip machine-smaster):2375 zk://192.168.99.100:2181/swarm
docker run swarm list zk://192.168.99.100:2181/swarm
sleep 10
eval "$(docker-machine env machine-s01)"
docker run -d swarm join --advertise $(docker-machine ip machine-s01):2375 zk://192.168.99.100:2181/swarm
docker run swarm list zk://192.168.99.100:2181/swarm
eval "$(docker-machine env machine-s02)"
docker run -d swarm join --advertise $(docker-machine ip machine-s02):2375 zk://192.168.99.100:2181/swarm
docker run swarm list zk://192.168.99.100:2181/swarm
If I run some containers:
eval "$(docker-machine env machine-smaster)"
docker run hello-world
The work is not dispatched to nodes (it is run by the master).
If I run docker info:
eval "$(docker-machine env machine-smaster)"
docker info
I do not see the swarm nodes.
Can you verify that the addresses you're advertising are actually reachable from the manager instance? i.e., does docker -H $(docker-machine ip machine-s01):2375 info return a valid result?
(Note that this subshell won't work inside the manager VM, just on your original client.)
Maybe your problem is that the started Docker Machine instances are listening on :2376 with TLS, but your started Swarm containers are trying to advertise and connect to :2375 without any TLS settings specified?
What do the docker logs for the Swarm containers say?
It looks like you're connecting to the "Swarm master" machine through the Docker API, not the Swarm API. Because of this, Docker will always deploy containers on the host you're connected to, and does not take advantage of Swarm scheduling the containers on the right host.
To connect to the Swarm API, add the --swarm option when running docker-machine env, so in your case:
eval "$(docker-machine env --swarm machine-smaster)"

Need Explanation for the docker documentation on the swarm

I finished this documentation:
https://docs.docker.com/swarm/install-w-machine/
It works fine.
Now I tried to setup this EC2 instances by following this documentation:
https://docs.docker.com/swarm/install-manual/
I am in Step 4. Set up a discovery backend
I cannot understand the steps what I need to do further.
I created 5 nodes in EC2: manager0, manager1, consul0, node0, node1. Now I need to know how to setup service discovery with swarm.
In that document they ask us to connect manager0 and consul0 then ifconfig, then they given as etc0 instance. I don't know where this is coming from.
Ultimately I need to know where (in which node?) to run this command:
$ docker run -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap
Any suggestion for me How to clear this step?
Consul will run on the consul0 server you created. So basically you first need to be able to run docker on worker0 and worker1 remotely, this is step 3. A better way of doing this is editing the daemon directly with the command:
sudo echo 'DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"' > /etc/default/docker`
Then restart docker. Afterwards you will find that you can run docker remotely from master0, master1 or any other instance behind your firewall with docker commands that start with:
docker -H $WORKER0_IPADDRESS:2375
For example if your workers ip address was 1.2.3.4 this would run the docker ps command remotely:
docker -H 1.2.3.4:2375 ps
This is what swarm runs on. Then start up your consul server with the command you want to run, you got that one right and thats it you wont do anything else with the consul0 server except use its IP address when you run your swarm commands.
So if $CONSUL0 represented the IP address of your consul server this is how you would set up the rest of swarm. If you ran each of them on the local machine of each node:
On consul0:
docker run -d -p 8500:8500 --restart=unless-stopped --name=consul progrium/consul -server -bootstrap
On master0 and master1:
docker run --name=master -d -p 4000:4000 swarm manage -H :4000 --replication --advertise $(hostname -i):4000 consul://$CONSUL0:8500
On worker0 and worker1:
docker run -d --name=worker swarm join --advertise=$(hostname -i):2375 consul://$CONSUL0:8500/

docker-machine create node without tls verification

When I create a node with docker-machine
docker-machine create -d virtualbox node1
it is created with tls verification enabled for docker deamon which made things a bit more of a hassle than normal for swarm.
I want to create a node with docker-machine without tls verification for testing purpose.
I tried with:
docker-machine create -d virtualbox --engine-tls false node1
and
docker-machine create -d virtualbox --engine-tls-verify false node1
and
docker-machine create -d virtualbox --engine-opt-tls false node1
I use commands below:
docker-machine create -d virtualbox --engine-env DOCKER_TLS=no node1
And then ssh to the node to execute docker commands:
docker-machine ssh node1
$ docker info
try:
docker-machine create -d virtualbox --engine-opt tlsverify=false node1
and after running:
eval "$(docker-machine env node1)"
run:
unset DOCKER_TLS_VERIFY
This worked best for me:
docker-machine create -d virtualbox --engine-env DOCKER_TLS=no --engine-opt host=tcp://0.0.0.0:2375 node1
This way it binds to 2375 in addition to 2376. 2375 is the tradition for non-tls daemons.

Resources