Docker Swarm: Apparent inconsistency - docker

I'm using docker 1.13.1 on CentOS 7. I have created a swarm having a leader and two workers. Here are the nodes:
[root#inf-jenkins02-prd ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
jfyycwch6l1rdarc9j7hd69dg inf-jenkins04-prd Ready Active
jy182rao4rnm3vn1uhm2ghslt inf-jenkins03-prd Ready Active
xuc8l7ra249y7e9s7u778g46l * inf-jenkins02-prd Ready Active Leader
Now, I want to see the details of each node:
[root#inf-jenkins02-prd ~]# docker node ps inf-jenkins02-prd
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
[root#inf-jenkins02-prd ~]#
The command is done on the leader, of course but nothing is displayed. These seems like a major inconsistency as there are no running containers:
[root#inf-jenkins02-prd ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root#inf-jenkins02-prd ~]#
and also:
[root#inf-jenkins02-prd ~]# docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root#inf-jenkins02-prd ~]#
I have created the cluster with Ansible but I don't think that this detail might be relevant. Does anyone know what might be wrong here ?

The commands you are using to see the state of the nodes are not the ones you should be using. For having some details on your nodes you can try with things like.
docker node inspect
or
docker system info
The commands you are using, are suitable for times when you want to list the "services" (as per docker/swarm perspective) that are currently running.
Just for the sake of testing, you could try to run a container like
docker container run --name test nginx
and then execute your
docker container ls
Hope this helped!

Related

Is there a way to setup a test docker swarm on a single machine?

I am trying to setup a docker swarm on WSL2 for testing purposes. I want to know, if it is possible to have a swarm with multiple "dummy" nodes on a single machine.
Here are the two ways that I trid:
Run multiple WSL instances as suggested here.
PS C:\Users\jdu> wsl -l
Windows-Subsystem für Linux-Distributionen:
Ubuntu3
Ubuntu
Ubuntu2
Docker is installed and run in each WSL instance. So I manage to initialize a swarm on Ubuntu and let Ubuntu2 and Ubuntu3 to join as workers.
On Ubuntu
$ docker swarm init
Swarm initialized: current node (hude19jo7t9dqpe0akg55ipmy) is now a manager.
On Ubuntu2
$ docker swarm join --token SWMTKN-1-xxxxxxxxx-xxxxxxxxx 192.168.189.5:2377 --listen-addr 0.0.0.0:12377
This node joined a swarm as a manager.
Then if I check on Ubuntu
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
hude19jo7t9dqpe0akg55ipmy * laptop-ebc155 Ready Active Leader 20.10.21
ozeq43yukgfbltjnfya0tlx08 laptop-ebc155 Ready Active Reachable 20.10.20
Inspired by the ideas here, I have tried with docker-in-docker containers, e.g. I deploy multiple docker instances on a single WSL.
# Init Swarm master
docker swarm init
# Get join token:
SWARM_TOKEN=$(docker swarm join-token -q worker)
echo $SWARM_TOKEN
# Get Swarm master IP (Docker for Mac xhyve VM IP)
SWARM_MASTER_IP=$(docker info | grep -w 'Node Address' | awk '{print $3}')
echo $SWARM_MASTER_IP
DOCKER_VERSION=dind
# setup deploy Docker-in-Docker containers and join them to a swarm
docker run -d --privileged --name worker-1 --hostname=worker-1 -p 12377:2377 docker:${DOCKER_VERSION}
docker exec worker-1 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
docker run -d --privileged --name worker-2 --hostname=worker-2 -p 22377:2377 docker:${DOCKER_VERSION}
docker exec worker-2 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
docker run -d --privileged --name worker-3 --hostname=worker-3 -p 32377:2377 docker:${DOCKER_VERSION}
docker exec worker-3 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
After that
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
s371tmygu9h640xfosn6kyca4 * laptop-ebc155 Ready Active Leader 20.10.21
w1ina9ttvje4hn6r13p3gzbge worker-1 Ready Active 20.10.20
m8mqky6jchjao01nz8t5e392a worker-2 Ready Active 20.10.20
n29afhbb090tlyn9p0byga9au worker-3 Ready Active 20.10.20
To test the above two swarm setup, I use a very simple compose file as suggested by the official docs. As you can expect, these two swarm setup didn't work that well :/
If the MongoDB and MongoExpress are deployed on different nodes, both of the swarm setups show a same error MongoNetworkError: failed to connect to server [mongo:27017] on first connect. My understanding to this error is, that MongoExpress can not reach MongoDB under mongo:27017, which seems like a problem of the docker internal DNS. Can someone help me out? Or just feel free to tell me, dont try this single-multi nodes ideas anymore :D I am very appreciate to any help!
I just tried the same two exercises :)
Approach 1 - swarm nodes in WSL instances
I think it is currently impossible because of WSL2 design see https://github.com/microsoft/WSL/issues/4304. WSL2 instances are in fact sharing network setup - ip, interfaces, network namespaces, and so on. Every change made in one of them is immediately visible in all others and this conflicts with virtual interfaces and namespaces created by docker swarm nodes when they start up.
I tried configuring multiple ip addresses on eth0 interface, so that each node can have it's own (like here), and then used --advertise-addr --listen-addr options in docker swarm init and docker swarm join commands. Still I'm getting this error in dockerd logs:
moving interface ov-001000-yis5e to host ns failed, invalid argument, after config error error setting interface \"ov-001000-yis5e\" IP to 10.0.0.1/24: cannot program address 10.0.0.1/24 in sandbox interface because it conflicts with existing route {Ifindex: 4 Dst: 10.0.0.0/24 Src: 10.0.0.1 Gw: <nil> Flags: [] Table: 254}"
I believe here docker swarm hits a problem, because it already sees master's interfaces when it tries to to set up routing mesh networking for the worker. All because master and node share network config.
Approach 2 - swarm nodes as docker containers (docker-in-docker)
But I've got no 2. working with just a small change in swarm init command:
# advertise swarm on default bridge network
docker swarm init --advertise-addr 172.17.0.1
For me, the standard docker swarm init selected by default the eth0 address, which was only working for communication from dind -> wsl, but not the other way round.
Another but probably unrelated problem was that I could not access services/stacks executed this way from Windows host. This seems to be a wls bug and luckily there is a workaround.
One last hint about this mongo stack is ... patience. The stack consists of 2 services: mongo - the database and mongo-express - the client. Mongo image is a lot bigger ~600MB while mongo-express just ~135MB. The mongo-express image will be downloaded faster and it will be recreated by swarm multiple times before mongo is even started. Note also that docker images are independently downloaded for each worker in this setup, so also rebalancing may take some time.
I found these commands useful to see what is really happening:
# overview of services
docker service ls
# containers in each swarm service
docker service ps $(docker service ls --format {{.Name}})
# images in each dind worker
for i in $(seq "${NUM_WORKERS}"); do
docker exec worker-${i} docker images
done
#containers in each dind worker
for i in $(seq "${NUM_WORKERS}"); do
docker exec worker-${i} docker ps -a
done
Full listing of commands necessary to get working docker swarm using dind:
docker swarm init --advertise-addr docker0
SWARM_TOKEN=$( docker swarm join-token -q worker)
echo $SWARM_TOKEN
SWARM_MASTER_IP=$( docker info 2>&1 | grep -w 'Node Address' | awk '{print $3}')
echo $SWARM_MASTER_IP
DOCKER_VERSION=20.10.12-dind
NUM_WORKERS=3
# Run NUM_WORKERS workers with SWARM_TOKEN
for i in $(seq "${NUM_WORKERS}"); do
docker run -d --privileged --name worker-${i} --hostname=worker-${i} docker:${DOCKER_VERSION}
sleep 5
docker exec worker-${i} docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
done
# Setup the visualizer
docker service create \
--detach=true \
--name=viz \
--publish=8000:8080/tcp \
--constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
dockersamples/visualizer
####### play with mongo
mkdir mongodemo && cd mongodemo
wget https://raw.githubusercontent.com/docker-library/docs/f6c9b596064e2eed9c3b6ac75bea606cb6d94099/mongo/stack.yml
docker stack deploy -c stack.yml mongo
# from windows:
# mongo will be available under <eth0>:8081
# visualizer under <eth0>:8000
ip -4 addr | grep eth0

Creating a docker service is not also creating a docker container

I am trying to create a docker container in a swarm. I am expecting to see the service when I execute "docker service ls", and to see a container running when I execute "docker ps". I see the service but not the container.
[root#docker01-staging dcater]# docker service create --name dbcservice alpine ping 127.0.0.1
lm2b7g3kbnbn11m33y15bplqf
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
[root#docker01-staging dcater]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
maad961bcum4 dbcservice replicated 1/1 alpine:latest
[root#docker01-staging dcater]# docker ps --filter name=dbcservice
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Any idea what I am missing?
I figured out the answer (roughly). I'm not sure I have the terminology right, but docker01-staging is the management node. I checked docker02-staging, and that's actually where the process is running:
[root#docker02-staging dcater]# docker ps --filter name=dbcservice
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3f30b6fa3d40 alpine:latest "ping 127.0.0.1" 56 minutes ago Up 56 minutes dbcservice.1.fke9ljd8brpwzhklzqy0agt1r
docker ps is a docker level command that talks to the docker daemon running on the same node that docker ps is run, whereas in the context of Docker Swarm, docker service is a swarm level command, querying the swarm state. Thus docker ps must always be executed on each node in the swarm to see the running containers.
There is also docker node ps which is a swarm level command that will show the containers running on swarm nodes using the swarm node name. Use docker node ls to show the swarm node names.

How to connect a Docker container to a local network

I'm running Docker on a Raspberry Pi 3 using Raspbian (Jessie). I want to access my containers from other PCs on the same network. Can someone explain how can I make containers to show under my router list as an independent machine?
Port forwarding is useless because we are using few ports and if we need to add some new function to it, we must commit the container, delete the container, create a new container under the image created and add the new ports to forward.
Maybe you can try docker ps -a to check the container's states and available ports.
This is the result on my Linux (IP address is 135.251.247.21):
sdn#sdn-KVM:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
be8c8289fe20 135.249.45.113:9005/onos:1.7.004 "./bin/onos-service" 3 weeks ago Up 7 hours 0.0.0.0:6633->6633/tcp, 6653/tcp, 0.0.0.0:8101->8101/tcp, 9876/tcp, 0.0.0.0:9191->8181/tcp onos-docker
I can access this container from a remote machine via SSH:
ssh -p 8101 karaf#135.251.247.21.
If you cannot access your container from a remote machine, you can try to access it on your local machine by running docker exec -it xxx bash, where xxx is the container name.

client access to docker swarm

I have a docker swarm cluster consisting of one manager and one worker node. Then I configured (tls and DOCKER_HOST) a client from my laptop to get access to this cluster.
When I run docker ps I see only containers from the worker node (and not all containers of worker node (!)).
For example, from my client:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a129d9402aeb progrium/consul "/bin/start -rejoi..." 2 weeks ago Up 22 hours IP:8300-8302->8300-8302/tcp, IP:8400->8400/tcp, IP:8301-8302->8301-8302/udp, 53/tcp, 53/udp, IP:8500->8500/tcp, IP:8600->8600/udp hadoop1103/consul-agt2-hadoop
As well as I run docker ps at worker node:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4fec7fbf0b00 swarm "/swarm join --advert" 16 hours ago Up 16 hours 2375/tcp join
a129d9402aeb progrium/consul "/bin/start -rejoin -" 2 weeks ago Up 22 hours 0.0.0.0:8300-8302->8300-8302/tcp, 0.0.0.0:8400->8400/tcp, 0.0.0.0:8301-8302->8301-8302/udp, 53/tcp, 53/udp, 0.0.0.0:8500->8500/tcp, 0.0.0.0:8600->8600/udp consul-agt2-hadoop
So two questions: Why docker ps doesn't show containers from manager machine and not all containers from worker node?
Classic swarm (run as a container) by default hides the swarm management containers from docker ps output. You can show these containers with a docker ps -a command instead.
This behavior may be documented elsewhere, but the one location I've seen the behavior documented is in the api differences docs:
GET "/containers/json"
Containers started from the swarm official image are hidden by default, use all=1 to display them.
The all=1 api syntax is the equivalent of the docker ps -a cli.

Docker swarm-manager displays old container information

I am using docker-machine with Google Compute Engine(GCE)
to run a
docker swarm cluster. I created a swarm successfully with 2
nodes
(swnd-01 & swnd-02) in the cluster. I created a daemon container
like this
in the swarm-manager environment:
docker run -d ubuntu /bin/bash
docker ps shows the container running on swnd-01. When I tried
executing a command over the container using docker exec I get the
error that container is not running while docker ps shows otherwise.
I ssh'ed into swnd-01 via docker-machine to come to know that container
exited as soon as it was created. I tried docker run command inside the
swnd-01 but it still exits. I don't understand the behavior.
Any suggestions will be thankfully received.
The reason it exits is that the /bin/bash command completes and a Docker container only runs as long as its main process (if you run such a container with the -it flags the process will keep running while the terminal is attached).
As to why the swarm manager thought the container was still running, I'm not sure. I guess there is a short delay while Swarm updates the status of everything.

Resources