Running LINSTOR in Docker Swarm - docker-swarm

I am currently trying out linstor in my lab. I am trying to setup a separation of compute and storage node. Storage node that runs linstor whereas Compute node is running Docker Swarm or K8s. I have setup 1 linstor node and 1 docker swarm node in this testing. Linstor node is configured successfully.
Linstor Node
DRBD 9.1.2
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ instance-2 ┊ DISKLESS ┊ ┊ ┊ ┊ False ┊ Ok ┊ ┊
┊ pd-std-pool ┊ instance-2 ┊ LVM_THIN ┊ vg/lvmthinpool ┊ 199.80 GiB ┊ 199.80 GiB ┊ True ┊ Ok ┊ ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
# linstor node list
╭─────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ State ┊
╞═════════════════════════════════════════════════════════════╡
┊ instance-2 ┊ COMBINED ┊ 10.100.0.29:3366 (PLAIN) ┊ Online ┊
╰─────────────────────────────────────────────────────────────╯
Docker Node
In another node, I have Docker Swarm running. This node does not have any tools installed such as drbd, drbdtop, drbdsetup ...etc. Technically is running a minimal installation that is sufficient to run only Docker to keep it lightweight. Docker version is 20.10.3. I have also installed the linstor docker volume written in golang.
Below is my /etc/linstor/docker-volume.conf and docker volume plugin installed in my Docker Swarm node
$ docker plugin ls
ID NAME DESCRIPTION ENABLED
6300029b3178 linbit/linstor-docker-volume:latest Linstor volume plugin for Docker true
$ cat /etc/linstor/docker-volume.conf
[global]
controllers = linstor://instance-2
fs = xfs
I got an error when trying to use the volume created by linstor. I have confirmed I can ping linstor controller at instance-2 and have all ports open in the firewall. Here is the error and the step to reproduce
$ docker volume create -d linbit/linstor-docker-volume:latest --name=first --opt size=20 --opt replicas=1 --opt storage-pool=pd-std-pool
$ docker volume ls
DRIVER VOLUME NAME
local 64f864db31990baa6b790dde34513a7f6fc466ca0c5e72ffab7024365a9f45da
linbit/linstor-docker-volume:latest first
$ docker volume inspect first
[
{
"CreatedAt": "0001-01-01T00:00:00Z",
"Driver": "linbit/linstor-docker-volume:latest",
"Labels": {},
"Mountpoint": "",
"Name": "first",
"Options": {
"replicas": "1",
"size": "20",
"storage-pool": "pd-std-pool"
},
"Scope": "global"
}
]
$ docker run --rm -it -v first:/data alpine sh
docker: Error response from daemon: error while mounting volume '': VolumeDriver.Mount: 404 Not Found.
ERRO[0000] error waiting for container: context canceled
Questions
Do I need to install drbd-utils in my Docker Swarm for it to work?
What is the error VolumeDriver.Mount 404 Not Found means ?

LINSTOR manages storage in a cluster of nodes replicating disk space inside a LVM or ZFS volume (or bare partition I'd say) by using DRDB (Distributed Replicated Block Device) to replicate data across the nodes, as per the official docs:
"LINSTOR is a configuration management system for storage on Linux systems. It manages LVM logical volumes and/or ZFS ZVOLs on a cluster of nodes. It leverages DRBD for replication between different nodes and to provide block storage devices to users and applications. It manages snapshots, encryption and caching of HDD backed data in SSDs via bcache."
So I'd say yes, you really need to have the driver on every node on which you want to use the driver (I did see Docker's storage plugin try to mount the DRBD volume locally)
However, you do not necessarily need to have the storage space itself on the compute node, since you can mount a diskless DRBD resource from volumes that are replicated on separate nodes so I'd say your idea should work, unless there is some bug in the driver itself I didn't discover yet: your compute node(s) needs to be registered as being a diskless node for all the required pools (I didn't try this but remember reading it's not only possible but recommended for some types of data migrations).
Of course if you don't have more than 1 storage nodes you don't gain much from using LINSTOR/drbd (node or disk failure will leave you diskless). My use case for it was to have replicated storage across different servers in different datacenters, so that the next time one burns to a crisp 😅 I can have my data and containers running after minutes instead of several days...

Related

Is there a way to setup a test docker swarm on a single machine?

I am trying to setup a docker swarm on WSL2 for testing purposes. I want to know, if it is possible to have a swarm with multiple "dummy" nodes on a single machine.
Here are the two ways that I trid:
Run multiple WSL instances as suggested here.
PS C:\Users\jdu> wsl -l
Windows-Subsystem für Linux-Distributionen:
Ubuntu3
Ubuntu
Ubuntu2
Docker is installed and run in each WSL instance. So I manage to initialize a swarm on Ubuntu and let Ubuntu2 and Ubuntu3 to join as workers.
On Ubuntu
$ docker swarm init
Swarm initialized: current node (hude19jo7t9dqpe0akg55ipmy) is now a manager.
On Ubuntu2
$ docker swarm join --token SWMTKN-1-xxxxxxxxx-xxxxxxxxx 192.168.189.5:2377 --listen-addr 0.0.0.0:12377
This node joined a swarm as a manager.
Then if I check on Ubuntu
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
hude19jo7t9dqpe0akg55ipmy * laptop-ebc155 Ready Active Leader 20.10.21
ozeq43yukgfbltjnfya0tlx08 laptop-ebc155 Ready Active Reachable 20.10.20
Inspired by the ideas here, I have tried with docker-in-docker containers, e.g. I deploy multiple docker instances on a single WSL.
# Init Swarm master
docker swarm init
# Get join token:
SWARM_TOKEN=$(docker swarm join-token -q worker)
echo $SWARM_TOKEN
# Get Swarm master IP (Docker for Mac xhyve VM IP)
SWARM_MASTER_IP=$(docker info | grep -w 'Node Address' | awk '{print $3}')
echo $SWARM_MASTER_IP
DOCKER_VERSION=dind
# setup deploy Docker-in-Docker containers and join them to a swarm
docker run -d --privileged --name worker-1 --hostname=worker-1 -p 12377:2377 docker:${DOCKER_VERSION}
docker exec worker-1 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
docker run -d --privileged --name worker-2 --hostname=worker-2 -p 22377:2377 docker:${DOCKER_VERSION}
docker exec worker-2 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
docker run -d --privileged --name worker-3 --hostname=worker-3 -p 32377:2377 docker:${DOCKER_VERSION}
docker exec worker-3 docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
After that
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
s371tmygu9h640xfosn6kyca4 * laptop-ebc155 Ready Active Leader 20.10.21
w1ina9ttvje4hn6r13p3gzbge worker-1 Ready Active 20.10.20
m8mqky6jchjao01nz8t5e392a worker-2 Ready Active 20.10.20
n29afhbb090tlyn9p0byga9au worker-3 Ready Active 20.10.20
To test the above two swarm setup, I use a very simple compose file as suggested by the official docs. As you can expect, these two swarm setup didn't work that well :/
If the MongoDB and MongoExpress are deployed on different nodes, both of the swarm setups show a same error MongoNetworkError: failed to connect to server [mongo:27017] on first connect. My understanding to this error is, that MongoExpress can not reach MongoDB under mongo:27017, which seems like a problem of the docker internal DNS. Can someone help me out? Or just feel free to tell me, dont try this single-multi nodes ideas anymore :D I am very appreciate to any help!
I just tried the same two exercises :)
Approach 1 - swarm nodes in WSL instances
I think it is currently impossible because of WSL2 design see https://github.com/microsoft/WSL/issues/4304. WSL2 instances are in fact sharing network setup - ip, interfaces, network namespaces, and so on. Every change made in one of them is immediately visible in all others and this conflicts with virtual interfaces and namespaces created by docker swarm nodes when they start up.
I tried configuring multiple ip addresses on eth0 interface, so that each node can have it's own (like here), and then used --advertise-addr --listen-addr options in docker swarm init and docker swarm join commands. Still I'm getting this error in dockerd logs:
moving interface ov-001000-yis5e to host ns failed, invalid argument, after config error error setting interface \"ov-001000-yis5e\" IP to 10.0.0.1/24: cannot program address 10.0.0.1/24 in sandbox interface because it conflicts with existing route {Ifindex: 4 Dst: 10.0.0.0/24 Src: 10.0.0.1 Gw: <nil> Flags: [] Table: 254}"
I believe here docker swarm hits a problem, because it already sees master's interfaces when it tries to to set up routing mesh networking for the worker. All because master and node share network config.
Approach 2 - swarm nodes as docker containers (docker-in-docker)
But I've got no 2. working with just a small change in swarm init command:
# advertise swarm on default bridge network
docker swarm init --advertise-addr 172.17.0.1
For me, the standard docker swarm init selected by default the eth0 address, which was only working for communication from dind -> wsl, but not the other way round.
Another but probably unrelated problem was that I could not access services/stacks executed this way from Windows host. This seems to be a wls bug and luckily there is a workaround.
One last hint about this mongo stack is ... patience. The stack consists of 2 services: mongo - the database and mongo-express - the client. Mongo image is a lot bigger ~600MB while mongo-express just ~135MB. The mongo-express image will be downloaded faster and it will be recreated by swarm multiple times before mongo is even started. Note also that docker images are independently downloaded for each worker in this setup, so also rebalancing may take some time.
I found these commands useful to see what is really happening:
# overview of services
docker service ls
# containers in each swarm service
docker service ps $(docker service ls --format {{.Name}})
# images in each dind worker
for i in $(seq "${NUM_WORKERS}"); do
docker exec worker-${i} docker images
done
#containers in each dind worker
for i in $(seq "${NUM_WORKERS}"); do
docker exec worker-${i} docker ps -a
done
Full listing of commands necessary to get working docker swarm using dind:
docker swarm init --advertise-addr docker0
SWARM_TOKEN=$( docker swarm join-token -q worker)
echo $SWARM_TOKEN
SWARM_MASTER_IP=$( docker info 2>&1 | grep -w 'Node Address' | awk '{print $3}')
echo $SWARM_MASTER_IP
DOCKER_VERSION=20.10.12-dind
NUM_WORKERS=3
# Run NUM_WORKERS workers with SWARM_TOKEN
for i in $(seq "${NUM_WORKERS}"); do
docker run -d --privileged --name worker-${i} --hostname=worker-${i} docker:${DOCKER_VERSION}
sleep 5
docker exec worker-${i} docker swarm join --token ${SWARM_TOKEN} ${SWARM_MASTER_IP}:2377
done
# Setup the visualizer
docker service create \
--detach=true \
--name=viz \
--publish=8000:8080/tcp \
--constraint=node.role==manager \
--mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
dockersamples/visualizer
####### play with mongo
mkdir mongodemo && cd mongodemo
wget https://raw.githubusercontent.com/docker-library/docs/f6c9b596064e2eed9c3b6ac75bea606cb6d94099/mongo/stack.yml
docker stack deploy -c stack.yml mongo
# from windows:
# mongo will be available under <eth0>:8081
# visualizer under <eth0>:8000
ip -4 addr | grep eth0

Error response from daemon: bridge is a pre-defined network and cannot be removed

CentOS 7
Docker 20.10
I want to delete all networks.
docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
As you can see no containers. I was removed it before.
I try this:
docker network ls
NETWORK ID NAME DRIVER SCOPE
1b6758d38df3 bridge bridge local
89dea066d590 host host local
8e235018309e none null local
And this:
docker network rm 1b6758d38df3
Error response from daemon: bridge is a pre-defined network and cannot be removed
P.S the folder /var/lib/docker is empty
Those are the system networks included in every Docker installation, they are not like user-defined networks and cannot be removed.
From the docs for thedocker network prune command:
Note that system networks such as bridge, host, and none will never be pruned
From the Network containers tutorial page:
Every installation of the Docker Engine automatically includes three default networks.
[...]
The network named bridge is a special network. Unless you tell it otherwise, Docker always launches your containers in this network.
This would mean that removing those networks would break some of Docker's networking features.
For what purpose do you want to remove/delete default network provided by Docker... Please share your use case so some one from community can guide you accordingly...
Bridge, Host & None are default & pre defined network... These networks are created during installation of docker....
Bridge - All containers without --network options get created within
bridge network only
To verify this you can run following commands -
docker run -it --rm --name=default-bridge-container1 busybox
As above command does not have --network option then it will be create container default-bridge-container1 under bridge network. To verify this, Run
docker network inspect bridge
Under containers section of inspect command, you will see container name default-bridge-container1 with IP assigned to it from bridge subnet.
Host - this option tells to use underlying host network
None - Container with --network=none means container is running in
isolation & it has no access to inward or outward network.

How to share Minikube instance on both Docker for Windows and WSL2?

How to share a Minikube instance amongst Windows/Windows WSL?
In Windows WSL minikube start fails:
😄 minikube v1.22.0 on Ubuntu 20.04
✨ Using the docker driver based on existing profile
👍 Starting control plane node minikube in cluster minikube
🚜 Pulling base image ...
🏃 Updating the running docker "minikube" container ...
🤦 StartHost failed, but will try again: provision: Temporary Error: NewSession: new client: new client: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
🏃 Updating the running docker "minikube" container ...
😿 Failed to start docker container. Running "minikube delete" may fix it: provision: Temporary Error: NewSession: new client: new client: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
❌ Exiting due to IF_SSH_AUTH: Failed to start host: provision: Temporary Error: NewSession: new client: new client: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
💡 Suggestion: Your host is failing to route packets to the minikube VM. If you have VPN software, try turning it off or configuring it so that it does not re-route traffic to the VM IP. If not, check your VM environment routing options.
📘 Documentation: https://minikube.sigs.k8s.io/docs/handbook/vpn_and_proxy/
🍿 Related issue: https://github.com/kubernetes/minikube/issues/3930
The above works in Windows:
😄 minikube v1.17.1 on Microsoft Windows 10 Pro 10.0.19042 Build 19042
🎉 minikube 1.22.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.22.0
💡 To disable this notice, run: 'minikube config set WantUpdateNotification false'
✨ Using the docker driver based on existing profile
👍 Starting control plane node minikube in cluster minikube
🏃 Updating the running docker "minikube" container ...
🐳 Preparing Kubernetes v1.20.2 on Docker 20.10.2 ...
🔎 Verifying Kubernetes components...
🌟 Enabled addons: storage-provisioner, dashboard, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
Questions:
Why would it fail on Linux? Note: I originally installed Minikube on Windows.
Is there a way to share the Minikube environment?
Second I want to share one Docker context. docker context ls:
Windows:
Returns:
NAME TYPE DESCRIPTION DOCKER ENDPOINT KUBERNETES ENDPOINT ORCHESTRATOR
default * moby Current DOCKER_HOST based configuration npipe:////./pipe/docker_engine https://127.0.0.1:53873 (default) swarm
desktop-linux moby npipe:////./pipe/dockerDesktopLinuxEngine
Windows WSL (Ubuntu):
docker context ls
NAME TYPE DESCRIPTION DOCKER ENDPOINT KUBERNETES ENDPOINT ORCHESTRATOR
default * moby Current DOCKER_HOST based configuration unix:///var/run/docker.sock https://127.0.0.1:51967 (default) swarm
desktop-linux moby
Questions:
What is default?
What is desktop-linux?
Which one is recommended to fully utilize the performance boost of WSL?
Third I want to share one Kubernetes context. kubectl config get-contexts:
Windows:
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
docker-desktop docker-desktop docker-desktop
kubernetes/REDACTED kubernetes REDACTED
* minikube minikube minikube default
Windows WSL (Ubuntu):
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
docker-desktop docker-desktop docker-desktop
* minikube minikube minikube
Questions:
What is docker-desktop?
Can I safely delete docker-desktop?
I was able to synchronize both Windows/WSL by copying the configuration files:
mkdir ~/.kube \ && cp /mnt/c/Users/[USERNAME]/.kube/config ~/.kube
kubectl config use-context docker-for-desktop # Select the Windows context
kubectl cluster-info # Check if it works## Heading ##
Ref: https://devkimchi.com/2018/06/05/running-kubernetes-on-wsl/

Docker Swarm: Apparent inconsistency

I'm using docker 1.13.1 on CentOS 7. I have created a swarm having a leader and two workers. Here are the nodes:
[root#inf-jenkins02-prd ~]# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
jfyycwch6l1rdarc9j7hd69dg inf-jenkins04-prd Ready Active
jy182rao4rnm3vn1uhm2ghslt inf-jenkins03-prd Ready Active
xuc8l7ra249y7e9s7u778g46l * inf-jenkins02-prd Ready Active Leader
Now, I want to see the details of each node:
[root#inf-jenkins02-prd ~]# docker node ps inf-jenkins02-prd
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
[root#inf-jenkins02-prd ~]#
The command is done on the leader, of course but nothing is displayed. These seems like a major inconsistency as there are no running containers:
[root#inf-jenkins02-prd ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root#inf-jenkins02-prd ~]#
and also:
[root#inf-jenkins02-prd ~]# docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root#inf-jenkins02-prd ~]#
I have created the cluster with Ansible but I don't think that this detail might be relevant. Does anyone know what might be wrong here ?
The commands you are using to see the state of the nodes are not the ones you should be using. For having some details on your nodes you can try with things like.
docker node inspect
or
docker system info
The commands you are using, are suitable for times when you want to list the "services" (as per docker/swarm perspective) that are currently running.
Just for the sake of testing, you could try to run a container like
docker container run --name test nginx
and then execute your
docker container ls
Hope this helped!

The container from the worker node can't join the swarm's attachable network

I encounter a problem when using the 'docker run' on the worker node,
the scenario is as follows:
I have following three VMs in my environment,
and they are already in the Swarn mode.
VM.1 -> Master node in the Swarm
VM.2 -> Worker node in the Swarm
VM.3 -> Worker node in the Swarm
and I've also created the overlay network in this environment via:
docker network create --attachable --driver overlay --subnet 10.10.0.0/16 --gateway 10.10.0.1 test-net
and the overlay network is created successfully
# docker network ls
NETWORK ID NAME DRIVER SCOPE
fc1b70304011 bridge bridge local
f9ca924c1a4d docker_gwbridge bridge local
ea8fc696d6f1 host host local
r311gaq7iobo ingress overlay swarm
bd08afac574d none null local
wb7vfpxzdkyt test-net overlay swarm
but, once I use 'docker run' to run a container and let it join the "test-net" from the worker node(VM.2 and VM.3), then I will encounter the following problem:
# docker run -itd --name=test --net=test-net kafka:latest
c0324e6c3a8720b291cfc3aa7980846348f7a4450381036927924c52d343f622
docker: Error response from daemon: error creating external
connectivity network:
cannot create network
246bb018a15a6641a9cb26afec30c62eb4714816cfc0a307786c8a209a2418e6
(docker_gwbridge):
conflicts with network
0093ca50dcbcf729aeeae537f424727b674843312ef63ea647db48c7b0077e45
(docker_gwbridge): networks have same bridge name.
but, it will be worked if I use the same 'docker run' command on the Master node, I've google this problem for serval times but still not understand what is happening on the worker node...
Thanks for your reading and help!
During investigating,
this issue is not 100% reproduced on the other machine/distribution.
some machines can be worked by docker run -itd --name=test --net=test-net kafka:latest
but, if the way above is not worked in specific machine,
then you can try to run the container without --net first,
then use docker network connect --ip <ipaddress> <network> <container> for
appending the specific network at your container.

Resources