Docker Windows master node "docker swarm init" causes worker nodes in same Virtual Network to no longer see the master node - docker-swarm

I have strange behaviour related to docker swarm mode on windows. What I have done:
Deployed two "Windows Server 2019 Datacenter with Containers - Gen1" virtual machines in Azure
Setting RDP access from my IP to the virtual machines
Ensures they are in the same virtual network and their subnet is associated with the virtual network
Downloaded all windows updates
Used telnet to check if worker machine sees master by running "telnet 10.0.0.4 3389". This works.
Used telnet to check if master machine sees worker by running "telnet 10.0.0.5 3389". This works.
Ensured that Docker Swarm ports are open in Windows Firefall too for both machines: 4789, 7946 (UDP) and 2377, 7946 (TCP)
Initialized docker swarm mode on master node with the command: "docker swarm init --advertise-addr 10.0.0.4"
Checked that "docker node ls" lists the master as Ready
Immediately after this tried to use "telnet 10.0.0.4 3389" from worker node to see if master is still accessible - it no longer works!
Not surprisingly, trying to join the docker swarm from the worker also fails in the usual "timeout" error
Due to the fact that telnet 10.0.0.4 3389 worked before master node entered swarm mode, but not after, it seems docker windows is doing some changes to the firewall priorities or rules, or changing the active network or something... Which is bonkers. I have not found a solution to this problem, which is making docker-for-windows unusable. Note: This problem only occurs in Azure. Using virtual machines in Exoscale and manually installing docker with powershell scripts did not show the same issue, which makes me think perhaps the "Windows Server 2019 Datacenter with Containers - Gen1" servers have some faulty configurations.
Edit:
I can confirm that this behaviour does not appear when manually installing docker for 2019 data centers using the following guide: https://blog.sixeyed.com/getting-started-with-docker-on-windows-server-2019/ (sixeyed is a known Docker for Windows expert). In other words "Windows Server 2019 Datacenter" image works.

I can confirm that this behaviour does not appear when manually installing docker for 2019 data centers using the following guide: https://blog.sixeyed.com/getting-started-with-docker-on-windows-server-2019/ (sixeyed is a known Docker for Windows expert). In other words "Windows Server 2019 Datacenter" image works.
So, do not use the "Windows Server 2019 Datacenter with Containers - Gen1" image. Instead, use the standard image and follow standard docker-for-windows-server-2019 installation guides to get swarm mode working.

Related

Cannot Connect to docker daemon. is docker daemon running?

I'm using Jenkins on Docker on my local Mac Machine.
And I'm running another Docker on ubuntu VirtualBox. So now, there are 2 docker machines. one is on my mac machine and one is on my Ubuntu VirtualBox machine. I'm running Jenkins on Mac Docker. Now in the Jenkins pipeline, I want to build an image on my ubuntu machine.
I've configured Jenkins docker cloud and in the docker host URL, it is connected to the ubuntu docker-machine.
But while building a new image, I'm getting the error. Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I've tried even adding ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:4243 -H unix:///var/run/docker.sock
at /lib/systemd/system/docker.service
WHen i check ps -aux,
Can someone please help me out?
help is appreciated.
First personally if I had a setup like that I would not bother connecting to the remote docker and would just install a Jenkins agent on the ubuntu machine and make it talk to the Jenkins master.
But if you want to do it they way you have it set up right now we a Jenkins talking from inside out one docker host into another docker host I suggest looking into the following:
Your Jenkins master and the ubuntu machine a very isolated they might as well just be on different machines not even in the same room. Unix domain sockets, the ones that are identified by unix://* are made for communicating within a single local OS kernel, trying to bridge them into remote machine will lead to disaster.
So the only way Jenkins could communicate to the remote host is via a remote protocol like TCP. Most of the time when you install docker with the default settings it doesn't even listen to TCP at all, mostly for security reasons.
First thing you should do is to configure a docker inside of the ubuntu machine to listen on TCP port and accept connections from remote hosts. You can use netstat -nat to see if anything is listening on TCP 4243. When things are configured correctly you see the line that stats with 0.0.0.0:4243 or something like that in the output of the nestat
Second you need to make sure your the firewalls/iptables/netfilter configuration on the Ubuntu host lets in connections from outside. A good test to try is to telnet <ubuntu-ip> 4243 from a terminal session on your Mac.
Then you need to make sure you that docker networking is configured correctly so that connections from the inside of the container that is running Jenkins end up on your ubuntu box. To test you need to exec -it into your jenkins container and repeat the telnet test. On modern linuxes telnet is usually not installed, so you can use curl -vvv which will always end up with an error, so just look at the verbose output to see if the error because things cannot communicate (timeout, connection reset etc) or the error occurs because your curl tried to talk HTTP to docker and got gibberish response. In the later case you can consider things to be set up correctly.
Finally you need to tell Jenkins Docker to communicate to the remote docker via TCP. Usually that is given on the command line to your docker run, docker ps, docker exec
I've configured it by defining the slave label in my Jenkins Pipeline.
Jenkins agents run on a variety of different environments such as physical machines, virtual machines, Kubernetes clusters, and Docker images.
In your Jenkins Pipeline or In your JenkinsFile, you've to set the agent accordingly to what you're using either using Docker image or any virtual machine.
Also Thank you so much #Vlad, all the things you told me, were really helpful.

Docker node level load balancing not working

I have two laptops, Ubuntu-14 and Mac (Big Sur) and both of them have docker (with swarm support) installed in it.
I used Ubuntu as my Swarm manager (and) Mac as my worker node.
Ubuntu private IP is 192.168.0.14 (and) Mac private IP is 192.168.0.11 [Private IP can be shared in public without any issues because every class C network has the same IP's :P]
"docker swarm init --advertise-addr" was the command I used to make my Ubuntu host, the Manager (and) I entered the join command in Mac to make the Mac node to join the Swarm as worker.
So, On a highlevel, I used docker-compose.yml (which has only 1 python webservice). Using the compose file, I started a "docker stack" and then replicated the "python webservice" instance to 5. All these actions were carried out in Manager node.
Ubuntu Manager Node (also had 2 container instance and behaved as worker) (and) Mac Node had 3 container instances of the "python webservice". I have set up "ports" to be "80:1234" which means If I hit the port 80 of the host machine, It will redirect to the "python application webservice port" which is 1234 running inside the container.
When I hit the Manager IP (192.168.0.14:80) some 50 times and when I checked the logs of all the 5 containers in both Mac and Ubuntu,
I found all the 2 containers in Ubuntu, got 25 hits each (in a round robin fashion) BUT,
I couldn't find any logs for any of the containers present in Mac machine.
Is this an expected behavior?
Only when I hit the IP address (192.168.0.11:80) of the Mac machine (worker) directly, I was able to get the logs/request hits for the containers present in the Mac machine.
So, there is two types of load balancing happening here,
When I hit the IP:port (of a worker/manager), then, only the containers present in that worker/manager machine will be load balanced and served in a round-robin fashion (I can see that's the algorithm used). Let's name this load balancing type as "Container level load balancer"
But, When I hit 192.168.0.14 (Manager IP), I expected the load will be balanced across all the 5 containers which is deployed across 2 nodes. Somehow this didn't work. Let's call it "Node level load balancer"
I have tried searching a lot in google for this but found nothing. Most sites are using external technologies like Nginx, HaProxy load balancers for solving "Node level load balancer".
Isn't there an out of the box support for this by docker itself?
EDIT 1 - Added docker-compose.yml as Metin asked in comment section
docker-compose.yml
version: '3'
services:
webservice:
image: python_ws_test
ports:
- '80:1234'
command: ["python", "app.py"]
The main issue was, I tried to join a Linux node and a Mac node, because the docker for Mac (only SWARM, I think) is kind of broken as mentioned,
in this comment https://dev.to/aguedeney/comment/172d6 (and)
subsequently the thread (https://dev.to/natterstefan/docker-tip-how-to-get-host-s-ip-address-inside-a-docker-container-5anh).
Mac Private IP is 192.168.0.11, but somehow, 192.168.65.3 is the IP taken for the Mac worker node.
How did I find out?
Point 1
=> I made Mac as my Manager using "swarm init" command without any "advertise-addr" or "lister-addr" or etc.. The "docker swarm join" command I got had the IP address = 192.168.65.3. I don't know why because my Mac host IP is 192.168.0.11. This is not expected behaviour.
=> I did the same in Ubuntu to make my Ubuntu as Manager using "swarm init" raw command and the "docker swarm join" command I got had the IP address = 192.168.0.14 which is the same IP of the ubuntu host machine and that is expected behaviour.
Point 2
Once the stack is deployed, I tried to inspect the overlay network that's used using "docker network inspect $networkName". Linux manager node had the peers as "itself" and 192.168.65.3 which was unreachable because my Mac node's IP is 192.168.0.11.
But somehow, when I auto-scaled by using "scale" command in Manager node (Ubuntu), docker manager was able to scale the containers in both Mac and Ubuntu. This is very odd.
Default Overlay network - behaviour
Also, "docker stack deploy" by default creates an overlay ingress network irrespective of whether you mention in docker-compose.yml or deploy command. Docker manager and nodes communicate between them on top of this network.
Answer to the issue mentioned in question
Docker has out of the box support for "Node level load balancing"? Yes!
I was so frustrated about this odd behaviour in Mac, I installed Ubuntu 20.04 VM in my Mac and tried to use "Ubuntu 14.04" (separate Laptop / Base OS) as Manager and Ubuntu 20.04 (Virtual Machine OS) as worker node. Now, I was able to load balance between two nodes (I was getting hits in worker node), although I kept hitting the IP of manager only.
I'll update why Mac is broken here if I get more insights. Anyone who already has the knowledge about this, please share.

Add a VM running Ubuntu as a worker node in Docker Swarm

I'm trying to create swarm consisting of 2 nodes, using docker-machine, it is easy to provision a VM and add it as a node, but I want to create a swarm using a ubuntu VM machine and Windows docker as manager without using docker-machine.
Running
docker swarm init
in Windows (Host Machine) gives me a token to add a worker. I have Ubuntu running in VirtualBox, Docker is also installed in the VM and I'm able to ssh into it and run commands but whenever I try to add this Ubuntu Machine as a worker node by using the token generated from Windows Machine, it says
Error response from daemon: Timeout was reached before node joined. The attempt to join the swarm will continue in the background. Use the "docker info" command to see the current swarm status of your node.
I think it is related to port forwarding. I'm forwarding my VM port 22 to 127.0.0.1:22 in VBox for connecting via SSH. But I tried several combinations of forwarding. Still the VM is not able to join as a node in the swarm that I created in Windows.
Any guidance will be of great value.
Check if you have connectivity from your Ubuntu to your Windows machine. First, ssh to your Ubuntu and check:
Windows is addressable, for example using ping windows-ip.
If it is not, make sure both are in the same network, for example setting a bridge network in your VM configuration.
Windows is listening in ports needed by docker swarm:
TCP port 2376 for secure Docker client communication. This port is required for Docker Machine to work. Docker Machine is used to orchestrate Docker hosts.
TCP port 2377. This port is used for communication between the nodes of a Docker Swarm or cluster. It only needs to be opened on manager nodes.
TCP and UDP port 7946 for communication among nodes (container network discovery).
UDP port 4789 for overlay network traffic (container ingress networking).
You can check this using telnet windows-ip port.
If they are not reachable, check your Windows firewall.
I hope it helps!
I tried to create a similar Swarm with a Windows manager node but never really got it to work. You can initialize a single-node Swarm from Windows with docker swarm init. However adding multiple worker nodes does not appear to be supported at the moment:
https://docs.docker.com/engine/swarm/swarm-tutorial/.
"Currently, you cannot use Docker Desktop for Mac or Docker Desktop for Windows alone to test a multi-node swarm".
The following options are possible:
Pure Linux swarm (Linux manager + Linux workers) which runs only Linux containers
Hybrid Swarm (Linux manager + Windows workers + Linux workers) which runs Windows and Linux containers
(Sometimes) Pure Windows Swarm using Win Server 2019 as the manager. The regular Windows updates have been known to break various features of Swarm. For example, https://github.com/moby/moby/issues/40998
Then everyone either tries workarounds or waits for the next Windows update to fix the problem.
Personally I've had good luck with hybrid Swarm. It works fine with simple Ubuntu manager + standard Windows 10 workers. No need for Win Server.

Docker Host And Other Fundamental Questions

I am new to Docker. And have few easy questions hope you could help.
I have a windows 10 machine which installed "docker for windows". In its HyperV manager I could see a virtual machine called "MobyLinuxVM".
So my questions are:
1, When people talking about "Docker Host" and "Docker Engine", what are they according to my situation?
-- I assume "Docker Host" should be my windows PC, and "Docker Engine" is that Virtual machine inside Hyper-V.
2, If I use ipconfig to see my PC, I will find I got at lease 2 networks and IP addresses:
(a) Lan Adapter -- show my IP is 192.168.xxx.yyy
(b) DockerNAT -- show my IP is 10.0.75.1
Then when I try to use dock-compose.yml to create container, I found I could ONLY use:
environment:
- MAGENTO_HOST=10.0.75.2
- MARIADB_HOST=10.0.75.2
to create container and can be directly accessed (e.g. via browser to Magento website). So question is:
If my machine is 10.0.75.1 within Docker network, then what is 10.0.75.2? why I cannot use e.g. 10.0.75.3?
3, My yml script actually contains multiple containers creation -- e.g. 2 Magento containers + 2 MariaDB containers + etc. When I specify their docker 'HOST', why it's not my machine? (If we call my machine to be 'docker host' & hyper-v virtual image to be 'docker engine' in my 1st question.)
4, Also according to my 3rd question, I current deploy all containers within 1 host. Is it worth to use Docker Swarm which people can use to cluster multiple Docker hosts? If so, does that mean I need to use Hyper-V to create another "MobyLinuxVM"?
Thanks a lot!
1 Docker Engine + Docker Host
The Docker Engine is the group of processes that manage Docker containers. dockerd is usually the head of that process tree.
The Docker Host is the OS running Docker engine, that is MobyLinuxVM
Your VM host is your Windows box.
2 Docker Host IP
10.0.75.2 is most likely the address assigned to MobyLinuxVM. I don't run Docker for Windows so can't entirely confirm but searching the web seems to back this up.
3 - see 1
4 Swarm
You would need to run multiple VMs to setup swarm. Docker machine is the tool to use when setting up swarm instances. It allows you to manage multiple Docker instances and comes with a HyperV driver.

Docker Service won't start Windows Server 2016

I followed the steps in this link to install Docker on Windows Server 2016.
OS Name Microsoft Windows Server 2016 Standard
Version 10.0.14393 Build 14393
Docker installs fine, but the service just stays in "Starting" when I restart the server. There are no Docker related messages in the event logs, so I have absolutely no idea what the problem is. I also tried deregistering the service, and registering it listening on the default named pipe and an IP address.
In my case the docker Service didn't start after fresh Installation cause I already had a Hyper-V Switch type NAT and a corresponding NETNAT object. Docker for Windows is trying to create a new NETNAT object for it's HNS internal Network, and can't do so, cause the other NETNAT object already exists.
I remove the Hyper-V Switch and the NETNAT object ( get-netnat | remove-netnat), and after that the Installation for Docker on Windows Server 2016 worked without any Problems - and the Docker Service was started automatically after reboot.

Resources