Docker swarm worker service unable to access service on manager - docker

I have a single node docker Swarm. I deployed a stack with Influxdb (placement: manager), Grafana (placement: manager) and cAdvisor (deploy mode global).
I use Portainer to visualize the stack. And it correctly shows 1 manager node with its services.
Then I added a second (worker) node with docker swarm join etc...
Now the cAdvisor service is also launched on the worker, but it has a problem, it keeps logging:
E1115 14:52:46.772290 1 memory.go:91] failed to write stats to influxDb - Post http://influx:8086/write?consistency=&db=cadvisor&precision=&rp=: dial tcp: lookup influx on 127.0.0.11:53: no such host
Any ideas?
Thanx

Related

why docker swarm dns resolve service differently

I have a docker swarm cluster form by 6 ECS nodes, 1 manager 5 worker node and 34 services with portainer and watchertower run globally.
I run redis, mysql(replicated mode and bind to specific label node) as docker swarm service and other service access them via serviceName. Normally, it works as I desire.
However, if I update service with --force option(trying to take effect immediately), the other services(not all the service, but quiet a few) start receive error like 'redis Name or Service not Known'. The thing is that I never update redis service. If I enter the container and ping the service, it returns me back correct IP.
The second weird phenomena is that One Service with multiple instances act differently, some instance did not has serviceName resolve issue, some does.
What's the cause of that? DNS cache? How can I avoid it ? What is the correct way to update a docker swarm service ?
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
pqcbjo1rgsta81nudzkkphbqt * bc1 Ready Active Leader 20.10.7
u0za3ktnk0ih4zpmzvpij6h35 bc2 Ready Active 20.10.7
t1jp4z5kiyc4gvml56j5dpzj0 bc3 Ready Active 20.10.7
fa2umzxk5b6vun85vlbw9r7xu bc4 Ready Active 20.10.17
cbamwvq2s5hmkvia035i9i2xw bc5 Ready Active 20.10.17
9uj8g25oirfq8i3fgsvbup5ou d2 Ready Active 20.10.7

How to deploy a compose file in docker swarm which is present in Worker node

In system1(i.e Host name of Master node), the docker is started using
docker swarm init
And later the Compose file available in system1 (*.yml) are deployed using
docker stack deploy --compose-file file_1.yml system1
docker stack deploy --compose-file file_2.yml system1
docker stack deploy --compose-file file_3.yml system1
Next in system2 (i.e Host name of Worker node),
Will join the manager node (system1) using join --token command.And using below mentioned command,and later copy the output of that command and join the manager node.
docker swarm join-token worker
And once ran output of the above command in system2.Was able to successfully join the manager node.
Also cross verified by using ,
docker node ls
And I could see both manager node and worker in Ready and active state.
In my case I'm using worker node(system2) for failover .
Now that I have similar compose files (*.yml files) in system2.
How do I get that deployed in docker swarm ?
Since system2 is worker node, I cannot deploy in system2.
At first I'm not sure what do you mean by
In my case I'm using worker node(system2) for failover .
We are running Docker Swarm in production and the only way you can achieve failover with managers is to use more of them. Note because Docker Swarm uses etcd and that uses quorum, go with the rule of 1,3,5 ...
As for deployments from non-manager nodes, it is not possible to do so in Docker Swarm unless you use a management service which has a docker socket proxy and it can work with it through a service (service will be running on the manager and since it all lives inside Docker Swarm you can then invoke the calls from the worker.).
But there is no way to directly deploy or administrate the swarm from the worker node.
Some things:
First:
Docker contexts are used to communicate with a swarm manager remotely so that you do not have to be on the manager when executing docker commands.
i.e. to deploy remotely to a swarm you could create then use a context like this:
docker context create swarm1 --docker "host=ssh://user#node1"
docker --context swarm1 stack deploy --compose-file stack.yml stack1
2nd:
Once the swarm is set up, you always communicate with a manager node, and it orchestrates the deployment of the service to available worker nodes. In the case that worker nodes are added after services are deployed docker will not move tasks to the worker nodes until new deployments are performed as it prefers to not interrupt running tasks. The goal is eventual balance. If you want to force a docker to rebalance to consider the new worker node immediately, then just redeploy the stack, or
docker service update --force some-service
3rd:
To control which worker nodes services run tasks on you can use placement constraints and node labels.
docker service create --constraint node.role==worker ... would only deploy onto nodes that have the worker role (are not managers)
or
docker service update --constraint-add "node.labels.is-nvidia-enabled==1" some-service would only deploy tasks to the node where you have explicitly labeled the node with the corresponding label and value.
e.g. docker node update label-add is-nvidia-enabled=1 node1 node3

Cannot join Docker manager node in Windows using tokens

My friend and I are trying to connect our Docker daemon using Docker Swarm. We both are using Windows OS and we are NOT on the same network. According to Docker docs each docker host must have the following ports open;
TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes
UDP port 4789 for overlay network traffic
We both have added new rules for the given ports in inbound and outbound rules in the firewall. Though we keep getting the same two errors while trying to join using token created by the manager node using docker swarm join --token command;
1. error response from daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 192.168.65.3:2377: connect: connection refused"
2. Timeout error
Also, if either of us runs docker swarm init it shows 192.168.65.3 IP address that isn't part of any network we're connected to. What does it mean?
Docker overlay tutorial also states that in order to connect to the manager node, the worker node should add the IP address of the manager.
docker swarm join --token \ --advertise-addr IP-ADDRESS-OF-WORKER-1
IP-ADDRESS-OF-MANAGER:2377
Does it mean that in our case we have to use public IP address of the manager node after enabling port forwarding?
Potential network issues aside, here is your problem:
We both are using Windows OS
I have seen this issue in other threads when attempting to use Windows nodes in a multi-node swarm. Here are some important pieces of information from the Docker overlay networks documentation:
Before you can create an overlay network, you need to either initialize your Docker daemon as a swarm manager using docker swarm init or join it to an existing swarm using docker swarm join. Either of these creates the default ingress overlay network which is used by swarm services by default.
Overlay network encryption is not supported on Windows. If a Windows node attempts to connect to an encrypted overlay network, no error is detected but the node cannot communicate.
By default, Docker encrypts all swarm service management traffic. As far as I know, disabling this encryption is not possible. Do not confuse this with the --opt encrypted option, as that involves encrypting application data, not swarm management traffic.
For a single-node swarm, using Windows is just fine. For a multi-node swarm, which would be deployed using Docker stack, I highly recommend using Linux for all worker and manager nodes.
A while ago I was using Linux as a manager node and Windows as a worker node. I noticed that joining the swarm would only work if the Linux machine was the swarm manager; If the Windows machine was the manager, joining the swarm would not work. After the Windows machine joined the swarm, container-to-container communication over a user-defined overlay network would not work no matter what. Replacing the Windows machine with a Linux machine fixed all issues.

Rolling update in Docker Swarm across services with health check

I am currently attempting a Kafka cluster deployment in Docker Swarm. Kafka does not work with the replica feature of Swarm because each Kafka broker (node) needs to be configured and reachable individually (i.e. no load balancer in front of it). Therefore, each broker is configured as an individual service with replicas=1, e.g. kafka1, kafka2 and kafka3 services.
Every now and then the configuration or image for the Kafka brokers will need to be changed via a docker stack deploy (done by a person or CI/CD pipeline). Then Swarm will recreate all containers simultaneously and as a result, the Kafka cluster is temporarily unavailable, which is not acceptable for a critical piece of infrastructure that is supposed to run 24/7. And I haven't even mentioned the Zookeeper cluster underneath Kafka yet, for which the same applies.
The desired behavior is that Swarm recreates the container of the kafka1 service, waits until it has fully started up and synchronized with the other brokers (all topic partitions are in sync), and only then Swarm restarts kafka2 service and so on.
I think I can construct a health check within the Kafka Docker image that would tell the Docker engine when the Kafka broker is fully synchronized. But how make Swarm perform what amounts to a rolling update across service boundaries? It ignores the depends_on setting that Docker Compose knows, and rolling update policies apply to service replicas only. Any idea?

Error response from daemon: attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded

I am trying to set up docker swarm with an overlay network. I have some hosts on aws while others are laptops running Ubuntu(same as on aws). Every node has a static public IP. I have created an overlay network as:
docker network create --driver=overlay --attachable test-net
I have created a swarm network on one of the aws hosts. Every other node is able to join that swarm network.
However when I run docker run -it --name alpine2 --network test-net alpine on any node not on aws, I get the error: docker: Error response from daemon: attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded.
But if I run the same on any aws host, then everything is working fine. Is there anything more I need to do in terms of networking/ports If there are some nodes on aws while others are not?
I have opened the ports required for swarm networking on all machines.
EDIT: All the nodes are marked as "active" when listing in the manager node.
UPDATE Solved this issue by opening the respective ports. It now works if all the nodes are Linux based. But when I try to make a swarm with the manager as Linux(ubuntu) os, mac os machines are not able to join the swarm.
check if the node in drain state:
docker node inspect --format {{.Spec.Availability}} node
if yes then update the state:
docker node update --availability active node
here is the explanation:
Resolution
When a node is in drain state, it is expected behavior that you should
not be able to allocate swarm mode resources such as multi-host
overlay network IP addresses to the node.However, swarm mode does not
currently provide a messaging mechanism between the swarm leader where
IP address management occurs back to the worker node that requested
the IP address. So docker run fails with context deadline exceeded.
Internal engineering issue escalation/292 has been opened to provide a
better error message in a future release of the Docker daemon.
source
Check if the below ports are opened on both machines.
TCP port 2377
TCP and UDP port 7946
UDP port 4789
You may use ufw to allow the ports:
ufw allow 2377/tcp
I had a similar issue, managed to fix it by making sure the ENGINE VERSION of the nodes were the same.
sudo docker node ls
Another common cause for this is Ubuntu server installer installing docker using snap, and that package is buggy. Uninstall with snap and install using apt. And reconsider Ubuntu. :-/

Resources