Docker swarm - Manager node cannot access the containers in worker node - docker

In our docker swarm environment, there is 1 manager nodes and 2 worker nodes.
We also installed portainer,swarm and the portainer agent's &swarm agents on all nodes.
Yesterday, one of the virtual servers which worker node installed rebooted unexpectedly.
When we check the docker service it was stopped. restarted the docker service with using this command:
systemctl restart docker
Then all the containers seem to work fine on the worker node. But when we check the containers by the portainer which runs on a master node, the containers look stopped. Swarmpit reports that the worker's nodes active and ready.
What could be the problem?
Worker Node:
Master node - running containers
Swarmpit

We find out that the firewall caused the error.
After rebotting CentOS, the firewall is enabled automatically and it conflicted with the docker engine so we disabled the firewall with this command:
systemctl disable firewalld

Related

How to deploy a compose file in docker swarm which is present in Worker node

In system1(i.e Host name of Master node), the docker is started using
docker swarm init
And later the Compose file available in system1 (*.yml) are deployed using
docker stack deploy --compose-file file_1.yml system1
docker stack deploy --compose-file file_2.yml system1
docker stack deploy --compose-file file_3.yml system1
Next in system2 (i.e Host name of Worker node),
Will join the manager node (system1) using join --token command.And using below mentioned command,and later copy the output of that command and join the manager node.
docker swarm join-token worker
And once ran output of the above command in system2.Was able to successfully join the manager node.
Also cross verified by using ,
docker node ls
And I could see both manager node and worker in Ready and active state.
In my case I'm using worker node(system2) for failover .
Now that I have similar compose files (*.yml files) in system2.
How do I get that deployed in docker swarm ?
Since system2 is worker node, I cannot deploy in system2.
At first I'm not sure what do you mean by
In my case I'm using worker node(system2) for failover .
We are running Docker Swarm in production and the only way you can achieve failover with managers is to use more of them. Note because Docker Swarm uses etcd and that uses quorum, go with the rule of 1,3,5 ...
As for deployments from non-manager nodes, it is not possible to do so in Docker Swarm unless you use a management service which has a docker socket proxy and it can work with it through a service (service will be running on the manager and since it all lives inside Docker Swarm you can then invoke the calls from the worker.).
But there is no way to directly deploy or administrate the swarm from the worker node.
Some things:
First:
Docker contexts are used to communicate with a swarm manager remotely so that you do not have to be on the manager when executing docker commands.
i.e. to deploy remotely to a swarm you could create then use a context like this:
docker context create swarm1 --docker "host=ssh://user#node1"
docker --context swarm1 stack deploy --compose-file stack.yml stack1
2nd:
Once the swarm is set up, you always communicate with a manager node, and it orchestrates the deployment of the service to available worker nodes. In the case that worker nodes are added after services are deployed docker will not move tasks to the worker nodes until new deployments are performed as it prefers to not interrupt running tasks. The goal is eventual balance. If you want to force a docker to rebalance to consider the new worker node immediately, then just redeploy the stack, or
docker service update --force some-service
3rd:
To control which worker nodes services run tasks on you can use placement constraints and node labels.
docker service create --constraint node.role==worker ... would only deploy onto nodes that have the worker role (are not managers)
or
docker service update --constraint-add "node.labels.is-nvidia-enabled==1" some-service would only deploy tasks to the node where you have explicitly labeled the node with the corresponding label and value.
e.g. docker node update label-add is-nvidia-enabled=1 node1 node3

Kubernetes Cluster - Containers do not restart after reboot

I have a kubernetes cluster setup at home on two bare metal machines.
I used kubespray to install both and it uses kubeadm behind the scenes.
The problem I encounter is that all containers within the cluster have a restartPolicy: no which makes my cluster break when I restart the main node.
I have to manually run "docker container start" for all containers in "kube-system" namespace to make it work after reboot.
Does anyone have an idea where the problem might be coming from ?
Docker provides restart policies to control whether your containers start automatically when they exit, or when Docker restarts. Here your containers have the restart policy - no which means this policy will never automatically start the container under any circumstance.
You need to change the restart policy to Always which restarts the container if it stops. If it is manually stopped, it is restarted only when Docker daemon restarts or the container itself is manually restarted.
You can change the restart policy of an existing container using docker update. Pass the name of the container to the command. You can find container names by running docker ps -a.
docker update --restart=always <CONTAINER NAME>
Restart policy details:
Keep the following in mind when using restart policies:
A restart policy only takes effect after a container starts successfully. In this case, starting successfully means that the container is up for at least 10 seconds and Docker has started monitoring it. This prevents a container which does not start at all from going into a restart loop.
If you manually stop a container, its restart policy is ignored until the Docker daemon restarts or the container is manually restarted. This is another attempt to prevent a restart loop.
I am answering my question:
It wasn't probably very clear but I was talking about the kube-system pods that manage the whole cluster and that should automatically start when the machine restarts.
It turns out those pods (ex: code-dns, kube-proxy, etc) have a restart policy of "no" intentionally and it is the kubelet service on the node that spins up the whole cluster when you restart your node.
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
In my case kubelet could not start due to missing cri-dockerd process.
Check the issue I opened at kubespray:
Verifying the kubelet logs is done like so:
journalctl -u kubelet -f

Unable to deploy portainer agent to docker swarm - /proc/sys/net/bridge/bridge-nf-call-iptables: permission denied

I've been having an issue deploying the Portainer agent across a newly created Docker swarm. One of the nodes starts the agent without any issue [we'll call that HOST#1] but then HOST#2 will just
infinitely try to deploy the agent container (showing preparing container under the services menu in portainer), before eventually showing failed with the below error message and then attempting to create a new container.
Error:
starting container failed: error creating external connectivity network: cannot restrict inter-container communication: open /proc/sys/net/bridge/bridge-nf-call-iptables: permission denied
What i've tested/tried
I have been following the instructions outlined on the portainer wiki and using the agent-stack.yml file for adding an existing agent to a swarm, https://docs.portainer.io/v/ce-2.11/start/install/agent/swarm/linux I did also try delete the agent altogether from the swarm and deyploying it again, with the same results.
No issues deploying the hello world service to the swarm.
Temporarily disabling ufw
setting ufw allow in on docker0
setting ufw allow in on docker_gwbridge
docker node ls reports both nodes are Ready & avalible
Environment details:
Both systems running Ubuntu server 20.04
Both systems running Docker version 20.10.12
Both systems running kernel versions 5.4.0*
Both are running as manager nodes in the swarm
Portainer Agent 2.11.0
The system unable to deploy the Agent is a OpenVZ VPS [HOST#2]
The VPS [HOST#2] is connected to my local network via a OpenVPN (layer 2) tap adapter, therefore the swarm is connecting over the VPN
HOST2 is running ufw for firewall management while HOST1 is not
I'm quite new to docker swarm but i have been using docker for many years. Any help highly appreciated

docker swarm services are restarting automatically

We are running a docker swarm cluster with 3 managers and 5 workers. Twice now we have experienced some error in the cluster where every service is restarted automatically.
This happens when the heartbeat is getting failed. The service 4adb11869318 on manager node and the service e7b284330420 on worker node had the issues very frequently
Manager Node Logs: https://pastebin.com/YdriawA6
Worker Node Logs: https://pastebin.com/AvGCstfg
I dont know how to prevent the restart of docker services. Do you have any suggestions?

Docker swarm cluster how to add manager nodes as a reachable

I am using docker virtual box for windows 7 machine.
$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
default * virtualbox Running tcp://1.2.3.101:2376 v17.04.0-ce
manager1 - virtualbox Running tcp://1.2.3.106:2376 v17.04.0-ce
manager2 - virtualbox Running tcp://1.2.3.105:2376 v17.04.0-ce
worker1 - virtualbox Running tcp://1.2.3.102:2376 v17.04.0-ce
worker2 - virtualbox Running tcp://1.2.3.104:2376 v17.04.0-ce
worker3 - virtualbox Running tcp://1.2.3.103:2376 v17.04.0-ce
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
e8kum3w0xqd4g02cx1tfps9ni manager1 Down Active
aibbgvqtiv9bhzbs8l20lbx2m * default Ready Active Leader
sbt75u8ayvf7lqj7y3zppjwvk worker1 Ready Active
ny2j5556w4tyflf3tjfqzjrte worker2 Ready Active
veipdd0qs2gjnogftxvr1kfhq worker3 Ready Active
Now i am planing set up environment docker swarm cluster, like i have three manager node (name as default,manager1,manager2) and three workers nodes (name as worker1, worker2,worker3).
Using default manager node i init docker swarm with address
$ docker swarm init --advertise-addr 1.2.3.101:2376
output starting
Swarm initialized: current node (acbbgvqtiv6bhzbs8l20lbx1e) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-1ie1b420bhs452ubt4iy01brfc97801q0ya608spbt0fnuzkp0-1h2a86acczxe4qta164np487r 1.2.3.101:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
output ending
Using this output i easily added worker nodes. Now my question is how yo add other manager (manager1,manager2) to reachable state. Note still default node act as leader
could you please any one help on this ?
Thanks
Sorry for late answer.
On the existing manager host get manager-token:
>docker swarm join-token manager
and then on a potential manager host execute got output
Run command on manager node
docker swarm join-token manager
to get the token to add other nodes as manager, should be similar to the worker token you got above
You need to ssh to the other machine which you want to add as a manager node to the swarm.
Once done, run that command
For the manager to advertise address you can provide --advertise-addr and --listen-addr flags as well, they take host:port as param.
Hope this helps

Resources