Unable to join Docker swarm because control.sock is missing? - docker

I have an existing Docker swarm consisting of three machines. I am trying to add a new manager to this swarm. I run the command
docker swarm join --token SWMTKN-1-<...> 192.168.200.200:2377
After a while I get the error
Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Unknown desc = could not connect to prospective new cluster member using its advertised address: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I view the daemon logs using tail -f /var/log/messages | grep docker, I see this:
Mar 17 17:07:48 UAT-Blockchain dockerd: time="2021-03-17T17:07:48.575024542+08:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {/var/run/docker/swarm/control.sock <nil> 0 <nil>}. Err :connection error: desc= \"transport: Error while dialing dial unix /var/run/docker/swarm/control.sock: connect: no such file or directory\". Reconnecting..." module=grpc
A quick check shows that /var/run/docker/swarm/control.sock is indeed missing on this machine, but is present on the machines in the existing swarm.
What is this control.sock? How should I go about enabling/reinstating it on this current machine? Is this a problem of faulty installation?

Related

How to build from docker file inside another docker container?

When I run:
docker build -t random-letter .
I get error:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I tried running dockerd but got some other errors
Running iptables --wait -t nat -L -n failed with message: `iptables v1.8.4 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.`, error: exit status 3
INFO[2022-04-13T14:32:13.795289191Z] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby
INFO[2022-04-13T14:32:13.795587753Z] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
INFO[2022-04-13T14:32:13.795630880Z] stopping healthcheck following graceful shutdown module=libcontainerd
WARN[2022-04-13T14:32:14.796355453Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.4 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.
Here's a link to a similar question may help you to get a good answer since I believe DinD should be avoided to reduce complexity

Base Docker in Docker image cannot start Docker daemon

I have reduced my dockerfile to the following
FROM docker:latest
EXPOSE 3000
But when running the image, docker daemon cannot start.
Running dockerd in the container results in a large chain of info, errors and warnings ending with the following:
WARN[2021-12-09T01:07:36.691842800Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: Iptables not found
Am I missing something? I can manually install iptables but then it fails again with
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.7 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
So I am assuming I have some setting wrong as it seems to be working out of the box here https://hub.docker.com/_/docker
I am running docker on Windows with the WSL 2 backend.

Connection error with Docker Swarm in Gitlab Runner - half bare - half containers

I'm trying to use Docker Swarm with a Gitlab Runner to roll out deployments on 5-6 servers.
The swarm atm consists of 1 manager (call it M1), 2 workers (W1 and W2). My Gitlab Runner is using a docker executor with a docker in docker image. When a commit occurs, the Runner tries to register as a manager in the swarm (so it can start jobs / services). However I get the following error:
Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Unknown desc = could not connect to prospective new cluster member using its advertised address: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: IPADDRESS connect: connection refused"
Interestingly, the ipaddress in the error is not the ip address of any of the servers. They are in sequence, and the one in the error message is the direct next address.
Is there a better way to issue commands to my swarm from my runner / a computer not connected to the server?

Why do I get an authentication error when trying to join docker swarm

I am trying to run
docker-machine ssh myvm2 "docker swarm join --token SwMTKN-1-<cut> 192.168.161.163:2376"
to join mymv2 as a worker to the cluster, but I get this error
Error response from daemon: rpc error: code = Unavailable desc = all Subconns are in TransientFailure,
latest connection error: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate" exit status 1
I am following the docker course at :https://docs.docker.com/v17.09/get-started/part4/#create-a-cluster
Change the port to 2377 in --advertise-addr while creating swarm and join the other managers and workers using the same port to solve this problem.

Docker - Unable to join swarm as manager, able to join as worker

When executing a docker swarm join command (as manager), I face the following error:
Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = "transport: x509: certificate is not valid for any names, but wanted to match swarm-manager"
Joining the same swarm, but as worker, works flawless.
The logfiles show me the following items:
kmo#GETSTdock-app01 ~ $ sudo tail -f /var/log/upstart/docker.log
time="2018-07-06T09:18:17.890620199+02:00" level=info msg="Listening for connections" addr="[::]:2377" module=node node.id=7j75bmugpf8k2o0onta1yp4zy proto=tcp
time="2018-07-06T09:18:17.892234469+02:00" level=info msg="manager selected by agent for new session: { 10.130.223.107:2377}" module=node/agent node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:17.892364019+02:00" level=info msg="waiting 0s before registering session" module=node/agent node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.161362606+02:00" level=error msg="fatal task error" error="cannot create a swarm scoped network when swarm is not active" module=node/agent/taskmanager node.id=7j75bmugpf8k2o0onta1yp4zy service.id=p3ng4om2m8rl7ygoef18ayohp task.id=weaubf3qj5goctlh2039sjvdg
time="2018-07-06T09:18:18.162182077+02:00" level=error msg="fatal task error" error="cannot create a swarm scoped network when swarm is not active" module=node/agent/taskmanager node.id=7j75bmugpf8k2o0onta1yp4zy service.id=6sl9y5rcov6htwbyvm504ewh2 task.id=j3foc6rjszuqszj41qyqb6mpe
time="2018-07-06T09:18:18.184847516+02:00" level=info msg="Stopping manager" module=node node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.184993569+02:00" level=info msg="Manager shut down" module=node node.id=7j75bmugpf8k2o0onta1yp4zy
time="2018-07-06T09:18:18.185020917+02:00" level=info msg="shutting down certificate renewal routine" module=node/tls node.id=7j75bmugpf8k2o0onta1yp4zy node.role=swarm-manager
time="2018-07-06T09:18:18.185163663+02:00" level=error msg="cluster exited with error: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate is not valid for any names, but wanted to match swarm-manager\""
time="2018-07-06T09:18:18.185492995+02:00" level=error msg="Handler for POST /v1.37/swarm/join returned error: manager stopped: can't initialize raft node: rpc error: code = Internal desc = connection error: desc = \"transport: x509: certificate is not valid for any names, but wanted to match swarm-manager\""
I face similar problems when I join as worker, and then attempt to promote the node to a manager node.
Docker version = 18.03.1
OS = Ubuntu 14.04 LTS
Anybody an idea how to resolve this?
For me, I had to open port 2377 in the joining manager node's firewall; that seemed to do the trick. I'm not sure if this is best practice, as I'm still a noob with Docker Swarm: but add it to the list of things to try if you have this issue.
This may or may not work, but you can try
On manager run:
docker swarm leave --force
Recreate the swarm using:
docker swarm init --advertise-addr [ip-address for initial manager]
Then try to add managers using the advertised address
Also you can try:
Comment out the proxy from the docker proxy define file /etc/systemd/system/docker.service.d/docker.conf or /etc/systemd/system/docker.service.d/docker_proxy.conf
reload the deamon with
systemctl daemon-reload
Re-excute docker swarm join --token manager

Resources