I have an issue with the docker daemon installed on an Ubuntu 14.04 VM. The logs reveal that ipv6 is enabled hence the docker seems to be listening on this ip address. Essentially, this effects Clair. I have made sure that ipv6 is disabled on the following recommendation here. I also disabled ipv6 in daemon.json as specified in Docker documentation. My docker version is Docker version 17.06.1-ce, build 874a737.
Docker daemon logs :
time="2018-02-20T20:33:17.736203462+01:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 20 01:4860:4860::8844]"
Clair logs:
2018/02/20 20:43:51 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp [::]:6060: connect: cannot assign requested address"; Reconnecting to {[::]:6060 <nil>}
2018/02/20 20:46:14 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp [::]:6060: connect: cannot assign requested address"; Reconnecting to {[::]:6060 <nil>}
It's trying to make an IPv6 connection, but the address is wrong. [::] is IN6ADDR_ANY, not an actual address you can connect to. Provide the correct address in your config.yaml.
Did you mean to connect to localhost?
api:
# v3 grpc/RESTful API server address
addr: "[::1]:6060"
Related
I gave role like this.
env: oracle cloud.
open port: TCP 2377 , UDP TCP 7946 ,UDP 4786
Instance A : manager
Instance B : worker
Local PC : worker
init swarm mode with this cli on A
docker swarm init --advertise-addr <A's IP>
B got
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp A's IP:2377: connect: no route to host"
Local PC got
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp A's IP:2377: connect: connection refused"
well I have no idea what should I need to do more.
thank you in advance.
the problem was firewall setting on manager node's instance.
sudo firewall-cmd --add-port=2377/tcp --permanent
sudo firewall-cmd --reload
for me #Logan Lee solution perfectly matched.
According to the docker documentation, following ports need to manage accordingly
TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes
UDP port 4789 for overlay network traffic
Thanks #Logan Lee
Check the network connectivity between A and B, seems like they are not on the same network.
I am running a docker swarm manager in VM Workstation 15 player with NAT(VM: Ubuntu 19.10, Host: Windows 10). I ran docker swarm init --advertise-addr 223.181.240.48:2377 on my mangager vm. Now i copied to the token and used it on my my other vm that is running on another node and another network with NAT. it returns the following error:
Error response from daemon: Timeout was reached before node joined.
The attemp to join the swarm will continue in the background. Use the
"docker info" command to see the current swarm status of your node.
Then i tried googling for error and got to know that the problem may arise due to firewall and i might have to unblock the port.Also, as i am using NAT, i have to either use automatic bridge or port forward.First, I tried using bride(in vm setting, i changed network to bridge), but when i tried "my ip",the results were same in both host machine and vm(223.181.240.48).So, i tried port forwarding with NAT,i went to C:/ProgramData/VMware/vmnetnat.conf and added the following line
[incomingtcp]
2377:192.168.172.2:2377
192.168.172.2 is my vm's net gateway address. Then i again ran the docker swarm command, copied to my other vm. Now, i got the following error:
Error response from daemon: rpc error: code =Unavailable desc = all
SubConns are in TransientFailure, latest connection error: connection
error: desc = "transport: Error while dialing dial tcp
233.181.240.48:2377: connect: connection refused"
Then i tried sudo ufw allow 2377/tcp to unblock port in vm. Then retried the whole procedure again. Now i am again receiving the timeout error. Did i miss something in the middle? or did something wrong? And what is the difference between the ip i receive through a "my ip " google search and the ipv4 i receive in wired connection setting(dhcp on).
I have been tasked to build a production ready Swarm cluster using Zookeeper as dicovery backend. I used the official documentation for this purpose, https://docs.docker.com/swarm/install-manual/. Concerning backend discovery I used this one: https://docs.docker.com/swarm/discovery/. Now I have an issue. When I try to communicate with the swarm, I have this error: No elected primary cluster manager.
This is my setup:
I'm running on Ubuntu 16.04 with docker Client/Server version 1.12.3, with zookeeper 3.4.9 launch in the same host as my swarm manager. I'm using a two nodes architecture with one swarm manager and one swarm worker
After Docker Engine installation on each node,
$ nohup docker daemon -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock &
Now on the swarm manager:
$ docker run -d -p 4000:4000 swarm manage -H :4000 --replication --advertise <swarm-manager-ip>:4000 zk://<swarm-manager-ip>/swarm
On the swarm worker:
$ docker run -d swarm join --advertise=<swarm-worker-ip>:2375 zk://<swarm-manager-ip>/swarm
Now when I try to see if everything is good, I hit the command below and the result follows.
$ docker -H <swarm-manager-ip>:4000 ps -a
Error response from daemon: No elected primary cluster manager
When I just do this:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
91c3864ba6ee swarm "/swarm manage -H :40" 17 hours ago Up 19 minutes 2375/tcp, 0.0.0.0:4000->4000/tcp swarm-master
I can see the swarm master and when I try to see the logs of the swarm node, I can see this:
$ docker logs 91c3864ba6ee
time="2016-12-09T20:29:39Z" level=info msg="Initializing discovery without TLS"
time="2016-12-09T20:29:39Z" level=info msg="Listening for HTTP" addr=":4000" proto=tcp
time="2016-12-09T20:29:39Z" level=info msg="Leader Election: Cluster leadership lost"
2016/12/09 20:29:40 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:40Z" level=error msg="zk: could not connect to a server"
time="2016-12-09T20:29:40Z" level=error msg="zk: could not connect to a server"
time="2016-12-09T20:29:40Z" level=error msg="Discovery error: zk: could not connect to a server"
2016/12/09 20:29:42 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:42Z" level=error msg="Discovery error: zk: could not connect to a server"
2016/12/09 20:29:44 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:44Z" level=error msg="Discovery error: zk: could not connect to a server"
time="2016-12-09T20:29:44Z" level=error msg="Discovery error: Unexpected watch error"
2016/12/09 20:29:46 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
2016/12/09 20:29:48 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:50Z" level=info msg="Leader Election: Cluster leadership lost"
2016/12/09 20:29:50 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
time="2016-12-09T20:29:50Z" level=error msg="zk: could not connect to a server"
time="2016-12-09T20:29:50Z" level=error msg="zk: could not connect to a server"
But a simple telnet command shows me that my zookeeper host is working. So how do I have a i/o timeout when the swarm try to connect to zookeeper discovery backend?
As mentioned in the comments there is a new version called Swarm mode embedded with Docker since 1.12. It includes a built-in high-available distributed object store so you don't have to setup an external KV store yourself.
Now regarding your issue with the first version of Swarm, one line caught my attention:
2016/12/09 20:29:50 Failed to connect to <swarm-manager-ip>:2181: dial tcp <swarm-manager-ip>:2181: i/o timeout
To me it seems that zookeeper is not running on your machine or that you didn't point to the right port.
First check that zookeeper is running on your machine with:
ps aux | grep zookeeper
You should see a process running.
If not, make sure you create a zoo.cfg file in the conf directory of your zookeeper installation specifying the right port, for example:
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
You can look at This Tutorial to bootstrap zookeeper.
After this you can run the zkStart.sh script to start your zookeeper instance and swarm should now be able to properly connect and register the Leader key.
If this still does not work, try downgrading to zookeeper 3.4.6 as this is the last known supported version since the switch to Docker Swarm Mode.
Update
I believe the culprit is the master who does not appear to be listening on port 7946. netstat shows that 7946 is listening on the nodes, but not the master. When I check the syslogs for the nodes I see the following error
level=error msg="Failed to join memberlist [10.0.0.12] on retry: 1 error(s) occurred:\n\n* Failed to join 10.0.0.12: dial tcp 10.0.0.12:7946: getsockopt: connection refused"
Original Post
I am running a three node Swarm Mode cluster in AWS; one master and two workers. This is swarm mode not to be confused with docker swarm from pre 1.12.
I created all of the services with docker-machine. Each machine is running Ubuntu 15.10 with Docker 1.12.3.
Linux swarm-master-01 4.2.0-42-generic #49-Ubuntu SMP Tue Jun 28 21:26:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Using the master node I have created a service with the following
docker service create --replicas 1 --name myapp -p 3000 myapp
When I run docker service ps myapp I get the following output
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
02awst8p9pezgpkfzqgz8z79t myapp.1 myapp:latest swarm-node-01 Running Running 19 minutes ago
The running task is deployed to swarm-node-01.
I checked the auto-selected port which was published publicly
$ docker service inspect myapp | jq .[].Endpoint.Ports[].PublishedPort
30000
According to the documentation:
External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster whether or not the node is currently running the task for the service. All nodes in the swarm route ingress connections to a running task instance.
But when I try to curl the nodes who do not have the task running I'm getting connection refused.
$ curl $(docker-machine ip swarm-node-01):30000/stats
{"uptime":"2016-11-09T14:48:35Z","requestCount":7,"statuses":{"200":7},"pid":1,"open_db_conns":0}
$ curl $(docker-machine ip swarm-node-02):30000/stats
curl: (7) Failed to connect to [the IP] port 30000: Connection refused
note: I scrubbed the IP of node-02
My Troubleshooting:
The nodes are both properly connected to the swarm
Scaling the service up to 5 (which inherently deploys the task to every node) makes curl work on every node, because the task is deployed to every node.
UPDATE 1
I initialized the swarm with
docker swarm init --advertise-addr 10.0.0.12:2377 --listen-addr 10.0.0.12:2377
I checked the syslogs from the nodes and I'm seeing the following errors
level=error msg="Failed to join memberlist [10.0.0.12] on retry: 1 error(s) occurred:\n\n* Failed to join 10.0.0.12: dial tcp 10.0.0.12:7946: getsockopt: connection refused"
I checked to see if the ingress port was listening and it doesn't seem to be
ubuntu#swarm-master-01:~$ sudo lsof -i :7946
ubuntu#swarm-master-01:~$ cat < /dev/tcp/10.0.0.12/7946
-bash: connect: Connection refused
-bash: /dev/tcp/10.0.0.12/7946: Connection refused
ubuntu#swarm-master-01:~$ cat < /dev/tcp/0.0.0.0/7946
-bash: connect: Connection refused
-bash: /dev/tcp/0.0.0.0/7946: Connection refused
I was able to get around the issue for now, but I don't know what initially caused it. The overlay network (port 7946) wasn't listening on swarm-master-01. I figured this out with netstat -nlt. I searched the syslogs and found these errors related to the port in the syslog.
Nov 8 20:28:20 ubuntu docker[23092]: time="2016-11-08T20:28:20.171385360Z" level=warning msg="2016/11/08 20:28:20 [ERR] memberlist: Failed TCP fallback ping: read tcp 10.0.0.85:54016->10.0.0.13:7946: i/o timeout"
Nov 9 18:26:17 swarm-node-01 docker[714]: time="2016-11-09T18:26:17.573441271Z" level=warning msg="2016/11/09 18:26:17 [ERR] memberlist: Failed to send indirect ping: write udp [::]:7946->10.0.0.38:7946: use of closed network connection"
For some reason docker refused to open this port and listen any more. Here is what I did (albeit undesirable) to circumvent the issue:
Created another node with docker-machine called swarm-master-02
Joined swarm-master-02 to the cluster as a master
Demoted master-01 which set master-02 as the leader
Restarted the docker daemon on each node (might not have been necessary)
Now all of the machines are working as expected except for swarm-master-01. One task is running on swarm-node-01 and curl works against all nodes by forwarding the traffic to the proper container on the proper node. However, swarm-master-01 refuses to listen on the overlay network and curl does not work against this node. I was only able to fix swarm-master-01 by completely removing it from the cluster, restarting the docker daemon, and joining it again as a master. Now 7946 is listening on that machine.
I have set up docker swarm, installed on 2 ubuntu boxes, one centos, turned of firewalls, selinux, iptables.
Here is the guide I used: http://devopscube.com/docker-tutorial-getting-started-with-docker-swarm/
When I try and manage the swarm, I get this:
swarm manage token://28dc122221ee60ea44f587e0a338f638
INFO[0000] Listening for HTTP addr=127.0.0.1:2375 proto=tcp
ERRO[0000] Get http://10.20.7.143:2375/v1.15/info: dial tcp 10.20.7.143:2375: connection refused
ERRO[0000] Get http://10.20.7.144:2375/v1.15/info: dial tcp 10.20.7.144:2375: connection refused
ERRO[0000] Get http://10.20.7.146:2375/v1.15/info: dial tcp 10.20.7.146:2375: connection refused
Any Ideas?
You might have missed the line about the updated swarm post. Here is the link to the post.
http://devopscube.com/how-to-setup-and-configure-docker-swarm-cluster/
It works!