I have set up docker swarm, installed on 2 ubuntu boxes, one centos, turned of firewalls, selinux, iptables.
Here is the guide I used: http://devopscube.com/docker-tutorial-getting-started-with-docker-swarm/
When I try and manage the swarm, I get this:
swarm manage token://28dc122221ee60ea44f587e0a338f638
INFO[0000] Listening for HTTP addr=127.0.0.1:2375 proto=tcp
ERRO[0000] Get http://10.20.7.143:2375/v1.15/info: dial tcp 10.20.7.143:2375: connection refused
ERRO[0000] Get http://10.20.7.144:2375/v1.15/info: dial tcp 10.20.7.144:2375: connection refused
ERRO[0000] Get http://10.20.7.146:2375/v1.15/info: dial tcp 10.20.7.146:2375: connection refused
Any Ideas?
You might have missed the line about the updated swarm post. Here is the link to the post.
http://devopscube.com/how-to-setup-and-configure-docker-swarm-cluster/
It works!
Related
I gave role like this.
env: oracle cloud.
open port: TCP 2377 , UDP TCP 7946 ,UDP 4786
Instance A : manager
Instance B : worker
Local PC : worker
init swarm mode with this cli on A
docker swarm init --advertise-addr <A's IP>
B got
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp A's IP:2377: connect: no route to host"
Local PC got
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp A's IP:2377: connect: connection refused"
well I have no idea what should I need to do more.
thank you in advance.
the problem was firewall setting on manager node's instance.
sudo firewall-cmd --add-port=2377/tcp --permanent
sudo firewall-cmd --reload
for me #Logan Lee solution perfectly matched.
According to the docker documentation, following ports need to manage accordingly
TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes
UDP port 4789 for overlay network traffic
Thanks #Logan Lee
Check the network connectivity between A and B, seems like they are not on the same network.
I have an issue with the docker daemon installed on an Ubuntu 14.04 VM. The logs reveal that ipv6 is enabled hence the docker seems to be listening on this ip address. Essentially, this effects Clair. I have made sure that ipv6 is disabled on the following recommendation here. I also disabled ipv6 in daemon.json as specified in Docker documentation. My docker version is Docker version 17.06.1-ce, build 874a737.
Docker daemon logs :
time="2018-02-20T20:33:17.736203462+01:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 20 01:4860:4860::8844]"
Clair logs:
2018/02/20 20:43:51 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp [::]:6060: connect: cannot assign requested address"; Reconnecting to {[::]:6060 <nil>}
2018/02/20 20:46:14 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp [::]:6060: connect: cannot assign requested address"; Reconnecting to {[::]:6060 <nil>}
It's trying to make an IPv6 connection, but the address is wrong. [::] is IN6ADDR_ANY, not an actual address you can connect to. Provide the correct address in your config.yaml.
Did you mean to connect to localhost?
api:
# v3 grpc/RESTful API server address
addr: "[::1]:6060"
I have a kubernetes cluster with three nodes: 10.9.84.149,10.9.105.90 and 10.9.84.149. When my application tries to execute the command inside some pod:
kuebctl exec -it <podName>
it sometimes gets an error:
Error from server: error dialing backend: dial tcp 10.9.84.149:10250: getsockopt: connection refused
As far as I could see everything was fine with the cluster: all kube-system services and pods were running well. Besides, it didn't appear regularly.
Can anybody help me on this issue?
I got the same error as this below
Error from server: Get https://192.168.100.102:10250/containerLogs/default/kubia-n8nv9/kubia: dial tcp 192.168.100.102:10250: connect: no route to host
DISABLING THE FIREWALL WAS MY FIX ON ALL NODES
I figured out my worker nodes firewall was not disabled. I did instruction below to fix my problem
systemctl disable firewalld && systemctl stop firewalld
-Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1...
-Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.```
Looks like your kubelet process not running, or keep restarting.
ss -tnpl |grep 10250
LISTEN 0 128 :::10250 :::* users:(("kubelet",pid=1102,fd=21))
check kubelet process is running.
if its running see when its started.
look at /var/log/message file for any issue with node.
Make sure you don't have the firewall blocking the traffic
Update
I believe the culprit is the master who does not appear to be listening on port 7946. netstat shows that 7946 is listening on the nodes, but not the master. When I check the syslogs for the nodes I see the following error
level=error msg="Failed to join memberlist [10.0.0.12] on retry: 1 error(s) occurred:\n\n* Failed to join 10.0.0.12: dial tcp 10.0.0.12:7946: getsockopt: connection refused"
Original Post
I am running a three node Swarm Mode cluster in AWS; one master and two workers. This is swarm mode not to be confused with docker swarm from pre 1.12.
I created all of the services with docker-machine. Each machine is running Ubuntu 15.10 with Docker 1.12.3.
Linux swarm-master-01 4.2.0-42-generic #49-Ubuntu SMP Tue Jun 28 21:26:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Using the master node I have created a service with the following
docker service create --replicas 1 --name myapp -p 3000 myapp
When I run docker service ps myapp I get the following output
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
02awst8p9pezgpkfzqgz8z79t myapp.1 myapp:latest swarm-node-01 Running Running 19 minutes ago
The running task is deployed to swarm-node-01.
I checked the auto-selected port which was published publicly
$ docker service inspect myapp | jq .[].Endpoint.Ports[].PublishedPort
30000
According to the documentation:
External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster whether or not the node is currently running the task for the service. All nodes in the swarm route ingress connections to a running task instance.
But when I try to curl the nodes who do not have the task running I'm getting connection refused.
$ curl $(docker-machine ip swarm-node-01):30000/stats
{"uptime":"2016-11-09T14:48:35Z","requestCount":7,"statuses":{"200":7},"pid":1,"open_db_conns":0}
$ curl $(docker-machine ip swarm-node-02):30000/stats
curl: (7) Failed to connect to [the IP] port 30000: Connection refused
note: I scrubbed the IP of node-02
My Troubleshooting:
The nodes are both properly connected to the swarm
Scaling the service up to 5 (which inherently deploys the task to every node) makes curl work on every node, because the task is deployed to every node.
UPDATE 1
I initialized the swarm with
docker swarm init --advertise-addr 10.0.0.12:2377 --listen-addr 10.0.0.12:2377
I checked the syslogs from the nodes and I'm seeing the following errors
level=error msg="Failed to join memberlist [10.0.0.12] on retry: 1 error(s) occurred:\n\n* Failed to join 10.0.0.12: dial tcp 10.0.0.12:7946: getsockopt: connection refused"
I checked to see if the ingress port was listening and it doesn't seem to be
ubuntu#swarm-master-01:~$ sudo lsof -i :7946
ubuntu#swarm-master-01:~$ cat < /dev/tcp/10.0.0.12/7946
-bash: connect: Connection refused
-bash: /dev/tcp/10.0.0.12/7946: Connection refused
ubuntu#swarm-master-01:~$ cat < /dev/tcp/0.0.0.0/7946
-bash: connect: Connection refused
-bash: /dev/tcp/0.0.0.0/7946: Connection refused
I was able to get around the issue for now, but I don't know what initially caused it. The overlay network (port 7946) wasn't listening on swarm-master-01. I figured this out with netstat -nlt. I searched the syslogs and found these errors related to the port in the syslog.
Nov 8 20:28:20 ubuntu docker[23092]: time="2016-11-08T20:28:20.171385360Z" level=warning msg="2016/11/08 20:28:20 [ERR] memberlist: Failed TCP fallback ping: read tcp 10.0.0.85:54016->10.0.0.13:7946: i/o timeout"
Nov 9 18:26:17 swarm-node-01 docker[714]: time="2016-11-09T18:26:17.573441271Z" level=warning msg="2016/11/09 18:26:17 [ERR] memberlist: Failed to send indirect ping: write udp [::]:7946->10.0.0.38:7946: use of closed network connection"
For some reason docker refused to open this port and listen any more. Here is what I did (albeit undesirable) to circumvent the issue:
Created another node with docker-machine called swarm-master-02
Joined swarm-master-02 to the cluster as a master
Demoted master-01 which set master-02 as the leader
Restarted the docker daemon on each node (might not have been necessary)
Now all of the machines are working as expected except for swarm-master-01. One task is running on swarm-node-01 and curl works against all nodes by forwarding the traffic to the proper container on the proper node. However, swarm-master-01 refuses to listen on the overlay network and curl does not work against this node. I was only able to fix swarm-master-01 by completely removing it from the cluster, restarting the docker daemon, and joining it again as a master. Now 7946 is listening on that machine.
I am trying to follow this tutorial with two Vagrant instances:
http://kubernetes.io/v1.0/docs/getting-started-guides/docker-multinode.html
After setting up master and worker node I tried to connect to the service's IP of a simple nginx-service from the master. But it looks like the kube-proxy cannot find the docker-container of the worker-node.
The virtual IP of the service and the container-ip respond well on the worker-node
That made me think of a misfunction of flanneld.
Does anybody know how I could track down this error?
Any help is appreciated!
Thanks in advance
Best, Johannes
Output of Kube-Proxy container
I1016 20:53:42.829290 1 proxysocket.go:130] Accepted TCP connection from 10.0.2.15:51774 to 10.0.2.15:40197
E1016 20:53:43.829575 1 proxysocket.go:99] Dial failed: dial tcp 10.1.12.3:80: i/o timeout
E1016 20:53:45.825473 1 proxysocket.go:99] Dial failed: dial tcp 10.1.12.3:80: no route to host
E1016 20:53:48.825556 1 proxysocket.go:99] Dial failed: dial tcp 10.1.12.3:80: no route to host
E1016 20:53:51.825627 1 proxysocket.go:99] Dial failed: dial tcp 10.1.12.3:80: no route to host
E1016 20:53:51.825710 1 proxysocket.go:133] Failed to connect to balancer: failed to connect to an endpoint.
This looks like https://github.com/kubernetes/kubernetes/issues/14426. Try upgrading flannel to 0.5.3.