I have followed the most recent instructions (updated 7th May '15) to setup a cluster in Ubuntu** with etcd and flanneld. But I'm having trouble with the network... it seems to be in some kind of broken state.
**Note: I updated the config script so that it installed 0.16.2. Also a kubectl get minions returned nothing to start but after a sudo service kube-controller-manager restart they appeared.
This is my setup:
| ServerName | Public IP | Private IP |
------------------------------------------
| KubeMaster | 107.x.x.32 | 10.x.x.54 |
| KubeNode1 | 104.x.x.49 | 10.x.x.55 |
| KubeNode2 | 198.x.x.39 | 10.x.x.241 |
| KubeNode3 | 104.x.x.52 | 10.x.x.190 |
| MongoDev1 | 162.x.x.132 | 10.x.x.59 |
| MongoDev2 | 104.x.x.103 | 10.x.x.60 |
From any machine I can ping any other machine... it's when I create pods and services that I start getting issues.
Pod
POD IP CONTAINER(S) IMAGE(S) HOST LABELS STATUS CREATED
auth-dev-ctl-6xah8 172.16.37.7 sis-auth leportlabs/sisauth:latestdev 104.x.x.52/104.x.x.52 environment=dev,name=sis-auth Running 3 hours
So this pod has been spun up on KubeNode3... if I try and ping it from any machine other than it's KubeNode3 I get a Destination Net Unreachable error. E.g.
# ping 172.16.37.7
PING 172.16.37.7 (172.16.37.7) 56(84) bytes of data.
From 129.250.204.117 icmp_seq=1 Destination Net Unreachable
I can call etcdctl get /coreos.com/network/config on all four and get back {"Network":"172.16.0.0/16"}.
I'm not sure where to look from there. Can anyone help me out here?
Supporting Info
On the master node:
# ps -ef | grep kube
root 4729 1 0 May07 ? 00:06:29 /opt/bin/kube-scheduler --logtostderr=true --master=127.0.0.1:8080
root 4730 1 1 May07 ? 00:21:24 /opt/bin/kube-apiserver --address=0.0.0.0 --port=8080 --etcd_servers=http://127.0.0.1:4001 --logtostderr=true --portal_net=192.168.3.0/24
root 5724 1 0 May07 ? 00:10:25 /opt/bin/kube-controller-manager --master=127.0.0.1:8080 --machines=104.x.x.49,198.x.x.39,104.x.x.52 --logtostderr=true
# ps -ef | grep etcd
root 4723 1 2 May07 ? 00:32:46 /opt/bin/etcd -name infra0 -initial-advertise-peer-urls http://107.x.x.32:2380 -listen-peer-urls http://107.x.x.32:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster infra0=http://107.x.x.32:2380,infra1=http://104.x.x.49:2380,infra2=http://198.x.x.39:2380,infra3=http://104.x.x.52:2380 -initial-cluster-state new
On a node:
# ps -ef | grep kube
root 10878 1 1 May07 ? 00:16:22 /opt/bin/kubelet --address=0.0.0.0 --port=10250 --hostname_override=104.x.x.49 --api_servers=http://107.x.x.32:8080 --logtostderr=true --cluster_dns=192.168.3.10 --cluster_domain=kubernetes.local
root 10882 1 0 May07 ? 00:05:23 /opt/bin/kube-proxy --master=http://107.x.x.32:8080 --logtostderr=true
# ps -ef | grep etcd
root 10873 1 1 May07 ? 00:14:09 /opt/bin/etcd -name infra1 -initial-advertise-peer-urls http://104.x.x.49:2380 -listen-peer-urls http://104.x.x.49:2380 -initial-cluster-token etcd-cluster-1 -initial-cluster infra0=http://107.x.x.32:2380,infra1=http://104.x.x.49:2380,infra2=http://198.x.x.39:2380,infra3=http://104.x.x.52:2380 -initial-cluster-state new
#ps -ef | grep flanneld
root 19560 1 0 May07 ? 00:00:01 /opt/bin/flanneld
So I noticed that the flannel configuration (/run/flannel/subnet.env) was different to what docker was starting up with (wouldn't have a clue how they got out of sync).
# ps -ef | grep docker
root 19663 1 0 May07 ? 00:09:20 /usr/bin/docker -d -H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=172.16.85.1/24 --mtu=1472
# cat /run/flannel/subnet.env
FLANNEL_SUBNET=172.16.60.1/24
FLANNEL_MTU=1472
FLANNEL_IPMASQ=false
Note that the docker --bip=172.16.85.1/24 was different to the flannel subnet FLANNEL_SUBNET=172.16.60.1/24.
So naturally I changed /etc/default/docker to reflect the new value.
DOCKER_OPTS="-H tcp://127.0.0.1:4243 -H unix:///var/run/docker.sock --bip=172.16.60.1/24 --mtu=1472"
But now a sudo service docker restart wasn't erroring out... so looking at /var/log/upstart/docker.log I could see the following
FATA[0000] Shutting down daemon due to errors: Bridge ip (172.16.85.1) does not match existing bridge configuration 172.16.60.1
So the final piece to the puzzle was deleting the old bridge and restarting docker...
# sudo brctl delbr docker0
# sudo service docker start
If sudo brctl delbr docker0 returns bridge docker0 is still up; can't delete it run ifconfig docker0 down and try again.
Please try this:
ip link del docker0
systemctl restart flanneld
Related
i've been trying all the existing commands for several hours and could not fix this problem.
i used everything covered in this Article: Docker - Bind for 0.0.0.0:4000 failed: port is already allocated.
I currently have one container: docker ps -a | meanwhile docker ps is empty
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ebb9289dfd1 dockware/dev:latest "/bin/bash /entrypoi…" 2 minutes ago Created TheGoodPartDocker
when i Try docker-compose up -d i get the Error:
ERROR: for TheGoodPartDocker Cannot start service shop: driver failed programming external connectivity on endpoint TheGoodPartDocker (3b59ebe9366bf1c4a848670c0812935def49656a88fa95be5c4a4be0d7d6f5e6): Bind for 0.0.0.0:80 failed: port is already allocated
I've tried to remove everything using: docker ps -aq | xargs docker stop | xargs docker rm
Or remove ports: fuser -k 80/tcp
even deleting networks:
sudo service docker stop
sudo rm -f /var/lib/docker/network/files/local-kv.db
or just manually shut down stop and run:
docker-compose down
docker stop 5ebb9289dfd1
docker rm 5ebb9289dfd1
here is also my netstat : netstat | grep 80
unix 3 [ ] STREAM CONNECTED 20680 /mnt/wslg/PulseAudioRDPSink
unix 3 [ ] STREAM CONNECTED 18044
unix 3 [ ] STREAM CONNECTED 32780
unix 3 [ ] STREAM CONNECTED 17805 /run/guest-services/procd.sock
And docker port TheGoodPartDocker gives me no result.
I also restarted my computer, but nothing works :(.
Thanks for helping
Obviously port 80 is already occupied by some other process. You need to stop the process, before you start the container. To find out the process use ss:
$ ss -tulpn | grep 22
tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1187,fd=3))
tcp LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=1187,fd=4))
I'm trying to use the resources from other computers using the python3-mpi4py since my research uses a lot of calculations.
My codes and data are on the docker container.
To use mpi I have to be able to ssh directly to the docker container from other computers inside the same network as the host computer is located. But I cannot ssh into it.
my image is like below
|Host | <- On the same network -> | Other Computers |
| port 10000 | | |
| ^ | | |
|-------|-----------| | |
| V | | |
| port 10000 | | |
|docker container <-|------------ ssh ------------|--> |
Can anyone teach me how to do this?
You can running ssh server in the Host computer, then you can ssh to Host, then use docker command such as docker exec -i -t containerName /bin/bash to get interactive shell.
example:
# 1. On Other Computers
ssh root#host_ip
>> enter into Host ssh shell
# 2. On Host ssh shell
docker exec -i -t containerName /bin/bash
>> enter into docker interactive shell
# 3. On docker interactive shell
do sth.
My real question is, if secrets are mounted as volumes in pods - can they be read if someone gains root access to the host OS.
For example by accessing /var/lib/docker and drilling down to the volume.
If someone has root access to your host with containers, he can do pretty much whatever he wants... Don't forget that pods are just a bunch of containers, which in fact are processes with pids. So for example, if I have a pod called sleeper:
kubectl get pods sleeper-546494588f-tx6pp -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
sleeper-546494588f-tx6pp 1/1 Running 1 21h 10.200.1.14 k8s-node-2 <none>
running on the node k8s-node-2. With root access to this node, I can check what pid this pod and its containers have (I am using containerd as container engine, but points below are very similar for docker or any other container engine):
[root#k8s-node-2 /]# crictl -r unix:///var/run/containerd/containerd.sock pods -name sleeper-546494588f-tx6pp -q
ec27f502f4edd42b85a93503ea77b6062a3504cbb7ac6d696f44e2849135c24e
[root#k8s-node-2 /]# crictl -r unix:///var/run/containerd/containerd.sock ps -p ec27f502f4edd42b85a93503ea77b6062a3504cbb7ac6d696f44e2849135c24e
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT POD ID
70ca6950de10b 8ac48589692a5 2 hours ago Running sleeper 1 ec27f502f4edd
[root#k8s-node-2 /]# crictl -r unix:///var/run/containerd/containerd.sock# inspect 70ca6950de10b | grep pid | head -n 1
"pid": 24180,
And then finally with those information (pid number), I can access "/" mountpoint of this process and check its content including secrets:
[root#k8s-node-2 /]# ll /proc/24180/root/var/run/secrets/kubernetes.io/serviceaccount/
total 0
lrwxrwxrwx. 1 root root 13 Nov 14 13:57 ca.crt -> ..data/ca.crt
lrwxrwxrwx. 1 root root 16 Nov 14 13:57 namespace -> ..data/namespace
lrwxrwxrwx. 1 root root 12 Nov 14 13:57 token -> ..data/token
[root#k8s-node-2 serviceaccount]# cat /proc/24180/root/var/run/secrets/kubernetes.io/serviceaccount/namespace ; echo
default
[root#k8s-node-2 serviceaccount]# cat /proc/24180/root/var/run/secrets/kubernetes.io/serviceaccount/token | cut -d'.' -f 1 | base64 -d ;echo
{"alg":"RS256","kid":""}
[root#k8s-node-2 serviceaccount]# cat /proc/24180/root/var/run/secrets/kubernetes.io/serviceaccount/token | cut -d'.' -f 2 | base64 -d 2>/dev/null ;echo
{"iss":"kubernetes/serviceaccount","kubernetes.io/serviceaccount/namespace":"default","kubernetes.io/serviceaccount/secret.name":"default-token-6sbz9","kubernetes.io/serviceaccount/service-account.name":"default","kubernetes.io/serviceaccount/service-account.uid":"42e7f596-e74e-11e8-af81-525400e6d25d","sub":"system:serviceaccount:default:default"}
It is one of the reasons why it is super important to properly secure access to your kubernetes infrastructure.
I have 2 nodes in my swarm cluster, a manager and a worker. I deployed a stack with 5 replicas distributed in those nodes. The yaml file has a network called webnet for the service web. After they are deployed I try to access the service, but when I use the IP address of the manager node it load-balances between 2 replicas, and if I use the IP address of the worker it load-balances among the other 3 replicas. So, using only docker, how can I load-balance among all the 5 replicas?
My nodes:
root#debiancli:~/docker# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
5y256zrqwalq1hcxmqnnqc177 centostraining Ready Active 18.06.0-ce
mkg6ecl3x28uyyqx7gvzz0ja3 * debiancli Ready Active Leader 18.06.0-ce
Tasks in manager (self) and worker (centostraining):
root#debiancli:~/docker# docker node ps self
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
stbe721sstq7 getstartedlab_web.3 get-started:part2 debiancli Running Running 2 hours ago
6syjojjmyh0y getstartedlab_web.5 get-started:part2 debiancli Running Running 2 hours ago
root#debiancli:~/docker# docker node ps centostraining
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
wpvsd98vfwd1 getstartedlab_web.1 get-started:part2 centostraining Running Running less than a second ago
e3z8xybuv53l getstartedlab_web.2 get-started:part2 centostraining Running Running less than a second ago
sd0oi675c2am getstartedlab_web.4 get-started:part2 centostraining Running Running less than a second ago
The stack and its tasks:
root#debiancli:~/docker# docker stack ls
NAME SERVICES ORCHESTRATOR
getstartedlab 1 Swarm
root#debiancli:~/docker# docker stack ps getstartedlab
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
wpvsd98vfwd1 getstartedlab_web.1 get-started:part2 centostraining Running Running less than a second ago
e3z8xybuv53l getstartedlab_web.2 get-started:part2 centostraining Running Running less than a second ago
stbe721sstq7 getstartedlab_web.3 get-started:part2 debiancli Running Running 2 hours ago
sd0oi675c2am getstartedlab_web.4 get-started:part2 centostraining Running Running less than a second ago
6syjojjmyh0y getstartedlab_web.5 get-started:part2 debiancli Running Running 2 hours ago
The networks (getstartedlab_webnet is used by my tasks)
root#debiancli:~/docker# docker network ls
NETWORK ID NAME DRIVER SCOPE
b95dd9ee2ae6 bridge bridge local
63578897e920 docker_gwbridge bridge local
x47kwsfa23oo getstartedlab_webnet overlay swarm
7f77ad495edd host host local
ip8czm66ofng ingress overlay swarm
f2cc6118fde7 none null local
docker-compose.yml used to deploy the stack:
root#debiancli:~/docker#cat docker-compose.yml
version: "3"
services:
web:
image: get-started:part2
deploy:
replicas: 5
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
ports:
- "4000:80"
networks:
- webnet
networks:
webnet:
Accessing the service from a third machine (this curl and grep pulls the container name)
[Ubuntu:~]$ debiancli=192.168.182.129
[Ubuntu:~]$ centostraining=192.168.182.133
[Ubuntu:~]$ curl -s $debiancli:4000 | grep -oP "(?<=</b> )[^<].{11}"
f4c1e3ff53f2
[Ubuntu:~]$ curl -s $debiancli:4000 | grep -oP "(?<=</b> )[^<].{11}"
de2110bee2f7
[Ubuntu:~]$ curl -s $debiancli:4000 | grep -oP "(?<=</b> )[^<].{11}"
f4c1e3ff53f2
[Ubuntu:~]$ curl -s $debiancli:4000 | grep -oP "(?<=</b> )[^<].{11}"
de2110bee2f7
[Ubuntu:~]$ curl -s $debiancli:4000 | grep -oP "(?<=</b> )[^<].{11}"
f4c1e3ff53f2
[Ubuntu:~]$ curl -s $debiancli:4000 | grep -oP "(?<=</b> )[^<].{11}"
de2110bee2f7
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
72b757f92983
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
d2e824865436
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
b53c3fd0cfbb
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
72b757f92983
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
d2e824865436
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
b53c3fd0cfbb
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
72b757f92983
[Ubuntu:~]$ curl -s $centostraining:4000 | grep -oP "(?<=</b> )[^<].{11}"
d2e824865436
Notice that when I probe debiancli (swarm manager) it loops through containers f4c1e3ff53f2 and de2110bee2f7 only, that is, the 2 replicas running on the manager and the same happens for the 3 replicas in centostraining (swarm worker). So, what am I missing?
Am trying to setup Minikube, and have a challenge. My minikube is setup, and I started the Nginex pod. I can see that the pod is up, but the service doesn't appear as active. On dashboard too, although the pod appears the depolyment doesn't show up. Here are my power shell command outputs.
Am learning this technology and may have missed something. My understanding is that when using docker tools, no explicit configurations are necessary at docker level, other than setting it up. Am I wrong here ? If so where ?
relevant PS output
Lets deploy hello-nginx deployment
C:\> kubectl.exe run hello-nginx --image=nginx --port=80
deployment "hello-nginx" created
View List of pods
c:\> kubectl.exe get pods
NAME READY STATUS RESTARTS AGE
hello-nginx-6d66656896-hqz9j 1/1 Running 0 6m
Expose as a Service
c:\> kubectl.exe expose deployment hello-nginx --type=NodePort
service "hello-nginx" exposed
List exposed services using minikube
c:\> minikube.exe service list
|-------------|----------------------|-----------------------------|
| NAMESPACE | NAME | URL |
|-------------|----------------------|-----------------------------|
| default | hello-nginx | http://192.168.99.100:31313 |
| default | kubernetes | No node port |
| kube-system | kube-dns | No node port |
| kube-system | kubernetes-dashboard | http://192.168.99.100:30000 |
|-------------|----------------------|-----------------------------|
Access Nginx from browser http://192.168.99.100:31313
This method can be used
this worked for me on centos 7
$ systemctl enable nginx
$ systemctl restart nginx
or
$ systemctl start nginx