Can't get Pod info in the kubernetes cluster - docker

Following instructions in the book of "Kubernetes Cookbook", I create a docker cluster with one master and two nodes:
master: 198.11.175.18
etcd, flannel, kube-apiserver, kube-controller-manager, kube-scheduler
minion:
etcd, flannel, kubelet, kube-proxy
minion1: 120.27.94.15
minion2: 114.215.142.7
OS version is:
[user1#iZu1ndxa4itZ ~]$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
[user1#iZu1ndxa4itZ ~]$ uname -a
Linux iZu1ndxa4itZ 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Kuberneters version is:
Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5", GitTreeState:"clean"}
I can get the status of two nodes by issuing kubectl on Master.
[user1#iZu1ndxa4itZ ~]$ kubectl get nodes
NAME STATUS AGE
114.215.142.7 Ready 23m
120.27.94.15 Ready 14h
The components on Master work well:
[user1#iZu1ndxa4itZ ~]$ kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
But after starting a nginx container, there is no Pods status.
[user1#iZu1ndxa4itZ ~]$ kubectl run --image=nginx nginx-test
deployment "nginx-test" created
[user1#iZu1ndxa4itZ ~]$ kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
my-first-nginx 2 0 0 0 20h
my-first-nginx01 1 0 0 0 20h
my-first-nginx02 1 0 0 0 19h
nginx-test 1 0 0 0 5h
[user1#iZu1ndxa4itZ ~]$ kubectl get pods
Any clue to diagnose the problem? Thanks.
BTW, I attempted to run two Docker containers manually in different nodes, the two containers can communicate with each other using ping.
Update 2016-08-19
Found clue from services log of kube-apiser and kube-controller-manager, the problem may be caused by incorrect secure configuration:
sudo service kube-apiserver status -l
Aug 19 14:59:53 iZu1ndxa4itZ kube-apiserver[21393]: E0819 14:59:53.118954 21393 genericapiserver.go:716] Unable to listen for secure (open /var/run/kubernetes/apiserver.crt: no such file or directory); will try again.
Aug 19 15:00:08 iZu1ndxa4itZ kube-apiserver[21393]: E0819 15:00:08.120253 21393 genericapiserver.go:716] Unable to listen for secure (open /var/run/kubernetes/apiserver.crt: no such file or directory); will try again.
Aug 19 15:00:23 iZu1ndxa4itZ kube-apiserver[21393]: E0819 15:00:23.121345 21393 genericapiserver.go:716] Unable to listen for secure (open /var/run/kubernetes/apiserver.crt: no such file or directory); will try again.
Aug 19 15:00:38 iZu1ndxa4itZ kube-apiserver[21393]: E0819 15:00:38.122638 21393 genericapiserver.go:716] Unable to listen for secure (open /var/run/kubernetes/apiserver.crt: no such file or directory); will try again.
sudo service kube-controller-manager status -l
Aug 19 15:01:52 iZu1ndxa4itZ kube-controller-manager[21415]: E0819 15:01:52.138742 21415 replica_set.go:446] unable to create pods: pods "my-first-nginx02-1004561501-" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service account
Aug 19 15:01:52 iZu1ndxa4itZ kube-controller-manager[21415]: I0819 15:01:52.138799 21415 event.go:211] Event(api.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"my-first-nginx02-1004561501", UID:"ba35be11-652a-11e6-88d2-00163e0017a3", APIVersion:"extensions", ResourceVersion:"120", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: pods "my-first-nginx02-1004561501-" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service account
Aug 19 15:01:52 iZu1ndxa4itZ kube-controller-manager[21415]: E0819 15:01:52.144583 21415 replica_set.go:446] unable to create pods: pods "my-first-nginx-3671155609-" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service account
Aug 19 15:01:52 iZu1ndxa4itZ kube-controller-manager[21415]: I0819 15:01:52.144657 21415 event.go:211] Event(api.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"my-first-nginx-3671155609", UID:"d6c8288c-6529-11e6-88d2-00163e0017a3", APIVersion:"extensions", ResourceVersion:"54", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: pods "my-first-nginx-3671155609-" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service account
Aug 19 15:04:17 iZu1ndxa4itZ kube-controller-manager[21415]: I0819 15:04:17.149320 21415 event.go:211] Event(api.ObjectReference{Kind:"ReplicaSet", Namespace:"default", Name:"nginx-test-863723326", UID:"624ed0ea-65a2-11e6-88d2-00163e0017a3", APIVersion:"extensions", ResourceVersion:"12247", FieldPath:""}): type: 'Warning' reason: 'FailedCreate' Error creating: pods "nginx-test-863723326-" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service account
Aug 19 15:04:17 iZu1ndxa4itZ kube-controller-manager[21415]: E0819 15:04:17.148513 21415 replica_set.go:446] unable to create pods: pods "nginx-test-863723326-" is forbidden: no API token found for service account default/default, retry after the token is automatically created and added to the service accoun

Resolved the problem with following procedure:
openssl genrsa -out /tmp/service_account.key 2048
sudo cp /tmp/service_account.key /etc/kubernetes/service_account.key
sudo vim /etc/kubernetes/apiserver
KUBE_API_ARGS="--secure-port=0 --service-account-key-file=/etc/kubernetes/service_account.key"
sudo service kube-apiserver restart
sudo vim /etc/kubernetes/controller-manager
KUBE_CONTROLLER_MANAGER_ARGS="--service_account_private_key_file=/etc/kubernetes/service_account.key"
sudo service kube-controller-manager restart

Related

Unable to create kubernetes with HA etc on Ubuntu 20.04

Since two days I am fighting with Kubernetes setup on Ubuntu 20.04. I created so called template vm on vSphere and I cloned three vm's out of it.
I have following configurations for each master node:
/etc/hosts
127.0.0.1 localhost
127.0.1.1 kubernetes-master1
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.255.200 kubernetes-cluster.homelab01.local
192.168.255.201 kubernetes-master1.homelab01.local
192.168.255.202 kubernetes-master2.homelab01.local
192.168.255.203 kubernetes-master3.homelab01.local
192.168.255.204 kubernetes-worker1.homelab01.local
192.168.255.205 kubernetes-worker2.homelab01.local
192.168.255.206 kubernetes-worker3.homelab01.local
127.0.1.1 kubernetes-master1on a first master and 127.0.1.1 kubernetes-master2 on second one and 127.0.1.1 kubernetes-master3 on the third one.
I am using Docker 19.03.11 which is latest supported by Kubernetes as per documentation.
Docker
Client: Docker Engine - Community
Version: 19.03.11
API version: 1.40
Go version: go1.13.10
Git commit: 42e35e61f3
Built: Mon Jun 1 09:12:34 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.11
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 42e35e61f3
Built: Mon Jun 1 09:11:07 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
I used following commands to install docker:
sudo apt-get update && sudo apt-get install -y \
containerd.io=1.2.13-2 \
docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)
I marked all the necessary packets on hold.
sudo apt-mark hold kubelet kubeadm kubectl docker-ce containerd.io docker-ce-cli
Some details about the VM's.
Master1
sudo cat /sys/class/dmi/id/product_uuid
f09c3242-c8f7-c97e-bc6a-b2065c286ea9
IP: 192.168.255.201
Master2
sudo cat /sys/class/dmi/id/product_uuid
b4fe3242-ba37-a533-c12f-b30b735cbe9f
IP: 192.168.255.202
Master3
sudo cat /sys/class/dmi/id/product_uuid
c3cc3242-4115-8c38-8e46-166190620249
IP: 192.168.255.203
IP addresses and name resolution works flawless on all hosts
192.168.255.200 kubernetes-cluster.homelab01.local
192.168.255.201 kubernetes-master1.homelab01.local
192.168.255.202 kubernetes-master2.homelab01.local
192.168.255.203 kubernetes-master3.homelab01.local
192.168.255.204 kubernetes-worker1.homelab01.local
192.168.255.205 kubernetes-worker2.homelab01.local
192.168.255.206 kubernetes-worker3.homelab01.local
Keepalived.conf
From master1. On master2 it has state=backup and priority 100, on master3 state=backup and priority 89.
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
$STATE=MASTER
$INTERFACE=ens160
$ROUTER_ID=51
$PRIORITY=255
$AUTH_PASS=Kub3rn3t3S!
$APISERVER_VIP=192.168.255.200/24
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state $STATE
interface $INTERFACE
virtual_router_id $ROUTER_ID
priority $PRIORITY
authentication {
auth_type PASS
auth_pass $AUTH_PASS
}
virtual_ipaddress {
$APISERVER_VIP
}
track_script {
check_apiserver
}
}
check_apiserver.sh
/etc/keepalived/check_apiserver.sh
#!/bin/sh
APISERVER_VIP=192.168.255.200
APISERVER_DEST_PORT=6443
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://localhost:${APISERVER_DEST_PORT}/"
if ip addr | grep -q ${APISERVER_VIP}; then
curl --silent --max-time 2 --insecure https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/ -o /dev/null || errorExit "Error GET https://${APISERVER_VIP}:${APISERVER_DEST_PORT}/"
fi
Keepalive service
sudo service keepalived status
● keepalived.service - Keepalive Daemon (LVS and VRRP)
Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-01-06 16:41:38 CET; 1min 26s ago
Main PID: 804 (keepalived)
Tasks: 2 (limit: 4620)
Memory: 4.7M
CGroup: /system.slice/keepalived.service
├─804 /usr/sbin/keepalived --dont-fork
└─840 /usr/sbin/keepalived --dont-fork
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering Kernel netlink reflector
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering Kernel netlink command channel
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Opening file '/etc/keepalived/keepalived.conf>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: WARNING - default user 'keepalived_script' fo>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (Line 29) Truncating auth_pass to 8 characters
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: SECURITY VIOLATION - scripts are being execut>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (VI_1) ignoring tracked script check_apiserve>
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Warning - script check_apiserver is not used
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: Registering gratuitous ARP shared channel
Jan 06 16:41:38 kubernetes-master1 Keepalived_vrrp[840]: (VI_1) Entering MASTER STATE
lines 1-20/20 (END)
haproxy.cfg
# /etc/haproxy/haproxy.cfg
#
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 notice
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
bind *:8443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server kubernetes-master1 192.168.255.201:6443 check
server kubernetes-master2 192.168.255.202:6443 check
server kubernetes-master3 192.168.255.203:6443 check
haproxy service status
sudo service haproxy status
● haproxy.service - HAProxy Load Balancer
Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-01-06 16:41:38 CET; 3min 12s ago
Docs: man:haproxy(1)
file:/usr/share/doc/haproxy/configuration.txt.gz
Process: 847 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUC>
Main PID: 849 (haproxy)
Tasks: 3 (limit: 4620)
Memory: 4.7M
CGroup: /system.slice/haproxy.service
├─849 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/hapro>
└─856 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/hapro>
Jan 06 16:41:38 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master1 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [WARNING] 005/164139 (856) : Server apiserver/kuberne>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master2 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master2 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [WARNING] 005/164139 (856) : Server apiserver/kuberne>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: [ALERT] 005/164139 (856) : backend 'apiserver' has no>
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master3 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: Server apiserver/kubernetes-master3 is DOWN, reason: >
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: backend apiserver has no server available!
Jan 06 16:41:39 kubernetes-master1 haproxy[856]: backend apiserver has no server available!
lines 1-23/23 (END)
I am creating the first kubernetes node with following command
sudo kubeadm init --control-plane-endpoint kubernetes-cluster.homelab01.local:8443 --upload-certs
This works well and I apply Calico CNI plugin with command
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
After that I am attempting join from master2.
Keepalived works perfectly fine as I tested it on all three with stopping service and observing failover to other nodes. When on the first master1 node I created kubernetes haproxy informed that backend was visible.
Kubernetes cluster bootstrap process
udo kubeadm init --control-plane-endpoint kubernetes-cluster.homelab01.local:8443 --upload-certs
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes-cluster.homelab01.local kubernetes-master1 kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.255.201]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master1 localhost] and IPs [192.168.255.201 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master1 localhost] and IPs [192.168.255.201 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 18.539325 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
[mark-control-plane] Marking the node kubernetes-master1 as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node kubernetes-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 2cu336.rjxs8i0svtna27ke
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
--discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54 \
--control-plane --certificate-key 57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
--discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54
All stuff is up and running on master1
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/calico-kube-controllers-744cfdf676-mks4d 1/1 Running 0 36s
kube-system pod/calico-node-bnvmz 1/1 Running 0 37s
kube-system pod/coredns-74ff55c5b-skdzk 1/1 Running 0 3m11s
kube-system pod/coredns-74ff55c5b-tctl9 1/1 Running 0 3m11s
kube-system pod/etcd-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-apiserver-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-controller-manager-kubernetes-master1 1/1 Running 0 3m4s
kube-system pod/kube-proxy-smmmx 1/1 Running 0 3m11s
kube-system pod/kube-scheduler-kubernetes-master1 1/1 Running 0 3m4s
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3m17
s
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 3m11
s
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SE
LECTOR AGE
kube-system daemonset.apps/calico-node 1 1 1 1 1 kuberne
tes.io/os=linux 38s
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kuberne
tes.io/os=linux 3m11s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 38s
kube-system deployment.apps/coredns 2/2 2 2 3m11s
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/calico-kube-controllers-744cfdf676 1 1 1 37s
kube-system replicaset.apps/coredns-74ff55c5b 2 2 2 3m11s
Immediately after attempting to join master2 to cluster master1 kubernetes dies.
wojcieh#kubernetes-master2:~$ sudo kubeadm join kubernetes-cluster.homelab01.local:8443 --token 2cu336.rjxs8i0svtna27ke \
> --discovery-token-ca-cert-hash sha256:eb0668ca16acec622e4a97d69e0d4c42e64b1a61ffea13a3787956817021ca54 \
> --control-plane --certificate-key 57abea9f00357a4459c852249ac0170633c9a0f2327cde191e529a1689ea158b
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes-cluster.homelab01.local kubernetes-master2 kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.255.202]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kubernetes-master2 localhost] and IPs [192.168.255.202 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kubernetes-master2 localhost] and IPs [192.168.255.202 127.0.0.1 ::1]
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
Broadcast message from systemd-journald#kubernetes-master2 (Wed 2021-01-06 16:53:04 CET):
haproxy[870]: backend apiserver has no server available!
Broadcast message from systemd-journald#kubernetes-master2 (Wed 2021-01-06 16:53:04 CET):
haproxy[870]: backend apiserver has no server available!
^C
wojcieh#kubernetes-master2:~$
Here are some logs which might be relevant
Logs from master1 https://pastebin.com/Y1zcwfWt
Logs from master2 https://pastebin.com/rBELgK1Y

Cannot initialize Kubernetes cluster on Ubuntu 18.04 (Virtual Box)

I struggle to initialize a simple Kubernetes cluster using Ubuntu on Virtualbox. I tried both server and desktop version, following the official documentation:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
I also tried to follow some other ones, thinking the issue was because i'm using Virtualbox VM's, like this one:
https://medium.com/#gunjangarge/create-kubernetes-cluster-using-kubeadm-on-ubuntu-virtualbox-step-by-step-68a3eeb1f74c
But everytime I have the same issue with port 6443 not being exposed. Sometimes the process starts correctly, giving me the join command:
kubeadm init --pod-network-cidr=192.168.0.0/16
W1029 08:47:53.841460 11540 configset.go:348] WARNING: kubeadm cannot validate component configs
for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.192:6443 --token ztnoww.t8ng5a3jo2kx5cb2 \
--discovery-token-ca-cert-hash
sha256:907dde6cc6d72ed4cd7fe7e7f252e2cf657dd3256fba6ee5ec92027132a9c5af
Sometimes it's not starting at all and timeouting:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
Anyway, even when it's starting, port 6443 is never exposed, and kubelet is not happy with it:
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Thu 2020-10-29 08:48:15 CET; 20s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 13262 (kubelet)
Tasks: 14 (limit: 4666)
CGroup: /system.slice/kubelet.service
└─13262 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-contai
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.588386 13262 controller.go:136] failed to ensure node lease exists, will retry in 800ms, error: Get
"https://192.168.1.192:6443/apis/coordination.k8s.io/v1/names
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.785951 13262 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.1.192:644
Okt 29 08:48:23 master kubelet[13262]: I1029 08:48:23.022354 13262 kubelet_node_status.go:70] Attempting to register node master
Okt 29 08:48:24 master kubelet[13262]: I1029 08:48:24.188510 13262 request.go:645] Throttling request took 1.097264312s, request: POST:https://192.168.1.192:6443/api/v1/namespaces/kube-system/pods
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.678880 13262 kubelet_node_status.go:108] Node master was previously registered
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.679004 13262 kubelet_node_status.go:73] Successfully registered node master
Okt 29 08:48:25 master kubelet[13262]: W1029 08:48:25.765981 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:27 master kubelet[13262]: E1029 08:48:27.148246 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
Okt 29 08:48:30 master kubelet[13262]: W1029 08:48:30.767511 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:32 master kubelet[13262]: E1029 08:48:32.164211 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
I have to say I don't know what to do now. I tried for hours with different Ubuntu versions, trying to find solutions on the Internet but I didn't found any solution. I also went trough the logs and found that maybe the config file is not created correctly for any reason:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml
but I found nothing about it, except "try to init the cluster again", which I did several times..
Thank you in advance for your help!
OK, I think I finally found the problem. I tried the same process on another PC and everything worked smoothly, so for anyway of you having a similar issue, just don't try to use VirtualBox and WSL at the same time (even if wsl is shut off)
I just did what's explained here: https://stackoverflow.com/a/63229718/2428805 and now everything's fine...

I can't set up Kubernetes in Centos 7: Unable to update cni config

I am trying to follow docs to setup a one node Kubernetes cluster with Centos 7.
kubeadm init will return no error but kubectl get nodes will return:
NAME STATUS AGE VERSION
[MY_IP] NotReady 22s v1.6.4
system log repeats:
Jun 6 16:21:48 localhost kubelet: W0606 16:21:48.064388 11520 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 6 16:21:48 localhost kubelet: E0606 16:21:48.064537 11520 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I can only find info about this in Kubernetes github logs but they talk about a bug and I haven't found a workaround. Thanks
you can run this command
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Looks like you need a pod network. Have you completed step 3 in the guide here? If you install one of the network overlays (listed at https://kubernetes.io/docs/concepts/cluster-administration/addons/), you should be good to go.

When using mesos, marathon, and zookeeper my mesos-slave doesnt start when I specify the "containerizers" file with "docker,mesos"?

I have 3 CentOS VMs and I have installed Zookeeper, Marathon, and Mesos on the master node, while only putting Mesos on the other 2 VMs. The master node has no mesos-slave running on it. I am trying to run Docker containers so i specified "docker,mesos" in the containerizes file. One of the mesos-agents starts fine with this configuration and I have been able to deploy a container to that slave. However, the second mesos-agent simply fails when I have this configuration (it works if i take out that containerizes file but then it doesn't run containers). Here are some of the logs and information that has come up:
Here are some "messages" in the log directory:
Apr 26 16:09:12 centos-minion-3 systemd: Started Mesos Slave.
Apr 26 16:09:12 centos-minion-3 systemd: Starting Mesos Slave...
WARNING: Logging before InitGoogleLogging() is written to STDERR
[main.cpp:243] Build: 2017-04-12 16:39:09 by centos
[main.cpp:244] Version: 1.2.0
[main.cpp:247] Git tag: 1.2.0
[main.cpp:251] Git SHA: de306b5786de3c221bae1457c6f2ccaeb38eef9f
[logging.cpp:194] INFO level logging started!
[systemd.cpp:238] systemd version `219` detected
[main.cpp:342] Inializing systemd state
[systemd.cpp:326] Started systemd slice `mesos_executors.slice`
[containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
[linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
[provisioner.cpp:249] Using default backend 'copy'
[slave.cpp:211] Mesos agent started on (1)#172.22.150.87:5051
[slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
[slave.cpp:541] Agent resources: cpus(*):1; mem(*):919; disk(*):2043; ports(*):[31000-32000]
[slave.cpp:549] Agent attributes: [ ]
[slave.cpp:554] Agent hostname: node3
[status_update_manager.cpp:177] Pausing sending status updates
[state.cpp:62] Recovering state from '/var/lib/mesos/meta'
[state.cpp:706] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
[status_update_manager.cpp:203] Recovering status update manager
[docker.cpp:868] Recovering Docker containers
[containerizer.cpp:599] Recovering containerizer
[provisioner.cpp:410] Provisioner recovery complete
[group.cpp:340] Group process (zookeeper-group(1)#172.22.150.87:5051) connected to ZooKeeper
[group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
[group.cpp:418] Trying to create path '/mesos' in ZooKeeper
[detector.cpp:152] Detected a new leader: (id='15')
[group.cpp:699] Trying to get '/mesos/json.info_0000000015' in ZooKeeper
[zookeeper.cpp:259] A new leading master (UPID=master#172.22.150.88:5050) is detected
Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 26 16:09:13 centos-minion-3 systemd: Unit mesos-slave.service entered failed state.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service failed.
Logs from docker:
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine Loaded:
loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf Active: inactive (dead) since Tue 2017-04-25 18:00:03 CDT;
24h ago Docs: docs.docker.com Main PID: 872 (code=exited, status=0/SUCCESS)
Apr 26 18:25:25 centos-minion-3 systemd[1]: Dependency failed for Docker Application Container Engine.
Apr 26 18:25:25 centos-minion-3 systemd[1]: Job docker.service/start failed with result 'dependency'
Logs from flannel:
[flanneld-start: network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
You have answer in your logs
Failed to perform recovery: Collect failed:
Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1;
stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Mesos keeps it state/metadata on local disk. When it's restarted it try to load this state. If configuration changed and is not compatible with previous state it won't start.
Just bring docker to live by fixing problems with flannel and etcd and everything will be fine.
add the following flag while starting agent,
--reconfiguration_policy=additive
more details here: http://mesos.apache.org/documentation/latest/agent-recovery/

SkyDNS does not work with Kubernetes 1.1.2

I installed successfully Kubernetes 1.1.2 on CoreOS alpha (877.1.0) will the service files as link https://gist.github.com/thanhson1085/5a005e92245cb2288dee
After that, I want to run SkyDNS AddOn for my Kubernetes as a Service Discovery. And I followed this guide: https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns
But it does not work:
core#coreos-1 ~/projects $ sudo kubectl exec busybox -- nslookup kubernetes.default.cluster.local
Server: 10.100.100.100
Address 1: 10.100.100.100
nslookup: can't resolve 'kubernetes.default.cluster.local'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
core#coreos-1 ~/projects $ sudo kubectl exec busybox -- nslookup kubernetes.default
Server: 10.100.100.100
Address 1: 10.100.100.100
nslookup: can't resolve 'kubernetes.default'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
core#coreos-1 ~/projects $ sudo kubectl exec busybox -- nslookup kubernetes.default.svc.cluster.local
Server: 10.100.100.100
Address 1: 10.100.100.100
nslookup: can't resolve 'kubernetes.default.svc.cluster.local'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
It works if i lookup global domain:
core#coreos-1 ~/projects $ sudo kubectl exec busybox -- nslookup google.com
Server: 10.100.100.100
Address 1: 10.100.100.100
Name: google.com
Address 1: 2404:6800:4003:c02::64 sc-in-x64.1e100.net
Address 2: 74.125.200.138 sa-in-f138.1e100.net
Address 3: 74.125.200.139 sa-in-f139.1e100.net
Address 4: 74.125.200.100 sa-in-f100.1e100.net
Address 5: 74.125.200.113 sa-in-f113.1e100.net
Address 6: 74.125.200.102 sa-in-f102.1e100.net
Address 7: 74.125.200.101 sa-in-f101.1e100.net
LOGS:
core#coreos-1 ~/projects $ sudo docker logs k8s_skydns.ed0ae89c_kube-dns-v9-a5afs_kube-system_b45f88f2-9684-11e5-9f5a-04018a53d701_32d98a24
2015/11/29 10:47:00 skydns: falling back to default configuration, could not read from etcd: 100: Key not found (/skydns) [2]
2015/11/29 10:47:00 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2015/11/29 10:47:00 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
kube2sky logs:
Failed to list *api.Service: couldn't get version/kind; json parse error: invalid character '<' looking for beginning of value
Failed to list *api.Endpoints: couldn't get version/kind; json parse error: invalid character '<' looking for beginning of value
And the below is the information for my installation:
core#coreos-1 ~/projects $ sudo kubectl get endpoints -a --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default frontend 10.100.21.7:80,10.100.21.9:80 1d
default kubernetes 128.199.134.19:6443 1d
default redis-master 10.100.21.2:6379 1d
default redis-slave 10.100.21.5:6379,10.100.21.6:6379 1d
kube-system kube-dns 51m
core#coreos-1 ~/projects $ sudo kubectl get pods -a --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox 1/1 Running 0 51m
default frontend-1w89q 1/1 Running 0 1d
default frontend-w44qx 1/1 Running 0 1d
default redis-master-qde8g 1/1 Running 0 1d
default redis-slave-3m0t3 1/1 Running 0 1d
default redis-slave-m6fc8 1/1 Running 0 1d
kube-system kube-dns-v9-zrd22 3/4 Running 96 52m
core#coreos-1 ~/projects $ sudo kubectl get services -a --all-namespaces
NAMESPACE NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE
default frontend 10.100.192.188 nodes 80/TCP name=frontend 1d
default kubernetes 10.100.0.1 <none> 443/TCP <none> 1d
default redis-master 10.100.180.191 <none> 6379/TCP name=redis-master 1d
default redis-slave 10.100.146.91 <none> 6379/TCP name=redis-slave 1d
kube-system kube-dns 10.100.100.100 <none> 53/UDP,53/TCP k8s-app=kube-dns 52m
core#coreos-1 ~/projects $ sudo kubectl get rc -a --all-namespaces
NAMESPACE CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
default frontend php-redis gcr.io/google_samples/gb-frontend:v3 name=frontend 2 1d
default redis-master master redis name=redis-master 1 1d
default redis-slave worker gcr.io/google_samples/gb-redisslave:v1 name=redis-slave 2 1d
kube-system kube-dns-v9 etcd gcr.io/google_containers/etcd:2.0.9 k8s-app=kube-dns,version=v9 1 52m
kube2sky gcr.io/google_containers/kube2sky:1.11
skydns gcr.io/google_containers/skydns:2015-10-13-8c72f8c
healthz gcr.io/google_containers/exechealthz:1.0
See Kubelet Service Status:
core#coreos-1 ~ $ sudo systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2015-11-28 07:17:28 UTC; 52min ago
Main PID: 27007 (kubelet)
Memory: 2.9M
CPU: 944ms
CGroup: /system.slice/kubelet.service
├─27007 /opt/bin/kubelet --port=10250 --cluster-dns=10.100.100.100 --cluster-domain=cluster.local --hostname-override=coreos-1 --api-servers=http://127.0.0.1:8080 --logtostderr=true
└─27018 journalctl -k -f
Nov 28 08:09:26 coreos-1 kubelet[27007]: E1128 08:09:26.690587 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:27 coreos-1 kubelet[27007]: E1128 08:09:27.689255 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:28 coreos-1 kubelet[27007]: E1128 08:09:28.696916 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:29 coreos-1 kubelet[27007]: E1128 08:09:29.697705 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:30 coreos-1 kubelet[27007]: E1128 08:09:30.691816 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:31 coreos-1 kubelet[27007]: E1128 08:09:31.684655 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:32 coreos-1 kubelet[27007]: I1128 08:09:32.008987 27007 manager.go:1769] pod "kube-dns-v9-zrd22_kube-system" container "skydns" is unhealthy (probe result: failure), it will be killed and re-created.
Nov 28 08:09:32 coreos-1 kubelet[27007]: E1128 08:09:32.708600 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:33 coreos-1 kubelet[27007]: E1128 08:09:33.708741 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Nov 28 08:09:34 coreos-1 kubelet[27007]: E1128 08:09:34.685334 27007 fs.go:211] Stat fs failed. Error: no such file or directory
Please help me to fix it.
And can I use Consul to place of SkyDNS in this case?
Isn't it just a typo? Try nslookup kubernetes.default.svc.cluster.local or kubernetes.default
Right now, you searched for kubenetes when it should be kubernetes
Hope it helps!
UPDATE:
It seems like you haven't configured the --cluster-dns, and --cluster-domain in your kubelet service.
I found the solution from #Tim Hockin comment.
I got logs from kube2sky service:
Failed to list *api.Service: couldn't get version/kind; json parse error: invalid character '<' looking for beginning of value
Failed to list *api.Endpoints: couldn't get version/kind; json parse error: invalid character '<' looking for beginning of value
With this error, I just need to add:
- -kube_master_url=http://10.130.162.60:8080 # IP Kubernets Api Server
To skydns-rc.yaml file.
All the details about my service configuration files, please see: https://gist.github.com/thanhson1085/5a005e92245cb2288dee
After that, restart dns service containers. It works:
core#coreos-1 ~/projects $ sudo kubectl exec busybox -- nslookup kubernetes.default
Server: 10.100.100.100
Address 1: 10.100.100.100
Name: kubernetes.default
Address 1: 10.100.0.1

Resources