CrashLoopBackOff on kubernetes-dashboard - docker

I'm a noob with Kubernetes. I'm trying to follow some recipes to get a small cluster up and running, but I'm having troubles ...
I have a master and (4) nodes, all running Ubuntu 16.04
installed docker on all nodes:
$ sudo apt-get update
$ sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ sudo add-apt-repository \
"deb https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \
$(lsb_release -cs) \
stable"
$ sudo apt-get update && apt-get install -y docker-ce=$(apt-cache madison docker-ce | grep 17.03 | head -1 | awk '{print $3}')
$ sudo docker version
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
turned off swap on all nodes
$ sudo swapoff -a
commented out the swap mounts in /etc/fstab
$ sudo vi /etc/fstab
$ mount -a
installed kubeadm & kubectl on all nodes:
$ sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ sudo cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ sudo apt-get update
$ sudo apt-get install -y kubeadm kubectl
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4",
GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean",
BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc",
Platform:"linux/amd64"}
downloaded and unpacked this into /usr/local/bin on master and all nodes: https://github.com/kubernetes-incubator/cri-tools/releases
installed etcd 3.3.0 on all nodes:
$ sudo groupadd --system etcd
$ sudo useradd --home-dir "/var/lib/etcd" \
--system \
--shell /bin/false \
-g etcd \
etcd
$ sudo mkdir -p /etc/etcd
$ sudo chown etcd:etcd /etc/etcd
$ sudo mkdir -p /var/lib/etcd
$ sudo chown etcd:etcd /var/lib/etcd
$ sudo rm -rf /tmp/etcd && mkdir -p /tmp/etcd
$ sudo curl -L https://github.com/coreos/etcd/releases/download/v3.3.0/etcd- v3.3.0-linux-amd64.tar.gz -o /tmp/etcd-3.3.0-linux-amd64.tar.gz
$ sudo tar xzvf /tmp/etcd-3.3.0-linux-amd64.tar.gz -C /tmp/etcd --strip-components=1
$ sudo cp /tmp/etcd/etcd /usr/bin/etcd
$ sudo cp /tmp/etcd/etcdctl /usr/bin/etcdctl
noted the IP of the master:
$ sudo ifconfig -a eth0
eth0 Link encap:Ethernet HWaddr 1e:00:51:00:00:28
inet addr:172.20.43.30 Bcast:172.20.43.255 Mask:255.255.254.0
inet6 addr: fe80::27b5:3d06:94c9:9d0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3194023 errors:0 dropped:0 overruns:0 frame:0
TX packets:3306456 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:338523846 (338.5 MB) TX bytes:3682444019 (3.6 GB)
initialized kubernetes on the master:
$ sudo kubeadm init --pod-network-cidr=172.20.43.0/16 \
--apiserver-advertise-address=172.20.43.30 \
--ignore-preflight-errors=cri \
--kubernetes-version stable-1.9
[init] Using Kubernetes version: v1.9.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING CRI]: unable to check if the container runtime at "/var/run/dockershim.sock" is running: exit status 1
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [jenkins-kube- master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.20.43.30]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 37.502640 seconds
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[markmaster] Will mark node jenkins-kube-master as master by adding a label and a taint
[markmaster] Master jenkins-kube-master tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 6be4b1.9a8dacf89f71e53c
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 --discovery-token-ca-cert-hash sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf
on the master node:
$ sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ export KUBECONFIG=$HOME/.kube/config
$ echo "export KUBECONFIG=$HOME/.kube/config" | tee -a ~/.bashrc
setup flannel for networking on master:
$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
clusterrole "flannel" created
clusterrolebinding "flannel" created
serviceaccount "flannel" created
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created
$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
clusterrole "flannel" configured
clusterrolebinding "flannel" configured
join the nodes to the cluster running this on each:
$ sudo kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 \
--discovery-token-ca-cert-hash sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf \
--ignore-preflight-errors=cri
installed the dashboard on the master:
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role "kubernetes-dashboard-minimal" created
rolebinding "kubernetes-dashboard-minimal" created
deployment "kubernetes-dashboard" created
service "kubernetes-dashboard" created
started the proxy:
$ kubectl proxy
Starting to serve on 127.0.0.1:8001
opened another ssh to master with -L 8001:127.0.0.1:8001 and opened a local browser window for http://localhost:8001/ui
it redirects to http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ and says:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "no endpoints available for service \"https:kubernetes- dashboard:\"",
"reason": "ServiceUnavailable",
"code": 503
}
checking the pods ...
$ sudo kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default guids-74487d79cf-zsj8q 1/1 Running 0 4h
kube-system etcd-jenkins-kube-master 1/1 Running 1 21h
kube-system kube-apiserver-jenkins-kube-master 1/1 Running 1 21h
kube-system kube-controller-manager-jenkins-kube-master 1/1 Running 2 21h
kube-system kube-dns-6f4fd4bdf-7pr9q 3/3 Running 0 1d
kube-system kube-flannel-ds-pvk8m 1/1 Running 0 4h
kube-system kube-flannel-ds-q4fsl 1/1 Running 0 4h
kube-system kube-flannel-ds-qhxn6 1/1 Running 0 21h
kube-system kube-flannel-ds-tkspz 1/1 Running 0 4h
kube-system kube-flannel-ds-vgqsb 1/1 Running 0 4h
kube-system kube-proxy-7np9b 1/1 Running 0 4h
kube-system kube-proxy-9lx8h 1/1 Running 1 1d
kube-system kube-proxy-f46d8 1/1 Running 0 4h
kube-system kube-proxy-fdtx9 1/1 Running 0 4h
kube-system kube-proxy-kmnjf 1/1 Running 0 4h
kube-system kube-scheduler-jenkins-kube-master 1/1 Running 1 21h
kube-system kubernetes-dashboard-5bd6f767c7-xf42n 0/1 CrashLoopBackOff 53 4h
checking the log ...
$ sudo kubectl logs kubernetes-dashboard-5bd6f767c7-xf42n --namespace=kube-system
2018/03/20 17:56:25 Starting overwatch
2018/03/20 17:56:25 Using in-cluster config to connect to apiserver
2018/03/20 17:56:25 Using service account token for csrf signing
2018/03/20 17:56:25 No request provided. Skipping authorization
2018/03/20 17:56:55 Error while initializing connection to Kubernetes apiserver.
This most likely means that the cluster is misconfigured (e.g., it has invalid
apiserver certificates or service accounts configuration) or the
--apiserver-host param points to a server that does not exist.
Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
I find this reference to 10.96.0.1 rather odd. I don't have that on my network anywhere that I'm aware of.
I put the output of sudo kubectl describe pod --namespace=kube-system on pastebin:
https://pastebin.com/cPppPkRw
Thanks in advance for any pointers.
-Steve Maring
Orlando, FL

--service-cluster-ip-range=10.96.0.0/12
Line 76 of your pastebin shows the Service CIDR to be that, which squares with how kubernetes thinks of the world: .1 in the Service CIDR is always kubernetes (IIRC kube-dns gets a pretty low IP assignment, too, but I can't recall if it is always fixed like the kubernetes one is)
You'll want to either change both the Service and Pod CIDRs to fit within the 10.244.0.0/16 subnet that flannel created as a side-effect of deploying that yaml, or change its ConfigMap (err, at your peril now that the network has already been pushed into etcd) to align with the Service and Pod CIDR specified to your apiserver.

Related

Error trying runnning kubeadm init on Centos 7

I am new at kubeletes and I can´t run "kubeadm init" with success.
Let me show you step by step what I did:
I installed last version dockers using yum following dockers documentation
(I have configurated 'Environment="HTTP_PROXY=http://usuario:password#proxy:port/" "HTTPS_PROXY=http://usuario:password#proxy:port/"' in /etc/systemd/system/docker.service.d/http-proxy.conf).
I have disabled SELINUXTYPE, disabled Swap with the command "swapoff -a" and commented "#/dev/mapper/centos-swap swap swap defaults 0 0" in /etc/fstab.
I used "modprobe br_netfilter" and "echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables" to activate the module called "br_netfilter".
"kubernetes.repo" file to install "kubelet kubeadm kubectl" using yum:
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
Opened ports:
firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=2379-2380/tcp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=10251/tcp
firewall-cmd --permanent --add-port=10252/tcp
firewall-cmd --permanent --add-port=10255/tcp
firewall-cmd --reload
I created "10-kubeadm.conf" file:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
Reload and enable services:
systemctl daemon-reload
systemctl restart docker
systemctl enable docker
systemctl restart kubelet
systemctl enable kubelet
(both services with status: active(running))
Error:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
Thanks in advance for the help.
Best Regards.
please disable your swap
swapoff -a
vim /etc/fstab
comment the swap line
after that install this packages
yum install -y yum-utils device-mapper-persistent-data lvm2
and add repo by this
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
and you should install docker by this command
yum install -y docker-ce
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
yum install -y kubelet kubeadm kubectl
then reboot
systemctl start docker && systemctl enable docker
systemctl start kubelet && systemctl enable kubelet
systemctl daemon-reload
systemctl restart kubelet
kubeadm init --apiserver-advertise-address=MASTER_IP --pod-network-cidr=10.244.0.0/16
do not change 10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Next, deploy the flannel network to the kubernetes cluster using the kubectl command.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
I write the complete way to run kubernetes and I run kubernetes cluster by this command 1000 of time

No nodes available on Minikube

I am working on minikube inside a vmware (ubuntu 16.04).
Everything was fine for couple of weeks.
One day I came and noticeds that all my pods stucked on "pending".
I describe one of the pods and saw:
no nodes available to schedule pods
I uninstalled minikube:
minikube stop; minikube delete &&
docker stop $(docker ps -aq) &&
rm -rf ~/.kube ~/.minikube &&
rm -rf /usr/local/bin/localkube /usr/local/bin/minikube &&
rm -rf /etc/kubernetes/ &&
docker system prune -af --volumes
systemctl stop kubelet
systemctl disable kubelet
Installed it again:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube && mv minikube /usr/local/bin/
swapoff -a
minikube start --vm-driver=none
mv /root/.minikube $HOME/.minikube # this will write over any previous configuration
chown -R $USER $HOME/.minikube
chgrp -R $USER $HOME/.minikube
When I run kubectl get nodes I received: The connection to the server 192.168.21.129:8443 was refused - did you specify the right host or port?`
I ran docker ps -a:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
602e5d1a27c1 3193be46e0b3 "kube-scheduler --ad…" About a minute ago Exited (2) 22 seconds ago k8s_kube-scheduler_kube-scheduler-minikube_kube-system_9729a196c4723b60ab401eaff722982d_1
29208eeb5f46 k8s.gcr.io/pause:3.1 "/pause" 2 minutes ago Exited (0) 22 seconds ago k8s_POD_kube-scheduler-minikube_kube-system_9729a196c4723b60ab401eaff722982d_0
I have only 2 pods which are exited...
What happens here?
I uninstalled it deeply as I can (or not?).
How to troubleshoot it and fix ?
EDIT:
I uninstalled Minikube but now also removed the kubelet service like that:
I followed the instruction of removing service from here:
systemctl stop kubelet
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /lib/systemd/system/kubelet.service
rm -rf /var/lib/kubelet
rm -rf /usr/libexec/kubernetes/kubelet-plugins
rm -rf /usr/bin/kubelet
systemctl daemon-reload
systemctl reset-failed
I searched to see if I still have it on my system with find / | grep kubelet and found lots of files under /sys/kernel/slab. I restarted the machine and it gone.
I installed again minikube and in the beginning I received errors about kubelet:
sudo /usr/bin/kubeadm init --config /var/lib/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests --ignore-preflight-errors=DirAvailable--data-minikube --ignore-preflight-errors=Port-10250 --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml --ignore-preflight-errors=Swap --ignore-preflight-errors=CRI
output: [init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[WARNING FileExisting-ebtables]: ebtables not found in system path
[WARNING FileExisting-socat]: socat not found in system path
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.02.0-ce. Latest validated version: 18.06
[WARNING Hostname]: hostname "minikube" could not be reached
[WARNING Hostname]: hostname "minikube": lookup minikube on 127.0.1.1:53: server misbehaving
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/var/lib/minikube/certs/"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [minikube localhost] and IPs [192.168.21.129 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [minikube localhost] and IPs [192.168.21.129 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
: running command:
sudo /usr/bin/kubeadm init --config /var/lib/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests --ignore-preflight-errors=DirAvailable--data-minikube --ignore-preflight-errors=Port-10250 --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml --ignore-preflight-errors=Swap --ignore-preflight-errors=CRI
I tried to start it again and then it wrote that it succeed but then again, all the containers are being deleted.
I think it related to the kubelet but I removed it and reinstalled completely.
One of the errors was:
[WARNING SystemVerification]: this Docker version is not on the list
of validated versions: 18.02.0-ce. Latest validated version: 18.06
I decided to also remove docker.
This is the full uninstall process I did:
# Remove minikube
minikube stop; minikube delete &&
docker stop $(docker ps -aq) &&
rm -rf ~/.kube ~/.minikube &&
rm -rf /usr/local/bin/localkube /usr/local/bin/minikube &&
rm -rf /etc/kubernetes/ &&
docker system prune -af --volumes
# remove kubelet
systemctl stop kubelet
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /lib/systemd/system/kubelet.service
rm -rf /var/lib/kubelet
rm -rf /usr/libexec/kubernetes/kubelet-plugins
rm -rf /usr/bin/kubelet
systemctl daemon-reload
systemctl reset-failed
# uninstall docker
dpkg -l | grep -i docker
apt-get purge -y docker.io
rm -rf /var/lib/docker
apt-get autoremove -y --purge docker.io
apt-get autoclean
# remove other pieces
rm -rf /home/myuser/.minikube
rm -rf ~/.kube
rm -f /var/lib/dpkg/info/kubelet*
rm -f /var/cache/apt/archives/kubelet_1.13.2-00_amd64.deb
rm -f /var/lib/systemd/deb-systemd-helper-enabled/kubelet.service.dsh-also
rm -f /var/lib/systemd/deb-systemd-helper-enabled/multi-user.target.wants/kubelet.service
rm -f /etc/default/kubelet
I restart my machine and install everything:
apt-get install -y docker.io
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube && mv minikube /usr/local/bin/
swapoff -a
minikube start --v=3 --vm-driver=none
Now everything works fine.

Ubuntu docker swarm error "docker: Cannot connect to the Docker daemon. Is the docker daemon running on this host?."

I am trying to set up docker swarm with consul on some Ubuntu 14.04 vagrant boxes, however there is an issue with the docker daemon. I already have a progrium/consul container running and a swarm manager container running. 172.28.128.3 is the master machine running everything, 172.28.128.4 is the machine I am trying to start a docker swarm container on. Here is my command and the output:
vagrant#ubuntu-14:~$ docker -H=172.28.128.4:2375 run -d swarm join \
> --advertise=172.28.128.4:2375 \
> consul://172.28.128.3:8500/
docker: Cannot connect to the Docker daemon. Is the docker daemon running on this host?.
See 'docker run --help'.
There is no other problem with docker and attempting to start the daemon the same way I would on my macs boot2docker gives the following output:
vagrant#ubuntu-14:~$ eval "$(docker-machine env default)"
docker-machine: command not found
Update: here is the output of $sudo docker info and $docker info (they are exactly the same except for one line described below)
vagrant#ubuntu-14:~$ sudo docker info
Containers: 8
Running: 2
Paused: 0
Stopped: 6
Images: 8
Server Version: 1.11.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 81
Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: null host bridge
Kernel Version: 3.13.0-24-generic
Operating System: Ubuntu 14.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 993.9 MiB
Name: ubuntu-14
ID: BBEM:JVHD:UXV7:AGQR:ITUY:3KGT:K4RS:7KSR:ESCJ:2VZQ:QTOG:J26U
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support
The only difference between the two commands is that $docker info has the following entry for Network:
Network: host bridge null
On my second machine there is no difference at all between the two command outputs.
UPDATE: after adding DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock" to the file /etc/default/docker on my worker machine and restarting the docker service on my worker server sudo docker restart swarm is working correctly.
Thank you JorelC for the solution.
You have to configure all machines that You want to use docker through tcp to run in tcp mode. In Your remote machine (172.28.128.4 in your question), edit /etc/default/docker file and add something like this in DOCKER_OPTS:
DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"
After that, You need to restart the service:
sudo service docker restart
And You should use docker through tcp. Try from your client machine:
docker -H=172.28.128.4:2375 info
to test if it's working
There can also be issues if you are using clones of instances or instance images that have docker preinstalled on them.
To get around that use the following shell script:
#UNINSTALL
sudo apt-get purge -y docker-engine
sudo apt-get autoremove -y --purge docker-engine
#CLONES
sudo rm /etc/docker/key.json
#INSTALL
sudo apt-get install -y curl
sudo curl -sSL http://get.docker.com | sudo sh
sudo usermod -aG docker $(whoami)
sudo su root
And if you want to use the newest version of docker swarm (1.12 the one with docker swarm built in) use the following script:
# DOCKER 1.12.0
sudo apt-get update
sudo apt-get purge -y lxc-docker docker-engine
sudo apt-get autoremove -y --purge docker-engine
sudo curl -fsSL https://experimental.docker.com/ | sudo sh
sudo chmod 777 /etc/default/docker
echo 'DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"' > /etc/default/docker
sudo chmod 755 /etc/default/docker
sudo rm /etc/docker/key.json
sudo service docker restart
sudo usermod -aG docker $(whoami)
sudo su root

Kubernetes on Mesos, no suitable offer available

I followed the instructions on this page to build and deploy Mesos. I did this on a Ubuntu Trusty VM with 1 Mesos master and 1 slave. The following commands are what I used to run Mesos.
$ mesos-master --ip=10.0.2.15 --work_dir=/var/lib/mesos --log_dir=/var/log/mesos
$ mesos-slave --master=10.0.2.15:5050 --containerizers=docker,mesos
All of three tests finished without error message.
Then I followed this page to deploy Kubernetes. After building Kubernetes-Mesos, I used the following commands to deploy Kubernetes.
$ export KUBERNETES_MASTER_IP=10.0.2.15
$ export KUBERNETES_MASTER=http://${KUBERNETES_MASTER_IP}:8888
$ docker run -d --hostname $(uname -n) --name etcd \
-p 4001:4001 -p 7001:7001 quay.io/coreos/etcd:v2.0.12 \
--listen-client-urls http://0.0.0.0:4001 \
--advertise-client-urls http://${KUBERNETES_MASTER_IP}:4001
etcd container is running.
$ export PATH="$(pwd)/_output/local/go/bin:$PATH"
$ export MESOS_MASTER=10.0.2.15:5050
$ cat <<EOF >mesos-cloud.conf
[mesos-cloud]
mesos-master = ${MESOS_MASTER}
EOF
$ km apiserver \
--address=${KUBERNETES_MASTER_IP} \
--etcd-servers=http://${KUBERNETES_MASTER_IP}:4001 \
--service-cluster-ip-range=10.10.10.0/24 \
--port=8888 \
--cloud-provider=mesos \
--cloud-config=mesos-cloud.conf \
--secure-port=0 \
--v=1 >apiserver.log 2>&1 &
$ km controller-manager \
--master=${KUBERNETES_MASTER_IP}:8888 \
--cloud-provider=mesos \
--cloud-config=./mesos-cloud.conf \
--v=1 >controller.log 2>&1 &
$ km scheduler \
--address=${KUBERNETES_MASTER_IP} \
--mesos-master=${MESOS_MASTER} \
--etcd-servers=http://${KUBERNETES_MASTER_IP}:4001 \
--mesos-user=root \
--api-servers=${KUBERNETES_MASTER_IP}:8888 \
--cluster-dns=10.10.10.10 \
--cluster-domain=cluster.local \
--v=2 >scheduler.log 2>&1 &
Logs seem correct, no error message.
kubectl get services shows:
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
k8sm-scheduler 10.10.10.50 <none> 10251/TCP 1m
kubernetes 10.10.10.1 <none> 443/TCP 2m
Then I created a simple nginx pod, kubectl get pods always shows it's pending. kubectl get events shows:
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
2m 47s 9 nginx Pod Warning FailedScheduling {default-scheduler } Error scheduling: No suitable offers for pod/task
What does it mean by No suitable offers for pod/task? In Mesos' log, I see Mesos keeps sending offer to Kubernetes framework, but keeps being DECLINED. If I run mesos-execute --master=10.0.2.15:5050 --name=echo --command="echo 'hello world'" --containerizer=docker --docker_image=ubuntu:14.04 it can deploy a Docker image with "mesos-" prefix and run the command. So it seems Docker containerizer works properly.
Kubernetes-Mesos will decline offers for several reasons:
the resources in the offer don't satisfy the minimum required to launch the pod-task. The first pod-task launched on a given slave requires executor resources in addition to the pod-task resources.
the resources in the offer aren't compatible with the scheduler. this happens if you start the framework, launch a task, kill the scheduler process, then restart the scheduler with different flags; some scheduler flags affect the command-line used to launch the executor. the quickest way to remedy this is to delete any running pods and manually kill the incompatible executor process(es) already running on the slave(s).
there is a problem with the node info in the apiserver registry.
What version of k8sm are you running? master? You might try increasing the verbosity of the scheduler logs (--v=3) and then dumping a copy of your scheduler logs up on pastebin or some such so that they can be analyzed. It's often difficult to troubleshoot situations like this without the logs.
It sounds like the offers that are coming in do not satisfy the needs of Kubernetes. You have to find out what your framework needs, and then compare that to what the rejected offers look like.

docker AWS fail to launch - custom kernel

I cannot launch docker on my AWS instance:
root#system:~# docker -H tcp://127.0.0.1:2375 -H
unix:///var/run/docker.sock -d
root#system:~# INFO[0000] +job serveapi(tcp://127.0.0.1:2375, unix:///var/run/docker.sock)
INFO[0000] +job init_networkdriver()
INFO[0000] Listening for HTTP on tcp (127.0.0.1:2375)
INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
Unable to allow incoming packets: iptables failed: iptables --wait -I
FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT:
iptables: No chain/target/match by that name.
(exit status 1)
INFO[0000] -job init_networkdriver() = ERR (1)
FATA[0000] (exit status 1)
How would you troubleshoot that error sequence message?
Double-check the docker installation procedure on AWS:
[ec2-user ~]$ sudo yum update -y
[ec2-user ~]$ sudo yum install -y docker
[ec2-user ~]$ sudo service docker start
Starting cgconfig service: [ OK ]
Starting docker: [ OK ]
[ec2-user ~]$ sudo usermod -a -G docker ec2-user
# Log out and log back in again to pick up the new docker group permissions.
# Verify that the ec2-user can run Docker commands without sudo.
[ec2-user ~]$ docker info

Resources