No nodes available on Minikube - docker

I am working on minikube inside a vmware (ubuntu 16.04).
Everything was fine for couple of weeks.
One day I came and noticeds that all my pods stucked on "pending".
I describe one of the pods and saw:
no nodes available to schedule pods
I uninstalled minikube:
minikube stop; minikube delete &&
docker stop $(docker ps -aq) &&
rm -rf ~/.kube ~/.minikube &&
rm -rf /usr/local/bin/localkube /usr/local/bin/minikube &&
rm -rf /etc/kubernetes/ &&
docker system prune -af --volumes
systemctl stop kubelet
systemctl disable kubelet
Installed it again:
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube && mv minikube /usr/local/bin/
swapoff -a
minikube start --vm-driver=none
mv /root/.minikube $HOME/.minikube # this will write over any previous configuration
chown -R $USER $HOME/.minikube
chgrp -R $USER $HOME/.minikube
When I run kubectl get nodes I received: The connection to the server 192.168.21.129:8443 was refused - did you specify the right host or port?`
I ran docker ps -a:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
602e5d1a27c1 3193be46e0b3 "kube-scheduler --ad…" About a minute ago Exited (2) 22 seconds ago k8s_kube-scheduler_kube-scheduler-minikube_kube-system_9729a196c4723b60ab401eaff722982d_1
29208eeb5f46 k8s.gcr.io/pause:3.1 "/pause" 2 minutes ago Exited (0) 22 seconds ago k8s_POD_kube-scheduler-minikube_kube-system_9729a196c4723b60ab401eaff722982d_0
I have only 2 pods which are exited...
What happens here?
I uninstalled it deeply as I can (or not?).
How to troubleshoot it and fix ?
EDIT:
I uninstalled Minikube but now also removed the kubelet service like that:
I followed the instruction of removing service from here:
systemctl stop kubelet
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /lib/systemd/system/kubelet.service
rm -rf /var/lib/kubelet
rm -rf /usr/libexec/kubernetes/kubelet-plugins
rm -rf /usr/bin/kubelet
systemctl daemon-reload
systemctl reset-failed
I searched to see if I still have it on my system with find / | grep kubelet and found lots of files under /sys/kernel/slab. I restarted the machine and it gone.
I installed again minikube and in the beginning I received errors about kubelet:
sudo /usr/bin/kubeadm init --config /var/lib/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests --ignore-preflight-errors=DirAvailable--data-minikube --ignore-preflight-errors=Port-10250 --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml --ignore-preflight-errors=Swap --ignore-preflight-errors=CRI
output: [init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[WARNING FileExisting-ebtables]: ebtables not found in system path
[WARNING FileExisting-socat]: socat not found in system path
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.02.0-ce. Latest validated version: 18.06
[WARNING Hostname]: hostname "minikube" could not be reached
[WARNING Hostname]: hostname "minikube": lookup minikube on 127.0.1.1:53: server misbehaving
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/var/lib/minikube/certs/"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [minikube localhost] and IPs [192.168.21.129 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [minikube localhost] and IPs [192.168.21.129 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
: running command:
sudo /usr/bin/kubeadm init --config /var/lib/kubeadm.yaml --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests --ignore-preflight-errors=DirAvailable--data-minikube --ignore-preflight-errors=Port-10250 --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml --ignore-preflight-errors=FileAvailable--etc-kubernetes-manifests-etcd.yaml --ignore-preflight-errors=Swap --ignore-preflight-errors=CRI
I tried to start it again and then it wrote that it succeed but then again, all the containers are being deleted.
I think it related to the kubelet but I removed it and reinstalled completely.

One of the errors was:
[WARNING SystemVerification]: this Docker version is not on the list
of validated versions: 18.02.0-ce. Latest validated version: 18.06
I decided to also remove docker.
This is the full uninstall process I did:
# Remove minikube
minikube stop; minikube delete &&
docker stop $(docker ps -aq) &&
rm -rf ~/.kube ~/.minikube &&
rm -rf /usr/local/bin/localkube /usr/local/bin/minikube &&
rm -rf /etc/kubernetes/ &&
docker system prune -af --volumes
# remove kubelet
systemctl stop kubelet
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service.d
rm -rf /lib/systemd/system/kubelet.service
rm -rf /var/lib/kubelet
rm -rf /usr/libexec/kubernetes/kubelet-plugins
rm -rf /usr/bin/kubelet
systemctl daemon-reload
systemctl reset-failed
# uninstall docker
dpkg -l | grep -i docker
apt-get purge -y docker.io
rm -rf /var/lib/docker
apt-get autoremove -y --purge docker.io
apt-get autoclean
# remove other pieces
rm -rf /home/myuser/.minikube
rm -rf ~/.kube
rm -f /var/lib/dpkg/info/kubelet*
rm -f /var/cache/apt/archives/kubelet_1.13.2-00_amd64.deb
rm -f /var/lib/systemd/deb-systemd-helper-enabled/kubelet.service.dsh-also
rm -f /var/lib/systemd/deb-systemd-helper-enabled/multi-user.target.wants/kubelet.service
rm -f /etc/default/kubelet
I restart my machine and install everything:
apt-get install -y docker.io
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube && mv minikube /usr/local/bin/
swapoff -a
minikube start --v=3 --vm-driver=none
Now everything works fine.

Related

Issue with Hyperledger byfn - ERRO 002 Cannot run peer because cannot init crypto

I was running Hyperleddger byfn, to bring up the first network, on Mac. Each time I got this error above. What I tried so far for resolutio:
docker rm -f $(docker ps -aq) — del existing containers
docker rmi -f $(docker images -a) — del existing images
./byfn.sh -m down
./byfn.sh -m generate
./byfn.sh -m up
But I keep getting the same error. Also tried executing the command line in byfn script - where the error is generated, separately on docker cli
docker exec cli peer channel create -o orderer.example.com:7050 -c mychannel -f ./channel-artifacts/channel.tx --tls true --cafile /Users/debg/fabric-samples/first-network/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem
Error: failed to create deliver client: failed to load config for OrdererClient: unable to load orderer.tls.rootcert.file: open /Users/debg/fabric-samples/first-network/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem: no such file or directory
But I can clearly see the .pem file on the same folder, with 755 access through all the directories in the hierarchy and the file. Can anyone please help?
The given path: /Users/debg/fabric-samples/first-network/crypto-config/ordererOrganizations/example.com/orderers/orderer.example.com/msp/tlscacerts/tlsca.example.com-cert.pem is checked inside your cli docker container.
Can you please confirm if the certificate is correctly getting mounted to that location inside the cli container?

Error trying runnning kubeadm init on Centos 7

I am new at kubeletes and I can´t run "kubeadm init" with success.
Let me show you step by step what I did:
I installed last version dockers using yum following dockers documentation
(I have configurated 'Environment="HTTP_PROXY=http://usuario:password#proxy:port/" "HTTPS_PROXY=http://usuario:password#proxy:port/"' in /etc/systemd/system/docker.service.d/http-proxy.conf).
I have disabled SELINUXTYPE, disabled Swap with the command "swapoff -a" and commented "#/dev/mapper/centos-swap swap swap defaults 0 0" in /etc/fstab.
I used "modprobe br_netfilter" and "echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables" to activate the module called "br_netfilter".
"kubernetes.repo" file to install "kubelet kubeadm kubectl" using yum:
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
Opened ports:
firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=2379-2380/tcp
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=10251/tcp
firewall-cmd --permanent --add-port=10252/tcp
firewall-cmd --permanent --add-port=10255/tcp
firewall-cmd --reload
I created "10-kubeadm.conf" file:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
Reload and enable services:
systemctl daemon-reload
systemctl restart docker
systemctl enable docker
systemctl restart kubelet
systemctl enable kubelet
(both services with status: active(running))
Error:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
Thanks in advance for the help.
Best Regards.
please disable your swap
swapoff -a
vim /etc/fstab
comment the swap line
after that install this packages
yum install -y yum-utils device-mapper-persistent-data lvm2
and add repo by this
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
and you should install docker by this command
yum install -y docker-ce
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
yum install -y kubelet kubeadm kubectl
then reboot
systemctl start docker && systemctl enable docker
systemctl start kubelet && systemctl enable kubelet
systemctl daemon-reload
systemctl restart kubelet
kubeadm init --apiserver-advertise-address=MASTER_IP --pod-network-cidr=10.244.0.0/16
do not change 10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Next, deploy the flannel network to the kubernetes cluster using the kubectl command.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
I write the complete way to run kubernetes and I run kubernetes cluster by this command 1000 of time

CrashLoopBackOff on kubernetes-dashboard

I'm a noob with Kubernetes. I'm trying to follow some recipes to get a small cluster up and running, but I'm having troubles ...
I have a master and (4) nodes, all running Ubuntu 16.04
installed docker on all nodes:
$ sudo apt-get update
$ sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
software-properties-common
$ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ sudo add-apt-repository \
"deb https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \
$(lsb_release -cs) \
stable"
$ sudo apt-get update && apt-get install -y docker-ce=$(apt-cache madison docker-ce | grep 17.03 | head -1 | awk '{print $3}')
$ sudo docker version
Client:
Version: 17.12.1-ce
API version: 1.35
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:17:40 2018
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.1-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.4
Git commit: 7390fc6
Built: Tue Feb 27 22:16:13 2018
OS/Arch: linux/amd64
Experimental: false
turned off swap on all nodes
$ sudo swapoff -a
commented out the swap mounts in /etc/fstab
$ sudo vi /etc/fstab
$ mount -a
installed kubeadm & kubectl on all nodes:
$ sudo curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ sudo cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ sudo apt-get update
$ sudo apt-get install -y kubeadm kubectl
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.4",
GitCommit:"bee2d1505c4fe820744d26d41ecd3fdd4a3d6546", GitTreeState:"clean",
BuildDate:"2018-03-12T16:21:35Z", GoVersion:"go1.9.3", Compiler:"gc",
Platform:"linux/amd64"}
downloaded and unpacked this into /usr/local/bin on master and all nodes: https://github.com/kubernetes-incubator/cri-tools/releases
installed etcd 3.3.0 on all nodes:
$ sudo groupadd --system etcd
$ sudo useradd --home-dir "/var/lib/etcd" \
--system \
--shell /bin/false \
-g etcd \
etcd
$ sudo mkdir -p /etc/etcd
$ sudo chown etcd:etcd /etc/etcd
$ sudo mkdir -p /var/lib/etcd
$ sudo chown etcd:etcd /var/lib/etcd
$ sudo rm -rf /tmp/etcd && mkdir -p /tmp/etcd
$ sudo curl -L https://github.com/coreos/etcd/releases/download/v3.3.0/etcd- v3.3.0-linux-amd64.tar.gz -o /tmp/etcd-3.3.0-linux-amd64.tar.gz
$ sudo tar xzvf /tmp/etcd-3.3.0-linux-amd64.tar.gz -C /tmp/etcd --strip-components=1
$ sudo cp /tmp/etcd/etcd /usr/bin/etcd
$ sudo cp /tmp/etcd/etcdctl /usr/bin/etcdctl
noted the IP of the master:
$ sudo ifconfig -a eth0
eth0 Link encap:Ethernet HWaddr 1e:00:51:00:00:28
inet addr:172.20.43.30 Bcast:172.20.43.255 Mask:255.255.254.0
inet6 addr: fe80::27b5:3d06:94c9:9d0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3194023 errors:0 dropped:0 overruns:0 frame:0
TX packets:3306456 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:338523846 (338.5 MB) TX bytes:3682444019 (3.6 GB)
initialized kubernetes on the master:
$ sudo kubeadm init --pod-network-cidr=172.20.43.0/16 \
--apiserver-advertise-address=172.20.43.30 \
--ignore-preflight-errors=cri \
--kubernetes-version stable-1.9
[init] Using Kubernetes version: v1.9.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING CRI]: unable to check if the container runtime at "/var/run/dockershim.sock" is running: exit status 1
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [jenkins-kube- master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.20.43.30]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 37.502640 seconds
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[markmaster] Will mark node jenkins-kube-master as master by adding a label and a taint
[markmaster] Master jenkins-kube-master tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 6be4b1.9a8dacf89f71e53c
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 --discovery-token-ca-cert-hash sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf
on the master node:
$ sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ export KUBECONFIG=$HOME/.kube/config
$ echo "export KUBECONFIG=$HOME/.kube/config" | tee -a ~/.bashrc
setup flannel for networking on master:
$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
clusterrole "flannel" created
clusterrolebinding "flannel" created
serviceaccount "flannel" created
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created
$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
clusterrole "flannel" configured
clusterrolebinding "flannel" configured
join the nodes to the cluster running this on each:
$ sudo kubeadm join --token 6be4b1.9a8dacf89f71e53c 172.20.43.30:6443 \
--discovery-token-ca-cert-hash sha256:524d29b032d7bfd319b147ab03a936bd429805258425bccca749de71bcb1efaf \
--ignore-preflight-errors=cri
installed the dashboard on the master:
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role "kubernetes-dashboard-minimal" created
rolebinding "kubernetes-dashboard-minimal" created
deployment "kubernetes-dashboard" created
service "kubernetes-dashboard" created
started the proxy:
$ kubectl proxy
Starting to serve on 127.0.0.1:8001
opened another ssh to master with -L 8001:127.0.0.1:8001 and opened a local browser window for http://localhost:8001/ui
it redirects to http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ and says:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "no endpoints available for service \"https:kubernetes- dashboard:\"",
"reason": "ServiceUnavailable",
"code": 503
}
checking the pods ...
$ sudo kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default guids-74487d79cf-zsj8q 1/1 Running 0 4h
kube-system etcd-jenkins-kube-master 1/1 Running 1 21h
kube-system kube-apiserver-jenkins-kube-master 1/1 Running 1 21h
kube-system kube-controller-manager-jenkins-kube-master 1/1 Running 2 21h
kube-system kube-dns-6f4fd4bdf-7pr9q 3/3 Running 0 1d
kube-system kube-flannel-ds-pvk8m 1/1 Running 0 4h
kube-system kube-flannel-ds-q4fsl 1/1 Running 0 4h
kube-system kube-flannel-ds-qhxn6 1/1 Running 0 21h
kube-system kube-flannel-ds-tkspz 1/1 Running 0 4h
kube-system kube-flannel-ds-vgqsb 1/1 Running 0 4h
kube-system kube-proxy-7np9b 1/1 Running 0 4h
kube-system kube-proxy-9lx8h 1/1 Running 1 1d
kube-system kube-proxy-f46d8 1/1 Running 0 4h
kube-system kube-proxy-fdtx9 1/1 Running 0 4h
kube-system kube-proxy-kmnjf 1/1 Running 0 4h
kube-system kube-scheduler-jenkins-kube-master 1/1 Running 1 21h
kube-system kubernetes-dashboard-5bd6f767c7-xf42n 0/1 CrashLoopBackOff 53 4h
checking the log ...
$ sudo kubectl logs kubernetes-dashboard-5bd6f767c7-xf42n --namespace=kube-system
2018/03/20 17:56:25 Starting overwatch
2018/03/20 17:56:25 Using in-cluster config to connect to apiserver
2018/03/20 17:56:25 Using service account token for csrf signing
2018/03/20 17:56:25 No request provided. Skipping authorization
2018/03/20 17:56:55 Error while initializing connection to Kubernetes apiserver.
This most likely means that the cluster is misconfigured (e.g., it has invalid
apiserver certificates or service accounts configuration) or the
--apiserver-host param points to a server that does not exist.
Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
I find this reference to 10.96.0.1 rather odd. I don't have that on my network anywhere that I'm aware of.
I put the output of sudo kubectl describe pod --namespace=kube-system on pastebin:
https://pastebin.com/cPppPkRw
Thanks in advance for any pointers.
-Steve Maring
Orlando, FL
--service-cluster-ip-range=10.96.0.0/12
Line 76 of your pastebin shows the Service CIDR to be that, which squares with how kubernetes thinks of the world: .1 in the Service CIDR is always kubernetes (IIRC kube-dns gets a pretty low IP assignment, too, but I can't recall if it is always fixed like the kubernetes one is)
You'll want to either change both the Service and Pod CIDRs to fit within the 10.244.0.0/16 subnet that flannel created as a side-effect of deploying that yaml, or change its ConfigMap (err, at your peril now that the network has already been pushed into etcd) to align with the Service and Pod CIDR specified to your apiserver.

Unable to start Docker service with error "Failed to start docker.service: Unit not found."

I have installed Docker with yum install docker:
$ uname -a
Linux caspgval4 3.10.0-229.20.1.el7.x86_64 #1 SMP Wed Nov 4 10:08:36 CST 2015 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 1.12.6, build 3a094bd/1.12.6
$ docker info
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Docs: http://docs.docker.com
I am trying to install and run Docker, but it is giving an error as below:
$ sudo service docker start
Redirecting to /bin/systemctl start docker.service
Failed to start docker.service: Unit not found.
How do I resolve this issue? I tried the following commands, but no luck:
$ sudo systemctl start docker
Failed to start docker.service: Unit not found.
Extra information:
$ journalctl -u docker
No journal files were found.
-- No entries --
$ cat /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target
Wants=docker-storage-setup.service
Requires=rhel-push-plugin.socket
Requires=docker-cleanup.timer
[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
Environment=DOCKER_HTTP_HOST_COMPAT=1
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
ExecStart=/usr/bin/dockerd-current \
--add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
--default-runtime=docker-runc \
--authorization-plugin=rhel-push-plugin \
--exec-opt native.cgroupdriver=systemd \
--userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
$OPTIONS \
$DOCKER_STORAGE_OPTIONS \
$DOCKER_NETWORK_OPTIONS \
$ADD_REGISTRY \
$BLOCK_REGISTRY \
$INSECURE_REGISTRY
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Restart=on-abnormal
MountFlags=slave
[Install]
WantedBy=multi-user.target
I tried the following:
$ sudo systemctl daemon-reload
$ sudo systemctl start docker
Failed to start docker.service: Unit not found.
$ sudo journalctl -u docker
-- No entries --
More debug information:
$ sudo systemctl status network.target
● network.target - Network
Loaded: loaded (/usr/lib/systemd/system/network.target; static; vendor preset: disabled)
Active: active since Mon 2017-01-23 02:54:39 PST; 2 months 29 days ago
Docs: man:systemd.special(7)
http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget
Jan 23 02:54:39 mymachine systemd[1]: Starting Network.
Jan 23 02:54:39 mymachine systemd[1]: Reached target Network.
$ sudo systemctl status docker-storage-setup.service
● docker-storage-setup.service - Docker Storage Setup
Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; disabled; vendor preset: disabled)
Active: inactive (dead)
$ sudo systemctl status rhel-push-plugin.socket
Unit rhel-push-plugin.socket could not be found.
$ sudo systemctl status docker-cleanup.timer
● docker-cleanup.timer - Run docker-cleanup every hour
Loaded: loaded (/usr/lib/systemd/system/docker-cleanup.timer; disabled; vendor preset: disabled)
Active: inactive (dead)
In case someone has installed docker using snap, they can start the service using
sudo snap status docker #check the status
sudo snap start docker # start the service
Run this command to list all the services:
sudo systemctl list-units --type=service
Look for the correct Docker service name (in my case it is snap.docker.dockerd.service) then run:
sudo systemctl restart snap.docker.dockerd.service
It looks like you're missing the rhel-push-plugin.socket unit which is presumably part of a rhel-push-plugin package. You can try fixing that install, or you can install the upstream Docker package directly from Docker with the following as root:
curl -sSL https://get.docker.com/ | sh
Or more appropriately following the CentOS install guide from Docker. (The CentOS install tends to work on even a RHEL system when you have a supported version, which until the recent 20.10 release did not include CentOS 8 or RHEL 8.)
The upstream Docker install will be a more recent version of Docker, but it will not have the various RHEL modifications like the rhel-push-plugin.
Use the following command if your OS is Ubuntu, it will install Docker successfully:
apt install docker.io
if you have installed docker with snap you can check the status of the daemon like this:
sudo snap services docker
If the current column shows inactive then it's not running. Try starting it with this:
sudo snap start docker
Check the service again, if it's still not running after that you can check the logs:
sudo snap logs docker
which might give more hints on what's stopping it from running
For me, I had an error saying that docker could not create a symlink in /etc/docker/ because another file was in the way. Emptying the directory was not enough due to a known bug in the docker snap. The work-around was to delete the directory then refresh the snap (see https://forum.snapcraft.io/t/layouts-still-brittle-when-refreshing-snaps/26252/2)
sudo rm -rf /etc/docker
sudo snap refresh
I worked around the issue by simply doing the following:
$ sudo apt-get purge containerd.io docker-ce
$ rm -rf /var/lib/containerd
[reboot]
$ sudo apt-get install containerd.io docker-ce
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html
Note
In some cases, you may need to reboot your instance to provide permissions for the ec2-user to access the Docker daemon. Try rebooting your instance if you see the following error:
Try installing Docker as root (sudo):
sudo yum install docker
See Installation on Red Hat Enterprise Linux.

docker inside docker container

I want to install docker inside a running docker container.
docker run -it centos:centos7
My base container is using centos, I can login to running container using docker exec. But when I try to install docker inside it using yum install -y docker it installs.
But somehow I can't start the docker service with docker -d &, it gives me error as:
INFO[0000] Option DefaultNetwork: bridge
WARN[0000] Running modprobe bridge nf_nat br_netfilter failed with message: , error: exit status 1
FATA[0000] Error starting daemon: Error initializing network controller: Error initializing bridge driver: Setup IP forwarding failed: open /proc/sys/net/ipv4/ip_forward: read-only file system
Is there a way I can install docker inside docker container or build image already having running docker? I have already seen these examples but none works for me.
The output of uname -r on the host machine:
[fedora# ~]$ uname -r
4.2.6-200.fc22.x86_64
Any help would be appreciated.
Thanks in advance
Update
Thanks to https://stackoverflow.com/a/38016704/372019 I want to show another approach.
Instead of mounting the host's docker binary, you should copy or install a container specific release of the docker binary. Since you're only using it in a client mode, you won't need to install it as a system service. You still need to mount the Docker socket into the container so that you can easily communicate with the host's Docker engine.
Assuming that you got a base image with a working Docker binary (e.g. the official docker image), the example now looks like this:
docker run\
-v /var/run/docker.sock:/var/run/docker.sock\
docker:1.12 docker info
Without actually answering your question I'd suggest you to read Using Docker-in-Docker for your CI or testing environment? Think twice.
It explains why running docker-in-docker should be replaced with a setup where Docker containers run as siblings of the "outer" or "base" container. The article also links to the original https://github.com/jpetazzo/dind project where you can find working examples how to run Docker in Docker - in case you still want to have docker-in-docker.
An example how to enable a container to access the host's Docker daemon look like this:
docker run\
-v /var/run/docker.sock:/var/run/docker.sock\
-v /usr/bin/docker:/usr/bin/docker\
busybox:latest /usr/bin/docker info
If you are on Mac with Docker toolbox.
The below command WON’T WORK
docker run\
-v /var/run/docker.sock:/var/run/docker.sock\
-v /usr/bin/docker:/usr/bin/docker\
busybox:latest /usr/bin/docker info
Because /var/run/docker.sock will not be on your OSX filesystem
the Docker daemon is running inside the boot2docker VM - and that's where the unix socket is.
So you have to run the container from boot2docker VM
$ docker-machine ssh default
$ docker run\
-v /var/run/docker.sock:/var/run/docker.sock\
-v $(which docker):/usr/bin/docker\
busybox:latest /usr/bin/docker info
$ exit
This looks like Docker-in-Docker, feels like Docker-in-Docker, but it’s not Docker-in-Docker, when this container will create more containers, those containers will be created in the top-level Docker.
You need the --privileged parameter.
By default, Docker containers are “unprivileged” and cannot, for
example, run a Docker daemon inside a Docker container.
Source
Run your base image with the command docker run --privileged -it centos:centos7 bash. Then you may install and run another docker container inside that container.
I`ve a similar problems in my vms.
I`ve solve the problem with change the storage file system from image to vfs(in daemon.json file)
like the image bellow
For image works first create a base image, in my case with centos7
FROM centos:7
ENV container docker
RUN (cd /lib/systemd/system/sysinit.target.wants/; for i in *; do [ $i == \
systemd-tmpfiles-setup.service ] || rm -f $i; done); \
rm -f /lib/systemd/system/multi-user.target.wants/*;\
rm -f /etc/systemd/system/*.wants/*;\
rm -f /lib/systemd/system/local-fs.target.wants/*; \
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \
rm -f /lib/systemd/system/basic.target.wants/*;\
rm -f /lib/systemd/system/anaconda.target.wants/*;
VOLUME [ "/sys/fs/cgroup" ]
CMD ["/usr/sbin/init"]
with this image builded (in my case i called local/c7-systemd) create a second image, installing docker and moving daemon.json to inside.
FROM local/c7-systemd
RUN yum install -y yum-utils
RUN yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
RUN yum install -y docker-ce docker-ce-cli containerd.io
RUN curl -L "https://github.com/docker/compose/releases/download/1.28.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
RUN chmod +x /usr/local/bin/docker-compose
RUN ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
COPY daemon.json /etc/docker/daemon.json
RUN yum install -y nano
RUN systemctl enable docker
EXPOSE 80
EXPOSE 8080
EXPOSE 8161
EXPOSE 6379
EXPOSE 8761
CMD ["/usr/sbin/init"]
enjoy!

Resources