Using kubeadm to init kubernetes 1.12.0 falied:node "xxx" not found - docker

My environment:
CentOS7 linux
/etc/hosts:
192.168.0.106 master01
192.168.0.107 node02
192.168.0.108 node01
On master01 machine:
/etc/hostname:
master01
On master01 machine I execute commands as follows:
1)yum install docker-ce kubelet kubeadm kubectl
2)systemctl start docker.service
3)vim /etc/sysconfig/kubelet
EDIT the file:
KUBELET_EXTRA_ARGS="--fail-swap-on=false"
4)systemctl enable docker kubelet
5)kubeadm init --kubernetes-version=v1.12.0 --pod-network-cidr=10.244.0.0/16 servicecidr=10.96.0.0/12 --ignore-preflight-errors=all
THEN
The first error message:
unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
kubelet.go:2236] node "master01" not found
kubelet_node_status.go:70] Attempting to register node master01
Oct 2 23:32:35 master01 kubelet: E1002 23:32:35.974275 49157
kubelet_node_status.go:92] Unable to register node "master01" with API server: Post https://192.168.0.106:6443/api/v1/nodes: dial tcp 192.168.0.106:6443: connect: connection refused
l don't know why node master01 not found?
l have tried a lot of ways but can't solve the problem.
Thank you!

Your issue also might be caused by firewall rules, restricting tcp connection to 6443 port.
So you can temporary disable firewall on master node to validate this:
systemctl stop firewalld
and then try to perform kubeadm init once again.
Hope it helps.

I'm not sure that the problem is always related to firewall rules.
Consider the steps below:
1 ) Try to resolve your error related to: /etc/kubernetes/pki/ca.crt.
Run kubeadm init again without the --ignore-preflight-errors=all and see if all actions in the [certs] phase took place - for example:
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
.
.
2 ) Try running re-installing kubeadm or running kubeadm init with other version (use the --kubernetes-version=X.Y.Z flag).
3 ) Try restarting kubelet with systemctl restart kubelet.
4 ) Check if the docker version need to be updated.
5 ) Open specific ports which are needed (if you decide to stop firewalld remember to start it right after you resolve the issue).

Related

How to use vpnkit with minikube on mac

There are many question around this topic, but not the specific info I am after.
Host OS is Mac, and recently had to uninstall Docker Desktop due to their licensing change. So instead we have moved to minikube, and it is all working great with VirtualBox driver.
But ideally we would like to use the hyperkit driver, as it requires less resources than virtualbox, and is (anecdotally) faster. This also all works great until we connect to our VPN (using cisco anyconnect) and then all outbound networking from within the minikube VM stops working. e.g.
k8> minikube ssh "traceroute 8.8.8.8"
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 46 byte packets
1 host.minikube.internal (192.168.64.1) 0.154 ms 0.181 ms 0.151 ms
2 * * *
Everything else is is fine, inbound networking via ingress is all good. And maven-docker-plugin is happily creating images with the minikube docker daemon. Just nothing outbound.
So figured I'd try to work with VPNKit as I have read it is meant to address this issue. But cannot find a lot of detailed documentation, and so am struggling.
We have tried starting VPNKit with minimal config:
vpnkit --ethernet /tmp/vpkit-ethernet.socket --debug
And then attempt to start minikube, but it fails:
k8> minikube delete
๐Ÿ”ฅ Deleting "minikube" in hyperkit ...
๐Ÿ’€ Removed all traces of the "minikube" cluster.
k8> minikube start --driver=hyperkit --hyperkit-vpnkit-sock=/tmp/vpnkit-ethernet.socket
๐Ÿ˜„ minikube v1.25.1 on Darwin 10.15.7
โœจ Using the hyperkit driver based on user configuration
๐Ÿ‘ Starting control plane node minikube in cluster minikube
๐Ÿ”ฅ Creating hyperkit VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
๐Ÿ”ฅ Deleting "minikube" in hyperkit ...
๐Ÿคฆ StartHost failed, but will try again: creating host: create: Error creating machine: Error in driver during machine creation: hyperkit crashed! command line:
hyperkit loglevel=3 console=ttyS0 console=tty0 noembed nomodeset norestore waitusb=10 systemd.legacy_systemd_cgroup_controller=yes random.trust_cpu=on hw_rng_model=virtio base host=minikube
๐Ÿ”ฅ Creating hyperkit VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
๐Ÿ˜ฟ Failed to start hyperkit VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: hyperkit crashed! command line:
hyperkit loglevel=3 console=ttyS0 console=tty0 noembed nomodeset norestore waitusb=10 systemd.legacy_systemd_cgroup_controller=yes random.trust_cpu=on hw_rng_model=virtio base host=minikube
โŒ Exiting due to PR_HYPERKIT_CRASHED: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: hyperkit crashed! command line:
hyperkit loglevel=3 console=ttyS0 console=tty0 noembed nomodeset norestore waitusb=10 systemd.legacy_systemd_cgroup_controller=yes random.trust_cpu=on hw_rng_model=virtio base host=minikube
๐Ÿ’ก Suggestion: Hyperkit is broken. Upgrade to the latest hyperkit version and/or Docker for Desktop. Alternatively, you may choose an alternate --driver
๐Ÿฟ Related issues:
โ–ช https://github.com/kubernetes/minikube/issues/6079
โ–ช https://github.com/kubernetes/minikube/issues/5780
And in the vpnkit log we see:
time="2022-02-14T06:07:57Z" level=debug msg="usernet: accepted vmnet connection"
time="2022-02-14T06:07:57Z" level=warning msg="Uwt: Pipe.listen: rejected ethernet connection: EOF"
time="2022-02-14T06:08:07Z" level=debug msg="usernet: accepted vmnet connection"
time="2022-02-14T06:08:07Z" level=warning msg="Uwt: Pipe.listen: rejected ethernet connection: EOF"
So kind of implies something is not right with how I started vpnkit. Have played with IP args to ensure it all matches, but does not help.
My guess is that the --ethernet=path arg is not the right type of socket. I have seen there is also --vsock-path=path but specifying this does not appear to create the socket file like --ethernet=path does. Do I have to create this some other way?
Or are there other config options I need to mess with. e.g. I thought --gateway-forwards=path could help, but can find no documentation on file format or contents.
So, I guess two main questions:
Is what we are trying even possible? Is it the the right way to go about it? Or is it much more complicated than simply running the vpnkit command?
If we are on the right track, does anyone have experience with this, and know how to set up the socket for minikube+vpnkit+hyperkit? What args, config, or other setup is required?
And just to note: --hyperkit-vpnkit-sock=auto is not an option for us, as we do not have docker installed, and so the docker socket file does not exist.
And just in case its a version issue:
k8> minikube version
minikube version: v1.25.1
commit: 3e64b11ed75e56e4898ea85f96b2e4af0301f43d
k8> vpnkit --version
854498c13b1884d4a48d84f3569eb34681af2126
k8> hyperkit -v
hyperkit: 0.20200908
Homepage: https://github.com/docker/hyperkit
License: BSD

Cannot initialize Kubernetes cluster on Ubuntu 18.04 (Virtual Box)

I struggle to initialize a simple Kubernetes cluster using Ubuntu on Virtualbox. I tried both server and desktop version, following the official documentation:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
I also tried to follow some other ones, thinking the issue was because i'm using Virtualbox VM's, like this one:
https://medium.com/#gunjangarge/create-kubernetes-cluster-using-kubeadm-on-ubuntu-virtualbox-step-by-step-68a3eeb1f74c
But everytime I have the same issue with port 6443 not being exposed. Sometimes the process starts correctly, giving me the join command:
kubeadm init --pod-network-cidr=192.168.0.0/16
W1029 08:47:53.841460 11540 configset.go:348] WARNING: kubeadm cannot validate component configs
for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.192:6443 --token ztnoww.t8ng5a3jo2kx5cb2 \
--discovery-token-ca-cert-hash
sha256:907dde6cc6d72ed4cd7fe7e7f252e2cf657dd3256fba6ee5ec92027132a9c5af
Sometimes it's not starting at all and timeouting:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
Anyway, even when it's starting, port 6443 is never exposed, and kubelet is not happy with it:
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
โ””โ”€10-kubeadm.conf
Active: active (running) since Thu 2020-10-29 08:48:15 CET; 20s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 13262 (kubelet)
Tasks: 14 (limit: 4666)
CGroup: /system.slice/kubelet.service
โ””โ”€13262 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-contai
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.588386 13262 controller.go:136] failed to ensure node lease exists, will retry in 800ms, error: Get
"https://192.168.1.192:6443/apis/coordination.k8s.io/v1/names
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.785951 13262 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.1.192:644
Okt 29 08:48:23 master kubelet[13262]: I1029 08:48:23.022354 13262 kubelet_node_status.go:70] Attempting to register node master
Okt 29 08:48:24 master kubelet[13262]: I1029 08:48:24.188510 13262 request.go:645] Throttling request took 1.097264312s, request: POST:https://192.168.1.192:6443/api/v1/namespaces/kube-system/pods
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.678880 13262 kubelet_node_status.go:108] Node master was previously registered
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.679004 13262 kubelet_node_status.go:73] Successfully registered node master
Okt 29 08:48:25 master kubelet[13262]: W1029 08:48:25.765981 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:27 master kubelet[13262]: E1029 08:48:27.148246 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
Okt 29 08:48:30 master kubelet[13262]: W1029 08:48:30.767511 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:32 master kubelet[13262]: E1029 08:48:32.164211 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
I have to say I don't know what to do now. I tried for hours with different Ubuntu versions, trying to find solutions on the Internet but I didn't found any solution. I also went trough the logs and found that maybe the config file is not created correctly for any reason:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml
but I found nothing about it, except "try to init the cluster again", which I did several times..
Thank you in advance for your help!
OK, I think I finally found the problem. I tried the same process on another PC and everything worked smoothly, so for anyway of you having a similar issue, just don't try to use VirtualBox and WSL at the same time (even if wsl is shut off)
I just did what's explained here: https://stackoverflow.com/a/63229718/2428805 and now everything's fine...

Can I run k8s master INSIDE a docker container? Getting errors about k8s looking for host's kernel details

In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.
I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/
It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.

Minikube start stuck in waiting for pods and timeout

I try to run a sample application in my Ubuntu 18 vm.
I have installed Docker client and server version of 18.06.1-ce. I already have VirtualBox running.
I use below link and install kubectl 1.14 too: https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-linux
I have Minikube v1.0.1 also installed. But Minikube start command stuck in Waiting for pods: apiserver and timeout
harshana#-Virtual-Machine:~$ sudo minikube start
๐Ÿ˜„ minikube v1.0.1 on linux (amd64)
๐Ÿคน Downloading Kubernetes v1.14.1 images in the background ...
โš ๏ธ Ignoring --vm-driver=virtualbox, as the existing "minikube" VM was created using the none driver.
โš ๏ธ To switch drivers, you may create a new VM using `minikube start -p <name> --vm-driver=virtualbox`
โš ๏ธ Alternatively, you may delete the existing VM using `minikube delete -p minikube`
๐Ÿ”„ Restarting existing none VM for "minikube" ...
โŒ› Waiting for SSH access ...
๐Ÿ“ถ "minikube" IP address is xxx.xxx.x.xxx
๐Ÿณ Configuring Docker as the container runtime ...
๐Ÿณ Version of container runtime is 18.06.1-ce
โŒ› Waiting for image downloads to complete ...
โœจ Preparing Kubernetes environment ...
๐Ÿ’พ Downloading kubeadm v1.14.1
๐Ÿ’พ Downloading kubelet v1.14.1
๐Ÿšœ Pulling images required by Kubernetes v1.14.1 ...
๐Ÿ”„ Relaunching Kubernetes v1.14.1 using kubeadm ...
โŒ› Waiting for pods: apiserver
sudo minikube logs:
May 19 08:11:40 harshana-Virtual-Machine kubelet[10572]: E0519 08:11:40.825465 10572 kubelet.go:2244] node "minikube" not found
May 19 08:11:40 harshana-Virtual-Machine kubelet[10572]: E0519 08:11:40.895848 10572 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Get https://localhost:8443/api/v1/nodes?fieldSelector=metadata.name%!D(MISSING)minikube&limit=500&resourceVersion=0: dial tcp 127.0.0.1:8443: connect: connection refused
I got the same behaviour because I have created a first VM using kvm. I have followed the instructions and deleted the VM. Run the below :
1- minikube delete -p minikube
2- minikube start

Creating new docker-machine instance always fails validating certs using openstack driver

Everytime I try to create a new instance via docker-machine on open stack, I always get this error for validating the certs. I have to end up regenerating the certs right after I create the instance for me to be able to use the instances.
$ docker-machine create --driver openstack --openstack-ssh-user root --openstack-keypair-name "KeyName" --openstack-private-key-file ~/.ssh/id_rsa --openstack-flavor-id 50 --openstack-image-name "Ubuntu-16.04" manager1
Running pre-create checks...
Creating machine...
(staging-worker1) Creating machine...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "xxx.xxx.xxx.xxx:2376": dial tcp xxx.xxx.xxx.xxx:2376: i/o timeout
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
$ docker-machine regenerate-certs manager1
Regenerate TLS machine certs? Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Waiting for SSH to be available...
Detecting the provisioner...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Then it seems to work
$ docker-machine ssh manager1 pwd
/home/ubuntu
But when I try to do env
$ docker-machine env manager1
Error checking TLS connection: Error checking and/or regenerating the certs: There was an error validating certificates for host "xxx.xxx.xxx.xx:2376": dial tcp xxx.xxx.xxx.xx:2376: i/o timeout
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which might stop running containers.
Any ideas on what might be causing this?
I've documented it further in github https://github.com/docker/machine/issues/3829
It turns out my hosting service locked down everything other than 22, 80, and 443 on the Open Stack Security Group Rules. I had to add 2376 TCP Ingress for docker-machine's commands to work.
It helps explain why docker-machine ssh worked but not docker-machine env
On Ubuntu you will need to SSH to your machine and cd into following directory:
cd /etc/systemd/system/docker.service.d/
list all files in it with:
ls -l
you will probably have something like this:
-rw-r--r-- 1 root root 274 Jul 2 17:47 10-machine.conf
-rw-r--r-- 1 root root 101 Jul 2 17:46 override.conf
you will need to delete all files except 10-machine.conf with sudo rm. After that remove existing machine which is failing with:
docker-machine rm machine1
and try to create it one more time like this:
docker-machine create -d generic --generic-ip-address ip --generic-ssh-key ~/.ssh/key --generic-ssh-user username --generic-ssh-port 22 machine1
please change ip, key, username and machine1 with you actual values. It should work now. I hope this helps.

Resources