I had created a Virtual switch with the name "Minikube2" . Previously I had created Virtual switch with the name "minikube" ,but deleted it later as there was config issue.
Did all the required configuration -"sharing on ethernet .."
Now when I try to run
minikube start --kubernetes-version="v1.10.3" --vm-driver="hyperv" --hyperv-virtual-switch="minikube2"
it downloads the ISO , but fails to configure the switch -
it says vswitch "minikube2" not found
Short answer is to delete C:\Users\%USERNAME%\.minikube and try again.
Below is my investigation:
First I have created Virtual Switch "minikube", started the cluster and it worked as expected.
Then I stopped minikube, created new "Minikube2" switch and started minikube
minikube start --kubernetes-version="v1.10.3" --vm-driver="hyperv" --hyperv-virtual-switch="minikube2" --v=9
Appeared issue:
Starting local Kubernetes v1.10.3 cluster... Starting VM... [executing
==>] : C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -NoProfile -NonInteractive ( Hyper-V\Get-VM minikube ).state [stdout =====>] : Off
[stderr =====>] : [executing ==>] :
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -NoProfile
-NonInteractive Hyper-V\Start-VM minikube [stdout =====>] : [stderr =====>] : Hyper-V\Start-VM : 'minikube' failed to start. Synthetic Ethernet Port (Instance ID AF9D08DC-2625-4F24-93E5-E09BAD904899):
Error 'Insufficient system resources exist to complete the requested
service.'. Failed to allocate resources while connecting to a virtual
network. The Ethernet switch may not exist. 'minikube' failed to
start. (Virtual machine ID 863D6558-78EC-4648-B712-C1FDFC907588)
'minikube' Synthetic Ethernet Port: Failed to finish reserving
resources with Error 'Insufficient system resources exist to complete
the requested service.' (0x800705AA). (Virtual machine ID
863D6558-78EC-4648-B712-C1FDFC907588) 'minikube' failed to allocate
resources while connecting to a virtual network: Insufficient system
resources exist to complete the requested service. (0x800705AA)
(Virtual Machine ID 863D6558-78EC-4648-B712-C1FDFC907588). The
Ethernet switch may not exist. Could not find Ethernet switch
'minikube'. At line:1 char:1
+ Hyper-V\Start-VM minikube
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Start-VM], VirtualizationException
+ FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands.StartVM
E1022 12:50:43.384867 6216 start.go:168] Error starting host: Error
starting stopped host: exit status 1.
Retrying. E1022 12:50:43.398832 6216 start.go:174] Error starting
host: Error starting stopped host: exit status 1 PS
C:\Windows\system32>
Then I deleted C:\Users\%USERNAME%\.minikube , minikube vm inside Hyper-V and started again:
C:\Windows\system32> minikube start --kubernetes-version="v1.10.3" --vm-driver="hyperv" --hyperv-virtual-switch="minikube2" --v=9
Result:
Starting local Kubernetes v1.10.3 cluster... Starting VM...
Downloading Minikube ISO
170.78 MB / 170.78 MB [============================================] 100.00% 0s
Creating CA: C:\Users\Vitalii.minikube\certs\ca.pem
Creating client certificate: C:\Users\Vitalii.minikube\certs\cert.pem
----- [stderr =====>] : Using switch "Minikube2"
----- Moving files into cluster...
Downloading kubeadm v1.10.3
Downloading kubelet v1.10.3 Finished
Downloading kubeadm v1.10.3 Finished
Finished Downloading kubelet v1.10.3
Setting up certs... Connecting to
cluster... Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
PS C:\Windows\system32> kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-c4cffd6dc-cjzsm 1/1 Running 0 1m
kube-system etcd-minikube 1/1 Running 0 56s
kube-system kube-addon-manager-minikube 1/1 Running 0 13s
kube-system kube-apiserver-minikube 1/1 Running 0 41s
kube-system kube-controller-manager-minikube 1/1 Running 0 1m
kube-system kube-dns-86f4d74b45-w62rv 2/3 Running 0 1m
kube-system kube-proxy-psgss 1/1 Running 0 1m
kube-system kube-scheduler-minikube 1/1 Running 0 21s
kube-system kubernetes-dashboard-6f4cfc5d87-jz266 1/1 Running 0 1m
kube-system storage-provisioner 1/1 Running 0 1m
It looks like HyperV driver has some problems running Minikube. For me when trying to run it on Windows it was much simpler to use the docker driver. Just run:
minikube start --driver=docker
and it works without problems. Make sure your docker daemon is running e.g. with Docker Desktop. For reference about drivers look here: https://kubernetes.io/docs/setup/learning-environment/minikube/#specifying-the-vm-driver
Related
Trying to connect to IP of a node and getting a timeout error
I have started minikube
minikube start
* minikube v1.27.0 on Microsoft Windows 10 Pro 10.0.19042 Build 19042
! Kubernetes 1.25.0 has a known issue with resolv.conf. minikube is using a workaround that should work for most use cases.
! For more information, see: https://github.com/kubernetes/kubernetes/issues/112135
* Using the hyperv driver based on existing profile
* Starting control plane node minikube in cluster minikube
* Restarting existing hyperv VM for "minikube" ...
* Preparing Kubernetes v1.25.0 on Docker 20.10.18 ...
* Verifying Kubernetes components...
- Using image gcr.io/k8s-minikube/storage-provisioner:v5
* Enabled addons: storage-provisioner, default-storageclass
! C:\Program Files\Docker\Docker\resources\bin\kubectl.exe is version 1.22.4, which may have incompatibilites with Kubernetes 1.25.0.
- Want kubectl v1.25.0? Try 'minikube kubectl -- get pods -A'
* Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
Checked that the pods are up and running
PS C:\WINDOWS\system32> minikube kubectl -- get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default account-docker-kubernetes 1/1 Running 3 (2m59s ago) 3h43m
kube-system coredns-565d847f94-dh5qw 1/1 Running 4 (2m59s ago) 3h57m
kube-system etcd-minikube 1/1 Running 0 2m1s
kube-system kube-apiserver-minikube 1/1 Running 0 2m2s
kube-system kube-controller-manager-minikube 1/1 Running 4 (2m59s ago) 3h57m
kube-system kube-proxy-gs6pm 1/1 Running 4 (2m59s ago) 3h57m
kube-system kube-scheduler-minikube 1/1 Running 4 (2m59s ago) 3h57m
kube-system storage-provisioner 1/1 Running 6 (2m59s ago) 3h57m
Checked the service
PS C:\WINDOWS\system32> kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
account-docker-kubernetes NodePort 10.105.105.236 <none> 8082:30163/TCP 3h44m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4h
Found the IP of the node
PS C:\WINDOWS\system32> minikube ip
172.25.177.1
so to connect to the service from outside Kubernetes its node ip plus service port
But http://172.25.177.1:30163/bank/health/ I get a timeout.
So i tried to connect to the Pod from inside first I get the pod IP.
PS C:\WINDOWS\system32> kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
account-docker-kubernetes 1/1 Running 3 (12m ago) 3h53m 172.18.0.2 minikube <none> <none>
Then exec in
PS C:\WINDOWS\system32>
PS C:\WINDOWS\system32> kubectl exec --stdin --tty account-docker-kubernetes -- /bin/bash
root#account-docker-kubernetes:/app#
Then ran the curl command on the pod returns 200
root#account-docker-kubernetes:/app# curl -v http://172.18.0.2:8082/bank/health/
* Trying 172.18.0.2:8082...
* TCP_NODELAY set
* Connected to 172.18.0.2 (172.18.0.2) port 8082 (#0)
> GET /bank/health/ HTTP/1.1
> Host: 172.18.0.2:8082
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200
< Content-Type: text/plain;charset=UTF-8
< Content-Length: 2
< Date: Sat, 01 Oct 2022 18:09:21 GMT
<
* Connection #0 to host 172.18.0.2 left intact
So I can connect from inside the pod but not from outside using the node IP.
I have also set up a firewall rule to allow connect to port 30163.
I feel I have created an abomination. The goal of what I am doing is to run a docker image and start the AWX web application an be able to use AWX on my local machine. The issue with this is that AWX uses kubernetes to run. I have created an image that is able to run kubernetes and the AWX application inside a container. The final output after running my bash script in the container to start AWX looks like this:
NAMESPACE NAME READY STATUS RESTARTS AGE
awx-operator-system awx-demo-586bd67d59-vj79v 4/4 Running 0 3m14s
awx-operator-system awx-demo-postgres-0 1/1 Running 0 4m11s
awx-operator-system awx-operator-controller-manager-5b4fdf998d-7tzgh 2/2 Running 0 5m4s
ingress-nginx ingress-nginx-admission-create-pfcqs 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-admission-patch-8rghp 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-controller-755dfbfc65-f7vm7 1/1 Running 0 5m33s
kube-system coredns-6d4b75cb6d-4lnvw 1/1 Running 0 5m33s
kube-system etcd-minikube 1/1 Running 0 5m46s
kube-system kube-apiserver-minikube 1/1 Running 0 5m45s
kube-system kube-controller-manager-minikube 1/1 Running 0 5m45s
kube-system kube-proxy-ddnh7 1/1 Running 0 5m34s
kube-system kube-scheduler-minikube 1/1 Running 0 5m45s
kube-system storage-provisioner 1/1 Running 1 (5m33s ago) 5m43s
go to http://192.168.49.2:30085 , the username is admin and the password is XL8aBJPy16ziBau84v63QJLNVw2JGmnb
So I believe that it is running and starting properly. The IP address 192.168.49.2 is the IP of one of the kubernetes pods. I have been struggeling to forward the info coming from this pod to my local machine. I have been trying to go from Kubernetes pod -> docker localhost -> local machine local host.
I have tried using kubectl proxy, host.docker.internal curl and a few other with no success. However I might be using these in the wrong form.
I understand that docker containers run in a very isolated environment so is it possible to forward this information from the pod to my local machine?
Thanks for your time!
When trying to create Pods that can use GPU, I get the error "exec: "nvidia-smi": executable file not found in $PATH" ".
To explain the error from the beginning, my main goal was to create JupyterHub enviroments that can use GPU. I installed Zero to JupyterHub for Kubernetes. I followed these steps to be able to use GPU. When I check my nodes GPUs seems schedulable by Kubernetes. So far everything seemed fine.
kubectl get nodes -o=custom-columns=NAME:.metadata.name,GPUs:.status.capacity.'nvidia\.com/gpu'
NAME GPUs
arge-server 1
However, when I logged in to JupyetHub and tried to open the profile using GPU, I got an error: [Warning] 0/1 nodes are available: 1 Insufficient nvidia.com/gpu. So, I checked the Pods and I found that they were all in the "Waiting: PodInitializing" state.
kubectl get pods -n gpu-operator-resources
NAME READY STATUS RESTARTS AGE
nvidia-dcgm-x5rqs 0/1 Init:0/1 2 6d20h
nvidia-device-plugin-daemonset-jhjhb 0/1 Init:0/1 0 6d20h
gpu-feature-discovery-pd4xv 0/1 Init:0/1 2 6d20h
nvidia-dcgm-exporter-7mjgt 0/1 Init:0/1 2 6d20h
nvidia-operator-validator-9xjmv 0/1 Init:Error 10 26m
After that, I took a closer look at the Pod nvidia-operator-validator-9xjmv, which was the beginning of the error, and I saw that the toolkit-validation container was throwing a CrashLoopBackOff error. Here is the relevant part of the log:
kubectl describe pod nvidia-operator-validator-9xjmv -n gpu-operator-resources
Name: nvidia-operator-validator-9xjmv
Namespace: gpu-operator-resources
.
.
.
Controlled By: DaemonSet/nvidia-operator-validator
Init Containers:
.
.
.
toolkit-validation:
Container ID: containerd://e7d004f0809cbefdae5407ea42eb659972ea7eefa5dd6e45e968cbf3ed22bf2e
Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2
Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator#sha256:a07fd1c74e3e469ac316d17cf79635173764fdab3b681dbc282027a23dbbe227
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
nvidia-validator
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 18 Nov 2021 12:55:00 +0300
Finished: Thu, 18 Nov 2021 12:55:00 +0300
Ready: False
Restart Count: 16
Environment:
WITH_WAIT: false
COMPONENT: toolkit
Mounts:
/run/nvidia/validations from run-nvidia-validations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hx7ls (ro)
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 58m default-scheduler Successfully assigned gpu-operator-resources/nvidia-operator-validator-9xjmv to arge-server
Normal Pulled 58m kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 58m kubelet Created container driver-validation
Normal Started 58m kubelet Started container driver-validation
Normal Pulled 56m (x5 over 58m) kubelet Container image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v1.8.2" already present on machine
Normal Created 56m (x5 over 58m) kubelet Created container toolkit-validation
Normal Started 56m (x5 over 58m) kubelet Started container toolkit-validation
Warning BackOff 3m7s (x255 over 58m) kubelet Back-off restarting failed container
Then, I looked at the logs of the container and I got the following error.
kubectl logs -n gpu-operator-resources -f nvidia-operator-validator-9xjmv -c toolkit-validation
time="2021-11-18T09:29:24Z" level=info msg="Error: error validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"
toolkit is not ready
For similar issues, it was suggested to delete the failed Pod and deployment. However, doing these did not fix my problem. Do you have any suggestions?
I have;
Ubuntu 20.04
Kubernetes v1.21.6
Docker 20.10.10
NVIDIA-SMI 470.82.01
CUDA 11.4
CPU: Intel Xeon E5-2683 v4 (32) # 2.097GHz
GPU: NVIDIA GeForce RTX 2080 Ti
Memory: 13815MiB / 48280MiB
Thanks in advance.
In case you're are still having the issue, we just had the same issue on our cluster, the "dirty" fix is to do that:
rm /run/nvidia/driver
ln -s / /run/nvidia/drive
kubectl delete pod -n gpu-operator nvidia-operator-validator-xxxxx
The reason is the init pod of the nvidia-operator-validator try to execute nvidia-smi within a chroot from /run/nvidia/driver .. which is a tmpfs (so doesn't persist accross reboot) and is not populated when performing a manual install of the drivers.
Do hope for a better fix from Nvidia.
After installing this is what my pods look like
Running pods
NAME READY STATUS RESTARTS AGE
elk-elasticsearch-client-5ffc974f8-987zv 1/1 Running 0 21m
elk-elasticsearch-curator-1582107120-4f2wm 0/1 Completed 0 19m
elk-elasticsearch-data-0 0/1 Pending 0 21m
elk-elasticsearch-exporter-84ff9b656d-t8vw2 1/1 Running 0 21m
elk-elasticsearch-master-0 1/1 Running 0 21m
elk-elasticsearch-master-1 1/1 Running 0 20m
elk-filebeat-4sxn9 0/2 Init:CrashLoopBackOff 9 21m
elk-kibana-77b97d7c69-d4jzz 1/1 Running 0 21m
elk-logstash-0 0/2 Pending 0 21m
So filebeat refuses to start.
Getting the logs from this node I get
Exiting: Couldn't connect to any of the configured Elasticsearch hosts. Errors: [Error connection to Elasticsearch http://elk-elasticsearch-client.elk.svc:9200: Get http://elk-elasticsearch-client.elk.svc:9200: lookup elk-elasticsearch-client.elk.svc on 10.96.0.10:53: no such host]
Also when trying to access the kibana node (the only node i can call using http) I get that it is not ready.
get pv:
pvc-9b9b13d8-48d2-4a79-a10c-8d1278554c75 4Gi RWO Delete Bound default/data-elk-elasticsearch-master-0 standard 113m
pvc-d8b361d7-8e04-4300-a0f8-c79f7cea7e44 4Gi RWO Delete Bound default/data-elk-elasticsearch-master-1 standard 112m
I'm running minikube with the none vm-driver which it tells me, does not respect the memory or cpu-flag. But I don't get it complaining about resources
kubectl version 1.17
docker version i 19.03.5, build 633a0ea838
minikube version 1.6.2
The elk stack was installed using helm.
I have the following versions:
elasticsearch-1.32.2.tgz
elasticsearch-curator-2.1.3.tgz
elasticsearch-exporter-2.2.0.tgz
filebeat-4.0.0.tgz
kibana-3.2.6.tgz
logstash-2.4.0.tgz
Running on ubuntu 18.04
Tearing everything down and then installing the required components from other helm-charts solved the issues. It may be that the charts I was using were not intended to run locally on minikube.
I have created a k8s cluster with RHEL7 with kubernetes packages GitVersion:"v1.8.1". I'm trying to deploy wordpress on my custom cluster. But pod creation is always stuck in ContainerCreating state.
phani#k8s-master]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default wordpress-766d75457d-zlvdn 0/1 ContainerCreating 0 11m
kube-system etcd-k8s-master 1/1 Running 0 1h
kube-system kube-apiserver-k8s-master 1/1 Running 0 1h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 1h
kube-system kube-dns-545bc4bfd4-bb8js 3/3 Running 0 1h
kube-system kube-proxy-bf4zr 1/1 Running 0 1h
kube-system kube-proxy-d7zvg 1/1 Running 0 34m
kube-system kube-scheduler-k8s-master 1/1 Running 0 1h
kube-system weave-net-92zf9 2/2 Running 0 34m
kube-system weave-net-sh7qk 2/2 Running 0 1h
Docker Version:1.13.1
Pod status from descibe command
Normal Scheduled 18m default-scheduler Successfully assigned wordpress-766d75457d-zlvdn to worker1
Normal SuccessfulMountVolume 18m kubelet, worker1 MountVolume.SetUp succeeded for volume "default-token-tmpcm"
Warning DNSSearchForming 18m kubelet, worker1 Search Line limits were exceeded, some dns names have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local
Warning FailedCreatePodSandBox 14m kubelet, worker1 Failed create pod sandbox.
Warning FailedSync 25s (x8 over 14m) kubelet, worker1 Error syncing pod
Normal SandboxChanged 24s (x8 over 14m) kubelet, worker1 Pod sandbox changed, it will be killed and re-created.
from the kubelet log I observed below error on worker
error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
But kubelet is stable no problems seen on worker.
How do I solve this problem?
I checked the cni failure, I couldn't find anything.
~]# ls /opt/cni/bin
bridge cnitool dhcp flannel host-local ipvlan loopback macvlan noop ptp tuning weave-ipam weave-net weave-plugin-2.3.0
In journal logs below messages are repetitively appeared . seems like scheduler is trying to create the container all the time.
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421184 14339 remote_runtime.go:115] StopPodSandbox "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421212 14339 kuberuntime_manager.go:780] Failed to stop sandbox {"docker" "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304"}
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421247 14339 kuberuntime_manager.go:580] killPodWithSyncResult failed: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421262 14339 pod_workers.go:182] Error syncing pod 7f1c6bf1-6af3-11e8-856b-fa163e3d1891 ("wordpress-766d75457d-spdrb_default(7f1c6bf1-6af3-11e8-856b-fa163e3d1891)"), skipping: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Failed create pod sandbox.
... is almost always a CNI failure; I would check on the node that all the weave containers are happy, and that /opt/cni/bin is present (or its weave equivalent)
You may have to check both the journalctl -u kubelet.service as well as the docker logs for any containers running to discover the full scope of the error on the node.
It's seem to working by removing the$KUBELET_NETWORK_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
I have removed $KUBELET_NETWORK_ARGS and restarted the worker node then pods got deployed successfully.
As Matthew said it's most likely a CNI failure.
First, find the node this pod is running on:
kubectl get po wordpress-766d75457d-zlvdn -o wide
Next in the node where the pod is located check /etc/cni/net.d if you have more than one .conf then you can delete one and restart the node.
source: https://github.com/kubernetes/kubeadm/issues/578.
note this is one of the solutions.
While hopefully it's no one else's problem, for me, this happened when part of my filesystem was full.
I had pods stuck in ContainerCreating only on one node in my cluster. I also had a bunch of pods which I expected to shutdown, but hadn't. Someone recommended running
sudo systemctl status kubelet -l
which showed me a bunch of lines like
Jun 18 23:19:56 worker01 kubelet[1718]: E0618 23:19:56.461378 1718 kuberuntime_manager.go:647] createPodSandbox for pod "REDACTED(2c681b9c-cf5b-11eb-9c79-52540077cc53)" failed: mkdir /var/log/pods/2c681b9c-cf5b-11eb-9c79-52540077cc53: no space left on device
I confirmed that I was out of space with
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 189G 0 189G 0% /dev
tmpfs 189G 0 189G 0% /sys/fs/cgroup
/dev/mapper/vg01-root 20G 7.0G 14G 35% /
/dev/mapper/vg01-tmp 4.0G 34M 4.0G 1% /tmp
/dev/mapper/vg01-home 4.0G 72M 4.0G 2% /home
/dev/mapper/vg01-varlog 10G 10G 20K 100% /var/log
/dev/mapper/vg01-varlogaudit 2.0G 68M 2.0G 4% /var/log/audit
I just had to clear out that dir (and did some manual cleanup on all the pending pods and pods that were stuck running).