Docker desktop - kubernetes failed to start - docker

I have installed Docker Desktop (version : 2.3.0.4) and enabled Kubernetes.
I deployed couple of PODS and everything was working fine, Since yesterday I am facing a weird issue mentioned below:
Unable to connect to the server: dial tcp 127.0.0.1:6443: connectex: No
connection could be made because the target machine actively refused it.
As such, no changes were made on my system. I am using Linux Containers on Windows 10 machine.
Following steps I have tried:
Restarted the Docker Desktop
Tried the same with minikube and Docker Desktop both
Tried to disable the firewall but due to some permissions, I am not able to turn it off.
I have reset the kubernetes cluster as well.

I tried numerous different changes to fix docker desktop kubernetes failing to start. What finally worked for me is...
Clicked the troubleshooting icon (it's a bug icon) and then chose Clean/Purge Data.*

Finally,I found the solution for this.
VPN was causing the issue, I am using my office laptop and after restart, VPN was enabled and logged-in and due to this Kubernetes was not working.
After disabling the VPN, Kubernetes cluster working fine.
Hope that helps others as well.

For me, just "Clean and Purge" wasn't enough. Here is what I did.
Log off VPN
Go to bug and "Clean and Purge Data"
Also choose "Reset to Factory Defaults"
Restart Docker Desktop
Choose "Enable Kubernetes"
At this point, the "Starting" took a while for Kubernetes to be enabled. Now's it all good.
$ kubectl get namespace
NAME STATUS AGE
default Active 80s
kube-node-lease Active 82s
kube-public Active 82s
kube-system Active 82s

I tried clean/purge data and resetting factory settings but that didn't worked.
I had to reset kubernetes cluster from here.

In my case, the corporate proxy server caused the Kubernetes startup to fail. Addiing *.docker.internal to the no_proxy hosts solved the issue.

I had similar problem.
Install Minikube
I install minikube and I run as following on windows 10.
starting of kubectl
Then I gave permission for docker.
Check cluster-info
When I check cluster-info result as following
cluster info results
Try to get pods
When I try to get pods I did not get any error.

As #N-ate mentioned above, after clicking Clean/Purge Data which removes all downloaded images from my computer, now docker and kubernates are running properly.
As you can see in the image below, I only have kubernates images running on docker and it takes most of the allocated memory. I guess the failure of starting kubernates was related to this memory issue.

In my case, the Kubernetes (Docker Desktop on Mac) is not running properly though I can manage Pods, Services, etc., when I opened the Docker Desktop, it says
Kubernetes failed to start (red background)
I managed to fix the issue by resetting Docker Desktop and Prune/cleaning the storage.

Even I had similar problem after updating to Docker desktop(version 4.11.1). After I downgraded the version it works fine.
Troubleshooting steps
check is there any errors by running following command
kubectl get events|grep node
and make sure all pods are in running state.
kubectl get pods --namespace kube-system

I don't know for others but for some reasons, the above suggested options didn't work for me while fixing K8s on Docker Desktop on Windows. Tried fixing by cleaning the cluster, resetting to default, restarting pc, installing previous versions of Docker Desktop, enabling my pc HiperVisor, and giving it more resource priority, and others but yet still K8s failed to start, even though the Docker starts.
I chanced on Minikube as an alternative tool (without UI) to create my cluster and interacted with it using Kubectl.
And K8s worked for me locally.
I followed this guide - https://minikube.sigs.k8s.io/docs/start/

My docker-desktop is running behind the company proxy server.
I deleted following Proxy Env Variables from my windows OS.
HTTPS_PROXY:serveraddess
HTTP_PROXY:serveraddress
and I set up manual proxy in docker desktop.

My steps:
restart docker - it didn't help.
reset Kubernetes - it didn't help.
Adding missing 'wslconfig' file to C:\Users[MY USER] - it didn't help.
Restart the computer between any step - it didn't help.
stop using Wsl reuse Wsl - it didn't help.
uninstall docker and install again and enable Kubernetes - it didn't help.
Remove '.kube' folder from C:\Users[MY USER] and reset Kubernetes - It causes the Kubernetes to try stopping, and after failure - restart docker - which succeeded.

Related

NiFi 1.19 keeps restarting within Docker container

I have Docker Desktop on Windows 10 and I run Apache NiFi 1.19 within container.
NiFi keeps restarting, by itself, without giving any useful log message, exception message, or whatever I can trace back with.
Any ideas what can be going wrong? I have tried many things.. including this, but nothing helped out.
Ok, finally, so the problem was built-in Hyper-V backend of the Docker (i.e. the default VM environment that comes with Docker Desktop).
I just opted for Use the WSL 2 based engine choice and problem went away.
From the Docker Desktop dashboard, go to settings and in the General tab, select "Use the WSL 2 based enginge":

Unable to reach registry-1.docker.io from Kind cluster node on WSL2

I am setting up and airflow k8s cluster using kind deployment on a WSL2 setup. When I execute standard helm install $RELEASE_NAME apache-airflow/airflow --namespace $NS it fails. Further investigation shows that cluster worker node cannot connect to registry-1.docker.io.
Error log for one the image pull
Failed to pull image "redis:6-buster": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/redis:6-buster": failed to resolve reference "docker.io/library/redis:6-buster": failed to do request: Head "https://registry-1.docker.io/v2/library/redis/manifests/6-buster": dial tcp: lookup registry-1.docker.io on 172.19.0.1:53: no such host
I can access all other websites from this node e.g. google.com, yahoo.com merriam-webster.com etc. ; even docker.com works. This issue is very specific to registry-1.docker.io.
All the search and links seems to be around general internet connection issue.
Current solution:
If I manually change the /etc/resolv.conf on the kind worker node to point to the IP address from /etc/resolv.conf of the WSL2 Debian main IP address, then it works.
But, this is a dynamic cluster and node and I cannot do this every time. I am currently searching for a way as to how the make it a part of the cluster configuration. Some way that makes it work just by saying kind create cluster and one should be able to use kubectl or helm by default.
However, I am more interested in figuring out why this network setup fails specifically for registry-1.docker.io. Is there some configuration that can be done to avoid changing DNS to host IP or google DNS? As the current network configuration seems to work pretty much for the rest of the internet.
I have documented all the steps and investigation details including some of network configuration details on github repositroy. If you need any further information to help solve the issue, please let me know. I will keep on updating the github documentation as I make progress.
Setup:
Windows 11 with WSL2 without any Docker desktop
WSL2 image : Debian bullseye (11) with docker engine on linux
Docker version : 20.10.2
Kind version : 0.11.1
Kind image: kindest/node:v1.20.7#sha256:cbeaf907fc78ac97ce7b625e4bf0de16e3ea725daf6b04f930bd14c
67c671ff9
I am not sure, if it is an answer or not. After spending 2 days trying to find solution. I thought to change the node image version. On the Kind release page, it says 1.21 as the latest image for the kind version 0.11.1. I had problems with 1.21 to even start the cluster. 1.20 faced this strange DNS image. So went with 1.23. It all worked fine with thus image.
However, to my surprise, when I changed the cluster configuration back to 1.20, the DNS issue was gone. So, I do not what changed due to switch of of the image, but I cannot reproduce the issue again! Maybe it will help someone else
I find that i have found the correct workaround for this bug: Switching IPTables to legacy mode has fixed this for me.
https://github.com/docker/for-linux/issues/1406#issuecomment-1183487816

Testcontainers do not start after replacing Docker Desktop with minikube

I want to make my testcontainers in Java integration tests work with minikube replacing Docker Desktop.
I followed below article to get started:
https://www.atomicjar.com/2021/10/docker-on-windows-and-macos/#minikube
This is what I've got in testcontainers.properties
docker.client.strategy=org.testcontainers.dockerclient.EnvironmentAndSystemPropertyClientProviderStrategy
docker.host=tcp\://192.168.64.2\:2376
docker.cert.path=/Users/username/.minikube/certs
docker.tls.verify=true
Although my docker is up and running, I'm getting following exception:
Caused by: java.lang.IllegalStateException: Could not find a valid Docker environment. Please see logs and check configuration
Can anybody please suggest anything to make it working?
TA
If you are using gradle try -no-daemon flag to use a new daemon. Your old gradle daemon still using your previous testcontainers properties, also restart your IDE if you're running your build inside.
After restarting Minikube and Intellij editor, and updating testcontainer-bom to be the latest - from 1.15 to 1.16.2, I was able to pull some third-party docker images. This means docker is working now.
However, I'm still trying to find a way to work with local images (Other application docker images) for integration testing as it used to work with Docker Desktop.

Docker Desktop Kubernetes Unable to connect to the server: EOF

Earlier today I had increased my Docker desktop resources, but when ever since it restarted Kubernetes has not been able to complete its startup. Whenever I try to run a kubectl command, I get Unable to connect to the server: EOF in response.
I had thought that it started because I hadn't deleting a helm chart before adjusting the resource values in Settings, thus said resources having been assigned to the pods instead of the Kubernetes api server. But I have not been able to fix this issue.
This is what I have tried thus far:
Restarting Docker again
Reset Kubernetes
Reset Docker to factory settings
Deleting the VM in hyper-v and restarting Docker
Uninstalling and reinstalling Docker Desktop
Deleting the pki folder and restart Docker
Set the Environment Variable for KUBECONFIG
Deleting .kube/config and restart
Another clean reinstall of Docker Desktop
But Kubernetes does not complete its startup, so I still get Unable to connect to the server: EOF in response.
Is there anything I haven't tried yet?
I'll share that what solved this for me was Docker Desktop settings feature for "reset kubernetes cluster". I know that #shenyongo said that a "reset kubernetes" didn't work, and I suppose they mean this.
But for the sake of other readers who may find this, I had this same error message (with Docker Desktop on Windows 11, using wsl2), and the solution for me was indeed to do this:
open the Settings page (in Docker Desktop--right-click on it in the status tray)
then choose "Kubernetes" on the left
then choose "reset kubernetes cluster"
Yes, that warns that "all stacks and kubernetes resources will be deleted", but as nothing else had worked for me (and I wasn't worried about losing much), I tried it, and it did the trick. In moments, all my k8s functionality was back to working.
As background, k8s had been working fine for me for some time. It was just that one day I found I was getting this error. I searched and searched and found lots of folks asking about it but not getting answers, let alone this answer. To be clear, like the OP here I had tried restarting Docker Desktop, restarting the host machine, even downloading and installing an available DD update (I was only a bit behind), and none of those worked. I didn't proceed to ALL the steps shenyongo did, as I thought I'd try this first, and the reset worked.
Hope that may help others. I realize some may fear losing something, but this helps stress the power of declarative vs imperative k8s configuration. It SHOULD be easy to recreate most everything if necessary. I realize it may not be so for everyone.

Docker Compose stuck downloading or pulling fs layer

I have the latest Docker for Mac installed, and I'm running into a problem where it appears that docker-compose up is stuck in a Downloading state for one of the containers:
± |master ✗| → docker-compose up --build
Pulling container (repo.io/company/container:prod)...
prod: Pulling from company/container
somehash: Already exists
somehash: Already exists
somehash: Already exists
somehash: Already exists
somehash: Pulling fs layer
somehash: Already exists
somehash: Already exists
somehash: Downloading [=================================================> ] 234.6 MB/239.3 MB
somehash: Download complete
somehash: Download complete
^^ this is literally what it looks like on my command line. Stopping and starting hasn't helped, it immediately outputs this same output.
I've tried to rm the container but I guess it doesn't yet exist, it returns the output No stopped containers. --force-recreate also gets stuck in the same place. And perhaps I'm not googling for the right terminology but I haven't found anything useful to try - any pointers?
I just needed to restart Docker.
Linux users can use sudo service docker restart.
Docker for Mac has a handy button for this in the Docker widget in the macOS toolbar:
If you happen to be using Docker Toolkit try docker-machine restart.
I faced the same problem! Restarting the service didn't help, downloading again didn't help. It used to get stuck at random instances leaving me with no option but to kill the pull request.
One thing which worked for me was to download 1 file at a time. For Ubuntu users, you can use the following steps:
Stop the service:
sudo service docker stop
Start docker with max concurrent download set as 1:
sudo dockerd --max-concurrent-downloads 1
Download the required image:
sudo docker pull <image_name>
Download images, after that stop the terminal and start the daemon again as it was earlier.
sudo service docker start
I had the similar situation this morning where my network suddenly went down and I was forced to power cycle the modern, while docker-compose was still in the middle of downloading stuff from docker hub.
Yes, bouncing the docker daemon process seems to resolve this.
For Linux users - do sudo service docker restart to fix it.
Go to the Docker Preferences from its menu bar icon. Within there is a "bug" icon. Click on that and then "clean / Purge data"
I'm running OSX and restarting Docker for Mac didn't help. Neither did a full restart or upgrading VirtualBox. What did work was turning my wifi interface on and off every time it got stuck. I had to do this repeatedly, but it eventually downloaded the entire image.
Directly download the necessary images using docker, e.g.
docker pull company/container
and then run
docker-compose up
again. Worked for me on MacOS.
I found a possible workaround.
I have my docker engine installed in a Ubuntu 18.04 Snap Environment.
I discovered searching in some forums that users relate this behaviors to limitation in the download bandwith.
So in the picture below you are going to watch that the components was stucked
Part of the Downloads stucked and finally I cancelled the process CTRL + C
I added two parameters or flags in the configuration file that controls the docker daemon behavior: max-concurrent-downloads 1 and max-concurrent-uploads 1
In my case remember, i am working in a snap environment. This file is located in this directory: /var/lib/docker/current/config/daemon.json
REMEMBER TO STOP ALL DOCKER PROCESS BEFORE THE FILE MODIFICATION, AND CREATE BACKUP OF THE FILE
Add the two lines in the picture. This is going to help you to limit the downloads to only one by one
This is the process that helped me to resolve this problem.
Download Succesfull
I had this issue in my VirtualBox when doing a docker pull on the image but it got stuck at a specific position and never moved from there. So, the issue was due to the network adapters in my VM. I was using NAT by default. When I switched it to "Bridged adapter", the issue went away.
I had a similar problem on docker for windows for a couple of days and when I tried to connect to the virtual machine (via Hyper-V Manager) the downloads started speeding along. I have no idea why but it worked for me...
Completely remove docker
Install docker again
It should work now
I tried to restar docker, update docker, but didnt help

Resources