Unable to reach registry-1.docker.io from Kind cluster node on WSL2 - docker

I am setting up and airflow k8s cluster using kind deployment on a WSL2 setup. When I execute standard helm install $RELEASE_NAME apache-airflow/airflow --namespace $NS it fails. Further investigation shows that cluster worker node cannot connect to registry-1.docker.io.
Error log for one the image pull
Failed to pull image "redis:6-buster": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/redis:6-buster": failed to resolve reference "docker.io/library/redis:6-buster": failed to do request: Head "https://registry-1.docker.io/v2/library/redis/manifests/6-buster": dial tcp: lookup registry-1.docker.io on 172.19.0.1:53: no such host
I can access all other websites from this node e.g. google.com, yahoo.com merriam-webster.com etc. ; even docker.com works. This issue is very specific to registry-1.docker.io.
All the search and links seems to be around general internet connection issue.
Current solution:
If I manually change the /etc/resolv.conf on the kind worker node to point to the IP address from /etc/resolv.conf of the WSL2 Debian main IP address, then it works.
But, this is a dynamic cluster and node and I cannot do this every time. I am currently searching for a way as to how the make it a part of the cluster configuration. Some way that makes it work just by saying kind create cluster and one should be able to use kubectl or helm by default.
However, I am more interested in figuring out why this network setup fails specifically for registry-1.docker.io. Is there some configuration that can be done to avoid changing DNS to host IP or google DNS? As the current network configuration seems to work pretty much for the rest of the internet.
I have documented all the steps and investigation details including some of network configuration details on github repositroy. If you need any further information to help solve the issue, please let me know. I will keep on updating the github documentation as I make progress.
Setup:
Windows 11 with WSL2 without any Docker desktop
WSL2 image : Debian bullseye (11) with docker engine on linux
Docker version : 20.10.2
Kind version : 0.11.1
Kind image: kindest/node:v1.20.7#sha256:cbeaf907fc78ac97ce7b625e4bf0de16e3ea725daf6b04f930bd14c
67c671ff9

I am not sure, if it is an answer or not. After spending 2 days trying to find solution. I thought to change the node image version. On the Kind release page, it says 1.21 as the latest image for the kind version 0.11.1. I had problems with 1.21 to even start the cluster. 1.20 faced this strange DNS image. So went with 1.23. It all worked fine with thus image.
However, to my surprise, when I changed the cluster configuration back to 1.20, the DNS issue was gone. So, I do not what changed due to switch of of the image, but I cannot reproduce the issue again! Maybe it will help someone else

I find that i have found the correct workaround for this bug: Switching IPTables to legacy mode has fixed this for me.
https://github.com/docker/for-linux/issues/1406#issuecomment-1183487816

Related

All docker stack are restarting automatically

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.
After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

Docker desktop - kubernetes failed to start

I have installed Docker Desktop (version : 2.3.0.4) and enabled Kubernetes.
I deployed couple of PODS and everything was working fine, Since yesterday I am facing a weird issue mentioned below:
Unable to connect to the server: dial tcp 127.0.0.1:6443: connectex: No
connection could be made because the target machine actively refused it.
As such, no changes were made on my system. I am using Linux Containers on Windows 10 machine.
Following steps I have tried:
Restarted the Docker Desktop
Tried the same with minikube and Docker Desktop both
Tried to disable the firewall but due to some permissions, I am not able to turn it off.
I have reset the kubernetes cluster as well.
I tried numerous different changes to fix docker desktop kubernetes failing to start. What finally worked for me is...
Clicked the troubleshooting icon (it's a bug icon) and then chose Clean/Purge Data.*
Finally,I found the solution for this.
VPN was causing the issue, I am using my office laptop and after restart, VPN was enabled and logged-in and due to this Kubernetes was not working.
After disabling the VPN, Kubernetes cluster working fine.
Hope that helps others as well.
For me, just "Clean and Purge" wasn't enough. Here is what I did.
Log off VPN
Go to bug and "Clean and Purge Data"
Also choose "Reset to Factory Defaults"
Restart Docker Desktop
Choose "Enable Kubernetes"
At this point, the "Starting" took a while for Kubernetes to be enabled. Now's it all good.
$ kubectl get namespace
NAME STATUS AGE
default Active 80s
kube-node-lease Active 82s
kube-public Active 82s
kube-system Active 82s
I tried clean/purge data and resetting factory settings but that didn't worked.
I had to reset kubernetes cluster from here.
In my case, the corporate proxy server caused the Kubernetes startup to fail. Addiing *.docker.internal to the no_proxy hosts solved the issue.
I had similar problem.
Install Minikube
I install minikube and I run as following on windows 10.
starting of kubectl
Then I gave permission for docker.
Check cluster-info
When I check cluster-info result as following
cluster info results
Try to get pods
When I try to get pods I did not get any error.
As #N-ate mentioned above, after clicking Clean/Purge Data which removes all downloaded images from my computer, now docker and kubernates are running properly.
As you can see in the image below, I only have kubernates images running on docker and it takes most of the allocated memory. I guess the failure of starting kubernates was related to this memory issue.
In my case, the Kubernetes (Docker Desktop on Mac) is not running properly though I can manage Pods, Services, etc., when I opened the Docker Desktop, it says
Kubernetes failed to start (red background)
I managed to fix the issue by resetting Docker Desktop and Prune/cleaning the storage.
Even I had similar problem after updating to Docker desktop(version 4.11.1). After I downgraded the version it works fine.
Troubleshooting steps
check is there any errors by running following command
kubectl get events|grep node
and make sure all pods are in running state.
kubectl get pods --namespace kube-system
I don't know for others but for some reasons, the above suggested options didn't work for me while fixing K8s on Docker Desktop on Windows. Tried fixing by cleaning the cluster, resetting to default, restarting pc, installing previous versions of Docker Desktop, enabling my pc HiperVisor, and giving it more resource priority, and others but yet still K8s failed to start, even though the Docker starts.
I chanced on Minikube as an alternative tool (without UI) to create my cluster and interacted with it using Kubectl.
And K8s worked for me locally.
I followed this guide - https://minikube.sigs.k8s.io/docs/start/
My docker-desktop is running behind the company proxy server.
I deleted following Proxy Env Variables from my windows OS.
HTTPS_PROXY:serveraddess
HTTP_PROXY:serveraddress
and I set up manual proxy in docker desktop.
My steps:
restart docker - it didn't help.
reset Kubernetes - it didn't help.
Adding missing 'wslconfig' file to C:\Users[MY USER] - it didn't help.
Restart the computer between any step - it didn't help.
stop using Wsl reuse Wsl - it didn't help.
uninstall docker and install again and enable Kubernetes - it didn't help.
Remove '.kube' folder from C:\Users[MY USER] and reset Kubernetes - It causes the Kubernetes to try stopping, and after failure - restart docker - which succeeded.

Docker Desktop Kubernetes Unable to connect to the server: EOF

Earlier today I had increased my Docker desktop resources, but when ever since it restarted Kubernetes has not been able to complete its startup. Whenever I try to run a kubectl command, I get Unable to connect to the server: EOF in response.
I had thought that it started because I hadn't deleting a helm chart before adjusting the resource values in Settings, thus said resources having been assigned to the pods instead of the Kubernetes api server. But I have not been able to fix this issue.
This is what I have tried thus far:
Restarting Docker again
Reset Kubernetes
Reset Docker to factory settings
Deleting the VM in hyper-v and restarting Docker
Uninstalling and reinstalling Docker Desktop
Deleting the pki folder and restart Docker
Set the Environment Variable for KUBECONFIG
Deleting .kube/config and restart
Another clean reinstall of Docker Desktop
But Kubernetes does not complete its startup, so I still get Unable to connect to the server: EOF in response.
Is there anything I haven't tried yet?
I'll share that what solved this for me was Docker Desktop settings feature for "reset kubernetes cluster". I know that #shenyongo said that a "reset kubernetes" didn't work, and I suppose they mean this.
But for the sake of other readers who may find this, I had this same error message (with Docker Desktop on Windows 11, using wsl2), and the solution for me was indeed to do this:
open the Settings page (in Docker Desktop--right-click on it in the status tray)
then choose "Kubernetes" on the left
then choose "reset kubernetes cluster"
Yes, that warns that "all stacks and kubernetes resources will be deleted", but as nothing else had worked for me (and I wasn't worried about losing much), I tried it, and it did the trick. In moments, all my k8s functionality was back to working.
As background, k8s had been working fine for me for some time. It was just that one day I found I was getting this error. I searched and searched and found lots of folks asking about it but not getting answers, let alone this answer. To be clear, like the OP here I had tried restarting Docker Desktop, restarting the host machine, even downloading and installing an available DD update (I was only a bit behind), and none of those worked. I didn't proceed to ALL the steps shenyongo did, as I thought I'd try this first, and the reset worked.
Hope that may help others. I realize some may fear losing something, but this helps stress the power of declarative vs imperative k8s configuration. It SHOULD be easy to recreate most everything if necessary. I realize it may not be so for everyone.

Access to internal infrastructure from Kubernetes

If I run Docker (Docker for Desktop, 2.0.0.3 on Windows 10), then access to internal infrastructure and containers is fine. I can easily do
docker pull internal.registry:5005/container:latest
But ones I enable Kubernetes there, I completely lose an access to internal infrastructure and [Errno 113] Host is unreachable in Kubernetes itself or connect: no route to host from Docker appears.
I have tried several ways, including switching of NAT from DockerNAT to Default Switch. That one doesn't work without restart and restart changes it back to DockerNAT, so, no luck here. This option also seems not to work.
let's start from the basics form the official documentation:
Please make sure you meet all the prerequisites and all other instructions were met.
Also you can use this guide. It has more info with details pointing to what might have gone wrong in your case.
If the above won't help, there are few other things to consider:
In case you are using a virtual machine, make sure that the IP you are referring to is the one of the docker-engines’ host and not the one on which the client is running.
Try to add tmpnginx in docker-compose.
Try to delete the pki directory in C:\programdata\DockerDesktop (first stop Docker, delete the dir and than start Docker). The directory will be recreated and k8s-app=kube-dns labels should work fine.
Please let me know if that helped.

No response from Docker service

I tried following the tutorial here
https://docs.docker.com/get-started/part3/.
First issue I ran into was when I called docker swarm init. It also asked for docker swarm init --advertise-addr with one of two possible IPv6 IPs.
I tried initializing the swarm on both and then starting the service. The service starts succesfully, but I can't get any response when accessing Localhost:4000. It just loads forever.
I have tried rebuilding the image, creating the swarm on both IPs, checking the logs (there was nothing there), but I kind of run out of ideas. If it helps, the computer has dual operating system, might affect the networking in ways I an unable to figure out.
How can I receive a response on my request?
The issue I was facing was a connection between google chrome and docker swarm, documented better here
https://forums.docker.com/t/google-chrome-and-localhost-in-swarm-mode/32229/9.
There is no apparent solution

Resources