Docker for Desktop Kubernetes Unable to connect to the server: dial tcp [::1]:6445 - docker

I am using Docker for Desktop on Windows 10 Professional with Hyper-V, also I am not using minikube. I have installed Kubernetes cluster via Docker for Desktop, as shown below:
It shows the Kubernetes is successfully installed and running.
When I run the following command:
kubectl config view
I get the following output:
apiVersion: v1
clusters:
- cluster:
insecure-skip-tls-verify: true
server: https://localhost:6445
name: docker-for-desktop-cluster
contexts:
- context:
cluster: docker-for-desktop-cluster
user: docker-for-desktop
name: docker-for-desktop
current-context: docker-for-desktop
kind: Config
preferences: {}
users:
- name: docker-for-desktop
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
However when I run the
kubectl cluster-info
I am getting the following error:
Unable to connect to the server: dial tcp [::1]:6445: connectex: No connection could be made because the target machine actively refused it.
It seems like there is some network issue, I am not sure how to resolve this.

I know this is an old question but the following helped me to resolve a similar issue. The root cause was that I had minikube installed previously and that was being used as my default context.
I was getting following error:
Unable to connect to the server: dial tcp 192.168.1.8:8443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
In the power-shell run the following command:
> kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
docker-desktop docker-desktop docker-desktop
docker-for-desktop docker-desktop docker-desktop
* minikube minikube minikube
this will list all the contexts and see if there are multiple. If you had installed minikube in the past, that will show a * mark as currently selected default context. You can change that to point to docker-desktop context like follows:
> kubectl config use-context docker-desktop
Run the get-contexts command again to verify the * mark.
Now, the following command should work:
> kubectl get pods

Posting a response to this very old question, as I was searching for a solution and later found a different cause for my problem and the solution was simple.
Cause was that the config file was missing from the $HOME$/.kube directory
A simple restart of Docker Desktop restored the file with some defaults and things were back ok.
Side note: The issue started after I upgraded my Docker Desktop Installation to latest (when I got the update available popup). I should also mention that the cluster stopped working and I had to manually remove Docker Desktop and Reinstall the latest version (this was the story before the problem occurred).

Related

Unable to export traces to OpenTelemetry Collector on Kubernetes

I am using the opentelemetry-ruby otlp exporter for auto instrumentation:
https://github.com/open-telemetry/opentelemetry-ruby/tree/main/exporter/otlp
The otel collector was installed as a daemonset:
https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector
I am trying to get the OpenTelemetry collector to collect traces from the Rails application. Both are running in the same cluster, but in different namespaces.
We have enabled auto-instrumentation in the app, but the rails logs are currently showing these errors:
E, [2022-04-05T22:37:47.838197 #6] ERROR -- : OpenTelemetry error: Unable to export 499 spans
I set the following env variables within the app:
OTEL_LOG_LEVEL=debug
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
I can't confirm that the application can communicate with the collector pods on this port.
Curling this address from the rails/ruby app returns "Connection Refused". However I am able to curl http://<OTEL_POD_IP>:4318 which returns 404 page not found.
From inside a pod:
# curl http://localhost:4318/
curl: (7) Failed to connect to localhost port 4318: Connection refused
# curl http://10.1.0.66:4318/
404 page not found
This helm chart created a daemonset but there is no service running. Is there some setting I need to enable to get this to work?
I confirmed that otel-collector is running on every node in the cluster and the daemonset has HostPort set to 4318.
The problem is with this setting:
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
Imagine your pod as a stripped out host itself. Localhost or 0.0.0.0 of your pod, and you don't have a collector deployed in your pod.
You need to use the address from your collector. I've checked the examples available at the shared repo and for agent-and-standalone and standalone-only you also have a k8s resource of type Service.
With that you can use the full service name (with namespace) to configure your environment variable.
Also, the Environment variable now is called OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, so you will need something like this:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=<service-name>.<namespace>.svc.cluster.local:<service-port>
The correct solution is to use the Kubernetes Downward API to fetch the node IP address, which will allow you to export the traces directly to the daemonset pod within the same node:
containers:
- name: my-app
image: my-image
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://$(HOST_IP):4318
Note that using the deployment's service as the endpoint (<service-name>.<namespace>.svc.cluster.local) is incorrect, as it effectively bypasses the daemonset and sends the traces directly to the deployment, which makes the daemonset useless.

Skaffold dev fails

I am having this error, after running skaffold dev.
Step 1/6 : FROM node:current-alpine3.11
exiting dev mode because first build failed: unable to stream build output: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 192.168.49.1:53: read udp 192.168.49.2:35889->192.168.49.1:53: i/o timeout. Please fix the Dockerfile and try again..
Here is skaffold.yml
apiVersion: skaffold/v2beta11
kind: Config
metadata:
name: *****
build:
artifacts:
- image: 127.0.0.1:32000/auth
context: auth
docker:
dockerfile: Dockerfile
deploy:
kubectl:
manifests:
- infra/k8s/auth-depl.yaml
local:
push: false
artifacts:
- image: 127.0.0.1:32000/auth
context: auth
docker:
dockerfile: Dockerfile
sync:
manual:
- src: "src/**/*.ts"
dest: .
I have tried all possible solutions I saw online, including adding 8.8.8.8 as the DNS, but the error still persists. I am using Linux and running ubuntu, I am also using Minikube locally. Please assist.
This is a Community Wiki answer, posted for better visibility, so feel free to edit it and add any additional details you consider important.
In this case:
minikube delete && minikube start
solved the problem but you can start from restarting docker daemon. Since this is Minikube cluster and Skaffold uses for its builds Minikube's Docker daemon, as suggested by Brian de Alwis in his comment, you may start from:
minikube stop && minikube start
or
minikube ssh
su
systemctl restart docker
I searched for similar errors and in many cases e.g. here or in this thread, setting up your DNS to something reliable like 8.8.8.8 may also help:
sudo echo "nameserver 8.8.8.8" >> /etc/resolv.conf
in case you use Minikube you should first:
minikube ssh
su ### to become root
and then run:
echo "nameserver 8.8.8.8" >> /etc/resolv.conf
The following error message:
Please fix the Dockerfile and try again
may be somewhat misleading in similar cases as Dockerfile is probably totally fine, but as we can read in other part:
lookup registry-1.docker.io on 192.168.49.1:53: read udp 192.168.49.2:35889->192.168.49.1:53: i/o timeout.
it's definitely related with failing DNS lookup. This is well described here as well known issue.
Get i/o timeout
Get https://index.docker.io/v1/repositories//images: dial tcp: lookup on :53: read udp :53: i/o timeout
Description
The DNS resolver configured on the host cannot resolve the registry’s
hostname.
GitHub link
N/A
Workaround
Retry the operation, or if the error persists, use another DNS
resolver. You can do this by updating your /etc/resolv.conf file
with these or other DNS servers:
nameserver 8.8.8.8 nameserver 8.8.4.4

Trying to pull/run docker images from docker hub on Minikube fails

I am very new to Kuberetes and I have done some work with docker previously. I am trying to accomplish following:
Spin up Minikube
Use Kube-ctl to spin up a docker image from docker hub.
I started minikube and things look like they are up and running. Then I pass following command
kubectl run nginx --image=nginx (Please note I do not have this image anywhere on my machine and I am expecting k8 to fetch it for me)
Now, when I do that, it spins up the pod but the status is ImagePullBackOff. So I ran kubectl describe pod command on it and the results look like following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m default-scheduler Successfully assigned default/ngix-67c6755c86-qm5mv to minikube
Warning Failed 8m kubelet, minikube Failed to pull image "nginx": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 192.168.64.1:53: read udp 192.168.64.2:52133->192.168.64.1:53: read: connection refused
Normal Pulling 8m (x2 over 8m) kubelet, minikube Pulling image "nginx"
Warning Failed 8m (x2 over 8m) kubelet, minikube Error: ErrImagePull
Warning Failed 8m kubelet, minikube Failed to pull image "nginx": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 192.168.64.1:53: read udp 192.168.64.2:40073->192.168.64.1:53: read: connection refused
Normal BackOff 8m (x3 over 8m) kubelet, minikube Back-off pulling image "nginx"
Warning Failed 8m (x3 over 8m) kubelet, minikube Error: ImagePullBackOff
Then I searched around to see if anyone has faced similar issues and it turned out that some people have and they did resolve it by restarting minikube using some more flags which look like below:
minikube start --vm-driver="xhyve" --insecure-registry="$REG_IP":80
when I do nslookup inside Minikube, it does resolve with following information:
Server: 10.12.192.22
Address: 10.12.192.22#53
Non-authoritative answer:
hub.docker.com canonical name = elb-default.us-east-1.aws.dckr.io.
elb-default.us-east-1.aws.dckr.io canonical name = us-east-1-elbdefau-1nlhaqqbnj2z8-140214243.us-east-1.elb.amazonaws.com.
Name: us-east-1-elbdefau-1nlhaqqbnj2z8-140214243.us-east-1.elb.amazonaws.com
Address: 52.205.36.130
Name: us-east-1-elbdefau-1nlhaqqbnj2z8-140214243.us-east-1.elb.amazonaws.com
Address: 3.217.62.246
Name: us-east-1-elbdefau-1nlhaqqbnj2z8-140214243.us-east-1.elb.amazonaws.com
Address: 35.169.212.184
still no luck. Is there anything that I am doing wrong here?
There error message suggests that the Docker daemon running in the minikube VM can't resolve the registry-1.docker.io hostname because the DNS nameserver it's configured to use for DNS resolution (192.168.64.1:53) is refusing connection. It's strange to me that the Docker deamon is trying to resolve registry-1.docker.io via a nameserver at 192.168.64.1 but when you nslookup on the VM it's using a nameserver at 10.12.192.22. I did an Internet search for "minkube Get registry-1.docker.io/v2: dial tcp: lookup registry-1.docker.io on 192.168.64.1:53" and found an issue where someone made this comment, seems identical to your problem, and seems specific to xhyve.
In that comment the person says:
This issue does look like an xhyve issue not seen with virtualbox.
and
Switching to virtualbox fixed this issue for me.
I stopped minikube, deleted it, started it without --vm-driver=xhyve (minikube uses virtualbox driver by default), and then docker build -t hello-node:v1 . worked fine without errors
In my case it was caused by running dnsmasq, a dns server, on my Mac using Homebrew, which caused the DNS requests to fail inside minikube. After stopping dnsmasq, everything worked.
I got this problem with my local minikube setup and I wasn't able to pull any images I added to a simple deployment manifest.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test1 0/1 ImagePullBackOff 0 68s
Tried to execute the below test:
apiVersion: v1
kind: Pod
metadata:
name: test1
labels:
site: blog
spec:
containers:
- name: web
image: nginx:latest
It was possible or fixed only after restarting the minikube.
Maybe the dnsmasq was really the cause in this case.
You have:
minukube running with default settings.
docker building your images
(*) configured minikube to point to your docker images local repo
And now minikube can't pull images from public "container" registries, like docker hub.
stop and start minikube, then point it back to your local docker images repo. The commands to do this (and (*) this):
minikube stop
minikube start
minikube -p minikube docker-env
eval $(minikube -p minikube docker-env)
Since running the above I was able to pull nginx, alpine and frens from hub.docker.come just by setting image: alpine in the yaml spec.
The issue was just a short drop in my network connectivity. So if you have no dns/vpn/xhyve complications and it just stops, the fix is easy enough.

Issues with Kubernetes multi-master using kubeadm on premises

Following Kubernetes v1.11 documentation, I have managed to setup Kubernetes high availability using kubeadm, stacked control plane nodes, with 3 masters running on-premises on CentOS7 VMs. But with no load-balancer available, I used Keepalived to set a failover virtual IP (10.171.4.12) for apiserver as described in Kubernetes v1.10 documentation. As a result, my "kubeadm-config.yaml" used to boostrap the control planes had the following header:
apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.0
apiServerCertSANs:
- "10.171.4.12"
api:
controlPlaneEndpoint: "10.171.4.12:6443"
etcd:
...
The configuration went fine with the following Warning that appeared when boostrapping all 3 Masters:
[endpoint] WARNING: port specified in api.controlPlaneEndpoint
overrides api.bindPort in the controlplane address
And this Warning when joining Workers:
[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will
not be used, because the following required kernel modules are not
loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs
support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{}
nf_conntrack_ipv4:{}] you can solve this problem with following
methods:
1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support
Afterwards, basic tests succeed:
When stopped, Keepalived is failing over to another Master and apiserver is always accessible (all kubectl commands succeed).
When stopping the main Master (with highest Keepalived preference), the deployment of apps is successful (tested with Kubernetes bootcamp) and everything syncs properly with the main Master when it is back online.
Kubernetes bootcamp application runs successfully, and all master & worker nodes respond properly when the service exposing bootcamp with NodePort is curled.
Successfully deployed docker-registry as per https://github.com/kubernetes/ingress-nginx/tree/master/docs/examples/docker-registry
But then comes these issues:
Nginx Ingress Controller pod fails to run and enters state CrashLoopBackOff (refer to events below)
After installing helm and tiller on any Master, all commands using "helm install" or "helm list" failed to execute (refer to command ouputs below)
I am running Kubernetes v1.11.1 but kubeadm-config.yaml mentions 1.11.0, is this something I should worry about?
Shall I not follow the official documentation and go for other alternatives such as described at: https://medium.com/#bambash/ha-kubernetes-cluster-via-kubeadm-b2133360b198
Note: same issue with new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and deployed nginx latest ingress controller release 0.18.0.
-- Nginx controller pod events & logs:
Normal Pulled 28m (x38 over 2h) kubelet, node3.local Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.17.1" already present on machine
Warning Unhealthy 7m (x137 over 2h) kubelet, node3.local Liveness probe failed: Get http://10.240.3.14:10254/healthz: dial tcp 10.240.3.14:10254: connect: connection refused
Warning BackOff 2m (x502 over 2h) kubelet, node3.local Back-off restarting failed container
nginx version: nginx/1.13.12
W0809 14:05:46.171066 5 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0809 14:05:46.171748 5 main.go:191] Creating API client for https://10.250.0.1:443
-- helm command outputs:
'# helm install ...
Error: no available release name found
'# helm list
Error: Get https://10.250.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.250.0.1:443: i/o timeout
-- kubernetes service & endpoints:
# kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.250.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443
Session Affinity: None
Events: <none>
# kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default bc-svc 10.240.3.27:8080 6d
default kubernetes 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443 7d
ingress-nginx default-http-backend 10.240.3.24:8080 4d
kube-system kube-controller-manager <none> 7d
kube-system kube-dns 10.240.2.4:53,10.240.2.5:53,10.240.2.4:53 + 1 more... 7d
kube-system kube-scheduler <none> 7d
kube-system tiller-deploy 10.240.3.25:44134 5d
Problems solved when switched my POD network from Flanneld to Calico.
(tested on Kubernetes 1.11.0; will repeat tests tomorrow on latest k8s version 1.11.2)
As you can see in the Kubernetes client-go code, IP address and port are read from environment variables inside a container:
host, port := os.Getenv("KUBERNETES_SERVICE_HOST"), os.Getenv("KUBERNETES_SERVICE_PORT")
You can check these variables if you run following command mentioning any healthy pod in it:
$ kubectl exec <healthy-pod-name> -- printenv | grep SERVICE
I think the cause of the problem is that the variables KUBERNETES_SERVICE_HOST:KUBERNETES_SERVICE_PORT is set to 10.250.0.1:443 instead of 10.171.4.12:6443
Could you confirm it by checking these variables in your cluster?
Important Additional Notes:
After running couple of labs, I got the same issue with:
- new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and nginx latest ingress controller release 0.18.0.
- standalone Kubernetes master with few workers using version 1.11.1 (one master + two workers) and nginx latest ingress controller release 0.18.0.
- but with standalone Kubernetes master version 1.11.0 (one master + two workers), nginx ingress controller 0.17.1 worked with no complaints while 0.18.0 complained that Readiness probe failed but the pod went into the running state.
=> As a result, I think the issue is related to kubernetes releases 1.11.1 and 1.11.2 in the way they interpret the health probes maybe

Pod creation in ContainerCreating state always

I am trying to create a pod using kubernetes with the following simple command
kubectl run example --image=nginx
It runs and assigns the pod to the minion correctly but the status is always in ContainerCreating status due to the following error. I have not hosted GCR or GCloud on my machine. So not sure why its picking from there only.
1h 29m 14s {kubelet centos-minion1} Warning FailedSync Error syncing pod, skipping:
failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed
for gcr.io/google_containers/pause:2.0, this may be because there are no
credentials on this request. details: (unable to ping registry endpoint
https://gcr.io/v0/\nv2 ping attempt failed with error: Get https://gcr.io/v2/:
http: error connecting to proxy http://87.254.212.120:8080: dial tcp
87.254.212.120:8080: i/o timeout\n v1 ping attempt failed with error:
Get https://gcr.io/v1/_ping: http: error connecting to proxy
http://87.254.212.120:8080: dial tcp 87.254.212.120:8080: i/o timeout)
Kubernetes is trying to create a pause container for your pod; this container is used to create the pod's network namespace. See this question and its answers for more general information on the pause container.
To your specific error: Kubernetes tries to pull the pause container's image (which would be gcr.io/google_containers/pause:2.0, according to your error message) from the Google Container Registry (gcr.io). Apparently, your Docker engine tries to connect to GCR using a HTTP proxy located at 87.254.212.120:8080, to which it apparently cannot connect (i/o timeout).
To correct this error, either make sure that you HTTP proxy server is online and does not block HTTP requests to GCR, or (if you do have public Internet access) disable the proxy connection for your Docker engine (this would typically be done using the http_proxy and https_proxy environment variables, which would have been set in /etc/sysconfig/docker or /etc/default/docker, depending on your Linux distribution).

Resources