When I run:
docker build -t random-letter .
I get error:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I tried running dockerd but got some other errors
Running iptables --wait -t nat -L -n failed with message: `iptables v1.8.4 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.`, error: exit status 3
INFO[2022-04-13T14:32:13.795289191Z] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby
INFO[2022-04-13T14:32:13.795587753Z] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
INFO[2022-04-13T14:32:13.795630880Z] stopping healthcheck following graceful shutdown module=libcontainerd
WARN[2022-04-13T14:32:14.796355453Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.4 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.
Here's a link to a similar question may help you to get a good answer since I believe DinD should be avoided to reduce complexity
I have reduced my dockerfile to the following
FROM docker:latest
EXPOSE 3000
But when running the image, docker daemon cannot start.
Running dockerd in the container results in a large chain of info, errors and warnings ending with the following:
WARN[2021-12-09T01:07:36.691842800Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: Iptables not found
Am I missing something? I can manually install iptables but then it fails again with
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables -t nat -N DOCKER: iptables v1.8.7 (legacy): can't initialize iptables table `nat': Permission denied (you must be root)
So I am assuming I have some setting wrong as it seems to be working out of the box here https://hub.docker.com/_/docker
I am running docker on Windows with the WSL 2 backend.
I have an existing Docker swarm consisting of three machines. I am trying to add a new manager to this swarm. I run the command
docker swarm join --token SWMTKN-1-<...> 192.168.200.200:2377
After a while I get the error
Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Unknown desc = could not connect to prospective new cluster member using its advertised address: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I view the daemon logs using tail -f /var/log/messages | grep docker, I see this:
Mar 17 17:07:48 UAT-Blockchain dockerd: time="2021-03-17T17:07:48.575024542+08:00" level=warning msg="grpc: addrConn.createTransport failed to connect to {/var/run/docker/swarm/control.sock <nil> 0 <nil>}. Err :connection error: desc= \"transport: Error while dialing dial unix /var/run/docker/swarm/control.sock: connect: no such file or directory\". Reconnecting..." module=grpc
A quick check shows that /var/run/docker/swarm/control.sock is indeed missing on this machine, but is present on the machines in the existing swarm.
What is this control.sock? How should I go about enabling/reinstating it on this current machine? Is this a problem of faulty installation?
I have docker installed on Ubuntu 18.04.2 with snap.
When I try to start docker it fails with the following error log.
2020-07-16T23:49:14Z docker.dockerd[932]: failed to start containerd: timeout waiting for containerd to start
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Main process exited, code=exited, status=1/FAILURE
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Failed with result 'exit-code'.
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Service hold-off time over, scheduling restart.
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Scheduled restart job, restart counter is at 68.
2020-07-16T23:49:14Z systemd[1]: Stopped Service for snap application docker.dockerd.
2020-07-16T23:49:14Z systemd[1]: Started Service for snap application docker.dockerd.
It goes over and over into a restart loop. What should I do to get docker working again?
In this case, docker was waiting for containerd to start. The containerd pid is located at
/var/snap/docker/471/run/docker/containerd/containerd.pid.
This pid didn't exist. But the file was not deleted when the server was unceremoniously shutdown. Deleting this file allows the containerd process to start again, and problem is solved. I believe similar problems exist out there where docker.pid file also points to a non-existent pid.
Ive also faced error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout on fresh docker install on Arch linux today.
Ive installed docker and tried to start it:
sudo systemctl enable docker
sudo systemctl start docker
It dont start: sudo systemctl status docker says:
× docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2022-02-20 20:29:53 +03; 8s ago
TriggeredBy: × docker.socket
Docs: https://docs.docker.com
Process: 8368 ExecStart=/usr/bin/dockerd -H fd:// (code=exited, status=1/FAILURE)
Main PID: 8368 (code=exited, status=1/FAILURE)
CPU: 414ms
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: Stopped Docker Application Container Engine.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Start request repeated too quickly.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Failed with result 'exit-code'.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: Failed to start Docker Application Container Engine.
I managed to get more info after executing sudo dockerd:
$ sudo dockerd
INFO[2022-02-20T20:32:05.923357711+03:00] Starting up
INFO[2022-02-20T20:32:05.924015767+03:00] libcontainerd: started new containerd process pid=8618
INFO[2022-02-20T20:32:05.924036777+03:00] parsed scheme: "unix" module=grpc
INFO[2022-02-20T20:32:05.924043494+03:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2022-02-20T20:32:05.924058420+03:00] ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>} module=grpc
INFO[2022-02-20T20:32:05.924068315+03:00] ClientConn switching balancer to "pick_first" module=grpc
containerd: /usr/lib/libc.so.6: version `GLIBC_2.34' not found (required by containerd)
ERRO[2022-02-20T20:32:05.924198775+03:00] containerd did not exit successfully error="exit status 1" module=libcontainerd
WARN[2022-02-20T20:32:06.925000686+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:09.397384787+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:13.645272915+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:19.417671818+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start containerd: timeout waiting for containerd to start
So it seems like containerd could not start in my case.
I tried sudo containerd and voila:
$ sudo containerd
containerd: /usr/lib/libc.so.6: version `GLIBC_2.34' not found (required by containerd)
On my OS (Arch linux) the solution was to update the package:
sudo pacman -S lib32-glibc
If may be just sudo pacman -S glibc for someone on arch linux as weel
If I use the kubectl command after a reboot, I will receive an error.
x.x.x.x: 6443 was refused-did you specify the right host or port?
If I check my container with docker ps, kube-apiserver and kube-scheduler are turned on and off.
Why is this happening?
root#taeil-linux:/etc/systemd/system/kubelet.service.d# cd
root#taeil-linux:~# kubectl get nodes
The connection to the server 10.0.0.152:6443 was refused - did you specify the right host or port?
root#taeil-linux:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
root#taeil-linux:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.15.3 232b5c793146 2 weeks ago 82.4MB
k8s.gcr.io/kube-apiserver v1.15.3 5eb2d3fc7a44 2 weeks ago 207MB
k8s.gcr.io/kube-scheduler v1.15.3 703f9c69a5d5 2 weeks ago 81.1MB
k8s.gcr.io/kube-controller-manager v1.15.3 e77c31de5547 2 weeks ago 159MB
node carbon c83f74dcf58e 3 weeks ago 895MB
kubernetesui/dashboard v2.0.0-beta1 4640949a39e6 2 months ago 64.6MB
weaveworks/weave-kube 2.5.2 f04a043bb67a 3 months ago 148MB
weaveworks/weave-npc 2.5.2 5ce48e0d813c 3 months ago 49.6MB
kubernetesui/metrics-scraper v1.0.0 44390ebe2b73 4 months ago 36.8MB
k8s.gcr.io/coredns 1.3.1 eb516548c180 7 months ago 40.3MB
k8s.gcr.io/etcd 3.3.10 2c4adeb21b4f 9 months ago 258MB
quay.io/coreos/flannel v0.10.0-amd64 f0fad859c909 19 months ago 44.6MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 20 months ago 742kB
root#taeil-linux:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Fri 2019-09-06 14:29:25 KST; 4min 19s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 14470 (kubelet)
Tasks: 19 (limit: 4512)
CGroup: /system.slice/kubelet.service
└─14470 /usr/bin/kubelet --bootstrap- kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf -- kubeconfig=/etc/kubernetes/kubelet.conf -- config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network- plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --resolv-con
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.800330 14470 pod_workers.go:190] Error syncing pod 9a745ac0a776afabd0d387fd0fcb2f54 ("kube-apiserver-taeil-linux_kube- system(9a745ac0a776afabd0d387fd0fcb2f54)"), skipping: failed to "CreatePodSandbox" for "kube-apiserver-ta
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.897945 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.916566 14470 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.0.152:6443/api/v1/pods? fieldSelector=spec.nodeName%3Dtaeil-linux&limit=500&resourceVersion=0: dia
9월 06 14:33:44 taeil-linux kubelet[14470]: E0906 14:33:44.998190 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.098439 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.198732 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.299052 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.399343 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.499561 14470 kubelet.go:2248] node "taeil-linux" not found
9월 06 14:33:45 taeil-linux kubelet[14470]: E0906 14:33:45.599723 14470 kubelet.go:2248] node "taeil-linux" not found
root#taeil-linux:~# systemctl status kube-apiserver
Unit kube-apiserver.service could not be found.
If I try
docker logs
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0906 10:54:19.636649 1 server.go:560] external host was not specified, using 10.0.0.152
I0906 10:54:19.636954 1 server.go:147] Version: v1.15.3
I0906 10:54:21.753962 1 plugins.go:158] Loaded 10 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.753988 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
E0906 10:54:21.754660 1 prometheus.go:55] failed to register depth metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754701 1 prometheus.go:68] failed to register adds metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754787 1 prometheus.go:82] failed to register latency metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754842 1 prometheus.go:96] failed to register workDuration metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754883 1 prometheus.go:112] failed to register unfinished metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754918 1 prometheus.go:126] failed to register unfinished metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754952 1 prometheus.go:152] failed to register depth metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.754986 1 prometheus.go:164] failed to register adds metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755047 1 prometheus.go:176] failed to register latency metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755104 1 prometheus.go:188] failed to register work_duration metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755152 1 prometheus.go:203] failed to register unfinished_work_seconds metric admission_quota_controller: duplicate metrics collector registration attempted
E0906 10:54:21.755188 1 prometheus.go:216] failed to register longest_running_processor_microseconds metric admission_quota_controller: duplicate metrics collector registration attempted
I0906 10:54:21.755215 1 plugins.go:158] Loaded 10 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesBy Condition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObj ectInUseProtection,MutatingAdmissionWebhook.
I0906 10:54:21.755226 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,Validating AdmissionWebhook,ResourceQuota.
I0906 10:54:21.757263 1 client.go:354] parsed scheme: ""
I0906 10:54:21.757280 1 client.go:354] scheme "" not registered, fallback to default scheme
I0906 10:54:21.757335 1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{127.0.0.1:2379 0 <nil>}]
I0906 10:54:21.757402 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:21.757666 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
I0906 10:54:22.753069 1 client.go:354] parsed scheme: ""
I0906 10:54:22.753118 1 client.go:354] scheme "" not registered, fallback to default scheme
I0906 10:54:22.753204 1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{127.0.0.1:2379 0 <nil>}]
I0906 10:54:22.753354 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{127.0.0.1:2379 <nil>}]
W0906 10:54:22.753855 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:22.757983 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:23.754019 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:24.430000 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:25.279869 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:26.931974 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:28.198719 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:30.825660 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:32.850511 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:36.294749 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
W0906 10:54:38.737408 1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
F0906 10:54:41.757603 1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry {[https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt} true 0xc00063dd40 apiextensions.k8s.io/v1beta1 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:2379: connect: connection refused)
The answer is in the comment by #cewood;
Okay, that helps to understand what you installation is likely to look
like. Regarding the other master components, these are likely running
via the kubelet, and hence there won't be any systemd units for them,
only for the kubelet itself.
With kubeadm install you dont see the services;
as root
systemctl start docker
systemctl start kubectl
switch to non root user
su nonrootuser -
kubectl get pods
Long time no see.
I totally realized how to solve this problem!
If you get an error like this for no reason, you can fix it by:
docker rm $(docker ps -a -q)
Perhaps an error occurred when the existing Kubernetes container was rebooted and the newly running container crashed.
watch docker ps
If you check the container with watch, you can see that kube-apiserver and others are turned off within 1 minute.
So I decided to delete all containers appearing in docker ps -a and it's fixed!