Failure creating multiple clusters in kind - docker

I am trying to create a multicluster with kind but it is only creating one and exists when creating the second with an error failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged k8ssandra-multinode-worker01-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
cluster-one.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: k8ssandra-multinode-control
nodes:
- role: control-plane
- role: worker
extraMounts:
- hostPath: /root/data
containerPath: /files
cluster-two.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: k8ssandra-multinode-worker01
nodes:
- role: control-plane
- role: worker
- role: worker
extraMounts:
- hostPath: /root/data
containerPath: /files
error
Creating cluster "k8ssandra-multinode-worker01" ...
βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
βœ“ Preparing nodes πŸ“¦ πŸ“¦ πŸ“¦
βœ“ Writing configuration πŸ“œ
βœ“ Starting control-plane πŸ•ΉοΈ
βœ“ Installing CNI πŸ”Œ
βœ“ Installing StorageClass πŸ’Ύ
βœ— Joining worker nodes 🚜
ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged k8ssandra-multinode-worker01-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
Command Output: I1117 09:17:08.518417 135 join.go:416] [preflight] found NodeName empty; using OS hostname as NodeName
I1117 09:17:08.519112 135 joinconfiguration.go:76] loading configuration from "/kind/kubeadm.conf"
I1117 09:17:08.520143 135 controlplaneprepare.go:220] [download-certs] Skipping certs download
I1117 09:17:08.520165 135 join.go:533] [preflight] Discovering cluster-info
I1117 09:17:08.520196 135 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "k8ssandra-multinode-worker01-control-plane:6443"
I1117 09:17:08.528982 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 7 milliseconds
I1117 09:17:08.529890 135 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "abcdef", will try again
I1117 09:17:14.440113 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 2 milliseconds
I1117 09:17:14.440525 135 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "abcdef", will try again
I1117 09:17:20.857223 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 2 milliseconds
I1117 09:17:20.858092 135 token.go:105] [discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server "k8ssandra-multinode-worker01-control-plane:6443"
I1117 09:17:20.858105 135 discovery.go:52] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I1117 09:17:20.858121 135 join.go:547] [preflight] Fetching init configuration
I1117 09:17:20.858126 135 join.go:593] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
I1117 09:17:20.865945 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s 200 OK in 7 milliseconds
I1117 09:17:20.868770 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy?timeout=10s 200 OK in 1 milliseconds
I1117 09:17:20.869969 135 kubelet.go:74] attempting to download the KubeletConfiguration from ConfigMap "kubelet-config"
I1117 09:17:20.871559 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 200 OK in 1 milliseconds
I1117 09:17:20.873364 135 interface.go:432] Looking for default routes with IPv4 addresses
I1117 09:17:20.873375 135 interface.go:437] Default route transits interface "eth0"
I1117 09:17:20.873507 135 interface.go:209] Interface eth0 is up
I1117 09:17:20.873584 135 interface.go:257] Interface "eth0" has 3 addresses :[172.18.0.6/16 fc00:f853:ccd:e793::6/64 fe80::42:acff:fe12:6/64].
I1117 09:17:20.873610 135 interface.go:224] Checking addr 172.18.0.6/16.
I1117 09:17:20.873622 135 interface.go:231] IP found 172.18.0.6
I1117 09:17:20.873650 135 interface.go:263] Found valid IPv4 address 172.18.0.6 for interface "eth0".
I1117 09:17:20.873660 135 interface.go:443] Found active IP 172.18.0.6
I1117 09:17:20.881864 135 kubelet.go:120] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I1117 09:17:20.882821 135 kubelet.go:135] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I1117 09:17:20.883136 135 loader.go:374] Config loaded from file: /etc/kubernetes/bootstrap-kubelet.conf
I1117 09:17:20.883425 135 kubelet.go:156] [kubelet-start] Checking for an existing Node in the cluster with name "k8ssandra-multinode-worker01-worker" and status "Ready"
I1117 09:17:20.885860 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/nodes/k8ssandra-multinode-worker01-worker?timeout=10s 404 Not Found in 2 milliseconds
I1117 09:17:20.886355 135 kubelet.go:171] [kubelet-start] Stopping the kubelet
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I1117 09:17:26.056747 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:31.059092 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:36.058998 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:41.060172 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:46.060067 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:51.055910 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:56.059946 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] Initial timeout of 40s passed.
I1117 09:18:01.059081 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:06.059225 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:11.058780 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:16.059417 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:21.055401 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:26.058925 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:31.059803 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:36.056263 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:41.059902 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:46.058631 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:51.055696 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:56.056784 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:01.056467 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:06.059134 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:11.060130 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:16.056531 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
timed out waiting for the condition
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
cmd/kubeadm/app/cmd/join.go:181
github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
cmd/kubeadm/app/kubeadm.go:50
main.main
cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1594
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
journalctl -u kubelet -f
Nov 17 09:36:36 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:36:36 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 17 09:36:46 k8ssandra systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 172.
Nov 17 09:36:46 k8ssandra systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 17 09:36:46 k8ssandra systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 17 09:36:46 k8ssandra kubelet[115167]: E1117 09:36:46.972947 115167 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
Nov 17 09:36:46 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:36:46 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 17 09:36:57 k8ssandra systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 173.
Nov 17 09:36:57 k8ssandra systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 17 09:36:57 k8ssandra systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 17 09:36:57 k8ssandra kubelet[115206]: E1117 09:36:57.215363 115206 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
Nov 17 09:36:57 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:36:57 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.
^[[ANov 17 09:37:07 k8ssandra systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 174.
Nov 17 09:37:07 k8ssandra systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 17 09:37:07 k8ssandra systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 17 09:37:07 k8ssandra kubelet[115249]: E1117 09:37:07.463490 115249 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
Nov 17 09:37:07 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:37:07 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.

Related

Rails 6 + Capistrano - No such puma.sock file

please, I have a giant problem for more than 10 hours.
Whenever I run my application deployment in Rails, with Capistrano and Puma, and I run a restart of nginx, I see this error when I try to access my web:
enter image description here
When I access my nginx logs, I see the following error:
2020/12/29 04:09:50 [crit] 9536#9536: *73 connect() to unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock failed (2: No such file or directory) while connecting to upstream, client: [CLIENT_ID], server: , request: "GET / HTTP/1.1", upstream: "http://unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock:/", host: "[MY_HOST]"
2020/12/29 04:09:50 [crit] 9536#9536: *73 connect() to unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock failed (2: No such file or directory) while connecting to upstream, client: [CLIENT_ID], server: , request: "GET / HTTP/1.1", upstream: "http://unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock:/500.html", host: "[MY_HOST]"
Thanks in advance for someone's help. Because it has been over 10 hours that I am trying to solve this problem of missing the ".sock" file and I can't
Update 1:
Following a tutorial I create:
I create in the path: /etc/systemd/system a file: puma-website.service
Inside has:
After=network.target
[Service]
# Foreground process (do not use --daemon in ExecStart or config.rb)
Type=simple
# Preferably configure a non-privileged user
User=ubuntu
Group=ubuntu
# Specify the path to your puma application root
WorkingDirectory=/home/ubuntu/my_app/current
# Helpful for debugging socket activation, etc.
Environment=PUMA_DEBUG=1
#EnvironmentFile=/var/www/my-website.com/.env
# The command to start Puma
ExecStart=/home/ubuntu/.rbenv/shims/bundle exec puma -C /home/ubuntu/my_app/current/config/puma.rb
Restart=always
[Install]
WantedBy=multi-user.target
But I error:
:/etc/systemd/system$ sudo systemctl status puma-website.service
● puma-website.service
Loaded: loaded (/etc/systemd/system/puma-website.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-12-29 00:52:19 UTC; 12h ago
Process: 4316 ExecStart=/home/ubuntu/.rbenv/shims/bundle exec puma -C /home/ubuntu/my_app/current/config/puma.rb (code=exited, status=1/FAILURE
Main PID: 4316 (code=exited, status=1/FAILURE)
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Main process exited, code=exited, status=1/FAILURE
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Failed with result 'exit-code'.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Service hold-off time over, scheduling restart.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Scheduled restart job, restart counter is at 10.
Dec 29 00:52:19 MyIp systemd[1]: Stopped puma-website.service.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Start request repeated too quickly.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Failed with result 'exit-code'.
Dec 29 00:52:19 MyIp systemd[1]: Failed to start puma-website.service.

Docker fails with "failed to start containerd: timeout waiting for containerd to start"

I have docker installed on Ubuntu 18.04.2 with snap.
When I try to start docker it fails with the following error log.
2020-07-16T23:49:14Z docker.dockerd[932]: failed to start containerd: timeout waiting for containerd to start
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Main process exited, code=exited, status=1/FAILURE
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Failed with result 'exit-code'.
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Service hold-off time over, scheduling restart.
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Scheduled restart job, restart counter is at 68.
2020-07-16T23:49:14Z systemd[1]: Stopped Service for snap application docker.dockerd.
2020-07-16T23:49:14Z systemd[1]: Started Service for snap application docker.dockerd.
It goes over and over into a restart loop. What should I do to get docker working again?
In this case, docker was waiting for containerd to start. The containerd pid is located at
/var/snap/docker/471/run/docker/containerd/containerd.pid.
This pid didn't exist. But the file was not deleted when the server was unceremoniously shutdown. Deleting this file allows the containerd process to start again, and problem is solved. I believe similar problems exist out there where docker.pid file also points to a non-existent pid.
Ive also faced error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout on fresh docker install on Arch linux today.
Ive installed docker and tried to start it:
sudo systemctl enable docker
sudo systemctl start docker
It dont start: sudo systemctl status docker says:
Γ— docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2022-02-20 20:29:53 +03; 8s ago
TriggeredBy: Γ— docker.socket
Docs: https://docs.docker.com
Process: 8368 ExecStart=/usr/bin/dockerd -H fd:// (code=exited, status=1/FAILURE)
Main PID: 8368 (code=exited, status=1/FAILURE)
CPU: 414ms
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: Stopped Docker Application Container Engine.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Start request repeated too quickly.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Failed with result 'exit-code'.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: Failed to start Docker Application Container Engine.
I managed to get more info after executing sudo dockerd:
$ sudo dockerd
INFO[2022-02-20T20:32:05.923357711+03:00] Starting up
INFO[2022-02-20T20:32:05.924015767+03:00] libcontainerd: started new containerd process pid=8618
INFO[2022-02-20T20:32:05.924036777+03:00] parsed scheme: "unix" module=grpc
INFO[2022-02-20T20:32:05.924043494+03:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2022-02-20T20:32:05.924058420+03:00] ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>} module=grpc
INFO[2022-02-20T20:32:05.924068315+03:00] ClientConn switching balancer to "pick_first" module=grpc
containerd: /usr/lib/libc.so.6: version `GLIBC_2.34' not found (required by containerd)
ERRO[2022-02-20T20:32:05.924198775+03:00] containerd did not exit successfully error="exit status 1" module=libcontainerd
WARN[2022-02-20T20:32:06.925000686+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:09.397384787+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:13.645272915+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:19.417671818+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start containerd: timeout waiting for containerd to start
So it seems like containerd could not start in my case.
I tried sudo containerd and voila:
$ sudo containerd
containerd: /usr/lib/libc.so.6: version `GLIBC_2.34' not found (required by containerd)
On my OS (Arch linux) the solution was to update the package:
sudo pacman -S lib32-glibc
If may be just sudo pacman -S glibc for someone on arch linux as weel

1 out 5 fluentd is in ImagePullBackOff state

I have 1 master and 5 nodes k8s cluster. I am setting EFK with ref: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes#step-4-%E2%80%94-creating-the-fluentd-daemonset
While Creating the Fluentd DaemonSet, 1 out 5 fluentd is in ImagePullBackOff state :
kubectl get all -n kube-logging -o wide Tue Apr 21 03:49:26 2020
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES
SELECTOR
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
NAME READY STATUS RESTARTS AGE IP NODE
po/fluentd-82h6k 1/1 Running 1 1d 100.96.15.56 ip-172-20-52-52.us-west-1.compute.internal
po/fluentd-8ghjq 0/1 ImagePullBackOff 0 17h 100.96.10.170 ip-172-20-58-72.us-west-1.compute.internal
po/fluentd-fdmc8 1/1 Running 1 1d 100.96.3.73 ip-172-20-63-147.us-west-1.compute.internal
po/fluentd-g7755 1/1 Running 1 1d 100.96.2.22 ip-172-20-60-101.us-west-1.compute.internal
po/fluentd-gj8q8 1/1 Running 1 1d 100.96.16.17 ip-172-20-57-232.us-west-1.compute.internal
admin#ip-172-20-58-79:~$ kubectl describe po/fluentd-8ghjq -n kube-logging
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 12m (x4364 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Back-off pulling image "fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1"
Warning FailedSync 2m (x4612 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Error syncing pod
Kubelet logs on node which is failing to run Fulentd
admin#ip-172-20-58-72:~$ journalctl -u kubelet -f
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095334 755 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095369 755 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: W0421 03:53:53.095440 755 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Apr 21 03:53:54 ip-172-20-58-72 kubelet[755]: I0421 03:53:54.882213 755 server.go:779] GET /metrics/cadvisor: (50.308555ms) 200 [[Prometheus/2.12.0] 172.20.58.79:54492]
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: I0421 03:53:55.452951 755 kuberuntime_manager.go:500] Container {Name:fluentd Image:fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:FLUENT_ELASTICSEARCH_HOST Value:vpc-cog-01-es-dtpgkfi.ap-southeast-1.es.amazonaws.com ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_PORT Value:443 ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_SCHEME Value:https ValueFrom:nil} {Name:FLUENTD_SYSTEMD_CONF Value:disable ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:varlog ReadOnly:false MountPath:/var/log SubPath: MountPropagation:<nil>} {Name:varlibdockercontainers ReadOnly:true MountPath:/var/lib/docker/containers SubPath: MountPropagation:<nil>} {Name:fluentd-token-k8fnp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: E0421 03:53:55.455327 755 pod_workers.go:182] Error syncing pod aa65dd30-82f2-11ea-a005-0607d7cb72ed ("fluentd-8ghjq_kube-logging(aa65dd30-82f2-11ea-a005-0607d7cb72ed)"), skipping: failed to "StartContainer" for "fluentd" with ImagePullBackOff: "Back-off pulling image \"fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1\""
Kubelet logs on the node which is running Fulentd successfully
admin#ip-172-20-63-147:~$ journalctl -u kubelet -f
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874293 1272 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874336 1272 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: W0421 04:09:25.874453 1272 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available

Kubernetes installation conflicts with docker-ce-17.03.0

I can't install kubernetes in CentOS following this installation guide (link).
1: Flannel and docker service can't start after default installation
By default above installation will install Docker 1.12, but flannel and docker service can't start.
● flanneld.service - Flanneld overlay address etcd agent
Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled)
Active: activating (start) since Mon 2017-03-20 11:24:45 EDT; 27s ago
Main PID: 31572 (flanneld)
CGroup: /system.slice/flanneld.service
└─31572 /usr/bin/flanneld -etcd-endpoints=http://127.0.0.1:2379 -etcd-prefix=/atomic.io/network
Mar 20 11:25:00 JackKubeNode1 flanneld-start[31572]: E0320 11:25:00.259468 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:01 JackKubeNode1 flanneld-start[31572]: E0320 11:25:01.265559 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:02 JackKubeNode1 flanneld-start[31572]: E0320 11:25:02.592586 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:03 JackKubeNode1 flanneld-start[31572]: E0320 11:25:03.677965 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:04 JackKubeNode1 flanneld-start[31572]: E0320 11:25:04.719815 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:05 JackKubeNode1 flanneld-start[31572]: E0320 11:25:05.820301 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:09 JackKubeNode1 flanneld-start[31572]: E0320 11:25:09.016167 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:10 JackKubeNode1 flanneld-start[31572]: E0320 11:25:10.021494 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:11 JackKubeNode1 flanneld-start[31572]: E0320 11:25:11.022784 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:12 JackKubeNode1 flanneld-start[31572]: E0320 11:25:12.238389 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:13 JackKubeNode1 flanneld-start[31572]: E0320 11:25:13.513397 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
A dependency job for docker.service failed. See 'journalctl -xe' for details.
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf
Active: inactive (dead) since Mon 2017-03-20 11:25:16 EDT; 1min 29s ago
Docs: http://docs.docker.com
Main PID: 30412 (code=exited, status=0/SUCCESS)
Mar 20 11:18:32 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:18:32.059329808-04:00" level=info msg="Daemon has completed initialization"
Mar 20 11:18:32 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:18:32.059499814-04:00" level=info msg="Docker daemon" commit="96d83a5/1.12.6" graphdriver=devicemapper version=1.12.6
Mar 20 11:18:33 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:18:33.169919003-04:00" level=info msg="API listen on /var/run/docker.sock"
Mar 20 11:18:33 JackKubeNode1 systemd[1]: Started Docker Application Container Engine.
Mar 20 11:25:15 JackKubeNode1 systemd[1]: Stopping Docker Application Container Engine...
Mar 20 11:25:15 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:25:15.912002109-04:00" level=info msg="Processing signal 'terminated'"
Mar 20 11:25:16 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:25:15.982882827-04:00" level=info msg="stopping containerd after receiving terminated"
Mar 20 11:25:16 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:25:16.352579523-04:00" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing"
Mar 20 11:26:42 JackKubeNode1 systemd[1]: Dependency failed for Docker Application Container Engine.
Mar 20 11:26:42 JackKubeNode1 systemd[1]: Job docker.service/start failed with result 'dependency'.
2: It is said in link above issue is fixed in docker 1.13. So I manually install docker first then install kubernetes. But docker-ce-17.03 was installed, then there was conflicts between kubernetes and docker-ce-17.03 during kubernetes dependency resolves. How to work it around?
Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker\n-Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker\n-Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker-io
Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker-io
Processing Conflict: docker-ce-selinux-17.03.0.ce-1.el7.centos.noarch conflicts docker-selinux
Processing Conflict: docker-ce-selinux-17.03.0.ce-1.el7.centos.noarch conflicts docker-selinux
3: Recently Docker renamed docker-VERSION as docker-ce-VERSION, and looks like kubernetes doesn't accept new name docker-ce-VERSION. I think the issue I met can be worked around if I manually install docker-1.13. But how to install docker-1.13? I always install docker-ce-17.03 when running "yum install docker".
Docker ce 17.03 is basically docker 1.13 which isn't supported yet by stable kubernetes release. See this kuberentes github issue

Docker can't start on centos7: failed to start docker application container engine

centos7 via vmware workstation player, and
[root#localhost Desktop]# uname -r
3.10.0-229.14.1.el7.x86_64
first, yum install docker-engine
then, other_args="--selinux-enabled" >> /etc/sysconfig/docker
when service docker start,I got error:
[root#localhost Desktop]# systemctl status docker.service -l
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled)
Active: activating (start) since ζ—₯ 2015-10-25 19:49:32 PDT; 46s ago
Docs: https://docs.docker.com
Main PID: 14387 (docker)
CGroup: /system.slice/docker.service
└─14387 /usr/bin/docker daemon -H fd://
10月 25 19:49:32 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine.
10月 25 19:49:32 localhost.localdomain systemd[1]: Unit docker.service entered failed state.
10月 25 19:49:32 localhost.localdomain systemd[1]: Starting Docker Application Container Engine...
10月 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.092885953-07:00" level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
10月 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.093697949-07:00" level=info msg="Option DefaultDriver: bridge"
10月 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.093729432-07:00" level=info msg="Option DefaultNetwork: bridge"
10月 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.108983655-07:00" level=warning msg="Running modprobe bridge nf_nat br_netfilter failed with message: modprobe: WARNING: Module br_netfilter not found.\n, error: exit status 1"
who can help me ? thanks.

Resources