I am trying to create a multicluster with kind but it is only creating one and exists when creating the second with an error failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged k8ssandra-multinode-worker01-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
cluster-one.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: k8ssandra-multinode-control
nodes:
- role: control-plane
- role: worker
extraMounts:
- hostPath: /root/data
containerPath: /files
cluster-two.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: k8ssandra-multinode-worker01
nodes:
- role: control-plane
- role: worker
- role: worker
extraMounts:
- hostPath: /root/data
containerPath: /files
error
Creating cluster "k8ssandra-multinode-worker01" ...
β Ensuring node image (kindest/node:v1.25.3) πΌ
β Preparing nodes π¦ π¦ π¦
β Writing configuration π
β Starting control-plane πΉοΈ
β Installing CNI π
β Installing StorageClass πΎ
β Joining worker nodes π
ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged k8ssandra-multinode-worker01-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1
Command Output: I1117 09:17:08.518417 135 join.go:416] [preflight] found NodeName empty; using OS hostname as NodeName
I1117 09:17:08.519112 135 joinconfiguration.go:76] loading configuration from "/kind/kubeadm.conf"
I1117 09:17:08.520143 135 controlplaneprepare.go:220] [download-certs] Skipping certs download
I1117 09:17:08.520165 135 join.go:533] [preflight] Discovering cluster-info
I1117 09:17:08.520196 135 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "k8ssandra-multinode-worker01-control-plane:6443"
I1117 09:17:08.528982 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 7 milliseconds
I1117 09:17:08.529890 135 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "abcdef", will try again
I1117 09:17:14.440113 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 2 milliseconds
I1117 09:17:14.440525 135 token.go:223] [discovery] The cluster-info ConfigMap does not yet contain a JWS signature for token ID "abcdef", will try again
I1117 09:17:20.857223 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s 200 OK in 2 milliseconds
I1117 09:17:20.858092 135 token.go:105] [discovery] Cluster info signature and contents are valid and no TLS pinning was specified, will use API Server "k8ssandra-multinode-worker01-control-plane:6443"
I1117 09:17:20.858105 135 discovery.go:52] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I1117 09:17:20.858121 135 join.go:547] [preflight] Fetching init configuration
I1117 09:17:20.858126 135 join.go:593] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
I1117 09:17:20.865945 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s 200 OK in 7 milliseconds
I1117 09:17:20.868770 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-system/configmaps/kube-proxy?timeout=10s 200 OK in 1 milliseconds
I1117 09:17:20.869969 135 kubelet.go:74] attempting to download the KubeletConfiguration from ConfigMap "kubelet-config"
I1117 09:17:20.871559 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config?timeout=10s 200 OK in 1 milliseconds
I1117 09:17:20.873364 135 interface.go:432] Looking for default routes with IPv4 addresses
I1117 09:17:20.873375 135 interface.go:437] Default route transits interface "eth0"
I1117 09:17:20.873507 135 interface.go:209] Interface eth0 is up
I1117 09:17:20.873584 135 interface.go:257] Interface "eth0" has 3 addresses :[172.18.0.6/16 fc00:f853:ccd:e793::6/64 fe80::42:acff:fe12:6/64].
I1117 09:17:20.873610 135 interface.go:224] Checking addr 172.18.0.6/16.
I1117 09:17:20.873622 135 interface.go:231] IP found 172.18.0.6
I1117 09:17:20.873650 135 interface.go:263] Found valid IPv4 address 172.18.0.6 for interface "eth0".
I1117 09:17:20.873660 135 interface.go:443] Found active IP 172.18.0.6
I1117 09:17:20.881864 135 kubelet.go:120] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I1117 09:17:20.882821 135 kubelet.go:135] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I1117 09:17:20.883136 135 loader.go:374] Config loaded from file: /etc/kubernetes/bootstrap-kubelet.conf
I1117 09:17:20.883425 135 kubelet.go:156] [kubelet-start] Checking for an existing Node in the cluster with name "k8ssandra-multinode-worker01-worker" and status "Ready"
I1117 09:17:20.885860 135 round_trippers.go:553] GET https://k8ssandra-multinode-worker01-control-plane:6443/api/v1/nodes/k8ssandra-multinode-worker01-worker?timeout=10s 404 Not Found in 2 milliseconds
I1117 09:17:20.886355 135 kubelet.go:171] [kubelet-start] Stopping the kubelet
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I1117 09:17:26.056747 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:31.059092 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:36.058998 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:41.060172 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:46.060067 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:51.055910 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:17:56.059946 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] Initial timeout of 40s passed.
I1117 09:18:01.059081 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:06.059225 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:11.058780 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:16.059417 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:21.055401 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:26.058925 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:31.059803 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:36.056263 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
I1117 09:18:41.059902 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:46.058631 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:51.055696 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:18:56.056784 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:01.056467 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:06.059134 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:11.060130 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
I1117 09:19:16.056531 135 loader.go:374] Config loaded from file: /etc/kubernetes/kubelet.conf
timed out waiting for the condition
error execution phase kubelet-start
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
cmd/kubeadm/app/cmd/join.go:181
github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:974
github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
cmd/kubeadm/app/kubeadm.go:50
main.main
cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1594
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
journalctl -u kubelet -f
Nov 17 09:36:36 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:36:36 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 17 09:36:46 k8ssandra systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 172.
Nov 17 09:36:46 k8ssandra systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 17 09:36:46 k8ssandra systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 17 09:36:46 k8ssandra kubelet[115167]: E1117 09:36:46.972947 115167 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
Nov 17 09:36:46 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:36:46 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.
Nov 17 09:36:57 k8ssandra systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 173.
Nov 17 09:36:57 k8ssandra systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 17 09:36:57 k8ssandra systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 17 09:36:57 k8ssandra kubelet[115206]: E1117 09:36:57.215363 115206 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
Nov 17 09:36:57 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:36:57 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.
^[[ANov 17 09:37:07 k8ssandra systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 174.
Nov 17 09:37:07 k8ssandra systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Nov 17 09:37:07 k8ssandra systemd[1]: Started kubelet: The Kubernetes Node Agent.
Nov 17 09:37:07 k8ssandra kubelet[115249]: E1117 09:37:07.463490 115249 run.go:74] "command failed" err="failed to load kubelet config file, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory, path: /var/lib/kubelet/config.yaml"
Nov 17 09:37:07 k8ssandra systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 09:37:07 k8ssandra systemd[1]: kubelet.service: Failed with result 'exit-code'.
Related
please, I have a giant problem for more than 10 hours.
Whenever I run my application deployment in Rails, with Capistrano and Puma, and I run a restart of nginx, I see this error when I try to access my web:
enter image description here
When I access my nginx logs, I see the following error:
2020/12/29 04:09:50 [crit] 9536#9536: *73 connect() to unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock failed (2: No such file or directory) while connecting to upstream, client: [CLIENT_ID], server: , request: "GET / HTTP/1.1", upstream: "http://unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock:/", host: "[MY_HOST]"
2020/12/29 04:09:50 [crit] 9536#9536: *73 connect() to unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock failed (2: No such file or directory) while connecting to upstream, client: [CLIENT_ID], server: , request: "GET / HTTP/1.1", upstream: "http://unix:///home/ubuntu/apps/my_app/shared/tmp/sockets/my_app-puma.sock:/500.html", host: "[MY_HOST]"
Thanks in advance for someone's help. Because it has been over 10 hours that I am trying to solve this problem of missing the ".sock" file and I can't
Update 1:
Following a tutorial I create:
I create in the path: /etc/systemd/system a file: puma-website.service
Inside has:
After=network.target
[Service]
# Foreground process (do not use --daemon in ExecStart or config.rb)
Type=simple
# Preferably configure a non-privileged user
User=ubuntu
Group=ubuntu
# Specify the path to your puma application root
WorkingDirectory=/home/ubuntu/my_app/current
# Helpful for debugging socket activation, etc.
Environment=PUMA_DEBUG=1
#EnvironmentFile=/var/www/my-website.com/.env
# The command to start Puma
ExecStart=/home/ubuntu/.rbenv/shims/bundle exec puma -C /home/ubuntu/my_app/current/config/puma.rb
Restart=always
[Install]
WantedBy=multi-user.target
But I error:
:/etc/systemd/system$ sudo systemctl status puma-website.service
β puma-website.service
Loaded: loaded (/etc/systemd/system/puma-website.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-12-29 00:52:19 UTC; 12h ago
Process: 4316 ExecStart=/home/ubuntu/.rbenv/shims/bundle exec puma -C /home/ubuntu/my_app/current/config/puma.rb (code=exited, status=1/FAILURE
Main PID: 4316 (code=exited, status=1/FAILURE)
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Main process exited, code=exited, status=1/FAILURE
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Failed with result 'exit-code'.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Service hold-off time over, scheduling restart.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Scheduled restart job, restart counter is at 10.
Dec 29 00:52:19 MyIp systemd[1]: Stopped puma-website.service.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Start request repeated too quickly.
Dec 29 00:52:19 MyIp systemd[1]: puma-website.service: Failed with result 'exit-code'.
Dec 29 00:52:19 MyIp systemd[1]: Failed to start puma-website.service.
I have docker installed on Ubuntu 18.04.2 with snap.
When I try to start docker it fails with the following error log.
2020-07-16T23:49:14Z docker.dockerd[932]: failed to start containerd: timeout waiting for containerd to start
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Main process exited, code=exited, status=1/FAILURE
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Failed with result 'exit-code'.
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Service hold-off time over, scheduling restart.
2020-07-16T23:49:14Z systemd[1]: snap.docker.dockerd.service: Scheduled restart job, restart counter is at 68.
2020-07-16T23:49:14Z systemd[1]: Stopped Service for snap application docker.dockerd.
2020-07-16T23:49:14Z systemd[1]: Started Service for snap application docker.dockerd.
It goes over and over into a restart loop. What should I do to get docker working again?
In this case, docker was waiting for containerd to start. The containerd pid is located at
/var/snap/docker/471/run/docker/containerd/containerd.pid.
This pid didn't exist. But the file was not deleted when the server was unceremoniously shutdown. Deleting this file allows the containerd process to start again, and problem is solved. I believe similar problems exist out there where docker.pid file also points to a non-existent pid.
Ive also faced error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout on fresh docker install on Arch linux today.
Ive installed docker and tried to start it:
sudo systemctl enable docker
sudo systemctl start docker
It dont start: sudo systemctl status docker says:
Γ docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2022-02-20 20:29:53 +03; 8s ago
TriggeredBy: Γ docker.socket
Docs: https://docs.docker.com
Process: 8368 ExecStart=/usr/bin/dockerd -H fd:// (code=exited, status=1/FAILURE)
Main PID: 8368 (code=exited, status=1/FAILURE)
CPU: 414ms
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: Stopped Docker Application Container Engine.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Start request repeated too quickly.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: docker.service: Failed with result 'exit-code'.
Feb 20 20:29:53 V-LINUX-087 systemd[1]: Failed to start Docker Application Container Engine.
I managed to get more info after executing sudo dockerd:
$ sudo dockerd
INFO[2022-02-20T20:32:05.923357711+03:00] Starting up
INFO[2022-02-20T20:32:05.924015767+03:00] libcontainerd: started new containerd process pid=8618
INFO[2022-02-20T20:32:05.924036777+03:00] parsed scheme: "unix" module=grpc
INFO[2022-02-20T20:32:05.924043494+03:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2022-02-20T20:32:05.924058420+03:00] ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}] <nil> <nil>} module=grpc
INFO[2022-02-20T20:32:05.924068315+03:00] ClientConn switching balancer to "pick_first" module=grpc
containerd: /usr/lib/libc.so.6: version `GLIBC_2.34' not found (required by containerd)
ERRO[2022-02-20T20:32:05.924198775+03:00] containerd did not exit successfully error="exit status 1" module=libcontainerd
WARN[2022-02-20T20:32:06.925000686+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:09.397384787+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:13.645272915+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
WARN[2022-02-20T20:32:19.417671818+03:00] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: error while dialing: dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
failed to start containerd: timeout waiting for containerd to start
So it seems like containerd could not start in my case.
I tried sudo containerd and voila:
$ sudo containerd
containerd: /usr/lib/libc.so.6: version `GLIBC_2.34' not found (required by containerd)
On my OS (Arch linux) the solution was to update the package:
sudo pacman -S lib32-glibc
If may be just sudo pacman -S glibc for someone on arch linux as weel
I have 1 master and 5 nodes k8s cluster. I am setting EFK with ref: https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes#step-4-%E2%80%94-creating-the-fluentd-daemonset
While Creating the Fluentd DaemonSet, 1 out 5 fluentd is in ImagePullBackOff state :
kubectl get all -n kube-logging -o wide Tue Apr 21 03:49:26 2020
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES
SELECTOR
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
ds/fluentd 5 5 4 5 4 <none> 1d fluentd fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-e
lasticsearch-1.1 app=fluentd
NAME READY STATUS RESTARTS AGE IP NODE
po/fluentd-82h6k 1/1 Running 1 1d 100.96.15.56 ip-172-20-52-52.us-west-1.compute.internal
po/fluentd-8ghjq 0/1 ImagePullBackOff 0 17h 100.96.10.170 ip-172-20-58-72.us-west-1.compute.internal
po/fluentd-fdmc8 1/1 Running 1 1d 100.96.3.73 ip-172-20-63-147.us-west-1.compute.internal
po/fluentd-g7755 1/1 Running 1 1d 100.96.2.22 ip-172-20-60-101.us-west-1.compute.internal
po/fluentd-gj8q8 1/1 Running 1 1d 100.96.16.17 ip-172-20-57-232.us-west-1.compute.internal
admin#ip-172-20-58-79:~$ kubectl describe po/fluentd-8ghjq -n kube-logging
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 12m (x4364 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Back-off pulling image "fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1"
Warning FailedSync 2m (x4612 over 17h) kubelet, ip-172-20-58-72.us-west-1.compute.internal Error syncing pod
Kubelet logs on node which is failing to run Fulentd
admin#ip-172-20-58-72:~$ journalctl -u kubelet -f
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095334 755 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: E0421 03:53:53.095369 755 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 03:53:53 ip-172-20-58-72 kubelet[755]: W0421 03:53:53.095440 755 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available
Apr 21 03:53:54 ip-172-20-58-72 kubelet[755]: I0421 03:53:54.882213 755 server.go:779] GET /metrics/cadvisor: (50.308555ms) 200 [[Prometheus/2.12.0] 172.20.58.79:54492]
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: I0421 03:53:55.452951 755 kuberuntime_manager.go:500] Container {Name:fluentd Image:fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:FLUENT_ELASTICSEARCH_HOST Value:vpc-cog-01-es-dtpgkfi.ap-southeast-1.es.amazonaws.com ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_PORT Value:443 ValueFrom:nil} {Name:FLUENT_ELASTICSEARCH_SCHEME Value:https ValueFrom:nil} {Name:FLUENTD_SYSTEMD_CONF Value:disable ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:varlog ReadOnly:false MountPath:/var/log SubPath: MountPropagation:<nil>} {Name:varlibdockercontainers ReadOnly:true MountPath:/var/lib/docker/containers SubPath: MountPropagation:<nil>} {Name:fluentd-token-k8fnp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 21 03:53:55 ip-172-20-58-72 kubelet[755]: E0421 03:53:55.455327 755 pod_workers.go:182] Error syncing pod aa65dd30-82f2-11ea-a005-0607d7cb72ed ("fluentd-8ghjq_kube-logging(aa65dd30-82f2-11ea-a005-0607d7cb72ed)"), skipping: failed to "StartContainer" for "fluentd" with ImagePullBackOff: "Back-off pulling image \"fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1\""
Kubelet logs on the node which is running Fulentd successfully
admin#ip-172-20-63-147:~$ journalctl -u kubelet -f
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874293 1272 summary.go:92] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: E0421 04:09:25.874336 1272 summary.go:92] Failed to get system container stats for "/system.slice/docker.service": failed to get cgroup stats for "/system.slice/docker.service": failed to get container info for "/system.slice/docker.service": unknown container "/system.slice/docker.service"
Apr 21 04:09:25 ip-172-20-63-147 kubelet[1272]: W0421 04:09:25.874453 1272 helpers.go:847] eviction manager: no observation found for eviction signal allocatableNodeFs.available
I can't install kubernetes in CentOS following this installation guide (link).
1: Flannel and docker service can't start after default installation
By default above installation will install Docker 1.12, but flannel and docker service can't start.
β flanneld.service - Flanneld overlay address etcd agent
Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled)
Active: activating (start) since Mon 2017-03-20 11:24:45 EDT; 27s ago
Main PID: 31572 (flanneld)
CGroup: /system.slice/flanneld.service
ββ31572 /usr/bin/flanneld -etcd-endpoints=http://127.0.0.1:2379 -etcd-prefix=/atomic.io/network
Mar 20 11:25:00 JackKubeNode1 flanneld-start[31572]: E0320 11:25:00.259468 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:01 JackKubeNode1 flanneld-start[31572]: E0320 11:25:01.265559 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:02 JackKubeNode1 flanneld-start[31572]: E0320 11:25:02.592586 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:03 JackKubeNode1 flanneld-start[31572]: E0320 11:25:03.677965 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:04 JackKubeNode1 flanneld-start[31572]: E0320 11:25:04.719815 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:05 JackKubeNode1 flanneld-start[31572]: E0320 11:25:05.820301 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:09 JackKubeNode1 flanneld-start[31572]: E0320 11:25:09.016167 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:10 JackKubeNode1 flanneld-start[31572]: E0320 11:25:10.021494 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:11 JackKubeNode1 flanneld-start[31572]: E0320 11:25:11.022784 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:12 JackKubeNode1 flanneld-start[31572]: E0320 11:25:12.238389 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Mar 20 11:25:13 JackKubeNode1 flanneld-start[31572]: E0320 11:25:13.513397 31572 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
A dependency job for docker.service failed. See 'journalctl -xe' for details.
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
β docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
ββflannel.conf
Active: inactive (dead) since Mon 2017-03-20 11:25:16 EDT; 1min 29s ago
Docs: http://docs.docker.com
Main PID: 30412 (code=exited, status=0/SUCCESS)
Mar 20 11:18:32 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:18:32.059329808-04:00" level=info msg="Daemon has completed initialization"
Mar 20 11:18:32 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:18:32.059499814-04:00" level=info msg="Docker daemon" commit="96d83a5/1.12.6" graphdriver=devicemapper version=1.12.6
Mar 20 11:18:33 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:18:33.169919003-04:00" level=info msg="API listen on /var/run/docker.sock"
Mar 20 11:18:33 JackKubeNode1 systemd[1]: Started Docker Application Container Engine.
Mar 20 11:25:15 JackKubeNode1 systemd[1]: Stopping Docker Application Container Engine...
Mar 20 11:25:15 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:25:15.912002109-04:00" level=info msg="Processing signal 'terminated'"
Mar 20 11:25:16 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:25:15.982882827-04:00" level=info msg="stopping containerd after receiving terminated"
Mar 20 11:25:16 JackKubeNode1 dockerd-current[30412]: time="2017-03-20T11:25:16.352579523-04:00" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing"
Mar 20 11:26:42 JackKubeNode1 systemd[1]: Dependency failed for Docker Application Container Engine.
Mar 20 11:26:42 JackKubeNode1 systemd[1]: Job docker.service/start failed with result 'dependency'.
2: It is said in link above issue is fixed in docker 1.13. So I manually install docker first then install kubernetes. But docker-ce-17.03 was installed, then there was conflicts between kubernetes and docker-ce-17.03 during kubernetes dependency resolves. How to work it around?
Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker\n-Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker\n-Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker-io
Processing Conflict: docker-ce-17.03.0.ce-1.el7.centos.x86_64 conflicts docker-io
Processing Conflict: docker-ce-selinux-17.03.0.ce-1.el7.centos.noarch conflicts docker-selinux
Processing Conflict: docker-ce-selinux-17.03.0.ce-1.el7.centos.noarch conflicts docker-selinux
3: Recently Docker renamed docker-VERSION as docker-ce-VERSION, and looks like kubernetes doesn't accept new name docker-ce-VERSION. I think the issue I met can be worked around if I manually install docker-1.13. But how to install docker-1.13? I always install docker-ce-17.03 when running "yum install docker".
Docker ce 17.03 is basically docker 1.13 which isn't supported yet by stable kubernetes release. See this kuberentes github issue
centos7 via vmware workstation player, and
[root#localhost Desktop]# uname -r
3.10.0-229.14.1.el7.x86_64
first, yum install docker-engine
then, other_args="--selinux-enabled" >> /etc/sysconfig/docker
when service docker start,I got error:
[root#localhost Desktop]# systemctl status docker.service -l
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled)
Active: activating (start) since ζ₯ 2015-10-25 19:49:32 PDT; 46s ago
Docs: https://docs.docker.com
Main PID: 14387 (docker)
CGroup: /system.slice/docker.service
ββ14387 /usr/bin/docker daemon -H fd://
10ζ 25 19:49:32 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine.
10ζ 25 19:49:32 localhost.localdomain systemd[1]: Unit docker.service entered failed state.
10ζ 25 19:49:32 localhost.localdomain systemd[1]: Starting Docker Application Container Engine...
10ζ 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.092885953-07:00" level=info msg="[graphdriver] using prior storage driver \"devicemapper\""
10ζ 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.093697949-07:00" level=info msg="Option DefaultDriver: bridge"
10ζ 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.093729432-07:00" level=info msg="Option DefaultNetwork: bridge"
10ζ 25 19:49:33 localhost.localdomain docker[14387]: time="2015-10-25T19:49:33.108983655-07:00" level=warning msg="Running modprobe bridge nf_nat br_netfilter failed with message: modprobe: WARNING: Module br_netfilter not found.\n, error: exit status 1"
who can help me ? thanks.