Flannel failed to start when installing kubernetes - docker

I'm installing kubernetes, but docker and kube-apiserver both failed to start. Through journal, finding that the reason is flannel. So I try to start flannel but it failed. Here is the output info:
Job for flanneld.service failed because a timeout was exceeded. See "systemctl status flanneld.service" and "journalctl -xe" for details.
Then I typed systemctl status flanneld.service, and here is the result:
● flanneld.service - Flanneld overlay address etcd agent
Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled)
Active: activating (start) since Fri 2018-01-26 03:28:08 EST; 51s ago
Main PID: 5870 (flanneld)
Memory: 6.5M
CGroup: /system.slice/flanneld.service
└─5870 /usr/bin/flanneld -etcd-endpoints=http://127.0.0.1:2379 -etcd-prefix=/atomic.io/network
Jan 26 03:28:50 node-1 flanneld-start[5870]: E0126 03:28:50.755931 5870 network.go:102] failed to retrieve network config: 100: Key not found (/atomic.io) [23858]
Is there anyone can help me on this?

Curl the endpoint you have defined. You may need to point flannel to your master node.
/etc/sysconfig/flanneld
FLANNEL_ETCD_ENDPOINTS="http://<master ip>:2379"
FLANNEL_ETCD_PREFIX="/kube-centos/network"
Try that and then restart flanneld.

Related

Docker - error after moved storage to second disk and using overlay2

I just moved Docker default storage location to second disk setting up a /etc/docker/daemon.json as described in documentation, so far so goood.
The problem is that now I keep getting a bunch of volumes being continuously (re)mounted, ad obiously it is really annoying.
So I tried to set up overlay2 in /etc/docker/daemon.json, but now Docker doesn't event start
# sudo systemctl restart docker
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xeu docker.service" for details.
# systemctl status docker.service
× docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2022-12-15 11:06:36 CET; 10s ago
TriggeredBy: × docker.socket
Docs: https://docs.docker.com
Process: 17614 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status>
Main PID: 17614 (code=exited, status=1/FAILURE)
CPU: 54ms
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: Stopped Docker Application Container Engine.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: docker.service: Start request repeated too quickly.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: docker.service: Failed with result 'exit-code'.
dic 15 11:06:36 sgratani-OptiPlex-7060 systemd[1]: Failed to start Docker Application Container Engine.
So, for now I give up using overlay2 since having all the Docker images and container on the seond disk is more important than getting rid of a bunch of volumes being mounted continuously, but can anyone tells me where the problem is and if there is a solution?
Update #1: strange permissions behaviour problem
Got a simple docker-compose.yml with a Wordpress service (official WP image) and a database service, and when I have the docker storage location on the second disk instead of default one the database (volume maybe?) seems inaccessible:
wordpress keep giving error on db connection
trying to run mysql interactive from db service result in error on login with root user
Obviously this is related to the docker storage location, but cannot find why, since new location is created by docker itself when started.

Can nerdctl/crictl be used to list containers started by docker

I'm using version 20.10.21 of docker, in my understanding docker with this version uses containerd to manage image and container lifecycle, but why cannot I use crictl/nerdctl to list the containers which I started by docker cli?
What I've tried:
Check if docker uses containerd to manage contianers, ths is the result of systemctl status docker
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─http-proxy.conf
Active: active (running) since Sun 2022-12-04 22:44:27 CST; 1min 18s ago TriggeredBy: ● docker.socket
Docs: https://docs.docker.com Main PID: 1821 (dockerd)
Tasks: 91 (limit: 38297)
Memory: 229.6M
CPU: 1.214s
CGroup: /system.slice/docker.service
├─1821 /usr/bin/dockerd -H fd://
├─1845 containerd --config /var/run/docker/containerd/containerd.toml --log-level info
I guess this means containerd is started by docker daemon. And the unix socket is located at /var/run/docker/containerd/containerd.sock
Try nerdctl to list containers but got error message:
$ nerdctl --address unix:///var/run/docker/containerd/containerd.sock ps
FATA[0000] rootless containerd not running? (hint: use `containerd-rootless-setuptool.sh install` to start rootless containerd): stat /run/user/1000/containerd-rootless: no such file or directory
Then I tried it again with sudo
sudo nerdctl --address unix:///var/run/docker/containerd/containerd.sock ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
As you can see, there's no container listed, but docker ps shows many containers I started.
Try crictl to check result, but got errors:
sudo crictl --r unix:///var/run/docker/containerd/containerd.sock ps
E1204 22:47:27.190569 3925 remote_runtime.go:557] "ListContainers with filter from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService" filter="&ContainerFilter{Id:,State:&ContainerStateValue{State:CONTAINER_RUNNING,},PodSandboxId:,LabelSelector:map[string]string{},}"
FATA[0000] listing containers: rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService
So my questions is: Why can't I get the same results of docker cli by nerdctl/crictl? Is there anything wrong I've done? or anything wrong in my understanding?
Thanks for any tips.

Cannot initialize Kubernetes cluster on Ubuntu 18.04 (Virtual Box)

I struggle to initialize a simple Kubernetes cluster using Ubuntu on Virtualbox. I tried both server and desktop version, following the official documentation:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
I also tried to follow some other ones, thinking the issue was because i'm using Virtualbox VM's, like this one:
https://medium.com/#gunjangarge/create-kubernetes-cluster-using-kubeadm-on-ubuntu-virtualbox-step-by-step-68a3eeb1f74c
But everytime I have the same issue with port 6443 not being exposed. Sometimes the process starts correctly, giving me the join command:
kubeadm init --pod-network-cidr=192.168.0.0/16
W1029 08:47:53.841460 11540 configset.go:348] WARNING: kubeadm cannot validate component configs
for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.192:6443 --token ztnoww.t8ng5a3jo2kx5cb2 \
--discovery-token-ca-cert-hash
sha256:907dde6cc6d72ed4cd7fe7e7f252e2cf657dd3256fba6ee5ec92027132a9c5af
Sometimes it's not starting at all and timeouting:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
Anyway, even when it's starting, port 6443 is never exposed, and kubelet is not happy with it:
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Thu 2020-10-29 08:48:15 CET; 20s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 13262 (kubelet)
Tasks: 14 (limit: 4666)
CGroup: /system.slice/kubelet.service
└─13262 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --pod-infra-contai
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.588386 13262 controller.go:136] failed to ensure node lease exists, will retry in 800ms, error: Get
"https://192.168.1.192:6443/apis/coordination.k8s.io/v1/names
Okt 29 08:48:22 master kubelet[13262]: E1029 08:48:22.785951 13262 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.1.192:644
Okt 29 08:48:23 master kubelet[13262]: I1029 08:48:23.022354 13262 kubelet_node_status.go:70] Attempting to register node master
Okt 29 08:48:24 master kubelet[13262]: I1029 08:48:24.188510 13262 request.go:645] Throttling request took 1.097264312s, request: POST:https://192.168.1.192:6443/api/v1/namespaces/kube-system/pods
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.678880 13262 kubelet_node_status.go:108] Node master was previously registered
Okt 29 08:48:25 master kubelet[13262]: I1029 08:48:25.679004 13262 kubelet_node_status.go:73] Successfully registered node master
Okt 29 08:48:25 master kubelet[13262]: W1029 08:48:25.765981 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:27 master kubelet[13262]: E1029 08:48:27.148246 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
Okt 29 08:48:30 master kubelet[13262]: W1029 08:48:30.767511 13262 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
Okt 29 08:48:32 master kubelet[13262]: E1029 08:48:32.164211 13262 kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: c
I have to say I don't know what to do now. I tried for hours with different Ubuntu versions, trying to find solutions on the Internet but I didn't found any solution. I also went trough the logs and found that maybe the config file is not created correctly for any reason:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml
but I found nothing about it, except "try to init the cluster again", which I did several times..
Thank you in advance for your help!
OK, I think I finally found the problem. I tried the same process on another PC and everything worked smoothly, so for anyway of you having a similar issue, just don't try to use VirtualBox and WSL at the same time (even if wsl is shut off)
I just did what's explained here: https://stackoverflow.com/a/63229718/2428805 and now everything's fine...

Cannot start memcached

I cannot get memcached to run on my server.
This is what I tried so far:
% sudo systemctl start memcached # no output
% sudo systemctl status memcached.service
● memcached.service - memcached daemon
Loaded: loaded (/lib/systemd/system/memcached.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2020-02-16 17:45:09 CET; 4s ago
Process: 22725 ExecStart=/usr/share/memcached/scripts/systemd-memcached-wrapper /etc/memcached.conf (code=exited, status=71)
Main PID: 22725 (code=exited, status=71)
systemd[1]: Started memcached daemon.
systemd-memcached-wrapper[22725]: bind(): Cannot assign requested address
systemd-memcached-wrapper[22725]: failed to listen on TCP port 11211: Cannot assign requested address
systemd[1]: memcached.service: Main process exited, code=exited, status=71/n/a
systemd[1]: memcached.service: Unit entered failed state.
systemd[1]: memcached.service: Failed with result 'exit-code'.
I am running Ubuntu 16.04.6 LTS
How can I start my memcached service?
Have a look into /etc/memcached.conf there might be written sth. like
-l xxx.xx.xx.xx
If you are trying to connect via localhost: just comment the line.
If you are trying to connect from somewhere else check the IP for correctness.

When using mesos, marathon, and zookeeper my mesos-slave doesnt start when I specify the "containerizers" file with "docker,mesos"?

I have 3 CentOS VMs and I have installed Zookeeper, Marathon, and Mesos on the master node, while only putting Mesos on the other 2 VMs. The master node has no mesos-slave running on it. I am trying to run Docker containers so i specified "docker,mesos" in the containerizes file. One of the mesos-agents starts fine with this configuration and I have been able to deploy a container to that slave. However, the second mesos-agent simply fails when I have this configuration (it works if i take out that containerizes file but then it doesn't run containers). Here are some of the logs and information that has come up:
Here are some "messages" in the log directory:
Apr 26 16:09:12 centos-minion-3 systemd: Started Mesos Slave.
Apr 26 16:09:12 centos-minion-3 systemd: Starting Mesos Slave...
WARNING: Logging before InitGoogleLogging() is written to STDERR
[main.cpp:243] Build: 2017-04-12 16:39:09 by centos
[main.cpp:244] Version: 1.2.0
[main.cpp:247] Git tag: 1.2.0
[main.cpp:251] Git SHA: de306b5786de3c221bae1457c6f2ccaeb38eef9f
[logging.cpp:194] INFO level logging started!
[systemd.cpp:238] systemd version `219` detected
[main.cpp:342] Inializing systemd state
[systemd.cpp:326] Started systemd slice `mesos_executors.slice`
[containerizer.cpp:220] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
[linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
[provisioner.cpp:249] Using default backend 'copy'
[slave.cpp:211] Mesos agent started on (1)#172.22.150.87:5051
[slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="linux" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
[slave.cpp:541] Agent resources: cpus(*):1; mem(*):919; disk(*):2043; ports(*):[31000-32000]
[slave.cpp:549] Agent attributes: [ ]
[slave.cpp:554] Agent hostname: node3
[status_update_manager.cpp:177] Pausing sending status updates
[state.cpp:62] Recovering state from '/var/lib/mesos/meta'
[state.cpp:706] No committed checkpointed resources found at '/var/lib/mesos/meta/resources/resources.info'
[status_update_manager.cpp:203] Recovering status update manager
[docker.cpp:868] Recovering Docker containers
[containerizer.cpp:599] Recovering containerizer
[provisioner.cpp:410] Provisioner recovery complete
[group.cpp:340] Group process (zookeeper-group(1)#172.22.150.87:5051) connected to ZooKeeper
[group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
[group.cpp:418] Trying to create path '/mesos' in ZooKeeper
[detector.cpp:152] Detected a new leader: (id='15')
[group.cpp:699] Trying to get '/mesos/json.info_0000000015' in ZooKeeper
[zookeeper.cpp:259] A new leading master (UPID=master#172.22.150.88:5050) is detected
Failed to perform recovery: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1; stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service: main process exited, code=exited, status=1/FAILURE
Apr 26 16:09:13 centos-minion-3 systemd: Unit mesos-slave.service entered failed state.
Apr 26 16:09:13 centos-minion-3 systemd: mesos-slave.service failed.
Logs from docker:
$ sudo systemctl status docker
● docker.service - Docker Application Container Engine Loaded:
loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/docker.service.d
└─flannel.conf Active: inactive (dead) since Tue 2017-04-25 18:00:03 CDT;
24h ago Docs: docs.docker.com Main PID: 872 (code=exited, status=0/SUCCESS)
Apr 26 18:25:25 centos-minion-3 systemd[1]: Dependency failed for Docker Application Container Engine.
Apr 26 18:25:25 centos-minion-3 systemd[1]: Job docker.service/start failed with result 'dependency'
Logs from flannel:
[flanneld-start: network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
You have answer in your logs
Failed to perform recovery: Collect failed:
Failed to run 'docker -H unix:///var/run/docker.sock ps -a': exited with status 1;
stderr='Cannot connect to the Docker daemon. Is the docker daemon running on this host?'
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/meta/slaves/latest
This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
Mesos keeps it state/metadata on local disk. When it's restarted it try to load this state. If configuration changed and is not compatible with previous state it won't start.
Just bring docker to live by fixing problems with flannel and etcd and everything will be fine.
add the following flag while starting agent,
--reconfiguration_policy=additive
more details here: http://mesos.apache.org/documentation/latest/agent-recovery/

Resources