kube-addons.service failed on CoreOS-libvirt installation - docker

I have the following issue installing and provisioning my Kubernetes CoreOS-libvirt-based cluster.
When I'm logging on the master node, I see the following:
ssh core#192.168.10.1
Last login: Thu Dec 10 17:19:21 2015 from 192.168.10.254
CoreOS alpha (884.0.0)
Update Strategy: No Reboots
Failed Units: 1
kube-addons.service
Trying to debug it, I run and receive the following:
core#kubernetes-master ~ $ systemctl status kube-addons.service
● kube-addons.service - Kubernetes addons
Loaded: loaded (/etc/systemd/system/kube-addons.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2015-12-10 16:41:06 UTC; 41min ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 801 ExecStart=/opt/kubernetes/bin/kubectl create -f /opt/kubernetes/addons (code=exited, status=1/FAILURE)
Process: 797 ExecStartPre=/bin/sleep 10 (code=exited, status=0/SUCCESS)
Process: 748 ExecStartPre=/bin/bash -c while [[ "$(curl -s http://127.0.0.1:8080/healthz)" != "ok" ]]; do sleep 1; done (code=exited, status=0/SUCCESS)
Main PID: 801 (code=exited, status=1/FAILURE)
Dec 10 16:40:53 kubernetes-master systemd[1]: Starting Kubernetes addons...
Dec 10 16:41:06 kubernetes-master kubectl[801]: replicationcontroller "skydns" created
Dec 10 16:41:06 kubernetes-master kubectl[801]: error validating "/opt/kubernetes/addons/skydns-svc.yaml": error validating data: found invalid field portalIP for v1.ServiceSpec; if you choose to ignore these errors, turn validation off with --validate=false
Dec 10 16:41:06 kubernetes-master systemd[1]: kube-addons.service: Main process exited, code=exited, status=1/FAILURE
Dec 10 16:41:06 kubernetes-master systemd[1]: Failed to start Kubernetes addons.
Dec 10 16:41:06 kubernetes-master systemd[1]: kube-addons.service: Unit entered failed state.
Dec 10 16:41:06 kubernetes-master systemd[1]: kube-addons.service: Failed with result 'exit-code'.
My etcd version is:
etcd --version
etcd version 0.4.9
But I have a etcd2 also:
etcd2 --version
etcd Version: 2.2.2
Git SHA: b4bddf6
Go Version: go1.4.3
Go OS/Arch: linux/amd64
And at the current moment the second one is being runned:
ps aux | grep etcd
etcd 731 0.5 8.4 329788 42436 ? Ssl 16:40 0:16 /usr/bin/etcd2
root 874 0.4 7.4 59876 37804 ? Ssl 17:19 0:02 /opt/kubernetes/bin/kube-apiserver --address=0.0.0.0 --port=8080 --etcd-servers=http://127.0.0.1:2379 --kubelet-port=10250 --service-cluster-ip-range=10.11.0.0/16
core 953 0.0 0.1 6740 876 pts/0 S+ 17:27 0:00 grep --colour=auto etcd
What causes the issue and how can I solve it?
Thank you.

The relevant log line is:
/opt/kubernetes/addons/skydns-svc.yaml": error validating data: found invalid field portalIP for v1.ServiceSpec; if you choose to ignore these errors, turn validation off with --validate=false
You should figure out what's invalid about that IP or set the flag to ignore.

Related

How to fix docker storage-driver=overlay2 problem

I need to change the underlying storage for a Proxmox LXC Debian Buster container from RAW to ZFS. For this I restored a snapshot to ZFS storage. This is normally transparent for the OS in the container, but in this case docker no longer starts.
The initial problem was that docker wasn't started, and after some digging around I find this:
# dockerd -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
INFO[2021-08-03T09:24:40.909844803Z] Starting up
...
ERRO[2021-08-03T09:24:56.914420548Z] failed to mount overlay: invalid argument storage-driver=overlay2
ERRO[2021-08-03T09:24:56.914439880Z] [graphdriver] prior storage driver overlay2 failed: driver not supported
failed to start daemon: error initializing graphdriver: driver not supported
How can I fix this?
EDIT:
I tried the suggested fix, but still no cigar:
root#mail:/var/log# systemctl status docker.service
* docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sat 2021-10-09 10:05:49 UTC; 1min 23s ago
Docs: https://docs.docker.com
Process: 236 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
Main PID: 236 (code=exited, status=1/FAILURE)
Oct 09 10:05:49 mail systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
Oct 09 10:05:49 mail systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Oct 09 10:05:49 mail systemd[1]: Stopped Docker Application Container Engine.
Oct 09 10:05:49 mail systemd[1]: docker.service: Start request repeated too quickly.
Oct 09 10:05:49 mail systemd[1]: docker.service: Failed with result 'exit-code'.
Oct 09 10:05:49 mail systemd[1]: Failed to start Docker Application Container Engine.
The link offered suggests creating a new zpool within the container. Seems a bit of an overkill for that to be necessary, no?
Configure Docker to use zfs. Edit /etc/docker/daemon.json and set the storage-driver to zfs. If the file was empty before, it should now look like this:
{
"storage-driver": "zfs"
}
more details: https://docs.docker.com/storage/storagedriver/zfs-driver/

Error creating default \"bridge\" network: package not installed"

Suddently my docker daemon stop and never turned on again. I'm running docker on a Linux raspberrypi 4.1.13-v7+. It worked before until last week when my docker service suddenly stop working and I don't have a clue why.
My docker version is:
raspberrypi:~ $ docker version
Client:
Version: 1.10.3
API version: 1.22
Go version: go1.4.3
Git commit: 20f81dd
Built: Thu Mar 10 22:23:48 2016
OS/Arch: linux/arm
An error occurred trying to connect: Get http:///var/run/docker.sock/v1.22/version: read unix /var/run/docker.sock: connection reset by peer
Socket is ok:
● docker.socket - Docker Socket for the API
Loaded: loaded (/lib/systemd/system/docker.socket; disabled)
Active: active (listening) since Sat 2018-03-17 00:42:46 UTC; 6s ago
Listen: /var/run/docker.sock (Stream)
Looking to my service status you can see the following log:
docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled)
Active: failed (Result: start-limit) since Sat 2018-03-17 00:05:52 UTC; 4min 55s ago
Docs: https://docs.docker.com
Process: 2891 ExecStart=/usr/bin/docker daemon -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 2891 (code=exited, status=1/FAILURE)
Mar 17 00:05:52 raspberrypi docker[2891]: time="2018-03-17T00:05:52.743474604Z" level=debug msg="ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.1)"
Mar 17 00:05:52 raspberrypi docker[2891]: time="2018-03-17T00:05:52.758090386Z" level=debug msg="ReleasePool(LocalDefault/172.17.0.0/16)"
Mar 17 00:05:52 raspberrypi docker[2891]: time="2018-03-17T00:05:52.772819345Z" level=debug msg="Cleaning up old shm/mqueue mounts: start."
Mar 17 00:05:52 raspberrypi docker[2891]: time="2018-03-17T00:05:52.773269239Z" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: package not installed"
Mar 17 00:05:52 raspberrypi systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Mar 17 00:05:52 raspberrypi systemd[1]: Failed to start Docker Application Container Engine.
I already tried this solution but for me it didn't work.
How can I make my docker service start again? It seems that a package is not installed but I tried to:
raspberrypi:~ $ modprobe bridge
modprobe: FATAL: Module bridge not found.

Docker could not start after install on CentOS 7

I install docker on CentOS7(Linux version 3.10.0-327.el7.x86_64) with command yum install -y docker, but when I try to start docker with systemctl start docker, the docker failed to start, below is the error message
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2018-03-15 16:38:37 CST; 10s ago
Docs: http://docs.docker.com
Process: 5166 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --seccomp-profile=/etc/docker/seccomp.json $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY $REGISTRIES (code=exited, status=1/FAILURE)
Main PID: 5166 (code=exited, status=1/FAILURE)
Mar 15 16:38:36 localhost.localdomain systemd[1]: Starting Docker Application Container Engine...
Mar 15 16:38:36 localhost.localdomain dockerd-current[5166]: time="2018-03-15T16:38:36.570661801+08:00" level=info msg="libcontainerd... 5171"
Mar 15 16:38:37 localhost.localdomain dockerd-current[5166]: time="2018-03-15T16:38:37.585565695+08:00" level=warning msg="overlay2: the ba...
Mar 15 16:38:37 localhost.localdomain dockerd-current[5166]: Error starting daemon: SELinux is not supported with the overlay2 graph ...false)
Mar 15 16:38:37 localhost.localdomain systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Mar 15 16:38:37 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine.
Mar 15 16:38:37 localhost.localdomain systemd[1]: Unit docker.service entered failed state.
Mar 15 16:38:37 localhost.localdomain systemd[1]: docker.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
How to solve this issue?

How do I clear a thinpool device for docker

I am running docker on a Redhat system with devicemapper and thinpool device just as recommended for production systems. Now when I want to reinstall docker I need two steps:
1) remove docker directory (in my case /area51/docker)
2) clear thinpool device
The docker documentation states that when using devicemapper with dm.metadev and dm.datadev options, the easiest way of cleaning devicemapper would be:
If setting up a new metadata pool it is required to be valid.
This can be achieved by zeroing the first 4k to indicate empty metadata, like this:
$ dd if=/dev/zero of=$metadata_dev bs=4096 count=1
Unfortunately, according to the documentation, the dm.metadatadev is deprecated, it says to use dm.thinpooldev instead.
My thinpool has been created along the lines of this docker instruction
So, my setup now looks like this:
cat /etc/docker/daemon.json
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.thinpooldev=/dev/mapper/thinpool_VG_38401-thinpool",
"dm.basesize=18G"
]
}
Under the devicemapper directory i see the following thinpool devices
ls -l /dev/mapper/thinpool_VG_38401-thinpool*
lrwxrwxrwx 1 root root 7 Dec 6 08:31 /dev/mapper/thinpool_VG_38401-thinpool -> ../dm-8
lrwxrwxrwx 1 root root 7 Dec 6 08:31 /dev/mapper/thinpool_VG_38401-thinpool_tdata -> ../dm-7
lrwxrwxrwx 1 root root 7 Dec 6 08:31 /dev/mapper/thinpool_VG_38401-thinpool_tmeta -> ../dm-6
So, after running docker successfully I tried to reinstall as described above and clear the thinpool by writing 4K zeroes into the tmeta device and restart docker:
dd if=/dev/zero of=/dev/mapper/thinpool_VG_38401-thinpool_tmeta bs=4096 count=1
systemctl start docker
And endet up with
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2017-12-06 10:28:46 UTC; 10s ago
Docs: https://docs.docker.com
Process: 1566 ExecStart=/usr/bin/dockerd -G uwsgi --data-root=/area51/docker -H unix:///var/run/docker.sock (code=exited, status=1/FAILURE)
Main PID: 1566 (code=exited, status=1/FAILURE)
Memory: 236.0K
CGroup: /system.slice/docker.service
Dec 06 10:28:45 yoda3 systemd[1]: Starting Docker Application Container Engine...
Dec 06 10:28:45 yoda3 dockerd[1566]: time="2017-12-06T10:28:45.816049000Z" level=info msg="libcontainerd: new containerd process, pid: 1577"
Dec 06 10:28:46 yoda3 dockerd[1566]: time="2017-12-06T10:28:46.816966000Z" level=warning msg="failed to rename /area51/docker/tmp for background deletion: renam...chronously"
Dec 06 10:28:46 yoda3 dockerd[1566]: Error starting daemon: error initializing graphdriver: devmapper: Unable to take ownership of thin-pool (thinpool_VG_38401-...data blocks
Dec 06 10:28:46 yoda3 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Dec 06 10:28:46 yoda3 systemd[1]: Failed to start Docker Application Container Engine.
Dec 06 10:28:46 yoda3 systemd[1]: Unit docker.service entered failed state.
Dec 06 10:28:46 yoda3 systemd[1]: docker.service failed.
I assumed I could get around the 'unable to take ownership of thin-pool' by doing a reboot. But after reboot and trying to start docker again I got the following error:
systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2017-12-06 10:30:37 UTC; 2min 29s ago
Docs: https://docs.docker.com
Process: 3180 ExecStart=/usr/bin/dockerd -G uwsgi --data-root=/area51/docker -H unix:///var/run/docker.sock (code=exited, status=1/FAILURE)
Main PID: 3180 (code=exited, status=1/FAILURE)
Memory: 37.9M
CGroup: /system.slice/docker.service
Dec 06 10:30:36 yoda3 systemd[1]: Starting Docker Application Container Engine...
Dec 06 10:30:36 yoda3 dockerd[3180]: time="2017-12-06T10:30:36.893777000Z" level=warning msg="libcontainerd: makeUpgradeProof could not open /var/run/docker/lib...containerd"
Dec 06 10:30:36 yoda3 dockerd[3180]: time="2017-12-06T10:30:36.901958000Z" level=info msg="libcontainerd: new containerd process, pid: 3224"
Dec 06 10:30:37 yoda3 dockerd[3180]: Error starting daemon: error initializing graphdriver: devicemapper: Non existing device thinpool_VG_38401-thinpool
Dec 06 10:30:37 yoda3 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Dec 06 10:30:37 yoda3 systemd[1]: Failed to start Docker Application Container Engine.
Dec 06 10:30:37 yoda3 systemd[1]: Unit docker.service entered failed state.
Dec 06 10:30:37 yoda3 systemd[1]: docker.service failed.
So, obviously writing zeroes into the thinpool_meta device is not the right thing to do, it seems to destroy my thinpool device.
Anyone here that can tell me the right steps to clear the thin-pool device? Preferably the solution should not require a reboot.

After installing docker on centos7,Failed to start docker."Job for docker.service failed."

After executing yum install docker on centos7, I want to start docker by executing service docker start, then i can see the error:
Redirecting to /bin/systemctl start docker.service
Job for docker.service failed. See 'systemctl status docker.service' and 'journalctl -xn' for details.
then I execute systemctl status docker.service -l, then the error is:
docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled)
Active: failed (Result: exit-code) since Sun 2015-03-15 03:49:49 EDT; 12min ago
Docs: http://docs.docker.com
Process: 11444 ExecStart=/usr/bin/docker -d $OPTIONS $DOCKER_STORAGE_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 11444 (code=exited, status=1/FAILURE)
Mar 15 03:49:48 localhost.localdomain docker[11444]: 2015/03/15 03:49:48 docker daemon: 1.3.2 39fa2fa/1.3.2; execdriver: native; graphdriver:
Mar 15 03:49:48 localhost.localdomain docker[11444]: [a25f748b] +job serveapi(fd://)
Mar 15 03:49:48 localhost.localdomain docker[11444]: [info] Listening for HTTP on fd ()
Mar 15 03:49:48 localhost.localdomain docker[11444]: [a25f748b] +job init_networkdriver()
Mar 15 03:49:48 localhost.localdomain docker[11444]: [a25f748b] -job init_networkdriver() = OK (0)
Mar 15 03:49:49 localhost.localdomain docker[11444]: 2015/03/15 03:49:49 write /var/lib/docker/init/dockerinit-1.3.2: no space left on device
Mar 15 03:49:49 localhost.localdomain systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Mar 15 03:49:49 localhost.localdomain systemd[1]: Failed to start Docker Application Container Engine.
Mar 15 03:49:49 localhost.localdomain systemd[1]: Unit docker.service entered failed state.
I really have no idea, looking forward to your response, I will be very appreciative!
this error usually occurs because of missing device-mapper-event-libs package.
# yum install device-mapper-event-libs
Thanks for Ben Whaley's advice,When I check my disk space,Indeed it's not enough.I extend my disk space and solve the problem. It's the first time I put forward questions,It's really of help. thanks again.
I upgraded the CentOS 7 kernel from 3 to 4.
NOTE: I upgraded Kernel for other reasons also, first try without upgrading kernel.
delete the folder docker under /var/lib
go to cd /etc/sysconfig
vi docker (before editing copy docker docker.org)
see Line there you find OPTIONS='--selinux-disabled --log-driver=journald'
Remove --selinux-disabled should like OPTIONS='--log-driver=journald'
Now un-comment # setsebool -P docker_transition_unconfined 1 to setsebool -P docker_transition_unconfined 1
reboot the machine or you try only docker start to check for me it works :)

Resources