docker - driver "devicemapper" failed to remove root filesystem after process in container killed

docker - driver "devicemapper" failed to remove root filesystem after process in container killed - docker

I am using Docker version 17.06.0-ce on Redhat with devicemapper storage. I am launching a container running a long-running service. The master process inside the container sometimes dies for whatever reason. I get the following error message.
/bin/bash: line 1: 40 Killed python -u scripts/server.py start go
I would like the container to exit and to be restarted by docker. However docker never exits. If I do it manually I get the following error:
Error response from daemon: driver "devicemapper" failed to remove root filesystem.
After googling, I tried a bunch of things:
docker rm -f <container>
rm -f <pth to mount>
umount <pth to mount>
All result in device is busy. The only remedy right now is to reboot the host system which is obviously not a long-term solution.
Any ideas?

I had the same problem and the solution was a real surprise.
So here is the error om docker rm:
$ docker rm 08d51aad0e74
Error response from daemon: driver "devicemapper" failed to remove root filesystem for 08d51aad0e74060f54bba36268386fe991eff74570e7ee29b7c4d74047d809aa: remove /var/lib/docker/devicemapper/mnt/670cdbd30a3627ae4801044d32a423284b540c5057002dd010186c69b6cc7eea: device or resource busy
Then I did the following (basically go through all processes and look for docker in mountinfo):
$ grep docker /proc/*/mountinfo | grep 958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac
/proc/20416/mountinfo:629 574 253:15 / /var/lib/docker/devicemapper/mnt/958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,relatime shared:288 - xfs /dev/mapper/docker-253:5-786536-958722d105f8586978361409c9d70aff17c0af3a1970cb3c2fb7908fe5a310ac rw,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota
This got be the PID of the offending process keeping it busy - 20416 (the item after /proc/)
So I did a ps -p and to my surprise find:
[devops#dp01app5030 SeGrid]$ ps -p 20416
PID TTY TIME CMD
20416 ? 00:00:19 ntpd
A true WTF moment. So I pair problem solved with Google and found this:
Then found this https://github.com/docker/for-linux/issues/124
Turns out I had to restart ntp daemon and that fixed the issue!!!

Related

Logspout container in Docker

I am trying to deploy logspout container in docker, but keep running into an issue which I have searched in this website and github but to no avail, so hoping someone knows.
I followed the following commands as per the Readme here: https://github.com/gliderlabs/logspout
(1) docker pull gliderlabs/logspout:latest (also tried with logspout:master, same results)
(2) docker run -d --name="logspout" --volume=/var/run/docker.sock:/var/run/docker.sock --publish=127.0.0.1:8000:80 gliderlabs/logspout (also tried with -v /var/run/docker.sock:/var/run/docker.sock, same results)
The container gets created but stops immediately. When I check the container logs (docker container logs logspout), I only see the following entries:
2021/12/19 06:37:12 # logspout v3.2.14 by gliderlabs
2021/12/19 06:37:12 # adapters: raw syslog tcp tls udp multiline
2021/12/19 06:37:12 # options :
2021/12/19 06:37:12 persist:/mnt/routes
2021/12/19 06:37:12 # jobs : pump routes http[health,logs,routes]:80
2021/12/19 06:37:12 # routes : none
2021/12/19 06:37:12 pump ended: Get http://unix.sock/containers/json?: dial unix /var/run/docker.sock: connect: no such file or directory
I checked docker.sock as ls -la /var/run/docker.sock results in srw-rw---- 1 root docker 0 Dec 12 09:49 /var/run/docker.sock. So docker.sock does exist, which adds to the confusion as to why the container can't find it.
I am new to linux/docker, but my understanding is that using -v or --version would automatically mount the location to the container, but does not seem to be happening here. So I am wondering if anyone has any suggestion on what needs to be done so that the logspout container can find the docker.sock.
System Info: Docker version 20.10.11, build dea9396; Raspberry Pi 4 ARM 64, OS: Debian GNU/Linux 11 (bullseye)
EDIT: added comment about -v tag in step (2) above

The container must be able to access the Docker Unix socket to mount it. This is typically a problem when namespace remapping is enabled. To disable remapping for the logspout container, pass the --userns=host flag to docker run, .. create, etc.

Can I run k8s master INSIDE a docker container? Getting errors about k8s looking for host's kernel details

In a docker container I want to run k8s.
When I run kubeadm join ... or kubeadm init commands I see sometimes errors like
\"modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file
'/lib/modules/3.10.0-1062.1.2.el7.x86_64/modules.dep.bin'.
nmodprobe:
FATAL: Module configs not found in directory
/lib/modules/3.10.0-1062.1.2.el7.x86_64",
err: exit status 1
because (I think) my container does not have the expected kernel header files.
I realise that the container reports its kernel based on the host that is running the container; and looking at k8s code I see
// getKernelConfigReader search kernel config file in a predefined list. Once the kernel config
// file is found it will read the configurations into a byte buffer and return. If the kernel
// config file is not found, it will try to load kernel config module and retry again.
func (k *KernelValidator) getKernelConfigReader() (io.Reader, error) {
possibePaths := []string{
"/proc/config.gz",
"/boot/config-" + k.kernelRelease,
"/usr/src/linux-" + k.kernelRelease + "/.config",
"/usr/src/linux/.config",
}
so I am bit confused what is simplest way to run k8s inside a container such that it consistently past this getting the kernel info.
I note that running docker run -it solita/centos-systemd:7 /bin/bash on a macOS host I see :
# uname -r
4.9.184-linuxkit
# ls -l /proc/config.gz
-r--r--r-- 1 root root 23834 Nov 20 16:40 /proc/config.gz
but running exact same on a Ubuntu VM I see :
# uname -r
4.4.0-142-generic
# ls -l /proc/config.gz
ls: cannot access /proc/config.gz
[Weirdly I don't see this FATAL: Module configs not found in directory error every time, but I guess that is a separate question!]
UPDATE 22/November/2019. I see now that k8s DOES run okay in a container. Real problem was weird/misleading logs. I have added an answer to clarify.

I do not believe that is possible given the nature of containers.
You should instead test your app in a docker container then deploy that image to k8s either in the cloud or locally using minikube.
Another solution is to run it under kind which uses docker driver instead of VirtualBox
https://kind.sigs.k8s.io/docs/user/quick-start/

It seems the FATAL error part was a bit misleading.
It was badly formatted by my test environment (all on one line.
When k8s was failing I saw the FATAL and assumed (incorrectly) that was root cause.
When I format the logs nicely I see ...
kubeadm join 172.17.0.2:6443 --token 21e8ab.1e1666a25fd37338 --discovery-token-unsafe-skip-ca-verification --experimental-control-plane --ignore-preflight-errors=all --node-name 172.17.0.3
[preflight] Running pre-flight checks
[WARNING FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 4.4.0-142-generic
DOCKER_VERSION: 18.09.3
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.3. Latest validated version: 18.06
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-142-generic/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/4.4.0-142-generic\n", err: exit status 1
[discovery] Trying to connect to API Server "172.17.0.2:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.17.0.2:6443"
[discovery] Failed to request cluster info, will try again: [the server was unable to return a response in the time allotted, but may still be processing the request (get configmaps cluster-info)]
There are other errors later, which I originally though were a side-effect of the nasty looking FATAL error e.g. .... "[util/etcd] Attempt timed out"]} but I now think root cause is Etcd part times out sometimes.
Adding this answer in case someone else puzzled like I was.

docker - start failed because /etc/fstab not found

I'm using Window Linux Subsystem (Debian stretch). Followed the instruction on Docker website, I installed docker-ce, but it cannot start. Here is the info:
$ sudo service docker start
grep: /etc/fstab: No such file or directory
[ ok ] Starting Docker: docker.
$ sudo service docker status
[FAIL] Docker is not running ... failed!
What should I do with /etc/fstab not found?

to fix fstab
touch /etc/fstab
if you run dockerd, it will give you the failed message:
INFO[2022-01-27T17:55:14.100489400+07:00] Loading containers: start.
WARN[2022-01-27T17:55:14.191666800+07:00] Running iptables --wait -t nat -L -n failed with message: `iptables v1.8.2 (nf_tables): CHAIN_ADD failed (No such file or directory): chain PREROUTING`, error: exit status 4
INFO[2022-01-27T17:55:14.493716300+07:00] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby
INFO[2022-01-27T17:55:14.494906600+07:00] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
INFO[2022-01-27T17:55:14.495048400+07:00] stopping healthcheck following graceful shutdown module=libcontainerd
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables --wait -t nat -N DOCKER: iptables v1.8.2 (nf_tables): CHAIN_ADD failed (No such file or directory): chain PREROUTING
(exit status 4)
that is Debian nat issue, fix it with:
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
now you can start the service again
you can follow this to make it start on startup https://askubuntu.com/a/1356147/138352
Edited:
if the issue with IP table still persisted try to set WSL version to 2, run the command from Windows shell:
wsl --set-version <distribution name> 2
the distribution list can be found with command wsl -l

I was getting the same error. Apparently on my install of WSL with Debian, I didn't have an etc/fstab file. Surprisingly, just creating the file via 'touch' worked:
sudo touch /etc/fstab

Perhaps a good signal https://learn.microsoft.com/en-us/windows/wsl/release-notes#build-17093
WSL now processes the /etc/fstab file during instance start [GH 2636].

For anybody stumbling across this years later like me, Docker doesn't work inside WSL.
But you can use Docker for Windows and WSL2 to run native containers inside your Linux Distro and the install and config is quite painless https://learn.microsoft.com/en-us/windows/wsl/tutorials/wsl-containers

lxc-kill: failed to get the init pid

I got a problem ,it seems that the container is already stopped. Cause I ping the container's ip, and got no answer.
lxc-info indicate that the process is STOP
[root#matrix-node04 mnt]# lxc-info -n f3a939113d6e12450829a2dc76be3c761b818e63fbd33df513772e6e4485565e
state: STOPPED
pid: -1
But docker ps indicate the process is still running
[root#matrix-node04 mnt]# docker ps | grep f3a939113d6e
f3a939113d6e c69436ea2169 /bin/sh -c '/usr/loc 4 weeks ago Up 2 weeks d-mcl-354_lisx_test_kr22-n-3
can I use lxc-start to manualy start the container ? I tried the following cmd
[root#matrix-node04 mnt]# lxc-start -n f3a939113d6e12450829a2dc76be3c761b818e63fbd33df513772e6e4485565e -f /srv/docker/containers/f3a939113d6e12450829a2dc76be3c761b818e63fbd33df513772e6e4485565e/config.lxc
lxc-start: No such file or directory - failed to get real path for '/srv/docker/devicemapper/mnt/f3a939113d6e12450829a2dc76be3c761b818e63fbd33df513772e6e4485565e/rootfs'
lxc-start: failed to pin the container's rootfs
lxc-start: failed to spawn 'f3a939113d6e12450829a2dc76be3c761b818e63fbd33df513772e6e4485565e'
Has someone met this ?

Docker on RHEL 6 Cgroup mounting failing

I'm trying to get my head around something that's been working on a Centos+Vagrant, but not on our providers RHEL (Red Hat Enterprise Linux Server release 6.5 (Santiago)). A sudo service docker restart hands this:
Stopping docker: [ OK ]
Starting cgconfig service: Error: cannot mount cpuset to /cgroup/cpuset: Device or resource busy
/sbin/cgconfigparser; error loading /etc/cgconfig.conf: Cgroup mounting failed
Failed to parse /etc/cgconfig.conf [FAILED]
Starting docker: [ OK ]
The service starts okey enough, but images cannot run. A mounting failed error is shown when I try. And the startup-log also gives a warning or two. Regarding the kernelwarning, centos gives the same and has no problems as Epel should resolve this:
WARNING: You are running linux kernel version 2.6.32-431.17.1.el6.x86_64, which might be unstable running docker. Please upgrade your kernel to 3.8.0.
2014/08/07 08:58:29 docker daemon: 1.1.2 d84a070; execdriver: native; graphdriver:
[1233d0af] +job serveapi(unix:///var/run/docker.sock)
[1233d0af] +job initserver()
[1233d0af.initserver()] Creating server
2014/08/07 08:58:29 Listening for HTTP on unix (/var/run/docker.sock)
[1233d0af] +job init_networkdriver()
[1233d0af] -job init_networkdriver() = OK (0)
2014/08/07 08:58:29 WARNING: mountpoint not found
Anyone had any success overcoming this problem or should I throw in the towel and wait for the provider to update to RHEL 7?

I have the same issue.
(1) check cgconfig status
# /etc/init.d/cgconfig status
if it stopped, restart it
# /etc/init.d/cgconfig restart
check cgconfig is running
(2) check cgconfig is on
# chkconfig --list cgconfig
cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off
if cgconfig is off, turn it on
(3) if still does not work, may be some cgroups modules is missing. In the kernel .config file, make menuconfig, add those modules into kernel and recompile and reboot
after that, it should be OK

I ended up asking the same question at Google Groups and in the end finding a solution with some help. What worked for me was this:
umount cgroup
sudo service cgconfig start
The project of making Docker work was put on halt all the same. Later a problem of network connection for the containers. This took to much time to solve and had to give up.

So I spent the whole day trying to rig docker to work on my vps. I was running into this same error. Basically what it came down to was the fact that OpenVZ didn't support docker containers up until a couple months ago. Specifically this RHEL update:
https://openvz.org/Download/kernel/rhel6/042stab105.14
Assuming this is your problem, or some variation of it, the burden of solving it is on your host. They will need to follow these steps:
https://openvz.org/Docker_inside_CT

In my case
/etc/rc.d/rc.cgconfig start
was generating
Starting cgconfig service: Error: cannot mount cpu,cpuacct,memory to
/cgroup/cpu_and_mem: Device or resource busy /usr/sbin/cgconfigparser;
error loading /etc/cgconfig.conf: Cgroup mounting failed Failed to
parse /etc/cgconfig.conf
i had to use:
/etc/rc.d/rc.cgconfig restart
and it automagicly umouted and mounted groups
Stopping cgconfig service: Starting cgconfig service:

it seems like the cgconfig service not running,so check it!
# /etc/init.d/cgconfig status
# mkdir -p /cgroup/cpuacct /cgroup/memory /cgroup/devices /cgroup/freezer net_cls /cgroup/blkio
# cat /etc/cgconfig.conf |tail|grep "="|awk '{print "mount -t cgroup -o",$1,$1,$NF}'>cgroup_mount.sh
# sh ./cgroup_mount.sh
# /etc/init.d/cgconfig restart
# /etc/init.d/docker restart

This situation occurs when the kernel is booted with cgroup_disable=memory and /etc/cgconfig.conf contains memory = /cgroup/memory;
This causes only /cgroup/cpuset to be mounted instead of the full set.
Solution: either remove cgroup_disable=memory from your kernel boot options or comment out memory = /cgroup/memory; from cgconfig.conf.

The cgconfig service startup uses mount and umount which requires an extra privilege bump from docker.
See the --privileged=true flag here for more info.
I was able to overcome this issue by starting my container with:
docker run -it --privileged=true my-image.
Tested in Centos6, Centos6.5.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart