Docker container lost file after restart docker service or reboot - docker

My server's main drive nearly full. So I move /var/lib/docker directory to second drive location /media/my-username/sec-drive/docker using
sudo -s # enter root mode
service docker stop
rsync -aXS /media/my-username/sec-drive/docker /var/lib/docker
rm -rf /var/lib/docker
ln -s /media/my-username/sec-drive/docker /var/lib/docker
serivce docker start
then I start my all docker container by using docker-compose up -d
all containers works just fine.
But when I reboot or restart docker service, one of my containers lost a bunch of files(other containsers works just fine). One of those files is libmxnet.so(filemode:777) under /opt/myproj/mxnet/
use local mxnet
RuntimeError: Cannot find the files.
List of candidates:
/opt/myproj/mxnet/libmxnet.so
/opt/myproj/mxnet/libmxnet.so
/opt/myproj/mxnet/../../build/libmxnet.so
/usr/local/nvidia/lib/libmxnet.so
/usr/local/nvidia/lib64/libmxnet.so
../../../libmxnet.so
Those files seems lost randomly. In mxnet folder __init__.py lost but __init__.pyc stays fine. That's really wired.
Then I try to remove images and containers and import again, just turn out same result.
UPDATE:
This error occurred on another server again. But this time I've reinstalled the system and haven't move docker to another drive. Seems it has nothing to do with docker directory location

You have to instruct the docker daemon that you change the folder.
In your docker.service you should add a parameter (-g):
FROM:
ExecStart=/usr/bin/docker daemon
TO:
ExecStart=/usr/bin/docker daemon -g /new/path/docker
Some references here:
https://www.rb-associates.co.uk/blog/move-var-lib-docker-to-another-directory/
https://linuxconfig.org/how-to-move-docker-s-default-var-lib-docker-to-another-directory-on-ubuntu-debian-linux

Related

Getting error while running local docker registry

I am getting while running local docker registry on centos system. I am explaining the error below.
docker: Error response from daemon: lstat /var/lib/docker/overlay2/3202584ed599bad99c7896e0363ac9bb80a0385910844ce13e9c5e8849494d07: no such file or directory.
I am setting of the local registry like below.
vi /etc/docker/daemon.json:
{ "insecure-registries":["ip:5000"] }
I have the registry image installed my system and I am running using the below command.
docker run -dit -p 5000:5000 --name registry bundle/tools:registry_3.0.0-521
I have cleaned all volume as per some suggestion from google but still same issue. Can anybody help me to resolve this error.
The error is not related to the registry and is happening in the client side because of local caching (or some other docker-related issue) in your system.
I've seen this error a lot in the docker community and the most suggested approach to solve this error is to clean up the whole /var/lib/docker directory.
On your local client system, if you don't care about your current containers, images, and caches, try stopping the docker daemon, removing the whole /var/lib/docker directory, and starting it again:
Note that sometimes it gets fixed by only restarting the daemon, so it worths trying it first:
sudo service docker restart
If a simple restart can't solve the problem, go ahead and destroy it:
sudo service docker stop
sudo rm -rf /var/lib/docker
sudo service docker start
(I'm not sure about if these systemd commands will work on your CentOS too)

Make systemctl work from inside a container in a debian stretch image

Purpose - What do I want to achieve?
I want to access systemctl from inside a container running a kubernetes node (ami: running debian stretch)
Working setup:
Node AMI: kope.io/k8s-1.10-debian-jessie-amd64-hvm-ebs-2018-08-17
Node Directories Mounted in the container to make systemctl work:
/var/run/dbus
/run/systemd
/bin/systemctl
/etc/systemd/system
Not Working setup:
Node AMI: kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
Node Directories Mounted in the container to make systemctl work:
/var/run/dbus
/run/systemd
/bin/systemctl
/etc/systemd/system
Debugging in an attempt to solve the problem
To debug this issue with the debian-stretch image not supporting systemctl with the same mounts as debian-jessie
1) I began by spinning up a nginx deployment by mounting the above mentioned volumes in it
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
kubectl exec -it nginx-deployment /bin/bash
root#nginx-deployment-788f65877d-pzzrn:/# systemctl
systemctl: error while loading shared libraries: libsystemd-shared-
232.so: cannot open shared object file: No such file or directory
2) As the above issue showed the file libsystemd-shared-232.so not found. I found the actual path by looking into the node.
admin#ip-10-0-20-11:~$ sudo find / -iname 'libsystemd-shared-232.so'
/lib/systemd/libsystemd-shared-232.so
3) Mounted the /lib/systemd in the nginx pod and ran the systemctl again
kubectl exec -it nginx-deployment /bin/bash
root#nginx-deployment-587d866f54-ghfll:/# systemctl
systemctl: error while loading shared libraries: libcap.so.2:cannot
open shared object file: No such file or directory
4) Now the systemctl was failing with a new so missing error
root#nginx-deployment-587d866f54-ghfll:/# systemctl
systemctl: error while loading shared libraries: libcap.so.2: cannot
open shared object file: No such file or directory
5) To solve the above error i again searched the node for libcap.so.2 Found it in the below path.
admin#ip-10-0-20-11:~$ sudo find / -iname 'libcap.so.2'
/lib/x86_64-linux-gnu/libcap.so.2
6) Seeing the above directory not mounted in my pod. I mounted the below path in the nginx pod.
/lib/x86_64-linux-gnu mounted in the nginx pod(deployment)
7) The nginx pod is not able to come up after adding the above mount. Getting the below error:
$ k logs nginx-deployment-f9c5ff956-b9wn5
standard_init_linux.go:178: exec user process caused "no such file
or directory"
Please suggest how to debug further. And what all mounts are required to make systemctl work from inside a container in a debian stretch environment.
Any pointers to take the debugging further could be helpful.
Rather than mounting some of the library files from the host you can just install systemd in the container.
$ apt-get -y install systemd
Now, that won't necessarily make systemctl run. You will need systemd to be running in your container which is spawned by /sbin/init on your system. /sbin/init needs to run as root so essentially you would have to run this with the privileged flag in the pod or container security context on Kubernetes. Now, this is insecure and there is a long history about running systemd in a container where the Docker folks were mostly against it (security) and the Red Hat folks said that it was needed.
Nevertheless, the Red Hat folks figured out a way to make it work without the unprivileged flag. You need:
/run mounted as a tmpfs in the container.
/sys/fs/cgroup mounted as read-only is ok.
/sys/fs/cgroup/systemd/ mounted as read/write.
Use for STOPSIGNAL SIGRTMIN+3
In Kubernetes you need an emptyDir to mount a tmpfs. The others can be mounted as host volumes.
After mounting your host's /lib directory into the container, your Pod most probably will not start because the Docker image's /lib directory contained some library needed by the Nginx server that should start in that container. By mounting /lib from the host, the libraries required by Nginx will not be accessible any more. This will result in a No such file or directory error when trying to start Nginx.
To make systemctl available from within the container, I would suggest simply installing it within the container, instead of trying to mount the required binaries and libraries into the container. This can be done in your container's Dockerfile:
FROM whatever
RUN apt-get update && apt-get install systemd
No need to mount /bin/systemd or /lib/ with this solution.
I had a similar problem where one of the lines in my Dockerfile was:
RUN apt-get install -y --reinstall systemd
but after docker restart, when I tried to use systemctl command. The output was:
Failed to connect to bus: No such file or directory.
I solved this issue by adding following to my docker-compose.yml:
volumes:
- "/sys/fs/cgroup:/sys/fs/cgroup:ro"
It can be done also by:
sudo docker run -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro {other options}

Docker - ERROR: failed to register layer: symlink

I'm running a docker-compose file we have, I usually run it with command:
docker-compose up
But today I'm getting this error.
ERROR: failed to register layer: symlink ../bdf441e8145a625c4ab289f13ac2274b37d35475b97680f50b7eccda4328f973/diff /var/lib/docker/overlay2/l/7O5XKRTJV6RMTXBV5DTPDOHYNX: no such file or directory
To solve this issue, you just Stop and Start docker service from terminal.
# service docker stop
# service docker start
For me, this issue came up when I tried to clear the lib/docker/overlay folder by deleting all its contents (not a good thing to do). After that, I was not able to build any of my images back.
Solved it by running this
docker system prune --volumes -a
Warning: This removes all the volumes and its contents which may result in data loss. Which was fine for me since I already deleted everything.
Followed this answer just restarting docker fixed the problem.
https://stackoverflow.com/a/35325477/4031815
Restart docker or if that doesn't work, do a Docker > Reset > Remove all Data.
I got the same error and the latter is the only thing that ended up working for me.
What worked for me and did not involve losing all my data:
Make sure all docker containers are down: docker compose down
Remove problematic overlay2: sudo rm -R /var/lib/docker/overlay2
Remove images: sudo rm -R /var/lib/docker/image
Clear any other cached data: docker system prune -f
Restart docker service: systemctl restart docker
Close and reopen VS Code
Then, docker compose build finally worked fine for me

How to change the default location for "docker create volume" command?

When creating volumes through the volume API, that is, as the container volume pattern is now not necessarily the best practice anymore:
# docker volume inspect test-data
[
{
"Name": "test-data",
"Driver": "local",
"Mountpoint": "/var/lib/docker/volumes/test-data/_data"
}
]
I would like to, for example, have docker volumes exist in /data (which is mounted in a different physical volume).
This is not possible to do with symbolic links, it is possible to do with bind mounts, but would I'm wondering if there is some configuration in Docker to change the default location for each separate volume.
You can change where Docker stores its files including volumes by changing one of its startup parameters called --data-root.
If you're using systemd for service management, the file is usually located at /lib/systemd/system/docker.service. Edit the file as such:
# Old - taken from the generated docker.service file in Ubuntu 16.04's docker.io package
ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS
# New
ExecStart=/usr/bin/dockerd --data-root /new_location/ -H fd:// $DOCKER_OPTS
Alternatively, you can edit the Docker daemon configuration file which defaults to /etc/docker/daemon.json.
Restart the Docker daemon and your volumes will be under /new_location/volumes/{volume_name}/_data
Note: be careful in production and also locally! You also have to move the existing data from /var/lib/docker/ to the new location for your docker install to work as expected.
You can use symlinks from the new location if you want specific folders to be in specific place.
2017: with 17.05.0-ce (2017-05-04), the PR 28696 deprecates --graph flag in favor or --data-root: commit 1ecaed0
The name "graph" is a legacy term from long ago when there used to be a directory at the default location /var/lib/docker/graph.
However, the flag would indicate the path of the parent directory of the "graph" directory which contains not only image data but also data for volumes, containers, and networks.
In the most recent version of docker, this directory also contains swarm cluster state and node certificates.
With issue 5922 and PR 5978, the documentation has been updated.
Example:
ExecStart=/usr/bin/dockerd -H fd:// --data-root=/mnt/ssd/lib/docker
2016 (now deprecated)
I only know of a docker option to change /var/lib/docker itself, not its subfolders (part of its "graph" used by a docker daemon storage driver)
See docker daemon "Miscellaneous options":
Docker supports softlinks for the Docker data directory (/var/lib/docker) and for /var/lib/docker/tmp.
The DOCKER_TMPDIR and the data directory can be set like this:
DOCKER_TMPDIR=/mnt/disk2/tmp /usr/local/bin/docker daemon -D -g /var/lib/docker -H unix:// > /var/lib/docker-machine/docker.log 2>&1
# or
export DOCKER_TMPDIR=/mnt/disk2/tmp
/usr/local/bin/docker daemon -D -g /var/lib/docker -H unix:// > /var/lib/docker-machine/docker.log
As mentioned in "Where are docker images stored on the host machine?" (and that would apply also for containers/volumes):
The contents of the /var/lib/docker directory vary depending on the driver Docker is using for storage.
I successfully moved the storage location of docker by moving the content of /var/lib/docker to a new location and then place a symlink pointing to the new location (I took this solution from here https://askubuntu.com/questions/631450/change-data-directory-of-docker):
Caution - These steps depend on your current /var/lib/docker being an
actual directory (not a symlink to another location).
1) Stop docker: service docker stop. Verify no docker process is running:
ps aux | grep -i [d]ocker
2) Double check docker really isn't running. Take a look at the current docker directory:
ls /var/lib/docker/
2b) Make a backup - tar -zcC /var/lib docker >
/mnt/pd0/var_lib_docker-backup-$(date +%s).tar.gz
3) Move the /var/lib/docker directory to your new partition:
mv /var/lib/docker /mnt/pd0/docker
4) Make a symlink: ln -s /mnt/pd0/docker /var/lib/docker
5) Take a peek at the directory structure to make sure it looks like
it did before the mv: ls /var/lib/docker/ (note the trailing slash)
6) Start docker back up service docker start
7) restart your containers (resolve the symlink)
Worked for me on Ubuntu 18.04.1 LTS on an Azure VM with Docker 18.09.2
If you're on Fedora (tested on 32) just change or add the --data-root flag with your desired path to the OPTIONS variable on /etc/sysconfig/docker, this is the environment file used by systemd to start the dockerd service.

Moving docker root folder to a new drive / partition

I am trying to move the "/var/lib/docker" folder from one disk to another since that is taking up too much space. I keep running into some errors relating to permissions!
According to these questions:
How do I move a docker container's image to a persistent disk?
How to run docker LXC containers on another partition?
My disk is mounted on "/data" and I copied the "/var/lib/docker" folder to "/data/docker"
This is what I tried:
Tried out the -g flag from DOCKER_OPTS with "/data/docker"
Tried creating a symbolic link from the new disk drive
I tried doing a bind mount from /data/docker
However in all the cases, I get an error when I try to launch services inside my container about missing permissions to write to "/dev/null" (as user root).
I simply did a copy of the folder to the new disk. This copied all the permissions as well (This is an ext4 system with same filesystem level permissions as the original disk on which docker exists now).
Specs:
The fileystem I am using is aufs.
Docker version is 0.7.6
Ubuntu 12.04
How do I move the data properly? Do I need a upgrade first?
I just did the following and it seems to work well:
as root:
service docker stop
mv /var/lib/docker /data/
# reboot and get root
service docker stop
rm -rf /var/lib/docker && ln -s /data/docker /var/lib/
service docker start
To add custom startup options to docker in Debian / Ubuntu (such as using a different data directory):
Edit /lib/systemd/system/docker.service:
[Service]
EnvironmentFile=-/etc/default/docker
ExecStart=/usr/bin/docker -d $DOCKER_OPTS -H fd://
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
In /etc/default/docker set :
DOCKER_OPTS="-g /srv/docker"
In more recent Docker versions on Ubuntu you need to edit /etc/default/daemon.json:
{
"data-root": "/new/location"
}

Resources