Docker: is it possible to use overlayed backing filesystem? - docker

I'd like to control whether docker operates on a persistent storage or on a persistent storage overlayed with a volatile one.
It is because I have the filesystem on an SD card (Raspberry Pi) and it needs to last long. I mostly want to operate on a read-only filesystem (ext4) overlayed with tmpfs (run containers on it), but when I detect that an update is available I want to unmount overlayfs, switch filesystem as read-write, update the image, then switch everything back to the tmpfs-overlayed read-only filesystem.
# mv /var/lib/docker /var/lib/docker~
# mkdir -p /var/lib/docker /tmp/docker /tmp/work
# mount -t overlay -o lowerdir=/var/lib/docker~,upperdir=/tmp/docker,workdir=/tmp/work overlay /var/lib/docker
# docker daemon --storage-driver devicemapper
I tried two storage drivers: overlay2 and devicemapper (loop). The former refused to work on overlayfs underlying filesystem (it is also mentioned in the documentation that it is not supported), the latter consumes all my memory and then Docker gets killed by OS. The behaviour is the same for Raspberry Pi and my PC.
The only storage that should work is vfs, but from what I have read, it is very inefficient for storage (no Copy-on-Write), so it is of no use for me.
Now I'm giving a try to do it with aufs storage driver and overlayfs backing filesystem (Docker documentation doesn't state that it is disabled). I hope it will work but it has some disadvantages: aufs is not supported by mainline Linux kernel.
Is there some other way to switch between the two filesystems? Or could the SD card saving be done by some completely different way (e.g. running in-memory containers)?

EDIT: Sorry, this finally DOES NOT WORK!!!. Docker daemon starts but is unable to create containers. This is the error:
Handler for POST /v1.24/containers/create returned error: error creating aufs mount to /var/lib/docker/aufs/mnt c549130a63857658f8675fd84296afae46293a9f7ae54e9ee04e83c231db600f-init: invalid argument
aufs storage driver with overlayfs backing filesystem works. For now it seems like the only option, however I'm not satisfied with the solution, because it looks like a hack to me and because aufs is not in mainline kernel so I needed to compile the kernel myself.
This is how I did it (it's quite a hack, please advice me to do it better):
on my PC:
$ git clone https://github.com/p4l1ly/rpi-kernel
$ cd rpi-kernel
$ vagrant up
...wait some quite long time...
$ vagrant ssh
$ cp /var/kernel_build/results/kernel-20161003-100112/rpi2_3/kernel7.img /vagrant/
$ exit
$ sudo cp kernel7.img /mnt
then on the SD card:
# mv /var/lib/docker /var/lib/docker~
# mkdir -p /var/lib/docker /tmp/docker /tmp/work
# mount -t overlay -o lowerdir=/var/lib/docker~,upperdir=/tmp/docker,workdir=/tmp/work overlay /var/lib/docker
# docker daemon --storage-driver aufs

Related

overlayfs inside docker container

Is it possible to mount an overlay fs inside a (privileged) docker container? At least my intuitive approach, which works fine outside of a container, fails:
> mkdir /tmp/{up,low,work,merged}
> mount -t overlay overlay -o lowerdir=/tmp/low/,upperdir=/tmp/up/,workdir=/tmp/work/ /tmp/merged/
mount: /tmp/merged: wrong fs type, bad option, bad superblock on overlay, missing codepage or helper program, or other error.
Additional information:
Docker version 18.09.1, build 4c52b90
Kernel 4.19.0-8-amd64
Debian 10 (host and docker-image)
Found something that worked! Mounting the workdir and upperdir as tmpfs does the trick for me.
Like so:
> mkdir /tmp/overlay
> mkdir /tmp/{low,merged}
> mount -t tmpfs tmpfs /tmp/overlay
> mkdir /tmp/overlay/{up,work}
> mount -t overlay overlay -o lowerdir=/tmp/low/,upperdir=/tmp/overlay/up/,workdir=/tmp/overlay/work/ /tmp/merged/
I'd still be interested in an explanation why creating an overlay w/o tmpfs fails within a docker container?
How to mount an overlayfs inside a docker container:
https://gist.github.com/detunized/7c8fc4c37b49c5475e68ef9574587eee
Basically, you need to run the container with either --privileged or the more secure --cap-add=SYS_ADMIN.
This is a bit of a guess but I suspect it is because docker is already using overlayfs and overlayfs is refusing to use upperdir as another overlayfs.
I suspect this may be due to whiteout files:
In order to support rm and rmdir without changing the lower
filesystem, an overlay filesystem needs to record in the upper
filesystem that files have been removed. This is done using whiteouts
and opaque directories (non-directories are always opaque).
A whiteout is created as a character device with 0/0 device number.
When a whiteout is found in the upper level of a merged directory, any
matching name in the lower level is ignored, and the whiteout itself
is also hidden.
To delete a file that exists in a lowerdir, overlayfs will create a whiteout file and hides all whiteout files (device number 0,0). This logically means that you cannot create a character device file with number 0,0 inside an overlayfs because that must be hidden by overlayfs itself.
If you were allowed to use an overlayfs as an upperdir it wouldn't be able to create blackout files and therefore wouldn't be able to rm or rmdir any files from the lower layers. Because it can't create a character device file with number 0,0 on another overlayfs.

Why Docker `/var/lib/docker/overlay2` directory can not be on a overlay2 fs

When /var/lib/docker/overlay2 directory is on an overlay2 fs, Docker fails to start with:
level=error msg="'overlay2' is not supported over overlayfs" storage-driver=overlay2
level=error msg="[graphdriver] prior storage driver overlay2 failed: backing file system is unsupported for this graph driver"
Error starting daemon: error initializing graphdriver: backing file system is unsupported for this graph driver
The relevant code seems to be https://github.com/moby/moby/blob/master/daemon/graphdriver/overlay2/overlay.go#L162 but it doesn't explain the why!
I think you can check the /var/lib/docker/overlay2 directory using xfs_info whether or not it can be supported the overlay2.
For instance,
$ xfs_info /var/lib/docker/overlay2 | grep ftype
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
If the output is ftype=1, then it can be supported the overlay2.
Refer What is d_type and why Docker overlayfs need it for more details.
I hope it help you. :^)
As explained by in a GitHub commit comment by the author, overlay on overlay leads to "very quirky behavior" and is thus not supported currently in Docker.
It is not clear why you are attempting to do so. But a common reason is running Docker inside Docker. If this is the case, you can simply mount a folder from host to /var/lib/docker inside your container, e.g.:
docker run -d -v /my/storage:/var/lib/docker docker-in-docker-image
In addition, when building an image for Docker-in-Docker, add a VOLUME /var/lib/docker statement to your Dockerfile to ensure the folder is a mount point. This is also how the official docker:dind image works: https://github.com/docker-library/docker/blob/master/18.09/dind/Dockerfile#L40

How can I fix 'No space left on device' error in Docker?

I'm running a Mac-native Docker (no virtualbox/docker-machine).
I have a huge image with a lot of infrastructure in it (Postgres, etc.).
I have run cleanup scripts to get rid of a lot of cruft--unused images and so forth.
When I run my image I get an error like:
could not create directory "/var/lib/postgresql/data/pg_xlog": No space left on device
On my host Mac /var is sitting at 60% space available and generally my disk has lots of storage free.
Is this some Docker configuration I need to bump up to give it more resources?
Relevant lines from mount inside docker:
none on / type aufs (rw,relatime,si=5b19fc7476f7db86,dio,dirperm1)
/dev/vda1 on /data type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/hostname type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /etc/hosts type ext4 (rw,relatime,data=ordered)
/dev/vda1 on /var/lib/postgresql/data type ext4 (rw,relatime,data=ordered)
Here’s df:
[11:14]
Filesystem 1K-blocks Used Available Use% Mounted on
none 202054928 4333016 187269304 3% /
tmpfs 1022788 0 1022788 0% /dev
tmpfs 1022788 0 1022788 0% /sys/fs/cgroup
/dev/vda1 202054928 4333016 187269304 3% /data
shm 65536 4 65532 1% /dev/shm
tmpfs 204560 284 204276 1% /run/docker.sock
I haven't found many options for this, the main issue in github is https://github.com/docker/for-mac/issues/371
Some of the options suggested there are:
If you can remove all images/containers, you can follow these instructions:
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)
docker volume rm $(docker volume ls |awk '{print $2}')
rm -rf ~/Library/Containers/com.docker.docker/Data/*
You can try to prune all unused images/containers but this has proven not very effective:
docker system prune
Use a template image that is larger, install qemu using homebrew and move the image around, see this specific comment: https://github.com/docker/for-mac/issues/371#issuecomment-242047368 but you need to have at least 2x of space left to do this without losing containers/images.
See also: How do you get around the size limitation of Docker.qcow2 in the Docker for Mac?
And https://forums.docker.com/t/no-space-left-on-device-error/10894/26
I ran into the same issue, running docker system prune --volumes resolved the problem.
"Volumes are not pruned by default, and you must specify the --volumes flag for docker system prune to prune volumes."
See: https://docs.docker.com/config/pruning/#prune-everything
I ran into this recently on with a docker installation on linux that uses the devicemapper storage driver (default). There was indeed a docker configuration I needed to change to fix this.
Docker images are made of read-only layers of filesystem snapshots, each layer created by a command in your Dockerfile, which are built on top of a common base storage snapshot. The base snapshot is shared by all your images and has a file system with a default size of 10GB. When you run your image you get a new writable layer on top of all the layers in the image, so you can add new files in your running container but it's still eventually based on the same base snapshot with the 10GB filesystem. This is at least true for devicemapper, not sure about other drivers. Here is the relevant documentation from docker.
To change this default value to something else, there's a daemon parameter you can set, e.g. docker daemon --storage-opt dm.basesize=100G. Since you probably don't run the daemon manually need to edit the docker daemon options in some file, depending on how you run the docker daemon. With docker for mac you can edit the daemon parameters as JSON in the preferences under Daemon->Advanced. You probably need to add something like this:
{
"storage-opts": ["dm.basesize=100G"]
}
(but like I said, I had this problem on linux, so didn't try the above).
Anyway in order for this to take effect, you'll need to remove all your existing images (so that they're re-created on top of the new base snapshot with the new size). See storage driver options.

Clean docker environment: devicemapper

I have a docker environment with 2 containers (Jenkins and Nexus, both with their own named volume).
I have a daily cron-job which deletes unused containers and images. This is working fine. But the problem is inside my devicemapper:
du -sh /var/lib/docker/
30G docker/
I can each folder in my docker folder:
Volumes (big, but that's normal in my case):
/var/lib/docker# du -sh volumes/
14G volumes/
Containers:
/var/lib/docker# du -sh containers/
3.2M containers/
Images:
/var/lib/docker# du -sh image/
5.8M image/
Devicemapper:
/var/lib/docker# du -sh devicemapper/
16G devicemapper/
/var/lib/docker/devicemapper/mnt is 7.3G
/var/lib/docker/devicemapper/devicemapper is 8.1G
Docker info:
Storage Driver: devicemapper
Pool Name: docker-202:1-xxx-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 5.377 GB
Data Space Total: 107.4 GB
Data Space Available: 28.8 GB
Metadata Space Used: 6.148 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.141 GB
Udev Sync Supported: true
What is this space and am I able to clean this without breaking stuff?
Don't use a devicemapper loop file for anything serious! Docker has big warnings about this.
The /var/lib/docker/devicemapper/devicemapper directory contains the sparse loop files that contain all the data that docker mounts. So you would need to use lvm tools to trawl around them and do things. Have a read though the remove issues with devicemapper, they are kinda sorta resolved but maybe not.
I would move away from devicemapper where possible or use LVM thin pools on anything RHEL based. If you can't change storage drivers, the same procedure will at least clear up any allocated sparse space you can't reclaim.
Changing the docker storage driver
Changing storage driver will require dumping your /var/lib/docker directories which contains all your docker data. There are ways to save portions of it but that involves messing around with Docker internals. Better to commit and export any containers or volumes you want to keep and import them after the change. Otherwise you will have a fresh, blank Docker install!
Export data
Stop Docker
Remove /var/lib/docker
Modify your docker startup to use the new storage driver.
Set --storage-driver=<name> in /lib/systemd/system/docker.service or /etc/systemd/system/docker.service or /etc/default/docker or /etc/sysconfig/docker
Start Docker
Import Data
AUFS
AUFS is not in the mainline kernel (and never will be) which means distro's have to actively include it somehow. For Ubuntu it's in the linux-image-extra packages.
apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
Then change the storage driver option to --storage-driver=aufs
OverlayFS
OverlayFS is already available in Ubuntu, just change the storage driver to --storage-driver=overlay2 or --storage-driver=overlay if you are still using a 3.x kernel
I'm not sure how good an idea this is right now. It can't be much worse than the loop file but
The overlay2 driver is pretty solid for dev use but isn't considered production ready yet (e.g. Docker Enterprise don't provide support) but it is being pushed to become the standard driver due to the AUFS/Kernel issues.
Direct LVM Thin Pool
Instead of the devicemapper loop file you can use an LVM thin pool directly. RHEL makes this easy with a docker-storage-setup utility that distributed with their EPEL docker package. Docker have detailed steps for setting up the volumes manually.
--storage-driver=devicemapper \
--storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool \
--storage-opt dm.use_deferred_removal=true
Docker 17.06+ supports managing simple direct-lvm block device setups for you.
Just don't run out of space in the LVM volume, ever. You end up with an unresponsive Docker daemon that needs to be killed and then LVM resources that are still in use that are hard to clean up.
A periodic docker system prune -a works for me on systems where I use devicemapper and not the LVM thinpool. The pattern I use is:
I label any containers, images, etc with label "protected" if I want them to be exempt from cleanup
I then periodically run docker system prune -a --filter=label!=protected (either manually or on cron with -f)
Labeling examples:
docker run --label protected ...
docker create --label=protected=true ...
For images, Dockerfile's LABEL, eg LABEL protected=true
To add a label to an existing image that I cannot easily rebuild, I make a 2 line Dockerfile with the above, build a new image, then switch the new image for the old one (tag).
General Docker label documentation
First, what is devicemapper (official documentation)
Device Mapper has been included in the mainline Linux kernel since version 2.6.9 [in 2005]. It is a core part of RHEL family of Linux distributions.
The devicemapper driver stores every image and container on its own virtual device. These devices are thin-provisioned copy-on-write snapshot devices.
Device Mapper technology works at the block level rather than the file level. This means that devicemapper storage driver's thin provisioning and copy-on-write operations work with blocks rather than entire files.
The devicemapper is the default Docker storage driver on some Linux distributions.
Docker hosts running the devicemapper storage driver default to a configuration mode known as loop-lvm. This mode uses sparse files to build the thin pool used by image and container snapshots
Docker 1.10 [from 2016] and later no longer matches image layer IDs with directory names in /var/lib/docker.
However, there are two key directories.
The /var/lib/docker/devicemapper/mnt directory contains the mount points for image and container layers.
The /var/lib/docker/devicemapper/metadatadirectory contains one file for every image layer and container snapshot.
If your docker info does show your Storage Driver is devicemapper (and not aufs), proceed with caution with those folders.
See for instance issue 18867.
I faced the same issue where in my /var/lib/docker/devicemapper/devicemapper/data file has reached ~91% of root volume(~45G of 50G). I tried removing all the unwanted images, deleted volumes, nothing helped in reducing this file.
Did a few googling and understood that the "data" files is loopback-mounted sparse files and docker uses it to store the mount locations and other files we would have stored inside the containers.
Finally I removed all the images which were run before and stopped
Warning: Deletes all docker containers
docker rm $(docker ps -aq)
The reduced the devicemapper file significantly. Hope this may help you
.

Install Docker on a specific volume

Is there a way to install Docker on a specific volume ?
When I install Docker on Amazon Linux with the following command :
sudo yum install docker
and then start the docker service using :
sudo service docker start
It creates two Data Spaces :
Data file: /dev/loop0
Metadata file: /dev/loop1
How can I have those spaces be on a given volume such as /mnt/docker for example ?
Those are device files. They will always be in /dev (actually not, but let's just assume for sake of simplicity, here). loop0 and loop1 are loop devices that are backed by the actual Docker volume files. You can easily see this using losetup -l:
> losetup -l
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE
/dev/loop0 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/data
/dev/loop1 0 0 1 0 /var/lib/docker/devicemapper/devicemapper/metadata
What you might want to do (depending on your file system layout) is moving the Docker runtime directory somewhere else (default is /var/lib/docker; all Docker volumes and images are stored there). For this, you can supply the -g flag to the Docker daemon.
In CentOS/Fedora/RHEL (and probably, because it's based on RHEL, also Amazon Linux), you can modify the /etc/sysconfig/docker file for this (look for an OPTIONS variable). In Ubuntu/Debian /etc/default/docker would be the place to look.
I was able to get docker to store all of its data (containers and their data volumes) at a different place in the file system (my EBS volume) by editing /etc/sysconfig/docker
which has the line:
OPTIONS="--default-ulimit nofile=1024:4096"
I added the -g option, as documented here
OPTIONS="--default-ulimit nofile=1024:4096 --graph=/home/ec2-user/myvolume"
where myvolume is the directory where I mounted my EBS volume. Of course you need to stop and restart the docker daemon for this to take effect.
This is on Amazon Linux. Apparently the docker config file is /etc/default/docker on some Linuxes.

Resources