Docker backing filesystem as AWS efs - docker

I am mounting an AWS efs filesystem on /var/lib/docker and using it as the default docker backing filesystem. Storage driver is overlay2. I see in the docs that overlay2 only supports xfs and ext. My aim is to mount this backing filesystem on multiple machines so that all those machines have the image data but multiple mount is not supported by aws ebs(being a ext4 and a supported backing fs by overlay2). One way could by that is pull the images on an ext4 fs and cp the image data into the efs but it would be too time taking. What could be another way to go about this?

The short answer is "don't do that" because /var/lib/docker is not designed to be shared by multiple daemons. You'll find race conditions, erroneous output about networks and containers that don't exist locally, and other errors that won't be fixed/supported.
Instead, put a registry near your cluster, in the same VPC/AZ, and have your nodes pull from that cluster. Or have a look at the work done to support estargz in runtimes like containerd which can start running a container before the layers are completely pulled.

Related

How is non-volume data for a Docker container stored?

I understand Docker volumes and the way they refer to directories on the host. What about the rest of the filesystem within a container?
To think about it a different way: suppose you have a server with most of the storage on a remote drive, meaning reads and writes take longer than usual. If you don't mount any volumes, would it keep any/some/most/all of the container filesystem in RAM? Or does it write some amount of it to disk, meaning it would be just as slow as a volume in this case?
Non-volume data is stored in a layered overlay filesystem (in most distributions, this will be either an AUFS or DeviceMapper filesystem). The principle is the same in both cases (image source):
As already mentioned in comments, I can recommend reading the section "Understand images, containers and storage drivers" from the official documentation. This answer is just a short summary.
Each Docker image consists of multiple layers of filesystem images. For example, an Apache+PHP image might consist of (1) a generic Ubuntu base layer, (2) an additional layer with the Apache HTTP server installed and (3) another layer on top with PHP-FPM and configuration files (just an example).
When you start a new container from an image, a new per-container layer is added to the existing image layers. This layer will contain all changes that is written within the container itself (to non-volume directories).
Regarding your specific questions:
If you don't mount any volumes, would it keep any/some/most/all of the container filesystem in RAM?
Nope, there's nothing in RAM (besides the usual filesystem caches). It's all in the overlay filesystem, which are mounted using AUFS, DeviceMapper or another storage driver.
Or does it write some amount of it to disk, meaning it would be just as slow as a volume in this case?
In general, filesystem access in volumes is more performant than in the overlay filesystem. After all, a volume (at least, a regular host-based volume, letting aside volume drivers that add network storage volumes) is simply a bind mount to a regular directory in the host filesystem, bypassing the layer filesystem entirely. The performance of volumes in comparison to layer filesystem is (among other topics) investigated in this paper:
AUFS introduces significant overhead which is not surprising since I/O is going through several layers, [...]. Applications that are filesystem or disk intensive should bypass AUFS by using volumes. [...] Although containers themselves have almost no overhead, Docker is not without performance gotchas. Docker volumes have noticeably better performance than files stored in AUFS.

Clean docker environment: devicemapper

I have a docker environment with 2 containers (Jenkins and Nexus, both with their own named volume).
I have a daily cron-job which deletes unused containers and images. This is working fine. But the problem is inside my devicemapper:
du -sh /var/lib/docker/
30G docker/
I can each folder in my docker folder:
Volumes (big, but that's normal in my case):
/var/lib/docker# du -sh volumes/
14G volumes/
Containers:
/var/lib/docker# du -sh containers/
3.2M containers/
Images:
/var/lib/docker# du -sh image/
5.8M image/
Devicemapper:
/var/lib/docker# du -sh devicemapper/
16G devicemapper/
/var/lib/docker/devicemapper/mnt is 7.3G
/var/lib/docker/devicemapper/devicemapper is 8.1G
Docker info:
Storage Driver: devicemapper
Pool Name: docker-202:1-xxx-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 5.377 GB
Data Space Total: 107.4 GB
Data Space Available: 28.8 GB
Metadata Space Used: 6.148 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.141 GB
Udev Sync Supported: true
What is this space and am I able to clean this without breaking stuff?
Don't use a devicemapper loop file for anything serious! Docker has big warnings about this.
The /var/lib/docker/devicemapper/devicemapper directory contains the sparse loop files that contain all the data that docker mounts. So you would need to use lvm tools to trawl around them and do things. Have a read though the remove issues with devicemapper, they are kinda sorta resolved but maybe not.
I would move away from devicemapper where possible or use LVM thin pools on anything RHEL based. If you can't change storage drivers, the same procedure will at least clear up any allocated sparse space you can't reclaim.
Changing the docker storage driver
Changing storage driver will require dumping your /var/lib/docker directories which contains all your docker data. There are ways to save portions of it but that involves messing around with Docker internals. Better to commit and export any containers or volumes you want to keep and import them after the change. Otherwise you will have a fresh, blank Docker install!
Export data
Stop Docker
Remove /var/lib/docker
Modify your docker startup to use the new storage driver.
Set --storage-driver=<name> in /lib/systemd/system/docker.service or /etc/systemd/system/docker.service or /etc/default/docker or /etc/sysconfig/docker
Start Docker
Import Data
AUFS
AUFS is not in the mainline kernel (and never will be) which means distro's have to actively include it somehow. For Ubuntu it's in the linux-image-extra packages.
apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual
Then change the storage driver option to --storage-driver=aufs
OverlayFS
OverlayFS is already available in Ubuntu, just change the storage driver to --storage-driver=overlay2 or --storage-driver=overlay if you are still using a 3.x kernel
I'm not sure how good an idea this is right now. It can't be much worse than the loop file but
The overlay2 driver is pretty solid for dev use but isn't considered production ready yet (e.g. Docker Enterprise don't provide support) but it is being pushed to become the standard driver due to the AUFS/Kernel issues.
Direct LVM Thin Pool
Instead of the devicemapper loop file you can use an LVM thin pool directly. RHEL makes this easy with a docker-storage-setup utility that distributed with their EPEL docker package. Docker have detailed steps for setting up the volumes manually.
--storage-driver=devicemapper \
--storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool \
--storage-opt dm.use_deferred_removal=true
Docker 17.06+ supports managing simple direct-lvm block device setups for you.
Just don't run out of space in the LVM volume, ever. You end up with an unresponsive Docker daemon that needs to be killed and then LVM resources that are still in use that are hard to clean up.
A periodic docker system prune -a works for me on systems where I use devicemapper and not the LVM thinpool. The pattern I use is:
I label any containers, images, etc with label "protected" if I want them to be exempt from cleanup
I then periodically run docker system prune -a --filter=label!=protected (either manually or on cron with -f)
Labeling examples:
docker run --label protected ...
docker create --label=protected=true ...
For images, Dockerfile's LABEL, eg LABEL protected=true
To add a label to an existing image that I cannot easily rebuild, I make a 2 line Dockerfile with the above, build a new image, then switch the new image for the old one (tag).
General Docker label documentation
First, what is devicemapper (official documentation)
Device Mapper has been included in the mainline Linux kernel since version 2.6.9 [in 2005]. It is a core part of RHEL family of Linux distributions.
The devicemapper driver stores every image and container on its own virtual device. These devices are thin-provisioned copy-on-write snapshot devices.
Device Mapper technology works at the block level rather than the file level. This means that devicemapper storage driver's thin provisioning and copy-on-write operations work with blocks rather than entire files.
The devicemapper is the default Docker storage driver on some Linux distributions.
Docker hosts running the devicemapper storage driver default to a configuration mode known as loop-lvm. This mode uses sparse files to build the thin pool used by image and container snapshots
Docker 1.10 [from 2016] and later no longer matches image layer IDs with directory names in /var/lib/docker.
However, there are two key directories.
The /var/lib/docker/devicemapper/mnt directory contains the mount points for image and container layers.
The /var/lib/docker/devicemapper/metadatadirectory contains one file for every image layer and container snapshot.
If your docker info does show your Storage Driver is devicemapper (and not aufs), proceed with caution with those folders.
See for instance issue 18867.
I faced the same issue where in my /var/lib/docker/devicemapper/devicemapper/data file has reached ~91% of root volume(~45G of 50G). I tried removing all the unwanted images, deleted volumes, nothing helped in reducing this file.
Did a few googling and understood that the "data" files is loopback-mounted sparse files and docker uses it to store the mount locations and other files we would have stored inside the containers.
Finally I removed all the images which were run before and stopped
Warning: Deletes all docker containers
docker rm $(docker ps -aq)
The reduced the devicemapper file significantly. Hope this may help you
.

Docker - Disk Quotas

I understand that docker containers have a maximum of 10GB of disk space with the Device Mapper storage driver by default.
In my case, I have a worker container and a data volume container. I then link them together using "-volumes-from". I also use the Device Mapper storage driver.
Example, I did a test by creating a script to download a 20GB file onto my data volume container and that worked successfully.
My question, is there a 10GB quota for the data volume container? How did the 20GB file download successfully if it is limited by 10GB?
Volumes are outside of the Union File System by definition, so any data in them will not count towards the devicemapper 10GB limit. By default volumes are stored under /var/lib/docker/vfs if you don't specify a mount point.
You can find the exactly where your volumes are on the host by using the docker inspect command e.g:
docker inspect -f {{.Volumes}} CONTAINER
You will get a result like:
map[/CONTAINER/VOLUME:/var/lib/docker/vfs/dir/5a6f7b306b96af38723fc4d31def1cc515a0d75c785f3462482f60b730533b1a]
Where the path after the colon is the location of the volume on the host.

Can docker "instance" run totally in ram?

I would like to power a docker instance in ram... totally inside ram... using tmpfs
Can it be done?
I'm not sure how docker uses filesystems as I'm too used to using kvm and xen, they both need to set up a default size before it can be used.
So how does "docker fs" work?
This can be done. If you mount /var/lib/docker on a tmpfs, Docker can use other storage backends, like OverlayFS, on top of it.
Docker uses what it calls a "Union File System", made up of multiple read-only layers with a copy-on-write layer on top (see http://docs.docker.com/terms/layer/). It can use one of several storage drivers for this (in order of preference): AUFS, BTRFS, devicemapper, overlayfs or VFS. Because of this, no, I don't think you will be able to use tmpfs.
More information at https://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/

Do I need a private docker registry?

I've recently discovered docker. It looks very useful for us.
But what I don't understand is the role of the registry beyond getting initial docker images. We'll likely be starting with some images based on those from docker.io, but will be customizing those and adding some private closed source software.
What concerns me is if the images were large enough then could I run out of space on my / drive.
Can /var/lib/docker just be a mount to a shared file system like cephfs or nfs?
I'm also interested in using CoreOS in a PXE or iPXE configuration. It appears that in that scenario / is mounted as tmpfs up to 50% RAM which is needlessly wasteful for pulling images that could be available on a shared file system. However I've read comments that for some reason /var/lib/docker needs to be on btrfs. Is this true? why?
Ok I've found an answer to my last question. CoreOS requires /var/lib/docker to be mounted on btrfs because it uses the btrfs backend. This backend uses btrfs snapshots to implement the layers docker uses to represent it's image.
Which helps with my second question. Can /var/lib/docker just be a mount to a shared file system. By the looks of it, no. Not unless the super slow vfs backend is used.
It's easy and cheap to store your registry in S3.
I would recommend against mounting /var/lib/docker on nfs. If someone hammers the nfs, all your services will essentially stop working, since the file systems of the containers live there.

Resources