How is non-volume data for a Docker container stored? - docker

I understand Docker volumes and the way they refer to directories on the host. What about the rest of the filesystem within a container?
To think about it a different way: suppose you have a server with most of the storage on a remote drive, meaning reads and writes take longer than usual. If you don't mount any volumes, would it keep any/some/most/all of the container filesystem in RAM? Or does it write some amount of it to disk, meaning it would be just as slow as a volume in this case?

Non-volume data is stored in a layered overlay filesystem (in most distributions, this will be either an AUFS or DeviceMapper filesystem). The principle is the same in both cases (image source):
As already mentioned in comments, I can recommend reading the section "Understand images, containers and storage drivers" from the official documentation. This answer is just a short summary.
Each Docker image consists of multiple layers of filesystem images. For example, an Apache+PHP image might consist of (1) a generic Ubuntu base layer, (2) an additional layer with the Apache HTTP server installed and (3) another layer on top with PHP-FPM and configuration files (just an example).
When you start a new container from an image, a new per-container layer is added to the existing image layers. This layer will contain all changes that is written within the container itself (to non-volume directories).
Regarding your specific questions:
If you don't mount any volumes, would it keep any/some/most/all of the container filesystem in RAM?
Nope, there's nothing in RAM (besides the usual filesystem caches). It's all in the overlay filesystem, which are mounted using AUFS, DeviceMapper or another storage driver.
Or does it write some amount of it to disk, meaning it would be just as slow as a volume in this case?
In general, filesystem access in volumes is more performant than in the overlay filesystem. After all, a volume (at least, a regular host-based volume, letting aside volume drivers that add network storage volumes) is simply a bind mount to a regular directory in the host filesystem, bypassing the layer filesystem entirely. The performance of volumes in comparison to layer filesystem is (among other topics) investigated in this paper:
AUFS introduces significant overhead which is not surprising since I/O is going through several layers, [...]. Applications that are filesystem or disk intensive should bypass AUFS by using volumes. [...] Although containers themselves have almost no overhead, Docker is not without performance gotchas. Docker volumes have noticeably better performance than files stored in AUFS.

Related

Can a Docker volume do transparent decompression for read-only files?

tl;dr What are best practices for exposing read-only data to a Dockerized app, with transparent decompression?
I have a Dockerized app which needs read-only access to a set of data files totalling 80 GB. Currently, I manually copy these to hosts, and expose them via a bind-mount. I would like to migrate these files to a Docker volume, but their size is an issue.
These files compress well, down to 15 GB. Is it possible to take advantage of that compression in the Docker volume, to avoid eagerly decompressing the full 80 GB to the host? For example, can a Docker volume use SquashFS or similar for on-demand decompression?
Things I've tried:
Using the btrfs storage driver, but this requires that the host be configured with a dedicated block device.
Using fuse-zip and mounting inside the container, but this is very hacky and requires extending additional SYS_ADMIN capabilities to the container.
In case it's relevant, the files are accessed linearly, not random-access. Thank you for any help!

Docker backing filesystem as AWS efs

I am mounting an AWS efs filesystem on /var/lib/docker and using it as the default docker backing filesystem. Storage driver is overlay2. I see in the docs that overlay2 only supports xfs and ext. My aim is to mount this backing filesystem on multiple machines so that all those machines have the image data but multiple mount is not supported by aws ebs(being a ext4 and a supported backing fs by overlay2). One way could by that is pull the images on an ext4 fs and cp the image data into the efs but it would be too time taking. What could be another way to go about this?
The short answer is "don't do that" because /var/lib/docker is not designed to be shared by multiple daemons. You'll find race conditions, erroneous output about networks and containers that don't exist locally, and other errors that won't be fixed/supported.
Instead, put a registry near your cluster, in the same VPC/AZ, and have your nodes pull from that cluster. Or have a look at the work done to support estargz in runtimes like containerd which can start running a container before the layers are completely pulled.

Docker bind mount directory vs named volume performance comparison

Is there any performance difference between following docker named volumes vs bind mounted volumes? If yes, how much numbers are we talking about?
Docker volume example:
docker run -v mysql:/var/lib/mysql mysql:tag
Docker bind mount example:
docker run -v /path/to/mysql-data:/var/lib/mysql mysql:tag
These containers are used for mostly databases like elasticsearch, mysql and mongodb. Which one should I prefer?
On a couple of platforms (MacOS, Windows with WSL 2) bind mounts are known to be especially slow.
Beyond that, you shouldn't see a perceptible performance difference between named volumes, the container filesystem, files in the image (regardless of the number of layers), or bind mounts (particularly on native Linux).
A good general rule might be to use bind mounts for config files and log files, where I/O is relatively rare but you as a human need to access the files directly; named volumes for database storage and other content where I/O is relatively frequent but as a human you can't directly read the files; and the image itself for your application code.

Docker: in memory file system

I have a docker container which does alot of read/write to disk. I would like to test out what happens when my entire docker filesystem is in memory. I have seen some answers here that say it will not be a real performance improvement, but this is for testing.
The ideal solution I would like to test is sharing the common parts of each image and copy to your memory space when needed.
Each container files which are created during runtime should be in memory as well and separated. it shouldn't be more than 5GB fs in idle time and up to 7GB in processing time.
Simple solutions would duplicate all shared files (even those part of the OS you never use) for each container.
There's no difference between the storage of the image and the base filesystem of the container, the layered FS accesses the images layers directly as a RO layer, with the container using a RW layer above to catch any changes. Therefore your goal of having the container running in memory while the Docker installation remains on disk doesn't have an easy implementation.
If you know where your RW activity is occurring (it's fairly easy to check the docker diff of a running container), the best option to me would be a tmpfs mounted at that location in your container, which is natively supported by docker (from the docker run reference):
$ docker run -d --tmpfs /run:rw,noexec,nosuid,size=65536k my_image
Docker stores image, container, and volume data in its directory by default. Container HDs are made of the original image and the 'container layer'.
You might be able set this up using a RAM disk. You would hard allocate some RAM, mount it, and format it with your file system of choice. Then move your docker installation to the mounted RAM disk and symlink it back to the original location.
Setting up a Ram Disk
Best way to move the Docker directory
Obviously this is only useful for testing as Docker and it's images, volumes, containers, etc would be lost on reboot.

Do I need a private docker registry?

I've recently discovered docker. It looks very useful for us.
But what I don't understand is the role of the registry beyond getting initial docker images. We'll likely be starting with some images based on those from docker.io, but will be customizing those and adding some private closed source software.
What concerns me is if the images were large enough then could I run out of space on my / drive.
Can /var/lib/docker just be a mount to a shared file system like cephfs or nfs?
I'm also interested in using CoreOS in a PXE or iPXE configuration. It appears that in that scenario / is mounted as tmpfs up to 50% RAM which is needlessly wasteful for pulling images that could be available on a shared file system. However I've read comments that for some reason /var/lib/docker needs to be on btrfs. Is this true? why?
Ok I've found an answer to my last question. CoreOS requires /var/lib/docker to be mounted on btrfs because it uses the btrfs backend. This backend uses btrfs snapshots to implement the layers docker uses to represent it's image.
Which helps with my second question. Can /var/lib/docker just be a mount to a shared file system. By the looks of it, no. Not unless the super slow vfs backend is used.
It's easy and cheap to store your registry in S3.
I would recommend against mounting /var/lib/docker on nfs. If someone hammers the nfs, all your services will essentially stop working, since the file systems of the containers live there.

Resources