In docker, how are storage driver and backing file system different? - docker

The docker info command lists both storage driver e.g. device-mapper and backing filesystem e.g. XFS. What is the meaning of these two attributes and how are they different from each other?

The "storage driver" is software component that docker uses to manages storage. This may be one of the overlay drivers, which use the overlay filesystem driver in the kernel, or the devicemapper driver, which allocates chunks of storage using the Linux device mapper, or any of several other drivers.
At some level, all of these drivers need to store files, which means they need to use a fileysstem. In the case of overlay-type drivers (like overlay, overlay2, aufs) this is an existing filesystem in your host. For the devicemapper driver (and similar drivers that operate on block storage), this is a filesystem created on the block devices that Docker carves out of the devicemapper storage. The "backing filesystem" is the filesystem being used to store the files, and will be something like "XFS" or "ext4", etc.

Some of the Docker storage drivers use regular files on top of an existing file system. aufs, overlay, overlay2 and devicemapper in loop-lvm mode all work via a formatted "backing filesystem".
btrfs, zfs and devicemapper in direct-lvm mode use a volume/device directly so there is no formatted file system in between the Docker container and the actual device being used. A file system is still required for Docker to store data on devices so it will create one using the reported "backing filesystem".

Related

How can I manipulate storage devices outside of Docker?

I'd like to spin up an Ubuntu image with certain tools like testdisk for disk recovery. How can I manage all detected volumes on the host machine with testdisk inside a Docker container?
The O'reilly info worked for Windows with supposed limitations (inability to repartition). I'm assuming if you use disk management to see the disk number (0,1,2,etc) it will correspond to the sd# you have to reference. Supposedly with Windows Server Editions, you can use the device flag and specify a device class GUID to share inside Docker. But like previously mentioned, it isn't raw access but rather a shared device.

is docker storage driver a persistent storage

i'm new to docker and i'm trying to understand persistent storage in docker.
in the section Manage Application data > Store data within containers > About storage drivers
https://docs.docker.com/storage/storagedriver/ Storage drivers allow
you to create data in the writable layer of your container. The files
won’t be persisted after the container is deleted, and both read and
write speeds are lower than native file system performance.
but later on the section Manage Application data > Store data within containers > Use the Device Mapper storage driver
https://docs.docker.com/storage/storagedriver/device-mapper-driver/
they use direct-lvm that creates logical volumes that allow to persist data
my question : using a storage driver means :
the container-generated data is ephemeral ?
the container-generated data is ephemeral if we are using a logical
volume on a loopback-device ?
the container-generated data is persistent if we are using a logical
volume on a block device ?
The storage driver configuration is essentially an install-time setting that's not really relevant once you've gotten it set up correctly. In particular if you run docker info and it says it's using an overlay2 driver I would recommend closing this particular browser tab and not changing anything.
Of the paragraph you quoted, the important thing to take away is that files you create inside a container, that aren't inside a volume directory, will be lost as soon as the container is deleted. It doesn't matter what underlying storage driver you're using. The performance differences between the container filesystem, named volumes, and bind-mounts almost never matter (except on MacOS hosts where bind mounts are very very slow).
The data the storage driver persists includes both the temporary container filesystems (they get persisted until the container is deleted) and the underlying image data. It does not include named Docker volumes or other bind-mounted host directories.
If you're using devicemapper, you might see if you can upgrade your host to a newer Linux distribution that can use the overlay2 driver. In particular that avoids the fixed space limit of the devicemapper driver. If you must use devicemapper, general wisdom has been that using a dedicated partition for it is better than using a file. As I said up front, though, this is essentially install-time configuration and has no bearing on your application or docker run commands.

How to bypass memory caching while using FIO inside of a docker container?

I am trying to benchmark I/O performance on my host and docker container using flexible IO tool with O_direct enabled in order to bypass memory caching. The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all. even if I ran it with --privileged mode. This is the command I ran inside of a container, Any suggestions?
fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=4k --numjobs=1 --size=10G --runtime=600 --group_reporting --output-format=json >/home/docker/docker_seqread_4k.json
(Note this isn't really a programming question so Stackoverflow is the wrong place to ask this... Maybe Super User or Serverfault would be a better choice and get faster answers?)
The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all.
If your best case latencies are suspiciously small compared to your worst case latencies it is highly likely your suspicions are well founded and that kernel caching is still happening. Asking for O_DIRECT is a hint not an order and the filesystem can choose to ignore it and use the cache anyway (see the part about "You're asking for direct I/O to a file in a filesystem but...").
If you have the option and you're interested in disk speed, it is better to do any such test outside of a container (with all the caveats that implies). Another option when you can't/don't want to disable caching is ensure that you do I/O that is at least two to three times the size (both in terms of amount and the region being used) of RAM so the majority of I/O can't be satisfied by buffers/cache (and if you're doing write I/O then do something like end_fsync=1 too).
In summary, the filesystem being used by docker may make it impossible to accurately do what you're requesting (measure the disk speed by bypassing cache while using whatever your default docker filesystem is).
Why a Docker benchmark may give the results you expect
The Docker engine uses, by default, the OverlayFS [1][2] driver for data storage in a containers. It assembles all of the different layers from the images and makes them readable. Writing is always done to the "top" layer, which is the container storage.
When performing reads and writes to the container's filesystem, you're passing through Docker's overlay2 driver, through the OverlayFS kernel driver, through your filesystem driver (e.g. ext4) and onto your block device. Additionally, as Anon mentioned, DIRECT/O_DIRECT is just a hint, and may not be respected by any of the layers you're passing through.
Getting more accurate results
To get an accurate benchmarks within a Docker container, you should write to a volume mount or change your storage driver to one that is not overlaid, such as the Device Mapper driver or the ZFS driver.
Both the Device Mapper driver and the ZFS driver require a dedicated block device (you'll likely need a separate hard drive), so using a volume mount might be the easiest way to do this.
Use a volume mount
Use the -v options with a directory that sits on a block device on your host.
docker run -v /absolute/host/directory:/container_mount_point alpine
Use a different Docker storage driver
Note that the storage driver must be changed on the Docker daemon (dockerd) and cannot be set per container. From the documentation:
Important: When you change the storage driver, any existing images and containers become inaccessible. This is because their layers cannot be used by the new storage driver. If you revert your changes, you can access the old images and containers again, but any that you pulled or created using the new driver are then inaccessible.
With that disclaimer out of the way, you can change your storage driver by editing daemon.json and restarting dockerd.
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.directlvm_device=/dev/sd_",
"dm.thinp_percent=95",
"dm.thinp_metapercent=1",
"dm.thinp_autoextend_threshold=80",
"dm.thinp_autoextend_percent=20",
"dm.directlvm_device_force=false"
]
}
Additional container benchmark notes - kernel
If you are trying to compare different flavors of Linux, keep in mind that Docker is still running on your host machine's kernel.

Do Docker containers on the same host machine share the same page cache?

If I have two Docker containers running on the same host machine do they each have their own page cache or do they use the page cache of the host machine?
Page cache is managed by the kernel, which is used by all the containers.
See more at moby/moby issue 21759
Docker makes it easy to spawn a lot of containers and get better density, but it also makes it easy to run too many services on one machine or to run services which require way too much RAM.
The official documentation lists devicemapper (direct-lvm) as a production ready storage driver, but it doesn't have very efficient memory usage. The official documentation doesn't state otherwise either. Multiple identical containers will increase memory usage for the page cache.
In order to make this better and get better performance, the following should help, in a similar way to how it helps outside of Docker and containers in general:
make containers smaller for long running services & applications (e.g. smaller binaries, smaller images, optimize memory usage, etc)
VERY IMPORTANT: use volumes and bind mounts, instead of storing data inside the container
VERY IMPORTANT: make sure to run a system with a maintained kernel, up to date Docker and devicemapper libraries (e.g. fully updated CentOS 7 / RHEL 7 / Ubuntu 14.04 / Ubuntu 16.04)
Current behaviour (January 2020) is that by default containers on the same host share the same page cache.
Current docker documentation explains:
OverlayFS is a modern union filesystem that is similar to AUFS, but faster and with a simpler implementation. Docker provides two storage drivers for OverlayFS: the original overlay, and the newer and more stable overlay2.
The overlay2 driver is supported on Docker Engine - Community, and Docker EE 17.06.02-ee5 and up, and is the recommended storage driver.
Page Caching. OverlayFS supports page cache sharing. Multiple containers accessing the same file share a single page cache entry for that file. This makes the overlay and overlay2 drivers efficient with memory and a good option for high-density use cases such as PaaS
https://docs.docker.com/storage/storagedriver/overlayfs-driver/

How to limit Docker filesystem space available to container(s)

The general scenario is that we have a cluster of servers and we want to set up virtual clusters on top of that using Docker.
For that we have created Dockerfiles for different services (Hadoop, Spark etc.).
Regarding the Hadoop HDFS service however, we have the situation that the disk space available to the docker containers equals to the disk space available to the server. We want to limit the available disk space on a per-container basis so that we can dynamically spawn an additional datanode with some storage size to contribute to the HDFS filesystem.
We had the idea to use loopback files formatted with ext4 and mount these on directories which we use as volumes in docker containers. However, this implies a large performance loss.
I found another question on SO (Limit disk size and bandwidth of a Docker container) but the answers are almost 1,5 years old which - regarding the speed of development of docker - is ancient.
Which way or storage backend would allow us to
Limit storage on a per-container basis
Has near bare-metal performance
Doesn't require repartitioning of the server drives
You can specify runtime constraints on memory and CPU, but not disk space.
The ability to set constraints on disk space has been requested (issue 12462, issue 3804), but isn't yet implemented, as it depends on the underlying filesystem driver.
This feature is going to be added at some point, but not right away. It's a bit more difficult to add this functionality right now because a lot of chunks of code are moving from one place to another. After this work is done, it should be much easier to implement this functionality.
Please keep in mind that quota support can't be added as a hack to devicemapper, it has to be implemented for as many storage backends as possible, so it has to be implemented in a way which makes it easy to add quota support for other storage backends.
Update August 2016: as shown below, and in issue 3804 comment, PR 24771 and PR 24807 have been merged since then. docker run now allow to set storage driver options per container
$ docker run -it --storage-opt size=120G fedora /bin/bash
This (size) will allow to set the container rootfs size to 120G at creation time.
This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers
Documentation: docker run/#Set storage driver options per container.

Resources