Is a Docker virtual volume fast enough for MariaDB read / write operations? - docker

I have a question regarding MariaDB and Docker. Is it wise to use the volume that is already provided with the official MariaDB-Docker-image? Or is it better to create a folder that is shared with the host for better performance? One of my colleagues was afraid that read / write operations could be too slow in the virtual volume.
In my opinion, read / write should be fast enough on that virtual volume as Docker only utilizes the Linux core system, right?
Thank you in advance!

I think you are asking if there is a performance difference between volumes and bind mounts.
The answer is there shouldn't be. Both types bypass the slow copy-on-write storage drivers and are stored directly on the host:
From Performance best practices:
Use volumes for write-heavy workloads: Volumes provide the best and
most predictable performance for write-heavy workloads. This is
because they bypass the storage driver and do not incur any of the
potential overheads introduced by thin provisioning and copy-on-write...

Related

How to bypass memory caching while using FIO inside of a docker container?

I am trying to benchmark I/O performance on my host and docker container using flexible IO tool with O_direct enabled in order to bypass memory caching. The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all. even if I ran it with --privileged mode. This is the command I ran inside of a container, Any suggestions?
fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=4k --numjobs=1 --size=10G --runtime=600 --group_reporting --output-format=json >/home/docker/docker_seqread_4k.json
(Note this isn't really a programming question so Stackoverflow is the wrong place to ask this... Maybe Super User or Serverfault would be a better choice and get faster answers?)
The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all.
If your best case latencies are suspiciously small compared to your worst case latencies it is highly likely your suspicions are well founded and that kernel caching is still happening. Asking for O_DIRECT is a hint not an order and the filesystem can choose to ignore it and use the cache anyway (see the part about "You're asking for direct I/O to a file in a filesystem but...").
If you have the option and you're interested in disk speed, it is better to do any such test outside of a container (with all the caveats that implies). Another option when you can't/don't want to disable caching is ensure that you do I/O that is at least two to three times the size (both in terms of amount and the region being used) of RAM so the majority of I/O can't be satisfied by buffers/cache (and if you're doing write I/O then do something like end_fsync=1 too).
In summary, the filesystem being used by docker may make it impossible to accurately do what you're requesting (measure the disk speed by bypassing cache while using whatever your default docker filesystem is).
Why a Docker benchmark may give the results you expect
The Docker engine uses, by default, the OverlayFS [1][2] driver for data storage in a containers. It assembles all of the different layers from the images and makes them readable. Writing is always done to the "top" layer, which is the container storage.
When performing reads and writes to the container's filesystem, you're passing through Docker's overlay2 driver, through the OverlayFS kernel driver, through your filesystem driver (e.g. ext4) and onto your block device. Additionally, as Anon mentioned, DIRECT/O_DIRECT is just a hint, and may not be respected by any of the layers you're passing through.
Getting more accurate results
To get an accurate benchmarks within a Docker container, you should write to a volume mount or change your storage driver to one that is not overlaid, such as the Device Mapper driver or the ZFS driver.
Both the Device Mapper driver and the ZFS driver require a dedicated block device (you'll likely need a separate hard drive), so using a volume mount might be the easiest way to do this.
Use a volume mount
Use the -v options with a directory that sits on a block device on your host.
docker run -v /absolute/host/directory:/container_mount_point alpine
Use a different Docker storage driver
Note that the storage driver must be changed on the Docker daemon (dockerd) and cannot be set per container. From the documentation:
Important: When you change the storage driver, any existing images and containers become inaccessible. This is because their layers cannot be used by the new storage driver. If you revert your changes, you can access the old images and containers again, but any that you pulled or created using the new driver are then inaccessible.
With that disclaimer out of the way, you can change your storage driver by editing daemon.json and restarting dockerd.
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.directlvm_device=/dev/sd_",
"dm.thinp_percent=95",
"dm.thinp_metapercent=1",
"dm.thinp_autoextend_threshold=80",
"dm.thinp_autoextend_percent=20",
"dm.directlvm_device_force=false"
]
}
Additional container benchmark notes - kernel
If you are trying to compare different flavors of Linux, keep in mind that Docker is still running on your host machine's kernel.

What's the fastest way to use Ceph volumes in Docker Swarm?

I want to set up a Swarm with persistent and replicated volumes through Ceph. I see these options to combine both services, once both are set up:
Configure the host OS to mount a CephFS in /var/lib/docker/volumes.
Use rexray/rbd as a volume driver.
Use rexray/s3fs to access Ceph object store, which is S3-compatible.
I wonder now: which option would deliver fastest performance? Is there another better option that I'm missing?
Thanks.
In general for best performance you should go for rbd, since it provides you with direct block access to the ceph volume, whereas s3fs is quite much more machinery to be spun, which eventually result in longer response times. Having quick responses for random read/writes is especially important when you have a scenario like running a postgreSQL (or MariaDB) database with mixed read/write load.
This is only a general advice looking at Ceph rbd. But my guess is this will apply as well to docker storage drivers.

How is concurrency managed in volumes when they are shared by multiple containers?

How is concurrency managed in volumes when the volumes are shared by multiple containers?
It is the same way how a shared path / directory / file is managed by multiple processes / applications without containers.
Avoid parallel / simultaneous writes using locks, semaphores, mutexes and mutual exclusion.
Be careful about stale / old data when reading from shared volume.
Keep in mind things like eventual consistency, volume backups and data migration since the volume is shared by multiple containers.
Make sure that the shared volume is not corrupted. If corrupted, have a recovery plan.
Use dedicated storage servers so that containers can move in the cluster (hostPath and emptyDir in kubernetes do not move with pods).
The underlying OS, disk, storage software / driver, LVM, filesystem will also handle IO concurrency.
As it would be on a single host with multiple processes, cause that's what it effectively is.

Do Docker containers on the same host machine share the same page cache?

If I have two Docker containers running on the same host machine do they each have their own page cache or do they use the page cache of the host machine?
Page cache is managed by the kernel, which is used by all the containers.
See more at moby/moby issue 21759
Docker makes it easy to spawn a lot of containers and get better density, but it also makes it easy to run too many services on one machine or to run services which require way too much RAM.
The official documentation lists devicemapper (direct-lvm) as a production ready storage driver, but it doesn't have very efficient memory usage. The official documentation doesn't state otherwise either. Multiple identical containers will increase memory usage for the page cache.
In order to make this better and get better performance, the following should help, in a similar way to how it helps outside of Docker and containers in general:
make containers smaller for long running services & applications (e.g. smaller binaries, smaller images, optimize memory usage, etc)
VERY IMPORTANT: use volumes and bind mounts, instead of storing data inside the container
VERY IMPORTANT: make sure to run a system with a maintained kernel, up to date Docker and devicemapper libraries (e.g. fully updated CentOS 7 / RHEL 7 / Ubuntu 14.04 / Ubuntu 16.04)
Current behaviour (January 2020) is that by default containers on the same host share the same page cache.
Current docker documentation explains:
OverlayFS is a modern union filesystem that is similar to AUFS, but faster and with a simpler implementation. Docker provides two storage drivers for OverlayFS: the original overlay, and the newer and more stable overlay2.
The overlay2 driver is supported on Docker Engine - Community, and Docker EE 17.06.02-ee5 and up, and is the recommended storage driver.
Page Caching. OverlayFS supports page cache sharing. Multiple containers accessing the same file share a single page cache entry for that file. This makes the overlay and overlay2 drivers efficient with memory and a good option for high-density use cases such as PaaS
https://docs.docker.com/storage/storagedriver/overlayfs-driver/

How to limit Docker filesystem space available to container(s)

The general scenario is that we have a cluster of servers and we want to set up virtual clusters on top of that using Docker.
For that we have created Dockerfiles for different services (Hadoop, Spark etc.).
Regarding the Hadoop HDFS service however, we have the situation that the disk space available to the docker containers equals to the disk space available to the server. We want to limit the available disk space on a per-container basis so that we can dynamically spawn an additional datanode with some storage size to contribute to the HDFS filesystem.
We had the idea to use loopback files formatted with ext4 and mount these on directories which we use as volumes in docker containers. However, this implies a large performance loss.
I found another question on SO (Limit disk size and bandwidth of a Docker container) but the answers are almost 1,5 years old which - regarding the speed of development of docker - is ancient.
Which way or storage backend would allow us to
Limit storage on a per-container basis
Has near bare-metal performance
Doesn't require repartitioning of the server drives
You can specify runtime constraints on memory and CPU, but not disk space.
The ability to set constraints on disk space has been requested (issue 12462, issue 3804), but isn't yet implemented, as it depends on the underlying filesystem driver.
This feature is going to be added at some point, but not right away. It's a bit more difficult to add this functionality right now because a lot of chunks of code are moving from one place to another. After this work is done, it should be much easier to implement this functionality.
Please keep in mind that quota support can't be added as a hack to devicemapper, it has to be implemented for as many storage backends as possible, so it has to be implemented in a way which makes it easy to add quota support for other storage backends.
Update August 2016: as shown below, and in issue 3804 comment, PR 24771 and PR 24807 have been merged since then. docker run now allow to set storage driver options per container
$ docker run -it --storage-opt size=120G fedora /bin/bash
This (size) will allow to set the container rootfs size to 120G at creation time.
This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers
Documentation: docker run/#Set storage driver options per container.

Resources