So I understand that docker is using /var/lib/docker/ to store every container and images... right?
That means the only optimization that I can do to my container is to optimize the underlying fs that /var/lib/docker/ is sitting on?
In that sense, can I assume that I should be optimizing the mount options of my underlying system fs? e.g. ext4 noatime, noadirtime etc etc
Also, can i use a different mount per /var/lib/docker/folder?? Any limitations and optimization settings considerations for the underlying disk docker is sitting on?
I would rather squash my images
https://github.com/jwilder/docker-squash
and prefer Debian, for example as said here
https://docker.cn/p/6-dockerfile-tips-official-images-en
extract
The main advantage of the Debian image is the smaller size – it clocks in at around 85.1 MB compared to around 200 MB for Ubuntu.
Related
I am using Docker Desktop for Windows on Windows 10.
I was experiencing issues with system SSD always being full and moved 'docker-desktop-data' distro (which is used to store docker images and other stuff) out of the system drive to drive D: which is HDD using this guide.
Finally, I was happy to have a lot of space on my SSD... but docker containers started to work slower. I guess this happens due to HDD write/read operations being slower than on SSD.
Is there a better way to solve the problem of the continuously growing size of Docker distro's without impacting how fast containers actually work and images are built?
Actually only be design. As you know, a docker container is layered. So it might be feasible to check if it is possible to create something like a "base-container" from which your actual image in derived.
Also it might be sensible to check if your base distro is small enough. I often have seen containers created from full blown Debian or Ubuntu distros. Thats not the best idea. Try to derive from an alpine version or check for even smaller approaches.
I have a Kubernetes-cluster with 1 master-node and 3 worker-nodes. All Nodes are running on CentOS 7 with Docker 19.06. I'm also running Longhorn for dynamic provisioning of volumes (if that's important).
My problem is that every few days one of the worker nodes grows the HDD-usage to 85% (43GB). This is not a linear increase, but happens over a few hours, sometimes quite rapidly. I can "solve" this problem for a few days by first restarting the docker service and then doing a docker system prune -a. If I don't restart the service first, the prune removes next to nothing (only a few MB).
I also tried to find out which container is taking up all that space, but docker system df says it doesn't use the space. I used df and du to crawl along the /var/lib/docker subdirectories too, and it seems none of the folders (alone or all together) takes up much space either. Continuing this all over the system, I can't find any other big directories either. There are 24GB that I just can't account for. What makes me think this is a docker problem nonetheless is that a restart and prune just solves it every time.
Googling around I found a lot of similar issues where most people just decided to increase disk space. I'm not keen on accepting this as the preferred solution, as it feels like kicking the can down the road.
Would you have any smart ideas on what to do instead of increasing disk space?
It seems like it is expected behavior, from Docker documentation you can read:
Docker takes a conservative approach to cleaning up unused objects
(often referred to as “garbage collection”), such as images,
containers, volumes, and networks: these objects are generally not
removed unless you explicitly ask Docker to do so. This can cause
Docker to use extra disk space. For each type of object, Docker
provides a prune command. In addition, you can use docker system prune to clean up multiple types of objects at once. This topic shows
how to use these prune commands.
So it seems like you have to clean it up manually using docker system/image/container prune. Other issue might be that those containers create too much logs and you might need to clean it up.
I am trying to benchmark I/O performance on my host and docker container using flexible IO tool with O_direct enabled in order to bypass memory caching. The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all. even if I ran it with --privileged mode. This is the command I ran inside of a container, Any suggestions?
fio --name=seqread --rw=read --direct=1 --ioengine=libaio --bs=4k --numjobs=1 --size=10G --runtime=600 --group_reporting --output-format=json >/home/docker/docker_seqread_4k.json
(Note this isn't really a programming question so Stackoverflow is the wrong place to ask this... Maybe Super User or Serverfault would be a better choice and get faster answers?)
The result is very suspicious. docker performs almost 50 times better than my host machine which is impossible. It seems like docker is not bypassing the caching at all.
If your best case latencies are suspiciously small compared to your worst case latencies it is highly likely your suspicions are well founded and that kernel caching is still happening. Asking for O_DIRECT is a hint not an order and the filesystem can choose to ignore it and use the cache anyway (see the part about "You're asking for direct I/O to a file in a filesystem but...").
If you have the option and you're interested in disk speed, it is better to do any such test outside of a container (with all the caveats that implies). Another option when you can't/don't want to disable caching is ensure that you do I/O that is at least two to three times the size (both in terms of amount and the region being used) of RAM so the majority of I/O can't be satisfied by buffers/cache (and if you're doing write I/O then do something like end_fsync=1 too).
In summary, the filesystem being used by docker may make it impossible to accurately do what you're requesting (measure the disk speed by bypassing cache while using whatever your default docker filesystem is).
Why a Docker benchmark may give the results you expect
The Docker engine uses, by default, the OverlayFS [1][2] driver for data storage in a containers. It assembles all of the different layers from the images and makes them readable. Writing is always done to the "top" layer, which is the container storage.
When performing reads and writes to the container's filesystem, you're passing through Docker's overlay2 driver, through the OverlayFS kernel driver, through your filesystem driver (e.g. ext4) and onto your block device. Additionally, as Anon mentioned, DIRECT/O_DIRECT is just a hint, and may not be respected by any of the layers you're passing through.
Getting more accurate results
To get an accurate benchmarks within a Docker container, you should write to a volume mount or change your storage driver to one that is not overlaid, such as the Device Mapper driver or the ZFS driver.
Both the Device Mapper driver and the ZFS driver require a dedicated block device (you'll likely need a separate hard drive), so using a volume mount might be the easiest way to do this.
Use a volume mount
Use the -v options with a directory that sits on a block device on your host.
docker run -v /absolute/host/directory:/container_mount_point alpine
Use a different Docker storage driver
Note that the storage driver must be changed on the Docker daemon (dockerd) and cannot be set per container. From the documentation:
Important: When you change the storage driver, any existing images and containers become inaccessible. This is because their layers cannot be used by the new storage driver. If you revert your changes, you can access the old images and containers again, but any that you pulled or created using the new driver are then inaccessible.
With that disclaimer out of the way, you can change your storage driver by editing daemon.json and restarting dockerd.
{
"storage-driver": "devicemapper",
"storage-opts": [
"dm.directlvm_device=/dev/sd_",
"dm.thinp_percent=95",
"dm.thinp_metapercent=1",
"dm.thinp_autoextend_threshold=80",
"dm.thinp_autoextend_percent=20",
"dm.directlvm_device_force=false"
]
}
Additional container benchmark notes - kernel
If you are trying to compare different flavors of Linux, keep in mind that Docker is still running on your host machine's kernel.
The general scenario is that we have a cluster of servers and we want to set up virtual clusters on top of that using Docker.
For that we have created Dockerfiles for different services (Hadoop, Spark etc.).
Regarding the Hadoop HDFS service however, we have the situation that the disk space available to the docker containers equals to the disk space available to the server. We want to limit the available disk space on a per-container basis so that we can dynamically spawn an additional datanode with some storage size to contribute to the HDFS filesystem.
We had the idea to use loopback files formatted with ext4 and mount these on directories which we use as volumes in docker containers. However, this implies a large performance loss.
I found another question on SO (Limit disk size and bandwidth of a Docker container) but the answers are almost 1,5 years old which - regarding the speed of development of docker - is ancient.
Which way or storage backend would allow us to
Limit storage on a per-container basis
Has near bare-metal performance
Doesn't require repartitioning of the server drives
You can specify runtime constraints on memory and CPU, but not disk space.
The ability to set constraints on disk space has been requested (issue 12462, issue 3804), but isn't yet implemented, as it depends on the underlying filesystem driver.
This feature is going to be added at some point, but not right away. It's a bit more difficult to add this functionality right now because a lot of chunks of code are moving from one place to another. After this work is done, it should be much easier to implement this functionality.
Please keep in mind that quota support can't be added as a hack to devicemapper, it has to be implemented for as many storage backends as possible, so it has to be implemented in a way which makes it easy to add quota support for other storage backends.
Update August 2016: as shown below, and in issue 3804 comment, PR 24771 and PR 24807 have been merged since then. docker run now allow to set storage driver options per container
$ docker run -it --storage-opt size=120G fedora /bin/bash
This (size) will allow to set the container rootfs size to 120G at creation time.
This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers
Documentation: docker run/#Set storage driver options per container.
As far as I understand Docker uses memory mapped files to start from image. Since I can do this over and over again and as far as I remember start different instances of the same image in parallel, I guess docker abstracts the file system and stores changes somewhere else.
I wonder if docker can be configured (or does it by default) to run in a memory only mode without some sort of a temporary file?
Docker uses a union filesystem that allows it to work in "layers" (devicemapper, BTRFS, etc). It's doing copy-on-write so that starting new containers is cheap, and when it performs the first write, it actually creates a new layer.
When you start a container from an image, you are not using memory-mapped files to restore a frozen process (unless you built all of that into the image yourself...). Rather, you're starting a normal Unix process but inside a sandbox where it can only see its own unionfs filesystem.
Starting many copies of an image where no copy writes to disk is generally cheap and fast. But if you have a process with a long start-up time, you'll still pay that cost for every instance.
As for running Docker containers wholly in memory, you could create a RAM disk and specify that as Docker's storage volume (configurable, but typically located under /var/lib/docker).
In typical use-cases, I would not expect this to be a useful performance tweak. First, you'll spend a lot of memory holding files you won't access. The base layer of an image contains most Linux system files. If you fetch 10 packages from the Docker Hub, you'll probably hit 20G worth of images easily (after that the storage cost tends to plateau). Second, the system already manages memory and swapping pretty well (which is why a RAM disk is a performance tweak) and you get all of that applied to processes running inside a container. Third, for most of the cases where a RAM disk might help, you can use the -v flag to mount the disk as a volume on the container rather than needing to store your whole unionfs there.