Files deleted from within /var/lib/docker/aufs/diff/ - docker

The Docker docs state:
Warning: Do not directly manipulate any files or directories within /var/lib/docker/. These files and directories are managed by Docker.
Let's say someone hasn't read that hint and deleted some files from /var/lib/docker/aufs/diff to free up some disk space. These files didn't live in a Docker volume and are not part of the original Docker image but have been created in the container writable layer. Restarting the given container frees up the disk space but are there any known side effects?
And for the next time: Does removing that kind of files or directories from within the container (via docker exec .. rm ..) result in a proper removal or are they only marked as deleted? The documentation currently doesn't describe this special case.

Restarting the given container frees up the disk space but are there any known side effects?
As you stated in your question, you should not "manipulate any files or directories within /var/lib/docker/", as any side-effect may appear and no documentation trace anything about this: it's internal Docker plumbing which may highly change other Docker versions, ut's not supposed to be exposed to end-users nor be tempered with. You could look at Docker code for your Docker version and all it's dependencies to understand what happened, but it's not really practical :-)
are there any known side effects?
There maybe be side effects - I insist on the may as anything can happen depending on your Docker version and configuration. Even if it may seem to be working, some things may be broken.
Well known side effect is Docker installation corruption, which may have present itself in various fashions: random container crash, data loss, unexplained bug, etc.
Best case scenario, you just discarded some data in your container and everything will work fine in the future.
Not-so-good scenario: you actually broke something in your installation and corrupted it, you'll be better of re-installing Docker entirely.
Does removing that kind of files or directories from within the container (via docker exec .. rm ..) result in a proper removal or are they only marked as deleted?
Deleting a file in the container will not always remove it from the system, it depends on the drive your are using. Doc has a section about writing files for all of them:
AUFS - it seemed implied that file is deleted, AUFS will copy the file from the image layer and work on it, it should then delete the copy
When a file is deleted within a container, a whiteout file is created in the container layer. The version of the file in the image layer is not deleted [...] Subsequent writes to the same file operate against the copy of the file already copied up to the container.
BTRFS - deleted and space reclaimed, doc is quite clear:
If a container creates a file and then deletes it, this operation is performed in the Btrfs filesystem itself and the space is reclaimed.
devicemapper - may not be deleted depending on config:
if you are using direct-lvm, the blocks are freed. If you use loop-lvm, the blocks may not be freed
OverlayFS - seemed implied that file is deleted, but the image file is kept
When a file is deleted within a container, a whiteout file is created in the container (upperdir). The version of the file in the image layer (lowerdir) is not deleted
ZFS - deleted:
If you create and then delete a file or directory within the container’s writable layer, the blocks are reclaimed by the zpool.
VFS is using a copy of the previous layer and work directly in a directory representing that layer, a deletion in the container should probably delete it from the related directory on host machine
The documentation currently doesn't describe this special case.
Yes, and it probably won't ;)

Related

var/lib/docker/containers/* eats my hard disk space

My raspberrypi suddenly had no more free space.
By looking at the folder sizes with the following command:
sudo du -h --max-depth=3
I noticed that a docker folder eats an incredible amount of hard disk space. It's the folder
var/lib/docker/containers/*
The folder seems to contain some data for the current running docker containers. The first letters of the filename correspond to the docker container-ID. One folder seems to grow dramatically fast. After stopping the affected container and removed him, the related folder disappeared. So the folder seems to have belonged to it.
Problem solved.
I wonder now what the reason could be that this folder size increases so much. Further, I wonder what is the best way to not run into the same problem again later.
I could write a bash script which removes the related container at boot and run it again. Better ideas are very welcome.
The container ids are directories, so you can look inside to see what is using space in there. The two main reasons are:
Logs from stdout/stdere. These can be limited with added options. You can view these with docker logs.
Filesystem changes. The underlying image filesystem is not changed, so any writes trigger a copy-on-write to a directory within each container id. You can view these with docker diff.

Why should our work inside the container shouldn't modify the content of the container itself?

I am reading an article related to docker images and containers.
It says that a container is an instance of an image. Fair enough. It also says that whenever you make some changes to a container, you should create an image of it which can be used later.
But at the same time it says:
Your work inside a container shouldn’t modify the container. Like
previously mentioned, files that you need to save past the end of a
container’s life should be kept in a shared folder. Modifying the
contents of a running container eliminates the benefits Docker
provides. Because one container might be different from another,
suddenly your guarantee that every container will work in every
situation is gone.
What I want to know is that, what is the problem with modifying container's contents? Isn't this what containers are for? where we make our own changes and then create an image which will work every time. Even if we are talking about modifying container's content itself and not just adding any additional packages, how will it harm anything since the image created from this container will also have these changes and other containers created from that image will inherit those changes too.
Treat the container filesystem as ephemeral. You can modify it all you want, but when you delete it, the changes you have made are gone.
This is based on a union filesystem, the most popular/recommended being overlay2 in current releases. The overlay filesystem merges together multiple lower layers of the image with an upper layer of the container. Reads will be performed through those layers until a match is found, either in the container or in the image filesystem. Writes and deletes are only performed in the container layer.
So if you install packages, and make other changes, when the container is deleted and recreated from the same image, you are back to the original image state without any of your changes, including a new/empty container layer in the overlay filesystem.
From a software development workflow, you want to package and release your changes to the application binaries and dependencies as new images, and those images should be created with a Dockerfile. Persistent data should be stored in a volume. Configuration should be injected as either a file, environment variable, or CLI parameter. And temp files should ideally be written to a tmpfs unless those files are large. When done this way, it's even possible to make the root FS of a container read-only, eliminating a large portion of attacks that rely on injecting code to run inside of the container filesystem.
The standard Docker workflow has two parts.
First you build an image:
Check out the relevant source tree from your source control system of choice.
If necessary, run some sort of ahead-of-time build process (compile static assets, build a Java .jar file, run Webpack, ...).
Run docker build, which uses the instructions in a Dockerfile and the content of the local source tree to produce an image.
Optionally docker push the resulting image to a Docker repository (Docker Hub, something cloud-hosted, something privately-run).
Then you run a container based off that image:
docker run the image name from the build phase. If it's not already on the local system, Docker will pull it from the repository for you.
Note that you don't need the local source tree just to run the image; having the image (or its name in a repository you can reach) is enough. Similarly, there's no "get a shell" or "start the service" in this workflow, just docker run on its own should bring everything up.
(It's helpful in this sense to think of an image the same way you think of a Web browser. You don't download the Chrome source to run it, and you never "get a shell in" your Web browser; it's almost always precompiled and you don't need access to its source, or if you do, you have a real development environment to work on it.)
Now: imagine there's some critical widespread security vulnerability in some core piece of software that your application is using (OpenSSL has had a couple, for example). It's prominent enough that all of the Docker base images have already updated. If you're using this workflow, updating your application is very easy: check out the source tree, update the FROM line in the Dockerfile to something newer, rebuild, and you're done.
Note that none of this workflow is "make arbitrary changes in a container and commit it". When you're forced to rebuild the image on a new base, you really don't want to be in a position where the binary you're running in production is something somebody produced by manually editing a container, but they've since left the company and there's no record of what they actually did.
In short: never run docker commit. While docker exec is a useful debugging tool it shouldn't be part of your core Docker workflow, and if you're routinely running it to set up containers or are thinking of scripting it, it's better to try to move that setup into the ordinary container startup instead.

What does docker/aufs/diff directory contain? Why is it so big?

Its size is 20G and it contains tons of hash like 00074a74d6cf2052eeb6a9e61bd2b407b464bce6a23a4596ce2e9100f58b6de6.
What is this "diff" folder for?
Docker works as a union file system or ufs. A diff in Docker terms is simply the difference in filesystem. Like git, it takes an initial read only image and builds the final container by layering your diffs. Everytime you do something in the container it creates a change in the layer which may be commited to a new image via docker commit. If you know what youre doing you can delete those diffs and clean out your disk space.
Likely there's been many changes or large files committed to those layered or diff file systems.
This will clean out your system. Be careful, it could delete something you might want.
docker system prune
First off, you don't want to interact with files in /var/lib/docker, those are only meant to be interacted with by Docker.
In terms of what the aufs/diff directory contains:
AUFS is a union filesystem, which means that it layers multiple directories on a single Linux host and presents them as a single directory. These directories are called branches in AUFS terminology, and layers in Docker terminology. The unification process is referred to a a union mount.
diff specifically contains:
the contents of each layer, each stored in a separate subdirectory
However, this changes if the container is running, in which case it contains:
Differences introduced in the writable container layer, such as new or modified files.
Source: https://docs.docker.com/storage/storagedriver/aufs-driver/#how-the-aufs-storage-driver-works

How to synchronize 2 docker container's overlay2 filesystems?

I happen to use docker in a questionable way for a specific purpose:
I have a container with a few development tools and their configurations. These are typically tools found in operating systems that are configured with dotconf files usually found in home directories (like tmux, vim, git, programming languages...). This is normally handled by configuring an OS with tools and dotfiles.
But with the setup becoming more complex over the years, properly setting up a new OS is becoming very hard. On the other hand, moving a container around machines is very simple. So for that precise case, I happen to use docker in a way that goes against the "docker way". But that is really comfortable to me.
That being said, I also want to synchronize the container's filesystem with another container (which, in my understanding is definitely not the "docker way", but yet.) I want to run 2 instances of the same image, on different machines. Then I want to synchronize their read write layer that is on top of the image. So when a file is created, deleted or modified on one, this is replicated on the other.
I was thinking of using rsync or unison to do that. But I don't know how the overlay2 driver works. Are the directories in /var/lib/docker/overlay2/<container-id> the actual containers filesystem layer? Or should it be mounted? I saw some people mount their containers filesystem on the host with the device mapper driver fairly easily. Would that make sense with overlay2?
I think your best option here is to use a bind mount. This changes your initial design a bit - but it will likely be the cleanest, and easiest to implement.
First things first - you'll want to ensure that any files that you want to have synced are in a specific folder; so rather than rsyncing the entire underlying filesystem - you'll just do, for example /app/my_files inside your container and set your application to read/write from there.
Now - create your directory and setup the rsync between your machines; let's say it's at /rsync
Lastly - run your containers and use a bind-mount; which if you're just bringing up a container would look like: docker run -d -v /rsync:/app/my_files my_image
After reading this page: https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ and experimenting with image / container creation and deletion, I have found out a few things.
When creating an image, the folder /var/lib/docker/overlay2 is populated with a new folder, called with what seems to be a hash (let's call it 123), and is itself populated with the image content / filesystem.
Then when creating the container from this same image, two more folders get created inside /var/lib/docker/overlay2, also named with what seems to be another hash, with one of them having -init at the end. Let's call them 456 and 456-init. They seem to contain the container layer.
When looking at the output of docker inspect <container-name>, the GraphDriver section has some info about how docker uses the overlay2. The lower dir contains the init container dir + the image dir as in: /var/lib/docker/overlay2/456-init/diff:/var/lib/docker/overlay2/123/diff. I don't fully understand how that works, but I understand that I am not interested in the lower dir since it should be the image dir in read only mode. And that is something I already have on all hosts and thus do not need to sync.
Then in my understanding the upper dir in overlay2 is the read write layer that the container uses on top of the image layer. In the GraphDriver this is found to be /var/lib/docker/overlay2/456/diff. That is the directory that gets the changes made inside the container. I could not find the documentation so I experimented a bit and found out that this upper dir never changed during the life of the container. I stopped and started it and the upper dir stayed the same. Then when removing the container this folder is deleted. And when creating the container again, a new folder with a different name is created.
So it looks like what I need to sync is this upper dir, which can be found with docker inspect. I'll experiment a bit more with that.

Is it a docker best practice to use volume for the code?

The VOLUME instruction should be used to expose any database storage area, configuration storage, or files/folders created by your docker container. You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image.
will you store your code in volume?
Such as your jar files. It could be a little convenient to deploy the application without rebuilding the image.
Are there any considerations if storing the code in volume? like performance, security or others.
I don't recommend using a VOLUME statement inside the Dockerfile for anything with current versions of docker (current being any version of docker since the introduction of named volumes). Including a VOLUME command has multiple downsides, including:
possible inability to change contents at that location of the image with any later steps or child images (this behavior appears to be different with different scenarios and different versions of docker)
potential to create volumes with just a hash for the name that clutter up the docker volume ls output and are very difficult to find and reuse later if you needed the data inside
for your changing code, if you place it in a volume and recreate your container from a new version of the image, the volume will still have the old copy of your code unless you update that volume yourself (the key feature of volumes is persistent data that you want to keep between image versions)
I do recommend putting your data in a volume that you define on the docker run command line or inside a docker-compose.yml. Volumes defined there can have a name or map back to a path on the docker host. And you can make any folder or file a volume without needing to define it in the Dockerfile. Volumes defined at this step doesn't impact the image, allowing you to extend an image without being locked out of making changes to a directory.
For your code, it is a common best practice to inject code with a volume if it is interpreted (e.g. javascript) or already compiled (e.g. a jar file) during application development. You would define the volume on the container (not the Dockerfile), and overlay the code or binaries that were also copied into the image using the same filenames. This allows you to rapidly iterate in development without frequently rebuilding the image. Depending on the application, you may be able to live reload the code, otherwise, a container restart should be all that's needed to see the latest change. And once development is finished, you rebuild the image with your current code and ship that to someone that can use it without needing the volume mount for the code.
I've also blogged about my concerns with volumes inside of Dockerfiles if you'd like to see more details on this.
You say:
It could be a little convenient to deploy the application without rebuilding the image.
Instead of that, it has a lot of advantages to encapsulate your application version inside an image build. You can easily deploy your app only deploying the image, so the fact that you use a volume for app code leads you to orchestrate some other deployment method to update that volume too.
And you have to (eventually) match the jar version with the proper image version.
Regarding security or performance, I don't think that there are special considerations.
Anyway, it is not a common approach to use volumes for that. And as #BMitch say, using VOLUME inside Dockerfile is some tricky.

Resources