EC2 - Docker container dir increase in size - docker

I am running a docker compose setup on a AWS EC2 instance with three docker container.
After a few weeks running my docker images the size of the /containers dir increases quite a bit:
8,1G /var/lib/docker/containers
0 /var/lib/docker/plugins
3,1G /var/lib/docker/overlay2
When I stop all my images and remove them and the containers and restart my docker images it looks like this:
96K /var/lib/docker/containers
0 /var/lib/docker/plugins
3,1G /var/lib/docker/overlay2
A docker image prune --all did not free anything.
So how can I prevent the var/lib/docker/containers from growing that much.

this happens because you are writing data into the container itself. you should write data to an external volume. each time you write data into the container, a new layer is created on top of the current image.
after a while, your /var/lib/docker/container will be collecting a lot of layers of changed/written file and keep growing
each time you stop your container, the layers are removed, and you are back to the original state of the image when you build them.
Quote:
Containers that write a lot of data consume more space than containers
that do not. This is because most write operations consume new space
in the container’s thin writable top layer.
Note: for write-heavy applications, you should not store the data in the container. Instead, use Docker volumes, which are independent
of the running container and are designed to be efficient for I/O. In
addition, volumes can be shared among containers and do not increase
the size of your container’s writable layer.
Reference: https://docs.docker.com/storage/storagedriver/

Related

How do I clean docker?

An error occurred because there is not enough disk space
I decided to check how much is free and came across this miracle
Cleaned up via docker system prune -a and
docker container prune -f
docker image prune -f
docker system prune -f
But only 9GB was cleared
Prune removes containers/images that have not been used for a while/stopped. I would suggest you do a docker ps -a and then remove/stop all the containers that you don't want with docker stop <container-id>, and then move on to remove docker images by docker images ps and then remove them docker rmi <image-name>
Once you have stooped/removed all the unwanted containers run docker system prune --volumes to remove all the volumes/cache and dangling images.
Don't forget to prune the volumes! Those almost always take up way more space than images and containers. docker volume prune. Be careful if you have anything of value stored in them though.
It could be your logging of the running containers. I've seen Docker logging writing disks full with logging. By default Docker containers can write unlimited logs.
I always add logging configuration to my docker-compose restrict total size. Docker Logging Configuration.
From the screenshot I think there's some confusion on what's taking up how much space. Overlay filesystems are assembled from directories on the parent filesystem. In this case, that parent filesystem is within /var/lib/docker which is part of / in your example. So the df output for each overlay filesystem is showing how much disk space is used/available within /dev/vda2. Each container isn't using 84G, that's just how much is used in your entire / filesystem.
To see how much space is being used by docker containers, images, volumes, etc, you can use docker system df. If you have running containers (likely from the above screenshot), docker will not clean those up, you need to stop them before they are eligible for pruning. Once containers have been deleted, you can then prune images and volumes. Note that deleting volumes deletes any data you were storing in that volume, so be sure it's not data you wanted to save.
It's not uncommon for docker to use a lot of disk space from downloading lots of images (docker makes it easy to try new things) and those are the easiest to prune when the containers are stopped. However what's harder to see are the logs that containers are outputting which will slowly fill the drive within the containers directory. For more details on how to clean the logs automatically, see this answer.
If you want, you could dig deep at granular level and pinpoint the file(s) which are the cause of this much disk usage.
du -sh /var/lib/docker/overley2/<container hash>/merged/* | sort -h
this would help you coming to a conclusion much easily.

What does remove container mean/do?

I am getting my hands dirty with Containers and Docker. In the world of Docker (or the world of Containers in general), I am trying to understand what "removing a container" means. The mental model I have of a container is that it is a running instance of an image. If a container has stopped executing, what resources is it consuming that need to be freed? I can understand removing the associated image(s) as they consume disk space. Maybe my mental model of a container as "a running instance of an image" is not correct, and there is more to it. If someone could shed some light, I would greatly appreciate it.
You are nearly there.
In Docker when a container is run, the docker daemon mounts the images layers using a unionfs. It adds a layer on top of the images layers to track changes that are happening inside of that container. Say a log file is written that is tracked as part of this layer.
By removing a container you remove the resources use to track this layer of changes that the container has done on top of the image.
When a container is running it consumes CPU, memory and the HDD space associated with this layer.
When a container is stopped, the CPU and memory are released but the HDD space is still used. (You can check stopped containers using docker ps -a; this will show you all the containers across all states)
When a stopped container is removed the HDD space is also freed up.
Docker also removes all the meta-data(when it was started, where its files are etc) associated with the container when you remove the container.
To have some more fun with this do this:
docker inspect <image_name>:<tag> and see its output.
docker inspect <containerid> and see its output, here containerid should be of a container running off the above image. And see if you can figure out the layers and cross relate them across the image and container.

Docker: in memory file system

I have a docker container which does alot of read/write to disk. I would like to test out what happens when my entire docker filesystem is in memory. I have seen some answers here that say it will not be a real performance improvement, but this is for testing.
The ideal solution I would like to test is sharing the common parts of each image and copy to your memory space when needed.
Each container files which are created during runtime should be in memory as well and separated. it shouldn't be more than 5GB fs in idle time and up to 7GB in processing time.
Simple solutions would duplicate all shared files (even those part of the OS you never use) for each container.
There's no difference between the storage of the image and the base filesystem of the container, the layered FS accesses the images layers directly as a RO layer, with the container using a RW layer above to catch any changes. Therefore your goal of having the container running in memory while the Docker installation remains on disk doesn't have an easy implementation.
If you know where your RW activity is occurring (it's fairly easy to check the docker diff of a running container), the best option to me would be a tmpfs mounted at that location in your container, which is natively supported by docker (from the docker run reference):
$ docker run -d --tmpfs /run:rw,noexec,nosuid,size=65536k my_image
Docker stores image, container, and volume data in its directory by default. Container HDs are made of the original image and the 'container layer'.
You might be able set this up using a RAM disk. You would hard allocate some RAM, mount it, and format it with your file system of choice. Then move your docker installation to the mounted RAM disk and symlink it back to the original location.
Setting up a Ram Disk
Best way to move the Docker directory
Obviously this is only useful for testing as Docker and it's images, volumes, containers, etc would be lost on reboot.

Why are container's size and image's size equivalent?

The glossary of docker says that
A Docker container consists of
A Docker image
Execution environment
A standard set of instructions
When I type docker images, I see 324.2 MB in SIZE column of mysql:5.6.
When I type docker ps -s -a, this command tells me that the size of the container, which is created by docker run mysql:5.6 -d, is also 324.2 MB.
Does this mean that Execution environment and A standard set of instructions do not occupy any disk space?
or the disk space they use is less than 0.1 MB?
or docker ps -s -a just lists the size of the container's image?
Because of the copy-on-write mechanism, the size of a container is... at first 0.
Meaning, you can launch 100 containers, then won't take 100 times the size of the image. They will share the filesystem proposed by the image.
Then any modification done during the life of the container will be written in a new layer, one per image.
See more at "Understand images, containers, and storage drivers":
When you create a new container, you add a new, thin, writable layer on top of the underlying stack. This layer is often called the “container layer”.
All changes made to the running container - such as writing new files, modifying existing files, and deleting files - are written to this thin writable container layer. The diagram below shows a container based on the Ubuntu 15.04 image.

Docker container and memory consumption

Assume I am starting a big number of docker containers which are based on the same docker image. It means that each docker container is running the same application. It could be the case that the application is big enough and requires a lot of hard drive memory.
How is docker dealing with it?
Does all docker containers sharing the static part defined in the docker image?
If not does it make sense to copy the application into some directory on the machine which is used to run docker containers and to mount this app directory for each docker container?
Docker shares resources at kernel level. This means application logic is in never replicated when it is ran. If you start notepad 1000 times it is still stored only once on your hard disk, the same counts for docker instances.
If you run 100 instances of the same docker image, all you really do is keep the state of the same piece of software in your RAM in 100 different separated timelines. The hosts processor(s) shift the in-memory state of each of these container instances against the software controlling it, so you DO consume 100 times the RAM memory required for running the application.
There is no point in physically storing the exact same byte-code for the software 100 times because this part of the application is always static and will never change. (Unless you write some crazy self-altering piece of software, or you choose to rebuild and redeploy your container's image)
This is why containers don't allow persistence out of the box, and how docker differs from regular VM's that use virtual hard disks. However, this is only true for the persistence inside the container. The files that are being changed by docker software on the hard disk are "mounted" into containers using the docker volumes and thus arent really part of the docker environments, but just mounted into them. (Read more about this at: https://docs.docker.com/userguide/dockervolumes/)
Another question that you might want to ask when you think about this, is how does docker store changes that it makes to its disk on runtime. What is really sweet to check out, is how docker actually manages to get this working. The original state of the container's hard disk is what is given to it from the image. It can NOT write to this image. Instead of writing to the image, a diff is made of what is changed in the containers internal state in comparison to what is in the docker image.
Docker uses a technology called "Union Filesystem", which creates a diff layer on top of the initial state of the docker image.
This "diff" (referenced as the writable container in the image below) is stored in memory and disappears when you delete your container. (Unless you use the command "docker commit", however: I don't recommend this. The state of your new docker image is not represented in a dockerfile and can not easily be regenerated from a rebuild)

Resources