Docker daemon memory leak due to logs from long running process

Docker daemon memory leak due to logs from long running process - memory

I have the following setup:
Perl service running in a container and writing logs out to STDERR
logspout to ship those logs out to a remote server for archiving
in a 600 MB RAM machine.
I also truncate the logs periodically at:
/var/lib/docker/containers/CID/CID-json.log
as suggested here to avoid 100% disk scenarios.
Problem
Docker daemon starts of with low memory usage, 1% initially and slowly increases to 40% after 2 days of running the container.
Reference
Docker daemon memory leak has been talked about in this issue and this issue. But both of them are closed now saying merged at a commit. Am running the latest major version of docker (Docker version 1.4.0, build 4595d4f), but still face a monotonically increasing memory usage issue.
EDIT: I did this experiment: Just run a bash process in the container, print out a lot of lines to STDERR, docker daemon process's memory usage accelerates very quickly
Does docker do some log buffering and doesn't release memory even if underlying log file (/var/lib/docker/containers/CID/CID-json.log) is cleared?
There's apparently no way to clear the logs. Will this commit solve this issue for long running tasks?
I don't know why docker daemon's memory usage keeps increasing. How do I debug this issue?

There is still at least one outstanding issue relating to memory leaks with logs: https://github.com/docker/docker/issues/9139

This may not be what you are looking for, but I usually run a cron job to restart my containers after a certain amount of time everyday. This ensures that the container has enough RAM all the time, and also I generally restrict the maximum ram usage by the container while creating them.
Containers take only few seconds to restart and serve data and if you are not running a High Availability service and can afford a few seconds downtime, consider restarting the container (assuming that you dont have persistent volumes).
However, if you do find a solution to your problem, do let us know.

docker rm $(docker ps -a -q)
docker rmi --force $(docker images -q)
docker system prune --force
Need to be root user.
systemctl stop docker
rm -rf /var/lib/docker/aufs
apt-get autoclean
apt-get autoremove
systemctl start docker

Related

Reduce the disk space Docker uses [duplicate]

(Post created on Oct 05 '16)
I noticed that every time I run an image and delete it, my system doesn't return to the original amount of available space.
The lifecycle I'm applying to my containers is:
> docker build ...
> docker run CONTAINER_TAG
> docker stop CONTAINER_TAG
> rm docker CONTAINER_ID
> rmi docker image_id
[ running on a default mac terminal ]
The containers in fact were created from custom images, running from node and a standard redis. My OS is OSX 10.11.6.
At the end of the day I see I keep losing Mbs. How can I face this problem?
EDITED POST
2020 and the problem persists, leaving this update for the community:
Today running:
macOS 10.13.6
Docker Engine 18.9.2
Docker Desktop Cli 2.0.0.3
The easiest way to workaround the problem is to prune the system with the Docker utilties.
docker system prune -a --volumes

WARNING:
By default, volumes are not removed to prevent important data from being deleted if there is currently no container using the volume. Use the --volumes flag when running the command to prune volumes as well:
Docker now has a single command to do that:
docker system prune -a --volumes
See the Docker system prune docs

There are three areas of Docker storage that can mount up, because Docker is cautious - it doesn't automatically remove any of them: exited containers, unused container volumes, unused image layers. In a dev environment with lots of building and running, that can be a lot of disk space.
These three commands clear down anything not being used:
docker rm $(docker ps -f status=exited -aq) - remove stopped containers
docker rmi $(docker images -f "dangling=true" -q) - remove image layers that are not used in any images
docker volume rm $(docker volume ls -qf dangling=true) - remove volumes that are not used by any containers.
These are safe to run, they won't delete image layers that are referenced by images, or data volumes that are used by containers. You can alias them, and/or put them in a CRON job to regularly clean up the local disk.

It is also worth mentioning that file size of docker.qcow2 (or Docker.raw on High Sierra with Apple Filesystem) can seem very large (~64GiB), larger than it actually is, when using the following command:
ls -klsh Docker.raw
This can be somehow misleading because it will output the logical size of the file rather than its physical size.
To see the physical size of the file you can use this command:
du -h Docker.raw
Source: https://docs.docker.com/docker-for-mac/faqs/#disk-usage

Why does the file keep growing?
If Docker is used regularly, the size of the Docker.raw (or Docker.qcow2) can keep growing, even when files are deleted.
To demonstrate the effect, first check the current size of the file on the host:
$ cd ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/
$ ls -s Docker.raw
9964528 Docker.raw
Note the use of -s which displays the number of filesystem blocks actually used by the file. The number of blocks used is not necessarily the same as the file “size”, as the file can be sparse.
Next start a container in a separate terminal and create a 1GiB file in it:
$ docker run -it alpine sh
# and then inside the container:
/ # dd if=/dev/zero of=1GiB bs=1048576 count=1024
1024+0 records in
1024+0 records out
/ # sync
Back on the host check the file size again:
$ ls -s Docker.raw
12061704 Docker.raw
Note the increase in size from 9964528 to 12061704, where the increase of 2097176 512-byte sectors is approximately 1GiB, as expected. If you switch back to the alpine container terminal and delete the file:
/ # rm -f 1GiB
/ # sync
then check the file on the host:
$ ls -s Docker.raw
12059672 Docker.raw
The file has not got any smaller! Whatever has happened to the file inside the VM, the host doesn’t seem to know about it.
Next if you re-create the “same” 1GiB file in the container again and then check the size again you will see:
$ ls -s Docker.raw
14109456 Docker.raw
It’s got even bigger! It seems that if you create and destroy files in a loop, the size of the Docker.raw (or Docker.qcow2) will increase up to the upper limit (currently set to 64 GiB), even if the filesystem inside the VM is relatively empty.
The explanation for this odd behaviour lies with how filesystems typically manage blocks. When a file is to be created or extended, the filesystem will find a free block and add it to the file. When a file is removed, the blocks become “free” from the filesystem’s point of view, but no-one tells the disk device. Making matters worse, the newly-freed blocks might not be re-used straight away – it’s completely up to the filesystem’s block allocation algorithm. For example, the algorithm might be designed to favour allocating blocks contiguously for a file: recently-freed blocks are unlikely to be in the ideal place for the file being extended.
Since the block allocator in practice tends to favour unused blocks, the result is that the Docker.raw (or Docker.qcow2) will constantly accumulate new blocks, many of which contain stale data. The file on the host gets larger and larger, even though the filesystem inside the VM still reports plenty of free space.
TRIM
A TRIM command (or a DISCARD or UNMAP) allows a filesystem to signal to a disk that a range of sectors contain stale data and they can be forgotten. This allows:
an SSD drive to erase and reuse the space, rather than spend time shuffling it around; and
Docker for Mac to deallocate the blocks in the host filesystem, shrinking the file.
So how do we make this work?
Automatic TRIM in Docker for Mac
In Docker for Mac 17.11 there is a containerd “task” called trim-after-delete listening for Docker image deletion events. It can be seen via the ctr command:
$ docker run --rm -it --privileged --pid=host walkerlee/nsenter -t 1 -m -u -i -n ctr t ls
TASK PID STATUS
vsudd 1741 RUNNING
acpid 871 RUNNING
diagnose 913 RUNNING
docker-ce 958 RUNNING
host-timesync-daemon 1046 RUNNING
ntpd 1109 RUNNING
trim-after-delete 1339 RUNNING
vpnkit-forwarder 1550 RUNNING
When an image deletion event is received, the process waits for a few seconds (in case other images are being deleted, for example as part of a docker system prune ) and then runs fstrim on the filesystem.
Returning to the example in the previous section, if you delete the 1 GiB file inside the alpine container
/ # rm -f 1GiB
then run fstrim manually from a terminal in the host:
$ docker run --rm -it --privileged --pid=host walkerlee/nsenter -t 1 -m -u -i -n fstrim /var/lib/docker
then check the file size:
$ ls -s Docker.raw
9965016 Docker.raw
The file is back to (approximately) it’s original size – the space has finally been freed!
Hopefully this blog will be helpful, also checkout the following macos docker utility scripts for this problem:
https://github.com/wanliqun/macos_docker_toolkit

Docker on Mac has an additional problem that is hurting a lot of people: the docker.qcow2 file can grow out of proportions (up to 64gb) and won't ever shrink back down on its own.
https://github.com/docker/for-mac/issues/371
As stated in one of the replies by djs55 this is in the planning to be fixed, but its not a quick fix. Quote:
The .qcow2 is exposed to the VM as a block device with a maximum size
of 64GiB. As new files are created in the filesystem by containers,
new sectors are written to the block device. These new sectors are
appended to the .qcow2 file causing it to grow in size, until it
eventually becomes fully allocated. It stops growing when it hits this
maximum size.
...
We're hoping to fix this in several stages: (note this is still at the
planning / design stage, but I hope it gives you an idea)
1) we'll switch to a connection protocol which supports TRIM, and
implement free-block tracking in a metadata file next to the qcow2.
We'll create a compaction tool which can be run offline to shrink the
disk (a bit like the qemu-img convert but without the dd if=/dev/zero
and it should be fast because it will already know where the empty
space is)
2) we'll automate running of the compaction tool over VM reboots,
assuming it's quick enough
3) we'll switch to an online compactor (which is a bit like a GC in a
programming language)
We're also looking at making the maximum size of the .qcow2
configurable. Perhaps 64GiB is too large for some environments and a
smaller cap would help?
Update 2019: many updates have been done to Docker for Mac since this answer was posted to help mitigate problems (notably: supporting a different filesystem).
Cleanup is still not fully automatic though, you may need to prune from time to time. For a single command that can help to cleanup disk space, see zhongjiajie's answer.

docker container prune
docker system prune
docker image prune
docker volume prune

Since nothing here was working for me, here's what I did. Check file size:
ls -lhks ~/Library/Containers/com.docker.docker//Data/vms/0/data/Docker.raw
Then in the docker desktop simply reduce the disk image size (I was using raw format). It will say it will delete everything, but by the time you are reading this post, you probably already have. So that creates a fresh new empty file.

i'm not sure if it is related to the current topic , but this been a solution for me personally
open docker settings -> resources -> disk image size - 16gb

There are several options on how to limit docker diskspace, I'd start by limiting/rotating the logs: Docker container logs taking all my disk space
E.g. if you have a recent docker version, you can start it with an --log-opt max-size=50m option per container.
Also - if you've got old, unused containers, you can consider having a look at the docker logs which are located at /var/lib/docker/containers/*/*-json.log

$ sudo docker system prune
WARNING! This will remove:
all stopped containers
all networks not used by at least one container
all dangling images
all dangling build cache

Docker prune does not help in clearing space, docker-desktop-data seems to be not clearning

I am trying to clear my system of docker images, constantly pruning, even my latest prune said 22GB got freed, but when I look at my system, there is not even a single MB which got freed. My system is not on very low storage, I am positively sure there is so much 100's of GB of docker images in mine, which are not used. I need help clearing them, if anyone else also faced and resolved it.
docker system prune -a
Above command is given and tried, not working
Software : Docker Desktop
OS : Windows
WSL backend Docker running
From docker info, I could find Docker Root Dir: /var/lib/docker
Which is not present in any of my wsl like ubuntu, there is two docker wsl in my system, docker-desktop-data and docker-desktop. I am suspecting mostly due to this difference only. Someone help

Does the following work?
docker rmi -f $(docker images -f "dangling=true" -q)
You can actually see what's being deleted that way.

Docker taking much more space than sum of containers, images and volumes

Docker running on Ubuntu is taking 18G of disk space (on a partition of 20G), causing server crashes. Commands below show that there is a serious mismatch between "official" image, container and volume sizes and the docker folder size.
What causes this and how can I cleanup ?
I already tried docker system prunewhich doesn't help.
du -sh /var/lib/docker
docker system df
du -sh /var/lib/docker/*
du -sh /var/lib/docker/containers/*

I was having the same problem. I solved my problem by deleting log files
sudo sh -c "truncate -s 0 /var/lib/docker/containers/*/*-json.log"
Link: How to clear the logs properly for a Docker container?

You have two containers that are eating your storage. Those containers must be running, because you said you already ran docker system prune. Otherwise /var/lib/docker/containers would be empty.
So check why are those two consuming so much. Probably they are logging too much to stdout.

docker system prune should clean up old unused layers, I have managed to get lots of disk space back several times with this. Hope it helps.

Docker CPU and memory too low

I am newbie to Docker world. I could successfully build and run container with Tomcat. But performance is very poor. I logged into running system and found that only 2 cpu cores and 4 GB RAM is allocated. Is it one of reason for bad performance, if so how can I allocate more resources.
I tried following command, but no luck..
docker run --rm -c 3 -p 32772:8080 --memory=8Gb -d helloworld
Any pointer will be helpful.
thanks in advance.

Do you use Docker for Windows/Mac? Then you can change it in the settings (Docker icon in the taskbar).
On Windows, Docker runs in Hyper-V without dynamic memory, so the memory will not be avalible to your system even if it isn't used.
With docker info you can find out how many resources are avalible.
The bad performace may also be caused by very slow file access on Docker for Mac.
On Linux, Docker has no upper limit by default.
The cpu and memory args of docker run limit the resources for one container, if they are not set there is no upper limit.

how to kill lots of docker container processes effectively and faster?

We are using Jenkins and Docker in combination.. we have set up Jenkins like master/slave model, and containers are spun up in the slave agents.
Sometimes due to bug in jenkins docker plugin or for some unknown reasons, containers are left dangling.
Killing them all takes time, about 5 seconds per container process and we have about 15000 of them. Will take ~24hrs to finish running the cleanup job. How can I remove the containers bunch of them at once? or effectively so that it takes less time?
Will uninstalling docker client, remove the containers?
Is there a volume where these containers process kept, could be removed (bad idea)
Any threading/parallelism to remove them faster?
I am going to run a cron job weekly to patch these bugs, but right now I dont have whole day to get these removed.

Try this:
Uninstall docker-engine
Reboot host
rm /var/lib/docker
Rebooting effectively stops all of the containers and uninstalling docker prevents them from coming back upon reboot. (in case they have restart=always set)

If you are interesting in only killing the processes as they are not exiting properly (my assessment of what you mean--correct me if I'm wrong), there is a way to walk the running container processes and kill them using the Pid information from the container's metadata. As it appears you don't necessarily care about clean process shutdown at this point (which is why docker kill is taking so long per container--the container may not respond to the right signals and therefore the engine waits patiently, and then kills the process), then a kill -9 is a much more swift and drastic way to end these containers and clean up.
A quick test using the latest docker release shows I can kill ~100 containers in 11.5 seconds on a relatively modern laptop:
$ time docker ps --no-trunc --format '{{.ID}}' | xargs -n 1 docker inspect --format '{{.State.Pid}}' $1 | xargs -n 1 sudo kill -9
real 0m11.584s
user 0m2.844s
sys 0m0.436s
A clear explanation of what's happening:
I'm asking the docker engine for an "full container ID only" list of all running containers (the docker ps)
I'm passing that through docker inspect one by one, asking to output only the process ID (.State.Pid), which
I then pass to the kill -9 to have the system directly kill the container process; much quicker than waiting for the engine to do so.
Again, this is not recommended for general use as it does not allow for standard (clean) exit processing for the containerized process, but in your case it sounds like that is not important criteria.
If there is leftover container metadata for these exited containers you can clean that out by using:
docker rm $(docker ps -q -a --filter status=exited)
This will remove all exited containers from the engine's metadata store (the /var/lib/docker content) and should be relatively quick per container.

So,
docker kill $(docker ps -a -q)
isn't what you need?
EDIT: obviously it isn't. My next take then:
A) somehow create a list of all containers that you want to stop.
B) Partition that list (maybe by just slicing it into n parts).
C) Kick of n jobs in parallel, each one working one of those list-slices.
D) Hope that "docker" is robust enough to handle n processes sending n kill requests in sequence in parallel.
E) If that really works: maybe start experimenting to determine the optimum setting for n.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart