docker out of memory - docker

I'm new to docker and have a problem with existing scripts that work on one machine but not on another. I'm willing to read documentation and existing answers but am a little lost on the many levels of abstraction in this topic.
Running an application in docker results in an out of memory exception. I start docker with --ulimit memlock -1:-1 and no other limit on memory seems to be applied.
df -h inside docker yields
root#localhost:/# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-253:0-1312128-9219e5dbff0bc6da3a663fab31ec34e6f6b28ba6c8fbd3b343d9131d41f6b1c9 10G 3.0G 7.1G 30% /
tmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/mapper/fedora-root 50G 20G 28G 42% /etc/hosts
/dev/mapper/fedora-home 401G 151G 231G 40% /var/results
shm 64M 0 64M 0% /dev/shm
When the OOM occurs, the first file system is used to 95%. Where does this limit 10G come from? Where could I adjust it?
All partitions of my device surely have enough free space.
[uscholz#localhost RegressionTesting]$ docker info
Containers: 2
Running: 1
Paused: 0
Stopped: 1
Images: 52
Server Version: 1.10.3
Storage Driver: devicemapper
Pool Name: docker-253:0-1312128-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 12.17 GB
Data Space Total: 107.4 GB
Data Space Available: 32.16 GB
Metadata Space Used: 7.889 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.14 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.122 (2016-04-09)
Execution Driver: native-0.2
Logging Driver: journald
Plugins:
Volume: local
Network: bridge null host
Kernel Version: 4.8.12-200.fc24.x86_64
Operating System: Fedora 24 (Twenty Four)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 8
Total Memory: 7.787 GiB
Name: localhost.localdomain
ID: YXHN:34PG:ZQA3:P4DU:4TFY:6THC:VFI2:E7BE:IGOW:2TTH:3BS7:3OOD
Registries: docker.io (secure)

Where does this limit 10G come from?
It comes from dockerd (daemon) which create special 'device' 10G size by default for new container
Where could I adjust it?
You can set it by providing --storage-opt dm.basesize=50G to dockerd is case of devicemapper storage driver
P.S.: actually, OOM not cause of disk space, but memory, i think
And there are two possible reasons:
you out of real memory. Than use free -m on docker host machine to watch it. Container use the same memory as host with docker
you out of shared memory. Not sure about OOM, but docker set /dev/shm to 64M by default. It is not appropriate for some applications

Related

Running Docker on Ubuntu VM - keeps failing because of disk space

I have been struggling to build an application on my Ubuntu VM. On this VM, I have cloned a git repository, which contains an application (frontend, backend, database). When running the make command, it ultimately fails somewhere in the building process, because of no space left on device. Having increased the RAM and hard-disk size several times now, I am still wondering what exactly causes this error.
Is it the RAM size, or the hard-disk size?
Let me give some more information:
OS: Ubuntu 19.0.4
RAM allocated: 9.2 GB
Processors (CPU): 6
Hard disk space: 43 GB
The Ubuntu VM is a rather clean install, with only Docker, Docker Compose, and NodeJS installed on it. The VM runs via VMWare.
The following repository is cloned, which is meant to be built on the VM:
git#github.com:reactioncommerce/reaction-platform.git
For more information on the requirements they pose, which I seem to meet: https://docs.reactioncommerce.com/docs/installation-reaction-platform
After having increased RAM, CPU processors, and hard disk spaces iteratively, I still end up with the 'no space left on device' error. When checking the disk space, via df -h I get the following:
Filesystem Size Used Avail Use% Mounted on
udev 4.2G 0 4.2G 0% /dev
tmpfs 853M 1.8M 852M 1% /run
/dev/sr0 1.6G 1.6G 0 100% /cdrom
/dev/loop0 1.5G 1.5G 0 100% /rofs
/cow 4.2G 3.7G 523M 88% /
tmpfs 4.2G 38M 4.2G 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
tmpfs 4.2G 584K 4.2G 1% /tmp
tmpfs 853M 12K 853M 1% /run/user/999
Now this makes me wonder, it seems that /dev/sr0, /dev/loop0 and /cow are the partitions that are used when building the application. However, I do not quite understand whether I am constrained by RAM or actual disk space at the moment.
Other Docker issues made me look at the inodes as well, as they could be problematic. And these also seem to be maxed out, however, I think the issue resides in the above.
I saw a similar question on SuperUser, however I could not really mirror his situation to mine, that is found here.

docker is full, all inodes are used

got huge problem, all my inodes seems to be used.
I've cleaned all volumes unused
Cleaned all container and images
with command -> docker prune
but still it seems that it stay full :
Filesystem Inodes IUsed IFree IUse% Mounted on
none 3200000 3198742 1258 100% /
tmpfs 873942 16 873926 1% /dev
tmpfs 873942 13 873929 1% /sys/fs/cgroup
/dev/sda1 3200000 3198742 1258 100% /images
shm 873942 1 873941 1% /dev/shm
tmpfs 873942 1 873941 1% /sys/firmware
docker info
Containers: 5
Running: 3
Paused: 0
Stopped: 2
Images: 23
Server Version: 17.06.1-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 53
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.668GiB
Name: serveur-1
ID: CW7J:FJAH:S4GR:4CGD:ZRWI:EDBY:AYBX:H2SD:TWZO:STZU:GSCX:TRIC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
The only thing i think can do this, is a build i'm doing on this machine.
This build runs a npm install with many files.
Can these files stays on server ?
is there any chance i have to delete these temporary files ?
Is there any dangling volumes left in the system? If you have dangling volumes, it may fill up your disk space.
List all dangling volumes
docker volume ls -q -f dangling=true
Remove all dangling volumes
docker volume rm `docker volume ls -q -f dangling=true`
Found the error,
this seems to be Docker 17.06.1-ce error.
This version seems not correctly deleting images, and keeping files in /var/lib/docker/aufs/mnt/
So just upgrade to new docker version and this will be fine.
now df show me
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 51558236 3821696 45595452 8% /
udev 10240 0 10240 0% /dev
tmpfs 1398308 57696 1340612 5% /run
tmpfs 3495768 0 3495768 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 3495768 0 3495768 0% /sys/fs/cgroup
This is better :)
I had the same problem. Had Jenkins running inside Docker with a volume attached to it. After a few weeks Jenkins told me "npm WARN tar ENOSPC: no space left on device". After some googling I found out that all inodes are taken with sudo df -ih. With sudo find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n I could locate the folder using up all the inodes and it was a certain build with npm. Deleted that folder and now I'm good to go again.
This can also be an effect of a lot of stopped containers, for example if there's a cron job running that use a container, and the docker run commandline used does not include --rm - in that case, every cron invocation will leave a stopped container on the filesystem.
In this case, the output of docker info will show a high number under Server -> Containers -> Stopped.
To cure this:
docker container prune
Add --rm to your docker run command line in the cron job.
In my case it was dangling build cache because removing dangling images does not solve the issue.
This cache can be removed by following command: docker system prune --all --force, but be careful maybe you still need some volumes or images.
In my case, pods with ingress-nginx and modsecurity active are creating a lot of dirs & files on container volumes in mod security structure (/var/lib/docker/overlay2/...) after more than 80 days of execution.
Restarting the pods remove the problem.
This case can be generalize to other cases with container internal storage no controlled

Cannt run or build Docker images on CentOS 7

I'm learning using Docker now, I've installed docker on my server(CentOS 7). But when I follow the offical tutorial, I meet one problem which fails me to continue:
mkdir /var/lib/docker/overlay/7849ab40fd8072dcd724387dab14707bb4af0e94d9ab4f71795d0c478c3d49a9-init/merged/dev/shm: invalid argument
this appears when I build/run most images(few image that didnt fail is offical python:latest and hello-world)
What I'm trying to do is pulling offical image "docker/whalesay" and run it as follows:
docker run docker/whalesay
Unable to find image 'docker/whalesay:latest' locally
latest: Pulling from docker/whalesay
e190868d63f8: Pull complete
909cd34c6fd7: Pull complete
0b9bfabab7c1: Pull complete
a3ed95caeb02: Pull complete
00bf65475aba: Pull complete
c57b6bcc83e3: Pull complete
8978f6879e2f: Pull complete
8eed3712d2cf: Pull complete
Digest: sha256:178598e51a26abbc958b8a2e48825c90bc22e641de3d31e18aaf55f3258ba93b
Status: Downloaded newer image for docker/whalesay:latest
docker: Error response from daemon: mkdir /var/lib/docker/overlay/fb4b7f34f0963d158856dadccec49963e47716865c83066f7e6eaf0bae057a13-init/merged/dev/shm: invalid argument.
See 'docker run --help'.
Here is my docker info:
Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 4
Server Version: 1.13.0
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 03e5862ec0d8d3b3f750e19fca3ee367e13c090e
runc version: 2f7393a47307a16f8cee44a37b262e8b81021e3e
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.702 GiB
Name: iZ25d1y69iaZ
ID: VJAP:FBMM:CQ5I:KIV5:FO47:VJUJ:ECU2:5TOS:JZBE:EUSH:HUFF:NCAZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
I tried to search, not found the same issue. it looks like a file system problem, so here is my df -h and file -s /dev/vda1
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 4.9G 33G 14% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 352K 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
tmpfs 380M 0 380M 0% /run/user/1000
.
/dev/vda1: Linux rev 1.0 ext4 filesystem data, UUID=80b9b662-0a1d-4e84-b07b-c1bf19e72d97 (needs journal recovery) (extents) (large files) (huge files)
I'm new to Docker, so this maybe same configuration problem or version issue, but i havnt find it out.
I appreciate any suggestions and answers!
I was in SLES12 and i fixed the issue by
service stop docker
vi /etc/docker/daemon.json (This is a new file)
Add following entry
{
"storage-driver": "devicemapper"
}
service start docker
Edit: GitHub issue with the discussion of this problem.
I remember there was an issue with 3.10 kernel and overlayfs/ext4/xfs filesystems, and some people noticed that it started working again with a more recent kernel (I think it was in 3.18 that the overlayfs module was added to the kernel).
So if upgrading the kernel is an option to you, you can check if overlayfs+ext4 works.
If a kernel upgrade is not an option, then I guess your only option is to use another storage driver (aufs should not be available, so device mapper)

docker-registry disk full and no ideas to diet it

I have a server with a docker registry, and have pushed a lot of times a build the same :latest tag now my DD is full and I can't get how to diet it.
disk is full
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 48G 45G 397M 100% /
udev 10M 0 10M 0% /dev
tmpfs 794M 81M 713M 11% /run
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/dm-1 9.8G 56M 9.2G 1% /var/lib/docker/devicemapper/mnt/2e895760700ac3e1575e496a4ac6adde4de6129226febba8c0c3126af1655ad9
shm 64M 0 64M 0% /var/lib/docker/containers/5aa47e34d1b8be22deeae473729b4e587e6e4bfe7fb3e262eda891bad4b05042/shm
there is no dangling volume nor images
# docker volume ls -qf dangling=true
#
# docker images -f "dangling=true" -q
#
docker images
[root#kvm22:/etc/cron.daily] # docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
jwilder/nginx-proxy 0.5.0 72b65b5a6f38 4 weeks ago 248.4 MB
registry 2 c9bd19d022f6 11 weeks ago 33.27 MB
registry 2.5 c9bd19d022f6 11 weeks ago 33.27 MB
disk usage
# du -h -d 7 /var/lib/docker/volumes/
12K /var/lib/docker/volumes/24000fbe2e81da06924be8f7ce81e07101824036bca5f87d4d811f2a6f7bfa7b/_data
16K /var/lib/docker/volumes/24000fbe2e81da06924be8f7ce81e07101824036bca5f87d4d811f2a6f7bfa7b
42G /var/lib/docker/volumes/registry_docker-registry-volume/_data/docker/registry/v2/blobs/sha256
42G /var/lib/docker/volumes/registry_docker-registry-volume/_data/docker/registry/v2/blobs
5.9M /var/lib/docker/volumes/registry_docker-registry-volume/_data/docker/registry/v2/repositories/labor-prod
5.9M /var/lib/docker/volumes/registry_docker-registry-volume/_data/docker/registry/v2/repositories
43G /var/lib/docker/volumes/registry_docker-registry-volume/_data/docker/registry/v2
43G /var/lib/docker/volumes/registry_docker-registry-volume/_data/docker/registry
43G /var/lib/docker/volumes/registry_docker-registry-volume/_data/docker
43G /var/lib/docker/volumes/registry_docker-registry-volume/_data
43G /var/lib/docker/volumes/registry_docker-registry-volume
43G /var/lib/docker/volumes/
Output of docker version:
# docker --version
Docker version 1.12.4, build 1564f02
Output of docker info:
# docker info
Containers: 4
Running: 1
Paused: 0
Stopped: 3
Images: 5
Server Version: 1.12.4
Storage Driver: devicemapper
Pool Name: docker-8:1-1184923-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: ext4
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 1.058 GB
Data Space Total: 107.4 GB
Data Space Available: 3.036 GB
Metadata Space Used: 2.142 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.145 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.90 (2014-09-01)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.873 GiB
Name: kvm22
ID: G6OC:EKKY:ER4W:3JVZ:25BI:FF2Y:YXVA:RZRR:WPAP:SASB:AJJA:DM6J
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No memory limit support
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No oom kill disable support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
Insecure Registries:
127.0.0.0/8
I had the same problem. I can't believe there's no ready solution for this. Anyway I hacked a tool together and it seems to work.
You can find it here: https://github.com/Richie765/docker-tools
Basically it uses a bash script to find out which manifests are untagged. Then deletes them through the registry API. Afterwards you can run a garbage collection to actually delete the data.
I'm sure the script isn't perfect. Any improvements are welcome!
Found another tool here.
https://github.com/burnettk/delete-docker-registry-image
Included a clean_old_version.py script.
I give it a try too.
In case anyone still has this problem:
This is the 'reset' way how I solved it:
You can stop and delete the registry
docker stop registry && docker rm -v registry
and restart it afterwards:
docker run -d -p 5000:5000 --restart=always --name registry registry:2
Then you would have to rebuild your images localy and push them to the registry again.

docker disk space grows faster than container's

Docker containers that are modifying files, adding, and deleting extensively (leveldb) are growing disk usage faster that the container itself reports and eventually use up all the disk.
Here's one snapshot of df, and a a second. You'll note that disk space has increased considerably (300Mbytes) from the host's perspective, but the container's self-reported usage of disk space has only increased by 17Mbytes. As this continues the host runs out of disk.
Ubuntu stock 14.04, Docker version 1.10.2, build c3959b1.
Is there some sort of trim-like issue going on here?
root#9e7a93cbcb02:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-202:1-136171-d4[...] 9.8G 667M 8.6G 8% /
tmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/disk/by-uuid/0a76513a-37fc-43df-9833-34f8f9598ada 7.8G 2.9G 4.5G 39% /etc/hosts
shm 64M 0 64M 0% /dev/shm
And later on:
root#9e7a93cbcb02:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-202:1-136171-d4[...] 9.8G 684M 8.6G 8% /
tmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/disk/by-uuid/0a76513a-37fc-43df-9833-34f8f9598ada 7.8G 3.2G 4.2G 43% /etc/hosts
shm 64M 0 64M 0% /dev/shm
This is happening because of a kernel bug fix that has not been propagated to many mainstream OS distros. It's actually quite bad for newbie Docker users who naively fire up docker on the default Amazon AMI as I did.
Stick with CoreOS Stable, you won't have this issue. I have zero affiliation with CoreOS and frankly am greatly annoyed to have to deal with Yet Another Distro. In the CoreOS distro or other correctly working linux kernel the disk space of container and host track each other up and down correctly as the container frees or uses space. I'll note that OSX or other virtual box distros use CoreOS and thus work correctly.
Here's a long writeup on a very similar issue, but the root cause is a trim/discard issue in devicemapper. You need a fairly recent version of the Linux kernel to handle this properly. I'd go so far as to say that Docker is unfit for purpose unless you have the correct Linux kernel. See that article for a discussion on which version of your distro to use.
Note that above article only deals with management of docker containers and images, but AFAICT it also affects attempts by the container itself to free up disk space during normal addition/removal of files or blocks.
Be careful of what distro your cloud provider is using for cloud container management.

Resources