Is their a way to run replicas on memory tmpfs on host. I got the problem (infinity restart)
time="2018-11-02T21:55:05Z" level=fatal msg="Error running start replica command: failed to find extents, error: invalid argument"
Is the service able to work on disks mounted in memory?
Currently OpenEBS Jiva storage engine support only those file systems which supports extents mapping ext4,XFS etc...
where as tmpfs does not support extents mapping hence it fails.
Related
I am trying to fetch mass of urls with Selenium WebDriver (selenium/standalone-chrome:96.0 image), running in container on EC2 instance with 30GB storage. I put many efforts to avoid disc space leaking during this proccess, but finally gave up. So after a while container run out of space and i get error from WebDriver like selenium.common.exceptions.WebDriverException: Message: unknown error: cannot create temp dir for user data dir
As workaround I can force container exit after a while, so docker will restart container (with restart:always policy), but disc space is not reclaimed, and sooner or later docker restart manager throws error like
restartmanger wait error: mount /dev/mapper/docker-259:3-394503-72f7b76024003665f890079f6f681414587483fa2f30e0f080c027cd516ba7d2:/var/lib/docker/devicemapper/mnt/72f7b76024003665f890079f6f681414587483fa2f30e0f080c027cd516ba7d2: input/output error\nFailed to mount; and leaves container stopped.
Are there any technique to reclaim disk space on container restart?
UPDATE
creating/closing webdriver, performed after each driver.get()
def create_webdriver():
global driver
try:
logger.info("WebDriver: creating...")
options = Options()
options.add_argument("start-maximized")
options.add_argument("enable-automation")
options.add_argument("--headless")
options.add_argument("--no-sandbox")
options.add_argument("--disable-infobars")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-browser-side-navigation")
options.add_argument("--disable-gpu")
driver = webdriver.Chrome(options=options)
except Exception:
logger.exception("WebDriver: exception while creating, can not manage, exiting.")
exit(1)
def close_webdriver():
global driver
if driver is not None:
driver.quit()
driver = None
UPDATE2
It seems that there are no disk space leakage, but some issues with docker devicemapper fs on EC2 instance. I carefully investigate disk and docker space usage during the proccess, and find no issues
devtmpfs 16323728 120 16323608 1% /dev
tmpfs 16333664 0 16333664 0% /dev/shm
/dev/nvme0n1p1 8189348 1919080 6170020 24% /
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 9 1 9.033GB 8.571GB (94%)
Containers 8 6 144.8MB -2B (0%)
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B
but anyway container feels bad
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot create temp dir for user data dir
and exit, docker can't restart it and there are errors in /var/log/docker
time="2021-12-26T01:36:06.030765815Z" level=error msg="Driver devicemapper couldn't return diff size of container 258399ca6d95cb3510e5e02fec9253b2f22852e8a3553cfad8774b9f913ed279: Failed to mount; dmesg: <3>[ 3761.830462] Buffer I/O error on dev dm-8, logical block 2185471, lost async page write\n<4>[ 3761.839429] JBD2: recovery failed\n<3>[ 3761.843623] EXT4-fs (dm-8): error loading journal\n: mount /dev/mapper/docker-259:3-394503-26a311e2927d080ef4895f43d7dcd6ddaa26e5c0d8e71b6eb46bcdc8d1601194:/var/lib/docker/devicemapper/mnt/26a311e2927d080ef4895f43d7dcd6ddaa26e5c0d8e71b6eb46bcdc8d1601194: input/output error"
time="2021-12-26T01:36:25.009915383Z" level=info msg="ignoring event" container=f47ab38bdab172205bd30c3cdbc6723162e4422ef4dcda23f6fec0ac99a20035 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2021-12-26T01:36:25.010710566Z" level=info msg="shim disconnected" id=f47ab38bdab172205bd30c3cdbc6723162e4422ef4dcda23f6fec0ac99a20035
time="2021-12-26T01:36:25.010797187Z" level=error msg="copy shim log" error="read /proc/self/fd/36: file already closed"
time="2021-12-26T01:36:28.788036177Z" level=warning msg="error locating sandbox id c1e0abc725ee3e88f388042a34b8e46db09a8fd8024774862899d0f7d9af721b: sandbox c1e0abc725ee3e88f388042a34b8e46db09a8fd8024774862899d0f7d9af721b not found"
time="2021-12-26T01:36:28.788396052Z" level=error msg="Error unmounting device 8de02009e67a0fea87313b35b117eaed6cf654837532e04ce16a6fc0846d1954: invalid argument" storage-driver=devicemapper
time="2021-12-26T01:36:28.788426923Z" level=error msg="error unmounting container" container=f47ab38bdab172205bd30c3cdbc6723162e4422ef4dcda23f6fec0ac99a20035 error="invalid argument"
time="2021-12-26T01:36:28.789562261Z" level=error msg="f47ab38bdab172205bd30c3cdbc6723162e4422ef4dcda23f6fec0ac99a20035 cleanup: failed to delete container from containerd: no such container"
time="2021-12-26T01:36:28.794739546Z" level=error msg="restartmanger wait error: mount /dev/mapper/docker-259:3-394503-8de02009e67a0fea87313b35b117eaed6cf654837532e04ce16a6fc0846d1954:/var/lib/docker/devicemapper/mnt/8de02009e67a0fea87313b35b117eaed6cf654837532e04ce16a6fc0846d1954: input/output error\nFailed to mount; dmesg: <3>[ 3784.574178] Buffer I/O error on dev dm-10, logical block 1048578, lost async page write\n<4>[ 3784.583183] JBD2: recovery failed\n<3>[ 3784.587446] EXT4-fs (dm-10): error loading journal\n\ngithub.com/docker/docker/daemon/graphdriver/devmapper.(*DeviceSet).MountDevice\n\t/builddir/build/BUILD/docker-20.10.7-3.71.amzn1/src/github.com/docker/docker/daemon/graphdriver/devmapper/deviceset.go:2392\ngithub.com/docker/docker/daemon/graphdriver/devmapper.(*Driver).Get\n\t/builddir/build/BUILD/docker-20.10.7-3.71.amzn1/src/github.com/docker/docker/daemon/graphdriver/devmapper/driver.go:208\ngithub.com/docker/docker/layer.(*referencedRWLayer).Mount\n\t/builddir/build/BUILD/docker-20.10.7-3.71.amzn1/src/github.com/docker/docker/layer/mounted_layer.go:104\ngithub.com/docker/docker/daemon.(*Daemon).Mount\n\t/builddir/build/BUILD/docker-20.10.7-3.71.amzn1/src/github.com/docker/docker/daemon/daemon.go:1320\ngithub.com/docker/docker/daemon.(*Daemon).conditionalMountOnStart\n\t/builddir/build/BUILD/docker-20.10.7-3.71.amzn1/src/github.com/docker/docker/daemon/daemon_unix.go:1360\ngithub.com/docker/docker/daemon.(*Daemon).containerStart\n\t/builddir/build/BUILD/docker-20.10.7-3.71.amzn1/src/github.com/docker/docker/daemon/start.go:145\ngithub.com/docker/docker/daemon.(*Daemon).handleContainerExit.func1\n\t/builddir/build/BUILD/docker-20.10.7-3.71.amzn1/src/github.com/docker/docker/daemon/monitor.go:84\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1374"
SOLVED
It was really issue with default AMI Linux docker configuration with devicemapper storage driver on EC2 instance. Clean install docker on Ubuntu 18.04 with overlay2 storage driver solves the issue completely.
Below is the file system in overlay2 eating disk space, on Ubuntu Linux 18.04 LTS
Disk space of server 125GB
overlay 124G 6.0G 113G 6% /var/lib/docker/overlay2/9ac0eb938cd2a50bb87e8ed13605d3f09214fdd9c8967f18dfc3f9432701fea7/merged
overlay 124G 6.0G 113G 6% /var/lib/docker/overlay2/397b099799212060ee7a4718660aa13aba8aa1fbb92f4d88d86fbad94e572847/merged
shm 64M 0 64M 0% /var/lib/docker/containers/7ffb129016d187a61a31c33f9e468b98d0ac7ab1771b87631f6caade5b84adc6/mounts/shm
overlay 124G 6.0G 113G 6% /var/lib/docker/overlay2/df7c4acee73f7aa2536d2a8929a48241bc8e92a5f7b9cb63ab70cea731b52cec/merged
Another solution if the above doesn't work is setup a log rotation.
nano /etc/docker/daemon.json
if not found
cat > daemon.json
Add the following lines to file:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Restart the docker daemon: systemctl restart docker
Please refer: How to setup log rotation post installation
In case someone else runs into this, here's what's happening:
Your container may be writing data (logs, deployables, downloads...) to its local filesystem, and overlay2 will create a diff on each append/create/delete, so the container's filesystem will keep growing until it fills all available space on the host.
There are a few workarounds that won't require changing the storage driver:
first of all, make sure the data saved by the container may be discarded (you probably don't want to delete your database or anything similar)
periodically stop the container, prune the system docker system prune and restart the container
make sure the container doesn't write to its local filesystem, but if you can't:
replace any directories the container writes to with volumes or mounts.
Follow the Steps if your Server is Linux Ubuntu 18.04 LTS (should work for others too)
Docker info for Overlay2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
if you got the following lines when you enter df -h --total
19M /var/lib/docker/overlay2/00d82017328c49c661c78ce14550c4073c50a550fe5004911bd3488b085aea76/diff
5.9M /var/lib/docker/overlay2/00e3e4fa0cbff7c242c38cfc9501ef1a523158d69b50779e08a773e7e22a01f1/diff
44M /var/lib/docker/overlay2/0e8e7e893b2c8aa17b4875d421670e058e4d97de066c970bbeab6cba566a44ba/diff
28K /var/lib/docker/overlay2/12a4c4e4877d35e9db657e4acff32e513042cb44119cca5c43fc19ad81c3915f/diff
............
............
then do the changes as follows:
First stop docker : sudo systemctl stop docker
Next: got to path /etc/docker
Check file daemon.json if not found
cat > daemon.json
and enter the following inside:
{
"storage-driver": "aufs"
}
and close
Finally restart docker : sudo systemctl start docker
Check if the changes have been made:
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 0
Dirperm1 Supported: true
Changing the file system can help you to resolve this issue.
Please if check your docker version supports aufs here:
Please do check the Linux distribution and what storage drivers supported here :
I had a similar issue with the docker swarm.
The docker system prune --volume, restarting the server, removing the swarm stack and recreation was not helping.
In my case, I was hosting RabbitMQ where docker-compose config was:
services:
rabbitmq:
image: rabbitmq:.....
....
volumes:
- "${PWD}/queues/data:/var/lib/rabbitmq"
In such a case each container restart, each server reboot, just all that leads to restarting the rabbitmq container takes more and more hard drive space.
Initial value:
ls -ltrh queues/data/mnesia/ | wc -l
61
du -sch queues/data/mnesia/
7.8G queues/data/mnesia/
7.8G total
After restart:
ls -ltrh queues/data/mnesia/ | wc -l
62
du -sch queues/data/mnesia/
8.3G queues/data/mnesia/
8.3G total
My solution was to stop the rabbitmq and remove directories in queues/data/mnesia/. Then restart the rabbitmq.
Maybe sth is wrong with my config... But if you have such an issue then worth checking your volumes of containers whether do not leave some trash there.
If you are troubled by that /var/lib/docker/overlay2 directory is taking too much space(use du command to check space usage), then the answer below may be suitable for you.
docker xxx prune commands will clean up something unused, such as all stopped containers(in /var/lib/docker/containers), files in the virtual filesystems of stopped containers(in /var/lib/docker/overlay2), unmounted volumes(in /var/lib/docker/volumes) and images that don't have related containers(in /var/lib/docker/images). But all of this will not touch the containers which are in running.
limiting the size of logs in configurations will limit the size of /var/lib/docker/containers/*/*-json.log, but it doesn't involve the overlay2 directory.
you can find two folders called merged and diff in /var/lib/docker/overlay2/<hash>/. If these folders are big. That means there are high disk usage in your containers SELVES but not the docker host. In this case, you have to attach a terminal into relevant containers, find the high usage locations in the containers, and take your own solutions.
Just like Nick M said.
I have a server where I run some containers with volumes. All my volumes are in /var/lib/docker/volumes/ because docker is managing it. I use docker-compose to start my containers.
Recently, I tried to stop one of my container but it was impossible :
$ docker-compose down
[17849] INTERNAL ERROR: cannot create temporary directory!
So, I checked how the data is mounted on the server :
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 7,8G 0 7,8G 0% /dev
tmpfs 1,6G 1,9M 1,6G 1% /run
/dev/md3 20G 19G 0 100% /
tmpfs 7,9G 0 7,9G 0% /dev/shm
tmpfs 5,0M 0 5,0M 0% /run/lock
tmpfs 7,9G 0 7,9G 0% /sys/fs/cgroup
/dev/md2 487M 147M 311M 33% /boot
/dev/md4 1,8T 1,7G 1,7T 1% /home
tmpfs 1,6G 0 1,6G 0% /run/user/1000
As you can see, the / is only 20Go, so it is full and I can't stop my containers using docker-compose.
My questions are :
There is a simple solution to increase the available space in the
/, using /dev/md4 ?
Or can I move volumes to another place without losing data ?
This part of the Docker Daemon is confirgurable. Best practices would have you change the data folder; this can be done with OS-level Linux commands like a symlink... I would say it's better to actually configure the Docker Daemon to store the data elsewhere!
You can do that by editing the Docker command line (e.g. the systemd script that starts the Docker daemon), or change /etc/docker/daemon.json.
The file should have this content:
{
"data-root": "/path/to/your/docker"
}
If you add a new hard drive, partition, or mount point you can add it here and docker will store its data there.
I landed here as I had the very same issue. Even though some sources suggest you could do it with a symbolic link this will cause all kinds of issues.
Depending on the OS and Docker version I had malformed images, weird errors or the docker-daemon refused to start.
Here is a solution, but it seems it varies a little from version to version. For me the solution was:
Open
/lib/systemd/system/docker.service
And change this line
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
to:
ExecStart=/usr/bin/dockerd -g /mnt/WHATEVERYOUR/PARTITIONIS/docker --containerd=/run/containerd/containerd.sock
I solved it creating a symbolic link to a partition with bigger size:
ln -s /scratch/docker_meta /var/lib/docker
/scratch/docker_meta is the folder that I have in a bigger partition.
Do a bind mount.
For example, moving /docker/volumes to /mnt/large.
Append line into /etc/fstab.
/mnt/large /docker/volumes none bind 0 0
And then.
mv /docker/volumes/* /mnt/large/
mount /docker/volumes
Do not forget chown and chmod of /mnt/large first, if you are using non-root docker.
I have been using the VSCode Remote Container Plugin for some time without issue. But today when I tried to open my project the remote container failed to open with the following error:
Command failed: docker exec -w /home/vscode/.vscode-server/bin/9833dd88 24d0faab /bin/sh -c echo 34503 >.devport
rejected promise not handled within 1 second: Error: ENOSPC: no space left on device, mkdir '/home/vscode/.vscode-server/data/logs/20191209T160810
It looks like the container is out of disk space but I'm not sure how to add more.
Upon further inspection I am a bit confused. When I run df from in the container it shows that I have used 60G of disk space but the size of my root directory is only ~9G.
$ df
Filesystem Size Used Avail Use% Mounted on
overlay 63G 61G 0 100% /
tmpfs 64M 0 64M 0% /dev
tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup
shm 64M 0 64M 0% /dev/shm
/dev/sda1 63G 61G 0 100% /etc/hosts
tmpfs 7.4G 0 7.4G 0% /proc/acpi
tmpfs 7.4G 0 7.4G 0% /sys/firmware
$ du -h --max-depth=1 /
9.2G /
What is the best way to resolve this issue?
Try docker system prune --all if you don't see any container or images with docker ps and docker images, but be careful it removes all cache and unused containers, images and network. docker ps -a and docker images -a shows you all the containers and images including ones that are currently not running or not in use.
Check the docs if problem persists: Clean unused docker resources
It looks like all docker containers on your system share the same disk space. I found two solutions:
Go into Docker Desktop's settings and increase the amount of disk space available.
Run docker container prune to free disk space being used by stopped containers.
In my case I had a bunch stopped docker containers from months back taking up all of the disk space allocated to Docker.
With some sites headless Chromium is failing when it is running inside Docker container:
[0520/093103.024239:ERROR:platform_shared_memory_region_posix.cc(268)] Failed to reserve 16728064 bytes for shared memory.: No space left on device (28)
[0520/093103.024591:ERROR:validation_errors.cc(76)] Invalid message: VALIDATION_ERROR_UNEXPECTED_NULL_POINTER (null field 1)
[0520/093103.024946:FATAL:memory.cc(22)] Out of memory. size=16723968
How should I tune Docker to fix this?
You're running out of shared memory as is described in line 1.
[0520/093103.024239:ERROR:platform_shared_memory_region_posix.cc(268)] Failed to reserve 16728064 bytes for shared memory.: No space left on device (28)
This is handled by /dev/shm which is set to a default of 64mb in Docker, which isn't that much for modern web applications.
For context on /dev/shm see here https://superuser.com/questions/45342/when-should-i-use-dev-shm-and-when-should-i-use-tmp
Option 1:
Run chrome with --disable-dev-shm-usage
Option 2:
Set /dev/shm size to a reasonable amount docker run -it --shm-size=1g replacing 1g with whatever amount you want.