Docker error at higher core counts on a multi core machine

Docker error at higher core counts on a multi core machine - docker

I am running a Centos Container using docker on a RHEL 65 machine. I am trying to run an MPI application (MILC) on 16 cores.
My server has 20 cores and 128 GB of memory.
My application runs fine until 15 cores but fails with the APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) error when using 16 cores and up. At 16 cores and up these are the messages I see in the logs.
Jul 16 11:29:17 localhost abrt[100668]: Can't open /proc/413/status: No such file or directory
Jul 16 11:29:17 localhost abrt[100669]: Can't open /proc/414/status: No such file or directory
Jul 16 11:29:17 localhost abrt[100670]: Can't open /proc/417/status: No such file or directory
A few details on the container:
kernel 2.6.32-431.el6.x86_64
Official centos from docker hub
Started container as:
docker run -t -i -c 20 -m 125g --name=test --net=host centos /bin/bash
I would greatly appreciate any and all feedback regarding this. Please do let me know if I can provide any further information.
Regards

Related

Vda devices in podman

I have a podman image with Ubuntu 18.04 and if I run a container with the command:
podman run -it <my image> /bin/bash
and I run the command:
cat /proc/partitions
I see the following:
root#0382d078cd30:/# cat /proc/partitions
major minor #blocks name
11 0 1048575 sr0
252 0 104857600 vda
252 1 1024 vda1
252 2 130048 vda2
252 3 393216 vda3
252 4 104332271 vda4
Can anyone explain to me what are these devices? I tried to do some search but no luck. The reason I want to know this is because I need to test some commands like: parted, mkfs.ext4, tune2fs, and e2fsck on the /dev/vda1 device. However, I don't know how these virtual devices are mapped on my host system. I'm afraid that formatting the /dev/vda1 device with the mkfs.ext4 command I lost some data on my MacOS host.
I didn't map explicitely these devices with something on my host, so I think that podman created them by default and map them on my host in some way.
Is it safe to run the above command against these virtual devices or is there the risk I break something on my host?
Thank you in advance for your help.

no space left on device on docker desktop for macos and skaffold

I had tried following the advice here, specifically:
run docker system prune, which freed about 6GB
increased the Disk image size on docker desktop preferences to 64 GB (43 GB used)
but am still seeing this when running skaffold: exiting dev mode because first build failed: couldn't build "user/orders": docker build: Error response from daemon: Error processing tar file(exit status 1): write /tsconfig.json: no space left on device. Another run of skaffold gave me this on another occasion:
exiting dev mode because first build failed: couldn't build "exiting dev mode because first build failed: couldn't build "user/orders": unable to stream build output: failed to create rwlayer: mkdir /var/lib/docker/overlay2/7c6618702ad15fe0fa7d4655109aa6326fb4f954df00d2621f62d66d7b328ed9/diff: no space left on device
Also, when running docker system df, I see this:
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 10 0 28.86GB 28.86GB (100%)
Containers 0 0 0B 0B
Local Volumes 30 0 15.62GB 15.62GB (100%)
Build Cache 0 0 0B 0B
I also have about 200GB of physical hard drive space available.
I'm hoping I don't have to manually run rm * as proposed here, which was for a linux distro.

if you're running on Mac and have 200GB free, will increasing help you?

GitLab (via Docker) on a QNAP NAS with ARM CPU ("exec format error")

I just bought a QNAP TS-832X NAS (Firmware: 4.3.4.0695 Build 20180830).
This machine comes with an ARM CPU (Annapurna Labs Alpine AL324 Quad-core ARM Cortex-A57 CPU # 1.70GHz).
I bought it only to install GitLab on it, but the official image doesn't seem to work.
When I try to run the image it fails.
[~] # docker run -d --name gitlab-server --hostname build1 -p 10080:10080 -p 10022:22 -p 10443:443 -v /share/GitLab/config:/etc/gitlab -v /share/GitLab/logs:/var/log/gitlab -v /share/GitLab/data:/var/opt/gitlab --restart always gitlab/gitlab-ce:latest
[~] # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a176158729ad gitlab/gitlab-ce:latest "/assets/wrapper" 5 seconds ago Restarting (1) 1 second ago gitlab-server
[~] # docker logs a1
standard_init_linux.go:185: exec user process caused "exec format error"
standard_init_linux.go:185: exec user process caused "exec format error"
standard_init_linux.go:185: exec user process caused "exec format error"
standard_init_linux.go:185: exec user process caused "exec format error"
standard_init_linux.go:185: exec user process caused "exec format error"
standard_init_linux.go:185: exec user process caused "exec format error"
standard_init_linux.go:185: exec user process caused "exec format error"
After googling I figured it might be caused by the host architecture, so I tried running ulm0/gitlab, but with the same result.
I also tried other images with "ARM" in their tags like arm64v8/ubuntu. This one didn't even give any logs.
[~] # docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2b2b68bc912c arm64v8/ubuntu:latest "/bin/bash" 7 seconds ago Restarting (0) 1 second ago ubuntu-arm
a176158729ad gitlab/gitlab-ce:latest "/assets/wrapper" 2 hours ago Restarting (1) 51 seconds ago gitlab-server
[~] # docker logs 2b
[~] #
uname -a
Linux build1 4.2.8 #2 SMP Thu Aug 30 07:33:01 CST 2018 aarch64 GNU/Linux
docker version
Client:
Version: 17.09.1-ce
API version: 1.32
Go version: go1.8.3
Git commit: a9fd393
Built: Fri Aug 3 04:31:20 2018
OS/Arch: linux/arm64
Server:
Version: 17.09.1-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: a9fd393
Built: Fri Aug 3 04:31:20 2018
OS/Arch: linux/arm64
Experimental: false

Sorry to hear about your problem, unfortunately I don't believe there is any official GitLab Docker image for ARM devices.
From personal experience I've found that most developers will make a Docker image for Intel devices but not work on ARM Devices.
This topic has been discussed on the QNAP Forums already:
My QNAP is Intel based, so I can't corroborate your results, but quoting a few sentences from a page about docker on Raspberry Pi:
"Docker-based apps you use have to be packaged specifically for ARM architecture! Docker-based apps packaged for x86/x64 will not work and will result in an error such as:
FATA[0003] Error response from daemon: Cannot start container 0f0fa3f8e510e53908e6a459e817d600b9649e621e7dede974d6a65761ad39e5: exec format error
Keep this in mind when searching for apps on the Docker Hub - the source for Docker apps/images. If you see the keyword RPI or ARM in the heading or description, this app can usually be used for the Raspberry Pi."
The TS-831X has a "AnnapurnaLabs, an Amazon company Alpine AL-314 Quad-core 1.7 GHz Cortex-A15 processor" CPU, which is an ARM architecture much like the Raspberry Pi..
So, I suspect you may be limited in what Docker images you have access to and unless an official/canonical maintainer of an app also makes an ARM build, you may be stuck with either rolling your own or trusting a 3rd party hobbyist to do so...
I hate to say this but I'd say you should have picked up an Intel one instead.
I have a QNAP TS-251+ (Intel based) with 8GB RAM and 2x8TB in a RAID Configuration and this works perfectly for my Gitlab instance, in addition to running PLEX and using it as a Webserver as well.
I would also suggest when you do finally get it up and running to map the volumes to directories that are easy to access so you can make configuration changes easily.

How to check the number of cores used by docker container?

I have been working with Docker for a while now, I have installed docker and launched a container using
docker run -it --cpuset-cpus=0 ubuntu
When I log into the docker console and run
grep processor /proc/cpuinfo | wc -l
It shows 3 which are the number of cores I have on my host machine.
Any idea on how to restrict the resources to the container and how to verify the restrictions??

The issue has been already raised in #20770. The file /sys/fs/cgroup/cpuset/cpuset.cpus reflects the correct output.
The cpuset-cpus is taking effect however is not being reflected in /proc/cpuinfo

docker inspect <container_name>
will give the details of the container launched u have to check for "CpusetCpus" in there and then u will find the details.

Containers aren't complete virtual machines. Some kernel resources will still appear as they do on the host.
In this case, --cpuset-cpus=0 modifies the resources the container cgroup has access to which is available in /sys/fs/cgroup/cpuset/cpuset.cpus. Not what the VM and container have in /proc/cpuinfo.
One way to verify is to run the stress-ng tool in a container:
Using 1 cpu will be pinned at 1 core (1 / 3 cores in use, 100% or 33% depending on what tool you use):
docker run --cpuset-cpus=0 deployable/stress -c 3
This will use 2 cores (2 / 3 cores, 200%/66%):
docker run --cpuset-cpus=0,2 deployable/stress -c 3
This will use 3 ( 3 / 3 cores, 300%/100%):
docker run deployable/stress -c 3
Memory limits are another area that don't appear in kernel stats
$ docker run -m 64M busybox free -m
total used free shared buffers cached
Mem: 3443 2500 943 173 261 1858
-/+ buffers/cache: 379 3063
Swap: 1023 0 1023
yamaneks answer includes the github issue.

it should be in double quotes --cpuset-cpus="", --cpuset-cpus="0" means it make use of cpu0.

Pagecache and dirty pages in paused container

I have a Java application running in Ubuntu 14.04 container. The application relies OS pagecache to speed-up reads and writes. The container is issued a pause command which according to docker documentation triggers a cgroup freezer https://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt.
What happens to dirty pages and pagecache of the paused container? Are they flushed to disk? Or the whole notion of container-scope pagecache is wrong and dirty pages for all containers are managed at the docker host level?
docker host free -m:
user#0000 ~ # free -m
total used free shared buffers cached
Mem: 48295 47026 1269 0 22 45010
-/+ buffers/cache: 1993 46302
Swap: 24559 12 24547
container docker exec f1b free -m
user#0000 ~ # docker exec f1b free -m
total used free shared buffers
Mem: 48295 47035 1259 0 22
-/+ buffers: 47013 1282
Swap: 24559 12 24547
Once a container is paused, I cannot check memory as seen by the container.
FATA[0000] Error response from daemon: Container f1 is paused, unpause the container before exec

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Docker error at higher core counts on a multi core machine - docker

Related

Vda devices in podman

no space left on device on docker desktop for macos and skaffold

GitLab (via Docker) on a QNAP NAS with ARM CPU ("exec format error")

How to check the number of cores used by docker container?

Pagecache and dirty pages in paused container

Categories

Resources