Why is my cgroup write throughput not limited?

Why is my cgroup write throughput not limited? - docker

I am trying to set upper write throughput limit per cgroup via blkio cgroup controller.
I have tried it like this:
echo "major:minor 10485760" > /sys/fs/cgroup/blkio/docker/XXXXX/blkio.throttle.write_bps_device
This should limit throughput to 10 MBps. However tool, that's monitoring servers disk, reports this behaviour.
I thought that, the line should hold somewhere around 10M. Can somebody explain this behaviour to me and maybe propose a better way to limit throughput?

Are you sure that the major/minor numbers that you specified in the command line are correct? Moreover, as you are running in docker, the limitation is for the processes running in the docker container not for the processes running outside. So, you need to check from where the information taken by the monitoring tool come from (does it take numbers for all the processes inside and outside the container or only for the processes inside the container?).
To check the setting, the Linux documentation provides an example with the dd command and a device limited to 1MB/second on reads. You can try the same with a limit on the writes to see if the monitoring tool is coherent with the output of dd. Make the latter run in the container.
For example, my home directory is located on /dev/sdb2:
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
[...]
/dev/sdb2 2760183720 494494352 2125409664 19% /home
[...]
$ ls -l /dev/sdb*
brw-rw---- 1 root disk 8, 16 mars 14 08:14 /dev/sdb
brw-rw---- 1 root disk 8, 17 mars 14 08:14 /dev/sdb1
brw-rw---- 1 root disk 8, 18 mars 14 08:14 /dev/sdb2
I check the speed of the writing in a file:
$ dd oflag=direct if=/dev/zero of=$HOME/file bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4,2 MB, 4,0 MiB) copied, 0,131559 s, 31,9 MB/s
I set the 1MB/s write limit on the whole disk (8:16) as it does not work on individual partitions (8:18) on which my home directory resides:
# echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
# cat /sys/fs/cgroup/blkio/blkio.throttle.write_bps_device
8:16 1048576
dd's output confirms the limitation of the I/O throughput to 1 MB/s:
$ dd oflag=direct if=/dev/zero of=$HOME/file bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4,2 MB, 4,0 MiB) copied, 4,10811 s, 1,0 MB/s
So, it is possible to make the same in a container.

Related

Docker: Monitor disk writes to the container, i.e. by the overlay storage driver

I would like to monitor data written "inside" a Docker container, meaning data written to the backing filesystem by the overlay storage driver. Not data written to volumes, tmpfs or bind mounts. Typical monitoring tools, such as docker stats seem to report the total amount of data written.
BLOCK I/O The amount of data the container has read to and written from [sic] block devices on the host
Source: docker stats
The idea is to keep containers as read-only as possible, by finding "write-heavy" files / folders and moving them to volumes or bind mounts. So an ideal solution would not (only) show the data currently written, but the total amount of data written since the container was started, ideally breaking it down to single files.
At the moment I'm simply using find -type f -mtime x from the container shell, where x is a smaller than the image age, but there must be a better solution for this.
I'm using: Server Version: 18.06.1-ce, Storage Driver: overlay2, Backing Filesystem: extfs

Actually the docker storage driver itself provides the answer already.
Taking the overlay2 storage driver, which is the default driver on most distributions, as an example, we see that the container layer, where all data written to the container is stored, is kept in a separate folder:
Source: How the overlay driver works
Total amount of data written to the container layer
For a complete overview of what has been written to the container, we only have to take a look at the upperdir, which is called diff on the backing (host) file system.
The path of the diff folder can be found with
docker container inspect <container_name> --format='{{.GraphDriver.Data.UpperDir}}' # or
docker container inspect <container_name> | grep UpperDir
With default settings, this path points to /var/lib/docker/overlay2/. Note that access to the "inner workings" of docker requires root access on the host, and it's a good idea not to do any writes to these folders.
Now that we have the folder on the backing file system, we can simply du in much detail as we want. As a test example, I've used an alpine image that runs a script, which writes a 10 MB dummy file every 10 seconds.
root#testbox:/var/lib/docker/overlay2/83a825d...# du -h -d 1
8.0K ./work
216M ./diff
216M .
root#testbox:/var/lib/docker/overlay2/83a825d...# ll diff/tmp
total 220164
drwxrwxrwt 2 root root 4096 Okt 21 22:57 ./
drwxr-xr-x 3 root root 4096 Okt 21 22:53 ../
-rw-r--r-- 1 root root 9266613 Okt 21 22:53 dummy0.tar.gz
-rw-r--r-- 1 root root 9266613 Okt 21 22:55 dummy10.tar.gz
-rw-r--r-- 1 root root 9266613 Okt 21 22:55 dummy11.tar.gz
[...]
Hence, seeing all the files and folders written to the container is as easy as with any other directory.

pvcreate not able to initialize physical volume

I got some application which will call the pvcreate each time.
I can see the volumes in my vm as follow:
$ pvscan
PV /dev/vda5 VG ubuntu-vg lvm2 [99.52 GiB / 0 free]
Total: 1 [99.52 GiB] / in use: 1 [99.52 GiB] / in no VG: 0 [0 ]
$ pvcreate --metadatasize=128M --dataalignment=256K '/dev/vda5'
Can't initialize physical volume "/dev/vda5" of volume group "ubuntu-vg" without -ff
$ pvcreate --metadatasize=128M --dataalignment=256K '/dev/vda5' -ff
Really INITIALIZE physical volume "/dev/vda5" of volume group "ubuntu-vg" [y/n]? y
Can't open /dev/vda5 exclusively. Mounted filesystem?
I have also tried wipsfs and observed the same result for above commands
$ wipefs -af /dev/vda5
/dev/vda5: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
How can I execute pvcreate?
Anything to be added for my vm?

It seems your hdd (/dev/vda5) is already been used in your ubuntu-vg. I think you can not use same hdd partition in 2 different PV's. or you can not add it again.

Able to malloc more than docker-compose mem_limit

I'm trying to limit my container so that it doesn't take up all the RAM on the host. From the Docker docs I understand that --memory limits the RAM and --memory-swap limits (RAM+swap). From the docker-compose docs it looks like the terms for those are mem_limit and memswap_limit, so I've constructed the following docker-compose file:
> cat docker-compose.yml
version: "2"
services:
stress:
image: progrium/stress
command: '-m 1 --vm-bytes 15G --vm-hang 0 --timeout 10s'
mem_limit: 1g
memswap_limit: 2g
The progrium/stress image just runs stress, which in this case spawns a single thread which requests 15GB RAM and holds on to it for 10 seconds.
I'd expect this to crash, since 15>2. (It does crash if I ask for more RAM than the host has.)
The kernel has cgroups enabled, and docker stats shows that the limit is being recognised:
> docker stats
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
7624a9605c70 0.00% 1024MiB / 1GiB 99.99% 396B / 0B 172kB / 0B 2
So what's going on? How do I actually limit the container?
Update:
Watching free, it looks like the RAM usage is effectively limited (only 1GB of RAM is used) but the swap is not: the container will gradually increase swap usage until it's eaten though all of the swap and stress crashes (it takes about 20secs to get through 5GB of swap on my machine).
Update 2:
Setting mem_swappiness: 0 causes an immediate crash when requesting more memory than mem_limit, regardless of memswap_limit.

Running docker info shows WARNING: No swap limit support
According to https://docs.docker.com/engine/installation/linux/linux-postinstall/#your-kernel-does-not-support-cgroup-swap-limit-capabilities this is disabled by default ("Memory and swap accounting incur an overhead of about 1% of the total available memory and a 10% overall performance degradation.") You can enable it by editing the /etc/default/grub file:
Add or edit the GRUB_CMDLINE_LINUX line to add the following two key-value pairs:
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
then update GRUB with update-grub and reboot.

How to check the number of cores used by docker container?

I have been working with Docker for a while now, I have installed docker and launched a container using
docker run -it --cpuset-cpus=0 ubuntu
When I log into the docker console and run
grep processor /proc/cpuinfo | wc -l
It shows 3 which are the number of cores I have on my host machine.
Any idea on how to restrict the resources to the container and how to verify the restrictions??

The issue has been already raised in #20770. The file /sys/fs/cgroup/cpuset/cpuset.cpus reflects the correct output.
The cpuset-cpus is taking effect however is not being reflected in /proc/cpuinfo

docker inspect <container_name>
will give the details of the container launched u have to check for "CpusetCpus" in there and then u will find the details.

Containers aren't complete virtual machines. Some kernel resources will still appear as they do on the host.
In this case, --cpuset-cpus=0 modifies the resources the container cgroup has access to which is available in /sys/fs/cgroup/cpuset/cpuset.cpus. Not what the VM and container have in /proc/cpuinfo.
One way to verify is to run the stress-ng tool in a container:
Using 1 cpu will be pinned at 1 core (1 / 3 cores in use, 100% or 33% depending on what tool you use):
docker run --cpuset-cpus=0 deployable/stress -c 3
This will use 2 cores (2 / 3 cores, 200%/66%):
docker run --cpuset-cpus=0,2 deployable/stress -c 3
This will use 3 ( 3 / 3 cores, 300%/100%):
docker run deployable/stress -c 3
Memory limits are another area that don't appear in kernel stats
$ docker run -m 64M busybox free -m
total used free shared buffers cached
Mem: 3443 2500 943 173 261 1858
-/+ buffers/cache: 379 3063
Swap: 1023 0 1023
yamaneks answer includes the github issue.

it should be in double quotes --cpuset-cpus="", --cpuset-cpus="0" means it make use of cpu0.

Rethinkdb container: rethinkdb process takes less RAM than the whole container

I'm running my rethinkdb container in Kubernetes cluster. Below is what I notice:
Running top in the host which is CoreOS, rethinkdb process takes about 3Gb:
$ top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
981 root 20 0 53.9m 34.5m 20.9m S 15.6 0.4 1153:34 hyperkube
51139 root 20 0 4109.3m 3.179g 22.5m S 15.0 41.8 217:43.56 rethinkdb
579 root 20 0 707.5m 76.1m 19.3m S 2.3 1.0 268:33.55 kubelet
But running docker stats to check the rethinkdb container, it takes about 7Gb!
$ docker ps | grep rethinkdb
eb9e6b83d6b8 rethinkdb:2.1.5 "rethinkdb --bind al 3 days ago Up 3 days k8s_rethinkdb-3.746aa_rethinkdb-rc-3-eiyt7_default_560121bb-82af-11e5-9c05-00155d070266_661dfae4
$ docker stats eb9e6b83d6b8
CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
eb9e6b83d6b8 4.96% 6.992 GB/8.169 GB 85.59% 0 B/0 B
$ free -m
total used free shared buffers cached
Mem: 7790 7709 81 0 71 3505
-/+ buffers/cache: 4132 3657
Swap: 0 0 0
Can someone explain why the container is taking a lot more memory than the rethinkdb process itself?
I'm running docker v1.7.1, CoreOS v773.1.0, kernel 4.1.5

In top command, your are looking at physical memory amount. in stats command, this also include the disk cached ram, so it's always bigger than the physical amount of ram. When you really need more RAM, the disk cached will be released for the application to use.
In deed, the memmory usage is pulled via cgroup memory.usage_in_bytes, you can access it in /sys/fs/cgroup/memory/docker/long_container_id/memory.usage_in_bytes. And acording to linux doc https://www.kernel.org/doc/Documentation/cgroups/memory.txt section 5.5:
5.5 usage_in_bytes
For efficiency, as other kernel components, memory cgroup uses some
optimization to avoid unnecessary cacheline false sharing.
usage_in_bytes is affected by the method and doesn't show 'exact'
value of memory (and swap) usage, it's a fuzz value for efficient
access. (Of course, when necessary, it's synchronized.) If you want to
know more exact memory usage, you should use RSS+CACHE(+SWAP) value in
memory.stat(see 5.2).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why is my cgroup write throughput not limited? - docker

Related

Docker: Monitor disk writes to the container, i.e. by the overlay storage driver

pvcreate not able to initialize physical volume

Able to malloc more than docker-compose mem_limit

How to check the number of cores used by docker container?

Rethinkdb container: rethinkdb process takes less RAM than the whole container

Categories

Resources