Enabling cgroup cpu real-time runtime in ubuntu kernel

Enabling cgroup cpu real-time runtime in ubuntu kernel - docker

I am trying to use real-time scheduling in a docker container running on Ubuntu 18.04.
I have already installed a realtime kernel following the method given here. I have selected kernel version 5.2.9 and its associated rt patch.
The output of uname -a confirms that the realtime kernel is well installed and running:
Linux myLaptop 5.2.9-rt3 #1 SMP PREEMPT RT ...
To run my container I issue the following command:
docker run --cpu-rt-runtime=95000 \
--ulimit rtprio=99 \
--ulimit memlock=102400 \
--cap-add=sys_nice \
--privileged \
-it \
myimage:latest
However, the output I got is:
docker: Error response from daemon: Your kernel does not support cgroup cpu real-time runtime.
I have seen that this can be linked to the missing CONFIG_RT_GROUP_SCHED as detailed in the issue here. Indeed if I run the script provided at this page to check the kernel compatibility with Docker I get:
- CONFIG_RT_GROUP_SCHED: missing
Which seems to confirm that Docker is using this for realtime scheduling but is not provided in the kernel, although patched to be realtime.
From there, I tried to find a solution in vain. I am not well versed in kernel configurations to know if I need to compile it with a specific option, and which one to choose, to add the missing CONFIG_RT_GROUP_SCHED.
Thanks a lot in advance for recommendations and help.

When talking about real-time Linux there are different approaches ranging from single kernel approaches (like PREEMPT_RT) to dual-kernel approaches (such as Xenomai). You can use real-time capable Dockers in combination with all of them (clearly the kernel of your host machine has to match) to produce real-time capable systems but the approaches differ. In your case you are mixing up two different approaches: You installed PREEMPT_RT while following a guide for control groups which are incompatible with PREEMPT_RT.
By default the Linux kernel can be compiled with different levels of preempt-ability (see e.g. Reghenzani et al. - "The real-time Linux kernel: a Survey on PREEMPT_RT"):
PREEMPT_NONE has no way of forced preemption
PREEMPT_VOLUNTARY where preemption is possible in some locations in order to reduce latency
PREEMPT where preemption can occur in any part of the kernel (excluding spinlocks and other critical sections)
These can be combined with the feature of control groups (cgroups for short) by setting CONFIG_RT_GROUP_SCHED=y during kernel compilation, which reserves a certain fraction of CPU-time for processes of a certain (user-defined) group.
PREEMPT_RT developed from PREEMPT and is a set of patches that aims at making the kernel fully preemptible, even in critical sections (PREEMPT_RT_FULL). For this purpose e.g. spinlocks are largely replaced by mutexes.
As of 2021 it is being slowly merged into the mainline and will be available to the general public without the need to patch the kernel. As stated here PREEMPT_RT currently can't be compiled with the CONFIG_RT_GROUP_SCHED and therefore can't be used with control groups (see here for a comparison). From what I have read this is due to high latency spikes, something that I have already observed with control groups by means of cyclicytests.
This means you can either compile your kernel (see the Ubuntu manual for details)
Without PREEMPT_RT but with CONFIG_RT_GROUP_SCHED (see this post for details) and follow the Docker guide on real-time with control groups as well as my post here. From my experience this has though quite high latency spikes, something not desirable for real-time system where the worst-case latency is much more important than the average latency.
With PREEMPT_RT without CONFIG_RT_GROUP_SCHED (which can also be installed from a Debian package such as this one). In this case it is sufficient to execute the Docker with the options --privileged --net=host, or the Docker-compose equivalent privileged: true network_mode: host. Then any process from inside the Docker can set real-time priorities rtprio (e.g. by calling ::pthread_setschedparam from inside the code or by using chrt from the command line).
In case you are not using the root as user inside the Docker you furthermore will have to have give yourself a name of a user that belongs to a group with real-time privileges on your host computer (see $ ulimit -r). This can be done by configuring the PAM limits (/etc/security/limits.conf file) accordingly (as described here) by copying the section of the #realtime user group and creating a new group (e.g. #some_group) or adding the user (e.g. some_user) directly:
#some_group soft rtprio 99
#some_group soft priority 99
#some_group hard rtprio 99
#some_group hard priority 99
In this context rtprio is the maximum real-time priority allowed for non-privileged processes. The hard limit is the real limit to which the soft limit can be set to. The hard limits are set by the super-user and enforce by the kernel. The user cannot raise his code to run with a higher priority than the hard limit. The soft limit on the other hand is the default value limited by the hard limit. For more information see e.g. here.
I use latter option for real-time capable robotic applications and could not observe any differences in latency between with and without the Docker. You can find a guide on how to set up PREEMPT_RT and automated scripts for building it on my Github.

Related

How does Docker handle different kernel versions?

Let's say that I make an image for an OS that uses a kernel of version 10. What behavior does Docker exhibit if I run a container for that image on a host OS running a kernel of version 9? What about version 11?
Does the backward compatibility of the versions matter? I'm asking out of curiosity because the documentation only talks about "minimum Linux kernel version", etc. This sounds like it doesn't matter what kernel version the host is running beyond that minimum. Is this true? Are there caveats?

Let's say that I make an image for an OS that uses a kernel of version 10.
I think this is a bit of a misconception, unless you are talking about specific software that relies on newer kernel features inside your Docker image, which should be pretty rare. Generally speaking a Docker image is just a custom file/directory structure, assembled in layers via FROM and RUN instructions in one or more Dockerfiles, with a bit of meta data like what ports to open or which file to execute on container start. That's really all there is to it. The basic principle of Docker is very much like a classic chroot jail, only a bit more modern and with some candy on top.
What behavior does Docker exhibit if I run a container for that image on a host OS running a kernel of version 9? What about version 11?
If the kernel can run the Docker daemon it should be able to run any image.
Are there caveats?
As noted above, Docker images that include software which relies on bleeding edge kernel features will not work on kernels that do not have those features, which should be no surprise. Docker will not stop you from running such an image on an older kernel as it simply does not care whats inside an image, nor does it know what kernel was used to create the image.
The only other thing I can think of is compiling software manually with aggressive optimizations for a specific cpu like Intel or Amd. Such images will fail on hosts with a different cpu.

Docker's behaviour is no different: it doesn't concern itself (directly) with the behaviour of the containerized process. What Docker does do is set up various parameters (root filesystem, other mounts, network interfaces and configuration, separate namespaces or restrictions on what PIDs can be seen, etc.) for the process that let you consider it a "container," and then it just runs the initial process in that environment.
The specific software inside the container may or may not work with your host operating system's kernel. Using a kernel older than the software was built for is not infrequently problematic; more often it's safe to run older software on a newer kernel.
More often, but not always. On a host with kernel 4.19 (e.g. Ubuntu 18.04) try docker run centos:6 bash. You'll find it segfaults (exit code 139) because that old build of bash does something that greatly displeases the newer kernel. (On a 4.9 or lower kernel, docker run centos:6 bash will work fine.) However, docker run centos:6 ls will not die in the same way because that program is not dependent on particular kernel facilities that have changed (at least, not when run with no arguments).

This sounds like it doesn't matter what kernel version the host is running beyond that minimum. Is this true?
As long as your kernel meets Docker's minimum requirements (which mostly involve having the necessary APIs to support the isolated execution environment that Docker sets up for each container), Docker doesn't really care what kernel you're running.
In many way, this isn't entirely a Docker question: for the most part, user-space tools aren't tied particularly tightly to specific kernel versions. This isn't unilaterally true; there are some tools that by design interact with a very specific kernel version, or that can take advantage of APIs in recent kernel versions for improved performance, but for the most part your web server or database just doesn't care.
Are there caveats?
The kernel version you're running may dictate things like which storage drivers are available to Docker, but this doesn't really have any impact on your containers.
Older kernel versions may have security vulnerabilities that are fixed in more recent versions, and newer versions may have fixes that offer improved performance.

Managing multiple GPUs with multiple users

I have a server (Ubuntu 16.04) with 4 GPUs. My team shares this, and our current approach is to containerize all of our work with Docker, and to restrict containers to GPUs using something like $ NV_GPU=0 nvidia-docker run -ti nvidia/cuda nvidia-smi. This works well when we're all very clear about who's using which GPU, but our team has grown and I'd like a more robust way of monitoring GPU use and prohibit access to GPUs when they're in use. nvidia-smi is one channel of information with the "GPU-Util", but sometimes the GPU may have a 0% GPU-Util at one moment while it is currently reserved by someone working in a container.
Do you have any recommendations for:
Tracking when a user runs $ NV_GPU='gpu_id' nvidia-docker run
Kicking an error when another user runs $ NV_GPU='same_gpu_id' nvidia-docker run
Keeping an updated log that's something along the lines of {'gpu0':'user_name or free', . . ., 'gpu3':'user_name or free'}, where for every gpu it identifies the user who ran the active docker container utilizing that gpu, or it states that it is 'free'. Actually, stating the user and the container that is linked to the gpu would be preferable.
Updating the log when the user closes the container that is utilizing the gpu
I may be thinking about this the wrong way too, so open to other ideas. Thanks!

Can a docker image based on Ubuntu run in Redhat?

Read some PPTs, it seems that one container can run on different linux vendors. Is is true?

Yes. That's the main idea of docker.
It creates a "static container" in a chrooted env that is able to run on any linux because all the needed user-land dependencies are included in the image.
Since linux (the kernel) maintains a backward compatibility on system calls and their call-schemes, the idea can work across versions and even different distributions of Linux.
Of course, the binary architecture (say amd64) needs to be the same on the source and target system.

Yes, for most applications this works. The kernel is whatever you are really running on (RedHat in your example) while the userspace is supplied by the container (Ubuntu).
Most Linux kernel variants are sufficiently similar that applications will not notice. However if the code relies on something specific in the kernel that is not there, Docker can't help you.
Docker itself relies on certain minimum kernel features, version 3.8 at the time of writing. https://docs.docker.com/engine/installation/binaries/

Limiting a Docker Container to a single cpu core

I'm trying to build a system which runs pieces of code in consistent conditions, and one way I imagine this being possible is to run the various programs in docker containers with the same layout, reserving the same amount of memory, etc. However, I can't seem to figure out how to keep CPU usage consistent.
The closest thing I can seem to find are "cpu shares," which, if I understand the documentation, limit cpu usage with respect to what other containers/other processes are running on the system, and what's available on the system. They do not seem to be capable of limiting the container to an absolute amount of cpu usage.
Ideally, I'd like to set up docker containers that would be limited to using a single cpu core. Is this at all possible?

If you use a newer version of Docker, you can use --cpuset-cpus="" in docker run to specify the CPU cores you want to allocate:
docker run --cpuset-cpus="0" [...]
If you use an older version of Docker (< 0.9), which uses LXC as the default execution environment, you can use --lxc-conf to configure the allocated CPU cores:
docker run --lxc-conf="lxc.cgroup.cpuset.cpus = 0" [...]
In both of those cases, only the first CPU core will be available to the docker container. Both of these options are documented in the docker help.

I've tried to provide a tutorial on container resource alloc.
https://gist.github.com/afolarin/15d12a476e40c173bf5f

docker metrics reside in different location for different environment or versions

I need to gather docker metrics like cpu, memory and I/O, but I noticed that on my Ubuntu 14.04 the location of the metrics are different from the location in my CoreOs system:
For example:
The docker cpu metrics in ubuntu are located under:
/sys/fs/cgroup/cpuacct/docker/<dockerLongId>/cpuacct.stat
The docker cpu metrics for CoreOs are located under:
/sys/fs/cgroup/cpuacct/system.slice/docker-<dockerLongId>.scope/cpuacct.stat
do you have an idea what will be the best way to support both environments?

There are numerous issues with this. To start with the CoreOS vs Ubuntu part, this is due to the fact that on Ubuntu systemd slices are not used.
man systemd.slice
In the end, control groups are designed to be configurable. At any given time a process could be reconfigured by moving the PID between different cgroups. Inherently there will be a small amount of unpredictable behavior. Those patterns should be stable for processes started by their respective init systems.
The best way to detect which method should be used would be to read /etc/os-release. The purpose of this file is to provide a stable method for determining not only the distro, but the version as well.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart