GPU Memory in TensorFlow container with NVIDIA SLI - docker

I have constructed a machine-learning computer with two RTX 2070 SUPER NVIDIA GPUs connected with SLI Bridge, Windows OS (SLI verified in NVIDIA Control Panel).
I have benchmarked the system using http://ai-benchmark.com/alpha and got impressive results.
In order to take the best advantage of libraries that use the GPU for scientific tasks (cuDF) I have created a TensorFlow Linux container:
https://www.tensorflow.org/install/docker
using “latest-gpu-py3-jupyter” tag.
I have then connected PyCharm to this container and configured its interpreter as an interpreter of the same project (I mounted the host project folder in the container).
When I run the same benchmark on the container, I get the error:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[50,56,56,144] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[node MobilenetV2/expanded_conv_2/depthwise/BatchNorm/FusedBatchNorm (defined at usr/local/lib/python3.6/dist-packages/ai_benchmark/utils.py:238) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
This error relates to the exhaustion of GPU memory inside the container.
Why is the GPU on the windows host successfully handle the computation and the GPU on the Linux container exhaust the memory?
What makes this difference? is that related to memory allocation in the container?

Here is an awesome link from docker.com that explains why your desired workflow won't work. It also wouldn't work with RAPIDS cudf either. Docker Desktop works using Hyper V which isolates the hardware and doesn't access to the GPU the way the Linux drivers expect. Also, nvidia-docker is linux only
I can let you know that RAPIDS (cudf) currently doesn't support this implementation either. Windows, however, does work better with a Linux host. For both tensorflow and cudf, I strongly recommend that you use (or dual boot) one of the recommended OSes as your host OS, found here: https://rapids.ai/start.html#prerequisites. If you need Windows in your workflow, you can run it on top of your Linux host.
There is a chance that in the future, a WSL version will allow you to run RAPIDS on Windows, letting you- on your own- craft an on Windows solution.
Hope this helps!

Related

Docker container CPU features do not match the host's ones (RDTSCP)

I am using a Docker container to run a C++ compiled executable. The Docker container is built using the latest Linux Debian distribution, while the host is a MacOS system (MacOS 12.6, on MacBook Pro 16 Latest 2019).
Within the C++ code, I call the function __rdtscp(unsigned int *__A) including x86intrin.h for monitoring purpose. Compiling and executing the application on the MacOS host it works correctly. But if I try to run it within the Docker container, I obtain a Illegal instruction error (it is compiled on another physical Linux host, I need this: anyway, I can run the same executable on different Linux machines and also on the container generated by the same Docker image I use if executed on another host).
Looking deeper into the issue, I found that __rdtscp(unsigned int *__A) must be supported by the CPU. It should be supported by all the CPUs after 2010/2011. In fact, it seems that flag is reported within the host CPU's features (RDTSCP). The problem is that I cannot find it within the container CPU's features.
Note that using __rdtsc() it works correctly, but this is not serializable, so I want to use __rdtscp(unsigned int *__A).
Following the MacOS host output of sysctl -a | grep machdep.cpu
And this is the output of the Debian docker container of lscpu
Could you help me to figure out the reason of this difference? Is there a way to force Docker to provide the same host CPU's features?
Thank you!

Are containers specific to a host OS?

Are containers specific to a particular host OS? For instance, if a container is created on Windows with particular dependencies (e.g., DLL files), can it run in a setup in which the host OS is Linux? I initially assumed that a container must be specific to a particular host OS.
But the following two excerpts seem to suggest that I may not have understood the mechanics correctly. So my question is: are containers built over the docker engine so when the dependencies are included, they are relative to the docker engine and the underlying host OS does not matter?
(1) From IBM:
Containerization allows developers to create and deploy applications faster and more securely. With traditional methods, code is developed in a specific computing environment which, when transferred to a new location, often results in bugs and errors. For example, when a developer transfers code from a desktop computer to a virtual machine (VM) or from a Linux to a Windows operating system. Containerization eliminates this problem by bundling the application code together with the related configuration files, libraries, and dependencies required for it to run. This single package of software or “container” is abstracted away from the host operating system, and hence, it stands alone and becomes portable—able to run across any platform or cloud, free of issues. [https://www.ibm.com/cloud/learn/containerization]
(2) From Docker:
Does Docker run on Linux, macOS, and Windows?
You can run both Linux and Windows programs and executables in Docker containers. The Docker platform runs natively on Linux (on x86-64, ARM and many other CPU architectures) and on Windows (x86-64).
Docker Inc. builds products that let you build and run containers on Linux, Windows and macOS.
What does Docker technology add to just plain LXC?🔗
Docker technology is not a replacement for LXC. “LXC” refers to capabilities of the Linux kernel (specifically namespaces and control groups) which allow sandboxing processes from one another, and controlling their resource allocations. On top of this low-level foundation of kernel features, Docker offers a high-level tool with several powerful functionalities:
Portable deployment across machines. Docker defines a format for bundling an application and all its dependencies into a single object called a container. This container can be transferred to any Docker-enabled machine. The container can be executed there with the guarantee that the execution environment exposed to the application is the same in development, testing, and production. LXC implements process sandboxing, which is an important pre-requisite for portable deployment, but is not sufficient for portable deployment. If you sent me a copy of your application installed in a custom LXC configuration, it would almost certainly not run on my machine the way it does on yours. The app you sent me is tied to your machine’s specific configuration: networking, storage, logging, etc. Docker defines an abstraction for these machine-specific settings. The exact same Docker container can run - unchanged - on many different machines, with many different configurations.
The host OS, or precisely, the kernel provided still matters. That's why you can't run Windows containers on Linux. You can run Linux container on Windows due to Hyper-V and WSL2, and on macOS with Hypervisor, but that's it. If the provided kernel is compatible (doesn't have to be identical), usually similar version and the same architecture (remember, there are x64, ARM64, etc) or at least supported virtualization (x64 containers can run on M1, which is ARM64) then you can just run the container, no need to worry about DLLs because they're supposed to be included either in one of the base image you start with or the image you generate.

Best way to test a custom kernel for hardware performance counters

I would like to modify an official linux kernel to test some possibilities for perf linux module (I need to modify some files in kernel/events/..., not only tools/perf/...).
Naively, I though of using a VM or Docker but I need to test my custom version with hardware performance counters (HPC); and it's a big problem :
Docker can take HPC but I understood but only by my host kernel, I can't test directly a custom kernel without installing it on my system (correct me if I am wrong)
The VM can't take HPC because it can't emulate it
What is the best way to test a custom kernel linux without installing directly the kernel on my ubuntu system ? And if I have to, what is the most elegant way to do these tests ? Thank you.
I found a solution : KVM + QEMU emulator.
To use PMU, I changed this parameter in the VM parameters (XML format) :
<cpu mode='host-passthrough'/>
Or you can add this option in cmd line :
-cpu host
I followed in part this page for building the kernel on qemu and for the counters this page.

Which docker version to use for app using linux kernel 2.6? [duplicate]

Let's say that I make an image for an OS that uses a kernel of version 10. What behavior does Docker exhibit if I run a container for that image on a host OS running a kernel of version 9? What about version 11?
Does the backward compatibility of the versions matter? I'm asking out of curiosity because the documentation only talks about "minimum Linux kernel version", etc. This sounds like it doesn't matter what kernel version the host is running beyond that minimum. Is this true? Are there caveats?
Let's say that I make an image for an OS that uses a kernel of version 10.
I think this is a bit of a misconception, unless you are talking about specific software that relies on newer kernel features inside your Docker image, which should be pretty rare. Generally speaking a Docker image is just a custom file/directory structure, assembled in layers via FROM and RUN instructions in one or more Dockerfiles, with a bit of meta data like what ports to open or which file to execute on container start. That's really all there is to it. The basic principle of Docker is very much like a classic chroot jail, only a bit more modern and with some candy on top.
What behavior does Docker exhibit if I run a container for that image on a host OS running a kernel of version 9? What about version 11?
If the kernel can run the Docker daemon it should be able to run any image.
Are there caveats?
As noted above, Docker images that include software which relies on bleeding edge kernel features will not work on kernels that do not have those features, which should be no surprise. Docker will not stop you from running such an image on an older kernel as it simply does not care whats inside an image, nor does it know what kernel was used to create the image.
The only other thing I can think of is compiling software manually with aggressive optimizations for a specific cpu like Intel or Amd. Such images will fail on hosts with a different cpu.
Docker's behaviour is no different: it doesn't concern itself (directly) with the behaviour of the containerized process. What Docker does do is set up various parameters (root filesystem, other mounts, network interfaces and configuration, separate namespaces or restrictions on what PIDs can be seen, etc.) for the process that let you consider it a "container," and then it just runs the initial process in that environment.
The specific software inside the container may or may not work with your host operating system's kernel. Using a kernel older than the software was built for is not infrequently problematic; more often it's safe to run older software on a newer kernel.
More often, but not always. On a host with kernel 4.19 (e.g. Ubuntu 18.04) try docker run centos:6 bash. You'll find it segfaults (exit code 139) because that old build of bash does something that greatly displeases the newer kernel. (On a 4.9 or lower kernel, docker run centos:6 bash will work fine.) However, docker run centos:6 ls will not die in the same way because that program is not dependent on particular kernel facilities that have changed (at least, not when run with no arguments).
This sounds like it doesn't matter what kernel version the host is running beyond that minimum. Is this true?
As long as your kernel meets Docker's minimum requirements (which mostly involve having the necessary APIs to support the isolated execution environment that Docker sets up for each container), Docker doesn't really care what kernel you're running.
In many way, this isn't entirely a Docker question: for the most part, user-space tools aren't tied particularly tightly to specific kernel versions. This isn't unilaterally true; there are some tools that by design interact with a very specific kernel version, or that can take advantage of APIs in recent kernel versions for improved performance, but for the most part your web server or database just doesn't care.
Are there caveats?
The kernel version you're running may dictate things like which storage drivers are available to Docker, but this doesn't really have any impact on your containers.
Older kernel versions may have security vulnerabilities that are fixed in more recent versions, and newer versions may have fixes that offer improved performance.

Getting Docker to recognize nvidia graphics card on mac

When I am in my container, I run
lspci | grep -i nvidia
and nothing shows.
When I run ./deviceQuery from the samples NVIDIA provides I get
no CUDA-capable device is detected
I know I have a nvidia driver on my mac. I just can't figure out how to get my docker container to realize that.
On OS X, docker is a container running inside a separate virtualbox vm which does not expose the host GPU.
You'll first need to make the graphics card available in the Virtual Box VM. I'm not sure how to do that, but this looks like it might help:
https://www.virtualbox.org/manual/ch04.html#guestadd-video
Once you've got it mounted within the VM, then you can also share it with the container.
I haven't tried this myself, but this guy says that he can run native X11 Apps on a Mac using a beta docker client called Kinematic along with socat, XQuartz, and QGIS, and he seems to imply that NVidia driver issues were thus avoided. This looks worth a try!

Resources