Vulkan does not detect GPU when running Unity build in Docker container - docker

Running Unity builds on my PC usually works fine.
However when I am trying to run Unity builds within a Docker container I get a segmentation error Segmentation fault (core dumped). I am using Ubuntu 20.04 with an Nvidia GTX1080 and installed all required dependencies like the Nvidia Docker toolkit.
Looking at the logs generated by Unity, it seems that my Nvidia GPU is not detected by Vulkan.
[Vulkan init] SelectPhysicalDevice requestedDeviceIndex=-1 xrDevice=(nil)
[Vulkan init] Physical Device 0xfe9930 [0]: "llvmpipe (LLVM 12.0.0, 256 bits)" deviceType=4 vendorID=10005 deviceID=0
[Vulkan init] Selected physical device (nil)
Caught fatal signal - signo:11 code:1 errno:0 addr:(nil)
By looking at the output of vulkan-info only llvmpipe is detected as physical device.
GPU0:
VkPhysicalDeviceProperties:
---------------------------
apiVersion = 4198582 (1.1.182)
driverVersion = 1 (0x0001)
vendorID = 0x10005
deviceID = 0x0000
deviceType = PHYSICAL_DEVICE_TYPE_CPU
deviceName = llvmpipe (LLVM 12.0.0, 256 bits)
In my Dockerfile I set following Nvidia settings
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES all
and used --gpus='all,"capabilities=compute,utility,graphics,display"' -e DISPLAY when starting the container.
Also running nvidia-smi within the container works.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.203.03 Driver Version: 450.203.03 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:15:00.0 On | N/A |
| 12% 28C P8 18W / 250W | 644MiB / 11170MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Any ideas on resolving this problem? Thanks!

I don't know what the problem was, but i found a workaround.
Instead of using ubuntu:20.04 as base image I am using unityci/editor:ubuntu-2022.1.20f1-base-1 in my Dockerfile now.
I am using following settings in a docker compose file to start the container:
unity:
build:
dockerfile: unity.Dockerfile
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
- ${XAUTHORITY}:${XAUTHORITY}
- $XDG_RUNTIME_DIR:$XDG_RUNTIME_DIR
environment:
- XAUTHORITY
- DISPLAY
- XDG_RUNTIME_DIR
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]

Related

How to properly use current Docker Desktop, WSL 2, and NVidia GPU support

First time posting a question - suggestions welcome to improve the post!
OS: Windows 10 Enterprise, 21H2 19044.2468
WSL2: Ubuntu-20.04
NVIDIA Hardware: A2000 8GB Ampere-based laptop GPU
NVIDIA Driver: Any from 517.88 to 528.24 (current prod. release)
DOCKER DESKTOP: Any from 4.9 to 4.16
I get the following error when I try to use GPU-enabled docker containers:
$ docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: failed to create shim task: OCI runtime create failed:
runc create failed: unable to start container process: error during container init:
error running hook #0: error running hook: exit status 1, stdout: , stderr:
Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: WSL environment detected but no adapters were found: unknown.
I have searched this error somewhat extensively, and have found many questions and answers. Such questions and answers are different because:
They deal with beta Nvidia drivers, which is dated
They deal with docker version 19 and prior, not docker desktop 20 and up
They are based on the windows insider channel, which is dated
They are relevant for older and less supported hardware - this is on an Ampere based card
They relate to the docker-nvidia2 package which is deprecated
They may relate to using the docker toolkit which does not follow the Nvidia instructions.
I followed the instructions at https://docs.nvidia.com/cuda/wsl-user-guide/index.html carefully. I also followed instructions at https://docs.docker.com/desktop/windows/wsl/. This install was previously working for me, and suddenly stopped one day without my intervention. Obviously, something changed, but I wiped all the layers of software here and reinstalled to no avail. Then I tried on other systems which previously were working, but which were outside the enterprise domain, and they too had this issue.
Relevant info from nvidia-smi in windows, wsl, and the dxdiag:
PS C:\> nvidia-smi
Mon Jan 30 10:37:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 517.88 Driver Version: 517.88 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A200... WDDM | 00000000:01:00.0 On | N/A |
| N/A 55C P8 14W / N/A | 1431MiB / 8192MiB | 6% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1880 C+G C:\Windows\System32\dwm.exe N/A |
| 0 N/A N/A 5252 C+G ...3d8bbwe\CalculatorApp.exe N/A |
| 0 N/A N/A 6384 C+G ...werToys.ColorPickerUI.exe N/A |
| 0 N/A N/A 6764 C+G ...2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 7024 C+G ...artMenuExperienceHost.exe N/A |
| 0 N/A N/A 10884 C+G ...ll\Dell Pair\DellPair.exe N/A |
| 0 N/A N/A 11044 C+G ...\AppV\AppVStreamingUX.exe N/A |
| 0 N/A N/A 11944 C+G ...me\Application\chrome.exe N/A |
| 0 N/A N/A 12896 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 14372 C+G ...---------<edit>---------- N/A |
| 0 N/A N/A 14748 C+G ...\PowerToys.FancyZones.exe N/A |
| 0 N/A N/A 15472 C+G ...ontend\Docker Desktop.exe N/A |
| 0 N/A N/A 16500 C+G ...werToys.PowerLauncher.exe N/A |
| 0 N/A N/A 17356 C+G ...---------<edit>---------- N/A |
| 0 N/A N/A 17944 C+G ...ge\Application\msedge.exe N/A |
| 0 N/A N/A 18424 C+G ...t\Teams\current\Teams.exe N/A |
| 0 N/A N/A 18544 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 18672 C+G ...oft OneDrive\OneDrive.exe N/A |
| 0 N/A N/A 19216 C+G ...t\Teams\current\Teams.exe N/A |
| 0 N/A N/A 20740 C+G ...ck\app-4.29.149\slack.exe N/A |
| 0 N/A N/A 22804 C+G ...lPanel\SystemSettings.exe N/A |
+-----------------------------------------------------------------------------+
On the WSL side
$ nvidia-smi
Mon Jan 30 10:35:55 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 517.88 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A200... On | 00000000:01:00.0 On | N/A |
| N/A 54C P8 14W / N/A | 1276MiB / 8192MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 23 G /Xwayland N/A |
+-----------------------------------------------------------------------------+
You can see from this that the card is in the WDDM mode, which addresses other answers to previous questions, as well.
cat /proc/version
Linux version 5.15.79.1-microsoft-standard-WSL2 (oe-user#oe-host) (x86_64-msft-linux-gcc (GCC)
9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) #1 SMP Wed Nov 23 01:01:46 UTC 2022
The wsl version seems fine.
Running dxdiag in windows produced a result with WHQL signed drivers with no problems reported. Everything in the hardware is exactly as it was when the containers worked previously.
This led me to query the lspci with:
$ lspci | grep -i nvidia
$
So this points to the root of the issue. My card is certainly CUDA capable, my system has the proper drivers and cuda toolkit, but the WSL2 doesn't see any NVIDIA devices. There is a 3D controller: MS Corp Basic Render Driver. As a result of this I ran the update-pciids
$ sudo update-pciids
$ ... (it pulled 270k)
$ Done.
$ lspci | grep -i nvidia
$
Blank again...
I reverted drivers to the 514.08 version to see if it would help. No dice.
Out of curiosity, I tried this entire clean install process with and without the linux CUDA install, with and without the windows CUDA toolkit, and with and without the docker nvidia toolkit. None of these changed the behavior. I tried with about 10 different versions of the nvidia driver and nvidia cuda toolkit as well. All have the same behavior, but in the CLI mode I can run current releases of everything. I just can't publish that as the fix, see below.
I was able to work around this by installing docker cli directly within Ubuntu, and then using the <edit: docker engine / docker-ce packages currently sourced from docker's repo. However, this option does not work with solutions to mount the docker-data folder to an external hard drive mounted in windows, which is a requirement for the system. (there are many reasons for this, but Docker Desktop does solve these with its WSL repo locations). /edit>
I replicated this issue with a Windows 11 machine, brought the machine fully up to date, and the issue still persisted. Thus, I have ruled out the windows 10 build version as the root cause. Plus, both systems worked previously, and neither had any updates applied manually. Both may have had automatic updates applied, however, the enterprise system cannot be controlled, and the Windows 11 machine was brought fully up to date without fixing the issue. Both machines have the issue where lspci does not see the card.
Am I just an idiot? Have I failed to set something up properly? I don't understand how my own setup could fail after it was working working properly. I have followed NVIDIA and DOCKER directions to a T. So many processes update dependencies prior to installing it feels impossible to track. But at the same time, something had to have changed and others must be experiencing this issue. Thanks for reading along this far! Suggestions on revision welcome.

bug after installing nvidia drivers on ubuntu PREEMPT_RT kernel

I installed Nvidia drivers on my Ubuntu 20.04 PC with PREEMPT_RT kernel patch:5.15.79-rt54.For installation I used this tutorial: https://gist.github.com/pantor/9786c41c03a97bca7a52aa0a72fa9387.
After installation I'm getting this bug: bug scheduling while atomic: irq/88-s-nvidia/773/0x00000002. Bug appears sometimes every few seconds or few minutes.
I'm aware that nvidia drivers are not supported on realtime kernels but maybe anyone has found a solution to this problem or workaround? I don't have much experience with ubuntu kernels but PREEMPT_RT kernel is required to control 6DOF robotic arm that I'm working with. I also need CUDA for image processing that will determine movement of the robot. I was considering use of second PC that runs PREEMPT_RT kernel and commands robot velocities. I would make all calculations on the first machine(with Nvidia GPU and generic kernel) and transfer data over TCP/IP but I'm afraid that it would add to much latency to the system. I do apologize if I didn't provide enough information but I didn't really know what would be helpful in this situation.
I tried to install old version of nvidia drivers (520.56.06) but it didn't help.
This is Nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 38C P8 5W / N/A | 243MiB / 4096MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1629 G /usr/lib/xorg/Xorg 59MiB |
| 0 N/A N/A 2154 G /usr/lib/xorg/Xorg 68MiB |
| 0 N/A N/A 2278 G /usr/bin/gnome-shell 104MiB |
+-----------------------------------------------------------------------------+

Hardware Acceleration for Headless Chrome on WSL2 Docker

my target is to server-side-render a frame of a WebGL2 application in a performant way.
To do so, I'm using Puppeteer with headless Chrome inside a Docker container running Ubuntu 20.04.4 on WSL2 on Windows 10 22H2.
However, I can't get any hardware acceleration to become active.
It seems that Chrome doesn't detect my Nvidia GTX1080 card, while on the host system with headless: false it's being used and increases render performance drastically.
I have followed the tutorial here to install the CUDA toolkit on WSL2.
A container spun up using sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark uses the GTX1080.
Running nvidia-smi inside the container shows the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.65 Driver Version: 527.37 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 49C P8 10W / N/A | 871MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
Using flags like
--ignore-gpu-blocklist
--enable-gpu-rasterization
--enable-zero-copy
--use-gl=desktop / --use-gl=egl
did not solve the problem.
I also tried libosmesa6 with --use-gl=osmesa which had no effect.
I'm starting my container with docker-compose including the following section into the service's block:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Chrome version is HeadlessChrome/108.0.5351.0.
The Chrome WebGL context's WEBGL_debug_renderer_info tells me
vendor: Google Inc. (Google)
renderer: ANGLE (Google, Vulkan 1.3.0 (SwiftShader Device (Subzero) (0x0000C0DE)), SwiftShader driver)
Visiting chrome://gpu with Puppeteer shows me the following:
enter image description here
Any ideas what might still be missing to have Chrome use the GPU for hardware acceleration?

Cannot find CUDA device even if installed on the system when accessed from pytorch

NVCC version
C:\Users\acute>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:24:09_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
NVIDIA SMI version
C:\Users\acute>nvidia-smi
Mon May 02 00:12:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 431.02 Driver Version: 431.02 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 43C P8 5W / N/A | 134MiB / 4096MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
C:\Users\acute>pip freeze
certifi==2021.10.8
charset-normalizer==2.0.12
idna==3.3
numpy==1.22.3
Pillow==9.1.0
requests==2.27.1
torch==1.11.0+cu113
torchaudio==0.11.0+cu113
torchvision==0.12.0+cu113
typing_extensions==4.2.0
urllib3==1.26.9
Accessing from torch
>>> torch.cuda.is_available()
C:\Users\acute\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\cuda_init_.py:82: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\c10\cuda\CUDAFunctions.cpp:112.)
return torch._C._cuda_getDeviceCount() > 0
False
enter code here

nvidia driver support on ubuntu in docker from host windows - 'Found no NVIDIA driver on your system' error

I have built a docker image: ubuntu20.04 + py38 + torch, various libs(llvmlite, cuda, pyyaml etc) + my flask app. The app uses torch and it needs nvidia driver inside the container. The host machine is win10 x64.
when running container and testing it with postman, the error appeared:
<head>
<title>AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx // Werkzeug Debugger</title>
On my machine nvidia-smi is:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 442.92 Driver Version: 442.92 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 40C P8 3W / N/A | 382MiB / 6144MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 6212 C+G ...ta\Local\Postman\app-7.31.0\Postman.exe N/A |
| 0 6752 C+G ...are\Brave-Browser\Application\brave.exe N/A |
+-----------------------------------------------------------------------------+
It has been asked many times on SO, and the traditional answers are that nvidia can't support gpu acceleration from windows on linux docker container.
I found similar answers. I have read question and answers to this question. But these solutions involve host ubuntu + docker image with ubuntu inside.
this link instructs how to use nvidia-docker2, but nvidia-docker2 is deprecated according to this answer
The official nvidia-docker repo has instructions - but for linux host only.
But there is also this WSL on docker(backend linux) software installed - can it be used?
Is there still way to make ubuntu container use nvidia gpu from host windows machine?
It looks like you can now run Docker in Ubuntu with the Windows Subsystem for Linux (WSL 2) and do GPU-through.
This link goes through installation, setup and running a TensorFlow Jupyter notebook with GPU support:
https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2
Note - I haven't done this myself yet.

Resources