Run NVIDIA for GPGPU, Intel for graphics simultaneously - nvidia

I have a laptop running Ubuntu 18.04 with both Intel and NVIDIA graphics cards
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
01:00.0 VGA compatible controller: NVIDIA Corporation GM204M [GeForce GTX 970M] (rev a1)
I would like to use the Intel card for my actual graphics display, and my NVIDIA card for simultaneously running GPGPU things (e.g. TensorFlow models, other CUDA stuff, OpenCL). Is this possible? How would I go about this?
Ideally, I'd be able to turn the NVIDIA GPU on and off easily, so that I can just turn it on when I need to run something on it, and turn it off after to save power.
Currently, I have it set up with nvidia-prime so that I can switch between one card or the other (I need to reboot in between). However, if I've loaded the Intel card for graphics (prime-select intel), then the NVIDIA kernel drivers never get loaded and I can't access the NVIDIA GPU (nvidia-smi doesn't work).
I tried loading the NVIDIA kernel module with sudo modprobe nvidia when running the graphics on Intel, but I get ERROR: could not insert 'nvidia': No such device.

Yes, this is indeed possible. It is called "Nvidia Optimus" and means that the integrated Intel GPU is used by default to save power and the dedicated Nvidia GPU is used only for high-performance applications. Here are guides on how to set it up in Linux:
The Ultimate Guide to Setting Up Nvidia Optimus on Linux
archlinux: Nvidia Optimus

Short answer: You can give a try to my modified version of prime-select, which adds an 'hybrid' profile (graphics on Intel, TensorFlow and other CUDA stuff on Nvidia GPU). https://github.com/lperez31/prime-select-hybrid
Long answer:
I came around the same issue and found several blogs talking about different solutions, but I wanted a more straightforward solution, and I didn't want to have to switch between profiles each time I needed TensorFlow to run on Nvidia GPU.
When setting the 'intel' profile, prime-select blacklists three modules: nvidia, nvidia-drm and nvidia-modeset. It also removes the three aliases to these modules. Thus, when running in intel profile, the sudo modprobe nvidia command should fail. Indeed, if the alias would not have been removed, this command should do the trick.
In order to use Intel for graphics and Nvidia GPU for TensorFlow, the 'hybrid' profile in the modified version of prime-select above blacklists nvidia-drm and nvidia-modeset modules, but not nvidia module. Thus nvidia drivers are loaded, but as nvidia-drm (Display Rendering Manager) is not loaded, the graphics remain on Intel GPU.
If you don't want to use my version of prime-select, you could just edit /usr/bin/prime-select and comment the two following lines:
blacklist nvidia
alias nvidia off
With these lines commented, nvidia-smi command should run even in 'intel' profile, you should be able to use CUDA stuff on Nvidia GPU and your graphics will use Intel.

Related

CPU and GPU Stress test for Nvidia Jetson Xavier (Ubuntu 18.04, Jetpack 4.6)

How can I make a simultaneous CPU and GPU stress test on Jetson Xavier machine (Ubuntu 18.04, Jetpack 4.6)?
The only code found is
https://github.com/JTHibbard/Xavier_AGX_Stress_Test with tough package incompatibility issues. It only works for CPU.
Anyone can contribute with providing another code or solve the issue with the mentioned one? A python code is preferred.
Solution found. For CPU stress test, the above link works. It needs numba package to be installed. For GPU stress test, the samples in cuda folder of the Nvidia Jetson machines can be simply and efficiently used. The samples are in the /usr/local/cuda/samples. Choose one and compile it using sudo make. The compiled test file will be accessible in /usr/local/cuda/samples/bin/aarch64/linux/release (aarch64 may differ in different architectures). Run the test and check the performances using sudo jtop in another command line.

RTX 3080 LHR Missing gpu__dram_throughput CUDA metric

As part of a machine learning project, we are optimizing some custom CUDA kernels.
We are trying to profile them using Nsight Compute, but encounter the following error running on the LHR RTX 3080 when running a simple wrapper around the CUDA Kernel:
==ERROR== Failed to access the following 4 metrics: dram__cycles_active.avg.pct_of_peak_sustained_elapsed, dram__cycles_elapsed.avg.per_second, gpu__compute_memory_throughput.avg.pct_of_peak_sustained_elapsed, gpu__dram_throughput.avg.pct_of_peak_sustained_elapsed
==ERROR== Failed to profile kernel "kernel" in process 20204
Running a diff against the metrics available on an RTX 3080 TI (non-LHR) vs an RTX-3080 (LHR) via nv-nsight-cu-cli --devices 0 --query-metrics, We notice the following metrics are missing in the RTX 3080 LHR version:
gpu__compute_memory_request_throughput
gpu__compute_memory_throughput
gpu__dram_throughput
All of these are required for even basic memory profiling using Nsight Compute. All other metrics are correctly present, except for these. Is this a limitation of LHR cards? Why would they not be present?
Details:
Gigabyte RTX 3080 Turbo (LHR)
Cuda Version: 11.5
Driver version: 497.29.
Windows 10
I saw your post on the nvidia developer forums and from what it looks like, nvidia didn't intend on this, so I'd either just go with what works (non-lhr) for now until they fix it. Quadro and tesla cards are supported by Nsight Compute so they might be a holdover solution.
So to answer the main question:
Will buying a non-LHR GPU address this problem?
for right now, yes, buying a non-lhr 3080 should fix the issue.
As per Nvidia forums, this is an unintended bug that is fixed by upgrading from CUDA 11.5 to CUDA 11.6, under which all profiling is working correctly with all metrics available.
Successful conditions:
Gigabyte RTX 3080 Turbo (LHR)
Cuda Version: 11.6
Driver version: 511.23.
Windows 10
We don't know why these metrics were unavailable, but the version update is definitely the correct fix.

GPU Memory in TensorFlow container with NVIDIA SLI

I have constructed a machine-learning computer with two RTX 2070 SUPER NVIDIA GPUs connected with SLI Bridge, Windows OS (SLI verified in NVIDIA Control Panel).
I have benchmarked the system using http://ai-benchmark.com/alpha and got impressive results.
In order to take the best advantage of libraries that use the GPU for scientific tasks (cuDF) I have created a TensorFlow Linux container:
https://www.tensorflow.org/install/docker
using “latest-gpu-py3-jupyter” tag.
I have then connected PyCharm to this container and configured its interpreter as an interpreter of the same project (I mounted the host project folder in the container).
When I run the same benchmark on the container, I get the error:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[50,56,56,144] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[node MobilenetV2/expanded_conv_2/depthwise/BatchNorm/FusedBatchNorm (defined at usr/local/lib/python3.6/dist-packages/ai_benchmark/utils.py:238) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
This error relates to the exhaustion of GPU memory inside the container.
Why is the GPU on the windows host successfully handle the computation and the GPU on the Linux container exhaust the memory?
What makes this difference? is that related to memory allocation in the container?
Here is an awesome link from docker.com that explains why your desired workflow won't work. It also wouldn't work with RAPIDS cudf either. Docker Desktop works using Hyper V which isolates the hardware and doesn't access to the GPU the way the Linux drivers expect. Also, nvidia-docker is linux only
I can let you know that RAPIDS (cudf) currently doesn't support this implementation either. Windows, however, does work better with a Linux host. For both tensorflow and cudf, I strongly recommend that you use (or dual boot) one of the recommended OSes as your host OS, found here: https://rapids.ai/start.html#prerequisites. If you need Windows in your workflow, you can run it on top of your Linux host.
There is a chance that in the future, a WSL version will allow you to run RAPIDS on Windows, letting you- on your own- craft an on Windows solution.
Hope this helps!

ML training process is not on GPU

I just moved from AWS to Google Cloud Platform because of its lower GPU price. I followed the instructions on the website creating a compute engine instance with a K80 GPU, installed the latest-versioned Tensorflow, keras, cuda driver and cnDNN, everything goes very well. However when I try to train my model, the training process is still on CPU.
NVIDIA-SMI 387.26 Driver Version: 387.26
Cuda compilation tools, release 9.1, V9.1.85
Tensorflow version -1.4.1
cudnn cudnn-9.1-linux-x64-v7
Ubuntu 16.04
Perhaps you installed the cpu version of tensorflow?
Now, Google Cloud's compute engine has a VM OS image with all the needed software installed for an easier/faster way to get started: https://cloud.google.com/deep-learning-vm/

How to run GPGPU inside docker image with different from host kernel and GPU driver version

I have machine with several GPUs. My idea is to attach them to different docker instances in order to use that instances in CUDA (or OpenCL) calculations.
My goal is to setup docker image with quite old Ubuntu and quite old AMD video drivers (13.04). Reason is simple: upgrade to newer version of driver will broke my OpenCL program (due to buggy AMD linux drivers).
So question is following. Is it possible to run docker image with old Ubuntu, old kernel (3.14 for example) and old AMD (fglrx) driver on fresh Arch Linux setup with fresh kernel 4.2 and newer AMD (fglrx) drivers in repository?
P.S. I tried this answer (with Nvidia cards) and unfortunately deviceQuery inside docker image doesn't see any CUDA devices (as It happened with some commentors of original answer)...
P.P.S. My setup:
CPU: Intel Xeon E5-2670
GPUs:
1 x Radeon HD 7970
$ lspci -nn | grep Rad
83:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X] [1002:6798]
83:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT HDMI Audio [Radeon HD 7970 Series] [1002:aaa0]
2 x GeForce GTX Titan Black
With docker you rely on virtualization on Operating System level. That means you use the same kernel in all containers. If you wish to run different kernels for each container, you'll probably have to use system-level virtualization, e.g., KVM, VirtualBox. If your setup supports Intel's VT-d, you can pass the GPU as a PCIe device to the container(better terminology in this case is, Virtual Machine).

Resources