Yolo v7 not detecting objects on image - machine-learning

I'm trying Yolo v7, it seems to be working, but the resulted image has no object detection mapping on it, while it should have.
I read the Github to how to setup Yolo v7 on Docker, here's the full commands you should be able to reproduce my problem.
git clone https://github.com/WongKinYiu/yolov7
cd yolov7
nvidia-docker run --name yolov7 -it --rm -v "$CWD":/yolov7 --shm-size=64g nvcr.io/nvidia/pytorch:21.08-py3
// on the container
cd /yolov7
python -m pip install virtualenv
python -m virtualenv venv3
. venv3/bin/activate
pip install -r requirements.txt
apt update
apt install -y zip htop screen libgl1-mesa-glx
pip install seaborn thop
python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg
And this is the console output of the last command:
# python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg
Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.25, device='', exist_ok=False, img_size=640, iou_thres=0.45, name='exp', no_trace=False, nosave=False, project='runs/detect', save_conf=False, save_txt=False, source='inference/images/horses.jpg', update=False, view_img=False, weights=['yolov7.pt'])
YOLOR 🚀 v0.1-115-g072f76c torch 1.13.0+cu117 CUDA:0 (NVIDIA GeForce GTX 1650, 3903.875MB)
Fusing layers...
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
Model Summary: 306 layers, 36905341 parameters, 6652669 gradients
Convert model to Traced-model...
traced_script_module saved!
model is traced!
/yolov7/venv3/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Done. (150.9ms) Inference, (0.3ms) NMS
The image with the result is saved in: runs/detect/exp6/horses.jpg
Done. (0.616s)
Now, I should be able to see the detections on the generated image runs/detect/exp6/horses.jpg, from the original image inference/images/horses.jpg, right? But I see the two images the same, no difference. What's wrong with the setup?
Nvidia driver:
$ nvidia-smi
Tue Dec 6 09:47:03 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 45% 27C P8 N/A / 75W | 13MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1152 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 1256 G /usr/bin/gnome-shell 2MiB |
+-----------------------------------------------------------------------------+
Ubuntu version:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal

modify variable half on detect.py file to False
Line 31 : half = False

When we are using the Gpu . The program allows only to use half precision of the Model . This is what is written in Line 31 of detect.py(# half precision only supported on CUDA).
By changing it to False :
Line 31 : half = False
Does the magic for me .

I came across the same issue too. It is basically what others mentioned. Some explanation is added below.
The reason is in line 31 half = device.type != 'cpu' # half precision only supported on CUDA.
Not all GPUs nor all Nvidia GPUs with CUDA support half-precision (16 bit) floats, esp if your GPU is a bit older. In my case, I was using an AMD 5700XT (via ROCM) - this GPU also has no fp16 support!
To make it configurable, I added a command line argument which let the user to override the variable half mentioned in other answers;
# After Line 31~, check if command line override is present.
# device = select_device(opt.device)
half = opt.fp16 and device.type != 'cpu' # half precision only supported on CUDA
# After line (169~) with `parser = argparse.ArgumentParser()`
parser.add_argument("--fp16", type=bool, default=False, help="Use float16 (Some GPUs only)")

Related

bug after installing nvidia drivers on ubuntu PREEMPT_RT kernel

I installed Nvidia drivers on my Ubuntu 20.04 PC with PREEMPT_RT kernel patch:5.15.79-rt54.For installation I used this tutorial: https://gist.github.com/pantor/9786c41c03a97bca7a52aa0a72fa9387.
After installation I'm getting this bug: bug scheduling while atomic: irq/88-s-nvidia/773/0x00000002. Bug appears sometimes every few seconds or few minutes.
I'm aware that nvidia drivers are not supported on realtime kernels but maybe anyone has found a solution to this problem or workaround? I don't have much experience with ubuntu kernels but PREEMPT_RT kernel is required to control 6DOF robotic arm that I'm working with. I also need CUDA for image processing that will determine movement of the robot. I was considering use of second PC that runs PREEMPT_RT kernel and commands robot velocities. I would make all calculations on the first machine(with Nvidia GPU and generic kernel) and transfer data over TCP/IP but I'm afraid that it would add to much latency to the system. I do apologize if I didn't provide enough information but I didn't really know what would be helpful in this situation.
I tried to install old version of nvidia drivers (520.56.06) but it didn't help.
This is Nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 38C P8 5W / N/A | 243MiB / 4096MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1629 G /usr/lib/xorg/Xorg 59MiB |
| 0 N/A N/A 2154 G /usr/lib/xorg/Xorg 68MiB |
| 0 N/A N/A 2278 G /usr/bin/gnome-shell 104MiB |
+-----------------------------------------------------------------------------+

Hardware Acceleration for Headless Chrome on WSL2 Docker

my target is to server-side-render a frame of a WebGL2 application in a performant way.
To do so, I'm using Puppeteer with headless Chrome inside a Docker container running Ubuntu 20.04.4 on WSL2 on Windows 10 22H2.
However, I can't get any hardware acceleration to become active.
It seems that Chrome doesn't detect my Nvidia GTX1080 card, while on the host system with headless: false it's being used and increases render performance drastically.
I have followed the tutorial here to install the CUDA toolkit on WSL2.
A container spun up using sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark uses the GTX1080.
Running nvidia-smi inside the container shows the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.65 Driver Version: 527.37 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 49C P8 10W / N/A | 871MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
Using flags like
--ignore-gpu-blocklist
--enable-gpu-rasterization
--enable-zero-copy
--use-gl=desktop / --use-gl=egl
did not solve the problem.
I also tried libosmesa6 with --use-gl=osmesa which had no effect.
I'm starting my container with docker-compose including the following section into the service's block:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Chrome version is HeadlessChrome/108.0.5351.0.
The Chrome WebGL context's WEBGL_debug_renderer_info tells me
vendor: Google Inc. (Google)
renderer: ANGLE (Google, Vulkan 1.3.0 (SwiftShader Device (Subzero) (0x0000C0DE)), SwiftShader driver)
Visiting chrome://gpu with Puppeteer shows me the following:
enter image description here
Any ideas what might still be missing to have Chrome use the GPU for hardware acceleration?

Vulkan does not detect GPU when running Unity build in Docker container

Running Unity builds on my PC usually works fine.
However when I am trying to run Unity builds within a Docker container I get a segmentation error Segmentation fault (core dumped). I am using Ubuntu 20.04 with an Nvidia GTX1080 and installed all required dependencies like the Nvidia Docker toolkit.
Looking at the logs generated by Unity, it seems that my Nvidia GPU is not detected by Vulkan.
[Vulkan init] SelectPhysicalDevice requestedDeviceIndex=-1 xrDevice=(nil)
[Vulkan init] Physical Device 0xfe9930 [0]: "llvmpipe (LLVM 12.0.0, 256 bits)" deviceType=4 vendorID=10005 deviceID=0
[Vulkan init] Selected physical device (nil)
Caught fatal signal - signo:11 code:1 errno:0 addr:(nil)
By looking at the output of vulkan-info only llvmpipe is detected as physical device.
GPU0:
VkPhysicalDeviceProperties:
---------------------------
apiVersion = 4198582 (1.1.182)
driverVersion = 1 (0x0001)
vendorID = 0x10005
deviceID = 0x0000
deviceType = PHYSICAL_DEVICE_TYPE_CPU
deviceName = llvmpipe (LLVM 12.0.0, 256 bits)
In my Dockerfile I set following Nvidia settings
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES all
and used --gpus='all,"capabilities=compute,utility,graphics,display"' -e DISPLAY when starting the container.
Also running nvidia-smi within the container works.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.203.03 Driver Version: 450.203.03 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:15:00.0 On | N/A |
| 12% 28C P8 18W / 250W | 644MiB / 11170MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Any ideas on resolving this problem? Thanks!
I don't know what the problem was, but i found a workaround.
Instead of using ubuntu:20.04 as base image I am using unityci/editor:ubuntu-2022.1.20f1-base-1 in my Dockerfile now.
I am using following settings in a docker compose file to start the container:
unity:
build:
dockerfile: unity.Dockerfile
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
- ${XAUTHORITY}:${XAUTHORITY}
- $XDG_RUNTIME_DIR:$XDG_RUNTIME_DIR
environment:
- XAUTHORITY
- DISPLAY
- XDG_RUNTIME_DIR
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]

nvidia driver support on ubuntu in docker from host windows - 'Found no NVIDIA driver on your system' error

I have built a docker image: ubuntu20.04 + py38 + torch, various libs(llvmlite, cuda, pyyaml etc) + my flask app. The app uses torch and it needs nvidia driver inside the container. The host machine is win10 x64.
when running container and testing it with postman, the error appeared:
<head>
<title>AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx // Werkzeug Debugger</title>
On my machine nvidia-smi is:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 442.92 Driver Version: 442.92 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 40C P8 3W / N/A | 382MiB / 6144MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 6212 C+G ...ta\Local\Postman\app-7.31.0\Postman.exe N/A |
| 0 6752 C+G ...are\Brave-Browser\Application\brave.exe N/A |
+-----------------------------------------------------------------------------+
It has been asked many times on SO, and the traditional answers are that nvidia can't support gpu acceleration from windows on linux docker container.
I found similar answers. I have read question and answers to this question. But these solutions involve host ubuntu + docker image with ubuntu inside.
this link instructs how to use nvidia-docker2, but nvidia-docker2 is deprecated according to this answer
The official nvidia-docker repo has instructions - but for linux host only.
But there is also this WSL on docker(backend linux) software installed - can it be used?
Is there still way to make ubuntu container use nvidia gpu from host windows machine?
It looks like you can now run Docker in Ubuntu with the Windows Subsystem for Linux (WSL 2) and do GPU-through.
This link goes through installation, setup and running a TensorFlow Jupyter notebook with GPU support:
https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2
Note - I haven't done this myself yet.

Old docker containers are not usable (no GPU) after updating the GPU driver in the host machine

Today, we updated the GPU driver for our host machine, and the new containers that we created are all working fine. However, all of our existing docker containers give the following error when running the nvidia-smi command inside:
Failed to initialize NVML: Driver/library version mismatch
How to rescue these old containers? Our previous GPU driver version in the host machine was 384.125 and it is now 430.64.
Host Configuration
nvidia-smi gives
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... Off | 00000000:07:00.0 On | 0 |
| N/A 40C P0 39W / 300W | 182MiB / 32505MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... Off | 00000000:08:00.0 Off | 0 |
| N/A 40C P0 39W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-DGXS... Off | 00000000:0E:00.0 Off | 0 |
| N/A 39C P0 40W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-DGXS... Off | 00000000:0F:00.0 Off | 0 |
| N/A 40C P0 38W / 300W | 12MiB / 32508MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1583 G /usr/lib/xorg/Xorg 169MiB |
+-----------------------------------------------------------------------------+
nvcc --version gives
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
dpkg -l | grep -i docker gives
ii dgx-docker-cleanup 1.0-1 amd64 DGX Docker cleanup script
rc dgx-docker-options 1.0-7 amd64 DGX docker daemon options
ii dgx-docker-repo 1.0-1 amd64 docker repository configuration file
ii docker-ce 5:18.09.2~3-0~ubuntu-xenial amd64 Docker: the open-source application container engine
ii docker-ce-cli 5:18.09.2~3-0~ubuntu-xenial amd64 Docker CLI: the open-source application container engine
ii nvidia-container-runtime 2.0.0+docker18.09.2-1 amd64 NVIDIA container runtime
ii nvidia-docker 1.0.1-1 amd64 NVIDIA Docker container tools
rc nvidia-docker2 2.0.3+docker18.09.2-1 all nvidia-docker CLI wrapper
docker version gives
Client:
Version: 18.09.2
API version: 1.39
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:50 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 03:42:13 2019
OS/Arch: linux/amd64
Experimental: false
I ran into this issue as well. In my case, I had the line:
apt install -y nvidia-cuda-toolkit
in my Dockerfile. Removing this line resolved the issue. In general, I would recommend using an nvidia provided container compatible with the drivers on your local machine.

Resources