Python script that prints all devices on bear-metal ubuntu18 (The host) doesn't see those devices in Docker container
Host: Driver Version: 470.141.03 CUDA Version: 11.4
import pyrender
from pyrender.platforms import egl
from OpenGL.EGL import *
devices = egl.query_devices()
print(devices)
The script works (prints all 8 GPUs) when runs on bare metal Ubuntu18 (amd64) but fails on Docker.
I tried image built from
FROM nvidia/cudagl:11.4.0-devel-ubuntu18.04
and from
FROM tensorflow/tensorflow:2.7.0-gpu
Also, tried NVIDIA_DRIVER_CAPABILITIES=all and NVIDIA_VISIBLE_DEVICES=all
Related
I am stuck on this for quite a while now i have tried searching and trying stuff but i am getting nowhere.
My setup is as follows:
Host
linux Distro: Archlinux
kernel version: 5.14.2
docker version: 20.10.8, build 3967b7d28e
nvidia driver version: 470.63.01-1
nvidia container toolkit version: 1.5.0-2 , cgroups disabled.
amd gpu driver: xf86-video-amdgpu 21.0.0-1
Container
base image: ubuntu:18.04
command line : docker run -it --rm --privileged --gpus all -e DISPLAY=$DISPLAY -e XAUTHORITY=~/.Xauthority --network host --volume /tmp/.X11-unix/:/tmp/.X11-unix --volume $XAUTHORITY:/root/.Xauthority gazebo:libgazebo9-bionic gazebo
Expected results
expected gazebo window to open with hardware acceleration, using privileged access.
Actual results
On using --privileged:
si_init_perfcounters: max_sh_per_se = 2 not supported (inaccurate performance counters)
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 149 ()
Minor opcode of failed request: 2
Serial number of failed request: 35
Current serial number in output stream: 36
Without --privileged and specifying graphic cards in --device manually:
gazebo window opens up with hardware acceleration and works smoothly as expected.
Detailed description
I was actually trying to run gazebo version 9 in a custom image which i had created using ubuntu:18.04 as base image. i referred to gazebo:libgazebo9-bionic,nvidia/cuda:11.4.1-cudnn8-devel-ubuntu18.04 and ros:melodic-desktop while writing the dockerfile. i even tried the same thing for gazebo 11 on the same base image and got the same issue as above. Whereas the exactly similar setup for ubuntu foxy works smoothly. i really need to use privileged because i am going to be working on hardware for a lot of time. please help me on how should this be fixed. thanks alot
P.S. Other GUI applications (rviz,moveit,etc) are running without any issues. Im getting this issue with gazebo only.
Ok found the solution!
Gazebo was working on osrf/ros:noetic-desktop-full but not on osrf/ros:melodic-desktop-full.
I got the exact same error:
X Error of failed request: BadAlloc (insufficient resources for operation)
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 149 ()
The solution was to update the MESA drivers on the ros:melodic image from version Mesa 20.0.8 to Mesa 22.0.2.
sudo add-apt-repository ppa:kisak/kisak-mesa -y
sudo apt update
sudo apt upgrade -y
If you want to check your current Mesa version:
sudo apt install mesa-utils
glxinfo | grep Mesa
I've noticed that when Nmap is dockerized it is yielding incorrect OS results. I've tried various pre-built docker images as well as one I created myself and they all show the same results.
Here are a few of the pre-built images I've tried:
https://hub.docker.com/r/instrumentisto/nmap
https://hub.docker.com/r/uzyexe/nmap/
I've run the same Nmap command with these images and using my locally installed Nmap version and here are the results (all images are using Nmap 7.80):
$ nmap -sV -O 192.168.1.1
------(locally installed nmap result - correct):
OS CPE: cpe:/o:linux:linux_kernel:2.6
OS details: Linux 2.6.8 - 2.6.30
Network Distance: 1 hop
Service Info: OS: Linux; Device: broadband router; CPE: cpe:/o:linux:linux_kernel
------(all docker image nmap results - incorrect):
OS CPE: cpe:/h:hp:jetdirect_170x cpe:/h:hp:inkjet_3000
Aggressive OS guesses: HP 170X print server or Inkjet 3000 printer (85%), HP LaserJet 4000 printer (85%), HP LaserJet 4250 printer (85%)
No exact OS matches for host (test conditions non-ideal).
Service Info: OS: Linux; Device: broadband router; CPE: cpe:/o:linux:linux_kernel
What's interesting to me is that the Service Info is actually correct across the scans, but nothing else is.
I'm trying to figure out of there is a setting/flag that I'm missing when executing the docker command. Here's what I've tried:
Setting the docker network to host (no change in result)
Setting the docker network to bridge (no change in result)
Not setting any network setting (no change in result)
I really need to get Nmap working in a docker container because it's integrated into a rails web app that I'm building utilizing the ruby-nmap gem.
Thanks!
I am trying to setup one small kubenertes cluster on my ubuntu 18.04 LTS server. Now every step is done, but checking the GPU status fails. The container keeps reporting errors:
1. Issue Description
I have done steps by Quick-Start, but when I run the test case, it reports error.
2. Steps to reproduce the issue
exec shell cmd
docker run --security-opt=no-new-privileges --cap-drop=ALL
--network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins
nvidia/k8s-device-plugin:1.9
check the erros
2020/02/09 00:20:15 Starting to serve on
/var/lib/kubelet/device-plugins/nvidia.sock
2020/02/09 00:20:15 Could not register device plugin: rpc error: code = Unimplemented desc =
unknown service deviceplugin.Registration
2020/02/09 00:20:15 Could
not contact Kubelet, retrying. Did you enable the device plugin
feature gate?
2020/02/09 00:20:15 You can check the prerequisites at:
https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2020/02/09
00:20:15 You can learn how to set the runtime at:
https://github.com/NVIDIA/k8s-device-plugin#quick-start
3. Environment Information
- outputs of nvidia-docker run --rm dlws/cuda nvidia-smi
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
outputs of nvidia-docker run --rm dlws/cuda nvidia-smi
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
contents of /etc/docker/daemon.json
contents:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
docker version: 19.03.2
kubernetes version: 1.15.2
Finally I found the answer, hope this post would be helpful for others who encounter the same issue:
For kubernetes 1.15, use k8s-device-plugin:1.11 instead. The version 1.9 is not able to communicate with kubelet.
I am using docker installation of TensorFlow .
I initiate the container using
nvidia-docker run -it -p 8888:8888 -v /*/Data/docker:/docker --name TensorFlow gcr.io/tensorflow/tensorflow:latest-gpu /bin/bash
This allows me to link a folder names "docker" in my secondary local drive with a folder inside docker container.
The issue is that whenever my computer (Ubuntu - GTX 1070 - 6700k Intel CPU) goes to sleep, the GPU becomes unavailable and code runs only on CPU. When I run the code in ipython notebook session inside docker I get:
failed call to cuInit: CUDA_ERROR_UNKNOWN.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: 123456c234ds
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: 123456c234ds
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 4.9.3 (Ubuntu 4.9.3-13ubuntu2)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 367.57.0
When i restart the computer, the GPU becomes available without the UNKNOWN message.
I have searched the Internet and the solutions such as sudo apt-get install nvidia-modprobe does not solve the issue.
I'm trying to connect to my docker instance the devices I have connected to my laptop.
Concretely I have 4 devices (two iphones, two android) and I would like to be able to start 4 docker instances and connect each device to one instance.
What I expected to do is as simply as in ubuntu
docker run --privileged -v /dev/bus/usb:/dev/bus/usb -d -P my-android:0.0.1
But my host OS is a Mac OS X, also the instances I'm creating, because I need access to the instruments tool.
but so far I read that under mac os x, devices are connected directly though usb not being mounted.
this is what I got when I search for the iphone device:
iPhone USB:
Type: Ethernet
BSD Device Name: en6
IPv4:
Configuration Method: DHCP
IPv6:
Configuration Method: Automatic
Proxies:
Exceptions List: *.local, 169.254/16
FTP Passive Mode: Yes
Do you know how can I connect the devices to the docker instances?
Thanks!!!!
I got this working with the docker-machine on virtualbox with the VirtualBox Extension Pack installed (provides support for USB 2.0 and USB 3.0 devices).
have the mobile phone connected to the host system.
$ ioreg -p IOUSB | grep SAMSUNG
+-o SAMSUNG_Android#14100000 <class AppleUSBDevice, id 0x100000c66, registered, matched, active, busy 0 (13 ms), retain 34>
create the a docker machine with the virtualbox driver (I've named it
base)
docker-machine create --driver virtualbox base
stop the machine to enable the USB Controller on the VM
docker-machine stop base
docker-machine start base
- activate the base VM as docker host
eval $(docker-machine env base)
- start ubuntu container with the usb devices mounted
docker run -it --rm -v /dev/usb/bus:/dev/bus/usb ubuntu /bin/bash
- install the usbutils just to demo with lsusb that the android device is connected
root#ce1e4be0bb73:/# apt-get update && apt-get install -y usbutils
1st run of lsusb (did not show the device)
root#ce1e4be0bb73:/# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
to show the device I had to unplug and plug again my phone, 2nd run of lsusb
root#ce1e4be0bb73:/# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 04e8:6860 Samsung Electronics Co., Ltd Galaxy (MTP)
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub