cannot run gazebo9 on docker with privileged on ubuntu 18.04 - docker

I am stuck on this for quite a while now i have tried searching and trying stuff but i am getting nowhere.
My setup is as follows:
Host
linux Distro: Archlinux
kernel version: 5.14.2
docker version: 20.10.8, build 3967b7d28e
nvidia driver version: 470.63.01-1
nvidia container toolkit version: 1.5.0-2 , cgroups disabled.
amd gpu driver: xf86-video-amdgpu 21.0.0-1
Container
base image: ubuntu:18.04
command line : docker run -it --rm --privileged --gpus all -e DISPLAY=$DISPLAY -e XAUTHORITY=~/.Xauthority --network host --volume /tmp/.X11-unix/:/tmp/.X11-unix --volume $XAUTHORITY:/root/.Xauthority gazebo:libgazebo9-bionic gazebo
Expected results
expected gazebo window to open with hardware acceleration, using privileged access.
Actual results
On using --privileged:
si_init_perfcounters: max_sh_per_se = 2 not supported (inaccurate performance counters)
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 149 ()
Minor opcode of failed request: 2
Serial number of failed request: 35
Current serial number in output stream: 36
Without --privileged and specifying graphic cards in --device manually:
gazebo window opens up with hardware acceleration and works smoothly as expected.
Detailed description
I was actually trying to run gazebo version 9 in a custom image which i had created using ubuntu:18.04 as base image. i referred to gazebo:libgazebo9-bionic,nvidia/cuda:11.4.1-cudnn8-devel-ubuntu18.04 and ros:melodic-desktop while writing the dockerfile. i even tried the same thing for gazebo 11 on the same base image and got the same issue as above. Whereas the exactly similar setup for ubuntu foxy works smoothly. i really need to use privileged because i am going to be working on hardware for a lot of time. please help me on how should this be fixed. thanks alot
P.S. Other GUI applications (rviz,moveit,etc) are running without any issues. Im getting this issue with gazebo only.

Ok found the solution!
Gazebo was working on osrf/ros:noetic-desktop-full but not on osrf/ros:melodic-desktop-full.
I got the exact same error:
X Error of failed request: BadAlloc (insufficient resources for operation)
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 149 ()
The solution was to update the MESA drivers on the ros:melodic image from version Mesa 20.0.8 to Mesa 22.0.2.
sudo add-apt-repository ppa:kisak/kisak-mesa -y
sudo apt update
sudo apt upgrade -y
If you want to check your current Mesa version:
sudo apt install mesa-utils
glxinfo | grep Mesa

Related

Running container fails with failed to add the host cannot allocate memory

OS: Red Hat Enterprise Linux release 8.7 (Ootpa)
Version:
$ sudo yum list installed | grep docker
containerd.io.x86_64 1.6.9-3.1.el8 #docker-ce-stable
docker-ce.x86_64 3:20.10.21-3.el8 #docker-ce-stable
docker-ce-cli.x86_64 1:20.10.21-3.el8 #docker-ce-stable
docker-ce-rootless-extras.x86_64 20.10.21-3.el8 #docker-ce-stable
docker-scan-plugin.x86_64 0.21.0-3.el8 #docker-ce-stable
Out of hundreds os docker calls made over days, a few of them fails. This is the schema of the commandline:
/usr/bin/docker
run
-u
1771:1771
-a
stdout
-a
stderr
-v
/my_path:/data
--rm
my_image:latest
my_entry
--my_args
The failure:
docker: Error response from daemon: failed to create endpoint recursing_aryabhata on network bridge: failed to add the host (veth6ad97f8) <=> sandbox (veth23b66ce) pair interfaces: cannot allocate memory.
It is not reproducible. The failure rate is less than one percent. At the time this error happens system has lots of free memory. Around the time that this failure happens, the application is making around 5 docker calls per second. Each call take about 5 to 10 seconds to complete.

Turn SCTP support on Ubuntu 22.04

I am building a SCTP supporting application with Erlang and I stumbled upon some problems likely related to my machine (I tried the same code on another machine and it works just fine). I am using Ubuntu 22.04. When I try to gen_sctp:open(...) it returns: "{error,eprotonosupport}" which after some research turns out to be " The protocol type or the specified protocol is not supported within this domain.".
I tried:
sudo apt-get install libsctp-dev lksctp-tools
sctp_darn -H 0 -P 2500 -l
sctp_darn -H 0 -P 2600 -h 127.0.0.1 -p 2500 -s
And it seems to work just fine.
After:
lynis audit system | grep sctp
It returns:
* Determine if protocol 'sctp' is really needed on this system [NETW-3200]
So it seems to be enabled. What am I missing? (port is 3868)
Edit:
The port is open. I tried with ufw and iptables for all protocols and solely for sctp. It did't work.
Edit 2:
So after setting up 2 VM's Ubuntu 20.04 and Ubuntu 22.04 everything seems to work as expected. I guess I have messed something up with my system.

Docker Container nvidia/k8s-device-plugin:1.9 Keeps Reporting Error

I am trying to setup one small kubenertes cluster on my ubuntu 18.04 LTS server. Now every step is done, but checking the GPU status fails. The container keeps reporting errors:
1. Issue Description
I have done steps by Quick-Start, but when I run the test case, it reports error.
2. Steps to reproduce the issue
exec shell cmd
docker run --security-opt=no-new-privileges --cap-drop=ALL
--network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins
nvidia/k8s-device-plugin:1.9
check the erros
2020/02/09 00:20:15 Starting to serve on
/var/lib/kubelet/device-plugins/nvidia.sock
2020/02/09 00:20:15 Could not register device plugin: rpc error: code = Unimplemented desc =
unknown service deviceplugin.Registration
2020/02/09 00:20:15 Could
not contact Kubelet, retrying. Did you enable the device plugin
feature gate?
2020/02/09 00:20:15 You can check the prerequisites at:
https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2020/02/09
00:20:15 You can learn how to set the runtime at:
https://github.com/NVIDIA/k8s-device-plugin#quick-start
3. Environment Information
- outputs of nvidia-docker run --rm dlws/cuda nvidia-smi
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
outputs of nvidia-docker run --rm dlws/cuda nvidia-smi
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
contents of /etc/docker/daemon.json
contents:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
docker version: 19.03.2
kubernetes version: 1.15.2
Finally I found the answer, hope this post would be helpful for others who encounter the same issue:
For kubernetes 1.15, use k8s-device-plugin:1.11 instead. The version 1.9 is not able to communicate with kubelet.

GPU becomes unavailable when computer goes to sleep

I am using docker installation of TensorFlow .
I initiate the container using
nvidia-docker run -it -p 8888:8888 -v /*/Data/docker:/docker --name TensorFlow gcr.io/tensorflow/tensorflow:latest-gpu /bin/bash
This allows me to link a folder names "docker" in my secondary local drive with a folder inside docker container.
The issue is that whenever my computer (Ubuntu - GTX 1070 - 6700k Intel CPU) goes to sleep, the GPU becomes unavailable and code runs only on CPU. When I run the code in ipython notebook session inside docker I get:
failed call to cuInit: CUDA_ERROR_UNKNOWN.
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: 123456c234ds
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: 123456c234ds
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 4.9.3 (Ubuntu 4.9.3-13ubuntu2)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 367.57.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:293] kernel version seems to match DSO: 367.57.0
When i restart the computer, the GPU becomes available without the UNKNOWN message.
I have searched the Internet and the solutions such as sudo apt-get install nvidia-modprobe does not solve the issue.

Cannot install docker on OS X Version 10.9.5

I first tried installing VirtualBox by downloading "VirtualBox 5.0 for OS X hosts (amd64)" from the VirtualBox download page, and then installed boot2docker and docker via brew.
The first apparent issue appeared when creating the boot2docker-vm image:
$ boot2docker init
2015/07/27 21:38:13 Creating VM boot2docker-vm...
2015/07/27 21:38:13 Apply interim patch to VM boot2docker-vm (https://www.virtualbox.org/ticket/12748)
2015/07/27 21:38:13 Failed to modify VM "boot2docker-vm": exit status 1
Launching the VirtualBox manager application I can see the boot2docker-vm machine "Running", but looking at the log I see something like this which appears to indicate that the boot2docker-vm "machine" failed to boot:
00:00:04.169546 Guest Log: BIOS: Boot : bseqnr=1, bootseq=4231
00:00:04.169711 Guest Log: BIOS: Boot from Floppy 0 failed
00:00:04.170101 Guest Log: BIOS: Boot : bseqnr=2, bootseq=0423
00:00:04.170490 Guest Log: BIOS: CDROM boot failure code : 0002
00:00:04.170800 Guest Log: BIOS: Boot from CD-ROM failed
00:00:04.171190 Guest Log: BIOS: Boot : bseqnr=3, bootseq=0042
00:00:04.171795 Guest Log: int13_harddisk: function 02, unmapped device for ELDL=80
00:00:04.172304 Guest Log: BIOS: Boot from Hard Disk 0 failed
00:00:04.172706 Guest Log: BIOS: Boot : bseqnr=4, bootseq=0004
00:00:04.172991 Guest Log: BIOS: Booting from LAN...
00:00:04.191271 Display::handleDisplayResize(): uScreenId = 0, pvVRAM=0000000000000000 w=720 h=400 bpp=0 cbLine=0x0, flags=0x1
00:00:06.446949 Guest Log: BIOS: Boot from LAN failed
00:00:06.448852 Guest Log: Could not read from the boot medium! System halted.
I uninstalled everything and then tried downloading and installing from boot2docker download page, which installs VirtualBox, boot2docker, and docker all in one go. But I still see the same problem indicated above (the boot2docker-vm machine fails to boot).
I'm reluctant to make big changes to the OS X version on my laptop, since my hardware is old. But I'll try the installation sequence on a more modern machine and see if it works there.
Has anyone managed to make docker work on OS X Version 10.9.5?
EDIT (adding additional information which comments suggest might be relevant):
My machine has:
2.26GHz Intel Core 2 Duo
4Gb of RAM (1067 MHz DDR3)
NVIDIA GeForce 9400M 256 MB
OS X 10.9.5
I installed everything as the primary User (not root) on my system.
And the versions of everything which I installed are:
VirtualBox 4.3.30 r101610
boot2docker version 1.7.1
docker version 1.7.1
This issue on github might be of help (Latest version of virtual box 4.3.x works fine in the issue described). Though I would suggest to use docker-machine. Below are the steps (Installation):
$ docker-machine create --driver virtualbox dev
$ eval "$(docker-machine env dev)"
And then you can use docker commands as usual.
Some of the comments in the github issue suggested by nash_ag and this stackoverflow question pointed me in the right direction.
This is the sequence of steps I used to get VirtualBox, boot2docker, docker, and docker-machine working in my environment (where $USERNAME is my primary OS X User, who installed VirtualBox), with several wrong turns elided, and most output omitted:
$ rm -rf /Users/$USERNAME/VirtualBox\ VMs/
$ boot2docker delete
(ran VirtualBox Uninstall script from my desktop)
...
$ brew tap caskroom/cask
...
$ brew update
...
$ brew install brew-cask
...
$ brew cask install virtualbox
...
$ VBoxManage -v
5.0.0r101573
$ boot2docker -v
Boot2Docker-cli version: v1.7.1
Git commit: 8fdc6f5
$ VBoxManage list vms
(nothing)
$ boot2docker init -v
...
$ boot2docker up
...
$ eval "$(boot2docker shellinit)"
(writes .pem files)
$ brew install docker-machine
...
$ docker-machine -v
docker-machine version 0.3.1 (HEAD)
$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM
$ docker-machine create --driver virtualbox dev
...
$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM
dev virtualbox Running tcp://192.168.99.100:2376
$ VBoxManage list vms
"boot2docker-vm" {99d5c5c1-e7cc-49bf-93c7-b0cbf626d62c}
"dev" {341fd11e-86f9-46ca-89e6-39ee78458a4b}
$ eval "$(docker-machine env dev)"
$ docker run -d -p 8000:80 nginx
...
$ curl $(docker-machine ip dev):8000
<!DOCTYPE html>
...
At this point things appear to be working well enough for me to use the "standard" docs/instructions for docker and docker-machine, so my original problem is solved.

Resources