I'm currently working on running DL algorithms inside docker containers and I've been successful. However, I can only get it running by passing --net=host flag to docker run command which makes container use host computer's network interface. If I don't pass that flag it throws the following error:
No EGL Display
nvbufsurftransform: Could not get EGL display connection
No protocol specified
nvbuf_utils: Could not get EGL display connection
When I do
echo $DISPLAY
it outputs :0 which is correct.
But I don't understand what Gstreamer, X11 or EGL has to do with full network feature. Is there any explanation for this or any workaround except --net=host flag? Because of this reason I can't map different ports for various containers.
I also have created a topic on this on NVIDIA DevTalk Forum but it still is a dark spot for me. I didn't satisfied with the answers I got.
But it is OK to use --net=host flag to solve this problem anyways.
Quick heads up: Gstreamer is not working over X11-Forwarding natively, you better have to use VNC solution, or have access to the physical machine.
Troubleshooting
is gstreamer installed? apt install -y gstreamer1.0-plugins-base
what does xrandr returns?
what does xauth list returns?
what does gst-launch-1.0 nvarguscamerasrc ! nvoverlaysink returns?
For example:
On my setup because I do not use a dockerfile I copy the xauth list cookie then paste it in docker
xauth add user/unix:11 MIT-MAGIC-COOKIE cccccccccccccccccccccccccc
After this I can test display with xterm&.
Besides, once this is done I have an output with xrandr
Getting more verbose
Also I connect to the docker by an ssh connection with verbose (to host / or / guest we don't care) ssh -X -v user#192.168.123.123
therefore the EGL error is wrapped by debug details.
stream stuff
This is related to Deepstream and Gstreamer customization from nVidia.
Some nvidia threads point that EGL needs a "sink" but no X11 display.
If there is some server running on the host at a designed port, running docker with --net=host will allow a client to connect within the docker.
According to the the doc, there are some servers used by the Gpu.
Doc
$DISPLAY
According to nVidia threads: unset DISPLAY provides better results.
On my setup, without display, the EGL error is gone. Then the stream cannot be seen.
Related
I have a docker container that uses a gstreamer plugin to capture the input of a camera. It runs fine with a Bastler camera but now I need to use an IDS uEye camera. To be able to use this camera I need to have the ueyeusbdrc service running. The IDS documentation says that to start it I can run sudo systemctl start ueyeusbdrc or sudo /etc/init.d/ueyeusbdrc start. The problem is that when the docker container runs, that service is not running and I get a Failed to initialize camera error, which is the same error I get if I run gst-launch-1.0 -v idsueyesrc ! videoconvert ! autovideosink and the ueyeusbdrc service is not running outside the container in my PC. So this tells me that the issue is that the ueyeusbdrc service is not running inside the container.
How can I run the ueyeusbdrc inside the docker container? I tried to run /etc/init.d/ueyeusbdrc start in the .sh script that launches the application (which is called using ENTRYPOINT ["<.sh file>"] in the Dockerfile), but it fails. Also if I try to use sudo, it tells me that the command doesn't exist. If I run systemctl it also tells me the command doesn't exist. BTW, I am running the docker with privileged: true (at least that's what is set in the docker-compose.yml file).
I am using Ubuntu 18.04.
Update:
I mapped /run/ueyed and /var/run/ueyed to the container and that changed the error from Failed to initialize camera to Failed to initialize video capture. It may be that I can run the daemon in the host and there is a way to hook it to the container. Any suggestions on how to do that?
Finally got this working. I had to add a few options to the docker command (in my case to the docker-compose yml file). I based my solution on the settings found here: https://github.com/chalmers-revere/opendlv-device-camera-ueye
Adding these arguments to the docker command solved the issue: --ipc=host --pid=host -v /var/run:/var/run. With these options there is no need to run the service inside the container.
The other key part is to install the IDS software inside the docker container. This can be easily done by downloading, extracting and running the installer (the git repo mentioned above has an outdated version, but the most recent version can be found in the IDS web page).
Also, make sure the system service for the IDS uEye camera is running in the host (sudo systemctl start ueyeusbdrc).
I'm trying to create a Windows Docker Container with access to GPUs. To start I just wanted to try check if I can access GPU on Docker containers.
Dockerfile
FROM mcr.microsoft.com/windows:1903
CMD [ "ping", "-t", "localhost" ]
Build and run
docker build -t debug_image .
docker run -d --gpus all --mount src="C:\Program Files\NVIDIA Corporation\NVSMI",target="C:\Program Files\NVIDIA Corporation\NVSMI",type=bind debug_image
docker exec -it CONTAINER_ID powershell
Problem and question
Now that I'm inside, I try to execute my shared NVIDIA SMI executable. However, I got an error and it's not capable of running. The obvious question is why, if the host is capable.
PS C:\Program Files\NVIDIA Corporation\NVSMI> .\nvidia-smi.exe
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA
driver. Make sure that the latest NVIDIA driver is installed and
running. This can also be happening if non-NVIDIA GPU is running as
primary display, and NVIDIA GPU is in WDDM mode.
About NVIDIA Driver, AFAIK it should not return any problem, since it works on the HOST, where NVIDIA Driver is installed.
My host has 2 NVIDIA GPUs, and it has no "primary" display as it's a server with no screen connected. AFAIK, it's CPU doesn't have an integrated GPU, so I would assume one of the connected NVIDIA GPUs is the primary display (if it does exist when no display is connected to the server)(also, I think one should be it, because one renders the screen when I connect through TeamViewer if needed, and dxdiag returns one of them as Display 1).
About WDDM mode, I've found ways to change it, but didn't found ways to check the current mode.
So basically the question, is why is it not working? Any insight or help in the previous points would be helpful.
Update.
About:
1) I've updated my drivers from 431 to 441, latest version available for GTX 1080 Ti, and the error message remains the same.
2-3) I've confirmed that GTX (Except some Titan models) cannot run in TCC mode. Therefore they're running in WDDM mode.
I have been trying to connect Spyder to a docker container running on a remote server and failing time and again. Here is a quick diagram of what I am trying to achieve:
Currently I am launching the docker container on the remote machine through ssh with
docker run --runtime=nvidia -it --rm --shm-size=2g -v /home/timo/storage:/storage -v /etc/passwd:/etc/passwd -v /etc/group:/etc/group --ulimit memlock=-1 -p 8888:8888 --ipc=host ufoym/deepo:all-jupyter
so I am forwarding on port 8888. Then inside the docker container I am running
jupyter notebook --no-browser --ip=0.0.0.0 --port=8888 --allow-root --notebook-dir='/storage'
OK, now for the Spyder part - As per the instructions here, I go to ~/.local/share/jupyter/runtime, where I find the following files:
kernel-ada17ae4-e8c3-4e17-9f8f-1c029c56b4f0.json nbserver-11-open.html nbserver-21-open.html notebook_cookie_secret
kernel-e81bc397-05b5-4710-89b6-2aa2adab5f9c.json nbserver-11.json nbserver-21.json
Not knowing which one to take, I copy them all to my local machine.
I now go to Consoles->Connect to an Existing Kernel, which gives me the "Connect to an Existing Kernel" window which I fill out as so (of course using my actual remote IP address):
(here I have chosen the first of the json files for Connection info:). I hit enter and Spyder goes dark and crashes.
This happens regardless of which connection info file I choose. So, my questions are:
1: Am I doing all of this correctly? I have found lots of instructions for how to connect to remote servers, but not so far for specifically connecting to a jupyter notebook on a docker on a remote server.
2: If yes, then what else can I do to troubleshoot the issues I am encountering?
I should also note that I have no problems connecting to the Jupyter Notebook through the browser on my local machine. It's just that I would prefer to be working with Spyder as my IDE.
Many thanks in advance!
This isn't a solution so much as a work around, but sshfs might be of help
Use sshfs to mount the remote machine's home directory on a local directory, then your local copy of Spyder can edit the file as if it were a local file.
sshfs remotehost.com:/home/user/ ./remote-host/
It typically takes about half a second to upload the changes to an AWS host when you I hit save in Spyder, which is an acceptable delay for me. When it's time to run the code, ssh into the remote machine, and run the code from an IPython shell. It's not elegant, but it does work.
I'm not expecting this to be the best answer, but maybe you can use it as a stopgap solution.
I have the same problem with you. I got it working, maybe a bit clumsy as I am totally new to docker. Here are my steps and notes on where we differ, hope this helps:
Launch docker conatiner in remote machine:
docker run --gpus all --rm -ti --net=host -v /my_storage/data:/home/data -v /my_storage/JSON:/root/.local/share/jupyter/runtime repo/tensorflow:20.03-tf2-py3
I use a second volume mount, in order to get kernel.json file to my local computer. I couldn't manage to access directly from docker via ssh, as it is in /root/ folder in docker container, and with root-only access. If you know how to read from there directly, I'll be happy to learn. My workaround is:
On remote machine, create a JSON/ directory, and map it to the "jupyter --runtime-dir" in container. Once the kernel is created, access the kernel-xxx.json file through this volume mount, copy to local machine and chmod.
Launch ipython kernel in container:
ipython kernel
You are launching jupyter notebook. I suspect this is the reason for your problem. I am not sure if spyder works on notebooks, but it works on iPython kernels. Probably, it works better on spyder-kernels.
copy kernel.json file from /remote_machine/JSON to local machine, chmod for accessing.
launch spyder, use local kernel.json and ssh settings. This part is same as yours.
Not enough reputation... to add comment but to chime on #asim's solution. I was able to have my locally installed Spyder to connect to a kernel running from a container on a remote machine. There is bit of manual work but I am okay with this since I can get much more done with Spyder than with other IDEs.
docker run --rm -it --net=host -v /project_directory_remote_machine:/container_project_directory image_id bash
from container
python -m spyder_kernels.console - matplotlib=’inline’ --ip=127.0.0.1 -f=/container_project_directory/connection_file.json
from remote machine, chmod connection_file.json to open then open and copy/paste content to a file on a local machine :) Use the json file to connect to a remote kernel following steps in the sources below
https://medium.com/#halmubarak/connecting-spyder-ide-to-a-remote-ipython-kernel-25a322f2b2be
https://mazzine.medium.com/how-to-connect-your-spyder-ide-to-an-external-ipython-kernel-with-ssh-putty-tunnel-e1c679e44154
I am using Wireshark 2.4.6 portable (downloaded from their site) and I am trying to configure the remote capture
I am not clear on what I should use in the remote capture command line.
There is a help for this but it refers to the CLI option
https://www.wireshark.org/docs/man-pages/sshdump.html
On the above page they say that using that sshdump CLI is the equivalent of this Unix CLI
ssh remoteuser#remotehost -p 22222 'tcpdump -U -i IFACE -w -' > FILE & $ wireshark FILE w
You just have to configure the SSH settings in that window to get Wireshark to log in and run tcpdump.
You can leave the capture command empty and it will capture on eth0. You'd only want to change it if you have specific requirements (like if you need to specify an interface name).
You might want to set the capture filter to not ((host x.x.x.x) and port 22) (replacing x.x.x.x with your own ip address) so the screen doesn't get flooded with its own SSH traffic.
The following works as a remote capture command:
/usr/bin/dumpcap -i eth0 -q -f 'not port 22' -w -
Replace eth0 with the interface to capture traffic on and not port 22 with the remote capture filter remembering not to capture your own ssh traffic.
This assumes you have configured dumpcap on the remote host to run without requiring sudo.
The other capture fields appear to be ignored when a remote capture command is specified.
Tested with Ubuntu 20.04 (on both ends) with wireshark 3.2.3-1.
The default remote capture command appears to be tcpdump.
I have not found any documentation which explains how the GUI dialog options for remote ssh capture are translated to the remote host command.
Pertaining to sshdump, if you're having trouble finding the command via the commandline, note that it is not in the system path by default on all platforms.
For GNU/Linux (for example in my case, Ubuntu 20.04, wireshark v3.2.3) it was under /usr/lib/x86_64-linux-gnu/wireshark/extcap/.
If this is not the case on your system, it may be handy to ensure that mlocate is installed (sudo apt install mlocate) then use locate sshdump to find its path (you may find some other interesting tools tools in the same location - use man pages or --help to learn more).
As an evolution on can you run GUI apps in a docker container, is it possible to run GUI applications via Docker without other tools like VNC or X11/XQuartz?
In VirtualBox, you could pass the --type gui to launch a headed VM, and this doesn't require installing any additional software. Is anything like that possible via Dockerfile or CLI arguments?
Docker doesn't provide a virtual video device and a place to render that video content in a window like a VM does.
It might be possible to run a container with --privileged and write to the Docker hosts video devices. That would possibly require a second video card that's not in use. The software that Docker runs in the container would also need to support that video device and be able write directly to it or a frame buffer. This limits what could run in the container to something like an X server or Wayland that draws a display to a device.
You could try the following which is worked in my case.
Check the Local machine display and its authentication
[root#localhost ~]# echo $DISPLAY
[root#localhost ~]# xauth list $DISPLAY
localhost:15 MIT-MAGIC-COOKIE-1 cc2764a7313f243a95c22fe21f67d7b1
Copy the above authentication and join your existing container, and add the display autherntication.
[root#apollo-server ~]# docker exec -it -e DISPLAY=$DISPLAY 3a19ab367e79 bash
root#3a19ab367e79:/# xauth add 192.168.10.10:15.0 MIT-MAGIC-COOKIE-1 cc2764a7313f243a95c22fe21f67d7b1
root#3a19ab367e79:/# firefox