I'm wanting to install the minimal cuda runtime files into alpine linux and create a much smaller docker base with cuda than that provided by nvidia themselves. The nvidia official ones are enormous as usual.
How do I obtain these runtime files without pulling the entire cuda 8 toolkit during docker build?
I can't speak as to what other files might be needed. However, Nvidia drivers are compiled with glibc, and alpine uses musl to maintain its small footprint. You would likely need the nvidia driver's source code so you could recompile it with musl, or an alpine baseimage that implements glibc such as this one. I haven't tried using this yet, but I was able to sucessfully compile libcudacore with musl and gcc/make on an alpine 3.8 container. I have not yet been able to compile the entire Nvidia/Cuda toolkit yet. I will attempt to test this more when I have more time.
The reality is that Nvidia/CUDA is not supported in any way with Alpine Linux Musl or its libc port, and you will end up with a flaky image nevertheless even if you succeed with your alchemist venture.
Nvidia drivers and CUDA Toolkits are incredibly complex systems that honestly I can't see the point to compile it yourself for an unsupported system library or an unsupported port for libc, with all the unexpected to happen even in the case it compiles. Use Debian's slim images or Ubuntu minimal and install official supported files manually, as this is the smallest you can go. Or even better use the "huge" Nvidia DockerHub images (ubuntu LTS based).
Anyway, beyond this question, the Nvidia DockerHub ones are the best way to go, they are supported by the creators of CUDA Toolkit itself and they are no brainers. If you want to be picky go to their Gitlab's repository for dockers, you can build up Debian/Ubuntu by hand pretty easily and quick.
Yes they Nvidia DockerHub images are 1-2 gig's large, but normally you only have to download them once, as you use the image as a base, if you add your code to it only those layers of your code which are normally small to dozens of Mbi are to be recurrently pulled/pushed, not the entire image, so honestly I can't see a reason why people is so much concerned about image sizes, small is better no doubt but up to a point, spending your valuable time in your actual needs is far better.
somebody's solution for alpine-cuda:
https://arto.s3.amazonaws.com/notes/cuda
Drivers
https://developer.nvidia.com/vulkan-driver
$ lsmod | fgrep nvidia
$ nvidia-smi
Driver Installation
https://us.download.nvidia.com/XFree86/Linux-x86_64/390.77/README/
https://github.com/NVIDIA/nvidia-installer
Driver Installation on Alpine Linux
https://github.com/sgerrand/alpine-pkg-glibc
https://github.com/sgerrand/alpine-pkg-glibc/releases
https://wiki.alpinelinux.org/wiki/Running_glibc_programs
$ apk add sudo bash ca-certificates wget xz make gcc linux-headers
$ wget -q -O /etc/apk/keys/sgerrand.rsa.pub https://raw.githubusercontent.com/sgerrand/alpine-pkg-glibc/master/sgerrand.rsa.pub
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-2.27-r0.apk
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-bin-2.27-r0.apk
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-dev-2.27-r0.apk
$ wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/2.27-r0/glibc-i18n-2.27-r0.apk
$ apk add glibc-2.27-r0.apk glibc-bin-2.27-r0.apk glibc-dev-2.27-r0.apk glibc-i18n-2.27-r0.apk
$ /usr/glibc-compat/bin/localedef -i en_US -f UTF-8 en_US.UTF-8
$ bash NVIDIA-Linux-x86_64-390.77.run --check
$ bash NVIDIA-Linux-x86_64-390.77.run --extract-only
$ cd NVIDIA-Linux-x86_64-390.77 && ./nvidia-installer
Driver Uninstallation
$ nvidia-uninstall
Driver Troubleshooting
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 390.77NVIDIA-Linux-x86_64-390.77.run: line 998: /tmp/makeself.XXX/xz: No such file or directory\nExtraction failed.
$ apk add xz # Alpine Linux
bash: ./nvidia-installer: No such file or directory
Install the glibc compatibility layer package for Alpine Linux.
ERROR: You do not appear to have libc header files installed on your system. Please install your distribution's libc development package.
$ apk add musl-dev # Alpine Linux
ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured
$ apk add linux-vanilla-dev # Alpine Linux
ERROR: Failed to execute `/sbin/ldconfig`: The installer has encountered the following error during installation: 'Failed to execute `/sbin/ldconfig`'. Would you like to continue installation anyway?
Continue installation.
Toolkit
https://developer.nvidia.com/cuda-toolkit
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/
Toolkit Download
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
$ wget -c https://developer.nvidia.com/compute/cuda/9.2/Prod2/local_installers/cuda_9.2.148_396.37_linux
Toolkit Installation
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
Toolkit Installation on Alpine Linux
$ apk add sudo bash
$ sudo bash cuda_9.2.148_396.37_linux
# You are attempting to install on an unsupported configuration. Do you wish to continue? y
# Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37? y
# Do you want to install the OpenGL libraries? y
# Do you want to run nvidia-xconfig? n
# Install the CUDA 9.2 Toolkit? y
# Enter Toolkit Location: /opt/cuda-9.2
# Do you want to install a symbolic link at /usr/local/cuda? y
# Install the CUDA 9.2 Samples? y
# Enter CUDA Samples Location: /opt/cuda-9.2/samples
$ sudo ln -s cuda-9.2 /opt/cuda
$ export PATH="/opt/cuda/bin:$PATH"
Toolkit Uninstallation
$ sudo /opt/cuda-9.2/bin/uninstall_cuda_9.2.pl
Toolkit Troubleshooting
Cannot find termcap: Can't find a valid termcap file at /usr/share/perl5/core_perl/Term/ReadLine.pm line 377.
$ export PERL_RL="Perl o=0"
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
$ apk add g++ # Alpine Linux
cicc: Relink `/usr/lib/libgcc_s.so.1' with `/usr/glibc-compat/lib/libc.so.6' for IFUNC symbol `memset'
https://github.com/sgerrand/alpine-pkg-glibc/issues/58
$ scp /lib/x86_64-linux-gnu/libgcc_s.so.1 root#alpine:/usr/glibc-compat/lib/libgcc_s.so.1
$ sudo /usr/glibc-compat/sbin/ldconfig /usr/glibc-compat/lib /lib /usr/lib
Compiler
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/
$ nvcc -V
Please define what you actually mean by "into Alpine Linux".
Regardless whether you're running the workloads directly on the host or in a container or chroot - you need to install the whole NVidia driver stack (including Cuda libs, kernel drivers, etc) on the host. Also kernel and userland drivers are two sides of the same product, both have to have the same version.
This means: whatever the host OS actually is, it has to be exactly one of those directly supported by NVidia. You have to use exactly the kernel versions (and configurations) that Nvidia built their proprietary/binary-only drivers for. Using a different kernel version or recompiling it with different configuration MIGHT POSSIBLY work, but it's DANGEROUS. Even with exactly the officially supported distros, it's still gambling, and depending on moon phase or whether some Chinese rice bag fallen over. It often works, but when it doesn't anymore, you're most likely out of luck.
Now when you're putting your workloads into some separate OS image, e.g. chroot or container, you also have to have the same driver package version in that image, too. One of the primary reasons for using containers or chroots - isolating and decoupling applications from host OS (so you don't need to fit them in anymore and do upgrades independently, even have container images independent from the host OS) - is now immediately voided. Host and workload need to fit together exactly.
In short: if you wanna have a CUDA workload, both host OS as well as workload image (container, chroot, etc) need to be supported by that, and they both need to have the same driver version installed. Anything else is just russian roulette.
Since somebody mentioned "nvidia-docker". This breaks the security isolation that docker is originally meant for. (just look at the source, which actually is available somewhere on github). It's nothing but a better chroot. And still, host and docker image need to have the same driver stack version installed.
Finally, I'd like to ask the question, what your actual use case is here.
Be warned: this all might be okay for playing games on an totally unimportant home computer, but really not suited for anything professional, where stability and security matter. If you're bound to certain data security / privacy regulations like GDPO, keep far away from this - you just cannot comply to these regulations with those proprietary drivers. Legally dangerous.
--mtx
Addendum: why do proprietary kernel drivers never work reliably ?
Express answer: the Linux kernel was never ever made for that, this just isn't supported.
Longer answer: kernel modules are NOT external programs, that are executed in some isolated environment (like eg. done with userland programs) - they are (by definition) integral pieces of the kernel that just happen to be lazily loaded when needed. (they are not even like shared libraries / DLLs). This means that they have to fit - on binary level - exactly to the actual build of the kernel you're running. When compiling the kernel, there're lots of config options that influence the actual internal binary layout in subtle ways, e.g. enabling/disabling some features can change the layout of certain data structures, cpu specific optimizations can change datastructures, calling conventions, locking mechanisms, and much much more.
And those things also change from kernel version to another. We're e.g. doing lots of internal refactorings (e.g. in data structures, macros and inline functions) after which the same piece source code generates very different binary code.
Therefore, any kernel modules always need to be compiled exactly for a specific kernel image (with the same config options, against the same includes, with the same compiler flags), or you risks horrible failures that could lead to lockups, security flaws, data corruptions or even total data loss.
You have been warned.
To clarify, this is just the driver. Not cuda. That's another story.
In fact this turns out to be much easier than expected. I just didn't quite /understand how far nvidia-docker project had come and quite how it worked.
Basically, download and install the latest nvidia-docker. From the nvidia-docker project.
https://github.com/NVIDIA/nvidia-docker/releases
Then create an alpine linux Dockerfile.
FROM alpine:3.5
LABEL com.nvidia.volumes.needed="nvidia_driver"
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
RUN /bin/sh
Build it.
docker build -t alpine-nvidia
Run
nvidia-docker run -ti --rm alpine-nvidia
Note the use of the nvidia-docker cli instead of the normal docker cli.
nvidia-docker calls docker cli with extra parameters.
Related
When I run sudo apt install linux-modules-extra-$(uname -r) in a Docker container based on a Ubuntu 20.04 on a single board computer running Ubuntu 18.04, I get the following errors:
E: Unable to locate package linux-modules-extra-4.15.0-143-generic
E: Couldn't find any package by glob 'linux-modules-extra-4.15.0-143-generic'
E: Couldn't find any package by regex 'linux-modules-extra-4.15.0-143-generic'
To me, this makes me wonder whether it is even possible to install linux-modules-extra-4.15.0-143-generic in Ubuntu 20.04? Maybe it is only compatible with Ubuntu 18.04?
Could anyone clarify this for me please?
In general, if you're building a kernel module, it has to match exactly the kernel that's running on the host system. If you're using a native Debian or Ubuntu system (without Docker), there's a system where kernel modules can be rebuilt or reinstalled when the host kernel is updated. See for example the Debian wiki KernelDKMS page.
In contrast, a Docker image is generally supposed to be portable across hosts. If you upgrade the host's kernel, or if you run a FROM ubuntu:18.04 image on an Ubuntu 20.04 host, the image isn't really supposed to be aware of this.
In your particular case, you can't get the kernel headers you need, because they're not part of the Ubuntu 18.04 distribution. For this particular case it might be possible to get the headers from the later version of Ubuntu, but it might not be possible in the general case; maybe because the system is actually running plain Debian or RHEL and the kernel build is different, maybe because the operator built their own kernel.
Since a Linux kernel module is so specific to the host it runs on, and since it can bypass any and all security concerns, it's not appropriate to try to install one in a container. Do it directly on the host instead.
I have a bootable iso image (live cd) with Linux system that is pretty old. That distro doesn't have remote repo (all installations are done from cdrom and separate disk with packages). I wanted to turn it into a docker image. Reading through articles google gave me, I've found several ways to do that. The first one is to mount the iso and find filesystem.squashfs - only modern distros use that way, not my case. My distro doesn't have that file available. The second approach is to call debootstrap but it requires to specify the repo for the distro with dist directory available in it. My distro doesn't have a public repo. What can I do? Is it even possible? I think that should be possible by doing a lot of things manually but how?
I faced similar problems when I had to containerize an old build server (building natively for legacy systems), eventually I succeeded. This approach describes how to containerize some old Linux distro (kernel 2.6.27 in my case), in the present Linux kernel 5 era.
General steps
if necessary: boot the old OS (or Live CD image)
login to the old system as root (or use sudo)
create a tarball from the relevant folders present in root
cd / ; tar cfvz image.tar.gz --one-file-system --exclude=/var/log --exclude=/image.tar.gz /
the selection worked in my case; review for yourself which folders to include or exclude
transfer the tarball to the Docker host (step not shown here)
and import it:
docker import image.tar.gz
the previous command will print out some hash
if convenient, tag the imported image:
docker tag <import-hash> <your-label>
Legacy problem: unsupported system calls
The imported image contains a Linux distribution snapshot. Some binaries can be executed from Docker, eg.:
docker run --rm <your-label> bin/ls
may actually work.
Some important binaries initially did not work for me, most notably bash:
docker run -it --rm <your-label> bin/bash
was failing silently. (Also, running with strace was possible but gave no clear indication.)
As #hiranchaudhuri pointed out, this is likely due to an API discrepancy between the host's kernel and the container's user space code.
In my case the problem was solved by enabling the legacy vsyscall kernel API
for Windows WSL2, this is described here https://learn.microsoft.com/en-us/windows/wsl/wsl-config
for native Linux systems of today, I guess this can be set in the boot configuration, with the kernel command-line parameter vsyscall=emulate, if the present kernel supports this option
I seriously doubt you will succeed on that.
Be aware Docker is not a full virtualization like KVM or VirtualBox. The lightweight virtualization benefits from the docker containers running on the host's Linux kernel. Which means the kernel is the same inside and outside of the container.
If you now try to install some old distro inside the container you may end up with an incompatible combination. Patching the kernel may involve upgrading glibc, and patching that may involve recompiling the rest of the OS.
I am not sure why you want to stick to the old distro, but seriously I believe you are better off with real virtualization.
I have been experimenting with GrovePI+ running python programs and am looking to extend my experimentation to include integrating with Azure IoT Hub by create Azure IoT Edge modules. I know that I need to update module settings to run with escalated rights so that the program can access I/O and have seen the documentation on how to accomplish that, but I am struggling a bit with getting the container built. The approach that I had in mind was to base the image on the arm32v7/python:3.7-stretch image and from there include the following run command:
RUN apt-get update &&\
apt-get -y install apt-utils curl &&\
curl -kL dexterindustries.com/update_grovepi | bash
The problem is that the script is failing because it looks for files in /home/pi/. Before I go deeper down the rabbit hole, I figured I should check and see if I am working on a problem that someone else already solved. Has anyone built Docker images to run GrovePi programs? If so, what worked for you?
I've no experience with GrovePi, but remember that modules (Docker containers) are completely self-contained and don't have access to the system. So if that script works when ssh'ed into a device, then I can see why it would not work in a module; the module is a little system-in-a-box that has no awareness of or access to locations like /home/pi/.
Basically, I'd expect you need to configure the Pi itself with whatever is needed for Grove Pi stuff, then you package your Python into a module. The tricky bit might be getting access to hardware like I2C from within the module, but that's not too terrible. This kind of thing is what you'll need (but different devices).
This was a huge surprise for me:
Today, using Docker For Mac (18.03.1-ce-mac65), I ran a Debian Stretch image. Inside the image I mounted the latest Raspbian Stretch image (2018-04-18-raspbian-stretch-lite) using mount. I then used chroot to this mounted Raspbian filesystem.
This is where it got weird. I was able to use apt (without any special modifications) to install software into this mounted filesystem.
Running:
dpkg --print-architecture
returned: armfh
and the software I installed (vim) worked like a charm
I was even able to compile a simple program using gcc and run it.
But, I need to know! How is this possible?
According to Docker:
Docker for Mac provides binfmt_misc multi architecture support, so you can run containers for different Linux architectures, such as arm, mips, ppc64le, and even s390x.
EDIT
On Linux, you can install qemu-user-static and then follow this git repo to get cross-architecture support!
I am the developer of a software product (NJOY) with build requirements of:
CMake 3.2
Python 3.4
gcc 6.2 or clang 3.9
gfortran 5.3+
In reading about Docker, it seems that I should be able to create an image with just these components so that I can compile my code and use it. Much of the documentation is written with the implication that one wants to create a scalable web architecture and thus, doesn’t appear to be applicable to compiled applications like what I’m trying to do. I know it is applicable, I just can’t seem to figure out what to do.
I’m struggling with separating the Docker concept from a Virtual Machine; I can only conceive of compiling my code in an environment that contains an entire OS instead of just the necessary components. I’ve begun a Docker image by starting with an Ubuntu image. This seems to work just fine, but I get the feeling that I’m overly complicating things.
I’ve seen a Docker image for gcc; I’d like to combine it with CMake and Python into an image that we can use. Is this even possible?
What is the right way to approach this?
Combining docker images is not available. Docker images are chained. You start from a base images and you then install additional tools that you want to add on top of the base image.
For instance, you can start from the gcc image and build on it by creating a Dockerfile. Your Dockerfile might look something like:
FROM gcc:latest
# install cmake
RUN apt-get install cmake
# Install python
RUN apt-get install python
Then you build this dockerfile to create the Docker image. This will give you an image that contains gcc, cmake and python.