I am trying to create an nvidia-docker image with installed TensorRT for my specific application. I can't use any of the provided TensortRT base images, as they are using CUDA version not compatible with the application, but I have a custom TensorRT debian package which is used in my organization. The problem is, when I install it from the Dockerfile, it also installs nvidia drivers. As a result, the container is successfully created, but can't be started - the result is:
svc_moma_usr#PL1LXD-529389:~/gutkowsp/Docker_projects/test_cuda$ nvidia-docker run tensorrt-test
docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/97f449ff2535b1ad304520dae75c613931888658a66b89235b0d040a872a625c/merged/usr/bin/nvidia-smi: file exists\\\\n\\\"\"": unknown.
ERRO[0001] error waiting for container: context canceled
The dockerfile is:
FROM nvidia/cuda:9.1-devel-ubuntu16.04
ENV DEBIAN_FRONTEND noninteractive
ENV CUDNN_VERSION 7.0.5.15
LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"
RUN apt update -y && \
apt install software-properties-common -y && \
apt-add-repository --yes --update ppa:ansible/ansible && \
apt install ansible -y
RUN apt update -y && \
apt install -y --no-install-recommends \
libcudnn7=$CUDNN_VERSION-1+cuda9.1 \
libcudnn7-dev=$CUDNN_VERSION-1+cuda9.1
RUN apt update -y && \
apt install tensorrt -y
How this problem of unnecessary drivers is solved? This seems to me like a common issue, as in general nvidia docker images typically have installed nvidia software, which usually comes with drivers. Maybe someone can share the dockerfiles for the TensorRT images for reference?
For anyone who facing the same issue:
If necessary use CUDNN enabled docker image, like 11.7.1-cudnn8-runtime-ubuntu18.04 to avoid the necessity to install it using apt
Run apt update
Run apt install <your package> -y --dry-run | grep nvidia
Add all listed nvidia packages to apt ignore list - add a dash after the package name with an asterisk in place of version number
apt install <your package> libnvidia-compute-*-server- \
libnvidia-compute-*- --dry-run | grep nvidia
Make sure that none of nvidia packages will be installed. If necessary add newly discovered packages to ignore list.
If everything is OK then remove --dry-run flag and install your package
apt install <your package> libnvidia-compute-*-server- libnvidia-compute-*-
Related
I had to compile a custom kernel to get Ubuntu to run on my laptop and now I'm trying to run docker containers on it.
It generated packages I installed:
linux-headers-5.15.30-25.25custom_5.15.30-25.25custom-1_amd64.deb
linux-image-5.15.30-25.25custom-dbg_5.15.30-25.25custom-1_amd64.deb
linux-image-5.15.30-25.25custom_5.15.30-25.25custom-1_amd64.deb
Now when I try to create docker images I get the following error:
...
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package linux-headers-5.15.30-25.25custom
E: Couldn't find any package by glob 'linux-headers-5.15.30-25.25custom'
E: Couldn't find any package by regex 'linux-headers-5.15.30-25.25custom'
The Dockerfile just pulls an nvidia image and adds some other packages required
FROM nvidia/cuda:11.4.2-devel-ubuntu18.04
ARG COMPILE_GRAPHICS=OFF
ARG DEBIAN_FRONTEND=noninteractive
USER root
RUN \
set -ex && \
apt-key update && \
apt-get update && \
apt-get install -y -q \
build-essential \
software-properties-common \
openssl \
curl && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /var/cache/apt/archives/ && \
rm -rf /usr/share/doc/ && \
rm -rf /usr/share/man/
...
It is installed on the host PC
~$ sudo dpkg -l | grep linux-headers-5.15.30-25.2
ii linux-headers-5.15.30-25.25custom 5.15.30-25.25custom-1 amd64 Linux kernel headers for 5.15.30-25.25custom on amd64
There's no problem on other machines using the upstream Ubuntu kernel packages.
So guess docker needs the actual package. How can I add a custom location to fetch the packages?
Thanks
I get the feeling you are mixing up what is outside and inside of a container.
Outside - on your host operating system you had to compile a custom kernel to get Linux running. So far so good.
Now you are trying to build a docker container. So the next steps are happening inside the container. Docker is lightweight virtualization therefore the container runs on the same kernel as the host. Since some package dependency is on the kernel's headers, and apt is trying to install the Debian package for them but cannot find them. Seems obvious as you are running a custom kernel and the package is in none well-known repository.
To get out of the situation:
check whether the headers for your kernel are available as .deb
make that .deb available inside the container. This may happen by placing them into the Docker build context.
ensure your Dockerfile installs the .deb before installing whatever needs that .deb. This will prevent it is searched in online repositories.
Created a Docker file in oreder to install Tomcat server from Unix as bashe os
My Dockerfile:
FROM ubuntu
RUN apt-get update && apt-get upgrade -y #to update os
RUN apt-get dist-upgrade
RUN apt-get install build-essential
RUN apt-get install openjdk-8-jdk # to install java 8
RUN apt-get wget -y #to install wget package
RUN apt-get wget https://mirrors.estointernet.in/apache/tomcat/tomcat-9/v9.0.37/bin/apache-tomcat-9.0.37.tar.gz #to download tomcat
RUN tar -xvzf apache-tomcat-9.0.37 # unzipping the tomcat
RUN mkdir tomcat # craeting tomacat directory
RUN cp apache-tomcat-9.0.37/* tomcat # copying tomact files to tomact directory
Command to create Docker Image from Docker file:
docker build -t [img name] -f [file name] .
On execution, while installing java package am getting like this:
'''After this operation, 242 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y'''
You are getting the prompt because the command is awaiting user input for whether or not to install a package. The -y flag you're using for a few of them (like wget) allows bash to assume a yes. Add this flag to all your installation commands.
By the way, there's quite a few potential issues with the Dockerfile you posted.
For example, you have RUN apt-get wget ...
Are you sure that is what you want to do, and not just RUN wget ...? Unless wget is a command that apt-get takes, which it isn't, it will cause unexpected behavior.
You also seem to be missing the command to start the Tomcat server, which can make it so that nothing happens when you attempt to run the image.
I think you should add DEBIAN_FRONTEND=noninteractive when running the apt-get commands, something like this:
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install build-essential -y
Also, it's considered bad practice to use multiple RUN steps which could be consolidated into one. More about Dockerfile best practices can be found here.
I am trying to install the java runtime in a Debian based docker image (mcr.microsoft.com/dotnet/core/sdk:3.1-buster). According to various howtos this should be possible by running
RUN apt update
RUN apt-get install openjdk-11-jre
The latter command comes back with
E: Unable to locate package openjdk-11-jre
However according to https://packages.debian.org/buster/openjdk-11-jre the package does exist. What am I doing wrong?
Unsure from which image your are pulling. I used slim, Dockerfile.
from debian:buster-slim
ENV DEBIAN_FRONTEND=noninteractive
RUN mkdir -p /usr/share/man/man1 /usr/share/man/man2
RUN apt-get update && \
apt-get install -y --no-install-recommends \
openjdk-11-jre
# Prints installed java version, just for checking
RUN java --version
NOTE: If you don't run the mkdir -p /usr/share/man/man1 /usr/share/man/man2 you'll run into dependency problems with ca-certificates, openjdk-11-jre-headless etc. I've been using this fix provided by community, haven't really checked the permanent fix.
I'm building a image that builds a Jenkins and I try to use a plugin over the Jenkins when it is running, so, I need get run Jenkins before my plugin execution.
I execute it like docker build -t dockerfile and the error wich I am obtaining:
jenkins.JenkinsException: Error in request: [Errno 99]
Cannot assign requested address
I think the problem is when the plugin is executed it guess Jenkins is running and not.
FROM foxylion/jenkins
MAINTAINER Mishel Uchuari <dmuchuari#hotmail.com>
RUN /usr/local/bin/install-plugins.sh workflow-remote-loader workflow-aggregator build-pipeline-plugin
ENV JENKINS_USER replicate
ENV JENKINS_PASS replicate
USER root
RUN apt-get -y update && apt-get -y upgrade
RUN apt-get install -y apt-utils
RUN apt-get install -y python-pip
RUN apt install -y linuxbrew-wrapper
RUN useradd someuser -m -s /bin/bash
USER someuser
RUN chmod -R 777 /home/someuser
RUN brew install libyaml
USER root
RUN apt-get install build-essential
RUN apt-get -y update && apt-get -y upgrade
RUN pip install jenkins-job-builder==2.0.0.0b2
RUN pip install PyYAML python-jenkins
RUN mkdir /etc/jenkins_jobs/
COPY jenkins_jobs.ini /etc/jenkins_jobs/
COPY scm_pipeline.yaml /etc/jenkins_jobs/
RUN jenkins-jobs --conf /etc/jenkins_jobs/jenkins_jobs.ini update /etc/jenkins_jobs/scm_pipeline.yaml
I had the same issue myself when using it under Docker:
File "/src/.tox/py27/local/lib/python2.7/site-packages/jenkins_jobs/builder.py", line 124, in get_plugins_info
raise e
JenkinsException: Error in request: [Errno 99] Cannot assign requested address
That was caused when it tries to retrieve the list of plugins, I went overriding plugins_info to short circuit the code path:
jjb = JenkinsJobs(args=['test', config_dir, '-o', output_dir])
jjb.builder['plugins_info'] = [] # prevents 99 cannot assign requested address
jjb.execute()
I had the issue with python 2.7.9 on Debian Jessie. If I remember correctly that is no more an issue with a later python version eg 2.7.13 from Debian Stretch.
(the patch on which I encountered the issue):
https://gerrit.wikimedia.org/r/#/c/380929/8/tests/test_integration.py
RUN brew install libyaml
brew is a package manager for Mac OS X. Also PyYAML gracefully skip compilation when the lib is not availble. So you probably do not need that one. And I guess it would work without installing build-essential.
RUN pip install jenkins-job-builder==2.0.0.0b2
RUN pip install PyYAML python-jenkins
I am surprised you have install PyYAML and python-jenkins explicitly. Supposedly installing jenkins-job-builder should install all the dependencies (eg PyYAML and python-jenkins).
I am trying to build a docker container based on amazonlinux which is sort of centos.
One of the packages I need is supervisor and it is not available on the official repos so I have to do it with easy_install or pip.
The problem is that, although I tried installing python-setuptools and python-pip, then when I try to do:
RUN easy_install supervisor
or
RUN pip install supervisor
It says the command doesn't exists
/bin/sh: easy_install: command not found
The command '/bin/sh -c easy_install supervisor' returned a non-zero code: 127
I tried with full path, but same result, and I see other dockerfiles people doing it like that on centos images.
After a while, I found the reason.
By default, yum was installing python26 and the easy_install script runs with python27, so I had to be calling easy_install-2.6 or install the python27 package
Not familiar with AWS's specific image, but for a general centos image, you'll need to install pip or easy_install with a yum command first, which requires the epel repository:
RUN yum -y install epel-release \
&& yum -y install python-pip python-setuptools \
&& yum clean all
Python documented the process in detail on their page here: https://packaging.python.org/install_requirements_linux/
There's also some documentation on this over at superuser: https://superuser.com/q/877759/587488