persistent pip install in rapids.ai docker container - docker

This is probably a really stupid question, but one has got to start somewhere. I am playing with NVDIA's rapids.ai gpu-enhanced docker container, but this (presumably by design) does not come with pytorch. Now, of course, I can do a pip install torch torch-ignite every time, but this is both annoying and resource-consuming (and pytorch is a large download). What is the approved method for persisting a pip install in a container?

Create a new Dockerfile that builds a new image based on the existing one:
FROM the/rapids-ai/image
RUN pip install torch torch-ignite
And then
$ ls Dockerfile
Dockerfile
$ docker build -t myimage .
You can now do:
$ docker run myimage

Related

Error installing PyTorch when building Docker image

I run into this error when trying to build a Docker image. My requirements.txt file only contains 'torch==1.9.0'. This version clearly exists, but after downloading for a minute or longer, this error pops up.
There is a pytorch docker container on docker hub that has the latest releases: https://hub.docker.com/r/pytorch/pytorch/tags?page=1&ordering=last_updated
Maybe you can either base your docker container on that container (if it works for you) or you can compare the Dockerfile of your container with the Dockerfile of the container on docker hub to see if you are missing any system level dependencies or configurations...
Modify your Docker file to install requirements using:
RUN pip install -r requirements.txt --no-cache-dir
This will solve ram/memory related issues with large packages like torch.

Purpose of specifying several UNIX commands in a single RUN instruction in Dockerfile

I have noticed that many Dockerfiles try to minimize the number of instructions by several UNIX commands in a single RUN instruction. So is there any reason?
Also is there any difference in the outcomes between the two Dockerfiles below?
Dockerfile1
FROM ubuntu
MAINTAINER demousr#example.com
RUN apt-get update
RUN apt-get install –y nginx
CMD ["echo", "Image created"]
Dockerfile2
FROM ubuntu
MAINTAINER demousr#example.com
RUN apt-get update && apt-get install –y nginx
CMD ["echo", "Image created"]
Roughly speaking, a Docker image contains some metadata & an array of layers, and a running container is built upon these layers by adding a container layer (read-and-write), the layers from the underlying image being read-only at that point.
These layers can be stored in the disk in different ways depending on the configured driver. For example, the following image taken from the official Docker documentation illustrates the way the files changed in these different layers are taken into account with the OverlayFS storage driver:
Next, the Dockerfile instructions RUN, COPY, and ADD create layers, and the best practices mentioned on the Docker website specifically recommend to merge consecutive RUN commands in a single RUN command, to reduce the number of layers, and thereby reduce the size of the final image:
https://docs.docker.com/develop/dev-best-practices/
[…] try to reduce the number of layers in your image by minimizing the number of separate RUN commands in your Dockerfile. You can do this by consolidating multiple commands into a single RUN line and using your shell’s mechanisms to combine them together. […]
See also: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
Moreover, in your example:
RUN apt-get update -y -q
RUN apt-get install -y nginx
if you do docker build -t your-image-name . on this Dockerfile, then edit the Dockerfile after a while, add another package beyond nginx, then do again docker build -t your-image-name ., due to the Docker cache mechanism, the apt-get update -y -q won't be executed again, so the APT cache will be obsolete. So this is another upside for merging the two RUN commands.
In addition to the space savings, it's also about correctness
Consider your first dockerfile (a common mistake when working with debian-like systems which utilize apt):
FROM ubuntu
MAINTAINER demousr#example.com
RUN apt-get update
RUN apt-get install –y nginx
CMD ["echo", "Image created"]
If two or more images follow this pattern, a cache hit could cause the image to be unbuildable due to cached metadata
let's say I built an image which looks similar to that ~a few weeks ago
now I'm building this image today. there's a cache present up until the RUN apt-get update line
the docker build will reuse that cached layer (since the dockerfile and base image are identical) up to the RUN apt-get update
when the RUN apt-get install line runs, it will use the cached apt metadata (which is now weeks out of date and likely will error)

How to extend/inherit/join from two seperate Dockerfiles, multi-stage builds?

I have a deployment process which I currently achieve via docker-machine and docker-compose. (I have multiple services deployed which are interelated - one a Django application, and another the resty-auto-ssl Docker image (ref: https://github.com/Valian/docker-nginx-auto-ssl)
My docker-compose file is something like:
services:
web:
nginx:
postgres:
(N.B. I'm not using postgres in production, that's merely as example).
What I need to do, is to essentially bundle all of this up into one built Docker image.
Each service references a different Dockerfile base, one for the Django application:
FROM python:3.7.2-slim
RUN apt-get update && apt-get -y install cron && apt-get -y install nano
ENV PYTHONUNBUFFERED 1
RUN mkdir /usr/src/app
WORKDIR /usr/src/app
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
COPY . /usr/src/app
RUN ["chmod", "+x", "/usr/src/app/init.sh"]
And one for the valian/docker-nginx-auto-ssl image:
FROM valian/docker-nginx-auto-ssl
COPY nginx.conf /usr/local/openresty/nginx/conf/
I assume theoretically I could some how join these two Dockerfiles into one? Would this be a case of utilising multi-stage Docker builds (https://docs.docker.com/v17.09/engine/userguide/eng-image/multistage-build/#before-multi-stage-builds) to be used into a joined docker-compose service?
I don't believe you can join images, a Dockerfile image is like a VM hard disk, it would be like saying I want to join 2 hard disk images together. These images may even be different versions of Linux and now even Windows. If you want 1 single image, you could build one yourself by starting off with a base mage like Alpine Linux and then install all the dependencies you want.
But the good news the images you use from Dockfile you can get the source for these, so all the hard work of what to put in your Docker is done for you.
eg. For the python bit -> https://github.com/jfloff/alpine-python
And then for nginx-auto -> https://github.com/Valian/docker-nginx-auto-ssl
Because the nginx-auto-sll is based on alphie-fat, I would suggest using that one. And then get the details from both Docker files and append them to each other.
Once you have created this image you can then use again & again. So although it might be a pain setting up initially, it pays dividends later.

Pip install actually IN a docker container (Airflow)

I am really new with docker. I have it run now for airflow. For one of the airflow DAGs, I perform python jobs.<job_name>.run which is located on the server + within the docker. However, this python code needs packages to run and I am having trouble installing those.
If I put in the Dockerfile a RUN pip install ... it doesn't seem to work. If I go 'in' the docker container by docker exec -ti <name_of_worker> bash and I perform pip freeze, no packages show up.
However, if I perform the pip install command while I am in the worker, the airflow DAG will run successfully. However, I shouldn't have to perform this task every time I rebuild my containers. Can anyone help me?

Is there a way to use conda to install libraries in a Docker image?

I'm trying to install some libraries (specifically pytorch) in my docker image. I have a Dockerfile that installs anaconda properly, but now I'd like to use conda to install a few other things in the image. Is there a way to do this? I've tried
RUN /opt/anaconda/bin/conda install -y pytorch torchvision
And anaconda is installed in the right location. Am I doing this the right way?
First, check if the way you have added/installed anaconda in your own image reflects the ContinuumIO/docker-images/anaconda.
Second, you can test the installation dynamically, as the README recommends
docker run -it yourImage /bin/bash -c "/opt/conda/bin/conda install ...
If that is working correctly, you can do the same in your Dockerfile.

Resources