How to find contents of an NGC Docker image? - docker

The NVIDIA NGC container catalog has a broad range of GPU-optimised containers for common activities such as deep learning. How does one find what is inside the Docker images?
For example, I require an image with Pytorch 1.4 and Python 3.6 or 3.7, but the Pytorch tags go from pytorch:17.10 to pytorch:21.06-py3 (where xx.xx is the container version). Is there somewhere a list of what is installed in each container, or even better the container Dockerfile that was used to build the images?

You can do a high level inspection of the image using:
docker history <IMAGE> also at one point I've used this tool:
https://github.com/wagoodman/dive which was quite nice in order to inspect different layers.
So basically you can inspect each layer to see the instructions used in order to build that specific image and search for commands that have been used to install different packages

The details of pytorch NGC containers are listed at PyTorch Release Notes at the bottom of the pytorch NGC page.
All other deep learning frameworks related documentation is also at NVIDIA Deep Learning Frameworks.

Related

Shared Python Packages Among Docker Containers

I've multiple docker containers that host some flask apps which runs some machine learning services. Let's say container 1 is using pytorch, and container 2 is also using pytorch. When I build image, both pytorch take up some size on disk. For some reason, we split these 2 services into different containers, if I insist on this way, is it possible to only build pytorch once so that both container can import it? Thanks in advance, appreciate any help and suggestions!
You can build one docker image and install pytorch on it. Then use that image as base image for those two codes. In this way, pytorch only takes hard space once. And you save time no installing pytorch twice.
You can also build only one image, copy your codes in two different directories,
for example /app1 and /app2. Then in your docker compose, change work directory for each app.

Which docker base image to use in the Dockerfile?

I'm having a web application, which consists of two projects:
using VueJS as a front-end part;
using ExpressJS as a back-end part;
I now need to docker-size my application using a docker, but I'm not sure about the very first line in my docker files (which is referring to the used environment I guess, source).
What I will need to do now is separate docker images for both projects, but since I'm very new to this, I can't figure out what should be the very first lines for both of the Dockerfiles (in both of the projects).
I was developing the project in Windows 10 OS, where I'm having node version v8.11.1 and expressjs version 4.16.3.
I tried with some of the versions which I found (as node:8.11.1-alpine) but what I got a warning: `
SECURITY WARNING: You are building a Docker image from Windows against
a non-Windows Docker host.
Which made me to think that I should not only care about node versions, instead to care about OS as well. So not sure which base images to use now.
node:8.11.1-alpine is a perfectly correct tag for a Node image. This particular one is based on Alpine Linux - a lightweight Linux distro, which is often used when building Docker images because of it's small footprint.
If you are not sure about which base image you should choose, just read the documentation at DockerHub. It lists all currently supported tags and describes different flavours of the Node image ('Image Variants' section).
Quote:
Image Variants
The node images come in many flavors, each designed for a specific use case.
node:<version>
This is the defacto image. If you are unsure about what your needs are, you probably want to use this one. It is designed to be used both as a throw away container (mount your source code and start the container to start your app), as well as the base to build other images off of. This tag is based off of buildpack-deps. buildpack-deps is designed for the average user of docker who has many images on their system. It, by design, has a large number of extremely common Debian packages. This reduces the number of packages that images that derive from it need to install, thus reducing the overall size of all images on your system.
node:<version>-alpine
This image is based on the popular Alpine Linux project, available in the alpine official image. Alpine Linux is much smaller than most distribution base images (~5MB), and thus leads to much slimmer images in general.
This variant is highly recommended when final image size being as small as possible is desired. The main caveat to note is that it does use musl libc instead of glibc and friends, so certain software might run into issues depending on the depth of their libc requirements. However, most software doesn't have an issue with this, so this variant is usually a very safe choice. See this Hacker News comment thread for more discussion of the issues that might arise and some pro/con comparisons of using Alpine-based images.
To minimize image size, it's uncommon for additional related tools (such as git or bash) to be included in Alpine-based images. Using this image as a base, add the things you need in your own Dockerfile (see the alpine image description for examples of how to install packages if you are unfamiliar).
node:<version>-onbuild
The ONBUILD image variants are deprecated, and their usage is discouraged. For more details, see docker-library/official-images#2076.
While the onbuild variant is really useful for "getting off the ground running" (zero to Dockerized in a short period of time), it's not recommended for long-term usage within a project due to the lack of control over when the ONBUILD triggers fire (see also docker/docker#5714, docker/docker#8240, docker/docker#11917).
Once you've got a handle on how your project functions within Docker, you'll probably want to adjust your Dockerfile to inherit from a non-onbuild variant and copy the commands from the onbuild variant Dockerfile (moving the ONBUILD lines to the end and removing the ONBUILD keywords) into your own file so that you have tighter control over them and more transparency for yourself and others looking at your Dockerfile as to what it does. This also makes it easier to add additional requirements as time goes on (such as installing more packages before performing the previously-ONBUILD steps).
node:<version>-slim
This image does not contain the common packages contained in the default tag and only contains the minimal packages needed to run node. Unless you are working in an environment where only the node image will be deployed and you have space constraints, we highly recommend using the default image of this repository.

difference between host and docker container

I have been trying to train a 3DCNN network with a specific architecture. I wanted to create a dockerfile with all the steps necessary to have the network working. The issue is that If I run the neural network network in the host I have no problem, everything works fine. But doing almost the same on a docker container I always get the "segmentation fault (core dumped)" error.
Both installations are not exactly the same but the variations (maybe some extra package installed) shouldn't be a problem, right? Besides I don't have any error until it starts iterating, so it seems like is a memory problem. The GPU works on the docker container and is the same GPU as the host. the python code is the same.
The Docker container neural network network start training with the data but on the epoch 1 it gets the "segmentation fault (core dumped)".
So my question is the following: Is it possible to have critical differences between the host and a docker container even if they have exactly the same packages installed? Especially with relation to tensorflow and GPU. Because the error must be from outside the code, given that the code works in a similar environment.
Hope I explained myself enough to give the idea of my question, thank you.
A docker image will resolve, at runtime, will resolve its system calls by the host kernel.
See "How can Docker run distros with different kernels?".
In your case, your Error is
Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1, SSE4.2
See "How to compile Tensorflow with SSE4.2 and AVX instructions?"
(referenced by tensorflow/tensorflow issue 8037)
You could try and build an image from a Tensorflow built from source, using a docker multi-stage build.

Is it possible to install the CPU and GPU versions of tensorflow at the same time

I am using nvidia-docker to access GPUs from a docker container. However, not all of our machines have GPUs and I would like to automatically fall back to the CPU version when GPUs are not available.
Do I have to build separate docker images--one for CPU and one for GPU--or is it possible to install tensorflow and tensorflow-gpu and pick the right variant depending on whether a GPU is available?
The GPU version of tensorflow fails to load in the container when started using normal docker (as opposed to nvidia-docker) because the library libcuda.so.1 is missing. We managed to use the same image for different hosts in three steps:
Link the library stub /usr/local/cuda/lib64/stubs/libcuda.so to libcuda.so.1 in the same directory.
Add the stubs directory as a search path to /etc/ld.so.conf.d with lower precedence than the directory in which libcuda.so.1 is mounted by nvidia-docker.
Call ldconfig to refresh the library cache.
If the image is used on a host without a GPU via normal docker, tensorflow loads the stub and places all ops on the CPU. If the image is used on a host with a GPU via nvidia-docker, tensorflow loads the mounted library and places appropriate ops on the GPU. Full example here.
You might want to to take a look at the official Tensorflow docker images The GPU version uses nvidia-docker to access the GPU.
What I've done in the past is have two nearly identical Dockefiles. The only difference in the Dockerfiles would be the FROM directive:
FROM tensorflow/tensorflow:latest-py3
OR
FROM tensorflow/tensorflow:latest-gpu-py3
(you could also choose the Python2 image if you want)
Everything else would be the same, and you could even automate this, such that the appropriate FROM tag is set when you build the image. I have used makefiles to build the appropriate image depending on whether the host machine has GPU or not.

Extend docker devicemapper loop-lvm sparse file

We are using Docker as part of our build pipeline. As we're using CentOS, the default Docker installation has set up devicemapper with a loop-lvm sparse file. Now that sparse file has reached it's size limit of 100G and we're not able to build any new images or containers. Due to the issues mentioned in Clean docker environment: devicemapper we cannot free any space by removing unused containers or images.
As I've learnt in the meantime, using loop-lvm is not a very good idea (http://www.projectatomic.io/blog/2015/06/notes-on-fedora-centos-and-docker-storage-drivers/) and we're planning the migration to direct-lvm. However, this will take some time to plan. Therefore I'm looking into possibilites to extend the currently used sparse file without loosing all data.
The current Docker docs have some instructions on how to extend the sparse file (https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#/for-a-loop-lvm-configuration). However, we are currently running Docker version 1.9 and the docs for this version do not contain these instructions. Any chance that they are still applicable for Docker 1.9?
Answering my own question: I was able to extend the loop-lvm file using the method described in the docs and did not encounter any issue so far.

Resources