I have been trying to train a 3DCNN network with a specific architecture. I wanted to create a dockerfile with all the steps necessary to have the network working. The issue is that If I run the neural network network in the host I have no problem, everything works fine. But doing almost the same on a docker container I always get the "segmentation fault (core dumped)" error.
Both installations are not exactly the same but the variations (maybe some extra package installed) shouldn't be a problem, right? Besides I don't have any error until it starts iterating, so it seems like is a memory problem. The GPU works on the docker container and is the same GPU as the host. the python code is the same.
The Docker container neural network network start training with the data but on the epoch 1 it gets the "segmentation fault (core dumped)".
So my question is the following: Is it possible to have critical differences between the host and a docker container even if they have exactly the same packages installed? Especially with relation to tensorflow and GPU. Because the error must be from outside the code, given that the code works in a similar environment.
Hope I explained myself enough to give the idea of my question, thank you.
A docker image will resolve, at runtime, will resolve its system calls by the host kernel.
See "How can Docker run distros with different kernels?".
In your case, your Error is
Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1, SSE4.2
See "How to compile Tensorflow with SSE4.2 and AVX instructions?"
(referenced by tensorflow/tensorflow issue 8037)
You could try and build an image from a Tensorflow built from source, using a docker multi-stage build.
Related
I've multiple docker containers that host some flask apps which runs some machine learning services. Let's say container 1 is using pytorch, and container 2 is also using pytorch. When I build image, both pytorch take up some size on disk. For some reason, we split these 2 services into different containers, if I insist on this way, is it possible to only build pytorch once so that both container can import it? Thanks in advance, appreciate any help and suggestions!
You can build one docker image and install pytorch on it. Then use that image as base image for those two codes. In this way, pytorch only takes hard space once. And you save time no installing pytorch twice.
You can also build only one image, copy your codes in two different directories,
for example /app1 and /app2. Then in your docker compose, change work directory for each app.
Let's say that I make an image for an OS that uses a kernel of version 10. What behavior does Docker exhibit if I run a container for that image on a host OS running a kernel of version 9? What about version 11?
Does the backward compatibility of the versions matter? I'm asking out of curiosity because the documentation only talks about "minimum Linux kernel version", etc. This sounds like it doesn't matter what kernel version the host is running beyond that minimum. Is this true? Are there caveats?
Let's say that I make an image for an OS that uses a kernel of version 10.
I think this is a bit of a misconception, unless you are talking about specific software that relies on newer kernel features inside your Docker image, which should be pretty rare. Generally speaking a Docker image is just a custom file/directory structure, assembled in layers via FROM and RUN instructions in one or more Dockerfiles, with a bit of meta data like what ports to open or which file to execute on container start. That's really all there is to it. The basic principle of Docker is very much like a classic chroot jail, only a bit more modern and with some candy on top.
What behavior does Docker exhibit if I run a container for that image on a host OS running a kernel of version 9? What about version 11?
If the kernel can run the Docker daemon it should be able to run any image.
Are there caveats?
As noted above, Docker images that include software which relies on bleeding edge kernel features will not work on kernels that do not have those features, which should be no surprise. Docker will not stop you from running such an image on an older kernel as it simply does not care whats inside an image, nor does it know what kernel was used to create the image.
The only other thing I can think of is compiling software manually with aggressive optimizations for a specific cpu like Intel or Amd. Such images will fail on hosts with a different cpu.
Docker's behaviour is no different: it doesn't concern itself (directly) with the behaviour of the containerized process. What Docker does do is set up various parameters (root filesystem, other mounts, network interfaces and configuration, separate namespaces or restrictions on what PIDs can be seen, etc.) for the process that let you consider it a "container," and then it just runs the initial process in that environment.
The specific software inside the container may or may not work with your host operating system's kernel. Using a kernel older than the software was built for is not infrequently problematic; more often it's safe to run older software on a newer kernel.
More often, but not always. On a host with kernel 4.19 (e.g. Ubuntu 18.04) try docker run centos:6 bash. You'll find it segfaults (exit code 139) because that old build of bash does something that greatly displeases the newer kernel. (On a 4.9 or lower kernel, docker run centos:6 bash will work fine.) However, docker run centos:6 ls will not die in the same way because that program is not dependent on particular kernel facilities that have changed (at least, not when run with no arguments).
This sounds like it doesn't matter what kernel version the host is running beyond that minimum. Is this true?
As long as your kernel meets Docker's minimum requirements (which mostly involve having the necessary APIs to support the isolated execution environment that Docker sets up for each container), Docker doesn't really care what kernel you're running.
In many way, this isn't entirely a Docker question: for the most part, user-space tools aren't tied particularly tightly to specific kernel versions. This isn't unilaterally true; there are some tools that by design interact with a very specific kernel version, or that can take advantage of APIs in recent kernel versions for improved performance, but for the most part your web server or database just doesn't care.
Are there caveats?
The kernel version you're running may dictate things like which storage drivers are available to Docker, but this doesn't really have any impact on your containers.
Older kernel versions may have security vulnerabilities that are fixed in more recent versions, and newer versions may have fixes that offer improved performance.
I am using nvidia-docker to access GPUs from a docker container. However, not all of our machines have GPUs and I would like to automatically fall back to the CPU version when GPUs are not available.
Do I have to build separate docker images--one for CPU and one for GPU--or is it possible to install tensorflow and tensorflow-gpu and pick the right variant depending on whether a GPU is available?
The GPU version of tensorflow fails to load in the container when started using normal docker (as opposed to nvidia-docker) because the library libcuda.so.1 is missing. We managed to use the same image for different hosts in three steps:
Link the library stub /usr/local/cuda/lib64/stubs/libcuda.so to libcuda.so.1 in the same directory.
Add the stubs directory as a search path to /etc/ld.so.conf.d with lower precedence than the directory in which libcuda.so.1 is mounted by nvidia-docker.
Call ldconfig to refresh the library cache.
If the image is used on a host without a GPU via normal docker, tensorflow loads the stub and places all ops on the CPU. If the image is used on a host with a GPU via nvidia-docker, tensorflow loads the mounted library and places appropriate ops on the GPU. Full example here.
You might want to to take a look at the official Tensorflow docker images The GPU version uses nvidia-docker to access the GPU.
What I've done in the past is have two nearly identical Dockefiles. The only difference in the Dockerfiles would be the FROM directive:
FROM tensorflow/tensorflow:latest-py3
OR
FROM tensorflow/tensorflow:latest-gpu-py3
(you could also choose the Python2 image if you want)
Everything else would be the same, and you could even automate this, such that the appropriate FROM tag is set when you build the image. I have used makefiles to build the appropriate image depending on whether the host machine has GPU or not.
I am experimenting with distributed tensorflow and an example project.
Running the project on the same docker container seems to work well. As soon as you run the application on different conatiners, they cannot connect to eachother.
I don't really know the problem, but I think this is because docker and tensorflow open ports which have to be concatenated to connect to the application like localhost:[docker-port]:[tf-port]
Do you think my guess is correct? And how can I solve this problem?
Can issues result if a docker image requires a kernel feature not provided by host OS kernel (e.g an image which requires a very specific kernel version)? Is this issue guaranteed to be prevented in some way?
Can issues result if a docker image requires a kernel feature not provided by host OS kernel
Yes, but note that the docker installation page recommend a minimun kernel level for docker itself to run.
For instance on RedHat "your kernel must be 3.10 at minimum".
If the image you run requires more recent kernel features, it won't work even though docker itself will.
Is this issue guaranteed to be prevented in some way?
Not really, as illustrated in "Docker - the pain of finding the right distribution+kernel+hardware combination "
As noted in "Can a docker image based on Ubuntu run in Redhat?"
Most Linux kernel variants are sufficiently similar that applications will not notice. However if the code relies on something specific in the kernel that is not there, Docker can't help you.
Plus:
system architecture is a limitation.
x86_64 images won't run on ARM for exemple. I.E. you won't run the official ubuntu image on a Raspberry PI.