"No module named PIL" after "RUN pip3 install Pillow" in docker container; neither PIL nor Pillow present in dist-packages directory - docker

I'm following this SageMaker guide and using the 1.12 cpu docker file.
https://github.com/aws/sagemaker-tensorflow-serving-container
If I use the requirements.txt file to install Pillow, my container works great locally, but when I deploy to SageMaker, 'pip3 install' fails with an error indicating my container doesn't have internet access.
To work around that issue, I'm trying to install Pillow in my container before deploying to SageMaker.
When I include the lines "RUN pip3 install Pillow" and "RUN pip3 show Pillow" in my docker file, when building, I see output saying "Successfully installed Pillow-6.2.0" and the show command indicates the lib was installed at /usr/local/lib/python3.5/dist-packages. Also running "RUN ls /usr/local/lib/python3.5/dist-packages" in the docker files shows "PIL" and "Pillow-6.2.0.dist-info" in dist-packages, and the PIL directory includes many code files.
However, when I run my container locally, trying to import in python using "from PIL import Image" results in error "No module named PIL". I've tried variations like "import Image", but PIL doesn't seem to be installed in the context in which the code is running when I start the container.
Before the line "from PIL import Image", I added "import subprocess" and 'print(subprocess.check_output("ls /usr/local/lib/python3.5/dist-packages".split()))'
This ls output matches what I get when running it in the docker file, except "PIL" and "Pillow-6.2.0.dist-info" are missing. Why are those two in /usr/local/lib/python3.5/dist-packages when I run the docker file but not when my container is started locally?
Is there a better way to include Pillow in my container? The referenced Github page also shows that I can deploy libraries by including the files (in code/lib of model package), but to get files compatible with Ubuntu 16.04 (which the docker container uses; I'm on a Mac), I'd probably copy them from the docker container after running "RUN pip3 install Pillow" in my docker file, and it seems odd that I would need to get files from the docker container to deploy to the docker container.
My docker file looks like:
ARG TFS_VERSION
FROM tensorflow/serving:${TFS_VERSION} as tfs
FROM ubuntu:16.04
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
COPY --from=tfs /usr/bin/tensorflow_model_server /usr/bin/tensorflow_model_server
# nginx + njs
RUN \
apt-get update && \
apt-get -y install --no-install-recommends curl && \
curl -s http://nginx.org/keys/nginx_signing.key | apt-key add - && \
echo 'deb http://nginx.org/packages/ubuntu/ xenial nginx' >> /etc/apt/sources.list && \
apt-get update && \
apt-get -y install --no-install-recommends nginx nginx-module-njs python3 python3-pip python3-setuptools && \
apt-get clean
RUN pip3 install Pillow
# cython, falcon, gunicorn, tensorflow-serving
RUN \
pip3 install --no-cache-dir cython falcon gunicorn gevent requests grpcio protobuf tensorflow && \
pip3 install --no-dependencies --no-cache-dir tensorflow-serving-api
COPY ./ /
ARG TFS_SHORT_VERSION
ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"
ENV PATH "$PATH:/sagemaker"
RUN pip3 show Pillow
RUN ls /usr/local/lib/python3.5/dist-packages
I've tried installing Pillow on the same line as cython and other dependencies, but the result is the same...those dependencies are in /usr/local/lib/python3.5/dist-packages both at the time the container is built and when the container is started locally, while "PIL" and "Pillow-6.2.0.dist-info" are only present when the container is built.

Apologies for the late response.
If I use the requirements.txt file to install Pillow, my container works great locally, but when I deploy to SageMaker, 'pip3 install' fails with an error indicating my container doesn't have internet access.
If restricted internet access isn't a requirement, then you should be able to enable internet access by making enable_network_isolation=False when instantiating your Model class in the SageMaker Python SDK, as shown here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L85
If restricted internet access is a requirement, this means that you will need to either install your dependencies in your own container beforehand or make use of the packaging as you mentioned in your correspondence.
I have copied your provided Dockerfile and created an image to run as an image in order to reproduce the error you are seeing. I was not able to reproduce the error as quoted below:
However, when I run my container locally, trying to import in python using "from PIL import Image" results in error "No module named PIL". I've tried variations like "import Image", but PIL doesn't seem to be installed in the context in which the code is running when I start the container.
I created a similar Docker image and ran it as a container with the following command:
docker run -it --entrypoint bash <DOCKER_IMAGE>
from within the container I started a Python3 session and ran the following commands locally without error:
root#13eab4c6e8ab:/# python3 -s
Python 3.5.2 (default, Oct 8 2019, 13:06:37)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from PIL import Image
Can you please provide the code for how you're starting your SageMaker jobs?
Please double check that the Docker image you have created is the one being referenced when starting your SageMaker jobs.
Please let me know if there is anything I can clarify.
Thanks!

Related

How to pack and transport only the delta of a container?

I have the following scenario:
A docker or podman container is setup ready and deployed to several production instances, that are NOT connected to the internet.
A new release has been developed, that needs only a new package, like a python module of a few kilobytes in size.
The new package is installed on dev container, and the dockerfile has been updated to also load the latest module (just for documentation, because the target system cannot reach docker.io).
We have packed the new container release, which is more than a Gigabyte in size. And could transport the new container to the target environments.
My question is: is there a way, to pack, create and transport only a delta of the container compared to the previously deployed version?
podman version 3.4.7
echo "\
FROM jupyter/scipy-notebook
USER root
RUN apt-get update && apt-get install --no-install-recommends -y mupdf-tools python3-dev
USER user
RUN pip -V
RUN pip install fitz==0.0.1.dev2
RUN pip install PyMuPDF==1.20.2
RUN pip install seaborn
RUN pip install openpyxl==3.0.10
RUN pip install flask==2.1.3
" > sciPyDockerfile
podman build --tag python_runner -f ./sciPyDockerfile
sudo podman save -o python_runner.tar python_runner
gzip python_runner.tar
The result is a file
1.1G Nov 28 15:27 python_runner.tar.gz
Is there any way to pack the delta only?

Run protoc command into docker container

I'm trying to run protoc command into a docker container.
I've tried using the gRPC image but protoc command is not found:
/bin/sh: 1: protoc: not found
So I assume I have to install manually using RUN instructions, but is there a better solution? An official precompiled image with protoc installed?
Also, I've tried to install via Dockerfile but I'm getting again protoc: not found.
This is my Dockerfile
#I'm not using "FROM grpc/node" because that image can't unzip
FROM node:12
...
# Download proto zip
ENV PROTOC_ZIP=protoc-3.14.0-linux-x86_32.zip
RUN curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.14.0/${PROTOC_ZIP}
RUN unzip -o ${PROTOC_ZIP} -d ./proto
RUN chmod 755 -R ./proto/bin
ENV BASE=/usr/local
# Copy into path
RUN cp ./proto/bin/protoc ${BASE}/bin
RUN cp -R ./proto/include/* ${BASE}/include
RUN protoc -I=...
I've done RUN echo $PATH to ensure the folder is in path and is ok:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Also RUN ls -la /usr/local/bin to check protoc file is into the folder and it shows:
-rwxr-xr-x 1 root root 4849692 Jan 2 11:16 protoc
So the file is in /bin folder and the folder is in the path.
Have I missed something?
Also, is there a simple way to get the image with protoc installed? or the best option is generate my own image and pull from my repository?
Thanks in advance.
Edit: Solved downloading linux-x86_64 zip file instead of x86_32. I downloaded the lower architecture requirements thinking a x86_64 machine can run a x86_32 file but not in the other way. I don't know if I'm missing something about architecture requirements (It's probably) or is a bug.
Anyway in case it helps someone I found the solution and I've added an answer with the neccessary Dockerfile to run protoc and protoc-gen-grpc-web.
The easiest way to get non-default tools like this is to install them through the underlying Linux distribution's package manager.
First, look at the Docker Hub page for the node image. (For "library" images like node, construct the URL https://hub.docker.com/_/node.) You'll notice there that there are several variations named "alpine", "buster", or "stretch"; plain node:12 is the same as node:12-stretch and node:12.20.0-stretch. The "alpine" images are based on Alpine Linux; the "buster" and "stretch" ones are different versions of Debian GNU/Linux.
For Debian-based packages, you can then look up the package on https://packages.debian.org/ (type protoc into the "Search the contents of packages" form at the bottom of the page). That leads you to the protobuf-compiler package. Knowing that contains the protoc binary, you can install it in your Dockerfile with:
FROM node:12 # Debian-based
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
protobuf-compiler
# The rest of your Dockerfile as above
COPY ...
RUN protoc ...
You generally must run apt-get update and apt-get install in the same RUN command, lest a subsequent rebuild get an old version of the package cache from the Docker build cache. I generally have only a single apt-get install command if I can manage it, with the packages list alphabetically one to a line for maintainability.
If the image is Alpine-based, you can do a similar search on https://pkgs.alpinelinux.org/contents to find protoc, and similarly install it:
FROM node:12-alpine
RUN apk add --no-cache protoc
# The rest of your Dockerfile as above
Finally I solved my own issue.
The problem was the arch version: I was using linux-x86_32.zip but works using linux-x86_64.zip
Even #David Maze answer is incredible and so complete, it didn't solve my problem because using apt-get install version 3.0.0 and I wanted 3.14.0.
So, the Dockerfile I have used to run protoc into a docker container is like this:
FROM node:12
...
# Download proto zip
ENV PROTOC_ZIP=protoc-3.14.0-linux-x86_64.zip
RUN curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v3.14.0/${PROTOC_ZIP}
RUN unzip -o ${PROTOC_ZIP} -d ./proto
RUN chmod 755 -R ./proto/bin
ENV BASE=/usr
# Copy into path
RUN cp ./proto/bin/protoc ${BASE}/bin/
RUN cp -R ./proto/include/* ${BASE}/include/
# Download protoc-gen-grpc-web
ENV GRPC_WEB=protoc-gen-grpc-web-1.2.1-linux-x86_64
ENV GRPC_WEB_PATH=/usr/bin/protoc-gen-grpc-web
RUN curl -OL https://github.com/grpc/grpc-web/releases/download/1.2.1/${GRPC_WEB}
# Copy into path
RUN mv ${GRPC_WEB} ${GRPC_WEB_PATH}
RUN chmod +x ${GRPC_WEB_PATH}
RUN protoc -I=...
Because this is currently the highest ranked result on Google and the above instructions above won't work, if you want to use docker/dind for e.g. gitlab, this is the way how you can get the glibc-dependency working for protoc there:
#!/bin/bash
# install gcompat, because protoc needs a real glibc or compatible layer
apk add gcompat
# install a recent protoc (use a version that fits your needs)
export PB_REL="https://github.com/protocolbuffers/protobuf/releases"
curl -LO $PB_REL/download/v3.20.0/protoc-3.20.0-linux-x86_64.zip
unzip protoc-3.20.0-linux-x86_64.zip -d $HOME/.local
export PATH="$PATH:$HOME/.local/bin"

How to retain cmake changes when building Docker image in Google Cloud Build?

I am working on a CI pipeline with Google Cloud Build to run tests on code stored in Cloud Source Repositories. As it stands, the pipeline uses the docker cloud builder to build an image with docker build. The building process takes nearly an hour to complete and it runs periodically. It builds the program and then runs a single test on it in one build step, this part works fine. What I want to do is build the program using cmake and make then store the image in the container registry so that I can run future tests from that image without having to spend the time building it before testing.
The issue is that when I run the custom image from the Container Registry in Cloud Build, it does not recognize the module that was built with cmake. The pipeline ran tests just fine when I built it then ran tests in the same build steps, but no longer recognizes the module when I run the image as a custom builder on Cloud Build.
The dockerfile used to build the image is as follows:
FROM ubuntu
ARG DEBIAN_FRONTEND=noninteractive
COPY . /app
WORKDIR /app
RUN apt-get update
RUN apt-get -y install python3.7
RUN apt-get -y install python3-pip
RUN pip3 install numpy
RUN pip3 install matplotlib
RUN pip3 install transitions
RUN pip3 install pandas
RUN apt-get install -y cmake
RUN apt-get install -y swig
RUN pip3 install conan
RUN ln -s ~/.local/bin/conan /usr/bin/conan
RUN apt-get install gcc
RUN cd ~
RUN python3 path/to/master_build.py
The master_build.py uses os.system commands to build the program. It calls a shell script to do the cmake process. The shell script is:
#!/bin/sh
mkdir dist3
cd dist3
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../src
make
cd ~
This builds the program no problem, then the python script calls other scripts to run basic tests, which all work fine when I do it in this build step. The issue is, when I use Cloud Build to run the custom image from container registry, it can no longer find the module that was built with CMake.
This is the cloudbuild config file that runs the custom image:
steps:
- name: 'gcr.io/$PROJECT_ID/built_images:latest'
args: ['python3', 'path/to/run_tests.py']
I get a ModuleNotFoundError, which is weird because it worked fine when I ran the test script in the same build after calling cmake. I'm guessing that the file system is not being retained when I push the image to container registry and it can no longer find the dist3 folder.
How can I retain the dist3 folder when I am pushing the image to container registry?
Thanks!

Docker: Unable to edit code within image due to missing bash

I have Ubuntu 18 running on an AWS server. Within that server I have a Docker image that I want to change the code for while it is still running.
ubuntu#ip-172-31-6-79:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
fc latest 20949d0fd7ec 7 days ago 1.74GB
debian latest 8d31923452f8 5 weeks ago 101MB
ekholabs/face-classifier latest b1a390b8ec60 21 months ago 1.77GB
In order to change the code I ran the following command
ubuntu#ip-172-31-6-79:~$ docker run -it fc bash
But I get the following error
python3: can't open file 'bash': [Errno 2] No such file or directory
How do I go about fixing this so I can edit the code within the Docker image. As a side note here is the Dockerfile
FROM debian:latest
RUN apt-get -y update && apt-get install -y git python3-pip python3-dev python3-tk vim procps curl
#Face classificarion dependencies & web application
RUN pip3 install numpy scipy scikit-learn pillow tensorflow pandas h5py opencv-python==3.2.0.8 keras statistics pyyaml pyparsing cycler matplotlib Flask
ADD . /ekholabs/face-classifier
WORKDIR ekholabs/face-classifier
ENV PYTHONPATH=$PYTHONPATH:src
ENV FACE_CLASSIFIER_PORT=8084
EXPOSE $FACE_CLASSIFIER_PORT
ENTRYPOINT ["python3"]
CMD ["src/web/faces.py"]
The problem is on your dockerfile you use an
ENTRYPOINT ["PYTHON3"]
which means when you run
docker run -it fc bash
it gets converted inside container to "python3 bash" this is why you have an error
python3: can't open file 'bash': [Errno 2] No such file or directory
Try remove the ENTRYPOINT
Hope that resolve the problem.

Docker-compose cannot find Java

I am trying to use a Python wrapper for a Java library called Tabula. I need both Python and Java images within my Docker container. I am using the openjdk:8 and python:3.5.3 images. I am trying to build the file using Docker-compose, but it returns the following message:
/bin/sh: 1: java: not found
when it reaches the line RUN java -version within the Dockerfile. The line RUN find / -name "java" also doesn't return anything, so I can't even find where Java is being installed in the Docker environment.
Here is my Dockerfile:
FROM python:3.5.3
FROM openjdk:8
FROM tailordev/pandas
RUN apt-get update && apt-get install -y \
python3-pip
# Create code directory
ENV APP_HOME /usr/src/app
RUN mkdir -p $APP_HOME/temp
WORKDIR /$APP_HOME
# Install app dependencies
ADD requirements.txt $APP_HOME
RUN pip3 install -r requirements.txt
# Copy source code
COPY *.py $APP_HOME/
RUN find / -name "java"
RUN java -version
ENTRYPOINT [ "python3", "runner.py" ]
How do I install Java within the Docker container so that the Python wrapper class can invoke Java methods?
This Dockerfile can not work because the multiple FROM statements at the beginning don't mean what you think it means. It doesn't mean that all the contents of the Images you're referring to in the FROM statements will end up in the Images you're building somehow, it actually meant two different concepts throughout the history of docker:
In the newer Versions of Docker multi stage builds, which is a very different thing from what you're trying to achieve (but very interesting nontheless).
In earlier Versions of Docker, it gave you the ability to simply build multiple images in one Dockerfile.
The behavior you are describing makes me assume you are using such an earlier Version. Let me explain what's actually happening when you run docker build on this Dockerfile:
FROM python:3.5.3
# Docker: "The User wants me to build an
Image that is based on python:3.5.3. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM openjdk:8
# Docker: "The User wants me to build an Image that is based on openjdk:8. No Problem!"
# Docker: "Ah, the next FROM Statement is coming up,
which means that the User is done with building this image"
FROM tailordev/pandas
# Docker: "The User wants me to build an Image that is based on python:3.5.3. No Problem!"
# Docker: "A RUN Statement is coming up. I'll put this as a layer in the Image the user is asking me to build"
RUN apt-get update && apt-get install -y \
python3-pip
...
# Docker: "EOF Reached, nothing more to do!"
As you can see, this is not what you want.
What you should do instead is build a single image where you will first install your runtimes (python, java, ..), and then your application specific dependencies. The last two parts you're already doing, here's how you could go about installing your general dependencies:
# Let's start from the Alpine Java Image
FROM openjdk:8-jre-alpine
# Install Python runtime
RUN apk add --update \
python \
python-dev \
py-pip \
build-base \
&& pip install virtualenv \
&& rm -rf /var/cache/apk/*
# Install your framework dependencies
RUN pip install numpy scipy pandas
... do the rest ...
Note that I haven't tested the above snippet, you may have to adapt a few things.

Resources