pip unable to fetch wheels when inside container - docker

When I try to build an image which includes installing Python modules via pip, the build container only use source distribution and therefore always compiles all modules from source, which is extremely annoying...
How can I get pip inside the container to use wheels just like the host system does?
When run from the host system
# pip3 -V
pip 18.1 from /usr/lib/python3/dist-packages/pip (python 3.7)
# pip3 install numpy
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting numpy
Using cached https://www.piwheels.org/simple/numpy/numpy-1.21.2-cp37-cp37m-linux_armv7l.whl
When run inside docker run -it python:3.7 /bin/bash
# pip3 -V
pip 21.2.4 from /usr/local/lib/python3.7/site-packages/pip (python 3.7)
# pip3 install numpy
Collecting numpy
Downloading numpy-1.21.2.zip (10.3 MB)
When I run everything in one command without shell, it works too
# docker run -it python:3.8 python3 -m pip install -U pip wheel setuptools && python3 -m pip install Pillow
Requirement already satisfied: pip in /usr/local/lib/python3.8/site-packages (21.2.4)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/site-packages (0.37.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/site-packages (57.4.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting Pillow
Downloading https://www.piwheels.org/simple/pillow/Pillow-8.3.1-cp37-cp37m-linux_armv7l.whl (1.3 MB)
Things I've tried
Run as privileged
use the pip flag --no-binary=:all:
System information:
Raspberry Pi 4
Linux 5.10.17-v7l+ armv7l GNU/Linux
Raspbian GNU/Linux 10 (buster)
ARMv7 Processor rev 3 (v7l)

Notice that you're using https://www.piwheels.org/, which only provides wheels for the Raspberry Pi. Open https://www.piwheels.org/simple/numpy/ you can see there are only armv6l and armv7l wheels.
Maybe in the container you forget to modify the index url and set it to https://www.piwheels.org/simple ?

Related

Docker ssh-agent forwarding breaks when multiple repos installed via requirements.txt

My team is using docker build secrets to forward our ssh-agent during a docker image build. We are running Docker Desktop on Mac, version 20.10.22, and we use the following step to install all image dependencies:
RUN --mount=type=ssh,uid=50000 pip install --no-warn-script-location --user -r requirements.txt -c constraints.txt
Our requirements.txt has pip installing multiple repos via git+ssh, and we are having intermittent exchange identification errors depending on the number of repos included and the order of their installation:
ssh_exchange_identification: Connection closed by remote host
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
This same step runs successfully if we install the dependencies one-by-one:
RUN --mount=type=ssh,uid=50000 xargs -L 1 pip install --no-warn-script-location --user -c constraints.txt < requirements.txt
Installing one-by-one is not advised, because it does not allow the dependency resolver to work on all dependencies that might be common to the packages. We believe this issue may come from a break in docker ssh-agent forwarding when pip uses subprocess to install the entries of our requirements.txt.
Has anyone run into this issue or know of any workarounds? Thank you!

How to pack and transport only the delta of a container?

I have the following scenario:
A docker or podman container is setup ready and deployed to several production instances, that are NOT connected to the internet.
A new release has been developed, that needs only a new package, like a python module of a few kilobytes in size.
The new package is installed on dev container, and the dockerfile has been updated to also load the latest module (just for documentation, because the target system cannot reach docker.io).
We have packed the new container release, which is more than a Gigabyte in size. And could transport the new container to the target environments.
My question is: is there a way, to pack, create and transport only a delta of the container compared to the previously deployed version?
podman version 3.4.7
echo "\
FROM jupyter/scipy-notebook
USER root
RUN apt-get update && apt-get install --no-install-recommends -y mupdf-tools python3-dev
USER user
RUN pip -V
RUN pip install fitz==0.0.1.dev2
RUN pip install PyMuPDF==1.20.2
RUN pip install seaborn
RUN pip install openpyxl==3.0.10
RUN pip install flask==2.1.3
" > sciPyDockerfile
podman build --tag python_runner -f ./sciPyDockerfile
sudo podman save -o python_runner.tar python_runner
gzip python_runner.tar
The result is a file
1.1G Nov 28 15:27 python_runner.tar.gz
Is there any way to pack the delta only?

Add packages from requirements.txt to Docker image to minimize cold start time on EC2?

When deploying a machine learning model on EC2 from a Docker image, the cold start time is high because the instance downloads the packages and files from requirements.txt even though the dockerfile contains pip install to install all these packages.
Some sample output when booting up:
021-11-21 05:28:57.632740:cortex:pid-1:INFO:downloading the project code
2021-11-21 05:28:57.746448:cortex:pid-1:INFO:downloading the python serving image
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting imutils
Downloading imutils-0.5.4.tar.gz (17 kB)
Collecting tensorflow==2.4.1
Downloading tensorflow-2.4.1-cp36-cp36m-manylinux2010_x86_64.whl (394.3 MB)
Collecting opencv-python==4.1.2.30
Downloading opencv_python-4.1.2.30-cp36-cp36m-manylinux1_x86_64.whl (28.3 MB)
Collecting pillow==7.0.0
Downloading Pillow-7.0.0-cp36-cp36m-manylinux1_x86_64.whl (2.1 MB)
Collecting flask-cors==3.0.8
Downloading Flask_Cors-3.0.8-py2.py3-none-any.whl (14 kB)
Requirement already satisfied: boto3 in /opt/conda/envs/env/lib/python3.6/site-packages (from -r /mnt/project/requi
rements.txt (line 6)) (1.13.7)
Collecting torch==1.8.1+cu101
Downloading https://download.pytorch.org/whl/cu101/torch-1.8.1%2Bcu101-cp36-cp36m-linux_x86_64.whl (763.6 MB)
Collecting torchvision==0.9.1+cu101
Downloading https://download.pytorch.org/whl/cu101/torchvision-0.9.1%2Bcu101-cp36-cp36m-linux_x86_64.whl (17.3 MB)
Rather than download and install these files for each EC2 instance launched, is it possible to do this once and incorporate the files into the Docker image during the Docker build process?
Dockerfile
FROM nvidia/cuda:11.4.0-runtime-ubuntu18.04
WORKDIR /usr/src/app
RUN apt-get -y update && \
apt-get install -y --fix-missing \
build-essential \
cmake \
python3 \
python3-pip \
ffmpeg \
libsm6 \
libxext6 \
&& apt-get clean && rm -rf /tmp/* /var/tmp/*
ADD ./requirements.txt ./
# install our dependencies
RUN python3 -m pip install --upgrade pip && python3 -m pip install -r requirements.txt && apt-get clean && rm -rf /tmp/* /var/tmp/*
ADD ./ ./
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
EXPOSE 8080
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8080","--log-level", "debug"]
Requirements.txt
torch==1.9.0
ffmpeg_python==0.1.17
fastai==1.0.51
boto3==1.18.15
botocore==1.21.15
scikit_image==0.17.2
requests==2.26.0
torchvision==0.10.0
opencv_python==4.5.3.56
starlette==0.14.2
scipy==1.5.4
numpy==1.19.5
fastapi==0.68.0
ffmpeg==1.4
ipython==7.16.1
Pillow==8.3.1
tensorboardX==2.4
uvicorn
python-multipart
youtube_dl==2021.6.6
uvloop
I think,
AS-IS:
(Local)
write code and Docker
=> push remote git-server
(EC2)
pull git repo
=> run Dockerfile
TO-BE:
(Local)
write code and Docker
=> push remote git-server
=> build Docker Image (by github-action or manually)
=> push Docker Image to Container Registry (by github-action or manually)
(EC2)
pull Docker Image
=>run Docker Image
Doing some guesswork from the info you are providing, it seems that you are using Cortex to deploy your workloads. You try to preinstall your python dependencies in your container, so that you don't have to manually download them every time your container start. The logs files come from your application inside the container. Although you have already installed them in the container OS, Cortex most probably uses virtual environments to separate jobs and downloads requirements again. This is how virtual environments work and it is actually the only way the EC2 images could support different jobs, with different requirements each.
Provide some more details to get any further assistance.
This is not a true answer but I hope it will be of help.
According to the error trace you provided in your question, and as pointed out as well in other answers, you seem to be using Cortex.
Please, consider review this Github issue. It describes the behavior you are indicating. Consider especially the first comment from David Eliahu:
"downloading the python serving image" refers to downloading the docker image for the API.
By default, the image is hosted on quay.io, and likely has dependencies
installed which you do not need. So, the best way to speed this up is to
make the image smaller, and host the image on ECR in the same region as
your cluster (assuming you're running on AWS). Here is our documentation
for both of these approaches: https://docs.cortex.dev/workloads/managing-dependencies/images
The provided link is broken, although the colocated image approach is still suggested in the product Production Guide and especially in this article.
Unfortunately, it only provides advice about how to minimize the image pull time but it says nothing about how to speed up the actual container boot process.
The aforementioned issue talks about optimizing the image making it smaller but it seems no longer described in the documentation.

"No module named PIL" after "RUN pip3 install Pillow" in docker container; neither PIL nor Pillow present in dist-packages directory

I'm following this SageMaker guide and using the 1.12 cpu docker file.
https://github.com/aws/sagemaker-tensorflow-serving-container
If I use the requirements.txt file to install Pillow, my container works great locally, but when I deploy to SageMaker, 'pip3 install' fails with an error indicating my container doesn't have internet access.
To work around that issue, I'm trying to install Pillow in my container before deploying to SageMaker.
When I include the lines "RUN pip3 install Pillow" and "RUN pip3 show Pillow" in my docker file, when building, I see output saying "Successfully installed Pillow-6.2.0" and the show command indicates the lib was installed at /usr/local/lib/python3.5/dist-packages. Also running "RUN ls /usr/local/lib/python3.5/dist-packages" in the docker files shows "PIL" and "Pillow-6.2.0.dist-info" in dist-packages, and the PIL directory includes many code files.
However, when I run my container locally, trying to import in python using "from PIL import Image" results in error "No module named PIL". I've tried variations like "import Image", but PIL doesn't seem to be installed in the context in which the code is running when I start the container.
Before the line "from PIL import Image", I added "import subprocess" and 'print(subprocess.check_output("ls /usr/local/lib/python3.5/dist-packages".split()))'
This ls output matches what I get when running it in the docker file, except "PIL" and "Pillow-6.2.0.dist-info" are missing. Why are those two in /usr/local/lib/python3.5/dist-packages when I run the docker file but not when my container is started locally?
Is there a better way to include Pillow in my container? The referenced Github page also shows that I can deploy libraries by including the files (in code/lib of model package), but to get files compatible with Ubuntu 16.04 (which the docker container uses; I'm on a Mac), I'd probably copy them from the docker container after running "RUN pip3 install Pillow" in my docker file, and it seems odd that I would need to get files from the docker container to deploy to the docker container.
My docker file looks like:
ARG TFS_VERSION
FROM tensorflow/serving:${TFS_VERSION} as tfs
FROM ubuntu:16.04
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
COPY --from=tfs /usr/bin/tensorflow_model_server /usr/bin/tensorflow_model_server
# nginx + njs
RUN \
apt-get update && \
apt-get -y install --no-install-recommends curl && \
curl -s http://nginx.org/keys/nginx_signing.key | apt-key add - && \
echo 'deb http://nginx.org/packages/ubuntu/ xenial nginx' >> /etc/apt/sources.list && \
apt-get update && \
apt-get -y install --no-install-recommends nginx nginx-module-njs python3 python3-pip python3-setuptools && \
apt-get clean
RUN pip3 install Pillow
# cython, falcon, gunicorn, tensorflow-serving
RUN \
pip3 install --no-cache-dir cython falcon gunicorn gevent requests grpcio protobuf tensorflow && \
pip3 install --no-dependencies --no-cache-dir tensorflow-serving-api
COPY ./ /
ARG TFS_SHORT_VERSION
ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"
ENV PATH "$PATH:/sagemaker"
RUN pip3 show Pillow
RUN ls /usr/local/lib/python3.5/dist-packages
I've tried installing Pillow on the same line as cython and other dependencies, but the result is the same...those dependencies are in /usr/local/lib/python3.5/dist-packages both at the time the container is built and when the container is started locally, while "PIL" and "Pillow-6.2.0.dist-info" are only present when the container is built.
Apologies for the late response.
If I use the requirements.txt file to install Pillow, my container works great locally, but when I deploy to SageMaker, 'pip3 install' fails with an error indicating my container doesn't have internet access.
If restricted internet access isn't a requirement, then you should be able to enable internet access by making enable_network_isolation=False when instantiating your Model class in the SageMaker Python SDK, as shown here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L85
If restricted internet access is a requirement, this means that you will need to either install your dependencies in your own container beforehand or make use of the packaging as you mentioned in your correspondence.
I have copied your provided Dockerfile and created an image to run as an image in order to reproduce the error you are seeing. I was not able to reproduce the error as quoted below:
However, when I run my container locally, trying to import in python using "from PIL import Image" results in error "No module named PIL". I've tried variations like "import Image", but PIL doesn't seem to be installed in the context in which the code is running when I start the container.
I created a similar Docker image and ran it as a container with the following command:
docker run -it --entrypoint bash <DOCKER_IMAGE>
from within the container I started a Python3 session and ran the following commands locally without error:
root#13eab4c6e8ab:/# python3 -s
Python 3.5.2 (default, Oct 8 2019, 13:06:37)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from PIL import Image
Can you please provide the code for how you're starting your SageMaker jobs?
Please double check that the Docker image you have created is the one being referenced when starting your SageMaker jobs.
Please let me know if there is anything I can clarify.
Thanks!

Unable to upgrade pip in docker build

In running the Docker build (using Jenkins CI), it fails on upgrading pip (last line of the docker file). I need it to upgrade version 8.1.1, as it suggest in the log, as my deploy fails on PIP versions mismatch.
Dockerfile
FROM ubuntu:14.04
FROM python:3.4
# Expose a port for gunicorn to listen on
EXPOSE 8002
# Make a workdir and virtualenv
WORKDIR /opt/documents_api
# Install everything else
ADD . /opt/documents_api
# Set some environment varialbes for PIP installation and db management
ENV CQLENG_ALLOW_SCHEMA_MANAGEMENT="True"
RUN apt-get update
RUN apt-get install -y python3-pip
RUN pip3 install --upgrade pip
Here's the error:
Step 15 : RUN pip3 install --upgrade pip
19:46:00 ---> Running in 84e2bcc850c0
19:46:04 Collecting pip
19:46:04 Downloading pip-8.1.1-py2.py3-none-any.whl (1.2MB)
19:46:04 Installing collected packages: pip
19:46:04 Found existing installation: pip 7.1.2
19:46:04 Uninstalling pip-7.1.2:
19:46:05 Successfully uninstalled pip-7.1.2
19:46:10 Exception:
19:46:10 Traceback (most recent call last):
19:46:10 File "/usr/local/lib/python3.4/shutil.py", line 424, in _rmtree_safe_fd
19:46:10 os.unlink(name, dir_fd=topfd)
19:46:10 FileNotFoundError: [Errno 2] No such file or directory: 'pip'
19:46:10 You are using pip version 7.1.2, however version 8.1.1 is available.
When you use two FROM directives, docker creates two output images, that's why it's messed up.
First, remove FROM ubuntu:14.04 and don't apt-get update in a Dockerfile, it's a bad practice (your image will be different every time you build, defeating the whole purpose of containers/Docker).
Second, you can check official python images Dockerfile to know which version of pip is installed, for example, python:3.4 (it's already v8.1.1).
Third, there is a special image for you case (external application): python:3.4-onbuild. Your Dockerfile can be reduced to:
FROM python:3.4-onbuild
ENV CQLENG_ALLOW_SCHEMA_MANAGEMENT="True"
EXPOSE 8002
CMD python myapp.py
One last thing, try to use alpine based images, they're much smaller (for python, it's almost 10 time smaller than the ubuntu based).
turns out the host I was running had no outside (internet) access. So the upgrade was failing. We solved it by adding another package to the DTR that had the necessary version in it.
use /usr/bin/ for run pip. Example :
/usr/bin/pip install --upgrade pip
running this command solved the same problem for me (python 3.9):
RUN /usr/local/bin/python -m pip install --upgrade pip

Resources