Pypy datascience docker image - docker

I have some datascience projects running in docker containers (I use k8s). I am trying to speed up my code by using pypy as my interpreter, but this has been a nightmare.
My OS is ubuntu 20.04
The main libraries I need are:
SQLAlchemy
SciPy
gRPC
For grpc I'm using grpclib, and for SciPy I'm installing it using the miniconda docker image.
My final hurdle is installing psycopg2cffi to make SQLAlchemy work, but after a couple of all-nighters I still haven't managed to make this work. I can install it, but when I run I get a SCRAM authentication problem that I've seen others also get.
Is there a pypy docker file someone has already created that has datascience libraries in it? Doesn't seem like it would be something no one has tried to be before..
Here's by dockerfile so far:
FROM conda/miniconda3 as base
# Setp conda env with pypy3 as the interpreter
RUN conda create -c conda-forge -n pypy-env pypy python=3.8 -y
ENV PATH="/usr/local/envs/pypy-env/bin:$PATH"
RUN pypy -m ensurepip
RUN apt-get -y update && \
apt-get -y install build-essential g++ python3-dev libpq-dev
# Install big/annoying libraries first
RUN pip install psycopg2cffi -y
RUN conda install scipy -y
RUN pip install numpy
WORKDIR /home
COPY ./core/requirements/requirements.txt .
COPY ./core/requirements/basic_requirements.txt .
RUN pip install -r ./requirements.txt
FROM python:3.8-slim as final
WORKDIR /home
COPY --from=base /usr/lib/x86_64-linux-gnu/libpq* /usr/lib/x86_64-linux-gnu/
COPY --from=base /usr/local/envs/pypy-env /usr/local/envs/pypy-env
ENV PATH="/usr/local/envs/pypy-env/bin:$PATH"
COPY .env .env
COPY .src/ .

Related

Workflow for building python wheels in a multistage dockerfile with pipenv

In order to keep final docker image small, my usual approach to building python projects with binary dependencies is to build the pinned dependencies in a first stage and copy them to a final stage lacking the building toolchains. Broadly:
FROM python:3 as builder
RUN apt-get install -y libfoo-dev libbar-dev
COPY constraints.txt /
RUN pip wheel \
--constraint /constraints.txt \
--wheel-dir /wheels \
python-foo pyBar
FROM python:3-slim
RUN apt-get install -y libfoo libbar
COPY requirements.txt constraints.txt /
COPY --from=builder /wheels /wheels
RUN pip install \
--requirement /requirements.txt \
--constraint /constraints.txt \
--only-binary :all: \
--find-links /wheels
Now I am trying to something similar on a project managed with pipenv and I am quite astray on how to achieve the same effect: pre-building the few projects that lack a public wheel in a first stage for the version pinned in the lockfile, and use them in a later pipenv install --deploy in the final stage.
Does this even make sense with the hash checking pipenv does? Is there any alternative to reduce the final image size? I'd like to avoid using a private index where to store prebuilt wheels, I'd rather keep the solution contained in the Dockerfile.
Related question How to make lightweight docker image for python app with pipenv
A solution is to install a full virtualenv and copy it, not only some wheels.
FROM python:3 as builder
RUN apt-get install -y libfoo-dev libbar-dev
RUN pip install pipenv
WORKDIR /app
COPY Pipfile* /app
RUN mkdir /app/.venv
RUN pipenv install --deploy
FROM python:3-slim
RUN apt-get install -y libfoo libbar
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
ENV PATH=/app/.venv/bin:$PATH

Is it possible create a base image from a file?

repo with a few services and in each service, I have the following base code:
FROM python:3.8.13-slim-bullseye
WORKDIR /usr/app
RUN apt-get update
RUN apt-get install default-libmysqlclient-dev build-essential -y
RUN python -m pip install --upgrade pip
RUN pip install pipenv setuptools
This is a little slow to rebuild each time, and sometimes I need to drop all images, so the idea is to know if is possible, to create Dockerfile as a base image and import this from another docker file in order to build these steps only one time locally.
Thanks

how can i reduce the size of the docker image

i'm new with docker,and i created a docker image with Dockerfile as follows, it's used for a raspberry pi so all the packages are needed, i read the articles of multistage of dockerfile, but i don't understand much, how can i reduce the size of the image to simplify this deployment on raspberry?
FROM continuumio/anaconda3:latest
RUN conda create -y -n dcase2020 python=3.7
SHELL ["conda", "run", "-n", "dcase2020", "/bin/bash", "-c"]
RUN conda install -c conda-forge vim -y
RUN conda install pyaudio
RUN pip install librosa
RUN conda install psutil
RUN pip install psds_eval
RUN conda install -y pandas h5py scipy \
&&conda install -y pytorch torchvision -c pytorch \
&&conda install -y pysoundfile youtube-dl tqdm -c conda-forge \
&&conda install -y ffmpeg -c conda-forge \
&&pip install dcase_util \
&&pip install sed-eval
EXPOSE 80
CMD [“bash”]
Thank you very much!
You are creating a new environment that probably only contains the requirements for your project, so no use in having the huge anaconda base env as extra weight, instead, just switch to a miniconda container like continuumio/miniconda3

How to run jupyter notebook using docker with ubuntu?

I am very new to docker and could not figure out how to search google to answer my question.
I am using windows OS
I've created docker image using
FROM python:3
RUN apt-get update && apt-get install -y python3-pip
COPY requirements.txt .
RUN pip install -r requirements.txt
RUN pip3 install jupyter
RUN useradd -ms /bin/bash demo
USER demo
WORKDIR /home/demo
ENTRYPOINT ["jupyter", "notebook", "--ip=0.0.0.0"]
and it worked fine. Now I've tried to create it again but with different libraries in requirements.txt it fails to build, it outputs ERROR: Could not find a version that satisfies requirement apturl==0.5.2. When I search what apturl is, I think we need ubuntu OS to install it.
So my question is how do you create a jupyter notebook server using docker with ubuntu libraries? (I am using Windows OS). Thanks!
try upgrading pip.
RUN pip install -U pip
RUN pip3 install -r requirements.txt

Using Pillow in Docker

I am not able to install python's PIL module in docker for some reason. Here's a description of what I have:
requirements.txt
Pillow
flask
redis
Dockerfile
FROM python:2.7
ADD . /code
WORKDIR /code
RUN pip install -r requirements.txt
CMD python app.py
app.py
import PIL
Commands
$ sudo docker build -t web .
Installing collected packages: Pillow, Werkzeug, MarkupSafe, Jinja2, itsdangerous, flask, redis
Successfully installed Jinja2-2.8 MarkupSafe-0.23 Pillow-2.9.0 Werkzeug-0.10.4 flask-0.10.1 itsdangerous-0.24 redis-2.10.3
---> 91dfb38bd480
Removing intermediate container 4e4ca5801814
Step 4 : CMD python app.py
---> Running in e71453f2fab6
---> d62996658bd6
Removing intermediate container e71453f2fab6
Successfully built d62996658bd6
$ sudo docker-compose up
Here's what I get:
Output
web_1 | File "app.py", line 1, in <module>
web_1 | import PIL
web_1 | ImportError: No module named PIL
I thought maybe adding PIL in requirements.txt would work, but here's what happens when I build
$ sudo docker build -t web .
....
Collecting PIL (from -r requirements.txt (line 1))
Could not find a version that satisfies the requirement PIL (from -r requirements.txt (line 1)) (from versions: )
Some externally hosted files were ignored as access to them may be unreliable (use --allow-external PIL to allow).
No matching distribution found for PIL (from -r requirements.txt (line 1))
Any idea what should be done from here?
Add RUN apk add zlib-dev jpeg-dev gcc musl-dev in the Dockerfile and then add Pillow in the requirements.txt
PIL would be the Python Imaging Library (PIL)
(sometimes, you would need import Image instead of import PIL)
According to "How do I install python imaging library (PIL)?", you would need to install others components as well
sudo apt-get build-dep python-imaging
sudo apt-get install libjpeg62 libjpeg62-dev
pip install PIL
See also a5huynh/scrapyd-playground/Dockerfile for an example using Pillow (Python Imaging Library) dependencies.
(But be aware, as Hugo comments below, that this mixes two modules: PIL and Pillow.
Pillow is a maintained fork and a drop-in replacement of the original, unmaintained PIL, so you shouldn't have both installed at the same time)
RUN apt-get update && apt-get install -y \
python-dev python-pip python-setuptools \
libffi-dev libxml2-dev libxslt1-dev \
libtiff4-dev libjpeg8-dev zlib1g-dev libfreetype6-dev \
liblcms2-dev libwebp-dev tcl8.5-dev tk8.5-dev python-tk
# Add the dependencies to the container and install the python dependencies
ADD requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt && rm /tmp/requirements.txt
RUN pip install Pillow
with requirements:
Pillow==2.6.1
Scrapy==0.24.4
Twisted==14.0.2
boto==2.36.0
cffi==0.8.6
characteristic==14.2.0
cryptography==0.6.1
cssselect==0.9.1
lxml==3.4.0
pyOpenSSL==0.14
pyasn1==0.1.7
pyasn1-modules==0.0.5
pycparser==2.10
pymongo==2.8
queuelib==1.2.2
scrapy-mongodb==0.8.0
scrapyd==1.0.1
service-identity==14.0.0
six==1.8.0
w3lib==1.10.0
zope.interface==4.1.1
In 2019 (4 years later), Daniel W. complains that:
the decoders / image processors are still missing which results in error like OSError: decoder tiff_lzw not available
He adds however:
I found out my problem originated from a buggy Pillow version (5.0), it complained about missing tiff stuff but in fact it was not missing.
The docs say "Pillow and PIL cannot co-exist in the same environment. Before installing Pillow, please uninstall PIL." Once that is taken care of, the answer by Nooras Fatima Ansari above solved the issue for me. Based on which docker image you use, you can check the pillow docker images for exact dependencies for your docker image.
if you're using docker compose as me just run docker-compose up --build after of add the pillow into requirements.txt

Resources