How to solve python error in docker - Failed building wheel for pyarrow? - docker

I am trying to build in Bamboo and got this error,
Failed to build pyarrow
21-Sep-2022 06:24:14 ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects
21-Sep-2022 06:24:15 The command '/bin/sh -c pip install --upgrade pip && pip install pyarrow' returned a non-zero code: 1
21-Sep-2022 06:24:15 =An error occurred when executing task 'DockerBuild'.
This error occurs only when I add pyarrow or fastparquet in requirements.txt.
This is my requirements.txt file:
requests
urllib3
fastapi
uvicorn[standard]
gunicorn
pytest-cov
prometheus-fastapi-instrumentator
prometheus_client
fastapi-health
python-decouple
ecs-logging
fastapi_health
psycopg2
arrow
anyio
asgiref
certifi
charset-normalizer
click
colorama
h11
idna
python-dotenv
pydantic
sniffio
starlette
typing_extensions
datetime
fastapi_resource_server
sendgrid
PyJWT==2.4.0
bcrypt==3.2.
cryptography==37.0.2
passlib
jose
jira
adal==1.2.7
aiohttp==3.8.1
aiosignal==1.2.0
async-timeout==4.0.2
azure-core==1.25.0
azure-identity==1.10.0
azure-storage-blob==12.13.1
pandas==1.4.4
multidict==6.0.2
numpy==1.23.2
ordered-set==4.1.0
oauthlib==3.2.0
packaging==21.3
python-dateutil==2.8.2
pytz==2022.2.1
requests-oauthlib==1.3.1
six==1.16.0
yarl==1.8.1
Below is my dockerfile:
FROM python:3.10.4-alpine3.15
RUN adduser -D pythonwebapi
WORKDIR /home/pythonwebapi
COPY requirements.txt requirements.txt
COPY logger_config.py logger_config.py
RUN echo 'http://dl-3.alpinelinux.org/alpine/v3.12/main' >> /etc/apk/repositories
RUN apk upgrade && apk add make gcc g++
RUN apk update
RUN apk add libffi-dev
RUN apk add postgresql-dev gcc python3-dev musl-dev
RUN apk add --no-cache musl-dev linux-headers g++
RUN pip install --upgrade pip && pip install arrow && pip install pyarrow
RUN pip install -r requirements.txt && pip install gunicorn
RUN apk del gcc g++ make
COPY app app
COPY init_app.py ./
ENV FLASK_APP init_app.py
RUN chown -R pythonwebapi:pythonwebapi ./
RUN chown -R 777 ./
USER pythonwebapi
EXPOSE 8000 7000
ENTRYPOINT ["gunicorn","--timeout", "1000","init_app:app","-k","uvicorn.workers.UvicornWorker","-b","0.0.0.0"]
Is this error because of the python image?
I am still learning docker so not sure what went wrong here. Can anyone please help me in understanding this?

I have changed the docker file and built it from source since I came to know that pyarrow wheels are not provided for alpine.
FROM python:3.9-alpine
RUN adduser -D pythonwebapi
WORKDIR /home/pythonwebapi
COPY requirements.txt requirements.txt
COPY logger_config.py logger_config.py
RUN echo 'http://dl-3.alpinelinux.org/alpine/v3.9/main' >> /etc/apk/repositories
RUN apk update \
&& apk upgrade \
&& apk add --no-cache build-base \
autoconf \
bash \
bison \
boost-dev \
cmake \
flex \
libressl-dev \
zlib-dev
RUN apk add make gcc g++
RUN apk add libffi-dev
RUN apk add postgresql-dev gcc python3-dev musl-dev
RUN pip install --upgrade pip && pip install -r requirements.txt && pip install gunicorn
RUN apk del gcc g++ make
RUN pip install --no-cache-dir six pytest numpy cython
RUN pip install --no-cache-dir pandas
ARG ARROW_VERSION=3.0.0
ARG ARROW_SHA1=c1fed962cddfab1966a0e03461376ebb28cf17d3
ARG ARROW_BUILD_TYPE=release
ENV ARROW_HOME=/usr/local \
PARQUET_HOME=/usr/local
#Download and build apache-arrow
RUN mkdir -p /arrow \
&& wget -q https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz -O /tmp/apache-arrow.tar.gz \
&& echo "${ARROW_SHA1} *apache-arrow.tar.gz" | sha1sum /tmp/apache-arrow.tar.gz \
&& tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1 \
&& mkdir -p /arrow/cpp/build \
&& cd /arrow/cpp/build \
&& cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-DOPENSSL_ROOT_DIR=/usr/local/ssl \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_SNAPPY=ON \
-DARROW_PARQUET=ON \
-DARROW_PYTHON=ON \
-DARROW_PLASMA=ON \
-DARROW_BUILD_TESTS=OFF \
.. \
&& make -j$(nproc) \
&& make install \
&& cd /arrow/python \
&& python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet \
&& python setup.py install \
&& rm -rf /arrow /tmp/apache-arrow.tar.gz
COPY app app
COPY init_app.py ./
ENV FLASK_APP init_app.py
RUN chown -R pythonwebapi:pythonwebapi ./
RUN chown -R 777 ./
USER pythonwebapi
EXPOSE 8000 7000
ENTRYPOINT ["gunicorn","--timeout", "5000","init_app:app","-k","uvicorn.workers.UvicornWorker","-b","0.0.0.0","-m 3000m"]

Related

Problem installing packages in multi-stage Dockerfile in the final stage

I want to create a minimal docker image.
For that purpose I am using the following multi-stage build dockerfile.
FROM python:3.9-slim as base
ENV LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONFAULTHANDLER=1 \
PYTHONHASHSEED=random \
PYTHONUNBUFFERED=1
WORKDIR /app
FROM base as builder
ENV PIP_DEFAULT_TIMEOUT=100 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
POETRY_VERSION=1.1.13
COPY pyproject.toml poetry.lock ./
RUN apt-get update && \
apt-get install make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
libffi-dev liblzma-dev python3.9-venv --yes && \
pip install "poetry==$POETRY_VERSION" && \
python -m venv /venv && \
poetry export -f requirements.txt | /venv/bin/pip install -r /dev/stdin
COPY . /app
RUN poetry build && /venv/bin/pip install dist/*.whl
FROM base as final
ENV PATH=/venv/bin:$PATH
COPY --from=builder /venv /venv
RUN apt-get update && apt-get install -y procps curl
# for prometheus
EXPOSE 9090
CMD ["my_command"]
However, no matter where I put the final install command in the final stage the commands are not found in the final image.
RUN apt-get update && apt-get install -y procps curl
I have tried putting it before and after the COPY and ENV and still nothing...
Finally, I added another stage between base and builder just to run this command and then everything works fine.
It's bugging me why this would be the case though. Any ideas what's wrong with the dockerfile above?

Cannot access installed python packages in Docker container

I have a docker container that requires a couple Python packages to be installed. I added some commands to install them, but the packages are not available in the container. I think I'm added them wrong, is there a certain way I need to copy over what was installed? Any help appreciated.
Specifically, I should be able to run lottie.py in my container after installing it, but it does not exist, nor does pip3 even though the install is successful.
Below is my Dockerfile:
FROM alpine AS builder
COPY . /go/src/matterbridge
RUN apk --no-cache add go git
WORKDIR /go/src/matterbridge
RUN go build -mod vendor -o /bin/matterbridge
FROM python:3
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir lottie cairosvg
RUN apt-get install libcairo2-dev
FROM alpine
RUN apk --no-cache add ca-certificates mailcap
COPY --from=builder /bin/matterbridge /bin/matterbridge
RUN mkdir /etc/matterbridge \
&& touch /etc/matterbridge/matterbridge.toml \
&& ln -sf /matterbridge.toml /etc/matterbridge/matterbridge.toml
ENTRYPOINT ["/bin/matterbridge", "-conf", "/etc/matterbridge/matterbridge.toml"]
Was given help in the comments that I'm not copying anything over from the Python image. I was able to find python:alpine and use that in my config below
FROM alpine AS builder
COPY . /go/src/matterbridge
RUN apk --no-cache add go git
WORKDIR /go/src/matterbridge
RUN go build -mod vendor -o /bin/matterbridge
FROM python:alpine
RUN apk --no-cache add ca-certificates mailcap
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev curl
RUN apk add jpeg-dev zlib-dev freetype-dev lcms2-dev openjpeg-dev tiff-dev tk-dev tcl-dev
RUN apk add --no-cache --virtual .pynacl_deps build-base python3-dev py3-pip
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir lottie cairosvg
COPY --from=builder /bin/matterbridge /bin/matterbridge
RUN mkdir /etc/matterbridge \
&& touch /etc/matterbridge/matterbridge.toml \
&& ln -sf /matterbridge.toml /etc/matterbridge/matterbridge.toml
ENTRYPOINT ["/bin/matterbridge", "-conf", "/etc/matterbridge/matterbridge.toml"]

Docker | Problem with installing lxml on Python 3.8 [duplicate]

I want to deploy my python project in docker, I wrote lxml>=3.5.0 in the requirments.txt as the project needs lxml. Here is my dockfile:
FROM gliderlabs/alpine:3.3
RUN set -x \
&& buildDeps='\
python-dev \
py-pip \
build-base \
' \
&& apk --update add python py-lxml $buildDeps \
&& rm -rf /var/cache/apk/* \
&& mkdir -p /app
ENV INSTALL_PATH /app
WORKDIR $INSTALL_PATH
COPY requirements-docker.txt ./
RUN pip install -r requirements.txt
COPY . .
RUN apk del --purge $buildDeps
ENTRYPOINT ["celery", "-A", "tasks", "worker", "-l", "info", "-B"]
I got this when I deploy it to docker:
*********************************************************************************
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
*********************************************************************************
error: command 'gcc' failed with exit status 1
----------------------------------------
Rolling back uninstall of lxml
I though it was because 'python-dev' and 'python-lxml', then I edited the dockfile like this:
WORKDIR $INSTALL_PATH
COPY requirements-docker.txt ./
RUN apt-get build-dev python-lxml
RUN pip install -r requirements.txt
It did not work, and I got another error:
---> Running in 73201a0dcd59
/bin/sh: apt-get: not found
How can I install lxml correctly in docker?
I added RUN apk add --update --no-cache g++ gcc libxslt-dev before RUN pip install -r requirements.txt and it worked.
Accepted answer is not neat and installs redundant packages. Better solution for reducing image size will be:
RUN apk add --no-cache --virtual .build-deps gcc libc-dev libxslt-dev && \
apk add --no-cache libxslt && \
pip install --no-cache-dir lxml>=3.5.0 && \
apk del .build-deps
Result image size will be < 163MB
Since I was using a much more bare-bone image I needed some more libs/apps.
This worked for me:
RUN apk add --update --no-cache g++ gcc libxml2-dev libxslt-dev python-dev libffi-dev openssl-dev make
RUN pip install -r requirements.txt
Since only this answer worked for me and I wanted something light
And I liked this answer, but which didn't work for me at first
I've edited it for myself and got this at the end :
RUN apk add --update --no-cache --virtual .build-deps g++ gcc libxml2-dev libxslt-dev python-dev && \
apk add --no-cache libxslt && \
pip install --no-cache-dir lxml>=3.5.0 && \
apk del .build-deps
The final image is around 110MB, and didn't have anymore any libxml and libslt errors
Do as in
https://hub.docker.com/r/ryanfox1985/docker-couchpotato/builds/boinrrs9dbhnutwjxjw2l8m/
Download the apk and install it
RUN wget http://nl.alpinelinux.org/alpine/edge/main/x86_64/py-lxml-3.4.0-r0.apk -O /var/cache/apk/py-lxml.apk
RUN apk add --allow-untrusted /var/cache/apk/py-lxml.apk
Actually, it's just
RUN apt-get install -y libxslt1-dev

Google Cloud Platform - AI Notebook is deleted after instance is stopped

I've set up an AI Platform Notebook Instance using a custom container with the below Dockerfile. I can access the notebook via the JupyterLabs interface. But, when I save everything and stop the notebook, and then turn it back on. I lose all of my files.
I cannot figure out where to set this in the GCP Console or my Dockerfile.
Any advice would be greatly appreciated.
Dockerfile:
FROM osgeo/gdal:ubuntu-small-3.0.4
ARG NB_USER="root"
ARG NB_UID="1000"
ARG NB_GID="100"
USER root
RUN apt-get update && apt-get install -y build-essential --no-install-recommends \
ca-certificates \
python3-pip \
unzip \
wget \
python3-rtree \
python-numpy \
git \
gdal-bin \
libgtk2.0-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /root
RUN mkdir /root/work
# update pip
RUN python3 -m pip install pip --upgrade \
&& python3 -m pip install wheel \
&& python3 -m pip install pip setuptools \
&& python3 -m pip install notebook==6.0.0 \
&& python3 -m pip install jupyterhub==1.0.0 \
&& python3 -m pip install jupyterlab==1.1.3
RUN python3 -m pip install git+https://github.com/toblerity/shapely.git#master#egg=shapely-1.7.1dev \
&& python3 -m pip install rasterio \
&& python3 -m pip install geopandas \
&& python3 -m pip install descartes \
&& python3 -m pip install solaris \
&& python3 -m pip install rio-tiler
EXPOSE 8080
CMD ["jupyter", "lab","--ip", "0.0.0.0", "--allow-root"]
COPY start.sh /usr/local/bin/
COPY start-notebook.sh /usr/local/bin/
COPY start-singleuser.sh /usr/local/bin/
COPY jupyter_notebook_config.py /etc/jupyter/

Why can I not access the google-storage url from inside Docker with moviepy?

I have a docker instance setup from which I am using moviepy.editor.VideoFileClip to edit urls that come from a google cloud instance.
Locally, I have no problem doing this at all and can run:
from moviepy.editor import VideoFileClip
vfc = VideoFileClip('https://storage.googleapis.com/<bucket>/<mp4 name>')
...
However, in the docker instance, I am having problems accessing the file via moviepy, with the error:
Failed to resolve hostname storage.googleapis.com: Name or service not known
In the same python shell, i can run:
import urllib.request
urllib.request.urlretrieve('https://storage.googleapis.com/<bucket>/<mp4 name>', '/tmp/file.mp4')
And it works perfectly. Any idea what's going wrong?
Python Version: Python 3.7.3
Moviepy Version: moviepy==0.2.3.5
Platform Name: Alpine
Platform Version: Linux fe434704cf18 4.9.125-linuxkit #1 SMP Fri Sep 7 08:20:28 UTC 2018 x86_64 Linux
Dockerfile:
FROM jrottenberg/ffmpeg:4.1-alpine as ffmpeg
FROM python:3.7-alpine3.8
RUN apk update && apk upgrade && \
apk add --no-cache --update \
libgcc \
libstdc++ \
curl \
ca-certificates \
libcrypto1.0 \
libssl1.0 \
libgomp \
bash \
expat \
git \
openblas \
musl \
ffmpeg \
ghostscript \
file \
imagemagick
COPY --from=ffmpeg /usr/local /usr/local
WORKDIR /
COPY requirements.txt ./
RUN apk add --no-cache jpeg-dev zlib-dev postgresql-libs postgresql-dev && \
apk add --no-cache --virtual .build-deps gcc g++ build-base linux-headers \
ca-certificates python3-dev libffi-dev libressl-dev && \
ln -s /usr/include/local.h /usr/include/xlocale.h && \
apk add py-numpy && \
pip install pip --upgrade && \
pip install numpy && \
pip install --no-cache-dir -r requirements.txt && \
apk --purge del .build-deps
RUN rm requirements.txt
COPY ./docker/imagemagick.policy.xml etc/ImageMagick-6/policy.xml
COPY . .
# Run celery.py when the container launches
CMD ["celery", "worker", "-A", "a.celery", "--loglevel=info"]

Resources