Extend Dockerfile with additional pip install commands - docker

I'm not a docker expert and I've been searching for an answer to this as it seems like it should be pretty simple -- specifically as a multi-stage build. But if so, it's still not clear to me how I pull off what I'm trying to do within the multi-stage build framework.
original Dockerfile:
FROM python:3.8.5
RUN mkdir /src
WORKDIR /src
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
COPY api/requirements.txt /src/
RUN pip install pip
RUN pip install -r requirements.txt
COPY . /src/
Additional commands that I'd like effectively inserted after the two RUN pip install lines:
COPY api/requirements-dev.txt /src/
RUN pip install -r requirements-dev.txt
Ideally the second couple of lines would (with whatever FROM ... AS statements might be needed) be in Dockerfile-dev, and then I could just build from Dockerfile-dev to capture whatever changes might be in Dockerfile and tack on my dev dependencies.
Obviously I could just copy the original Dockerfile, add the extra lines, call the result Dockerfile-dev, and build from that. However I'm trying to corral all of the dev dependencies into their own files that explicitly inherit the "prod" files as much as possible, as with docker-compose.yml-like inheritance/overrides. That lets me leave the "prod" code untouched and avoid e.g. conflicts when I merge it in, and makes it clear via additional files what stuff is being added to make my dev environment.

Related

got pip error while trying to convert an existing docker file to use distroless image

I have a dockerfile in which i am using python:3.9.2-slim-buster as base image and i am doing the following stuff.
FROM lab.com:5000/python:3.9.2-slim-buster
ENV PYTHONPATH=base_platform_update
RUN apt-get update && apt-get install -y curl && apt-get clean
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
RUN chmod +x ./kubectl
RUN mv ./kubectl /usr/local/bin
WORKDIR /script
RUN pip install SomePackage
COPY base_platform_update ./base_platform_update
ENTRYPOINT ["python3", "base_platform_update/core/main.py"]
I want to convert this to use distroless image. I tried but its not working. I found these resources
https://github.com/GoogleContainerTools/distroless/blob/main/examples/python3/Dockerfile
https://www.abhaybhargav.com/stories-of-my-experiments-with-distroless-containers/
I know this is not correct but this is what i came up with after following these resources
# first stage
FROM lab.com:5000/python:3.9.2-slim-buster AS build-env
WORKDIR /script
COPY base_platform_update ./base_platform_update
RUN apt-get update && apt-get install -y curl && apt-get clean
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
RUN mv ./kubectl /usr/local/bin
# second stage
FROM gcr.io/distroless/python3
WORKDIR /script
COPY --from=build-env /script/base_platform_update ./base_platform_update
COPY --from=build-env /usr/local/bin/kubectl /usr/local/bin/kubectl
COPY --from=build-env /bin/chmod /bin/chmod
COPY --from=build-env /usr/local/bin/pip /usr/local/bin/pip
RUN chmod +x /usr/local/bin/kubectl
ENV PYTHONPATH=base_platform_update
RUN pip install SomePackage
ENTRYPOINT ["python3", "base_platform_update/core/main.py"]
it gives the following error:
/bin/sh: 1: pip: not found
The command '/bin/sh -c pip install SomePackage' returned a non-zero code: 127
I also thought of moving RUN pip install SomePackage to first stage but the couldn't figure out how to do that.
Any help would be appreciated. Thanks
EDIT:
docker images output
gcr.io/distroless/python3 latest 7f711ebcfe29 51 years ago 52.2MB
gcr.io/distroless/python3 debug 7c587fbe3d02 51 years ago 53.3MB
It could be that you need to add that dir to the PATH.
ENV PATH="/usr/local/bin:$PATH"
consider though the final image size difference after adding all those dependencies, it might not be worth all the hassle.
the latest image tagged as python:3.8.5-alpine is 42.7MB while gcr.io/distroless/python3 as of writing this is 52.2MB, after adding the binaries, the script, and nonetheless the package you want to install you may surpass that figure at the end. If pull time is important and network bandwidth usage is expensive that might be a thought to have, otherwise for the current use case seems like too much.
Distroless images are meant only for runtime, as a result, you can't (by default) use the python package manager to install packages, see Google GitHub project readme
"Distroless" images contain only your application and its runtime
dependencies. They do not contain package managers, shells or any
other programs you would expect to find in a standard Linux
distribution.
you could install the packages in a second new stage and copy the installed packages from it to the third but that's not bound to work cause of target OS the package was meant for, incompatibility between the second and third stage etc`.
Here's an exame Dockerfile for that:
# first stage
FROM python:3.8 AS builder
COPY requirements.txt .
# install dependencies to the local user directory (eg. /root/.local)
RUN pip install --user -r requirements.txt
# second unnamed stage
FROM python:3.8-slim
WORKDIR /code
# copy only the dependencies installation from the 1st stage image
COPY --from=builder /root/.local /root/.local
COPY ./src .
# update PATH environment variable
ENV PATH=/root/.local:$PATH
CMD [ "python", "./server.py" ]
Dockerfile credits
You could package your application to a binary using any number of python libs but that depends on how much you need it. You can do that with packages like pyinstaller though it mainly packages the project rather than turning it to a single binary, nuitka which is a rising option and very popular along with cx_Freeze.
Here's a relevant thread on the topic if you're interested.
There's also this article.

Docker image Package Patch within Dockerfile

I have below docker image, where I need to update patch to curl package, in below Docker image, in Line number 3 I am already doing update, but it is shown up in Vulnerabilities report.
I have added, RUN yum -y update curl at the end of Dockerfile, then it is not showing up in Vulnerabilities report.
Any fix?, All Packages must install with latest version, I dont want to be mention explicitly
or any mistakes in Dockerfile?
FROM centos:7 AS base
FROM base AS build
# Install all dependenticies
RUN yum -y update \
&& yum install -y openssl-devel bzip2-devel libffi-devel \
zlib-devel wget gcc make
# Below compile python from source
FROM base
ENV LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib
COPY --from=build /usr/local/ /usr/local/
# Copy Code
COPY . /app/
WORKDIR /app
#Install code dependecies.
RUN /usr/local/bin/python -m pip install --upgrade pip \
&& pip install -r requirements.txt
# Why, I need this step, when I already update RUN in line 3?, If I won't perform I see in compliance report, any fix?
RUN yum -y update curl
# run Application
ENTRYPOINT ["python"]
CMD ["test.py"]
In order to understand what constitutes an image, you need to look at a Dockerfile in a different way:
Every step (with the exception of FROM) creates a new image, with the results of the previous step as a base.
FROM doesn't use the previous step, but an explicitly specified one.
Now, looking at your Dockerfile, you seem to wonder why RUN yum -y update curl doesn't work as expected. For easier understanding, let's trace it backwards:
RUN yum -y update curl
RUN /usr/local/bin/python -m pip install --upgrade pip \ && pip install -r requirements.txt
WORKDIR /app
COPY . /app/
COPY --from=build /usr/local/ /usr/local/
ENV LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib
FROM base -- at this point, the previous step is changed to the last step of base
FROM centos:7 AS base -- here, the previous step is changed to centos:7
As you see, nowhere in the earlier steps is yum update -y curl!
BTW: Typing this, I'm wondering what your precise question is, i.e. whether this works or doesn't or whether you wonder why it's necessary. Are you aware of the difference between yum update and yum update curl even?
docker build and friends have a cache system, based on the text of the input. So if the text of the command yum -y update doesn't change, it will continue using the same cached version of the output forever (or until the cache is deleted). Try running the build with --no-cache and see if that helps.

How do I modify my DOCKERFILE to install wget into kubernetes pod?

Right now my DOCKERFILE builds a dotnet image that is installed/updated and run inside its own pod in a Kubernetes cluster.
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 AS base
ARG DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
ARG DOTNET_CLI_TELEMETRY_OPTOUT=1
WORKDIR /app
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS build
ARG DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
ARG DOTNET_CLI_TELEMETRY_OPTOUT=1
ARG ArtifactPAT
WORKDIR /src
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*
COPY /src .
RUN dotnet restore "./sourceCode.csproj" -s "https://api.nuget.org/v3/index.json"
RUN dotnet build "./sourceCode.csproj" -c Release -o /app
FROM build AS publish
RUN dotnet publish "./sourceCode.csproj" -c
Release -o /app
FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "SourceCode.dll"]
EXPOSE 80
The cluster is very bare-bones and does not include either curl nor wget on it. So, I need to get wget or curl installed in the pod/cluster to execute scripted commands that are set to run automatically after deployment and startup are completed. The command to do the install:
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*
within the DOCKERFILE seems to do nothing to install in the Kubernetes cluster. As after the build run and deploys if I were to exec into the pod and try to run
wget --help
I get wget doesn't exist. I do not have a lot of experience build DOCKERFILEs so I am truely getting stumped. And I want this automated in the DOCKERFILE as I will not be able to log into environments above our Test to perform the install manually.
its not related to kubernetes nor pods. Actually you cant install anything to kubernetes pod. you can install packages to containers which runs on pod.
Your problem is that, you install wget to your build image. when you use this image below you lost all installed packages. because those packages belong to build image. build, base, final they are different images.you need to copy files explicitly like you did final image. like this
COPY --from=publish /app .
so add command in the below to your final image and you can use wget without no problem.
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*
see this link for more info && best practices.
https://www.docker.com/blog/intro-guide-to-dockerfile-best-practices/
Everything between:
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 AS base
ARG DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
ARG DOTNET_CLI_TELEMETRY_OPTOUT=1
WORKDIR /app
and:
FROM base AS final
is irrelevant. With that line, you start constructing a new image from base which was defined in the first block.
(Incidentally, on the next line, you duplicate the WORKDIR statement needlessly. Also, final is the name you'll use to refer to base, it isn't a name for this finally defined image, so that doesn't really make sense - you don't want to do e.g. COPY --from=final.)
You need to install wget in either the base image, or in the last defined image which you'll actually be running, at the end.

Monolith docker application with webpack

I am running my monolith application in a docker container and k8s on GKE.
The application contains python & node dependencies also webpack for front end bundle.
We have implemented CI/CD which is taking around 5-6 min to build & deploy new version to k8s cluster.
Main goal is to reduce the build time as much possible. Written Dockerfile is multi stage.
Webpack is taking more time to generate the bundle.To buid docker image i am using already high config worker.
To reduce time i tried using the Kaniko builder.
Issue :
As docker cache layers for python code it's working perfectly. But when there is any changes in JS or CSS file we have to generate bundle.
When there is any changes in JS & CSS file instead if generate new bundle its use caching layer.
Is there any way to separate out build new bundle or use cache by passing some value to docker file.
Here is my docker file :
FROM python:3.5 AS python-build
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt &&\
pip3 install Flask-JWT-Extended==3.20.0
ADD . /app
FROM node:10-alpine AS node-build
WORKDIR /app
COPY --from=python-build ./app/app/static/package.json app/static/
COPY --from=python-build ./app ./
WORKDIR /app/app/static
RUN npm cache verify && npm install && npm install -g --unsafe-perm node-sass && npm run sass && npm run build
FROM python:3.5-slim
COPY --from=python-build /root/.cache /root/.cache
WORKDIR /app
COPY --from=node-build ./app ./
RUN apt-get update -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt
EXPOSE 9595
CMD python3 run.py
I would suggest to create separate build pipelines for your docker images, where you know that the requirements for npm and pip aren't so frequent.
This will incredibly improve the speed, reducing the time of access to npm and pip registries.
Use a private docker registry (the official one or something like VMWare harbor or SonaType Nexus OSS).
You store those build images on your registry and use them whenever something on the project changes.
Something like this:
First Docker Builder // python-builder:YOUR_TAG [gitrev, date, etc.)
docker build --no-cache -t python-builder:YOUR_TAG -f Dockerfile.python.build .
FROM python:3.5
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt &&\
pip3 install Flask-JWT-Extended==3.20.0
Second Docker Builder // js-builder:YOUR_TAG [gitrev, date, etc.)
docker build --no-cache -t js-builder:YOUR_TAG -f Dockerfile.js.build .
FROM node:10-alpine
WORKDIR /app
COPY app/static/package.json /app/app/static
WORKDIR /app/app/static
RUN npm cache verify && npm install && npm install -g --unsafe-perm node-sass
Your Application Multi-stage build:
docker build --no-cache -t app_delivery:YOUR_TAG -f Dockerfile.app .
FROM python-builder:YOUR_TAG as python-build
# Nothing, already "stoned" in another build process
FROM js-builder:YOUR_TAG AS node-build
ADD ##### YOUR JS/CSS files only here, required from npm! ###
RUN npm run sass && npm run build
FROM python:3.5-slim
COPY . /app # your original clean app
COPY --from=python-build #### only the files installed with the pip command
WORKDIR /app
COPY --from=node-build ##### Only the generated files from npm here! ###
RUN apt-get update -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt
EXPOSE 9595
CMD python3 run.py
A question is: why do you install curl and execute again the pip install -r requirements.txt command in the final docker image?
Triggering every time an apt-get update and install without cleaning the apt cache /var/cache/apt folder produces a bigger image.
As suggestion, use the docker build command with the option --no-cache to avoid caching result:
docker build --no-cache -t your_image:your_tag -f your_dockerfile .
Remarks:
You'll have 3 separate Dockerfiles, as I listed above.
Build the Docker images 1 and 2 only if you change your python-pip and node-npm requirements, otherwise keep them fixed for your project.
If any dependency requirement changes, then update the docker image involved and then the multistage one to point to the latest built image.
You should always build only the source code of your project (CSS, JS, python). In this way, you have also guaranteed reproducible builds.
To optimize your environment and copy files across the multi-stage builders, try to use virtualenv for python build.

Is there a way to install packages in app engine once to avoid the long deploy each time?

I need to have ghostscript and ImageMagick available to do some PDF editting and OCR. I have got to the point that I use a Dockerfile but it seems that gcloud app deploy would start from the beginning each time. Is there a way to speed it up by having the packages installed once?
Here's my Dockerfile:
ROM gcr.io/google-appengine/python
LABEL python_version=python3.6
RUN virtualenv --no-download /env -p python3.6
# Set virtualenv environment variables. This is equivalent to running
# source /env/bin/activate
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/
RUN pip install -r requirements.txt
ADD . /app/
RUN apt-get update
RUN apt-get install imagemagick -y
RUN apt-get install ghostscript
CMD exec gunicorn -b :$PORT main:app
Move those steps earlier in the Dockerfile.
Docker’s layer-caching feature means that it won’t rebuild a step where it’s already run that step from the exact same base image. However, as soon as you run a step that invalidates the cache, nothing after that will be cached. In particular the ADD . step will invalidate the cache if if anything at all in your source tree changes.
Style-wise, I’d change two other things. First, for similar caching reasons, it’s important to run apt-get update and apt-get install in the same RUN step, since previously-cached URLs from the “update” could become invalid. Second, I wouldn’t bother trying to set up a Python virtual environment, since a Docker image already provides an isolated filesystem and Python installation.
That ultimately leaves you with:
FROM gcr.io/google-appengine/python
LABEL python_version=python3.6
RUN apt-get update \
&& apt-get install -y ghostscript imagemagick
COPY requirements.txt /app/
RUN pip install -r requirements.txt
COPY . /app/
EXPOSE 8000
CMD ["gunicorn", "-b", ":8000", "main:app"]

Resources