This question already has answers here:
How do I reduce a python (docker) image size using a multi-stage build?
(4 answers)
Closed last year.
Can any one help me understand how can we
Try to optimize the Dockerfile by removing all unnecessary cache/files to reduce the image size. and
Removing unnecessary binaries/permissions to improve container security
My docker file look like this
FROM python:3.7-alpine
WORKDIR /code
ENV FLASK_APP app.py
ENV FLASK_RUN_HOST 0.0.0.0
RUN apk add --no-cache gcc musl-dev linux-headers
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
CMD ["flask", "run"]
Well, there is actually some ways to do that I guess:
multi-stage build
# STAGE1
FROM alpine AS stage1
WORKDIR /bin
RUN wget https://link/of/some/binaries -O app1 \
&& chmod +x app1
# Run additional commands if you want
# STAGE2
FROM alpine AS stage2
WORKDIR /usr/local/bin
RUN wget https://link/of/some/binaries -O app2 \
&& chmod +x app2
# Run additional commands if you want
# FINAL STAGE (runtime)
FROM python:3.7-alpine as runtime
COPY --from=stage1 /bin/app1 /bin/app1
COPY --from=stage2 /usr/local/bin/app2 /bin/app2
...
this will actually allow you to simply get only the binaries you need that you downloaded on the previous stages.
If you are using apk add and you don't know where things are getting installed you can try to test on an alpine image by running which command
remove cache
... # Install some stuff...
# Remove Cache
RUN rm -rf /var/cache/apk/*
Related
I have a dockerfile in which i am using python:3.9.2-slim-buster as base image and i am doing the following stuff.
FROM lab.com:5000/python:3.9.2-slim-buster
ENV PYTHONPATH=base_platform_update
RUN apt-get update && apt-get install -y curl && apt-get clean
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
RUN chmod +x ./kubectl
RUN mv ./kubectl /usr/local/bin
WORKDIR /script
RUN pip install SomePackage
COPY base_platform_update ./base_platform_update
ENTRYPOINT ["python3", "base_platform_update/core/main.py"]
I want to convert this to use distroless image. I tried but its not working. I found these resources
https://github.com/GoogleContainerTools/distroless/blob/main/examples/python3/Dockerfile
https://www.abhaybhargav.com/stories-of-my-experiments-with-distroless-containers/
I know this is not correct but this is what i came up with after following these resources
# first stage
FROM lab.com:5000/python:3.9.2-slim-buster AS build-env
WORKDIR /script
COPY base_platform_update ./base_platform_update
RUN apt-get update && apt-get install -y curl && apt-get clean
RUN curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
RUN mv ./kubectl /usr/local/bin
# second stage
FROM gcr.io/distroless/python3
WORKDIR /script
COPY --from=build-env /script/base_platform_update ./base_platform_update
COPY --from=build-env /usr/local/bin/kubectl /usr/local/bin/kubectl
COPY --from=build-env /bin/chmod /bin/chmod
COPY --from=build-env /usr/local/bin/pip /usr/local/bin/pip
RUN chmod +x /usr/local/bin/kubectl
ENV PYTHONPATH=base_platform_update
RUN pip install SomePackage
ENTRYPOINT ["python3", "base_platform_update/core/main.py"]
it gives the following error:
/bin/sh: 1: pip: not found
The command '/bin/sh -c pip install SomePackage' returned a non-zero code: 127
I also thought of moving RUN pip install SomePackage to first stage but the couldn't figure out how to do that.
Any help would be appreciated. Thanks
EDIT:
docker images output
gcr.io/distroless/python3 latest 7f711ebcfe29 51 years ago 52.2MB
gcr.io/distroless/python3 debug 7c587fbe3d02 51 years ago 53.3MB
It could be that you need to add that dir to the PATH.
ENV PATH="/usr/local/bin:$PATH"
consider though the final image size difference after adding all those dependencies, it might not be worth all the hassle.
the latest image tagged as python:3.8.5-alpine is 42.7MB while gcr.io/distroless/python3 as of writing this is 52.2MB, after adding the binaries, the script, and nonetheless the package you want to install you may surpass that figure at the end. If pull time is important and network bandwidth usage is expensive that might be a thought to have, otherwise for the current use case seems like too much.
Distroless images are meant only for runtime, as a result, you can't (by default) use the python package manager to install packages, see Google GitHub project readme
"Distroless" images contain only your application and its runtime
dependencies. They do not contain package managers, shells or any
other programs you would expect to find in a standard Linux
distribution.
you could install the packages in a second new stage and copy the installed packages from it to the third but that's not bound to work cause of target OS the package was meant for, incompatibility between the second and third stage etc`.
Here's an exame Dockerfile for that:
# first stage
FROM python:3.8 AS builder
COPY requirements.txt .
# install dependencies to the local user directory (eg. /root/.local)
RUN pip install --user -r requirements.txt
# second unnamed stage
FROM python:3.8-slim
WORKDIR /code
# copy only the dependencies installation from the 1st stage image
COPY --from=builder /root/.local /root/.local
COPY ./src .
# update PATH environment variable
ENV PATH=/root/.local:$PATH
CMD [ "python", "./server.py" ]
Dockerfile credits
You could package your application to a binary using any number of python libs but that depends on how much you need it. You can do that with packages like pyinstaller though it mainly packages the project rather than turning it to a single binary, nuitka which is a rising option and very popular along with cx_Freeze.
Here's a relevant thread on the topic if you're interested.
There's also this article.
I have this Dockerfile
ARG FUNCTION_DIR="/opt/"
FROM node:10.13-alpine#sha256:22c8219b21f86dfd7398ce1f62c48a022fecdcf0ad7bf3b0681131bd04a023a2 AS BUILD_IMAGE
ARG FUNCTION_DIR
RUN apk --update add cmake autoconf automake libtool binutils libexecinfo-dev python2 gcc make g++ zlib-dev
ENV NODE_ENV=production
ENV PYTHON=/usr/bin/python2
RUN mkdir -p ${FUNCTION_DIR}
WORKDIR ${FUNCTION_DIR}
COPY package.json yarn.lock ./
RUN yarn --frozen-lockfile
RUN npm prune --production
RUN yarn cache clean
RUN npm cache clean --force
FROM node:10.13-alpine#sha256:22c8219b21f86dfd7398ce1f62c48a022fecdcf0ad7bf3b0681131bd04a023a2
ARG FUNCTION_DIR
ENV NODE_ENV=production
ENV NODE_OPTIONS=--max_old_space_size=4096
RUN apk update \
&& apk upgrade \
&& apk add mongodb-tools fontconfig dumb-init \
&& rm -rf /var/cache/apk/*
RUN mkdir -p ${FUNCTION_DIR}
WORKDIR ${FUNCTION_DIR}
COPY --from=BUILD_IMAGE ${FUNCTION_DIR}/node_modules ./node_modules
COPY . .
RUN if [ -f core/config/local.js ]; then rm core/config/local.js; fi
RUN cp core/config/local.js.aws.readonly core/config/local.js
USER node
EXPOSE 8080
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["node", "app.js", "--app=search", "--env=production"]
I use this Dockerfile to generate an image (called core-a) that run our application in K8s. I've added some code inside my application to handle the case our application is launched from a lambda function and i've created another Dockerfile like the one above but using custom ENTRYPOINT and CMD setting this values.
ENTRYPOINT [ "/usr/local/bin/npx", "aws-lambda-ric" ]
CMD [ "apps/search/index.handler" ]
Than i deployed this image called core-b to ecr using core-b as docker image for a lambda function and everything works as expected.
After that i thought that i can use the possibility to overwrite entrypoint and CMD in order to use the same docker image for both environments, so i modified Lambda function's image pointing to core-a and using the entrypoint and cmd values i used in core-b dockerfile, but doing so i get an error
Couldn't find valid bootstrap(s): [\"/usr/local/bin/npx\"]
Did anyone have any suggestion ?
Try to remove the quotation marks (" ") when entering the override value in this web form.
These AWS docs unfortunately have an uncorrect note that say to use the quotation marks on each string.
Right now my DOCKERFILE builds a dotnet image that is installed/updated and run inside its own pod in a Kubernetes cluster.
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 AS base
ARG DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
ARG DOTNET_CLI_TELEMETRY_OPTOUT=1
WORKDIR /app
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS build
ARG DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
ARG DOTNET_CLI_TELEMETRY_OPTOUT=1
ARG ArtifactPAT
WORKDIR /src
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*
COPY /src .
RUN dotnet restore "./sourceCode.csproj" -s "https://api.nuget.org/v3/index.json"
RUN dotnet build "./sourceCode.csproj" -c Release -o /app
FROM build AS publish
RUN dotnet publish "./sourceCode.csproj" -c
Release -o /app
FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "SourceCode.dll"]
EXPOSE 80
The cluster is very bare-bones and does not include either curl nor wget on it. So, I need to get wget or curl installed in the pod/cluster to execute scripted commands that are set to run automatically after deployment and startup are completed. The command to do the install:
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*
within the DOCKERFILE seems to do nothing to install in the Kubernetes cluster. As after the build run and deploys if I were to exec into the pod and try to run
wget --help
I get wget doesn't exist. I do not have a lot of experience build DOCKERFILEs so I am truely getting stumped. And I want this automated in the DOCKERFILE as I will not be able to log into environments above our Test to perform the install manually.
its not related to kubernetes nor pods. Actually you cant install anything to kubernetes pod. you can install packages to containers which runs on pod.
Your problem is that, you install wget to your build image. when you use this image below you lost all installed packages. because those packages belong to build image. build, base, final they are different images.you need to copy files explicitly like you did final image. like this
COPY --from=publish /app .
so add command in the below to your final image and you can use wget without no problem.
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*
see this link for more info && best practices.
https://www.docker.com/blog/intro-guide-to-dockerfile-best-practices/
Everything between:
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 AS base
ARG DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
ARG DOTNET_CLI_TELEMETRY_OPTOUT=1
WORKDIR /app
and:
FROM base AS final
is irrelevant. With that line, you start constructing a new image from base which was defined in the first block.
(Incidentally, on the next line, you duplicate the WORKDIR statement needlessly. Also, final is the name you'll use to refer to base, it isn't a name for this finally defined image, so that doesn't really make sense - you don't want to do e.g. COPY --from=final.)
You need to install wget in either the base image, or in the last defined image which you'll actually be running, at the end.
I am running my monolith application in a docker container and k8s on GKE.
The application contains python & node dependencies also webpack for front end bundle.
We have implemented CI/CD which is taking around 5-6 min to build & deploy new version to k8s cluster.
Main goal is to reduce the build time as much possible. Written Dockerfile is multi stage.
Webpack is taking more time to generate the bundle.To buid docker image i am using already high config worker.
To reduce time i tried using the Kaniko builder.
Issue :
As docker cache layers for python code it's working perfectly. But when there is any changes in JS or CSS file we have to generate bundle.
When there is any changes in JS & CSS file instead if generate new bundle its use caching layer.
Is there any way to separate out build new bundle or use cache by passing some value to docker file.
Here is my docker file :
FROM python:3.5 AS python-build
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt &&\
pip3 install Flask-JWT-Extended==3.20.0
ADD . /app
FROM node:10-alpine AS node-build
WORKDIR /app
COPY --from=python-build ./app/app/static/package.json app/static/
COPY --from=python-build ./app ./
WORKDIR /app/app/static
RUN npm cache verify && npm install && npm install -g --unsafe-perm node-sass && npm run sass && npm run build
FROM python:3.5-slim
COPY --from=python-build /root/.cache /root/.cache
WORKDIR /app
COPY --from=node-build ./app ./
RUN apt-get update -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt
EXPOSE 9595
CMD python3 run.py
I would suggest to create separate build pipelines for your docker images, where you know that the requirements for npm and pip aren't so frequent.
This will incredibly improve the speed, reducing the time of access to npm and pip registries.
Use a private docker registry (the official one or something like VMWare harbor or SonaType Nexus OSS).
You store those build images on your registry and use them whenever something on the project changes.
Something like this:
First Docker Builder // python-builder:YOUR_TAG [gitrev, date, etc.)
docker build --no-cache -t python-builder:YOUR_TAG -f Dockerfile.python.build .
FROM python:3.5
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt &&\
pip3 install Flask-JWT-Extended==3.20.0
Second Docker Builder // js-builder:YOUR_TAG [gitrev, date, etc.)
docker build --no-cache -t js-builder:YOUR_TAG -f Dockerfile.js.build .
FROM node:10-alpine
WORKDIR /app
COPY app/static/package.json /app/app/static
WORKDIR /app/app/static
RUN npm cache verify && npm install && npm install -g --unsafe-perm node-sass
Your Application Multi-stage build:
docker build --no-cache -t app_delivery:YOUR_TAG -f Dockerfile.app .
FROM python-builder:YOUR_TAG as python-build
# Nothing, already "stoned" in another build process
FROM js-builder:YOUR_TAG AS node-build
ADD ##### YOUR JS/CSS files only here, required from npm! ###
RUN npm run sass && npm run build
FROM python:3.5-slim
COPY . /app # your original clean app
COPY --from=python-build #### only the files installed with the pip command
WORKDIR /app
COPY --from=node-build ##### Only the generated files from npm here! ###
RUN apt-get update -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt
EXPOSE 9595
CMD python3 run.py
A question is: why do you install curl and execute again the pip install -r requirements.txt command in the final docker image?
Triggering every time an apt-get update and install without cleaning the apt cache /var/cache/apt folder produces a bigger image.
As suggestion, use the docker build command with the option --no-cache to avoid caching result:
docker build --no-cache -t your_image:your_tag -f your_dockerfile .
Remarks:
You'll have 3 separate Dockerfiles, as I listed above.
Build the Docker images 1 and 2 only if you change your python-pip and node-npm requirements, otherwise keep them fixed for your project.
If any dependency requirement changes, then update the docker image involved and then the multistage one to point to the latest built image.
You should always build only the source code of your project (CSS, JS, python). In this way, you have also guaranteed reproducible builds.
To optimize your environment and copy files across the multi-stage builders, try to use virtualenv for python build.
FROM golang:1.8
RUN apt-get -y update && apt-get install -y curl
RUN go get -u github.com/gorilla/mux
RUN go get github.com/mattn/go-sqlite3
RUN curl -sL https://deb.nodesource.com/setup_6.x | bash - && \
apt-get install -y nodejs
COPY . /go/src/beginnerapp
WORKDIR ./src/beginnerapp/beginner-app-react
RUN npm run build
RUN go install beginnerapp/
WORKDIR /go/src/beginnerapp/beginner-app-react
VOLUME /go/src/beginnerapp/local-db
WORKDIR /go/src/beginnerapp
ENTRYPOINT /go/bin/beginnerapp
EXPOSE 8080
At the start, the golang project as well as the reactjs code don't exist on the image and need to be copied over before being able to build (js) / install (golang). Is there a way I can do that build/install process before copying files over to the image? Ideally I'd only need to copy over the golang executable and reactjs production build.
Yes this is possible now using multi stage builds. The idea is that you can have multiple FROM in your docker file and your main image will be built using the last FROM. Below is a sample pseudo structure
FROM node:latest as reactbuild
WORKDIR /app
COPY . .
RUN webpack build
FROM golang:latest as gobuild
WORKDIR /app
COPY . .
RUN go build
FROM alpine
WORKDIR /app
COPY --from=gobuild /app/myapp /app/myapp
COPY --from=reactbuild /app/dist /app/dist
Please read below article for more details
https://docs.docker.com/engine/userguide/eng-image/multistage-build/