How to reduce multistage build duplicate steps time cost issue? - docker

I have a go application, which depends on cgo. When build, it needs libsodium-dev, libzmq3-dev, libczmq-dev, and when run it also needs above three packages.
Currently, I use next multistage build: a golang build environment as the first stage & a debian slim as the second stage. But you could see the 3 packages installed for two times which waste time(Later I may have more such kinds of package added).
FROM golang:1.12.9-buster AS builder
WORKDIR /src/pigeon
COPY . .
RUN apt-get update && \
apt-get install -y --no-install-recommends libsodium-dev && \
apt-get install -y --no-install-recommends libzmq3-dev && \
apt-get install -y --no-install-recommends libczmq-dev && \
go build cmd/main/pgd.go
FROM debian:buster-slim
RUN apt-get update && \
apt-get install -y --no-install-recommends libsodium-dev && \
apt-get install -y --no-install-recommends libzmq3-dev && \
apt-get install -y --no-install-recommends libczmq-dev && \
apt-get install -y --no-install-recommends python3 && \
apt-get install -y --no-install-recommends python3-pip && \
pip3 install jinja2
WORKDIR /root/
RUN mkdir logger
COPY --from=builder /src/pigeon/pgd .
COPY --from=builder /src/pigeon/logger logger
CMD ["./pgd"]
Of course, I can give up multi-stage build, just use golang1.12.9-buster for build, and continue for run, but this will make final run image bigger (which is the advantage of multi-stage build).
Do I miss something or I had to make choice between above?

this is my take about your question:
FROM debian:buster-slim as base
RUN mkdir /debs /debs_tmp \
&& chmod 777 /debs /debs_tmp
WORKDIR /debs
RUN apt-get update \
&& apt-get install -y -d \
--no-install-recommends \
-o dir::cache::archives="/debs_tmp/" \
libsodium-dev \
libzmq3-dev \
libczmq-dev \
&& mv /debs_tmp/*.deb /debs \
&& rm -rf /debs_tmp \
&& apt-get install -y --no-install-recommends \
python3 \
python3-pip \
&& pip3 install jinja2 \
&& rm -rf /var/lib/apt/lists/*
##################
FROM golang:1.12.9-buster AS builder
COPY --from=base /debs /debs
WORKDIR /debs
RUN dpkg -i *.deb
WORKDIR /src/pigeon
COPY . .
RUN go build cmd/main/pgd.go
##################
FROM base
RUN rm -rf /debs
WORKDIR /root/
RUN mkdir logger
COPY --from=builder /src/pigeon/pgd .
COPY --from=builder /src/pigeon/logger logger
CMD ["./pgd"]
You can download the required packages in a temporary folder, move the debs in a new location and finally COPY the debs in the next stage. Finally you simply use the first image you've created.
BTW the containers will run as root. This might be an issue depending on what the software does, you might want to consider to use a user without "powers".
EDIT: sorry for the edits but I ran a couple of example locally and didn't have a go script ready.

At the COPY . . step, any time your source changes, the cache will bust and you will run all later steps again. You can reorder the steps to allow docker to cache the install of your dependencies. You can also join the apt-get install commands into one to reduce overhead of processing the package manager db.
FROM golang:1.12.9-buster AS builder
WORKDIR /src/pigeon
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libsodium-dev \
libzmq3-dev \
libczmq-dev
COPY . .
RUN go build cmd/main/pgd.go
FROM debian:buster-slim
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libsodium-dev \
libzmq3-dev \
libczmq-dev \
python3 \
python3-pip \
&& pip3 install jinja2
WORKDIR /root/
RUN mkdir logger
COPY --from=builder /src/pigeon/pgd .
COPY --from=builder /src/pigeon/logger logger
CMD ["./pgd"]
You will still install the packages twice, but now those installs are cached for future builds. The way to reuse the install of the libraries is to reorder the steps, installing the libraries in a common base image, and then install the go compiler on your build stage, but that will almost certainly be more overhead than installing libraries twice.
With BuildKit, you could share the apt cache between builds using an experimental syntax, but this requires that all builds use BuildKit (the syntax is not backwards compatible), and modifying docker's Debian image to preserve the apt package cache. From the BuildKit experimental documentation, there's the following example for apt:
# syntax = docker/dockerfile:experimental
FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
apt update && apt install -y gcc
https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

Related

Problem installing packages in multi-stage Dockerfile in the final stage

I want to create a minimal docker image.
For that purpose I am using the following multi-stage build dockerfile.
FROM python:3.9-slim as base
ENV LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONFAULTHANDLER=1 \
PYTHONHASHSEED=random \
PYTHONUNBUFFERED=1
WORKDIR /app
FROM base as builder
ENV PIP_DEFAULT_TIMEOUT=100 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
POETRY_VERSION=1.1.13
COPY pyproject.toml poetry.lock ./
RUN apt-get update && \
apt-get install make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
libffi-dev liblzma-dev python3.9-venv --yes && \
pip install "poetry==$POETRY_VERSION" && \
python -m venv /venv && \
poetry export -f requirements.txt | /venv/bin/pip install -r /dev/stdin
COPY . /app
RUN poetry build && /venv/bin/pip install dist/*.whl
FROM base as final
ENV PATH=/venv/bin:$PATH
COPY --from=builder /venv /venv
RUN apt-get update && apt-get install -y procps curl
# for prometheus
EXPOSE 9090
CMD ["my_command"]
However, no matter where I put the final install command in the final stage the commands are not found in the final image.
RUN apt-get update && apt-get install -y procps curl
I have tried putting it before and after the COPY and ENV and still nothing...
Finally, I added another stage between base and builder just to run this command and then everything works fine.
It's bugging me why this would be the case though. Any ideas what's wrong with the dockerfile above?

How to run a bash script that takes multiple user intactive inputs , as part of dockerfile

I have the below dockerfile that needs to run a owasp bash file for its intallation.
This .sh file needs multiple inputs(like 1, Y, enter) from the user for the completion of installation.
How do I provide these inputs from dockerfile or is there a way to skip these inputs and continue the installation.
This dockerfile is a part of the docker-compose.
Below is thew dockerfile
FROM ubuntu:20.04
RUN apt-get update && apt-get upgrade -y && apt-get clean
RUN apt-get install python3-pip -y
RUN apt-get install vim -y
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Indian
# Install OpenJDK-8
RUN apt-get install -y openjdk-8-jdk && \
apt-get install -y ant && \
apt-get clean;
# Fix certificate issues
RUN apt-get update && \
apt-get install ca-certificates-java && \
apt-get clean && \
update-ca-certificates -f
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME
RUN apt-get install wget -y && \
apt-get install unzip -y && \
apt-get install zip -y
RUN mkdir /home/owasp
RUN wget -c https://github.com/zaproxy/zaproxy/releases/download/v2.11.0/ZAP_2_11_0_unix.sh -P /home/owasp
RUN chmod u+x /home/owasp/ZAP_2_11_0_unix.sh
RUN ./home/owasp/ZAP_2_11_0_unix.sh
Use the Linux Package : https://github.com/zaproxy/zaproxy/releases/download/v2.11.0/ZAP_2.11.0_Linux.tar.gz
That has the same contents but is just a gziped tar file :)
Full list of ZAP downloads available is on https://www.zaproxy.org/download/
Or you can always extend our docker images https://www.zaproxy.org/docs/docker/
To provide input for command use some input generator and pipe it with your command.
Typical example is using command yes which provides endless stream of "y" on output:
RUN yes|./own-shell-scrpit.sh
You can run printf 'y\n1abc\nxxx' and pipe it. "\n" in printf states for newline (or enter).
I would suggest adding a ENTRYPOINT so it by default will invoke your bash script, but it gives the flexibily to the end user to pass different arguments. See the official docs. Keep in mind the CMD provided in a Dockerfile is a default command. You override it by passing any other value.
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Indian
RUN apt-get update && apt-get upgrade -y && apt-get clean
RUN apt-get install python3-pip -y
RUN apt-get install vim -y
# Install OpenJDK-8
RUN apt-get install -y openjdk-8-jdk && \
apt-get install -y ant && \
apt-get clean;
# Fix certificate issues
RUN apt-get update && \
apt-get install ca-certificates-java && \
apt-get clean && \
update-ca-certificates -f
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME
RUN apt-get install wget -y && \
apt-get install unzip -y && \
apt-get install zip -y
RUN mkdir /home/owasp
RUN wget -c https://github.com/zaproxy/zaproxy/releases/download/v2.11.0/ZAP_2_11_0_unix.sh -P /home/owasp
RUN chmod u+x /home/owasp/ZAP_2_11_0_unix.sh
ENTRYPOINT ./home/owasp/ZAP_2_11_0_unix.sh
CMD ['--some', '--default', '--args']
You can even choose to pass default flags on build. So your script will then always run with default flags you provided on docker build --build-args DEFAULT_PARAMS=--foo, unless you override it:
ARGS DEFAULT_PARAMS
FROM ubuntu:20.04
ENV DEFAULT_PARAMS=${DEFAULT_PARAMS}
ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=Indian
RUN apt-get update && apt-get upgrade -y && apt-get clean
RUN apt-get install python3-pip -y
RUN apt-get install vim -y
# Install OpenJDK-8
RUN apt-get install -y openjdk-8-jdk && \
apt-get install -y ant && \
apt-get clean;
# Fix certificate issues
RUN apt-get update && \
apt-get install ca-certificates-java && \
apt-get clean && \
update-ca-certificates -f
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
RUN export JAVA_HOME
RUN apt-get install wget -y && \
apt-get install unzip -y && \
apt-get install zip -y
RUN mkdir /home/owasp
RUN wget -c https://github.com/zaproxy/zaproxy/releases/download/v2.11.0/ZAP_2_11_0_unix.sh -P /home/owasp
RUN chmod u+x /home/owasp/ZAP_2_11_0_unix.sh
ENTRYPOINT ./home/owasp/ZAP_2_11_0_unix.sh
CMD ${DEFAULT_PARAMS}

Docker: COPY failed: stat <file>: file does not exist

I am trying to copy a file into my docker container but the command fails. The file is in the same directory as the Dockerfile, so I don't understand the reason for the error.
I'd appreciate any help or advice. Thanks beforehand.
This is the code:
FROM ubuntu:20.04 as builder
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get install -y \
build-essential \
cmake \
software-properties-common \
libopencv-dev
RUN add-apt-repository -y ppa:chrberger/libcluon
RUN apt-get update
RUN apt-get install -y libcluon
ADD . /opt/sources
WORKDIR /opt/sources
RUN mkdir build && \
cd build && \
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/tmp/dest .. && \
make && make install
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update --fix-missing
RUN apt-get install -y \
libopencv-core4.2 \
libopencv-imgproc4.2 \
libopencv-video4.2 \
libopencv-calib3d4.2 \
libopencv-features2d4.2 \
libopencv-objdetect4.2 \
libopencv-highgui4.2 \
libopencv-videoio4.2 \
libopencv-flann4.2 \
libopencv-dnn-dev \
python3-opencv
WORKDIR /usr/bin
COPY --from=builder /tmp/dest /usr
COPY --from=builder yolov3-tiny_obj.cfg /params
ENTRYPOINT ["/usr/bin/opendlv-perception-helloworld"]
Could you please clarify which line in your Dockerfile causes the error message?
Is the file you are trying to copy from your working directory yolov3-tiny_obj.cfg?
If that is the case, it fails because you specify to copy it from the builder stage.
The line should probably look like this:
COPY yolov3-tiny_obj.cfg /params

Docker image size is coming up to 1.7 G for Ubuntu with Python packages

Following is my Dockerfile :-
FROM ubuntu:18.04 AS builder
RUN apt update -y
RUN apt install python3.8 -y && apt install python3-pip -y
RUN apt install build-essential automake pkg-config libtool libffi-dev libgmp-dev -y
RUN apt install libsecp256k1-dev -y
RUN apt install openjdk-8-jre -y
RUN apt install git -y
RUN apt install libkrb5-dev -y
RUN apt install vim -y
RUN mkdir /opt/app
RUN chown -R root:root /opt/app
COPY ["requirements.txt","/opt/app/requirements.txt"]
SHELL ["/bin/bash", "-c"]
WORKDIR /opt/app
RUN pip3 install -r requirements.txt && apt-get -y clean all
RUN mkdir /opt/app/
RUN chown -R root:root /opt/app/
RUN cd /opt/app/
RUN git clone -b master https://bitbucket.org/heroes/test.git
CMD ["bash","/opt/app/bin/connect.sh"]
Docker image is generating with an image file size of 1.7G. I need to have OpenJDK hence cannot use a standard python package as a base package. When I perform docker history , I can see 2 or 3 layers (installing packages above like Python3.8, OpenJDK and libsecp256k1-dev) taking up to 400MB to 500MB in size. Ubuntu as a base image takes only 64 MB however rest of size is taking by my dockerfile layers.
I believe I need to re-write the dockerfile in order to reduce the file size which I did but nothing happened concrete.
Please assist me on reducing the image less than 1 GB at least.
[Update]
Below is my updated Dockerfile:-
FROM ubuntu:18.04 AS builder
WORKDIR /opt/app
COPY requirements.txt /opt/app/aws/requirements.txt
RUN mkdir -p /opt/app/aws \
&& apt-get update -yq \
&& apt-get install -y python3.8 python3-pip openjdk-8-jre -yq && apt-get -y clean all \
&& chown -R root:root /opt/app && cd /opt/app/aws && pip3 install -r requirements.txt
FROM alpine
COPY --from=builder /opt/app /opt/app
SHELL ["/bin/bash", "-c"]
CMD ["bash","/opt/app/aws/bin/connector/connect.sh"]
Screenshot of image size:-
After removing unwanted libraries like git, etc and using the multi-stage build, the image is now approx 1.7 GB which I believe is a lot. Any suggestion to improve this?
You have multiple issues going on.
First, each of your RUN apt install is increasing your image size, you should have them all in the same RUN stage, and at the end of the stage, delete all cached apt files.
Second, you're installing unnecessary stuff. Why would you need vim and git for instance? Why are you installing build-essential and other build-related stuff if you're not building anything?
Third, it seems you tried to do a multi-stage build but ended up adding everything to the same image. Read up on python multi-stage builds.
If we consider best practices instead of multiple RUN use single RUN.
For example
RUN apt-get update -yq \
&& apt-get install -y python3-dev build-essential -yq \
&& apt-get install curl -yq \
&& pip install -r requirements.txt \
&& apt-get purge -y --auto-remove gcc python3-dev build-essential
you can use multistage builds if you don't require git in your final image you can remove in final stage
Also if possible you can use alpine version also.
Try disabling recommended packages of APT with --no-install-recommends, you can read more about it from here.
Now the image is smaller:
FROM ubuntu:18.04 AS builder
RUN apt update -y
RUN apt install python3-pip -y
RUN apt install build-essential automake pkg-config libtool libffi-dev libgmp-dev -y
RUN apt install libsecp256k1-dev -y
RUN apt install openjdk-8-jre-headless -y
RUN apt install git -y
RUN apt install libkrb5-dev -y
RUN apt install vim -y
RUN mkdir /opt/app
RUN chown -R root:root /opt/app
COPY ["requirements.txt","/opt/app/requirements.txt"]
SHELL ["/bin/bash", "-c"]
WORKDIR /opt/app
RUN pip3 install -r requirements.txt && apt-get -y clean all
RUN mkdir /opt/app/
RUN chown -R root:root /opt/app/
RUN cd /opt/app/
RUN git clone -b master https://bitbucket.org/heroes/test.git
CMD ["bash","/opt/app/bin/connect.sh"]

How do I 'copy' installed R-packages from the 1ste stage to 2nd stage using multistage building on a R-base image?

I'm trying to build an image base on R-base, following the multi stage method. How can I copy the installed packages from the 1ste stage into the 2nd stage? And nothing else?
The current file gives me basically a 'packageless' R-base version. So the packages installed in the 1ste stage are 'lost' somewhere.
I think it has something to do with making and choosing the correct directories. This is a confusing part for me, since I'm fairly new to dockerizing applications.
Thanks for all your help!
Below my current file:
# Base image
FROM rocker/r-base:latest AS stage1
## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
libxml2-dev \
libssl-dev && \
echo "r <- getOption('repos');r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile && \
Rscript -e "install.packages(c('AzureStor'))"
##2nd stage, pulling 'fresh' base image
FROM rocker/r-base:latest
#COPY packages from 1st stage
COPY --from=stage1 /usr/local/lib/R/site-library /usr/local/lib/R/site-library
## create directories
RUN mkdir -p /script \
#Copy scripts
COPY /script /script
## Set workdir
WORKDIR /script
In the comments you note that you want to get rid of any excess 'weight'. The latter typically comes from having development tools and packages installed. Now the rocker/r-base image brings in quite a bit of weight already, since it has r-base-devel with its dependencies installed. However, we can try to not add further weight by having only the run-time dependencies in the final image by getting rid of the build-time dependencies. Build-time dependencies that are not necessary at run-time for an R package are typically development files like header files for system libraries, e.g. you don't need the libxml2-dev package at run-time. The libxml2 package would be enough.
I see several possible approaches to this.
First, you could use binary packages for those packages that need compilation against system libraries. I have not checked the dependencies for AzureStor, but it might well be that all the required R packages exist as compiled Debian packages. These will only depend on the run-time dependencies keeping the images size small and the build time short. Your Dockerfile would look something like this:
FROM rocker/r-base:latest
## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
r-cran-... \
r-cran-... && \
Rscript -e "install.packages(c('AzureStor'))" && \
apt-get clean %% \
rm -rf /var/lib/apt/lists/* && \
rm -rf /tmp/*
## create directories
RUN mkdir -p /script
#Copy scripts
COPY /script /script
## Set workdir
WORKDIR /script
Second, you could install both build- and run-time dependencies before installing R packages from source and remove the build-time dependencies after it, all within one command:
FROM rocker/r-base:latest
## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
libxml2-dev libxml2 \
libssl-dev libssl1.1 && \
Rscript -e "install.packages(c('AzureStor'))" && \
apt-get purge --yes libxml2-dev libssl-dev && \
apt-get clean %% \
rm -rf /var/lib/apt/lists/* && \
rm -rf /tmp/*
## create directories
RUN mkdir -p /script
#Copy scripts
COPY /script /script
## Set workdir
WORKDIR /script
Finally, you could use a multistage build with three stages:
Add the run-time dependencies.
Add the build-time dependencies and install packages into /usr/local/lib/R/site-library.
Use 1. as base and add the packages from 2.
So something like this:
# Base image
FROM rocker/r-base:latest AS stage1
## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
libxml2 \
libssl1.1 && \
apt-get clean %% \
rm -rf /var/lib/apt/lists/* && \
rm -rf /tmp/*
FROM stage1 AS stage2
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
libxml2-dev \
libssl-dev && \
Rscript -e "install.packages(c('AzureStor'))"
FROM stage1
COPY --from=stage2 /usr/local/lib/R/site-library /usr/local/lib/R/site-library
## create directories
RUN mkdir -p /script \
#Copy scripts
COPY /script /script
## Set workdir
WORKDIR /script
I have personally used the first and second approach. I have not tested the third approach, by I expect it to work as well.

Resources