I need one application in docker image which requires some specific version of libraries that have to be built from source.
So I am building it during the Docker build process.
Problem is, that it takes so long time (about 30mins).
I am wondering if it's possible to save it to the cache layer and skip it if the build process is done next time.
Here is the critical part of code from Dockerfile:
ADD https://sqlite.org/2022/sqlite-autoconf-3380200.tar.gz sqlite-autoconf-3380200.tar.gz
RUN tar -xvzf sqlite-autoconf-3380200.tar.gz
WORKDIR sqlite-autoconf-3380200
RUN ./configure
RUN make
RUN make install
WORKDIR /tmp
ADD https://download.osgeo.org/proj/proj-9.0.0.tar.gz proj-9.0.0.tar.gz
RUN tar -xvzf proj-9.0.0.tar.gz
WORKDIR proj-9.0.0
RUN mkdir build
WORKDIR build
RUN cmake ..
RUN cmake --build .
RUN cmake --build . --target install
RUN projsync --system-directory --list-files
The important detail about Docker layer caching is that, if any of the previous steps have changed, then all of the following steps will be rebuilt. So for your setup, if you change anything in one of the earlier dependencies, it will cause all of the later steps to be rebuilt again.
This is a case where Docker multi-stage builds can help. The idea is that you'd build each library in its own image, and therefore each library build can be independently cached. You can then copy all of the build results into a final image.
The specific approach I'll describe here assumes (a) all components install into /usr/local, (b) /usr/local is initially empty, and (c) there aren't conflicts between the different library installations. You should be able to adapt it to other filesystem layouts.
Everything below is in the same Dockerfile.
I'd make a very first stage selecting a base Linux-distribution image. If you know you'll always need to install something – TLS CA certificates, mandatory package updates – you can put it here. Having this helps ensure that everything is being built against a consistent base.
FROM ubuntu:20.04 AS base
# empty in this example
Since you have multiple things you need to build, a next stage will install any build-time dependencies. The C toolchain and its dependencies are large, so having this separate saves time and space since the toolchain can be shared across the later stages.
FROM base AS build-deps
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
build-essential \
cmake
# libfoo-dev
Now for each individual library, you have a separate build stage that downloads the source, builds it, and installs it into /usr/local.
FROM build-deps AS sqlite
WORKDIR /sqlite
ADD https://sqlite.org/2022/sqlite-autoconf-3380200.tar.gz sqlite-autoconf-3380200.tar.gz
...
RUN make install
FROM build-deps AS proj
WORKDIR /proj
ADD https://download.osgeo.org/proj/proj-9.0.0.tar.gz proj-9.0.0.tar.gz
...
RUN cmake --build . --target install
To actually build your application, you'll need the C toolchain, plus you'll also need these various libraries.
FROM build-deps AS app
COPY --from=sqlite /usr/local/ /usr/local/
COPY --from=proj /usr/local/ /usr/local/
WORKDIR /app
COPY ./ ./
RUN ./configure && make && make install
Once you've done all of this, in the app image, the /usr/local tree will have all of the installed libraries (COPYed from the previous image) plus your application. So for the final stage, start from the original OS image (without the C toolchain) and COPY the /usr/local tree in (without the original sources).
FROM base
COPY --from=app /usr/local/ /usr/local/
EXPOSE 12345
CMD ["myapp"] # in `/usr/local/bin`
Let's say you update to a newer patch version of proj. In the sqlite path, the base and build-deps layers haven't changed and the ADD and RUN commands are the same, so this stage runs entirely from cache. proj is rebuilt. That will cause the COPY --from=proj step to invalidate the cache in the app stage, and you'll rebuild your application against the newer library.
Related
I am building a Docker image and need to run pip install vs a private PyPi with credentials.
What is the best way to secure the credentials?
Using various file configuration options (pip.conf, requirements.txt, .netrc) is still a vulnerability even if I delete them because they can be recovered.
Environment variables are also visible.
What's the most secure approach?
I understand that you want to provide those credentials on build time and get rid of them afterwards.
Well, the most secure way to handle this with pip would be by using a multi-stage build process.
First, you would declare an initial build-image with the file configurations and any dependency that could be needed to download/compile your desired packages; don't worry about the possibility of recovering those files, since you will only use them for the build process.
Afterwards define your final image without the build dependencies and copy only the source code you want to run from your project and the dependencies from the build image. The resultant image won't have the configuration files and it's impossible to recover them, since they never were there.
FROM python:3.10-slim as build
RUN apt-get update
RUN apt-get install -y --no-install-recommends \
build-essential gcc
WORKDIR /usr/app
RUN python -m -venv /usr/app/venv
ENV PATH="/usr/app/venv/bin:$PATH"
[HERE YOU COPY YOUR CONFIGURATION FILES WITH CREDENTIALS]
COPY requirements.txt
RUN pip install -r requirements
FROM python:3.10-slim
WORKDIR /usr/app
COPY --from=build /usr/app/venv ./venv
[HERE YOU COPY YOUR SOURCE CODE INTO YOUR CURRENT WORKDIR]
ENV PATH="/usr/app/venv/bin:$PATH"
ENTRYPOINT ["python", "whatever.py"]
I would like to say that this is my first container and actually my first JAVA app so maybe I will have basic questions so be lenient, please.
I wrote spring boot app and my colleague has written the frontend part for it in angular. What I would like to achieve is to have "one button/one command" in IntelliJ to create a container containing whole app backend and front end.
What I need to do is:
Clone FE from company repository (I am using ssh key now)
Clone BE from GitHub
Build FE
Copy built FE to static folder in java app
Build BE
Create a container running this app
My current solution is to create "builder" container and there build FE and BE and then copy it to "production" container like this:
#BUILDER
FROM alpine AS builder
WORKDIR /src
# add credentials on build
ARG SSH_PRIVATE_KEY
RUN mkdir /root/.ssh/ \
&& echo "${SSH_PRIVATE_KEY}" > /root/.ssh/id_rsa \
&& echo "github.com,140.82.121.3 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==" >> /root/.ssh/known_hosts \
&& chmod 600 /root/.ssh/id_rsa
# installing dependencies
RUN apk update && apk upgrade && apk add --update nodejs nodejs-npm \
&& npm install -g #angular/cli \
&& apk add openjdk11 \
&& apk add maven \
&& apk add --no-cache openssh \
&& apk add --no-cache git
#cloning repositories
RUN git clone git#code.siemens.com:apcprague/edge/metal-forming-fe.git
RUN git clone git#github.com:bzumik1/metalForming.git
# builds front end
WORKDIR /src/metal-forming-fe
RUN npm install && ng build
# builds whole java app with front end
WORKDIR /src/metalForming
RUN cp -a /src/metal-forming-fe/dist/metal-forming/. /src/metalForming/src/main/resources/static \
&& mvn install -DskipTests=true
#PRODUCTION CONTAINER
FROM adoptopenjdk/openjdk11:debian-slim
LABEL maintainer jakub.znamenacek#siemens.com
RUN mkdir app
RUN ["chmod", "+rwx", "/app"]
WORKDIR /app
COPY --from=builder /src/metalForming/target/metal_forming-0.0.1-SNAPSHOT.jar .
EXPOSE 4200
RUN java -version
CMD java -jar metal_forming-0.0.1-SNAPSHOT.jar
This works but I takes very long time so I guess this is not correct way how to do it. Could anyone point me in correct direction? I was thinking if there is a way how to make maven to all these steps for me but maybe this is totally off.
Also if you will find any problem in my Dockerfile please let me know as I said this is my first Dockerfile so I could overlook something.
EDITED:
BTW does anyone know how can I get rid of this part:
echo "github.com,140.82.121.3 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==" >> /root/.ssh/known_hosts \
it adds GitHub to known_hosts (I also need to add a company repository there). It is because when I run git clone it will ask me if I trust this ... and I have to write yes but I don't know how to do it if it is automatically running in docker and I have no option to write there yes. I have tried yes | git clone ... but this is also not working
a few things:
1, if this runs "slow" on the machine than it will run slow inside a container too.
2, remove --no-cache,* you want to cache everything that is static, because next time when you build those commands will not run where there is no change. Once there is change in one command than that command will rerun instead using the builder cache and also all subsequent commands will have to rerun too.
*UPDATE: I have mistaken "apk update --no-cache" with "docker build --no-cache". I stated wrong that "apk add --no-cache" would mean that command is not cached, because this command is cached on docker builder level. However with this parameter you wouldn't need to delete in a later step the /var/cache/apk/ directory to make you image smaller, but that you wouldn't need to do here, because you are already using multi stage build, so this would not affect your final image size.
One more thing to clarify, all statements in Dockerfile are checked if they changed, if they did not than docker builder uses the cached layer for it and won't run that statement. Exception is ADD and COPY commands, here builder also checks the copied, added files if they changed with checksum. Also if a statement is changed or ADD-ed COPY-ed file(s) changed than that statement is re-run and all subsequent statements re-run too, so you want to put your source code copy statemant as much at the end as it is possible
If you want to disable this cache, do "docker build --no-cache ..." this way all the steps will be re-run that is in the Dockerfile.
3, specify WORKDIR at the top once, if you need to switch directory later use this:
RUN cd /someotherdir && mycommand
Also specifying a Subsequent WORKDIR will be relativ to the previous WORKDIR so it will mess up readibilty what is the (probably) sole purpose of WORKDIR statement.
4, Enable BuildKit:
Either declare environment variable
DOCKER_BUILDKIT=1
or add this to /etc/docker/daemon.json
{ "features": { "buildkit": true } }
BuildKit might not help in this case, but if you do more complex Dockerfiles with more stages Buildkit can run those parallel so overall build will be faster.
5, Do not skip tests with DskipTests=true :)
6, as stated in a comment, do not clone the repo inside the image build, you do not need to do that at all. Just put the Dockerfile in the / of the repo, and COPY the repo files with a Dockerfile command:
COPY . .
First dot is the source that is your current directory on your machine, second dot is the target, the working dir inside the image, /src with your Dockerfile. You build the image and publish it, push it to a docker registry so others can pull it and start using it. If you want more complex stuff building and publishing with a help of a server, look up CI/CD techniques.
Using the official golang docker image, I can use the protoc command to generate the x.pb.go and x_grpc.pb.go files. The problem is that it uses the latest versions, while I want to generate those using whichever version that is part of the go.mod file.
I tried to start from the golang image, then get my project's go.mod file, get the dependencies and try to generate from there. Here is my dockerfile:
FROM golang:1.15
WORKDIR /app
RUN apt-get update
RUN apt install -y protobuf-compiler
COPY go.* .
RUN go mod download
RUN go get all
RUN export PATH="$PATH:$(go env GOPATH)/bin"
RUN mkdir /api
Then I try to bind the volume of the .proto file and the /pb folder to output them, and use the protoc command again (I'm trying directly from the docker right now). Something like this:
protoc --proto_path=/api --go_out=/pb --go-grpc_out=/pb /api/x.proto
I'm getting this error though:
protoc-gen-go: program not found or is not executable
--go_out: protoc-gen-go: Plugin failed with status code 1.
My go.sum file has google.golang.org/protobuf v1.25.0 in it, so how come it is not found?
go.mod & go.sum are used for versioning when building go programs. This is not what you need here. You want the protoc compiler to use the correct plugin versions when running it against your .proto file(s).
To install the desired protoc-gen-go (and protoc-gen-go-grpc if using gRPC) plugins, install them directly. Update your Dockerfile like so:
FROM golang:1.15
WORKDIR /app
RUN apt-get update
RUN apt install -y protobuf-compiler
RUN GO111MODULE=on \
go get google.golang.org/protobuf/cmd/protoc-gen-go#v1.25.0 \
google.golang.org/grpc/cmd/protoc-gen-go-grpc#v1.1.0
# export is redundant here `/go/bin` is already in `golang` image's path
# (and actual any env change here is lost once the command completes)
# RUN export PATH="$PATH:$(go env GOPATH)/bin"
RUN mkdir /api
If you want the latest versions of either plugin, either use #latest - or drop the # suffix
I feel confused by the Dockerfile and build process. Specifically, I am working my way through the book Docker on AWS and I feel stuck until I can work my way through a few more of the details. The book had me write the following Dockerfile.
#Test stage
FROM alpine as test
LABEL application=todobackend
#Install basic utilities
RUN apk add --no-cache bash git
#Install build dependencies
RUN apk add --no-cache gcc python3-dev libffi-dev musl-dev linux-headers mariadb-dev py3-pip
RUN ../../usr/bin/pip3 install wheel
#Copy requirements
COPY /src/requirements* /build/
WORKDIR /build
#Build and install requirements
RUN pip3 wheel -r requirements_test.txt --no-cache-dir --no-input
RUN pip3 install -r requirements_test.txt -f /build --no-index --no-cache-dir
# Copy source code
COPY /src /app
WORKDIR /app
# Test entrypoint
CMD ["python3","manage.py","test","--noinput","--settings=todobackend.settings_test"]
The following is a list of the things I understand versus don't understand.
I understand this.
#Test stage
FROM alpine as test
LABEL application=todobackend
It is defining a 'test' stage so I can run commands like docker build --target test and will execute all of the following commands until the next FROM / as command indicates a different target. LABEL is labeling the specific docker image that is built and from which containers will be 'born' (not sure if that is the right word to use). I don't feel any confusion about that EXCEPT if that tag translates to containers spawned from that image.
So NOW I start to feel confused.
I PARTLY understand this
#Install basic utilities
RUN apk add --no-cache bash git
I understand that apk is an overloaded term that represents both the package manager on Alpine Linux and a file type. In this context, it is a package manager command to install (or upgrade) a package to the running system. HOWEVER, I am suppose to be building / packaging up an application and all of its dependencies into an enclosed 'environment'. Sooo... where / when does this 'environment' come in? That is where I feel confused. When the docker file is running apk, is it just saying "locally, on your current machine, please install these the normal way." (ie, the equivalent of a bash script where apk installs to its working directory). When I run docker build --target test -t todobackend-test on my previously pasted docker file, is the docker command doing both a native command execution AND a Docker Engine call to create an isolated environment for my docker image? I feel like what must be happening is when the docker command is run it acts like a wrapper around the built-in package manager / bash / pip functionality AND the docker engine and is doing both but I don't know.
Anyways, I feel hope that this made sense. I just want some implementation details. Feel free to link documentation but it can feel super tedious and unnecessarily detailed OR obfuscated sometimes.
I DO want to point out that if I run an apk command in my Dockerfile with a bad dependency name (e.g. python3-pip instead of py3-pip). I get a very interesting error:
/bin/sh: pip3: not found
Notice the command path. I am assuming anyone reading this will understand why that feels hella confusing.
I have a Go application that I build into a binary and distribute as a Docker image.
Currently, I'm using ubuntu as my base image, but this causes an issue where if a user tries to use a Timezone other than UTC or their local timezone, they get an error stating:
pod error: panic: open /usr/local/go/lib/time/zoneinfo.zip: no such file or directory
This error is caused because the LoadLocation package in Go requires that file.
I can think of two ways to fix this issue:
Continue using the ubuntu base image, but in my Dockerfile add the commands: RUN apt-get install -y tzdata
Use one of Golang's base images, eg. golang:1.7.5-alpine.
What would be the recommended way? I'm not sure if I need to or should be using a Golang image since this is the container where the pre-built binary runs. My understanding is that Golang images are good for building the binary in the first place.
I prefer to use multi-stage build. On 1st step you use special golang building container for installing all dependencies and build an application. On 2nd stage I copy binary file to empty alpine container. This allows having all required tooling and minimal docker image simultaneously (in my case 6MB instead of 280MB).
Example of Dockerfile:
# build stage
FROM golang:1.8
ADD . /src
RUN set -x && \
cd /src && \
go get -t -v github.com/lisitsky/go-site-search-string && \
CGO_ENABLED=0 GOOS=linux go build -a -o goapp
# final stage
FROM alpine
WORKDIR /app
COPY --from=0 /src/goapp /app/
ENTRYPOINT /goapp
EXPOSE 8080
Since not all OS have localized timezone installed, this is what I did to make it work:
ADD https://github.com/golang/go/raw/master/lib/time/zoneinfo.zip /usr/local/go/lib/time/zoneinfo.zip
The full example of multi-step Docker image for adding timezone is here
This is more of a vote, but apt-get is what we (my company's tech group) do in situations like this. It gives us complete control over the hierarchy of images, but this is assuming you may have future images based on this one.
You can use the system's tzdata. Or you can copy $GOROOT/lib/time/zoneinfo.zip into your image, which is a trimmed version of the system one.