Go dockerized build, cacheing the dependency pull layer

Go dockerized build, cacheing the dependency pull layer - docker

I'm having a ripping good time using skaffold to develop some kubernetes services, but one of the longest steps in my cycle is pulling all the dependencies for the container.
Does anyone have recommendations on how I can best cache all the dependencies in a layer? Are there best practices with building go binaries inside docker containers? Should I have a layer where I do a go get? (Also I'm a novice building go binaries, don't know all the bells and whistles yet.)

I encountered exactly the same issue while integrating skaffold with kubebuilder and the idea to completely resolve this issue was:
To install buildkit, for example: brew install buildkit;
To enable skaffold for local build by having something like this:
apiVersion: skaffold/v1beta9
kind: Config
build:
local:
useBuildkit: true
useDockerCLI: true
...
Edit the Dockerfile to enable it:
# syntax=docker/dockerfile:experimental
# Build the manager binary
FROM golang:1.12.5 as builder
...
# Build
#RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o manager main.go
RUN --mount=type=cache,target=/go/pkg/mod --mount=type=cache,target=/root/.cache/go-build CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -o manager main.go
...
Then at first time it will still download every dependency but afterwards it will use the cache, which dramatically speeds up the build process.

I agree with Grigoriy Mikhalkin. Regarding your performance improvements, I want to name the Docker Build Enhancements which are based on moby/buildkit. At the time of writing, the tools aren't properly documented, but with some trial and error, you might find your solution.
Using buildkit, you can use a cache in your RUN statements in order to reduce the time of subsequent executions. They provide an example of Go in their docs, as well. In order to have it work, you have to enable the experimental features for both the Docker daemon and client (described on the link above).

I found this article after googling some more which covers the process: Using go mod download to speed up Golang Docker builds
The gist of the trick is to copy your go.mod and go.sum files into the container, then run go mod download to download dependencies, and then in another step continue with your build.
This works because your the go.mod and go.sum files do not change unless you add more dependencies. So, when the next RUN statement happens which is the go mod download docker knows that that it can cache this layer. (Source)
FROM golang:1.13.9-buster as builder
# Make Build dir
RUN mkdir /build
WORKDIR /build
# Copy golang dependency manifests
COPY go.mod .
COPY go.sum .
# Cache the downloaded dependency in the layer.
RUN go mod download
# add the source code
COPY . .
# Build
RUN go build -o app
# Run
FROM debian:buster-slim
COPY --from=builder /build
WORKDIR /app
CMD ["./app"]

It's customary to use multi-stage build for go services.
So that all dependencies resolved and executable is built at build stage. And final stage is actually running executable. As a result, your final image will be more slim in size. Although, it's not gonna speed up dependency resolution stage.

Related

Why does `docker build` sometimes not do anything and how can I force it to always build

I'm doing go development.
Sometimes when I do docker build it doesn't build. It's pretty obvious by the output, since the build process has to download some libs into the container.
What can I put in my Dockerfile to force it to always build?
This is one of the Dockerfiles that does this sometimes, with some minor employer stuff sanitized:
FROM golang:1.17 AS build
WORKDIR /software
COPY . ./
# Without this, the resultant binary expects some DNS resolver
# files that are actually optional, and (obviously) don't
# exist on scratch
ENV CGO_ENABLED=0
RUN go build
FROM scratch
COPY --from=build /program /program
EXPOSE 8000/tcp
ENTRYPOINT [ "/program" ]

Docker build will only re-build where there is a change in your Dockerfile.
It will cache what it has built that has not changed so it does not have to do it again later.
You can use --no-cache to force this, but I wouldn't do this every build as just wastes time, effort, and computing power.
Here are the official docs to this feature: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache

can use
docker build --no-cache

How to install particular go command in docker container?

I want to use go mod graph inside the docker container to generate dependency graph in go repository. I only use that command and not use the other go functionalities. I have try the other tools like godegraph, gomod, dept and nothing can do better than go mod graph . Currently I installed all the go functionalities and add about 400MB of my docker image size.
Question : Can I reduce the size of the docker container by installing spesific command in golang ? Or can I get the binary of go mod graph so its can reduce the size of the container?

If you don't need Go when executing your image (docker run), but only need it for building your image (docker build), then, as commented, use a multi-stage build.
Here is an example: "Create lean Docker images using the Builder Pattern" from Mike Kaperys:
Each FROM instruction defines a new base image for the following instructions, and begins a new build stage.
Before multi-stage builds it was possible to to achieve the same effect, but the process required 2 Dockerfiles.
Using multi-stage builds allows you to selectively copy files between build stages — this is the basis of the builder pattern.
See kaperys/blog/docker-builder-pattern
FROM golang:1.16
COPY . /go/src/github.com/kaperys/blog/docker-builder-pattern
WORKDIR /go/src/github.com/kaperys/blog/docker-builder-pattern
RUN go get && CGO_ENABLED=0 GOOS=linux go build -o server .
FROM scratch
LABEL maintainer="Mike Kaperys <mike#kaperys.io>"
COPY --from=0 /go/src/github.com/kaperys/blog/docker-builder-pattern/server /opt/kaperys/vision/server
COPY --from=0 /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
ADD html/ /opt/kaperys/vision/html
EXPOSE 8080
WORKDIR /opt/kaperys/vision
ENTRYPOINT [ "./server" ]
That way, you start from an image where Go is already installed (included go mod), then you build your actual image from any other one you want (including, in this example, "scratch").

Cache Cargo dependencies in a Docker volume

I'm building a Rust program in Docker (rust:1.33.0).
Every time code changes, it re-compiles (good), which also re-downloads all dependencies (bad).
I thought I could cache dependencies by adding VOLUME ["/usr/local/cargo"]. edit I've also tried moving this dir with CARGO_HOME without luck.
I thought that making this a volume would persist the downloaded dependencies, which appear to be in this directory.
But it didn't work, they are still downloaded every time. Why?
Dockerfile
FROM rust:1.33.0
VOLUME ["/output", "/usr/local/cargo"]
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
COPY src/ ./src/
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Built with just docker build ..
Cargo.toml
[package]
name = "mwe"
version = "0.1.0"
[dependencies]
log = { version = "0.4.6" }
Code: just hello world
Output of second run after changing main.rs:
...
Step 4/6 : COPY Cargo.toml .
---> Using cache
---> 97f180cb6ce2
Step 5/6 : COPY src/ ./src/
---> 835be1ea0541
Step 6/6 : RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
---> Running in 551299a42907
Updating crates.io index
Downloading crates ...
Downloaded log v0.4.6
Downloaded cfg-if v0.1.6
Compiling cfg-if v0.1.6
Compiling log v0.4.6
Compiling mwe v0.1.0 (/)
Finished dev [unoptimized + debuginfo] target(s) in 17.43s
Removing intermediate container 551299a42907
---> e4626da13204
Successfully built e4626da13204

A volume inside the Dockerfile is counter-productive here. That would mount an anonymous volume at each build step, and again when you run the container. The volume during each build step is discarded after that step completes, which means you would need to download the entire contents again for any other step needing those dependencies.
The standard model for this is to copy your dependency specification, run the dependency download, copy your code, and then compile or run your code, in 4 separate steps. That lets docker cache the layers in an efficient manner. I'm not familiar with rust or cargo specifically, but I believe that would look like:
FROM rust:1.33.0
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
RUN cargo fetch # this should download dependencies
COPY src/ ./src/
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Another option is to turn on some experimental features with BuildKit (available in 18.09, released 2018-11-08) so that docker saves these dependencies in what is similar to a named volume for your build. The directory can be reused across builds, but never gets added to the image itself, making it useful for things like a download cache.
# syntax=docker/dockerfile:experimental
FROM rust:1.33.0
VOLUME ["/output", "/usr/local/cargo"]
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
COPY src/ ./src/
RUN --mount=type=cache,target=/root/.cargo \
["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
Note that the above assumes cargo is caching files in /root/.cargo. You'd need to verify this and adjust as appropriate. I also haven't mixed the mount syntax with a json exec syntax to know if that part works. You can read more about the BuildKit experimental features here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
Turning on BuildKit from 18.09 and newer versions is as easy as export DOCKER_BUILDKIT=1 and then running your build from that shell.

I would say, the nicer solution would be to resort to docker multi-stage build as pointed here and there
This way you can create yourself a first image, that would build both your application and your dependencies, then use, only, in the second image, the dependency folder from the first one
This is inspired by both your comment on #Jack Gore's answer and the two issue comments linked here above.
FROM rust:1.33.0 as dependencies
WORKDIR /usr/src/app
COPY Cargo.toml .
RUN rustup default nightly-2019-01-29 && \
mkdir -p src && \
echo "fn main() {}" > src/main.rs && \
cargo build -Z unstable-options --out-dir /output
FROM rust:1.33.0 as application
# Those are the lines instructing this image to reuse the files
# from the previous image that was aliased as "dependencies"
COPY --from=dependencies /usr/src/app/Cargo.toml .
COPY --from=dependencies /usr/local/cargo /usr/local/cargo
COPY src/ src/
VOLUME /output
RUN rustup default nightly-2019-01-29 && \
cargo build -Z unstable-options --out-dir /output
PS: having only one run will reduce the number of layers you generate; more info here

Here's an overview of the possibilities. (Scroll down for my original answer.)
Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards remove the fake source and add the real ones. [Caches dependencies, but several fake files with workspaces].
Add Cargo files, create fake main.rs/lib.rs, then compile dependencies. Afterwards create a new layer with the dependencies and continue from there. [Similar to above].
Externally mount a volume for the cache dir. [Caches everything, relies on caller to pass --mount].
Use RUN --mount=type=cache,target=/the/path cargo build in the Dockerfile in new Docker versions. [Caches everything, seems like a good way, but currently too new to work for me. Executable not part of image. Edit: See here for a solution.]
Run sccache in another container or on the host, then connect to that during the build process. See this comment in Cargo issue 2644.
Use cargo-build-deps. [Might work for some, but does not support Cargo workspaces (in 2019)].
Wait for Cargo issue 2644. [There's willingness to add this to Cargo, but no concrete solution yet].
Using VOLUME ["/the/path"] in the Dockerfile does NOT work, this is per-layer (per command) only.
Note: one can set CARGO_HOME and ENV CARGO_TARGET_DIR in the Dockerfile to control where download cache and compiled output goes.
Also note: cargo fetch can at least cache downloading of dependencies, although not compiling.
Cargo workspaces suffer from having to manually add each Cargo file, and for some solutions, having to generate a dozen fake main.rs/lib.rs. For projects with a single Cargo file, the solutions work better.
I've got caching to work for my particular case by adding
ENV CARGO_HOME /code/dockerout/cargo
ENV CARGO_TARGET_DIR /code/dockerout/target
Where /code is the directory where I mount my code.
This is externally mounted, not from the Dockerfile.
EDIT1: I was confused why this worked, but #b.enoit.be and #BMitch cleared up that it's because volumes declared inside the Dockerfile only live for one layer (one command).

You do not need to use an explicit Docker volume to cache your dependencies. Docker will automatically cache the different "layers" of your image. Basically, each command in the Dockerfile corresponds to a layer of the image. The problem you are facing is based on how Docker image layer caching works.
The rules that Docker follows for image layer caching are listed in the official documentation:
Starting with a parent image that is already in the cache, the next
instruction is compared against all child images derived from that
base image to see if one of them was built using the exact same
instruction. If not, the cache is invalidated.
In most cases, simply comparing the instruction in the Dockerfile with
one of the child images is sufficient. However, certain instructions
require more examination and explanation.
For the ADD and COPY instructions, the contents of the file(s) in the
image are examined and a checksum is calculated for each file. The
last-modified and last-accessed times of the file(s) are not
considered in these checksums. During the cache lookup, the checksum
is compared against the checksum in the existing images. If anything
has changed in the file(s), such as the contents and metadata, then
the cache is invalidated.
Aside from the ADD and COPY commands, cache checking does not look at
the files in the container to determine a cache match. For example,
when processing a RUN apt-get -y update command the files updated in
the container are not examined to determine if a cache hit exists. In
that case just the command string itself is used to find a match.
Once the cache is invalidated, all subsequent Dockerfile commands
generate new images and the cache is not used.
So the problem is with the positioning of the command COPY src/ ./src/ in the Dockerfile. Whenever there is a change in one of your source files, the cache will be invalidated and all subsequent commands will not use the cache. Therefore your cargo build command will not use the Docker cache.
To solve your problem it will be as simple as reordering the commands in your Docker file, to this:
FROM rust:1.33.0
RUN rustup default nightly-2019-01-29
COPY Cargo.toml .
RUN ["cargo", "build", "-Z", "unstable-options", "--out-dir", "/output"]
COPY src/ ./src/
Doing it this way, your dependencies will only be reinstalled when there is a change in your Cargo.toml.
Hope this helps.

With the integration of BuildKit into docker, if you are able to avail yourself of the superior BuildKit backend, it's now possible to mount a cache volume during a RUN command, and IMHO, this has become the best way to cache cargo builds. The cache volume retains the data that was written to it on previous runs.
To use BuildKit, you'll mount two cache volumes, one for the cargo dir, which caches external crate sources, and one for the target dir, which caches all of your built artifacts, including external crates and the project bins and libs.
If your base image is rust, $CARGO_HOME is set to /usr/local/cargo, so your command looks like this:
RUN --mount=type=cache,target=/usr/local/cargo,from=rust,source=/usr/local/cargo \
--mount=type=cache,target=target \
cargo build
If your base image is something else, you will need to change the /usr/local/cargo bit to whatever is the value of $CARGO_HOME, or else add a ENV CARGO_HOME=/usr/local/cargo line. As a side note, the clever thing would be to set literally target=$CARGO_HOME and let Docker do the expansion, but it
doesn't seem to work right - expansion happens, but buildkit still doesn't persist the same volume across runs when you do this.
Other options for achieving Cargo build caching (including sccache and the cargo wharf project) are described in this github issue.

I figured out how to get this also working with cargo workspaces, using romac's fork of cargo-build-deps.
This example has my_app, and two workspaces: utils and db.
FROM rust:nightly as rust
# Cache deps
WORKDIR /app
RUN sudo chown -R rust:rust .
RUN USER=root cargo new myapp
# Install cache-deps
RUN cargo install --git https://github.com/romac/cargo-build-deps.git
WORKDIR /app/myapp
RUN mkdir -p db/src/ utils/src/
# Copy the Cargo tomls
COPY myapp/Cargo.toml myapp/Cargo.lock ./
COPY myapp/db/Cargo.toml ./db/
COPY myapp/utils/Cargo.toml ./utils/
# Cache the deps
RUN cargo build-deps
# Copy the src folders
COPY myapp/src ./src/
COPY myapp/db/src ./db/src/
COPY myapp/utils/src/ ./utils/src/
# Build for debug
RUN cargo build

I'm sure you can adjust this code for use with a Dockerfile, but I wrote a dockerized drop-in replacement for cargo that you can save to a package and run as ./cargo build --release. This just works for (most) development (uses rust:latest), but isn't set up for CI or anything.
Usage: ./cargo build, ./cargo build --release, etc
It will use the current working directory and save the cache to ./.cargo. (You can ignore the entire directory in your version control and it doesn't need to exist beforehand.)
Create a file named cargo in your project's folder, run chmod +x ./cargo on it, and place the following code in it:
#!/bin/bash
# This is a drop-in replacement for `cargo`
# that runs in a Docker container as the current user
# on the latest Rust image
# and saves all generated files to `./cargo/` and `./target/`.
#
# Be sure to make this file executable: `chmod +x ./cargo`
#
# # Examples
#
# - Running app: `./cargo run`
# - Building app: `./cargo build`
# - Building release: `./cargo build --release`
#
# # Installing globally
#
# To run `cargo` from anywhere,
# save this file to `/usr/local/bin`.
# You'll then be able to use `cargo`
# as if you had installed Rust globally.
sudo docker run \
--rm \
--user "$(id -u)":"$(id -g)" \
--mount type=bind,src="$PWD",dst=/usr/src/app \
--workdir /usr/src/app \
--env CARGO_HOME=/usr/src/app/.cargo \
rust:latest \
cargo "$#"

Artifact caching for multistage docker builds

I have a Dockerfiles like this
# build-home
FROM node:10 AS build-home
WORKDIR /usr/src/app
COPY /home/package.json /home/yarn.lock /usr/src/app/
RUN yarn install
COPY ./home ./
RUN yarn build
# build-dashboard
FROM node:10 AS build-dashboard
WORKDIR /usr/src/app
COPY /dashboard/package.json /dashboard/yarn.lock /usr/src/app/
RUN yarn install
COPY ./dashboard ./
RUN yarn build
# run
FROM nginx
EXPOSE 80
COPY nginx.conf /etc/nginx/nginx.conf
COPY --from=build-home /usr/src/app/dist /usr/share/nginx/html/home
COPY --from=build-dashboard /usr/src/app/dist /usr/share/nginx/html/dashboard
Here building two react application and then artifacts of build are put in nginx. To improve build performance, I need to cache the dist folder in the build-home andbuild-dashboard build-stages.
For this i create a volume in docker-compose.yml
...
web:
container_name: web
build:
context: ./web
volumes:
- ./web-build-cache:/usr/src/app
ports:
- 80:80
depends_on:
- api
I’ve stopped at this stage because I don’t understand how to add volume created bydocker-compose first for the build-home stage, and after adding thisvolume to the build-dashboard.
Maybe i should be create a two volumes and attach each to each of build stages, but how do this?
UPDATE:
Initial build.
Home application:
Install modules: 100.91s
Build app: 39.51s
Dashboard application:
Install modules: 100.91s
Build app: 50.38s
Overall time:
real 8m14.322s
user 0m0.560s
sys 0m0.373s
Second build (without code or dependencies change):
Home application:
Install modules: Using cache
Build app: Using cache
Dashboard application:
Install modules: Using cache
Build app: Using cache
Overall time:
real 0m2.933s
user 0m0.309s
sys 0m0.427s
Third build (with small change in code in first app):
Home application:
Install modules: Using cache
Build app: 50.04s
Dashboard application:
Install modules: Using cache
Build app: Using cache
Overall time:
real 0m58.216s
user 0m0.340s
sys 0m0.445s
Initial build of home application without Docker: 89.69s
real 1m30.111s
user 2m6.148s
sys 2m17.094s
Second build of home application without Docker, the dist folder exists on disk (without code or dependencies change): 18.16s
real 0m18.594s
user 0m20.940s
sys 0m2.155s
Third build of home application without Docker, the dist folder exists on disk (with small change in code): 20.44s
real 0m20.886s
user 0m22.472s
sys 0m2.607s
In the docker-container, the third builds of the application is 2 times longer. This shows that if the result of the first build is on disk, other builds completed faster. In the docker container, all assemblies after the first are executed as long as the first, because there is no dist folder.

If you're using multi-stage builds then there's a problem with docker cache. The final image don't have layers with build steps. By using --target and --cache-from together you can save this layers and reuse them in rebuild.
You need something like
docker build \
--target build-home \
--cache-from build-home:latest \
-t build-home:latest
docker build \
--target build-dashboard \
--cache-from build-dashboard:latest \
-t build-dashboard:latest
docker build \
--cache-from build-dashboard:latest \
--cache-from build-home:latest \
-t my-image:latest \
You can find more details at
https://andrewlock.net/caching-docker-layers-on-serverless-build-hosts-with-multi-stage-builds---target,-and---cache-from/

You can't use volumes during image building, and in any case Docker already does the caching you're asking for. If you leave your Dockerfile as-is and don't try to add your code in volumes in the docker-compose.yml, you should get caching of the built Javascript files access rebuilds as you expect.
When you run docker build, Docker looks at each step in turn. If the input to the step hasn't changed, the step itself hasn't changed, and any files that are being added haven't changed, then Docker will just reuse the result of running that step previously. In your Dockerfile, if you only change the nginx config, it will skip over all of the Javascript build steps and reuse their result from the previous time around.
(The other relevant technique, which you already have, is to build applications in two steps: first copy in files like package.json and yarn.lock that name dependencies, and install dependencies; then copy in and build your application. Since the "install dependencies" step is frequently time-consuming and the dependencies change relatively infrequently, you want to encourage Docker to reuse the last build's node_modules directory.)

Speeding up Go builds with go 1.10 build cache in Docker containers

I have a Go project with a large vendor/ directory which almost never changes.
I am trying to use the new go 1.10 build cache feature to speed up my builds in Docker engine locally.
Avoiding recompilation of my vendor/ directory would be enough optimization. So I'm trying to do Go equivalent of this common Dockerfile pattern for Python:
FROM python
COPY requirements.txt . # <-- copy your dependency list
RUN pip install -r requirements.txt # <-- install dependencies
COPY ./src ... # <-- your actual code (everything above is cached)
Similarly I tried:
FROM golang:1.10-alpine
COPY ./vendor ./src/myproject/vendor
RUN go build -v myproject/vendor/... # <-- pre-build & cache "vendor/"
COPY . ./src/myproject
However this is giving "cannot find package" error (likely because you cannot build stuff in vendor/ directly normally either).
Has anyone been able to figure this out?

Here's something that works for me:
FROM golang:1.10-alpine
WORKDIR /usr/local/go/src/github.com/myorg/myproject/
COPY vendor vendor
RUN find vendor -maxdepth 2 -mindepth 2 -type d -exec sh -c 'go install -i github.com/myorg/myproject/{}/... || true' \;
COPY main.go .
RUN go build main.go
It makes sure the vendored libraries are installed first. As long as you don't change a library, you're good.

Just use go install -i ./vendor/....
Consider the following Dockerfile:
FROM golang:1.10-alpine
ARG APP
ENV PTH $GOPATH/src/$APP
WORKDIR $PTH
# Pre-compile vendors.
COPY vendor/ $PTH/vendor/
RUN go install -i ./vendor/...
ADD . $PTH/
# Display time taken and the list of the packages being compiled.
RUN time go build -v
You can test it doing something like:
docker build -t test --build-arg APP=$(go list .) .
On the project I am working on, without pre-compile, it takes ~12sec with 90+ package each time, after, it take ~1.2s with only 3 (only the local ones).
If you still have "cannot find package", it means there are missing vendors. Re-run dep ensure should fix it.
An other tip, unrelated to Go is to have your .dockerignore start with *. i.e. ignore everything and then whitelist what you need.

As of Go 1.11, you would use go modules to accomplish this;
FROM golang
WORKDIR /src/myproject
COPY go.mod go.sum ./ # <-- copy your dependency list
RUN go mod download # <-- install dependencies
COPY . . # <-- your actual code (everything above is cached)
As long as go.sum doesn't change, the image layer created by go mod download shall be reused from cache.

Using go mod download alone didn't do the trick for me.
What ended up working for me was to leverage buildkit to mount Go's build cache.
This was the article that lead me to this feature.
My Dockerfile looks something like this
FROM golang AS build
WORKDIR /go/src/app
COPY go.mod ./
COPY go.sum ./
RUN go mod download
COPY . ./
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go build -o /my-happy-app
For local development this did quite a different for me, changing buildtimes from 1.5 minutes to 3 seconds.
I'm using colima (to run Docker on my Mac. Just mentioning this since I'm not using Docker for Mac, but it should work just the same) with
colima version 0.4.2
git commit: f112f336d05926d62eb6134ee3d00f206560493b
runtime: docker
arch: x86_64
client: v20.10.9
server: v20.10.11
And golang 1.17, so this is not a 1.10 specific answer.
I took this setup from docker compose's cli dockerfile here.
Your mileage may vary.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart