Is it possible to cache multi-stage docker builds?

Is it possible to cache multi-stage docker builds? - docker

I recently switched to multi-stage docker builds, and it doesn't appear that there's any caching on intermediate builds. I'm not sure if this is a docker limitation, something which just isn't available or whether I'm doing something wrong.
I am pulling down the final build and doing a --cache-from at the start of the new build, but it always runs the full build.

This appears to be a limitation of docker itself and is described under this issue - https://github.com/moby/moby/issues/34715
The workaround is to:
Build the intermediate stages with a --target
Push the intermediate images to the registry
Build the final image with a --target and use multiple --cache-from paths, listing all the intermediate images and the final image
Push the final image to the registry
For subsequent builds, pull the intermediate + final images down from the registry first

Since the previous answer was posted, there is now a solution using the BuildKit backend: https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources
This involves passing the argument --build-arg BUILDKIT_INLINE_CACHE=1 to your docker build command. You will also need to ensure BuildKit is being used by setting the environment variable DOCKER_BUILDKIT=1 (on Linux; I think BuildKit might be the default backend on Windows when using recent versions of Docker Desktop). A complete command line solution for CI might look something like:
export DOCKER_BUILDKIT=1
# Use cache from remote repository, tag as latest, keep cache metadata
docker build -t yourname/yourapp:latest \
--cache-from yourname/yourapp:latest \
--build-arg BUILDKIT_INLINE_CACHE=1 .
# Push new build up to remote repository replacing latest
docker push yourname/yourapp:latest
Some of the other commenters are asking about docker-compose. It works for this too, although you need to additionally specify the environment variable COMPOSE_DOCKER_CLI_BUILD=1 to ensure docker-compose uses the docker CLI (with BuildKit thanks to DOCKER_BUILDKIT=1) and then you can set BUILDKIT_INLINE_CACHE: 1 in the args: section of the build: section of your YAML file to ensure the required --build-arg is set.

I'd like to add another important point to the answer
--build-arg BUILDKIT_INLINE_CACHE=1 caches only the last layer, and works in case nothing changed.
So, to enable the caching of layers for the whole build, this argument should be replaced by --cache-to type=inline,mode=max. See the documentation

Related

docker dont generate new image from docker build

I'm in low cost project that we send to container registry (DigitalOcean) only latest image.
But all time, after running:
docker build .
Is generating the same digest, every time.
This is my script for build:
docker build .
docker tag {image}:latest registry.digitalocean.com/{company}/{image}:latest;
docker push registry.digitalocean.com/{company}/{image}
I tried:
BUILD_VERSION=`date '+%s'`;
docker build -t {image}:"$NOW" -t {image}:latest .
docker tag {image}:latest registry.digitalocean.com/{company}/{image}:latest;
docker push registry.digitalocean.com/{company}/{image}
but not worked.

Editing my answer, what David said is correct - the push with out the tag should pick up latest tag.
If you provide what you have in your local repository and the output of the above commands, it would shed more light to your problem.
Edit 2:
I think I have figured out on why:
Is generating the same digest, every time.
This means, although you are running your docker build - there has been no change to the underlying artifacts which are being packaged into the image and hence it results into the same digest.

Sometimes layers are cached but there are changes that aren't detected so you can delete the image or use 'docker system prune' to force clearing cache here

Does the order of --cache-from arguments matter when building an image with Docker Buildkit?

Suppose I am building an image using Docker Buildkit. My image is from a multistage Dockerfile, like so:
FROM node:12 AS some-expensive-base-image
...
FROM some-expensive-base-image AS my-app
...
I am now trying to build both images. Suppose that I push these to Docker Hub. If I were to use Docker Buildkit's external caching feature, then I would want to try to save build time on my CI pipeline by pulling in the remote some-expensive-base-image:latest image as the cache when building the some-expensive-base-image target. And, I would want to pull in both the just-built some-expensive-base-image image and the remote my-app:latest image as the caches for the latter image. I believe that I need both in order to prevent requiring the steps of some-expensive-base-image from needing to be rebuilt, since...well...they are expensive.
This is what my build script looks like:
export DOCKER_BUILDKIT=1
docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from some-expensive-base-image:latest --target some-expensive-base-image -t some-expensive-base-image:edge .
docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from some-expensive-base-image:edge --cache-from my-app:latest --target my-app -t my-app:edge .
My question: Does the order of the --cache-from arguments matter for the second docker build?
I have been getting inconsistent results on my CI pipeline for this build. There are cache misses that are happening when building that latter image, even though there hasn't been any code changes that would have caused cache busting. The Cache Minefest can be pulled without issue. There are times when the cache image is pulled, but other times when all steps of that latter target need to be rerun. I don't know why.
By chance, should I instead try to docker pull both images before running the docker build commands in my script?
Also, I know that I referred to Docker Hub in my example, but in real life, my application uses AWS ECR for its remote Docker repository. Would that matter for proper Buildkit functionality?

Yes, the order of --cache-from matters!
See the explanation on Github from the person who implemented the feature, quoting here:
When using multiple --cache-from they are checked for a cache hit in the order that user specified. If one of the images produces a cache hit for a command only that image is used for the rest of the build.
I've had similar problems in the past, you might find useful to check ths answer, where I've shared about using Docker cache in the CI.

What's the purpose of "docker build --pull"?

When building a docker image you normally use docker build ..
But I've found that you can specify --pull, so the whole command would look like docker build --pull .
I'm not sure about the purpose of --pull. Docker's official documentation says "Always attempt to pull a newer version of the image", and I'm not sure what this means in this context.
You use docker build to build a new image, and eventually publish it somewhere to a container registry. Why would you want to pull something that doesn't exist yet?

it will pull the latest version of any base image(s) instead of reusing whatever you already have tagged locally
take for instance an image based on a moving tag (such as ubuntu:bionic). upstream makes changes and rebuilds this periodically but you might have a months old image locally. docker will happily build against the old base. --pull will pull as a side effect so you build against the latest base image
it's ~usually a best practice to use it to get upstream security fixes as soon as possible (instead of using stale, potentially vulnerable images). though you have to trade off breaking changes (and if you use immutable tags then it doesn't make a difference)

Docker allows passing the --pull flag to docker build, e.g. docker build . --pull -t myimage. This is the recommended way to ensure that the build always uses the latest container image despite the version available locally. However one additional point worth mentioning:
To ensure that your build is completely rebuilt, including checking the base image for updates, use the following options when building:
--no-cache - This will force rebuilding of layers already available.
The full command will therefore look like this:
docker build . --pull --no-cache --tag myimage:version
The same options are available for docker-compose:
docker-compose build --no-cache --pull

Simple answer. docker build is used to build from a local dockerfile. docker pull is used to pull from docker hub. If you use docker build without a docker file it throws an error.
When you specify --pull or :latest docker will try to download the newest version (if any)
Basically, if you add --pull, it will try to pull the newest version each time it is run.

Labelling images in docker

I've got a jenkins server monitoring a git repo and building a docker image on code change. The .git directory is ignored as part of the build, but I want to associate the git commit hash with the image so that I know exactly what version of the code was used to make it and check whether the image is up to date.
The obvious solution is to tag the image with something like "application-name-branch-name:commit-hash", but for many develop branches I only want to keep the last good build, and adding more tags will make cleaning up old builds harder (rather than using the jenkins build number as the image is built, then retagging to :latest and untagging the build number)
The other possibility is labels, but while this looked promising initially, they proved more complicated in practice..
The only way I can see to apply a label directly to an image is in the Dockerfile, which cannot use the build environment variables, so I'd need to use some kind of templating to produce a custom Dockerfile.
The other way to apply a label is to start up a container from the image with some simple command (e.g. bash) and passing in the labels as docker run arguments. The container can then be committed as the new image. This has the unfortunate side effect of making the image's default command whatever was used with the labelling container (so bash in this case) rather than whatever was in the original Dockerfile. For my application I cannot use the actual command, as it will start changing the application state.
None of these seem particularly ideal - has anyone else found a better way of doing this?

Support for this was added in docker v1.9.0, so updating your docker installation to that version would fix your problem if that is OK with you.
Usage is described in the pull-request below:
https://github.com/docker/docker/pull/15182
As an example, take the following Dockerfile file:
FROM busybox
ARG GIT_COMMIT=unknown
LABEL git-commit=$GIT_COMMIT
and build it into an image named test as anyone would do naïvely:
docker build -t test .
Then inspect the test image to check what value ended up for the git-commit label:
docker inspect -f '{{index .ContainerConfig.Labels "git-commit"}}' test
unkown
Now, build the image again, but this time using the --build-arg option:
docker build -t test --build-arg GIT_COMMIT=0123456789abcdef .
Then inspect the test image to check what value ended up for the git-commit label:
docker inspect -f '{{index .ContainerConfig.Labels "git-commit"}}' test
0123456789abcdef
References:
Docker build command documentation for the --build-arg option
Dockerfile reference for the ARG directive
Dockerfile reference for the LABEL directive

You can specify a label on the command line when creating your image. So you would write something like
docker build -t myproject --label "myproject.version=githash" .
instead of hard-coding the version you can also get it directly from git:
docker build -t myproject --label "myproject.version=`git describe`" .
To read out the label from your images you can use docker inspect with a format string:
docker inspect -f '{{index .Config.Labels "myproject.version"}}' myproject

If you are using docker-compose, you could add the following to the build section:
labels:
git-commit-hash: ${COMMIT_HASH}
where COMMIT_HASH is your environment variable, which holds commit hash.

Pull docker images from a private repository during docker build?

Is there any way of pulling images from a private registry during a docker build instead of docker hub?
I deployed a private registry and I would like to be able to avoid naming its specific ip:port in the Dockerfile's FROM instruction. I was expecting a docker build option or a docker environment variable to change the default registry.

The image name should include the FQDN of the registry host.
So if you want to FROM <some private image> you must specifiy it as FROM registry_host:5000/foo/bar
In the future this won't be a requirement, but unfortunately for now it is.

I was facing the same issue in 2019. I solved this using arguments (ARG).
https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
Arguments allow you to set optional parameters (with defaults) that can be used in your FROM line.
Dockerfile-project-dev
ARG REPO_LOCATION=privaterepo.company.net/
ARG BASE_VERSION=latest
FROM ${REPO_LOCATION}project/base:${BASE_VERSION}
...
For my use-case I normally want to pull from the private repo, but if I'm working on the Dockerfiles I may want to be able to build from an image on my own machine, without having to modify the FROM line in my Dockerfile. To tell Docker to search my local machine for the image at build time I would do this:
docker build -t project/dev:latest -f ./Dockerfile-project-dev --build-arg REPO_LOCATION='' .

The docker folks generally want to ensure that if you run docker pull foo/bar you'll get the same thing (i.e., the foo/bar image from Docker Hub) regardless of your local environment.
This means that there are no options available to have Docker use anything else without an explicit hostname/port.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart