What causes a cache invalidation when building a Dockerfile?

What causes a cache invalidation when building a Dockerfile? - docker

I've been reading docs Best practices for writing Dockerfiles. I encountered small incorrectness (IMHO) for which meaning was clear after reading further:
Using apt-get update alone in a RUN statement causes caching issues
and subsequent apt-get install instructions fail.
Why fail I wondered. Later came explanation of what they meant by "fail":
Because the apt-get update is not run, your build can potentially get
an outdated version of the curl and nginx packages.
However, for the following I still cannot understand what they mean by "If not, the cache is invalidated.":
Starting with a parent image that is already in the cache, the next
instruction is compared against all child images derived from that
base image to see if one of them was built using the exact same
instruction. If not, the cache is invalidated.
That part is mentioned in some answers on SO e.g. How does Docker know when to use the cache during a build and when not? and as a whole the concept of cache invalidation is clear to me, I've read below:
When does Docker image cache invalidation occur?
Which algorithm Docker uses for invalidate cache?
But what is meaning of "if not"? At first I was sure the phrase meant if no such image is found. That would be overkill - to invalidate cache which maybe useful later for other builds. And indeed it is not invalidated if no image is found when I've tried below:
$ docker build -t alpine:test1 - <<HITTT
> FROM apline
> RUN echo "test1"
> RUN echo "test1-2"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM apline
pull access denied for apline, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
(base) nb0408:docker a.martianov$ docker build -t alpine:test1 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-2"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
---> Running in 928453d33c7c
test1
Removing intermediate container 928453d33c7c
---> 0e93df31058d
Step 3/3 : RUN echo "test1-2"
---> Running in b068bbaf8a75
test1-2
Removing intermediate container b068bbaf8a75
---> daeaef910f21
Successfully built daeaef910f21
Successfully tagged alpine:test1
$ docker build -t alpine:test1-1 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
---> Using cache
---> 0e93df31058d
Step 3/3 : RUN echo "test1-3"
---> Running in 74aa60a78ae1
test1-3
Removing intermediate container 74aa60a78ae1
---> 266bcc6933a8
Successfully built 266bcc6933a8
Successfully tagged alpine:test1-1
$ docker build -t alpine:test1-2 - <<HITTT
> FROM alpine
> RUN "test2"
> RUN
(base) nb0408:docker a.martianov$ docker build -t alpine:test2 - <<HITTT
> FROM alpine
> RUN echo "test2"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test2"
---> Running in 1a058ddf901c
test2
Removing intermediate container 1a058ddf901c
---> cdc31ac27a45
Step 3/3 : RUN echo "test1-3"
---> Running in 96ddd5b0f3bf
test1-3
Removing intermediate container 96ddd5b0f3bf
---> 7d8b901f3939
Successfully built 7d8b901f3939
Successfully tagged alpine:test2
$ docker build -t alpine:test1-3 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
---> Using cache
---> 0e93df31058d
Step 3/3 : RUN echo "test1-3"
---> Using cache
---> 266bcc6933a8
Successfully built 266bcc6933a8
Successfully tagged alpine:test1-3
Cache was again used for last build. What does docs mean by "if not"?

Let's focus on your original problem (regarding apt-get update) to make things easier. The following example is not based on any best practices. It just illustrates the point you are trying to understand.
Suppose you have the following Dockerfile:
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y nginx
You build a first image using docker build -t myimage:latest .
What happens is:
The ubuntu image is pulled if it does not exist
A layer is created and cached to run apt-get update
A layer is created an cached to run apt install -y nginx
Now suppose you modify your Docker file to be
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y nginx openssl
and you run a build again with the same command as before. What happens is:
There is already an ubuntu image locally so it will not be pulled (unless your force with --pull)
A layer was already created with command apt-get update against the existing local image so it uses the cached one
The next command has changed so a new layer is created to install nginx and openssl. Since apt database was created in the preceding layer and taken from cache, if a new nginx and/or openssl version was released since then, you will not see them and you will install the outdated ones.
Does this help you to grasp the concept of cached layers ?
In this particular example, the best handling is to do everything in a single layer making sure you cleanup after yourself:
FROM ubuntu:18.04
RUN apt-get update \
&& apt-get install -y nginx openssl \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

The phrasing of the line would be better said:
If not, there is a cache miss and the cache is not used for this build step and any following build step of this stage of the Dockerfile.
That gets a bit verbose because a multi-stage Dockerfile can fail to find a cache match in one stage and then find a match in another stage. Different builds can all use the cache. The cache is "invalidated" for a specific build process, the cache itself is not removed from the docker host and it continues to be available for future builds.

Related

Changed file is not picked up by docker

I have a dockerfile looking like this
FROM debian:bullseye-slim
RUN apt-get update; \
apt-get install -y libssl-dev; \
apt-get clean
COPY ./runner /runner
CMD /runner
I build the image like this:
docker build -f RunnerDockerfile -t $(RUNNER_IMG) target/release
Even though my executable (runner) changes, it's not picked up by Docker, i.e. Docker uses the cache.
According to my understanding of the docs, if the hash of a file in a COPY statement is changed, it should be be picked up by Docker.
Example run
md5sum target/release/runner
32169761853677f5f7bc03acb7bbe19b target/release/runner
Then I make a change:
md5sum target/release/runner
0075f725ce4f2cd779706b2fb9f83218 target/release/runner
When rebuilding the Docker images I get this
docker build -f RunnerDockerfile -t "runner:latest" target/release
Sending build context to Docker daemon 513.4MB
Step 1/4 : FROM debian:bullseye-slim
---> f8000d381a2c
Step 2/4 : RUN apt-get update; apt-get install -y libssl-dev; apt-get clean
---> Using cache
---> fcd7212dd477
Step 3/4 : COPY ./runner /runner
---> Using cache
---> 1a79d547a9b5
Step 4/4 : CMD /runner
---> Using cache
---> a45a0fd57dd9
Successfully built a45a0fd57dd9
Successfully tagged runner:latest
I.e. it's using the cache for the COPY.
Am I doing something wrong or is this a bug?

Target dependancy is output of a command

I've got a Makefile with a docker recipe which builds a docker image by doing make docker. The recipe looks like this:
# Build the docker file
docker: .setup ${GOBLDDIR}/docker-image
I wanted to set this up so that the docker file isn't rebuilt if everything is up-to-date, so I set up the ${GOBLDDIR}/docker-image dependancy. That dependancy is just a text file with the docker image ID. Here's that recipe, which actually does the docker build:
# DO THE DOCKER BUILD
${GOBLDDIR}/docker-image: ${GOBLDDIR} ${GOSRCFILES} go.mod Makefile Dockerfile
# Vendor our dependencies so we don't have to download them in the docker container
go mod vendor
# Wherever the config lives, we need it to be in the Docker build context
mkdir -p ./.docker-files
cp ${TFOUT} ./.docker-files/config.json
# The go app is actually compiled within this dockerfile
docker build . -t ${GOCMD} --build-arg AWS_PROFILE=${AWS_PROFILE} --build-arg SVC=${GOCMD} --build-arg VERSTR=${VERSTR}
docker images ${GOCMD} -q > ${GOBLDDIR}/docker-image
docker tag ${GOCMD}:latest ${GOCMD}:dev
docker tag ${GOCMD}:latest ${GOCMD}:${VERSTR}
# Clean up the crap we created just to build the Dockerfile
rm -rf vendor/
rm -rf ./.docker-files
Maybe this is an insane design - feels that way to me. There are certianly times when it doesn't work. I'm open to other ideas.
In particular, is there a way to make a dependency not be a file, but be the result of a command? For example, something like: docker inspect -f '{{ .Created }}' MY_IMAGE_NAME?

Your approach is generally a good approach. You can add checking not necessarily for the timestamp of creation time, but whether the image is actually the one that you previously built (someone else may have built a newer image with a different contents that may not reflect your repository anymore).
In general make decides whether to make a target or not by comparing timestamps of dependencies, so they are most commonly files. The list of dependencies may be manipulated however with some logic, which allows you to run arbitrary checks.
In your scenario you already store the image ID in a file. This may now be used to check whether the current image ID is the same that we built previously. We may compare output of the same command (extracted to a variable for DRY-ness) with the stored contents upon dependencies evaluation; if they do not match, we issue dependency of FORCE which is .PHONY and therefore always out of date, effectively triggering target remake:
$ cat Makefile
docker-id = docker image ls -q $(DOCKER_IMAGE)
docker-image: Dockerfile $(if $(findstring $(shell $(docker-id)),$(file <docker-image)),,FORCE)
docker build . -t $(DOCKER_IMAGE)
$(docker-id) > $#
.PHONY: FORCE
Output:
# Initial image build
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Running in 6a9601225516
16517
---> 54b651cb8912
Removing intermediate container 6a9601225516
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# No files changed, image exists
$ make docker-image DOCKER_IMAGE=test
make: 'docker-image' is up to date.
# Changing Dockerfile forces rebuild
$ touch Dockerfile
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Using cache
---> 54b651cb8912
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# Mismatched id forces rebuild
$ echo foobar > docker-image
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Using cache
---> 54b651cb8912
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# Missing image forces rebuild
$ docker image rm test
Untagged: test:latest
Deleted: sha256:54b651cb8912a5d2505f1f92c8ea4ee367cdca0d3f8e6f1ebc69d0cf646ca72c
Deleted: sha256:3749695bf207f2ca0829d8faf0ef538af1e18e8c39228347f48fd6d7c149f73a
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Running in 70f8b09a92b3
16517
---> 8bdfdfff9db2
Removing intermediate container 70f8b09a92b3
Successfully built 8bdfdfff9db2
docker image ls -q test > docker-image

How to bust the cache for the FROM line of a Dockerfile

I have a Dockerfile like this:
FROM python:2.7
RUN echo "Hello World"
When I build this the first time with docker build -f Dockerfile -t test ., or build it with the --no-cache option, I get this output:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Running in 5b5b88e5ebce
Hello World
Removing intermediate container 5b5b88e5ebce
---> a23687d914c2
Successfully built a23687d914c2
My echo command executes.
If I run it again without busting the cache, I get this:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Using cache
---> a23687d914c2
Successfully built a23687d914c2
Successfully tagged test-requirements:latest
Cache is used for Step 2/2, and Hello World is not executed. I could get it to execute again by using --no-cache. However, each time, even when I am using --no-cache it uses a cached python:2.7 base image (although, unlike when the echo command is cached, it does not say ---> Using cache).
How do I bust the cache for the FROM python:2.7 line? I know I can do FROM python:latest, but that also seems to just cache whatever the latest version is the first time you build the Dockerfile.

If I understood the context correctly, you can use --pull while using docker build to get the latest base image -
$ docker build -f Dockerfile.test -t test . --pull
So using both --no-cache & --pull will create an absolute fresh image using Dockerfile -
$ docker build -f Dockerfile.test -t test . --pull --no-cache
Issue - https://github.com/moby/moby/issues/4238

FROM pulls an image from the registry (DockerHub in this case).
After the image is pulled to produce your build, you will see it if you run docker images.
You may remove it by running docker rmi python:2.7.

provide two docker images from the same dockerfile

I am setting an automatic build from which I would like to produce 2 images.
The use-case is in building and distributing a library:
- one image with the dependencies which will be reused for building and testing on Travis
- one image to provide the built software libs
Basically, I need to be able to push an image of the container at a certain point (before building) and one later (after building and installing).
Is this possible? I did not find anything relevant in Dockerfile docs.

You can do that using Docker Multi Stage builds. Have two Docker files
Dockerfile
FROM alpine
RUN apk update && apk add gcc
RUN echo "This is a test" > /tmp/builtfile
Dockerfile-prod
FROM myapp:testing as source
FROM alpine
COPY --from=source /tmp/builtfile /tmp/builtfile
RUN cat /tmp/builtfile
build.sh
docker build -t myapp:testing .
docker build -t myapp:production -f Dockerfile-prod .
So to explain, what we do is build the image with dependencies first. Then in our second Dockerfile-prod, we just include a FROM of our previously build image. And copy the built file to the production image.
Truncated output from my build
vagrant#vagrant:~/so$ ./build.sh
Step 1/3 : FROM alpine
Step 2/3 : RUN apk update && apk add gcc
Step 3/3 : RUN echo "This is a test" > /tmp/builtfile
Successfully tagged myapp:testing
Step 1/4 : FROM myapp:testing as source
Step 2/4 : FROM alpine
Step 3/4 : COPY --from=source /tmp/builtfile /tmp/builtfile
Step 4/4 : RUN cat /tmp/builtfile
This is a test
Successfully tagged myapp:production
For more information refer to https://docs.docker.com/engine/userguide/eng-image/multistage-build/#name-your-build-stages

How to force Docker for a clean build of an image

I have build a Docker image from a Docker file using the below command.
$ docker build -t u12_core -f u12_core .
When I am trying to rebuild it with the same command, it's using the build cache like:
Step 1 : FROM ubuntu:12.04
---> eb965dfb09d2
Step 2 : MAINTAINER Pavan Gupta <pavan.gupta#gmail.com>
---> Using cache
---> 4354ccf9dcd8
Step 3 : RUN apt-get update
---> Using cache
---> bcbca2fcf204
Step 4 : RUN apt-get install -y openjdk-7-jdk
---> Using cache
---> 103f1a261d44
Step 5 : RUN apt-get install -y openssh-server
---> Using cache
---> dde41f8d0904
Step 6 : RUN apt-get install -y git-core
---> Using cache
---> 9be002f08b6a
Step 7 : RUN apt-get install -y build-essential
---> Using cache
---> a752fd73a698
Step 8 : RUN apt-get install -y logrotate
---> Using cache
---> 93bca09b509d
Step 9 : RUN apt-get install -y lsb-release
---> Using cache
---> fd4d10cf18bc
Step 10 : RUN mkdir /var/run/sshd
---> Using cache
---> 63b4ecc39ff0
Step 11 : RUN echo 'root:root' | chpasswd
---> Using cache
---> 9532e31518a6
Step 12 : RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config
---> Using cache
---> 47d1660bd544
Step 13 : RUN sed 's#session\s*required\s*pam_loginuid.so#session optional pam_loginuid.so#g' -i /etc/pam.d/sshd
---> Using cache
---> d1f97f1c52f7
Step 14 : RUN wget -O aerospike.tgz 'http://aerospike.com/download/server/latest/artifact/ubuntu12'
---> Using cache
---> bd7dde7a98b9
Step 15 : RUN tar -xvf aerospike.tgz
---> Using cache
---> 54adaa09921f
Step 16 : RUN dpkg -i aerospike-server-community-*/*.deb
---> Using cache
---> 11aba013eea5
Step 17 : EXPOSE 22 3000 3001 3002 3003
---> Using cache
---> e33aaa78a931
Step 18 : CMD /usr/sbin/sshd -D
---> Using cache
---> 25f5fe70fa84
Successfully built 25f5fe70fa84
The cache shows that aerospike is installed. However, I don't find it inside containers spawn from this image, so I want to rebuild this image without using the cache. How can I force Docker to rebuild a clean image without the cache?

There's a --no-cache option:
docker build --no-cache -t u12_core -f u12_core .
In older versions of Docker you needed to pass --no-cache=true, but this is no longer the case.

In some extreme cases, your only way around recurring build failures is by running:
docker system prune
The command will ask you for your confirmation:
WARNING! This will remove:
- all stopped containers
- all volumes not used by at least one container
- all networks not used by at least one container
- all images without at least one container associated to them
Are you sure you want to continue? [y/N]
This is of course not a direct answer to the question, but might save some lives... It did save mine.

To ensure that your build is completely rebuild, including checking the base image for updates, use the following options when building:
--no-cache - This will force rebuilding of layers already available
--pull - This will trigger a pull of the base image referenced using FROM ensuring you got the latest version.
The full command will therefore look like this:
docker build --pull --no-cache --tag myimage:version .
Same options are available for docker-compose:
docker-compose build --no-cache --pull
Note that if your docker-compose file references an image, the --pull option will not actually pull the image if there is one already.
To force docker-compose to re-pull this, you can run:
docker-compose pull

The command docker build --no-cache . solved our similar problem.
Our Dockerfile was:
RUN apt-get update
RUN apt-get -y install php5-fpm
But should have been:
RUN apt-get update && apt-get -y install php5-fpm
To prevent caching the update and install separately.
See: Best practices for writing Dockerfiles

Most of information here are correct.
Here a compilation of them and my way of using them.
The idea is to stick to the recommended approach (build specific and no impact on other stored docker objects) and to try the more radical approach (not build specific and with impact on other stored docker objects) when it is not enough.
Recommended approach :
1) Force the execution of each step/instruction in the Dockerfile :
docker build --no-cache
or with docker-compose build :
docker-compose build --no-cache
We could also combine that to the up sub-command that recreate all containers:
docker-compose build --no-cache &&
docker-compose up -d --force-recreate
These way don't use cache but for the docker builder and the base image referenced with the FROM instruction.
2) Wipe the docker builder cache (if we use Buildkit we very probably need that) :
docker builder prune -af
3) If we don't want to use the cache of the parent images, we may try to delete them such as :
docker image rm -f fooParentImage
In most of cases, these 3 things are perfectly enough to allow a clean build of our image.
So we should try to stick to that.
More radical approach :
In corner cases where it seems that some objects in the docker cache are still used during the build and that looks repeatable, we should try to understand the cause to be able to wipe the missing part very specifically.
If we really don't find a way to rebuild from scratch, there are other ways but it is important to remember that these generally delete much more than it is required. So we should use them with cautious overall when we are not in a local/dev environment.
1) Remove all images without at least one container associated to them :
docker image prune -a
2) Remove many more things :
docker system prune -a
That says :
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all images without at least one container associated to them
- all build cache
Using that super delete command may not be enough because it strongly depends on the state of containers (running or not).
When that command is not enough, I try to think carefully which docker containers could cause side effects to our docker build and to allow these containers to be exited in order to allow them to be removed with the command.

With docker-compose try docker-compose up -d --build --force-recreate

I would not recommend using --no-cache in your case.
You are running a couple of installations from step 3 to 9 (I would, by the way, prefer using a one liner) and if you don't want the overhead of re-running these steps each time you are building your image you can modify your Dockerfile with a temporary step prior to your wget instruction.
I use to do something like RUN ls . and change it to RUN ls ./ then RUN ls ./. and so on for each modification done on the tarball retrieved by wget
You can of course do something like RUN echo 'test1' > test && rm test increasing the number in 'test1 for each iteration.
It looks dirty, but as far as I know it's the most efficient way to continue benefiting from the cache system of Docker, which saves time when you have many layers...

You can manage the builder cache with docker builder
To clean all the cache with no prompt:
docker builder prune -af

GUI-driven approach: Open the docker desktop tool (that usually comes with Docker):
under "Containers / Apps" stop all running instances of that image
under "Images" remove the build image (hover over the box name to get a context menu), eventually also the underlying base image

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart