Target dependancy is output of a command - docker

I've got a Makefile with a docker recipe which builds a docker image by doing make docker. The recipe looks like this:
# Build the docker file
docker: .setup ${GOBLDDIR}/docker-image
I wanted to set this up so that the docker file isn't rebuilt if everything is up-to-date, so I set up the ${GOBLDDIR}/docker-image dependancy. That dependancy is just a text file with the docker image ID. Here's that recipe, which actually does the docker build:
# DO THE DOCKER BUILD
${GOBLDDIR}/docker-image: ${GOBLDDIR} ${GOSRCFILES} go.mod Makefile Dockerfile
# Vendor our dependencies so we don't have to download them in the docker container
go mod vendor
# Wherever the config lives, we need it to be in the Docker build context
mkdir -p ./.docker-files
cp ${TFOUT} ./.docker-files/config.json
# The go app is actually compiled within this dockerfile
docker build . -t ${GOCMD} --build-arg AWS_PROFILE=${AWS_PROFILE} --build-arg SVC=${GOCMD} --build-arg VERSTR=${VERSTR}
docker images ${GOCMD} -q > ${GOBLDDIR}/docker-image
docker tag ${GOCMD}:latest ${GOCMD}:dev
docker tag ${GOCMD}:latest ${GOCMD}:${VERSTR}
# Clean up the crap we created just to build the Dockerfile
rm -rf vendor/
rm -rf ./.docker-files
Maybe this is an insane design - feels that way to me. There are certianly times when it doesn't work. I'm open to other ideas.
In particular, is there a way to make a dependency not be a file, but be the result of a command? For example, something like: docker inspect -f '{{ .Created }}' MY_IMAGE_NAME?

Your approach is generally a good approach. You can add checking not necessarily for the timestamp of creation time, but whether the image is actually the one that you previously built (someone else may have built a newer image with a different contents that may not reflect your repository anymore).
In general make decides whether to make a target or not by comparing timestamps of dependencies, so they are most commonly files. The list of dependencies may be manipulated however with some logic, which allows you to run arbitrary checks.
In your scenario you already store the image ID in a file. This may now be used to check whether the current image ID is the same that we built previously. We may compare output of the same command (extracted to a variable for DRY-ness) with the stored contents upon dependencies evaluation; if they do not match, we issue dependency of FORCE which is .PHONY and therefore always out of date, effectively triggering target remake:
$ cat Makefile
docker-id = docker image ls -q $(DOCKER_IMAGE)
docker-image: Dockerfile $(if $(findstring $(shell $(docker-id)),$(file <docker-image)),,FORCE)
docker build . -t $(DOCKER_IMAGE)
$(docker-id) > $#
.PHONY: FORCE
Output:
# Initial image build
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Running in 6a9601225516
16517
---> 54b651cb8912
Removing intermediate container 6a9601225516
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# No files changed, image exists
$ make docker-image DOCKER_IMAGE=test
make: 'docker-image' is up to date.
# Changing Dockerfile forces rebuild
$ touch Dockerfile
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Using cache
---> 54b651cb8912
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# Mismatched id forces rebuild
$ echo foobar > docker-image
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Using cache
---> 54b651cb8912
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# Missing image forces rebuild
$ docker image rm test
Untagged: test:latest
Deleted: sha256:54b651cb8912a5d2505f1f92c8ea4ee367cdca0d3f8e6f1ebc69d0cf646ca72c
Deleted: sha256:3749695bf207f2ca0829d8faf0ef538af1e18e8c39228347f48fd6d7c149f73a
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Running in 70f8b09a92b3
16517
---> 8bdfdfff9db2
Removing intermediate container 70f8b09a92b3
Successfully built 8bdfdfff9db2
docker image ls -q test > docker-image

Related

What causes a cache invalidation when building a Dockerfile?

I've been reading docs Best practices for writing Dockerfiles. I encountered small incorrectness (IMHO) for which meaning was clear after reading further:
Using apt-get update alone in a RUN statement causes caching issues
and subsequent apt-get install instructions fail.
Why fail I wondered. Later came explanation of what they meant by "fail":
Because the apt-get update is not run, your build can potentially get
an outdated version of the curl and nginx packages.
However, for the following I still cannot understand what they mean by "If not, the cache is invalidated.":
Starting with a parent image that is already in the cache, the next
instruction is compared against all child images derived from that
base image to see if one of them was built using the exact same
instruction. If not, the cache is invalidated.
That part is mentioned in some answers on SO e.g. How does Docker know when to use the cache during a build and when not? and as a whole the concept of cache invalidation is clear to me, I've read below:
When does Docker image cache invalidation occur?
Which algorithm Docker uses for invalidate cache?
But what is meaning of "if not"? At first I was sure the phrase meant if no such image is found. That would be overkill - to invalidate cache which maybe useful later for other builds. And indeed it is not invalidated if no image is found when I've tried below:
$ docker build -t alpine:test1 - <<HITTT
> FROM apline
> RUN echo "test1"
> RUN echo "test1-2"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM apline
pull access denied for apline, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
(base) nb0408:docker a.martianov$ docker build -t alpine:test1 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-2"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
---> Running in 928453d33c7c
test1
Removing intermediate container 928453d33c7c
---> 0e93df31058d
Step 3/3 : RUN echo "test1-2"
---> Running in b068bbaf8a75
test1-2
Removing intermediate container b068bbaf8a75
---> daeaef910f21
Successfully built daeaef910f21
Successfully tagged alpine:test1
$ docker build -t alpine:test1-1 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
---> Using cache
---> 0e93df31058d
Step 3/3 : RUN echo "test1-3"
---> Running in 74aa60a78ae1
test1-3
Removing intermediate container 74aa60a78ae1
---> 266bcc6933a8
Successfully built 266bcc6933a8
Successfully tagged alpine:test1-1
$ docker build -t alpine:test1-2 - <<HITTT
> FROM alpine
> RUN "test2"
> RUN
(base) nb0408:docker a.martianov$ docker build -t alpine:test2 - <<HITTT
> FROM alpine
> RUN echo "test2"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test2"
---> Running in 1a058ddf901c
test2
Removing intermediate container 1a058ddf901c
---> cdc31ac27a45
Step 3/3 : RUN echo "test1-3"
---> Running in 96ddd5b0f3bf
test1-3
Removing intermediate container 96ddd5b0f3bf
---> 7d8b901f3939
Successfully built 7d8b901f3939
Successfully tagged alpine:test2
$ docker build -t alpine:test1-3 - <<HITTT
> FROM alpine
> RUN echo "test1"
> RUN echo "test1-3"
> HITTT
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
---> 965ea09ff2eb
Step 2/3 : RUN echo "test1"
---> Using cache
---> 0e93df31058d
Step 3/3 : RUN echo "test1-3"
---> Using cache
---> 266bcc6933a8
Successfully built 266bcc6933a8
Successfully tagged alpine:test1-3
Cache was again used for last build. What does docs mean by "if not"?
Let's focus on your original problem (regarding apt-get update) to make things easier. The following example is not based on any best practices. It just illustrates the point you are trying to understand.
Suppose you have the following Dockerfile:
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y nginx
You build a first image using docker build -t myimage:latest .
What happens is:
The ubuntu image is pulled if it does not exist
A layer is created and cached to run apt-get update
A layer is created an cached to run apt install -y nginx
Now suppose you modify your Docker file to be
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get install -y nginx openssl
and you run a build again with the same command as before. What happens is:
There is already an ubuntu image locally so it will not be pulled (unless your force with --pull)
A layer was already created with command apt-get update against the existing local image so it uses the cached one
The next command has changed so a new layer is created to install nginx and openssl. Since apt database was created in the preceding layer and taken from cache, if a new nginx and/or openssl version was released since then, you will not see them and you will install the outdated ones.
Does this help you to grasp the concept of cached layers ?
In this particular example, the best handling is to do everything in a single layer making sure you cleanup after yourself:
FROM ubuntu:18.04
RUN apt-get update \
&& apt-get install -y nginx openssl \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
The phrasing of the line would be better said:
If not, there is a cache miss and the cache is not used for this build step and any following build step of this stage of the Dockerfile.
That gets a bit verbose because a multi-stage Dockerfile can fail to find a cache match in one stage and then find a match in another stage. Different builds can all use the cache. The cache is "invalidated" for a specific build process, the cache itself is not removed from the docker host and it continues to be available for future builds.

Getting reproducible docker layers on different hosts

Problem: I can't reproduce docker layers using exactly same content (on one machine or in CI cluster where something is built from git repo)
Consider this simple example
$ echo "test file" > test.txt
$ cat > Dockerfile <<EOF
FROM alpine:3.8
COPY test.txt /test.txt
EOF
If I build image on one machine with caching enabled, then last layer with copied file would be shared across images
$ docker build -t test:1 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
3.8: Pulling from library/alpine
cd784148e348: Already exists
Digest: sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1
Status: Downloaded newer image for alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:1
$ docker build -t test:2 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> Using cache
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:2
But with cache disabled (or simply using another machine) I got different hash values.
$ docker build -t test:3 --no-cache .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> ced4dff22d62
Successfully built ced4dff22d62
Successfully tagged test:3
At the same time history command shows that file content was same
$ docker history test:1
IMAGE CREATED CREATED BY SIZE COMMENT
decab6a3fbe3 6 minutes ago /bin/sh -c #(nop) COPY file:d9210c40895e
$ docker history test:3
IMAGE CREATED CREATED BY SIZE COMMENT
ced4dff22d62 27 seconds ago /bin/sh -c #(nop) COPY file:d9210c40895e
Am I missing something or this behavior is by design?
Are there any technics to get reproducible/reusable layers that does not force me to do one of the following
Share docker cache across machines
Do a pull of "previous" image before building next
Ultimately this problem prevents me from getting thin layers with constantly changing app code while keeping layers of my dependencies in separate and infrequently changed layer.
After some extra googling, I found a great post describing solution to this problem.
Starting from 1.13, docker has --cache-from option that can be used to tell docker to look at another images for layers. Important thing - image should be explicitly pulled for it to work + you still need point what image to take. It could be latest or any other "rolling" image you have.
Given that, unfortunately there is no way to produce same layer in "isolation", but cache-from solves root problem - how to eventually reuse some layers during ci build.

Does Docker build --no-cache actually download and refresh the base image?

Does docker build --no-cache refresh updated remote base images or not? Documentation does not seem to specify.
The --no-cache option will rebuild the image without using the local cached layers. However, the FROM line will reuse the already pulled base image if it exists on the build host (the from line itself may not be cached, but the image it pulls is). If you want to pull the base image again, you can use the --pull option to the build command. E.g.
$ docker build --no-cache --pull -t new-image-name:latest .
To see all the options the build command takes, you can run
$ docker build --help
or see the documentation at https://docs.docker.com/engine/reference/commandline/build/
Here's an example for how you can test this behavior yourself:
$ # very simple Dockerfile
$ cat df.test
FROM busybox:latest
RUN echo hello >test.txt
$ # pull an older version of busybox
$ docker pull busybox:1.29.2
1.29.2: Pulling from library/busybox
8c5a7da1afbc: Pull complete
Digest: sha256:cb63aa0641a885f54de20f61d152187419e8f6b159ed11a251a09d115fdff9bd
Status: Downloaded newer image for busybox:1.29.2
$ # retag that locally as latest
$ docker tag busybox:1.29.2 busybox:latest
$ # run the build, note the image id at the end of each build step
$ DOCKER_BUILDKIT=0 docker build --no-cache -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
---> e1ddd7948a1c
Step 2/2 : RUN echo hello >test.txt
---> Running in dba83fef49f9
Removing intermediate container dba83fef49f9
---> 1f824ff05612
Successfully built 1f824ff05612
$ # rerun the build, note step 1 keeps the same id and never pulled a new latest
$ DOCKER_BUILDKIT=0 docker build --no-cache -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
---> e1ddd7948a1c
Step 2/2 : RUN echo hello >test.txt
---> Running in 73df884b0f48
Removing intermediate container 73df884b0f48
---> e5870de6c24f
Successfully built e5870de6c24f
$ # run with --pull and see docker update the latest image, new container id from step 1
$ DOCKER_BUILDKIT=0 docker build --no-cache --pull -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
---> 59788edf1f3e
Step 2/2 : RUN echo hello >test.txt
---> Running in 7204116ecbf4
Removing intermediate container 7204116ecbf4
---> 2c6d8c48661b
Successfully built 2c6d8c48661b
$ # one last run now that busybox:latest is updated shows the pull has nothing to do
$ DOCKER_BUILDKIT=0 docker build --no-cache --pull -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Image is up to date for busybox:latest
---> 59788edf1f3e
Step 2/2 : RUN echo hello >test.txt
---> Running in f37e19024e99
Removing intermediate container f37e19024e99
---> 044a5d4011c4
Successfully built 044a5d4011c4
docker build --no-cache will rebuild your whole image without reusing cached layers but it will not pull the newest base image from the remote repository. It will just use your local stored image.
--no-cache rebuilds image without using the cache, so it is essentially a clean build.
as per the help docker build --help
--no-cache Do not use cache when building the image
Each dockerfile commands that we specify in the docker file like RUN, CMD, ADD create layers in your local system and this layers will be used by other docker images provided they too make use of same dockerfile command with same parameters.
When we specify docker build with "--no-cache" parameter then docker will ignore the local system docker image layers that were already available in the local system where you are building the docker and it always start the build as fresh build or from scratch and the reference count for the previous layer, if any; won't be added while building this new image layer.
You can find the layers of image by following this link
Finding the layers and layer sizes for each Docker image

How to bust the cache for the FROM line of a Dockerfile

I have a Dockerfile like this:
FROM python:2.7
RUN echo "Hello World"
When I build this the first time with docker build -f Dockerfile -t test ., or build it with the --no-cache option, I get this output:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Running in 5b5b88e5ebce
Hello World
Removing intermediate container 5b5b88e5ebce
---> a23687d914c2
Successfully built a23687d914c2
My echo command executes.
If I run it again without busting the cache, I get this:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Using cache
---> a23687d914c2
Successfully built a23687d914c2
Successfully tagged test-requirements:latest
Cache is used for Step 2/2, and Hello World is not executed. I could get it to execute again by using --no-cache. However, each time, even when I am using --no-cache it uses a cached python:2.7 base image (although, unlike when the echo command is cached, it does not say ---> Using cache).
How do I bust the cache for the FROM python:2.7 line? I know I can do FROM python:latest, but that also seems to just cache whatever the latest version is the first time you build the Dockerfile.
If I understood the context correctly, you can use --pull while using docker build to get the latest base image -
$ docker build -f Dockerfile.test -t test . --pull
So using both --no-cache & --pull will create an absolute fresh image using Dockerfile -
$ docker build -f Dockerfile.test -t test . --pull --no-cache
Issue - https://github.com/moby/moby/issues/4238
FROM pulls an image from the registry (DockerHub in this case).
After the image is pulled to produce your build, you will see it if you run docker images.
You may remove it by running docker rmi python:2.7.

How to automatically delete intermediate stage Docker containers from my system?

I have Dockerfile which I have committed to a git repo. I'm trying to build the container in stages and only save the final container stage with static go binary installed in it.
However also "staging" container seems to be saved to my system. I have tried to automatically delete it by using --rm flag but without success.
Here is my Dockerfile
# Use golang alpine3.7 image for build
FROM golang:1.10.0-alpine3.7
# Maintainer information
LABEL Maintainer="Kimmo Hintikka"
# set working directory to local source
WORKDIR /go/src/github.com/HintikkaKimmo/kube-test-app
# Copy kube-test-app to currect working directory
COPY kube-test-app.go .
# build golang binary from the app source
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o kube-test-app .
# Use Alpine 3.7 as base to mach the build face
FROM alpine:3.7
# Maintainer information
LABEL Maintainer="Kimmo Hintikka"
# Install ca-certificates for HTTPS connections
RUN apk --no-cache add ca-certificates
# Set working directory
WORKDIR /root/
# Copy binary from previous stage. --from=0 means the index of the build action
COPY --from=0 /go/src/github.com/HintikkaKimmo/kube-test-app/kube-test-app .
# Run the binary
CMD [ "./kube-test-app" ]
And here is the command I used to build it.
docker build --rm github.com/hintikkakimmo/kube-test-app -t kube-test-app
This builds successfully but also creates an additional tagless TAG <none> container which is the 1st phase of my build.
~/go/src/github.com/hintikkakimmo/kube-test-app(master*) » docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kube-test-app latest f87087bf2549 8 minutes ago 11.3MB
<none> <none> 1bb40fa05f27 8 minutes ago 397MB
golang 1.10.0-alpine3.7 85256d3905e2 7 days ago 376MB
alpine 3.7 3fd9065eaf02 6 weeks ago 4.15MB
alpine latest 3fd9065eaf02 6 weeks ago 4.15MB
My Docker version is:
~ » docker --version kimmo.hintikka#ie.ibm.com#Kimmos-MacBook-Pro
Docker version 17.12.0-ce, build c97c6d6
You should build it like this:
docker build --rm -t kube-test-app .
If dockerfile is in direcory were you located or specify path to dockerfile
docker build --rm -t kube-test-app -f path/to/dockerfile .
-t is tag, name of your builded docker image
For remove all except images and runing containers use docker system prune
For remove image docker rmi -f image_name
I took a different approach. I set a label on the intermediate container:
FROM golang:1.14.2 AS builder
LABEL builder=true
When I run the build, I simply append a command that seeks and destroys images containing that label:
docker build . -t custom-api:0.0.1 && \
docker rmi `docker images --filter label=builder=true -q`

Resources