Problem: I can't reproduce docker layers using exactly same content (on one machine or in CI cluster where something is built from git repo)
Consider this simple example
$ echo "test file" > test.txt
$ cat > Dockerfile <<EOF
FROM alpine:3.8
COPY test.txt /test.txt
EOF
If I build image on one machine with caching enabled, then last layer with copied file would be shared across images
$ docker build -t test:1 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
3.8: Pulling from library/alpine
cd784148e348: Already exists
Digest: sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1
Status: Downloaded newer image for alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:1
$ docker build -t test:2 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> Using cache
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:2
But with cache disabled (or simply using another machine) I got different hash values.
$ docker build -t test:3 --no-cache .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> ced4dff22d62
Successfully built ced4dff22d62
Successfully tagged test:3
At the same time history command shows that file content was same
$ docker history test:1
IMAGE CREATED CREATED BY SIZE COMMENT
decab6a3fbe3 6 minutes ago /bin/sh -c #(nop) COPY file:d9210c40895e
$ docker history test:3
IMAGE CREATED CREATED BY SIZE COMMENT
ced4dff22d62 27 seconds ago /bin/sh -c #(nop) COPY file:d9210c40895e
Am I missing something or this behavior is by design?
Are there any technics to get reproducible/reusable layers that does not force me to do one of the following
Share docker cache across machines
Do a pull of "previous" image before building next
Ultimately this problem prevents me from getting thin layers with constantly changing app code while keeping layers of my dependencies in separate and infrequently changed layer.
After some extra googling, I found a great post describing solution to this problem.
Starting from 1.13, docker has --cache-from option that can be used to tell docker to look at another images for layers. Important thing - image should be explicitly pulled for it to work + you still need point what image to take. It could be latest or any other "rolling" image you have.
Given that, unfortunately there is no way to produce same layer in "isolation", but cache-from solves root problem - how to eventually reuse some layers during ci build.
Related
I've got a Makefile with a docker recipe which builds a docker image by doing make docker. The recipe looks like this:
# Build the docker file
docker: .setup ${GOBLDDIR}/docker-image
I wanted to set this up so that the docker file isn't rebuilt if everything is up-to-date, so I set up the ${GOBLDDIR}/docker-image dependancy. That dependancy is just a text file with the docker image ID. Here's that recipe, which actually does the docker build:
# DO THE DOCKER BUILD
${GOBLDDIR}/docker-image: ${GOBLDDIR} ${GOSRCFILES} go.mod Makefile Dockerfile
# Vendor our dependencies so we don't have to download them in the docker container
go mod vendor
# Wherever the config lives, we need it to be in the Docker build context
mkdir -p ./.docker-files
cp ${TFOUT} ./.docker-files/config.json
# The go app is actually compiled within this dockerfile
docker build . -t ${GOCMD} --build-arg AWS_PROFILE=${AWS_PROFILE} --build-arg SVC=${GOCMD} --build-arg VERSTR=${VERSTR}
docker images ${GOCMD} -q > ${GOBLDDIR}/docker-image
docker tag ${GOCMD}:latest ${GOCMD}:dev
docker tag ${GOCMD}:latest ${GOCMD}:${VERSTR}
# Clean up the crap we created just to build the Dockerfile
rm -rf vendor/
rm -rf ./.docker-files
Maybe this is an insane design - feels that way to me. There are certianly times when it doesn't work. I'm open to other ideas.
In particular, is there a way to make a dependency not be a file, but be the result of a command? For example, something like: docker inspect -f '{{ .Created }}' MY_IMAGE_NAME?
Your approach is generally a good approach. You can add checking not necessarily for the timestamp of creation time, but whether the image is actually the one that you previously built (someone else may have built a newer image with a different contents that may not reflect your repository anymore).
In general make decides whether to make a target or not by comparing timestamps of dependencies, so they are most commonly files. The list of dependencies may be manipulated however with some logic, which allows you to run arbitrary checks.
In your scenario you already store the image ID in a file. This may now be used to check whether the current image ID is the same that we built previously. We may compare output of the same command (extracted to a variable for DRY-ness) with the stored contents upon dependencies evaluation; if they do not match, we issue dependency of FORCE which is .PHONY and therefore always out of date, effectively triggering target remake:
$ cat Makefile
docker-id = docker image ls -q $(DOCKER_IMAGE)
docker-image: Dockerfile $(if $(findstring $(shell $(docker-id)),$(file <docker-image)),,FORCE)
docker build . -t $(DOCKER_IMAGE)
$(docker-id) > $#
.PHONY: FORCE
Output:
# Initial image build
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Running in 6a9601225516
16517
---> 54b651cb8912
Removing intermediate container 6a9601225516
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# No files changed, image exists
$ make docker-image DOCKER_IMAGE=test
make: 'docker-image' is up to date.
# Changing Dockerfile forces rebuild
$ touch Dockerfile
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Using cache
---> 54b651cb8912
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# Mismatched id forces rebuild
$ echo foobar > docker-image
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Using cache
---> 54b651cb8912
Successfully built 54b651cb8912
docker image ls -q test > docker-image
# Missing image forces rebuild
$ docker image rm test
Untagged: test:latest
Deleted: sha256:54b651cb8912a5d2505f1f92c8ea4ee367cdca0d3f8e6f1ebc69d0cf646ca72c
Deleted: sha256:3749695bf207f2ca0829d8faf0ef538af1e18e8c39228347f48fd6d7c149f73a
$ make docker-image DOCKER_IMAGE=test
docker build . -t test
Sending build context to Docker daemon 7.168 kB
Step 1/2 : FROM alpine
---> 965ea09ff2eb
Step 2/2 : RUN echo 16517
---> Running in 70f8b09a92b3
16517
---> 8bdfdfff9db2
Removing intermediate container 70f8b09a92b3
Successfully built 8bdfdfff9db2
docker image ls -q test > docker-image
Does docker build --no-cache refresh updated remote base images or not? Documentation does not seem to specify.
The --no-cache option will rebuild the image without using the local cached layers. However, the FROM line will reuse the already pulled base image if it exists on the build host (the from line itself may not be cached, but the image it pulls is). If you want to pull the base image again, you can use the --pull option to the build command. E.g.
$ docker build --no-cache --pull -t new-image-name:latest .
To see all the options the build command takes, you can run
$ docker build --help
or see the documentation at https://docs.docker.com/engine/reference/commandline/build/
Here's an example for how you can test this behavior yourself:
$ # very simple Dockerfile
$ cat df.test
FROM busybox:latest
RUN echo hello >test.txt
$ # pull an older version of busybox
$ docker pull busybox:1.29.2
1.29.2: Pulling from library/busybox
8c5a7da1afbc: Pull complete
Digest: sha256:cb63aa0641a885f54de20f61d152187419e8f6b159ed11a251a09d115fdff9bd
Status: Downloaded newer image for busybox:1.29.2
$ # retag that locally as latest
$ docker tag busybox:1.29.2 busybox:latest
$ # run the build, note the image id at the end of each build step
$ DOCKER_BUILDKIT=0 docker build --no-cache -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
---> e1ddd7948a1c
Step 2/2 : RUN echo hello >test.txt
---> Running in dba83fef49f9
Removing intermediate container dba83fef49f9
---> 1f824ff05612
Successfully built 1f824ff05612
$ # rerun the build, note step 1 keeps the same id and never pulled a new latest
$ DOCKER_BUILDKIT=0 docker build --no-cache -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
---> e1ddd7948a1c
Step 2/2 : RUN echo hello >test.txt
---> Running in 73df884b0f48
Removing intermediate container 73df884b0f48
---> e5870de6c24f
Successfully built e5870de6c24f
$ # run with --pull and see docker update the latest image, new container id from step 1
$ DOCKER_BUILDKIT=0 docker build --no-cache --pull -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Downloaded newer image for busybox:latest
---> 59788edf1f3e
Step 2/2 : RUN echo hello >test.txt
---> Running in 7204116ecbf4
Removing intermediate container 7204116ecbf4
---> 2c6d8c48661b
Successfully built 2c6d8c48661b
$ # one last run now that busybox:latest is updated shows the pull has nothing to do
$ DOCKER_BUILDKIT=0 docker build --no-cache --pull -f df.test .
Sending build context to Docker daemon 23.04kB
Step 1/2 : FROM busybox:latest
latest: Pulling from library/busybox
Digest: sha256:2a03a6059f21e150ae84b0973863609494aad70f0a80eaeb64bddd8d92465812
Status: Image is up to date for busybox:latest
---> 59788edf1f3e
Step 2/2 : RUN echo hello >test.txt
---> Running in f37e19024e99
Removing intermediate container f37e19024e99
---> 044a5d4011c4
Successfully built 044a5d4011c4
docker build --no-cache will rebuild your whole image without reusing cached layers but it will not pull the newest base image from the remote repository. It will just use your local stored image.
--no-cache rebuilds image without using the cache, so it is essentially a clean build.
as per the help docker build --help
--no-cache Do not use cache when building the image
Each dockerfile commands that we specify in the docker file like RUN, CMD, ADD create layers in your local system and this layers will be used by other docker images provided they too make use of same dockerfile command with same parameters.
When we specify docker build with "--no-cache" parameter then docker will ignore the local system docker image layers that were already available in the local system where you are building the docker and it always start the build as fresh build or from scratch and the reference count for the previous layer, if any; won't be added while building this new image layer.
You can find the layers of image by following this link
Finding the layers and layer sizes for each Docker image
I have a Dockerfile like this:
FROM python:2.7
RUN echo "Hello World"
When I build this the first time with docker build -f Dockerfile -t test ., or build it with the --no-cache option, I get this output:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Running in 5b5b88e5ebce
Hello World
Removing intermediate container 5b5b88e5ebce
---> a23687d914c2
Successfully built a23687d914c2
My echo command executes.
If I run it again without busting the cache, I get this:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Using cache
---> a23687d914c2
Successfully built a23687d914c2
Successfully tagged test-requirements:latest
Cache is used for Step 2/2, and Hello World is not executed. I could get it to execute again by using --no-cache. However, each time, even when I am using --no-cache it uses a cached python:2.7 base image (although, unlike when the echo command is cached, it does not say ---> Using cache).
How do I bust the cache for the FROM python:2.7 line? I know I can do FROM python:latest, but that also seems to just cache whatever the latest version is the first time you build the Dockerfile.
If I understood the context correctly, you can use --pull while using docker build to get the latest base image -
$ docker build -f Dockerfile.test -t test . --pull
So using both --no-cache & --pull will create an absolute fresh image using Dockerfile -
$ docker build -f Dockerfile.test -t test . --pull --no-cache
Issue - https://github.com/moby/moby/issues/4238
FROM pulls an image from the registry (DockerHub in this case).
After the image is pulled to produce your build, you will see it if you run docker images.
You may remove it by running docker rmi python:2.7.
I would like to understand the execution steps involved in building Docker Images using Dockerfile. Couple of questions I have listed down below. Please help me in understanding the build process.
Dockerfile content
#from base image
FROM ubuntu:14.04
#author name
MAINTAINER RAGHU
#commands to run in the container
RUN echo "hello Raghu"
RUN sleep 10
RUN echo "TASK COMPLETED"
Command used to build the image: docker build -t raghavendar/hands-on:2.0 .
Sending build context to Docker daemon 20.04 MB
Step 1 : FROM ubuntu:14.04
---> b1719e1db756
Step 2 : MAINTAINER RAGHU
---> Running in 532ed79e6d55
---> ea6184bb8ef5
Removing intermediate container 532ed79e6d55
Step 3 : RUN echo "hello Raghu"
---> Running in da327c9b871a
hello Raghu
---> f02ff92252e2
Removing intermediate container da327c9b871a
Step 4 : RUN sleep 10
---> Running in aa58dea59595
---> fe9e9648e969
Removing intermediate container aa58dea59595
Step 5 : RUN echo "TASK COMPLETED"
---> Running in 612adda45c52
TASK COMPLETED
---> 86c73954ea96
Removing intermediate container 612adda45c52
Successfully built 86c73954ea96
In step 2 :
Step 2 : MAINTAINER RAGHU
---> Running in 532ed79e6d55
Question 1 : it indicates that it is running in the container with id - 532ed79e6d55, but with what Docker image is this container formed ?
---> ea6184bb8ef5
Question 2 : what is this id? Is it an image or container ?
Removing intermediate container 532ed79e6d55
Question 3 : Is the final image formed with multiple layers saved from intermediate containers?
Yes, Docker images are layered. When you build a new image, Docker does this for each instruction (RUN, COPY etc.) in your Dockerfile:
create a temporary container from the previous image layer (or the base FROM image for the first command;
run the Dockerfile instruction in the temporary "intermediate" container;
save the temporary container as a new image layer.
The final image layer is tagged with whatever you name the image - this will be clear if you run docker history raghavendar/hands-on:2.0, you'll see each layer and an abbreviation of the instruction that created it.
Your specific queries:
1) 532 is a temporary container created from image ID b17, which is your FROM image, ubuntu:14.04.
2) ea6 is the image layer created as the output of the instruction, i.e. from saving intermediate container 532.
3) yes. Docker calls this the Union File System and it's the main reason why images are so efficient.
docker build --rm=true
This is the default option, which makes it to delete all intermediate images after a successful build.
Does it affect the caching adversely? Since cache relies on the intermediate images I think?
Why not try it and find out?
$ cat Dockerfile
FROM debian
RUN touch /x
RUN touch /y
$ docker build --rm .
Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon
Step 0 : FROM debian
---> df2a0347c9d0
Step 1 : RUN touch /x
---> Running in 2e5ff13506e5
---> fd4dd6845e31
Removing intermediate container 2e5ff13506e5
Step 2 : RUN touch /y
---> Running in b2a585989fa5
---> 0093f530941b
Removing intermediate container b2a585989fa5
Successfully built 0093f530941b
$ docker build --rm .
Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon
Step 0 : FROM debian
---> df2a0347c9d0
Step 1 : RUN touch /x
---> Using cache
---> fd4dd6845e31
Step 2 : RUN touch /y
---> Using cache
---> 0093f530941b
Successfully built 0093f530941b
So no, the cache still works. As you pointed out, --rm is actually on by default (you would have to run --rm=false to turn it off), but it refers to the intermediate containers not the intermediate images. These are the containers that Docker ran your build commands in to create the images. In some cases you might want to keep those containers around for debugging, but normally the images are enough. In the above output, we can see the containers 2e5ff13506e5 and b2a585989fa5, which are deleted, but also the images fd4dd6845e31 and 0093f530941b which are kept.
You can't delete the intermediate images as they are needed by the final image (an image is the last layer plus all ancestor layers).