Why isn't docker reusing docker-compose's cache layers?

Why isn't docker reusing docker-compose's cache layers? - docker

This is a cut-down example of a problem I'm having with a bigger Dockerfile.
Here's a Dockerfile:
FROM alpine:latest AS base
COPY docker-compose.yml /tmp/docker-compose.yml
RUN touch /tmp/foo
Here's a docker-compose.yml:
version: '3.5'
services:
web:
build:
context: .
What I expect is that docker build will be able to reuse the cached layers that docker-compose builds. What I see when I run docker-compose build web is:
$ docker-compose build web
Building web
Step 1/3 : FROM alpine:latest AS base
---> f70734b6a266
Step 2/3 : COPY docker-compose.yml /tmp/docker-compose.yml
---> 764c54eb3dd4
Step 3/3 : RUN touch /tmp/foo
---> Running in 77bdf96af899
Removing intermediate container 77bdf96af899
---> 7d8197f7004f
Successfully built 7d8197f7004f
Successfully tagged docker-compose-caching_web:latest
If I re-run docker-compose build web, I get:
...
Step 2/3 : COPY docker-compose.yml /tmp/docker-compose.yml
---> Using cache
---> 764c54eb3dd4
...
So it's clearly able to cache the layer with the file in it. However, when I run docker build ., here's the output I see:
$ docker build .
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine:latest AS base
---> f70734b6a266
Step 2/3 : COPY docker-compose.yml /tmp/docker-compose.yml
---> e8679333ba0d
Step 3/3 : RUN touch /tmp/foo
---> Running in af26cc65312d
Removing intermediate container af26cc65312d
---> 186c8341ee96
Successfully built 186c8341ee96
Note step 2 didn't come from the cache. Why not? Or, more importantly, how can I ensure that it does without using --cache-from?
The problem this causes is that after this step in my bigger Dockerfile that I'm not showing, there's a honking great RUN command that takes an age to run. How can I get docker build and docker-compose build to share cache layers?
(Docker Desktop v 2.3.0.2 (45183) on OS X 10.14.6 for those playing along at home)

With Docker-compose 1.25+ (Dec. 2019), try and use:
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build
That is what is needed to enable the docker-cli, instead of the own internal docker-compose build.
See also "Faster builds in Docker Compose 1.25.1 thanks to BuildKit Support".
But be aware of docker-compose issue 7336, when using it with DOCKER_BUILDKIT=1 (in addition of COMPOSE_DOCKER_CLI_BUILD=1)

Looks like a known issue. For reasons I don't entirely understand, hashes generated by docker compose build are different from those generated by docker build.
https://github.com/docker/compose/issues/883

Related

Why docker needs to build all the previous stages?

I need to build 2 stages based on a common one
$ ls
Dockerfile dev other prod
$ cat Dockerfile
FROM scratch as dev
COPY dev /
FROM scratch as other
COPY other /
FROM scratch as prod
COPY --from=dev /env /
COPY prod /
As you can see prod stage does not depend on other stage, however it builds it anyway
$ docker build . --target prod
Sending build context to Docker daemon 4.096kB
Step 1/7 : FROM scratch as dev
--->
Step 2/7 : COPY dev /
---> 64c24f1f1d8c
Step 3/7 : FROM scratch as other
--->
Step 4/7 : COPY other /
---> 9b0753ec4353
Step 5/7 : FROM scratch as prod
--->
Step 6/7 : COPY --from=dev /dev /
---> Using cache
---> 64c24f1f1d8c
Step 7/7 : COPY prod /
---> 9fe8cc3d3ac1
Successfully built 9fe8cc3d3ac1
Why Docker needs to build the other layers ?
How can I build prod without other ? Do I have to use another Dockerfile ?

There are two different backends for docker build. The "classic" backend works exactly the way you describe: it runs through the entire Dockerfile until it reaches the final stage, so even if a stage is unused it will still be executed. The newer BuildKit backend can do some dependency analysis and determine that a stage is never used and skip over it as you request.
Very current versions of Docker use BuildKit as their default backend. Slightly older versions have BuildKit available, but it isn't the default. You can enable it by running
export DOCKER_BUILDKIT=1
in your shell environment where you run docker build.
(It's often a best practice to run the same Docker image in all environments, and to use separate Dockerfiles for separate components. That avoids any questions around which stages exactly get run.)

Getting reproducible docker layers on different hosts

Problem: I can't reproduce docker layers using exactly same content (on one machine or in CI cluster where something is built from git repo)
Consider this simple example
$ echo "test file" > test.txt
$ cat > Dockerfile <<EOF
FROM alpine:3.8
COPY test.txt /test.txt
EOF
If I build image on one machine with caching enabled, then last layer with copied file would be shared across images
$ docker build -t test:1 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
3.8: Pulling from library/alpine
cd784148e348: Already exists
Digest: sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1
Status: Downloaded newer image for alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:1
$ docker build -t test:2 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> Using cache
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:2
But with cache disabled (or simply using another machine) I got different hash values.
$ docker build -t test:3 --no-cache .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> ced4dff22d62
Successfully built ced4dff22d62
Successfully tagged test:3
At the same time history command shows that file content was same
$ docker history test:1
IMAGE CREATED CREATED BY SIZE COMMENT
decab6a3fbe3 6 minutes ago /bin/sh -c #(nop) COPY file:d9210c40895e
$ docker history test:3
IMAGE CREATED CREATED BY SIZE COMMENT
ced4dff22d62 27 seconds ago /bin/sh -c #(nop) COPY file:d9210c40895e
Am I missing something or this behavior is by design?
Are there any technics to get reproducible/reusable layers that does not force me to do one of the following
Share docker cache across machines
Do a pull of "previous" image before building next
Ultimately this problem prevents me from getting thin layers with constantly changing app code while keeping layers of my dependencies in separate and infrequently changed layer.

After some extra googling, I found a great post describing solution to this problem.
Starting from 1.13, docker has --cache-from option that can be used to tell docker to look at another images for layers. Important thing - image should be explicitly pulled for it to work + you still need point what image to take. It could be latest or any other "rolling" image you have.
Given that, unfortunately there is no way to produce same layer in "isolation", but cache-from solves root problem - how to eventually reuse some layers during ci build.

How to bust the cache for the FROM line of a Dockerfile

I have a Dockerfile like this:
FROM python:2.7
RUN echo "Hello World"
When I build this the first time with docker build -f Dockerfile -t test ., or build it with the --no-cache option, I get this output:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Running in 5b5b88e5ebce
Hello World
Removing intermediate container 5b5b88e5ebce
---> a23687d914c2
Successfully built a23687d914c2
My echo command executes.
If I run it again without busting the cache, I get this:
Sending build context to Docker daemon 40.66MB
Step 1/2 : FROM python:2.7
---> 6c76e39e7cfe
Step 2/2 : RUN echo "Hello World"
---> Using cache
---> a23687d914c2
Successfully built a23687d914c2
Successfully tagged test-requirements:latest
Cache is used for Step 2/2, and Hello World is not executed. I could get it to execute again by using --no-cache. However, each time, even when I am using --no-cache it uses a cached python:2.7 base image (although, unlike when the echo command is cached, it does not say ---> Using cache).
How do I bust the cache for the FROM python:2.7 line? I know I can do FROM python:latest, but that also seems to just cache whatever the latest version is the first time you build the Dockerfile.

If I understood the context correctly, you can use --pull while using docker build to get the latest base image -
$ docker build -f Dockerfile.test -t test . --pull
So using both --no-cache & --pull will create an absolute fresh image using Dockerfile -
$ docker build -f Dockerfile.test -t test . --pull --no-cache
Issue - https://github.com/moby/moby/issues/4238

FROM pulls an image from the registry (DockerHub in this case).
After the image is pulled to produce your build, you will see it if you run docker images.
You may remove it by running docker rmi python:2.7.

Docker Compose vs Multi-Stage Build

With this new version of Docker, Multi-Stage build gets introduced, at least I'd never heard of it before. And the question I have now is, should I use it like a standard Compose file?
I used docker-compose.yaml to start containers where many images where involved, one for the web server and one for the data base. With this new multi-stage build, can I use one single Dockerfile with two FROM commands and that's it?
Will this Multi-stage build eventually kill Compose (since images are smaller)?

Multi-stage builds don't impact the use of docker-compose (though you may want to look into using docker stack deploy with swarm mode to use that compose on a swarm). Compose is still needed to connect multiple microservices together, e.g. running a proxy, a few applications, and an in memory cache. Compose also simplifies passing all the configuration options to a complex docker image, attaching networks and volumes, configuring restart policies, swarm constraints, etc. All of these could be done with lots of scripting, but are made easier by a simple yaml definition.
What multi-stage builds do replace is a multiple step build where you may have a build environment that should be different than a runtime environment. This is all prior to the docker-compose configuration of running your containers.
The popular example is a go binary. That binary is statically compiled so it doesn't really need anything else to run. But the build environment for it is much larger as it pulls in the compiler and various libraries. Here's an example hello.go:
package main
import "fmt"
func main() {
fmt.Printf("Hello, world.\n")
}
And the corresponding Dockerfile:
ARG GOLANG_VER=1.8
FROM golang:${GOLANG_VER} as builder
WORKDIR /go/src/app
COPY . .
RUN go-wrapper download
RUN go-wrapper install
FROM scratch
COPY --from=builder /go/bin/app /app
CMD ["/app"]
The two FROM lines that that Dockerfile are what make it a multi-stage build. The first FROM line creates the first stage with the go compiler. The second FROM line is also the last which makes it the default image to tag when you build. In this case, that stage is the runtime of a single binary. Other stages are all cached on the build server but don't get copied with the final image. You can target the build to different stages if you need to build a single piece with the docker build --target=builder . command.
This becomes important when you look at the result of the build:
$ docker build -t test-mult-stage .
Sending build context to Docker daemon 4.096kB
Step 1/9 : ARG GOLANG_VER=1.8
--->
Step 2/9 : FROM golang:${GOLANG_VER} as builder
---> a0c61f0b0796
Step 3/9 : WORKDIR /go/src/app
---> Using cache
---> af5177aae437
Step 4/9 : COPY . .
---> Using cache
---> 976490d44468
Step 5/9 : RUN go-wrapper download
---> Using cache
---> e31ac3ce83c3
Step 6/9 : RUN go-wrapper install
---> Using cache
---> 2630f482fe78
Step 7/9 : FROM scratch
--->
Step 8/9 : COPY --from=builder /go/bin/app /app
---> 96b9364cdcdc
Removing intermediate container ed558a4da820
Step 9/9 : CMD /app
---> Running in 55db8ed593ac
---> 5fd74a4d4235
Removing intermediate container 55db8ed593ac
Successfully built 5fd74a4d4235
Successfully tagged test-mult-stage:latest
$ docker images | grep 2630
<none> <none> 2630f482fe78 5 weeks ago 700MB
$ docker images | grep test-mult-stage
test-mult-stage latest 5fd74a4d4235 33 seconds ago 1.56MB
Note the runtime image is only 1.5 MB, while the untaged builder image with the compiler is 700MB. Previously to get the same space savings you would need to compile your application outside of docker and deal with all the dependency issues that docker would normally solve for you. Or you could do the build in one container, copy the result out of that container, and use that copied file as the input to another build. The multi-stage build turns this second option into a single reproducible and portable command.

Multi-stage feature allows you to create temporary builds and extract their files to be used in your final build. For e.g. you need gcc to build your libraries but you don't need gcc in production container. Though, you could do multiple builds using few lines of bash scripting, multi-stage feature allows you to do it using a single Dockerfile. Compose only use your final image(s) regardless of how you've built it, so they are unrelated.

Dockerfile: mkdir and COPY commands run fine but I can't see the directory and file

I am using jenkins image to create a docker container. For now I am just trying to create a new directory and copy a couple of files. The image build process runs fine but when I start the container I cannot see the files and the directory.
Here is my dockerfile
FROM jenkins:2.46.1
MAINTAINER MandeepSinghGulati
USER jenkins
RUN mkdir /var/jenkins_home/aws
COPY aws/config /var/jenkins_home/aws/
COPY aws/credentials /var/jenkins_home/aws/
I found a similar question here but it seems different because I am not creating the jenkins user. It already exists with home directory /var/jenkins_home/. Not sure what I am doing wrong
Here is how I am building my image and starting the container:
➜ jenkins_test docker build -t "test" .
Sending build context to Docker daemon 5.632 kB
Step 1/6 : FROM jenkins:2.46.1
---> 04c1dd56a3d8
Step 2/6 : MAINTAINER MandeepSinghGulati
---> Using cache
---> 7f76c0f7fc2d
Step 3/6 : USER jenkins
---> Running in 5dcbf4ef9f82
---> 6a64edc2d2cb
Removing intermediate container 5dcbf4ef9f82
Step 4/6 : RUN mkdir /var/jenkins_home/aws
---> Running in 1eb86a351beb
---> b42587697aec
Removing intermediate container 1eb86a351beb
Step 5/6 : COPY aws/config /var/jenkins_home/aws/
---> a9d9a28fd777
Removing intermediate container ca4a708edc6e
Step 6/6 : COPY aws/credentials /var/jenkins_home/aws/
---> 9f9ee5a603a1
Removing intermediate container 592ad0031f49
Successfully built 9f9ee5a603a1
➜ jenkins_test docker run -it -v $HOME/jenkins:/var/jenkins_home -p 8080:8080 --name=test-container test
If I run the command without the volume mount, I can see the copied files and the directory. However with the volume mount I cannot see the same. Even if I empty the directory on the host machine. Is this the expected behaviour? How can I copy over files to the directory being used as a volume ?

Existing volumes can be mounted with
docker container run -v MY-VOLUME:/var/jenkins_home ...
Furthermore, the documentation of COPY states:
All new files and directories are created with a UID and GID of 0.
So COPY does not reflect your USER directive. This seems to be the second part of your problem.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart