Docker Compose vs Multi-Stage Build - docker

With this new version of Docker, Multi-Stage build gets introduced, at least I'd never heard of it before. And the question I have now is, should I use it like a standard Compose file?
I used docker-compose.yaml to start containers where many images where involved, one for the web server and one for the data base. With this new multi-stage build, can I use one single Dockerfile with two FROM commands and that's it?
Will this Multi-stage build eventually kill Compose (since images are smaller)?

Multi-stage builds don't impact the use of docker-compose (though you may want to look into using docker stack deploy with swarm mode to use that compose on a swarm). Compose is still needed to connect multiple microservices together, e.g. running a proxy, a few applications, and an in memory cache. Compose also simplifies passing all the configuration options to a complex docker image, attaching networks and volumes, configuring restart policies, swarm constraints, etc. All of these could be done with lots of scripting, but are made easier by a simple yaml definition.
What multi-stage builds do replace is a multiple step build where you may have a build environment that should be different than a runtime environment. This is all prior to the docker-compose configuration of running your containers.
The popular example is a go binary. That binary is statically compiled so it doesn't really need anything else to run. But the build environment for it is much larger as it pulls in the compiler and various libraries. Here's an example hello.go:
package main
import "fmt"
func main() {
fmt.Printf("Hello, world.\n")
}
And the corresponding Dockerfile:
ARG GOLANG_VER=1.8
FROM golang:${GOLANG_VER} as builder
WORKDIR /go/src/app
COPY . .
RUN go-wrapper download
RUN go-wrapper install
FROM scratch
COPY --from=builder /go/bin/app /app
CMD ["/app"]
The two FROM lines that that Dockerfile are what make it a multi-stage build. The first FROM line creates the first stage with the go compiler. The second FROM line is also the last which makes it the default image to tag when you build. In this case, that stage is the runtime of a single binary. Other stages are all cached on the build server but don't get copied with the final image. You can target the build to different stages if you need to build a single piece with the docker build --target=builder . command.
This becomes important when you look at the result of the build:
$ docker build -t test-mult-stage .
Sending build context to Docker daemon 4.096kB
Step 1/9 : ARG GOLANG_VER=1.8
--->
Step 2/9 : FROM golang:${GOLANG_VER} as builder
---> a0c61f0b0796
Step 3/9 : WORKDIR /go/src/app
---> Using cache
---> af5177aae437
Step 4/9 : COPY . .
---> Using cache
---> 976490d44468
Step 5/9 : RUN go-wrapper download
---> Using cache
---> e31ac3ce83c3
Step 6/9 : RUN go-wrapper install
---> Using cache
---> 2630f482fe78
Step 7/9 : FROM scratch
--->
Step 8/9 : COPY --from=builder /go/bin/app /app
---> 96b9364cdcdc
Removing intermediate container ed558a4da820
Step 9/9 : CMD /app
---> Running in 55db8ed593ac
---> 5fd74a4d4235
Removing intermediate container 55db8ed593ac
Successfully built 5fd74a4d4235
Successfully tagged test-mult-stage:latest
$ docker images | grep 2630
<none> <none> 2630f482fe78 5 weeks ago 700MB
$ docker images | grep test-mult-stage
test-mult-stage latest 5fd74a4d4235 33 seconds ago 1.56MB
Note the runtime image is only 1.5 MB, while the untaged builder image with the compiler is 700MB. Previously to get the same space savings you would need to compile your application outside of docker and deal with all the dependency issues that docker would normally solve for you. Or you could do the build in one container, copy the result out of that container, and use that copied file as the input to another build. The multi-stage build turns this second option into a single reproducible and portable command.

Multi-stage feature allows you to create temporary builds and extract their files to be used in your final build. For e.g. you need gcc to build your libraries but you don't need gcc in production container. Though, you could do multiple builds using few lines of bash scripting, multi-stage feature allows you to do it using a single Dockerfile. Compose only use your final image(s) regardless of how you've built it, so they are unrelated.

Related

Why docker needs to build all the previous stages?

I need to build 2 stages based on a common one
$ ls
Dockerfile dev other prod
$ cat Dockerfile
FROM scratch as dev
COPY dev /
FROM scratch as other
COPY other /
FROM scratch as prod
COPY --from=dev /env /
COPY prod /
As you can see prod stage does not depend on other stage, however it builds it anyway
$ docker build . --target prod
Sending build context to Docker daemon 4.096kB
Step 1/7 : FROM scratch as dev
--->
Step 2/7 : COPY dev /
---> 64c24f1f1d8c
Step 3/7 : FROM scratch as other
--->
Step 4/7 : COPY other /
---> 9b0753ec4353
Step 5/7 : FROM scratch as prod
--->
Step 6/7 : COPY --from=dev /dev /
---> Using cache
---> 64c24f1f1d8c
Step 7/7 : COPY prod /
---> 9fe8cc3d3ac1
Successfully built 9fe8cc3d3ac1
Why Docker needs to build the other layers ?
How can I build prod without other ? Do I have to use another Dockerfile ?
There are two different backends for docker build. The "classic" backend works exactly the way you describe: it runs through the entire Dockerfile until it reaches the final stage, so even if a stage is unused it will still be executed. The newer BuildKit backend can do some dependency analysis and determine that a stage is never used and skip over it as you request.
Very current versions of Docker use BuildKit as their default backend. Slightly older versions have BuildKit available, but it isn't the default. You can enable it by running
export DOCKER_BUILDKIT=1
in your shell environment where you run docker build.
(It's often a best practice to run the same Docker image in all environments, and to use separate Dockerfiles for separate components. That avoids any questions around which stages exactly get run.)

Why isn't docker reusing docker-compose's cache layers?

This is a cut-down example of a problem I'm having with a bigger Dockerfile.
Here's a Dockerfile:
FROM alpine:latest AS base
COPY docker-compose.yml /tmp/docker-compose.yml
RUN touch /tmp/foo
Here's a docker-compose.yml:
version: '3.5'
services:
web:
build:
context: .
What I expect is that docker build will be able to reuse the cached layers that docker-compose builds. What I see when I run docker-compose build web is:
$ docker-compose build web
Building web
Step 1/3 : FROM alpine:latest AS base
---> f70734b6a266
Step 2/3 : COPY docker-compose.yml /tmp/docker-compose.yml
---> 764c54eb3dd4
Step 3/3 : RUN touch /tmp/foo
---> Running in 77bdf96af899
Removing intermediate container 77bdf96af899
---> 7d8197f7004f
Successfully built 7d8197f7004f
Successfully tagged docker-compose-caching_web:latest
If I re-run docker-compose build web, I get:
...
Step 2/3 : COPY docker-compose.yml /tmp/docker-compose.yml
---> Using cache
---> 764c54eb3dd4
...
So it's clearly able to cache the layer with the file in it. However, when I run docker build ., here's the output I see:
$ docker build .
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine:latest AS base
---> f70734b6a266
Step 2/3 : COPY docker-compose.yml /tmp/docker-compose.yml
---> e8679333ba0d
Step 3/3 : RUN touch /tmp/foo
---> Running in af26cc65312d
Removing intermediate container af26cc65312d
---> 186c8341ee96
Successfully built 186c8341ee96
Note step 2 didn't come from the cache. Why not? Or, more importantly, how can I ensure that it does without using --cache-from?
The problem this causes is that after this step in my bigger Dockerfile that I'm not showing, there's a honking great RUN command that takes an age to run. How can I get docker build and docker-compose build to share cache layers?
(Docker Desktop v 2.3.0.2 (45183) on OS X 10.14.6 for those playing along at home)
With Docker-compose 1.25+ (Dec. 2019), try and use:
COMPOSE_DOCKER_CLI_BUILD=1 docker-compose build
That is what is needed to enable the docker-cli, instead of the own internal docker-compose build.
See also "Faster builds in Docker Compose 1.25.1 thanks to BuildKit Support".
But be aware of docker-compose issue 7336, when using it with DOCKER_BUILDKIT=1 (in addition of COMPOSE_DOCKER_CLI_BUILD=1)
Looks like a known issue. For reasons I don't entirely understand, hashes generated by docker compose build are different from those generated by docker build.
https://github.com/docker/compose/issues/883

ENV, RUN produce layers, images or containers? (understanding docs question)

I've searched site:stackoverflow.com dockerfile: ENV, RUN - layers or images and have read Does Docker EXPOSE make a new layer? and What are Docker image "layers"?.
While reading docs Best practices for writing Dockerfiles and trying to understand this part:
Each ENV line creates a new intermediate layer, just like RUN
commands. This means that even if you unset the environment variable
in a future layer, it still persists in this layer and its value can
be dumped.
I recalled that part above:
In older versions of Docker, it was important that you minimized the
number of layers in your images to ensure they were performant. The
following features were added to reduce this limitation:
Only the instructions RUN, COPY, ADD create layers. Other instructions
create temporary intermediate images, and do not increase the size of
the build.
I've read How to unset "ENV" in dockerfile?. and redid the example given on doc page, it indeed proves ENV is not unset:
$ docker build -t alpine:envtest -<<HITHERE
> FROM alpine
> ENV ADMIN_USER="mark"
> RUN unset ADMIN_USER
> HITHERE
Sending build context to Docker daemon 3.072kB
Step 1/3 : FROM alpine
latest: Pulling from library/alpine
89d9c30c1d48: Already exists
Digest: sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a
Status: Downloaded newer image for alpine:latest
---> 965ea09ff2eb
Step 2/3 : ENV ADMIN_USER="mark"
---> Running in 5d34f829a387
Removing intermediate container 5d34f829a387
---> e9c50b16c0e1
Step 3/3 : RUN unset ADMIN_USER
---> Running in dbcf57ca390d
Removing intermediate container dbcf57ca390d
---> 2cb4de2e0257
Successfully built 2cb4de2e0257
Successfully tagged alpine:envtest
$ docker run --rm alpine:envtest sh -c 'echo $ADMIN_USER'
mark
And the output says same "Removing intermediate container" for both ENV and RUN.
I've recently downloaded docker, don't think it is that old:
$ docker --version
Docker version 19.03.5, build 633a0ea
Maybe RUN instruction and RUN command is a different thing?
ENV, RUN - do they create layers, images or containers?
The Docker as containerization system based on two main concepts of image and container. The major difference between them is the top writable layer. When you generate a new container the new writable layer will be put above the last image layer. This layer is often called the container layer.
All the underlying image content remains unchanged and each change in the running container that creating new files, modifying existing files and so on will be copied in this thin writable layer.
In this case, Docker only stores the actual containers data and one image instance, which decreases storage usage and simplifies the underlying workflow. I would compare it with static and dynamic linking in C language, so Docker uses dynamic linking.
The image is a combination of layers. Each layer is only a set of differences from the layer before it.
The documentation says:
Only the instructions RUN, COPY, ADD create layers. Other instructions
create temporary intermediate images, and do not increase the size of
the build.
The description here is neither really clear nor accurate, and generally speaking these aren't the only instructions that create layers in the latest versions of Docker, as the documentation outlines.
For example, by default WORKDIR creates a given path if it does not exist and change directory to it. If the new path was created WORKDIR will generate a new layer.
By the way, ENV doesn't lead to layer creation. The data will be stored permanently in image and container config and there is no easy way to get rid of it. Basically, there are two options, how to organize workflow:
Temporal environment variables, they will be available until the end of the current RUN directive:
RUN export NAME='megatron' && echo $NAME # 'megatron'
RUN echo $NAME # blank
Clean environment variable, if there is no difference for you between the absence of env or blank content of it, then you could do:
ENV NAME='megatron'
# some instructions
ENV NAME=''
RUN echo $NAME
In the context of Docker, there is no distinction between commands and instructions. For RUN any commands that don't change filesystem content won't trigger permanent layers creation. Consider the following Dockerfile:
FROM alpine:latest
RUN echo "Hello World" # no layer
RUN touch file.txt # new layer
WORKDIR /no/existing/path # new layer
In the end, the output would be:
Step 1/4 : FROM alpine:latest
---> 965ea09ff2eb
Step 2/4 : RUN echo "Hello World"
---> Running in 451adb70f017
Hello World
Removing intermediate container 451adb70f017
---> 816ccbd1e8aa
Step 3/4 : RUN touch file.txt
---> Running in 9edc6afdd1e5
Removing intermediate container 9edc6afdd1e5
---> ea0040ec0312
Step 4/4 : WORKDIR /no/existing/path
---> Running in ec0feaf6710d
Removing intermediate container ec0feaf6710d
---> f2fe46478f7c
Successfully built f2fe46478f7c
Successfully tagged envtest:lastest
There is inspect command for inspecting Docker objects:
docker inspect --format='{{json .RootFS.Layers}}' <image_id>
Which shows us the list of SHA of three layers getting FROM, second RUN and WORKDIR directives, I would recommend using dive for exploring each layer in a docker image.
So why does it say removing intermediate container and not removing intermediate layer? Actually to execute RUN commands Docker needs to instantiate a container with the intermediate image up to that line of the Dockerfile and run the actual command. It will then "commit" the state of the container as a new intermediate image and continue the building process.

Docker COPY does not cp

i would like to build my first Docker image, containing Apache Tomcat and a deployed web app. My Dockerfile is really small, based on Tomcat:8.0 image and is supposed to copy a WAR file into Tomcat's appBase.
Build of the image reports success, but the file is nowhere to be found in the container.
Copying from host to the container work w/o issues using "docker cp":
[root#edubox dock]# docker cp jdbcdemo_3.war 15dd44bbf992:/usr/local/tomcat/webapps/
My Dockerfile:
# we are extending everything from tomcat:8.0 image ...
FROM tomcat:8.0
MAINTAINER simo
# COPY path-to-your-application-war path-to-webapps-in-docker-tomcat
COPY ./jdbcdemo_3.war /usr/local/tomcat/webapps/
EXPOSE 8082
Image build:
root#edubox dock]# docker image build -t simo/jdbcdemo_3 --tag=recent ./
Sending build context to Docker daemon 10.24 kB
Step 1/4 : FROM tomcat:8.0
---> ef6a7c98d192
Step 2/4 : MAINTAINER simo
---> Using cache
---> 54d824d7258b
Step 3/4 : COPY ./jdbcdemo_3.war /usr/local/tomcat/webapps/
---> Using cache
---> f94330423a93
Step 4/4 : EXPOSE 8082
---> Running in 74b6dd0364b2
---> 9464f11ac18e
Removing intermediate container 74b6dd0364b2
Successfully built 9464f11ac18e
I would expect COPY to place the file where specified or an error message because of which this does not work.
Please try this way,
Keep the "jdbcdemo_3.war" where your Dockerfile exists. And make this change in the Dockerfile.
(remove ./ from your Dockerfile) like,
COPY jdbcdemo_3.war /usr/local/tomcat/webapps/
please check the permission side of the file.
you can give full permission and test once. (user:group)
try this: COPY jdbcdemo_3.war /tmp
In your Dockerfile. And build the image and check in the /tmp directory. If the file copied successfully, give the permission to /usr/local/tomcat/webapps/ Or copy to /tmp first and then copy from /tmp to /usr/local/tomcat/webapps/. Using COPY command in Dockerfile
Hi and many thanks for offering advice. The issue has been trivial in the end. I have not been examining the correct container.
I did not realize one needs to pick the freshly created image, run the container with this image and only afterwards peek for changes described in the Dockerfile in that container.
I have been looking into parent container which i now understand could not have worked.
Sry for wasting your time ;-)

How to persist changes made by maven dependency:go-offline in docker image

I'm Test Automation engineer and working in big product company. Companies big monolithic project being divided and parts are departuring into clouds. As part of such redesign Test Automation projects should also get cloudy. Our typical TA project based on groovy, selenium, testng and maven. Now I want to try the option of putting whole TA maven project into Docker image\container. Its works well, but on the very first run it starts download dependencies into local .m2 repository. I want to speed up and have this task done on a creation image stage.
Here is a my DOCKERFILE:
FROM maven:3.3-jdk-8
LABEL description="Embedded portal-web-testing"
MAINTAINER NNN
COPY ./settings.xml /root/.m2/
COPY ./acceptance-tests ./acceptance-tests
WORKDIR acceptance-tests
RUN mvn dependency:go-offline --debug >log
RUN ls /root/.m2/
#RUN mvn test
ENTRYPOINT ["bash"]
And here is a log:
Step 1 : FROM maven:3.3-jdk-8
---> 7addddbdd730
Step 2 : LABEL description "Embedded portal-web-testing"
---> Running in 1d195ccb9c57
---> f5372b024ca0
Removing intermediate container 1d195ccb9c57
Step 3 : MAINTAINER NNN
---> Running in 03ebbffda680
---> cb12da3d8ec6
Removing intermediate container 03ebbffda680
Step 4 : COPY ./settings.xml /root/.m2/
---> 164999e1f63a
Removing intermediate container 1e1778d2533b
Step 5 : COPY ./acceptance-tests ./acceptance-tests
---> 7d93fff4193e
Removing intermediate container a5d04eb30591
Step 6 : WORKDIR acceptance-tests
---> Running in f15111475fc6
---> beb4d090362b
Removing intermediate container f15111475fc6
Step 7 : RUN mvn dependency:go-offline --debug >log
---> Running in 2c09f1869143
---> 62326c2bb073
Removing intermediate container 2c09f1869143
Step 8 : RUN ls /root/.m2/
---> Running in 91b602f529da
settings.xml
---> b7bc32199ab3
Removing intermediate container 91b602f529da
Step 9 : ENTRYPOINT bash
---> Running in 3167f5a6d923
---> 94b3e0b146da
Removing intermediate container 3167f5a6d923
Successfully built 94b3e0b146da
On Step 7 surely files being downloaded, but looks like not stored.
Following console command shows that there are no updates in local .m2 folder:
root#37f5a0d04232:/acceptance-tests# ls /root/.m2
settings.xml
If I try to run same command again from command line inside a container (when image is created and container had started):
root#37f5a0d04232:/acceptance-tests# mvn dependency:go-offline
Massive downloads starts and repository folder finally appeared under .m2
root#37f5a0d04232:/acceptance-tests# ls /root/.m2
repository settings.xml
I struggle to understand why changes caused by maven command from Dockerbuild file did not stored as docker layer.
I am using Docker 1.12 and maven 3.3.3
/root/.m2 is a volume that is why it gets cleared when docker container is launched. This can be avoided by caching the content in a custom directory and then copying it to /root/.m2 when container is launched.
Fortunately maven image is pre-baked with all the copying logic so you just have to point repository as:
RUN mvn -B -f /tmp/pom.xml -s /usr/share/maven/ref/settings-docker.xml dependency:resolve
The entry-point will take care of setting local repository for you. It helped me hope it helps you as well.
Also, for reference visit.

Resources