How are intermediate containers formed? - docker

I would like to understand the execution steps involved in building Docker Images using Dockerfile. Couple of questions I have listed down below. Please help me in understanding the build process.
Dockerfile content
#from base image
FROM ubuntu:14.04
#author name
MAINTAINER RAGHU
#commands to run in the container
RUN echo "hello Raghu"
RUN sleep 10
RUN echo "TASK COMPLETED"
Command used to build the image: docker build -t raghavendar/hands-on:2.0 .
Sending build context to Docker daemon 20.04 MB
Step 1 : FROM ubuntu:14.04
---> b1719e1db756
Step 2 : MAINTAINER RAGHU
---> Running in 532ed79e6d55
---> ea6184bb8ef5
Removing intermediate container 532ed79e6d55
Step 3 : RUN echo "hello Raghu"
---> Running in da327c9b871a
hello Raghu
---> f02ff92252e2
Removing intermediate container da327c9b871a
Step 4 : RUN sleep 10
---> Running in aa58dea59595
---> fe9e9648e969
Removing intermediate container aa58dea59595
Step 5 : RUN echo "TASK COMPLETED"
---> Running in 612adda45c52
TASK COMPLETED
---> 86c73954ea96
Removing intermediate container 612adda45c52
Successfully built 86c73954ea96
In step 2 :
Step 2 : MAINTAINER RAGHU
---> Running in 532ed79e6d55
Question 1 : it indicates that it is running in the container with id - 532ed79e6d55, but with what Docker image is this container formed ?
---> ea6184bb8ef5
Question 2 : what is this id? Is it an image or container ?
Removing intermediate container 532ed79e6d55
Question 3 : Is the final image formed with multiple layers saved from intermediate containers?

Yes, Docker images are layered. When you build a new image, Docker does this for each instruction (RUN, COPY etc.) in your Dockerfile:
create a temporary container from the previous image layer (or the base FROM image for the first command;
run the Dockerfile instruction in the temporary "intermediate" container;
save the temporary container as a new image layer.
The final image layer is tagged with whatever you name the image - this will be clear if you run docker history raghavendar/hands-on:2.0, you'll see each layer and an abbreviation of the instruction that created it.
Your specific queries:
1) 532 is a temporary container created from image ID b17, which is your FROM image, ubuntu:14.04.
2) ea6 is the image layer created as the output of the instruction, i.e. from saving intermediate container 532.
3) yes. Docker calls this the Union File System and it's the main reason why images are so efficient.

Related

RUN Instruction in building Docker Image

Below is my Docker file
ARG tag_info=latest
FROM alpine:${tag_info} AS my_apline
ARG message='Hello World'
RUN echo ${message}
RUN echo ${message} > myfile.txt
RUN echo "Hi Harry"
When i run docker image build -t myimage:v1_0 - < Dockerfile
the output is :
Sending build context to Docker daemon 2.048kB
Step 1/6 : ARG tag_info=latest
Step 2/6 : FROM alpine:${tag_info} AS my_apline
latest: Pulling from library/alpine
cbdbe7a5bc2a: Already exists
Digest: sha256:9a839e63dad54c3a6d1834e29692c8492d93f90c59c978c1ed79109ea4fb9a54
Status: Downloaded newer image for alpine:latest
---> f70734b6a266
Step 3/6 : ARG message='Hello World'
---> Running in 74bcc8897e8e
Removing intermediate container 74bcc8897e8e
---> d8d50432d375
Step 4/6 : RUN echo ${message}
---> Running in 730ed8e1c1d3
Hello World
Removing intermediate container 730ed8e1c1d3
---> 8417e3167e80
Step 5/6 : RUN echo ${message} > myfile.txt
---> Running in c66018331383
Removing intermediate container c66018331383
---> 07dc27d8ad3d
Step 6/6 : RUN echo "Hi Harry"
---> Running in fb92fb234e42
Hi Harry
Removing intermediate container fb92fb234e42
---> a3bec122a77f
It displays "Hi Harry" and "Hello World" in the middle (which I do not understand why)
Why "Hi Harry and "Hello World" is not displayed when i spin the container from image file?
Because the RUN command executes the commands when you are building the image not when you are spinning up the container using the image. It is used to alter the image like adding new packages using apt-get or changing file permissions etc
If you need something to run when the container is starting you need to use command to entrypoint instructions
From the docker official documentation:
The RUN instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting committed image will be used for the next step in the Dockerfile.
You should use CMD if you want to obtain the described behavior.
The main purpose of a CMD is to provide defaults for an executing container. These defaults can include an executable, or they can omit the executable, in which case you must specify an ENTRYPOINT instruction as well.
This has three forms:
- CMD ["executable","param1","param2"] (exec form, this is the preferred form)
- CMD ["param1","param2"] (as default parameters to ENTRYPOINT)
- CMD command param1 param2 (shell form)

Getting reproducible docker layers on different hosts

Problem: I can't reproduce docker layers using exactly same content (on one machine or in CI cluster where something is built from git repo)
Consider this simple example
$ echo "test file" > test.txt
$ cat > Dockerfile <<EOF
FROM alpine:3.8
COPY test.txt /test.txt
EOF
If I build image on one machine with caching enabled, then last layer with copied file would be shared across images
$ docker build -t test:1 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
3.8: Pulling from library/alpine
cd784148e348: Already exists
Digest: sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1
Status: Downloaded newer image for alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:1
$ docker build -t test:2 .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> Using cache
---> decab6a3fbe3
Successfully built decab6a3fbe3
Successfully tagged test:2
But with cache disabled (or simply using another machine) I got different hash values.
$ docker build -t test:3 --no-cache .
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM alpine:3.8
---> 3f53bb00af94
Step 2/2 : COPY test.txt /test.txt
---> ced4dff22d62
Successfully built ced4dff22d62
Successfully tagged test:3
At the same time history command shows that file content was same
$ docker history test:1
IMAGE CREATED CREATED BY SIZE COMMENT
decab6a3fbe3 6 minutes ago /bin/sh -c #(nop) COPY file:d9210c40895e
$ docker history test:3
IMAGE CREATED CREATED BY SIZE COMMENT
ced4dff22d62 27 seconds ago /bin/sh -c #(nop) COPY file:d9210c40895e
Am I missing something or this behavior is by design?
Are there any technics to get reproducible/reusable layers that does not force me to do one of the following
Share docker cache across machines
Do a pull of "previous" image before building next
Ultimately this problem prevents me from getting thin layers with constantly changing app code while keeping layers of my dependencies in separate and infrequently changed layer.
After some extra googling, I found a great post describing solution to this problem.
Starting from 1.13, docker has --cache-from option that can be used to tell docker to look at another images for layers. Important thing - image should be explicitly pulled for it to work + you still need point what image to take. It could be latest or any other "rolling" image you have.
Given that, unfortunately there is no way to produce same layer in "isolation", but cache-from solves root problem - how to eventually reuse some layers during ci build.

Dockerfile: mkdir and COPY commands run fine but I can't see the directory and file

I am using jenkins image to create a docker container. For now I am just trying to create a new directory and copy a couple of files. The image build process runs fine but when I start the container I cannot see the files and the directory.
Here is my dockerfile
FROM jenkins:2.46.1
MAINTAINER MandeepSinghGulati
USER jenkins
RUN mkdir /var/jenkins_home/aws
COPY aws/config /var/jenkins_home/aws/
COPY aws/credentials /var/jenkins_home/aws/
I found a similar question here but it seems different because I am not creating the jenkins user. It already exists with home directory /var/jenkins_home/. Not sure what I am doing wrong
Here is how I am building my image and starting the container:
➜ jenkins_test docker build -t "test" .
Sending build context to Docker daemon 5.632 kB
Step 1/6 : FROM jenkins:2.46.1
---> 04c1dd56a3d8
Step 2/6 : MAINTAINER MandeepSinghGulati
---> Using cache
---> 7f76c0f7fc2d
Step 3/6 : USER jenkins
---> Running in 5dcbf4ef9f82
---> 6a64edc2d2cb
Removing intermediate container 5dcbf4ef9f82
Step 4/6 : RUN mkdir /var/jenkins_home/aws
---> Running in 1eb86a351beb
---> b42587697aec
Removing intermediate container 1eb86a351beb
Step 5/6 : COPY aws/config /var/jenkins_home/aws/
---> a9d9a28fd777
Removing intermediate container ca4a708edc6e
Step 6/6 : COPY aws/credentials /var/jenkins_home/aws/
---> 9f9ee5a603a1
Removing intermediate container 592ad0031f49
Successfully built 9f9ee5a603a1
➜ jenkins_test docker run -it -v $HOME/jenkins:/var/jenkins_home -p 8080:8080 --name=test-container test
If I run the command without the volume mount, I can see the copied files and the directory. However with the volume mount I cannot see the same. Even if I empty the directory on the host machine. Is this the expected behaviour? How can I copy over files to the directory being used as a volume ?
Existing volumes can be mounted with
docker container run -v MY-VOLUME:/var/jenkins_home ...
Furthermore, the documentation of COPY states:
All new files and directories are created with a UID and GID of 0.
So COPY does not reflect your USER directive. This seems to be the second part of your problem.

Undertsanding docker build

When I run a docker build command i see the following
[root#hadoop01 myjavadir]# docker build -t runhelloworld .
Sending build context to Docker daemon 4.096 kB
Sending build context to Docker daemon
Step 0 : FROM java
---> 3323938eb5a2
Step 1 : MAINTAINER priyanka priyanka.patil#subex.com
---> Running in 89fa73dbc2b8
---> 827afdfa3d71
Removing intermediate container 89fa73dbc2b8
Step 2 : COPY ./HelloWorld.java .
---> 9e547d78d08c
Removing intermediate container ff5b7c7a8122
Step 3 : RUN javac HelloWorld.java
---> Running in d52f3093d6a3
---> 86121aadfc67
Removing intermediate container d52f3093d6a3
Step 4 : CMD java HelloWorld
---> Running in 7b4fa1b8ed37
---> 6eadaac27986
Removing intermediate container 7b4fa1b8ed37
Successfully built 6eadaac27986
Want to understand the meaning of these container ids like 7b4fa1b8ed37.
What does it mean when the daemon says "Removing intermediate container d52f3093d6a3"?
The docker build process automates what is happening in the Creating your own images section of the docker docs.
In your case above:
The image ID we're going to start with is 3323938eb5a2 (the ID of the java image)
from that image we run a container (after it's created it has a container ID of 89fa73dbc2b8) to set the MAINTAINER meta data, docker commits the changes and the resulting layer ID is 827afdfa3d71
because we're finished with the container 89fa73dbc2b8, we can remove it
from the layer we created from the MAINTAINER line, we create a new container to run the command COPY ./HelloWorld.java . which gets a container ID of ff5b7c7a8122, docker commits the changes and the resulting layer ID is 9e547d78d08c
because we're finished with the container ff5b7c7a8122, we can remove it
Repeat for steps 3 and 4.

Does docker --rm=true affect caching adversly?

docker build --rm=true
This is the default option, which makes it to delete all intermediate images after a successful build.
Does it affect the caching adversely? Since cache relies on the intermediate images I think?
Why not try it and find out?
$ cat Dockerfile
FROM debian
RUN touch /x
RUN touch /y
$ docker build --rm .
Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon
Step 0 : FROM debian
---> df2a0347c9d0
Step 1 : RUN touch /x
---> Running in 2e5ff13506e5
---> fd4dd6845e31
Removing intermediate container 2e5ff13506e5
Step 2 : RUN touch /y
---> Running in b2a585989fa5
---> 0093f530941b
Removing intermediate container b2a585989fa5
Successfully built 0093f530941b
$ docker build --rm .
Sending build context to Docker daemon 2.048 kB
Sending build context to Docker daemon
Step 0 : FROM debian
---> df2a0347c9d0
Step 1 : RUN touch /x
---> Using cache
---> fd4dd6845e31
Step 2 : RUN touch /y
---> Using cache
---> 0093f530941b
Successfully built 0093f530941b
So no, the cache still works. As you pointed out, --rm is actually on by default (you would have to run --rm=false to turn it off), but it refers to the intermediate containers not the intermediate images. These are the containers that Docker ran your build commands in to create the images. In some cases you might want to keep those containers around for debugging, but normally the images are enough. In the above output, we can see the containers 2e5ff13506e5 and b2a585989fa5, which are deleted, but also the images fd4dd6845e31 and 0093f530941b which are kept.
You can't delete the intermediate images as they are needed by the final image (an image is the last layer plus all ancestor layers).

Resources