Creating a Dockerfile - docker starts from scratch on each new build - docker

I am trying to build a dockerfile - iteratively adding lines and testing. My understanding was that docker will cache the lines that have already been built and start from the new lines I had added. The case seems to be that it just builds from scratch each time I call build on my container. Is this normal? If not - what am I doing wrong?

As demas said, if you're simply appending lines, the previous lines will be cached.
However, if anywhere in your Dockerfile you have a line like
ADD . /some/path
then Docker will assume that that line has changed even if it was only the Dockerfile that changed. So that line and anything after it will never be cached, unless nothing in the folder you're adding has changed.
You should be able to see whether this is happening by paying close attention to the output of the docker build command.
As a side note: a consequence of this is that if you're building a Dockerfile, you generally want to add the files in the directory as late as possible, doing any preparations beforehand. Of course, you will end up having to do things to your files (like some kind of build process) which is unfortunately hard to cache.

If I understood correctly you can look at Docker as version control system where each line in your Dockerfile is a commit to the container.
If you add new line to your Dockerfile, Docker get the last revision of container and make a new commit. If you add line on the middle of your Dockerfile, Docker get one of the previous revisions and make new commit to this part of the tree.

Related

New Keycloak image with own changes crashes directly

I have a weird problem with Docker and hope someone here can help me :)
I want to create a keycloak image that is derived from the image jboss/keycloak. The idea is that in the Dockerfile also a preconfigured standalone.xml is copied into the image and keycloak can start directly without manual work.
But as soon as I write for example a
"CMD touch /opt/test.txt"
into the file the container crashes with the message "12:02:14,290 INFO [org.jboss.modules] (main) JBoss Modules version 1.9.1.Final
WFLYSRV0073: Invalid option '/bin/sh'"
This is just a new file with no purpose, the changes to the .xml are not in there yet.
As soon as I put only the FROM back in and rebuild everything works again.
I thought through the layers in the container you could mod an image, but here it doesn't seem to work. Can someone tell me why ?
So far it has always worked with the alpine image, but I don't want to build the whole keycloak setup again myself, when there is already an official image for it.
This is basically what I had in mind:
FROM jboss/keycloak:X.XX
CMD rm /opt/jboss/keycloak/standalone/configuration/standalone.xml
COPY ./keycloak/standalone.xml /opt/jboss/keycloak/standalone/configuration/
Thanks for help :)
Change
CMD rm
to
RUN rm
RUN is part of building. every RUN command is executed while your image is built.
With CMD you define (or override) the default command when running/starting a container based on your image (and you don't want to change keycloaks default CMD)

Is there a way to force a particular line in my Dockerfile to always rebuild, whilst still benefiting from caching on the preceding layers? [duplicate]

This question already has answers here:
Disable cache for specific RUN commands
(9 answers)
Closed 1 year ago.
I frequently seem to have to write Dockerfiles like this (line numbers added for clarity):
1. FROM somebase
2. RUN cp /some/local/stuff /some/docker/container/path
3. RUN some-other-local-commands
4. RUN wget http://some.remote.server/some.remote.path.for.example.json
5. RUN some-other-local-commands-which-may-depend-on-the-json
On line (4), I'm fetching a remote resource. Let's assume for now that's a JSON file. It might change from time-to-time, maybe not on every build, but perhaps every few hours or days.
What this means is that every time I build my container, I want to ensure the freshest JSON file is fetched. One way to force this is to add the --no-cache parameter to my docker build command, but this forces all of the lines/layers to rebuild, including (1)-(3), where that is likely not necessary. Is there a pattern or technique to automatically 'taint' or 'mark' line (4) so that Docker knows it always has to re-run the wget (presumably this would also have to force a rebuild of line 5), whilst still getting the layer caching behaviour for lines (1)-(3) when Docker detects the pre-req files haven't changed?
If the specific thing you're trying to trigger rebuilds is the result of RUN wget ... a specific URL, Docker does actually have native support for this.
There are two similar commands to copy files into a container. COPY only copies files from the build context. ADD can also fetch external URLs and unpack local archives (but not both at the same time). The general recommendation is to use COPY, unless you need one of the specific things ADD does differently.
So you should be able to say
ADD http://some.remote.server/some.remote.path.for.example.json .
RUN some-other-local-commands-which-may-depend-on-the-json
and the RUN command will use the Docker layer cache based on the contents of the fetched file.
If this approach doesn't work for you (maybe you need special authentication to fetch the file) you can also fetch the file outside of Docker before you run docker build, and then COPY it in. Again, it will work like any other file you COPY in, and layer caching will take effect based on whether the file has changed or not.

Docker image layer: What does `ADD file:<some_hash> in /` mean?

In Docker Hub images there are lists of commands that being run for each image layer. Here is a golang example.
Some applications also provide their Dockerfile in GitHub. Here is a golang example.
According to the Docker Hub image layer, ADD file:4b03b5f551e3fbdf47ec609712007327828f7530cc3455c43bbcdcaf449a75a9 in / is the first command. The image layer doesn't have any "FROM" command included, and it doesn't seem to be suffice the ADD definition too.
So here are the questions:
What does ADD file:<HASH> in / means? What is this format?
Is there any way I could trace upwards using the hash? I suppose that hash represents the FROM image, but it seems there are no API for that.
Why it is not possible to build a dockerfile using the ADD file:<HASH> in / syntax? Is there any way I could build an image using such syntax, OR do a conversion between two format?
That Docker Hub history view doesn't show the actual Dockerfile; instead, it shows content essentially extracted from the docker history of the image. That doesn't preserve the specific details you're looking for: it doesn't remember the names of base images, or the build-context file names of things that get ADDed or COPYed in.
Chasing through GitHub and Docker Hub links, the golang:*-buster Dockerfile is built FROM buildpack-deps:...-scm; buildpack-deps:buster-scm is FROM buildpack-deps:buster-curl; that is FROM debian:buster; and that has a very simple Dockerfile (quoted here in its entirety):
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
FROM scratch starts from a completely totally empty image; that is the base of the Docker image tree (and what tells docker history and similar tools to stop). The ADD line unpacks a tar file of a Debian system image.
If you look at docker history or the Docker Hub history view you cite, you should be able to see these same steps happening. The ADD file:4b0... in / corresponds to the ADD rootfs.tar.gz /, and the second line is the CMD ["bash"]. It is not split up by Dockerfile or image, and the original filenames from ADD aren't saved. (You couldn't reproduce the image anyways without the contents of the rootfs.tar.gz, so it's merely slightly helpful to know its filename but not essential.)
The ADD file:hash in /path syntax is not standard Dockerfile syntax (the word in in particular is not part of it). I'm not sure there's a reliable way to translate from the host file or URL to the hash, but building the image and looking at its docker history would tell you (assuming you've got a perfect match for the file metadata). There's no way to get back to the original filename or syntax, and definitely no way to get back to the file contents.
ADD or COPY means that files are append to the images.
That are files, you cannot "trace" them.
You cannot just copy the commands, because the hashes are not the original files. See https://forums.docker.com/t/how-to-extract-file-from-image/96987 to get the file.

What does "From image" do in dockerfiles

I see that dockerfiles usually have a line beginning with "from" keywork, for example:
FROM composer/composer:1.1-alpine AS composer
As far as I know, dockerfiles are a set of commands that help to build and run many containers in docker.
As the example above, docker uses a image named composer/composer:1.1-alpine from docker hub. The As composer just make an alias, so we can use it more convenient.
When I looked for the image, I found the link enter link description here and then enter link description here.
The thing I dont really understand is that:
I guess docker will use the image to build something, but how exactly does it use the image? Does docker run the image or just prepare to use it when in need. Sometimes I dont see the dockerfiles use the image in following line (like this example, there are no lines using the keyword "composer" except the first line). It makes me confused.
Any help would be appreciated.
Thanks.
DockerFiles describes layers: Each command creates it's own layer. For example:
RUN touch test.txt
RUN cp test.txt foo.txt
would create two layers - the first one with the file test.txt, the second one without test.txt but with foo.txt
Each layer adds something to a container. When we walk the layers "up" we find that the very first layer is the empty layer, e.g. it contains only the linux (or windows) kernel itself - nothing else. But that's not really useful - we need a lot of tools (e.g. bash) to be able to run an app. So common base images like alpine add suc tools and core os functions.
It would be annoying as hell if we had to do this setup in every container so there a lots of base images, which do exactly this kind of setup.
If you want to see what a base image does, just search the name on hub.docker.com - there you will find the Dockerfile, describing the build process.
Aditionally, containers can be extenend, e.g. you use the elasticsearch container as a base image, and add your own functionality - that's the second use case for base images.
For your second question: You have to decide if you have to replicate the steps in your base image or not. If you inherit from a general OS image like apline - probably not, since linux normally ships with these tools. If you inherit from a more specialized container, it depends - if your machine matches the environment in the container, you don't need to, but if not you will have to apply these steps to your machine, too. E.g. if you don't have elasticsearch installed, you have to install it.
As for multiple froms in one Dockerfile: Please look up the documentation for Multi Stage builds. Essentially, they encapsulate multiple containers in a single dockerfile. Which can be very useful if you need a different set to build an app and to run the app. The first Container is responsible to build your app, while the second one takes the compiled source code and just runs it.
Watch for COPY --from= lines, these are copying files from one container to another.
The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. As such, a valid Dockerfile must start with a FROM instruction. The image can be any valid image – it is especially easy to start by pulling an image from the Public Repositories.
FROM can appear multiple times within a single Dockerfile to create multiple images or use one build stage as a dependency for another. Simply make a note of the last image ID output by the commit before each new FROM instruction. Each FROM instruction clears any state created by previous instructions.
Optionally a name can be given to a new build stage by adding AS name to the FROM instruction. The name can be used in subsequent FROM and COPY --from= instructions to refer to the image built in this stage.
The tag or digest values are optional. If you omit either of them, the builder assumes a latest tag by default. The builder returns an error if it cannot find the tag value.
Taken from : https://docs.docker.com/engine/reference/builder/#from

Re-running Docker only until a certain step using caches?

Is there any option to force Docker to run a build without using caches from that step on?
A particular usecase is something like this:
...
ADD some.cfg some.cfg
RUN do something with some.cfg
While working on the Dockerfile it is often necessary to adjust configurations and test them.
From the Docker point of view the steps remain unmodified however from my point as a Dockerfile
write I want Docker to run the build using caches until the ADD operation and from that point on
without caches. Is this possible?
As Mykola suggests, Docker will take a checksum of the files and should invalidate the cache if the content changes. However, you can always force cache invalidation at a given line in a Dockerfile by setting or changing an environment variable at that point e.g:
...
ENV updated-adds-on 14-DEC-14
ADD...
According to this, you don't have to disable cache on the added file change as docker will examine the contents and skip cache on change. Otherwise
you can use the --no-cache=true option on the docker build command.

Resources