I have a Dockerfile in a directory on my local computer. That directory contains files & folders that I need for my cloud server. It's my understanding that COPY will make those files and folders a part of the docker image inside a newly created /app folder. When I build the image with docker build -t app:2023.01.25 . it builds without issues. Then I tag and push it to Google Cloud's Container Registry with docker tag app:2023.01.25 gcr.io/app/app:2023.01.25 then docker push gcr.io/app/app:2023.01.2 . I then spawn a new server with Google's Python SDK with the new image. I check to verify the image was used for the server, and it appears so. So I ssh into the server and can't find any files. Am I misunderstanding docker? This is my first time using docker images.
# Dockerfile
FROM intel/oneapi-hpckit:devel-ubuntu20.04
COPY . /app
WORKDIR /app
RUN python3 -m pip install -r requirements.txt
Windows cmd:
docker build -t app:2023.01.25 .
docker tag app:2023.01.25 gcr.io/app/app:2023.01.25
docker push gcr.io/app/app:2023.01.2
I have following images inside my docker registry:
Lets assume that file-finder image is derived from ls_files. If so, can I tell that file-finder shares 996MB of disk storage with ls_files image and has only 58,72MB of his own storage?
No, you assumption is incorrect.
I think your Dockerfile is probably like this:
FROM ls_files
RUN # Your commands, etc.
Then you run:
docker build -t file-finder:1.0.0 .
Now the image file-finder is a complete and stand-alone image. You can remove ls_files with no issue since the image ls_files is now included and downloaded into the file-finder image.
When you build an image on top of another, the base image then has nothing to do with the new image and you can remove the base.
Example
FROM alpine:latest
RUN apk add nginx
ENTRYPOINT ["nginx", "-g", "daemon off"]
Let us run:
docker build -t my_nginx:1 .
Now let us remove alpine:latest image.
docker image rm alpine:latest
Now let's run my_nginx:1 image and you should see no error.
Let's say I created a docker image using a command like
docker build -t myimage .
In the current directory where I built the image, I have
ls
Dockerfile
myscript.py
Later, I made changes to ONLY the "myscript.py" file. How do I update the image without needing to rebuild?
I'm doing cross-platform testing (tooling, not kernel), so I have a custom image (used for ephemeral Jenkins slaves) for each OS, based on standard base images: centos6, centos7, ubuntu14, sles11, sles12, etc.
Aside for the base being different, my images have a lot in common with each other (all of them get a copy of pre-built and frequently changing maven/gradle/npm repositories for speed).
Here is a simplified example of the way the images are created (the tarball is the same across images):
# Dockerfile one
FROM centos:centos6
ADD some-files.tar.gz
# Dockerfile two
FROM ubuntu:14.04
ADD some-files.tar.gz
This results in large images (multi-GB) that have to be rebuilt regularly. Some layer reuse occurs between rebuilds thanks to the docker build cache, but if I can stop having to rebuild images altogether it would be better.
How can I reliably share the common contents among my images?
The images don't change much outside of these directories. This cannot be a simple mounted volume because in use the directories in this layer are modified, so it cannot be read-only and the source must not be changed (so what I'm looking for is closer to a COW but applied to a specific subset of the image)
Problem with --cache-from:
The suggestion to use --cache-from will not work:
$ cat df.cache-from
FROM busybox
ARG UNIQUE_ARG=world
RUN echo Hello ${UNIQUE_ARG}
COPY . /files
$ docker build -t test-from-cache:1 -f df.cache-from --build-arg UNIQUE_ARG=docker .
Sending build context to Docker daemon 26.1MB
Step 1/4 : FROM busybox
---> 54511612f1c4
Step 2/4 : ARG UNIQUE_ARG=world
---> Running in f38f6e76bbca
Removing intermediate container f38f6e76bbca
---> fada1443b67b
Step 3/4 : RUN echo Hello ${UNIQUE_ARG}
---> Running in ee960473d88c
Hello docker
Removing intermediate container ee960473d88c
---> c29d98e09dd8
Step 4/4 : COPY . /files
---> edfa35e97e86
Successfully built edfa35e97e86
Successfully tagged test-from-cache:1
$ docker build -t test-from-cache:2 -f df.cache-from --build-arg UNIQUE_ARG=world --cache-from test-from-cache:1 .
Sending build context to Docker daemon 26.1MB
Step 1/4 : FROM busybox
---> 54511612f1c4
Step 2/4 : ARG UNIQUE_ARG=world
---> Using cache
---> fada1443b67b
Step 3/4 : RUN echo Hello ${UNIQUE_ARG}
---> Running in 22698cd872d3
Hello world
Removing intermediate container 22698cd872d3
---> dc5f801fc272
Step 4/4 : COPY . /files
---> addabd73e43e
Successfully built addabd73e43e
Successfully tagged test-from-cache:2
$ docker inspect test-from-cache:1 -f '{{json .RootFS.Layers}}' | jq .
[
"sha256:6a749002dd6a65988a6696ca4d0c4cbe87145df74e3bf6feae4025ab28f420f2",
"sha256:01bf0fcfc3f73c8a3cfbe9b7efd6c2bf8c6d21b6115d4a71344fa497c3808978"
]
$ docker inspect test-from-cache:2 -f '{
{json .RootFS.Layers}}' | jq .
[
"sha256:6a749002dd6a65988a6696ca4d0c4cbe87145df74e3bf6feae4025ab28f420f2",
"sha256:c70c7fd4529ed9ee1b4a691897c2a2ae34b192963072d3f403ba632c33cba702"
]
The build shows exactly where it stops using the cache, when the command changes. And the inspect shows the change of the second layer id even though the same COPY command was run in each. And anytime the preceding layer differs, the cache cannot be used from the other image build.
The --cache-from option is there to allow you to trust the build steps from an image pulled from a registry. By default, docker only trusts layers that were locally built. But the same rules apply even when you provide this option.
Option 1:
If you want to reuse the build cache, you must have the preceding layers identical in both images. You could try using a multi-stage build if the base image for each is small enough. However, doing this would lose all of the settings outside of the filesystem (environment variables, entrypoint specification, etc), so you'd need to recreate that as well:
ARG base_image
FROM ${base_image} as base
# the above from line makes the base image available for later copying
FROM scratch
COPY large-content /content
COPY --from=base / /
# recreate any environment variables, labels, entrypoint, cmd, or other settings here
And then build that with:
docker build --build-arg base_image=base1 -t image1 .
docker build --build-arg base_image=base2 -t image2 .
docker build --build-arg base_image=base3 -t image3 .
This could also be multiple Dockerfiles if you need to change other settings. This will result in the entire contents of each base image being copied, so make sure your base image is significantly smaller to make this worth the effort.
Option 2:
Reorder your build to keep common components at the top. I understand this won't work for you, but it may help others coming across this question later. It's the preferred and simplest solution that most people use.
Option 3:
Remove the large content from your image and add it to your containers externally as a volume. You lose the immutability + copy-on-write features of layers of the docker filesystem. And you'll manually need to ship the volume content to each of your docker hosts (or use a network shared filesystem). I've seen solutions where a "sync container" is run on each of the docker hosts which performs a git pull or rsync or any other equivalent command to keep the volume updated. If you can, consider mounting the volume with :ro at the end to make it read only inside the container where you use it to give you immutability.
Given it sounds like the content of this additional 4GB of data is unrelated to the underlying container image, is there any way to mount that data outside of the container build/creation process? I know this creates an additional management step (getting the data everywhere you want the image), but assuming it can be a read-only shared mount (and then untarred by the image main process into the container filesystem as needed), this might be an easier way than building it into every image.
Turns out that as of Docker 1.13, you can use the --cache-from OTHER_IMAGE flag. (Docs)
In this situation, the solution would look like this:
docker build -t image1
docker build -t image2 --cache-from image1
docker build -t image3 --cache-from image1 --cache-from image2
... and so on
This will ensure that any layer these images have in common is reused.
UPDATE: as mentioned in other answers, this doesn't do what I expected. I admit I still don't understand what this does since it definitely changes the push behavior but the layers are not ultimately reused.
The most reliable and docker way to share the common contents between different docker images, is to refactor the commonalities between the images, into base images that the other images extend.
Example, if all the images build on top of a base image and install in it packages x, y, and z. The you refactor the installion of packages x, y and z with the base image to a newer base image, that the downstream images build on top.
When I run the command:
docker run dockerinaction/hello_world
The first time the following scenario plays out:
The dockerinaction/hello_world Dockerfile can be seen below:
FROM busybox:latest
CMD ["echo", "hello world"]
So from the wording:
Docker searches Docker Hub for the image
There are several things I'm curious about:
Is the image dockerinaction/hello_world?
Does this image reference another image named busybox:latest?
What about the Dockerfile is that on my machine somewhere?
Answers to each bulleted question, in corresponding order:
Yes, the image is dockerinaction/hello_world.
Yes, the image does reference busybox:latest, and builds upon it.
No, the Dockerfile is not stored on your machine. The docker run command is downloading a compressed version of the built Docker image that it found on Docker Hub. In some ways, you can think of the Dockerfile as the source code and the built image as the binary.
If you wanted to, you could write your own Dockerfile with the following contents:
FROM busybox:latest
CMD ["echo", "hello world"]
Then, in the directory containing that file (named Dockerfile), you could:
$ docker build -t my-hello-world:latest .
$ docker run my-hello-world:latest
The docker build command builds the Dockerfile, which in this case is stored on your machine. The built Docker image is tagged as my-hello-world:latest, and is only available on your machine (where it was built) unless you push it somewhere. You can run the built image from your machine by referring to the tag in the docker run command, as in the second line above.