Docker Caching, how does it really work? - docker

I understand that docker stores every images with layers. If I have multiple users on one development server, and everyone is running the same Dockerfile, but storing the image as, user1_myapp. And user2 is storing it as user2_myapp. Now again, they are using the same Dockerfile.
The question is, if the image is for example 100mb, are both images taking 100mb each, or are they sharing the same image and using only 100mb instead of 200mb?

Yes, the two images will share the same layers if you meet the prerequisites. Docker layers are reused independently of the resulting image name. The requirements to use a cached layer instead of creating a new one are:
The build command needs to be run against the same docker host where the previous image's cache exists.
The previous layer ID must match between the cache layer and the running build step.
The command currently being run, or the source context if you are running a COPY or ADD, must be identical. Docker does not know if you are running a command that pulls from an external changing resource (e.g. git clone or apt-get update), which can result in a false cache hit.
You cannot have disabled caching in your build command.
Keep in mind that layers are immutable, once created they are never changed, just replaced with different layers with new ID's when you run a different build. When you run a container, it uses a copy-on-write RW layer specific to that container, which allows multiple containers and images to point to the same image layers when they get a cache hit.
If you are having problems getting the cache to match in the two builds, e.g. importing a large file and something like the file timestamp doesn't match, consider creating an intermediate image that contains the common files. Then each project can build FROM that intermediate image.

Related

Why do docker containers rely on uploading (large) images rather than building from the spec files?

Having needed several times in the last few days to upload a 1Gb image after some micro change, I can't help but wonder why there isnt a deploy path built into docker and related tech (e.g. k8s) to push just the application files (Dockerfile, docker-compose.yml and app related code) and have it build out the infrastructure from within the (live) docker host?
In other words, why do I have to upload an entire linux machine whenever I change my app code?
Isn't the whole point of Docker that the configs describe a purely deterministic infrastructure output? I can't even see why one would need to upload the whole container image unless they make changes to it manually, outside of Dockerfile, and then wish to upload that modified image. But that seems like bad practice at the very least...
Am I missing something or this just a peculiarity of the system?
Good question.
Short answer:
Because storage is cheaper than processing power, building images "Live" might be complex, time-consuming and it might be unpredictable.
On your Kubernetes cluster, for example, you just want to pull "cached" layers of your image that you know that it works, and you just run it... In seconds instead of compiling binaries and downloading things (as you would specify in your Dockerfile).
About building images:
You don't have to build these images locally, you can use your CI/CD runners and run the docker build and docker push from the pipelines that run when you push your code to a git repository.
And also, if the image is too big you should look into ways of reducing its size by using multi-stage building, using lighter/minimal base images, using few layers (for example multiple RUN apt install can be grouped to one apt install command listing multiple packages), and also by using .dockerignore to not ship unnecessary files to your image. And last read more about caching in docker builds as it may reduce the size of the layers you might be pushing when making changes.
Long answer:
Think of the Dockerfile as the source code, and the Image as the final binary. I know it's a classic example.
But just consider how long it would take to build/compile the binary every time you want to use it (either by running it, or importing it as a library in a different piece of software). Then consider how indeterministic it would download the dependencies of that software, or compile them on different machines every time you run them.
You can take for example Node.js's Dockerfile:
https://github.com/nodejs/docker-node/blob/main/16/alpine3.16/Dockerfile
Which is based on Alpine: https://github.com/alpinelinux/docker-alpine
You don't want your application to perform all operations specified in these files (and their scripts) on runtime before actually starting your applications as it might be unpredictable, time-consuming, and more complex than it should be (for example you'd require firewall exceptions for an Egress traffic to the internet from the cluster to download some dependencies which you don't know if they would be available).
You would instead just ship an image based on the base image you tested and built your code to run on. That image would be built and sent to the registry then k8s will run it as a black box, which might be predictable and deterministic.
Then about your point of how annoying it is to push huge docker images every time:
You might cut that size down by following some best practices and well designing your Dockerfile, for example:
Reduce your layers, for example, pass multiple arguments whenever it's possible to commands, instead of re-running them multiple times.
Use multi-stage building, so you will only push the final image, not the stages you needed to build to compile and configure your application.
Avoid injecting data into your images, you can pass it later on-runtime to the containers.
Order your layers, so you would not have to re-build untouched layers when making changes.
Don't include unnecessary files, and use .dockerignore.
And last but not least:
You don't have to push images from your machine, you can do it with CI/CD runners (for example build-push Github action), or you can use your cloud provider's "Cloud Build" products (like Cloud Build for GCP and AWS CodeBuild)

How to improve automation of running container's base image updates?

I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!
There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)

Why should our work inside the container shouldn't modify the content of the container itself?

I am reading an article related to docker images and containers.
It says that a container is an instance of an image. Fair enough. It also says that whenever you make some changes to a container, you should create an image of it which can be used later.
But at the same time it says:
Your work inside a container shouldn’t modify the container. Like
previously mentioned, files that you need to save past the end of a
container’s life should be kept in a shared folder. Modifying the
contents of a running container eliminates the benefits Docker
provides. Because one container might be different from another,
suddenly your guarantee that every container will work in every
situation is gone.
What I want to know is that, what is the problem with modifying container's contents? Isn't this what containers are for? where we make our own changes and then create an image which will work every time. Even if we are talking about modifying container's content itself and not just adding any additional packages, how will it harm anything since the image created from this container will also have these changes and other containers created from that image will inherit those changes too.
Treat the container filesystem as ephemeral. You can modify it all you want, but when you delete it, the changes you have made are gone.
This is based on a union filesystem, the most popular/recommended being overlay2 in current releases. The overlay filesystem merges together multiple lower layers of the image with an upper layer of the container. Reads will be performed through those layers until a match is found, either in the container or in the image filesystem. Writes and deletes are only performed in the container layer.
So if you install packages, and make other changes, when the container is deleted and recreated from the same image, you are back to the original image state without any of your changes, including a new/empty container layer in the overlay filesystem.
From a software development workflow, you want to package and release your changes to the application binaries and dependencies as new images, and those images should be created with a Dockerfile. Persistent data should be stored in a volume. Configuration should be injected as either a file, environment variable, or CLI parameter. And temp files should ideally be written to a tmpfs unless those files are large. When done this way, it's even possible to make the root FS of a container read-only, eliminating a large portion of attacks that rely on injecting code to run inside of the container filesystem.
The standard Docker workflow has two parts.
First you build an image:
Check out the relevant source tree from your source control system of choice.
If necessary, run some sort of ahead-of-time build process (compile static assets, build a Java .jar file, run Webpack, ...).
Run docker build, which uses the instructions in a Dockerfile and the content of the local source tree to produce an image.
Optionally docker push the resulting image to a Docker repository (Docker Hub, something cloud-hosted, something privately-run).
Then you run a container based off that image:
docker run the image name from the build phase. If it's not already on the local system, Docker will pull it from the repository for you.
Note that you don't need the local source tree just to run the image; having the image (or its name in a repository you can reach) is enough. Similarly, there's no "get a shell" or "start the service" in this workflow, just docker run on its own should bring everything up.
(It's helpful in this sense to think of an image the same way you think of a Web browser. You don't download the Chrome source to run it, and you never "get a shell in" your Web browser; it's almost always precompiled and you don't need access to its source, or if you do, you have a real development environment to work on it.)
Now: imagine there's some critical widespread security vulnerability in some core piece of software that your application is using (OpenSSL has had a couple, for example). It's prominent enough that all of the Docker base images have already updated. If you're using this workflow, updating your application is very easy: check out the source tree, update the FROM line in the Dockerfile to something newer, rebuild, and you're done.
Note that none of this workflow is "make arbitrary changes in a container and commit it". When you're forced to rebuild the image on a new base, you really don't want to be in a position where the binary you're running in production is something somebody produced by manually editing a container, but they've since left the company and there's no record of what they actually did.
In short: never run docker commit. While docker exec is a useful debugging tool it shouldn't be part of your core Docker workflow, and if you're routinely running it to set up containers or are thinking of scripting it, it's better to try to move that setup into the ordinary container startup instead.

How to do deterministic builds of Docker images?

I'm trying to build Docker images and I would like my Docker images to be deterministic. Much to my surprise I found that even a trivial Dockerfile such as
FROM scratch
ENV a b
Produces different IDs when built repeatedly using docker build --no-cache .
How could I make my builds deterministic and whats causing the changes in image IDs? When caching is enabled the same ID is produced.
The reason I'm trying to get this reproducibility is to enable producing the same layers in a distributed build environment. I can not control where a build is run therefore I can not know what is in the cache.
Also the Docker build downloads files using wget from an ftp which may or may not have changed, currently I can not easily tell Docker from within a Dockerfile if the results of a RUN should invalidate the cache. Therefore if I could just produce the same ID for identical layers (when no cache is used) these layers would not have to be "push"ed and "pull"ed again.
Also all the reasons listed here: https://reproducible-builds.org/
AFAIK, currently docker images do not hash to byte-exact hashes, since the metadata currently contains stateful information such as created date. You can check out the design doc from 1.10. Unfortunately, it looks like the history metadata is an important part of image validity and identification.
Don't get me wrong, I'm all about reproducible builds. However I don't believe hash-exactness is the best criteria for measuring reproducibility of a docker image. A docker image isn't a compiled binary. There is no way to guarantee the results of a stage will ever be able to be reproduced, so even if the datetime metadata was absent, it would not guarantee reproducible builds. Take this pathological example:
RUN curl "https://www.random.org/strings/?num=1&len=20&digits=on&unique=on&format=plain&rnd=new" -o nonce.txt
The image ID is a SHA256 of the image's configuration object (what you get when you do a docker image inspect). Run this with the images you are creating and you will see differences between them.

docker - how can we export/import (or save/load) only the new changes?

I'm new to docker, Could any one help for below query
Server has a docker image like 1GB Image:ver1 [this image stored has .tar file in server]
In ubuntu PC donwloaded the tar image form server and loads/import the image[Image:ver1] using Docker
A new Image:ver2 is available on the Serever, size is still 1GB but difference with ver1 is only 10MB.
Q1: If it’s possible to “import/load” the new image[Image:ver2] from server, how can we export/import (or save/load) only the new changes[i.e 10MB]?
Q2: if we are able to apply above changes on top o existing image[i.e Image:Ver1], what are the steps to do?
Docker is a file based system and for each Pull request, it only pulls out the files which are changed. For example, if suppose you have 1 GB data in a file in a docker image. Now, you added 500MB of data to it. Then, in case of docker pull, it will only pull the changes, i.e the Delta part between the 2 files. So, you are safe and it won't pull all the things separately.
Although, while making a DockerFile, or docker conf file, you should be very careful, as all the lines in a Docker file is stored as a layer in the system. If suppose you have 10 layers in your Docker file, and you are changing the 5th layer, then all the layers after the 5th layer will again be pulled. That is the only catch using Docker.
Rest, it will always pull the Delta of changes for each pull request.
If you want to save/load tar files of docker images, there's no option to export a partial image. You can send over the full image, move your data to be an external volume that isn't transfered this way, or you can use a docker registry.
The latter is relatively easy to implement, docker includes an image where you can run your own private registry. Pushes and pulls to a docker registry will only send the changed layers, so you can make use of layer caching and structure your Dockerfiles to minimize the number of changed layers.
Ok, I've built the tool to create diff (in top layers) of docker images' versions (layer by layer) as a tarball and inflate the original image later.
Note. Works only for changes in top layers.
4-step process:
docker inspect -> make json with old layers' hashes as a json file
Prepare diff based on new image and hashes of old (existing) layers
Transfer diff to target machine
Inflate target image's tar based on diff and old image

Resources