How to do deterministic builds of Docker images? - docker

I'm trying to build Docker images and I would like my Docker images to be deterministic. Much to my surprise I found that even a trivial Dockerfile such as
FROM scratch
ENV a b
Produces different IDs when built repeatedly using docker build --no-cache .
How could I make my builds deterministic and whats causing the changes in image IDs? When caching is enabled the same ID is produced.
The reason I'm trying to get this reproducibility is to enable producing the same layers in a distributed build environment. I can not control where a build is run therefore I can not know what is in the cache.
Also the Docker build downloads files using wget from an ftp which may or may not have changed, currently I can not easily tell Docker from within a Dockerfile if the results of a RUN should invalidate the cache. Therefore if I could just produce the same ID for identical layers (when no cache is used) these layers would not have to be "push"ed and "pull"ed again.
Also all the reasons listed here: https://reproducible-builds.org/

AFAIK, currently docker images do not hash to byte-exact hashes, since the metadata currently contains stateful information such as created date. You can check out the design doc from 1.10. Unfortunately, it looks like the history metadata is an important part of image validity and identification.
Don't get me wrong, I'm all about reproducible builds. However I don't believe hash-exactness is the best criteria for measuring reproducibility of a docker image. A docker image isn't a compiled binary. There is no way to guarantee the results of a stage will ever be able to be reproduced, so even if the datetime metadata was absent, it would not guarantee reproducible builds. Take this pathological example:
RUN curl "https://www.random.org/strings/?num=1&len=20&digits=on&unique=on&format=plain&rnd=new" -o nonce.txt

The image ID is a SHA256 of the image's configuration object (what you get when you do a docker image inspect). Run this with the images you are creating and you will see differences between them.

Related

Why do docker containers rely on uploading (large) images rather than building from the spec files?

Having needed several times in the last few days to upload a 1Gb image after some micro change, I can't help but wonder why there isnt a deploy path built into docker and related tech (e.g. k8s) to push just the application files (Dockerfile, docker-compose.yml and app related code) and have it build out the infrastructure from within the (live) docker host?
In other words, why do I have to upload an entire linux machine whenever I change my app code?
Isn't the whole point of Docker that the configs describe a purely deterministic infrastructure output? I can't even see why one would need to upload the whole container image unless they make changes to it manually, outside of Dockerfile, and then wish to upload that modified image. But that seems like bad practice at the very least...
Am I missing something or this just a peculiarity of the system?
Good question.
Short answer:
Because storage is cheaper than processing power, building images "Live" might be complex, time-consuming and it might be unpredictable.
On your Kubernetes cluster, for example, you just want to pull "cached" layers of your image that you know that it works, and you just run it... In seconds instead of compiling binaries and downloading things (as you would specify in your Dockerfile).
About building images:
You don't have to build these images locally, you can use your CI/CD runners and run the docker build and docker push from the pipelines that run when you push your code to a git repository.
And also, if the image is too big you should look into ways of reducing its size by using multi-stage building, using lighter/minimal base images, using few layers (for example multiple RUN apt install can be grouped to one apt install command listing multiple packages), and also by using .dockerignore to not ship unnecessary files to your image. And last read more about caching in docker builds as it may reduce the size of the layers you might be pushing when making changes.
Long answer:
Think of the Dockerfile as the source code, and the Image as the final binary. I know it's a classic example.
But just consider how long it would take to build/compile the binary every time you want to use it (either by running it, or importing it as a library in a different piece of software). Then consider how indeterministic it would download the dependencies of that software, or compile them on different machines every time you run them.
You can take for example Node.js's Dockerfile:
https://github.com/nodejs/docker-node/blob/main/16/alpine3.16/Dockerfile
Which is based on Alpine: https://github.com/alpinelinux/docker-alpine
You don't want your application to perform all operations specified in these files (and their scripts) on runtime before actually starting your applications as it might be unpredictable, time-consuming, and more complex than it should be (for example you'd require firewall exceptions for an Egress traffic to the internet from the cluster to download some dependencies which you don't know if they would be available).
You would instead just ship an image based on the base image you tested and built your code to run on. That image would be built and sent to the registry then k8s will run it as a black box, which might be predictable and deterministic.
Then about your point of how annoying it is to push huge docker images every time:
You might cut that size down by following some best practices and well designing your Dockerfile, for example:
Reduce your layers, for example, pass multiple arguments whenever it's possible to commands, instead of re-running them multiple times.
Use multi-stage building, so you will only push the final image, not the stages you needed to build to compile and configure your application.
Avoid injecting data into your images, you can pass it later on-runtime to the containers.
Order your layers, so you would not have to re-build untouched layers when making changes.
Don't include unnecessary files, and use .dockerignore.
And last but not least:
You don't have to push images from your machine, you can do it with CI/CD runners (for example build-push Github action), or you can use your cloud provider's "Cloud Build" products (like Cloud Build for GCP and AWS CodeBuild)

How to improve automation of running container's base image updates?

I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!
There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)

does docker build --no-cache builds different layers?

I few months ago I decided to setup the CI of my project building docker images with the no-cache flag: I thought it would be best not to take the risk of letting docker use an old cache layer.
I realized only now that the sha of the layers of my image are always different (even if the newly built image should generate a layer identical to the previous built) and whenever I pull the newly built image all layers are always downloaded from zero.
I'm thinking now that the issue is the --no-cache flag, I know it sounds obvious, but honestly I thought that the --no-cache was only slower to execute, but also thought that it was implemented in a functional way (same command + same content = same layer).
Can someone confirm that the --no-cache flag is the problem?
The thing with containers is that practically speaking, you will never build the same layer with the same sha, ever. You can only have the same sha if you use the same layer you previously built.
Think about it this way: every time you build a layer, there will be at least a log file, a timestamp, something that is different - and then we have not yet mentioned external dependencies pulled in.
The --no-cache flag will simply stop the Docker engine from using the cached layers and it will download & build everything again. So that flag is indeed the (indirect) reason why your hashes are different, but that is the intended behavior. Building from the cache means that your builds will be faster, but reuse previously built layers, hence having the same sha (this may result in reusing previous stale results and whatnot, that is why we have the flag).
Have a look at this article for further info:
https://thenewstack.io/understanding-the-docker-cache-for-faster-builds/
If you wish to guarantee some layers have the same sha but still not want to use the cache, you may want to look into multiphase Docker builds:
https://docs.docker.com/develop/develop-images/multistage-build/
This way you can have a base image which is fixed and build everything else on top of that.

Docker Caching, how does it really work?

I understand that docker stores every images with layers. If I have multiple users on one development server, and everyone is running the same Dockerfile, but storing the image as, user1_myapp. And user2 is storing it as user2_myapp. Now again, they are using the same Dockerfile.
The question is, if the image is for example 100mb, are both images taking 100mb each, or are they sharing the same image and using only 100mb instead of 200mb?
Yes, the two images will share the same layers if you meet the prerequisites. Docker layers are reused independently of the resulting image name. The requirements to use a cached layer instead of creating a new one are:
The build command needs to be run against the same docker host where the previous image's cache exists.
The previous layer ID must match between the cache layer and the running build step.
The command currently being run, or the source context if you are running a COPY or ADD, must be identical. Docker does not know if you are running a command that pulls from an external changing resource (e.g. git clone or apt-get update), which can result in a false cache hit.
You cannot have disabled caching in your build command.
Keep in mind that layers are immutable, once created they are never changed, just replaced with different layers with new ID's when you run a different build. When you run a container, it uses a copy-on-write RW layer specific to that container, which allows multiple containers and images to point to the same image layers when they get a cache hit.
If you are having problems getting the cache to match in the two builds, e.g. importing a large file and something like the file timestamp doesn't match, consider creating an intermediate image that contains the common files. Then each project can build FROM that intermediate image.

Why doesn't Docker Hub cache Automated Build Repositories as the images are being built?

Note: It appears the premise of my question is no longer valid since the new Docker Hub appears to support caching. I haven't personally tested this. See the new answer below.
Docker Hub's Automated Build Repositories don't seem to cache images. As it is building, it removes all intermediate containers. Is this the way it was intended to work or am I doing something wrong? It would be really nice to not have to rebuild everything for every small change. I thought that was supposed to be one of the best advantages of docker and it seems weird that their builder doesn't use it. So why doesn't it cache images?
UPDATE:
I've started using Codeship to build my app and then run remote commands on my DigitalOcean server to copy the built files and run the docker build command. I'm still not sure why Docker Hub doesn't cache.
Disclaimer: I am a lead software engineer at Quay.io, a private Docker container registry, so this is an educated guess based on the same problem we faced in our own build system implementation.
Given my experience with Dockerfile build systems, I would suspect that the Docker Hub does not support caching because of the way caching is implemented in the Docker Engine. Caching for Docker builds operates by comparing the commands to be run against the existing layers found in memory.
For example, if the Dockerfile has the form:
FROM somebaseimage
RUN somecommand
ADD somefile somefile
Then the Docker build code will:
Check to see if an image matching somebaseimage exists
Check if there is a local image with the command RUN somecommand whose parent is the previous image
Check if there is a local image with the command ADD somefile somefile + a hashing of the contents of somefile (to make sure it is invalidated when somefile changes), whose parent is the previous image
If any of the above steps match, then that command will be skipped in the Dockerfile build process, with the cached image itself being used instead. However, the one key issue with this process is that it requires the cached images to be present on the build machine, in order to find and verify the matches. Having all of everyone's images on build nodes would be highly inefficient, making this a harder problem to solve.
At Quay.io, we solved the caching problem by creating a variation of the Docker caching code that could precompute these commands/hashes and then ask our registry for the cached layers, downloading them to the machine only after we had found the most efficient caching set. This required significant data model changes in our registry code.
If you'd like more information, we gave a technical overview into how we do so in this talk: https://youtu.be/anfmeB_JzB0?list=PLlh6TqkU8kg8Ld0Zu1aRWATiqBkxseZ9g
The new Docker Hub came out with a new Automated Build system that supports Build Caching.
https://blog.docker.com/2018/12/the-new-docker-hub/

Resources