How does docker handle digests?
I can see in plain text, when I run docker image --inspect, the digest of an image. And there's also the thing that local images don't have a digest until I push them to a registry (and AFAIK, if I push an image to various registries, it will have various digests, but never tried that).
I fear that docker might be actually using that info instead of calculating the hash every time that I use or pull an image.
Is there a way to actually tell docker: "Hey, I want you to recheck right now the hash of the image contents. Are they the exact same as when I first created the image? Or has someone manipulated it ever?"
And: does docker really calculate that hash every time an image is run (by digest), or at least every time an image is pulled (by digest)?
The digest is calculated on push and pull to a registry. It's a sha256 checksum of the image manifest, which is current versions of docker is independent of the registry (the older schema v1 syntax included the repository/tag in the manifest that resulted in the digest changing depending on the image name). The layer digests are included in that manifest, and those digests on the registry are compressed tar files. Once the files have been extracted on the local docker engine, they aren't reverified, and I'm not aware of a command yet that would verify the files under /var/lib/docker have not been changed since the image was pulled.
Related
I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!
There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)
Asked in a different question:
why does skaffold need two tags to the same image?
During deployment, Skaffold rewrites the image references in the Kubernetes manifests being deployed to ensure that the cluster pulls the the newly-built images and doesn't use stale copies (read about imagePullPolicy and some of the issues that it attempts to address). Skaffold can't just use the computed image tag as many tag conventions do not produce unique tags and the tag can be overwritten by another developer and point to a different image. It's not unusual for a team of devs, or parallel tests, to push images into the same image repository and encounter tag clashes. For example, latest will be overwritten by the next build, and the default gitCommit tagger generates tags like v1.17.1-38-g1c6517887 which uses the most recent version tag and the current commit SHA and so isn't unique across uncommitted source changes.
When pushing to a registry, Skaffold can use the image's digest, the portion after the # in gcr.io/my-project/image:latest#sha256:xxx. This digest is the hash of the image configuration and layers and uniquely identifies a specific image. A container runtime ignores the tag (latest here) when there is a digest.
When loading an image to a Docker daemon, as happens when deploying to minikube, the Docker daemon does not maintain image digests. So Skaffold instead tags the image with a second tag using a computed digest. It's extremely unlikely that two different images will have the same computed digest, unless they're the same image.
Tags are cheap: they're like symlinks, pointing to an image identifier.
I have created a docker image with a custom tag using Dockerfile. For the first time when I pushed it to docker repository (in Jfrog artifactory) using docker push command, it generated a SHA256 digest value. Now I again pushed the same image with same tag without any change in the content of the image to the same docker repository. But now it generated new SHA256 digest value.
Can someone explain me why this is happening? I'm struck at this point as my project is hardly dependent to SHA256 digest value of the docker image.
Since my comment answered your question, original credit goes to the post here: https://windsock.io/explaining-docker-image-ids/
Layers are identified by a digest in this form: algorithm:hex which looks like sha256:abcd.....
The hex is calculated by applying the algorithm (sha256) to the layers content. If the content changes, then the digest changes.
Is there a possibility to force pull of docker image?
I have redeployed docker image to another repository, but when I invoke
docker pull anotherrepo:port/my/image
nothing gets downloaded, instead I get info:
Digest: sha256:somehash
and that image is up to date.
docker rm/rmi doesn't work because the image is downloaded from originalrepo:port/my/image and I don't want to stop/delete it onyl for test purposes.
Is there possible to force pull to check if the image is correctly pushed?
The following should work. You add a temporary tag to avoid deletion of the image, delete the original tag and then pull:
docker tag "$originalTag" "tmpTag"
docker rmi "$originalTag"
docker pull "$originalTag"
docker rmi "tmpTag"
I think the answer lies in digests.
Images that use the v2 or later format have a content-addressable identifier called a digest. As long as the input used to generate the image is unchanged, the digest value is predictable.
Source: https://docs.docker.com/engine/reference/commandline/images/#list-the-full-length-image-ids
Maybe you don't need to verify if the push was successful, as Docker could be doing that automatically by using digests, but I'm not sure if this is indeed the case.
The only other way I can think of would be to pull from a different machine which has access to the new repository.
Is there any way to prevent images being uploaded to docker hub with the same tags as existing images? Our use case is as follows.
We deploy to production with a docker-compose file with the tags of images as version numbers. In order to support roll-back to previous environments and idempotent deployment it is necessary that a certain tagged docker image always refer to the same image.
However, docker hub allows images to be uploaded with the same tags as existing images (they override the old image). This completely breaks the idea of versioning your images.
We currently have work-arounds which involve our build scripts pulling all versions of an image and looking through the tags to check that an overwrite will not happen etc. but it feels like there has to be a better way.
If docker hub does not support this, is there a way to do docker deployment without docker hub?
The tag system has no way of preventing images been overwritten; you have to come up with your own processes to handle this (and h3nrik's answer is an example of this).
However, you could use the digest instead. In the new v2 of the registry, all images are given a checksum, known as a digest. If an image or any of its base layers change, the digest will change. So if you pull by digest, you can be absolutely certain that the contents of that image haven't changed over time and that the image hasn't been tampered with.
Pulling by digest looks like:
docker pull debian#sha256:f43366bc755696485050ce14e1429c481b6f0ca04505c4a3093dfdb4fafb899e
You should get the digest when you do a docker push.
Now, I agree that pulling by digest is a bit unwieldy, so you may want to set up a system that simply tracks digest and tag and can verify that the image hasn't changed.
In the future, this situation is likely to improve, with tools like Notary for signing images. Also, you may want to look at using labels to store metadata such as git hash or build number.
Assuming you have a local build system to build your Docker images: you could include the build number from your local build job in your tag. With that you assure your requirement:
... it is necessary that a certain tagged docker image always refer to the same image.
When your local build automatically pushes to docker hub it is assured that each push pushes an image with a unique tag.