Searching docker hub registry images/layers by their SHA digest - docker

If you ever attentively browse for docker images on https://hub.docker.com you may have once dissect all the commands composing an image of interest within a certain tag.
Great, but then you may have seen this kind of "translated" command when you click on a specific line of a command:
I may be wrong here because I'm not a Docker expert, but this seems to be an SHA-256 digest which refers to... something else inside the Hub.
My question is; how to find what exactly does it refer to, knowing the SHA value (3a7bff4e139bcacc5831fd70a035c130a91b5da001dd91c08b2acd635c7064e8)?

The SHA value you see is the digest of the file that was added.
For example, suppose you have the following dockerfile:
FROM scratch
ADD foo.txt /foo.txt
If you were to push that to dockerhub, you would see something like:
ADD file:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c in /
where b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c is the digest of foo.txt.
There's no definitive way to reverse this information to obtain the file, considering ADD can do things like unpack tar archives.
With modern versions of Docker/buildkit, you might see the filename instead.
This is the same thing you see when using docker image history.

Related

Meaning of hash after ADD file: in docker history

If I run docker history --no-trunc IMAGE on an image which was built from a Dockerfile like:
FROM scratch
ADD something.tar.xz /
...
I see the following under the CREATED BY column in the last line of the docker history output:
/bin/sh -c #(nop) ADD file:fb0755f94145d1e0a46167faf3b43dba3db9383f4c230217a500a65e01307e27 in /
The part after file: looks like a SHA256 hash of something but it doesn't match the SHA256 of...
the file referenced in the Dockerfile (something.tar.xz)
the name of that file or its full path at build time
the layer digest or any other which I see in the output from any Docker ls/inspect/etc.
So my question is, purely out of interest, what is this the hash of?
Related questions
This question is related but based on the misunderstanding that the ADD file:... is actually in the Dockerfile.
This question kind of implicitly asks my question but the closest to an answer is:
I'm not sure there's a reliable way to translate from the host file or URL to the hash
So I want to explicitly ask the question of what the hash actually refers to, even though I fully understand that I cannot magically reconstitute any information from it about the original file which was ADDed.

Docker image layer: What does `ADD file:<some_hash> in /` mean?

In Docker Hub images there are lists of commands that being run for each image layer. Here is a golang example.
Some applications also provide their Dockerfile in GitHub. Here is a golang example.
According to the Docker Hub image layer, ADD file:4b03b5f551e3fbdf47ec609712007327828f7530cc3455c43bbcdcaf449a75a9 in / is the first command. The image layer doesn't have any "FROM" command included, and it doesn't seem to be suffice the ADD definition too.
So here are the questions:
What does ADD file:<HASH> in / means? What is this format?
Is there any way I could trace upwards using the hash? I suppose that hash represents the FROM image, but it seems there are no API for that.
Why it is not possible to build a dockerfile using the ADD file:<HASH> in / syntax? Is there any way I could build an image using such syntax, OR do a conversion between two format?
That Docker Hub history view doesn't show the actual Dockerfile; instead, it shows content essentially extracted from the docker history of the image. That doesn't preserve the specific details you're looking for: it doesn't remember the names of base images, or the build-context file names of things that get ADDed or COPYed in.
Chasing through GitHub and Docker Hub links, the golang:*-buster Dockerfile is built FROM buildpack-deps:...-scm; buildpack-deps:buster-scm is FROM buildpack-deps:buster-curl; that is FROM debian:buster; and that has a very simple Dockerfile (quoted here in its entirety):
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
FROM scratch starts from a completely totally empty image; that is the base of the Docker image tree (and what tells docker history and similar tools to stop). The ADD line unpacks a tar file of a Debian system image.
If you look at docker history or the Docker Hub history view you cite, you should be able to see these same steps happening. The ADD file:4b0... in / corresponds to the ADD rootfs.tar.gz /, and the second line is the CMD ["bash"]. It is not split up by Dockerfile or image, and the original filenames from ADD aren't saved. (You couldn't reproduce the image anyways without the contents of the rootfs.tar.gz, so it's merely slightly helpful to know its filename but not essential.)
The ADD file:hash in /path syntax is not standard Dockerfile syntax (the word in in particular is not part of it). I'm not sure there's a reliable way to translate from the host file or URL to the hash, but building the image and looking at its docker history would tell you (assuming you've got a perfect match for the file metadata). There's no way to get back to the original filename or syntax, and definitely no way to get back to the file contents.
ADD or COPY means that files are append to the images.
That are files, you cannot "trace" them.
You cannot just copy the commands, because the hashes are not the original files. See https://forums.docker.com/t/how-to-extract-file-from-image/96987 to get the file.

Why isn't Docker more transparent about what it's downloading?

When I download a Docker image, it downloads dependencies, but only displays their hashes. Why does it not display what it is downloading?
For example:
➜ ~ docker run ubuntu:16.04
Unable to find image 'ubuntu:16.04' locally
16.04: Pulling from library/ubuntu
b3e1c725a85f: Downloading 40.63 MB/50.22 MB
4daad8bdde31: Download complete
63fe8c0068a8: Download complete
4a70713c436f: Download complete
bd842a2105a8: Download complete
What's the point in only telling me that it's downloading b3e1c725a85f, etc.?
An image is created on layers of filesystems represented by hashes. After it's creation, the base image tag may point to a completely different set of hashes without affecting any images built off of it. And these layers are based on things like run commands, the tag to call it something like ubuntu:16.04 is only added after the image is made.
So the best that could be done is to say 4a70713c436f is based on adding some directory based on a hash of an input folder itself, or a multi-line run command, neither of which makes for a decent UI. The result may have no tagged name, or it could have multiple tagged names. So the simplest solution is to output what's universal and unchanging for all scenarios, an unchanging hash.
To rephrase that pictorially:
b3e1c725a85f: could be ubuntu:16.04, ubuntu:16, ubuntu:latest, some.other.registry:5000/ubuntu-mirror:16.04
4daad8bdde31: could be completely untagged, just a run command
63fe8c0068a8: could be completely untagged, just a copy file
4a70713c436f: could point to a tagged base image where that tag has since changed
bd842a2105a8: could be created with a docker commit command (eek)

Docker: git add <filename> like feature

Suppose I have 10 files changes in my existing docker image. How would I commit only 2 specific files and create a separate tag?
Something Like:
docker commit -m file1 file2
Edited: see my request has been closed by docker members.
https://github.com/docker/docker/issues/20897
You cannot, man docker commit tells you why does not offer this operation.
Edit
It just that the command does not offer this operation and usually the man page does not details what and why the tool does not offer. Nevertheless, my guess is: it is programmatically possible to perform this operation but may lead in inconsistent images, or incoherent, and might be dangerous. Let's say you installed a package, and want to commit these files but forget a conf-file (or anything else) the image would be in a weird-not-so-clean state. Also, using docker commit is far away from the best practice to create image. Prefer a Dockerfile which make image lighter to build
If you modify the contents of a container, you can use the docker commit command to save the current state of the container as a new tag.
$ docker commit <CONTAINER_ID> <IMAGE_NAME>:<NEW_TAG>

Docker: How do I pull a specific build-id?

I would like to always pull a specific version, rather than just the latest.
A random example: https://registry.hub.docker.com/u/aespinosa/jenkins/builds_history/9511/
I am doing this because I only want to deploy versions that I have audited. Is this currently possible? Or am I forced to fork them and make my own?
You can pull a specific image by digest by using the following syntax:
docker pull ubuntu#sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2
If you need to find the hash, it is output when pushing/pulling the image. Some automated builds output it at the end. I tried looking for the hash with docker inspect but it didn't appear to be there, so you'll have to delete the image and pull it again to view the hash.
The way I do it is to tag each build
docker build -t $NAMESPACE/$APP_NAME:$BUILD_SHA1 .
docker tag $NAMESPACE/$APP_NAME:$SHA1 $DOCKER_REGISTRY/$NAMESPACE/$APP_NAME:$SHA1
docker push $DOCKER_REGISTRY/$NAMESPACE/$APP_NAME:$SHA1
and then you pull the specific tag
docker pull $DOCKER_REGISTRY/$NAMESPACE/$APP_NAME:$SHA1
In addition to Joel's answer, you might want to verify the image exists on a specific Docker repo before trying to pull the image. The easiest way I know is using the Docker registry API. Make a simple HTTP GET request. Assemble the string like this -
FullURL = DomainAndPort + "/v2/" + imageName + "/blobs/sha256:" + imageHash;
An example request that works for me on our network repo -
http://10.10.9.84:5000/v2/hello-world/blobs/sha256:8089101ead9ce9b8c68d6859995c98108e1022c23beaa55754acb89d66fd3381
Entering that string into a Chrome browser returns a JSON object describing the image. If you enter an invalid sha256 hash then the API returns -
{"errors":[{"code":"DIGEST_INVALID","message":"provided digest did not match uploaded content","detail":{}}]}
For more details see "Pulling a Layer" in https://docs.docker.com/registry/spec/api/

Resources