I read the Docker Image Specification v1.2.0.
It said:
Layers are referenced by cryptographic hashes of their serialized representation. This is a SHA256 digest over the tar archive used to transport the layer, represented as a hexadecimal encoding of 256 bits, e.g., sha256:a9561eb1b190625c9adb5a9513e72c4dedafc1cb2d4c5236c9a6957ec7dfd5a9. Layers must be packed and unpacked reproducibly to avoid changing the layer ID, for example by using tar-split to save the tar headers. Note that the digest used as the layer ID is taken over an uncompressed version of the tar.
I want find out the specific process. So I try the flowing:
chao#manager-02:~/image_lab$ docker image save busybox:1.27-glibc > busybox.tar
chao#manager-02:~/image_lab$ tar -xvf busybox.tar
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/VERSION
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/json
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/layer.tar
fe2d514cd10652d0384abf2b051422722f9cdd7d189e661450cba8cd387a7bb8.json
manifest.json
repositories
chao#manager-02:~/image_lab$ ls
47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe Dockerfile manifest.json
busybox.tar fe2d514cd10652d0384abf2b051422722f9cdd7d189e661450cba8cd387a7bb8.json repositories
chao#manager-02:~/image_lab$ sha256sum 47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/layer.tar
545903a7a569bac2d6b75f18d399251cefb53e12af9f644f4d9e6e0d893095c8 47f54add1c481ac7754f9d022c2c420099a16e78faf85b4f2926a96ee40277fe/layer.tar
Why the sha256sum I generated is not equal to sha256sum of the image layer?
Technically, you did answer your own question. This is what the Docker image spec says (as you quoted):
[DiffID] is a SHA256 digest over the tar archive used to transport the layer (...) Note that the digest used as the layer ID is taken over an uncompressed version of the tar.]"
But later on, when describing the content of the image, the same doc also says:
There is a directory for each layer in the image. Each directory is named with a 64 character hex name that is deterministically generated from the layer information. These names are not necessarily layer DiffIDs or ChainIDs.
If you look at the manifest.json of your image, you'll see that the rootfs.diff_ids array points to same hash you obtained by sha256suming layer.tar. The hash you computed is the DiffID.
The obvious follow up question then is: where did that directory name came from?!
I am not sure, but it seems that it is generated by whatever algorithm was used to generate layer IDs on the older Docker image format v1. Back then, images and layers were conflated into a single concept.
I'd guess they kept the v1 directory names unchanged to simplify the use old layers with newer Docker versions.
Footnote: AFAIU, the Docker image format spec is superseded by the OCI image format specification, but docker image save seems to generate archives in the older Docker format.)
Related
I know docker save can save a image to tar and use docker load to reload a image.
For example:
I have a Machine A and B. B can't connect hub. A is image:latest and B is image:base.
I have to save multi image in A as some tar file , but the tar files are too big to transfer.
Can I save the diff between tags or image ids in A and load the diff in B?
Not save the whole image which help update patch much more smaller.
This isn't possible using standard Docker tooling. The only option docker save takes is an option to write to a file rather than to stdout, and it always contains all parent layers (and base images).
If your only problem is transferring the images, consider either techniques to reduce the image size (for example, use a multi-stage image to not include build-time dependencies in the final image) or using tools like split(1) to break the tar file into smaller parts.
I believe the docker save tar file output is the same as the "Export an image" API call. It might be possible to manually edit that tar file to delete layers, and there might be tools out there that do this. (This is not a particularly mainstream path, though; I've looked into it several years ago but not done it myself, and occasionally see tools mentioned in infrequent SO answers.)
In between the standard behavior of docker pull and docker save only creating complete image chains, in principle there's no way to set up Docker so that you never only have the "top half" of an image but not the base layers below this. Editing the docker save tar files by hand would violate this invariant.
I have just copied a docker image from one repository to another, by pulling an explicit sha256 hash tag from our OpenShift 3.11 external repository, retagging it to our Harbor 1.9.2 repository and pushing that tag.
In that process the sha256 key for the new image was shown, and it was different to the sha256 key I started out with. This was unexpected as I did not change anything with the image, except assigning it another tag, so the bytes should be the same giving the same hash.
Does this mean that the algorithms for some reason are different? That the repository name is included in the hash key calculation? Or something else?
You are confusing the image id digest with the layer digests. If you docker inspect those images you will notice that the underlying layer digests will match exactly.
Each image in the registry gets an image id. Run docker images --digests --no-trunc and notice that you will see a digest column and an image id column and they are not the same. The digest column is the digest of the manifest and is shown as RepoDigests in docker inspect output. If the manifest contains the name and the tag, then the digest will be different as well.
Also try diff <(docker inspect image_id_1) <(docker inspect image_id_2) to see what is going on.
See this answer and this article for additional details.
I would like to edit a docker images metadata for the following reasons:
I don't like an image parents EXPOSE, VOLUME etc declaration (see #3465, Docker-Team did not want to provide a solution), so I'd like to "un-volume" or "un-expose" the image.
I dont't like an image ContainerConfig (see docker inspect [image]) cause it was generated from a running container using docker commit [container]
Fix error durring docker build or docker run like:
cannot mount volume over existing file, file exists [path]
Is there any way I can do that?
Its a bit hacky, but works:
Save the image to a tar.gz file:
$ docker save [image] > [targetfile.tar.gz]
Extract the tar file to get access to the raw image data:
tar -xvzf [targetfile.tar.gz]
Lookup the image metadata file in the manifest.json file: There should be a key like .Config which contains a [HEX] number. There should be an exact [HEX].json in the root of the extracted folder.
This is the file containing the image metadata. Edit as you like.
Pack the extracted files back into an new.tar.gz-archive
Use cat [new.tar.gz] | docker load to re-import the modified image
Use docker inspect [image] to verify your metadata changes have been applied
EDIT:
This has been wrapped into a handy script: https://github.com/gdraheim/docker-copyedit
I had come across the same workaround - since I have to edit the metadata of some images quite often (fixing an automated image rebuild from a third party), I have create a little script to help with the steps of save/unpack/edit/load.
Have a look at docker-copyedit. It can remove or overrides volumes as well as set other metadata values like entrypoint and cmd.
I want to know when I pulled a certain image, when you run docker images The Created field appear but the date that the image was pulled don't.
If you installed docker-engine from official repositories on your linux, it should be installed in /var/lib/docker, for your own configuration, find the respective path.
There is /var/lib/docker/image/aufs/repositories.json file where docker stores images with their sha256 values.
cat /var/lib/docker/image/aufs/repositories.json
Find the image you are looking after and copy it's sha256 hash somewhere.
Then:
ls /var/lib/docker/image/aufs/imagedb/content/sha256 -lash
Find the sha265 value you found in repositories.json then look at the date.
I am playing around with Docker for a couple of days and I already made some images (which was really fun!). Now I want to persist my work and came to the save and export commands, but I don't fully understand them.
What is the difference between save and export in Docker?
The short answer is:
save will fetch an image : for a VM or a physical server, that would be the installation .ISO image or disk. The base operating system.
It will pack the layers and metadata of all the chain required to build the image. You can then load this "saved" images chain into another docker instance and create containers from these images.
export will fetch the whole container : like a snapshot of a regular VM. Saves the OS of course, but also any change you made, any data file written during the container life. This one is more like a traditional backup.
It will give you a flat .tar archive containing the filesystem of your container.
Edit: as my explanation may still lead to confusion, I think that it is important to understand that one of these commands works with containers, while the other works with images.
An image has to be considered as 'dead' or immutable, starting 0 or 1000 containers from it won't alter a single byte. That's why I made a comparison with a system install ISO earlier. It's maybe even closer to a live-CD.
A container "boots" the image and adds an additional layer on top of it. This layer stores any change on the container (created/changed/removed files...).
There are two main differences between save and export commands.
save command saves whole image with history and metadata but export command exports only files structure (without history and metadata). So the exported tar file will be smaller then the saved one.
When you use exported file system for creating a new image then this new image will not contain any USER, EXPOSE, RUN etc. commands from your Dockerfile. Only file structure will be transferred.
So when you are using mentioned keywords in your Dockerfile then you cannot use export command for transferring image to another machine - you need always use save command.
export: container (filesystem)->image tar.
import: exported image tar-> image. Only one layer.
save: image-> image tar.
load: saved image tar->image. All layers will be recovered.
From Docker in Action, Second Edition p190.
Layered images maintain the history of the image, container-creation metadata, and old files that might have been deleted or overridden.
Flattened images contain only the current set of files on the filesystem.
The exported image will not have any layer or history information saved, so it will be smaller and you will not be able to rollback.
The saved image will have layer and history information, so larger.
If giving this to a customer, the Q is do you want to keep those layers or not?
Technically, save/load works with repositories which can be one or more of images, also referred to as layers. An image is a single layer within a repo. Finally, a container is an instantiated image (running or not).
Docker save Produces a tar file repo which contains all parent layers, and all tags + versions, or specified repo:tag, for each argument provided from image.
Docker export Produces specified file(can be tar or tgz) with flat contents without contents of specified volumes from Container.
docker save need to use on docker image while docker export need to use on container(just like running image)
Save Usage
docker save [OPTIONS] IMAGE [IMAGE...]
Save an image(s) to a tar archive (streamed to STDOUT by default)
--help=false Print usage -o, --output="" Write to a file,
instead of STDOUT
export Usage
docker export [OPTIONS] CONTAINER
Export the contents of a container's filesystem as a tar archive
--help=false Print usage -o, --output="" Write to a file,
instead of STDOUT