Is docker push/pull atomic? - docker

The question is simple as it seems.
What happens if the push is interrupted (^C ?) during the process but the remote repository already has the image (same name/tag)?
Will the successfully uploaded layers overwrite the existing images, may be corrupting it?
In the same way, locally could happen that same thing.
Has anyone already investigated in this way?

Existing layers are not overwritten.
This is how docker push/pull works according to v2 API:
Docker image is made up of a singed manifest file and the individual layers. These layers are stored as blobs in the registry keyed by their digests. Manifest file will have the all the details required to pull, install, validate and run an image. It also contains the list of layers making up the image.
Pushing an image
When you are pushing an image, client will push the layers first and then upload the signed manifest. So if the push is interrupted in between before the manifest is uploaded, the registry will have some unreferenced blobs lying around. When the garbage collection is triggered, these blobs will be removed.
When uploading a layer, client asks the registry if it already has the layer or not. If registry already has the layer, the upload of the particular layer is skipped. If registry doesn't have the layer, client will request for upload and registry returns a URL which the client can use to upload the layer. Layer can be uploaded in chunks or as monolithic single chunk. Once all chunks are uploaded, client must send the digest of the layer to the registry which the registry will validate and return success message if the digest of the uploaded content matches. Only after verifying the digest is the upload considered complete.
Once all the layers are pushed, client uploads the image manifest file. Registry checks that it has all the layers references in the manifest and returns an appropriate errors like BLOB_UNKNOWN if it doesn't.
Pulling an image
Pulling images work in similar way but in opposite order. When pulling an image, client will first request for the image manifest and then download the layers that it doesn't have. Download is complete only if the digests are verified.

Related

Does Docker Hub utilize checksum based storage?

I am trying to understand how the underlying storage works for Docker Hub. For context, JFrog states that they use checksum based storage, not only ensuring that all images will be stored only once, but each individual layer composing the image is stored only once, even if that layer is reused in another image.
This may have side effects that I'm trying to understand when cleaning and removing old artifacts and images from JFrog (and potentially Docker Hub). I would like to know if Docker Hub functions in a similar way, and cannot find a clear answer in the documentation.
There seem to be two questions one for Docker hub and one for Artifactory.
Let me try addressing from Artifactory side. Your understanding is correct. Artifactory is checksum-based and it stores every layer only once.
usercase1 :
We publish two images with few layers in common. Even if we delete one image, the layers that are in common will not be deleted as there is a reference still exists.
Usecase2:
For example, we will pull two images from Docker hub that have same layer in common (When we pull, Artifactory saves a copy in remote-cache and binary store), unique items will be saved. When we delete an image, only the unreferenced layers will be deleted. This is only local to Artifactory and it will not delete anything from the remote endpoint Docker Hub.
Container registries are implemented as a combination of a Directed Acyclic Graph (DAG) and Content Addressable Storage (CAS). Each image has a manifest, in json, that lists the blobs for the layers and image config. Those blobs are referenced by their digest, and the API to push and fetch those blobs includes that digest. So two different images in the same repository that have a manifest referencing the same blobs will use the same API to pull those blobs. There's no way to tell the difference between the requests, so there's no need to store the same blob twice.
When deleting content, you shouldn't delete the blobs. Instead, delete the manifests you no longer want, and rely on the registry to garbage collect those blobs.
However, when deleting manifests, that is done by digest, and multiple tags can point the same manifest. You can also have a manifest list, used for multi-platform images, that points to another manifest. While there is an API to delete tags, most registries don't implement this, so you need to exercise caution when deleting a manifest that no tags you want to keep still references that manifest. To minimize that risk, I delete a tag by pushing an empty manifest to the tag I want to delete, and then delete the digest of that empty manifest.
For more details on how registries work, see the OCI distribution-spec.

Are repositories in Docker contain one base layer on top of which all other layers are added?

I am learning about the Docker and want to clarify the meaning of a repository.
There is a single chain of union file system layers. There are references to different (maybe not all) layers in this chain and these references are called tags. The set of tags referencing layers in a single chain in Docker context is called a repository. Is it correct?
I am specifically interested if we can have more than one chain of layers in one repository. As I understand we can not, is it so?
A repository is a bit like a namespace for registries. Authorization is handled at the repository level, and blobs and manifests must be pushed (or mounted) into a repository before they can be pulled from that repository.
Those blobs are image layers and configs, each referenced by their digest (since registries are a CAS, a content addressable store). Since the blobs are content addressable, you only store them once and they get automatically deduplicated if you attempt to push them again. And the manifest for an image lists the layers and a config by digest, which is itself represented by a digest. And a tag is a (typically mutable) pointer to a manifest digest.
The layers referenced by any two manifests in the same repository can be identical, overlapping, or completely disjoint. There's no hierarchical requirement between two manifests to have anything in common. So the short answer to the question is yes, you can have more than one chain of layers in one repository.
All middle and base layers are cached and stored once (per layer id/hash of build steps) even if them used in different builds (image tags) or different images with same first build steps

Universal way to check if container image changed without pulling that works for all container registries

I'm writing a tool to sync container image from any container registry. In order to sync images, I need a way to check if local image:tag is different from remote image:tag, possibly through comparing image sha ID (as image sha digest is registry-based). Due to the nature of my tool, pulling image first then compare using docker inspect will not be suitable.
I was able to find some post like this or this. They either tell me to use docker v2 API to fetch remote metadata (which contains image ID) and then compare with local image ID or use container-diff (which seems like it was made for a more complicated problem, comparing packages in package management systems inside images). The docker v2 API method is not universal because each registry (docker.io, grc.io, ecr) requires different headers, authentications, etc. Therefore, container-diff seems to be the most suitable choice for me, but I have yet to figure out a way to simply output true/false if local and remote image are different. Also, it seems like this tool does pull images before diffing them.
Is there anyway to do this universally for all registries? I see that there are tools that already implemented this feature like fluxcd for kubernetes which can sync remote image to local pod image but is yet to know their technical details.
On a high level your approach is correct to compare the SHA values, however you need to dive deeper into the container spec, as there is more to it. (layer and blobs)
There are already tools out there that can copy images from one registry to another. The tools by default don't copy the data if the image already exist in the target. Skopeo is a good starting point.
If you plan to copy images from different registries, you need to cope with each registry individually. I would also recommend you to take a look at Harbor. The Harbor Container Registry has the capability to copy images from and to various registries built in. You can use Harbor as your solution or starting point for your endeavor.

Can you push a Docker image to a private repo and only include differences from a public image?

Suppose I have a private Docker repository at myrepo.myhost.com.
I now build an image off of a very large public Docker Registry image. Assume it's called bandwidthguy/five-gigabyte-image:latest.
I have a Dockerfile that does one simple thing, for example:
FROM bandwidthguy/five-gigabyte-image
COPY some-custom-file /etc/bigstuff
I build the image:
docker build -t myversionof-five-gigabyte-image .
and tag it.
docker tag myversionof-five-gigabyte-image:latest myrepo.myhost.com/myversions/five-gigabyte-image:latest
Now I push to my repo.
docker push myrepo.myhost.com/myversions/five-gigabyte-image
I noticed that when doing this, the entire large source image gets pushed to my repository.
What I'm wondering is if there is any way to somehow have Docker only push a difference image, and then pull the other layers from their respective sources when the image is pulled. Pushing the entire image to my private repo can have problems:
If the private repo is hosted on my home ISP, my limited upstream can cause major lag when pulling the image while out and about.
If the private repo is on a hosted service, it might have a disk quota and I am using 5GB of that quota needlessly.
It takes a long time to push the image, especially if I have slow upload speed at the time.
It may just be the case that you can't put the parts on different servers, but I figured it's worth an ask to see if it can be done. It would make sense that you could store all the layers on your own host for the purposes of running an air-gapped server, but it seems a bit of an oversight that you can't pull the source images from the Registry.
This question showcased my early misunderstanding of Docker. There is no current mechanism for storing different layers of an image on different repositories. While there's no theoretical reason this couldn't be implemented, I'm guessing it's just not worth the extra effort.
So, the answer to my question is no, you can't store only image differences in a private repo - you'll be storing all layers, including those that were pulled from the public repo, in your private repo. However, since layers are represented by their hashes, clients that have already pulled the image from the public repo won't need to re-download those layers again from the private repo. This leads to the possibility that perhaps the hashes of the very large layers could be kicked out of the private repo manually, and then users could be required to first pull the source image from public manually. (Pulling fresh from the private repo only would error out.) I haven't looked into this, but it might be a possible hacky solution.
Luckily, there aren't too many Docker images that actually need multiple gigabytes of space. Even so, layers are stored compressed and deduplicated in the registry.

Does a docker feature exist similar to save that only saves the new layer as a .tar?

I have a question about whether or not a feature exists in docker: I have a local network with multiple computers (all with docker) and a local registry. I have one base image (2.5GB) that is on every machine. Devs on the network run a container, make modifications, and can push to registry and pull and only that new layer is downloaded. They can save and load, and the load is almost instant because of base layer existing. Is there any way, (docker diff?) that I can use an alternative to docker save that only saves that single layer to send for a smaller file transfer. My base image is 2.5GB so each layer is trivial, but there will be real scenarios when a registry won't be available and we will have to use save/ load. Everything works, but if we could cut 2.5GB out of each transfer, that would be amazing.
This is a post from 2014 asking for it but I don't think anything official became of it.
Any ideas? Thanks.

Resources