Deleting Docker Images from Artifactory - docker

We are doing an artifactory cleanup, removing Docker images from our docker repo.
I understand from the article over https://jfrog.com/knowledge-base/how-can-i-delete-docker-images-older-than-a-certain-time-period
that if a layer is shared between two different images, and if only one is a candidate for deletion, then that layer will not be deleted from the binary storage.
Our policy is on deleting some specific tag versions (that are not used in production) and now we have some queries based on the above article
Is there a possibility that we will end up with partially deleted images (corrupted images). Say some of the layers of the image we are trying to delete is referenced by some other image, would layers be partially deleted, leaving us with a corrupted image which could be be pulled, but then result in failure??

You are correct, if the layer is referenced by another tag, then it will not be deleted.
The worst that can happen is that you will leave a tag behind that points to a non-existent image.

Related

Why is a new docker image the same size of the original one from which the commit was made?

I downloaded a Docker image and made some changes inside a container based on it. Then I commited those changes to create a new image that I would actually like to use.
docker images says that these images have about the same size. So, it seemed to me that Docker copied everything it needs to the new image.
Yet I can't remove the old image which I no longer need. It seems like I'm getting the worst of both worlds: neither is space conserved by a parenting relationship, nor can I delete the unwanted files.
What gives? Am I interpreting docker images output wrong (maybe it's not reporting the actual on-disk size)?
you may remove the first image with a force,
docker image rm -f $IMAGE_ID
As for the same size, it depends mainly on your changes, you can check if they match exactly on a byte level with:
docker image inspect IMAGE_NAME:$TAG --format='{{.Size}}'

How to add/mount large files kept in SharePoint to Docker Container through Dockerfile

I'm new to using Docker and wanted to understand how to add large folders (combined ~1GB) kept elsewhere (such as in SharePoint) to the Docker container using Dockerfile. What is the best way to add the files and can someone explain the commands to be used? For example, one method I have come across is the following:
ADD http://example.com/big.tar.xz /usr/src/things/
Does the /usr/src/things/ specify the location where I want to save the folders (not individual files) with respect to my original repository?
This answer is from: Adding large files to docker during build which covers the question at a high level. Can someone share details/commands for each step involved? One answer mentions not adding the files to the image but mounting as a volume. Is that a better option than using ADD in the Dockerfile.
Thanks!

Are tags exclusive to one image in a Docker repo?

I thought that tags in Docker worked like in stackoverflow where millions of questions can be tagged with the same tag. But when I tag a second image in Docker the first one loses its tag:
So are images to tags one-to-many, i.e. one image can have multiple tags in a repo, but a tag cannot be applied to 2 or more images in the same repo?
Pushing a new tag replaces the old tag, but if you know the digest, you can pull the old manifest until the registry garbage collects it.
A tag is a pointer to a manifest in the registry, and it can only point to a single manifest, similar to a symlink in Linux. This is needed since everything else in the registry is content addressable, so you need the tag to avoid needing to remember long digests.
There are a couple manifest types, an image manifest, and a manifest list. The manifest list contains references to other manifests, which is commonly used for multi-platform images. So a tag pointing to a manifest list could refer to multiple images using a manifest list. But runtimes will only pull a single image out of that list. And that list is generated by the tool pushing the image, not dynamically created by the registry by merging the previous images into a list (that would break the content addressable logic since it would change the digest).

GC collection of Docker Registry

Since v2.4.0 a garbage collector command is included within the registry binary. I read about how it works in the official documentation.
To use the garbage-collection:
bin/registry garbage-collect [--dry-run] /path/to/config.yml
I see the config in /etc/docker/registry/config.yml
When I just perform a dry-run I see a lot of blobs marked and at the end the blobs which would have been deleted without dry-run.
But I don't see how I can easily link this blobs to images?
Which images will be deleted and am I able to tell which image should be deleted or do I need to use another command and after that I have to run the gc?)
Can someone maybe provide an example in which case an image/blob will be deleted? Thanks
From your referenced documentation:
In the context of the Docker registry, garbage collection is the process of removing blobs from the filesystem which are no longer referenced by a manifest. Blobs can include both layers and manifests.
Manifests are groups of blobs (layers) used to represent an image tag. The only blobs deleted no longer reference any image. So to answer your question, if GC is working correctly, no one should be able to give an example of this deleting an image, but every useful GC should delete blobs, including your own.

What is the relationship between a docker image ID and the IDs in the manifests?

I am trying to understand the connection between the image ID as reported by docker images (or docker inspect) and the actual layers or images in the registry or manifest (using v2).
I run docker images, I get (abbreviated and changed to protect the not-so-innocent):
REPOSITORY TAG IMAGE ID
my.local.registry/some/image latest abcdefg
If I pull the manifest for the above image using the API, I get one that contains fsLayers, not one of which matches the (full) ID for the image. I get that, since the image is the sum of the layers.
However, if I pull that image elsewhere, I get the same ID. If I update the image, push and pull it, the new version has a new ID.
I thought it might be the hash of the manifest. However, (a) pulling the manifest via the API does not return the hash of the manifest in the JSON, and (b) looking in the registry directory itself, the sha256 of the given manifest in /var/lib/registry/v2/repositories/some/image/_manifests/tags/latest/current/link (or those in index/sha256/) give the correct link for the manifest that was downloaded, but does not match the image ID.
I had assumed the image ID matches the blob, but perhaps I am in error?
When we push an image into a registry,
We create a manifest that defines the image - layers inside it, and push both the manifest and layers independently.
We compress the layers and only then push them.
So in our host the hashes we have are the hashes of the content present in those layers, called the Content Hashes.
But to the registry we send the compressed layers, due to which the content changes and so the hashes change. So before those compressed layers are sent, the hashes for the compressed layers are calculated which are called the Distribution Hashes and those hashes are put in the manifest file.
Due to the difference in these Content and Distribution hashes, you see a difference in the ids.

Resources