check existence of docker image in private registry in Go - docker

I'm wondering how I can check that a docker image exists in a private registry (in eu.gcr.io), without pulling it.
I have a service, written in golang, which needs to check for the existence of a docker image in order to validate a config file passed to it by a user.
Pulling the image using the go docker client, as shown here, works. However, I don't want to pull down images just to check they exist, as they can be large.
I've tried using Client.ImageSearch, but his just searches for public images. the cloud.google.com/go package also doesn't seem to have anything for dealing with the container registry.
There's possibly this and the crane tool it contains, but I'm really struggling to figure out how it works. The documentation is... not great.
I'd like the solution to be host agnostic, and the only option I have found is to simply make a http request and use the logic from this answer.
Are there any docker or other packages able to do this in a cleaner way?

Just realised the lib I've been using has an unhelpfully named client method DistributionInspect (link), which will just return the image digest and manifest, if it's found. so the image doesn't get pulled down.

Related

Confused with the concept of docker tags

I am going through a books and came across this line:
Whereas a tag can only be applied to a single image in a repository, a single image can have several tags. For example, the Java repository on Docker Hub maintains the following tags: 7, 7-jdk, 7u71, 7u71-jdk, openjdk-7, and openjdk-7u71. All these tags are applied to the same image.
My question is: why will a single image have several tags? What is the purpose of tagging the same image with different tags?
In my opinion there are mainly two reasons.
First of all, it is for convenience, so you may give multiple aliases to the same images.
But you can give other (special) tags to images in order to push it to a different registry.
Suppose (for example) you are using microk8s and you enabled the registry service. To push a local image of yours, you have to apply it with a tag formerly named localhost:32000/my-image:my-tag.
In that scenario your image will have two tags my-image:my-tag and localhost:32000/my-image:my-tag. So, to push it to microk8s registry the only thing you will have to do is to issue the command git push localhost:32000/my-image:my-tag (the image tag will be parsed to get the URI of the registry to push to).
The above concept, obviously, can be applied to any other remote registry.
You can use multiple tags for all sorts of purposes. The most popular one is to have a "latest" image, for example.
Imagine you're pulling the latest ubuntu image.
That would be "docker pull ubuntu:latest". Try pulling ubuntu:20.04 - you'll find out you have already pulled the image.
NOTE: After a while, ubuntu:latest won't be the same a ubuntu:20.04 anymore, but to the newest tag. However, you'll always have a pointer to the latest version of the ubuntu image and wherever you're using it, you won't need to change the tags.

How to really delete Docker blobs and images to free up space in a private Docker registry

I am having trouble clearing up disk space in a private docker registry I am running. I am also having trouble understanding all the different digest SHAs returned by Docker API calls.
My private docker registry is setup in .../docker/registry/v2 It contains two folders: blobs (~1.4GB) and repositories (~944KB). I've pushed several repositories and multiple tags there.
At first I only needed to remove repo's/tags from being returned by some API calls for front-end display but now I need to free up disk space and can't figure out how.
1. Remove Repo and tag names (cosmetic)
I list docker repositories with their tags on a user front-end by parsing the results of the Docker API calls GET /_catalog and GET /<repo name>/tags/list for each repo. Sometimes I need to delete things so they don't show up in this list and I've been able to do that with the following approach.
GET /<repo name>/manifests/<tag>
DELETE /<repo name>/manifests/<digest SHA in headers.Docker-Content-Digest of the above response>
I'm not sure this is the right way to do this but I needed to somehow control the front-end display and this has worked for now.
This operation reduces the size of the local repositories folder by a few KB but doesn't affect the blobs folder.
2. Clear up disk space and really delete repo's
Where I'm stuck is how to actually free up the large disk space taken up by the blobs directory. There are many different digest SHAs that are returned by GET /<repo name>/manifests/<tag>: body.config.digest and the various body.layers[i].digest.
I tried executing DELETE /<repo name>/blobs/<some digest> on these different digests. For some it is successful, for others it says blob not found. When successful the repositories directory size is reduced by a few KB but blobs directory size doesn't change.
I only tried this after I already hit DELETE /<repo name>/manifests/<headers.Docker-Content-Digest> because I didn't want to accidentally break something and not be able to remove the repo's/tags from the front-end.
My question is: what is the correct way to clear space due to repo's/tags from a private docker registry. My hope is that the best practice will also delete them from the /_catalog and /<repo name>/tags/list as well.
Any insight on the use/meaning of the different digests discussed above would also be greatly appreciated.

After deleting image from Google Container Registry and uploading another with the same tag, deleted one is still pulled

I don't know if this is intended behavior or bug in GCR.
Basically I tried do it like that:
Create image from local files using Docker on Windows (Linux based image).
Before creating image I delete all local images with the same name/tag.
Image is tagged like repostiory/project/name:v1
When testing locally image have correct versions of executables (docker run imageID).
Before pushing image to GCR I delete all images from GCR with the same tag/name.
When Trying to pull new image from GCR to example kubernetes it pull the first (ever) image uploaded under particular tag.
I want to reuse the same tag to not change config file with every test and I don't really need to store previous versions of images.
It sounds like you're hitting the problem described in kubernetes/kubernetes#42171.
tl;dr, the default pull policy of kubernetes is broken by design such that you cannot reuse tags (other than latest). I believe the guidance from the k8s community is to use "immutable tags", which is a bit of an oxymoron.
You have a few options:
Switch to using the latest tag, since kubernetes has hardcoded this in their default pull policy logic (I believe in an attempt to mitigate the problem you're having).
Never reuse a tag.
Switch to explicitly using the PullAlways ImagePullPolicy. If you do this, you will incur a small overhead, since your node will have to check with the registry that the tag has not changed.
Switch to deploying by image digest with the PullIfNotPresent ImagePullPolicy. A more detailed explanation is in the PR I linked, but this gets you the best of both worlds.

When would a Docker image and its repository have different names?

The standard usage of the docker tag command is:
docker tag <image> <username>/<repository>:<tag>
So for example: docker tag friendlyhello john/get-started:part1.
Coming from Java-land, I'm used to Maven/Gradle-style coordinates of group:artifact:version, so to me, it makes sense for the image and the repository to be one in the same:
The image is the artifact you're producing, and in Java-land there's usually a 1:1 relationsip between the generated artifact and the source repo its code lives inside of. So to me, it makes more sense for the command to be just:
docker tag <username>/<repository>:<tag>
So for example: docker tag john/get-started:part1, where john is the username/group, get-started is the artifact/repo and part1 is the tag/version.
TO BE CLEAR: I am not asking what the difference is between an image and a repository! I understand that a repository is a location where images are stored, and I understand that an image is a Docker executable consisting of your Dockerized app and its dependencies. But from a naming standpoint, I'm confused as to why/when they should ever be different from each another.
So I ask: what is the difference between an image and a repository from a naming convention standpoint? For example if I wanted to make my own MySQL Docker image, I'd chose to make the image named "myapp-db", and that would also be the name of the repository where it lived (smeeb/myapp-db:v1, smeeb/myapp-db:v2, etc.).
So under what circumstances are/should image and repository names be different?
First a prerequisite: a tag is a pointer to an image, and an image is a sha256 reference to a manifest of configuration and layers that docker uses to make containers. What that means is that friendlyhello is not the name of an image, it's a tag that points to an image. The image is the id, something like c75bebcdd211.....
Next, each image can have zero, one, or multiple tags all pointing to it. When it doesn't have any tags pointing to it, that's referred to as a dangling image. That can happen if you build an image with a tag, and then rebuild it. The previous image is now untagged because the tag is pointing to the new image. Similarly you can have the tags image:latest, image:v1, image:1.0.1, and myrepo:5000/image:1.0 all pointing to the same image id.
Tags have a dual use. They can be for convenience. But they are also used by docker push and docker pull to lookup where to send or retrieve the package. If you don't do a push or a pull, then you can name it whatever you want and no one will know the difference. But if you do want to store it on a registry, the tag needs to identify which registry, or the default docker hub. And that tag also needs to identify the path on the registry, called the repository, and the versioning after the colon.
One confusing bit is that the short name at the end of the repository name is often called an "image name", and the versioning after the colon is often called a "tag", and I think this is much easier to understand if you forget those terms were ever overloaded like that.
Now with all that background (sorry, that was a lot), here are a few corrections to the question:
Instead of:
docker tag <image> <username>/<repository>:<tag>
Think of the syntax as:
docker tag <source> <tag>
Where <source> can be an image id, or another tag name. This means the following command won't make sense:
docker tag <username>/<repository>:<tag>
Because docker tag needs a source to tag, and it has no sense of context for what image you are currently working with.
Lastly, why would you use a name other than your repository name for an image, here are a few reasons I've encountered:
The image won't be pushed to a repository. It could be for local testing, or an intermediate step in a workflow, or you build and run your images on the same system.
You may have multiple names for the same image. registry/repo/image:v1 and registry/repo/image:v1.0.1 is a common example. I'll also tag the current image in a specific environment with registry/repo/image:STAGE to note that it made it through dev and CI and is now in the staging environment.
You may be moving images between registries. We pull images from hub.docker.com and retag them locally with a local registry. That gives us both a local cache and also a way to control when we update our base images to the next version. That's preferable to having an update image update in the middle of a production rollout.
I've also used tags to override upstream images. So instead of changing all my build scripts for an issue I have with an upstream image, I can just make my change and tag it with the upstream name. Then as long as I don't run a pull on that docker host, the builds will run using my modified base image.
One situation where you can have an image with a different tag than the repository name is if you have an image in use that is outdated.
For instance you download and run a MySQL:5 image. This container is still running when you pull a newer version of the MySQL:5 image. At that point the old image will be untagged (identifiable only by its hash), but not deleted, because it is still in use by the running MySQL container.
Another situation is that you can have intermediate images while building a new image. Basically each line gets committed as a new image, but they are not named with the name you specify as the final image name.
When using docker tag you don't even have to use the image name as the first parameter. You can even use the hash of the image that you want to tag as the first parameter, so it's more flexible than just namespace/repository:tag.
The difference between an image and repository must be stated:
An image is a tagged repository. That's the only difference. The <username> is part of the repository name.
From the overview of the Docker Registry Distribution API:
Classically, repository names have always been two path components
where each path component is less than 30 characters. The V2 registry
API does not enforce this. The rules for a repository name are as
follows:
A repository name is broken up into path components. A component of a
repository name must be at least one lowercase, alpha-numeric
characters, optionally separated by periods, dashes or underscores.
More strictly, it must match the regular expression
[a-z0-9]+(?:[._-][a-z0-9]+)*. If a repository name has two or more
path components, they must be separated by a forward slash ("/"). The
total length of a repository name, including slashes, must be less
than 256 characters.
Just use meaningful names for your images and tags. You could have smeeb/myapp and smeeb/myapp-db. For tags, the convention is to use versioned tags and a latest one.

How to delete docker image data or layer in nexus3

I'm trying out nexus oss 3.0.1-01. I have a docker repository setup and I'm able to push and pull images successfully. But I need a way delete images. For docker, deleting a component won't actually delete the actual image layers from the file system because it maybe referred to by other components. So, what is the proper way to handle it?
I even deleted every single components and then ran a scheduled task to compact blob store. But that didn't seem to do much in terms of free up storage space.
My understanding is that there isn't a feature in nexus3 at the moment. If there is, could you please point me to some documentation on it? Otherwise, how is everyone else managing their storage space for docker repository?
We had a user contribute this recently:
https://gist.github.com/lukewpatterson/bf9d19410094ea8bced1d4bb0523b67f
You can read about usage here: https://issues.sonatype.org/browse/NEXUS-9293
As well, a supported feature for this will be coming soon from Sonatype.
This is something that needs to be provided at the Docker Registry level. Currently it appears to be broken on v3.1
Did you try to go to assets and delete the layers? If that did not remove the files from the blob store, along with compact blob store, then it is a Nexus problem.
Make sure to tack this issues and confirm that this is the desired behavior for 3.2
See issues
https://issues.sonatype.org/browse/NEXUS-9497
https://issues.sonatype.org/browse/NEXUS-9293
In Nexus 3.14 you go to WebUI -> Tasks -> Create -> Docker - Delete unused manifests and images
Then another job Admin - Compact blob store to actually rm the files from the Nexus directory.
Before that you need to delete the Nexus components (using the cleanup policy+job), as original poster did.

Resources