Is there any way to prevent images being uploaded to docker hub with the same tags as existing images? Our use case is as follows.
We deploy to production with a docker-compose file with the tags of images as version numbers. In order to support roll-back to previous environments and idempotent deployment it is necessary that a certain tagged docker image always refer to the same image.
However, docker hub allows images to be uploaded with the same tags as existing images (they override the old image). This completely breaks the idea of versioning your images.
We currently have work-arounds which involve our build scripts pulling all versions of an image and looking through the tags to check that an overwrite will not happen etc. but it feels like there has to be a better way.
If docker hub does not support this, is there a way to do docker deployment without docker hub?
The tag system has no way of preventing images been overwritten; you have to come up with your own processes to handle this (and h3nrik's answer is an example of this).
However, you could use the digest instead. In the new v2 of the registry, all images are given a checksum, known as a digest. If an image or any of its base layers change, the digest will change. So if you pull by digest, you can be absolutely certain that the contents of that image haven't changed over time and that the image hasn't been tampered with.
Pulling by digest looks like:
docker pull debian#sha256:f43366bc755696485050ce14e1429c481b6f0ca04505c4a3093dfdb4fafb899e
You should get the digest when you do a docker push.
Now, I agree that pulling by digest is a bit unwieldy, so you may want to set up a system that simply tracks digest and tag and can verify that the image hasn't changed.
In the future, this situation is likely to improve, with tools like Notary for signing images. Also, you may want to look at using labels to store metadata such as git hash or build number.
Assuming you have a local build system to build your Docker images: you could include the build number from your local build job in your tag. With that you assure your requirement:
... it is necessary that a certain tagged docker image always refer to the same image.
When your local build automatically pushes to docker hub it is assured that each push pushes an image with a unique tag.
Related
I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!
There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)
Asked in a different question:
why does skaffold need two tags to the same image?
During deployment, Skaffold rewrites the image references in the Kubernetes manifests being deployed to ensure that the cluster pulls the the newly-built images and doesn't use stale copies (read about imagePullPolicy and some of the issues that it attempts to address). Skaffold can't just use the computed image tag as many tag conventions do not produce unique tags and the tag can be overwritten by another developer and point to a different image. It's not unusual for a team of devs, or parallel tests, to push images into the same image repository and encounter tag clashes. For example, latest will be overwritten by the next build, and the default gitCommit tagger generates tags like v1.17.1-38-g1c6517887 which uses the most recent version tag and the current commit SHA and so isn't unique across uncommitted source changes.
When pushing to a registry, Skaffold can use the image's digest, the portion after the # in gcr.io/my-project/image:latest#sha256:xxx. This digest is the hash of the image configuration and layers and uniquely identifies a specific image. A container runtime ignores the tag (latest here) when there is a digest.
When loading an image to a Docker daemon, as happens when deploying to minikube, the Docker daemon does not maintain image digests. So Skaffold instead tags the image with a second tag using a computed digest. It's extremely unlikely that two different images will have the same computed digest, unless they're the same image.
Tags are cheap: they're like symlinks, pointing to an image identifier.
How does docker handle digests?
I can see in plain text, when I run docker image --inspect, the digest of an image. And there's also the thing that local images don't have a digest until I push them to a registry (and AFAIK, if I push an image to various registries, it will have various digests, but never tried that).
I fear that docker might be actually using that info instead of calculating the hash every time that I use or pull an image.
Is there a way to actually tell docker: "Hey, I want you to recheck right now the hash of the image contents. Are they the exact same as when I first created the image? Or has someone manipulated it ever?"
And: does docker really calculate that hash every time an image is run (by digest), or at least every time an image is pulled (by digest)?
The digest is calculated on push and pull to a registry. It's a sha256 checksum of the image manifest, which is current versions of docker is independent of the registry (the older schema v1 syntax included the repository/tag in the manifest that resulted in the digest changing depending on the image name). The layer digests are included in that manifest, and those digests on the registry are compressed tar files. Once the files have been extracted on the local docker engine, they aren't reverified, and I'm not aware of a command yet that would verify the files under /var/lib/docker have not been changed since the image was pulled.
I'm using some docker images, which I have pulled from a registry:
docker pull registry.example.com/project/backend:latest
docker pull registry.example.com/project/frontend:latest
Now there is a new version on the server registry. If I do a new pull, I will overwrite the current images. But I need to keep the current working images in case I do get some problems with the newest latest images.
So, how do I create a kind of backup of my running backend:latest and frontend:latest? After that I can pull the latest latest image and in case I need to, I can use the old working images...
To keep the current image on your local environment you can use docker tag
docker tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
For example:
docker tag registry.example.com/project/backend:latest registry.example.com/project/backend:backup
Then when you pull the latest, the registry.example.com/project/backend:backup still existing
Pulling an image never deletes an existing image. However, if you have an image with the same name, the old image will become unnamed, and you'll have to refer to it by its image ID.
You've now seen the downside to using :latest tags. This is why it is better to reference an image by a specific version tag that the maintainer won't re-push.
First, you shouldn't be using latest in production environments. Rather define a tag you confirmed working.
And instead of executing stuff in an image to set it up, you should write a Dockerfile and make the installation repeatable and create your local image. That's actually one of the main reasons why docker is used.
One can easily build docker images through docker build command.
What I'm wondering is the t flag that you can give when building the image. For example:
$ docker build -t ouruser/sinatra:v2 .
According to documentation, the t flag is for tagging and naming purposes. Name is the part before ':', and tag is the part after it. So in our example, the name is ouruser/sinatra, and the tag is v2.
I thought this would be the image name and tag. But apparently, the name is actually some repository name? Why do I think it is? Well, because if you would after this list the images with command:
docker images
You would get a listing like this:
REPOSITORY TAG IMAGE ID CREATED SIZE
ouruser/sinatra latest 5db5f8471261 11 hours ago 446.7 M
Bang! Major shock! You thought you were creating an image with name, and instead, you specified some repository! Related to this, I have some questions:
Where is this repository located?
Can I name the image without creating a repository?
Where and how is this repository used, or could be used?
Where can I find more information about this repository? I only found this, and it doesn't tell much to be honest: docker build docs
Why is it common to use names that consist of two parts like this: somename/someothername?
Thank you for your help!
I believe the confusion here is the word "repository." In Docker, a repository is any group of builds of an image with the same name, and potentially multiple tags. A "registry" server, like hub.docker.com or your own private registry, holds multiple repositories, e.g. the redis repository on the public registry. That one repository has multiple tags for different versions of the build.
So with that background, to answer your questions:
ouruser/sinatra is located on your local Docker host until you do a docker push
No, the repository and the tag is the name of the image.
While local on your system, you can use this image locally. Once you push it up to a registry, you can then pull it down to any other Docker host that has access to that registry. And if you do a docker save you can save that image for a docker load on another host.
I'm sure there is documentation covering this somewhere on docs.docker.com, but I learned from a class.
The username/imagebase format came about to support pushing to your own namespace in hub.docker.com. Without that, whoever makes the first "Redis" image calls it "redis" while the next person makes their own repository called "redis-improved", and we quickly get into a jumble of confusing names where it's not clear who made what and what is a reputable image. That naming isn't required for images you make locally, but is still encouraged since images you pull from hub.docker.com may lack a username if they are maintained by Docker themselves. Without your username, you won't know which images you pulled down and which you built yourself.