We have a continuous integration pipeline on circleci that does the following:
Loads repo/image:mytag1 from the cache directory to be able to use cached layers
Builds a new version: docker build -t repoimage:mytag2
Saves the new version to the cache directory with docker save
Runs tests
Pushes to docker hub: docker push repo/image:mytag2
The problem is with step 5. The push step takes 5 minutes every time. If I understand it correctly, docker hub is meant to cache layers so we don't have to re-push things like the base image and dependencies if they are not updated.
I ran the build twice in a row, and I see a lot of crossover in the hash of the layers being pushed. Yet rather than "Image already exists" I see "Image successfully pushed".
Here's the output of build 1's docker push, and here's build 2
If you diff those two files you'll see that only 2 layers differ in each build:
< ca44fed88be6: Buffering to Disk
< ca44fed88be6: Image successfully pushed
< 5dbd19bfac8a: Buffering to Disk
< 5dbd19bfac8a: Image successfully pushed
---
> 9136b10cfb72: Buffering to Disk
> 9136b10cfb72: Image successfully pushed
> 0388311b6857: Buffering to Disk
> 0388311b6857: Image successfully pushed
So why is it that all the images have to re-push every time?
Using a different tag creates a different image which, when pushed, cannot rely on the cache.
For example the two commands:
$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v1
$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v2
yield utterly distinctimages even though they were created from identical images (db65bf421f96). When pushed, dockerhub must treat them as completely separate images as can be seen with:
$ docker images
REPOSITORY TAG IMAGE ID
me/thing v2 f14aa8ac6bae
me/thing v1 c7d72ccc1d71
The image IDs are unique and thus the images are unique even only if they vary in tags.
You could say "docker should recognize them as being bit for bit identical" and thus treat them as cachable. But it doesn't (yet).
The only surprise for me in your example is that you got any duplicate image IDs at all.
Authoritative (if less explanatory) documentation can be found at docker in "Build your own images".
The process should work as you described. In fact we're building all of our images in this way without problems. Usually there are just a few changes to the topmost layers and only those are pushed to the registry - otherwise the whole concept of image layers would be useless.
See here for an example. Only the two topmost layers have changed, are pushed for :latest and for :4.0.2 there's no push at all. We're tagging images with git tags and for some projects we even tag images with git describe - to get the rollback functionality, just in case.
You can get the project source-code also from GitHub to try it out.
A few things to note about the setup: We're using a self-hosted GitLab CI with a customized runner which runs docker and docker-compose on an isolated host with Docker 1.9.1, but that should not make any difference.
There may be also differences in the registry version, I had the feeling (but I am not 100% sure) that some older repos on DockerHub are still running on registry v1, newer ones always on v2 - so you may try creating a new repo and see if the issue still occurs.
Please note that the behavior for tags described above does only apply when pushing the same image-name, if you push the same image layers with another name, you always need to push all layers, despite the fact that all layers should already exists on the registry, so I guess repo/image:mytag1 and repoimage:mytag2 actually go to repo/image and the missing slash is just a typo.
Another cause could be that your images are built on different hosts on Circle CI, but then you should also get different layer IDs, so I think this is not very likely.
I suggest to build an image manually and try to reproduce the problem or contact Circle CI about this issue.
Related
I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!
There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)
In 2019, I made a pull image of Python 3.6. After that, I was sure that the image was self-updating (I did not use it actively, I just hoped that the latest pushes themselves were pulled from the repository or something like that), but I was surprised when I accidentally noticed the download/creation date is 2019.
Q: How does image pull work? Are there flags so that the layer hash/its relevance* is checked every time the image is built? Perhaps there is a way to set this check through the docker daemon config file? Or do I have to delete the base image every time to get a new image?
What I want: So that every time I build my images, the base image is checked for the last push (publication of image) in the docker hub repository.
Note: I'm talking about images with an identical tag. Also, I'm not afraid to re-build my images, there is no purpose to preserve them.
Thanks.
You need to explicitly docker pull the image to get updates. For your custom images, there are docker build --pull and docker-compose build --pull options that will pull the base image (though there is not a "pull" option for docker-compose up --build).
Without this, Docker will never check for updates for an image it already has. If your Dockerfile starts FROM python:3.6 and you already have a local image with that name and tag, Docker just uses it without contacting Docker Hub. If you don't have it then Docker will pull it, once, and then you'll have it locally.
The other thing to watch for is that the updates do eventually stop. If you look at the Docker Hub python image page you'll notice that there are no longer rebuilds for Python 3.5. If you pin to a very specific patch version, the automated builds generally only build the latest patch version for each supported minor version; if your image is FROM python:3.6.11 it will never get updates because 3.6.12 is the latest 3.6.x version.
I have a CI-pipeline that builds a docker image for my app for every run of the pipeline (and the pipeline is triggered by a code-push to the git repository.)
The docker image consists of several intermediate layers which progressively become very large in size. Most of the intermediate images are identical for each run, hence the caching mechanism of docker is significantly utilized.
However, the problem is that the final couple layers are different for each run, as they result from a COPY statement in dockerfile, where the built application artifacts are copied into the image. Since the artifacts are modified for every run, the already cached bottommost images will ALWAYS be invalidated. These images have a size of 800mb each.
What docker command can I use to identify (and delete) these image that gets replaced by newer images, i.e. when they get invalidated?
I would like to have my CI-pipeline to remove them at the end of the run so they don't end up dangling on the CI-server and waste a lot of disk space.
If I understand correctly: With every code push, CI pipeline creates new image, where new version of application is deployed. As a result, previously created image becomes outdated, so you want to remove it. To do so, you have to:
Get rid of all outdated containers, which where created from outdated image
display all containers with command docker ps -a
if still running, stop outdated containers with command docker stop [containerID]
remove them with command docker rm [containerID]
Remove outdated images with command: docker rmi [imageID]
To sum up why this process is needed: you cannot remove any image, until it is used by any existing container (even stopped containers still require their images). For this reason, you should first stop and remove old containers, and then remove old images.
Detection part, and automation of deletion process should be based on image versions and container names, which CI pipeline generates while creating new images.
Edit 1
To list all images, which have no relationship to any tagged images, you can use command: docker images -f dangling=true. You can delete them with the command: docker images purge.
Just one thing to remember here: If you build an image without tagging it, the image will appear on the list of "dangling" images. You can avoid this situation by providing a tag when you build it.
Edit 2
The command for image purging has changed. Right now the proper command is:
docker image prune
Here is a link with a documentation
I have the following scenario:
A daemon_pulling running docker pull the last version of an image from a private registry.
E.g. docker pull localhost:5000/myimage:v1 # sha or image id: 1234
A daemon_pushing running docker push of the last version of a image.
E.g. docker commit container_stable localhost:5000/myimage:v1 && docker push localhost:5000/myimage:v1 # sha or image id: 6789
The code works fine to deploy images based on containers!
The problem is when a dameon_pushing (sha or image id: 6789) is running and run a daemon_pulling (sha or image id: 1234) at the same time, because the pushing (6789) is not finished when a docker pull (1234) is used and detect a local change (6789 != 1234) and try download the image (1234) again but my last stable image is pushing (6789)...
I'm looking for a way to push without affect a pull in progress, and vice versa.
What is a better way to manage this concurrency?
I tried using a different Docker image name as pivot and rename it directly on the registry server, but I didn't find a way to rename remotely (just local rename).
It looks like you have set up your CI build to pull an existing image, run a container from it and install the updates, commit the changes to the same image name and then push it back to the registry. Continuously updating images by running containers and committing to the same image is not a good practice, since it hides the changes, and makes it unnecessarily difficult to replicate the build.
A better way would be to build the image from a Dockerfile, where you define all build steps. Look at the Reference Architecture on Docker's official Continuous Integration use case for examples. If you want to shorten build times, you can make your own base image to start from.
Is there any way to prevent images being uploaded to docker hub with the same tags as existing images? Our use case is as follows.
We deploy to production with a docker-compose file with the tags of images as version numbers. In order to support roll-back to previous environments and idempotent deployment it is necessary that a certain tagged docker image always refer to the same image.
However, docker hub allows images to be uploaded with the same tags as existing images (they override the old image). This completely breaks the idea of versioning your images.
We currently have work-arounds which involve our build scripts pulling all versions of an image and looking through the tags to check that an overwrite will not happen etc. but it feels like there has to be a better way.
If docker hub does not support this, is there a way to do docker deployment without docker hub?
The tag system has no way of preventing images been overwritten; you have to come up with your own processes to handle this (and h3nrik's answer is an example of this).
However, you could use the digest instead. In the new v2 of the registry, all images are given a checksum, known as a digest. If an image or any of its base layers change, the digest will change. So if you pull by digest, you can be absolutely certain that the contents of that image haven't changed over time and that the image hasn't been tampered with.
Pulling by digest looks like:
docker pull debian#sha256:f43366bc755696485050ce14e1429c481b6f0ca04505c4a3093dfdb4fafb899e
You should get the digest when you do a docker push.
Now, I agree that pulling by digest is a bit unwieldy, so you may want to set up a system that simply tracks digest and tag and can verify that the image hasn't changed.
In the future, this situation is likely to improve, with tools like Notary for signing images. Also, you may want to look at using labels to store metadata such as git hash or build number.
Assuming you have a local build system to build your Docker images: you could include the build number from your local build job in your tag. With that you assure your requirement:
... it is necessary that a certain tagged docker image always refer to the same image.
When your local build automatically pushes to docker hub it is assured that each push pushes an image with a unique tag.