Docker registry storage - docker

Let's say I have a Docker private registery and a Dockerfile for an application.
When I build my image with a specific tag and push it to the registry, then re-build it with different code ( for example I copy a different JAR or I re-run npm install && ng build ) with that same tag, does the registry keep the old image, even though I don't need it anymore since I replaced it with a new one ? I am asking because I am concerned about the storage memory for the Docker registery, and maybe pushing so much layers that I don't need anymore and intented to replace will result in losing storage memory for nothing.

Yes, it will keep it.
A tag is just an easy way to have access to an image, but what ultimately matters is the hash of the image, and that will be different.
Probably, the registry you want to use will have some maintenance task to remove older images, or you can just write a simple script yourself.

Related

How to improve automation of running container's base image updates?

I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!
There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)

How to instruct docker or docker-compose to automatically build image specified in FROM

When processing a Dockerfile, how do I instruct docker build to build the image specified in FROM locally using another Dockerfile if it is not already available?
Here's the context. I have a large Dockerfile that starts from base Ubuntu image, installs Apache, then PHP, then some custom configuration on top of that. Whether this is a good idea is another point, let's assume the build steps cannot be changed. The problem is, every time I change anything in the config, everything has to be rebuilt from scratch, and this takes a while.
I would like to have a hierarchy of Dockerfiles instead:
my-apache : based on stock Ubuntu
my-apache-php: based on my-apache
final: based on my-apache-php
The first two images would be relatively static and can be uploaded to dockerhub, but I would like to retain an option to build them locally as part of the same build process. Only one container will exist, based on the final image. Thus, putting all three as "services" in docker-compose.yml is not a good idea.
The only solution I can think of is to have a manual build script that for each image checks whether it is available on Dockerhub or locally, and if not, invokes docker build.
Are there better solutions?
I have found this article on automatically detecting dependencies between docker files and building them in proper order:
https://philpep.org/blog/a-makefile-for-your-dockerfiles
Actual makefile from Philippe's git repo provides even more functionality:
https://github.com/philpep/dockerfiles/blob/master/Makefile

How stable are version-tagged docker baseimages? Should I make my own copy?

I am creating docker images based on base-images from docker.io (for example ubuntu:14.04).
I want my docker-builds to be 100% reproducible. One requirement for this is, that the baseimage does not change (or if it changes, it is a decision by me to use the changed baseimage).
Can I be sure that a version tagged base image (like ubuntu:14.04) will always be exactly the same?
Or should I make my own copy in my own private repository?
Version tags like ubuntu:14.04 can be expected to change with bug fixes. If you want to be sure you get the exact same image (still containing the fixed bugs) you can use the hash of the image:
FROM ubuntu#sha256:4a725d3b3b1c
But you can not be sure this exact version will be hosted forever by docker hub.
Safest way is to create your own docker repository server. Push the images you are using to that repository. Use the hash notation to pull the images from your local repos.
FROM dockerrepos.yourcompany.com/ubuntu#sha256:4a725d3b3b1c

Use a certain FROM layer in Dockerfile

I want to base my container off centos:centos6. But for some reason, centos:centos6 is updated somehow on the registry. This results in possible unfavorable different images when being built on different machines at different time. Such change caused segmentation fault in our application recently.
Is there a way I can specify the exact version for the from declaration so the build shall be the same even when the container being built on different machines at different time?
No, you can't tell a Dockerfile to be pinned to a particular image / layer ID, you have to use a tag (and if you don't use a tag, the tag latest is assumed and used as the default.
If you're worried that a remote registry image will change, you should take a copy of the Dockerfile and build your own version of the image yourself. You can either host it on the Docker Hub under your account or run your own private registry.
That way, you're in complete control over what's in there and when it gets updated (i.e., if you need to pin a particular package to an old version).

Why doesn't Docker Hub cache Automated Build Repositories as the images are being built?

Note: It appears the premise of my question is no longer valid since the new Docker Hub appears to support caching. I haven't personally tested this. See the new answer below.
Docker Hub's Automated Build Repositories don't seem to cache images. As it is building, it removes all intermediate containers. Is this the way it was intended to work or am I doing something wrong? It would be really nice to not have to rebuild everything for every small change. I thought that was supposed to be one of the best advantages of docker and it seems weird that their builder doesn't use it. So why doesn't it cache images?
UPDATE:
I've started using Codeship to build my app and then run remote commands on my DigitalOcean server to copy the built files and run the docker build command. I'm still not sure why Docker Hub doesn't cache.
Disclaimer: I am a lead software engineer at Quay.io, a private Docker container registry, so this is an educated guess based on the same problem we faced in our own build system implementation.
Given my experience with Dockerfile build systems, I would suspect that the Docker Hub does not support caching because of the way caching is implemented in the Docker Engine. Caching for Docker builds operates by comparing the commands to be run against the existing layers found in memory.
For example, if the Dockerfile has the form:
FROM somebaseimage
RUN somecommand
ADD somefile somefile
Then the Docker build code will:
Check to see if an image matching somebaseimage exists
Check if there is a local image with the command RUN somecommand whose parent is the previous image
Check if there is a local image with the command ADD somefile somefile + a hashing of the contents of somefile (to make sure it is invalidated when somefile changes), whose parent is the previous image
If any of the above steps match, then that command will be skipped in the Dockerfile build process, with the cached image itself being used instead. However, the one key issue with this process is that it requires the cached images to be present on the build machine, in order to find and verify the matches. Having all of everyone's images on build nodes would be highly inefficient, making this a harder problem to solve.
At Quay.io, we solved the caching problem by creating a variation of the Docker caching code that could precompute these commands/hashes and then ask our registry for the cached layers, downloading them to the machine only after we had found the most efficient caching set. This required significant data model changes in our registry code.
If you'd like more information, we gave a technical overview into how we do so in this talk: https://youtu.be/anfmeB_JzB0?list=PLlh6TqkU8kg8Ld0Zu1aRWATiqBkxseZ9g
The new Docker Hub came out with a new Automated Build system that supports Build Caching.
https://blog.docker.com/2018/12/the-new-docker-hub/

Resources