How to manage concurrency in docker push and docker pull? - docker

I have the following scenario:
A daemon_pulling running docker pull the last version of an image from a private registry.
E.g. docker pull localhost:5000/myimage:v1 # sha or image id: 1234
A daemon_pushing running docker push of the last version of a image.
E.g. docker commit container_stable localhost:5000/myimage:v1 && docker push localhost:5000/myimage:v1 # sha or image id: 6789
The code works fine to deploy images based on containers!
The problem is when a dameon_pushing (sha or image id: 6789) is running and run a daemon_pulling (sha or image id: 1234) at the same time, because the pushing (6789) is not finished when a docker pull (1234) is used and detect a local change (6789 != 1234) and try download the image (1234) again but my last stable image is pushing (6789)...
I'm looking for a way to push without affect a pull in progress, and vice versa.
What is a better way to manage this concurrency?
I tried using a different Docker image name as pivot and rename it directly on the registry server, but I didn't find a way to rename remotely (just local rename).

It looks like you have set up your CI build to pull an existing image, run a container from it and install the updates, commit the changes to the same image name and then push it back to the registry. Continuously updating images by running containers and committing to the same image is not a good practice, since it hides the changes, and makes it unnecessarily difficult to replicate the build.
A better way would be to build the image from a Dockerfile, where you define all build steps. Look at the Reference Architecture on Docker's official Continuous Integration use case for examples. If you want to shorten build times, you can make your own base image to start from.

Related

How to improve automation of running container's base image updates?

I want all running containers on my server to always use the latest version of an official base image e.g. node:16.3 in order to get security updates. To achieve that I have implemented an image update mechanism for all container images in my registry using a CI workflow which has some limitations described below.
I have read the answers to this question but they either involve building or inspecting images on the target server which I would like to avoid.
I am wondering whether there might be an easier way to achieve the container image updates or to alleviate some of the caveats I have encountered.
Current Image Update Mechanism
I build my container images using the FROM directive with the minor version I want to use:
FROM node:16.13
COPY . .
This image is pushed to a registry as my-app:1.0.
To check for changes in the node:16.3 image compared to when I built the my-app:1.0 image I periodically compare the SHA256 digests of the layers of the node:16.3 with those of the first n=(number of layers of node:16.3) layers of my-app:1.0 as suggested in this answer. I retrieve the SHA256 digests with docker manifest inpect <image>:<tag> -v.
If they differ I rebuild my-app:1.0 and push it to my registry thus ensuring that my-app:1.0 always uses the latest node:16.3 base image.
I keep the running containers on my server up to date by periodically running docker pull my-app:1.0 on the server using a cron job.
Limitations
When I check for updates I need to download the manifests for all my container images and their base images. For images hosted on Docker Hub this unfortunately counts against the download rate limit.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. This information is especially important when the update process breaks a service. I keep track of the updates by logging the output of the docker pull command from the cron job.
To be able to revert the container image on the server I have to keep previous versions of the my-app:1.0 images as well. I do that by pushing incremental patch version tags along with the my-app:1.0 tag to my registry e.g. my-app:1.0.1, my-app:1.0.2, ...
Because of the way the layers of the base image and the app image are compared it is not possible to detect a change in the base image where only the uppermost layers have been removed. However I do not expect this to happen very frequently.
Thank you for your help!
There are a couple of things I'd do to simplify this.
docker pull already does essentially the sequence you describe, of downloading the image's manifest and then downloading layers you don't already have. If you docker build a new image with an identical base image, an identical Dockerfile, and identical COPY source files, then it won't actually produce a new image, just put a new name on the existing image ID. So it's possible to unconditionally docker build --pull images on a schedule, and it won't really use additional space. (It could cause more redeploys if neither the base image nor the application changes.)
[...] this unfortunately counts against the download rate limit.
There's not a lot you can do about that beyond running your own mirror of Docker Hub or ensuring your CI system has a Docker Hub login.
Since I always update the same image my-app:1.0 it is hard to track which version is currently running on the server. [...] To be able to revert the container image on the server [...]
I'd recommend always using a unique image tag per build. A sequential build ID as you have now works, date stamps or source-control commit IDs are usually easy to come up with as well. When you go to deploy, always use the full image tag, not the abbreviated one.
docker pull registry.example.com/my-app:1.0.5
docker stop my-app
docker rm my-app
docker run -d ... registry.example.com/my-app:1.0.5
docker rmi registry.example.com/my-app:1.0.4
Now you're absolutely sure which build your server is running, and it's easy to revert should you need to.
(If you're using Kubernetes as your deployment environment, this is especially important. Changing the text value of a Deployment object's image: field triggers Kubernetes's rolling-update mechanism. That approach is much easier than trying to ensure that every node has the same version of a shared tag.)

How do I get the last push of an image with the x.x tag if I already have an old push of an x.x image?

In 2019, I made a pull image of Python 3.6. After that, I was sure that the image was self-updating (I did not use it actively, I just hoped that the latest pushes themselves were pulled from the repository or something like that), but I was surprised when I accidentally noticed the download/creation date is 2019.
Q: How does image pull work? Are there flags so that the layer hash/its relevance* is checked every time the image is built? Perhaps there is a way to set this check through the docker daemon config file? Or do I have to delete the base image every time to get a new image?
What I want: So that every time I build my images, the base image is checked for the last push (publication of image) in the docker hub repository.
Note: I'm talking about images with an identical tag. Also, I'm not afraid to re-build my images, there is no purpose to preserve them.
Thanks.
You need to explicitly docker pull the image to get updates. For your custom images, there are docker build --pull and docker-compose build --pull options that will pull the base image (though there is not a "pull" option for docker-compose up --build).
Without this, Docker will never check for updates for an image it already has. If your Dockerfile starts FROM python:3.6 and you already have a local image with that name and tag, Docker just uses it without contacting Docker Hub. If you don't have it then Docker will pull it, once, and then you'll have it locally.
The other thing to watch for is that the updates do eventually stop. If you look at the Docker Hub python image page you'll notice that there are no longer rebuilds for Python 3.5. If you pin to a very specific patch version, the automated builds generally only build the latest patch version for each supported minor version; if your image is FROM python:3.6.11 it will never get updates because 3.6.12 is the latest 3.6.x version.

updating docker image given changes to local filesystem

I am trying to work out how I can update an existing image when I make changes to the local filesystem that was used to create the docker image. I thought that I could use docker commits to do that, but it seems that that allows you to change the image when there are changes to the filesystem on a running image?
/app.py
build from file system
sudo docker build -t app
now there are local changes to /app.py. How do I change the image app to reflect the changes to /app.py? right now I'm having to delete the old image and then create a new one.
sudo docker rmi app
sudo docker build -t app
any help is appreciated!
First of all, there's no running image, only running container. Image is something deliverable in Docker way, you build your image and then start a container from it.
To your problem, I think you have mentioned your options:
Rebuild your image
Go inside a running container, make changes and docker commit it back. Personally I only use this way to fix a tiny problem or make a hotfix to my image if docker build takes a really long time.
Docker uses union FS with copy on write to build image, which means if you want make a change to an image, you can't change it in-place, it'll create extra layer(s) to reflect your change(s), it'll just use the same image name in some cases. And from the perspective of delivery, I think it's totally OK to build a new image (with different tag) for each release, or even it should be done this way, that's why you have an Dockerfile, and images are not only something you start your container, they're actually versioned delivery artifacts and you can roll back to any version if you want/need. So I think your current solution is OK.
A few more words here: for local development and test, you can just mount your /app.py as a volume to your container when you start it, something like docker run -v /path/to/host/app.py:/path/to/container/app.py your_base_image_to_run_app, then anything you changed on your local FS to app.py, it'll reflect to the container. When you finish your job, build a new image.
As per your current design Solution is to create new image and assign same tag.
Best solution is expose environment variables from docker image and use those variable to update app.py so that you don't need to change image every time.Only one image is sufficient.

Docker hub image cache doesn't seem to be working

We have a continuous integration pipeline on circleci that does the following:
Loads repo/image:mytag1 from the cache directory to be able to use cached layers
Builds a new version: docker build -t repoimage:mytag2
Saves the new version to the cache directory with docker save
Runs tests
Pushes to docker hub: docker push repo/image:mytag2
The problem is with step 5. The push step takes 5 minutes every time. If I understand it correctly, docker hub is meant to cache layers so we don't have to re-push things like the base image and dependencies if they are not updated.
I ran the build twice in a row, and I see a lot of crossover in the hash of the layers being pushed. Yet rather than "Image already exists" I see "Image successfully pushed".
Here's the output of build 1's docker push, and here's build 2
If you diff those two files you'll see that only 2 layers differ in each build:
< ca44fed88be6: Buffering to Disk
< ca44fed88be6: Image successfully pushed
< 5dbd19bfac8a: Buffering to Disk
< 5dbd19bfac8a: Image successfully pushed
---
> 9136b10cfb72: Buffering to Disk
> 9136b10cfb72: Image successfully pushed
> 0388311b6857: Buffering to Disk
> 0388311b6857: Image successfully pushed
So why is it that all the images have to re-push every time?
Using a different tag creates a different image which, when pushed, cannot rely on the cache.
For example the two commands:
$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v1
$ docker commit -m "thing" -a "me" db65bf421f96 me/thing:v2
yield utterly distinctimages even though they were created from identical images (db65bf421f96). When pushed, dockerhub must treat them as completely separate images as can be seen with:
$ docker images
REPOSITORY TAG IMAGE ID
me/thing v2 f14aa8ac6bae
me/thing v1 c7d72ccc1d71
The image IDs are unique and thus the images are unique even only if they vary in tags.
You could say "docker should recognize them as being bit for bit identical" and thus treat them as cachable. But it doesn't (yet).
The only surprise for me in your example is that you got any duplicate image IDs at all.
Authoritative (if less explanatory) documentation can be found at docker in "Build your own images".
The process should work as you described. In fact we're building all of our images in this way without problems. Usually there are just a few changes to the topmost layers and only those are pushed to the registry - otherwise the whole concept of image layers would be useless.
See here for an example. Only the two topmost layers have changed, are pushed for :latest and for :4.0.2 there's no push at all. We're tagging images with git tags and for some projects we even tag images with git describe - to get the rollback functionality, just in case.
You can get the project source-code also from GitHub to try it out.
A few things to note about the setup: We're using a self-hosted GitLab CI with a customized runner which runs docker and docker-compose on an isolated host with Docker 1.9.1, but that should not make any difference.
There may be also differences in the registry version, I had the feeling (but I am not 100% sure) that some older repos on DockerHub are still running on registry v1, newer ones always on v2 - so you may try creating a new repo and see if the issue still occurs.
Please note that the behavior for tags described above does only apply when pushing the same image-name, if you push the same image layers with another name, you always need to push all layers, despite the fact that all layers should already exists on the registry, so I guess repo/image:mytag1 and repoimage:mytag2 actually go to repo/image and the missing slash is just a typo.
Another cause could be that your images are built on different hosts on Circle CI, but then you should also get different layer IDs, so I think this is not very likely.
I suggest to build an image manually and try to reproduce the problem or contact Circle CI about this issue.

Docker: How to prevent the use of latest image from docker registry?

I was using centos image from https://registry.hub.docker.com/u/blalor/centos/
For some reason Blalor decided to remove passwd from the list of packages installed on the base image and my dockers stopped working on new deployments. Why does not docker know the build which was used for my dockers? I have had to change my base images now and change every server's docker image.
I could not use the tag feature because there is the tagging for the blalor's images? Do I have to use the source code and host the centos image myself so that it does not change again?
You do not need to use sources. If you have a working image, you can do docker history <your image> to see the image ID that was used and tag the proper one into shortfellow/centos. If you do not have a working image, on the link you provided, there is a build detail section with the history of build. You can see that on January 13th, 2014, it has been built and the image then was a531daec9f98. You can do FROM a531daec9f98 on your dockerfile to make sure it will never change or you can docker tag a531daec9f98 shortfellow/centos (you will need to docker pull a531daec9f98 before).
It is very similar to git in a sense that if you are using someone's repository, and if that someone does not use tags or branches, when he updates his reposiroty and you re pull, you will have the latest version with the new changes. In order to get back to the version you liked, you need to find the commit id. The solution would be to fork the repository. Which you can do on Docker by tagging the image under you username and pushing to a registry (docker push username/image)

Resources