How do I use the "git-like" capabilities of Docker?

How do I use the "git-like" capabilities of Docker? - docker

I'm specifically interested in how I run or roll back to a specific version of the (binary) image from docker, and have tried to clarify this question to that effect.
The Docker FAQ it says:
Docker includes git-like capabilities for tracking successive versions of a container, inspecting the diff between versions, committing new versions, rolling back etc. The history also includes how a container was assembled and by whom, so you get full traceability from the production server all the way back to the upstream developer.
Google as I may, I can find no example of "rolling back" to an earlier container, inspecting differences, etc. (Obviously I can do such things for version-managed Dockerfiles, but the binary Docker image / container can change even when the Dockerfile does not, due to updated software sources, and I'm looking for a way to see and roll back between such changes).
For a basic example: imagine I run
docker build -t myimage .
on a Dockerfile that just updates the base ubuntu:
FROM ubuntu:14:04
RUN apt-get update -q && apt-get upgrade -y
If I build this same image a few days later, how can I diff these images to see what packages have been upgraded? How can I roll back to the earlier version of the image after re-running the same build command later?

Technically we are only rolling back AUFS layers, not necessarily rolling back history. If our workflow consists of interactively modifying our container and committing the changes with docker commit, then this really does roll back history in the sense that it removes any package updates we applied in later layers, leaving the versions installed in earlier layers. This is very different if we rebuild image from a Dockerfile. Then nothing here is letting us get back to the previous version we had built, we can only remove steps (layers) from the Dockerfile. In other words, we can only roll back the history of our docker commits to an image.
It appears the key to rolling back to an earlier version of a docker image is simply to point the docker tag at an earlier hash.
For instance, consider inspecting the history of the standard ubuntu:latest image:
docker history ubuntu:latest
Shows:
IMAGE CREATED CREATED BY SIZE
ba5877dc9bec 3 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B
2318d26665ef 3 weeks ago /bin/sh -c sed -i 's/^#\s*$deb.*universe$$/ 1.903 kB
ebc34468f71d 3 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 8 B
25f11f5fb0cb 3 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.5 kB
9bad880da3d2 3 weeks ago /bin/sh -c #(nop) ADD file:de2b0b2e36953c018c 192.5 MB
511136ea3c5a 14 months ago
0 B
Imagine we want to go back to the image indicated by hash 25f:
docker tag 25f ubuntu:latest
docker history ubuntu:latest
And we see:
IMAGE CREATED CREATED BY SIZE
25f11f5fb0cb 3 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.5 kB
9bad880da3d2 3 weeks ago /bin/sh -c #(nop) ADD file:de2b0b2e36953c018c 192.5 MB
511136ea3c5a 14 months ago 0 B
Of course, we probably never want to roll back in this way, since it makes ubuntu:latest not actually the most recent ubuntu in our local library. Note that we could have used any tag we wanted, e.g.
docker tag 25f ubuntu:notlatest
or simply launched the old image by hash:
docker run -it 25f /bin/bash
So simple and yet so neat. Note that we can combine this with docker inspect to get a bit more detail about the metadata of each image to which the Docker FAQ refers.
Also note that docker diff and docker commit are rather unrelated to this process, as they refer to containers (e.g. running images), not to images directly. That is, if we run an image interactively and then add or change a file on the image, we can see the change (between container) by using docker diff <Container-id> and commit the change using docker commit <Container id>.

I'm not sure if you actually can use the hash as a tag. The hash IIRC is a reference to the image itself whereas the tag is more of a metadata field on the image.
The tags feature imho is quite badly documented, but the way you should use it is probably by using semantic versioning of sorts to organise your tags and images. We're moving a complex (12-microservice) system to using Docker and from relying on latest I quickly ended up doing something like semantic versioning and a changelog in the Git Repo to keep track of changes.
This can also be good if you say, have a docker branch that automatically takes changes and triggers a build on DockerHub - you can update a changelog and know which hash/timestamp goes with what.
Personally as the DockerHub build triggers are slow at present I prefer manually declaring a tag for each image and keeping a changelog, but YMMV and I suspect the tools will get better for this.

Related

Building tensorflow-serving docker gpu from source with docker?

I have followed these steps for building docker from source,
git clone https://github.com/tensorflow/serving
cd serving
docker build --pull -t $USER/tensorflow-serving-devel-gpu \
-f tensorflow_serving/tools/docker/Dockerfile.devel-gpu .
docker build -t $USER/tensorflow-serving-gpu \
--build-arg TF_SERVING_BUILD_IMAGE=$USER/tensorflow-serving-devel-gpu \
-f tensorflow_serving/tools/docker/Dockerfile.gpu .
These took quite a long time for get compiled and it was completed successfully,
Now if I check docker images, I see this below response,
REPOSITORY TAG IMAGE ID CREATED SIZE
root/tensorflow-serving-gpu latest 42e221bb6bc9 About an hour ago 8.49GB
root/tensorflow-serving-devel-gpu latest 7fd974e5e0c5 2 hours ago 21.8GB
nvidia/cuda 11.0-cudnn8-devel-ubuntu18.04 7c49b261611b 3 months ago 7.41GB
I have two doubts regarding this,
Building from source took a large amount of time, and now I want to backup /save these images or containers and save them so I can re-use them later on different machine with same arch. If you know how to do it, please help me with the commands.
Since I completed the build successfully, I need to free up some space by removing unnecessary docker development images used to build tensorflow-serving-gpu? I have three images here which are related to tensorflow serving and I don't know which one to delete?

If you want to save images:
docker save root/tensorflow-serving-gpu:latest -o tfs.tar
And if you want to load it:
docker load -i tfs.tar
root/tensorflow-serving-gpu and root/tensorflow-serving-devel-gpu are two different images. You can see the differences by looking at the details of Dockerfile.devel-gpu and Dockerfile.gpu.

Correctly keeping docker VSTS / Azure Devops build agent clean yet cached

We have added a dockerised build agent to our development Kubernetes cluster which we use to build our applications as part of our Azure Devops pipelines. We created our own image based on the deprecated Microsoft/vsts-agent-docker on Github.
The build agent uses Docker outside of Docker (DooD) to create images on our development cluster.
This agent was working well for a few days but then an error would occasionally occur on the docker commands in our build pipeline:
Error response from daemon: No such image: fooproject:ci-3284.2
/usr/local/bin/docker failed with return code: 1
We realised that the build agent was creating tons of images that weren't being removed. There were tons of images that were blocking up the build agent and there were missing images, which would explain the "no such image" error message.
By adding a step to our build pipelines with the following command we were able to get our build agent working again:
docker system prune -f -a
But of course this then removes all our images, and they must be built from scratch every time, which causes our builds to take an unnecessarily long time.
I'm sure this must be a solved problem but I haven't been able to locate any documentation on the normal strategy for dealing with a dockerised build agent becoming clogged over time. Being new to docker and kubernetes I may simply not know what I am looking for. What is the best practice for creating a dockerised build agent that stays clean and functional, while maintaining a cache?
EDIT: Some ideas:
Create a build step that cleans up all but the latest image for the given pipeline (this might still clog the build server though).
Have a cron job run that removes all the images every x days (this would result in slow builds the first time after the job is run, and could still clog the build server if it sees heavy usage.
Clear all images nightly and run all builds outside of work hours. This way builds would run quickly during the day. However heavy usage could still clog the build server.
EDIT 2:
I found someone with a docker issue on Github that seems to be trying to do exactly the same thing as me. He came up with a solution which he described as follows:
I was exactly trying to figure out how to remove "old" images out of my automated build environment without removing my build dependencies. This means I can't just remove by age, because the nodejs image might not change for weeks, while my app builds can be worthless in literally minutes.
docker image rm $(docker image ls --filter reference=docker --quiet)
That little gem is exactly what I needed. I dropped my repository name in the reference variable (not the most self-explanatory.) Since I tag both the build number and latest the docker image rm command fails on the images I want to keep. I really don't like using daemon errors as a protection mechanism, but its effective.
Trying to follow these directions, I have applied the latest tag to everything that is built during the process, and then run
docker image ls --filter reference=fooproject
If I try to remove these I get the following error:
Error response from daemon: conflict: unable to delete b870ec9c12cc (must be forced) - image is referenced in multiple repositories
Which prevents the latest one from being removed. However this is not exactly a clean way of doing this. There must be a better way?

Probably you've already found a solution, but it might be useful for the rest of the community to have an answer here.
docker prune has a limited purpose. It was created to address the issue with cleaning up all local Docker images. (As it was mentioned by thaJeztah here)
To remove images in the more precise way it's better to divide this task into two parts:
1. select/filter images to delete
2. delete the list of selected images
E.g:
docker image rm $(docker image ls --filter reference=docker --quiet)
docker image rm $(sudo docker image ls | grep 1.14 | awk '{print $3}')
docker image ls --filter reference=docker --quiet | xargs docker image rm
It is possible to combine filters clauses to get exactly what you what:
(I'm using Kubernetes master node as an example environment)
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.14.2 5c24210246bb 3 months ago 82.1MB
k8s.gcr.io/kube-apiserver v1.14.2 5eeff402b659 3 months ago 210MB
k8s.gcr.io/kube-controller-manager v1.14.2 8be94bdae139 3 months ago 158MB
k8s.gcr.io/kube-scheduler v1.14.2 ee18f350636d 3 months ago 81.6MB # before
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 6 months ago 52.6MB
k8s.gcr.io/coredns 1.3.1 eb516548c180 7 months ago 40.3MB # since
k8s.gcr.io/etcd 3.3.10 2c4adeb21b4f 8 months ago 258MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 20 months ago 742kB
$ docker images --filter "since=eb516548c180" --filter "before=ee18f350636d"
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 6 months ago 52.6MB
$ docker images --filter "since=eb516548c180" --filter "reference=quay.io/coreos/flannel"
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 6 months ago 52.6MB
$ docker images --filter "since=eb516548c180" --filter "reference=quay*/*/*"
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 6 months ago 52.6MB
$ docker images --filter "since=eb516548c180" --filter "reference=*/*/flan*"
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 6 months ago 52.6MB
As mentioned in the documentation, images / image ls filter is much better than docker prune filter, which supports until clause only:
The currently supported filters are:
• dangling (boolean - true or false)
• label (label=<key> or label=<key>=<value>)
• before (<image-name>[:<tag>], <image id> or <image#digest>) - filter images created before given id or references
• since (<image-name>[:<tag>], <image id> or <image#digest>) - filter images created since given id or references
If you need more than one filter, then pass multiple flags
(e.g., --filter "foo=bar" --filter "bif=baz")
You can use other linux cli commands to filter docker images output:
grep "something" # to include only specified images
grep -v "something" # to exclude images you want to save
sort [-k colN] [-r] [-g]] | head/tail -nX # to select X oldest or newest images
Combining them and putting the result to CI/CD pipeline allows you to leave only required images in the local cache without collecting a lot of garbage on your build server.
I've copied here a good example of using that approach provided by strajansebastian in the comment:
#example of deleting all builds except last 2 for each kind of image
#(the image kind is based on the Repository value.)
#If you want to preserve just last build modify to tail -n+2.
# delete dead containers
docker container prune -f
# keep last 2 builds for each image from the repository
for diru in `docker images --format "{{.Repository}}" | sort | uniq`; do
for dimr in `docker images --format "{{.ID}};{{.Repository}}:{{.Tag}};'{{.CreatedAt}}'" --filter reference="$diru" | sed -r "s/\s+/~/g" | tail -n+3`; do
img_tag=`echo $dimr | cut -d";" -f2`;
docker rmi $img_tag;
done;
done
# clean dangling images if any
docker image prune -f

Why I can't run a newly created Docker image?

I created two new images anubh_custom_build_image/ubuntu_bionic:version1 & ubuntu_bionic_mldev:version1 from the base image ubuntu:bionic. The purpose of creating the custom-built ubuntu-docker images was to use Linux system on windows platform. I have faced many issues in past one such is installing a new version of tensorflow library ! pip install -q tf-nightly, I can't find a substitute of ! to run this command on windows cmd-prompt/PowerShell. Moreover, I want to invest more time on my codebase rather than fixing the issues on different OS. So, I pull the latest Ubuntu image from docker, installed a bunch of libraries for my usage and committed using docker commit command:
docker commit 503130713dff ubuntu_bionic_MLdev:version1
I can see the images using :
PS C:\Users\anubh> docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu_bionic_mldev version1 e7d1b154b69f 21 hours ago 9.33GB
anubh_custom_build_image/ubuntu_bionic version1 3c98f8954731 22 hours ago 9.33GB
tensorflow/tensorflow latest 2c8d1fd8bde4 2 days ago 1.25GB
ubuntu bionic 735f80812f90 2 weeks ago 83.5MB
ubuntu latest 735f80812f90 2 weeks ago 83.5MB
floydhub/dl-docker cpu 0b9fc622f1b7 2 years ago 2.87GB
When I tried to spin-up the containers using these images, The following command ran without any error.
PS C:\Users\anubh> docker run anubh_custom_build_image/ubuntu_bionic:version1
PS C:\Users\anubh> docker run ubuntu_bionic_mldev:version1
EDIT:
The issue is that run command is executing but the containers aren't spinning up for the above two images. I apologize for attaching the wrong error message in the first post, I edited it now. The below two containers were spined-up using docker run -it -p 8888:8888 tensorflow/tensorflow & docker run ubuntu:bionic commands.
PS C:\Users\anubh> docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
94d59b217b70 tensorflow/tensorflow "/run_jupyter.sh --a…" 21 hours ago Up 21 hours 6006/tcp, 8888/tcp boring_clarke
503130713dff ubuntu:bionic "bash" 38 hours ago Up 38 hours awesome_bardeen
Could anyone suggest what I am missing for running these images anubh_custom_build_image/ubuntu_bionic:version1 & ubuntu_bionic_mldev:version1 from the base image ubuntu:bionic as
containers?
Also, I can't find the location of any of these images on my disk.
Could anyone also suggest where to look for inside Windows OS?
NOTE: I will write a dockerfile in future to build a custom image, but for now, I want to use the commit command to create new image & use them.

Your docker run command doesn't work because you don't have the :version1 tag at the end of it. (Your question claims you do, but the actual errors you quote don't have it.)
However: if this was anything more than a simple typo, the community would probably be unable to help you, because you have no documentation about what went into your image. (And if not "the community", then "you, six months later, when you discover your image has a critical security vulnerability".) Learn how the Dockerfile system works and use docker build to build your custom images. Best-practice workflows tend to not use docker commit at all.

the approach to restore a pre-configured docker image

I am completely new to docker. I have a quick question about docker images.
Assume that I have setup a local docker image with certain software / server installed. So now I would need to set a checkpoint / snapshot here, then all the work done after this checkpoint is temporary; which means at a certain time, I would restore the original image (from that checkpoint) and overwrite everything in the temporary image.
My first question is if the above use-case make sense?
My second question, if the above make sense, what is the approach in doing that checkpoint (simply how, as I am keeping the checkpoint image in local diskspace only, no cloud repos involved) and how to restore the images to overwrite everything in the temporary image when needed.
Though I have read a bit of docker documentation, but am still struggling in the conceptual things.

It makes sense even though you could consider managing data in container or volume (or host folder mounted in the container).
That way the data remains persistent even when you stop and restart the container.
what is the approach in doing that checkpoint
If your container does not mount a volume, and has its data inside, then yes, stopping and removing a container will lose that data.
One possibility is to create that snapshot with docker commit.
That will freeze the container state as a new image, that you can run later.
Example:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c3f279d17e0a ubuntu:12.04 /bin/bash 7 days ago Up 25 hours desperate_dubinsky
197387f1b436 ubuntu:12.04 /bin/bash 7 days ago Up 25 hours focused_hamilton
$ docker commit c3f279d17e0a svendowideit/testimage:version3
f5283438590d
$ docker images
REPOSITORY TAG ID CREATED SIZE
svendowideit/testimage version3 f5283438590d 16 seconds ago 335.7 MB

As far as I know there is nothing like checkpoint or so with respect to Docker. But you can save the actions performed on the container as Images to create a new Image.
Giving an example to have better understanding:
Lets run a container using an base Ubuntu Image and create a folder inside the container:
#docker run -it ubuntu:14.04 /bin/bash
root#58246867493d:/#
root#58246867493d:/# cd /root
root#58246867493d:~# ls
root#58246867493d:~# mkdir TEST_DIR
root#58246867493d:~# exit
Status of the exited container:
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
58246867493d ubuntu:14.04 "/bin/bash" 2 minutes ago Exited (127) 57 seconds ago hungry_turing
Now you can commit the changes so that the container will be saved into a new Docker Image:
#docker commit 58246867493d ubuntu:15.0
Get docker Images
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
**ubuntu 15.0 acac1f3733b2 10 seconds ago 188MB**
ubuntu 14.04 132b7427a3b4 10 hours ago 188MB
Run the newly build Image to see the changes committed in the previous container.
# docker run -it ubuntu:15.0 /bin/bash
root#3a48af5eaec9:/# cd /root/
root#3a48af5eaec9:~# ls
TEST_DIR
root#3a48af5eaec9:~# exit
As you have asked, any further changes you do on this container either can be committed to form a new Image or can be ignored .
Hope it helps.

Best practice to apply patch to a modified docker container?

So let's say we just spun up a docker container and allows user SSH into the container by mapping port 22:22.
User then installed some software like git or whatever they want. So that container is now polluted.
Later on, suppose I want to apply some patches to the container, what is the best way to do so?
Keep in mind that the user has modified contents in container, including some system level directories like /usr/bin. So I cannot simply replace the running container with another image.
So to give you some real life use cases. Take Nitrous.io as an example. I saw they are using docker containers to serve as user's VM. So users can install packages like Node.js global packages. So how do they update/apply patch to containers like a pro? Similar platforms like Codeanywhere might work in the same way.
I tried google it but I failed. I am not 100 percent sure whether this is a duplicate though.

User then installed some software like git or whatever they want ... I want to apply some patch to the container, what is the best way to do so ?
The recommended way is to plan your updates through Dockerfile. However, if you are unable to achieve that, than any additional changes or new packages installed to the container should be committed before they are exited.
ex: Below is simple container created which does not have vim installed.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
pingimg 1.5 1e29ac7353d1 4 minutes ago 209.6 MB
Start the container and check if vim is installed.
$ docker run -it pingimg:1.5 /bin/bash
root#f63accdae2ab:/#
root#f63accdae2ab:/# vim
bash: vim: command not found
Install the required packages, inside the container:
root#f63accdae2ab:/# sudo apt-get update && install -y vim
Back on the host, commit the container with a new tag before stopping or exiting the container.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f63accdae2ab pingimg:1.5 "/bin/bash" About a minute ago Up About a minute modest_lovelace
$ docker commit f63accdae2ab pingimg:1.6
378e0359eedfe902640ff71df4395c3fe9590254c8c667ea3efb54e033f24cbe
$ docker stop f63accdae2ab
f63accdae2ab
Now docker images should show to both the tags or versions of the container. Note, the updated container shows larger size.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
pingimg 1.6 378e0359eedf 43 seconds ago 252.8 MB
pingimg 1.5 1e29ac7353d1 4 minutes ago 209.6 MB
Re-start the recently committed container, you can see that vim installed
$ docker run -it pingimg:1.6 /bin/bash
root#63dbbb8a9355:/# which vim
/usr/bin/vim
Verify the contents of the previous version container and should notice that vim is still missing.
$ docker run -it pingimg:1.5 /bin/bash
root#99955058ea0b:/# which vim
root#99955058ea0b:/# vim
bash: vim: command not found
Hope this helps!

There's a whole branch of software called configuration management that seeks to solve this issue, with solutions such as Ansible and Puppet. Whilst designed with VMs in mind, it is certainly possible to use such solutions with containers.
However, this is not the Docker way. Rather than patch a Docker container, throw it away and replace it with a new one. If you need to install new software, add it to the Dockerfile and build a new container as per #askb's solution. By doing things this way, we can avoid a whole set of headaches (similarly, prefer docker exec to installing ssh in containers).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart