docker: how to show the diffs between 2 images

docker: how to show the diffs between 2 images - docker

I have a Dockerfile with a sequence of RUN instructions that execute "apt-get install"s; for example, a couple of lines:
RUN apt-get install -y tree
RUN apt-get install -y git
After having executed "docker build", if I then execute "docker images -a", I see the listing of all the base-child-child-.... images that were created during the build.
I'd like to see a list of all of the packages that were installed when the "apt-get install -y git" line was executed (including the dependent packages that may have also been installed, besides the git packages).
Note: I believe that the "docker diff" command shows the diffs between a container and the image from which it was started. Instead I'd like the diffs between 2 images (of the same lineage): the "tree" and "git" image IDs. Is this possible?
Thanks.

Have a look at :
https://github.com/GoogleCloudPlatform/container-diff
This tool can diff local or remote docker images and can do so without requiring docker to be installed. It has file as well as package level "differs" (for example: apt, npm, and pip) so that you can more easily see the differences in packages that have changed between two docker images.
Disclaimer: I am a contributor to this project

This one worked for me:
docker run -it e5cba87ecd29 bash -c 'find /path/to/files -type f | sort | xargs -I{} sha512sum {}' > /tmp/dockerfiles.e5cba87ecd29.txt
docker run -it b1d19fe1a941 bash -c 'find /path/to/files -type f | sort | xargs -I{} sha512sum {}' > /tmp/dockerfiles.b1d19fe1a941.txt
meld /tmp/dockerfiles*
Where e5cba87ecd29 and b1d19fe1a941 are images I am interested in and /path/to/files is a directory which could be "/".
It lists all files, sorts it and add hash to it just in case. And meld highlights all the differences.

I suppose you could send both images' file systems to tarballs via docker export CONTAINER_ID or docker save IMAGE_ID (updated based on comments)
Then use whatever tool you like to diff the file systems - Git, Rdiff, etc.

It is now 2019 and I just found a useful tool which was released in late 2017.
https://opensource.googleblog.com/2017/11/container-diff-for-comparing-container-images.html
The following content is from container-diff github page:
container-diff diff <img1> <img2> --type=history [History]
container-diff diff <img1> <img2> --type=file [File System]
container-diff diff <img1> <img2> --type=size [Size]
container-diff diff <img1> <img2> --type=rpm [RPM]
container-diff diff <img1> <img2> --type=pip [Pip]
container-diff diff <img1> <img2> --type=apt [Apt]
container-diff diff <img1> <img2> --type=node [Node]
You can similarly run many analyzers at once:
container-diff diff <img1> <img2> --type=history --type=apt --type=node

Each RUN instruction creates a new container and you can inspect what a container changed by using docker diff <container>.
So after building your dockerfile, run docker ps -a to get a list of the containers the buildfile created. It should look something like:
CONTAINER ID IMAGE COMMAND CREATED STATUS ...
53d7dadafee7 f71e394eb0fc /bin/sh -c apt-get i 7 minutes ago Exit 0 ...
...
Now you can do do docker diff 53d7dadafee7 to see what was changed.

If you know container ID or name (even stopped container),
you can quickly dump file list on-the-fly.
$ docker export CONTAIN_ID_OR_NAME | tar tv
-rwxr-xr-x 0 0 0 0 2 6 21:22 .dockerenv
-rwxr-xr-x 0 0 0 0 2 6 21:22 .dockerinit
drwxr-xr-x 0 0 0 0 10 21 13:46 bin/
-rwxr-xr-x 0 0 0 1021112 10 8 2014 bin/bash
-rwxr-xr-x 0 0 0 31152 10 21 2013 bin/bunzip2
-rwxr-xr-x 0 0 0 0 10 21 2013 bin/bzcat link to bin/bunzip2
lrwxrwxrwx 0 0 0 0 10 21 2013 bin/bzcmp -> bzdiff
-rwxr-xr-x 0 0 0 2140 10 21 2013 bin/bzdiff
lrwxrwxrwx 0 0 0 0 10 21 2013 bin/bzegrep -> bzgrep
-rwxr-xr-x 0 0 0 4877 10 21 2013 bin/bzexe
......
Then you can save list to file and compare too list files.
If you insist to use image ID or name, you can dump first layer's file list on-the-fly:
$ docker save alpine |tar xO '*/layer.tar' | tar tv
drwxr-xr-x 0 0 0 0 12 27 06:32 bin/
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/ash -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/base64 -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/bbconfig -> /bin/busybox
-rwxr-xr-x 0 0 0 821408 10 27 01:15 bin/busybox
After all, i suggest you start the container then stop it, then you can get a merged file list as described in first way.
2017/02/01: The fastest way to show container's file list, you are free to enter its root dir to read files:
# PID=$(docker inspect -f '{{.State.Pid}}' CONTAIN_ID_OR_NAME)
# cd /proc/$PID/root && ls -lF
drwxr-xr-x 0 0 0 0 12 27 06:32 bin/
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/ash -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/base64 -> /bin/busybox
lrwxrwxrwx 0 0 0 0 12 27 06:32 bin/bbconfig -> /bin/busybox
-rwxr-xr-x 0 0 0 821408 10 27 01:15 bin/busybox
Note, if you are using docker-machine, you need first enter it by
docker-machine ssh then sudo sh.
Now you get the root dir of the two container, you can use diff to compare them directly.

Have a look at:
https://github.com/moul/docker-diff
They list Brew install instructions for Mac, I'm assuming it's a Bash script, so I assume it could be made to work in other *nix environments.

It sounds like in this case maybe you only needed to see the diff between two layers. If so, dive is awesome for this; it lets you inspect the filesystem at each layer and you can filter files by change type (unchanged, added, removed, modified).
And if you want to inspect the differences between two unrelated images, having two dive processes running side by side works okay too.

Related

Using the same Docker image file permissions differ from machine to machine

I have a problem, that I cannot grasp at all. I'm running my Jenkins pipeline in a Docker container on the master node. Now I added another node and want to run the pipeline there as well.
However, using the same image I get different file permissions in the container:
### master
> docker image ls node:10.20.1-stretch
REPOSITORY TAG IMAGE ID CREATED SIZE
node 10.20.1-stretch c5f1efe092a0 13 days ago 912MB
> docker run --rm -ti -u 1000:1000 node:10.20.1-stretch ls -la /home/node
total 20
drwxr-xr-x 2 1000 1000 4096 May 15 20:31 .
drwxr-xr-x 3 0 0 4096 May 15 20:31 ..
-rw-r--r-- 1 1000 1000 220 May 15 2017 .bash_logout
-rw-r--r-- 1 1000 1000 3526 May 15 2017 .bashrc
-rw-r--r-- 1 1000 1000 675 May 15 2017 .profile
### node 1
> docker image ls node:10.20.1-stretch
REPOSITORY TAG IMAGE ID CREATED SIZE
node 10.20.1-stretch c5f1efe092a0 13 days ago 912MB
> docker run --rm -ti -u 1000:1000 node:10.20.1-stretch ls -la /home/node
total 20
drwxr-xr-x 2 0 0 4096 May 26 05:42 .
drwxr-xr-x 1 0 0 4096 May 26 05:42 ..
-rw-r--r-- 1 0 0 220 May 26 05:42 .bash_logout
-rw-r--r-- 1 0 0 3526 May 26 05:42 .bashrc
-rw-r--r-- 1 0 0 675 May 26 05:42 .profile
I observed a similar behavior for the /tmp directory, which has chmod 1777 on master and 1755 on node 1.
# master
> docker -v
Docker version 19.03.9, build 9d988398e7
> dockerd -v
Docker version 19.03.9, build 9d988398e7
# node 1
> docker -v
Docker version 19.03.10, build 9424aeaee9
> dockerd -v
Docker version 19.03.10, build 9424aeaee9
I assume the wrong behavior is on node 1, as the /home/node directory and all of its children are owned by root:root there, but the same directory is owned by node:node on the master. However, I already upgraded the Docker version on node 1 from 19.03.8 to 19.03.10 and nothing changed.
It there anything I don't understand about Docker containers? I have been working with them for a while, but never observed such a behavior.

I have change the storage driver from overlay2 to aufs. Now I have the correct permissions.

what does 'tar --overwrite' actually do (or not do)?

I see that Linux tar has an option --overwrite. But overwriting seems to be the default. Moreover, specifying tar --no-overwrite does not change this behavior as the info file seems to suggest.
So what does that option actually do?
I test it with
ls -l >junk
ls -l junk
tar -cf junk.tar junk
>junk
ls -l junk
tar <option?> -xf junk.tar # option varies, results do not
ls -l junk

There are a few subtleties, but in general, here's the difference:
By default, "tar" tries to open output files with the flags O_CREAT | O_EXCL. If the file exists, this will fail, after which "tar" will retry by first trying to delete the existing file and then re-opening with the same flags (i.e., creating a new file).
In contrast, with the --overwrite option, "tar" tries to open output files with the flags O_CREAT | O_TRUNC. If the file exists, it will be truncated to zero size and overwritten.
The main implication is that "tar" by default will delete and re-create existing files, so they'll get new inode numbers. With --overwrite, the inode numbers won't change:
$ ls -li foo
total 0
5360222 -rw-rw-r-- 1 buhr buhr 0 Jun 26 15:16 bar
$ tar -cf foo.tar foo
$ tar -xf foo.tar # inode will change
$ ls -li foo
total 0
5360224 -rw-rw-r-- 1 buhr buhr 0 Jun 26 15:16 bar
$ tar --overwrite -xf foo.tar # inode won't change
$ ls -li foo
total 0
5360224 -rw-rw-r-- 1 buhr buhr 0 Jun 26 15:16 bar
$
This also means that, for each file overwritten, "tar" by default will need three syscalls (open, unlink, open) while --overwrite will need only one (open with truncation).

Why isn't docker-compose seeing my .env file when executing an ssh command inside a bash script?

When i ssh to server.foo.com as user fish and execute docker-compose up, docker will see a corresponding .env file that sites the the same directory.
[fish#staging-core ~]$ ls -al
total 28
drwx------. 4 fish fish 191 Jun 18 21:47 .
drwxr-xr-x. 4 root root 34 Jun 18 20:35 ..
-rw-------. 1 fish fish 558 Jun 18 22:15 .bash_history
-rw-r--r--. 1 fish fish 18 Apr 11 00:53 .bash_logout
-rw-r--r--. 1 fish fish 193 Apr 11 00:53 .bash_profile
-rw-r--r--. 1 fish fish 231 Apr 11 00:53 .bashrc
-rw-r--r--. 1 fish fish 0 Jun 18 20:19 .cloud-locale-test.skip
drwx------. 2 fish fish 25 Jun 18 21:41 .docker
-rw-r--r--. 1 fish fish 838 Jun 18 22:15 .env
-rw-r--r--. 1 fish fish 1066 Jun 18 22:15 docker-compose.yml
drwx------. 2 fish fish 29 Jun 18 20:35 .ssh
However, when i try and run this via a bash script, i get the following error:
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null fish#server.foo.com docker-compose down && docker-compose pull && docker-compose up -d && docker-compose scale api=4 && docker-compose scale ml=4
I get
Warning: Permanently added 'server.foo.com,165.227.100.2' (ECDSA) to the list of known hosts.
Removing network core_default
Network core_default not found.
WARNING: The FISH_TAG variable is not set. Defaulting to a blank string.
Why isn't docker-compose seeing my .env file when executing an ssh command inside a bash script?

You should double-quote your command, otherwise the chained instructions will execute on the local shell
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null fish#server.foo.com
"docker-compose down && \
docker-compose pull && \
docker-compose up -d && \
docker-compose scale api=4 && \
docker-compose scale ml=4"

Docker is deleting downloaded files during build when I use a VOLUME, why?

I have this simple Dockerfile:
FROM fabric8/java-centos-openjdk8-jdk
VOLUME /tmp
RUN curl -k -Lo /tmp/oc.tar.gz "https://mirror.openshift.com/pub/openshift-v3/clients/3.6.173.0.21/linux/oc.tar.gz" && ls -l /tmp
RUN ls -l /tmp && tar zxf /tmp/oc.tar.gz -C /usr/local/bin
It has to download a file, prints the /tmp folder contents, then ls again and extracts the downloaded file's content.
The problem is after downloading the file it is there (&& ls -l /tmp), but in the next RUN ls -l /tmp the file isn't there anymore.
Step 6/17 : RUN curl -k -Lo /tmp/oc.tar.gz "https://mirror.openshift.com/pub/openshift-v3/clients/3.6.173.0.21/linux/oc.tar.gz" && ls -l /tmp
---> Running in 5ad24909ed82
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 34.4M 100 34.4M 0 0 2489k 0 0:00:14 0:00:14 --:--:-- 5660k
total 35308
drwxr-xr-x 2 root root 4096 Mar 17 11:10 hsperfdata_root
-rwx------ 1 root root 836 Mar 2 01:07 ks-script-IAlIsB
-rw-r--r-- 1 root root 36145614 May 24 08:07 oc.tar.gz
-rw------- 1 root root 0 Mar 2 01:06 yum.log
Removing intermediate container 5ad24909ed82
---> 09e50e6d4d84
Step 7/17 : RUN ls -l /tmp && tar zxf /tmp/oc.tar.gz -C /usr/local/bin
---> Running in 49c305788ac9
total 8
drwxr-xr-x 2 root root 4096 Mar 17 11:10 hsperfdata_root
-rwx------ 1 root root 836 Mar 2 01:07 ks-script-IAlIsB
-rw------- 1 root root 0 Mar 2 01:06 yum.log
tar (child): /tmp/oc.tar.gz: Cannot open: No such file or directory
I has something to do with the VOLUME /tmp, without it, it works fine. What's the explanation of this?

Once the volume is defined, you won't be able to modify it. My best guess of what is happening internally during the build is that a temporary volume is setup with the temporary container used to perform the RUN step, and when the RUN step completes, the changes to the image are captured which will not include any changes to the temporary volume files. This behavior is documented by docker:
Changing the volume from within the Dockerfile: If any build steps
change the data within the volume after it has been declared, those
changes will be discarded.
I've also blogged on the topic here.

Docker: opposite of docker import

An image created with docker import fails to run, because a necessary file is missing.
I am attempting to debug what is in the image and what is missing.
What I have tried
docker save creates a tarball, but it is not a simple tarball with files but all kind of metadata
docker export requires a running container.
Do I have any other alternatives?

If the image has a working shell, use that to debug.
docker run -ti BROKEN_IMAGE sh
If a shell doesn't work, export the image contents and you will have a tar file you can extract somewhere and inspect.
CID=$(docker create BROKEN_IMAGE)
docker export $CID | tar -tvf -
Detail
The source file/data for a docker import should be a tar file of the plan image contents. This data can be viewed or extracted somewhere with the tar command:
$ CID=$(docker create busybox)
$ docker export $CID > myimage.tar
$ tar -tvf myimage.tar
-rwxr-xr-x 0 0 0 0 28 Nov 01:03 .dockerenv
drwxr-xr-x 0 0 0 0 1 Nov 22:58 bin/
-rwxr-xr-x 0 0 0 1049688 1 Nov 22:58 bin/[
-rwxr-xr-x 0 0 0 0 1 Nov 22:58 bin/[[ link to bin/[
-rwxr-xr-x 0 0 0 0 1 Nov 22:58 bin/acpid link to bin/[
...
If the file/data is from a docker save there are additional layers to cater for the image metadata and layers.
$ docker save busybox | tar -tvf -
drwxr-xr-x 0 0 0 0 3 Nov 22:39 036a82c6d65f2fa43a13599661490be3fca1c3d6790814668d4e8c0213153b12/
-rw-r--r-- 0 0 0 3 3 Nov 22:39 036a82c6d65f2fa43a13599661490be3fca1c3d6790814668d4e8c0213153b12/VERSION
-rw-r--r-- 0 0 0 1174 3 Nov 22:39 036a82c6d65f2fa43a13599661490be3fca1c3d6790814668d4e8c0213153b12/json
-rw-r--r-- 0 0 0 1337856 3 Nov 22:39 036a82c6d65f2fa43a13599661490be3fca1c3d6790814668d4e8c0213153b12/layer.tar
-rw-r--r-- 0 0 0 1497 3 Nov 22:39 6ad733544a6317992a6fac4eb19fe1df577d4dec7529efec28a5bd0edad0fd30.json
-rw-r--r-- 0 0 0 203 1 Jan 1970 manifest.json
-rw-r--r-- 0 0 0 90 1 Jan 1970 repositories
The LAYER_ID/layer.tar file(s) contain each layer contents which would need to be extracted in the order of the Layers array in manifest.json.
$ cat manifest.json | jq
[
{
"Config": "6ad733544a6317992a6fac4eb19fe1df577d4dec7529efec28a5bd0edad0fd30.json",
"RepoTags": [
"busybox:latest"
],
"Layers": [
"036a82c6d65f2fa43a13599661490be3fca1c3d6790814668d4e8c0213153b12/layer.tar"
]
}
]

Docker doesn't need a running container to be able to run docker export, it just needs a container:
docker create --name toexport brokenimage
docker export toexport -o exported.tar
docker rm toexport
The docker save is complimentary with the docker load command and can be used to save an image in a suitable format for importing, preserving each image layer and image metadata. This is why the resulting tarball has more tarballs inside it.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

docker: how to show the diffs between 2 images - docker

I suppose you could send both images' file systems to tarballs via docker export CONTAINER_ID or docker save IMAGE_ID (updated based on comments) Then use whatever tool you like to diff the file systems - Git, Rdiff, etc.

Have a look at: https://github.com/moul/docker-diff They list Brew install instructions for Mac, I'm assuming it's a Bash script, so I assume it could be made to work in other *nix environments.

Related

Using the same Docker image file permissions differ from machine to machine

what does 'tar --overwrite' actually do (or not do)?

Why isn't docker-compose seeing my .env file when executing an ssh command inside a bash script?

Docker is deleting downloaded files during build when I use a VOLUME, why?

Docker: opposite of docker import

Categories

Resources