Docker image layer: What does `ADD file:<some_hash> in /` mean? - docker

In Docker Hub images there are lists of commands that being run for each image layer. Here is a golang example.
Some applications also provide their Dockerfile in GitHub. Here is a golang example.
According to the Docker Hub image layer, ADD file:4b03b5f551e3fbdf47ec609712007327828f7530cc3455c43bbcdcaf449a75a9 in / is the first command. The image layer doesn't have any "FROM" command included, and it doesn't seem to be suffice the ADD definition too.
So here are the questions:
What does ADD file:<HASH> in / means? What is this format?
Is there any way I could trace upwards using the hash? I suppose that hash represents the FROM image, but it seems there are no API for that.
Why it is not possible to build a dockerfile using the ADD file:<HASH> in / syntax? Is there any way I could build an image using such syntax, OR do a conversion between two format?

That Docker Hub history view doesn't show the actual Dockerfile; instead, it shows content essentially extracted from the docker history of the image. That doesn't preserve the specific details you're looking for: it doesn't remember the names of base images, or the build-context file names of things that get ADDed or COPYed in.
Chasing through GitHub and Docker Hub links, the golang:*-buster Dockerfile is built FROM buildpack-deps:...-scm; buildpack-deps:buster-scm is FROM buildpack-deps:buster-curl; that is FROM debian:buster; and that has a very simple Dockerfile (quoted here in its entirety):
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
FROM scratch starts from a completely totally empty image; that is the base of the Docker image tree (and what tells docker history and similar tools to stop). The ADD line unpacks a tar file of a Debian system image.
If you look at docker history or the Docker Hub history view you cite, you should be able to see these same steps happening. The ADD file:4b0... in / corresponds to the ADD rootfs.tar.gz /, and the second line is the CMD ["bash"]. It is not split up by Dockerfile or image, and the original filenames from ADD aren't saved. (You couldn't reproduce the image anyways without the contents of the rootfs.tar.gz, so it's merely slightly helpful to know its filename but not essential.)
The ADD file:hash in /path syntax is not standard Dockerfile syntax (the word in in particular is not part of it). I'm not sure there's a reliable way to translate from the host file or URL to the hash, but building the image and looking at its docker history would tell you (assuming you've got a perfect match for the file metadata). There's no way to get back to the original filename or syntax, and definitely no way to get back to the file contents.

ADD or COPY means that files are append to the images.
That are files, you cannot "trace" them.
You cannot just copy the commands, because the hashes are not the original files. See https://forums.docker.com/t/how-to-extract-file-from-image/96987 to get the file.

Related

How to merge Docker Compose in one image? [duplicate]

I'm hoping to use docker to set up some bioinformatic analysis.
I have found two docker images that I would like to use:
jupyter/datascience-notebook
bioconductor/devel_base
I have been successful in running each of these images independently, however I don't know how to merge them together.
Is merging two docker containers possible? Or do you start with one, and then manually install the features of the other?
You can't just merge the images. You have to recreate your own based on what was in each of the images you want. You can download both images and re-create the Docker files for each like this:
docker history --no-trunc=true image1 > image1-dockerfile
docker history --no-trunc=true image2 > image2-dockerfile
Substitute the image1 and image2 with the images you want to see the history for. After this you can use those dockerfiles to build your own image that is the combination of the two.
The fly in the ointment here is that any ADD or COPY commands will not reveal what was copied because you don't have access to the local file system from which the original images were created. With any luck that won't be necessary or you can get any missing bits from the images themselves.
If there are specific files or directories that you want to cherry-pick from the one of the two images, you can create a new Dockerfile that builds FROM one of them and copy over specific paths from the other using COPY's --from option. For example:
FROM bioconductor/devel_base
COPY --from=jupyter/datascience-notebook /path/to/something-you-want /path
However, a quick investigation of those images shows that in this specific case there isn't a lot that can easily be cherry picked.
Alternatively, you can just look at the original Dockerfiles and combine them yourself:
https://github.com/jupyter/docker-stacks/blob/master/base-notebook/Dockerfile
https://github.com/Bioconductor/bioc_docker/blob/master/out/devel_base/Dockerfile
Fortunately they are both based one APT-based distros: Ubuntu and Debian. So most of the apt-get install commands should work fine if you pick either base image.
You start with one then manually install the features of the other one. Merging would be far to complex, and too many unknowns.

How docker load a diff image

I know docker save can save a image to tar and use docker load to reload a image.
For example:
I have a Machine A and B. B can't connect hub. A is image:latest and B is image:base.
I have to save multi image in A as some tar file , but the tar files are too big to transfer.
Can I save the diff between tags or image ids in A and load the diff in B?
Not save the whole image which help update patch much more smaller.
This isn't possible using standard Docker tooling. The only option docker save takes is an option to write to a file rather than to stdout, and it always contains all parent layers (and base images).
If your only problem is transferring the images, consider either techniques to reduce the image size (for example, use a multi-stage image to not include build-time dependencies in the final image) or using tools like split(1) to break the tar file into smaller parts.
I believe the docker save tar file output is the same as the "Export an image" API call. It might be possible to manually edit that tar file to delete layers, and there might be tools out there that do this. (This is not a particularly mainstream path, though; I've looked into it several years ago but not done it myself, and occasionally see tools mentioned in infrequent SO answers.)
In between the standard behavior of docker pull and docker save only creating complete image chains, in principle there's no way to set up Docker so that you never only have the "top half" of an image but not the base layers below this. Editing the docker save tar files by hand would violate this invariant.

What does "From image" do in dockerfiles

I see that dockerfiles usually have a line beginning with "from" keywork, for example:
FROM composer/composer:1.1-alpine AS composer
As far as I know, dockerfiles are a set of commands that help to build and run many containers in docker.
As the example above, docker uses a image named composer/composer:1.1-alpine from docker hub. The As composer just make an alias, so we can use it more convenient.
When I looked for the image, I found the link enter link description here and then enter link description here.
The thing I dont really understand is that:
I guess docker will use the image to build something, but how exactly does it use the image? Does docker run the image or just prepare to use it when in need. Sometimes I dont see the dockerfiles use the image in following line (like this example, there are no lines using the keyword "composer" except the first line). It makes me confused.
Any help would be appreciated.
Thanks.
DockerFiles describes layers: Each command creates it's own layer. For example:
RUN touch test.txt
RUN cp test.txt foo.txt
would create two layers - the first one with the file test.txt, the second one without test.txt but with foo.txt
Each layer adds something to a container. When we walk the layers "up" we find that the very first layer is the empty layer, e.g. it contains only the linux (or windows) kernel itself - nothing else. But that's not really useful - we need a lot of tools (e.g. bash) to be able to run an app. So common base images like alpine add suc tools and core os functions.
It would be annoying as hell if we had to do this setup in every container so there a lots of base images, which do exactly this kind of setup.
If you want to see what a base image does, just search the name on hub.docker.com - there you will find the Dockerfile, describing the build process.
Aditionally, containers can be extenend, e.g. you use the elasticsearch container as a base image, and add your own functionality - that's the second use case for base images.
For your second question: You have to decide if you have to replicate the steps in your base image or not. If you inherit from a general OS image like apline - probably not, since linux normally ships with these tools. If you inherit from a more specialized container, it depends - if your machine matches the environment in the container, you don't need to, but if not you will have to apply these steps to your machine, too. E.g. if you don't have elasticsearch installed, you have to install it.
As for multiple froms in one Dockerfile: Please look up the documentation for Multi Stage builds. Essentially, they encapsulate multiple containers in a single dockerfile. Which can be very useful if you need a different set to build an app and to run the app. The first Container is responsible to build your app, while the second one takes the compiled source code and just runs it.
Watch for COPY --from= lines, these are copying files from one container to another.
The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. As such, a valid Dockerfile must start with a FROM instruction. The image can be any valid image – it is especially easy to start by pulling an image from the Public Repositories.
FROM can appear multiple times within a single Dockerfile to create multiple images or use one build stage as a dependency for another. Simply make a note of the last image ID output by the commit before each new FROM instruction. Each FROM instruction clears any state created by previous instructions.
Optionally a name can be given to a new build stage by adding AS name to the FROM instruction. The name can be used in subsequent FROM and COPY --from= instructions to refer to the image built in this stage.
The tag or digest values are optional. If you omit either of them, the builder assumes a latest tag by default. The builder returns an error if it cannot find the tag value.
Taken from : https://docs.docker.com/engine/reference/builder/#from

How can I edit an existing docker image metadata?

I would like to edit a docker images metadata for the following reasons:
I don't like an image parents EXPOSE, VOLUME etc declaration (see #3465, Docker-Team did not want to provide a solution), so I'd like to "un-volume" or "un-expose" the image.
I dont't like an image ContainerConfig (see docker inspect [image]) cause it was generated from a running container using docker commit [container]
Fix error durring docker build or docker run like:
cannot mount volume over existing file, file exists [path]
Is there any way I can do that?
Its a bit hacky, but works:
Save the image to a tar.gz file:
$ docker save [image] > [targetfile.tar.gz]
Extract the tar file to get access to the raw image data:
tar -xvzf [targetfile.tar.gz]
Lookup the image metadata file in the manifest.json file: There should be a key like .Config which contains a [HEX] number. There should be an exact [HEX].json in the root of the extracted folder.
This is the file containing the image metadata. Edit as you like.
Pack the extracted files back into an new.tar.gz-archive
Use cat [new.tar.gz] | docker load to re-import the modified image
Use docker inspect [image] to verify your metadata changes have been applied
EDIT:
This has been wrapped into a handy script: https://github.com/gdraheim/docker-copyedit
I had come across the same workaround - since I have to edit the metadata of some images quite often (fixing an automated image rebuild from a third party), I have create a little script to help with the steps of save/unpack/edit/load.
Have a look at docker-copyedit. It can remove or overrides volumes as well as set other metadata values like entrypoint and cmd.

Why isn't Docker more transparent about what it's downloading?

When I download a Docker image, it downloads dependencies, but only displays their hashes. Why does it not display what it is downloading?
For example:
➜ ~ docker run ubuntu:16.04
Unable to find image 'ubuntu:16.04' locally
16.04: Pulling from library/ubuntu
b3e1c725a85f: Downloading 40.63 MB/50.22 MB
4daad8bdde31: Download complete
63fe8c0068a8: Download complete
4a70713c436f: Download complete
bd842a2105a8: Download complete
What's the point in only telling me that it's downloading b3e1c725a85f, etc.?
An image is created on layers of filesystems represented by hashes. After it's creation, the base image tag may point to a completely different set of hashes without affecting any images built off of it. And these layers are based on things like run commands, the tag to call it something like ubuntu:16.04 is only added after the image is made.
So the best that could be done is to say 4a70713c436f is based on adding some directory based on a hash of an input folder itself, or a multi-line run command, neither of which makes for a decent UI. The result may have no tagged name, or it could have multiple tagged names. So the simplest solution is to output what's universal and unchanging for all scenarios, an unchanging hash.
To rephrase that pictorially:
b3e1c725a85f: could be ubuntu:16.04, ubuntu:16, ubuntu:latest, some.other.registry:5000/ubuntu-mirror:16.04
4daad8bdde31: could be completely untagged, just a run command
63fe8c0068a8: could be completely untagged, just a copy file
4a70713c436f: could point to a tagged base image where that tag has since changed
bd842a2105a8: could be created with a docker commit command (eek)

Resources