Modifying and rebuilding a Docker image

Modifying and rebuilding a Docker image - docker

I'd like to make a change to a third-party Docker image (the official Shipyard image), and recompose a new image.
Will I have to export a TAR file, expand it into a directory, make the change, build a new TAR, and import that TAR, or is there a way to simply pour the contents of the image into a directory, and rebuild a new one, directly, when done?

You could either:
start from their Dockerfile or
just use FROM shipyard/shipyard to start your own Dockerfile based on their binary image.

Related

How can I load a Docker image created from the .tar file of an original Docker image?

I have a Docker image in .tar format. When I load it using sudo docker load < image.tar, it works fine.
I used tar -xf image.tar to un-archive the file. I then un-archived each layer file so I could edit some scripts and update some libraries manually. Once I was done with this, I used tar -cf on each layer and then the entire image.
When I load the modified image the same way I loaded the original, it does not work. I get this error:
open /var/lib/docker/tmp/docker-import-742628246/image-edited/json: no such file or directory
What could I have done wrong to cause this error and how can I properly load the modified .tar file into Docker?
P.S.: The problem appeared on Docker 20.10.12, running on Kali Linux 2021.4 inside VMWare Workstation Player.

I am modifying some image tarball and trying to load it back to docker daemon, and saw the same error. I tried this: sha256sum the layer.tar in each layer, and search the config json file(the other json file at the same level as manifest.json, with a long sha value as filename) for the original hash, replace with the new sha256 value, then this error disappears.
But then again, another error like /var/lib/docker/tmp-xxxx/xxxxx cannot open file appears, then I think nothing should/can be modified in the layer tar, as we cannot bypass docker checking the integrity of image in any easy way.
I am planning to adding another layer to the image using Google jib tool, where I copy some script to the container which does the modification I want, so that original layers keep intact.

How do I remove a directory inside container with JIB?

If it is a docker file, I want to remove the directory by executing the following command.
RUN rm /usr/bin/wget
How can I do it? any help is appreciated

First thing to emphasize: in Dockerfile, RUN rm /usr/bin/wget doesn't physically remove the file. Files and directories in previous layers will physically stay there forever. So, if you are trying to remove a file with sensitive information using rm, it's not going to work. As an example, recently, this oversight has led to a high-profile security breach in Codecov.
Docker Layer Attacks: Publicly distributed Docker images should be either squashed or multistage such that intermediate layers that contain sensitive information are excluded from the final build.
What happens is, RUN rm /usr/bin/wget creates another layer that contains a "whiteout" file /usr/bin/.wh.wget, and this new layer sits on top of all previous layers. Then at runtime, it's just that container runtimes will hide the file and you will not see it. However, if you download the image and inspect each layer, you will be able to see and extract both /usr/bin/wget and /usr/bin/.wh.wget files. So, yes, doing rm later doesn't reduce the size of the image at all. (BTW, each RUN in Dockerfile creates a new layer at the end. So, for example, if you remove files within the same RUN like RUN touch /foo && rm /foo, you will not leave /foo in the final image.)
Therefore, with Jib, if the file or directory you want to "delete" is coming from a base image, what you can do is to create a new whiteout file for it. Jib has the <extraDirectories> feature to copy arbitrary files into an image. So, for example, since <project root>/src/main/jib is the default extra directory, you can create an empty src/main/jib/usr/bin/.wh.wget, which will be coped into /usr/bin/.wh.wget in an image.
And of course, if you really want to physically remove the file that comes from the base image, the only option is to rebuild your base image so that it doesn't contain /usr/bin/wget.
For completeness: if the file or directory you want to remove is not from your base image but from Jib, you can use the Jib Layer-Filter extension (Maven/Gradle). (This is app-layer filtering and doesn't involve whiteout files.) However, normally there will be no reason to remove files put by Jib.

Docker image layer: What does `ADD file:<some_hash> in /` mean?

In Docker Hub images there are lists of commands that being run for each image layer. Here is a golang example.
Some applications also provide their Dockerfile in GitHub. Here is a golang example.
According to the Docker Hub image layer, ADD file:4b03b5f551e3fbdf47ec609712007327828f7530cc3455c43bbcdcaf449a75a9 in / is the first command. The image layer doesn't have any "FROM" command included, and it doesn't seem to be suffice the ADD definition too.
So here are the questions:
What does ADD file:<HASH> in / means? What is this format?
Is there any way I could trace upwards using the hash? I suppose that hash represents the FROM image, but it seems there are no API for that.
Why it is not possible to build a dockerfile using the ADD file:<HASH> in / syntax? Is there any way I could build an image using such syntax, OR do a conversion between two format?

That Docker Hub history view doesn't show the actual Dockerfile; instead, it shows content essentially extracted from the docker history of the image. That doesn't preserve the specific details you're looking for: it doesn't remember the names of base images, or the build-context file names of things that get ADDed or COPYed in.
Chasing through GitHub and Docker Hub links, the golang:*-buster Dockerfile is built FROM buildpack-deps:...-scm; buildpack-deps:buster-scm is FROM buildpack-deps:buster-curl; that is FROM debian:buster; and that has a very simple Dockerfile (quoted here in its entirety):
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
FROM scratch starts from a completely totally empty image; that is the base of the Docker image tree (and what tells docker history and similar tools to stop). The ADD line unpacks a tar file of a Debian system image.
If you look at docker history or the Docker Hub history view you cite, you should be able to see these same steps happening. The ADD file:4b0... in / corresponds to the ADD rootfs.tar.gz /, and the second line is the CMD ["bash"]. It is not split up by Dockerfile or image, and the original filenames from ADD aren't saved. (You couldn't reproduce the image anyways without the contents of the rootfs.tar.gz, so it's merely slightly helpful to know its filename but not essential.)
The ADD file:hash in /path syntax is not standard Dockerfile syntax (the word in in particular is not part of it). I'm not sure there's a reliable way to translate from the host file or URL to the hash, but building the image and looking at its docker history would tell you (assuming you've got a perfect match for the file metadata). There's no way to get back to the original filename or syntax, and definitely no way to get back to the file contents.

ADD or COPY means that files are append to the images.
That are files, you cannot "trace" them.
You cannot just copy the commands, because the hashes are not the original files. See https://forums.docker.com/t/how-to-extract-file-from-image/96987 to get the file.

How can I edit an existing docker image metadata?

I would like to edit a docker images metadata for the following reasons:
I don't like an image parents EXPOSE, VOLUME etc declaration (see #3465, Docker-Team did not want to provide a solution), so I'd like to "un-volume" or "un-expose" the image.
I dont't like an image ContainerConfig (see docker inspect [image]) cause it was generated from a running container using docker commit [container]
Fix error durring docker build or docker run like:
cannot mount volume over existing file, file exists [path]
Is there any way I can do that?

Its a bit hacky, but works:
Save the image to a tar.gz file:
$ docker save [image] > [targetfile.tar.gz]
Extract the tar file to get access to the raw image data:
tar -xvzf [targetfile.tar.gz]
Lookup the image metadata file in the manifest.json file: There should be a key like .Config which contains a [HEX] number. There should be an exact [HEX].json in the root of the extracted folder.
This is the file containing the image metadata. Edit as you like.
Pack the extracted files back into an new.tar.gz-archive
Use cat [new.tar.gz] | docker load to re-import the modified image
Use docker inspect [image] to verify your metadata changes have been applied
EDIT:
This has been wrapped into a handy script: https://github.com/gdraheim/docker-copyedit

I had come across the same workaround - since I have to edit the metadata of some images quite often (fixing an automated image rebuild from a third party), I have create a little script to help with the steps of save/unpack/edit/load.
Have a look at docker-copyedit. It can remove or overrides volumes as well as set other metadata values like entrypoint and cmd.

What is the difference between save and export in Docker?

I am playing around with Docker for a couple of days and I already made some images (which was really fun!). Now I want to persist my work and came to the save and export commands, but I don't fully understand them.
What is the difference between save and export in Docker?

The short answer is:
save will fetch an image : for a VM or a physical server, that would be the installation .ISO image or disk. The base operating system.
It will pack the layers and metadata of all the chain required to build the image. You can then load this "saved" images chain into another docker instance and create containers from these images.
export will fetch the whole container : like a snapshot of a regular VM. Saves the OS of course, but also any change you made, any data file written during the container life. This one is more like a traditional backup.
It will give you a flat .tar archive containing the filesystem of your container.
Edit: as my explanation may still lead to confusion, I think that it is important to understand that one of these commands works with containers, while the other works with images.
An image has to be considered as 'dead' or immutable, starting 0 or 1000 containers from it won't alter a single byte. That's why I made a comparison with a system install ISO earlier. It's maybe even closer to a live-CD.
A container "boots" the image and adds an additional layer on top of it. This layer stores any change on the container (created/changed/removed files...).

There are two main differences between save and export commands.
save command saves whole image with history and metadata but export command exports only files structure (without history and metadata). So the exported tar file will be smaller then the saved one.
When you use exported file system for creating a new image then this new image will not contain any USER, EXPOSE, RUN etc. commands from your Dockerfile. Only file structure will be transferred.
So when you are using mentioned keywords in your Dockerfile then you cannot use export command for transferring image to another machine - you need always use save command.

export: container (filesystem)->image tar.
import: exported image tar-> image. Only one layer.
save: image-> image tar.
load: saved image tar->image. All layers will be recovered.
From Docker in Action, Second Edition p190.
Layered images maintain the history of the image, container-creation metadata, and old files that might have been deleted or overridden.
Flattened images contain only the current set of files on the filesystem.

The exported image will not have any layer or history information saved, so it will be smaller and you will not be able to rollback.
The saved image will have layer and history information, so larger.
If giving this to a customer, the Q is do you want to keep those layers or not?

Technically, save/load works with repositories which can be one or more of images, also referred to as layers. An image is a single layer within a repo. Finally, a container is an instantiated image (running or not).

Docker save Produces a tar file repo which contains all parent layers, and all tags + versions, or specified repo:tag, for each argument provided from image.
Docker export Produces specified file(can be tar or tgz) with flat contents without contents of specified volumes from Container.
docker save need to use on docker image while docker export need to use on container(just like running image)
Save Usage
docker save [OPTIONS] IMAGE [IMAGE...]
Save an image(s) to a tar archive (streamed to STDOUT by default)
--help=false Print usage -o, --output="" Write to a file,
instead of STDOUT
export Usage
docker export [OPTIONS] CONTAINER
Export the contents of a container's filesystem as a tar archive
--help=false Print usage -o, --output="" Write to a file,
instead of STDOUT

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart