How to document a docker image

How to document a docker image - docker

I have a docker image that receives a set of environment variables to customize its execution.
A simple example would be a web-server, that has stuff like client secret for OAuth2, a secret to sign cookies, etc.
The whole app is containerized on a docker image, that receives (runtime) environment variables.
I distribute that docker image on a private registry, and I would like to document that image, so that users can understand how they can customize the image.
Is it possible to ship, as part of the docker image, annotations that e.g. using docker describe my_image output markdown to the stdout?
I could of course use a static page on the web for documentation, but the user would still need to know where that documentation could be found, and the whole distribution would be more complext this way (e.g. documentation changes with image tag).
Any ideas?

There is no silver bullet here as far as I know, All solutions below work, but require the user to be informed of how to retrieve the documentation.
There is no standard way of doing it.
The open container initiative have created an image spec annotation suggesting that
A link to more information about the image should be provided in a label called org.opencontainers.image.documentation.
A description of the software packaged inside the container should be provided in a label called org.opencontainers.image.description
According to OCI, one of the variations of option 1 below is correct.
Option 1: Providing a link in a label (Prefered by OCI)
Assuming the Dockerfile and related assets are version controlled in a git repository that is publicly accessible (for example on github), that git repository could also contain a README.md file. If you have a pipeline hooked up to the repo that builds and publishes the Docker image to a registry automatically, you could setup the docker build command to add a label with a link to the documentation as follows
# Get the current commit id
commit=$(git rev-parse HEAD)
# Build docker image and attach a link to the Readme as a label
docker build -t myimagename:myversion \
--label "org.opencontainers.image.documentation=https://github.com/<user>/<repo>/blob/$commit/README.md"
This solution links to specific commit documentation for that particular commit versioned alongside your Dockerfile. It does however require the user to have access to internet to be able to read the documentation
Option 1b: Providing full documentation in a label (Prefered by OCI)
A variation of option 1 where the full documentation is serialized and put into the label (there is no length restrictions on labels). This way the documentation is bundled with the image itself
As Jorge Leitao pointed out in the comments, the image annotaion spec from OCI specifies the name of such a label as org.opencontainers.image.description
Option 2: Bundling documentation inside image
If you prefer to actually bundle the Readme.md file inside the image to make it independent on any external web page consider the following
Upon build, make sure to copy the Readme.md file to the docker image
Also create a simple shell script describe that cats the Readme.md
describe
#!/usr/bin/env sh
cat /docs/Readme.md
Dockerfile additions
...
COPY Readme.md /docs/Readme.md
COPY describe /opt/bin/describe
RUN chmod +x /opt/bin/describe
ENV PATH="/opt/bin:${PATH}"
...
A user that have your Docker image an now run the following command to have the markdown sent to stdout
docker run myimage:version describe
This solution bundles the documentation for this particular version of the image inside the image and it can be retrieved without any external dependencies

Related

Docker image layer: What does `ADD file:<some_hash> in /` mean?

In Docker Hub images there are lists of commands that being run for each image layer. Here is a golang example.
Some applications also provide their Dockerfile in GitHub. Here is a golang example.
According to the Docker Hub image layer, ADD file:4b03b5f551e3fbdf47ec609712007327828f7530cc3455c43bbcdcaf449a75a9 in / is the first command. The image layer doesn't have any "FROM" command included, and it doesn't seem to be suffice the ADD definition too.
So here are the questions:
What does ADD file:<HASH> in / means? What is this format?
Is there any way I could trace upwards using the hash? I suppose that hash represents the FROM image, but it seems there are no API for that.
Why it is not possible to build a dockerfile using the ADD file:<HASH> in / syntax? Is there any way I could build an image using such syntax, OR do a conversion between two format?

That Docker Hub history view doesn't show the actual Dockerfile; instead, it shows content essentially extracted from the docker history of the image. That doesn't preserve the specific details you're looking for: it doesn't remember the names of base images, or the build-context file names of things that get ADDed or COPYed in.
Chasing through GitHub and Docker Hub links, the golang:*-buster Dockerfile is built FROM buildpack-deps:...-scm; buildpack-deps:buster-scm is FROM buildpack-deps:buster-curl; that is FROM debian:buster; and that has a very simple Dockerfile (quoted here in its entirety):
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
FROM scratch starts from a completely totally empty image; that is the base of the Docker image tree (and what tells docker history and similar tools to stop). The ADD line unpacks a tar file of a Debian system image.
If you look at docker history or the Docker Hub history view you cite, you should be able to see these same steps happening. The ADD file:4b0... in / corresponds to the ADD rootfs.tar.gz /, and the second line is the CMD ["bash"]. It is not split up by Dockerfile or image, and the original filenames from ADD aren't saved. (You couldn't reproduce the image anyways without the contents of the rootfs.tar.gz, so it's merely slightly helpful to know its filename but not essential.)
The ADD file:hash in /path syntax is not standard Dockerfile syntax (the word in in particular is not part of it). I'm not sure there's a reliable way to translate from the host file or URL to the hash, but building the image and looking at its docker history would tell you (assuming you've got a perfect match for the file metadata). There's no way to get back to the original filename or syntax, and definitely no way to get back to the file contents.

ADD or COPY means that files are append to the images.
That are files, you cannot "trace" them.
You cannot just copy the commands, because the hashes are not the original files. See https://forums.docker.com/t/how-to-extract-file-from-image/96987 to get the file.

Deploying cgal docker

I'm trying to deploy the official CGAL docker. From reading the README I understand that after downloading the specific image (e.g I want to open a docker with ubuntu16+CGAL and all of it's dependencies) using the following command:
docker pull cgal/testsuite-docker:ubuntu # get a specific image by replacing TAG with some tag
I need to install the cgal library itself using the
./test_cgal.py --user **** --passwd **** --images cgal-testsuite/ubuntu
The thing is that eventually I want to start the docker with an interactive shell, i.e
docker run --rm -it -v $(pwd):/source somedocker
And I couldn't understand where is the generated image, after the CGAL installation script.

Those images are not for running CGAL. They are only images we use to define an environment for our testsuite, and run tests in it, including compiling CGAL.
test_cgal.py will download the integration branch, which is rarely working as it is the branch in which we merge our PR to test them nightly. Don't use this to get a working CGAL. To my knowledge, there is no such image as the one you are looking for. No official one anyways.
Furthermore, installing cgal at runtime in this image will not modify the image, once you close the container your installation will be lost. You need to specify how to install CGA in the Dockerfile of your image and
then build it if you want a "ready to use" image.
You can use the dockerfile of the image you found to write your own, as there should be all the dependencies specified in it, but you need to edit it to download CGAL and maybe build it if you don't want the header-only version. This is not done in test-cgal.py or anywhere in this docker repository.

What does "From image" do in dockerfiles

I see that dockerfiles usually have a line beginning with "from" keywork, for example:
FROM composer/composer:1.1-alpine AS composer
As far as I know, dockerfiles are a set of commands that help to build and run many containers in docker.
As the example above, docker uses a image named composer/composer:1.1-alpine from docker hub. The As composer just make an alias, so we can use it more convenient.
When I looked for the image, I found the link enter link description here and then enter link description here.
The thing I dont really understand is that:
I guess docker will use the image to build something, but how exactly does it use the image? Does docker run the image or just prepare to use it when in need. Sometimes I dont see the dockerfiles use the image in following line (like this example, there are no lines using the keyword "composer" except the first line). It makes me confused.
Any help would be appreciated.
Thanks.

DockerFiles describes layers: Each command creates it's own layer. For example:
RUN touch test.txt
RUN cp test.txt foo.txt
would create two layers - the first one with the file test.txt, the second one without test.txt but with foo.txt
Each layer adds something to a container. When we walk the layers "up" we find that the very first layer is the empty layer, e.g. it contains only the linux (or windows) kernel itself - nothing else. But that's not really useful - we need a lot of tools (e.g. bash) to be able to run an app. So common base images like alpine add suc tools and core os functions.
It would be annoying as hell if we had to do this setup in every container so there a lots of base images, which do exactly this kind of setup.
If you want to see what a base image does, just search the name on hub.docker.com - there you will find the Dockerfile, describing the build process.
Aditionally, containers can be extenend, e.g. you use the elasticsearch container as a base image, and add your own functionality - that's the second use case for base images.
For your second question: You have to decide if you have to replicate the steps in your base image or not. If you inherit from a general OS image like apline - probably not, since linux normally ships with these tools. If you inherit from a more specialized container, it depends - if your machine matches the environment in the container, you don't need to, but if not you will have to apply these steps to your machine, too. E.g. if you don't have elasticsearch installed, you have to install it.
As for multiple froms in one Dockerfile: Please look up the documentation for Multi Stage builds. Essentially, they encapsulate multiple containers in a single dockerfile. Which can be very useful if you need a different set to build an app and to run the app. The first Container is responsible to build your app, while the second one takes the compiled source code and just runs it.
Watch for COPY --from= lines, these are copying files from one container to another.

The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. As such, a valid Dockerfile must start with a FROM instruction. The image can be any valid image – it is especially easy to start by pulling an image from the Public Repositories.
FROM can appear multiple times within a single Dockerfile to create multiple images or use one build stage as a dependency for another. Simply make a note of the last image ID output by the commit before each new FROM instruction. Each FROM instruction clears any state created by previous instructions.
Optionally a name can be given to a new build stage by adding AS name to the FROM instruction. The name can be used in subsequent FROM and COPY --from= instructions to refer to the image built in this stage.
The tag or digest values are optional. If you omit either of them, the builder assumes a latest tag by default. The builder returns an error if it cannot find the tag value.
Taken from : https://docs.docker.com/engine/reference/builder/#from

How to specify different .dockerignore files for different builds in the same project?

I used to list the tests directory in .dockerignore so that it wouldn't get included in the image, which I used to run a web service.
Now I'm trying to use Docker to run my unit tests, and in this case I want the tests directory included.
I've checked docker build -h and found no option related.
How can I do this?

Docker 19.03 shipped a solution for this.
The Docker client tries to load <dockerfile-name>.dockerignore first and then falls back to .dockerignore if it can't be found. So docker build -f Dockerfile.foo . first tries to load Dockerfile.foo.dockerignore.
Setting the DOCKER_BUILDKIT=1 environment variable is currently required to use this feature. This flag can be used with docker compose since 1.25.0-rc3 by also specifying COMPOSE_DOCKER_CLI_BUILD=1.
See also comment0, comment1, comment2
from Mugen comment, please note
the custom dockerignore should be in the same directory as the Dockerfile and not in root context directory like the original .dockerignore
i.e.
when calling
DOCKER_BUILDKIT=1
docker build -f /path/to/custom.Dockerfile ...
your .dockerignore file should be at
/path/to/custom.Dockerfile.dockerignore

At the moment, there is no way to do this. There is a lengthy discussion about adding an --ignore flag to Docker to provide the ignore file to use - please see here.
The options you have at the moment are mostly ugly:
Split your project into subdirectories that each have their own Dockerfile and .dockerignore, which might not work in your case.
Create a script that copies the relevant files into a temporary directory and run the Docker build there.

Adding the cleaned tests as a volume mount to the container could be an option here. After you build the image, if running it for testing, mount the source code containing the tests on top of the cleaned up code.
services:
tests:
image: my-clean-image
volumes:
- '../app:/opt/app' # Add removed tests

I've tried activating the DOCKER_BUILDKIT as suggested by #thisismydesign, but I ran into other problems (outside the scope of this question).
As an alternative, I'm creating an intermediary tar by using the -T flag which takes a txt file containing the files to be included in my tar, so it's not so different than a whitelist .dockerignore.
I export this tar and pipe it to the docker build command, and specify my docker file, which can live anywhere in my file hierarchy. In the end it looks like this:
tar -czh -T files-to-include.txt | docker build -f path/to/Dockerfile -

Another option is to have a further build process that includes the tests. The way I do it is this:
If the tests are unit tests then I create a new Docker image that is derived from the main project image; I just stick a FROM at the top, and then ADD the tests, plus any required tools (in my case, mocha, chai and so on). This new 'testing' image now contains both the tests and the original source to be tested. It can then simply be run as is or it can be run in 'watch mode' with volumes mapped to your source and test directories on the host.
If the tests are integration tests--for example the primary image might be a GraphQL server--then the image I create is self-contained, i.e., is not derived from the primary image (it still contains the tests and tools, of course). My tests use environment variables to tell them where to find the endpoint that needs testing, and it's easy enough to get Docker Compose to bring up both a container using the primary image, and another container using the integration testing image, and set the environment variables so that the test suite knows what to test.

Sadly it isn't currently possible to point to a specific file to use for .dockerignore, so we generate it in our build script based on the target/platform/image. As a docker enthusiast it's a sad and embarrassing workaround.

when pushing docker image to private docker registry, having trouble marking it 'public' via my script (but can do via web ui)

I am pushing a docker image to a private docker registry, and am having trouble marking it 'public' via
a script.
For this discussion, I'm guessing the content of the Dockerfile doesn't matter... so lets assume I have the following in my
current working directory:
Dockerfile
from ubuntu
touch /tmp/foo
I build like this:
docker build -t my.private.docker.registry.com/foo/jdk1.8.on.ubuntu14.04 .
Then, I am doing my push like this:
docker push my.private.docker.registry.com/foo/jdk1.8.on.ubuntu14.04
Next, I navigate to the web site that allows me to manage my private registry (# the url http://my.private.docker.registry.com)
I look at my image, and I see it has a padlock icon next to it, indicating that it is private. I can manually unlock from the
web UI, but I'd like to know if there are any options to docker's 'push command that will allow me to mark the image
as 'public' without manual intervention.
One thing I tried was setting global settings for my namespace such that all new repos would be readable/writable by all users.
Specifically: I went into the Docker web ui for my private registry and for the namespace 'foo' I tried adding default permissions
(for any newly created repos) such that all users will have 'write' access to any new repo pushed under the 'foo' namespace.
However, even after doing the above, when I pushed a new image to my private registry under namespace foo, that image was still
marked with the pad-lock. I looked up the command line options for 'docker push', and I did not see any option that looked like
it would affect the visibility of the image at the time of push.
thanks in advance for your help !
-chris

So, according to the folks who manage the Docker registry at the company I'm at now: there is no command line way to enable permissions for users other than the repository creator to have write access to that repo. You have to go to the web UI and manually mark the repo 'public', and you have to add permissions for each user (although it is possible to have groups of users, and then add a whole group -- this still is clunky because new employees have to be manually added to the group).
I find it hard to believe that there's no command line way.. But this is what our experts say.. If there are other experts out there who have a better idea, please chime in ! Otherwise I will do it manually through the web UI (grrrrRRrr).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart