Docker save only non public layers - docker

I can export images with
docker save -o <save image to path> <image name>
but this will pack all layers, and the file is big
is there a possibility to pack only layers which are not public available, so only the difference to the last public layer is exported?

You can try undocker. The tool can extract all or part of the layers of a Docker image onto the local filesystem. You can extract one or more specific layers:
$ docker save busybox |
undocker -vi -o busybox -l ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2
INFO:undocker:extracting image busybox (4986bf8c15363d1c5d15512d5266f8777bfba4974ac56e3270e7760f6f0a8125)
INFO:undocker:extracting layer ea13149945cb6b1e746bf28032f02e9b5a793523481a0a18645fc77ad53c4ea2
Of course, it doesn't automatically sort out publicly available layers, but this is something you can start with, here is the tool intro article by original author.

The docker-save-last-layer command line utility combined with docker build --squash is made to accomplish exactly this.
It exports only the last layer of the specified docker image.
It works by using a patched version of the docker daemon inside a docker image that can access the images on your host machine. So it doesn't require doing a full docker save before using it like the undocker answer. This makes it much more performant for large base images.
Typical usage is simple and looks like:
pip install d-save-last
docker build --t myimage --squash .
d-save-last myimage -o ./myimage.tar

Related

Make docker file from local image?

I have docker file which make image.
FROM public.ecr.aws/lambda/python:3.9
RUN export LANG=en_US.UTF-8
RUN export PYTHONUNBUFFERED=1
docker build -f dockers/mydocker -t python .
then I would like to make images from this image.
There are listed image named basic docker images
Then in another Dockerfile.
FROM python
ADD ./Pipfile* ./
RUN pipenv install --ignore-pipfile
When I try to build this dockerfile
There comes like this
FROM docker.io/library/python
Does this mean I can use local image to build next image?
I need to make local repository for this purpose ??
Or any other way to do this??
This is probably working fine, but you should be careful to pick names that don't conflict with standard Docker Hub image names.
A Docker image name has the form registry.example.com/path/name. If you don't explicitly specify a registry, it always defaults to docker.io (this can't be changed), and if you don't specify a path either, it defaults to docker.io/library/name. This is true in all contexts – the docker build -t option, the docker run image name, Dockerfile FROM lines, and anywhere else an image name appears.
That means that, when you run docker build -t python, you're creating a local image that has the same name as the Docker Hub python image. The docker build diagnostics are showing you the expanded name. It should actually be based on your local image, though; Docker won't contact a remote registry unless the image is missing locally or you explicitly tell it to.
I'd recommend choosing some unambiguous name here. You don't specifically need a Docker Hub account, but try to avoid bare names that will conflict with standard images.
# this will work even if you don't "own" this Docker Hub name
docker build -f Dockerfile.base -t whitebear/python .
FROM whitebear/python
...
(You may have some trouble seeing the effects of your base image since RUN export doesn't do anything; change those lines in the base image to ENV instead.)

How to find out the base image for a docker image

I have a docker image and I would like to find out from which image it has been created. Of course there are multiple layers, but I'd like to find out the last image (the FROM statement in the dockerfile for this image)?
I try to use docker image history and docker image inspect but I can't find this information in there.
I tried to use the following command but it gives me a error message
alias dfimage="sudo docker run -v /var/run/docker.sock:/var/run/docker.sock --rm xyz/mm:9e945ff"
dfimage febae8978318
This is the error message I'm getting
container_linux.go:235: starting container process caused "exec: \"febae8978318\": executable file not found in $PATH"
/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: \"febae8978318\": executable file not found in $PATH".
Easy way is to use
docker image history deno
This above command will give you output like this
Then just look at the IMAGE column and take that image ID which a24bb4013296 which is just above the first <missing>
Then just do the
For Linux
docker image ls | grep a24bb4013296
For Windows
docker image ls | findstr a24bb4013296
This will give you the base image name
The information doesn't really exist, exactly. An image will contain the layers of its parent(s) but there's no easy way to reverse layer digests back to a FROM statement, unless you happen to have (or are able to figure out) the image that contains those layers.
If you have the parent image(s) on-hand (or can find them), you can infer which image(s) your image used for its FROM statement (or ancestry) by cross-referencing the layers.
Theoretical example
Suppose your image, FOO, contains the layers 1 2 3 4 5 6. If you have another image, BAR on your system containing layers 1 2 3, you could infer that image BAR is an ancestor of image FOO -- I.E. that FROM BAR would have been used at some point in its hierarchy.
Suppose further that you have another image, BAZ which contains the layers 1 2 3 4 5. You could infer that image BAZ has image BAR in its ancestry and that image FOO inherits from image BAZ (and therefore indirectly from BAR).
From this, information you could infer the dockerfiles for these images might have looked something like this:
# Dockerfile of image BAR
FROM scratch
# layers 1 2 and 3
COPY ./one /
COPY ./two /
COPY ./three /
# Dockerfile of Image BAZ
FROM BAR
RUN echo "this makes layer 4" > /four
RUN echo "this makes layer 5" > /five
# Dockerfile of image FOO
FROM BAZ
RUN echo "this makes layer 6" > /six
You could get the exact commands by looking at docker image history for each image.
One important thing to keep in mind here, however, is that docker tags are mutable; maintainers make new images and move the tags to those images. So if you built an image with FROM python:3.8.1 today, it won't contain the same layers as if you had built an image with that same FROM line a few weeks ago. You'll need the SHA256 digest to be sure you're using the exact same image.
Practical Example, local images
Now that we understand the theory behind identifying images and their bases, let's put it to practice with a real-world example.
Note: because the tags I use will change over time (see above RE: tag mutability), I'll be using the SHA256 digest to pull the images in this example so it can be reproduced by viewers of this answer.
Let's say we have a particular image and we want to find its base(s). We'll use the official maven image here.
First, we'll take a look at its layers.
# maven:3.6-jdk-11-slim at time of writing, on my platform
IMAGE="docker.io/maven#sha256:55f1c145a04e01706233d68fe0b6b20bf76f765ab32f3fe6e29c8ef933917af6"
docker pull $IMAGE
docker image inspect $IMAGE | jq -r '.[].RootFS.Layers[]'
This will output the layers:
sha256:6e06900bc10223217b4c78081a857866f674c462e4f90593b01894da56df336d
sha256:eda2f4da9b1e70500ac340d40ee039ef3877e8be13b9a24cd345406bf6693412
sha256:6bdb7b3c3e226bdfaa911ba72a95fca13c3979cd150061d570cf569e93037ce6
sha256:ce217e530345060ca0973807a3288560e1e15cf1a4eeec44d6aa594a926c92dc
sha256:f256c980a7d17a00f57fd42a19f6323fcc2341fa46eba128def04824cafa5afa
sha256:446b1af848de2dcb92bbd229ca6ecaabf2f48dab323c19f90d02622e09a8fa67
sha256:10652cf89eaeb5b5d8e0875a6b1867b5cf92c509a9555d3f57d87fab605115a3
sha256:d9a4cf86bf01eb170242ca3b0ce456159fd3fddc9c4d4256208a9d19bae096ca
Now, from here, we can try to find other images that have a (strict) subset of these layers. Assuming you have the images on-hand, you can find them by cross-referencing the layers of images you have on disk, for example, using docker image inspect.
In this case, I just happen to know what these images are and have them on-hand (I'll discuss later what you might do if you don't have the images on-hand) so we'll go ahead and pull those images and take a look at the layers.
If you want to follow along:
# openjdk:11.0.10-jdk-slim at time of writing, on my platform
OPENJDK='docker.io/openjdk#sha256:fe6a46a26ff7d6c31b258e07b3d53f0c42fe68f55f646cc39d60d0b17cbc827b'
# debian:buster-20210329-slim at time of writing on my platform
DEBIAN='docker.io/debian#sha256:088be7d6017ad3ae98325f47707112e1f61687c371be1865e55d5e5531ca97fd'
docker pull $OPENJDK
docker pull $DEBIAN
If we inspect these images and compare them against the layers we saw in the output of docker image inspect for the maven image, we can confirm that the layers from openjdk and debian are present in our original maven image.
$ docker image inspect $DEBIAN | jq -r '.[].RootFS.Layers[]'
sha256:6e06900bc10223217b4c78081a857866f674c462e4f90593b01894da56df336d
$ docker image inspect $OPENJDK | jq -r '.[].RootFS.Layers[]'
sha256:6e06900bc10223217b4c78081a857866f674c462e4f90593b01894da56df336d
sha256:eda2f4da9b1e70500ac340d40ee039ef3877e8be13b9a24cd345406bf6693412
sha256:6bdb7b3c3e226bdfaa911ba72a95fca13c3979cd150061d570cf569e93037ce6
sha256:ce217e530345060ca0973807a3288560e1e15cf1a4eeec44d6aa594a926c92dc
As stated, because these 5 layers are a strict subset of the 8 layers from the maven image, we can conclude the openjdk and debian images are, at least, both in the ancestry path of the maven image.
We can further infer that the last 3 layers most likely come from the maven image itself (or, potentially, some unknown image).
Caveats, when you don't have images locally
Now, of course the above only works because I happen to have all the images on-hand. So, you'd either need to have the images or be able to locate them by the layer digests.
You might still be able to figure this out using information that may be available from registries like Docker Hub or your own private repositories.
For official images, the docker-library/repo-info contains historical information about the official images, including the layer digests for the various tags cataloged over the last several years. You could use this, for example, as a source of layer information.
If you can imagine this like a database of layer digests, you could infer ancestry of at least these official images.
"Distribution" (remote) digests vs "Content" (local) digests
An important caveat to note is that, when you inspect an image for its layer digests locally, you are getting the content digest of the layers. If you are looking at layer digests in a registry manifest (like what appears in the docker-library/repo-info project) you get the compressed distribution digest and won't be able to compare the layer digests with content.
So you can compare digests local <--> local OR remote <--> remote only.
Example, using remote images
Suppose I want to do this same thing, but I want to associate images in a remote repository and find its base(s). We can do the same thing by looking at the layers in the remote manifest.
You can find references how to do this for your particular registry, as described in this answer for dockerhub.
Using the same images from the example above, we would find that the distribution layer digests also match in the same way.
$ get-remote-layers $IMAGE
sha256:6fcf2156bc23db75595b822b865fbc962ed6f4521dec8cae509e66742a6a5ad3
sha256:96fde6667c188c81fcddee021ccbb3e054ebe83350fd4609e17a3d37f0ec7f9d
sha256:74d17759dd2a1b51afc740fadd96f655260689a2087308e40d1865a0098c5fae
sha256:bbe8ebb5d0a64d265558901c7c6c66e1d09f664da57cdb2e5f69ba52a7109d31
sha256:b2edaadd7dd62cfe7f551b902244ee67b84bc5c0b6538b9480ac9ca97a0a4986
sha256:0fca65d33e353bdfdd5edd8d4c8ab5efde52c078bd25e2dcf454f995e5420725
sha256:d6d771d0512387eee1e419a965b929a9a3b0365cf1935b3719d60bf9feffcf63
sha256:dee8cd26669373102db07820072127c46bbfdad340a586ee9dfe60ae933eac2b
$ get-remote-layers $DEBIAN
sha256:6fcf2156bc23db75595b822b865fbc962ed6f4521dec8cae509e66742a6a5ad3
$ get-remote-layers $OPENJDK
sha256:6fcf2156bc23db75595b822b865fbc962ed6f4521dec8cae509e66742a6a5ad3
sha256:96fde6667c188c81fcddee021ccbb3e054ebe83350fd4609e17a3d37f0ec7f9d
sha256:74d17759dd2a1b51afc740fadd96f655260689a2087308e40d1865a0098c5fae
sha256:bbe8ebb5d0a64d265558901c7c6c66e1d09f664da57cdb2e5f69ba52a7109d31
One other caveat with distribution digests in repositories is that you can only compare digests of the same manifest schema version. So, if an image was pushed with manifest v1 it won't have the same digest pushed again with manifest v2.
TL;DR
Images contain the layers of their ancestor image(s). Therefore, if an image A contains a strict subset of image B layers, you know that image B is a descendent of image A.
You can use this property of Docker images to determine the base images from which your images were derived.
You can use method suggested in this answer:
https://stackoverflow.com/a/53841690/3691891
First, pull chenzj/dfimage:
docker pull chenzj/dfimage
Get ID of your image:
docker images | grep <IMAGE_NAME> | awk '{print $3}'
Replace <IMAGE_NAME> with the name of your image. Use this ID as
the parameter to chenzj/dfimage:
docker run -v /var/run/docker.sock:/var/run/docker.sock --rm chenzj/dfimage <IMAGE_ID>
If you find this too hard just pull the chenzj/dfimage image and then
use the following docker-get-dockerfile.sh script:
#!/usr/bin/env sh
if [ "$#" -lt 1 ]
then
printf "Image name needed\n" >&2
exit 1
fi
image_id="$(docker images | grep "^$1 " | awk '{print $3}')"
if [ -z "$image_id" ]
then
printf "Image not found\n" >&2
exit 2
fi
docker run -v /var/run/docker.sock:/var/run/docker.sock --rm chenzj/dfimage "$image_id"
You need to pass image name as the parameter. Example usage:
$ ./docker-get-dockerfile.sh alpine
FROM alpine:latest
ADD file:fe64057fbb83dccb960efabbf1cd8777920ef279a7fa8dbca0a8801c651bdf7c in /
CMD ["/bin/sh"]
docker run image:tag cat /etc/*release*
Run a docker container from that image with the command above(change "image:tag" with your image name and tag). your container will print details you need to answer your question.

Best practice to connect my own code into a standard docker image in kubernetes

I have a lot of standard runtime docker images like python3 with tensorflow 1.7 installed and I want to use these standard images to run some customers code out side of them. The scenario seems quite similar with the serverless. So what is the best way to put the code into runtime dockers?
Right now I am trying to use a persistent volume to mount the code into runtime. But it has a lot of work. Is there some solution easier for this?
UPDATE
What is the workflow for google machine learning engine or floydhub. I think what I want is similar. They have a command line tool to make the local code combine with a standard env.
Following cloud native practices, code should be immutable, and releases and their dependencies uniquely identifiable for repeat-ability, replic-ability, etc - in short: you should really create images with your src code.
In your case, that would mean basing your Dockerfile on upstream python3 or TF images, there are a couple projects that may help with the workflow for above (code+build-release-run):
https://github.com/Azure/draft -- looks like better suited for your case
https://github.com/GoogleContainerTools/skaffold -- more golang friendly afaics
Hope it helps --jjo
One of the best practices is NOT to mount the code from a volume into it, but create a client-specific image that uses your TensorFlow image as a base image:
# Your base image comes in here.
FROM aisensiy/tensorflow:1
# Copy the client into your image.
COPY src /
# As Kubernetes will run your containers with an
# arbitrary UID, we set the user to nobody.
USER nobody
# ... and they will run with GID 0, so we
# need to change the group to 0 and make
# your stuff accessible to GID 0.
RUN \
chgrp -R 0 /src && \
chmod -R g=u && \
true
CMD ["/usr/bin/python", ...]
Some more best practices:
Always log to stdout instead of log files.
One process per container. If you need multiple local
processes, co-locate them into a single pod.
Even more best practices are provided in the OpenShift documentation: https://docs.openshift.org/latest/creating_images/guidelines.html
https://docs.openshift.org/latest/creating_images/guidelines.html
The code file can be passed from stdin when the container is being started. This way you can run arbitrary code when starting the container.
Please see below for example:
root#node-1:~# cat hello.py
print("This line will be printed.")
root#node-1:~#
root#node-1:~# docker run --rm -i python python < hello.py
This line will be printed.
root#node-1:~#
If this is your case,
You have a docker image with code in it.
Aim: To update the code inside docker image.
Solution:
Run a bash session with the docker image with a directory in your file system mounted as volume.
Place the updated code in the volume directory.
From the docker bash session replace the real code with updated code from the volume.
Save the current state of container as new docker image.
Sample Commands
Assume ~/my-dir in your file system has the new code updated-code.py
$ docker run -it --volume ~/my-dir:/workspace --workdir /workspace my-docker-image bash
Now a new bash session will start inside docker container.
Assuming you have the code in '/code/code.py' inside docker container,
You can simply update the code by
$ cp /workspace/updated-code.py /code/code.py
Or you can create new directory and place the code.
$ cp /workspace/updated-code.py /my-new-dir/code.py
Now the docker container contains updated code. But changes will be reset if you close the container and again run the image. To create a docker image with latest code, save this state of container using docker commit.
Open a new tab in the terminal.
$ docker ps
Will list all running docker containers.
Find CONTAINER ID of your docker container and save it.
$ docker commit id-of-your-container new-docker-image-name
Now run the docker image with latest code
$ docker run -it new-docker-image-name
Note: It is recommended to remove the old docker image using docker rmi command as docker images are heavy.
We're dealing with a similar challenge also. Our approach is to build a static docker image where Tensorflow, Python, etc are built once and maintained.
Each user has a PVC (persistent volume claim) where large files that may change such as datasets and workspaces live.
Then we have a bash shell that launches the cluster resources and syncs the workspace using ksync (like rsync for a kubernetes cluster).

How can I make docker to not eat up disk space if used in Continuous integration

I am playing with docker and plan to use it in a GitLab CI environment to package the current project state to containers and provide running instances to do reviews.
I use a very simple Dockerfile as follows:
FROM php:7.0-apache
RUN sed -i 's!/var/www/html!/var/www/html/public!g' /etc/apache2/sites-available/000-default.conf
COPY . /var/www/html/
Now, as soon as a I a new (empty) file (touch foobar) to the current directory and call
docker build -t test2 --rm .
again, a full new layer is created, containing all of the code.
If I do not create a new file, the old image seems to be nicely reused.
I have a half-way solution using the following Dockerfile:
FROM test2:latest
RUN sed -i 's!/var/www/html!/var/www/html/public!g'
/etc/apache2/sites-available/000-default.conf
COPY . /var/www/html/
After digging into that issue and switching the storage driver to overlay, this seems to be what I want - only a few bytes are added as a new layer.
But now I am wondering, how I could integrate this into my CI setup - basically I would need two different Dockerfiles - depending on whether the image already exists or it doesn't.
Is there a better solution for this?
Build your images with same tags or no tags
docker build -t myapp:ci-build ....
or
docker build ....
If you use same tags then old images will be untagged and will have "" as name. If you don't tag them then also they will have "" in name.
Now you can schedule below command
docker system prune -f
This will remove all dangling images containers etc
One suggestion is to use the command docker image prune to clean dangling images. This can save you a lot of space. You can run this command regularly in your CI.

How make docker layer to single layer

Docker images create with multiple layers, I want to convert this to single layer is there any docker build command to achive this ? i googled for but cant find anything
No command to achieve that, and a single layer image is against docker's design concept. This Understand images, containers, and storage drivers doc described why docker image has multiple layers. In short, image layers are one of the reasons Docker is so lightweight. When you change a Docker image, such as when you update an application to a new version, a new layer is built and replaces only the layer it updates. Besides, even your image has only one layer, when you create a container with that image, docker still will add a thin Read/Writable container layer on the top of your image layer.
If you just want to move your image around and think one single layer could make it easier, you probably should try to use docker save command to create a tar file of it.
Or you have more complicated requirements, you may need to use VM image rather than docker image.
I have just workaround by using multistage build (the last build will be just a COPY from the previous build)
FROM alpine as build1
RUN echo "The 1st Build"
FROM scratch
COPY --from=build1 / /
First option:
# docker image build .
# docker run <your-image>
# docker container export <container-id created from previous command> -o myimage.tar.gz
# docker image import myimage.tar.gz
The imported image will be a single layer file system image.
Second option: (not a complete solution) - use multi stage builds to reduce number of image layers.
During build we can also pass --squash option to make it a single layer image.
Experimental (daemon)API 1.25+
Squash newly built layers into a single new layer
https://docs.docker.com/engine/reference/commandline/image_build/
Flattening a Docker Image to a Single Layer:
docker run -d --name flat_container nginx
docker export flat_container > flat.tar
cat flat.tar | docker import - flat:latest
docker image history flat

Resources