What are docker child images - docker

What are docker child images and why can't I delete them?
I have been working with a Kali Linux image and I commit my changes and call it Kaliupdate1, make more changes and call it Kaliupdate2 and then I try to remove Kaliupdate1 but it doesn't work...
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kaliupdate2 latest e57f94c32fac 18 hours ago 2.25 GB
kaliupdate1 latest 16da215f736c 18 hours ago 1.12 GB
kaliupdate latest a841aa8bb8a9 19 hours ago 1.07 GB

So from your question, assuming that your workflow has been to start a container, work interactively inside the container and then commit the changes to a new image, the answer is that what you're essentially doing is creating a new layer on top of the existing kali base image.
As such the full stack of layers are required to operate. This doesn't mean that the disk space taken is 2.25+1.12+1.07 however as Docker shares the lower layers.
That said this isn't a great way to create Docker images, as doing things like chown and mv can leave redundant files in the image.
A better way is to create a new Dockerfile based on the original kali image (using FROM kali:latest in the Dockerfile) and then make the changes you want in the Dockerfile and execute a build , to give you the final image.
There's more information on Docker's website here

Related

How to reduce size of docker image by removing files and comitting

I have created a docker image and started installing many packages I needed for some tests.
However, after I was done, I realized I could remove some folders in order to reduce image size.
So that's what I did, and I commited those changes to a new image.
However, the image size remained the same. I saw someone with a similar issue here in SO but it seems there is an answer that explains that docker uses layers for its storage, so the image size only increases.
So my question is if it is possible to reduce image size by deleting folders and files or should I start from scratch?
docker image ls output for reference:
REPOSITORY TAG IMAGE ID CREATED SIZE
abacate melancia 9c1b3acdf62c 3 seconds ago 34.8GB
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx v2 68f6862f8371 9 minutes ago 34.8GB
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx v1 2090d0a74e9d 5 days ago 34.8GB
yes, you should start from scratch. The initial layers are indeed adding content to your image, as the existing answer notes.
If you kept your Dockerfile, you should be able to just replay the changes you made to your original image. This is a key way of working with Docker. The outcome of a build process isn't very valuable, when the process is repeatable. Keep the Dockerfile under source control, and Docker images become almost as ephemeral as Docker containers.

FROM dockerimage:latest pulls wrong image

I have a client-web/base image I build using gitlab ci pipeline:
latest c4fba30df 204.03 MiB 6 days ago
version_2 c4fba30df 204.03 MiB 6 days ago
version_1 7904a77c0 153.69 MiB 2 months ago
these are the images in my docker repository: as you can see, the image with tag latest, is actually the latest image, having the same image id (c4fba30df) as the image with tag version_2.
I build another image that is built on top of base image:
FROM gitlab.faccousa.net:4567/faccos/client-web/base:latest
...
...
...
Yesterday, I built the above image file and it looks like it happened the following:
Step 1/6 : FROM gitlab.faccousa.net:4567/faccos/client-web/base:latest
---> 7904a77c0
But 7904a77c0 is the version_1, so the older image id.
Am I doing something wrong with the latest tag?
I know latest is misused by many people, but in this case I feel I have a CI the always builds my base image and tags it twice with:
actual tag
latest tag
When you docker run an image, or if a Dockerfile is built FROM an image, and Docker thinks it already has the image locally, it will use the image it already has. In your case, since you already have a ...:latest version, Docker just uses it; it doesn't ever check that there might be a different version of the image with the same tag elsewhere.
The most reliable approach to this is to never use the :latest tag anywhere:
FROM gitlab.faccousa.net:4567/faccos/client-web/base:version_2
If you have a lot of dependent images and the base image changes at all regularly, though, maintaining this can become a hassle.
Another option is to tell docker build to try to --pull the base image every time
docker build --pull -t ... .
with the downsides that this build will fail if the remote repository is unavailable, and builds will take somewhat longer even if the base image hasn't changed.
does base is your project name?
gitlab.example.com:port/user/projectname:latest
here's the full guide
its normal that your version_2 and latest have the same image id

What is the practical use of a "tag" in docker?

For instance, there is one repository created using
$ docker tag friendlyhello john/get-started:part1
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
friendlyhello latest d9e555c53008 3 minutes ago 195MB
john/get-started part1 d9e555c53008 3 minutes ago 195MB
Now here are two images: friendlyhello:latest and john/get-started:part1.
I noticed that these two images has the same IMAGE ID.
So I guess there is just ONE image file on my disk.
If it is true, why should I tag to the repository just like create a link file in operating system?
In short, tags are used for convenience in order to identify an image (which is a combination of filesystem layers). Because your image evolves over time and sees more layers being added to form a new image, tags are also a convenient way to do versioning.
When a user downloads your image from a registry like Docker Hub or the Google Container Registry, they can easily associate which version they are downloading from the tag.
It is no different than tagging a release with git if you are familiar with it but you are tracking changes to your image in this case. You tag a release for your image and push changes to your remote repository.
Taking Ubuntu as an example, tags are used to refer to specific releases of the operating system and there can be plenty. For example these are all valid tags for Ubuntu (see docker hub's page):
rolling
zesty
17.04
latest
xenial
16.04
trusty
14.04
Multiple tags can point to one container image. With Ubuntu: xenial, latest and 16.04 are tags that point to the same location, these are just many ways to refer to the same image. This way and because I know that the latest (stable) version of Ubuntu is xenial and the version number is 16.04, I can download this specific image from the Docker Hub using either of these terms or tags.
In the same way trusty and xenial do not point to the same image. They point to images that may have common filesystem layers but diverged at some point.
You can tag images to make sure you will be able to find them later. If you store an image only with latest its probable you'll overwrite it on the next build and then potentially delete it with docker prune which deletes all images that are not referenced.a

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

How to see tree view of docker images?

I know docker has deprecated --tree flag from docker images command. But I could not find any handy command to get same output like docker images --tree. I found dockviz. But it seems to be another container to run. Is there any built in cli command to see tree view of images without using dockviz
Update Nov. 2021: for online public image, you have the online service contains.dev.
Update Nov. 2018, docker 18.09.
You now have wagoodman/dive, A tool for exploring each layer in a docker image
To analyze a Docker image simply run dive with an image tag/id/digest:
dive <your-image-tag>
or if you want to build your image then jump straight into analyzing it:
dive build -t <some-tag> .
The current (Sept 2015, docker 1.8) workaround mentioned by issue 5001 remains dockviz indeed:
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -t
The -t allows to remain in CLI only (no graphics needed)
Update Sept. 2016 (post docker 1.10: docker 1.11 soon 1.12), one year later, as mentioned in the same issue 5001, by Michael Härtl:
Since 1.10 the way layer IDs worked has changed fundamentally. For a lengthy explanation of this topic see #20399. There's also #20451 but I'm not sure, if this could be used by the nate/dockviz image.
Personally I find the way the new layers work very very confusing and much less transparent than before. And it's not really well documented either.
AFAIK #tonistiigi's comments in the issue above are the only public explanation available.
Tõnis Tiigi:
Pre v1.10 there was no concept of layers or the other way to think about it is that every image only had one layer. You built a chain of images and you pushed and pulled a chain. All these images in the chain had their own config.
Now there is a concept of a layer that is a content addressable filesystem diff. Every image configuration has an array of layer references that make up the root filesystem of the image and no image requires anything from its parent to run. Push and pull only move a single image, the parent images are only generated for a local build to use for the cache.
If you build an image with the Dockerfile, every command adds a history item into the image configuration. This stores to command so you can see it in docker history. As this is part of image configuration it also moves with push/pull and is included in the checksum verification.
Here are some examples of content addressable configs:
https://gist.github.com/tonistiigi/6447977af6a5c38bbed8
Terms in v1.10: (the terms really have not changed in implementation but previously our docs probably simplified things).
Layer is a filesystem diff. Bunch of files that when stacked on top of each other make up a root filesystem. Layers are managed by graphdrivers, they don't know anything about images.
Image is something you can run and that shows up in docker images -a. Needs to have a configuration object. When container starts it needs some kind of way to generate a root filesystem from image info. On build every Dockerfile command creates a new image.
You can refer to the more recent project TomasTomecek/sen, which:
had to understand 1.10 new layer format (commit 82b224e)
include an image tree representation:

Resources