What's benefit of docker's image layer? - docker

I'm new to Docker. Based on reading some Docker documentation, I plan to convert my project to Docker image as the following design:
My project has the follow natures:
The base OS almost never need to update unless big OS issue, thus say update base OS every 2 years.
The base libraries might be updated only 6 months.
Libraries are updated every month.
Project code are updated once a day.
Thus I plan to create 4 images:
image1 - base os
image2 - from image1 and add base libraries
image3 - from image2 and add libraries
image4 - from image3 and add project code
My understanding is, image4 has 4 layers. Once I build a new image4, Docker only needs to pull layer4, because layer 1,2,3 are same as the old image4. Since my project code is just some text scripts, thus layer4 should be very small, thus pull a new image4 should be very fast.
Is my understanding correct?

Since my project code is just some text scripts, thus layer4 should be very small
Layer4 will be small, but pulling image4 will be fast only if one has already pulled image 1 to 3 before.
And the resulting image4 won't be small, but the result of the concatenation of the 3 base images.
Plus, the term "layer" should not mask the fact that each line in a docker file will create an intermediate image, making the actual image a collection of all those intermediate small layers (resulting from the execution of each Dockerfile command).
You might have 4 "general" layers, but the actual image is likely to be composed of more than 4 layers.
You can check that with imagelayers.io
I use an alias to cleanup old dangling images:
alias drmiad='docker rmi $(docker images --filter "dangling=true" -q --no-trunc)'
My build script usually include:
. ../.bash_aliases
docker build -t sshd . || exit 1
drmiad 2> /dev/null

Related

How to merge Docker Compose in one image? [duplicate]

I'm hoping to use docker to set up some bioinformatic analysis.
I have found two docker images that I would like to use:
jupyter/datascience-notebook
bioconductor/devel_base
I have been successful in running each of these images independently, however I don't know how to merge them together.
Is merging two docker containers possible? Or do you start with one, and then manually install the features of the other?
You can't just merge the images. You have to recreate your own based on what was in each of the images you want. You can download both images and re-create the Docker files for each like this:
docker history --no-trunc=true image1 > image1-dockerfile
docker history --no-trunc=true image2 > image2-dockerfile
Substitute the image1 and image2 with the images you want to see the history for. After this you can use those dockerfiles to build your own image that is the combination of the two.
The fly in the ointment here is that any ADD or COPY commands will not reveal what was copied because you don't have access to the local file system from which the original images were created. With any luck that won't be necessary or you can get any missing bits from the images themselves.
If there are specific files or directories that you want to cherry-pick from the one of the two images, you can create a new Dockerfile that builds FROM one of them and copy over specific paths from the other using COPY's --from option. For example:
FROM bioconductor/devel_base
COPY --from=jupyter/datascience-notebook /path/to/something-you-want /path
However, a quick investigation of those images shows that in this specific case there isn't a lot that can easily be cherry picked.
Alternatively, you can just look at the original Dockerfiles and combine them yourself:
https://github.com/jupyter/docker-stacks/blob/master/base-notebook/Dockerfile
https://github.com/Bioconductor/bioc_docker/blob/master/out/devel_base/Dockerfile
Fortunately they are both based one APT-based distros: Ubuntu and Debian. So most of the apt-get install commands should work fine if you pick either base image.
You start with one then manually install the features of the other one. Merging would be far to complex, and too many unknowns.

How docker load a diff image

I know docker save can save a image to tar and use docker load to reload a image.
For example:
I have a Machine A and B. B can't connect hub. A is image:latest and B is image:base.
I have to save multi image in A as some tar file , but the tar files are too big to transfer.
Can I save the diff between tags or image ids in A and load the diff in B?
Not save the whole image which help update patch much more smaller.
This isn't possible using standard Docker tooling. The only option docker save takes is an option to write to a file rather than to stdout, and it always contains all parent layers (and base images).
If your only problem is transferring the images, consider either techniques to reduce the image size (for example, use a multi-stage image to not include build-time dependencies in the final image) or using tools like split(1) to break the tar file into smaller parts.
I believe the docker save tar file output is the same as the "Export an image" API call. It might be possible to manually edit that tar file to delete layers, and there might be tools out there that do this. (This is not a particularly mainstream path, though; I've looked into it several years ago but not done it myself, and occasionally see tools mentioned in infrequent SO answers.)
In between the standard behavior of docker pull and docker save only creating complete image chains, in principle there's no way to set up Docker so that you never only have the "top half" of an image but not the base layers below this. Editing the docker save tar files by hand would violate this invariant.

Docker squash and layer: which layers are merged and which are still shared

Hint: I am aware of this question but it does not exactly / fully answer my question or more or less not targeting it (but you could derive some of it - i would say there should be a yes/no question for that - here it is ).
CASE:
Assuming i have 3 Images, one called BASE, one CHILDa and CHILDb, both childs do FROM BASE.
Assume BASE has a size of 1GB and assuming we do not know or are particular interested if it is squashed or not, since it should not matter ( IMHO )
CHILDa and CHILDb both add 10 layers with a size of each 500MB
Assume we used docker build --squash CHILDa when creating CHILDa and CHILDb
Question:
When pulling CHILDa and CHILDb from the registry, i understand that BASE layers will be pulled first. Now, my question is, what is the exact size of the images on the drive:
a) 1GB(base) + 500MB(CHILDa) + 500MB(childB)=2GB
b) (1GB + 500MB) + (1GB + 500MB)=3GB
So are the layers of BASE shared as in non --sqaush cases ( this would be a) then ) or are they not shared, b) then
I understand, that layers from BASE should not be squashed when CHILDa is build and squashed, only the layers created in CHILDa are squashed to one layer, so the history should be looking like this
BASE LAYER1
BASE LAYER2
BASE LAYER3
...
CHILDa LAYER1 (squashed)
So that means, all the BASE layers should be shared with CHILDb on transfer and also shared in terms of disk space when CHILDa and CHILDb are both pulled. This would mean a) would be the answer.
I am asking this question to have a definite answer, not looking for a suggestion or an implication based on the docs. Probably even backed up having a test? It would be not the first time that docs and technical implementation are not matching eachother( in docker)
The answer is A. You can see what Docker is doing easily enough with an inspect on the resulting image:
First build two images:
$ docker build -t jenkins-blueocean:full .
$ docker build -t jenkins-blueocean:squash --squash .
Compare the total disk space used on the for an image (which counts the base image, in this case jenkins/jenkins):
$ docker image ls jenkins/jenkins:2.77
REPOSITORY TAG IMAGE ID CREATED SIZE
jenkins/jenkins 2.77 1a057287c665 6 weeks ago 814MB
$ docker image ls jenkins-blueocean:full
REPOSITORY TAG IMAGE ID CREATED SIZE
jenkins-blueocean full 773f9e1cbd94 3 minutes ago 1.29GB
$ docker image ls jenkins-blueocean:squash
REPOSITORY TAG IMAGE ID CREATED SIZE
jenkins-blueocean squash 9a8816dcc900 2 minutes ago 1.28GB
This disk space is cumulative, and will double count layers used in different images. So we need to look at the actual layers. Compare the layers for the three images (base, full, and squashed) using a docker inspect:
$ docker inspect -f '{{json .RootFS.Layers}}' jenkins/jenkins:2.77 | jq .
[
"sha256:45f0f161f0749d09482ed1507925151b22b1f8c0c85970fe0857d61e530264b4",
"sha256:560ec518567f4117ed651db78b9c46eee39e00f38a87d6200ad7c87b79432064",
"sha256:deccd4ec00609f5f711578af469ce4ff43a5c73efc52517fc8ca362ebd36860c",
"sha256:23543e96fe44ca57a96d8552a6d9d218f7aa93b928a1ec8bafcaa9df3cc5723b",
"sha256:3de9ccb39b3bcf90c9215a49a84b340fedad87840d0580ffe0f0e0e8a1cb1f53",
"sha256:559298d0ee994bb9f12a77b1acc6fdfb6c7120cbcadfd640f7a9d171729b2cb1",
"sha256:4dc9d0cb0b3ca0f565aa29c7762f7322ece1e1fb51711feac3a52f3c20a28d2f",
"sha256:93d818bcd1d5eb6c689e6964e89feb8a8a3a394a998552c540b7db74986266c7",
"sha256:ac3d4345fe0474e18265fbb999fe6ab1c077fbb59876317406c7974c75c7ab5d",
"sha256:83a60a36cc44ca6fdab64823e805a853106be334239eb9d43cc1b220bb6ad238",
"sha256:7c78d70f156aaaee25540c9100ca28b68b554a966d448079896c413ae71a0e5d",
"sha256:cfd8defeb8a79686260691ce89a36772b21af0f736628492c835bb8a5740b817",
"sha256:fc4dc905efd22f932b74f95b53904736bebf52c2033e9853c54efb0b3f01560f",
"sha256:456fa2e1bb798ba4ccc5d433013973772b673dfff1f1386f18ceffe7d18132da",
"sha256:2446924bd5315bf6c46e8a5db2b61247da4ded48f4de148c15f8f5a2f9b1e91a",
"sha256:5a4416e8de72a14e97b53484e6016cc8a5b79398a25eb3b80fa099740b9f32e3",
"sha256:20901b1036e739e01c99d83e461059b3974003835a31e473f348fd70ece6c4e3",
"sha256:6d9d9244ead270d545d5de253a6ebb95398a7c63b10977c5cf7345b1cbf7d201",
"sha256:056fab22f880b32a4bbe4725b5b227891290097fd3791af452e52eb98b02cfb4",
"sha256:18a8691ee145f81f617bf788f39617f46d84b9924911317e6226139074b1f3e1"
]
This base image goes up to layer 18a8691... Compare that with the full:
$ docker inspect -f '{{json .RootFS.Layers}}' jenkins-blueocean:full | jq .
[
"sha256:45f0f161f0749d09482ed1507925151b22b1f8c0c85970fe0857d61e530264b4",
"sha256:560ec518567f4117ed651db78b9c46eee39e00f38a87d6200ad7c87b79432064",
"sha256:deccd4ec00609f5f711578af469ce4ff43a5c73efc52517fc8ca362ebd36860c",
"sha256:23543e96fe44ca57a96d8552a6d9d218f7aa93b928a1ec8bafcaa9df3cc5723b",
"sha256:3de9ccb39b3bcf90c9215a49a84b340fedad87840d0580ffe0f0e0e8a1cb1f53",
"sha256:559298d0ee994bb9f12a77b1acc6fdfb6c7120cbcadfd640f7a9d171729b2cb1",
"sha256:4dc9d0cb0b3ca0f565aa29c7762f7322ece1e1fb51711feac3a52f3c20a28d2f",
"sha256:93d818bcd1d5eb6c689e6964e89feb8a8a3a394a998552c540b7db74986266c7",
"sha256:ac3d4345fe0474e18265fbb999fe6ab1c077fbb59876317406c7974c75c7ab5d",
"sha256:83a60a36cc44ca6fdab64823e805a853106be334239eb9d43cc1b220bb6ad238",
"sha256:7c78d70f156aaaee25540c9100ca28b68b554a966d448079896c413ae71a0e5d",
"sha256:cfd8defeb8a79686260691ce89a36772b21af0f736628492c835bb8a5740b817",
"sha256:fc4dc905efd22f932b74f95b53904736bebf52c2033e9853c54efb0b3f01560f",
"sha256:456fa2e1bb798ba4ccc5d433013973772b673dfff1f1386f18ceffe7d18132da",
"sha256:2446924bd5315bf6c46e8a5db2b61247da4ded48f4de148c15f8f5a2f9b1e91a",
"sha256:5a4416e8de72a14e97b53484e6016cc8a5b79398a25eb3b80fa099740b9f32e3",
"sha256:20901b1036e739e01c99d83e461059b3974003835a31e473f348fd70ece6c4e3",
"sha256:6d9d9244ead270d545d5de253a6ebb95398a7c63b10977c5cf7345b1cbf7d201",
"sha256:056fab22f880b32a4bbe4725b5b227891290097fd3791af452e52eb98b02cfb4",
"sha256:18a8691ee145f81f617bf788f39617f46d84b9924911317e6226139074b1f3e1",
"sha256:679b85b8d42598a7ecb5988e408da49cbb3f86402fd2e5694104839ff17a7015",
"sha256:5fa620489d92edd3e7922d9335d803ea83c148793044e0da99144152f7988437",
"sha256:17d03c6eda4a4d989f6751bb53d7bf356309938a1076af75bdf440195471fa2b",
"sha256:7a78b2c7c995ddab1ba675aba1c2bc54cc289ba148fd39b600f592060d98c459",
"sha256:f56b6c3fd8713236d077a95568a58445e6d6423113c0b68c6f10bef39bd6b6ff"
]
The full image added 5 layers to the image. Viewing the squashed:
$ docker inspect -f '{{json .RootFS.Layers}}' jenkins-blueocean:squash | jq .
[
"sha256:45f0f161f0749d09482ed1507925151b22b1f8c0c85970fe0857d61e530264b4",
"sha256:560ec518567f4117ed651db78b9c46eee39e00f38a87d6200ad7c87b79432064",
"sha256:deccd4ec00609f5f711578af469ce4ff43a5c73efc52517fc8ca362ebd36860c",
"sha256:23543e96fe44ca57a96d8552a6d9d218f7aa93b928a1ec8bafcaa9df3cc5723b",
"sha256:3de9ccb39b3bcf90c9215a49a84b340fedad87840d0580ffe0f0e0e8a1cb1f53",
"sha256:559298d0ee994bb9f12a77b1acc6fdfb6c7120cbcadfd640f7a9d171729b2cb1",
"sha256:4dc9d0cb0b3ca0f565aa29c7762f7322ece1e1fb51711feac3a52f3c20a28d2f",
"sha256:93d818bcd1d5eb6c689e6964e89feb8a8a3a394a998552c540b7db74986266c7",
"sha256:ac3d4345fe0474e18265fbb999fe6ab1c077fbb59876317406c7974c75c7ab5d",
"sha256:83a60a36cc44ca6fdab64823e805a853106be334239eb9d43cc1b220bb6ad238",
"sha256:7c78d70f156aaaee25540c9100ca28b68b554a966d448079896c413ae71a0e5d",
"sha256:cfd8defeb8a79686260691ce89a36772b21af0f736628492c835bb8a5740b817",
"sha256:fc4dc905efd22f932b74f95b53904736bebf52c2033e9853c54efb0b3f01560f",
"sha256:456fa2e1bb798ba4ccc5d433013973772b673dfff1f1386f18ceffe7d18132da",
"sha256:2446924bd5315bf6c46e8a5db2b61247da4ded48f4de148c15f8f5a2f9b1e91a",
"sha256:5a4416e8de72a14e97b53484e6016cc8a5b79398a25eb3b80fa099740b9f32e3",
"sha256:20901b1036e739e01c99d83e461059b3974003835a31e473f348fd70ece6c4e3",
"sha256:6d9d9244ead270d545d5de253a6ebb95398a7c63b10977c5cf7345b1cbf7d201",
"sha256:056fab22f880b32a4bbe4725b5b227891290097fd3791af452e52eb98b02cfb4",
"sha256:18a8691ee145f81f617bf788f39617f46d84b9924911317e6226139074b1f3e1",
"sha256:e05668bb7cbab8f964ea3512a9ce41568330218e0e383693ad9edfd1befce9aa"
]
It only added a single new layer. The base image itself was not squashed.
On disk, each layer is only stored once, so you only count the base image once for disk usage.
Note, I do not recommend squashing images in most scenarios. It breaks the value of image layer caching of earlier layers inside the image. Instead, I recommend organizing the Dockerfile to maximize the value of layer caching, and using multi-stage builds to get a layer down to a single copy if there's some systemic overhead.

Why isn't Docker more transparent about what it's downloading?

When I download a Docker image, it downloads dependencies, but only displays their hashes. Why does it not display what it is downloading?
For example:
➜ ~ docker run ubuntu:16.04
Unable to find image 'ubuntu:16.04' locally
16.04: Pulling from library/ubuntu
b3e1c725a85f: Downloading 40.63 MB/50.22 MB
4daad8bdde31: Download complete
63fe8c0068a8: Download complete
4a70713c436f: Download complete
bd842a2105a8: Download complete
What's the point in only telling me that it's downloading b3e1c725a85f, etc.?
An image is created on layers of filesystems represented by hashes. After it's creation, the base image tag may point to a completely different set of hashes without affecting any images built off of it. And these layers are based on things like run commands, the tag to call it something like ubuntu:16.04 is only added after the image is made.
So the best that could be done is to say 4a70713c436f is based on adding some directory based on a hash of an input folder itself, or a multi-line run command, neither of which makes for a decent UI. The result may have no tagged name, or it could have multiple tagged names. So the simplest solution is to output what's universal and unchanging for all scenarios, an unchanging hash.
To rephrase that pictorially:
b3e1c725a85f: could be ubuntu:16.04, ubuntu:16, ubuntu:latest, some.other.registry:5000/ubuntu-mirror:16.04
4daad8bdde31: could be completely untagged, just a run command
63fe8c0068a8: could be completely untagged, just a copy file
4a70713c436f: could point to a tagged base image where that tag has since changed
bd842a2105a8: could be created with a docker commit command (eek)

Is this Dockerfile extending a image or creating a new one ?

I was just going through this tutorail HERE about Docker images and to be more specific , it was about extending a docker image, now if you scroll to the section that says: Building an image from a Dockerfile , you'll see that a new Dockerfile is being built , now is this a independent image or is this Dockerfile extending the training/sinatra image ?? That would be by question.
So to repeat my question , is the Dockerfile in the Building an image from a Dockerfile section, creating a new image or extending the training/sinatra image ?
Thank you.
The command in that section is
docker build -t ouruser/sinatra:v2
That means it is creating a new image, extending the one mentioned in the Dockerfile: FROM ubuntu:14.04
The end result is:
a new image belonging to the user ouruser, the repository name sinatra and given it the tag v2.
each step creates a new container, runs the instruction inside that container and then commits that change - just like the docker commit work flow we saw earlier.
When all the instructions have executed we’re left with the 97feabe5d2ed image (also helpfully tagged as ouruser/sinatra:v2) and all intermediate containers will get removed to clean things up.
So again, this is an independent image, independent from training/sinatra.
To extends an image, you
either make a Dockerfile which starts "FROM <animage>", and build it: it will execute a series of docker commit on each intermediate containers.
or, and that is what is described in "Updating and committing an image", you do that manually, by running a bash, executing an order, exiting and committing the exited container into a new image.
The first approach scales better: you chain multiple commits specified in one Dockerfile.

Resources