Merging docker containers/commits/images

Merging docker containers/commits/images - docker

I have to 2 containers in which I installed 2 different sets of tools. Both containers use the same base image. Now, I like to combine both containers into one. Or alternatively, after committing these containers to images, I like to be able to merge the images. (But I would like to avoid creating Dockerfiles and merging them...)
(The sets of tools are independent, so none of the files added within one container should collide one of the other one.)
How can I do this? Or can I somehow tell docker to start a new container or create a new image with:
1. a base image and with
2. a stack of changes to the file system corresponding to my separate containers?
(Since docker stores the difference of the file system of a container relative to its image (or between different layers?), in principle it should be possible to merge the differences even directly...)

Related

Docker image size doesn’t make sense

I’ve created a ubuntu:bionic base image on my computer. Originally super large size but I deleted 80% of the content by running container and then committing. If I got to root directory and do “du -sh”, it said disk usage 4.5GB. Curious enough, the size of docker image when I do "docker images’ show 11 GB. After pushing to docker hub, I see that it’s 3.34 GB. So I thought perhaps it cleaned up something before compressing? I ran the new image, deleted some more content, commit, and pushed again. This time, “du–sh” said 3.0 GB, “docker images” still said 11GB and docker hub also 3.34 GB. Clearly it is compressing the 11GB file and not the 3.0GB content I’m expecting. Is there a easy way to “clean up” the image?

Docker images are built from layers. When you add a new layer, it doesn't remove the previous layers, it just adds a new one, rather like a new Git commit—the history is still there.
That means when you deleted the content, you made it invisible but it's still there in earlier layers.
You can see the layers and their sizes with docker history yourimagename.
Your options:
Make sure files you don't need don't make it in the first place, e.g. with .dockerignore.
Use a multi-stage build to create new image from the old one with only the files you need. https://docs.docker.com/develop/develop-images/multistage-build/

Tool to view tree of docker image layers

Is there any way to look the layers of a set of docker images in a tree fashion? It will help examine if any siblings are serving the same purpose and one can be replaced with the other.

Earlier one used to be able to do this using docker images --tree. However that is no longer available in the latest versions of docker.
Here is an external tool that can help achieve the same visualization

Outsource source code in docker-compose to use minimal disk space

I am using docker successfully in dev environment and want to use it now at staging and prod too.
I am developing a web application with symfony where the code is mounted local to the docker container. For staging and prod i want to "bake" the source code to the image, cause theres no need to change it anymore at this time.
At the moment my services "php" and "nginx" needs access to the src files. For staging/prod i would create a extra volume called "src" and mount it to both services. In one of the services (nginx/php) i would add a COPY command to copy the src code on build to the mounted "src" volume.
The problem now is the following:
Whenever a new version of my code exist, the whole image have to build new ... the smallest image (nginx) has a size of 200MB. So every time i want to update only my code (size just 10MB) the whole container (200MB) has to build new ...
In addition i want to check in all builds into a repository.
That is quite expensive with time ...
My thought is the following:
Is it possible to only build the data volume "src" new on each code update (triggered trough a jenkins build job) and check them in?
I think, there is no need to build rarely changed environments like php/nginx/mysql new on every build ...
Or is there another approach?
Initially having 1,5GB for all needed services is quite ok, But having for each version another 200 MB in the repository is too heavy.
Thanks

First the approach you are following is definitely a bad practice. A docker container should be portable and self-contained. Relying on data volumes that are bounded to the host machine will make your container not portable.
By design containers should package all of the dependencies needed to run the application. You should thus add the source to each image if the source code is a dependency that must be provided.
You should investigate other options to make the image size smaller. Depending on the programming language you are using, it is possible to compile/compress the source code and have a smaller binary for instance that can be copied into the image.
One final note is that using very different appraoches to deploy between environment(dev/staging/prod) is usually a bad idea. It is much preferable to have similar deployment strategies to avoid unexpected errors.

If you set up your Dockerfile properly (see docs) so you are adding the code last, it should be a pretty quick operation to update as all the other unchanged layers will be cached. This is pretty common practice as part of a Docker workflow.
You can use this same image for your local development and mount your working code over the code in the container for active development. As long as that exact same code is used to rebuild your images, you should maintain consistency. You could optimize further by choosing which parts of your code are likely to change and order your build accordingly.
You may also want to look into multi-stage build process where you can further optimize your base image and reduce final image size.

what are the advantages of having layers in a docker image?

Let's say I have two different Dockerfiles.
Image one called nudoc/my-base-image:1.1
FROM ubuntu:16.10
COPY . /test.war
Image two called nudoc/my-testrun-image:1.1
FROM acme/my-base-image:1.1
CMD /test/start.sh
Both have the layers in common.
What are the advantages of having layers in a docker image? does it benefit from pulling from the registry?

As Henry already stated
Common layers are downloaded only once and are stored only once. So this has benefits for download as well as storage.
Additionaly building an image will reuse layers if the creating command allows. This reduces the build time. For example if you copy a file into your image and the file is the same as in the last build the old layer will be reused. See the best practices for writing dockerfiles for more details.

Common layers are downloaded only once and are stored only once. So this has benefits for download as well as storage.

A layer will be downloaded once, and possibly reused for other images. You can see them as intermediate images, and those intermediate images are combined together to create a bigger one.
In continuous integration, this can save quite some time !
I suggest you read the official documentation page: https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/

Docker uses aufs file system as default where each instruction defined in your Dockerfile will acts as a each individual layer, if you add or update an instruction it will effect the respective layer hence it helps you to build, reuse or update your Docker image instantly, to learn more about layers and image read here

Number of commands in Dockerfile

I noticed that each line in the Dockerfile creates a separate image. Is there any limit on the number of images that are created?
Should we try to do a oneliner of RUN cmd1 && cmd2 && cmd3 instead?
How would this differ if we use a service like Quay?
Thanks!

As Alister said, there is an upper limit on the number of layers in a Docker image if you are using the AUFS file system. At Docker version 0.7.2 the limit was raised to 127 layers (changelog).
Since this a limitation of the underlying union file system (in the case of AUFS), using Quay or other private registries won't change the outcome. But you could use a different file system.
The current alternative filesystem is to use devicemapper (see CLI docs). These other filesystems may have different limitations on the number of layers -- I don't think devicemapper has an upper limit.
You're right, by RUNning multiple commands in a single RUN statement, you can reduce the number of layers.
Alternatively, if you really need a lot of layers to build your image, you could build an image until it reaches the maximum and then use docker export to create an un-layered copy of the image's file system. Then docker import to turn it back into an image again, this time with just one layer, and continue building. You lose the history that way though.

There is a limit, of 42 - apparently a hard limit imposed by AUFS.
It can help to be somewhat avoided by putting what would be done in individual RUN commands into a script, and then running that script. You would then end up with a single, larger image layer, rather than a number of smaller files to merge. Smaller images (with multiple RUN lines) make initial testing easier (since a new addition on the end of the RUNlist can re-use the previous image), so it's typical to wait until your Dockerfile has stabilised before merging the lines.
You can also reduce the potential number of images when ADDing a number of files, by adding a directory-full, rather than a number of individual files.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Merging docker containers/commits/images - docker

Related

Docker image size doesn’t make sense

Tool to view tree of docker image layers

Outsource source code in docker-compose to use minimal disk space

what are the advantages of having layers in a docker image?

Number of commands in Dockerfile

Categories

Resources