What is the impact of using multiple Base Images in Docker?

What is the impact of using multiple Base Images in Docker? - docker

I understand that docker containers are portable between docker hosts, but I am confused about the relationship with the Base Image and the host.
From the documentation on Images, it appears that you would have a much heavier footprint (akin to multiple VMs) on the host machine if you had a variety of base images running. Is this assumption correct?
GOOD: Many containers sharing a single base image.
BAD: Many containers running separate/unique base images.
I'm sure a lot of this confusion comes from my lack of knowledge of LXC.

I am confused about the relationship with the Base Image and the host.
The only relation between the container and the host is that they use the same kernel. Programs running in Docker can't see the host filesystem at all, only their own filesystem.
it appears that you would have a much heavier footprint (akin to multiple VMs) on the host machine if you had a variety of base images running. Is this assumption correct?
No. The Ubuntu base image is about 150MB. But you'd be hard-pressed to actually use all of those programs and libraries. You only need a small subset for any particular purpose. In fact, if your container is running memcache, you could just copy the 3 or 4 libraries it needs, and it would be about 1MB. There's no need for a shell, etc. The unused files will just sit there patiently on disk, completely ignored. They are not loaded into memory, nor are they copied around on disk.
GOOD: Many containers sharing a single base image.
BAD: Many containers running separate/unique base images.
No. Using multiple images will only use a tiny bit of RAM. (Obviously, multiple containers will take more disk space, but disk is cheap, so we'll ignore that). So I'd argue that it's "OK" instead of "BAD".
Example: I start one Ubuntu container with Memcached and another Centos container with Tomcat. If they were BOTH running Ubuntu, they could share the RAM for things like libc. But because they don't share the files, each base image must load it's own copy of libc. But as we've seen, we're only talking 150MB of files, and you're probably only using a few percent of that. So each image only wastes a few MB of RAM.
(Hint: look at your process in ps. That's how much RAM it's using, including any files from it's image.)

For the moment, Docker is using AUFS which is a Union file system using the copy on write.
When you have multiple base images, those images take disk space, but when you run N containers from those images, there is no actual disk used. As it is copy-on-write, only modified files will take space on the host.
So really, if you have 1 or N base image, it changes nothing, no matter how many container you have.
An image is nothing more but a filesystem where you could chroot, there is absolutely no relation between an image and the host beside the fact that it needs to be linux binary form the same architecture.

I think that multiple base images has just minor impact on memory used.
Explanation:
I think that your comparison to VM is a bit misleading. Sure, in case of f.e. 3 base images running, you will have higher memory requirements than in case of just 1 base image but VMs will have even higher memory requirements:
Rough calculation - Docker, for M images, N containers:
1 x base image + N x container (file system + working memory)
M x size of base image + N x container (file system + working memory)
Calculation - VMs:
N x VM image = at least N x size of the base image for specific VM + N x size of container (size of file system + working memory)
For docker to gain advantage you have to have M << N.
For small M and large N difference between docker and multiple VMs is significant.

Related

Increasing the storage space of a docker container on Windows to 2-3TB

Working on a windows computer with 5TB available space. Working on building an application to process large amounts of data that uses docker containers to create replicable environments. Most of the data processing is done in parallel using many smaller docker containers, but the final tool/container requires all the data to come together in one place. The output area is mounted to a volume, but most of the data is just copied into the container. This will be multiple TBs of storage space. RAM luckily isn't an issue in this case.
Willing to try any suggestions and make what changes I can.
Is this possible?
I've tried increasing disk space for docker using .wslconfig but this doesn't help.

Docker Container Library Duplication

New to docker...
Need some help to clarify basic container concept...
AFAIK, each container would include app. code, library, runtime, cfg files, etc.
If I would run N numbers of containers for N numbers of app. and each of the app. happens to use a set of same lib. would it mean my host systems literally end up having N-1 numbers of duplicate libraries?
while container reduces OS overhead in VM approach of virtualization, I am just wondering if the container approach still has room to improve in terms of resource optimization.
Thanks
Mira

Containers are the runtime instance, defined by an image. Docker uses a unionfs to merge multiple layers together to create the root filesystem you see inside your container. Each step in the build of an image is a layer. And the container itself has a copy-on-write layer attached just to the container so that it sees it's own changes. Because of this, docker is able to point multiple instances of a running image back to the same image files for the unionfs layers, it never copies the layer when you spin up another container, they all point back to the same filesystem bytes.
In short, if you have a 1 gig image, and spin up 100 containers all using that same image, on disk will only be the 1 gig image plus any changes made in those 100 containers, not 100 gigs.

One docker container per node or many containers per big node

We have a little farm of docker containers, spread over several Amazon instances.
Would it make sense to have fewer big host images (in terms of ram and size) to host multiple smaller containers at once, or to have one host instance per container, sized according to container needs?
EDIT #1
The issue here is that we need to decide up-front. I understand that we can decide later using various monitoring stats, but we need to make some architecture and infrastructure decisions before it is going to be used. More over, we do not have control over what content is going to be deployed.

You should read
An Updated Performance Comparison of Virtual Machines
and Linux Containers
http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf
and
Resource management in Docker
https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/
You need to check how much memory, CPU, I/O,... your containers consume, and you will draw your conclusions
You can easily, at least, check a few things with docker stats and docker top my_container
the associated docs
https://docs.docker.com/engine/reference/commandline/stats/
https://docs.docker.com/engine/reference/commandline/top/

How important is a small Docker image when running?

There are a number of really tiny Linux Docker images that weigh in around 4-5M and the "full" distros that start around 100M and climb to twice that.
Setting aside storage space and download time from a repo, are there runtime considerations to small vs large images? For example if I have a compiled Go program, one running on Busybox and the other on Ubuntu, and I run say 10 of them on a machine, in what ways (if any) does it matter than one image is tiny and the other pretty heavy? Does one consume more runtime resources than the other?

I never saw any real difference in consuming other resources than storage and RAM if the image is bigger, however, as Docker containers should be single process why having the big overhead of unused clutter in your containers?
When trimming things down to small containers, there some advantages you may consider:
Faster transfer when deploying (esp. important if you wan't to do rolling upgrades)
Costs: The most time I used big containers, I ran exactly into storage issues on small VMs
Distributed File Systems: When using some File Storage like GlusterFS or other attached storage, big containers slow down, when bootet and updated heavily
massive overhead of data: if you have 500 MB clutter, you'll have it on your dev-machine, your CI/CD-Server, your registry AND every node of your production servers. This CAN matter depending on your use case.
I would say: If you just use a handful containers internally then the size is less important, if ever, than using hundrets of containers in production.

Understanding docker from a layman point of view

I am just one day old to docker , so it is relatively very new to me .
I read the docker.io but could not get the answers to few basic questions . Here is what it is:
Docker is basically a tool which allows you to make use of the images and spin up your own customised images by installing softwares so that you can use to create the VMs using that .
Is this what docker is all about from a 10000 ft bird's eye piont of view?
2 . What exactly is the meaning of a container ? Is it synonymn for image?
3 . I remember reading somewhere that it allows you to deploy applications. Is this correct ? In other words will it behave like IIS for deploying the .net applications?
Please answer my questions above , so that I can understand it better and take it forward.

1) What docker is all about from a 10000 ft bird's eye point of view?
From the website: Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.
Drill down a little bit more and a thorough explanation of the what/why docker addresses:
https://www.docker.io/the_whole_story/
https://www.docker.io/the_whole_story/#Why-Should-I-Care-(For-Developers)
https://www.docker.io/the_whole_story/#Why-Should-I-Care-(For-Devops)
Further depth can be found in the technology documentation:
http://docs.docker.io/introduction/technology/
2) What exactly is the meaning of a container ? Is it synonymn for image?
An image is the set of layers that are built up and can be moved around. Images are read-only.
http://docs.docker.io/en/latest/terms/image/
http://docs.docker.io/en/latest/terms/layer/
A container is an active (or inactive if exited) stateful instantiation of an image.
http://docs.docker.io/en/latest/terms/container/
See also: In Docker, what's the difference between a container and an image?
3)I remember reading somewhere that it allows you to deploy applications. Is this correct ? In other words will it behave like IIS for deploying the .net applications?
Yes, Docker can be used to deploy applications. You can deploy single components of the application stack or multiple components within a container. It depends on the use case. See the First steps with Docker page here: http://docs.docker.io/use/basics/
See also:
http://docs.docker.io/examples/nodejs_web_app/
http://docs.docker.io/examples/python_web_app/
http://docs.docker.io/examples/running_redis_service/
http://docs.docker.io/examples/using_supervisord/

So.
It's about providing the separation of processes that you get with virtualisation without the overhead. Of course this doesn't come without a cost - which in this case the largest one is that your docked containers will all be running under the same kernel.
A container is roughly a chroot (with better process encapsulation) and some ethernet virtualisation. The image is the filesystem (plus a few bits) that is mounted to provide the root filesystem^1
deploy is just the term docker uses for spinning up a container instance.
Effectively, each running instance of a container thinks that it is the only thing^2 running on that machine (much like a cloud appliance is typically designed). It provides more separation of processes than running on the host OS would provide, and allows for easily spinning up multiple separate copies of the container as needed; while providing much, much lower overheads than using full virtualisation would need.
^1: Actually there may be several layers of file-system sandwiched together to form the root file system.
^2: Docker does support multiple processes running within a single instance, but that is generally considered to be somewhat advanced usage.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart