How important is a small Docker image when running? - docker

There are a number of really tiny Linux Docker images that weigh in around 4-5M and the "full" distros that start around 100M and climb to twice that.
Setting aside storage space and download time from a repo, are there runtime considerations to small vs large images? For example if I have a compiled Go program, one running on Busybox and the other on Ubuntu, and I run say 10 of them on a machine, in what ways (if any) does it matter than one image is tiny and the other pretty heavy? Does one consume more runtime resources than the other?

I never saw any real difference in consuming other resources than storage and RAM if the image is bigger, however, as Docker containers should be single process why having the big overhead of unused clutter in your containers?
When trimming things down to small containers, there some advantages you may consider:
Faster transfer when deploying (esp. important if you wan't to do rolling upgrades)
Costs: The most time I used big containers, I ran exactly into storage issues on small VMs
Distributed File Systems: When using some File Storage like GlusterFS or other attached storage, big containers slow down, when bootet and updated heavily
massive overhead of data: if you have 500 MB clutter, you'll have it on your dev-machine, your CI/CD-Server, your registry AND every node of your production servers. This CAN matter depending on your use case.
I would say: If you just use a handful containers internally then the size is less important, if ever, than using hundrets of containers in production.

Related

Increasing the storage space of a docker container on Windows to 2-3TB

Working on a windows computer with 5TB available space. Working on building an application to process large amounts of data that uses docker containers to create replicable environments. Most of the data processing is done in parallel using many smaller docker containers, but the final tool/container requires all the data to come together in one place. The output area is mounted to a volume, but most of the data is just copied into the container. This will be multiple TBs of storage space. RAM luckily isn't an issue in this case.
Willing to try any suggestions and make what changes I can.
Is this possible?
I've tried increasing disk space for docker using .wslconfig but this doesn't help.

How big can a GKE container image get before it's a problem?

This question is admittedly somewhat vague. If you have suggestions how to better word it, please by all means, give me feedback...
I want to understand how big a GKE container image can get before there may be problems, either serious or minor. For example, I've built a docker image (not deployed yet) that is 683 MB.
(As an aside, the reason it's so big is that I'm running a computer vision library licensed from a company with certain attributes: (1) uses native libraries that are not compatible with Alpine; (2) uses Java; (3) uses Node.js to run a required licensing daemon in same container; (4) has some very large machine learning model files.)
Although the service will have auto-scaling enabled, I expect the auto-scaling to be fairly light. It might add a new pod occasionally, but not major spikes up and down.
The size of the container will determine how many resources to assign it and thus how much CPU, memory and disk space your nodes.must have. I have seen containers require over 2 GB of memory and still work fine within the cluster.
There probably is an upper limit but the containers would have to be enormous, your container size should not pose any issues aside from possibly container startup
In practice, you're going to have issues pushing an image to GCR before you have issues running it on GKE, but there isn't a hard limit outside the storage capabilities of your nodes. You can get away with O(GB) pretty easily.

Why might an image run differently in Kubernetes than in Docker?

I'm experiencing an issue where an image I'm running as part of a Kubernetes deployment is behaving differently from the expected and consistent behavior of the same image run with docker run <...>. My understanding of the main purpose of containerizing a project is that it will always run the same way, regardless of the host environment (ignoring the influence of the user and of outside data. Is this wrong?
Without going into too much detail about my specific problem (since I feel the solution may likely be far too specific to be of help to anyone else on SO, and because I've already detailed it here), I'm curious if someone can detail possible reasons to look into as to why an image might run differently in a Kubernetes environment than locally through Docker.
The general answer of why they're different is resources, but the real answer is that they should both be identical given identical resources.
Kubernetes uses docker for its container runtime, at least in most cases I've seen. There are some other runtimes (cri-o and rkt) that are less widely adopted, so using those may also contribute to variance in how things work.
On your local docker it's pretty easy to mount things like directories (volumes) into the image, and you can populate the directory with some content. Doing the same thing on k8s is more difficult, and probably involves more complicated mappings, persistent volumes or an init container.
Running docker on your laptop and k8s on a server somewhere may give you different hardware resources:
different amounts of RAM
different size of hard disk
different processor features
different core counts
The last one is most likely what you're seeing, flask is probably looking up the core count for both systems and seeing two different values, and so it runs two different thread / worker counts.

How much resources to allocate to docker

I have been playing around with docker for a few months now and we are now ready to run a few production containers, and it got me into researching the infrastructure.
It let me to the question of, how much resources do I need to allocate to docker and how much should be left for the OS.
e.g. My server is 8 core 16gb ram. How much of that should I allocate to docker? I want to obviously allocate the maximum possible, but at what point would there be degradation of performance of the server it self?
Your question is hard to answer, and here's why: "docker" itself doesn't really require much in the way of resources. On the other hand, the applications that you run using docker will have their own requirements.
For example, if you're hosting a multi-terabyte database in a docker container, you're going to require more memory (and probably a lot more storage) than you would for, say, a single wordpress site.
If you're hosting some sort of video transcoding pipeline in Docker, you might end up consuming a lot more of your available CPU.
The only resource that Docker really consumes on its own is the storage space for images and volumes...and again, how much space you need is entirely dependent on how you're using Docker.
It all depends on exactly what you plan on doing with your system.

One docker container per node or many containers per big node

We have a little farm of docker containers, spread over several Amazon instances.
Would it make sense to have fewer big host images (in terms of ram and size) to host multiple smaller containers at once, or to have one host instance per container, sized according to container needs?
EDIT #1
The issue here is that we need to decide up-front. I understand that we can decide later using various monitoring stats, but we need to make some architecture and infrastructure decisions before it is going to be used. More over, we do not have control over what content is going to be deployed.
You should read
An Updated Performance Comparison of Virtual Machines
and Linux Containers
http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf
and
Resource management in Docker
https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/
You need to check how much memory, CPU, I/O,... your containers consume, and you will draw your conclusions
You can easily, at least, check a few things with docker stats and docker top my_container
the associated docs
https://docs.docker.com/engine/reference/commandline/stats/
https://docs.docker.com/engine/reference/commandline/top/

Resources