Docker taking up a lot of disk space - docker

I am using Docker Desktop for Windows on Windows 10.
I was experiencing issues with system SSD always being full and moved 'docker-desktop-data' distro (which is used to store docker images and other stuff) out of the system drive to drive D: which is HDD using this guide.
Finally, I was happy to have a lot of space on my SSD... but docker containers started to work slower. I guess this happens due to HDD write/read operations being slower than on SSD.
Is there a better way to solve the problem of the continuously growing size of Docker distro's without impacting how fast containers actually work and images are built?

Actually only be design. As you know, a docker container is layered. So it might be feasible to check if it is possible to create something like a "base-container" from which your actual image in derived.
Also it might be sensible to check if your base distro is small enough. I often have seen containers created from full blown Debian or Ubuntu distros. Thats not the best idea. Try to derive from an alpine version or check for even smaller approaches.

Related

Kubernetes/Docker uses too much disk space

I have a Kubernetes-cluster with 1 master-node and 3 worker-nodes. All Nodes are running on CentOS 7 with Docker 19.06. I'm also running Longhorn for dynamic provisioning of volumes (if that's important).
My problem is that every few days one of the worker nodes grows the HDD-usage to 85% (43GB). This is not a linear increase, but happens over a few hours, sometimes quite rapidly. I can "solve" this problem for a few days by first restarting the docker service and then doing a docker system prune -a. If I don't restart the service first, the prune removes next to nothing (only a few MB).
I also tried to find out which container is taking up all that space, but docker system df says it doesn't use the space. I used df and du to crawl along the /var/lib/docker subdirectories too, and it seems none of the folders (alone or all together) takes up much space either. Continuing this all over the system, I can't find any other big directories either. There are 24GB that I just can't account for. What makes me think this is a docker problem nonetheless is that a restart and prune just solves it every time.
Googling around I found a lot of similar issues where most people just decided to increase disk space. I'm not keen on accepting this as the preferred solution, as it feels like kicking the can down the road.
Would you have any smart ideas on what to do instead of increasing disk space?
It seems like it is expected behavior, from Docker documentation you can read:
Docker takes a conservative approach to cleaning up unused objects
(often referred to as “garbage collection”), such as images,
containers, volumes, and networks: these objects are generally not
removed unless you explicitly ask Docker to do so. This can cause
Docker to use extra disk space. For each type of object, Docker
provides a prune command. In addition, you can use docker system prune to clean up multiple types of objects at once. This topic shows
how to use these prune commands.
So it seems like you have to clean it up manually using docker system/image/container prune. Other issue might be that those containers create too much logs and you might need to clean it up.

Docker for Mac - sync issue

So I have noticed that on Mac there is a huge problem with sync while developing a PHP app. It can take up to 60 seconds before page loads.
As on Mac, Docker uses additional virtual machine I have used http://docker-sync.io to fix it. But I wonder, are you guys having similar issues? Yesterday I have noticed that there is something called File Sharing in Docker settings
img. As I've put my code at /Volumes/Documents/wwwdata should I have to add it also?
As the author of docker-sync, i might be able to give you an comprehensive answer.
Yet, under macOS, there is no solution with native docker for mac tools, to have a somewhat acceptable development environment - which means, sharing source code into the container - during its lifetime.
The main reasons are, that read and write speed on mounted volumes in docker for mac is extremely slow, see the performance comparison . This said, you could mount a volume using -v or volumes into a normal container, but this will be extremely slow. virtualbox or fusion shares are slow out of the same reasons, OSXFS even right now performs better then those, but still is horrible slow.
Docker-sync tries to detach the slow read/write speed from OSXFS by using unison as sync, not direct mount:
Long story short:
Docker for mac is still (very) slow, this hold even for High Sierra with APFS - unusable for development purposes.
The "folder" you are looking at and named "images" are nothing more then OSXFS based mounts into the hyperkit container, so just what it has been used in the past, you just now can configure other folders to be OSXFS synced and available to be mounted then the default ones. So this will not help you at all either.
To make this answer more balanced towards the general case, you find alternatives to docker-sync here - the amount of alternatives also tells you, that there is ( still ) a huge issue in docker-for-mac, it's not docker-sync made up.

docker container does not need an OS, but each container has one. Why?

"docker" is a buzz word these days and I'm trying to figure out, what it is and how does it work. And more specifically, how is it different from the normal VM (e.g. VirtualBox, HyperV or WMWare solutions).
The introduction section of the documentation (https://docs.docker.com/get-started/#a-brief-explanation-of-containers) reads:
Containers run apps natively on the host machine’s kernel. They have better performance characteristics than virtual machines that only get virtual access to host resources through a hypervisor. Containers can get native access, each one running in a discrete process, taking no more memory than any other executable.
Bingo! Here is the difference. Containers run directly on the kernel of hosting OS, this is why they are so lightweight and fast (plus they provide isolation of processes and nice distribution mechanism in the shape of docker hub, which plays well with the ability to connect containers with each other).
But wait a second. I can run Linux applications on windows using docker - how can it be? Sure, there is some VM. Otherwise we would just not get job done...
OK, but how does it look like, when we work on Linux host??? And here comes real confusion... there one still defines OS as a base image for every image we want to create. Even if we say "FROM scratch" - scratch is still some minimalistic kernel... So here comes
QUESTION 1: If I run e.g. CentOS host, can I create the container, which would directly use kernel of this host operating system (and not VM, which includes its own OS)? If yes, how can I do it? If no, why the documentaion of docker lies to us (as then docker images always run within some VM and it is not too much different from other VMs, or ist it?)?
After some thinking about it and looking around I was wondering, if some optimization is done for running the images. Here comes
QUESTION 2: If I run two containers, images of both of which are based on the same parent image, will this parent image be loaded into memory only once? Will there be one VM for each container or just one, which runs both containers? And what if we use different OSs?
The third question is quite beaten:
QUESTION 3: Are there somewhere some resources, which describe this kind of things... because most of the articles, which discuss docker just tell "it is so cool, you must definitely use ut. Just run one command and be happy"... which does not explain too much.
Thanks.
Docker "containers" are not virtual machines; they are just regular processes running on the host system (and thus always on the host's Linux kernel) with some special configuration to partition them off from the rest of the system.
You can see this for yourself by starting a process in a container and doing a ps outside the container; you'll see that process in the host's list of all processes. Running ps in the containerized process, however, will show only processes in that container; limiting the view of processes on the system is one of the facilities that containerization provides.
The container is also usually given a limited or separate view of many other system resources, such as files, network interfaces and users. In particular, containerized processes are often given a completely different root filesystem and set of users, making it look almost as if it's running on a separate machine. (But it's not; it still shares the host's CPU, memory, I/O bandwidth and, most importantly, Linux kernel of the host.)
To answer your specific questions:
On CentOS (or any other system), all containers you create are using the host's kernel. There is no way to create a container that uses a different kernel; you need to start a virtual machine for that.
The image is just files on disk; these files are "loaded into memory" in the same way any files are. So no, for any particular disk block of a file in a shared parent image there will never be more than one copy of that disk block in memory at once. However, each container has its own private "transparent" filesystem layer above the base image layer that is used to handle writes, so if you change a file the changed blocks will be stored there, and will now be separate from the underlying image that that other processes (who have not changed any blocks in that file) see.
In Linux you can try man cgroups and man cgroup_namespaces to get some fairly technical details about the cgroup mechanism, which is what Docker (and any other containerization scheme on Linux) uses to limit and change what a containerized process sees. I don't have any other particular suggestions on readings directly related to this, but I think it might help to learn the technical details of how processes and various other systems work on Unix and POSIX systems in general, because understanding that gives you the background to understand what kinds of things containerization does. Perhaps start with learning about the chroot(2) system call and programming with it a bit (or even playing around with the chroot(8) program); that would give you a practical hands-on example of how one particular area of containerization.
Follow-up questions:
There is no kernel version matching; only the one host kernel is ever used. If the program in the container doesn't work on that version of that kernel, you're simply out of luck. For example, try runing the Docker official centos:6 or centos:5 container on a Linux system with a 4.19 or later kernel, and you'll see that /bin/bash segfaults when you try to start it. The kernel and userland program are not compatible. If the program tries to use newer facilities that are not in the kernel, it will similarly fail. This is no different from running the same binaries (program and shared libraries!) outside of a container.
Windows and Macintosh systems can't run Linux containers directly, since they're not Linux kernels with the appropriate facilities to run even Linux programs, much less supporting the same extra cgroup facilities. So when you install Docker on these, generally it installs a Linux VM on which to run the containers. Almost invariably it will install only a single VM and run all containers in that one VM; to do otherwise would be a waste of resources for no benefit. (Actually, there could be benefit in being able to have several different kernel versions, as mentioned above.)
Docker does not has an OS in its containers. In simple terms, a docker container image just has a kind of filesystem snapshot of the linux-image the container image is dependent on.
The container-image includes some basic programs like bash-shell, vim-editor etc to facilitate developer to work easily with the docker image. Also, docker images can include pre-installed dependencies like nodeJS, redis-server etc as we can find on docker hub.
Docker behind the scene uses the host OS which is linux itself to run its containers. The programs included in linux-like filesystem snapshot that we see in form of docker containers actually runs on the host OS in isolation.
The container-images may sound like different linux distros but they are the filesystem snapshot of those distros. All Linux distributions are based on the same kernel. They differ in the programs, tools and dependencies that they ships with.
Also take note of this comment [click]. It is very much relevant to this question.
Hope this helps.
It's now long time since I posted this question, but it seems, like it still get hits... So I decided to answer it - in fact mainly the question, which is in the title (the questions in the text are carefully answered by Curt J. Sampson).
So, the discussion of the "main" question: if containers are not VMs, then why do we need VMs for them?
As you may guess, I am working on windows (on Linux this question would not emerge, because on Linux one does not need VMs for docker).
The reason, why we need a VM for containers in Winodows is pretty obvious (probably this is the reason, why nobody mentions it explicitly). As was already mentioned here and it many other FAQs, containers reuse kernel and some other resources of the hosting OS. Taking into account, that most of the containers available out there are based on Linux, one may conclude, that those containers need host OS to provide Linux kernel for them to run. Which is not natively easy on Windows (I am not sure, may be it is now possible with Linux subsystem). This is why on Windows we need one VM, which runs Linux and docker service inside this VM. And then, when we start the containers, they are also started inside this VM (and reuse the resources of its Linux OS). All the containers run inside the same VM. Getting a bit more technical: by default docker uses Hyper-V to run this linux VM, but one can also use Docker-Toolbox, which uses Oracle VirtualBox. By the way, VM can be freely seen in the Virtual Box interface. Nice part is that Docker (or Docker toolbox) takes care about managing this VM and we don't need to care about it.
Now some bonus question, which that time confused me even more. One may think: "Ok, it is clear now. If we run Linux container on Winodws OS, then we need Linux kernel and thus need VM with Linux. But if we run Windows container on Windows (by the way, it exists), then VM should not be needed, right?..." Answer: "wrong" (or almost wrong). :) The problem is, that the Windows based containers (at least those, which I saw) use windows server kernel, which is not available e.g. in Windows 10. Thus one still need VM with special version of Windows Server running on it. In fact MS even created special version of Windows Server, which can be run on VM for development purposes free of charge specifically to enable development of Windows-Server based containers. If my understanding is correct, those containers should be possible to run without VM on Windows Server. I should admit, that I never checked it though.
I hope, that this messy explanation may help someone to better understand the topic.
We need a VM to run a docker on the host machine ( this is achieved through the docker toolbox) if it is windows, on Linux we don't even need this. Once we have a docker toolbox container in itself doesn't need a VM, each container has a baseline image which is very minimal and reuses a lot of stuff with the host kernel hence making it lightweight compared to VM. You can run many such container using single host kernel.

Can I share docker images between windows and linux?

this might seems a stupid question, but here I am :
I'm running Ubuntu 16.04 and managed to install windows 10 in dual boot.
Running docker exclusively in linux so far, I decided to give it a try on Windows 10.
As I already downloaded several docker images on my Linux system, I'm willing to have a "shared" like development environment. I must admit this would be a waste of time and disk space to download Docker images I already downloaded before (on linux) on my fresh windows install.
So my question is simple : Can I use my linux images / containers on windows. I'm thinking of something like a global path variable pointing to my linux images to configure on docker windows.
Any idea if this is possible, and if yes, the pros and cons and the caveats ?
Thanks for helping me on this one.
Well i would suggest to create your local registry and then push these images there and pull it in your windows docker.
Sonatype nexus(artifact storage repository) can be used to store your docker images. Check if this helps.
I guess it's not possible to share the same folder (to reduce disk usage) since the stored files are totally different:
Under Windows the file is:
C:\Users\Public\Documents\Hyper-V\Virtual hard disks \MobyLinuxVM.vhdx
the vhdx extension is specific to MS systems.
and under linux it consist of 2 files:
/var/lib/docker/devicemapper/devicemapper/data
/var/lib/docker/devicemapper/devicemapper/metadata
see here for details
Where are Docker images stored on the host machine?
The technology under this is to have a specific fileSystem optimal for docker. Even if they used the same fileSystem storage, it wouldn't be a good idea imho.
If the purpose is only to gain time for resintalling, just dump all the images from on system, and re-pull them on the other one.
docker images --format "{{.Repository}}" > image-list.txt
then loop on the other OS
while read p; do
docker pull $p
done < image-listtxt

Docker container memory usage

I have a Lamp Docker Image.
I want to start 500 containers of this image, how many RAM i need?
I have tracked memory usage of each new container and it nearly the same as any other container of its image.
So,if single container is using 200 MB, I can start 5 containers on Linux machine with 1 GB RAM.
My question is:
Is docker container using same memory as, for example, same Virtual Machine Image?
May be I am doing something wrong in docker configuration or docker files?
docker stats might give you the feedback you need. https://docs.docker.com/engine/reference/commandline/stats/
I don't know the exact details of the docker internals, but the general idea is that Docker tries to reuse as much as it can. So if you start five identical containers, it should run much faster than a virtual machine, because docker should only have one instance of the base image and file system which all containers refer to.
Any changes to the file system of one container will be added as a layer on top, only marking the change. The underlying image will not be changed, so the five containers can still refer to the same single base image.
The virtual machine however (i believe) will have a complete copy of the file system for each of the five instances, because it doesn't use a layered file system.
So I'm not sure how you can determine exactly how much memory you need, but this should make the concept clearer to you. You could start one container to see the 'base memory' that will be needed for one and then each new container should only add a smaller constant amount of memory and that should give you a broad idea how much you need.

Resources