Purpose of Writable but Stateless Partitions under Container OS - docker

Recently I was running a container under Compute Engine's container OS, and my data (my TLS certificate specifically) wasn't getting persisted outside of the container across reboots because I was writing to /etc. After a bit of time, I stumbled upon Disks and file system overview - File system, which explains how their are two types of writable partitions: stateful and stateless. /etc is stateless, and I needed to move my persisted files to /var for stateful storage.
But I'm left wondering about the purpose of writable, stateless partitions. Deploying Containers - Limitations explains how a container OS (on a VM instance) can only run one container. What does a writable but stateless partition enable compared to just writing data within the docker container, since both of those writable locations would be lost on host OS reboot anyway? Only benefit I could see would be sharing data across containers on the same host OS, but the limitation above invalidates that.

The main purpose of COS images is the security: a minimal OS, without useless system libraries and binaries and able to run containers.
So that, the /etc is stateless to not persist changes and updates (backdoors) in the most important executable library of the COS.
On the container side, it lives in memory. You can write what you want on it, it's written in memory (except if you have mount volume in your container, but it's not the purpose here). And you are limited by the amount of memory available in the container. And finally, when you stop the container, it is offloaded from the memory and of course, you lost all the data written in the container.
So now, you need to have in mind that the /etc of your container isn't the same as your /etc of your VM. Same for the /var. The /var of your container is always stateless (if not mounted from a VM volume), the /var of your VM is statefull.
In addition, the lifecycle isn't the same: You can start and stop several containers on your COS VM, without stopping and restarting it. So the VM /etc will live all the VM life, and maybe "view" several containers' life.
Eventually, the COS image is used on a Compute Engine to run a container, and only one at a time. However, this COS image is also used for Kubernetes node pools (GKE on GCP) and, typically with Kubernetes, you can run several Pod (1+ containers) on the same Node (Compute Engine instance).
All this use cases can show you the meaning and the usefulness (or not) of these restrictions and features (and I hope I was clear in my explanations!)

Related

Best way for a Docker container to work with huge shared drive

I am building an application in a docker container, that in the end will have to read from a filesystem that is quite large (terabytes) in size.
The application itself will be running on another device.
I am now wondering which is better to use for connecting the container to this filesystem, a volume or a bind mount?
Only read below if you want to hear more detailed reasoning from me
The documentation for the volume state, if I read them correctly, that the content of the volume will be in a place on the host system which docker has access to. This makes me think that when I use a volume, Docker will try to place a copy of the really large filesystem on the shared drive, somewhere on the device that the application will be running on.
The bind mount documentation says that the information will be stored anywhere on the host system. This seems to indicate to me that the original information will remain on the shared drive, without creating any copies. But several other questions on this site have stated that the performance of the bind mount is a lot worse than the volume.
Since you already know the location on the host system you want to use, you should use a bind mount.
docker run -v /mnt/very-large-device:/data ...
With a named volume, the storage is in space Docker controls, usually inside /var/lib/docker/volumes on a native-Linux host or inside the hidden VM on other systems. You can't (easily) control where the underlying storage actually is. (You could configure the system to have the entire Docker installation on the large disk, or use extended volume options to create a named volume that's actually a bind mount; both are more complicated than using Docker's native bind-mount option.)
... the performance of the bind mount is a lot worse than the volume.
This is principally true on non-Linux systems. On native-Linux systems, bind mounts and named volumes should have identical performance, and that should be approximately the same as the container filesystem. On non-Linux systems, though, Docker needs to bridge between the Linux system inside the hidden VM and the host's filesystem, and this can be slow depending on access patterns.
As will all performance questions, the best way to determine which thing will be fastest is to actually set up an experiment on your intended system and measure it.

container has its own disk but shared memory?

I am new to docker, just a question on contain basis, below is a picture from a book:
It says that containers share the the host computer's CPU, OS and memory but each container has its own computer name, ip address and disk.
I am a little bit confused about the disk, isn't that disk is just like memory as resource? if a container has 1gb data inside, it must get allocated 1gb disk space by the host computer from its own disk just like memory? so container's disk is also shared?
You can make that diagram more precise by saying that each container has its own filesystem. /usr in a container is separate from /usr on other containers or on the host, even if they share the same underlying storage.
By way of analogy to ordinary processes, each process has its own address space and processes can't write to each other's memory, even though they share the same underlying memory hardware. The kernel assigns specific blocks (pages) of physical memory to specific process address spaces. If you go out of your way, there are actually a couple of ways to cause blocks of memory to be shared between processes. The same basic properties apply to container filesystems.
On older Docker installations (docker info will say devicemapper) Docker uses a reserved fixed-size disk area. On newer Docker installations (docker info will say overlay2) Docker can use the entire host disk. The Linux kernel is heavily involved in mapping parts of the host disk (or possibly host filesystem) into the per-container filesystem spaces.

Jenkins in a container is much slower than on the server itself

We recently had our jenkins redone. We decided to have the new version on a docker container on the server.
While migrating, I noticed that the jenkins is MUCH slower when its in a container than when it ran on the server itself.
This is a major issue and could mess up our migration.
I tried looking for ways to give more resources to the container with not much help.
How can I speed the jenkins container/ give it all the resources it needs on the server (the server is dedicated only to jenkins).
Also, how do I devide these resources when I want to start up slave containers as well?
Disk operations
One thing that can go slow with Docker is when the process running in a container is making a lot of I/O calls to the container file system. The container file system is a union file system which is not optimized for speed.
This is where docker volumes are useful. Additionally to providing a location on the file system which survives container deletion, disk performance on a docker volume is good.
The Jenkins Docker image defines the JENKINS_HOME location as a docker volume, so as long as your Jenkins jobs are making their disk operations within that location you should be fine.
If you determine that disk access on that volume is still too slow, you could customize the mount location of that volume on your docker host so that it would end up being mounted on a fast drive such as a SSD.
Another trick is to make a docker volume mounted to RAM with tmpfs. Note that such a volume does not offer persistence and that data at that location will be lost when the container is stopped or deleted.
JVM memory exhaustion / Garbage collector
As Jenkins is a Java application, another potential issue comes in mind: memory exhaustion. In the case the JVM on which the Jenkins process runs on is too limited in memory, the Java garbage collector will runs too frequently. You can witness that when you realize your Java app is using too much CPU (the garbage collector uses CPU). If that is the case, give more memory to the JVM:
docker run-p 8080:8080 -p 50000:50000 --env JAVA_OPTS="-Xmx2048m -Djava.awt.headless=true" jenkins/jenkins:lts
Network
Docker containers have a virtual network stack and custom network settings. You also want to make sure that all network related operation are fast.
The DNS server might be an issue, check it by executing ping <some domain name> from the Jenkins container.

Bandwidth and Disk space for Docker container

Does docker container get the same band-width as the host container? Or do we need to configure min and(or) max. I 've noticed that we need to override default RAM(which is 2 GB) and Swap space configuration if we need to run CPU intensive jobs.
Also do we need to configure the disk-space ? Or does it by default get as much space as the actual hard disk.
Memory and CPU are controlled using cgroups by docker. If you do not configure these, they are unrestricted and can use all of the memory and CPU on the docker host. If you run in a VM, which includes all Docker for Desktop installs, then you will be limited to that VM's resources.
Disk space is usually limited to the disk space available in /var/lib/docker. For that reason, many make this a different mount. If you use devicemapper for docker's graph driver (this has been largely deprecated), created preallocated blocks of disk space, and you can control that block size. You can restrict containers by running them with read-only root filesystems, and mounting volumes into the container that have a limited disk space. I've seen this done with loopback device mounts, but it requires some configuration outside of docker to setup the loopback device. With a VM, you will again be limited by the disk space allocated to that VM.
Network bandwidth is by default unlimited. I have seen an interesting project called docker-tc which monitors containers for their labels and updates bandwidth settings for a container using tc (traffic control).
Does docker container get the same band-width as the host container?
Yes. There is no limit imposed on network utilization. You could maybe impose limits using a bridge network.
Also do we need to configure the disk-space ? Or does it by default get as much space as the actual hard disk.
It depends on which storage driver you're using because each has its own options. For example, devicemapper uses 10G by default but can be configured to use more. The recommended driver now is overlay2. To configure start docker with overlay2.size.
This depends some on what your host system is and how old it is.
In all cases network bandwidth isn't explicitly limited or allocated between the host and containers; a container can do as much network I/O as it wants up to the host's limitations.
On current native Linux there isn't a desktop application and docker info will say something like Storage driver: overlay2 (overlay and aufs are good here too). There are no special limitations on memory, CPU, or disk usage; in all cases a container can use up to the full physical host resources, unless limited with a docker run option.
On older native Linux there isn't a desktop application and docker info says Storage driver: devicemapper. (Consider upgrading your host!) All containers and images are stored in a separate filesystem and the size of that is limited (it is included in the docker info output); named volumes and host bind mounts live outside this space. Again, memory and CPU are not intrinsically limited.
Docker Toolbox and Docker for Mac both use virtual machines to provide a Linux kernel to non-Linux hosts. If you see a "memory" slider you are probably using a solution like this. Disk use for containers, images, and named volumes is limited to the VM capacity, along with memory and CPU. Host bind mounts generally get passed through to the host system.

I'm still confused by Docker containers and images

I know that containers are a form of isolation between the app and the host (the managed running process). I also know that container images are basically the package for the runtime environment (hopefully I got that correct). What's confusing to me is when they say that a Docker image doesn't retain state. So if I create a Docker image with a database (like PostgreSQL), wouldn't all the data get wiped out when I stop the container and restart? Why would I use a database in a Docker container?
It's also difficult for me to grasp LXC. On another question page I see:
LinuX Containers (LXC) is an operating system-level virtualization
method for running multiple isolated Linux systems (containers) on a
single control host (LXC host)
What does that exactly mean? Does it mean I can have multiple versions of Linux running on the same host as long as the host support LXC? What else is there to it?
LXC and Docker, Both are completely different. But we say both are container holders.
There are two types of Containers,
1.Application Containers: Whose main motto is to provide application dependencies. These are Docker Containers (Light Weight Containers). They run as a process in your host and gets all the things done you want. They literally don't need any OS Image/ Boot Up thing. They come and they go in a matter of seconds. You cannot run multiple process/services inside a docker container. If you want, you can do run multiple process inside a docker container, but it is laborious. Here, resources (CPU, Disk, Memory, RAM) will be shared.
2.System Containers: These are fat Containers, means they are heavy, they need OS Images
to launch themselves, at the same time they are not as heavy as Virtual Machines, They are very similar to VM's but differ in architecture a bit.
In this, Let us say Ubuntu as a Host Machine, if you have LXC installed and configured in your ubuntu host, You can run a Centos Container, a Ubuntu(with Differnet Version), a RHEL, a Fedora and any linux flavour on top of a Ubuntu Host. You can also run multiple process inside an LXC contianer. Here also resoucre sharing will be done.
So, If you have a huge application running in one LXC Container, it requires more resources, simultaneously if you have another application running inside another LXC container which require less resources. The Container with less requirement will share the resources with the container with more resource requirement.
Answering Your Question:
So if I create a Docker image with a database (like PostgreSQL), wouldn't all the data get wiped out when I stop the container and restart?
You won't create a database docker image with some data to it(This is not recommended).
You run/create a container from an image and you attach/mount data to it.
So, when you stop/restart a container, data will never gets lost if you attach that data to a volume as this volume resides somewhere other than the docker container (May be a NFS Server or Host itself).
Does it mean I can have multiple versions of Linux running on the same host as long as the host support LXC? What else is there to it?
Yes, You can do this. We are running LXC Containers in our production.

Resources