How do I limit container disk usage without evicting? - docker

I'm trying to use Kubernetes on GKE (or EKS) to create Docker containers dynamically for each user and give users shell access to these containers and I want to be able to set a maximum limit on disk space that a container can use (or at least on one of the folders within each container) but I want to implement this in such a way that the pod isn't evicted if the size is exceeded. Instead ideally a user would get an error when trying to write more data to disk than the specified limit (e.g., Disk quota exceeded, etc).
I'd rather not have to use a Kubernetes volume that's backed by a gcePersistentDisk or an EBS volume to minimize the costs. Is there a way to achieve this with Kubernetes?

Assuming you're using emptyDir volume on Kubernetes, which is a temporary disk attached to your pod, you can set a size for that.
See the answer at https://stackoverflow.com/a/45565438/54929, this question is likely a duplicate.

Related

Increasing the storage space of a docker container on Windows to 2-3TB

Working on a windows computer with 5TB available space. Working on building an application to process large amounts of data that uses docker containers to create replicable environments. Most of the data processing is done in parallel using many smaller docker containers, but the final tool/container requires all the data to come together in one place. The output area is mounted to a volume, but most of the data is just copied into the container. This will be multiple TBs of storage space. RAM luckily isn't an issue in this case.
Willing to try any suggestions and make what changes I can.
Is this possible?
I've tried increasing disk space for docker using .wslconfig but this doesn't help.

What purpose to ephemeral volumes serve in Kubernetes?

I'm starting to learn Kubernetes recently and I've noticed that among the various tutorials online there's almost no mention of Volumes. Tutorials cover Pods, ReplicaSets, Deployments, and Services - but they usually end there with some example microservice app built using a combination of those four. When it comes to databases they simply deploy a pod with the "mongo" image, give it a name and a service so that other pods can see it, and leave it at that. There's no discussion of how the data is written to disk.
Because of this I'm left to assume that with no additional configuration, containers are allowed to write files to disk. I don't believe this implies files are persistent across container restarts, but if I wrote a simple NodeJS application like so:
const fs = require("fs");
fs.writeFileSync("test.txt", "blah");
const value = fs.readFileSync("test.txt", "utf8");
console.log(value);
I suspect this would properly output "blah" and not crash due to an inability to write to disk (note that I haven't tested this because, as I'm still learning Kubernetes, I haven't gotten to the point where I know how to put my own custom images in my cluster yet -- I've only loaded images already on Docker Hub so far)
When reading up on Kubernetes Volumes, however, I came upon the Ephemeral Volume -- a volume that:
get[s] created and deleted along with the Pod
The existence of ephemeral volumes leads me to one of two conclusions:
Containers can't write to disk without being granted permission (via a Volume), and so every tutorial I've seen so far is bunk because mongo will crash when you try to store data
Ephemeral volumes make no sense because you can already write to disk without them, so what purpose do they serve?
So what's up with these things? Why would someone create an ephemeral volume?
Container processes can always write to the container-local filesystem (Unix permissions permitting); but any content that goes there will be lost as soon as the pod is deleted. Pods can be deleted fairly routinely (if you need to upgrade the image, for example) or outside your control (if the node it was on is terminated).
In the documentation, the types of ephemeral volumes highlight two major things:
emptyDir volumes, which are generally used to share content between containers in a single pod (and more specifically to publish data from an init container to the main container); and
injecting data from a configMap, the downward API, or another data source that might be totally artificial
In both of these cases the data "acts like a volume": you specify where it comes from, and where it gets mounted, and it hides any content that was in the underlying image. The underlying storage happens to not be persistent if a pod is deleted and recreated, unlike persistent volumes.
Generally prepackaged versions of databases (like Helm charts) will include a persistent volume claim (or create one per replica in a stateful set), so that data does get persisted even if the pod gets destroyed.
So what's up with these things? Why would someone create an ephemeral volume?
Ephemeral volumes are more of a conceptual thing. The main need for this concept is driven from microservices and orchestration processes, and also guided by 12 factor app. But why?
Because, one major use case is when you are deploying a number of microservices (and their replicas) using containers across multiple machines in a cluster you don't want a container to be reliant on its own storage. If containers rely on their on storage, shutting them down and starting new ones affects the way your app behaves, and this is something everyone wants to avoid. Everyone wants to be able to start and stop containers whenever they want, because this allows easy scaling, updates, etc.
When you actually need a service with persistent data (like DB) you need a special type of configuration, especially if you are running on a cluster of machines. If you are running on one machine, you could use a mounted volume, just to be sure that your data will persist even after container is stopped. But if you want to just load balance across hundreds of stateless API services, ephemeral containers is what you actually want.

Kubernetes Volume definition - explanation

I am trying to be more familiar with Kubernetes orchestration tool and I faced a conceptual issue in case of volumes.
From my understanding, a volume allocates a space on the drive in order to persist data and this volume can be mount on a pod. This is ok until now.
But what will happen in the scenario below:
We have 3 pods and each of them has mounted volume which we persist some data. In some time we don't need 3 pods anymore and we kill one of them. What about its volume and its data? These data will be lost or should we transfer these data somehow to another volume?
Sorry for this bad definition, but I am trying to understand.
Thanks in advance!
A Volume is a way to describe a place where data can be stored. It does not hae to be on a local drive, it does not have to be a network block storage. A whole bunch of volume implementations are available ranging from emptyDir, hostPath, via iSCSI, EBS all the way to NFS or GlusterFS. A volume is a place where you define a piece of more or less posix compliant filesystem.
What happens with it when it's pod is gone is mostly up to what you are using. For example EBS volume can be scraped but NFS share may stay exactly as it was.
There is even more, as you can have Persistent Volume Claims, Volume Claim Templates and Persistent Volumes, which all build up upon the Volume concept it self to provide usefull abstractions.
I strongly encurage you to read and play with all of them to get better understanding of how storage can be managed in Kubernetes.

Restrict memory across all containers

I know you can set a memory restriction per container in docker via run -m <x>, but is it possible to set an aggregate restriction across all containers, rather than each container individually?
For example, if I have 5 containers and 2GB of RAM, is it possible to configure docker so that it can allocate in total no more than 1GB, meaning the sum of memory allocated to containers may not pass 1GB?
For now kubernetes does limiting only on container level via resources: limits parameter. And only for cpu and memory.
You could control how much memory/cpu a pod is using, since you define the pod. So, if you assign specific max usage for each container, the pod will not be able to use more resources then the sum of the individual ones.
This is not ideal, because you may want to let each container use as much memory as needed, but the pod to not get past a certain treshold. They have an issue opened for what you want here

Docker maximum offline containers

I've read about the limitations on Docker containers, and also on the maximum number of container running, but I'd like to do the following:
Start a container on-the-fly (milliseconds).
In order to do so, I've noticed that I have to create it beforehand; this will save me about 2 seconds each time. This made me wonder:
Is there any limitation to the number of created containers? Do they use any resources?
obviously it uses disk space to store it
does it also preload it in RAM, or not?
related: is the "active" state of the process saved on stopping, or is it the process stopped, and started on start? (if the latter is the case, then why would anyone bother to re-create containers? )
does it have a reserved IP address? And if so, is there a maximum number of private IP addresses Docker will use?
... anything else that might prevent me from having 50,000 containers?
If a container is only created, there is no running process (and nothing is [pre-]cached either). I've also verified that if the container isn't running yet, the NetworkSettings section of docker inspect is blank, so no IP addresses should be allocated in this case. The metadata stored on disk to track the "container object" should be the only impact (and whatever memory the Docker daemon uses at runtime while keeping track of said metadata, which likely includes a copy of the metadata itself).
I've run for i in {0..999}; do docker create --name hello-$i hello-world; done on my local machine to test this, and it completed successfully (although took what seems like an embarrassingly long time to complete, given that it's looking up and writing out the exact same metadata repeatedly).

Resources