I've read about the limitations on Docker containers, and also on the maximum number of container running, but I'd like to do the following:
Start a container on-the-fly (milliseconds).
In order to do so, I've noticed that I have to create it beforehand; this will save me about 2 seconds each time. This made me wonder:
Is there any limitation to the number of created containers? Do they use any resources?
obviously it uses disk space to store it
does it also preload it in RAM, or not?
related: is the "active" state of the process saved on stopping, or is it the process stopped, and started on start? (if the latter is the case, then why would anyone bother to re-create containers? )
does it have a reserved IP address? And if so, is there a maximum number of private IP addresses Docker will use?
... anything else that might prevent me from having 50,000 containers?
If a container is only created, there is no running process (and nothing is [pre-]cached either). I've also verified that if the container isn't running yet, the NetworkSettings section of docker inspect is blank, so no IP addresses should be allocated in this case. The metadata stored on disk to track the "container object" should be the only impact (and whatever memory the Docker daemon uses at runtime while keeping track of said metadata, which likely includes a copy of the metadata itself).
I've run for i in {0..999}; do docker create --name hello-$i hello-world; done on my local machine to test this, and it completed successfully (although took what seems like an embarrassingly long time to complete, given that it's looking up and writing out the exact same metadata repeatedly).
Related
I'm trying to find a way to identify what container created a volume as well as where it wants to mount to & also If it will be reused when the container restarts for volumes that are currently not in use.
I know I can see what container is currently using a volume, & where it's mounted to in said container, but that isn't enough. I need to identify containers that are no longer running.
The Situation
I've noticed a frequently reoccurring problem with Docker, I create a container to test it out, make some adjustments, restart it, make some more, restart it, until I get it working how I want it to.
In the process, many times, I come across containers that create worthless volumes. These, after the fact, I can identify as 8K volumes not currently in use & just delete them.
But many times these volumes aren't even persistent, as the container will create a new one each time it runs.
At times I look at my Volumes list & see over 100 volumes, none of which are currently in use. The 8KB ones I'll delete without a second thought, but the ones that are 12KB or 24KB or 100KB or 5Mb, etc, etc I don't want to just delete.
I use a Portainer agent inside Portainer solely for the ability to quickly browse these volumes & decide if it needs to be kept, transferred to a Bind mount, or just discarded, but it's becoming more & more of a problem & I figure there has to be some way to identify the container they came from. I'm sure it will require some sort of code exploration, but where? is there not a tool to do this? If I know where the information is I should be able to write a script or even make a container just for this purpose, I just don't know where to begin.
The most annoying thing is when a container creates a second container & that container, that I have no control over, is using a volume, but it creates a new one each time it starts.
Some examples
adoring_hellman created by VS Code Server container linuxserver/code-server
datadog/agent creates a container I believe is called st-vector or something similar
Both which have access to /var/run/docker.sock
I'm starting to learn Kubernetes recently and I've noticed that among the various tutorials online there's almost no mention of Volumes. Tutorials cover Pods, ReplicaSets, Deployments, and Services - but they usually end there with some example microservice app built using a combination of those four. When it comes to databases they simply deploy a pod with the "mongo" image, give it a name and a service so that other pods can see it, and leave it at that. There's no discussion of how the data is written to disk.
Because of this I'm left to assume that with no additional configuration, containers are allowed to write files to disk. I don't believe this implies files are persistent across container restarts, but if I wrote a simple NodeJS application like so:
const fs = require("fs");
fs.writeFileSync("test.txt", "blah");
const value = fs.readFileSync("test.txt", "utf8");
console.log(value);
I suspect this would properly output "blah" and not crash due to an inability to write to disk (note that I haven't tested this because, as I'm still learning Kubernetes, I haven't gotten to the point where I know how to put my own custom images in my cluster yet -- I've only loaded images already on Docker Hub so far)
When reading up on Kubernetes Volumes, however, I came upon the Ephemeral Volume -- a volume that:
get[s] created and deleted along with the Pod
The existence of ephemeral volumes leads me to one of two conclusions:
Containers can't write to disk without being granted permission (via a Volume), and so every tutorial I've seen so far is bunk because mongo will crash when you try to store data
Ephemeral volumes make no sense because you can already write to disk without them, so what purpose do they serve?
So what's up with these things? Why would someone create an ephemeral volume?
Container processes can always write to the container-local filesystem (Unix permissions permitting); but any content that goes there will be lost as soon as the pod is deleted. Pods can be deleted fairly routinely (if you need to upgrade the image, for example) or outside your control (if the node it was on is terminated).
In the documentation, the types of ephemeral volumes highlight two major things:
emptyDir volumes, which are generally used to share content between containers in a single pod (and more specifically to publish data from an init container to the main container); and
injecting data from a configMap, the downward API, or another data source that might be totally artificial
In both of these cases the data "acts like a volume": you specify where it comes from, and where it gets mounted, and it hides any content that was in the underlying image. The underlying storage happens to not be persistent if a pod is deleted and recreated, unlike persistent volumes.
Generally prepackaged versions of databases (like Helm charts) will include a persistent volume claim (or create one per replica in a stateful set), so that data does get persisted even if the pod gets destroyed.
So what's up with these things? Why would someone create an ephemeral volume?
Ephemeral volumes are more of a conceptual thing. The main need for this concept is driven from microservices and orchestration processes, and also guided by 12 factor app. But why?
Because, one major use case is when you are deploying a number of microservices (and their replicas) using containers across multiple machines in a cluster you don't want a container to be reliant on its own storage. If containers rely on their on storage, shutting them down and starting new ones affects the way your app behaves, and this is something everyone wants to avoid. Everyone wants to be able to start and stop containers whenever they want, because this allows easy scaling, updates, etc.
When you actually need a service with persistent data (like DB) you need a special type of configuration, especially if you are running on a cluster of machines. If you are running on one machine, you could use a mounted volume, just to be sure that your data will persist even after container is stopped. But if you want to just load balance across hundreds of stateless API services, ephemeral containers is what you actually want.
I have a large number of bytes per second coming from a sensor device (e.g., video) that are being read and processed by a process in a Docker container.
I have a second Docker container that would like to read the processed byte stream (still a large number of bytes per second).
What is an efficient way to read this stream? Ideally I'd like to have the first container write to some sort of shared memory buffer that the second container can read from, but I don't think separate Docker containers can share memory. Perhaps there is some solution with a shared file pointer, with the file saved to an in-memory file system?
My goal is to maximize performance and minimize useless copies of data from one buffer to another as much as possible.
Edit: Would love to have solutions for both Linux and Windows. Similarly, I'm interested in finding solutions for doing this in C++ as well as python.
Create a fifo with mkfifo /tmp/myfifo. Share it with both containers: --volume /tmp/myfifo:/tmp/myfifo:rw
You can directly use it:
From container 1: echo foo >>/tmp/myfifo
In Container 2: read var </tmp/myfifo
Drawback: Container 1 is blocked until Container 2 reads the data and empties the buffer.
Avoid the blocking: In both containers, run in bash exec 3<>/tmp/myfifo.
From container 1: echo foo >&3
In Container 2: read var <&3 (or e.g. cat <&3)
This solution uses exec file descriptor handling from bash. I don't know how, but certainly it is possible with other languages, too.
Using simple TCP socket would be my first choice. Only if measurements show that we absolutely need to squeeze the last bit of performance from the system that I would fall back to or pipes or shared memory.
Going by the problem statement, the process seems to be bound by the local CPU/mem resources and that the limiting factors are not external services. In that case having both producer and consumer on the same machine (as docker containers) might bound the CPU resource before anything else - BUT I will first measure before acting.
Most of the effort in developing a code is spent in maintaining it. So I favor mainstream practices. TCP stack has rock solid foundations and it is as optimized for performance as humanly possible. Also it is lot more (completely?) portable across platforms and frameworks. Docker containers on same host when communicating over TCP do not hit wire. If some day the processes do hit resource limit, you can scale horizontally by splitting the producer and consumer across physical hosts - manually or say using Kubernetes. TCP will work seamlessly in that case. If you never gonna need that level of throughput, then you also wont need system-level sophistication in inter process communication.
Go by TCP.
I am wondering if is it possible to run a new docker container by some automated means such that whenever the old container reaches a specific memory/CPU usage limit ,the old container doesn't get killed and new one balances the load.
You mean a sort of autoscaling, at the moment I don't have a built-in solution ready to be used but I can share with you my idea:
You can use a collector for metrics like cAdvisor https://github.com/google/cadvisor you can grab information about your container (you can also use docker stats to do that)
You can store this data inside a time series database like InfluxDB or prometheus.
Create a continuous query to trigger an event "create new container" when some metrics go our of your limit.
I know that you are looking for something of ready but at the moment I don't see any tools that resolve this problem.
It sounds like you need a container orchestrator for possibly other use cases. You can drive scale choices with metrics via almost any of them. Mesos, Kubernetes, or Swarm. Swarm is evolving a lot with Docker investing heavily. Swarm mode is a new feature coming in 1.12 which will put a lot of this orchestration in the core product, and would probably give you a great use case.
We have a little farm of docker containers, spread over several Amazon instances.
Would it make sense to have fewer big host images (in terms of ram and size) to host multiple smaller containers at once, or to have one host instance per container, sized according to container needs?
EDIT #1
The issue here is that we need to decide up-front. I understand that we can decide later using various monitoring stats, but we need to make some architecture and infrastructure decisions before it is going to be used. More over, we do not have control over what content is going to be deployed.
You should read
An Updated Performance Comparison of Virtual Machines
and Linux Containers
http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf
and
Resource management in Docker
https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/
You need to check how much memory, CPU, I/O,... your containers consume, and you will draw your conclusions
You can easily, at least, check a few things with docker stats and docker top my_container
the associated docs
https://docs.docker.com/engine/reference/commandline/stats/
https://docs.docker.com/engine/reference/commandline/top/