I'm trying to find a way to identify what container created a volume as well as where it wants to mount to & also If it will be reused when the container restarts for volumes that are currently not in use.
I know I can see what container is currently using a volume, & where it's mounted to in said container, but that isn't enough. I need to identify containers that are no longer running.
The Situation
I've noticed a frequently reoccurring problem with Docker, I create a container to test it out, make some adjustments, restart it, make some more, restart it, until I get it working how I want it to.
In the process, many times, I come across containers that create worthless volumes. These, after the fact, I can identify as 8K volumes not currently in use & just delete them.
But many times these volumes aren't even persistent, as the container will create a new one each time it runs.
At times I look at my Volumes list & see over 100 volumes, none of which are currently in use. The 8KB ones I'll delete without a second thought, but the ones that are 12KB or 24KB or 100KB or 5Mb, etc, etc I don't want to just delete.
I use a Portainer agent inside Portainer solely for the ability to quickly browse these volumes & decide if it needs to be kept, transferred to a Bind mount, or just discarded, but it's becoming more & more of a problem & I figure there has to be some way to identify the container they came from. I'm sure it will require some sort of code exploration, but where? is there not a tool to do this? If I know where the information is I should be able to write a script or even make a container just for this purpose, I just don't know where to begin.
The most annoying thing is when a container creates a second container & that container, that I have no control over, is using a volume, but it creates a new one each time it starts.
Some examples
adoring_hellman created by VS Code Server container linuxserver/code-server
datadog/agent creates a container I believe is called st-vector or something similar
Both which have access to /var/run/docker.sock
Related
How I understand the part I understand. )
Docker service command replaces docker run command. Docker run is about single container. Docker service is about running multiple containers as one managable unit. And service is a part if the unit.
In fact, in terms of docker swarm, service is a container, but from another (group) point of view (in current point of reasoning). But why service, why not just container?
Because docker run just simply runs a container. Docker service runs container as service. What the service means here?
Service here is some fault-tolerant entity, that provide some service (in a wide sense - you ask something, it does something). But what are criterias of fault-tolerance? What may happen to container?
It may be stopped for some reasons - you need to restart it, container can be out of resources - you need to edit resources limits, one container may not be enough to handle the current load, so you need more instances to balance the load (*so since now I understand why service is not just one container, so it can't be named just container, we need another word here. One more thing I can state here - in terms of service we can have only one image, it is always about only one, the same image, but we can have multiple containers based on that image *).
You can control all that, manually or automatically - another question, but you are able to do that in a system way now. So here we have not just a launched container, we have a goal - keep the service (in a wide sense) always running. To reach the goal some tasks should be completed. As you see, I already started talking using docker thesaurus (but came to that point from point of view of logic).
What tasks we are talking about when we are talking about the described goal?
Let's say, load has increased. We need, let's say, two more replicas of a container. Let's decompose that subgoal.
Subgoals (or tasks):
Create container №1.
Create container №2.
Create container №3.
Amazing, here we have three tasks we need (orchestrator needs actually) to perform.
Here task closely connected to container. That is what I often see in many-many tutorials. Task = running container instance.
But let's move forward in reasoning. Now we need to edit avaliable resources for container. So our goal is to change some resources limits. Our task here is to change resources. And since that moment we have that task != running container instance, this is not connected to container creation, we just change settings for some current running container.
What I can't understand.
Why different tutorials say that always task = running container? In described case, we are not restart container by hand, we just edit available resources, so the primary task for us not to run container (like when we create new replicas), but change available resources for it.
I want to understand logic of creators behind names (terminology).
I'm starting to learn Kubernetes recently and I've noticed that among the various tutorials online there's almost no mention of Volumes. Tutorials cover Pods, ReplicaSets, Deployments, and Services - but they usually end there with some example microservice app built using a combination of those four. When it comes to databases they simply deploy a pod with the "mongo" image, give it a name and a service so that other pods can see it, and leave it at that. There's no discussion of how the data is written to disk.
Because of this I'm left to assume that with no additional configuration, containers are allowed to write files to disk. I don't believe this implies files are persistent across container restarts, but if I wrote a simple NodeJS application like so:
const fs = require("fs");
fs.writeFileSync("test.txt", "blah");
const value = fs.readFileSync("test.txt", "utf8");
console.log(value);
I suspect this would properly output "blah" and not crash due to an inability to write to disk (note that I haven't tested this because, as I'm still learning Kubernetes, I haven't gotten to the point where I know how to put my own custom images in my cluster yet -- I've only loaded images already on Docker Hub so far)
When reading up on Kubernetes Volumes, however, I came upon the Ephemeral Volume -- a volume that:
get[s] created and deleted along with the Pod
The existence of ephemeral volumes leads me to one of two conclusions:
Containers can't write to disk without being granted permission (via a Volume), and so every tutorial I've seen so far is bunk because mongo will crash when you try to store data
Ephemeral volumes make no sense because you can already write to disk without them, so what purpose do they serve?
So what's up with these things? Why would someone create an ephemeral volume?
Container processes can always write to the container-local filesystem (Unix permissions permitting); but any content that goes there will be lost as soon as the pod is deleted. Pods can be deleted fairly routinely (if you need to upgrade the image, for example) or outside your control (if the node it was on is terminated).
In the documentation, the types of ephemeral volumes highlight two major things:
emptyDir volumes, which are generally used to share content between containers in a single pod (and more specifically to publish data from an init container to the main container); and
injecting data from a configMap, the downward API, or another data source that might be totally artificial
In both of these cases the data "acts like a volume": you specify where it comes from, and where it gets mounted, and it hides any content that was in the underlying image. The underlying storage happens to not be persistent if a pod is deleted and recreated, unlike persistent volumes.
Generally prepackaged versions of databases (like Helm charts) will include a persistent volume claim (or create one per replica in a stateful set), so that data does get persisted even if the pod gets destroyed.
So what's up with these things? Why would someone create an ephemeral volume?
Ephemeral volumes are more of a conceptual thing. The main need for this concept is driven from microservices and orchestration processes, and also guided by 12 factor app. But why?
Because, one major use case is when you are deploying a number of microservices (and their replicas) using containers across multiple machines in a cluster you don't want a container to be reliant on its own storage. If containers rely on their on storage, shutting them down and starting new ones affects the way your app behaves, and this is something everyone wants to avoid. Everyone wants to be able to start and stop containers whenever they want, because this allows easy scaling, updates, etc.
When you actually need a service with persistent data (like DB) you need a special type of configuration, especially if you are running on a cluster of machines. If you are running on one machine, you could use a mounted volume, just to be sure that your data will persist even after container is stopped. But if you want to just load balance across hundreds of stateless API services, ephemeral containers is what you actually want.
Our application run in a Docker Container. The application retrieves files from a directory outside our network. The application needs to process the files and save them somewhere for a year.
What is the best approach? Do we use the writable part of the Docker or a directory on the hosting system? what is the addvantages and drawbacks?
Preferably you have your container as dumb as possible. Also if your container gets killed or crashes, you would lose your data if you have it stored within the container.
A volume mount would be a possible solution, what would make it possible to store files directly to the host; but my question would then be: why using Docker?
Another advantage of having a quite dumb container is that you would be able to scale more easily. Having a queue, you could scale the amount of containers needed upto the a certain threshold to process the queue. :-)
Therefore I would advise you to store the data somewhere else; another FTP-server, hosted and managed by yourself.
I want to take a holistic approach backing up multiple machines running multiple Docker containers. Some might run, for example, Postgres databases. I want to back up this system, without having to have specific backup commands for different types of volumes.
It is fine to have a custom external script that sends e.g. signals to containers or runs Docker commands, but I strongly want to avoid anything specific to a certain image or type of image. In the example of Postgres, the documentation suggests running postgres-specific commands to backup databases, which goes against the design goals for the backup solution I am trying to create.
It is OK if I have to impose restrictions on the Docker images, as long as it is reasonably easy to implement by starting from existing Docker images and extending.
Any thoughts on how to solve this?
I just want to stress that I am not looking for a solution for how to back up Postgres databases under Docker, there are already many answers explaining how to do so. I am specifically looking for a way to back up any volume, without having to know what it is or having to run specific commands for its data.
(I considered whether this question belonged on SO or Serverfault, but I believe this is a problem to be solved by developers, hence it belongs here. Happy to move it if consensus is otherwise)
EDIT: To clarify, I want do something similar to what is explained in this question
How to deal with persistent storage (e.g. databases) in docker
but using the approach in the accepted answer is not going to work with Postgres (and I am sure other database containers) according to documentation.
I'm skeptical that there is a custom solution, holistic, multi machine, multi container, application/container agnostic approach. From my point of view there is a lot of orchestration activities necessary in the first place. And I wonder if you wouldn't use something like Kubernetes anyways that - supposedly - comes with its own backup solution.
For single machine, multi container setup I suggest to store your container's data, configuration, and eventual build scripts within one directory tree (e.g. /docker/) and use a standard file based backup program to backup the root directory.
Use docker-compose to managed your containers. This lets you store the configuration and even build options in a file(s). I have an individual compose file for each service, but a single one would also work.
Have a subdirectory for each service. Mount bind-mount directories aka volumes of the container there. If you need to adapt the build process more thoroughly you can easily store scripts, sources, Dockerfiles, etc. in there as well.
Since containers are supposed to be ephemeral, all persistent data should be in bind-mount and therefore in the main docker directory.
I'm really struggling to grasp the workflow of Docker. The issue is: where exactly are the deliverables?
One would expect the developers image to be the same one as the one used for testing, production.
But how can one develop use auto-reload and such(probably by some shared volumes) without building the image again and again?
The image for testers should be just fire and you are ready to go. How are the images split?
I heard something about data-container which holds probably the app deliverables. So does it mean that I will have one container for DB, one for App. Server and one versioned image for my code itself?
The issue is ,where exactly are the deliverables.
static deliverables (which never changes) are directly copied in the image.
dynamic deliverables (which are generated during a docker run session, or which are updated) are in volumes (either host mounted volume or data container volume), in order to have persistence across container life-cycle.
does it mean that I will have one container for DB, one for App.
Yes, in addition of your application container (which is what docker primarily is: it puts applications in container), you would have data container in order to isolate the data that needs persistence.