I'm new to Docker and as I understand, Docker uses the same libs/bins for multiple containers where possible.
How can I tell Docker to don't do that - so using a new lib or bin even if the same lib/bin already exists?
To be concrete:
I use this image and I want to start multiple instances of geth-testnet but all of them shall use their own blockchain.
I don't believe you need to worry about this. Docker uses hashing of the layers under the image to maximize reuse. These layers are all read only, and mounted with the union fs under a container specific read-write layer. The result is very efficient on the filesystem and transparent to the user who sees them as writable in their isolated container. However, if you modify them in one container, the change will not be visible in any other container and will be lost when the container is removed and replaced with a new instance.
Related
I'm trying to find a way to identify what container created a volume as well as where it wants to mount to & also If it will be reused when the container restarts for volumes that are currently not in use.
I know I can see what container is currently using a volume, & where it's mounted to in said container, but that isn't enough. I need to identify containers that are no longer running.
The Situation
I've noticed a frequently reoccurring problem with Docker, I create a container to test it out, make some adjustments, restart it, make some more, restart it, until I get it working how I want it to.
In the process, many times, I come across containers that create worthless volumes. These, after the fact, I can identify as 8K volumes not currently in use & just delete them.
But many times these volumes aren't even persistent, as the container will create a new one each time it runs.
At times I look at my Volumes list & see over 100 volumes, none of which are currently in use. The 8KB ones I'll delete without a second thought, but the ones that are 12KB or 24KB or 100KB or 5Mb, etc, etc I don't want to just delete.
I use a Portainer agent inside Portainer solely for the ability to quickly browse these volumes & decide if it needs to be kept, transferred to a Bind mount, or just discarded, but it's becoming more & more of a problem & I figure there has to be some way to identify the container they came from. I'm sure it will require some sort of code exploration, but where? is there not a tool to do this? If I know where the information is I should be able to write a script or even make a container just for this purpose, I just don't know where to begin.
The most annoying thing is when a container creates a second container & that container, that I have no control over, is using a volume, but it creates a new one each time it starts.
Some examples
adoring_hellman created by VS Code Server container linuxserver/code-server
datadog/agent creates a container I believe is called st-vector or something similar
Both which have access to /var/run/docker.sock
My current understanding of a Docker image is that it is a collection of individual layers. Each layer only contains deltas that are merged via the union filesystem (which simply mounts all layers on top of each other). When instantiating an image, another (writable) layer is put on top that will then contain all container-specific changes that are persisted between restarts. Please correct me if I am wrong in any of the above.
I would like to inspect the contents of each of the various layers. I am particularly interested in inspecting the top-most layer to see whether my containerized app writes any data that would bloat the container, like a log or so. I am working on macOS, which does not store all the files in /var/lib/docker/, but seems to store them in a VM. I read about the docker-machine tools that make it easy to connect to the Docker engine via SSH, where one would be able to see and mount all layers. However, this tool seems to be discontinued.
Does anybody have an idea on 1) how to connect to the docker engine to get access to the layers and 2) how to find out what files are contained in a particular layer?
edit: It seems to be possible to use docker diff to see the file differences between the original image and the running container, which is what I mainly wanted to achieve, but the original questions remain.
You can list the layers and their sizes with the docker history command. But to inspect the contents of all layers I recommend to use the dive tool.
Our application run in a Docker Container. The application retrieves files from a directory outside our network. The application needs to process the files and save them somewhere for a year.
What is the best approach? Do we use the writable part of the Docker or a directory on the hosting system? what is the addvantages and drawbacks?
Preferably you have your container as dumb as possible. Also if your container gets killed or crashes, you would lose your data if you have it stored within the container.
A volume mount would be a possible solution, what would make it possible to store files directly to the host; but my question would then be: why using Docker?
Another advantage of having a quite dumb container is that you would be able to scale more easily. Having a queue, you could scale the amount of containers needed upto the a certain threshold to process the queue. :-)
Therefore I would advise you to store the data somewhere else; another FTP-server, hosted and managed by yourself.
New to docker...
Need some help to clarify basic container concept...
AFAIK, each container would include app. code, library, runtime, cfg files, etc.
If I would run N numbers of containers for N numbers of app. and each of the app. happens to use a set of same lib. would it mean my host systems literally end up having N-1 numbers of duplicate libraries?
while container reduces OS overhead in VM approach of virtualization, I am just wondering if the container approach still has room to improve in terms of resource optimization.
Thanks
Mira
Containers are the runtime instance, defined by an image. Docker uses a unionfs to merge multiple layers together to create the root filesystem you see inside your container. Each step in the build of an image is a layer. And the container itself has a copy-on-write layer attached just to the container so that it sees it's own changes. Because of this, docker is able to point multiple instances of a running image back to the same image files for the unionfs layers, it never copies the layer when you spin up another container, they all point back to the same filesystem bytes.
In short, if you have a 1 gig image, and spin up 100 containers all using that same image, on disk will only be the 1 gig image plus any changes made in those 100 containers, not 100 gigs.
I'm really struggling to grasp the workflow of Docker. The issue is: where exactly are the deliverables?
One would expect the developers image to be the same one as the one used for testing, production.
But how can one develop use auto-reload and such(probably by some shared volumes) without building the image again and again?
The image for testers should be just fire and you are ready to go. How are the images split?
I heard something about data-container which holds probably the app deliverables. So does it mean that I will have one container for DB, one for App. Server and one versioned image for my code itself?
The issue is ,where exactly are the deliverables.
static deliverables (which never changes) are directly copied in the image.
dynamic deliverables (which are generated during a docker run session, or which are updated) are in volumes (either host mounted volume or data container volume), in order to have persistence across container life-cycle.
does it mean that I will have one container for DB, one for App.
Yes, in addition of your application container (which is what docker primarily is: it puts applications in container), you would have data container in order to isolate the data that needs persistence.