How to re-bind docker volume after removing data-only container? - docker

Following the example in the official guide on Managing Data in Containers, suppose there is a data-only container created via
docker create -v /data --name data-container image /bin/true
Also suppose there is at least one running consumer container, using the volume created in the data-only container:
docker run -d --volumes-from data-container --name consumer-container image
Now consider deletion of the data-container with the -v switch while the consumer is still running:
docker rm -v data-container
I understand the docs say that this does not affect the consumer-container, because the volume actually remains available:
"To delete the volume from disk, you must explicitly call docker rm -v
against the last container with a reference to the volume."
So, that is comparable to the unlink() system call. The data volume remains available, in a "pending" state. With "pending" I mean that this exact volume cannot be bound to other containers anymore by simply using --volumes-from. Speaking in terms of the unlink() analogy, there is no name anymore by which the data could be accessed.
My question is: is there a way to re-create a data-only container based on this pending volume so that it can be included again via --volumes-from? Or is such a "pending" volume condemned, in the sense that once the last consumer stops the data will be gone?

It is possible to determine the location of the volume on the underlying host with docker inspect --format="{{.Volumes}}" <Container ID>.
Whilst the consumer-container is still running, you could possibly, on the host, copy the contents of the volume to another location and create a new data-container based on the contents of that new location.
A downside is that further changes might be written to the original volume in the time between the end of your copy and stopping the consumer-container (which leads to the deletion of the original volume).
It's not ideal then, but it's one way I could see that this might be possible.

Related

Can one recover output from a Docker container after it has been removed

I've just started learning Docker for work, and my current belief is that if one runs a Docker container with the rm option - i.e.
docker run -rm mycontainer python3 ./mycode.py --logging='true'
then whatever output is produced disappears when the container closes. However, I just came across some code documentation that said:
"--rm: removes container at end of execution. Note - since output is stored in a volume, this persists beyond the life of the container"
What does this mean?
The original command this came from had the form:
docker run -it --name p2p -d --rm \
--mount type=bind,source=/home/me/scripts,target=/scripts \
--mount source=data,target=/data \
--mount source=output,target=/output \
--gpus device=GPU-5jhfjhjhjg-jhg-jgjgjh \
my_docker_container \
python3 mycode.py --logging='true' <lots of other flags>
What does it mean "the output is stored in a volume" and how do I go about finding this volume?
Docker volumes are basically just directories on the host, usually under /var/lib/docker/overlay. It's a little trickier to get to /var/lib/docker on OS X.
You can run docker volume ls to list the volumes, and docker volume inspect <id> to get the path on disk. The volume should hang around after the container is removed, unless you explicitly remove it or run docker system prune (and should be automatically re-attached by running the same command).
It looks like you didn't actually mount a volume with your command, but I've occasionally lucked out and found data under /var/lib/docker that hasn't been deleted/garbage-collected/etc yet.
Volumes are the primary method of persisting data beyond the lifetime of a container and also for sharing data between containers so that, provided that the volume is writeable, the change one container makes is visible to another.
Think of it like a network file storage shared between two computers on a network with absolutely no hard disk of their own. Now, if the computer were to shut down and get restarted, by itself it doesn't have a hard-disk to get persisted data from but because of the network file storage, it can see the latest updates to the contents made by the other machine. Same goes with volumes and containers.
The source of a docker volume could be any logical abstraction of persistant disk storage, whether its a windows drive or a linux mount point. When mounting a volume on a container, you're basically creating a linux mount point within the container that is pointing to the outside logical storage so that it sees what the host sees and vice versa. For example, in the example you shared, the host mount point /home/me/scripts/ contents is seen by the container as belonging to /scripts. In fact, if you enter the bash shell of the container and run rm on any file in /scripts within the container, it will be result in the /home/me/scripts/ content being removed as well but in reality, it IS the same thing being point at by host and container.
Volumes are essential for running databases in containers because the container by itself is ephemeral and everything is lost when it dies. But having a volume means that if the db container is started up again with the same volume mount pointing to the host file system where db data is residing, the db state remains intact.
Most of what I said is aimed at getting the basic idea of a volume and not towards being completely accurate-I hope you get what I am saying. Here is a great article that goes deeper into docker volumes. It's 5 years hold but the concept still holds.

What is the actual advantage of declaring a VOLUME in a Dockerfile? [duplicate]

First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.
I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:
# Dockerfile
VOLUME ["/foo"]
This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.
Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:
#!/usr/bin/env bash
# Run container for the first time
docker run -t foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t foo
I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.
Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):
#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data
# Run container for the first time
docker run -t -v foo_data:/foo foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t -v foo_data:/foo foo
Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.
So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.
Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.
Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.
Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.
Docker Data Containers
Why Docker Data Containers (Volumes!) are Good
Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.
Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.
However, the VOLUME instruction does come at a cost.
Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.
The latter issue results in problems like these.
How to “undeclare” volumes in docker image?
GitLab on Docker: how to persist user data between deployments?
For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.
tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.

What is the use of VOLUME in this official Dockerfile of postgresql

I found the following code in the Dockerfile of official postgresql. https://github.com/docker-library/postgres/blob/master/11/Dockerfile
ENV PGDATA /var/lib/postgresql/data
RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA" # this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
VOLUME /var/lib/postgresql/data
I want to know what is the purpose of VOLUME in this regard.
VOLUME /var/lib/postgresql/data
As per my understanding it will create a new storage volume when we start a container and that storage volume will also be deleted permanently when the container is removed (docker stop contianerid; docker rm containeid)
Then if the data is not going to persist then why to use this. Because VOLUME are used if we want the data to persist.
My question is w.r.t what is its use if the postgres data is only going to remain only till the container is running and after that everything is wiped out. If i have done lot of work and in the end everything is gone then whats the use.
As per my understanding it will create a new storage volume when we start a container and that storage volume will also be deleted permanently when the container is removed (docker stop contianerid; docker rm containeid)
If you run a container with the --rm option, anonymous volumes are deleted when the container exits. If you do not pass the --rm option when creating the container, then the -v option to docker container rm will also delete volumes. Otherwise, these anonymous volumes will persist after a stop/rm.
That said, anonymous volumes are difficult to manage since it's not clear which volume contains what data. Particularly with images like postgresql, I would prefer if they removed the VOLUME line from their Dockerfile, and instead provided a compose file that defined the volume with a name. You can see more about what the VOLUME line does and why it creates problems in my answer over here.
Your understanding of how volumes works is almost correct but not completely.
As you stated, when you create a container from an image defining a VOLUME, docker will indeed create an anonymous volume (i.e. with a random name).
When you stop/remove the container the volume itself will not be deleted and will still be accessible by the docker volume family of commands.
Indeed in order to remove a container and delete associated volumes you have to use the -v flag as in docker rm -v container-name. This command will remove the container and delete all the anonymous volumes associated with it (named volume will never be deleted unless explicitly requested via docker volume rm volume-name).
So to sum up the VOLUME directive inside a Dockerfile is used to identify those places that will host persistent data and ensuring the following:
data will survive the life of the container by default
data can be shared with other containers (i.e. --volumes-from)
The most important aspect to me is that it also serves as a sort of implicit documentation for your user to let them know where the persistent state is kept (so that they can name the volumes via the -v flag of docker run).

What is the practical purpose of VOLUME in Dockerfile?

First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.
I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:
# Dockerfile
VOLUME ["/foo"]
This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.
Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:
#!/usr/bin/env bash
# Run container for the first time
docker run -t foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t foo
I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.
Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):
#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data
# Run container for the first time
docker run -t -v foo_data:/foo foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t -v foo_data:/foo foo
Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.
So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.
Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.
Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.
Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.
Docker Data Containers
Why Docker Data Containers (Volumes!) are Good
Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.
Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.
However, the VOLUME instruction does come at a cost.
Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.
The latter issue results in problems like these.
How to “undeclare” volumes in docker image?
GitLab on Docker: how to persist user data between deployments?
For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.
tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.

understanding parameters passed to -v in Docker

If i run the below command in my terminal,
$ docker create -v /dbdata --name dbdata training/postgres /bin/true
I kind of understand how the -v command works now, but i am still not quite understanding the above command quite well, i am essentially having a difficulty understanding how,
-v /dbdata
works , what is that part of the command really doing ? I really fail to understand what -v /dbdata is really doing ?
The -v /dbdata allocates a Docker volume and makes it available at /dbdata inside the container. A Docker "volume" is a chunk of storage allocated by the docker storage engine that is distinct from the rest of your container filesystem.
Quoting from the documentation:
Data volumes
A data volume is a specially-designated directory within one or more
containers that bypasses the Union File System. Data volumes provide
several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point,
that existing data is copied into the new volume upon volume
initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
Data volumes are designed to persist data, independent of the
container’s life cycle. Docker therefore never automatically delete
volumes when you remove a container, nor will it “garbage collect”
volumes that are no longer referenced by a container.

Resources