What is the use of VOLUME in this official Dockerfile of postgresql - docker

I found the following code in the Dockerfile of official postgresql. https://github.com/docker-library/postgres/blob/master/11/Dockerfile
ENV PGDATA /var/lib/postgresql/data
RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA" # this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
VOLUME /var/lib/postgresql/data
I want to know what is the purpose of VOLUME in this regard.
VOLUME /var/lib/postgresql/data
As per my understanding it will create a new storage volume when we start a container and that storage volume will also be deleted permanently when the container is removed (docker stop contianerid; docker rm containeid)
Then if the data is not going to persist then why to use this. Because VOLUME are used if we want the data to persist.
My question is w.r.t what is its use if the postgres data is only going to remain only till the container is running and after that everything is wiped out. If i have done lot of work and in the end everything is gone then whats the use.

As per my understanding it will create a new storage volume when we start a container and that storage volume will also be deleted permanently when the container is removed (docker stop contianerid; docker rm containeid)
If you run a container with the --rm option, anonymous volumes are deleted when the container exits. If you do not pass the --rm option when creating the container, then the -v option to docker container rm will also delete volumes. Otherwise, these anonymous volumes will persist after a stop/rm.
That said, anonymous volumes are difficult to manage since it's not clear which volume contains what data. Particularly with images like postgresql, I would prefer if they removed the VOLUME line from their Dockerfile, and instead provided a compose file that defined the volume with a name. You can see more about what the VOLUME line does and why it creates problems in my answer over here.

Your understanding of how volumes works is almost correct but not completely.
As you stated, when you create a container from an image defining a VOLUME, docker will indeed create an anonymous volume (i.e. with a random name).
When you stop/remove the container the volume itself will not be deleted and will still be accessible by the docker volume family of commands.
Indeed in order to remove a container and delete associated volumes you have to use the -v flag as in docker rm -v container-name. This command will remove the container and delete all the anonymous volumes associated with it (named volume will never be deleted unless explicitly requested via docker volume rm volume-name).
So to sum up the VOLUME directive inside a Dockerfile is used to identify those places that will host persistent data and ensuring the following:
data will survive the life of the container by default
data can be shared with other containers (i.e. --volumes-from)
The most important aspect to me is that it also serves as a sort of implicit documentation for your user to let them know where the persistent state is kept (so that they can name the volumes via the -v flag of docker run).

Related

Combing VOLUME + docker run -v

I was looking for an explanation on the VOLUME entry when writing a Dockerfile and came across this statement
A volume is a persistent data stored in /var/lib/docker/volumes/...
You can either declare it in a Dockerfile, which means each time a container is started from the image, the volume is created (empty), even if you don't have any -v option.
You can declare it on runtime docker run -v [host-dir:]container-dir.
combining the two (VOLUME + docker run -v) means that you can mount the content of a host folder into your volume persisted by the container in /var/lib/docker/volumes/...
docker volume create creates a volume without having to define a Dockerfile and build an image and run a container. It is used to quickly allow other containers to mount said volume.
But I'm having a hard time understanding this line:
...combining the two (VOLUME + docker run -v) means that you can mount the content of a host folder into your volume persisted by the container in /var/lib/docker/volumes/...
For example, let's say I have a config file on my host machine and I run the container based off the image I made with the Dockerfile I wrote. Will it copy the config file into where the volume that I stated in my the volume entry?
Would it be something like (pseudocode)
#dockerfile
From Ubuntu
Run apt-get update
Run apt-get install mysql
Volume . /etc/mysql/conf.d
Cmd systemcl start MySQL
And when I run it
docker run -it -v /path/to/config/file: ubuntu_based_image
Is this what they mean?
You probably don't want VOLUME in your Dockerfile. It's not necessary to mount files or directories at runtime, and it has confusing side effects like making subsequent RUN commands silently lose state.
If an image does have a VOLUME, and you don't mount anything else there when you start the container, Docker will create an anonymous volume and mount it for you. This can result in space leaks if you don't clean these volumes up.
You can use a docker run -v option on any container directory regardless of whether or not it's declared as a VOLUME.
If you docker run -v /host/path:/container/path, the two directories are actually the same; nothing is copied, and writes to one are (supposed to be) immediately visible on the other.
docker run -v /host/path:/container/path bind mounts aren't visible in /var/lib/docker at all.
You shouldn't usually be looking at content in /var/lib/docker (and can't if you're not on a native-Linux host). If you need to access the volume file content directly, use a bind mount rather than a named or anonymous volume.
Bind mounts like you've shown are appropriate for injecting config files into containers, and for reading log files back out. Named volumes are appropriate for stateful applications' storage, like the data for a MySQL database. Neither type of volume is appropriate for code or libraries; build these directly into Docker images instead.

Google Cloud docker image file getting deleted

I am running a docker image for Juypter and tensorboard. The data seem to get deleted everytime the VM instance is stopped is there away to stop this from happening i could find anything on the web that would allow me to do this?
TL;DR: You are not persisting your data.
Docker containers does not persist data out of the box, you need to explicity tell docker to persist any data created inside the container when the container is deleted.
You can read more at Use volumes page at Docker documentation.
If you want to persist data you need to do the next steps:
Create a local volume inside the VM where you want to persist data. This command should be executed on the GCE instance
mkdir -p /opt/data/jupyterdata
Set the correct ownership of the folder to the user id that the user inside your container uses. For example, let's imagine that your container lspvic/tensorboard-notebook run the application using the user tensorflow with the UID 1500. So you need to set the ownership of your folder to the UID 1500:
chown 1500:1500 /opt/data/jupyterdata -R
Modify your docker run command to mount the local directory as a volume inside the container. For example, lets imagine that inside your container you want to save the files at /var/lib/jupyter (this is an example), you will need to modify the docker run command as follows:
docker run -it --rm -p 8888:8888 \
-v /opt/data/jupyterdata:/var/lib/jupyter:Z \
lspvic/tensorboard-notebook
NOTE: the :Z parameter is needed to avoid SELINUX issues
With this steps now your data saved on folder /var/lib/jupyter inside the container will be saved on /opt/data/jupyterdata inside the VM so no more data loss.

Move docker bind-mount to volume

Actually, I run my containers like this, for example :
docker run -v /nexus-data:/nexus-data sonatype/nexus3
^
After reading the documentation, I discover volumes that are completely managed by docker. For some reasons, I want to change the way to run my containers, to do something like this :
docker run -v nexus-data:/nexus-data sonatype/nexus3
^
I want to transfer my existing bind-mount to volumes.
But I don't want to lose the data into /nexus-data folder, is there a possibility to transfer this folder, to the new volume, whitout restart everything ? Because I've also Jenkins and Sonar containers for example, I just want to change the way to have persistent data. The is a proper way to do this ?
You can try out following steps so that you will not loose your current nexus-data.
#>docker run -v nexus-data:/nexus-data sonatype/nexus3
#>docker copy /nexus-data/. <container-name-or-id>:/nexus-data/
#>docker stop <container-name-or-id>
#>docker start <container-name-or-id>
docker copy will copy data from your host-machine's /nexus-data folder to container's FS /nexus-data folder which is your mounted volume.
Let me know if you face any issue while performing following steps.
Here's another way to do this, that I just used successfully with a Heimdall container. It's outlined in the documentation for the sonatype/nexus3 image:
Stop the running container (e.g. named nexus3)
Create a docker volume called nexus-data, creating it with the following command: docker volume create nexus-data)
By default, Docker will store the volume's content at /var/lib/docker/volumes/nexus-data/_data/
Simply copy the directory where you previously had been using a bind mount to the aforementioned volume directory (you'll need super user privileges to do this, or for the user to be part of the docker group): cp -R /path/to/nexus-data/* /var/lib/docker/volumes/nexus-data/_data/
Restart the nexus3 container with $ docker run -v nexus-data:/nexus-data sonatype/nexus3 --name=nexus3
Your container will be back up and running, with the files persisted in the directory /path/to/nexus-data/ now mirrored in the docker volume. Check if functionality is the same, of course, and if so, you can delete the /path/to/nexus-data/ directory
Q.E.D.

How to re-bind docker volume after removing data-only container?

Following the example in the official guide on Managing Data in Containers, suppose there is a data-only container created via
docker create -v /data --name data-container image /bin/true
Also suppose there is at least one running consumer container, using the volume created in the data-only container:
docker run -d --volumes-from data-container --name consumer-container image
Now consider deletion of the data-container with the -v switch while the consumer is still running:
docker rm -v data-container
I understand the docs say that this does not affect the consumer-container, because the volume actually remains available:
"To delete the volume from disk, you must explicitly call docker rm -v
against the last container with a reference to the volume."
So, that is comparable to the unlink() system call. The data volume remains available, in a "pending" state. With "pending" I mean that this exact volume cannot be bound to other containers anymore by simply using --volumes-from. Speaking in terms of the unlink() analogy, there is no name anymore by which the data could be accessed.
My question is: is there a way to re-create a data-only container based on this pending volume so that it can be included again via --volumes-from? Or is such a "pending" volume condemned, in the sense that once the last consumer stops the data will be gone?
It is possible to determine the location of the volume on the underlying host with docker inspect --format="{{.Volumes}}" <Container ID>.
Whilst the consumer-container is still running, you could possibly, on the host, copy the contents of the volume to another location and create a new data-container based on the contents of that new location.
A downside is that further changes might be written to the original volume in the time between the end of your copy and stopping the consumer-container (which leads to the deletion of the original volume).
It's not ideal then, but it's one way I could see that this might be possible.

How Docker container volumes work even when they aren't running?

Take a typical data only Docker container:
FROM stackbrew/busybox:latest
RUN mkdir /data
VOLUME /data
Now I have seen a great deal of them that are run like this:
docker run -name my-data data true
The true command exits as soon as it runs, and so does the container. But surprisingly it continues to serve the volume when you connect it with another container via --volumes-from my-data.
My question is, how does that work? How does a stopped container still allow access in it's volumes?
Volumes in docker are not a top-level thing. They are "simply" part of container's meta-data.
When you have VOLUME in your dockerfile or start a container with -v, Docker will create a directory in /var/lib/docker/volumes* with a random ID (this is the exact same process as creating an image with commit except it is empty) and add that random ID to the container's metadata.
When the container starts, Docker will mount-bind the directory /var/lib/docker/volumes/* at the given location for that volume.
When you use volumes-from, Docker will just lookup the volume id and the location from an other container, running or not and mount-bind the directory at the set location.
Volumes are not linked with the runtime, it is just directories that are mounted.
* With newer versions, Docker now uses the vfs driver for storage and /var/lib/docker/volumes/ is used only for metadatas like size, create time, etc. The actual data are stored in /var/lib/docker/vfs/dir/<volume id>

Resources