Is there any way currently to create a dynamically named volume during Docker's build process? I'd like to see something like:
sudo docker run -e MOUNT_POINT="/path/to/mount" module/sub-module
and then in the Dockerfile have something like:
ln -s /internal/path/to/storage $MOUNT_POINT
VOLUME [$MOUNT_POINT]
This would allow the highly valuable volumes-from directive to be used but each storage container built could have a variant mount point (and avoid colliding with a consumer who wanted to consume more than one data-volume-container).
Any ideas would be VERY welcome.
Here is how one should use volumes.
You have one container, say your application container for e.g. a database.
You have another container, say your volumes container actually holding your data.
You start your volumes container with the volumes parameter -v. Here you can name your volume dynamically.
You start your application container with the option --volumes-from using your volumes container.
See the docs for detailed information https://docs.docker.com/userguide/dockervolumes/
Related
First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.
I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:
# Dockerfile
VOLUME ["/foo"]
This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.
Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:
#!/usr/bin/env bash
# Run container for the first time
docker run -t foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t foo
I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.
Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):
#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data
# Run container for the first time
docker run -t -v foo_data:/foo foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t -v foo_data:/foo foo
Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.
So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.
Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.
Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.
Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.
Docker Data Containers
Why Docker Data Containers (Volumes!) are Good
Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.
Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.
However, the VOLUME instruction does come at a cost.
Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.
The latter issue results in problems like these.
How to “undeclare” volumes in docker image?
GitLab on Docker: how to persist user data between deployments?
For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.
tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.
First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.
I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:
# Dockerfile
VOLUME ["/foo"]
This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.
Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:
#!/usr/bin/env bash
# Run container for the first time
docker run -t foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t foo
I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.
Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):
#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data
# Run container for the first time
docker run -t -v foo_data:/foo foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t -v foo_data:/foo foo
Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.
So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.
Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.
Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.
Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.
Docker Data Containers
Why Docker Data Containers (Volumes!) are Good
Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.
Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.
However, the VOLUME instruction does come at a cost.
Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.
The latter issue results in problems like these.
How to “undeclare” volumes in docker image?
GitLab on Docker: how to persist user data between deployments?
For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.
tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.
I want to know why we have two different options to do the same thing, What are the differences between the two.
We basically have 3 types of volumes or mounts for persistent data:
Bind mounts
Named volumes
Volumes in dockerfiles
Bind mounts are basically just binding a certain directory or file from the host inside the container (docker run -v /hostdir:/containerdir IMAGE_NAME)
Named volumes are volumes which you create manually with docker volume create VOLUME_NAME. They are created in /var/lib/docker/volumes and can be referenced to by only their name. Let's say you create a volume called "mysql_data", you can just reference to it like this docker run -v mysql_data:/containerdir IMAGE_NAME.
And then there's volumes in dockerfiles, which are created by the VOLUME instruction. These volumes are also created under /var/lib/docker/volumes but don't have a certain name. Their "name" is just some kind of hash. The volume gets created when running the container and are handy to save persistent data, whether you start the container with -v or not. The developer gets to say where the important data is and what should be persistent.
What should I use?
What you want to use comes mostly down to either preference or your management. If you want to keep everything in the "docker area" (/var/lib/docker) you can use volumes. If you want to keep your own directory-structure, you can use binds.
Docker recommends the use of volumes over the use of binds, as volumes are created and managed by docker and binds have a lot more potential of failure (also due to layer 8 problems).
If you use binds and want to transfer your containers/applications on another host, you have to rebuild your directory-structure, where as volumes are more uniform on every host.
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. While bind mounts are dependent on the directory structure of the host machine, volumes are completely managed by Docker. Volumes are often a better choice than persisting data in a container’s writable layer, because a volume does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container. More on
Differences between -v and --mount behavior
Because the -v and --volume flags have been a part of Docker for a long time, their behavior cannot be changed. This means that there is one behavior that is different between -v and --mount.
If you use -v or --volume to bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory.
If you use --mount to bind-mount a file or directory that does not yet exist on the Docker host, Docker does not automatically create it for you, but generates an error. More on
Docker for Windows shared folders limitation
Docker for Windows does make much of the VM transparent to the Windows host, but it is still a virtual machine. For instance, when using –v with a mongo container, MongoDB needs something else supported by the file system. There is also this issue about volume mounts being extremely slow.
More on
Bind mounts are like a superset of Volumes (named or unnamed).
Bind mounts are created by binding an existing folder in the host system (host system is native linux machine or vm (in windows or mac)) to a path in the container.
Volume command results in a new folder, created in the host system under /var/lib/docker
Volumes are recommended because they are managed by docker engine (prune, rm, etc).
A good use case for bind mount is linking development folders to a path in the container. Any change in host folder will be reflected in the container.
Another use case for bind mount is keeping the application log which is not crucial like a database.
Command syntax is almost the same for both cases:
bind mount:
note that the host path should start with '/'. Use $(pwd) for convenience.
docker container run -v /host-path:/container-path image-name
unnamed volume:
creates a folder in the host with an arbitrary name
docker container run -v /container-path image-name
named volume:
should not start with '/' as this is reserved for bind mount.
'volume-name' is not a full path here. the command will cause a folder to be created with path "/var/lib/docker/volumes/volume-name" in the host.
docker container run -v volume-name:/container-path image-name
A named volume can also be created beforehand a container is run (docker volume create). But this is almost never needed.
As a developer, we always need to do comparison among the options provided by tools or technology. For Volume & Bind mounts, I would suggest to list down what kind of application you are trying to containerize.
Following are the parameters that I would consider before choosing Volume over Bind Mounts:
Docker provide various CLI commands to Volumes easily outside containers.
For backup & restore, Volume is far easier than Bind as it depends upon the underlying host OS.
Volumes are platform-agnostic so they can work on Linux as well as on Window containers.
With Bind, you have 2 technologies to take care of. Your host machine directory structure as well as Docker.
Migration of Volumes are easier not only on local machines but on cloud machines as well.
Volumes can be easily shared among multiple containers.
Let's say you are trying to dockerise a database (couchdb for example).
Then there are at least two assets you consider volumes for:
database files
log files
Let's further say you want to keep the db-files private but want to expose the log-files for later processing.
As far as I undestand the documentation, you have two options:
First option
define managed volumes for both, log- and db-files within the db-image
import these in a second container (you will get both) and work with the logs
Second option
create data container with a managed volume for the logs
create the db-image with a managed volume for the db-files only
import logs-volume from data container when running db-image
Two questions:
Are both options realy valid/ possible?
What is the better way to do it?
br volker
The answer to question 1 is that, yes both are valid and possible.
My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.
Mission critical
If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.
So taking the database files as an example, you could image these steps:
Create a reliable disk e.g. NFS that is backed-up on a regular basis
Attach this disk to your Docker host
Bind mount this disk into my database container which then writes database files to this disk.
So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:
docker run -d -v /reliable/disk:/data/db my-database-image
This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.
You can do exactly the same thing for the database logs:
docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image
Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:
docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor
This would be my recommended approach if this is a mission critical system.
Not mission critical
If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.
The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:
docker volume create db-data
docker volume create db-logs
You can then mount these volumes into your container from the command line:
docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image
These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:
docker volume inspect db-data
To find out where the data is being stored and back-up that location if you want to.
You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.
What is the best way to persist containers data with docker? I would like to be able to retain some data and be able to get them back when restarting my container. I have read this interesting post but it does not exactly answer my question.
As far as I understand, I only have one option:
docker run -v /home/host/app:/home/container/app
This will mount the countainer folder onto the host.
Is there any other option? FYI, I don't use linking containers (--link )
Using volumes is the best way of handling data which you want to keep from a container. Using the -v flag works well and you shouldn't run into issues with this.
You can also use the VOLUME instruction in the Dockerfile which means you will not have to add any more options at run time, however they're quite tightly coupled with the specific container, you'd need to use docker start, rather than docker run to get the data back (or of course -v to the volume which was created in the past, likely in /var/ somewhere).
A common way of handling volumes is to create a data volume container with volumes defined by -v Then when you create your app container, use the --volumes-from flag. This will make your new container use the same volumes as the container you used the -v on (your data volume container). Of course this may seem like you're shifting the issue somewhere else.
This makes it quite simple to share volumes over multiple containers. Perhaps you have a container for your application, and another for logstash.
create a volume-container: this format of -v creates a volume, directory e.g. /var/lib/docker/volume/d3b0d5b781b7f92771b7342824c9f136c883af321a6e9fbe9740e18b93f29b69
which is still a bind mounted /container/path/vol
docker run -v /foo/bar/vol --name volbox ubuntu
I can now use this container, as my volume.
docker run --volumes-from volbox --name foobox ubuntu /bin/bash
root#foobox# ls /container/path/vol
Now, if I distribute these two containers, they will just work. The volume will always be available to foobox, regardless which host it is deployed to.
The snag of course comes if you don't want your storage to be in /var/lib/docker/volumes...
I suggest you take a look at some of the excellent post by Michael Crosby
https://docs.docker.com/userguide/dockervolumes/
and the docker docs
https://docs.docker.com/userguide/dockervolumes/