Why have docker-compose volumes to be declared twice when not pointing to an actual folder on the host? - docker

In a docker-compose.yml file, do I really need to specify the volumes twice; inside and outside a service? If yes, Why? (the docker-compose part of the doc doesn't have much information on that)
I have the feeling that, in the case shown here where the myapp volume is not explicitly a folder on the host machine, we have to set it twice, but if it actually is a folder on the host machine, specifying it only inside the frontend service block is enough.
In addition, when the volume is specified outside the service block, it's almost always written as key: without any actual value (or sometime as key: {}), which makes me confused.
Moreover, when I run docker-compose down -v it actually only applies on volumes which are not explicitly specified as a folder on the host machine, according to the doc:
-v, --volumes Remove named volumes declared in the `volumes`
section of the Compose file and anonymous volumes
attached to containers.
So maybe the declaration of a volume outside a service is for making this volume identifiable, hence 'removable'. And on the other hand, it will never be removable if it's not set outside the service?

This is a whole bunch of questions - let's try to answer them sequentially:
1. Do I really need to specify the volumes twice (inside and outside the services section)?
This is not a duplicate specification: outside you declare the volume and inside you specify how to mount it into a container. A volume has an independent life cycle from services. It can be mounted by several services and it will retain data if services are restarted.
2. When the volume is specified outside the service block, it's almost always written as key: {}
This key-only notation is the default and does not require any driver configuration. However, if you needed to e.g. connect to NFS, you would have something like:
volumes:
example:
driver_opts:
type: "nfs"
o: "addr=10.40.0.199,nolock,soft,rw"
device: ":/docker/example"
Also, please differentiate between bind mounts and regular volumes. While regular volumes are managed independently from services (and containers), e.g. with docker volumes list, bind-mounts are mere mappings between the host and the container file system. They are tied to the container they are mounted to.
3. When I run docker-compose down -v it actually only applies on volumes which are not explicitly specified as a folder on the host machine
Yes, this won't remove bind mounts, since bind mounts are mere host-container filesystem mappings and therefore Docker does not create an independent volume entity for them.
For deeper understanding, please consider this excerpt from the documentation:
Bind mounts have been around since the early days of Docker. Bind
mounts have limited functionality compared to volumes. When you use a
bind mount, a file or directory on the host machine is mounted into a
container. The file or directory is referenced by its absolute path on
the host machine. By contrast, when you use a volume, a new directory
is created within Docker’s storage directory on the host machine, and
Docker manages that directory’s contents.

Related

"a bind mount won't copy the container contents to the host automatically, unlike a named volume"

Need clarity on a comment here:
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
docs.docker.com/compose/compose-file/#volumes
Is this accurate? If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
how did Docker persist data across container restarts before there were named volumes?
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
Is this accurate?
Close to accurate, but I can see the confusion. Host volumes, aka bind mounts, do not have an initialization feature from docker. With anonymous and named volumes, docker will initialize the volume with the contents of the image at that path. This initialization includes ownership and permissions which helps avoid permission errors. This initialization only runs when the container is created and the volume is new or empty, so subsequent containers will not pickup changes to the image made in newer image versions.
If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data
in case of a container restart)?
Reads and writes from the app in the container will continue through to the host filesystem used in the bind mount as expected. It's only the initialization step that doesn't run.
how did Docker persist data across container restarts before there were named volumes?
There were data containers, mounting volumes from other containers, but this was inflexible (all volume paths were fixed to the path in the data container) and mixed management of persistent data with ephemeral containers, and has therefore been phased out.
Volumes are used to handle data persistence between containers. A single container restarting (rather than being replaced) will still have all the container specific filesystem changes. The docker rm command deletes these filesystem changes, along with container logs and metadata/configuration of the container.
The container specific changes are the read/write top layer of an overlay filesystem used by docker. Volume mounts are all separate mounts into subdirectories of this overlay filesystem (just like /home or /var are often separate filesystem mounts in the / filesystem of a Linux host, all reads and writes to those other paths go to a separate underlying filesystem).
If you're going to mount a volume into a container, and you want that volume to reliably contain some content from the image, you need to manually copy it there at container startup time. One way to do this is with an entrypoint wrapper script:
#!/bin/sh
# Copy data into a possibly-mounted location
cp -a /app/static /var/www
# Then run the image's CMD
exec "$#"
You'd include this in your image's Dockerfile
# Must use JSON-array syntax
ENTRYPOINT ["/app/entrypoint.sh"]
CMD same as it was before
There are two important details about Docker named volumes' initialization behavior to be aware of here. The first, which you note, is that Docker only copies content into a volume for Docker named volumes; it doesn't happen for bind mounts, and it doesn't happen in other environments like Kubernetes.
The second, more subtle detail is that the initialization only happens the first time the container runs. If there's already content in a volume that you mount into a container, it will hide what was already there. In other SO questions you can see this manifest as, for example, "I added a package to my Node package.json file, but when I put the node_modules directory in a volume, it ignores the update" or "I'm using a volume to export content to an nginx proxy but it doesn't update".
I think #BMitch having the accepted answer is correct, but I will just try to add in some details with the hope of being useful.
Is this accurate? If yes, then:
Given it is my claim being scrutinised - I totally defer to #BMitch here :)!
However I would also add:
https://github.com/docker/compose/issues/4581#issuecomment-389559090
Provides a layman explanation of how named volumes / host volumes behave
My explanation needs updated to reflect the notion of 'initialization'
https://stackoverflow.com/a/40030535/3080207
This is how I would recommend setting up volumes in docker-compose at the moment, courtesy of #kaiser
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
Both host volumes and named volumes can achieve this.
I think the point of contention is what you want to happen on the:
first run of the container
subsequent runs of the container and
the location/accessibility of the volume on the host system.
Once a volume is attached to a container (be it a named volume or bind mount), whatever is stored to that volume should be persisted between restarts - that effectively comes for free. This assumes the same docker-compose config, and no manual removal of volumes.
Previously it was a bit limiting using a named volume, as you couldn't tail logs, or edit code directly from the host as easily as you could with a bind mount - but it seems that problem is resolved / has a work around now.
Bind mounts are able to persist data between restarts. I personally find that bind volumes do what I want 99% of the time, that being said, named volumes can now 'do it all' and I'd be using those moving forward.
There are differences between them though, and I'm sure they'll still bite people occasionally, requiring them to reach out to actual experts, instead of users like me :).

How can I share data between host and container using mounts

I've been attempting to share data between my host and my container. I've been reading a lot about volumes and I believe I have misunderstood some of the fundamentals around sharing data.
Here's how I've been doing it (with Docker Compose)
version: "2"
services:
my-server:
volumes:
- type: bind
source: ./test/
target: /var/logs
The problem with this approach is that the initial creation of the mount destroys any data in the target folder. So for example if my image was built from another image that had some logs in that folder (for whatever reason), the logs would be destroyed.
This is a major problem with my use case. I need to mount a volume (a folder, basically) so that I can share data between my host and guest, similar to how a shared folder with a VM would work.
I've looked into named volumes but from what I understand, named and anonymous volumes are designed to share data between containers, and not to share data with the host (which is what I need for my use case).
So besides bind mounts, is it possible share data between the host and container?
This is not really a Docker problem. I think you'll run into this with any mount. Basically you are already using the correct mechanism for sharing data between the host and your container.
When you mount something in linux, the mount target (i.e. the path at which you mount something) is always replaced with the root of whatever you mount. It does not merge the contents of the mount target with the contents of the (in this case) bind mounted directory. I'm surprised that works with VM shared folders because you run a high risk of a collision. e.g. same file in both locations. How would it resolve that? File system mounts are not the same as a dropbox like synchronisation of files between two locations.
I suggest that you do your bind mount to somewhere else in your container which has no contents and then modify your in-container workflow to handle this. In your example it sounds like you are attempting to collect logs. It also sounds like the containers configured log directory might have some contents which you want to be copied to the host. You could achieve this by having your container init itself by configuring a new log directory before starting your services/running anything, and copying any existing logs to that location. This new location would be the bind mount. Your init script could also detect if the bind mount was already used in this fashion and not sync over the data. This is really an application specific problem.

What Is The Difference Between Binding Mounts And Volumes While Handling Persistent Data In Docker Containers?

I want to know why we have two different options to do the same thing, What are the differences between the two.
We basically have 3 types of volumes or mounts for persistent data:
Bind mounts
Named volumes
Volumes in dockerfiles
Bind mounts are basically just binding a certain directory or file from the host inside the container (docker run -v /hostdir:/containerdir IMAGE_NAME)
Named volumes are volumes which you create manually with docker volume create VOLUME_NAME. They are created in /var/lib/docker/volumes and can be referenced to by only their name. Let's say you create a volume called "mysql_data", you can just reference to it like this docker run -v mysql_data:/containerdir IMAGE_NAME.
And then there's volumes in dockerfiles, which are created by the VOLUME instruction. These volumes are also created under /var/lib/docker/volumes but don't have a certain name. Their "name" is just some kind of hash. The volume gets created when running the container and are handy to save persistent data, whether you start the container with -v or not. The developer gets to say where the important data is and what should be persistent.
What should I use?
What you want to use comes mostly down to either preference or your management. If you want to keep everything in the "docker area" (/var/lib/docker) you can use volumes. If you want to keep your own directory-structure, you can use binds.
Docker recommends the use of volumes over the use of binds, as volumes are created and managed by docker and binds have a lot more potential of failure (also due to layer 8 problems).
If you use binds and want to transfer your containers/applications on another host, you have to rebuild your directory-structure, where as volumes are more uniform on every host.
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. While bind mounts are dependent on the directory structure of the host machine, volumes are completely managed by Docker. Volumes are often a better choice than persisting data in a container’s writable layer, because a volume does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container. More on
Differences between -v and --mount behavior
Because the -v and --volume flags have been a part of Docker for a long time, their behavior cannot be changed. This means that there is one behavior that is different between -v and --mount.
If you use -v or --volume to bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory.
If you use --mount to bind-mount a file or directory that does not yet exist on the Docker host, Docker does not automatically create it for you, but generates an error. More on
Docker for Windows shared folders limitation
Docker for Windows does make much of the VM transparent to the Windows host, but it is still a virtual machine. For instance, when using –v with a mongo container, MongoDB needs something else supported by the file system. There is also this issue about volume mounts being extremely slow.
More on
Bind mounts are like a superset of Volumes (named or unnamed).
Bind mounts are created by binding an existing folder in the host system (host system is native linux machine or vm (in windows or mac)) to a path in the container.
Volume command results in a new folder, created in the host system under /var/lib/docker
Volumes are recommended because they are managed by docker engine (prune, rm, etc).
A good use case for bind mount is linking development folders to a path in the container. Any change in host folder will be reflected in the container.
Another use case for bind mount is keeping the application log which is not crucial like a database.
Command syntax is almost the same for both cases:
bind mount:
note that the host path should start with '/'. Use $(pwd) for convenience.
docker container run -v /host-path:/container-path image-name
unnamed volume:
creates a folder in the host with an arbitrary name
docker container run -v /container-path image-name
named volume:
should not start with '/' as this is reserved for bind mount.
'volume-name' is not a full path here. the command will cause a folder to be created with path "/var/lib/docker/volumes/volume-name" in the host.
docker container run -v volume-name:/container-path image-name
A named volume can also be created beforehand a container is run (docker volume create). But this is almost never needed.
As a developer, we always need to do comparison among the options provided by tools or technology. For Volume & Bind mounts, I would suggest to list down what kind of application you are trying to containerize.
Following are the parameters that I would consider before choosing Volume over Bind Mounts:
Docker provide various CLI commands to Volumes easily outside containers.
For backup & restore, Volume is far easier than Bind as it depends upon the underlying host OS.
Volumes are platform-agnostic so they can work on Linux as well as on Window containers.
With Bind, you have 2 technologies to take care of. Your host machine directory structure as well as Docker.
Migration of Volumes are easier not only on local machines but on cloud machines as well.
Volumes can be easily shared among multiple containers.

What's the point of data-only docker containers?

Instead of using a data-only container, I can ...
create a directory on the host (say /opt/shared_data)
Run every container with -v /opt/shared_data:/some/mount/point_inside/container
voila, now /opt/shared_data is effectively shared amongst all containers , correct?
If my understanding is correct, if I create a data-only container and then use "--volumes-from" when running other containers, I am stuck mounting them in the same location they were mounted, whereas, this way I get to choose which directory they are mounted as in my containers.
So why do I need "data-only" containers? Besides, the volume just points to somewhere on the host (/var/lib/docker/volumes?) which is functionally equivalent to my /opt/shared_data anyway right? Whats the advantage of the former?
Data containers have been largely deprecated in favor of named volumes. There's really no advantage to using a data container over a named volume, and includes the disadvantage of being stuck with the mount points.
To compare named volumes with host volumes (aka bind mounts), you have have a few differences:
Host volumes include permission issues, users inside the container will differ from those outside the container and files may not be easily accessed from both environments
Named volumes add the ability to use any volume driver so you can mount your data from remote locations.
Named volumes are initialized to the contents of the image at that path, including all files and any directory permissions.
The latter point is a big one for me, it means you can create an initial default value for a data folder in your image, but update it using the container and keep those changes in a named volume. With bind mounts, if the directory is empty or doesn't exist, that's also what you get when you mount it in your container.

docker data volume vs mounted host directory

We can have a data volume in docker:
$ docker run -v /path/to/data/in/container --name test_container debian
$ docker inspect test_container
...
Mounts": [
{
"Name": "fac362...80535",
"Source": "/var/lib/docker/volumes/fac362...80535/_data",
"Destination": "/path/to/data/in/container",
"Driver": "local",
"Mode": "",
"RW": true
}
]
...
But if the data volume lives in /var/lib/docker/volumes/fac362...80535/_data, is it any different from having the data in a folder mounted using -v /path/to/data/in/container:/home/user/a_good_place_to_have_data?
Although using volumes and bind mounts feels the same (with the only change being the location of the directory), there are differences in behavior.
Volumes vs Bind Mounts
With Bind Mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its full or relative path on the host machine.
With Volume, a new directory is created within Docker's storage directory on the host machine, and Docker manages that directory's content.
Volumes advantages over bind mounts:
Volumes are easier to back up or migrate than bind mounts.
You can manage volumes using Docker CLI commands or the Docker API.
Volumes work on both Linux and Windows containers.
Volumes can be more safely shared among multiple containers.
Volume drivers allow you to store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
A new volume’s contents can be pre-populated by a container.
EDIT (9.9.2019):
According to #Sebi2020 comment, Bind mounts are much easier to backup. Docker doesn't provide any command to backup volumes. You have to use temporary containers with a bind mount to create backups.
Volumes
Created and managed by Docker. You can create a volume explicitly
using the docker volume create command, or Docker can create a volume
during container or service creation.
When you create a volume, it is stored within a directory on the
Docker host. When you mount the volume into a container, this
directory is what is mounted into the container. This is similar to
the way that bind mounts work, except that volumes are managed by
Docker and are isolated from the core functionality of the host
machine.
A given volume can be mounted into multiple containers simultaneously.
When no running container is using a volume, the volume is still
available to Docker and is not removed automatically. You can remove
unused volumes using docker volume prune.
When you mount a volume, it may be named or anonymous. Anonymous
volumes are not given an explicit name when they are first mounted
into a container, so Docker gives them a random name that is
guaranteed to be unique within a given Docker host. Besides the name,
named and anonymous volumes behave in the same ways.
Volumes also support the use of volume drivers, which allow you to
store your data on remote hosts or cloud providers, among other
possibilities.
Bind mounts
Available since the early days of Docker. Bind mounts have limited
functionality compared to volumes. When you use a bind mount, a file
or directory on the host machine is mounted into a container. The file
or directory is referenced by its full path on the host machine. The
file or directory does not need to exist on the Docker host already.
It is created on demand if it does not yet exist. Bind mounts are very
performant, but they rely on the host machine’s filesystem having a
specific directory structure available. If you are developing new
Docker applications, consider using named volumes instead. You can’t
use Docker CLI commands to directly manage bind mounts.
There is also tmpfs mounts.
tmpfs mounts
A tmpfs mount is not persisted on disk, either on the Docker host or
within a container. It can be used by a container during the lifetime
of the container, to store non-persistent state or sensitive
information. For instance, internally, swarm services use tmpfs mounts
to mount secrets into a service’s containers.
Reference:
https://docs.docker.com/storage/
is it any different from having the data in a folder mounted using -v /path/to/data/in/container:/home/user/a_good_place_to_have_data?
It is because, as mentioned in "Mount a host directory as a data volume"
The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from Dockerfile because built images should be portable. A host directory wouldn’t be available on all potential hosts.
If you have some persistent data that you want to share between containers, or want to use from non-persistent containers, it’s best to create a named Data Volume Container, and then to mount the data from it.
You can combine both approaches:
docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
Here we’ve launched a new container and mounted the volume from the dbdata container.
We’ve then mounted a local host directory as /backup.
Finally, we’ve passed a command that uses tar to backup the contents of the dbdata volume to a backup.tar file inside our /backup directory. When the command completes and the container stops we’ll be left with a backup of our dbdata volume.
Yes, this is quite different from a few perspectives. Like you wrote in the question's title, it is about understanding why we need data volumes vs bind mount to host.
Part 1 - Basic scenarios with examples
Lets take 2 scenarios.
Case 1: Web server
We want to provide our web server a configuration file that might change frequently. For example: exposing ports according to the current environment.
We can rebuild the image each time with the relevant setup or create 2 different images for each environment. Both of this solutions aren’t very efficient.
With Bind mounts Docker mounts the given source directory into a location inside the container.
(The original directory / file in the read-only layer inside the union file system will simply be overridden).
For example - binding a dynamic port to nginx:
version: "3.7"
services:
web:
image: nginx:alpine
volumes:
- type: bind #<-----Notice the type
source: ./mysite.template
target: /etc/nginx/conf.d/mysite.template
ports:
- "9090:8080"
environment:
- PORT=8080
command: /bin/sh -c "envsubst < /etc/nginx/conf.d/mysite.template >
/etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"
(*) Notice that this example could also be solved using Volumes.
Case 2 : Databases
Docker containers do not store persistent data -- any data that will be written to the writable layer in container’s union file system will be lost once the container stops running.
But what if we have a database running on a container, and the container stops - that means that all the data will be lost?
Volumes to the rescue.
Those are named file system trees which are managed for us by Docker.
For example - persisting Postgres SQL data:
services:
db:
image: postgres:latest
volumes:
- "dbdata:/var/lib/postgresql/data"
volumes:
- type: volume #<-----Notice the type
source: dbdata
target: /var/lib/postgresql/data
volumes:
dbdata:
Notice that in this case, for named volumes, the source is the name of the volume
(for anonymous volumes, this field is omitted).
Part 2 - Comparison
Differences in management and isolation on the host
Bind mounts exist on the host file system and being managed by the host maintainer. Applications / processes outside of Docker can also modify it.
Volumes can also be implemented on the host, but Docker will manage them for us and they can not be accessed outside of Docker.
Volumes are a much wider solution
Although both solutions help us to separate the data lifecycle from containers,
by using Volumes you gain much more power and flexibility over your system.
With Volumes we can design our data effectively and decouple it from other parts of the system by storing it in dedicated remote locations (e.g., in the cloud) and integrate it with external services like backups, monitoring, encryption and hardware management.
The difference between host directory and a data volume is in that that Docker manages the latter by placing it into the $DOCKER-DATA-DIR/volumes directory and attaching a reference to it (names or randomly generated ids). That is you get a little bit of convenience.
Both host directories and data volumes are directories on the host. Both are host dependent. You can't reference either of them in a Dockerfile; the VOLUME directive creates a new nameless (with randomly generated id) volume every time you launch a new container and cannot reference an existing volume.
* $DOCKER-DATA-DIR is /var/lib/docker here unless you changed the defaults.

Resources