docker volume type - bind vs volume - docker

TLDR
In docker-compose, what's the difference between
volumes:
- type: volume
source: mydata
target: /data
and
volumes:
- type: bind
source: mydata
target: /data
?
The question in long:
When you specify the volumes option in your docker-compose file, you can use the long-syntax style
According to the docs, the type option accepts 3 different values: volume, bind and tmpfs:
I understand the tmpfs option - it means that the volume will not be saved after the container is down..
But I fail to find any reference in the docs about the difference between the other 2 options: bind and volume, could someone enlighten me about that?

When bind mounts are files coming from your host machine, volumes are something more like the nas of docker.
Bind mounts are files mounted from your host machine (the one that runs your docker daemon) onto your container.
Volumes are like storage spaces totally managed by docker.
You will find, in the literature, two types of volumes:
named volumes (you provide the name of it)
anonymous volumes (usual UUID names from docker, like you can find them on container or untagged images)
Those volumes come with their own set of docker commands; you can also consult this list via
docker volume --help
You can see your existing volumes via
docker volume ls
You can create a named volume via
docker volume create my_named_volume
But you can also create a volume via a docker-compose file
version: "3.3"
services:
mysql:
image: mysql
volumes:
- type: volume
source: db-data
target: /var/lib/mysql/data
volumes:
db-data:
Where this is the part saying please docker, mount me the volume named db-data on top of the container directory /var/lib/mysql/data
- type: volume
source: db-data
target: /var/lib/mysql/data
And this is the part saying to docker please create me a volume named db-data
volumes:
db-data:
Docker documentation about the three mount types:
https://docs.docker.com/storage/bind-mounts/
https://docs.docker.com/storage/volumes/
https://docs.docker.com/storage/tmpfs/

If I understood you correctly, you're asking in other words: What is the difference between Volumes and bind mounts?
Differences in management and isolation on the host
Bind mounts exist on the host file system and being managed by the host maintainer. Applications / processes outside of Docker can also modify it.
Volumes can also be implemented on the host, but Docker will manage them for us and they can not be accessed outside of Docker.
Volumes are a much wider solution
Although both solutions help us to separate the data lifecycle from containers,
by using Volumes you gain much more power and flexibility over your system.
With Volumes we can design our data effectively and decouple it from the host and other parts of the system by storing it dedicated remote locations (Cloud for example) and integrate it with external services like backups, monitoring, encryption and hardware management.
More Volumes advantages over bind mounts:
No host concerns.
Can be managed using Docker CLI.
Volumes can save you some uid/gid issues related permissions which occur in cases like when a container user's uid does not match the host gid.
A new volume’s contents can be pre-populated by a container.
Examples
Lets take 2 scenarios.
Case 1: Web server.
We want to provide our web server a configuration file that might change frequently. For example: exposing ports according to the current environment.
We can rebuild the image each time with the relevant setup or create 2 different images for each environment. Both of this solutions aren’t very efficient.
With Bind mounts Docker mounts the given source directory into a location inside the container.
(The original directory / file in the read-only layer inside the union file system will simply be overridden).
For example - binding a dynamic port to nginx:
version: "3.7"
services:
web:
image: nginx:alpine
volumes:
- type: bind #<-----Notice the type
source: ./mysite.template
target: /etc/nginx/conf.d/mysite.template
ports:
- "9090:8080"
environment:
- PORT=8080
command: /bin/sh -c "envsubst < /etc/nginx/conf.d/mysite.template >
/etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"
(*) Notice that this example could also be solved using Volumes.
Case 2 : Databases.
Docker containers do not store persistent data: any data that will be written to the writable layer in container’s union file system will be lost once the container stop running.
But what if we have a database running on a container, and the container stops - that means that all the data will be lost?
Volumes to the rescue.
Those are named file system trees which are managed for us by Docker.
For example - persisting Postgres SQL data:
services:
db:
image: postgres:latest
volumes:
- "dbdata:/var/lib/postgresql/data"
volumes:
- type: volume #<-----Notice the type
source: dbdata
target: /var/lib/postgresql/data
volumes:
dbdata:
Notice that in this case, for named volumes, the source is the name of the volume
(For anonymous volumes, this field is omitted).

Feature
Bind
Volume                                 
Internal soul
Bind mounts attach a user-specified location on host filesystem to a specific point in a container file tree.
Volume attach with disk storage on the host filesystem or cloud storage.
command
--mount type=bind,src="",dst=""
Docker CLI docker volume command
Dependency
dependent on location on to the host filesystem.
Container-independent data management
Separation of concerns
No
Yes
Conflict with other containers
Yes Example: multiple instances of Cassandra that all use the same host location as a bind mount for data storage. In that case, each of the instances would compete for the same set of files. Without other tools such as file locks, that would likely result in corruption of the database.
No. By default, Docker creates volumes by using the local volume plugin.
When to choose
1- Bind mounts are useful when the host provides a file or directory that is needed by a program running in a container, or when that containerized program produces a file or log that is processed by users or programs running outside containers. 2- appropriate tools for workstations, machines with specialized concerns 3- systems with more traditional configuration management tooling.
Working with Persistent storage 1. Databases 2. Cloud storage
When not to choose
Better to avoid these kinds of specific bindings in generalized platforms or hardware pools.
To be written

Related

Best practice - Anonymous volume vs bind mount

In a container,
anonymous volume can be created
with syntax(VOLUME /build) in Dockerfile
or
below syntax with volumes having /build entry
cache:
build: ../../
dockerfile: docker/dev/Dockerfile
volumes:
- /tmp/cache:/cache
- /build
entrypoint: "true"
My understanding is, both approach(above) make volume /build also available after container goes in Exited state.
Volume is anonymous because /build points to some random new location(in /var/lib/docker/volumes directory) in docker host
I see that anonymous volumes are more safer than named volumes(like /tmp/cache:/cache).
Because /tmp/cache location is vulnerable because there is more chance that this location is used by more than one docker container.
1)
Why anonymous volume usage is discouraged?
2)
Is
VOLUME /build in Dockerfile
not same as
volumes:
- /build
in docker-compose.yml file? Is there a scenario, where we need to mention both?
You're missing a key third option, named volumes. If you declare:
version: '3'
volumes:
build: {}
services:
cache:
image: ...
volumes:
- build:/build
Docker Compose will create a named volume for you; you can see it with docker volume ls, for example. You can explicitly manage named volumes' lifetime, and set several additional options on them which are occasionally useful. The Docker documentation has a page describing named volumes in some detail.
I'd suggest that named volumes are strictly superior to anonymous volumes, for being able to explicitly see when they are created and destroyed, and for being able to set additional options on them. You can also mount the same named volume into several containers. (In this sequence of questions you've been asking, I'd generally encourage you to use a named volume and mount it into several containers and replace volumes_from:.)
Named volumes vs. bind mounts have advantages and disadvantages in both directions. Bind mounts are easy to back up and manage, and for content like log files that you need to examine directly it's much easier; on MacOS systems they are extremely slow. Named volumes can run independently of any host-system directory layout and translate well to clustered environments like Kubernetes, but it's much harder to examine them or back them up.
You almost never need a VOLUME directive. You can mount a volume or host directory into a container regardless of whether it's declared as a volume. Its technical effect is to mount a new anonymous volume at that location if nothing else is mounted there; its practical effect is that it prevents future Dockerfile steps from modifying that directory. If you have a VOLUME line you can almost always delete it without affecting anything.
Actually, anonymous volumes (/build) usage is encouraged over the use of bind mounts (/tmp/cache:/cache):
Volumes have several advantages over bind mounts:
Volumes are easier to back up or migrate than bind mounts.
You can manage volumes using Docker CLI commands or the Docker API.
Volumes work on both Linux and Windows containers.
Volumes can be more safely shared among multiple containers.
Volume drivers let you store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other
functionality.
New volumes can have their content pre-populated by a container.
Regarding your second question, yes. You can create anonymous volumes in docker-compose file or in the Dockerfile. No need to specify in both places.

What's the difference between declaring in docker-compose.yml volume as section and under a service?

What's the difference between declaring in the docker-compose.yml file a volume section and just using the volumes keyword under a service?
For example, I map a volume this way for a container:
services:
mysqldb:
volumes:
- ./data:/var/lib/mysql
This will map to the folder called data from my working directory.
But I could also map a volume by declaring a volume section and use its alias for the container:
services:
mysqldb:
volumes:
- data_volume:/var/lib/mysql
volumes:
data_volume:
driver: local
In this method, the actual location of where the mapped files are stored appears to be somewhat managed by docker compose.
What are the differences between these 2 methods or are they the same? Which one should I really use?
Are there any benefits of using one method over the other?
The difference between the methods you've described is that first method is a bind mount, and the other is a volume. These are more of Docker functions (rather than Docker Compose), and there are several benefits volumes provide over mounting a path from your host's filesystem. As described in the documentation, they:
are easier to back up or migrate
can be managed with docker volumes or the API (as opposed to the raw filesystem)
work on both Linux and Windows containers
can be safely shared among multiple containers
can have content pre-populated by a container (with bind mounts sometimes you have to copy data out, then restart the container)
Another massive benefit to using volumes are the volume drivers, which you'd specify in place of local. They allow you to store volumes remotely (i.e. cloud, etc) or add other features like encryption. This is core to the concept of containers, because if the running container is stateless and uses remote volumes, then you can move the container across hosts and it can be run without being reconfigured.
Therefore, the recommendation is to use Docker volumes. Another good example is the following:
services:
webserver_a:
volumes:
- ./serving/prod:/var/www
webserver_b:
volumes:
- ./serving/prod:/var/www
cache_server:
volumes:
- ./serving/prod:/cache_root
If you move the ./serving directory somewhere else, the bind mount breaks because it's a relative path. As you noted, volumes have aliases and have their path managed by Docker, so:
you wouldn't need to find and replace the path 3 times
the volume using local stores data somewhere else on your system and would continue mounting just fine
TL;DR: try and use volumes. They're portable, and encourage practices that reduce dependencies on your host machine.

Understanding volumn in docker compose

The following is an example given in https://docker-curriculum.com/
version: "3"
services:
es:
image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
container_name: es
environment:
- discovery.type=single-node
ports:
- 9200:9200
volumes:
- esdata1:/usr/share/elasticsearch/data
web:
image: prakhar1989/foodtrucks-web
command: python app.py
depends_on:
- es
ports:
- 5000:5000
volumes:
- ./flask-app:/opt/flask-app
volumes:
esdata1:
driver: local
and it says The volumes parameter specifies a mount point in our web container where the code will reside about the /opt/flask-app
I think it means, /opt/flask-app is a mount point and it points to the host machines ./flask-app
However it doesn't say anything about esdata1 and I can't apply the same explanation as given to /opt/flask-app since there's no esdata1 directory/file in the host machine.
What is happening for the esdata1 ?
My guess is that it means creating a volume (The closest thing I can think of is a disk partition) and name it esdata1 and mount it on /usr/share/elasticsearch/data, am I correct on this guess?
These are a bit different things: volumes and bind mounts. Bind mounts let you specify folder on host machine, which would serve as a storage. Volumes (at lease for local driver) also have folders on host machines, but their location is managed by Docker and is sometimes a bit more difficult to find.
When you specify volume in docker-compose.yml, if your path starts with / or . it becomes a bind mount, like in web service. Otherwise, if it is a single verb, it is a volume, like for es service.
You can inspect all volumes on your host machine by running docker volume ls.
What is happening for the esdata1 ? My guess is that it means creating
a volume (The closest thing I can think of is a disk partition) and
name it esdata1 and mount it on /usr/share/elasticsearch/data, am I
correct on this guess?
That's all correct.
I do not pretend on setting up the rules, but in general, volumes are more suitable for sharing common data between several containers, like in docker-compose, while bind mounts suite better for sharing data from host to container, like some initial configs for services.

Docker Anonymous Volumes

I've seen Docker volume definitions in docker-compose.yml files like so:
-v /path/on/host/modules:/var/www/html/modules
I noticed that Drupal's official image, their docker-compose.yml file is using anonymous volumes.
Notice the comments:
volumes:
- /var/www/html/modules
- /var/www/html/profiles
- /var/www/html/themes
# this takes advantage of the feature in Docker that a new anonymous
# volume (which is what we're creating here) will be initialized with the
# existing content of the image at the same location
- /var/www/html/sites
Is there a way to associate an anonymous volume with a path on the host machine after the container is running? If not, what is the point of having anonymous volumes?
Full docker-compose.yml example:
version: '3.1'
services:
drupal:
image: drupal:8.2-apache
ports:
- 8080:80
volumes:
- /var/www/html/modules
- /var/www/html/profiles
- /var/www/html/themes
# this takes advantage of the feature in Docker that a new anonymous
# volume (which is what we're creating here) will be initialized with the
# existing content of the image at the same location
- /var/www/html/sites
restart: always
postgres:
image: postgres:9.6
environment:
POSTGRES_PASSWORD: example
restart: always
Adding a bit more info in response to a follow-up question/comment from #JeffRSon asking how anonymous volumes add flexibility, and also to answer this question from the OP:
Is there a way to associate an anonymous volume with a path on the host machine after the container is running? If not, what is the point of having anonymous volumes?
TL;DR: You can associate a specific anonymous volume with a running container via a 'data container', but that provides flexibility to cover a use case that is now much better served by the use of named volumes.
Anonymous volumes were helpful before the addition of volume management in Docker 1.9. Prior to that, you didn't have the option of naming a volume. With the 1.9 release, volumes became discrete, manageable objects with their own lifecycle.
Before 1.9, without the ability to name a volume, you had to reference it by first creating a data container
docker create -v /data --name datacontainer mysql
and then mounting the data container's anonymous volume into the container that needed access to the volume
docker run -d --volumes-from datacontainer --name dbinstance mysql
These days, it's better to use named volumes since they are much easier to manage and much more explicit.
Anonymous volumes are equivalent to having these directories defined as VOLUME's in the image's Dockerfile. In fact, directories defined as VOLUME's in a Dockerfile are anonymous volumes if they are not explicitly mapped to the host.
The point of having them is added flexibility.
PD:
Anonymous volumes already reside in the host somewhere in /var/lib/docker (or whatever directory you configured). To see where they are:
docker inspect --type container -f '{{range $i, $v := .Mounts }}{{printf "%v\n" $v}}{{end}}' $CONTAINER
Note: Substitute $CONTAINER with the container's name.
One possible usecase of anonymous volumes in these days is in combination with Bind Mounts. When you want to bind some folder but without any specific subfolders. These specific subfolders should be then set as named or anonymous volumes. It will guarantee that these subfolders will be present in your container folder which is bounded outside the container but you do not have to have it in your bound folder on the host machine at all.
For example you can have your frontend NodeJS project built in container where is needed node_modules folder for it but you dont need this folder for your coding at all. You can then map your project folder to some folder outside the container and set the node_modules folder as an anonymous volume. Node_modules folder will be present in the container all the time even if you do not have it on the host machine in your working folder.
Not sure why Drupal developers suggest such settings. Anyways, I can think of two differences:
With named volumes you have a name that suggests to which project it belongs.
After docker-compose down && docker-compose up -d a new empty anonymous volume gets attached to the container. (But the old one doesn't disappear. docker doesn't delete volumes unless you tell it to.) With named volumes you'll get the volume that was attached to the container before docker-compose down.
As such, you probably don't want to put data you don't want to lose into an anonymous volume (like db or something). Again, they won't disappear by themselves. But after docker-compose down && docker-compose up -d && docker volume prune a named volume will survive.
For something less critical (like node_modules) I don't have strong argument for or against named volumes.
Is there a way to associate an anonymous volume with a path on the host machine after the container is running?
For that you need to change the settings, e.g. /var/www/html/modules -> ./modules:/var/www/html/modules, and do docker-compose up -d. But that will turn an anonymous volume into a bind mount. And you will need to copy the data from the volume to ./modules. Similarly, you can turn an anonymous volume into a named volume.

What is the difference between volumes-from and volumes?

I saw the docker-compose patterns but I'm confused. What is the best way to make composed containers.
When should I use link, or volumes_from.
When should I use volumes_from, volumes
#1 app-db-data
app:
image: someimage
link:
- db // data volume container name
db:
image: mysql
volumes_from:
- data // data volume name
data:
image: someimage
volumes:
- {host data}:{guest data}
#2 app-db+data
app:
image: someimage
link:
- db // data volume container name
db:
image: mysql
volumes:
- data // data file name
app
#1 app-service-data
app:
image: someimage
volumes_from:
- service // service container name
service:
image: mysql
volumes_from:
- data // image container name
data:
image: someimage
volumes:
- {host data}:{guest data}
#2 app-service+data
app:
image: someimage
volumes_from:
- service // service container name
service:
image: mysql
volumes:
- data // mounted file
Thanks
In short:
volumes_from mounts from other containers.
volumes mounts defined inline.
links connects containers.
A little bit more explained:
volumes_from mounts volumes from other containers. For example if you have data only containers and you want to mount these data only containers in the container that has your application code.
volumes is a the inline way to define and mount volumes. If you read #17798 you can see that named volumes can replace data only containers in most cases.
The simplest is then to use volumes. Since you can reuse them by naming them.
links is different. Because it does not mount. Instead it connects containers. So if you do:
app:
container_name: app_container
links:
- db
That means that if you connect to app_container with docker exec -it app_container bash and try ping db you will see that container is able to resolve ip for db.
This is because docker creates a network between containers.
Link and volumes_from are different concepts. Links are used when you need to connect (by network) two containers. In this case if you want to connect an App to the Database, the way to do this is by using a link, since applications use a port and host to connect to a database (not a directory on the filesystem).
Volumes and volumes_from differ in that the first one only declares volumes that docker will make persistent or host:guest mounts, but volumes_from tells docker to use a volumes that is already declared on another host (making it available to this host).
Of those 4 cases that you present, I think that the first and second are good choices. In the first you are creating a data only container, and make the mysql container use it. In the second case the data and the mysql container are the same.
Links and volumes are perfectly explained in the docker documentation.
Hope it helps.
Addition: Volumes_from is used when you want to mount all anon-volumes of a container - named volumes could have been mounted directly since the early days.
AFAICs https://docs.docker.com/compose/compose-file/#volumes . docker-compose has removed this functionality entirely, not sure how and why and if there is an alternative. But assume, you have an app container and you have a httpd container. Usually you would define the codebase folder, /var/www, as an anon volume and then mount it in httpd to be to serve static files using the httpd service, while passing all dynamic files like ruby/php/java to an upstream backend on app.
The point in using a anon volume and not a named volume is, that actually you want to be able to redeploy app and change the codebase ( app update ) which would not work, if app would have a named volume. That said, anon volumes are doing exactly that and thats why volumes_from is used here - using named volumes is no option is this case ( as it is very practical in a a lot of other cases ).
For the reference the upgrade guides for volumes_from:
https://docs.docker.com/compose/compose-file/compose-versioning/#upgrading
So volumes_from usually is used in a different context / scenario and named-volumes are the standard in a ll other cases as explained above. A brief post about that is https://stackoverflow.com/a/44744861/3625317

Resources