How to declare a Named Volume in a Dockerfile? - docker

If I use VOLUME in a docker file, it creates an anonymous volume. Is there any way to create a named volume from the dockerfile?
I'm looking for the Dockerfile equivalent of
docker run -v my-named-volume:/mnt/something repo/my-img
All I've managed to get via a Dockerfile is the equivalent of
docker run -v /mnt/something repo/my-img
I would think it is just not supported; however, the doc says this
The VOLUME instruction creates a mount point with the specified name and marks it as holding externally mounted volumes from native host or other containers.
It seems to imply that there is a way to name the volume, but it doesn't say how

It's not possible. I think the docs are worded maybe misleadingly.
“The specified name” refers to the path / directory name at which the volume will be created.

It is a bit unclear. It creates a mount point using that name but the actual file path does not use that name. If you do a docker inspect {container-name}, you will see the name like: "Destination": "/mnt/something", and the actual location like: "Source": "/var/lib/docker/volumes/cb80c7802244dd3669eed8afb7d94b61366844d80677eb180fa12002db04ea7c/_data",.
This is because the Dockerfile isn't tied to a particular host and can't be sure the host volume path would exist. You need to do that in the run (or equivalent) statement. You can use the api or docker inspect to find out where the volume is located once the container is created if you needed to use that info in a script or similar.
Declaring the volume in the Dockerfile insures that the data will persist and will be available to the host -- even if the location isn't preset.

This issue comment explains the current landscape better than the docs do.
This is by design. Let me try to explain; the VOLUME directive in the Dockerfile only defines that a volume should be created for a specific path in the image (/container when running). The name of the volume should never be dictated by an image, because that's something that should be defined at runtime

Related

Docker-kompose throws "mount destination not absolute" if volume moved into dockerfile [duplicate]

Docker run command has option to mount host directory into container
-v=[]: Create a bind mount with: [host-dir]:[container-dir]:[rw|ro].
If "host-dir" is missing, then docker creates a new volume.
And Dockerfile has VOLUME instruction
VOLUME ["/data"] - The VOLUME instruction will add one or more new volumes
to any container created from the image.
From what I see, there is no way to specify host-dir or rw/ro status when using Dockerfile.
Is there any other use of VOLUME in docker file other than wanting to share it with some other container?
Dockerfiles are meant to be portable and shared. The host-dir volume is something 100% host dependent and will break on any other machine, which is a little bit off the Docker idea.
Because of this, it is only possible to use portable instructions within a Dockerfile. If you need a host-dir volume, you need to specify it at run-time.
A common usage of VOLUME from Dockerfile is to store configuration or website sources so that it can be updated later by another container.

Docker force usage of volume

I know that you can specify a volume inside the dockerfile, but I see the problem that the user is not required to create such a volume.
What if he forgot to specify a volume and than there are many, possibly expensive to create, files saved there, but they are not persistent, because there is no volume specified?
So my question is if it is possible to force the user to create a volume for that mountpoint, or at least check at start time (inside the container) if there is a volume mounted, so that it can react to the missing volume?
EDIT: With the new information that there are automatic created unnnamed volumes I would also accept a user-side solution (not changing the container in such a ways that he checks the volume, but a docker-deamon settings which warn/prevents me from creating unnamed volumes by mistake.
I think the VOLUME declaration is the best you can do here.
In general, a container cannot force itself to be run with any particular options. You could make a similar argument that a container "must" be run with published port or with an attached stdin to be useful, but Docker doesn't allow an image to force these on either. (And more importantly, an image can't require direct access to the host filesystem, host networking, or privileged mode.)
As #masseyb notes in a comment, the key effect of the Dockerfile VOLUME directive is to create a new anonymous volume on the given directory if nothing else is mounted there. docker volume ls will show it and you should be able to use the volume ID directly in docker run -v options, so you won't actually lose data here. (There doesn't seem to be a command to give a name to the volume, surprisingly.)
In principle it's possible to check some things in an entrypoint wrapper script, but that won't work well for this volume case. The container can't tell whether a directory is an automatically-created anonymous volume or a new empty named volume.
(Also remember that volumes, including automatically-created anonymous volumes, are never committed to images. In your Dockerfile you can't change the directory content after you declare it a VOLUME; if an end user tries to docker commit a derived image it won't include the volume data. Unless you're sure it's what you want, I usually advise against declaring VOLUME. The case you describe in the question is pretty much the one case where it's useful.)

"a bind mount won't copy the container contents to the host automatically, unlike a named volume"

Need clarity on a comment here:
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
docs.docker.com/compose/compose-file/#volumes
Is this accurate? If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
how did Docker persist data across container restarts before there were named volumes?
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
Is this accurate?
Close to accurate, but I can see the confusion. Host volumes, aka bind mounts, do not have an initialization feature from docker. With anonymous and named volumes, docker will initialize the volume with the contents of the image at that path. This initialization includes ownership and permissions which helps avoid permission errors. This initialization only runs when the container is created and the volume is new or empty, so subsequent containers will not pickup changes to the image made in newer image versions.
If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data
in case of a container restart)?
Reads and writes from the app in the container will continue through to the host filesystem used in the bind mount as expected. It's only the initialization step that doesn't run.
how did Docker persist data across container restarts before there were named volumes?
There were data containers, mounting volumes from other containers, but this was inflexible (all volume paths were fixed to the path in the data container) and mixed management of persistent data with ephemeral containers, and has therefore been phased out.
Volumes are used to handle data persistence between containers. A single container restarting (rather than being replaced) will still have all the container specific filesystem changes. The docker rm command deletes these filesystem changes, along with container logs and metadata/configuration of the container.
The container specific changes are the read/write top layer of an overlay filesystem used by docker. Volume mounts are all separate mounts into subdirectories of this overlay filesystem (just like /home or /var are often separate filesystem mounts in the / filesystem of a Linux host, all reads and writes to those other paths go to a separate underlying filesystem).
If you're going to mount a volume into a container, and you want that volume to reliably contain some content from the image, you need to manually copy it there at container startup time. One way to do this is with an entrypoint wrapper script:
#!/bin/sh
# Copy data into a possibly-mounted location
cp -a /app/static /var/www
# Then run the image's CMD
exec "$#"
You'd include this in your image's Dockerfile
# Must use JSON-array syntax
ENTRYPOINT ["/app/entrypoint.sh"]
CMD same as it was before
There are two important details about Docker named volumes' initialization behavior to be aware of here. The first, which you note, is that Docker only copies content into a volume for Docker named volumes; it doesn't happen for bind mounts, and it doesn't happen in other environments like Kubernetes.
The second, more subtle detail is that the initialization only happens the first time the container runs. If there's already content in a volume that you mount into a container, it will hide what was already there. In other SO questions you can see this manifest as, for example, "I added a package to my Node package.json file, but when I put the node_modules directory in a volume, it ignores the update" or "I'm using a volume to export content to an nginx proxy but it doesn't update".
I think #BMitch having the accepted answer is correct, but I will just try to add in some details with the hope of being useful.
Is this accurate? If yes, then:
Given it is my claim being scrutinised - I totally defer to #BMitch here :)!
However I would also add:
https://github.com/docker/compose/issues/4581#issuecomment-389559090
Provides a layman explanation of how named volumes / host volumes behave
My explanation needs updated to reflect the notion of 'initialization'
https://stackoverflow.com/a/40030535/3080207
This is how I would recommend setting up volumes in docker-compose at the moment, courtesy of #kaiser
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
Both host volumes and named volumes can achieve this.
I think the point of contention is what you want to happen on the:
first run of the container
subsequent runs of the container and
the location/accessibility of the volume on the host system.
Once a volume is attached to a container (be it a named volume or bind mount), whatever is stored to that volume should be persisted between restarts - that effectively comes for free. This assumes the same docker-compose config, and no manual removal of volumes.
Previously it was a bit limiting using a named volume, as you couldn't tail logs, or edit code directly from the host as easily as you could with a bind mount - but it seems that problem is resolved / has a work around now.
Bind mounts are able to persist data between restarts. I personally find that bind volumes do what I want 99% of the time, that being said, named volumes can now 'do it all' and I'd be using those moving forward.
There are differences between them though, and I'm sure they'll still bite people occasionally, requiring them to reach out to actual experts, instead of users like me :).

Using docker volumes in packer build

Is it possible to use existing docker or external volumes in/during packer build?
I saw in https://www.packer.io/docs/builders/docker/:
"VOLUME /test1 /test2"
What does it exactly mean? "VOLUME String EX: "VOLUME FROM TO"" doesn't explain much. Is /test1 from host?
I also saw in https://www.packer.io/docs/builders/docker/#volumes:
volumes (map[string]string) - A mapping of additional volumes to mount into this container. The key of the object is the host path, the value is the container path.
How can I make use of that? Where/how can I put/declare it , suppose that I want to map /etc/dnsmasq.d/ host path into the container, during build time and run time as well?
It has the same meaning as the corresponding Dockerfile directive (indeed, all of the directives in that section of the Packer documentation are Dockerfile commands). You probably don't need or want it.
This is different from the docker run -v option to mount content into a container. You cannot specify mount options like this at container build time (whether using docker build or Packer). You don't need to specify a VOLUME to be able to mount content on some container directory.
The Dockerfile VOLUME directive isn't needed for most common uses and mostly only has confusing side effects. You do not need it to mount configuration into your application; you do not need it to overwrite application source code with a development tree; the most obvious thing it does do is prevent future RUN instructions from having an effect. I'd avoid it unless you understand in detail what it does and why you want it.

What is the purpose of Dockerfile command "Volume"?

When a Dockerfile contains VOLUME instruction (say) VOLUME [/opt/apache2/www, ...] (hope this path exists in real installation), it means this path is going to be mounted to something (right?). And this VOLUME instruction is for the image and not for one instance of it (container) but for every instance.
Anyway irrespective of whether an image contains a VOLUME defined or not, at the time of starting a container the run command can create a volume by mapping a local host path to a container path.
docker run --name understanding_volumes -v /localhost/path1:/opt/apache2/www -v /localhost/path2:/any/container/path image_name
The above should make it clear that though /any/container/path is not defined as a VOLUME in Dockerfile, we are able to mount it while running container.
That said, this SOF question throws some light on it - What is the purpose of defining VOLUME mount points within DockerFile rather than adhoc cmd-line -v?. Here one benefit of VOLUME instruction is mentioned. Which is, other containers can benefit from it. Using the --from-container (could not find this option for docker run --help, not sure if the answer meant --volumes-from) Anyway thus the mount point is accessible to other container in some kind of automatic way. Great.
My first question is, is the other volume path /any/container/path image_name mounted on to the container understanding_volumes also available to the second container using --from-container or --volumes-from (whichever option is correct)?
My next question is, is the use of VOLUME instruction just to let the other containers link to this path --> that is to make the data on /opt/apache2/www available to other containers through easy linking. So it's just sharing out. Or is there any data that can be made available to first container too.
Defining a volume in a Dockerfile has the advantage of specifying the volume location inside the image definition as documentation from the image creator to the user of the image. That's just about the only upside.
It was added to docker very early on, quite possibly when data containers were the only way to persist data. We now have a solution for named volumes that has obsoleted data containers. We have also added the compose file to define how containers are run in an easy to understand and reuse syntax.
While there is the one upside of self documented images, there are quite a few downsides, to the point that I strongly recommend against defining a volume inside the image to my clients and anyone publishing images for general reuse:
The volume is forced on the end user, there's no way to undefine a volume in the image.
If the volume is not defined at runtime (with a -v or compose file), the user will see anonymous volumes in their docker volume ls that have no association to what created them. These are almost always useless wastes of disk space.
They break the ability to extend the image since any changes to a volume in an image after the VOLUME line are typically ignored by docker. This means a user can never add their own initial volume data, which is very confusing because docker gives no warning that it is ignoring the user changes during the image build.
If you need to have a volume as a user a runtime, you can always define it with a -v or compose file, even if that volume is not defined in the Dockerfile. Many users have the misconception that you must define it in the image to be able to make it a named volume at runtime.
The ability to use --volumes-from is unaffected by defining the volume in the image, but I'd encourage you to avoid this capability. It does not exist in swarm mode, and you can get all the same capabilities along with more granularity by using a named volume that you mount in two containers.

Resources