Specify origin of data for a shared volume - docker

I have a task that I already solved, but where I'm not satisfied with the solution. Basically, I have a webserver container (Nginx) and a fast-CGI container (PHP-FPM). The webserver container is built on an off-the-shelf image, the FCGI container is based on a custom image and contains the application files. Now, since not everything is sourcecode and processed on the FCGI container, I need to make the application files available inside the webserver container as well.
Here's the docker-compose.yml that does the job:
version: '3.3'
services:
nginx:
image: nginx:1-alpine
volumes:
- # customize just the Nginx configuration file
type: bind
source: ./nginx.conf
target: /etc/nginx/nginx.conf
- # mount application files from PHP-FPM container
type: volume
source: www-data
target: /var/www/my-service
read_only: true
volume:
nocopy: true
ports:
- "80:80"
depends_on:
- php-fpm
php-fpm:
image: my-service:latest
command: ["/usr/sbin/php-fpm7.3", "--nodaemonize", "--force-stderr"]
volumes:
- # create volume from application files
# This one populates the content of the volume.
type: volume
source: www-data
target: /var/www/my-service
volumes:
# volume with application files shared between nginx and php-fpm
www-data:
What I don't like here is mostly reflected by the comments concerning the volumes. Who creates and stores data should be obvious from the code and not from comments. Also, what I really dislike is that docker actually creates a place where it stores data for this volume. Not only does this use up disk space and increase startup time, it also requires me to never forget to use docker-compose down --volumes in order to refresh the content on next start. Imagine my anger when I found out that down didn't tear down what up created and that I was hunting ghosts from previous runs.
My questions concerning this:
Can I express in code that one container contains data that should be made available to other containers more clearly? The above code works, but it fails utterly to express the intent.
Can I avoid anything persistent being created to avoid above mentioned downsides?
I would have liked to investigate into things like tmpfs volumes or other volume options. My problem is that I can't find documentation for available volume drivers or even explore which volume drivers exist. Maybe I have missed some CLI for that, I'd really appreciate a nudge in the right direction here.

You can use the local driver with option type=tmpfs, for example:
volumes:
www-data:
driver: local
driver_opts:
type: tmpfs
device: tmpfs
Which will follow your requirements:
Data will be shared between containers at runtime
Data should not be persisted, i.e. volumes will be emptied when container are stopped, restarted or destroyed
This is CLI equivalent of
docker volume create --driver local --opt type=tmpfs --opt device=tmpfs www-data
Important note: this is NOT a Docker tmpfs mount, but a Docker volume using a tmpfs option. As stated in the local volume driver documentation it uses the Linux mount options, in our case --types to specify a tmpfs filesystem. Contrary to a simple tmpfs mount, it will allow you to share the volume between container while retaining classic behavior of a temporary filesystem
I can't find documentation for available volume drivers or even explore which volume drivers exist
Volume doc, including tmpfs, Bind mount and Volumes
local driver options are found in the docker volume create doc
Volume driver plugins - some are still updated regularly or seem maintained, but most of them have not been updated for a long time or are deprecated. The list does not seem exhaustive though, for instance vieux/sshfs is not mentioned.
Can I express in code that one container contains data that should be made available to other containers more clearly? The above code works, but it fails utterly to express the intent.
I don't think so, your code is are already quite clear as to the intent of this volume:
That's what volume are for: sharing data between containers. As stated in the doc Volumes can be more safely shared among multiple container, furthermore only containers are supposed to write into volumes.
nocopy and read_only clearly express that nginx relies on data written by another container as it will only be able to read from this volume
Given your volume is not external, it is safe to assume only another container from the same stack can use it
A bit of logic and experience with Docker allow to quickly come to the previous point, but even for less experienced Docker users your comments gives clear indications, and your comments are part of the code ;)

You also can make custom nginx image with copy of static from your php image
here is Dockerfile for nginx
FROM my-service:latest AS src-files
FROM nginx
COPY --from=src-files /path-to-static-in-my-service-image /path-to-static-in-nginx
This will allow you to use no volumes with source code
Also can use TAG from env variables in Dockerfile
FROM my-service:${TAG} AS src-files
...

It depends of the usecase of your setup. If it's only for local dev or if you want the same thing on production. On dev, having a volume populated manually or by one container for others could be OK.
But if you want something that will run the same way in production, you may need something else. For exemple, in production, I don't want to have my code in a volume but in my image in an immutable way and I just need to redeploy it.
For me a volume is not for storing application code but for storing data like cache, user uploaded content, etc. Something we want to keep between deployments.
So, if we want to have 2 images with the same files not in a volume, I will build 2 images with the application code and static content, one for php, one for nginx.
But the deployment is usually not synchrone for the 2 images. We solve this issue by deploying the PHP application first and the nginx application after. On nginx, we add a config that try to serve static content from it first and if the file doesn't exist, to ask it to PHP.
For the dev environment, we will reuse the same image but use a volume to mount the current code inside the container (an host bind mount).
But in some case, the bind mount could have some issues:
- On Mac the file sharing is slow but it should be better with the latest version of Docker Desktop in the Edge channel (2.3.1.0)
- On Windows, the file sharing is slow too
- On Linux, we need to be careful about the file permission and the user used inside the container
If you try to solve one/many of this issues with the volume solution, we could find some solution for that. Ex, on Mac, I will try to Edge release of docker first, on Windows, if possible, I will use WSL2 and Docker set to use the WSL2 backend.

Related

Best practice - Anonymous volume vs bind mount

In a container,
anonymous volume can be created
with syntax(VOLUME /build) in Dockerfile
or
below syntax with volumes having /build entry
cache:
build: ../../
dockerfile: docker/dev/Dockerfile
volumes:
- /tmp/cache:/cache
- /build
entrypoint: "true"
My understanding is, both approach(above) make volume /build also available after container goes in Exited state.
Volume is anonymous because /build points to some random new location(in /var/lib/docker/volumes directory) in docker host
I see that anonymous volumes are more safer than named volumes(like /tmp/cache:/cache).
Because /tmp/cache location is vulnerable because there is more chance that this location is used by more than one docker container.
1)
Why anonymous volume usage is discouraged?
2)
Is
VOLUME /build in Dockerfile
not same as
volumes:
- /build
in docker-compose.yml file? Is there a scenario, where we need to mention both?
You're missing a key third option, named volumes. If you declare:
version: '3'
volumes:
build: {}
services:
cache:
image: ...
volumes:
- build:/build
Docker Compose will create a named volume for you; you can see it with docker volume ls, for example. You can explicitly manage named volumes' lifetime, and set several additional options on them which are occasionally useful. The Docker documentation has a page describing named volumes in some detail.
I'd suggest that named volumes are strictly superior to anonymous volumes, for being able to explicitly see when they are created and destroyed, and for being able to set additional options on them. You can also mount the same named volume into several containers. (In this sequence of questions you've been asking, I'd generally encourage you to use a named volume and mount it into several containers and replace volumes_from:.)
Named volumes vs. bind mounts have advantages and disadvantages in both directions. Bind mounts are easy to back up and manage, and for content like log files that you need to examine directly it's much easier; on MacOS systems they are extremely slow. Named volumes can run independently of any host-system directory layout and translate well to clustered environments like Kubernetes, but it's much harder to examine them or back them up.
You almost never need a VOLUME directive. You can mount a volume or host directory into a container regardless of whether it's declared as a volume. Its technical effect is to mount a new anonymous volume at that location if nothing else is mounted there; its practical effect is that it prevents future Dockerfile steps from modifying that directory. If you have a VOLUME line you can almost always delete it without affecting anything.
Actually, anonymous volumes (/build) usage is encouraged over the use of bind mounts (/tmp/cache:/cache):
Volumes have several advantages over bind mounts:
Volumes are easier to back up or migrate than bind mounts.
You can manage volumes using Docker CLI commands or the Docker API.
Volumes work on both Linux and Windows containers.
Volumes can be more safely shared among multiple containers.
Volume drivers let you store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other
functionality.
New volumes can have their content pre-populated by a container.
Regarding your second question, yes. You can create anonymous volumes in docker-compose file or in the Dockerfile. No need to specify in both places.

What's the difference between declaring in docker-compose.yml volume as section and under a service?

What's the difference between declaring in the docker-compose.yml file a volume section and just using the volumes keyword under a service?
For example, I map a volume this way for a container:
services:
mysqldb:
volumes:
- ./data:/var/lib/mysql
This will map to the folder called data from my working directory.
But I could also map a volume by declaring a volume section and use its alias for the container:
services:
mysqldb:
volumes:
- data_volume:/var/lib/mysql
volumes:
data_volume:
driver: local
In this method, the actual location of where the mapped files are stored appears to be somewhat managed by docker compose.
What are the differences between these 2 methods or are they the same? Which one should I really use?
Are there any benefits of using one method over the other?
The difference between the methods you've described is that first method is a bind mount, and the other is a volume. These are more of Docker functions (rather than Docker Compose), and there are several benefits volumes provide over mounting a path from your host's filesystem. As described in the documentation, they:
are easier to back up or migrate
can be managed with docker volumes or the API (as opposed to the raw filesystem)
work on both Linux and Windows containers
can be safely shared among multiple containers
can have content pre-populated by a container (with bind mounts sometimes you have to copy data out, then restart the container)
Another massive benefit to using volumes are the volume drivers, which you'd specify in place of local. They allow you to store volumes remotely (i.e. cloud, etc) or add other features like encryption. This is core to the concept of containers, because if the running container is stateless and uses remote volumes, then you can move the container across hosts and it can be run without being reconfigured.
Therefore, the recommendation is to use Docker volumes. Another good example is the following:
services:
webserver_a:
volumes:
- ./serving/prod:/var/www
webserver_b:
volumes:
- ./serving/prod:/var/www
cache_server:
volumes:
- ./serving/prod:/cache_root
If you move the ./serving directory somewhere else, the bind mount breaks because it's a relative path. As you noted, volumes have aliases and have their path managed by Docker, so:
you wouldn't need to find and replace the path 3 times
the volume using local stores data somewhere else on your system and would continue mounting just fine
TL;DR: try and use volumes. They're portable, and encourage practices that reduce dependencies on your host machine.

Docker Compose: volumes without colon (:)

I have a docker-compose.yml file with the following:
volumes:
- .:/usr/app/
- /usr/app/node_modules
First option maps current host directory to /usr/app, but what does the second option do?
[Refreshing this answer since it seems others have similar questions]
There are three kinds of volumes in docker:
Host volumes: these map a path from the host into the container with a bind mount. They have the short syntax /path/on/host:/path/in/container. Whatever exists on the host is what will be visible in the container, there's no merging of files or initialization from the image, and uid/gid's do not get any special mapping so you need to take care to allow the container uid/gid read and write access to this location (an exception is Docker for Mac with OSXFS). If the path on the host does not exist, docker will create an empty directory as root, and if it is a file, you can mount a single file into the container this way.
Named volumes: these have a name, instead of a host path as the source. They have the short syntax name:/path/in/container and in a compose file, you also need to define the named volume used in containers at the top level. By default, these are also a bind mount, but to a docker specific directory under /var/lib/docker/volumes that should be considered internal. However these defaults can be changed to allow things like NFS mounts, mounting disks, or even your own bind mounts to other locations. Named volumes also have a feature in docker, when they are new or empty and first used, docker copies the contents from the image into named volume before mounting it. This includes files, directories, uid/gid owners, and permissions. After that, they behave identical to a host volume, whatever is inside the volume overlays the image location.
Anonymous volumes: these only have a path inside the container. They are in the form /path/in/container and docker will create a default named volume with a guid as the name. They share the behaviors of named volumes, storing files under /var/lib/docker/volumes, initializing with the contents of the image, except they have a randomly generated guid that gives you no indication of how or even if they are being used. You can mount the volume in another container and inspect the contents, or you can find the container using the volume by inspecting each container to find the guid. If you create a container with the --rm flag, anonymous volumes will also be deleted automatically.
tmpfs: Wait, I said 3, and this is 4? That's because tmpfs isn't considered a volume, the syntax to mount it is different. The result is a pointer to an empty in memory filesystem. This is useful if you have temporary files you don't wish to save, they are relatively small, and you either need speed or want to be sure they aren't saved to disk.
In the OP's case:
/usr/app is mounted from the host, commonly used for development
/usr/app/node_modules is an anonymous volume initialized from the image
Why do this? Likely because you do not want to modify the node_modules directory on the host, particularly if there's platform specific data and you're running on Docker desktop where it's Mac/Win on the host and Linux in the container. It's also possible there's data in the image you want to get access to within the directory structure of the other volume mount.
Are there downsides to anonymous volumes? Two that I can think of:
If there's anything in /usr/app/node_modules that you want to reuse in a future container, you're unlikely to find the old volume. I tend to consider any data written to these as likely lost.
You'll often find the volumes on the host full of guids over time, and it's unclear which are in use and which can be deleted. Unused anonymous volumes are one of several causes of excessive disk use in docker.
For more details on docker volumes, see: https://docs.docker.com/storage/
Original answer:
The second one creates an anonymous volume. It will be listed in docker volume ls with a long unique id rather than a name. Docker-compose will be able to reuse this if you update your image, but it's easy to lose track of which volume belongs to what with those names, so I recommend always giving your volume a name.
Just to complement the accepted answer, according to Docker's Knowledge Base there are three types of volumes: host, anonymous, and named:
A host volume lives on the Docker host's filesystem and can be
accessed from within the container. Example volume path:
/path/on/host:/path/in/container
An anonymous volume is useful for when you would rather have
Docker handle where the files are stored. It can be difficult,
however, to refer to the same volume over time when it is an
anonymous volumes. Example volume path:
/path/in/container
A named volume is similar to an anonymous volume. Docker manages
where on disk the volume is created, but you give it a volume name. Example volume path:
name:/path/in/container
The path used in your example is an anonymous volume.
I had the same question while I was going through this tutorial, and the answer to what those lines could actually be doing is this:
Without the anonymous volume ('/usr/src/app/node_modules'), the node_modules directory would essentially disappear by the mounting of the host directory at runtime:
Build - The node_modules directory is created.
Run - The current directory is copied into the container, overwriting the node_modules that were just installed when the container was built.
The docker-compose.yml file for this:
version: '3.5'
services:
something-clever:
container_name: something-clever
build:
context: .
dockerfile: Dockerfile
volumes:
- '.:/usr/src/app'
- '/usr/src/app/node_modules'
ports:
- '4200:4200'

How to use docker image, without mounting the default volumes?

I want to use Docker MySQL.
docker run mysql
But I don't want to save data on the host machine. I want all the information to be protected inside the container. By default, this image created an unnamed volume, and attach it to the container.
Is it possible, to use the same container, (I don't want to create a new MySQL image from ground), but disable the volume?
In other words: Many Docker images in docker hub are using volumes by default. What is the easiest way to save all the data inside the container (so push, and commit will contain the data)? There is a command to stop a container, change it's Mounts settings, and start again?
I know that it is not best practice, my question is if it is possible.
EDIT: There is a tool mentioned in the comments of the below thread that can edit docker image metadata, allowing you to remove a volume.
This is currently an open issue awaiting someone with the bandwidth to code it. You can track the progress here, with this link going directly to the applicable comment:
#veqryn since reopening this issue, nobody started working on a pull-request; the existing pull request did no longer apply cleanly on the code-base so a new one has to be opened; if anyone is interested in working on this, then things can get going again.
I too would like this feature! Mounting /var/lib/mysql/ on windows hosts with NTFS gives the volume root:root permissions which can't be chown'd; I don't want to add mysql user to the root group. I would like to UNVOLUME the /var/lib/mysql directory and replace it with a symlink that does have mysql:mysql permissions, pointed at /host/ntfs/mnt which is root:root 🤷‍♀️
As shown in this question, you can create, name and associate a container volume easily enough to the default unnamed one of mysql
version: '2'
services:
db:
image: mysql
volumes:
- dbdata:/var/lib/mysql
volumes:
dbdata:
driver: local
See "Mount a shared-storage volume as a data volume": you can uise other drivers, like flocker, and benefit from a multi-host portable volume.

chown docker volumes on host (possibly through docker-compose)

I have the following example
version: '2'
services:
proxy:
container_name: proxy
hostname: proxy
image: nginx
ports:
- 80:80
- 443:443
volumes:
- proxy_conf:/etc/nginx
- proxy_htdocs:/usr/share/nginx/html
volumes:
proxy_conf: {}
proxy_htdocs: {}
which works fine. When I run docker-compose up it creates those named volumes in /var/lib/docker/volumes and all is good. However, from the host, I can only access /var/lib/docker as root, because it's root:root (makes sense). I was wondering if there is a way of chowning the host's directories to something more sensible/safe (like, my relatively unprivileged user that I use to do most things on the host) or if I just have to suck it up and chown them manually. I'm starting to have a number of scripts already to work around other issues, so having an extra couple of lines won't be much of a problem, but I'd really like to keep my self-written automation minimal, if I can -- fewer chances for stupid mistakes.
By the way, no: if I mount host directories instead of creating volumes, they get overlaid, meaning that if they start empty, they stay empty, and I don't get the default configuration (or whatever) from inside the container.
Extra points: can I just move the volumes to a more convenient location? Say, /home/myuser/myserverstuff/volumes?
It's best to not try to access files inside /var/lib/docker directly. Those directories are meant to be managed by the docker daemon, and not to be messed with.
To access the data inside a volume, there's a number of options;
use a bind-mounted directory (you considered that, but didn't fit your use case).
use a "service" container that uses the same volume and makes it accessible through that container, for example a container running ssh (to use scp) or a SAMBA container (such as svendowideit/samba)
use a volume-driver plugin. there's various plugins around that offer all kind of options. For example, the local persist plugin is a really simple plug-in that allows you to specify where docker should store the volume data (so outside of /var/lib/docker)

Resources