How can I share data between host and container using mounts - docker

I've been attempting to share data between my host and my container. I've been reading a lot about volumes and I believe I have misunderstood some of the fundamentals around sharing data.
Here's how I've been doing it (with Docker Compose)
version: "2"
services:
my-server:
volumes:
- type: bind
source: ./test/
target: /var/logs
The problem with this approach is that the initial creation of the mount destroys any data in the target folder. So for example if my image was built from another image that had some logs in that folder (for whatever reason), the logs would be destroyed.
This is a major problem with my use case. I need to mount a volume (a folder, basically) so that I can share data between my host and guest, similar to how a shared folder with a VM would work.
I've looked into named volumes but from what I understand, named and anonymous volumes are designed to share data between containers, and not to share data with the host (which is what I need for my use case).
So besides bind mounts, is it possible share data between the host and container?

This is not really a Docker problem. I think you'll run into this with any mount. Basically you are already using the correct mechanism for sharing data between the host and your container.
When you mount something in linux, the mount target (i.e. the path at which you mount something) is always replaced with the root of whatever you mount. It does not merge the contents of the mount target with the contents of the (in this case) bind mounted directory. I'm surprised that works with VM shared folders because you run a high risk of a collision. e.g. same file in both locations. How would it resolve that? File system mounts are not the same as a dropbox like synchronisation of files between two locations.
I suggest that you do your bind mount to somewhere else in your container which has no contents and then modify your in-container workflow to handle this. In your example it sounds like you are attempting to collect logs. It also sounds like the containers configured log directory might have some contents which you want to be copied to the host. You could achieve this by having your container init itself by configuring a new log directory before starting your services/running anything, and copying any existing logs to that location. This new location would be the bind mount. Your init script could also detect if the bind mount was already used in this fashion and not sync over the data. This is really an application specific problem.

Related

Why have docker-compose volumes to be declared twice when not pointing to an actual folder on the host?

In a docker-compose.yml file, do I really need to specify the volumes twice; inside and outside a service? If yes, Why? (the docker-compose part of the doc doesn't have much information on that)
I have the feeling that, in the case shown here where the myapp volume is not explicitly a folder on the host machine, we have to set it twice, but if it actually is a folder on the host machine, specifying it only inside the frontend service block is enough.
In addition, when the volume is specified outside the service block, it's almost always written as key: without any actual value (or sometime as key: {}), which makes me confused.
Moreover, when I run docker-compose down -v it actually only applies on volumes which are not explicitly specified as a folder on the host machine, according to the doc:
-v, --volumes Remove named volumes declared in the `volumes`
section of the Compose file and anonymous volumes
attached to containers.
So maybe the declaration of a volume outside a service is for making this volume identifiable, hence 'removable'. And on the other hand, it will never be removable if it's not set outside the service?
This is a whole bunch of questions - let's try to answer them sequentially:
1. Do I really need to specify the volumes twice (inside and outside the services section)?
This is not a duplicate specification: outside you declare the volume and inside you specify how to mount it into a container. A volume has an independent life cycle from services. It can be mounted by several services and it will retain data if services are restarted.
2. When the volume is specified outside the service block, it's almost always written as key: {}
This key-only notation is the default and does not require any driver configuration. However, if you needed to e.g. connect to NFS, you would have something like:
volumes:
example:
driver_opts:
type: "nfs"
o: "addr=10.40.0.199,nolock,soft,rw"
device: ":/docker/example"
Also, please differentiate between bind mounts and regular volumes. While regular volumes are managed independently from services (and containers), e.g. with docker volumes list, bind-mounts are mere mappings between the host and the container file system. They are tied to the container they are mounted to.
3. When I run docker-compose down -v it actually only applies on volumes which are not explicitly specified as a folder on the host machine
Yes, this won't remove bind mounts, since bind mounts are mere host-container filesystem mappings and therefore Docker does not create an independent volume entity for them.
For deeper understanding, please consider this excerpt from the documentation:
Bind mounts have been around since the early days of Docker. Bind
mounts have limited functionality compared to volumes. When you use a
bind mount, a file or directory on the host machine is mounted into a
container. The file or directory is referenced by its absolute path on
the host machine. By contrast, when you use a volume, a new directory
is created within Docker’s storage directory on the host machine, and
Docker manages that directory’s contents.

"a bind mount won't copy the container contents to the host automatically, unlike a named volume"

Need clarity on a comment here:
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
docs.docker.com/compose/compose-file/#volumes
Is this accurate? If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
how did Docker persist data across container restarts before there were named volumes?
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
Is this accurate?
Close to accurate, but I can see the confusion. Host volumes, aka bind mounts, do not have an initialization feature from docker. With anonymous and named volumes, docker will initialize the volume with the contents of the image at that path. This initialization includes ownership and permissions which helps avoid permission errors. This initialization only runs when the container is created and the volume is new or empty, so subsequent containers will not pickup changes to the image made in newer image versions.
If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data
in case of a container restart)?
Reads and writes from the app in the container will continue through to the host filesystem used in the bind mount as expected. It's only the initialization step that doesn't run.
how did Docker persist data across container restarts before there were named volumes?
There were data containers, mounting volumes from other containers, but this was inflexible (all volume paths were fixed to the path in the data container) and mixed management of persistent data with ephemeral containers, and has therefore been phased out.
Volumes are used to handle data persistence between containers. A single container restarting (rather than being replaced) will still have all the container specific filesystem changes. The docker rm command deletes these filesystem changes, along with container logs and metadata/configuration of the container.
The container specific changes are the read/write top layer of an overlay filesystem used by docker. Volume mounts are all separate mounts into subdirectories of this overlay filesystem (just like /home or /var are often separate filesystem mounts in the / filesystem of a Linux host, all reads and writes to those other paths go to a separate underlying filesystem).
If you're going to mount a volume into a container, and you want that volume to reliably contain some content from the image, you need to manually copy it there at container startup time. One way to do this is with an entrypoint wrapper script:
#!/bin/sh
# Copy data into a possibly-mounted location
cp -a /app/static /var/www
# Then run the image's CMD
exec "$#"
You'd include this in your image's Dockerfile
# Must use JSON-array syntax
ENTRYPOINT ["/app/entrypoint.sh"]
CMD same as it was before
There are two important details about Docker named volumes' initialization behavior to be aware of here. The first, which you note, is that Docker only copies content into a volume for Docker named volumes; it doesn't happen for bind mounts, and it doesn't happen in other environments like Kubernetes.
The second, more subtle detail is that the initialization only happens the first time the container runs. If there's already content in a volume that you mount into a container, it will hide what was already there. In other SO questions you can see this manifest as, for example, "I added a package to my Node package.json file, but when I put the node_modules directory in a volume, it ignores the update" or "I'm using a volume to export content to an nginx proxy but it doesn't update".
I think #BMitch having the accepted answer is correct, but I will just try to add in some details with the hope of being useful.
Is this accurate? If yes, then:
Given it is my claim being scrutinised - I totally defer to #BMitch here :)!
However I would also add:
https://github.com/docker/compose/issues/4581#issuecomment-389559090
Provides a layman explanation of how named volumes / host volumes behave
My explanation needs updated to reflect the notion of 'initialization'
https://stackoverflow.com/a/40030535/3080207
This is how I would recommend setting up volumes in docker-compose at the moment, courtesy of #kaiser
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
Both host volumes and named volumes can achieve this.
I think the point of contention is what you want to happen on the:
first run of the container
subsequent runs of the container and
the location/accessibility of the volume on the host system.
Once a volume is attached to a container (be it a named volume or bind mount), whatever is stored to that volume should be persisted between restarts - that effectively comes for free. This assumes the same docker-compose config, and no manual removal of volumes.
Previously it was a bit limiting using a named volume, as you couldn't tail logs, or edit code directly from the host as easily as you could with a bind mount - but it seems that problem is resolved / has a work around now.
Bind mounts are able to persist data between restarts. I personally find that bind volumes do what I want 99% of the time, that being said, named volumes can now 'do it all' and I'd be using those moving forward.
There are differences between them though, and I'm sure they'll still bite people occasionally, requiring them to reach out to actual experts, instead of users like me :).

Docker: Handling user uploads and saving files

I have been reading about Docker, and one of the first things that I read about docker was that it runs images in a read-only manner. This has raised this question in my mind, what happens if I need users to upload files? In that case where would the file go (are they appended to the image)? or in other words, how to handle uploaded files?
Docker containers are meant to be immutable and replaceable - you should be able to stop a container and replace it with a newer version without any ill effects. It's bad practice to store any configuration or operational data inside the container.
The situation you describe with file uploads would typically be resolved with a volume, which mounts a folder from the host filesystem into the container. Any modifications performed by the container to the mounted folder would persist on the host filesystem. When the container is replaced, the folder is re-mounted when the new container is started.
It may be helpful to read up on volumes: https://docs.docker.com/storage/volumes/
docker containers use file systems similar to their underlying operating system, as it seems in your case Windows Nano Server(windows optimized to be used in a container).
so any uploads to your container will be placed on the corresponding path you provided when uploading the file.
but this data is ephemeral, this means your data will persist until the container is for whatever reason stopped.
to use persistent storage you must provide a volume for your docker container, you can think of volumes as external disks attached to a container that mount on a path inside the container. this will persist data regardless of container state

What's the point of data-only docker containers?

Instead of using a data-only container, I can ...
create a directory on the host (say /opt/shared_data)
Run every container with -v /opt/shared_data:/some/mount/point_inside/container
voila, now /opt/shared_data is effectively shared amongst all containers , correct?
If my understanding is correct, if I create a data-only container and then use "--volumes-from" when running other containers, I am stuck mounting them in the same location they were mounted, whereas, this way I get to choose which directory they are mounted as in my containers.
So why do I need "data-only" containers? Besides, the volume just points to somewhere on the host (/var/lib/docker/volumes?) which is functionally equivalent to my /opt/shared_data anyway right? Whats the advantage of the former?
Data containers have been largely deprecated in favor of named volumes. There's really no advantage to using a data container over a named volume, and includes the disadvantage of being stuck with the mount points.
To compare named volumes with host volumes (aka bind mounts), you have have a few differences:
Host volumes include permission issues, users inside the container will differ from those outside the container and files may not be easily accessed from both environments
Named volumes add the ability to use any volume driver so you can mount your data from remote locations.
Named volumes are initialized to the contents of the image at that path, including all files and any directory permissions.
The latter point is a big one for me, it means you can create an initial default value for a data folder in your image, but update it using the container and keep those changes in a named volume. With bind mounts, if the directory is empty or doesn't exist, that's also what you get when you mount it in your container.

Docker - Exposing container directories to host directories without obscuring original contents

I am still relatively new to Docker, so I haven't quite figured out all the nuances yet, so forgive me if this has already been resolved elsewhere.
I would like to share files between the container and the host.
So far I have been using volumes, and mounting a specific host directory to a container directory - but this presents an issue in that, if the host directory is messed around with, these changes are also present in the container.
Another problem I am experiencing is that if the host directory is empty, and is mounted to a pre-existing directory on the container, then the contents of the directory are made invisible. I do understand that this behaviour is consistent with mounting, so I know this is not technically an issue.
But I wonder if it would be possible to set up a volume, or an alternative solution, that:
firstly cannot be edited on the host side,
and secondly allows for the host directory contents to be merged with the container's directory?
I wonder if it would be possible to set up a volume, or an alternative solution, that
firstly cannot be edited on the host side,
That would be a data volume container (with docker create on an image with a VOLUME directive)
and secondly allows for the host directory contents to be merged with the container's directory?
Not that I know of. You would have to do some kind of copy on startup.

Resources