What's the point of data-only docker containers?

What's the point of data-only docker containers? - docker

Instead of using a data-only container, I can ...
create a directory on the host (say /opt/shared_data)
Run every container with -v /opt/shared_data:/some/mount/point_inside/container
voila, now /opt/shared_data is effectively shared amongst all containers , correct?
If my understanding is correct, if I create a data-only container and then use "--volumes-from" when running other containers, I am stuck mounting them in the same location they were mounted, whereas, this way I get to choose which directory they are mounted as in my containers.
So why do I need "data-only" containers? Besides, the volume just points to somewhere on the host (/var/lib/docker/volumes?) which is functionally equivalent to my /opt/shared_data anyway right? Whats the advantage of the former?

Data containers have been largely deprecated in favor of named volumes. There's really no advantage to using a data container over a named volume, and includes the disadvantage of being stuck with the mount points.
To compare named volumes with host volumes (aka bind mounts), you have have a few differences:
Host volumes include permission issues, users inside the container will differ from those outside the container and files may not be easily accessed from both environments
Named volumes add the ability to use any volume driver so you can mount your data from remote locations.
Named volumes are initialized to the contents of the image at that path, including all files and any directory permissions.
The latter point is a big one for me, it means you can create an initial default value for a data folder in your image, but update it using the container and keep those changes in a named volume. With bind mounts, if the directory is empty or doesn't exist, that's also what you get when you mount it in your container.

Related

Why have docker-compose volumes to be declared twice when not pointing to an actual folder on the host?

In a docker-compose.yml file, do I really need to specify the volumes twice; inside and outside a service? If yes, Why? (the docker-compose part of the doc doesn't have much information on that)
I have the feeling that, in the case shown here where the myapp volume is not explicitly a folder on the host machine, we have to set it twice, but if it actually is a folder on the host machine, specifying it only inside the frontend service block is enough.
In addition, when the volume is specified outside the service block, it's almost always written as key: without any actual value (or sometime as key: {}), which makes me confused.
Moreover, when I run docker-compose down -v it actually only applies on volumes which are not explicitly specified as a folder on the host machine, according to the doc:
-v, --volumes Remove named volumes declared in the `volumes`
section of the Compose file and anonymous volumes
attached to containers.
So maybe the declaration of a volume outside a service is for making this volume identifiable, hence 'removable'. And on the other hand, it will never be removable if it's not set outside the service?

This is a whole bunch of questions - let's try to answer them sequentially:
1. Do I really need to specify the volumes twice (inside and outside the services section)?
This is not a duplicate specification: outside you declare the volume and inside you specify how to mount it into a container. A volume has an independent life cycle from services. It can be mounted by several services and it will retain data if services are restarted.
2. When the volume is specified outside the service block, it's almost always written as key: {}
This key-only notation is the default and does not require any driver configuration. However, if you needed to e.g. connect to NFS, you would have something like:
volumes:
example:
driver_opts:
type: "nfs"
o: "addr=10.40.0.199,nolock,soft,rw"
device: ":/docker/example"
Also, please differentiate between bind mounts and regular volumes. While regular volumes are managed independently from services (and containers), e.g. with docker volumes list, bind-mounts are mere mappings between the host and the container file system. They are tied to the container they are mounted to.
3. When I run docker-compose down -v it actually only applies on volumes which are not explicitly specified as a folder on the host machine
Yes, this won't remove bind mounts, since bind mounts are mere host-container filesystem mappings and therefore Docker does not create an independent volume entity for them.
For deeper understanding, please consider this excerpt from the documentation:
Bind mounts have been around since the early days of Docker. Bind
mounts have limited functionality compared to volumes. When you use a
bind mount, a file or directory on the host machine is mounted into a
container. The file or directory is referenced by its absolute path on
the host machine. By contrast, when you use a volume, a new directory
is created within Docker’s storage directory on the host machine, and
Docker manages that directory’s contents.

"a bind mount won't copy the container contents to the host automatically, unlike a named volume"

Need clarity on a comment here:
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
docs.docker.com/compose/compose-file/#volumes
Is this accurate? If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
how did Docker persist data across container restarts before there were named volumes?

The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
Is this accurate?
Close to accurate, but I can see the confusion. Host volumes, aka bind mounts, do not have an initialization feature from docker. With anonymous and named volumes, docker will initialize the volume with the contents of the image at that path. This initialization includes ownership and permissions which helps avoid permission errors. This initialization only runs when the container is created and the volume is new or empty, so subsequent containers will not pickup changes to the image made in newer image versions.
If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data
in case of a container restart)?
Reads and writes from the app in the container will continue through to the host filesystem used in the bind mount as expected. It's only the initialization step that doesn't run.
how did Docker persist data across container restarts before there were named volumes?
There were data containers, mounting volumes from other containers, but this was inflexible (all volume paths were fixed to the path in the data container) and mixed management of persistent data with ephemeral containers, and has therefore been phased out.
Volumes are used to handle data persistence between containers. A single container restarting (rather than being replaced) will still have all the container specific filesystem changes. The docker rm command deletes these filesystem changes, along with container logs and metadata/configuration of the container.
The container specific changes are the read/write top layer of an overlay filesystem used by docker. Volume mounts are all separate mounts into subdirectories of this overlay filesystem (just like /home or /var are often separate filesystem mounts in the / filesystem of a Linux host, all reads and writes to those other paths go to a separate underlying filesystem).

If you're going to mount a volume into a container, and you want that volume to reliably contain some content from the image, you need to manually copy it there at container startup time. One way to do this is with an entrypoint wrapper script:
#!/bin/sh
# Copy data into a possibly-mounted location
cp -a /app/static /var/www
# Then run the image's CMD
exec "$#"
You'd include this in your image's Dockerfile
# Must use JSON-array syntax
ENTRYPOINT ["/app/entrypoint.sh"]
CMD same as it was before
There are two important details about Docker named volumes' initialization behavior to be aware of here. The first, which you note, is that Docker only copies content into a volume for Docker named volumes; it doesn't happen for bind mounts, and it doesn't happen in other environments like Kubernetes.
The second, more subtle detail is that the initialization only happens the first time the container runs. If there's already content in a volume that you mount into a container, it will hide what was already there. In other SO questions you can see this manifest as, for example, "I added a package to my Node package.json file, but when I put the node_modules directory in a volume, it ignores the update" or "I'm using a volume to export content to an nginx proxy but it doesn't update".

I think #BMitch having the accepted answer is correct, but I will just try to add in some details with the hope of being useful.
Is this accurate? If yes, then:
Given it is my claim being scrutinised - I totally defer to #BMitch here :)!
However I would also add:
https://github.com/docker/compose/issues/4581#issuecomment-389559090
Provides a layman explanation of how named volumes / host volumes behave
My explanation needs updated to reflect the notion of 'initialization'
https://stackoverflow.com/a/40030535/3080207
This is how I would recommend setting up volumes in docker-compose at the moment, courtesy of #kaiser
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
Both host volumes and named volumes can achieve this.
I think the point of contention is what you want to happen on the:
first run of the container
subsequent runs of the container and
the location/accessibility of the volume on the host system.
Once a volume is attached to a container (be it a named volume or bind mount), whatever is stored to that volume should be persisted between restarts - that effectively comes for free. This assumes the same docker-compose config, and no manual removal of volumes.
Previously it was a bit limiting using a named volume, as you couldn't tail logs, or edit code directly from the host as easily as you could with a bind mount - but it seems that problem is resolved / has a work around now.
Bind mounts are able to persist data between restarts. I personally find that bind volumes do what I want 99% of the time, that being said, named volumes can now 'do it all' and I'd be using those moving forward.
There are differences between them though, and I'm sure they'll still bite people occasionally, requiring them to reach out to actual experts, instead of users like me :).

How can I share data between host and container using mounts

I've been attempting to share data between my host and my container. I've been reading a lot about volumes and I believe I have misunderstood some of the fundamentals around sharing data.
Here's how I've been doing it (with Docker Compose)
version: "2"
services:
my-server:
volumes:
- type: bind
source: ./test/
target: /var/logs
The problem with this approach is that the initial creation of the mount destroys any data in the target folder. So for example if my image was built from another image that had some logs in that folder (for whatever reason), the logs would be destroyed.
This is a major problem with my use case. I need to mount a volume (a folder, basically) so that I can share data between my host and guest, similar to how a shared folder with a VM would work.
I've looked into named volumes but from what I understand, named and anonymous volumes are designed to share data between containers, and not to share data with the host (which is what I need for my use case).
So besides bind mounts, is it possible share data between the host and container?

This is not really a Docker problem. I think you'll run into this with any mount. Basically you are already using the correct mechanism for sharing data between the host and your container.
When you mount something in linux, the mount target (i.e. the path at which you mount something) is always replaced with the root of whatever you mount. It does not merge the contents of the mount target with the contents of the (in this case) bind mounted directory. I'm surprised that works with VM shared folders because you run a high risk of a collision. e.g. same file in both locations. How would it resolve that? File system mounts are not the same as a dropbox like synchronisation of files between two locations.
I suggest that you do your bind mount to somewhere else in your container which has no contents and then modify your in-container workflow to handle this. In your example it sounds like you are attempting to collect logs. It also sounds like the containers configured log directory might have some contents which you want to be copied to the host. You could achieve this by having your container init itself by configuring a new log directory before starting your services/running anything, and copying any existing logs to that location. This new location would be the bind mount. Your init script could also detect if the bind mount was already used in this fashion and not sync over the data. This is really an application specific problem.

What Is The Difference Between Binding Mounts And Volumes While Handling Persistent Data In Docker Containers?

I want to know why we have two different options to do the same thing, What are the differences between the two.

We basically have 3 types of volumes or mounts for persistent data:
Bind mounts
Named volumes
Volumes in dockerfiles
Bind mounts are basically just binding a certain directory or file from the host inside the container (docker run -v /hostdir:/containerdir IMAGE_NAME)
Named volumes are volumes which you create manually with docker volume create VOLUME_NAME. They are created in /var/lib/docker/volumes and can be referenced to by only their name. Let's say you create a volume called "mysql_data", you can just reference to it like this docker run -v mysql_data:/containerdir IMAGE_NAME.
And then there's volumes in dockerfiles, which are created by the VOLUME instruction. These volumes are also created under /var/lib/docker/volumes but don't have a certain name. Their "name" is just some kind of hash. The volume gets created when running the container and are handy to save persistent data, whether you start the container with -v or not. The developer gets to say where the important data is and what should be persistent.
What should I use?
What you want to use comes mostly down to either preference or your management. If you want to keep everything in the "docker area" (/var/lib/docker) you can use volumes. If you want to keep your own directory-structure, you can use binds.
Docker recommends the use of volumes over the use of binds, as volumes are created and managed by docker and binds have a lot more potential of failure (also due to layer 8 problems).
If you use binds and want to transfer your containers/applications on another host, you have to rebuild your directory-structure, where as volumes are more uniform on every host.

Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. While bind mounts are dependent on the directory structure of the host machine, volumes are completely managed by Docker. Volumes are often a better choice than persisting data in a container’s writable layer, because a volume does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container. More on
Differences between -v and --mount behavior
Because the -v and --volume flags have been a part of Docker for a long time, their behavior cannot be changed. This means that there is one behavior that is different between -v and --mount.
If you use -v or --volume to bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory.
If you use --mount to bind-mount a file or directory that does not yet exist on the Docker host, Docker does not automatically create it for you, but generates an error. More on
Docker for Windows shared folders limitation
Docker for Windows does make much of the VM transparent to the Windows host, but it is still a virtual machine. For instance, when using –v with a mongo container, MongoDB needs something else supported by the file system. There is also this issue about volume mounts being extremely slow.
More on

Bind mounts are like a superset of Volumes (named or unnamed).
Bind mounts are created by binding an existing folder in the host system (host system is native linux machine or vm (in windows or mac)) to a path in the container.
Volume command results in a new folder, created in the host system under /var/lib/docker
Volumes are recommended because they are managed by docker engine (prune, rm, etc).
A good use case for bind mount is linking development folders to a path in the container. Any change in host folder will be reflected in the container.
Another use case for bind mount is keeping the application log which is not crucial like a database.
Command syntax is almost the same for both cases:
bind mount:
note that the host path should start with '/'. Use $(pwd) for convenience.
docker container run -v /host-path:/container-path image-name
unnamed volume:
creates a folder in the host with an arbitrary name
docker container run -v /container-path image-name
named volume:
should not start with '/' as this is reserved for bind mount.
'volume-name' is not a full path here. the command will cause a folder to be created with path "/var/lib/docker/volumes/volume-name" in the host.
docker container run -v volume-name:/container-path image-name
A named volume can also be created beforehand a container is run (docker volume create). But this is almost never needed.

As a developer, we always need to do comparison among the options provided by tools or technology. For Volume & Bind mounts, I would suggest to list down what kind of application you are trying to containerize.
Following are the parameters that I would consider before choosing Volume over Bind Mounts:
Docker provide various CLI commands to Volumes easily outside containers.
For backup & restore, Volume is far easier than Bind as it depends upon the underlying host OS.
Volumes are platform-agnostic so they can work on Linux as well as on Window containers.
With Bind, you have 2 technologies to take care of. Your host machine directory structure as well as Docker.
Migration of Volumes are easier not only on local machines but on cloud machines as well.
Volumes can be easily shared among multiple containers.

How can I have shared assets (pictures, text documents, etc) between my Docker container and host system?

I have a Docker container and I am trying to make it so that all of the files in /var/www/ on the container will be saved on the host system at a location (/home/me), and vise-versa. Is it possible to have this shared space between the two?
Would you accomplish this with mount points, or is there a better method?
Thanks

You can use volumes for sharing between container and host.
docker run -v /home/me:/var/www <image>
If you have a fixed files/data, you can add to the image using dockerfile or committing after copying into container. If you want to share rw dir between host and container, you need to use the volumes. Your data will also be persisted even if you remove and recreate a new container.

There are three ways that you can do this
Use volumes. Official docs
Burn the files in your image. Basically include the creation of the files inside the Dockerfile. This means every container container from that image will have an initial state of sorts.
Use data-only containers. These are containers without a running process that contain the data that you need. This also uses volumes. But instead of mounting to the host, your containers mount on the data-only container (which in turn mounts on the host if you want to). This answer will be useful

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart