At the moment I am trying to understand some scenarios concerning data persistence.
The Data Volume Container Pattern sounds great, and there are ways to backup the attached volumes. But what if I have no current backup the DataVolumeContainer gets removed (rather than only stopped)?
From what I understand the volume is physically still present on my host system, but I can't attach it to a new container (since there's no referencing container left).
Is there a possibility to restore that volume (e.g. mount it to a new container) by referencing its volume file or the volume name? (assuming that I named the volume)
but I can't attach it to a new container (since there's no referencing container left).
Is there a possibility to restore that volume (e.g. mount it to a new container) by referencing its volume file or the volume name (assuming that I named the volume)?
Yes there is: if you can find your old volume path, you can, as I mentioned in "Docker mount dangling volume", restore its content in a new (and empty) data volume container by replacing its mounting path content with the one found in the old path.
I prefer saving that path whenever I create a new data volume container.
Its not clear from your text if you have removed the volume with the container - then its gone - otherwise and by default you can find "raw" docker volumes here: /var/lib/docker/volumes/. Check the docs for docker rm -v.
Normally you would create a volume by docker run ... -v hostpath:containerpath ... and then you have your data always at hostpath available, no matter if you remove your container or not.
Related
docker run -ti --rm -v DataVolume3:/var ubuntu
Lets say I have a volume DataVolume 3 which pulls the contents of /var in the ubuntu container
even after killing this ubuntu container the volume remains and I can use this volume DataVolume3 to mount it to other containers.
This means with the deletion of container the volume mounts are not deleted.
How does this work ?
Does that volume mount mean that it copies the contents of /var into some local directory because this does not look like a symbolic link ?
If I have the container running and I create a file in the container then the same file gets copied to the host path ?
How does this whole process of volume mount from container to host and host to container work ?
Volumes are used for persistent storage and the volumes persists independent of the lifecycle of the container.
We can go through a demo to understand it clearly.
First, let's create a container using the named volumes approach as:
docker run -ti --rm -v DataVolume3:/var ubuntu
This will create a docker volume named DataVolume3 and it can be viewed in the output of docker volume ls:
docker volume ls
DRIVER VOLUME NAME
local DataVolume3
Docker stores the information about these named volumes in the directory /var/lib/docker/volumes/ (*):
ls /var/lib/docker/volumes/
1617af4bce3a647a0b93ed980d64d97746878564b141f30b6110d0818bf32b76 DataVolume3
Next, let's write some data from the ubuntu container at the mounted path var:
echo "hello" > var/file1
root#2b67a89a0050:/# cat /var/file1
hello
We can see this data with cat even after deleting the container:
cat /var/lib/docker/volumes/DataVolume3/_data/file1
hello
Note: Although, we are able to access the volumes like shown above but it not a recommended practice to access volumes data like this.
Now, next time when another container uses the same volume then the data from the volume gets mounted at the container directory specified as part of -v flag.
(*) The location may vary based on OS as pointed by David and probably can be seen by the docker volume inspect command.
Docker has a concept of a named volume. By default the storage for this lives somewhere on your host system and you can't directly access it from outside Docker (*). A named volume has its own lifecycle, it can be independently docker volume rm'd, and if you start another container mounting the same volume, it will have the same persistent content.
The docker run -v option takes some unit of storage, either a named volume or a specific host directory, and mounts it (as in the mount(8) command) in a specific place in the container filesystem. This will hide what was originally in the image and replace it with the volume content.
As you note, if the thing you mount is an empty named volume, it will get populated from the image content at container initialization time. There are some really important caveats on this functionality:
Named volume initialization happens only if the volume is totally empty.
The contents of the named volume never automatically update.
If the volume isn't empty, the volume contents completely replace what's in the image, even if it's changed.
The initialization happens only on native Docker, and not for example in Kubernetes.
The initialization happens only on named volumes, and not for bind-mounted host directories.
With all of these caveats, I'd avoid relying on this functionality.
If you need to mount a volume into a container, assume it will be empty when your entrypoint or the main container command starts. If you need a particular directory layout or file structure there, an entrypoint script can create it; if you're expecting it to hold particular data, keep a copy of it somewhere else in your image and copy it in if it's not already there (or, perhaps, always).
(*) On native Linux you can find a filesystem location for it, but accessing this isn't a best practice. On other OSes this will be hidden inside a virtual machine or other opaque storage. If you need to directly access the data (or inject config files, or read log files) a docker run -v /host/path:/container/path bind mount is a better choice.
Volumes are part of neither the container nor the host. Well, technically everything resides in the host machine. But the docker directories are only accessible by users in "docker" group. The files in these directories are separately managed by docker.
"Volumes are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/ on Linux)."
Hence volumes are like the union of files under the docker container and the host itself. Any addition on either end will be added to the volume(/var/lib/docker/volumes), not hard copy, rather something like symbol link
As volumes can be shared across different containers, deleting a container does not cascade to the volumes associated with it.
To remove unused volumes:
docker volume prune .
Need clarity on a comment here:
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
docs.docker.com/compose/compose-file/#volumes
Is this accurate? If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
how did Docker persist data across container restarts before there were named volumes?
The only 'problem' with a bind mount is that it won't copy the
container contents to the host automatically, unlike a named volume.
Is this accurate?
Close to accurate, but I can see the confusion. Host volumes, aka bind mounts, do not have an initialization feature from docker. With anonymous and named volumes, docker will initialize the volume with the contents of the image at that path. This initialization includes ownership and permissions which helps avoid permission errors. This initialization only runs when the container is created and the volume is new or empty, so subsequent containers will not pickup changes to the image made in newer image versions.
If yes, then:
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data
in case of a container restart)?
Reads and writes from the app in the container will continue through to the host filesystem used in the bind mount as expected. It's only the initialization step that doesn't run.
how did Docker persist data across container restarts before there were named volumes?
There were data containers, mounting volumes from other containers, but this was inflexible (all volume paths were fixed to the path in the data container) and mixed management of persistent data with ephemeral containers, and has therefore been phased out.
Volumes are used to handle data persistence between containers. A single container restarting (rather than being replaced) will still have all the container specific filesystem changes. The docker rm command deletes these filesystem changes, along with container logs and metadata/configuration of the container.
The container specific changes are the read/write top layer of an overlay filesystem used by docker. Volume mounts are all separate mounts into subdirectories of this overlay filesystem (just like /home or /var are often separate filesystem mounts in the / filesystem of a Linux host, all reads and writes to those other paths go to a separate underlying filesystem).
If you're going to mount a volume into a container, and you want that volume to reliably contain some content from the image, you need to manually copy it there at container startup time. One way to do this is with an entrypoint wrapper script:
#!/bin/sh
# Copy data into a possibly-mounted location
cp -a /app/static /var/www
# Then run the image's CMD
exec "$#"
You'd include this in your image's Dockerfile
# Must use JSON-array syntax
ENTRYPOINT ["/app/entrypoint.sh"]
CMD same as it was before
There are two important details about Docker named volumes' initialization behavior to be aware of here. The first, which you note, is that Docker only copies content into a volume for Docker named volumes; it doesn't happen for bind mounts, and it doesn't happen in other environments like Kubernetes.
The second, more subtle detail is that the initialization only happens the first time the container runs. If there's already content in a volume that you mount into a container, it will hide what was already there. In other SO questions you can see this manifest as, for example, "I added a package to my Node package.json file, but when I put the node_modules directory in a volume, it ignores the update" or "I'm using a volume to export content to an nginx proxy but it doesn't update".
I think #BMitch having the accepted answer is correct, but I will just try to add in some details with the hope of being useful.
Is this accurate? If yes, then:
Given it is my claim being scrutinised - I totally defer to #BMitch here :)!
However I would also add:
https://github.com/docker/compose/issues/4581#issuecomment-389559090
Provides a layman explanation of how named volumes / host volumes behave
My explanation needs updated to reflect the notion of 'initialization'
https://stackoverflow.com/a/40030535/3080207
This is how I would recommend setting up volumes in docker-compose at the moment, courtesy of #kaiser
how does one get the container's "new data" (e.g. a growing database) into the host when using a bind mount (to persist the data in case of a container restart)?
Both host volumes and named volumes can achieve this.
I think the point of contention is what you want to happen on the:
first run of the container
subsequent runs of the container and
the location/accessibility of the volume on the host system.
Once a volume is attached to a container (be it a named volume or bind mount), whatever is stored to that volume should be persisted between restarts - that effectively comes for free. This assumes the same docker-compose config, and no manual removal of volumes.
Previously it was a bit limiting using a named volume, as you couldn't tail logs, or edit code directly from the host as easily as you could with a bind mount - but it seems that problem is resolved / has a work around now.
Bind mounts are able to persist data between restarts. I personally find that bind volumes do what I want 99% of the time, that being said, named volumes can now 'do it all' and I'd be using those moving forward.
There are differences between them though, and I'm sure they'll still bite people occasionally, requiring them to reach out to actual experts, instead of users like me :).
I'm confused about common consensus that one shouldn't use data containers. I have specific use case that I want to accomplish.
I want to have docker nginx container and behind it some other container with application. To run newest version of my app I want to download ready container from my private docker registry. The application is for now purely static html, javascript something.
So my plan is to create docker image which will hold the files, and will specify a named volume in some /webapp folder. The nginx container will serve this volume. I do not see any other way how to move bunch of files to remote system the "docker containerized" way. Am I not actually creating cursed data container?
Anyway what happens during app containers exchange? When I stop the app container the volume remains accesible, as it is placed on host. When I pull and start new version of app container. The volume will be created again and prefiled with image files stored at the same location, replacing the content on host so the nginx container will server from now new version of the application.Right? What happens when I will reference volume that does not exist yet from the nginx container.
It seem that named values are not automatically filed with the content of the image. As well I'm not sure how to create named volume in docker file as this syntax taken from here doesn't work
FROM training/webapp
VOLUME webapp:/webapp
I think you might want what i have described here https://stackoverflow.com/a/41576040/3625317
The problem with volumes is, that when a container is recreated, not docker-compose down but rather docker-compose pull + up, the new container will not have your "new code stored in the volume" but rather, due to the recycled volume, still the old anon volume. The point is, you will need a anon-volume for the code anyway, since you want it redeployable, not a named volume since you want the code to be exchangeable.
On re-create the anon-volume is not removed, that said, lets say you have the image:v1 right now and you pull image:v2 and then do a docker-compose up. It will recreate your container based on image:v2 - when this finished, you will have a new container, but the code is still from the old container, which was based on image:v1, since the anon-volume has not been replaced, it was re-assigned. docker-compose down && docker-compose up will resolve that for you - but you have to keep this in mind when dealing with your idea. (down removes anon-volumes)
In general, there is a pro / con, see my other post.
Data-containers in general have a other meaning and have been replaced by so called named volumes. Data-containers have been used to establish a volume-mount which is "named" and not based on a anon-volume.
In the past, you had to create a container with a volume, and later use a container-name based mount of this volume ( the container would be the static / name part ), today, you just create a named volume name and mount by this volume-name, no need for a busybox killed after start based container-name based volume mount.
I've searched the docs but nothing came up so time to test it. But for a quick future reference...
Is the host folder populated with the container folder contents?
Is it the opposite?
Are both folder contents merged? (In that case: What happens when a file with the same name is in both folders?)
Or does it produce an error? Is the error thrown on launch or is it thrown when you try to build an image with a VOLUME pointing to an existing populated folder on the container?
Also, another thing that isn't in the docs: Do I have to define the container path as a VOLUME in the Dockerfile in order to use -v against it when launching the container or can I create volumes on the fly?
When you run a container and mount a volume from the host, all you see in the container is what is on the host - the volume mount points at the host directory, so if there was anything in the directory in the image it gets bypassed.
With an image from this Dockerfile:
FROM ubuntu
WORKDIR /vol
RUN touch /vol/from-container
VOLUME /vol
When you run it without a host mount, the image contents get copied into the volume:
> docker run vol-test ls /vol
from-container
But mount the volume from the host and you only see the host's content:
> ls $(pwd)/host
from-host
> docker run -v $(pwd)/host:/vol vol-test ls /vol
from-host
And no, you don't need the VOLUME instruction. The behaviour is the same without it.
Whenever a Docker container is created with a volume mounted on the host, e.g.:
docker run -v /path/on/host:/data container-image
Any contents that are already in /data due to the image build process are always completely discarded, and whatever is currently at /path/on/host is used in its place. (If /path/on/host does not exist, it is created as an empty directory, though I think some aspect of that behavior may currently be deprecated.)
Pre-defining a volume in the Dockerfile with VOLUME is not necessary; all VOLUME does is cause any containers run from the image to have an implicit -v /volume/path (Note lack of host mount path) argument added to their docker run command which is ignored if an explicit -v /host/path:/volume/path is used.
I recently created a mongodb docker instance running on boot2docker on windows.
Unfortunately during my experimenting with kitematic I managed to accidentally remove the volume from the mongo container and can no longer access my data.
The mongo instance seems to have created a new volume with the old volume now remaining dangling (orphaned) and not mounted in any containers.
Is there any way to recover this?
Thanks for your reply it got me on the right track, I managed to start a new mongo container using the following command
docker run -d -v 571284fbe08a3f2b675af299ec14e55550bad623f6316914d465843fa12d6f18:/data/db mongo
where 571284fbe08a3f2b675af299ec14e55550bad623f6316914d465843fa12d6f18 is the dangling volume identified by using
docker volume ls
I usually register the path (in a file) of any data volume container I create, precisely in that case. See "Docker volumes for persistent data - is it enough to pass container path only?" and my script updateDataContainerPath.
What I have seen is that:
any new data volume container comes with its own Mounts.Source path,
you can delete that new folder (which is empty)
you can replace it with the folder of your old data volume container (giving it the same name as the new one, but with the content of the old data volume container)
That will be enough for the new data volume container to give you access to your old data.
In your case, since you didn't register the path of your previous data volume container, you will have to do a search in /mnt/sda1/var/lib/docker/volumes/ for a known file.
This answer is probably just a rewrite of #VonC's, but I feel the need to sum things up a bit. Here are the steps I followed in order to put back a volume that got 'detached' after the container had been removed and recreated.
docker volume ls -f 'dangling=true' to see all detached volumes
docker volume inspect <volume_hash> on each to see where they sit (/var/lib/docker/volumes/ in my case)
Look into each volume's /_data folder to guess who's who.
Then as explained by VonC, just copy the content of the dangling volume into the new one.
(optional) when you are sure you've got back everything you need, docker volume prune will remove all dangling volumes (and therefore make a potential future search for the right dangling volume easier ^^)