Difference between data and host volume in Docker containers? - docker

As far as I know, there are 2 options for using volume in Docker:
1. Mount a host directory as data volume:
docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=<YourStrong!Passw0rd>" -p 1433:1433
-v <host directory>/data:/var/opt/mssql/data
-d mcr.microsoft.com/mssql/server:2019-latest
2. Use data volume containers:
docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=<YourStrong!Passw0rd>" -p 1433:1433
-v sqlvolume:/var/opt/mssql -d mcr.microsoft.com/mssql/server:2019-latest
My questions:
1) In the 1st option, I think data in the /var/opt/mssql/data is completely kept on <host directory>. But in the 2nd option, where is it kept in the Docker data files?
2) Let's say I have a currently used container where /var/opt/mssql/data is stored and then decide to mount it to a <host directory>. In this scene, can I move this data directly to the host directory using the docker run command in the 2nd option? Or should I backup and restore this data?
3) When mounting data in a host directory and deleting all the containers and uninstalling Docker, we still have the data. So, after reinstalling the Docker and mounting the same database to the same host, will we get the same data before uninstalling Docker?

Your bind mounted data is kept in the bind-mounted folder on your host. Nothing really mysterious about that. For docker volumes, in most typical situations, the data is kept on your docker host. The exact place and folder architecture depends on your configuration, more specifically on the storage driver in use. Meanwhile, inspecting /var/lib/docker/volumes is probably a good start to look under the hood. Note that some drivers support storing data over the network (e.g. vieux/sshfs...), but from your question I doubt you use any of those at the moment.
I'm not totally sure what you are asking here. Meanwhile, one feature available with docker volumes (and not with bind mounts, i.e. mounting a host folder to the container) is the ability to copy the content from the path in the image to a freshly created volume when the container starts.
Data is data, wherever you keep it. If you have any sort of (thoroughly tested!) backup, you will always be able to restore it and feed it to your new container. The situation is usually quite easy to visualize regarding bind-mounts for docker beginners (i.e. backup the folder on the host, restore it and bind-mount it back to a new container). If you decide to work with volumes (which is probably a good suggestion as explained in the volume documentation introduction), see Backup, restore, or migrate data volumes to get a first idea of how to proceed.

Related

Can one recover output from a Docker container after it has been removed

I've just started learning Docker for work, and my current belief is that if one runs a Docker container with the rm option - i.e.
docker run -rm mycontainer python3 ./mycode.py --logging='true'
then whatever output is produced disappears when the container closes. However, I just came across some code documentation that said:
"--rm: removes container at end of execution. Note - since output is stored in a volume, this persists beyond the life of the container"
What does this mean?
The original command this came from had the form:
docker run -it --name p2p -d --rm \
--mount type=bind,source=/home/me/scripts,target=/scripts \
--mount source=data,target=/data \
--mount source=output,target=/output \
--gpus device=GPU-5jhfjhjhjg-jhg-jgjgjh \
my_docker_container \
python3 mycode.py --logging='true' <lots of other flags>
What does it mean "the output is stored in a volume" and how do I go about finding this volume?
Docker volumes are basically just directories on the host, usually under /var/lib/docker/overlay. It's a little trickier to get to /var/lib/docker on OS X.
You can run docker volume ls to list the volumes, and docker volume inspect <id> to get the path on disk. The volume should hang around after the container is removed, unless you explicitly remove it or run docker system prune (and should be automatically re-attached by running the same command).
It looks like you didn't actually mount a volume with your command, but I've occasionally lucked out and found data under /var/lib/docker that hasn't been deleted/garbage-collected/etc yet.
Volumes are the primary method of persisting data beyond the lifetime of a container and also for sharing data between containers so that, provided that the volume is writeable, the change one container makes is visible to another.
Think of it like a network file storage shared between two computers on a network with absolutely no hard disk of their own. Now, if the computer were to shut down and get restarted, by itself it doesn't have a hard-disk to get persisted data from but because of the network file storage, it can see the latest updates to the contents made by the other machine. Same goes with volumes and containers.
The source of a docker volume could be any logical abstraction of persistant disk storage, whether its a windows drive or a linux mount point. When mounting a volume on a container, you're basically creating a linux mount point within the container that is pointing to the outside logical storage so that it sees what the host sees and vice versa. For example, in the example you shared, the host mount point /home/me/scripts/ contents is seen by the container as belonging to /scripts. In fact, if you enter the bash shell of the container and run rm on any file in /scripts within the container, it will be result in the /home/me/scripts/ content being removed as well but in reality, it IS the same thing being point at by host and container.
Volumes are essential for running databases in containers because the container by itself is ephemeral and everything is lost when it dies. But having a volume means that if the db container is started up again with the same volume mount pointing to the host file system where db data is residing, the db state remains intact.
Most of what I said is aimed at getting the basic idea of a volume and not towards being completely accurate-I hope you get what I am saying. Here is a great article that goes deeper into docker volumes. It's 5 years hold but the concept still holds.

Docker: using a bind mount locally with swarm

Docker newcomer here.
I have a simple image of a django website with a volume defined for the app directory.
I can bind this volume to the actual folder where I do the development with this command :
docker container run --rm -p 8000:8000 --mount type=bind,src=$(pwd)/wordcount-project,target=/usr/src/app/wordcount-project wordcount-django
This works fairly well.
Now I tried to push that simple example in a swarm. Note that I have set up a local registry for the image to be available.
So to start my service I'd do :
docker service create -p 8000:8000 --mount type=bind,source=$(pwd)/wordcount-project,target=/usr/src/app/wordcount-project 127.0.0.1:5000/wordcount-django
It will work after some tries but only because it run on the local node (where the actual folder is) and not a remote node (where there is no wordcount-project folder).
Any idea how to solve this so that this folder can be accessible to all node and yet, still be accessible locally for development ?
Thanks !
Using bind-mount in docker swarn is not recommended, as you can read in the doc. In particular :
Important: Bind mounts can be useful but they can also cause problems. In most cases, it is recommended that you architect your application such that mounting paths from the host is unnecessary.
However, if you still want to use bind-mount, then you have two possibility :
Make sure your folder exists on all the nodes. The main problem here is that you'll have to update it everytime on every node.
Use a shared filesystem (such as sshfs for example) and mount it on a directory on each node. However, now that you have a shared filesystem, then you can just use a docker data volume and change the driver.
You can find some documentation on changing the volume data driver here

Making Docker "undeletable" volume

I have a docker named volume for database data. Now the thing is that when the database container is down and I (or anyone) run docker system prune it deletes all the unused containers, images and volumes including the one with database data. Is there a way to make the volume undeletable unless it is explicitly told to?
I suppose I can just mount a host directory to the container without making it a docker volume (and therefore without the risk of deleting it), but using docker volume seems like a cleaner way to do it.
When you run docker system pruneit is going to wipe out everything. But if you do something like this docker run -d -p 8080:8080 -p 1521:1521 -v /Users/noname_dev/programming/oracle-database:/u01/app/oracle -e DBCA_TOTAL_MEMORY=1024 oracle-database
then /Users/noname_dev/programming/oracle-database will still be there on your local but the container will naturally be gone till you create it again.

File storage options with Docker

We plan to use Docker with our new asp.net core project and one of the requirements is that app will upload files and we need to have them stored permanently.
We have read that Docker creates filesystem/volumes (i might be imprecise in terminology here) per container and if container is recreated for whatever reason - filesystem/volume exposed to container is lost.
We would like to avoid storing files in our database (mongodb).
What is the usual, best practice way to have files permanently&reliably stored with Docker?
Keeping non-ephemeral data in external storage servers is one solution. An more recent approach is to use S3 or a local equivalent like minio to store shared or private data that needs to outlive the lifetime of the container.
Refer to similar question
It's possible to create data volumes in the docker image/container.
$ docker run -d -P --name web -v /webapp training/webapp python app.py
This will create a new volume inside a container at /webapp. But the files stored will be lost once the container is destroyed.
On the other hand, we can mount a host directory into a container. The host directory will then by accessible inside the container.
$ docker run -d -P --name web -v /src/webapp:/webapp training/webapp python app.py
This command mounts the host directory, /src/webapp, into the container at /webapp.
The files stored by the docker container into this mounted directory will be available even if the container is destroyed. If you are planning to persist the files beyond the life time of container this will be a good option.

Volume and data persistence

What is the best way to persist containers data with docker? I would like to be able to retain some data and be able to get them back when restarting my container. I have read this interesting post but it does not exactly answer my question.
As far as I understand, I only have one option:
docker run -v /home/host/app:/home/container/app
This will mount the countainer folder onto the host.
Is there any other option? FYI, I don't use linking containers (--link )
Using volumes is the best way of handling data which you want to keep from a container. Using the -v flag works well and you shouldn't run into issues with this.
You can also use the VOLUME instruction in the Dockerfile which means you will not have to add any more options at run time, however they're quite tightly coupled with the specific container, you'd need to use docker start, rather than docker run to get the data back (or of course -v to the volume which was created in the past, likely in /var/ somewhere).
A common way of handling volumes is to create a data volume container with volumes defined by -v Then when you create your app container, use the --volumes-from flag. This will make your new container use the same volumes as the container you used the -v on (your data volume container). Of course this may seem like you're shifting the issue somewhere else.
This makes it quite simple to share volumes over multiple containers. Perhaps you have a container for your application, and another for logstash.
create a volume-container: this format of -v creates a volume, directory e.g. /var/lib/docker/volume/d3b0d5b781b7f92771b7342824c9f136c883af321a6e9fbe9740e18b93f29b69
which is still a bind mounted /container/path/vol
docker run -v /foo/bar/vol --name volbox ubuntu
I can now use this container, as my volume.
docker run --volumes-from volbox --name foobox ubuntu /bin/bash
root#foobox# ls /container/path/vol
Now, if I distribute these two containers, they will just work. The volume will always be available to foobox, regardless which host it is deployed to.
The snag of course comes if you don't want your storage to be in /var/lib/docker/volumes...
I suggest you take a look at some of the excellent post by Michael Crosby
https://docs.docker.com/userguide/dockervolumes/
and the docker docs
https://docs.docker.com/userguide/dockervolumes/

Resources