In Docker, how can I share files between containers and then save them to an image? - docker

I want to commit the data in a container's shared volume to an image. I cannot seem to do it? I kind of get the impression this perhaps is not possible in Docker but that seems totally at odds with the whole philosophy of not leaving data on the host so part of me thinks there must be a way to do this.
1. Terminal 1
Start up a container in Terminal 1 with a volume.
$ docker run -it -v /data ubuntu:14.10 /bin/bash
root#19fead4f6a68:/# echo "Hello Docker Volumes." > /data/foo.txt
2. Terminal 2
Start up a second container in Terminal 2 the file from container 1 is there so docker volumes are all working.
$ docker run -it --volumes-from 19fead4f6a68 ubuntu:14.10 /bin/bash
root#5c7cdbfc67d8:/# cat /data/foo.txt
Hello Docker Volumes.
3. Terminal 3
My understanding is that I can only commit diffs to images so I check what the diffs are on both the containers. For some bizarre reason my changes do not show up!??
$ docker diff 19fead4f6a68
A /data
$ docker diff 5c7cdbfc67d8
A /data
4. Back in Terminal 1
I create a file outside of the volume folder
root#19fead4f6a68:/# echo "Docker you are a very strange beast...." > /var/beast.txt
5. Back in Terminal 3
We now have some changes we can commit although I am rather frustrated as this is not the data from the volume I needed to share with my other container.
$ docker diff 19fead4f6a68
A /data
C /var
A /var/beast.txt
Clearly this is by design. Anyone have any ideas as to why docker don't allow me to save volume data to a commit? Is there anyway at all to share files between containers and then save them to an image? I feel like there must be something I am missing? Especially to the ends of sharing data whilst avoiding host dependencies.

Volumes are outside of container images. That's exactly what they are for - bringing data inside a container that isn't in the image.
From the Docker docs:
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System to provide several useful features for persistent or shared data:
Data volumes can be shared and reused between containers
Changes to a data volume are made directly
Changes to a data volume will not be included when you update an image
If you want to save some changes as part of an image, make the changes inside the image and not in a volume. If you want to share changes across multiple containers, put that data in a volume but you have to make your own arrangements for snapshots, rollback, etc., because Docker doesn't have that feature.
Maybe you would be interested in Flocker.

It looks as though there is an open issue around adding volume layers to docker:
https://github.com/docker/docker/issues/9382

Related

Can one recover output from a Docker container after it has been removed

I've just started learning Docker for work, and my current belief is that if one runs a Docker container with the rm option - i.e.
docker run -rm mycontainer python3 ./mycode.py --logging='true'
then whatever output is produced disappears when the container closes. However, I just came across some code documentation that said:
"--rm: removes container at end of execution. Note - since output is stored in a volume, this persists beyond the life of the container"
What does this mean?
The original command this came from had the form:
docker run -it --name p2p -d --rm \
--mount type=bind,source=/home/me/scripts,target=/scripts \
--mount source=data,target=/data \
--mount source=output,target=/output \
--gpus device=GPU-5jhfjhjhjg-jhg-jgjgjh \
my_docker_container \
python3 mycode.py --logging='true' <lots of other flags>
What does it mean "the output is stored in a volume" and how do I go about finding this volume?
Docker volumes are basically just directories on the host, usually under /var/lib/docker/overlay. It's a little trickier to get to /var/lib/docker on OS X.
You can run docker volume ls to list the volumes, and docker volume inspect <id> to get the path on disk. The volume should hang around after the container is removed, unless you explicitly remove it or run docker system prune (and should be automatically re-attached by running the same command).
It looks like you didn't actually mount a volume with your command, but I've occasionally lucked out and found data under /var/lib/docker that hasn't been deleted/garbage-collected/etc yet.
Volumes are the primary method of persisting data beyond the lifetime of a container and also for sharing data between containers so that, provided that the volume is writeable, the change one container makes is visible to another.
Think of it like a network file storage shared between two computers on a network with absolutely no hard disk of their own. Now, if the computer were to shut down and get restarted, by itself it doesn't have a hard-disk to get persisted data from but because of the network file storage, it can see the latest updates to the contents made by the other machine. Same goes with volumes and containers.
The source of a docker volume could be any logical abstraction of persistant disk storage, whether its a windows drive or a linux mount point. When mounting a volume on a container, you're basically creating a linux mount point within the container that is pointing to the outside logical storage so that it sees what the host sees and vice versa. For example, in the example you shared, the host mount point /home/me/scripts/ contents is seen by the container as belonging to /scripts. In fact, if you enter the bash shell of the container and run rm on any file in /scripts within the container, it will be result in the /home/me/scripts/ content being removed as well but in reality, it IS the same thing being point at by host and container.
Volumes are essential for running databases in containers because the container by itself is ephemeral and everything is lost when it dies. But having a volume means that if the db container is started up again with the same volume mount pointing to the host file system where db data is residing, the db state remains intact.
Most of what I said is aimed at getting the basic idea of a volume and not towards being completely accurate-I hope you get what I am saying. Here is a great article that goes deeper into docker volumes. It's 5 years hold but the concept still holds.

Combing VOLUME + docker run -v

I was looking for an explanation on the VOLUME entry when writing a Dockerfile and came across this statement
A volume is a persistent data stored in /var/lib/docker/volumes/...
You can either declare it in a Dockerfile, which means each time a container is started from the image, the volume is created (empty), even if you don't have any -v option.
You can declare it on runtime docker run -v [host-dir:]container-dir.
combining the two (VOLUME + docker run -v) means that you can mount the content of a host folder into your volume persisted by the container in /var/lib/docker/volumes/...
docker volume create creates a volume without having to define a Dockerfile and build an image and run a container. It is used to quickly allow other containers to mount said volume.
But I'm having a hard time understanding this line:
...combining the two (VOLUME + docker run -v) means that you can mount the content of a host folder into your volume persisted by the container in /var/lib/docker/volumes/...
For example, let's say I have a config file on my host machine and I run the container based off the image I made with the Dockerfile I wrote. Will it copy the config file into where the volume that I stated in my the volume entry?
Would it be something like (pseudocode)
#dockerfile
From Ubuntu
Run apt-get update
Run apt-get install mysql
Volume . /etc/mysql/conf.d
Cmd systemcl start MySQL
And when I run it
docker run -it -v /path/to/config/file: ubuntu_based_image
Is this what they mean?
You probably don't want VOLUME in your Dockerfile. It's not necessary to mount files or directories at runtime, and it has confusing side effects like making subsequent RUN commands silently lose state.
If an image does have a VOLUME, and you don't mount anything else there when you start the container, Docker will create an anonymous volume and mount it for you. This can result in space leaks if you don't clean these volumes up.
You can use a docker run -v option on any container directory regardless of whether or not it's declared as a VOLUME.
If you docker run -v /host/path:/container/path, the two directories are actually the same; nothing is copied, and writes to one are (supposed to be) immediately visible on the other.
docker run -v /host/path:/container/path bind mounts aren't visible in /var/lib/docker at all.
You shouldn't usually be looking at content in /var/lib/docker (and can't if you're not on a native-Linux host). If you need to access the volume file content directly, use a bind mount rather than a named or anonymous volume.
Bind mounts like you've shown are appropriate for injecting config files into containers, and for reading log files back out. Named volumes are appropriate for stateful applications' storage, like the data for a MySQL database. Neither type of volume is appropriate for code or libraries; build these directly into Docker images instead.

What is the actual advantage of declaring a VOLUME in a Dockerfile? [duplicate]

First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.
I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:
# Dockerfile
VOLUME ["/foo"]
This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.
Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:
#!/usr/bin/env bash
# Run container for the first time
docker run -t foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t foo
I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.
Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):
#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data
# Run container for the first time
docker run -t -v foo_data:/foo foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t -v foo_data:/foo foo
Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.
So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.
Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.
Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.
Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.
Docker Data Containers
Why Docker Data Containers (Volumes!) are Good
Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.
Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.
However, the VOLUME instruction does come at a cost.
Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.
The latter issue results in problems like these.
How to “undeclare” volumes in docker image?
GitLab on Docker: how to persist user data between deployments?
For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.
tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.

What is the practical purpose of VOLUME in Dockerfile?

First of all, I want to make it clear I've done due diligence in researching this topic. Very closely related is this SO question, which doesn't really address my confusion.
I understand that when VOLUME is specified in a Dockerfile, this instructs Docker to create an unnamed volume for the duration of the container which is mapped to the specified directory inside of it. For example:
# Dockerfile
VOLUME ["/foo"]
This would create a volume to contain any data stored in /foo inside the container. The volume (when viewed via docker volume ls) would show up as a random jumble of numbers.
Each time you do docker run, this volume is not reused. This is the key point causing confusion here. To me, the goal of a volume is to contain state persistent across all instances of an image (all containers started from it). So basically if I do this, without explicit volume mappings:
#!/usr/bin/env bash
# Run container for the first time
docker run -t foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t foo
I expect the unnamed volume to be reused between the 2 run commands. However, this is not the case. Because I did not explicitly map a volume via the -v option, a new volume is created for each run.
Here's important part number 2: Since I'm required to explicitly specify -v to share persistent state between run commands, why would I ever specify VOLUME in my Dockerfile? Without VOLUME, I can do this (using the previous example):
#!/usr/bin/env bash
# Create a volume for state persistence
docker volume create foo_data
# Run container for the first time
docker run -t -v foo_data:/foo foo
# Kill the container and re-run it again. Note that the previous
# volume would now contain data because services running in `foo`
# would have written data to that volume.
docker container stop foo
docker container rm foo
# Run container a second time
docker run -t -v foo_data:/foo foo
Now, truly, the second container will have data mounted to /foo that was there from the previous instance. I can do this without VOLUME in my Dockerfile. From the command line, I can turn any directory inside the container into a mount to either a bound directory on the host or a volume in Docker.
So my question is: What is the point of VOLUME when you have to explicitly map named volumes to containers via commands on the host anyway? Either I'm missing something or this is just confusing and obfuscated.
Note that all of my assertions here are based on my observations of how docker behaves, as well as what I've gathered from the documentation.
Instructions like VOLUME and EXPOSE are a bit anachronistic. Named volumes as we know them today were introduced in Docker 1.9, almost three years ago.
Before Docker 1.9, running a container whose image had one or more VOLUME instructions (or using the --volume option) was the only way to create volumes for data sharing or persistence. In fact, it used to be a best practice to create data-only containers whose sole purpose was to hold one or more volumes, and then share those volumes with your application containers using the --volumes-from option. Here's some articles that describe this outdated pattern.
Docker Data Containers
Why Docker Data Containers (Volumes!) are Good
Also, check out moby/moby#17798 (Data-only containers obsolete with docker 1.9.0?) where the change from data-only containers to named volumes was discussed.
Today, I consider the VOLUME instruction as an advanced tool that should only be used for specialized cases, and after careful thought. For example, the official postgres image declares a VOLUME at /var/lib/postgresql/data. This can improve the performance of postgres containers out of the box by keeping the database data out of the layered filesystem. Docker doesn't have to search through all the layers of the container image for file requests at /var/lib/postgresql/data.
However, the VOLUME instruction does come at a cost.
Users might not be aware of the unnamed volumes being created, and continuing to take up storage space on their Docker host after containers are removed.
There is no way to remove a volume declared in a Dockerfile. Downstream images cannot add data to paths where volumes exist.
The latter issue results in problems like these.
How to “undeclare” volumes in docker image?
GitLab on Docker: how to persist user data between deployments?
For the GitLab question, someone wants to extend the GitLab image with pre-configured data for testing purposes, but it's impossible to commit that data in a downstream image because of the VOLUME at /var/opt/gitlab in the parent image.
tl;dr: VOLUME was designed for a world before Docker 1.9. Best to just leave it out.

Can I mount a Docker image as a volume in Docker?

I would like to distribute some larger static files/assets as a Docker image so that it is easy for user to pull those optional files down the same way they would be pulling the app itself. But I cannot really find a good way to expose files from one Docker image to the other? Is there a way to mount a Docker image itself (or a directory in it) as a volume to other Docker container?
I know that there are volume plugins I could use, but I could not find any where I could to this or something similar?
Is possible create any directory of an image to a docker volume, but not full image. At least not in a pretty or simple way.
If you want to create a directory from your image as a docker volume you can create a named volume:
docker volume create your_volume
docker run -d \
-it \
--name=yourcontainer \
-v your_volume:/dir_with_data_you_need \
your_docker_image
From this point, you'll have accessible your_volume with data from image your_docker_image
Reason why you cannot mount the whole image in a volume is because docker doesn't let specify / as source of named volume. You'll get Cannot create container for service your-srv: invalid volume spec "/": invalid volume specification: '/' even if you try with docker-compose.
Don't know any direct way.
You can use a folder in your host as a bridge to share things, this is a indirect way to acheive this.
docker run -d -v /any_of_your_host_folder:/your_assets_folder_in_your_image_container your_image
docker run -d -v /any_of_your_host_folder:/your_folder_of_your_new_container your_container_want_to_use_assets
For your_image, you need add CMD in dockerfile to copy the assets to your_assets_folder_in_your_image_container(the one you use as volume as CMD executes after volume)
This may waste time, but just at the first time the assets container starts. And after the container starts, the files in assets container in fact copy to the host folder, and has none business with assets image any more. So you can just delete the image of the assets image. Then no space waste.
You aim just want other people easy to use the assets, so why not afford script to them, automatically fetch the image -> start the container(CMD auto copy files to volume) -> delete the image/container -> the assets already on host, so people just use this host assets as a volume to do next things.
Of course, if container can directly use other image's resource, it is better than this solution. Anyway, this can be a solution although not perfect.
You can add the docker sock as a volume which will allow you to start one of your docker images from within your docker container.
To do this, add the two following volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- "/usr/bin/docker:/usr/bin/docker"
If you need to share files between the containers map the volume /tmp:/tmp when starting both containers.

Resources