programmatic access to docker volumes from host - docker

I am using the docker HTTP API described here.
Suppose I get a volume ID using the GET /volumes API endpoint. Is it possible for me to inspect the contents of this volume (list files, read files)?
I understand that I could create a container that mounts this volume and then use the /containers/(id)/archive endpoint to download files from it, but this seems like a rather expensive operation when all I wish to do is inspect the contents of a single file on the volume.

I think the right thing is too execute the scripts you want to execute in a container with the volumes mounted, but you can just list the files and folders in the volume folder here : /var/lib/docker/volumes/.
This path is gonna change if you tweak a bit docker, but your volumes are always stored somewhere, just go inside the folder corresponding at your volume ID.
See ya !

Related

Docker force usage of volume

I know that you can specify a volume inside the dockerfile, but I see the problem that the user is not required to create such a volume.
What if he forgot to specify a volume and than there are many, possibly expensive to create, files saved there, but they are not persistent, because there is no volume specified?
So my question is if it is possible to force the user to create a volume for that mountpoint, or at least check at start time (inside the container) if there is a volume mounted, so that it can react to the missing volume?
EDIT: With the new information that there are automatic created unnnamed volumes I would also accept a user-side solution (not changing the container in such a ways that he checks the volume, but a docker-deamon settings which warn/prevents me from creating unnamed volumes by mistake.
I think the VOLUME declaration is the best you can do here.
In general, a container cannot force itself to be run with any particular options. You could make a similar argument that a container "must" be run with published port or with an attached stdin to be useful, but Docker doesn't allow an image to force these on either. (And more importantly, an image can't require direct access to the host filesystem, host networking, or privileged mode.)
As #masseyb notes in a comment, the key effect of the Dockerfile VOLUME directive is to create a new anonymous volume on the given directory if nothing else is mounted there. docker volume ls will show it and you should be able to use the volume ID directly in docker run -v options, so you won't actually lose data here. (There doesn't seem to be a command to give a name to the volume, surprisingly.)
In principle it's possible to check some things in an entrypoint wrapper script, but that won't work well for this volume case. The container can't tell whether a directory is an automatically-created anonymous volume or a new empty named volume.
(Also remember that volumes, including automatically-created anonymous volumes, are never committed to images. In your Dockerfile you can't change the directory content after you declare it a VOLUME; if an end user tries to docker commit a derived image it won't include the volume data. Unless you're sure it's what you want, I usually advise against declaring VOLUME. The case you describe in the question is pretty much the one case where it's useful.)

Docker: Handling user uploads and saving files

I have been reading about Docker, and one of the first things that I read about docker was that it runs images in a read-only manner. This has raised this question in my mind, what happens if I need users to upload files? In that case where would the file go (are they appended to the image)? or in other words, how to handle uploaded files?
Docker containers are meant to be immutable and replaceable - you should be able to stop a container and replace it with a newer version without any ill effects. It's bad practice to store any configuration or operational data inside the container.
The situation you describe with file uploads would typically be resolved with a volume, which mounts a folder from the host filesystem into the container. Any modifications performed by the container to the mounted folder would persist on the host filesystem. When the container is replaced, the folder is re-mounted when the new container is started.
It may be helpful to read up on volumes: https://docs.docker.com/storage/volumes/
docker containers use file systems similar to their underlying operating system, as it seems in your case Windows Nano Server(windows optimized to be used in a container).
so any uploads to your container will be placed on the corresponding path you provided when uploading the file.
but this data is ephemeral, this means your data will persist until the container is for whatever reason stopped.
to use persistent storage you must provide a volume for your docker container, you can think of volumes as external disks attached to a container that mount on a path inside the container. this will persist data regardless of container state

How can I share data between host and container using mounts

I've been attempting to share data between my host and my container. I've been reading a lot about volumes and I believe I have misunderstood some of the fundamentals around sharing data.
Here's how I've been doing it (with Docker Compose)
version: "2"
services:
my-server:
volumes:
- type: bind
source: ./test/
target: /var/logs
The problem with this approach is that the initial creation of the mount destroys any data in the target folder. So for example if my image was built from another image that had some logs in that folder (for whatever reason), the logs would be destroyed.
This is a major problem with my use case. I need to mount a volume (a folder, basically) so that I can share data between my host and guest, similar to how a shared folder with a VM would work.
I've looked into named volumes but from what I understand, named and anonymous volumes are designed to share data between containers, and not to share data with the host (which is what I need for my use case).
So besides bind mounts, is it possible share data between the host and container?
This is not really a Docker problem. I think you'll run into this with any mount. Basically you are already using the correct mechanism for sharing data between the host and your container.
When you mount something in linux, the mount target (i.e. the path at which you mount something) is always replaced with the root of whatever you mount. It does not merge the contents of the mount target with the contents of the (in this case) bind mounted directory. I'm surprised that works with VM shared folders because you run a high risk of a collision. e.g. same file in both locations. How would it resolve that? File system mounts are not the same as a dropbox like synchronisation of files between two locations.
I suggest that you do your bind mount to somewhere else in your container which has no contents and then modify your in-container workflow to handle this. In your example it sounds like you are attempting to collect logs. It also sounds like the containers configured log directory might have some contents which you want to be copied to the host. You could achieve this by having your container init itself by configuring a new log directory before starting your services/running anything, and copying any existing logs to that location. This new location would be the bind mount. Your init script could also detect if the bind mount was already used in this fashion and not sync over the data. This is really an application specific problem.

How to do host-specific customization on docker images

Let me explain what I want with a silly example:
After I do "docker pull" to download an image to my host, I want to create a file /etc/myname on this image to have the exact name of this host. As a result, all containers running this image on this host can find the hostname by reading /etc/myname.
Plus, I want the file /etc/myname to be shared across all contains on this host. I know I can easily create this file separately in each container, but that's not what I want.
(Again, this is just a silly example. I don't actually need to store the hostname. I want to store a large amount of host-specific data in a shared file, without using a shared volume).
I can do that by manually creating the file myself, where $dir is the top-most layer of the image:
dir=17024e41f8b6c958c5c9e60bffa8b6c8b2da5a1235b6e18085d5059f9602f605
echo $HOSTNAME > /var/lib/docker/aufs/diff/$dir/etc/myname
But is there a less hacky way to do this?
The easiest way to do this would be to use a shared volume, and that is in fact the only way to do it currently. I assume you know about bind mounting in docker, but I'll show here just in case.
To the docker run command, as well as passing -v <volume name>:<path in container> you can also pass <path on host>:<path in container>. So you could have your metadata in the same place on each host and then bind mount it into the containers.

Deploy a docker app using volume create

I have a Python app using a SQLite database (it's a data collector that runs daily by cron). I want to deploy it, probably on AWS or Google Container Engine, using Docker. I see three main steps:
1. Containerize and test the app locally.
2. Deploy and run the app on AWS or GCE.
3. Backup the DB periodically and download back to a local archive.
Recent posts (on Docker, StackOverflow and elsewhere) say that since 1.9, Volumes are now the recommended way to handle persisted data, rather than the "data container" pattern. For future compatibility, I always like to use the preferred, idiomatic method, however Volumes seem to be much more of a challenge than data containers. Am I missing something??
Following the "data container" pattern, I can easily:
Build a base image with all the static program and config files.
From that image create a data container image and copy my DB and backup directory into it (simple COPY in the Dockerfile).
Push both images to Docker Hub.
Pull them down to AWS.
Run the data and base images, using "--volume-from" to refer to the data.
Using "docker volume create":
I'm unclear how to copy my DB into the volume.
I'm very unclear how to get that volume (containing the DB) up to AWS or GCE... you can't PUSH/PULL a volume.
Am I missing something regarding Volumes?
Is there a good overview of using Volumes to do what I want to do?
Is there a recommended, idiomatic way to backup and download data (either using the data container pattern or volumes) as per my step 3?
When you first use an empty named volume, it will receive a copy of the image's volume data where it's first used (unlike a host based volume that completely overlays the mount point with the host directory). So you can initialize the volume contents in your main image as a volume, upload that image to your registry and pull that image down to your target host, create a named volume on that host, point your image to that named volume (using docker-compose makes the last two steps easy, it's really 2 commands at most docker volume create <vol-name> and docker run -v <vol-name>:/mnt <image>), and it will be populated with your initial data.
Retrieving the data from a container based volume or a named volume is an identical process, you need to mount the volume in a container and run an export/backup to your outside location. The only difference is in the command line, instead of --volumes-from <container-id> you have -v <vol-name>:/mnt. You can use this same process to import data into the volume as well, removing the need to initialize the app image with data in it's volume.
The biggest advantage of the new process is that it clearly separates data from containers. You can purge all the containers on the system without fear of losing data, and any volumes listed on the system are clear in their name, rather than a randomly assigned name. Lastly, named volumes can be mounted anywhere on the target, and you can pick and choose which of the volumes you'd like to mount if you have multiple data sources (e.g. config files vs databases).

Resources