Backing up docker volumes - docker

I've created a separate volume on an Ubuntu machine with the intention to store docker volumes and persist data. So far, I've created volumes on the host machine for two services (jira and postgres), which I intent to backup offsite. I am using docker-compose like so
postgres:
volumes:
- /var/dkr/pgdata:/var/lib/postgresql/data
And for jira:
volumes:
- /var/dkr/jira:/var/atlassian/jira
My thinking is that I could just rsync the /var/dkr folder to a temporary location, tar it and send it to S3. Now that I've read a bit more on the process of hosted volumes I am worried that I might end up with messed up GIDs and UIDs for the services when I restore from a backup.
My questions are - has docker resolved this problem in the newer versions (I am using the latest). Is it safe to take this approach? What would be a better way to backup my persistent volumes?

There's no magic solution to uid/gid mapping issues between containers and hosts. It would need to be implemented by the filesystem drivers in the Linux kernel, which is how NFS and some of the VM filesystem mappings work. For "bind" mounts, forcing a uid/gid is not an option from Linux, and Docker is just providing an easy to use interface on top of that.
With your backups, ensure that uid/gid is part of your backup (tar does this by default). Also ensure that the uid/gid being used in your container is defined in the image or specified to a static value in your docker run or compose file. As long as you don't depend on a host specific uid/gid, and restore preserving the uid/gid (default for tar as root), you won't have any trouble.
Worst case, you run something like find /var/dkr -uid $old_uid -exec chown $new_uid {} \; to change your UID's. The tar command also has options for change uid/gid on extract (see the man page for more details).

Related

What Is The Difference Between Binding Mounts And Volumes While Handling Persistent Data In Docker Containers?

I want to know why we have two different options to do the same thing, What are the differences between the two.
We basically have 3 types of volumes or mounts for persistent data:
Bind mounts
Named volumes
Volumes in dockerfiles
Bind mounts are basically just binding a certain directory or file from the host inside the container (docker run -v /hostdir:/containerdir IMAGE_NAME)
Named volumes are volumes which you create manually with docker volume create VOLUME_NAME. They are created in /var/lib/docker/volumes and can be referenced to by only their name. Let's say you create a volume called "mysql_data", you can just reference to it like this docker run -v mysql_data:/containerdir IMAGE_NAME.
And then there's volumes in dockerfiles, which are created by the VOLUME instruction. These volumes are also created under /var/lib/docker/volumes but don't have a certain name. Their "name" is just some kind of hash. The volume gets created when running the container and are handy to save persistent data, whether you start the container with -v or not. The developer gets to say where the important data is and what should be persistent.
What should I use?
What you want to use comes mostly down to either preference or your management. If you want to keep everything in the "docker area" (/var/lib/docker) you can use volumes. If you want to keep your own directory-structure, you can use binds.
Docker recommends the use of volumes over the use of binds, as volumes are created and managed by docker and binds have a lot more potential of failure (also due to layer 8 problems).
If you use binds and want to transfer your containers/applications on another host, you have to rebuild your directory-structure, where as volumes are more uniform on every host.
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. While bind mounts are dependent on the directory structure of the host machine, volumes are completely managed by Docker. Volumes are often a better choice than persisting data in a container’s writable layer, because a volume does not increase the size of the containers using it, and the volume’s contents exist outside the lifecycle of a given container. More on
Differences between -v and --mount behavior
Because the -v and --volume flags have been a part of Docker for a long time, their behavior cannot be changed. This means that there is one behavior that is different between -v and --mount.
If you use -v or --volume to bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory.
If you use --mount to bind-mount a file or directory that does not yet exist on the Docker host, Docker does not automatically create it for you, but generates an error. More on
Docker for Windows shared folders limitation
Docker for Windows does make much of the VM transparent to the Windows host, but it is still a virtual machine. For instance, when using –v with a mongo container, MongoDB needs something else supported by the file system. There is also this issue about volume mounts being extremely slow.
More on
Bind mounts are like a superset of Volumes (named or unnamed).
Bind mounts are created by binding an existing folder in the host system (host system is native linux machine or vm (in windows or mac)) to a path in the container.
Volume command results in a new folder, created in the host system under /var/lib/docker
Volumes are recommended because they are managed by docker engine (prune, rm, etc).
A good use case for bind mount is linking development folders to a path in the container. Any change in host folder will be reflected in the container.
Another use case for bind mount is keeping the application log which is not crucial like a database.
Command syntax is almost the same for both cases:
bind mount:
note that the host path should start with '/'. Use $(pwd) for convenience.
docker container run -v /host-path:/container-path image-name
unnamed volume:
creates a folder in the host with an arbitrary name
docker container run -v /container-path image-name
named volume:
should not start with '/' as this is reserved for bind mount.
'volume-name' is not a full path here. the command will cause a folder to be created with path "/var/lib/docker/volumes/volume-name" in the host.
docker container run -v volume-name:/container-path image-name
A named volume can also be created beforehand a container is run (docker volume create). But this is almost never needed.
As a developer, we always need to do comparison among the options provided by tools or technology. For Volume & Bind mounts, I would suggest to list down what kind of application you are trying to containerize.
Following are the parameters that I would consider before choosing Volume over Bind Mounts:
Docker provide various CLI commands to Volumes easily outside containers.
For backup & restore, Volume is far easier than Bind as it depends upon the underlying host OS.
Volumes are platform-agnostic so they can work on Linux as well as on Window containers.
With Bind, you have 2 technologies to take care of. Your host machine directory structure as well as Docker.
Migration of Volumes are easier not only on local machines but on cloud machines as well.
Volumes can be easily shared among multiple containers.

Docker temporary files strategy

My docker produces some temporary files.
Is there an encouraged strategy regarding those?
If I put those to /tmp, I'm not sure they'll get cleared. (Edit note: the link is dead. The question was, "Are default cronjobs executed in a docker container?")
Or should I expose the volume /tmp from the host machine?
I am not aware of any encouraged way to manage temporary files with Docker as it will mostly depend on how you need to handle these temporary files with your application (should they be deleted on restart? Periodically?...)
You have several possibilities depending on your needs:
Use Docker tmpfs mount
You can mount a tmpfs volume which will persist data as long as the container is running (i.e. the data in the volume will be deleted when the container stops), for example:
docker run --mount type=tmpfs,destination=/myapp/tmpdir someimage
This may be useful if you (can) restart your containers regularly and the temporary data may be recreated on container restart. However if you need to be able to clean up temporary data while the container is running, this is not a good solution as you will need to stop the container to have your temporary data cleaned.
Edit: as per #alexander-azarov coment, the tmpfs volume size is unlimited by default with the risk of the container using up all the machine memory. Using tmpfs-size flag is recommended to mitigate that risk, such as docker run --mount type=tmpfs,destination=/app,tmpfs-size=4096
Writing into the container writable layer
The writable layer of the container is where all the data will be written in the container if no volume is mounted. It will persist on container restart, but will be deleted if the container is deleted.
This way the temporary data will be deleted only when the container is deleted. It may be a good solution for short-lived containers, but not for long-lived containers.
Mounting host machine /tmp in the container with a bind mount
For example:
docker run -v /tmp/myapp-tmp-dir:/myapp/tmpdir someimage
This will cause all data to be written in the host machine /tmp/myapp-tmp-dir directory, and result will depend on how the host machine manage /tmp (in most cases, data are cleared upon machine restart)
Create and mount a volume to manage data into
You can create a volume which will contain your data, for example:
docker run --mount source=myappvol,target=/myapp/tmpdir someimage
And manage the data in the volume: mount-it in another container and cleanup the data, deleting the volume, etc.
These are the most common solutions relying (almost) solely on Docker functionalities. Another possibility would be to handle temporary files directly from your software or app running in the container, but it's more an application-related issue than a Docker-related one.

docker volume container strategy

Let's say you are trying to dockerise a database (couchdb for example).
Then there are at least two assets you consider volumes for:
database files
log files
Let's further say you want to keep the db-files private but want to expose the log-files for later processing.
As far as I undestand the documentation, you have two options:
First option
define managed volumes for both, log- and db-files within the db-image
import these in a second container (you will get both) and work with the logs
Second option
create data container with a managed volume for the logs
create the db-image with a managed volume for the db-files only
import logs-volume from data container when running db-image
Two questions:
Are both options realy valid/ possible?
What is the better way to do it?
br volker
The answer to question 1 is that, yes both are valid and possible.
My answer to question 2 is that I would consider a different approach entirely and which one to choose depends on whether or not this is a mission critical system and that data loss must be avoided.
Mission critical
If you absolutely cannot lose your data, then I would recommend that you bind mount a reliable disk into your database container. Bind mounting is essentially mounting a part of the Docker Host filesystem into the container.
So taking the database files as an example, you could image these steps:
Create a reliable disk e.g. NFS that is backed-up on a regular basis
Attach this disk to your Docker host
Bind mount this disk into my database container which then writes database files to this disk.
So following the above example, lets say I have created a reliable disk that is shared over NFS and mounted on my Docker Host at /reliable/disk. To use that with my database I would run the following Docker command:
docker run -d -v /reliable/disk:/data/db my-database-image
This way I know that the database files are written to reliable storage. Even if I lose my Docker Host, I will still have the database files and can easily recover by running my database container on another host that can access the NFS share.
You can do exactly the same thing for the database logs:
docker run -d -v /reliable/disk/data/db:/data/db -v /reliable/disk/logs/db:/logs/db my-database-image
Additionally you can easily bind mount these volumes into other containers for separate tasks. You may want to consider bind mounting them as read-only into other containers to protect your data:
docker run -d -v /reliable/disk/logs/db:/logs/db:ro my-log-processor
This would be my recommended approach if this is a mission critical system.
Not mission critical
If the system is not mission critical and you can tolerate a higher potential for data loss, then I would look at Docker Volume API which is used precisely for what you want to do: managing and creating volumes for data that should live beyond the lifecycle of a container.
The nice thing about the docker volume command is that it lets you created named volumes and if you name them well it can be quite obvious to people what they are used for:
docker volume create db-data
docker volume create db-logs
You can then mount these volumes into your container from the command line:
docker run -d -v db-data:/db/data -v db-logs:/logs/db my-database-image
These volumes will survive beyond the lifecycle of your container and are stored on the filesystem if your Docker host. You can use:
docker volume inspect db-data
To find out where the data is being stored and back-up that location if you want to.
You may also want to look at something like Docker Compose which will allow you to declare all of this in one file and then create your entire environment through a single command.

Appropriate use of Volumes - to push files into container?

I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository

How to list Docker mounted volumes from within the container

I want to list all container directories that are mounted volumes.
I.e. to be able to get similar info I get from
docker inspect --format "{{ .Volumes }}" <self>
But from within the container and without having docker installed in there.
I tried cat /proc/mounts, but I couldn't find a proper filter for it.
(EDIT - this may no longer work on Mac) If your Docker host is OS X, the mounted volumes will be type osxfs (or fuse.osxfs). You can run a
mount | grep osxfs | awk '{print $3}'
and get a list of all the mounted volumes.
If your Docker host is Linux (at least Ubuntu 14+, maybe others), the volumes appear to all be on /dev, but not on a device that is in your container's /dev filesystem. The volumes will be alongside /etc/resolv.conf, /etc/hostname, and /etc/hosts. If you do a mount | grep ^/dev to start, then filter out any of the files in ls /dev/*, then filter out the three files listed above, you should be left with host volumes.
mount | grep ^/dev/ | grep -v /etc | awk '{print $3}'
My guess is the specifics may vary from Linux to Linux. Not ideal, but at least possible to figure out.
Assuming you want to check what volumes are mounted from inside a linux based container you can look up entries beginning with "/dev" in /etc/mtab, removing the /etc entries
$ grep "^/dev" /etc/mtab | grep -v " \/etc/"
/dev/nvme0n1p1 /var/www/site1 ext4 rw,relatime,discard,data=ordered 0 0
/dev/nvme0n1p1 /var/www/site2 ext4 rw,relatime,discard,data=ordered 0 0
As you can read from many of the comments you had, a container is initially nothing but a restricted, reserved part of resources that is totally cut away from the rest of your machine. It is not aware of being a Docker, and inside the container everything behaves as if it were a separate machine. Sort of like the matrix, I guess ;)
You get access to the host machine's kernel and its resources, but yet again restricted as just a filtered out set. This is done with the awesome "cgroups" functionality that comes with Unix/Linux kernels.
Now the good news: There are multiple ways for you to provide the information to your Docker, but that is something that you are going to have to provide and build yourself.
The easiest ad most powerful way is to mount the Unix socket located on your host at /var/run/docker.sock to the inside of your container at the same location. That way, when you use the Docker client inside your container you are directly talking to the docker engine on your host.
However, with great power comes great responsibility. This is a nice setup, but it is not very secure. Once someone manages to get into your docker it has root access to your host system this way.
A better way would be to provide a list of mounts through the environment settings, or clinging on to some made-up conventions to be able to predict the mounts.
(Do you realize that there is a parameter for mounting, to give mounts an alias for inside your Docker?)
The docker exec command is probably what you are looking for.
This will let you run arbitrary commands inside an existing container.
For example:
docker exec -it <mycontainer> bash
Of course, whatever command you are running must exist in the container filesystem.
#docker cp >>>> Copy files/folders between a container and the local filesystem
docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH
docker cp [OPTIONS] SRC_PATH CONTAINER:DEST_PATH
to copy full folder:
docker cp ./src/build b081dbbb679b:/usr/share/nginx/html
Note – This will copy build directory in container’s …/nginx/html/ directory to copy only files present in folder:
docker cp ./src/build/ b081dbbb679b:/usr/share/nginx/html
Note – This will copy contents of build directory in container’s …./nginx/html/ directory
Docker Storage options:
Volumes are stored in a part of the host filesystem which is managed by Docker(/var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
When you create a volume, it is stored within a directory on the Docker host. When you mount the volume into a container, this directory is what is mounted into the container. This is similar to the way that bind mounts work, except that volumes are managed by Docker and are isolated from the core functionality of the host machine.
A given volume can be mounted into multiple containers simultaneously. When no running container is using a volume, the volume is still available to Docker and is not removed automatically. You can remove unused volumes using docker volume prune.
When you mount a volume, it may be named or anonymous. Anonymous volumes are not given an explicit name when they are first mounted into a container, so Docker gives them a random name that is guaranteed to be unique within a given Docker host. Besides the name, named and anonymous volumes behave in the same ways.
Volumes also support the use of volume drivers, which allow you to store your data on remote hosts or cloud providers, among other possibilities.
Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
Available since the early days of Docker. Bind mounts have limited functionality compared to volumes. When you use a bind mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its full path on the host machine. The file or directory does not need to exist on the Docker host already. It is created on demand if it does not yet exist. Bind mounts are very performant, but they rely on the host machine’s filesystem having a specific directory structure available. If you are developing new Docker applications, consider using named volumes instead. You can’t use Docker CLI commands to directly manage bind mounts.
One side effect of using bind mounts, for better or for worse, is that you can change the host filesystem via processes running in a container, including creating, modifying, or deleting important system files or directories. This is a powerful ability which can have security implications, including impacting non-Docker processes on the host system.
tmpfs mounts are stored in the host system’s memory only, and are never written to the host system’s filesystem.
A tmpfs mount is not persisted on disk, either on the Docker host or within a container. It can be used by a container during the lifetime of the container, to store non-persistent state or sensitive information. For instance, internally, swarm services use tmpfs mounts to mount secrets into a service’s containers.
If you need to specify volume driver options, you must use --mount.
-v or --volume: Consists of three fields, separated by colon characters (:). The fields must be in the correct order, and the meaning of each field is not immediately obvious.
o In the case of named volumes, the first field is the name of the volume, and is unique on a given host machine. For anonymous volumes, the first field is omitted.
o The second field is the path where the file or directory will be mounted in the container.
o The third field is optional, and is a comma-separated list of options, such as ro. These options are discussed below.
• --mount: Consists of multiple key-value pairs, separated by commas and each consisting of a = tuple. The --mount syntax is more verbose than -v or --volume, but the order of the keys is not significant, and the value of the flag is easier to understand.
o The type of the mount, which can be bind, volume, or tmpfs. This topic discusses volumes, so the type will always be volume.
o The source of the mount. For named volumes, this is the name of the volume. For anonymous volumes, this field is omitted. May be specified as source or src.
o The destination takes as its value the path where the file or directory will be mounted in the container. May be specified as destination, dst, or target.
o The readonly option, if present, causes the bind mount to be mounted into the container as read-only.
o The volume-opt option, which can be specified more than once, takes a key-value pair consisting of the option name and its value.

Resources