How to take backup of docker volumes? - docker

I'm using named volumes to persist data on Host machine in the cloud.
I want to take backup of these volumes present in the docker environment so that I can reuse them on critical incidents.
Almost decided to write a python script to compress the specified directory on the host machine and push it to the AWS S3.
But I would like to know if there is any other approaches to this problem?

docker-volume-backup may be helpful. It allows you to back up your Docker volumes to an external location or to a S3 storage.
Why use a Docker container to back up a Docker volume instead of writing your own Python script? Ideally you don't want to make backups while the volume is being used, so having a container on your docker-compose able to properly stop your container before taking backups can effectively copy data without affecting the application performance or backup integrity.
There's also this alternative: volume-backup

Related

Where do docker images' new Files get saved to in GCP?

I want to create some docker images that generates text files. However, since images are pushed to Container Registry in GCP. I am not sure where the files will be generated to when I use kubectl run myImage. If I specify a path in the program, like '/usr/bin/myfiles', would they be downloaded to the VM instance where I am typing "kubectl run myImage"? I think this is probably not the case.. What is the solution?
Ideally, I would like all the files to be in one place.
Thank you
Container Registry and Kubernetes are mostly irrelevant to the issue of where a container will persist files it creates.
Some process running within a container that generates files will persist the files to the container instance's file system. Exceptions to this are stdout and stderr which are both available without further ado.
When you run container images, you can mount volumes into the container instance and this provides possible solutions to your needs. Commonly, when running Docker Engine, it's common to mount the host's file system into the container to share files between the container and the host: docker run ... --volume=[host]:[container] yourimage ....
On Kubernetes, there are many types of volumes. An seemingly obvious solution is to use gcePersistentDisk but this has a limitation in that it these disks may only be mounted for write on one pod at a time. A more powerful solution may be to use an NFS-based solution such as nfs or gluster. These should provide a means for you to consolidate files outside of the container instances.
A good solution but I'm unsure whether it is available, would be to write your files as Google Cloud Storage objects.
A tenet of containers is that they should operate without making assumptions about their environment. Your containers should not make assumptions about running on Kubernetes and should not make assumptions about non-default volumes. By this I mean, that your containers will write files to container's file system. When you run the container, you apply the configuration that e.g. provides an NFS volume mount or GCS bucket mount etc. that actually persists the files beyond the container.
HTH!

All branches of Git lab and Image lost when Docker container exits? How to save?

If my container exits then all my images got lost and their respective data as well.
Could you please answer how to save the data (Here in this case of gitlab, we have multiple branches). How to save those branches even if container exits and next time when we restart the container, I should get all my old branches back?
This question is a bit light on specific details of your workflow, but the general answer to the need for persistent data in the ephemeral container world is volumes. Without a broader understanding of your workflow and infrastructure, it could be as simple as making sure that your gitlab data is in a named local volume. E.g. something you create with docker volume or an image that everyone uses that has a VOLUME location identified in the Dockerfile and is bind mounted to a host location at container run time.
Of course once you are in the world of distributed systems and orchestrating multi-node container environments, local volumes will no longer be a viable answer and you will need to investigate shared volume capabilities from a storage vendor or self-managed with NFS or some other global filesystem capabilities. A lot of good detail is provided in the Docker volume administrative guide if you are new to the volume concept.

Deploy a docker app using volume create

I have a Python app using a SQLite database (it's a data collector that runs daily by cron). I want to deploy it, probably on AWS or Google Container Engine, using Docker. I see three main steps:
1. Containerize and test the app locally.
2. Deploy and run the app on AWS or GCE.
3. Backup the DB periodically and download back to a local archive.
Recent posts (on Docker, StackOverflow and elsewhere) say that since 1.9, Volumes are now the recommended way to handle persisted data, rather than the "data container" pattern. For future compatibility, I always like to use the preferred, idiomatic method, however Volumes seem to be much more of a challenge than data containers. Am I missing something??
Following the "data container" pattern, I can easily:
Build a base image with all the static program and config files.
From that image create a data container image and copy my DB and backup directory into it (simple COPY in the Dockerfile).
Push both images to Docker Hub.
Pull them down to AWS.
Run the data and base images, using "--volume-from" to refer to the data.
Using "docker volume create":
I'm unclear how to copy my DB into the volume.
I'm very unclear how to get that volume (containing the DB) up to AWS or GCE... you can't PUSH/PULL a volume.
Am I missing something regarding Volumes?
Is there a good overview of using Volumes to do what I want to do?
Is there a recommended, idiomatic way to backup and download data (either using the data container pattern or volumes) as per my step 3?
When you first use an empty named volume, it will receive a copy of the image's volume data where it's first used (unlike a host based volume that completely overlays the mount point with the host directory). So you can initialize the volume contents in your main image as a volume, upload that image to your registry and pull that image down to your target host, create a named volume on that host, point your image to that named volume (using docker-compose makes the last two steps easy, it's really 2 commands at most docker volume create <vol-name> and docker run -v <vol-name>:/mnt <image>), and it will be populated with your initial data.
Retrieving the data from a container based volume or a named volume is an identical process, you need to mount the volume in a container and run an export/backup to your outside location. The only difference is in the command line, instead of --volumes-from <container-id> you have -v <vol-name>:/mnt. You can use this same process to import data into the volume as well, removing the need to initialize the app image with data in it's volume.
The biggest advantage of the new process is that it clearly separates data from containers. You can purge all the containers on the system without fear of losing data, and any volumes listed on the system are clear in their name, rather than a randomly assigned name. Lastly, named volumes can be mounted anywhere on the target, and you can pick and choose which of the volumes you'd like to mount if you have multiple data sources (e.g. config files vs databases).

Appropriate use of Volumes - to push files into container?

I was reading Project Atomic's guidance for images which states that the 2 main use cases for using a volume are:-
sharing data between containers
when writing large files to disk
I have neither of these use cases in my example using an Nginx image. I intended to mount a host directory as a volume in the path of the Nginx docroot in the container. This is so that I can push changes to a website's contents into the host rather then addressing the container. I feel it is easier to use this approach since I can - for example - just add my ssh key once to the host.
My question is, is this an appropriate use of a data volume and if not can anyone suggest an alternative approach to updating data inside a container?
One of the primary reasons for using Docker is to isolate your app from the server. This means you can run your container anywhere and get the same result. This is my main use case for it.
If you look at it from that point of view, having your container depend on files on the host machine for a deployed environment is counterproductive- running the same container on a different machine may result in different output.
If you do NOT care about that, and are just using docker to simplify the installation of nginx, then yes you can just use a volume from the host system.
Think about this though...
#Dockerfile
FROM nginx
ADD . /myfiles
#docker-compose.yml
web:
build: .
You could then use docker-machine to connect to your remote server and deploy a new version of your software with easy commands
docker-compose build
docker-compose up -d
even better, you could do
docker build -t me/myapp .
docker push me/myapp
and then deploy with
docker pull
docker run
There's a number of ways to achieve updating data in containers. Host volumes are a valid approach and probably the simplest way to achieve making your data available.
You can also copy files into and out of a container from the host. You may need to commit afterwards if you are stopping and removing the running web host container at all.
docker cp /src/www webserver:/www
You can copy files into a docker image build from your Dockerfile, which is the same process as above (copy and commit). Then restart the webserver container from the new image.
COPY /src/www /www
But I think the host volume is a good choice.
docker run -v /src/www:/www webserver command
Docker data containers are also an option for mounted volumes but they don't solve your immediate problem of copying data into your data container.
If you ever find yourself thinking "I need to ssh into this container", you are probably doing it wrong.
Not sure if I fully understand your request. But why you need do that to push files into Nginx container.
Manage volume in separate docker container, that's my suggestion and recommend by Docker.io
Data volumes
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization.
Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
refer: Manage data in containers
As said, one of the main reasons to use docker is to achieve always the same result. A best practice is to use a data only container.
With docker inspect <container_name> you can know the path of the volume on the host and update data manually, but this is not recommended;
or you can retrieve data from an external source, like a git repository

What's the correct way to take automatic backups of docker volume containers?

I am in the process of building a simple web application using NodeJS that persists data to a MySQL database and saves images that have been uploaded to it. With my current setup, I have 4 Docker containers - 1 for the NodeJS application, 1 for the MySQL server, 1 Volume Container for the MySQL Data and 1 Volume container for the uploaded files.
What I would like to do is come up with a mechanism where I can periodically take backups of both volume containers automatically without stopping the web application.
Is it possible to do this and if so, what's the best way?
I have looked at the Docker Documentation on Volume management that covers backing up and restoring volumes, but I'm not sure that would work while the application is still writing data to the database or saving uploaded files.
To backup your database I suggest to use mysqldump, that is more safer then a simple file copy.
For volume backup you can also run a container, link the volume and tar the contents together.
In both cases you can use additional containers or process injection via docker exec.

Resources