Docker Backup Concept. A Beginner Question - docker

Be there a machine that runs various docker projects. Each docker container is regularly replaced/stopped/started as soon as newer versions arrive from the build system.
How does a backup concept for such a machine look like?
Looking into similar questions [1] the correct path to a working backup/restore procedure is not immediately clear to me. My current understanding is something like:
Backup
Use scripts to create images and containers. Store/Backup scripts in your favorite Version Control System. Use version tags to pull docker images. Don't use latest tag.
Exclude /var/lib/docker/overlay2 from backup (to prevent backing up dangling and temporary stuff)
Use named volumes only. Volumes can be saved and restored from backup. For database stuff extra work has to be done. Eventually consider to tar volumes to extra folder [2].
docker prune daily to remove dangling stuff
Restore
Make sure all named volumes are back in place.
Fetch scripts from version control to recreate images as needed. Use docker run to recreate containers.
Application specific tasks - restore databases from dumps , etc.
[1]
How can I backup a Docker-container with its data-volumes?
[2] https://stackoverflow.com/a/48112996/1485527

Don't use latest tag in your images. Set correct tags (like v0.0.1, v0.0.2, etc) for your images and you can have all of your versions in a docker registry.
You should prefer to use stateless container
What is about docker volume? You can use it https://docs.docker.com/storage/volumes/
If you use bind mount volume you can manually save you files in archive for backup

Related

Should I use the application code inside the docker images or in volumes?

I am working on a Devops project. I want to find the perfect solution.
I have a conflict between two solutions. should I use the application code inside the docker images or in volumes?
Your code should almost never be in volumes, developer-only setups aside (and even then). This is doubly true if you have a setup like a frequent developer-only Node setup that puts the node_modules directory into a Docker-managed anonymous volume: since Docker will refuse to update that directory on its own, the primary effect of this is to cause Docker to ignore any changes to the package.json file.
More generally, in this context, you should think of the image as a way to distribute the application code. Consider clustered environments like Kubernetes: the cluster manager knows how to pull versioned Docker images on its own, but you need to work around a lot of the standard machinery to try to push code into a volume. You should not need to both distribute a Docker image and also separately distribute the code in the image.
I'd suggest using host-directory mounts for injecting configuration files and for storing file-based logs (if the container can't be configured to log to stdout). Use either host-directory or named-volume mounts for stateful containers' data (host directories are easier to back up, named volumes are faster on non-Linux platforms). Do not use volumes at all for your application code or libraries.
(Consider that, if you're just overwriting all of the application code with volume mounts, you may as well just use the base node image and not build a custom image; and if you're doing that, you may as well use your automation system (Salt Stack, Ansible, Chef, etc.) to just install Node and ignore Docker entirely.)

Sharing docker volumes within the workplace

I have taken some time to create a useful Docker volume for use at work. It has a restored backup of one of our software databases (SQL Server) on it, and I use it for testing/debug by just attaching it to whatever Linux SQL Container I feel like running at the time.
When I make useful Docker images at work, I share them with our team using either the Azure Container Registry or the AWS Elastic Container Registry. If there's a DockerFile I've made as part of a solution, I can store that in our GIT repo for others to access.
But what about volumes? Is there a way to share these with colleagues so they don't need to go through the process I went through to build the volume in the first place? So if I've got this 'databasevolume' is there a way to source control it? Or share it as a file to other users of Docker within my team? I'm just looking to save them the time of creating a volume, downloading the .bak file from its storage location, restoring it etc.
The short answer is that there is no default docker functionality to export the contents of a docker volume and docker export explicitly does not export the contents of the volumes associated with the container. You can backup, restore or migrate data volumes.
Note: if your're backing up a database I'd suggest using the appropriate tools for that database.

Sharing bind volume in Docker swarm

We use open-jdk image to deploy our jars. since we have multiple jars we simply attach them using bind mode and run them. I don't want to build separate images since our deployment will be in air gaped environments and each time I can't rebuild images as only the jars will be changing.
Now we are trying to move towards swarm. Since it is a bind mount, I'm unable to spread the replicas to other nodes.
If I use volumes how can I put these jars into that volume? One possibility is that I can run a dummy alpine image and mount the volume to host and then I can share it with other containers. But it possible to share that volume between the nodes? and is it an optimum solution? Also if I need to update the jars how can that be done?
I can create NFS drive but I'm trying to figure out a way of implementing without it. Since it is an isolated environment and may contain crucial data I can't use 3rd party plugins to finish the job as well.
So how docker swarm can be implemented in this scenario?
Use docker build. Really.
An image is supposed to be a static copy of your application and its runtime, and not the associated data. The statement "only the jars changed" means "we rebuilt the application". While you can use bind mounts to inject an application into a runtime-only container, I don't feel like it's really a best practice, and that's doubly true in a language where there's already a significant compile-time step.
If you're in an air-gapped environment, you need to figure out how you're going to provide application updates (regardless of the deployment framework). The best solution, if you can manage it, is to set up a private Docker registry on the isolated network, docker save your images (with the tars embedded), then docker load, docker tag, and docker push them into the registry. Then you can use the registry-tagged image name everywhere and not need to worry about manually pushing the images and/or jar files across.
Otherwise you need to manually distribute the image tar and docker load it, or manually push your updated jars on to each of the target systems. An automation system like Ansible works well for this; I'm partial to Ansible because it doesn't require a central server.

Docker Merge Volumes between host and image

I have an application, in which I want to install into a docker image. This particular application has a folder for custom user's plugins. A user can put their plugins for our application there and we will load and execute them. We also ship our application with some plugins already. What I wanted is when I run docker mounting a volume with the -v options it still keeps the contents already in the image in a way like the contents from the image is merged with the ones in the host folder. Is that possible? Is there another solution that not involves a refactor in the app to support loading from multiple folders to achieve that?
You can mount them into your /plugins/customplugin1. In that case ls plugins should show
customplugin1
standardplugin
standardplugin2

Is it a docker best practice to use volume for the code?

The VOLUME instruction should be used to expose any database storage area, configuration storage, or files/folders created by your docker container. You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image.
will you store your code in volume?
Such as your jar files. It could be a little convenient to deploy the application without rebuilding the image.
Are there any considerations if storing the code in volume? like performance, security or others.
I don't recommend using a VOLUME statement inside the Dockerfile for anything with current versions of docker (current being any version of docker since the introduction of named volumes). Including a VOLUME command has multiple downsides, including:
possible inability to change contents at that location of the image with any later steps or child images (this behavior appears to be different with different scenarios and different versions of docker)
potential to create volumes with just a hash for the name that clutter up the docker volume ls output and are very difficult to find and reuse later if you needed the data inside
for your changing code, if you place it in a volume and recreate your container from a new version of the image, the volume will still have the old copy of your code unless you update that volume yourself (the key feature of volumes is persistent data that you want to keep between image versions)
I do recommend putting your data in a volume that you define on the docker run command line or inside a docker-compose.yml. Volumes defined there can have a name or map back to a path on the docker host. And you can make any folder or file a volume without needing to define it in the Dockerfile. Volumes defined at this step doesn't impact the image, allowing you to extend an image without being locked out of making changes to a directory.
For your code, it is a common best practice to inject code with a volume if it is interpreted (e.g. javascript) or already compiled (e.g. a jar file) during application development. You would define the volume on the container (not the Dockerfile), and overlay the code or binaries that were also copied into the image using the same filenames. This allows you to rapidly iterate in development without frequently rebuilding the image. Depending on the application, you may be able to live reload the code, otherwise, a container restart should be all that's needed to see the latest change. And once development is finished, you rebuild the image with your current code and ship that to someone that can use it without needing the volume mount for the code.
I've also blogged about my concerns with volumes inside of Dockerfiles if you'd like to see more details on this.
You say:
It could be a little convenient to deploy the application without rebuilding the image.
Instead of that, it has a lot of advantages to encapsulate your application version inside an image build. You can easily deploy your app only deploying the image, so the fact that you use a volume for app code leads you to orchestrate some other deployment method to update that volume too.
And you have to (eventually) match the jar version with the proper image version.
Regarding security or performance, I don't think that there are special considerations.
Anyway, it is not a common approach to use volumes for that. And as #BMitch say, using VOLUME inside Dockerfile is some tricky.

Resources