docker: data volumes container - docker

So I'm still trying to figure out what would be the best way to use docker in my current infrastructure.
I've been thinking of creating a data-volume containers, that containers would hold the data volume for mongodb (this seem to be pretty popular approach).
If I do that, how would I update the container without loosing the data inside it?
========== EDIT ==========
Clarification:
I want to be able to "update" the container by, basically, rebuilding from Dockerfile. This means that I'll need to spin up a new container, but I want to keep the volumes from the old one

I think this might help you:
https://github.com/discordianfish/docker-backup
TL;DR: You just need to ensure that another container (such as: backup_monitor) is referencing the data volume and you will keep the volumes no matter what. Whether you rebuild your container or not, data volumes are meant to bypass the Union File System so it are kept while there is some container pointing to it.
Another approach could be to share the data volumes with the docker host, using this way you can forget about data because is saved on the host directly. Hope it helps.

Related

How to check if volume population has finished

Imagine the following scenario:
I have a Docker image with a lot of small files in some folder called /app. I do then add a bind mount to that folder on a slow, network file system (/dfs/volumes; in my case it's based on Ceph): docker run -v /dfs/volumes/app:/app ..., as soon as the container starts, docker starts populating the volume. On the host I can see how /dfs/volumes/app is filled up with files, the container is running at this point. So far so good.
However: since the container is already running and at some point my entry point /app/executable will be executed, this might result in a problem because I do not know if the volume is already fully populated.
Is there a way to delay the container startup until the volume is completly populated? Or can I somehow check if population is done from inside the container? I could probably manually prepare the volume before I start the container, but that kind of defeats the purpose of the automatic volume population...
ok, found out whats happening, this was related to the setup being distributed:
the volume is shared between two containers on different hosts. as soon as the first container starts, it starts populating the volume, and the other container "sees" an existing volume with that already has some files in it, so it simply starts booting.
and then for the 2nd container, the files appear over time.
my bad.
anyway, i leave this here for reference, in case someone runs into similar problems.

How to backup docker container with data and move to another server?

I'm definitely new to docker at all, started to use it a while ago and I need to move my stuff from one server to another. I thought that just creating a personal image will solve this issue, but nope :D.
So if I'm right, all data is saved on created volume, right? Like one of the containers is PostgreSQL.
So to move everything i need also backup the volume and export it on a new server?
https://docs.docker.com/storage/volumes/#backup-restore-or-migrate-data-volumes
This is what I found on their docs.
Hope somebody could help me with understanding
Docker, for default, stored images, containers, volumes, and other data, into /var/lib/docker, if not customized by the file /etc/docker/daemons.json as explained here.
In order to move all the graph to a new server you should:
Stop docker service.
Copy data root.
Restart docker service.
Regards.

Where should production critical and non-production non-critical data stored?

I was asked this question in an interview and i m not sure of the correct answer hence I would like your suggestions.
I was asked whether we should persist production critical data inside of the docker instance or outside of it? What would be my choice and the reasons for it.
Would your answer differ incase we have a non-prod non critical data ?
Back your answers with reasons.
Most data should be managed externally to containers and container images. I tend to view data constrained to a container as temporary (intermediate|discardable) data. Otherwise, if it's being captured but it's not important to my business, why create it?
The name "container" is misleading. Containers aren't like VMs where there's a strong barrier (isolation) between VMs. When you run multiple containers on a single host, you can enumerate all their processes using ps aux on the host.
There are good arguments for maintaining separation between processes and data and running both within a single container makes it more challenging to retain this separation.
Unlike processes, files in container layers are more isolated though. Although the layers are manifest as files on the host OS, you can't simply ls a container layer's files from the host OS. This makes accessing the data in a container more complex. There's also a performance penalty for effectively running a file system atop another file system.
While it's common and trivial to move container images between machines (viz docker push and docker pull), it's less easy to move containers between machines. This isn't generally a problem for moving processes as these (config aside) are stateless and easy to move and recreate, but your data is state and you want to be able to move this data easily (for backups, recovery) and increasingly to move amongst a dynamic pool of nodes that perform processing upon it.
Less importantly but not unimportantly, it's relatively easy to perform the equivalent of a rm -rf * with Docker by removing containers (docker container rm ...) and thereby deleting the application and your data.
The two very most basic considerations you should have here:
Whenever a container gets deleted, everything in the container filesystem is lost.
It's extremely common to delete containers; it's required to change many startup options or to update a container to a newer image.
So you don't really want to keep anything "in the container" as its primary data storage: it's inaccessible from outside the container, and will get lost the next time there's a critical security update and you must delete the container.
In plain Docker, I'd suggest keeping
...in the image: your actual application (the compiled binary or its interpreted source as appropriate; this does not go in a volume)
...in the container: /tmp
...in a bind-mounted host directory: configuration files you need to push into the container at startup time; directories of log files produced by the container (things where you as an operator need to directly interact with the files)
...in either a named volume or bind-mounted host directory: persistent data the container records in the filesystem
On this last point, consider trying to avoid this layer altogether; keeping data in a database running "somewhere else" (could be another container, a cloud service like RDS, ...) simplifies things like backups and simplifies running multiple replicas of the same service. A host directory is easier to back up, but on some environments (MacOS) it's unacceptably slow.
My answers don't change here for "production" vs. "non-production" or "critical" vs. "non-critical", with limited exceptions you can justify by saying "it's okay if I lose this data" ("because it's not the master copy of it").

How to replace a Docker data volume with another at runtime?

I'm trying to replace a Docker data volume with another one at runtime without interrupting other containers that access data inside the data volume.
Is there presently any way to do this with Docker?
If not, what is a container strategy where I can have a separate data container that will be accessed by other containers/services but where I can swap out the data container without causing service interruptions at runtime?
I don't think you can do it without tweaking the container at runtime . You can have a look at the approach suggested here
Depending on the type of the data, there are other alternatives.
On my side, I had a similar requirement where some website static resources where managed outside the application in a data container. My resources where "packaged" in a container, but swapping this data container to move from one version to another at runtime didn't work.
I moved to a different strategy where I have a generic side-car container which pull the code from a separate git repo on demand. Hence I can easily upgrade the version of my data container without restarting anything.

Is it "safe" to commit a running container in docker?

As the title goes, safe means... the proper way?
Safe = consistent, no data loss, professional, legit way.
Hope to share some experiences with pro docker users.
Q. Commit is safe for running docker containers (with the exception of rapidly changing realtime stuff and database stuff, your own commentary is appreciated.)
Yes or No answer is accepted with comment. Thanks.
All memory and harddisk storage is saved inside the container instance. You should, as long as you don't use any external mounts/docker volumes and servers (externally connected DBs?) never get in trouble for stopping/restarting and comitting dockers. Please read on to go more in depth on this topic.
A question that you might want to ask yourself initially, is how does docker store changes that it makes to its disk on runtime? What is really sweet to check out, is how docker actually manages to get this working. The original state of the container's hard disk is what is given to it from the image. It can NOT write to this image. Instead of writing to the image, a diff is made of what is changed in the containers internal state in comparison to what is in the docker image.
Docker uses a technology called "Union Filesystem", which creates a diff layer on top of the initial state of the docker image.
This "diff" (referenced as the writable container in the image below) is stored in memory and disappears when you delete your container. When you use docker commit, the writable container that is retained in the temporary "state" of the container is stored inside a new image, however: I don't recommend this. The state of your new docker image is not represented in a dockerfile and can not easily be regenerated from a rebuild. Making a new dockerfile should not be hard. So that is alway the way-to-go for me personally.
When your docker is working with mounted volumes, external servers/DBs, you might want to make sure you don't get out of sync and temporary stop your services inside the docker container. When you would use a dockerfile you can start up a bootstrap shell script inside your container to start up connections, perform checks and initialize the running process to get your application durably set up. Again, running a committed container makes it harder to do something like this.

Resources