Get files back from running docker container? - docker

This idea came to me and I don't know if it's viable.
Suppose you delete a bunch of files which happen to be mounted inside a docker container.
Because the container is still running, the files are held by the container, even if they're not visible to the host anymore.
As such, the process inside the container can still work with them. It's not untill the container is stopped/restarted, that they'll finally be gone.
What if you wanted those files back? You realize you made a mistake and you wish to bring those files back. Can the files, as seen from inside the container, be leveraged to this end in some way?

If you mount host directory with -v flag if you delete files from month path from the host it will not show inside the container as well.
As such, the process inside the container can still work with them.
It's not untill the container is stopped/restarted, that they'll
finally be gone.
maybe the process in your case required all the required files on startup so, in that case, is fine, like you can delete nginx.conf but it will fail when it restart and it will only affect once you restart the process, so for such cases your assumption make sense but as per copying files does not make sense.

Related

How to check if volume population has finished

Imagine the following scenario:
I have a Docker image with a lot of small files in some folder called /app. I do then add a bind mount to that folder on a slow, network file system (/dfs/volumes; in my case it's based on Ceph): docker run -v /dfs/volumes/app:/app ..., as soon as the container starts, docker starts populating the volume. On the host I can see how /dfs/volumes/app is filled up with files, the container is running at this point. So far so good.
However: since the container is already running and at some point my entry point /app/executable will be executed, this might result in a problem because I do not know if the volume is already fully populated.
Is there a way to delay the container startup until the volume is completly populated? Or can I somehow check if population is done from inside the container? I could probably manually prepare the volume before I start the container, but that kind of defeats the purpose of the automatic volume population...
ok, found out whats happening, this was related to the setup being distributed:
the volume is shared between two containers on different hosts. as soon as the first container starts, it starts populating the volume, and the other container "sees" an existing volume with that already has some files in it, so it simply starts booting.
and then for the 2nd container, the files appear over time.
my bad.
anyway, i leave this here for reference, in case someone runs into similar problems.

What is the point in backing up a docker container?

I have started using Docker recently and currently running Portainer and Nginx.
Of course, I've also started looking deeper into Docker, how it works, how to back it up, and I just feel like I'm missing something.
The data, whether it be bind mount or volume, resides on the host, when all is said and done.
I followed some video showing how to backup a container, export an image and reimport it, and once I reached the end I realize that you still have to run the command with all the ports, mounts, etc. like if I was simply using the original image. Of course, since the data isn't backed up, you have to move it manually.
What am I missing? What's the difference between backing up an image as opposed to just pulling a new one with docker run command that you used the first time and moving the data?

Why should our work inside the container shouldn't modify the content of the container itself?

I am reading an article related to docker images and containers.
It says that a container is an instance of an image. Fair enough. It also says that whenever you make some changes to a container, you should create an image of it which can be used later.
But at the same time it says:
Your work inside a container shouldn’t modify the container. Like
previously mentioned, files that you need to save past the end of a
container’s life should be kept in a shared folder. Modifying the
contents of a running container eliminates the benefits Docker
provides. Because one container might be different from another,
suddenly your guarantee that every container will work in every
situation is gone.
What I want to know is that, what is the problem with modifying container's contents? Isn't this what containers are for? where we make our own changes and then create an image which will work every time. Even if we are talking about modifying container's content itself and not just adding any additional packages, how will it harm anything since the image created from this container will also have these changes and other containers created from that image will inherit those changes too.
Treat the container filesystem as ephemeral. You can modify it all you want, but when you delete it, the changes you have made are gone.
This is based on a union filesystem, the most popular/recommended being overlay2 in current releases. The overlay filesystem merges together multiple lower layers of the image with an upper layer of the container. Reads will be performed through those layers until a match is found, either in the container or in the image filesystem. Writes and deletes are only performed in the container layer.
So if you install packages, and make other changes, when the container is deleted and recreated from the same image, you are back to the original image state without any of your changes, including a new/empty container layer in the overlay filesystem.
From a software development workflow, you want to package and release your changes to the application binaries and dependencies as new images, and those images should be created with a Dockerfile. Persistent data should be stored in a volume. Configuration should be injected as either a file, environment variable, or CLI parameter. And temp files should ideally be written to a tmpfs unless those files are large. When done this way, it's even possible to make the root FS of a container read-only, eliminating a large portion of attacks that rely on injecting code to run inside of the container filesystem.
The standard Docker workflow has two parts.
First you build an image:
Check out the relevant source tree from your source control system of choice.
If necessary, run some sort of ahead-of-time build process (compile static assets, build a Java .jar file, run Webpack, ...).
Run docker build, which uses the instructions in a Dockerfile and the content of the local source tree to produce an image.
Optionally docker push the resulting image to a Docker repository (Docker Hub, something cloud-hosted, something privately-run).
Then you run a container based off that image:
docker run the image name from the build phase. If it's not already on the local system, Docker will pull it from the repository for you.
Note that you don't need the local source tree just to run the image; having the image (or its name in a repository you can reach) is enough. Similarly, there's no "get a shell" or "start the service" in this workflow, just docker run on its own should bring everything up.
(It's helpful in this sense to think of an image the same way you think of a Web browser. You don't download the Chrome source to run it, and you never "get a shell in" your Web browser; it's almost always precompiled and you don't need access to its source, or if you do, you have a real development environment to work on it.)
Now: imagine there's some critical widespread security vulnerability in some core piece of software that your application is using (OpenSSL has had a couple, for example). It's prominent enough that all of the Docker base images have already updated. If you're using this workflow, updating your application is very easy: check out the source tree, update the FROM line in the Dockerfile to something newer, rebuild, and you're done.
Note that none of this workflow is "make arbitrary changes in a container and commit it". When you're forced to rebuild the image on a new base, you really don't want to be in a position where the binary you're running in production is something somebody produced by manually editing a container, but they've since left the company and there's no record of what they actually did.
In short: never run docker commit. While docker exec is a useful debugging tool it shouldn't be part of your core Docker workflow, and if you're routinely running it to set up containers or are thinking of scripting it, it's better to try to move that setup into the ordinary container startup instead.

How to track file changes within a Docker Container

Is there an easy way to track file changes (files will be changed elsewhere) inside the docker container.
I used COPY within the Dockerfile to test the functionality but now I need to keep track if the copied files are changing in the background.
The changes are made within a different application (Not a docker container). This app fetches data and overwrites those files if something has changed --> Then my container should react to the changes and synchronize it's files.
Is a simple MOUNT enough to establish that?
Regards
Check
inotify docker image
https://github.com/pstauffer/docker-inotify
or
https://hub.docker.com/r/coppit/inotify-command/
or
https://hub.docker.com/r/coppit/inotify-command/~/dockerfile/

How to synchronize 2 docker container's overlay2 filesystems?

I happen to use docker in a questionable way for a specific purpose:
I have a container with a few development tools and their configurations. These are typically tools found in operating systems that are configured with dotconf files usually found in home directories (like tmux, vim, git, programming languages...). This is normally handled by configuring an OS with tools and dotfiles.
But with the setup becoming more complex over the years, properly setting up a new OS is becoming very hard. On the other hand, moving a container around machines is very simple. So for that precise case, I happen to use docker in a way that goes against the "docker way". But that is really comfortable to me.
That being said, I also want to synchronize the container's filesystem with another container (which, in my understanding is definitely not the "docker way", but yet.) I want to run 2 instances of the same image, on different machines. Then I want to synchronize their read write layer that is on top of the image. So when a file is created, deleted or modified on one, this is replicated on the other.
I was thinking of using rsync or unison to do that. But I don't know how the overlay2 driver works. Are the directories in /var/lib/docker/overlay2/<container-id> the actual containers filesystem layer? Or should it be mounted? I saw some people mount their containers filesystem on the host with the device mapper driver fairly easily. Would that make sense with overlay2?
I think your best option here is to use a bind mount. This changes your initial design a bit - but it will likely be the cleanest, and easiest to implement.
First things first - you'll want to ensure that any files that you want to have synced are in a specific folder; so rather than rsyncing the entire underlying filesystem - you'll just do, for example /app/my_files inside your container and set your application to read/write from there.
Now - create your directory and setup the rsync between your machines; let's say it's at /rsync
Lastly - run your containers and use a bind-mount; which if you're just bringing up a container would look like: docker run -d -v /rsync:/app/my_files my_image
After reading this page: https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ and experimenting with image / container creation and deletion, I have found out a few things.
When creating an image, the folder /var/lib/docker/overlay2 is populated with a new folder, called with what seems to be a hash (let's call it 123), and is itself populated with the image content / filesystem.
Then when creating the container from this same image, two more folders get created inside /var/lib/docker/overlay2, also named with what seems to be another hash, with one of them having -init at the end. Let's call them 456 and 456-init. They seem to contain the container layer.
When looking at the output of docker inspect <container-name>, the GraphDriver section has some info about how docker uses the overlay2. The lower dir contains the init container dir + the image dir as in: /var/lib/docker/overlay2/456-init/diff:/var/lib/docker/overlay2/123/diff. I don't fully understand how that works, but I understand that I am not interested in the lower dir since it should be the image dir in read only mode. And that is something I already have on all hosts and thus do not need to sync.
Then in my understanding the upper dir in overlay2 is the read write layer that the container uses on top of the image layer. In the GraphDriver this is found to be /var/lib/docker/overlay2/456/diff. That is the directory that gets the changes made inside the container. I could not find the documentation so I experimented a bit and found out that this upper dir never changed during the life of the container. I stopped and started it and the upper dir stayed the same. Then when removing the container this folder is deleted. And when creating the container again, a new folder with a different name is created.
So it looks like what I need to sync is this upper dir, which can be found with docker inspect. I'll experiment a bit more with that.

Resources