How to track file changes within a Docker Container

How to track file changes within a Docker Container - docker

Is there an easy way to track file changes (files will be changed elsewhere) inside the docker container.
I used COPY within the Dockerfile to test the functionality but now I need to keep track if the copied files are changing in the background.
The changes are made within a different application (Not a docker container). This app fetches data and overwrites those files if something has changed --> Then my container should react to the changes and synchronize it's files.
Is a simple MOUNT enough to establish that?
Regards

Check
inotify docker image
https://github.com/pstauffer/docker-inotify
or
https://hub.docker.com/r/coppit/inotify-command/
or
https://hub.docker.com/r/coppit/inotify-command/~/dockerfile/

Related

Why should our work inside the container shouldn't modify the content of the container itself?

I am reading an article related to docker images and containers.
It says that a container is an instance of an image. Fair enough. It also says that whenever you make some changes to a container, you should create an image of it which can be used later.
But at the same time it says:
Your work inside a container shouldn’t modify the container. Like
previously mentioned, files that you need to save past the end of a
container’s life should be kept in a shared folder. Modifying the
contents of a running container eliminates the benefits Docker
provides. Because one container might be different from another,
suddenly your guarantee that every container will work in every
situation is gone.
What I want to know is that, what is the problem with modifying container's contents? Isn't this what containers are for? where we make our own changes and then create an image which will work every time. Even if we are talking about modifying container's content itself and not just adding any additional packages, how will it harm anything since the image created from this container will also have these changes and other containers created from that image will inherit those changes too.

Treat the container filesystem as ephemeral. You can modify it all you want, but when you delete it, the changes you have made are gone.
This is based on a union filesystem, the most popular/recommended being overlay2 in current releases. The overlay filesystem merges together multiple lower layers of the image with an upper layer of the container. Reads will be performed through those layers until a match is found, either in the container or in the image filesystem. Writes and deletes are only performed in the container layer.
So if you install packages, and make other changes, when the container is deleted and recreated from the same image, you are back to the original image state without any of your changes, including a new/empty container layer in the overlay filesystem.
From a software development workflow, you want to package and release your changes to the application binaries and dependencies as new images, and those images should be created with a Dockerfile. Persistent data should be stored in a volume. Configuration should be injected as either a file, environment variable, or CLI parameter. And temp files should ideally be written to a tmpfs unless those files are large. When done this way, it's even possible to make the root FS of a container read-only, eliminating a large portion of attacks that rely on injecting code to run inside of the container filesystem.

The standard Docker workflow has two parts.
First you build an image:
Check out the relevant source tree from your source control system of choice.
If necessary, run some sort of ahead-of-time build process (compile static assets, build a Java .jar file, run Webpack, ...).
Run docker build, which uses the instructions in a Dockerfile and the content of the local source tree to produce an image.
Optionally docker push the resulting image to a Docker repository (Docker Hub, something cloud-hosted, something privately-run).
Then you run a container based off that image:
docker run the image name from the build phase. If it's not already on the local system, Docker will pull it from the repository for you.
Note that you don't need the local source tree just to run the image; having the image (or its name in a repository you can reach) is enough. Similarly, there's no "get a shell" or "start the service" in this workflow, just docker run on its own should bring everything up.
(It's helpful in this sense to think of an image the same way you think of a Web browser. You don't download the Chrome source to run it, and you never "get a shell in" your Web browser; it's almost always precompiled and you don't need access to its source, or if you do, you have a real development environment to work on it.)
Now: imagine there's some critical widespread security vulnerability in some core piece of software that your application is using (OpenSSL has had a couple, for example). It's prominent enough that all of the Docker base images have already updated. If you're using this workflow, updating your application is very easy: check out the source tree, update the FROM line in the Dockerfile to something newer, rebuild, and you're done.
Note that none of this workflow is "make arbitrary changes in a container and commit it". When you're forced to rebuild the image on a new base, you really don't want to be in a position where the binary you're running in production is something somebody produced by manually editing a container, but they've since left the company and there's no record of what they actually did.
In short: never run docker commit. While docker exec is a useful debugging tool it shouldn't be part of your core Docker workflow, and if you're routinely running it to set up containers or are thinking of scripting it, it's better to try to move that setup into the ordinary container startup instead.

Get files back from running docker container?

This idea came to me and I don't know if it's viable.
Suppose you delete a bunch of files which happen to be mounted inside a docker container.
Because the container is still running, the files are held by the container, even if they're not visible to the host anymore.
As such, the process inside the container can still work with them. It's not untill the container is stopped/restarted, that they'll finally be gone.
What if you wanted those files back? You realize you made a mistake and you wish to bring those files back. Can the files, as seen from inside the container, be leveraged to this end in some way?

If you mount host directory with -v flag if you delete files from month path from the host it will not show inside the container as well.
As such, the process inside the container can still work with them.
It's not untill the container is stopped/restarted, that they'll
finally be gone.
maybe the process in your case required all the required files on startup so, in that case, is fine, like you can delete nginx.conf but it will fail when it restart and it will only affect once you restart the process, so for such cases your assumption make sense but as per copying files does not make sense.

How to update docker container image but keep the generated files by container app

What is the best practices for the updating container for the following scenario;
I have images that build on my web app project, and I am puplishing new images based on updated source code, once in a month.
Buy my web app generates files or updates some file in time after running in container. For example, app is creating new xml files under user folder for each web user. Another example is upload files by users.
I want to keep these files after running new updated image without lose.
/bin/
/first.dll
/second.dll
/other-soruces/
/some.cs
/other.cs
/user/
/user-1.xml
/user-2.xml
/uploads/
/images
/image-1.jpg
/web.config
Should I use the volume feature of Docker ? Is there any another strategy ?

Short answer, yes, you do want a volume for these directories. More specifically, two volumes: /user and /uploads.
This gets into a fundamental practice of image and container design that is best done by dividing your application into three parts:
The application code, binaries, libraries, and other runtime dependencies.
The persistent data that the application access and creates.
The configuration that modifies how the application runs, particularly in different environments with the same code.
Each of these parts should go in a different place in docker.
The first part, the code and binaries, goes in your image. This is what you ship to run your container on different nodes in docker, and what you store in a registry for later reuse.
The second part, your persistent data, gets stored in a volume. There are two main types of volumes to pick from: a named volume and a host volume (aka bind mount). A named volume has a particular feature that improves portability, it will be initialized to the contents of your image at the volume location when the volume is created for the first time. This initialization includes directory and file permissions and ownership, and can be used to seed your volume with an initial state. The host volume (bind mount) is just a directory mount from the docker host into the container, and you get exactly what was on the host, including the uid/gid of the files/directories, along with no initialization procedure. The host volume is very easy to access for developers, but lacks portability if you move into a multi-node swarm cluster, and suffers from uid/gid on the host mapping to different users inside the container since usernames inside the container can be different for the same id's. Any files you write inside the container that are not written to a volume should be considered disposable and will be lost when you recreate the container to update to a new image. And any directories you define as a volume should be considered owned by that volume and will not receive updates from the image when you replace the container.
The last piece, configuration, is often overlooked but equally important. This is anything injected into the application at startup to tell it where to connect for external data, config files that alter it's behavior, and anything that needs to be separated to allow the same image to be reusable in different environments. This is how you get portability from development to production with the same image, and how you get reusability of publicly provided images. The configuration is injected with environment variables, command line parameters, bind mounts of a config file (when you run on a single node), and configs + secrets which are essentially the same bind mount of a config file that is now stored in docker's swarm rather than locally on a single host. In your situation, the /web.config looks suspiciously like a config file that you'll want to move out of the image and inject as a bind mount or swarm config.
To put these all together, you will want a compose file that defines your image, the volumes to use, and any configs or environment variables to set.

Is it a docker best practice to use volume for the code?

The VOLUME instruction should be used to expose any database storage area, configuration storage, or files/folders created by your docker container. You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image.
will you store your code in volume?
Such as your jar files. It could be a little convenient to deploy the application without rebuilding the image.
Are there any considerations if storing the code in volume? like performance, security or others.

I don't recommend using a VOLUME statement inside the Dockerfile for anything with current versions of docker (current being any version of docker since the introduction of named volumes). Including a VOLUME command has multiple downsides, including:
possible inability to change contents at that location of the image with any later steps or child images (this behavior appears to be different with different scenarios and different versions of docker)
potential to create volumes with just a hash for the name that clutter up the docker volume ls output and are very difficult to find and reuse later if you needed the data inside
for your changing code, if you place it in a volume and recreate your container from a new version of the image, the volume will still have the old copy of your code unless you update that volume yourself (the key feature of volumes is persistent data that you want to keep between image versions)
I do recommend putting your data in a volume that you define on the docker run command line or inside a docker-compose.yml. Volumes defined there can have a name or map back to a path on the docker host. And you can make any folder or file a volume without needing to define it in the Dockerfile. Volumes defined at this step doesn't impact the image, allowing you to extend an image without being locked out of making changes to a directory.
For your code, it is a common best practice to inject code with a volume if it is interpreted (e.g. javascript) or already compiled (e.g. a jar file) during application development. You would define the volume on the container (not the Dockerfile), and overlay the code or binaries that were also copied into the image using the same filenames. This allows you to rapidly iterate in development without frequently rebuilding the image. Depending on the application, you may be able to live reload the code, otherwise, a container restart should be all that's needed to see the latest change. And once development is finished, you rebuild the image with your current code and ship that to someone that can use it without needing the volume mount for the code.
I've also blogged about my concerns with volumes inside of Dockerfiles if you'd like to see more details on this.

You say:
It could be a little convenient to deploy the application without rebuilding the image.
Instead of that, it has a lot of advantages to encapsulate your application version inside an image build. You can easily deploy your app only deploying the image, so the fact that you use a volume for app code leads you to orchestrate some other deployment method to update that volume too.
And you have to (eventually) match the jar version with the proper image version.
Regarding security or performance, I don't think that there are special considerations.
Anyway, it is not a common approach to use volumes for that. And as #BMitch say, using VOLUME inside Dockerfile is some tricky.

Is it a bad idea to use docker to run a front end build process during development?

I have an angular project I'm working on containerizing. It currently has enough build tooling that a front-end developer could (and this is how it currently works) just run gulp in the project root, edit source files in src/, and the build tooling handles running traceur, templates and libsass and so forth, spitting content into a build directory. That build directory is served with a minimal server in development, and handled by nginx in production.
My goal is to build a docker-based environment that replicates this workflow. My users are developers who are on different kinds of boxes, so having build dependencies frozen in a dockerfile seems to make sense.
I've gotten close enough to this- docker mounts the host volume, the developer edits the file on the local disk, and the gulp build watcher is running in the docker container instance and rebuilds the site (and triggers livereload, etc etc).
The issue I have is with wrapping my head around how filesystem layers work. Is this process of rebuilding files in the container's build/frontend directory generating a ton of extraneous saved layers? That's not something I'd really like, because I'm not wild about monotonically growing this instance as developers tweak and rebuild and tweak and rebuild. It'd only grow locally, but having to go through the "okay, time to clean up and start over" process seems tedious.

Is this process of rebuilding files in the container's build/frontend directory generating a ton of extraneous saved layers?
Nope, the only way to stack up extra layers is to commit a container with changes to a new image then use that new image to create the next container. Rinse, repeat.
Filesystem layers are saved when a container is committed to a new image (docker commit ...). When a container is running there will be a single read/write layer on top that contains all of the changes made to the container since it was created.
having to go through the "okay, time to clean up and start over" process seems tedious.
If you run the build container with docker run --rm ... then you'll get the cleanup for free. The build container will be created fresh from the image each time.
Also, data volumes bypass the union filesystem so there's a good chance you won't write to the container's filesystem at all.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart