Dockerized executable read/write on host filesystem - docker

I just dockerized an executable that reads from a file and creates a new file in the very directory that file came from.
I want to use Docker in that setup, so that I avoid installing numerous third-party libraries in the production environment.
My problem now: I have file /this/is/a.file on my underlying (host) file system and my executable is supposed to create /this/is/b.file.
As far as I see it, the only chance to get this done is by mapping a volume that points to /this/is and then let the executable know where I mounted it to in the docker, container.
Am I right? Or is there a way that I just pass docker run mydockerizedstuff /this/is/a.file without using Docker volumes?

You're correct, you need to pass in /this/is as a volume and the executable will write to that location.
If you want to constrain the thing even more, you can pass /this/is/b.file as a volume. You need to create it (simply via touch) beforehand, otherwise Docker will consider it a directory and create it as such for you, but you'll know that the thing won't be able to create /this/is/c.file or any other thing.

Related

Docker container bind host directory

I wish I could update the settings of the application by changing the local profile.
I use "volume" to bind a local directory, for example:
docker run -v D:\test:/app
But when the container is running, all files in /app are emptied, because D:\test does not have any files.
Is there any way I can achieve my goal
Your question is a bit unclear. I guess you're problem is the following: You want to bind mount your app directory, but it is initially empty and will stay empty since the bind mount overwrites everything put into /app during build
I usually use two different ways:
Put your profile into host directory D:\test (if applicable). This is also a viable strategy for e.g. the source code of nodejs apps
During build, put your profile into /app_temp. Then create an entry point which moves /app_temp into /app. If you want to persist the profile through multiple build/run phases, it has to be inside the build context (which is likely not D:\test) on your host.
You need to change a bit the way your application is organized. put all the settings in their own directory and have the application read them from there. Then you can map only the settings folder you the host one.
Another option is to map the host folder to a temporary folder inside the container and have the ENTRYPOINT script update your files (by copying them over) and then run your application.
Docker was not meant to be used for the workflow you are trying to setup and for this reason you need to do some extra work.
because D:\test does not have any files.
That the way it works. Volume type you use is bind mount, i.e. you mount file system, using mount point mapped to a host directory.
According to documentation:
With bind mounts, we control the exact mountpoint on the host. We can use this to persist data, but it’s often used to provide additional data into containers.
You have two options here (both imply host data should exist in advance):
Bind to a folder, containing configuration data. As you showed.
It is possible to bind only file:
docker run -v D:\test\config.json:/app/config.json
While binding to a file, if it does not exist beforehand, docker-daemon would think it is a directory and will create directory, both in container and on the host.
you mount file system, using mount point mapped to a host directory
Hence, if host directory is empty mounted file system would also be empty.

How to control file operations made to a volume in docker?

The situation is that I have a user space file system which can provide a bunch of posix like interface in user space. Like this:
open
read
write
mkdir
...
I want to make a volume on this file system and pass it to a docker. My question is how can I control the way docker access this volume so that it can be redirected to my posix like interface?
Right now my file system can't be mounted on the host. It is a completely user space file system.
I think fuse can support this, but I don't want to go there unless I have no choice.
You dont need a volume here. If you can access your POSIX interfaces from your application running in docker. You access it and perform read, write etc operations. If you really need a volume implementation, you need to store it into another volume and have a watch dog app sync the changes to your user file system
Docker does not implement any file or directory access. It's simply not what docker does, as a matter of design.
All docker does when launching a container is to create a bunch of mounts in such a way that the processes inside the container can issue their regular POSIX calls. When a process inside a container calls write(), the call goes directly to the Linux kernel, without docker's knowledge or intervention.
Now, there's a missing piece in your puzzle that has to be implemented one way or another: The application calls e.g. POSIX write() function, and your filesystem is not able to intercept this write() function.
So you have a couple of options:
Option 1: Implement your userspace filesystem in a library:
The library would override the write() function.
You compile the library and put it in some directory e.g. /build/artifacts/filesystem.so.
You use that directory as a volume when running the container, e.g. docker run -v /build/artifacts/filesystem.so:/extralibs/filesystem.so ...
you add this filesystem as a preloaded library: docker run ... --env LD_PRELOAD=/extralibs/filesystem.so ...
This will make all calls in the container use your library, so it should forward all the irrelevant files (e.g. /bin/bash, /etc/passwd etc.) to the real filesystem.
If you have control over the images, then you can set it up such that only particular commands execute with this LD_PRELOAD.
Fair warning: implementing a library which overrides system calls and libc has a lot of pitfalls that you'll need to work around. One example is that if the program uses e.g. fprintf(), then you have to override fprintf() as well, even though fprintf() calls write().
Option 2: Modify the application to just call your filsystem functions.
This is assuming you can modify the application and the docker image.
If your filesystem is a service, run it in the container and issue the appropriate RPCs.
If it needs to be shared with other containers, then the backing store for your filesystem can be a volume.
Option 3: Make your userspace filesystem available natively within the container.
Meaning any command can issue a write() that goes to the kernel directly, and the kernel redirects it to your filesystem.
This essentially means implementing your filesystem as a fuse daemon, mounting it on the host (seeing how you can't mount it inside containers), and using it as a docker volume.
If there's a specific limitation that you're not allowed to mount your filesystem on the host, then you have a lot of work to do to make option 1 work. Otherwise I would advise you to implement your filesystem with fuse and mount it on the host - it has the highest ROI.

How to update docker container image but keep the generated files by container app

What is the best practices for the updating container for the following scenario;
I have images that build on my web app project, and I am puplishing new images based on updated source code, once in a month.
Buy my web app generates files or updates some file in time after running in container. For example, app is creating new xml files under user folder for each web user. Another example is upload files by users.
I want to keep these files after running new updated image without lose.
/bin/
/first.dll
/second.dll
/other-soruces/
/some.cs
/other.cs
/user/
/user-1.xml
/user-2.xml
/uploads/
/images
/image-1.jpg
/web.config
Should I use the volume feature of Docker ? Is there any another strategy ?
Short answer, yes, you do want a volume for these directories. More specifically, two volumes: /user and /uploads.
This gets into a fundamental practice of image and container design that is best done by dividing your application into three parts:
The application code, binaries, libraries, and other runtime dependencies.
The persistent data that the application access and creates.
The configuration that modifies how the application runs, particularly in different environments with the same code.
Each of these parts should go in a different place in docker.
The first part, the code and binaries, goes in your image. This is what you ship to run your container on different nodes in docker, and what you store in a registry for later reuse.
The second part, your persistent data, gets stored in a volume. There are two main types of volumes to pick from: a named volume and a host volume (aka bind mount). A named volume has a particular feature that improves portability, it will be initialized to the contents of your image at the volume location when the volume is created for the first time. This initialization includes directory and file permissions and ownership, and can be used to seed your volume with an initial state. The host volume (bind mount) is just a directory mount from the docker host into the container, and you get exactly what was on the host, including the uid/gid of the files/directories, along with no initialization procedure. The host volume is very easy to access for developers, but lacks portability if you move into a multi-node swarm cluster, and suffers from uid/gid on the host mapping to different users inside the container since usernames inside the container can be different for the same id's. Any files you write inside the container that are not written to a volume should be considered disposable and will be lost when you recreate the container to update to a new image. And any directories you define as a volume should be considered owned by that volume and will not receive updates from the image when you replace the container.
The last piece, configuration, is often overlooked but equally important. This is anything injected into the application at startup to tell it where to connect for external data, config files that alter it's behavior, and anything that needs to be separated to allow the same image to be reusable in different environments. This is how you get portability from development to production with the same image, and how you get reusability of publicly provided images. The configuration is injected with environment variables, command line parameters, bind mounts of a config file (when you run on a single node), and configs + secrets which are essentially the same bind mount of a config file that is now stored in docker's swarm rather than locally on a single host. In your situation, the /web.config looks suspiciously like a config file that you'll want to move out of the image and inject as a bind mount or swarm config.
To put these all together, you will want a compose file that defines your image, the volumes to use, and any configs or environment variables to set.

Is it a docker best practice to use volume for the code?

The VOLUME instruction should be used to expose any database storage area, configuration storage, or files/folders created by your docker container. You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image.
will you store your code in volume?
Such as your jar files. It could be a little convenient to deploy the application without rebuilding the image.
Are there any considerations if storing the code in volume? like performance, security or others.
I don't recommend using a VOLUME statement inside the Dockerfile for anything with current versions of docker (current being any version of docker since the introduction of named volumes). Including a VOLUME command has multiple downsides, including:
possible inability to change contents at that location of the image with any later steps or child images (this behavior appears to be different with different scenarios and different versions of docker)
potential to create volumes with just a hash for the name that clutter up the docker volume ls output and are very difficult to find and reuse later if you needed the data inside
for your changing code, if you place it in a volume and recreate your container from a new version of the image, the volume will still have the old copy of your code unless you update that volume yourself (the key feature of volumes is persistent data that you want to keep between image versions)
I do recommend putting your data in a volume that you define on the docker run command line or inside a docker-compose.yml. Volumes defined there can have a name or map back to a path on the docker host. And you can make any folder or file a volume without needing to define it in the Dockerfile. Volumes defined at this step doesn't impact the image, allowing you to extend an image without being locked out of making changes to a directory.
For your code, it is a common best practice to inject code with a volume if it is interpreted (e.g. javascript) or already compiled (e.g. a jar file) during application development. You would define the volume on the container (not the Dockerfile), and overlay the code or binaries that were also copied into the image using the same filenames. This allows you to rapidly iterate in development without frequently rebuilding the image. Depending on the application, you may be able to live reload the code, otherwise, a container restart should be all that's needed to see the latest change. And once development is finished, you rebuild the image with your current code and ship that to someone that can use it without needing the volume mount for the code.
I've also blogged about my concerns with volumes inside of Dockerfiles if you'd like to see more details on this.
You say:
It could be a little convenient to deploy the application without rebuilding the image.
Instead of that, it has a lot of advantages to encapsulate your application version inside an image build. You can easily deploy your app only deploying the image, so the fact that you use a volume for app code leads you to orchestrate some other deployment method to update that volume too.
And you have to (eventually) match the jar version with the proper image version.
Regarding security or performance, I don't think that there are special considerations.
Anyway, it is not a common approach to use volumes for that. And as #BMitch say, using VOLUME inside Dockerfile is some tricky.

How to place files on shared volume from within Dockerfile?

I have a Dockerfile which builds an image that provides for me a complicated tool-chain environment to compile a project on a mounted volume from the host machines file system. Another reason is that I don't have a lot of space on the image.
The Dockerfile builds my tool-chain in the OS image, and then prepares the source by downloading packages to be placed on the hosts shared volume. And normally from there I'd then log into the image and execute commands to build. And this is the problem. I can download the source in the Dockerfile, but how then would I get it to the shared volume.
Basically I have ...
ADD http://.../file mydir
VOLUME /home/me/mydir
But then of course, I get the error 'cannot mount volume over existing file ..."
Am I going about this wrong?
You're going about it wrong, but you already suspected that.
If you want the source files to reside on the host filesystem, get rid of the VOLUME directive in your Dockerfile, and don't try to download the source files at build time. This is something you want to do at run time. You probably want to provision your image with a pair of scripts:
One that downloads the files to a specific location, say /build.
Another that actually runs the build process.
With these in place, you could first download the source files to a location on the host filesystem, as in:
docker run -v /path/on/my/host:/build myimage fetch-sources
And then you can build them by running:
docker run -v /path/on/my/host:/build myimage build-sources
With this model:
You're trying to muck about with volumes during the image build process. This is almost never what you want, since data stored in a volume is explicitly excluded from the image, and the build process doesn't permit you to conveniently mount host directories inside the container.
You are able to download the files into a persistent location on the host, where they will be available to you for editing, or re-building, or whatever.
You can run the build process multiple times without needing to re-download the source files every time.
I think this will do pretty much what you want, but if it doesn't meet your needs, or if something is unclear, let me know.

Resources