rbind usage on local volume mounting - docker

I have a directory that is configured to be managed by an automounter (as described here). I need to use this directory (and all directories that are mounted inside) in multiple pods as a Local Persistent Volume.
I am able to trigger the automounter within the containers, but there are some use-cases when this directory is not empty when the container starts up. This makes sub-directories appear as empty and not being able to trigger the automounter (whithin the container)
I did some investigation and discovered that when using Local PVs, there is a mount -o bind command between the source directory and some internal directory managed by the kubelet (this is the line in the source code).
What I actually do need is rbind to be used (recursive binding - here is a good explanation).
Using rbind also requires some changes to the part that unmounts the volume (recursive unmounting is needed)
I don't want to patch the kubelet and recompile it..yet.
So my question is: are there some official methods to provide to Kubernetes some custom mounter/unmounter?

Meanwhile, I did find a solution for this use-case.
Based on Kubernetes docs there is something called Out-Of-Tree Volume Plugins
The Out-of-tree volume plugins include the Container Storage Interface (CSI) and FlexVolume. They enable storage vendors to create custom storage plugins without adding them to the Kubernetes repository
Even that CSI is encouraged to be used, I chose FlexVolume to implement my custom driver. Here is a detailed documentation.
This driver is actually a py script that supports three actions: init/mount/unmount (--rbind is used to mount that directory managed by automounter and unmounts it like this). It is deployed using a DaemonSet (docs here)
And this is it!

Related

Should I use the application code inside the docker images or in volumes?

I am working on a Devops project. I want to find the perfect solution.
I have a conflict between two solutions. should I use the application code inside the docker images or in volumes?
Your code should almost never be in volumes, developer-only setups aside (and even then). This is doubly true if you have a setup like a frequent developer-only Node setup that puts the node_modules directory into a Docker-managed anonymous volume: since Docker will refuse to update that directory on its own, the primary effect of this is to cause Docker to ignore any changes to the package.json file.
More generally, in this context, you should think of the image as a way to distribute the application code. Consider clustered environments like Kubernetes: the cluster manager knows how to pull versioned Docker images on its own, but you need to work around a lot of the standard machinery to try to push code into a volume. You should not need to both distribute a Docker image and also separately distribute the code in the image.
I'd suggest using host-directory mounts for injecting configuration files and for storing file-based logs (if the container can't be configured to log to stdout). Use either host-directory or named-volume mounts for stateful containers' data (host directories are easier to back up, named volumes are faster on non-Linux platforms). Do not use volumes at all for your application code or libraries.
(Consider that, if you're just overwriting all of the application code with volume mounts, you may as well just use the base node image and not build a custom image; and if you're doing that, you may as well use your automation system (Salt Stack, Ansible, Chef, etc.) to just install Node and ignore Docker entirely.)

Mounting a large file in Kubernetes

We are running a pod in Kubernetes that needs to load a file during runtime. This file has the following properties:
It is known at build time
It should be mounted read-only by multiple pods (the same kind)
It might change (externally to the cluster) and needs to be updated
For various reasons (security being the main concern) the file cannot be inside the docker image
It is potentially quite large, theoretically up to 100 MB, but in practice between 200kB - 10MB.
We have considered various options:
Creating a persistent volume, mount the volume in a temporary pod to write (update) the file, unmount the volume, and then mount it in the service with ROX (Read-Only Multiple) claims. This solution means we need downtime during upgrade, and it is hard to automate (due to timings).
Creating multiple secrets using the secrets management of Kubernetes, and then "assemble" the file before loading it in an init-container or something similar.
Both of these solutions feels a little bit hacked - is there a better solution out there that we could utilize for solving this?
You need to use a shared filesystem that supports Read/Write Multiple Pods.
Here is a link to the CSI Drivers which can be used with Kubernetes and provide those access:
https://kubernetes-csi.github.io/docs/drivers.html
Ideally, you need a solution that is not an appliance, and can run anywhere meaning it can run in the cloud or on-prem.
The platforms that could work for you are Ceph, GlusterFS, and Quobyte (Disclaimer, I work for Quobyte)

How to create and mount common data for all pods

I have defined job in my kubernetes cluster which suppose to create some folder with some data. Now I would like to share this folder between all other pods, because they need to use this data.
Currently other pods are not running if above mentioned job is not finished.
So I think about volumes. Let's say - result of the job is mounted folder, which is accessible from other pods when job is finished.
Other pods in cluster needs only environment variable - path to this mounted folder.
Could you please how I could define this?
ps. I know this is not a very good use case, however I Have legacy monolit application with lots of dependencies.
I'm assuming that the folder you are referring to is a single folder at some disk that can be mounted by multiple clients.
Check here , or at your volume plugin documentation reference if the access mode you are requesting is supported.
Create the persistent volume claim that your pods will use, no need for the matching volume to exists yet. You can use label/expression matching to make sure that this PVC will only be satisfied by the persistent volume you will be creating at your job.
At your job add a final task that creates the persistent volume claim that satisfies the PVC.
Create your pods adding the PVC as volume. I don't think pod presets are needed, plus they are alpha, not enabled by default, and not widely used, but depending on your case you might want to take a look at them.

How to update docker container image but keep the generated files by container app

What is the best practices for the updating container for the following scenario;
I have images that build on my web app project, and I am puplishing new images based on updated source code, once in a month.
Buy my web app generates files or updates some file in time after running in container. For example, app is creating new xml files under user folder for each web user. Another example is upload files by users.
I want to keep these files after running new updated image without lose.
/bin/
/first.dll
/second.dll
/other-soruces/
/some.cs
/other.cs
/user/
/user-1.xml
/user-2.xml
/uploads/
/images
/image-1.jpg
/web.config
Should I use the volume feature of Docker ? Is there any another strategy ?
Short answer, yes, you do want a volume for these directories. More specifically, two volumes: /user and /uploads.
This gets into a fundamental practice of image and container design that is best done by dividing your application into three parts:
The application code, binaries, libraries, and other runtime dependencies.
The persistent data that the application access and creates.
The configuration that modifies how the application runs, particularly in different environments with the same code.
Each of these parts should go in a different place in docker.
The first part, the code and binaries, goes in your image. This is what you ship to run your container on different nodes in docker, and what you store in a registry for later reuse.
The second part, your persistent data, gets stored in a volume. There are two main types of volumes to pick from: a named volume and a host volume (aka bind mount). A named volume has a particular feature that improves portability, it will be initialized to the contents of your image at the volume location when the volume is created for the first time. This initialization includes directory and file permissions and ownership, and can be used to seed your volume with an initial state. The host volume (bind mount) is just a directory mount from the docker host into the container, and you get exactly what was on the host, including the uid/gid of the files/directories, along with no initialization procedure. The host volume is very easy to access for developers, but lacks portability if you move into a multi-node swarm cluster, and suffers from uid/gid on the host mapping to different users inside the container since usernames inside the container can be different for the same id's. Any files you write inside the container that are not written to a volume should be considered disposable and will be lost when you recreate the container to update to a new image. And any directories you define as a volume should be considered owned by that volume and will not receive updates from the image when you replace the container.
The last piece, configuration, is often overlooked but equally important. This is anything injected into the application at startup to tell it where to connect for external data, config files that alter it's behavior, and anything that needs to be separated to allow the same image to be reusable in different environments. This is how you get portability from development to production with the same image, and how you get reusability of publicly provided images. The configuration is injected with environment variables, command line parameters, bind mounts of a config file (when you run on a single node), and configs + secrets which are essentially the same bind mount of a config file that is now stored in docker's swarm rather than locally on a single host. In your situation, the /web.config looks suspiciously like a config file that you'll want to move out of the image and inject as a bind mount or swarm config.
To put these all together, you will want a compose file that defines your image, the volumes to use, and any configs or environment variables to set.

Is it a docker best practice to use volume for the code?

The VOLUME instruction should be used to expose any database storage area, configuration storage, or files/folders created by your docker container. You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image.
will you store your code in volume?
Such as your jar files. It could be a little convenient to deploy the application without rebuilding the image.
Are there any considerations if storing the code in volume? like performance, security or others.
I don't recommend using a VOLUME statement inside the Dockerfile for anything with current versions of docker (current being any version of docker since the introduction of named volumes). Including a VOLUME command has multiple downsides, including:
possible inability to change contents at that location of the image with any later steps or child images (this behavior appears to be different with different scenarios and different versions of docker)
potential to create volumes with just a hash for the name that clutter up the docker volume ls output and are very difficult to find and reuse later if you needed the data inside
for your changing code, if you place it in a volume and recreate your container from a new version of the image, the volume will still have the old copy of your code unless you update that volume yourself (the key feature of volumes is persistent data that you want to keep between image versions)
I do recommend putting your data in a volume that you define on the docker run command line or inside a docker-compose.yml. Volumes defined there can have a name or map back to a path on the docker host. And you can make any folder or file a volume without needing to define it in the Dockerfile. Volumes defined at this step doesn't impact the image, allowing you to extend an image without being locked out of making changes to a directory.
For your code, it is a common best practice to inject code with a volume if it is interpreted (e.g. javascript) or already compiled (e.g. a jar file) during application development. You would define the volume on the container (not the Dockerfile), and overlay the code or binaries that were also copied into the image using the same filenames. This allows you to rapidly iterate in development without frequently rebuilding the image. Depending on the application, you may be able to live reload the code, otherwise, a container restart should be all that's needed to see the latest change. And once development is finished, you rebuild the image with your current code and ship that to someone that can use it without needing the volume mount for the code.
I've also blogged about my concerns with volumes inside of Dockerfiles if you'd like to see more details on this.
You say:
It could be a little convenient to deploy the application without rebuilding the image.
Instead of that, it has a lot of advantages to encapsulate your application version inside an image build. You can easily deploy your app only deploying the image, so the fact that you use a volume for app code leads you to orchestrate some other deployment method to update that volume too.
And you have to (eventually) match the jar version with the proper image version.
Regarding security or performance, I don't think that there are special considerations.
Anyway, it is not a common approach to use volumes for that. And as #BMitch say, using VOLUME inside Dockerfile is some tricky.

Resources