Is it possible to mount Cinder volumes on docker containers in openstack?
And if it is, is there a way to encrypt data leaving the container to the cinder volume?
I was thinking of mounting the volume as a loopback device and encrypt the data as it was being flushed to the disk. Ist this possible?
Kind regards
It is not currently possible to mount Cinder volumes inside a Docker container in OpenStack.
A fundamental problem is that Docker is filesystem-based, rather than block-device-based. Any block device -- like a Cinder volume -- would need to be formatted with a filesystem and mounted prior to starting the container. While it might be technically feasible, the necessary support for this does not yet exist.
The Manila project may be a better solution for adding storage to containers, but I haven't looked into that and I don't know if (a) the project works at all yet and (b) it it works with nova-docker.
If you're not using the nova-docker driver but are instead using the Heat plugin for Docker, you can mount host volumes in a container similar to docker run -v ..., but making this work seamlessly across multiple nodes in a multi-tenant setting may be difficult or impossible.
Related
I'm new to the area, sorry if my question sounds dumb.
What I'm looking for: I have a containers pod, where one of the containers (alpine based) should read/write from/to the customer's provided file. I don't want to limit customer on how to provide file (or at least to support most common ways).
And file's size might be huge sometimes (not sure if that requirement makes any difference).
The more flexibility here the better.
From the initial search I found there are multiple ways to bind the volume/directory to docker's container:
Docker bind mount - sharing dir between Host and container (nice to have)
Add a docker volume to the pod (must have)
Mount AWS S3 bucket to docker's container (must have)
any other ways of supplying file access to the container? Let's say from the remote machine via sftp access?
But main question - is it all possible to configure via Kubernetes?
Ideally in the same yaml file that starts the containers?
Any hints/examples are very welcome!
It surely is possible!
Like there are volume mount for a docker container, there are volume mounts in Kubernetes as well.
This is achieved using Persistent Volume Claim, PVC. These are Pod lifecycle independent storage classes to store the data in the volume mount.
Understand more about the concept here: https://kubernetes.io/docs/concepts/storage/persistent-volumes/
I have seen the terms "bind mount" and "host volume" being used in various articles but none of them mention whether they are the same thing or not. But looking at their function, it looks like they are pretty much the same thing. Can anyone answer whether it is the same thing or not? If not, what is the difference?
Ref:
Docker Docs - Use bind mounts
https://blog.logrocket.com/docker-volumes-vs-bind-mounts/
They are different concepts.
As mentioned in bind mounts:
Bind mounts have been around since the early days of Docker. Bind mounts have limited functionality compared to volumes. When you use a bind mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its absolute path on the host machine. By contrast, when you use a volume, a new directory is created within Docker’s storage directory on the host machine, and Docker manages that directory’s contents.
And as mentioned in volumes:
Volumes are the preferred mechanism for persisting data generated by
and used by Docker containers. While bind mounts are dependent on the
directory structure and OS of the host machine, volumes are completely
managed by Docker. Volumes have several advantages over bind mounts:
Volumes are easier to back up or migrate than bind mounts.
You can manage volumes using Docker CLI commands or the Docker API.
Volumes work on both Linux and Windows containers.
Volumes can be more safely shared among multiple containers.
Volume drivers let you store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
New volumes can have their content pre-populated by a container.
Volumes on Docker Desktop have much higher performance than bind mounts from Mac and Windows hosts.
A "bind mount" is when you let your container see and use a normal directory in a normal filesystem on your host. Changes made by programs running in the container will be visible in your host's filesystem.
A "volume" is a single file on your host that acts like a whole filesystem visible to the container. You can't normally see what's inside it from the host.
I was able to figure it out.
There are 3 types of storage in Docker.
1. Bind mounts-also known as host volumes.
2. Anonymous volumes.
3. Named volumes.
So bind mount = host volume. They are the same thing. "Host volume" must be a deprecating term though, as I cannot see it in Docker docs. But it can be seen in various articles published 1-2 years ago.
Examples for where it is referred to as "host volume":
https://docs.drone.io/pipeline/docker/syntax/volumes/host/
https://spin.atomicobject.com/2019/07/11/docker-volumes-explained/
This docs page here Manage data in Docker is quite helpful
Volumes are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/ on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
I am building an application in a docker container, that in the end will have to read from a filesystem that is quite large (terabytes) in size.
The application itself will be running on another device.
I am now wondering which is better to use for connecting the container to this filesystem, a volume or a bind mount?
Only read below if you want to hear more detailed reasoning from me
The documentation for the volume state, if I read them correctly, that the content of the volume will be in a place on the host system which docker has access to. This makes me think that when I use a volume, Docker will try to place a copy of the really large filesystem on the shared drive, somewhere on the device that the application will be running on.
The bind mount documentation says that the information will be stored anywhere on the host system. This seems to indicate to me that the original information will remain on the shared drive, without creating any copies. But several other questions on this site have stated that the performance of the bind mount is a lot worse than the volume.
Since you already know the location on the host system you want to use, you should use a bind mount.
docker run -v /mnt/very-large-device:/data ...
With a named volume, the storage is in space Docker controls, usually inside /var/lib/docker/volumes on a native-Linux host or inside the hidden VM on other systems. You can't (easily) control where the underlying storage actually is. (You could configure the system to have the entire Docker installation on the large disk, or use extended volume options to create a named volume that's actually a bind mount; both are more complicated than using Docker's native bind-mount option.)
... the performance of the bind mount is a lot worse than the volume.
This is principally true on non-Linux systems. On native-Linux systems, bind mounts and named volumes should have identical performance, and that should be approximately the same as the container filesystem. On non-Linux systems, though, Docker needs to bridge between the Linux system inside the hidden VM and the host's filesystem, and this can be slow depending on access patterns.
As will all performance questions, the best way to determine which thing will be fastest is to actually set up an experiment on your intended system and measure it.
Looking at multiple options to implement a shared storage for a Docker Swarm, I can see most of them require a special Docker plugin:
sshFs
CephFS
glusterFS
S3
and others
... but one thing that is not mentioned anywhere is just mounting a typical block storage to all VPS nodes running the Docker Swarm. Is this option impractical and thus not mentioned on the Internet? Am I missing something?
My idea is as follows:
Create a typical Block Storage (like e.g. one offered by DigitalOcean or Vultr).
Mount it to your VPS filesystem.
Mount a folder from that Block Storage as a volume in the Docker Container / Docker Worker with using a "local" driver.
Sounds the simplest and most obvious to me. Why people are using more complicated setups like sshFs, CephFS etc? And most importantly, is the implementation I described viable, and if so, what are the drawbacks of doing it this way?
The principal advantage of using a volume plugin over mounted storage comes down to the ability to create storage volumes dynamically, and associate them with namespaces.
i.e. with docker managing storage for a volume via a "volumes: data:" directive in a compose file, a volume will be created for each named stack that is deployed.
Using the local driver and mounts, you the swarm admin now need to ensure that no two stacks are trying to use /mnt/data.
Once you pass that hurdle, some platforms have limitations to the number of hosts a block storage can be mounted on to.
Theres also the security angle to consider - with the volume mapped like that a compromose to any service on any host can potentially expose all your data to an attacker, where a volume plugin will expose just exactly the data mounted on that container.
All that said - docker swarm is awesome and the current plugin space is lacking - if mounting block storage is what it takes to get a workable storage solution I say do it. Hopefully the CSI support will be ready before year end however.
I'm new to Docker Swarm. As I understand, Docker Swarm allows you to abstract from clustering. Means you don't care on which hardriwe container is deployed.
On the other hand, the standard way to handle database in Docker - is to write data outside Docker container (to avoid copy-on-write behaviour). That's achieved by mounting a Volume and write db-related data to it. The important thing here - are Volumes machine-specific? Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
Example:
I have 3 machines and 3 microservices/containers. All of them are deployed through Docker Swarm. Only one microservice/container must connect to a database. So I need to mount Volume only on one machine. But on which?
Databases and similar stateful applications are still a hard thing to deal with when it comes to Docker swarm and other orchestration frameworks. Ideally, containers should be able to run on any node in the swarm, but the problem comes when you need to persist data beyond the container's lifecycle.
Mounting a volume is the Docker way to persist data, however this ties the container with a specific node as volumes are created on the specific nodes. There are many projects that try to solve this problem and provide some sort of distributed storage.
There was a project called Flocker that deals with the above problem (it’s no longer maintained). There is also a newer project called REXRAY.
Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
By default, no. Docker swarm will choose one of the nodes and deploy the container on it. However, you can work around this problem:
First, you need to define a named volume in you Stackfile/Composefile under the service definition.
Second, you need to use node Placement Constraints to restrict where the database container should run.
If you do not you a distributed storage tool, then when it comes to databases and similar stateful containers that need volumes, you need to restrict the container to a specific nodes.