Mount rexray/ceph volume in multiple containers on Docker swarm

Mount rexray/ceph volume in multiple containers on Docker swarm - docker

What I have done
I have built a Docker Swarm cluster where I am running containers that have persistent data. To allow the container to move to another host in the event of failure I need resilient shared storage across the swarm. After looking into the various options I have implemented the following:
Installed a Ceph Storage Cluster across all nodes of the Swarm and create a RADOS Block Device (RBD).
http://docs.ceph.com/docs/master/start/quick-ceph-deploy/
Installed Rexray on each node and configure it to use the RBD created above. https://rexray.readthedocs.io/en/latest/user-guide/storage-providers/ceph/
Deploy a Docker stack that mounts a volume using the rexray driver e.g.
version: '3'
services:
test-volume:
image: ubuntu
volumes:
- test-volume:/test
volumes:
test-volume:
driver: rexray
This solution is working in that I can deploy a stack, simulate a failure on the node that is running then observe the stack restarted on another node with no loss of persistent data.
However, I cannot mount a rexray volume in more than one container. My reason for doing is to use a short lived "backup container" that simply tars the volume to a snapshot backup while the container is still running.
My Question
Can I mount my rexray volumes into a second container?
The second container only needs read access so it can tar the volume to a snapshot backup while keeping the first container running.

Unfortunately the answer is no, in this use case rexray volumes cannot be mounted into a second container. Some information below will hopefully assist anyone heading down a similar path:
Rexray does not support multiple mounts:
Today REX-Ray was designed to actually ensure safety among many hosts that could potentially have access to the same host. This means that it forcefully restricts a single volume to only be available to one host at a time. (https://github.com/rexray/rexray/issues/343#issuecomment-198568291)
But Rexray does support a feature called pre-emption where:
..if a second host does request the volume that he is able to forcefully detach it from the original host first, and then bring it to himself. This would simulate a power-off operation of a host attached to a volume where all bits in memory on original host that have not been flushed down is lost. This would support the Swarm use case with a host that fails, and a container trying to be re-scheduled.
(https://github.com/rexray/rexray/issues/343#issuecomment-198568291)
However, pre-emption is not supported by the Ceph RBD.
(https://rexray.readthedocs.io/en/stable/user-guide/servers/libstorage/#preemption)

You could of course have a container that attaches the volume, and then exports it via nfs on a dedicated swarm network, the client containers could then access it via nfs

Related

Do replicated docker contianers in swarm mode contain multiple copies of data?

I have recently started learning docker. However when studying swarm mode I see that containers can be scaled up. What I would like to know is once you scale conatiner in replicated mode will the data within the container be replicated too ? or just fresh containers will be spawned ?
For example lets say I created mysql service initially only with 1 copy. I create and update tables in that mysql container. Later I scale it to 3, will newly spawned containers contain same table data ? Also will the data be continuously be replicated across 3 docker instances ?

A replicated service will use fresh container instances per container. Swarm does not take care about replication of persistent data to be stored in volumes.
Dependening on the volume plugin (e.g. local driver /w remote nfs shares) you are limited to read-write-once or read-write-many. Even if your volume allows read-write-many, the service replicas might not support that, for instance mysql will not work if you point n replicas to the same volume. You can leverage swarm service template variables for instance to point your volumes to different target folders of the same nfs share.
Also with swarm, you will want to have storage that needs to be reachable from all nodes, as a container can die and be re-spawned on a different node. So either you will need to use a remote share based on NFS or CIFS (see example usages nfs cifs), a storage cluster like Ceph or GlusterFS or a cloud native storage like Portworx. While you have to take care of HA for remote share solutions, data replication is build in for storage clusters and cloud native storage.
In case a containerized service itself is cluster/replica aware it is usualy better to not use the swarm replica mechanism - unless all instances can be started with the same set of parameters.

same data volume attached to multiple container even in different host

I'm able to bind a docker volume to a specific container in a swarm thanks to flocker, but now i would have multiple replicas of my server (to do load balancing) and so i'm searching something to bind the same data volume to multiple replicas of a docker service.
In flocker documentaiton i have found that
Can more than one container access the same volume? Flocker works by
creating a 1 to 1 relationship of a volume and a container. This means
you can have multiple volumes for one container, and those volumes
will always follow that container.
Flocker attaches volumes to the individual agent host (docker host)
and this can only be one host at a time because Flocker attaches
Block-based storage. Nodes on different hosts cannot access the same
volume, because it can only be attached to one node at a time.
If multiple containers on the same host want to use the same volume,
they can, but be careful because multiple containers accessing the
same storage volume can cause corruption.
Can I attach a single volume to multiple hosts? Not currently, support
from multi-attach backends like GCE in Read Only mode, or NFS-like
backends like storage, or distributed filesystems like GlusterFS would
need to be integrated. Flocker focuses mainly on block-storage uses
cases that attach a volume to a single node at a time.
So i think is no possible to do what i want with flocker.
I could use a different orchestrator (k8s) if that could help me, even if i have no experience with that.
I would not use NAS/NFS or anything distribuited filesystems.
Any suggestions?
Thanks in advance.

In k8s, you can mount volume to different Pods at the same time if technology that backs the volume supports shared access.
As mentioned in Kubernetes Persistent Volumes:
Access Modes A PersistentVolume can be mounted on a host in any way
supported by the resource provider. As shown below, providers will
have different capabilities and each PV’s access modes are set to the
specific modes supported by that particular volume. For example, NFS
can support multiple read/write clients, but a specific NFS PV might
be exported on the server as read-only. Each PV gets its own set of
access modes describing that specific PV’s capabilities.
The access modes are:
ReadWriteOnce – the volume can be mounted as read-write by a single node
ReadOnlyMany – the volume can be mounted read-only by many nodes
ReadWriteMany – the volume can be mounted as read-write by many nodes
Types of volumes that supports ReadOnlyMany mode:
AzureFile
CephFS
FC
FlexVolume
GCEPersistentDisk
Glusterfs
iSCSI
Quobyte
NFS
RBD
ScaleIO
Types of volumes that supports ReadWriteMany mode:
AzureFile
CephFS
Glusterfs
Quobyte
RBD
PortworxVolume
VsphereVolume(works when pods are collocated)

Persistent storage for WebDAV server on docker swarm?

How can I achieve a persistent storage for a WebDAV server running on several/any swarm nodes?
It's part of a docker-compose app running on my own vSphere infrastructure.
I was thinking about mounting an external NFS share from insde the containers (at the OS level, not docker volumes) but then how would that be better than having WebDAV outside the swarm cluster?

I can think of 2 options:
Glusterfs
This option is vSphere independent. You can create replicated bricks and store your volumes on them. Exposing same volume to multiple docker hosts. So in case of node failure the container will get restarted on another node and has it's persistent storage with it. You can also mount the persistent data on multiple containers.
There is one catch: Same diskspace will be consumed on multiple nodes.
Docker-Volume-vSphere
This option requires vsphere hosts. You can create docker volumes on vmfs datastores. they will be shared between docker hosts (virtual machines). So in case of failure the container restarts on another node and has persistent data available. Multiple containers can share a single volume.

How to delete volumes in swarm cluster?

I have a swarm cluster with one manager and another normal node , when I create a swarm service I am creating with mount type ,mount source and mount target . It creates the volume with the same name in both manger and node and starts the container and my service is up.
When I release the service the volume created along with the service was not deleted, this is still fine.
The problem I am facing is when I delete the volume with the same endpoint it's only deleting the volume in swarm manager, the volume created in the node while creating the service still exists.
I want the manager to delete all the volumes which is created along with the swarm service. Is there a way ??

After so much of analysis here is the theory.
if you are instructing swarm to create the service with volume, Swarm is only performing actions on creating the services inside the cluster i.e on the multiple nodes yes when you send the volume details yes it does creates the volume as well but while releasing the service it fails to check in the worker nodes for existence of volume while releasing Its the bug in docker
I have raised the bug in docker for it.
As of now there is no other way than manually releasing the volume from worker nodes after releasing the swarm service .

As far as I know a volume is only created on nodes where a container is created. Is it possible that your service fails to start on one node, ends up on the other and somehow swarm doesn't clean up? If thats the case write an issue in github.
Update (from comments):
According to the docker service create documentation:
A named volume is a mechanism for decoupling persistent data needed by your container from the image used to create the container and from the host machine. Named volumes are created and managed by Docker, and a named volume persists even when no container is currently using it. Data in named volumes can be shared between a container and the host machine, as well as between multiple containers. Docker uses a volume driver to create, manage, and mount volumes. You can back up or restore volumes using Docker commands.
So if you are using named volumes the correct answer would be why are they removed on the manager and where they ever created there?

Docker swarm NFS volumes,

I am playing with docker's 1.12 swarm with Orchestration! But there is one issue I am not able to find an answer to:
In this case if you're running a service like nginx or redis you don't worry about the data persistence,
But if you're running a service like a database we need data persistance so if something happens to your docker instance the master will shuttle the docker instance to one of the available nodes, by default docker doesn't move data volumes to other nodes to address this problem. We can use third party plugins like Flocker (https://github.com/ClusterHQ/flocker), Rexray ("https://github.com/emccode/rexray") to solve the issue.
But the problem with this is: when one node fails you lose the data. Flocker or Rexray does not deal with this.
We can solve this if we use something like NFS. I mount the same volume to across my nodes in this case we don't have to move the data between two nodes. If one of the nodes fail its need to remember the docker mount location, can we do this? If so can we achieve this with docker Swarm Built-In Orchestration!

Using Rexray, then the data is stored outside the docker swarm nodes (in Amazon S3, Openstack Cinder, ...). So If you loose a node, you won't loose your persistent data. If your scheduler mounts a new container which needs the data on another host, it will retrieve the external volume using rexray plugin and you're ok to go.
Note: your external provider needs to allow you to perform forced detach of the volume from the now unavailable old nodes.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart