Persistent storage solution for Docker on AWS EC2

Persistent storage solution for Docker on AWS EC2 - docker

I want to deploy a node-red server on my AWS EC2 cluster. I got the docker image up and running without problems. Node-red stores the user flows in a folder named /data. Now when the container is destroyed the data is lost. I have red about several solutions where you can mount a local folder into a volume. What is a good way to deal with persistent data in AWS EC2?
My initial thoughts are to use a S3 volume or mount a volume in the task definition.

It is possible to use a volume driver plugin with docker that supports mapping EBS volumes.
Flocker was one of the first volume managers, it supports EBS and has evolved to support a lot of different back ends.
Cloudstor is Dockers volume plugin (It comes with Docker for AWS/Azure).
Blocker is an EBS only volume driver.
S3 doesn't work well for all file system operations as you can't update a section of an object, so updating 1 byte of a file means you have to write the entire object again. It's also not immediately consistent so a write then read might give you odd/old results.

The EBS volume can only be attached to one instance which means that you can only run your docker containers in one EC2 instance. Assuming that you would like to scale your solution in future with many containers running in ECS cluster then you need to look into EFS. It’s a shared system from AWS. The only issue is performance degradation of EFS over EBS.

The easiest way (and the most common approach) is run your docker with -v /path/to/host_folder:/path/to/container_folder option, so the container will refer to host folder and information will stay after it will be restarted or recreated. Here the detailed information about docker volume system.

I would use AWS EFS. It is like a NAS in that you can have it mounted to multiple instances at the same time.
If you are using ECS for your docker host the following guide may be helpful http://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_efs.html

Related

Binding of volume to Docker Container via Kubernetes

I'm new to the area, sorry if my question sounds dumb.
What I'm looking for: I have a containers pod, where one of the containers (alpine based) should read/write from/to the customer's provided file. I don't want to limit customer on how to provide file (or at least to support most common ways).
And file's size might be huge sometimes (not sure if that requirement makes any difference).
The more flexibility here the better.
From the initial search I found there are multiple ways to bind the volume/directory to docker's container:
Docker bind mount - sharing dir between Host and container (nice to have)
Add a docker volume to the pod (must have)
Mount AWS S3 bucket to docker's container (must have)
any other ways of supplying file access to the container? Let's say from the remote machine via sftp access?
But main question - is it all possible to configure via Kubernetes?
Ideally in the same yaml file that starts the containers?
Any hints/examples are very welcome!

It surely is possible!
Like there are volume mount for a docker container, there are volume mounts in Kubernetes as well.
This is achieved using Persistent Volume Claim, PVC. These are Pod lifecycle independent storage classes to store the data in the volume mount.
Understand more about the concept here: https://kubernetes.io/docs/concepts/storage/persistent-volumes/

Is it practical to mount a block storage to multiple VPS running Docker Swarm for shared storage?

Looking at multiple options to implement a shared storage for a Docker Swarm, I can see most of them require a special Docker plugin:
sshFs
CephFS
glusterFS
S3
and others
... but one thing that is not mentioned anywhere is just mounting a typical block storage to all VPS nodes running the Docker Swarm. Is this option impractical and thus not mentioned on the Internet? Am I missing something?
My idea is as follows:
Create a typical Block Storage (like e.g. one offered by DigitalOcean or Vultr).
Mount it to your VPS filesystem.
Mount a folder from that Block Storage as a volume in the Docker Container / Docker Worker with using a "local" driver.
Sounds the simplest and most obvious to me. Why people are using more complicated setups like sshFs, CephFS etc? And most importantly, is the implementation I described viable, and if so, what are the drawbacks of doing it this way?

The principal advantage of using a volume plugin over mounted storage comes down to the ability to create storage volumes dynamically, and associate them with namespaces.
i.e. with docker managing storage for a volume via a "volumes: data:" directive in a compose file, a volume will be created for each named stack that is deployed.
Using the local driver and mounts, you the swarm admin now need to ensure that no two stacks are trying to use /mnt/data.
Once you pass that hurdle, some platforms have limitations to the number of hosts a block storage can be mounted on to.
Theres also the security angle to consider - with the volume mapped like that a compromose to any service on any host can potentially expose all your data to an attacker, where a volume plugin will expose just exactly the data mounted on that container.
All that said - docker swarm is awesome and the current plugin space is lacking - if mounting block storage is what it takes to get a workable storage solution I say do it. Hopefully the CSI support will be ready before year end however.

Share volume in docker swarm for many nodes

I'm facing a big challenge. Trying run my app on 2 VPS in docker swarm. Containers that use volumes should use shared volume between nodes.
My solution is:
Use plugin glusterFS and mount volume on every node using nfs. NFS generate single point of failure so when something go wrong my data are gone. (it's not look good maybe im wrong)
Use Azure Storage - store data as blob ( Azure Data Lake Storage Gen2 ). But my main is problem how can i connect to azure storage using docker-compose.yaml? I should declarate volume in every service that use volume and declare volume in volume section. I don't have idea how to do that.
Docker documentation about it is gone. Should be here https://docs.docker.com/docker-for-azure/persistent-data-volumes/.
Another option is use https://hub.docker.com/r/docker4x/cloudstor/tags?page=1&ordering=last_updated but last update was 2 years ago so its probably not supported anymore.
Do i have any other options and which share volume between nodes is best solution?

There are a number of ways of dealing with creating persistent volumes in docker swarm, none of them particularly satisfactory:
First up, a simple way is to use nfs, glusterfs, iscsi, or vmware to multimount the same SAN storage volume onto each docker swarm node. Services just mount volumes as /mnt/volumes/my-sql-workload
On the one hand its really simple, on the other hand there is literally no access control and you can easilly accidentally load services pointing at each others data.
Next, commercial docker volume plugins for SANs. If you are lucky and possess a Pure Storage, NetApp or other such SAN array, some of them still offer docker volume plugins. Trident for example if you have a NetApp.
Third. if you are in the cloud, the legacy swarm offerings on Azure and Aws included a built in "cloudstor" volume driver but you need to dig really deep to find it in their legacy offering.
Four, there are a number of opensource or free volume plugins that will mount volumes from nfs, glusterfs or other sources. But most are abandoned or very quiet. The most active I know of is marcelo-ochoa/docker-volume-plugins
I wasn't particularly happy with how those plugins mounted pre-existing volumes, but made operations like docker volume create hard, so I made my own, but really
Swarm Cluster Volume Support with CSI Plugins is hopefully going to drop in 2021¹. Which hopefully is a solid rebuttal to all the problems above.
¹Its now 2022 and the next version of Docker has not yet gone live with CSI support. Still we wait.

In my opinion, a good solution could be to create a GlusterFS cluster, configure a single volume and mount it in every Docker Swarm node (i.e. in /mnt/swarm-storage).
Then, for every Container that needs persistent storage, bind-mount a subdirectory of the GlusterFS volume inside the container.
Example:
services:
my-container:
...
volumes:
- type: bind
source: /mnt/swarm-storage/my-container
target: /a/path/inside/the/container
This way, every node shares the same storage, so that a given container could be instantiated indifferently on every cluster node.
You don't need any Docker plugin for a particular storage driver, because the distributed storage is transparent to the Swarm cluster.
Lastly, GlusterFS is a distributed filesystem, designed to not have a single point of failure and you can cluster it on as many node you like (contrary to NFS).

Docker volumes vs nfs

I would like to know if it is logical to use a redundant NFS/GFS share for webcontent instead of using docker volumes?
I'm trying to build a HA docker environment with the least amount of additional tooling. I would like to stick to 3 servers, each a docker swarm node.
Currently I'm looking into storage: an NFS/GFS filesystem cluster would require additional tooling for a small environment (100gb max storage). I would like to only use native docker supported configurations. So I would prefer to use volumes and share those across containers. However, those volumes are, for as far as I know, not synchronized to other swarm nodes by default.. so if the swarm node that hosts the data volume goes down it will be unavailable for each container across the swarm..

A few things, that together, should answer your question:
Volumes use a driver, and the default driver in Docker run and Swarm services is the built-in "local" driver which only supports file paths that are mounted on that host. For using shared storage with Swarm services, you'll want a 3rd party plugin driver, like REX-Ray. An official list is here: store.docker.com
What you want to look for in a volume driver is one that's "docker swarm aware" that will re-attach volumes to a new task created if old Swarm service task is killed/updated. Tools like REX-Ray are almost like a "persistent data orchestrator" that ensures volumes are attached to the proper node where they are needed.
I'm not sure what web content you're talking about, but if it's code or templates, it should be built into the image. If you're talking about user uploaded content that needs to be backed up, then yep a volume sounds like the right way.

Docker swarm NFS volumes,

I am playing with docker's 1.12 swarm with Orchestration! But there is one issue I am not able to find an answer to:
In this case if you're running a service like nginx or redis you don't worry about the data persistence,
But if you're running a service like a database we need data persistance so if something happens to your docker instance the master will shuttle the docker instance to one of the available nodes, by default docker doesn't move data volumes to other nodes to address this problem. We can use third party plugins like Flocker (https://github.com/ClusterHQ/flocker), Rexray ("https://github.com/emccode/rexray") to solve the issue.
But the problem with this is: when one node fails you lose the data. Flocker or Rexray does not deal with this.
We can solve this if we use something like NFS. I mount the same volume to across my nodes in this case we don't have to move the data between two nodes. If one of the nodes fail its need to remember the docker mount location, can we do this? If so can we achieve this with docker Swarm Built-In Orchestration!

Using Rexray, then the data is stored outside the docker swarm nodes (in Amazon S3, Openstack Cinder, ...). So If you loose a node, you won't loose your persistent data. If your scheduler mounts a new container which needs the data on another host, it will retrieve the external volume using rexray plugin and you're ok to go.
Note: your external provider needs to allow you to perform forced detach of the volume from the now unavailable old nodes.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart