How to use stateful services (databases) in Rancher? - docker

I need to use Postgres database in Rancher stack (Cattle).
I made environment for my application, that contains api, frontend and database services (in one stack). I want to use multiple hosts for this environment, but if I will add some hosts, database can't properly work, because volume with data will only exists on one host.
I think, I cannot use NFS (or some other network storage) for databases because of IO speed. Am I right? Is there any workflow to use databases in Rancher?
I thought, that I can bind service only to one host of environment, but I didn't find this setting.

Rancher has Storage Services to manage Volumes.
NFS (rancher-nfs) will not work very well for a read/write Postgres database. If you're running on AWS Rancher also supports EBS volumes via convoy
For vanilla Docker local is the only volume type that will work in most places but these are obviously not shared volumes, it's local to the Docker host. If you create an environment scoped local volume for the database then the database container will always be scheduled to the node holding the storage.
Other shared storage volume types require the storage backend and a Docker volume plugin that can manage the movements of external storage. REX-Ray supports a number of storage providers. Flocker also supports a number of storage providers.

Related

What are the differences between the concepts of storage driver and volume driver in Docker?

I'm studying about Docker and I couldn't understand very well the difference between what storage drivers are used for and what volume drivers are used for.
My theory (please correct me if I'm wrong) is the storage drivers manage the way Docker deals underneath with the writable layer, can use overlay, overlay2, aufs, zfs, btrfs and so on.
Volume drivers, however, deal with volumes underneath, like it can be a local volume (in this case I think it will use a storage driver) or can be a remote one (like EBS).
Am I right?
Docker uses storage drivers to store image layers, and to store data in the writable layer of a container. Docker uses Volume drivers for write-intensive data, data that must persist beyond the container’s lifespan, and data that must be shared between containers. So, I understand Storage Drivers are used with image and container layers while Volume Drivers are used for persistent container application data. See the first three paragraphs of this Docker documentation: https://docs.docker.com/storage/storagedriver/
Docker engine volume enable engine deployments to be integrated with external storage systems such as Amazon EBS, and enable data volumes to persist beyond the lifetime of a single Docker host. Here the term 'local' in concept of Docker volume driver means the volumes esdata1 and esdata2 are created on the same Docker host where you run your container. By using other Volume plugins, e.g.,-driver=flocker. You are able to create a volume on a external host and mount it to the local host, say, /data-path.

Is it practical to mount a block storage to multiple VPS running Docker Swarm for shared storage?

Looking at multiple options to implement a shared storage for a Docker Swarm, I can see most of them require a special Docker plugin:
sshFs
CephFS
glusterFS
S3
and others
... but one thing that is not mentioned anywhere is just mounting a typical block storage to all VPS nodes running the Docker Swarm. Is this option impractical and thus not mentioned on the Internet? Am I missing something?
My idea is as follows:
Create a typical Block Storage (like e.g. one offered by DigitalOcean or Vultr).
Mount it to your VPS filesystem.
Mount a folder from that Block Storage as a volume in the Docker Container / Docker Worker with using a "local" driver.
Sounds the simplest and most obvious to me. Why people are using more complicated setups like sshFs, CephFS etc? And most importantly, is the implementation I described viable, and if so, what are the drawbacks of doing it this way?
The principal advantage of using a volume plugin over mounted storage comes down to the ability to create storage volumes dynamically, and associate them with namespaces.
i.e. with docker managing storage for a volume via a "volumes: data:" directive in a compose file, a volume will be created for each named stack that is deployed.
Using the local driver and mounts, you the swarm admin now need to ensure that no two stacks are trying to use /mnt/data.
Once you pass that hurdle, some platforms have limitations to the number of hosts a block storage can be mounted on to.
Theres also the security angle to consider - with the volume mapped like that a compromose to any service on any host can potentially expose all your data to an attacker, where a volume plugin will expose just exactly the data mounted on that container.
All that said - docker swarm is awesome and the current plugin space is lacking - if mounting block storage is what it takes to get a workable storage solution I say do it. Hopefully the CSI support will be ready before year end however.

Do replicated docker contianers in swarm mode contain multiple copies of data?

I have recently started learning docker. However when studying swarm mode I see that containers can be scaled up. What I would like to know is once you scale conatiner in replicated mode will the data within the container be replicated too ? or just fresh containers will be spawned ?
For example lets say I created mysql service initially only with 1 copy. I create and update tables in that mysql container. Later I scale it to 3, will newly spawned containers contain same table data ? Also will the data be continuously be replicated across 3 docker instances ?
A replicated service will use fresh container instances per container. Swarm does not take care about replication of persistent data to be stored in volumes.
Dependening on the volume plugin (e.g. local driver /w remote nfs shares) you are limited to read-write-once or read-write-many. Even if your volume allows read-write-many, the service replicas might not support that, for instance mysql will not work if you point n replicas to the same volume. You can leverage swarm service template variables for instance to point your volumes to different target folders of the same nfs share.
Also with swarm, you will want to have storage that needs to be reachable from all nodes, as a container can die and be re-spawned on a different node. So either you will need to use a remote share based on NFS or CIFS (see example usages nfs cifs), a storage cluster like Ceph or GlusterFS or a cloud native storage like Portworx. While you have to take care of HA for remote share solutions, data replication is build in for storage clusters and cloud native storage.
In case a containerized service itself is cluster/replica aware it is usualy better to not use the swarm replica mechanism - unless all instances can be started with the same set of parameters.

Docker volumes vs nfs

I would like to know if it is logical to use a redundant NFS/GFS share for webcontent instead of using docker volumes?
I'm trying to build a HA docker environment with the least amount of additional tooling. I would like to stick to 3 servers, each a docker swarm node.
Currently I'm looking into storage: an NFS/GFS filesystem cluster would require additional tooling for a small environment (100gb max storage). I would like to only use native docker supported configurations. So I would prefer to use volumes and share those across containers. However, those volumes are, for as far as I know, not synchronized to other swarm nodes by default.. so if the swarm node that hosts the data volume goes down it will be unavailable for each container across the swarm..
A few things, that together, should answer your question:
Volumes use a driver, and the default driver in Docker run and Swarm services is the built-in "local" driver which only supports file paths that are mounted on that host. For using shared storage with Swarm services, you'll want a 3rd party plugin driver, like REX-Ray. An official list is here: store.docker.com
What you want to look for in a volume driver is one that's "docker swarm aware" that will re-attach volumes to a new task created if old Swarm service task is killed/updated. Tools like REX-Ray are almost like a "persistent data orchestrator" that ensures volumes are attached to the proper node where they are needed.
I'm not sure what web content you're talking about, but if it's code or templates, it should be built into the image. If you're talking about user uploaded content that needs to be backed up, then yep a volume sounds like the right way.

Persistent storage solution for Docker on AWS EC2

I want to deploy a node-red server on my AWS EC2 cluster. I got the docker image up and running without problems. Node-red stores the user flows in a folder named /data. Now when the container is destroyed the data is lost. I have red about several solutions where you can mount a local folder into a volume. What is a good way to deal with persistent data in AWS EC2?
My initial thoughts are to use a S3 volume or mount a volume in the task definition.
It is possible to use a volume driver plugin with docker that supports mapping EBS volumes.
Flocker was one of the first volume managers, it supports EBS and has evolved to support a lot of different back ends.
Cloudstor is Dockers volume plugin (It comes with Docker for AWS/Azure).
Blocker is an EBS only volume driver.
S3 doesn't work well for all file system operations as you can't update a section of an object, so updating 1 byte of a file means you have to write the entire object again. It's also not immediately consistent so a write then read might give you odd/old results.
The EBS volume can only be attached to one instance which means that you can only run your docker containers in one EC2 instance. Assuming that you would like to scale your solution in future with many containers running in ECS cluster then you need to look into EFS. It’s a shared system from AWS. The only issue is performance degradation of EFS over EBS.
The easiest way (and the most common approach) is run your docker with -v /path/to/host_folder:/path/to/container_folder option, so the container will refer to host folder and information will stay after it will be restarted or recreated. Here the detailed information about docker volume system.
I would use AWS EFS. It is like a NAS in that you can have it mounted to multiple instances at the same time.
If you are using ECS for your docker host the following guide may be helpful http://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_efs.html

Resources