Handling multiple persistant volumes for a Rethinkdb Docker Swarm - docker

I'm currently using RethinkDB across cloud servers by manually joining each server at setup. I'm interested in moving over to a Swarm approach to make scaling and failover easier. The current approach is cumbersome to scale.
In the current manual approach, I simply create a local folder on each server for RDB and mount as a volume to store its data. However, using a Swarm means that I'd need to handle volumes more dynamically. Each container will need a distinct volume to keep data separate in case of errors.
Any recommendations on how to handle this scenario? A lot of the tutorials I've seen so far mention Flocker to manage persistent storage, but I can't see that being handled dynamically.

Currently I am struggling with a situation like this. I've created a temporary fix with GlusterFS.
What you do is install GlusterFS on all the Docker nodes and mount the folders. This way the data exists on all the nodes. But this is less than ideal if you have a lot of writes. This could be slow because of the way Gluster treats your data replication to prevent data loss. It is solid, but I have some issues with the speed.
In your case I would suggest looking into Flocker. Flocker is a volume plugin that migrates your data when a container moves to another host. I haven't had any experience with it, but in my case the concept of Flocker renders useless, I need my data in multiple containers on multiple hosts (Read only) This is where Gluster came into play

Related

What purpose to ephemeral volumes serve in Kubernetes?

I'm starting to learn Kubernetes recently and I've noticed that among the various tutorials online there's almost no mention of Volumes. Tutorials cover Pods, ReplicaSets, Deployments, and Services - but they usually end there with some example microservice app built using a combination of those four. When it comes to databases they simply deploy a pod with the "mongo" image, give it a name and a service so that other pods can see it, and leave it at that. There's no discussion of how the data is written to disk.
Because of this I'm left to assume that with no additional configuration, containers are allowed to write files to disk. I don't believe this implies files are persistent across container restarts, but if I wrote a simple NodeJS application like so:
const fs = require("fs");
fs.writeFileSync("test.txt", "blah");
const value = fs.readFileSync("test.txt", "utf8");
console.log(value);
I suspect this would properly output "blah" and not crash due to an inability to write to disk (note that I haven't tested this because, as I'm still learning Kubernetes, I haven't gotten to the point where I know how to put my own custom images in my cluster yet -- I've only loaded images already on Docker Hub so far)
When reading up on Kubernetes Volumes, however, I came upon the Ephemeral Volume -- a volume that:
get[s] created and deleted along with the Pod
The existence of ephemeral volumes leads me to one of two conclusions:
Containers can't write to disk without being granted permission (via a Volume), and so every tutorial I've seen so far is bunk because mongo will crash when you try to store data
Ephemeral volumes make no sense because you can already write to disk without them, so what purpose do they serve?
So what's up with these things? Why would someone create an ephemeral volume?
Container processes can always write to the container-local filesystem (Unix permissions permitting); but any content that goes there will be lost as soon as the pod is deleted. Pods can be deleted fairly routinely (if you need to upgrade the image, for example) or outside your control (if the node it was on is terminated).
In the documentation, the types of ephemeral volumes highlight two major things:
emptyDir volumes, which are generally used to share content between containers in a single pod (and more specifically to publish data from an init container to the main container); and
injecting data from a configMap, the downward API, or another data source that might be totally artificial
In both of these cases the data "acts like a volume": you specify where it comes from, and where it gets mounted, and it hides any content that was in the underlying image. The underlying storage happens to not be persistent if a pod is deleted and recreated, unlike persistent volumes.
Generally prepackaged versions of databases (like Helm charts) will include a persistent volume claim (or create one per replica in a stateful set), so that data does get persisted even if the pod gets destroyed.
So what's up with these things? Why would someone create an ephemeral volume?
Ephemeral volumes are more of a conceptual thing. The main need for this concept is driven from microservices and orchestration processes, and also guided by 12 factor app. But why?
Because, one major use case is when you are deploying a number of microservices (and their replicas) using containers across multiple machines in a cluster you don't want a container to be reliant on its own storage. If containers rely on their on storage, shutting them down and starting new ones affects the way your app behaves, and this is something everyone wants to avoid. Everyone wants to be able to start and stop containers whenever they want, because this allows easy scaling, updates, etc.
When you actually need a service with persistent data (like DB) you need a special type of configuration, especially if you are running on a cluster of machines. If you are running on one machine, you could use a mounted volume, just to be sure that your data will persist even after container is stopped. But if you want to just load balance across hundreds of stateless API services, ephemeral containers is what you actually want.

Docker design: exchange data between containers or put multiple processes in one container?

In a current project I have to perform the following tasks (among others):
capture video frames from five IP cameras and stitch a panorama
run machine learning based object detection on the panorama
stream the panorama so it can be displayed in a UI
Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.
Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.
My thoughts regarding potential solutions:
shared volume. Potential downside: One extra write and read per frame might be too slow?
Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.
merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.
Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.
Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.
If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.
If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:
Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.
Each component reads a message from a RabbitMQ queue that names a file to process.
The component reads the file and does its work.
When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.
In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.
This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.
Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.
Alright, Let's unpack this:
IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.
MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)
Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.
If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.
https://docs.docker.com/config/containers/multi-service_container/

Backup-friendly Docker volumes

I want to take a holistic approach backing up multiple machines running multiple Docker containers. Some might run, for example, Postgres databases. I want to back up this system, without having to have specific backup commands for different types of volumes.
It is fine to have a custom external script that sends e.g. signals to containers or runs Docker commands, but I strongly want to avoid anything specific to a certain image or type of image. In the example of Postgres, the documentation suggests running postgres-specific commands to backup databases, which goes against the design goals for the backup solution I am trying to create.
It is OK if I have to impose restrictions on the Docker images, as long as it is reasonably easy to implement by starting from existing Docker images and extending.
Any thoughts on how to solve this?
I just want to stress that I am not looking for a solution for how to back up Postgres databases under Docker, there are already many answers explaining how to do so. I am specifically looking for a way to back up any volume, without having to know what it is or having to run specific commands for its data.
(I considered whether this question belonged on SO or Serverfault, but I believe this is a problem to be solved by developers, hence it belongs here. Happy to move it if consensus is otherwise)
EDIT: To clarify, I want do something similar to what is explained in this question
How to deal with persistent storage (e.g. databases) in docker
but using the approach in the accepted answer is not going to work with Postgres (and I am sure other database containers) according to documentation.
I'm skeptical that there is a custom solution, holistic, multi machine, multi container, application/container agnostic approach. From my point of view there is a lot of orchestration activities necessary in the first place. And I wonder if you wouldn't use something like Kubernetes anyways that - supposedly - comes with its own backup solution.
For single machine, multi container setup I suggest to store your container's data, configuration, and eventual build scripts within one directory tree (e.g. /docker/) and use a standard file based backup program to backup the root directory.
Use docker-compose to managed your containers. This lets you store the configuration and even build options in a file(s). I have an individual compose file for each service, but a single one would also work.
Have a subdirectory for each service. Mount bind-mount directories aka volumes of the container there. If you need to adapt the build process more thoroughly you can easily store scripts, sources, Dockerfiles, etc. in there as well.
Since containers are supposed to be ephemeral, all persistent data should be in bind-mount and therefore in the main docker directory.

Container with app + DB

I've been making some tests with docker and so far I'm wondering why it's considered a good practice to separate the DB and the app in two containers.
Having two containers seems to be cumbersome to manage and I don't really see the value in it.
Whereas I like the idea of having a self sustainable container per app.
One reason is the separation of data storage and application. If you you put both in their own container, you can update them independently. In my experience this is a common process, because usually the application will evolve faster than the underlying database.
It also frees you to run the containers in different places, which might be a constraint in your operations. Or to run multiple containers from the same database image with different applications.
Often it is also a good thing to be able to scale the UI from one instance to multiple instance, all connected to the same database (or cache instance or HTTP backend). This is mentioned briefly in the docker best practices.
I also understand the urge to run multiple processes in one container. That's why so many minimalist init systems/supervisors like s6 came up lately. I prefer this for demos of applications which require a couple things, like an nginx for frontend, a database and maybe a redis instance. But you could also write a basic docker-compose file and run the demo with multiple containers.
It depends on what you consider your "DB", is it the database application or the content.
The latter is easy, the content needs to be persisted outside the lifetime of the application. The convention used to be to have a "data" container, which simplified linking it with the application (e.g. using the Docker Engine create command --volumes-from parameter). With Docker 1.9 there is a new volume API which has superceded the concept of "data" containers. But you should never store your data in the overlay filesystem (if not only for persistence, but for performance).
If you are referring to a database application, you really enter a semi-religious debate with the microservices crowd. Docker is built to run single process. It is built for 12-factor apps. It is built for microservices. It is definitely possible to run more than one process in a container, but with it you have to consider the additional complexity of managing/monitoring these processes (e.g. using an init process like supervisord), dealing with logging, etc.
I've delivered both. If you are managing the container deployment (e.g. you are hosting the app), it is actually less work to use multiple containers. This allows you to use Docker's abstraction layers for networking and persistent storage. It also provides maximum portability as you scale the application (perhaps you may consider using convoy or flocker volume drivers or an overlay network for hosting containers across multiple servers). If you are developing a product for distribution, it is more convenient to deliver a single Docker Repository (with one Image). This minimizes the support costs as you guide customers through deployment.

How to share volumes across multiple hosts in docker engine swarm mode?

Can we share a common/single named volume across multiple hosts in docker engine swarm mode, what's the easiest way to do it ?
If you have an NFS server setup you can use use some nfs folder as a volume from docker compose like this:
volumes:
grafana:
driver: local
driver_opts:
type: nfs
o: addr=192.168.xxx.xx,rw
device: ":/PathOnServer"
In the grand scheme of things
The other answers are definitely correct. If you feel like you're still missing something or are coming to the conclusion that things might never really improve in this space, then you might want to reconsider the use of the typical POSIX-like hierarchical filesystem abstraction. Not all applications really need it (I might go as far as to say that few do). Maybe yours doesn't either.
In defense of filesystems
It is still very common in many circles, but usually these people know their remote/distributed filesystems very well and know how to set them up and leverage them properly (and they might be very good systems too, though often not with existing Docker volume drivers). Sometimes it's also in part because they're simply forced to (codebases that can't or shouldn't be rewritten to support other storage backends). Using, configuring or even writing arbitrary Docker volume drivers would be a secondary concern only.
Alternatives
If you have the option however, then evaluate other persistence solutions for your applications. Many implementations won't use POSIX filesystem interfaces but network interfaces instead, which pose no particular infrastructure-level difficulties in clusters such as Docker Swarm.
Solutions managed by third-parties (e.g. cloud providers)
Should you succeed in removing all dependencies to filesystems for persistent and shared data (it's still fine for transient local state), then you might claim to have fully "stateless" applications. Of course there is often always state persisted somewhere still, but the idea is that you don't handle it yourself. Many cloud providers (if that's where you're hosting things) will offer fully managed solutions for handling persistent state such that you don't have to care about it at all. If you're going this route, do consider managed services that use APIs compatible with implementations that you can use locally for testing (for example by running a Docker container based on an image for that implementation that is provided by a third-party or that you can maintain yourself).
DIY solutions
If you do want to manage persistent state yourself within a Docker Swarm cluster, then the filesystem abstraction is often inevitable (and you'd probably have more difficulties targeting block devices directly anyway). You'll want to play with node and service constraints to ensure the requirements of whatever you use to persist data are fulfilled. For certain things like a central DBMS server it could be easy ("always run the task on that specific node only"), for others it could be way more involved.
The task of setting up, scaling and monitoring such a setup is definitely not trivial, which is why many application developers are happy to let somebody else (e.g. cloud providers) do it. It's still a very cool space to explore however, though given you had to ask that question it's likely not something you should focus on if you're on a deadline.
Conclusion
As always, use the right abstraction for the job, and pause to think about what your strengths are and where to spend your resources.
From scratch, Docker does not support this by itself. You must use additional components either a docker plugin which would provide you with a new layer type for your volumes, or a sync tool directly on your FS which will sync the data for you.
From my point of view, the easiest solution is rsync or more accurately lsyncdn the daemon version of rsync. But I never tried it for docker volumes, so I can't tell if it handle it fine.
Other solutions are offered using Infinit.sh. It basically does the same thing as lsyncd does. It's a one way sync. So if your docker container are RW in their volumes it won't match your expectations. I tried this solution, and it works pretty well for RO operations. And not in production. It's still an alpha version. Infinit is also on the way to provide a docker driver. Not released yet. So I didn't even tried it. Too risky.
Other solutions I found but was unable to install (and so to try) are flocker and glusterFS. Both are designed to create FS Volume based on several HDD from several machines. But none of their repositories were working these past weeks.
Sorry for giving you only weak solutions, but I'm facing the same problem and haven't find yet a perfect solution.
Cheers,
Olivier

Resources