Docker design: exchange data between containers or put multiple processes in one container? - docker

In a current project I have to perform the following tasks (among others):
capture video frames from five IP cameras and stitch a panorama
run machine learning based object detection on the panorama
stream the panorama so it can be displayed in a UI
Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.
Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.
My thoughts regarding potential solutions:
shared volume. Potential downside: One extra write and read per frame might be too slow?
Using a message queue or e.g. redis. Potential downside: yet another component in the architecture.
merging the two containers. Potential downside(s): Not only does it not feel right, but the two containers have completely different base images and dependencies. Plus I'd have to worry about parallelization.
Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.

Usually most communication between Docker containers is over network sockets. This is fine when you're talking to something like a relational database or an HTTP server. It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.
If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this. Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs). But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.
If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ. For local work this could be a Docker named volume or bind-mounted host directory; cloud storage like Amazon S3 will work fine too. The setup is like this:
Each component knows about the shared storage and connects to RabbitMQ, but is unaware of the other components.
Each component reads a message from a RabbitMQ queue that names a file to process.
The component reads the file and does its work.
When it finishes, the component writes the result file back to the shared storage, and writes its location to a RabbitMQ exchange.
In this setup each component is totally stateless. If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it. If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.
This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.
Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive). The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages). You get a choice between Docker named volumes and host bind-mounts as readily available shared storage. Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow. Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.

Alright, Let's unpack this:
IMHO Shared Volume works just fine, but gets way too messy over time. Especially if you're handling Stateful services.
MQ: This seems like a best option in my opinion. Yes, it's another component in your architecture, but it makes sense to have it rather than maintaining messy shared Volumes or handling massive container images (if you manage to combine 2 container images)
Yes, You could potentially do this, but not a good idea. Considering your use case, I'm going to go ahead and make an assumption that you have a massive list of dependencies which could potentially lead to a conflict. Also, lot of dependencies = larger image = Larger attack surface - which from a security perspective is not a good thing.
If you really want to run multiple processes in one container, it's possible. There are multiple ways to achieve that, however I prefer supervisord.
https://docs.docker.com/config/containers/multi-service_container/

Related

Why might an image run differently in Kubernetes than in Docker?

I'm experiencing an issue where an image I'm running as part of a Kubernetes deployment is behaving differently from the expected and consistent behavior of the same image run with docker run <...>. My understanding of the main purpose of containerizing a project is that it will always run the same way, regardless of the host environment (ignoring the influence of the user and of outside data. Is this wrong?
Without going into too much detail about my specific problem (since I feel the solution may likely be far too specific to be of help to anyone else on SO, and because I've already detailed it here), I'm curious if someone can detail possible reasons to look into as to why an image might run differently in a Kubernetes environment than locally through Docker.
The general answer of why they're different is resources, but the real answer is that they should both be identical given identical resources.
Kubernetes uses docker for its container runtime, at least in most cases I've seen. There are some other runtimes (cri-o and rkt) that are less widely adopted, so using those may also contribute to variance in how things work.
On your local docker it's pretty easy to mount things like directories (volumes) into the image, and you can populate the directory with some content. Doing the same thing on k8s is more difficult, and probably involves more complicated mappings, persistent volumes or an init container.
Running docker on your laptop and k8s on a server somewhere may give you different hardware resources:
different amounts of RAM
different size of hard disk
different processor features
different core counts
The last one is most likely what you're seeing, flask is probably looking up the core count for both systems and seeing two different values, and so it runs two different thread / worker counts.

Backup-friendly Docker volumes

I want to take a holistic approach backing up multiple machines running multiple Docker containers. Some might run, for example, Postgres databases. I want to back up this system, without having to have specific backup commands for different types of volumes.
It is fine to have a custom external script that sends e.g. signals to containers or runs Docker commands, but I strongly want to avoid anything specific to a certain image or type of image. In the example of Postgres, the documentation suggests running postgres-specific commands to backup databases, which goes against the design goals for the backup solution I am trying to create.
It is OK if I have to impose restrictions on the Docker images, as long as it is reasonably easy to implement by starting from existing Docker images and extending.
Any thoughts on how to solve this?
I just want to stress that I am not looking for a solution for how to back up Postgres databases under Docker, there are already many answers explaining how to do so. I am specifically looking for a way to back up any volume, without having to know what it is or having to run specific commands for its data.
(I considered whether this question belonged on SO or Serverfault, but I believe this is a problem to be solved by developers, hence it belongs here. Happy to move it if consensus is otherwise)
EDIT: To clarify, I want do something similar to what is explained in this question
How to deal with persistent storage (e.g. databases) in docker
but using the approach in the accepted answer is not going to work with Postgres (and I am sure other database containers) according to documentation.
I'm skeptical that there is a custom solution, holistic, multi machine, multi container, application/container agnostic approach. From my point of view there is a lot of orchestration activities necessary in the first place. And I wonder if you wouldn't use something like Kubernetes anyways that - supposedly - comes with its own backup solution.
For single machine, multi container setup I suggest to store your container's data, configuration, and eventual build scripts within one directory tree (e.g. /docker/) and use a standard file based backup program to backup the root directory.
Use docker-compose to managed your containers. This lets you store the configuration and even build options in a file(s). I have an individual compose file for each service, but a single one would also work.
Have a subdirectory for each service. Mount bind-mount directories aka volumes of the container there. If you need to adapt the build process more thoroughly you can easily store scripts, sources, Dockerfiles, etc. in there as well.
Since containers are supposed to be ephemeral, all persistent data should be in bind-mount and therefore in the main docker directory.

Container with app + DB

I've been making some tests with docker and so far I'm wondering why it's considered a good practice to separate the DB and the app in two containers.
Having two containers seems to be cumbersome to manage and I don't really see the value in it.
Whereas I like the idea of having a self sustainable container per app.
One reason is the separation of data storage and application. If you you put both in their own container, you can update them independently. In my experience this is a common process, because usually the application will evolve faster than the underlying database.
It also frees you to run the containers in different places, which might be a constraint in your operations. Or to run multiple containers from the same database image with different applications.
Often it is also a good thing to be able to scale the UI from one instance to multiple instance, all connected to the same database (or cache instance or HTTP backend). This is mentioned briefly in the docker best practices.
I also understand the urge to run multiple processes in one container. That's why so many minimalist init systems/supervisors like s6 came up lately. I prefer this for demos of applications which require a couple things, like an nginx for frontend, a database and maybe a redis instance. But you could also write a basic docker-compose file and run the demo with multiple containers.
It depends on what you consider your "DB", is it the database application or the content.
The latter is easy, the content needs to be persisted outside the lifetime of the application. The convention used to be to have a "data" container, which simplified linking it with the application (e.g. using the Docker Engine create command --volumes-from parameter). With Docker 1.9 there is a new volume API which has superceded the concept of "data" containers. But you should never store your data in the overlay filesystem (if not only for persistence, but for performance).
If you are referring to a database application, you really enter a semi-religious debate with the microservices crowd. Docker is built to run single process. It is built for 12-factor apps. It is built for microservices. It is definitely possible to run more than one process in a container, but with it you have to consider the additional complexity of managing/monitoring these processes (e.g. using an init process like supervisord), dealing with logging, etc.
I've delivered both. If you are managing the container deployment (e.g. you are hosting the app), it is actually less work to use multiple containers. This allows you to use Docker's abstraction layers for networking and persistent storage. It also provides maximum portability as you scale the application (perhaps you may consider using convoy or flocker volume drivers or an overlay network for hosting containers across multiple servers). If you are developing a product for distribution, it is more convenient to deliver a single Docker Repository (with one Image). This minimizes the support costs as you guide customers through deployment.

How to share volumes across multiple hosts in docker engine swarm mode?

Can we share a common/single named volume across multiple hosts in docker engine swarm mode, what's the easiest way to do it ?
If you have an NFS server setup you can use use some nfs folder as a volume from docker compose like this:
volumes:
grafana:
driver: local
driver_opts:
type: nfs
o: addr=192.168.xxx.xx,rw
device: ":/PathOnServer"
In the grand scheme of things
The other answers are definitely correct. If you feel like you're still missing something or are coming to the conclusion that things might never really improve in this space, then you might want to reconsider the use of the typical POSIX-like hierarchical filesystem abstraction. Not all applications really need it (I might go as far as to say that few do). Maybe yours doesn't either.
In defense of filesystems
It is still very common in many circles, but usually these people know their remote/distributed filesystems very well and know how to set them up and leverage them properly (and they might be very good systems too, though often not with existing Docker volume drivers). Sometimes it's also in part because they're simply forced to (codebases that can't or shouldn't be rewritten to support other storage backends). Using, configuring or even writing arbitrary Docker volume drivers would be a secondary concern only.
Alternatives
If you have the option however, then evaluate other persistence solutions for your applications. Many implementations won't use POSIX filesystem interfaces but network interfaces instead, which pose no particular infrastructure-level difficulties in clusters such as Docker Swarm.
Solutions managed by third-parties (e.g. cloud providers)
Should you succeed in removing all dependencies to filesystems for persistent and shared data (it's still fine for transient local state), then you might claim to have fully "stateless" applications. Of course there is often always state persisted somewhere still, but the idea is that you don't handle it yourself. Many cloud providers (if that's where you're hosting things) will offer fully managed solutions for handling persistent state such that you don't have to care about it at all. If you're going this route, do consider managed services that use APIs compatible with implementations that you can use locally for testing (for example by running a Docker container based on an image for that implementation that is provided by a third-party or that you can maintain yourself).
DIY solutions
If you do want to manage persistent state yourself within a Docker Swarm cluster, then the filesystem abstraction is often inevitable (and you'd probably have more difficulties targeting block devices directly anyway). You'll want to play with node and service constraints to ensure the requirements of whatever you use to persist data are fulfilled. For certain things like a central DBMS server it could be easy ("always run the task on that specific node only"), for others it could be way more involved.
The task of setting up, scaling and monitoring such a setup is definitely not trivial, which is why many application developers are happy to let somebody else (e.g. cloud providers) do it. It's still a very cool space to explore however, though given you had to ask that question it's likely not something you should focus on if you're on a deadline.
Conclusion
As always, use the right abstraction for the job, and pause to think about what your strengths are and where to spend your resources.
From scratch, Docker does not support this by itself. You must use additional components either a docker plugin which would provide you with a new layer type for your volumes, or a sync tool directly on your FS which will sync the data for you.
From my point of view, the easiest solution is rsync or more accurately lsyncdn the daemon version of rsync. But I never tried it for docker volumes, so I can't tell if it handle it fine.
Other solutions are offered using Infinit.sh. It basically does the same thing as lsyncd does. It's a one way sync. So if your docker container are RW in their volumes it won't match your expectations. I tried this solution, and it works pretty well for RO operations. And not in production. It's still an alpha version. Infinit is also on the way to provide a docker driver. Not released yet. So I didn't even tried it. Too risky.
Other solutions I found but was unable to install (and so to try) are flocker and glusterFS. Both are designed to create FS Volume based on several HDD from several machines. But none of their repositories were working these past weeks.
Sorry for giving you only weak solutions, but I'm facing the same problem and haven't find yet a perfect solution.
Cheers,
Olivier

Why is the Docker vfs storage backend not considered suitable for production?

The Docker vfs storage backend is in several places mentioned as not being a production backend (see for example this Docker GitHub issue comment by Michael Crosby). What makes it not suitable for production?
Project Atomic's description of storage backends says:
The vfs backend is a very simple fallback that has no copy-on-write support. Each layer is just a separate directory. Creating a new layer based on another layer is done by making a deep copy of the base layer into a new directory.
Since this backend doesn’t share diskspace use between layers, and since creating a new layer is a slow operation this is not a very practical backend. However, it still has its uses, for instance to verify other backends against, or if you need a super robust (if slow) backend that works everywhere.
According to that description it sounds like the only downside is that more disk space might be used and creating layers might be slower. But there are no mentions of downsides during runtime when accessing files, and it is even described as "robust". The disk space issue alone does not seem like a blocker for production use.
Indeed, you could use the vfs driver in production, however, be aware that as it is a 'regular' copy, you won't benefit from the features that devicemapper or btrfs can provide and you rely exclusively on the underlying file system.
The runtime downside is that it is much slower to run. Once started, if you have the same underlying file system, it will be the same thing.
In short, I would recommend against because:
It has been implemented first for tests then used for volumes. Never meant to be used for runtime
It relies on the underlying file system so you give less control to Docker over your files. It might (or might not) cause issues with future upgrade. The very purpose of Docker is to abstract the host, so you are better off delegating this kind of thing to Docker.
It takes a lot of disk space
It takes a lot of time to run or commit

Resources