how to run docker in production, with a active/active or active/standby HA system?
are there any guides or best practices?
i am thinking of 3 scenarios:
1) NFS - for two servers - wich are prepped with docker-machine and mounting a shared NFS to /var/lib/docker/ - so both docker nodes should see the same files. (using some sort of filer, like vnx, efs, and so on.)
2) using DRBD to replicate a disk - and mount it to: /var/lib/docker/ - so data is on both nodes, and the active node can mount it and run containers, in case of failover the other node mounts and starts the containers
3) using DRBD - as above - and export a NFS server, mounting the NFS on both nodes to : /var/lib/docker/ - so as above both nodes can mount and run containers, using Heartbeat/Pacemaker to travel the virtual-IP & DRBD switching
what is the best practice on running docker-containers in production to make them high availaible.
regards
Persistent storage is still somewhat the elephant in the room in the container/docker world.
I wouldn't recommend using any of the approaches that you're suggesting. The only exception would be if you put some particular data onto a shared volume (using a volume mount) (but not the entire /var/lib/docker).
There are lots of things going on in the container space and there's a volume plugins that integrates directly into Docker. One of the volume plugins/solutions that is gaining the most momentum is Flocker, which is worth looking into.
Once you've moved your data out of your containers, setting up a HA system becomes a lot easier, as the containers become more or less ephemeral.
You can then use something like Kubernetes, Docker Swarm, or Docker Datacenter to manage/monitor these containers.
Related
I'm facing a big challenge. Trying run my app on 2 VPS in docker swarm. Containers that use volumes should use shared volume between nodes.
My solution is:
Use plugin glusterFS and mount volume on every node using nfs. NFS generate single point of failure so when something go wrong my data are gone. (it's not look good maybe im wrong)
Use Azure Storage - store data as blob ( Azure Data Lake Storage Gen2 ). But my main is problem how can i connect to azure storage using docker-compose.yaml? I should declarate volume in every service that use volume and declare volume in volume section. I don't have idea how to do that.
Docker documentation about it is gone. Should be here https://docs.docker.com/docker-for-azure/persistent-data-volumes/.
Another option is use https://hub.docker.com/r/docker4x/cloudstor/tags?page=1&ordering=last_updated but last update was 2 years ago so its probably not supported anymore.
Do i have any other options and which share volume between nodes is best solution?
There are a number of ways of dealing with creating persistent volumes in docker swarm, none of them particularly satisfactory:
First up, a simple way is to use nfs, glusterfs, iscsi, or vmware to multimount the same SAN storage volume onto each docker swarm node. Services just mount volumes as /mnt/volumes/my-sql-workload
On the one hand its really simple, on the other hand there is literally no access control and you can easilly accidentally load services pointing at each others data.
Next, commercial docker volume plugins for SANs. If you are lucky and possess a Pure Storage, NetApp or other such SAN array, some of them still offer docker volume plugins. Trident for example if you have a NetApp.
Third. if you are in the cloud, the legacy swarm offerings on Azure and Aws included a built in "cloudstor" volume driver but you need to dig really deep to find it in their legacy offering.
Four, there are a number of opensource or free volume plugins that will mount volumes from nfs, glusterfs or other sources. But most are abandoned or very quiet. The most active I know of is marcelo-ochoa/docker-volume-plugins
I wasn't particularly happy with how those plugins mounted pre-existing volumes, but made operations like docker volume create hard, so I made my own, but really
Swarm Cluster Volume Support with CSI Plugins is hopefully going to drop in 2021¹. Which hopefully is a solid rebuttal to all the problems above.
¹Its now 2022 and the next version of Docker has not yet gone live with CSI support. Still we wait.
In my opinion, a good solution could be to create a GlusterFS cluster, configure a single volume and mount it in every Docker Swarm node (i.e. in /mnt/swarm-storage).
Then, for every Container that needs persistent storage, bind-mount a subdirectory of the GlusterFS volume inside the container.
Example:
services:
my-container:
...
volumes:
- type: bind
source: /mnt/swarm-storage/my-container
target: /a/path/inside/the/container
This way, every node shares the same storage, so that a given container could be instantiated indifferently on every cluster node.
You don't need any Docker plugin for a particular storage driver, because the distributed storage is transparent to the Swarm cluster.
Lastly, GlusterFS is a distributed filesystem, designed to not have a single point of failure and you can cluster it on as many node you like (contrary to NFS).
docker data volume vs mounted host directory
says volumes should be preferred over bind mounts
I have a few questions regarding the issue. The post says:
When you create a volume, it is stored within a directory on the Docker host
Bear with me, but I'm new to docker, and I'm wondering what is docker host here.
Is it a machine where I build the image (probably not)?
Is it the machine where the image will be run? If it is so, what happens if I run the image on multiple machines, will it create two independent volumes?
When I have developement and production setup, how docker manages two separate volumes for each environment?
Besides it seems fairly easy to lose data by doing docker-compose down when I use data volumes, that's the first obstacle that makes me to hesitate to use data volumes, is there an obvious solution to mitigate the issue?
That's not a doctrine actually - not using bind mounts. Yes, they can damage your host's file system if mounted inaccurately (like -v /bin:/var/log) as soon as your have root privileges inside container by default; also they are less portable but they facilitate file exchange between host and container. When you want to provide initial configuration for your service, or put source code for compilation into container, I believe you would prefer to bind mount instead of creating and running temporary container for docker volume cp operations. Also, you should always use :ro option when possible (read only) to prevent data modification from inside container.
Docker host - it is a machine (PC), where Docker daemon is running.
Is it a machine where I build the image (probably not)?
Not true. You can build using docker CLI or docker API remotely.
Is it the machine where the image will be run?
Yes, images are run by docker daemon and thus it will be the host.
If it is so, what happens if I run the image on multiple machines,
will it create two independent volumes?
It depends. Running images on different machines can be achieved in different ways, staring with orchestrators like kubernetes or docker swarm and ending with manual launch on separate docker daemons. With orchestrators it is possible to have same volume, shared among different hosts, but in this case you can't use bind mounts, you use volumes.
When I have developement and production setup, how docker manages two
separate volumes for each environment?
Docker doesn't it is you who manages.
Besides it seems fairly easy to lose data by doing docker-compose down
when I use data volumes, that's the first obstacle that makes me to
hesitate to use data volumes, is there an obvious solution to mitigate
the issue?
Volumes can easily persist between docker-compose sessions. The most explicit way to achieve that is to declare volume in advance with
docker volume create foo
and then use it in your compose files:
version: '3'
services:
abc:
volumes:
foo:/foo
volumes:
foo:
external: true
Feature
Bind
Volume
Internal soul
Bind mounts attach a user-specified location on host filesystem to a specific point in a container file tree.
Volume attach with disk storage on the host filesystem or cloud storage.
command
--mount type=bind,src="",dst=""
Docker CLI docker volume command
Dependency
dependent on location on to the host filesystem.
Container-independent data management
Separation of concerns
No
Yes
Conflict with other containers
Yes Example: multiple instances of Cassandra that all use the same host location as a bind mount for data storage. In that case, each of the instances would compete for the same set of files. Without other tools such as file locks, that would likely result in corruption of the database.
No. By default, Docker creates volumes by using the local volume plugin.
When to choose
1- Bind mounts are useful when the host provides a file or directory that is needed by a program running in a container, or when that containerized program produces a file or log that is processed by users or programs running outside containers. 2- appropriate tools for workstations, machines with specialized concerns 3- systems with more traditional configuration management tooling.
Working with Persistent storage 1. Databases 2. Cloud storage
When not to choose
Better to avoid these kinds of specific bindings in generalized platforms or hardware pools.
To be written
I'm new to Docker Swarm. As I understand, Docker Swarm allows you to abstract from clustering. Means you don't care on which hardriwe container is deployed.
On the other hand, the standard way to handle database in Docker - is to write data outside Docker container (to avoid copy-on-write behaviour). That's achieved by mounting a Volume and write db-related data to it. The important thing here - are Volumes machine-specific? Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
Example:
I have 3 machines and 3 microservices/containers. All of them are deployed through Docker Swarm. Only one microservice/container must connect to a database. So I need to mount Volume only on one machine. But on which?
Databases and similar stateful applications are still a hard thing to deal with when it comes to Docker swarm and other orchestration frameworks. Ideally, containers should be able to run on any node in the swarm, but the problem comes when you need to persist data beyond the container's lifecycle.
Mounting a volume is the Docker way to persist data, however this ties the container with a specific node as volumes are created on the specific nodes. There are many projects that try to solve this problem and provide some sort of distributed storage.
There was a project called Flocker that deals with the above problem (it’s no longer maintained). There is also a newer project called REXRAY.
Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
By default, no. Docker swarm will choose one of the nodes and deploy the container on it. However, you can work around this problem:
First, you need to define a named volume in you Stackfile/Composefile under the service definition.
Second, you need to use node Placement Constraints to restrict where the database container should run.
If you do not you a distributed storage tool, then when it comes to databases and similar stateful containers that need volumes, you need to restrict the container to a specific nodes.
So with the introduction of volumes we do not longer use data-only containers! Nice. But right now I have this nice home-grown ZFS appliance and I want to use it as a backend for my docker volumes (of course Docker is running on other hosts).
I can export ZFS as NFS relatively easy, what are proven (i.e. battle-tested) options for using NFS as a volume backend for docker?
A google search shows me the following possibilities:
using Flocker, I could use the flocker-agent-thingie on the zfs appliance. However with Flocker being scrapped, I am concerned...
using the local volume backend and simply mount the nfs export on the docker host -> does not scale, but might do the job.
using a specialized volume plugin like https://github.com/ContainX/docker-volume-netshare to utilize nfs
something alike from Rancher: https://github.com/rancher/convoy
or going big and use Kubernetes and NFS as persistent storage https://kubernetes.io/docs/concepts/storage/volumes/#nfs
I have pretty extensive single-host Docker knowledge - which option is a stable one? Performance is not that important to me (alas the use case is a dockerized OwnCloud/NextCloud stack, throughput is limited by the internet connection)
I am setting docker on cloud machine with two drives: HA slow drive and attached SSD fast drive. I would like to split containers between these two drives. I'd like to put IO-intensive containers on SSD drive and less IO-intensive on the HA drive.
I know it is possible to change location of all containers with -g daemon flag.
Is it possible to change location per container (preferably using docker-compose)?
I think a simpler approach is to define volumes inside the SSD. In general IO-intensive operations should be in a volume, just create this volume in the SSD with docker-compose.
Regards