I have three containers that need to run on the same Swarm node/host in order to have access to the same data volume. I don't care which host they are delegated to - since it is running on Elastic AWS instances, they will come up and down without my knowing it.
This last fact makes it tricky even though it seems like it should be fairly common. Is there a placement constraint that would allow this? Obviously node.id or node.hostname are out as those are not constant. I thought about labels - that would work, but then I have no idea how to have a "replacement" AWS instance automatically get the label.
Swarm doesn't have the feature to put containers on the same host together yet (with your requirements of not using ID or hostname). That's known as "Pods" in Kubernetes. Docker Swarm takes a more distributed approach. You could try to hack together a label assignment on new instance startup but that isn't ideal.
In Swarm the way to solve this problem today is with using a different volume driver plugin then the built-in "local" driver. Here's a list of certified ones. The key in Swarm is to not use local storage on a node for volumes. Those volumes will get lost when the node dies anyway, so it's best in Swarm to move your volumes to shared storage.
In AWS I'd suggest you try EFS as shared storage if you need multiple containers to access it at once, and use either Docker's CloudStor driver (comes with Docker for AWS template) or the REX-Ray storage orchestrator solution which ensures shared data paths (NFS, EFS, S3, etc.) are connected to the correct node for the correct Service task.
Related
Currently, I am migrating to Docker Swarm and have begun to use docker configs to offload most of the configuration files but I have one file remaining that is several GBs that is used by my tileserver. Right now, I have a 1 master / 4 workers and I am looking for a way to share that file with all nodes in the swarm to prepare for a time when the tileserver goes down.
Any ideas ?
If you want highly available data then a solution that distributes data amongst nodes (or servers).
One approach would be deploying an object storage solution onto the swarm - something like minio gives you an s3 compatible REST api and when deployed with a minimum of 4 disks in erasure coding mode tolerates 1 disk down for writing and 2 disks down for reading (assuming you have a node per disk).
If re-jigging your app to work with object storage isnt in scope then investigate something like glusterfs which you will want to install on the metal, rather than on docker. glusterfs will give you a unified filesystem with decent HA on 3 nodes, you can add disks on the fly.
Obviously with minio its expected your app would use the s3 api to access its files. With glusterfs you would need to mount gfs volumes on host locations where containers than then mount volumes to gain access to that network storage.
unless you are willing to go wandering through the world of rex-ray and other community supported docker volume drivers that either havn't seen an update in years or are literally maintained by one guy for fun which can bring some first class support for glusterfs based docker volumes to your hopefully non production docker swarm.
I work in an media organisation where we deploy all our application on monolithic VMs but now we want move to kubernetes but we have major problem we have almost 40+NFS servers from which we are consuming the data in terabytes
The major problem is how do we read all this data from containers
The solutions we tried creating a
1.Persistent Volume and Persistent Volume Claim of the NFS which according to us is not a feasible solution as the data grow we have to create a new pv and pvc and create deployment
2.Mounting volumes on Kubernetes if we do this there would be no difference between kubernetes and VMs
3.Adding docker volumes to containers we were able to add the volume but we cannot see the data in the container
How can we make the existing nfs as storage class and use it or how to mount all the 40+ NFS servers on pods
It sounds like you need some form of object storage or block storage platform to manage the disks and automatically provisions disks for you.
You could use something like Rook for deploying Ceph into your cluster.
This will enable disk management in a much more friendly way, and help to automatically provision the NFS disks into your cluster.
Take a look at this: https://docs.ceph.com/docs/mimic/radosgw/nfs/
There is also the option of creating your own implementation using CRDs to trigger PV/PVC creation on certain actions/disks being mounted in your servers.
I’m trying to understand the swarm and volumes in scale. I followed the https://docs.docker.com/get-started until the end and everything was clear and working as expected. Now I want to do the same thing (have two nodes, 1 manager, 1 worker) but on my physical devices. Let’s say I have two computers on the same network or two AWS instances.
I tried to set it up between my two computers on the local network, but it seems that the manager’s IP is responding to the manager nodes and worker’s IP to worker nodes. Also, the Redis is not working on the worker’s IP.
I think it should be possible, but I can’t find an article where they are doing a similar thing.
I mean there should be a way, I don’t think that companies are running all their Docker containers on the same server (AWS instances)
P.S.: I'm using all the same files and setups provided in the link above.
I'm using Docker I have implemented a system to deploy environments (on a single server) based on Git branches using Traefik (*.dev.domain.com) and Docker Compose templates.
I like Kubernetes and I've never switched to it since I'm limited to one single server for my infrastructure. I've only used it using local installations (Docker for Windows).
So, my question is: does it make sense to run a Kubernetes "cluster" (master and nodes) on a single server to orchestrate and route containers (in place of Traefik/Rancher/Docker Compose)?
This use is for development and staging only for the moment, so high availability is not a prerequisite.
Thanks.
If it is not a production environment, it doesn't matter how many nodes you are using. So yes, it should be just fine in this case. But make sure all the k8s features you will need in production are available in test/dev, to keep things similar and portable.
AFAIU,
I do not see a requirement for kubernetes unless we are doing below at least for single host using native docker run or docker-compose or docker engine swarm mode -
Make sure there are enough(>=2) replicas of your app in a single server and you are balancing the load across those apps docker containers.
If you want to go bit advanced, we should be able to scale up & down dynamically (docker swarm mode supports this out of the box else use jwilder nginx proxy).
Your deployment should not cause a downtime. Make sure a single container is always healthy at any instant of time while deploying.
Container should auto heal(restart automatically) in case your HTTP or TCP health check fails.
Doing all of the above will certainly put you in a better place but single host is still a single source of failure which you got to deal with at regular intervals.
Preferred : if possible try to start with docker engine swarm mode or kubernetes single master or minikube. This will automatically take care of all the above scenarios out of the box and will also allow you to further scale up anytime by adding more nodes without changing much in your YML files for docker swarm or kubernetes.
Ref -
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
https://docs.docker.com/engine/swarm/
I would use single host k8s only if I managed clusters with the same project that I would like to deploy to the said host. This enables you to reuse manifests and all the automation you've created for your clusters.
Have I had single host environments only, I would probably stick to docker-compose.
If you're looking to try it out your easiest options are probably minikube (easy to run single-node cluster locally but without some features) or using one of the free trial accounts for a managed Kubernetes service from one of the big cloud providers (fully-featured and multi-node but limited use before you have to pay).
I want to set up a Swarm with persistent and replicated volumes through Ceph. I see these options to combine both services, once both are set up:
Configure the host OS to mount a CephFS in /var/lib/docker/volumes.
Use rexray/rbd as a volume driver.
Use rexray/s3fs to access Ceph object store, which is S3-compatible.
I wonder now: which option would deliver fastest performance? Is there another better option that I'm missing?
Thanks.
In general for best performance you should go for rbd, since it provides you with direct block access to the ceph volume, whereas s3fs is quite much more machinery to be spun, which eventually result in longer response times. Having quick responses for random read/writes is especially important when you have a scenario like running a postgreSQL (or MariaDB) database with mixed read/write load.
This is only a general advice looking at Ceph rbd. But my guess is this will apply as well to docker storage drivers.