Share large file with all nodes in docker swarm - docker

Currently, I am migrating to Docker Swarm and have begun to use docker configs to offload most of the configuration files but I have one file remaining that is several GBs that is used by my tileserver. Right now, I have a 1 master / 4 workers and I am looking for a way to share that file with all nodes in the swarm to prepare for a time when the tileserver goes down.
Any ideas ?

If you want highly available data then a solution that distributes data amongst nodes (or servers).
One approach would be deploying an object storage solution onto the swarm - something like minio gives you an s3 compatible REST api and when deployed with a minimum of 4 disks in erasure coding mode tolerates 1 disk down for writing and 2 disks down for reading (assuming you have a node per disk).
If re-jigging your app to work with object storage isnt in scope then investigate something like glusterfs which you will want to install on the metal, rather than on docker. glusterfs will give you a unified filesystem with decent HA on 3 nodes, you can add disks on the fly.
Obviously with minio its expected your app would use the s3 api to access its files. With glusterfs you would need to mount gfs volumes on host locations where containers than then mount volumes to gain access to that network storage.
unless you are willing to go wandering through the world of rex-ray and other community supported docker volume drivers that either havn't seen an update in years or are literally maintained by one guy for fun which can bring some first class support for glusterfs based docker volumes to your hopefully non production docker swarm.

Related

How to Convert NFS into a Storage Class in kubernetes

I work in an media organisation where we deploy all our application on monolithic VMs but now we want move to kubernetes but we have major problem we have almost 40+NFS servers from which we are consuming the data in terabytes
The major problem is how do we read all this data from containers
The solutions we tried creating a
1.Persistent Volume and Persistent Volume Claim of the NFS which according to us is not a feasible solution as the data grow we have to create a new pv and pvc and create deployment
2.Mounting volumes on Kubernetes if we do this there would be no difference between kubernetes and VMs
3.Adding docker volumes to containers we were able to add the volume but we cannot see the data in the container
How can we make the existing nfs as storage class and use it or how to mount all the 40+ NFS servers on pods
It sounds like you need some form of object storage or block storage platform to manage the disks and automatically provisions disks for you.
You could use something like Rook for deploying Ceph into your cluster.
This will enable disk management in a much more friendly way, and help to automatically provision the NFS disks into your cluster.
Take a look at this: https://docs.ceph.com/docs/mimic/radosgw/nfs/
There is also the option of creating your own implementation using CRDs to trigger PV/PVC creation on certain actions/disks being mounted in your servers.

What's the fastest way to use Ceph volumes in Docker Swarm?

I want to set up a Swarm with persistent and replicated volumes through Ceph. I see these options to combine both services, once both are set up:
Configure the host OS to mount a CephFS in /var/lib/docker/volumes.
Use rexray/rbd as a volume driver.
Use rexray/s3fs to access Ceph object store, which is S3-compatible.
I wonder now: which option would deliver fastest performance? Is there another better option that I'm missing?
Thanks.
In general for best performance you should go for rbd, since it provides you with direct block access to the ceph volume, whereas s3fs is quite much more machinery to be spun, which eventually result in longer response times. Having quick responses for random read/writes is especially important when you have a scenario like running a postgreSQL (or MariaDB) database with mixed read/write load.
This is only a general advice looking at Ceph rbd. But my guess is this will apply as well to docker storage drivers.

Docker Swarm constraint to keep multiple containers together?

I have three containers that need to run on the same Swarm node/host in order to have access to the same data volume. I don't care which host they are delegated to - since it is running on Elastic AWS instances, they will come up and down without my knowing it.
This last fact makes it tricky even though it seems like it should be fairly common. Is there a placement constraint that would allow this? Obviously node.id or node.hostname are out as those are not constant. I thought about labels - that would work, but then I have no idea how to have a "replacement" AWS instance automatically get the label.
Swarm doesn't have the feature to put containers on the same host together yet (with your requirements of not using ID or hostname). That's known as "Pods" in Kubernetes. Docker Swarm takes a more distributed approach. You could try to hack together a label assignment on new instance startup but that isn't ideal.
In Swarm the way to solve this problem today is with using a different volume driver plugin then the built-in "local" driver. Here's a list of certified ones. The key in Swarm is to not use local storage on a node for volumes. Those volumes will get lost when the node dies anyway, so it's best in Swarm to move your volumes to shared storage.
In AWS I'd suggest you try EFS as shared storage if you need multiple containers to access it at once, and use either Docker's CloudStor driver (comes with Docker for AWS template) or the REX-Ray storage orchestrator solution which ensures shared data paths (NFS, EFS, S3, etc.) are connected to the correct node for the correct Service task.

How do I use Docker on cloud or datacenter

I couldn't have enough courage to start using docker now I'm feel like came from last century. I want to clear my doubts about docker before get started. My question is mainly for deploying/running docker images on cloud or hosting environment.
Can I build a docker image with any type of server (eg. wildfly, payara) and/or database server (eg. mysql, oracle) and will it work on docker enabled cloud/datacenter?
If it's yes how about persistent datas like database files and static storages (eg. images, uploaded documents, logs) those are stored in docker images or somewhere else? What will happen to those files when I update my application and redeploy new image?
I read posts about what is docker but I couln't find specific answer. Forgive me for not doing enough googling.
I have run docker on AWS and other cloud providers. It is really not that hard if you have some experience with system administration and or devops. Regarding cloud hosters and getting started, most providers have some sort of tutorial on how to get started using docker with their infrastructure:
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/docker-basics.html
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-dockerextension/
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
To get a server up and running, you just need the docker engine installed on the host, there are packages for many distros:
https://docs.docker.com/engine/installation/
After docker engine is installed, you can create dockerfiles for basically any server or service. Hopefully you do not need to, in most cases, since there are countless docker files and pre-configured, vendor maintained images already available on dockerhub (I use wildfly, elk-stack, and mysql for example). Be careful about selecting images are maintained, otherwise you end up with security issues in your images that might never get fixed! Or you have to do it yourself!
Example images:
https://hub.docker.com/r/jboss/wildfly/
https://hub.docker.com/_/mysql/
https://hub.docker.com/_/oraclelinux/
https://hub.docker.com/u/payara/
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
In general, you will want to store persistent data external to the docker image and mount it into the image as a volume:
https://docs.docker.com/engine/tutorials/dockervolumes/
Some cloud based storage providers might be easier to mount or connect to in other ways, but this volume approach is standard, IMO.
For logfiles, I actually push them to an ELK server, so having a volume for the logs is not necessarily required. However, since the ELK server is also a docker image, it does have a volume where the data is persisted.
So you have:
documentation from your cloud hoster (or docker themselves)
a host in your cloud running docker engine
0..n images that you can either grab from dockerhub or build yourself.
storage for persistent data on this host or mounted from elsewhere that you mount into your docker images on startup. this is where e.g. mysql data folders live, or where you can persist logs, etc.
Of course, it can get much more complex from there, e.g. how to transparently scale and update your environment etc., but that is something for e.g. kubernetes or docker swarm or some other solution (I've scripted a bit on my own but do not need the robustness or elastic scalability of large systems).
Regarding cluster management, it should be noted that Swarm is now included in the Docker Core. This has created some controversy in the community and even talks of a fork of the core:
https://technologyconversations.com/2015/11/04/docker-clustering-tools-compared-kubernetes-vs-docker-swarm/
https://jaxenter.com/docker-1-12-is-probably-the-most-important-release-since-1-0-129080.html
http://searchitoperations.techtarget.com/news/450303918/Docker-fork-talk-prompts-container-standardization-brawl
http://www.infoworld.com/article/3118345/cloud-computing/why-kubernetes-is-winning-the-container-war.html
I have experience running docker on Alibaba cloud and AWS as well. I did not see any difference in working with docker on both cloud providers. Docker images can be build same way on all linux platform regardless of the cloud provider. However, persistence of data need to be taken care using docker volumes. However, it is recommended to use managed service such as RDS in Alibaba cloud for databases instead of using docker.
Can I build a docker image with any type of server (eg. wildfly,
payara) and/or database server (eg. mysql, oracle) and will it work on
docker enabled cloud/datacenter?
You can build your own Docker images or use solutions that are already pre-packaged and proven by cloud providers. For example, here is an auto-clustering Docker-based implementation of GlassFish that can be run and managed on Jelastic PaaS.
If it's yes how about persistent datas like database files and static
storages (eg. images, uploaded documents, logs) those are stored in
docker images or somewhere else? What will happen to those files when
I update my application and redeploy new image?
With the above mentioned cluster, all data is kept inside containers and stays without changes after restart. As an option, you can also connect a separate data storage container if you wish to share it across other containers.

How to limit Docker filesystem space available to container(s)

The general scenario is that we have a cluster of servers and we want to set up virtual clusters on top of that using Docker.
For that we have created Dockerfiles for different services (Hadoop, Spark etc.).
Regarding the Hadoop HDFS service however, we have the situation that the disk space available to the docker containers equals to the disk space available to the server. We want to limit the available disk space on a per-container basis so that we can dynamically spawn an additional datanode with some storage size to contribute to the HDFS filesystem.
We had the idea to use loopback files formatted with ext4 and mount these on directories which we use as volumes in docker containers. However, this implies a large performance loss.
I found another question on SO (Limit disk size and bandwidth of a Docker container) but the answers are almost 1,5 years old which - regarding the speed of development of docker - is ancient.
Which way or storage backend would allow us to
Limit storage on a per-container basis
Has near bare-metal performance
Doesn't require repartitioning of the server drives
You can specify runtime constraints on memory and CPU, but not disk space.
The ability to set constraints on disk space has been requested (issue 12462, issue 3804), but isn't yet implemented, as it depends on the underlying filesystem driver.
This feature is going to be added at some point, but not right away. It's a bit more difficult to add this functionality right now because a lot of chunks of code are moving from one place to another. After this work is done, it should be much easier to implement this functionality.
Please keep in mind that quota support can't be added as a hack to devicemapper, it has to be implemented for as many storage backends as possible, so it has to be implemented in a way which makes it easy to add quota support for other storage backends.
Update August 2016: as shown below, and in issue 3804 comment, PR 24771 and PR 24807 have been merged since then. docker run now allow to set storage driver options per container
$ docker run -it --storage-opt size=120G fedora /bin/bash
This (size) will allow to set the container rootfs size to 120G at creation time.
This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers
Documentation: docker run/#Set storage driver options per container.

Resources