I work in an media organisation where we deploy all our application on monolithic VMs but now we want move to kubernetes but we have major problem we have almost 40+NFS servers from which we are consuming the data in terabytes
The major problem is how do we read all this data from containers
The solutions we tried creating a
1.Persistent Volume and Persistent Volume Claim of the NFS which according to us is not a feasible solution as the data grow we have to create a new pv and pvc and create deployment
2.Mounting volumes on Kubernetes if we do this there would be no difference between kubernetes and VMs
3.Adding docker volumes to containers we were able to add the volume but we cannot see the data in the container
How can we make the existing nfs as storage class and use it or how to mount all the 40+ NFS servers on pods
It sounds like you need some form of object storage or block storage platform to manage the disks and automatically provisions disks for you.
You could use something like Rook for deploying Ceph into your cluster.
This will enable disk management in a much more friendly way, and help to automatically provision the NFS disks into your cluster.
Take a look at this: https://docs.ceph.com/docs/mimic/radosgw/nfs/
There is also the option of creating your own implementation using CRDs to trigger PV/PVC creation on certain actions/disks being mounted in your servers.
Related
New to Docker/K8s. I need to be able to mount all the containers (across all pods) on my K8s cluster to a shared file system, so that they can all read from and write to files on this shared file system. The file system needs to be something residing inside of -- or at the very least accessible to -- all containers in the K8s cluster.
As far as I can tell, I have two options:
I'm guessing K8s offers some type of persistent, durable block/volume storage facility? Maybe PV or PVC?
Maybe launch a Dockerized Samba container and give my others containers access to it somehow?
Does K8s offer this type of shared file system capability or do I need to do something like a Dockerized Samba?
NFS is a common solution to provide you the file sharing facilities. Here's a good explanation with example to begin with. Samba can be used if your file server is Windows based.
You are right you can use the File system in the backend with Access Mode ReadWriteMany.
ReadWirteMany will allow the container to mount to a single PVC and write on it.
You can also use the NFS system as suggested by the gohm'c, for NFS you can set up the GlusterFS or MinIO containers.
Read more about the Access mode ReadWriteMany : https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
We are planning to host our Artifactory on ECS (Fargate) and mount the data to EFS. We will use an ALB in front of the containers (8081 and 8082) We still have some open issues:
Can we use multiple containers at the same time or will there be upload/write issues to EFS?
Is EFS a good solution or is S3 better?
What about the metadata. I read Artifactory is hosting this in some Derby database. What if we redeploy a new container? Will the data be gone? Can this data be persisted on EFS or do we need RDS?
Can we use multiple containers at the same time or will there be upload/write issues to EFS?
Ans: Yes you can use multiple containers to host Artifactory instances in a single host. However it is generally recommended to use multiple host to avoid the 'single point of failure' scenario. I don't anticipate any RW issues with EFS/S3.
Is EFS a good solution or is S3 better?
Ans: In my opinion both S3 and EFS are better known as scalable solutions rather than high performance oriented and it completely depends on the use-case. You can overcome this issue by enabling cache-fs in Artifactory which will store the frequently used binaries in a defined place (like a local disk with higher RW speeds). You can read more about cache-fs here: https://jfrog.com/knowledge-base/what-is-cache-fs-video/
What about the metadata. I read Artifactory is hosting this in some Derby database. What if we redeploy a new container? Will the data be gone? Can this data be persisted on EFS or do we need RDS?
Ans: when you are configuring more than one Artifactory node it is mandatory to have an external database (RDS) to store the configurations/references. On a side note: Artifactory generates the metadata for the packages/artifacts and store them in the FS only. However the references will be stored in the DB
Currently, I am migrating to Docker Swarm and have begun to use docker configs to offload most of the configuration files but I have one file remaining that is several GBs that is used by my tileserver. Right now, I have a 1 master / 4 workers and I am looking for a way to share that file with all nodes in the swarm to prepare for a time when the tileserver goes down.
Any ideas ?
If you want highly available data then a solution that distributes data amongst nodes (or servers).
One approach would be deploying an object storage solution onto the swarm - something like minio gives you an s3 compatible REST api and when deployed with a minimum of 4 disks in erasure coding mode tolerates 1 disk down for writing and 2 disks down for reading (assuming you have a node per disk).
If re-jigging your app to work with object storage isnt in scope then investigate something like glusterfs which you will want to install on the metal, rather than on docker. glusterfs will give you a unified filesystem with decent HA on 3 nodes, you can add disks on the fly.
Obviously with minio its expected your app would use the s3 api to access its files. With glusterfs you would need to mount gfs volumes on host locations where containers than then mount volumes to gain access to that network storage.
unless you are willing to go wandering through the world of rex-ray and other community supported docker volume drivers that either havn't seen an update in years or are literally maintained by one guy for fun which can bring some first class support for glusterfs based docker volumes to your hopefully non production docker swarm.
I have three containers that need to run on the same Swarm node/host in order to have access to the same data volume. I don't care which host they are delegated to - since it is running on Elastic AWS instances, they will come up and down without my knowing it.
This last fact makes it tricky even though it seems like it should be fairly common. Is there a placement constraint that would allow this? Obviously node.id or node.hostname are out as those are not constant. I thought about labels - that would work, but then I have no idea how to have a "replacement" AWS instance automatically get the label.
Swarm doesn't have the feature to put containers on the same host together yet (with your requirements of not using ID or hostname). That's known as "Pods" in Kubernetes. Docker Swarm takes a more distributed approach. You could try to hack together a label assignment on new instance startup but that isn't ideal.
In Swarm the way to solve this problem today is with using a different volume driver plugin then the built-in "local" driver. Here's a list of certified ones. The key in Swarm is to not use local storage on a node for volumes. Those volumes will get lost when the node dies anyway, so it's best in Swarm to move your volumes to shared storage.
In AWS I'd suggest you try EFS as shared storage if you need multiple containers to access it at once, and use either Docker's CloudStor driver (comes with Docker for AWS template) or the REX-Ray storage orchestrator solution which ensures shared data paths (NFS, EFS, S3, etc.) are connected to the correct node for the correct Service task.
I'm using Kubernetes deployment with persistent volume to run my application, like this example;
https://github.com/kubernetes/kubernetes/tree/master/examples/mysql-wordpress-pd
, but when I try to add more replicas or autoscale, all the new pods try to connect to the same volume.
How can I simultaneously auto create new volumes for each new pod., like statefulsets(petsets) are able to do it.
The conclusion I reached for K8S 1.6 is you can't. However, you can use NFS. If, like CrateDB, your cluster can create a folder for each node under the volume mount, then you can auto-scale. So, I auto-scale CrateDB as a Deployment using this configuration:
https://github.com/erik777/kubernetes-cratedb
which relies on an nfs-server, which I deploy as an RC with PVC/PV:
SAME_BASE/kubernetes-nfs-server
It is on my TODO list to exlpore distributed file systems such as GluterFS. For K8S Deployments, your choice of file system is your remedy.
You can also engage the scalability and storage SIGs in the K8S community to help prioritize this use-case. Adding the capability to K8S removes the requirement for a clustering solution to handle node separation in a shared volume, as well as prevent the introduction of additional points of failure between the clustered app and the PV.
GITHUB kubernetes/community
Hopefully, we can see a K8S OTB solution by 2.0.
(NOTE: Had to change 2 of the GITHUB links because I don't have "10 reputation")