Kubernetes Deployment for stateful application - docker

I have a question about best practices designing deployments and or stateful sets for stateful applications like wordpress and co.
The current idea i had was to make a fully dynamic image for one specific cms. With the idea i can mount the project data into it. Like themes, files etc.
In the case of wordpress it would be wp-content/themes. Or is that the wrong way. Is it better to already build the image with the right data and dont worry about the deployment because you already have everything.
What are your experiences with stateful apps and how did you solve those "problems".
thanks for answers :)

I don't think Wordpress is really stateful in this matter and it should be deployed like a regular deployment.
Stateful Set is typically things like databases that needs storage. As an example, Cassandra would typically be a Stateful Set with mounted Volume Claims. When one instance dies, a new one is brought up with the same name, IP address and volume as the old one. After a short while it should be part of the cluster again.
With deployments you will not get the same name or IP address and you can't mount Volume Claims.

All you need to run (wp-content/themes) app it would be nice to put in an image.
All that will change (statefull) you can store in the PVC.

Related

How to update in all replicas when hitting an endpoint

I have 3 replicas of same application running in kubernetes and application exposes an endpoint. Hitting the endpoint sets a boolean variable value to true which is having a use in my application. But the issue is, when hitting the endpoint, the variable is updated only in one of the replicas. How to make the changes in all replicas just by hitting one endpoint?
You need to store your data in a shared database of some kind, not locally in memory. If all you need is a temporary flag, Redis would be a popular choice, or for more durable stuff Postgres is the gold standard. But there's a wide and wonderful world of databases out there, explore which match your use case.
Seems like you're trying to solve an issue with your app using Kubernetes.
Let me elaborate:
If you have several pods behind a service, you can't access all of them using a single request. This have been proposed here, but in my opinion - isn't best practice.
If you need to share data between your apps, you can have them communicate with each other using a cluster service.
You probably assume you can share data using Kubernetes volumes, such as gcePersistentDisk or some other sort of volume, but then again, volumes were not meant to solve such problems.
In conclusion, the way I would solve such issue, is by having the pods communicate changes with each other. Kubernetes can't solve this for you.
EDIT:
Another approach could be having a shared storage (for example a single pod containing mongoDB for example) but I would say that it's a bit of an overkill for your issue. Also because in order to communicate with this pod you would probably need in-cluster communication anyway.

using nfs network path as a kubernetes persistence volume

I have setup a kubernetes cluster with three nodes. All nodes are Linux centos machines.
I need persistent volume to store data and I am trying to achive this.
I was following this tutorial. But it only covers a one node cluster.
https://kubernetes.io/docs/tutorials/stateful-application/mysql-wordpress-persistent-volume/
Since, my cluster consist of three node, I could not use local path. Previous tutorial does not worked for me.
I need a network path and using NFS seems a reasonable solution to me. ( Is there any good alternative I would like to hear.)
Using NFS network mount contains two steps.
First, Creating a persistent volume on a network path.
Second, define this network path as a persistent volume and use it.
Second step pretty straight forward. Its is explained in kubernetes documentation and there is even sample yaml.
documentation:https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes
example: https://github.com/kubernetes/examples/blob/master/staging/volumes/nfs/nfs-pv.yaml
First part also seems straight forward. Its explained in following document
https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nfs-mount-on-ubuntu-16-04#step-5-%E2%80%94-creating-the-mount-points-on-the-client
/etc/exports
directory_to_share client(share_option1,...,share_optionN)
/etc/exports
/var/nfs/general 203.0.113.256(rw,sync,no_subtree_check)
/home 203.0.113.256(rw,sync,no_root_squash,no_subtree_check)
But when you export a path as a NFS path you should make some configuration and give clients some rights. Basically you need client ip.
With kubernetes we use abstraction such as pods and we don't want to deal with real machines and theirs ip addresses. So, the problem startes here.
So, I don't want to give nodes ip to nfs server. (They might change in he first place.) There should be a better solution that all pods (in any node) should be able to connect to NFS network path.
Even allowing all ip without restriction or allowing an ip range might solve the issue. I would like to hear if there is such solution. But, I would also like to hear what is the best practice. How everybody else use NFS network path from kubernetes.
I could not find any solution yet. If you solved similar problem, please let me know how you solve it. Any documenatation on this issue will be good too. Thanks in advance!
You asked for best practices and from what I've found I think that the best option would be white-listing the IP addresses. Since you do not want to do that, there are also some workarounds answers posted on SO created by people who had similar issues with dynamic IP clients in NFS. You can find a link to deployment using glusterfs in the posted answers. If you want NFS with dynamic IP (because it can change), you can use DNS names instead of IP. If you need dynamic provisioning, use glusterfs.
I will add some information about the volumes as you asked. Might give you some light on the topic and help with the issue.
Since pods are ephemeral you need to move the volume outside the Pod - therefore making it independent from the Pods - so the volume would persist its state in case of Pod failure. Kubernetes supports several types of Volumes.
You could use NFS (more on NFS in the previous link) - so after the Pod is removed the volume gets unmounted, but it still exists. This is also not desired in your situation as the user needs to know the file system type and other details about the volume it will want to connect to. When you go to the examples in the documentation about NFS yaml files, you will see that their kind is defined as a Persistent Volume Claim.
This method is based on creating a series of abstractions that will allow a Node to connect to the Persistent Volume, but the user will not need any backend details, in addition, your node can connect to many providers.

Kubernetes: Managing uploaded user content

I have this idea for what I think should happen with my project, but I want to check in and see if this works on a theoretical level first. Basically I am working on a Django site that is run on Kubernetes, but am struggling a little bit about which way I should set up my replicationSet/statefulSet to manage uploaded content (images).
My biggest concern is trying to find out how to scale and maintain uploaded content. My first idea is that I need to have a single volume that has these files written to it, but can I have multiple pods write to the same volume that way while scaling?
From what I have gathered, it doesn't seem to work that way. It sounds more like each pod, or at least each node, would have it's own volume. But would a request for an image reach the volume it is stored on? Or should I create a custom backend program to move things around so that it is served off of a NGINX server like my other static content?
FYI - this its my first scalable project lol. But I am really just trying to find the best way to manage uploads... or a way in general. I would appreciate any explanations, thoughts, or fancy diagrams on how something like this might work!
Hello I think you should forget kubernetes a bit and think of the architecture and capabilities of your Django application. I guess you have built a web app, that offers some 'upload image' functionality, and then you have code that 'stores' this image somewhere. On the very simple scenario if you run your app on your laptop, the you web app, is configured to save this content to a local folder, a more advanced example is that you deploy your application to a VM or a cloud vm e.g an AWS EC2 instance, and your app is saving the files to the local storage of this EC2 instance. The question is twofold - what happens if we have 2 instances of your web app deployed - can the be configured and run - so that they 'share' the same folder to save the images? I guess this is what you want, other wise your app would not scale horizontally , each user would have to hit each individual instance - in order to upload or retrieve specific images. So having that in mind that is a design decision of your application, which I am pretty sure you have already worked it out, the you need to think - how can I share a folder? a bucket so that all my instances of my web app can save files? If you spinned 3 different vms, on any cloud, you would have to use some kind of clour storage, so that all three instances point to the same physical storage location, or an NFS drive or you could save your data to a cloud storage service S3!
Having all the above in mind, and clearly understanding that you need to decouple your application from the notion of locale storage especially if you want to make it as as stateless as it gets (whatever that means to you), having your web app, which is packaged as a docker container and deployed in a kubernetes cluster as a pod - and saving files to the local storage is not going to get any far, since each pod, each docker container will use the underlying kubernetes worker (vm) storage to save files, so another instance will be saving files on some other vm etc etc.
Kubernetes provides this kind of abstraction for applications (pods) that want to 'share' within the kubernetes cluster, some local storage and of course persist it. Something that I did not add above is that pod and worker storage (meaning if you save files in the kubernetes worker or pod) once this vm / instance is restarted you will loose your data. So you want something durable.
To cut a long story short,
1) you can either to deploy your application / pod along with a Persistent Volume Claim assuming that your kubernetes cluster supports it. What is happening is that you can mount to your pod some kind of folder / storage which will be backed up by whatever is available to your cluster - some kind of NFS store. https://kubernetes.io/docs/concepts/storage/persistent-volumes/
2) You can 'outsource' this need to share a common local storage to some external provider, e.g a common case use an S3 bucket, and not tackle the problem on kubernetes - just keep and provision the app within kubernetes.
I hope I gave you some basic ideas.
Note: Kubernetes 1.14 now (March 2019) comes with Durable Local Storage Management is Now GA, which:
Makes locally attached (non-network attached) storage available as a persistent volume source.
Allows users to take advantage of the typically cheaper and improved performance of persistent local storage kubernetes/kubernetes: #73525, #74391, #74769 kubernetes/enhancements: #121 (kep)
That might help securing a truly persistent storage for your case.
As noted by x-yuri in the comments:
See more with "Kubernetes 1.14: Local Persistent Volumes GA", from Michelle Au (Google), Matt Schallert (Uber), Celina Ward (Uber).
you could use ipfs https://pypi.org/project/django-ipfs-storage/
creating a container with this image https://hub.docker.com/r/ipfs/go-ipfs/ in the same pod you can ref as 'localhost'

docker stack with overlay network & name resolution

I'm totally new to docker and started yesterday to do some tutorials. I want to build a small test application consisting of several different services (replicated and so on) that interact with each other and encountered a problem regarding 'service-discovery'. I started with the get-started tutorials on docker.com and at the moment i'm not really sure what's best practice in the world of docker to let the different containers in a network get to know each other...
As this is a rather vague 'problem description', i try to make this more precise. I want to use a few independent services (e.g. with stuff like postgre, mongodb, redis and rabbitmq...) together with a set of worker nodes to which work is assigned by a dedicated master node. Since it seems to be quite convenient, I wanted to use a docker-composer.yml file to define all my services and deploy them as a stack.
Moreover, I created a custom network and since it seems not to be possible to attach a stacked service to a bridge network, I created an attachable overlay network.
To finally get to the point: even though the services are deployed correctly, their actual container-name is random and without using somekind of service registry I'm not able to resolve their addresses.
A simple solution would be to use single containers with fixed container names - however this does not seem to be a best practice solution (even though it is actually just a docker-based DNS that is based on container names rather than domain names). Another problem are the randomly generated container names that contain underscores, and hence these names are not valid addresses that can be resolved...
best regards
Have you looked at something like Kubernetes? To quote from the home page:
It groups containers that make up an application into logical units for easy management and discovery.

How to share volumes across multiple hosts in docker engine swarm mode?

Can we share a common/single named volume across multiple hosts in docker engine swarm mode, what's the easiest way to do it ?
If you have an NFS server setup you can use use some nfs folder as a volume from docker compose like this:
volumes:
grafana:
driver: local
driver_opts:
type: nfs
o: addr=192.168.xxx.xx,rw
device: ":/PathOnServer"
In the grand scheme of things
The other answers are definitely correct. If you feel like you're still missing something or are coming to the conclusion that things might never really improve in this space, then you might want to reconsider the use of the typical POSIX-like hierarchical filesystem abstraction. Not all applications really need it (I might go as far as to say that few do). Maybe yours doesn't either.
In defense of filesystems
It is still very common in many circles, but usually these people know their remote/distributed filesystems very well and know how to set them up and leverage them properly (and they might be very good systems too, though often not with existing Docker volume drivers). Sometimes it's also in part because they're simply forced to (codebases that can't or shouldn't be rewritten to support other storage backends). Using, configuring or even writing arbitrary Docker volume drivers would be a secondary concern only.
Alternatives
If you have the option however, then evaluate other persistence solutions for your applications. Many implementations won't use POSIX filesystem interfaces but network interfaces instead, which pose no particular infrastructure-level difficulties in clusters such as Docker Swarm.
Solutions managed by third-parties (e.g. cloud providers)
Should you succeed in removing all dependencies to filesystems for persistent and shared data (it's still fine for transient local state), then you might claim to have fully "stateless" applications. Of course there is often always state persisted somewhere still, but the idea is that you don't handle it yourself. Many cloud providers (if that's where you're hosting things) will offer fully managed solutions for handling persistent state such that you don't have to care about it at all. If you're going this route, do consider managed services that use APIs compatible with implementations that you can use locally for testing (for example by running a Docker container based on an image for that implementation that is provided by a third-party or that you can maintain yourself).
DIY solutions
If you do want to manage persistent state yourself within a Docker Swarm cluster, then the filesystem abstraction is often inevitable (and you'd probably have more difficulties targeting block devices directly anyway). You'll want to play with node and service constraints to ensure the requirements of whatever you use to persist data are fulfilled. For certain things like a central DBMS server it could be easy ("always run the task on that specific node only"), for others it could be way more involved.
The task of setting up, scaling and monitoring such a setup is definitely not trivial, which is why many application developers are happy to let somebody else (e.g. cloud providers) do it. It's still a very cool space to explore however, though given you had to ask that question it's likely not something you should focus on if you're on a deadline.
Conclusion
As always, use the right abstraction for the job, and pause to think about what your strengths are and where to spend your resources.
From scratch, Docker does not support this by itself. You must use additional components either a docker plugin which would provide you with a new layer type for your volumes, or a sync tool directly on your FS which will sync the data for you.
From my point of view, the easiest solution is rsync or more accurately lsyncdn the daemon version of rsync. But I never tried it for docker volumes, so I can't tell if it handle it fine.
Other solutions are offered using Infinit.sh. It basically does the same thing as lsyncd does. It's a one way sync. So if your docker container are RW in their volumes it won't match your expectations. I tried this solution, and it works pretty well for RO operations. And not in production. It's still an alpha version. Infinit is also on the way to provide a docker driver. Not released yet. So I didn't even tried it. Too risky.
Other solutions I found but was unable to install (and so to try) are flocker and glusterFS. Both are designed to create FS Volume based on several HDD from several machines. But none of their repositories were working these past weeks.
Sorry for giving you only weak solutions, but I'm facing the same problem and haven't find yet a perfect solution.
Cheers,
Olivier

Resources