how do i know when should i use a stateless pod or a stateful one? - docker

I am some kind of new to Kubernetes and Docker and I was studying the concept of statelessness and statefulness and I understand that stateless microservices don't store data on the host, whereas stateful microservices require some kind of storage on the host who serves the requests but if it's up to me I will always use a stateful one why should I ever use a stateless pod? what is the advantage of statelessness?

For a typical Kubernetes Pod, it will be managed by a higher-level controller like a Deployment. You might set the Deployment to have replicas: 3 so that if one of them fails the other two can pick up the load. On an update the existing Pods will get deleted and recreated. If there's heavy load, you can set up a HorizontalPodAutoscaler to increase that replica count for you, which will create more pods when needed.
All of this is really straightforward if your containers are stateless, and there are no consequences to kubectl delete pod.
The problem with a stateful pod is, well, the state. Kubernetes gives you some choices on where to store data, but most of them can only be used on one pod at a time; if you have multiple replicas then each generally needs its own local storage, and the application needs to know how to reconcile the multiple copies of it. (Or, if you can set up something like an NFS server, the application needs to know how to handle concurrent writes.) Operationally, you need to know how to back up and restore all of the individual little volumes that are getting created along the way.
A standard approach is to minimize the number of places where state is stored, and use network I/O from stateless applications to put things in places. The state doesn't even need to be in the cluster: if your application is running in AWS, you could have containers that principally store data in RDS hosted relational databases and Amazon's S3 object store but keep nothing locally, and you can then use normal backup and management approaches for those out-of-cluster stores.

Related

What purpose to ephemeral volumes serve in Kubernetes?

I'm starting to learn Kubernetes recently and I've noticed that among the various tutorials online there's almost no mention of Volumes. Tutorials cover Pods, ReplicaSets, Deployments, and Services - but they usually end there with some example microservice app built using a combination of those four. When it comes to databases they simply deploy a pod with the "mongo" image, give it a name and a service so that other pods can see it, and leave it at that. There's no discussion of how the data is written to disk.
Because of this I'm left to assume that with no additional configuration, containers are allowed to write files to disk. I don't believe this implies files are persistent across container restarts, but if I wrote a simple NodeJS application like so:
const fs = require("fs");
fs.writeFileSync("test.txt", "blah");
const value = fs.readFileSync("test.txt", "utf8");
console.log(value);
I suspect this would properly output "blah" and not crash due to an inability to write to disk (note that I haven't tested this because, as I'm still learning Kubernetes, I haven't gotten to the point where I know how to put my own custom images in my cluster yet -- I've only loaded images already on Docker Hub so far)
When reading up on Kubernetes Volumes, however, I came upon the Ephemeral Volume -- a volume that:
get[s] created and deleted along with the Pod
The existence of ephemeral volumes leads me to one of two conclusions:
Containers can't write to disk without being granted permission (via a Volume), and so every tutorial I've seen so far is bunk because mongo will crash when you try to store data
Ephemeral volumes make no sense because you can already write to disk without them, so what purpose do they serve?
So what's up with these things? Why would someone create an ephemeral volume?
Container processes can always write to the container-local filesystem (Unix permissions permitting); but any content that goes there will be lost as soon as the pod is deleted. Pods can be deleted fairly routinely (if you need to upgrade the image, for example) or outside your control (if the node it was on is terminated).
In the documentation, the types of ephemeral volumes highlight two major things:
emptyDir volumes, which are generally used to share content between containers in a single pod (and more specifically to publish data from an init container to the main container); and
injecting data from a configMap, the downward API, or another data source that might be totally artificial
In both of these cases the data "acts like a volume": you specify where it comes from, and where it gets mounted, and it hides any content that was in the underlying image. The underlying storage happens to not be persistent if a pod is deleted and recreated, unlike persistent volumes.
Generally prepackaged versions of databases (like Helm charts) will include a persistent volume claim (or create one per replica in a stateful set), so that data does get persisted even if the pod gets destroyed.
So what's up with these things? Why would someone create an ephemeral volume?
Ephemeral volumes are more of a conceptual thing. The main need for this concept is driven from microservices and orchestration processes, and also guided by 12 factor app. But why?
Because, one major use case is when you are deploying a number of microservices (and their replicas) using containers across multiple machines in a cluster you don't want a container to be reliant on its own storage. If containers rely on their on storage, shutting them down and starting new ones affects the way your app behaves, and this is something everyone wants to avoid. Everyone wants to be able to start and stop containers whenever they want, because this allows easy scaling, updates, etc.
When you actually need a service with persistent data (like DB) you need a special type of configuration, especially if you are running on a cluster of machines. If you are running on one machine, you could use a mounted volume, just to be sure that your data will persist even after container is stopped. But if you want to just load balance across hundreds of stateless API services, ephemeral containers is what you actually want.

How to update in all replicas when hitting an endpoint

I have 3 replicas of same application running in kubernetes and application exposes an endpoint. Hitting the endpoint sets a boolean variable value to true which is having a use in my application. But the issue is, when hitting the endpoint, the variable is updated only in one of the replicas. How to make the changes in all replicas just by hitting one endpoint?
You need to store your data in a shared database of some kind, not locally in memory. If all you need is a temporary flag, Redis would be a popular choice, or for more durable stuff Postgres is the gold standard. But there's a wide and wonderful world of databases out there, explore which match your use case.
Seems like you're trying to solve an issue with your app using Kubernetes.
Let me elaborate:
If you have several pods behind a service, you can't access all of them using a single request. This have been proposed here, but in my opinion - isn't best practice.
If you need to share data between your apps, you can have them communicate with each other using a cluster service.
You probably assume you can share data using Kubernetes volumes, such as gcePersistentDisk or some other sort of volume, but then again, volumes were not meant to solve such problems.
In conclusion, the way I would solve such issue, is by having the pods communicate changes with each other. Kubernetes can't solve this for you.
EDIT:
Another approach could be having a shared storage (for example a single pod containing mongoDB for example) but I would say that it's a bit of an overkill for your issue. Also because in order to communicate with this pod you would probably need in-cluster communication anyway.

Kubernetes scaling pods using custom algorithm

Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web and Mongo. Currently we run these containers on a single machine. However as our users are increasing we are looking for a solution to scale. Using Kubernetes we would form a multi container pod. If we are to replicate we need to replicate all 3 containers as a unit. Our cloud application is consumed by mobile app users. Our app can only handle approx 30000 users per Worker node and we intend to place a single pod on a single worker node. Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
We plan on using Kubernetes to manage the containers. Load balancing doesn't work for our use case as a mobile device needs to be tied to a single machine once assigned and each Pod works independently with its own persistent volume. However we need a way of spinning up new Pods on worker nodes if the number of users goes over 30000 and so on.
The idea is we have some sort of custom scheduler which assigns a mobile device a Worker Node ( domain/ IPaddress) depending on the number of users on that node.
Is Kubernetes a good fit for this design and how could we implement a custom pod scale algorithm.
Thanks
Piggy-Backing on the answer of Jonah Benton:
While this is technically possible - your problem is not with Kubernetes it's with your Application! Let me point you the problem:
Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web, and Mongo.
Here is your first problem: Is you can only deploy these three containers together and not independently - you cannot scale one or the other!
While MongoDB can be scaled to insane loads - if it's bundled with your web server and web application it won't be able to...
So the first step for you is to break up these three components so they can be managed independently of each other. Next:
Currently we run these containers on a single machine.
While not strictly a problem - I have serious doubt's what it would mean to scale your application and what the challenges that come with scalability!
Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
Now, this IS a problem. You're looking to run an application on Kubernetes but I do not think you understand the consequences of doing that: Kubernetes orchestrates your resources. This means it will move pods (by killing and recreating) between nodes (and if necessary to the same node). It does this fully autonomous (which is awesome and gives you a good night sleep) If you're relying on clients sticking to a single nodes IP, you're going to get up in the middle of the night because Kubernetes tried to correct for a node failure and moved your pod which is now gone and your users can't connect anymore. You need to leverage the load-balancing features (services) in Kubernetes. Only they are able to handle the dynamic changes that happen in Kubernetes clusters.
Using Kubernetes we would form a multi container pod.
And we have another winner - No! You're trying to treat Kubernetes as if it were your on-premise infrastructure! If you keep doing so you're going to fail and curse Kubernetes in the process!
Now that I told you some of the things you're thinking wrong - what a person would I be if I did not offer some advice on how to make this work:
In Kubernetes your three applications should not run in one pod! They should run in separate pods:
your webservers work should be done by Ingress and since you're already familiar with nginx, this is probably the ingress you are looking for!
Your web application should be a simple Deployment and be exposed to ingress through a Service
your database should be a separate deployment which you can either do manually through a statefullset or (more advanced) through an operator and also exposed to the web application trough a Service
Feel free to ask if you have any more questions!
Building a custom scheduler and running multiple schedulers at the same time is supported:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
That said, to the question of whether kubernetes is a good fit for this design- my answer is: not really.
K8s can be difficult to operate, with the payoff being the level of automation and resiliency that it provides out of the box for whole classes of workloads.
This workload is not one of those. In order to gain any benefit you would have to write a scheduler to handle the edge failure and error cases this application has (what happens when you lose a node for a short period of time...) in a way that makes sense for k8s. And you would have to come up to speed with normal k8s operations.
With the information provided, hard pressed to see why one would use k8s for this workload over just running docker on some VMs and scripting some of the automation.

Kubernetes With DPDK

I'm trying to figure out if Kubernetes will work for a certain use case. I understand the networking/clustering concept, and even the load balancing and how that can be used with things like nginx. However, assuming this is not deployed on a public cloud and things like ELB won't be available, could it still be used for a high-speed networking application using DPDK? For example, if we assume the cluster networking provided by k8s is only used for the control/management path, and the containers themselves handle the NIC directly with DPDK, is this something it's commonly used for?
Secondly, I understand the replication controller and petsets feature I think, but I'm not really clear on whether the intent of those features is for high availability or not. It seems that the "pod fails and the RC replaces it on a different node" isn't necessarily for HA, and there aren't really guarantees on how fast it builds a new pod. Am I incorrect?
For the second question, if the replication controller has size large than 1, it is highly available.
For example, you have an service "web-svc" in front of the replication controller "web-app", with size 3, then your request will be load balanced to one of the 3 pod:
web-svc ----> {web-app-pod1, web-app-pod2, web-app-pod3}
If some of the 3 pods fail, kubernetes will replace them with new ones.
And pet set is similar to replication controller, but used for stateful applications like database.

Kubernetes on Mesos

I Have the following setup in mind:
Kubernetes on Mesos (based on the kubernetes-mesos project) within a /16 network.
Each pod will have its own IP and I believe this will avail 64 000 pods.
The idea is to provide isolation for each app i.e. Each app gets its own mysql within the same pod - the app accesses mysql on localhost(within the pod).
If an additional service were needed, I'd use kubernetes rolling updates to add the service's container to the pod, the app will be able to access this new service on localhost as well.
Each application needs as much isolation as possible.
Are there any defects to such an implementation?
Do I have to use weave?
There's an option to specify the service-ip-range while running the kubernetes-mesos install.
One hole is how do I scale a service, is this really viable?
Is there a better way to do this? i.e. Offering isolated services
Thanks.
PS//I'm obviously a noobie at this and I'm trying to get the best possible setup running.
A common misconception is that a Pod should manage a vertical, multi-tier stack: for example a web tier + DB tier together.
It's interesting to read the Kubernetes design intent of Pods: they're for collecting 'helper' processes rather than composing a vertical stack.
To answer your questions, I'd recommend:
Define a Pod template for the web tier only. This can be scaled to any size required, using a replication controller (questions #1 and #3).
Define another Pod for MySQL.
Use the Service abstraction to locate these components.
This sort of design will work for small applications, but you're right that it'll be tough to scale up if you suddenly want two have a couple instances of a service hit the same mysql backend.
You may want to look into putting each service into a separate namespace. Then a service's DNS lookups will be scoped to its own namespace by default so that it won't find other services' resources unless it's explicitly looking for them. This would let you put mysql (and any other dependencies) in a separate pod so that the frontend could be scaled independently.

Resources