I try to understand what is a Statefulset but I can not understand this. I know just it is like a ReplicaSet and copies from the Container will created. I have understood even it is appropriate for databases but the rest i could not understand. Can anybody explain that, maybe with a example or a use case ?
Thanks for your helps.
As name says statefulsets : In kubernetes if you are running the stateful application that time instead of deployment you have to use the statefulsets.
Stateful sets are used for application storing data in memory, session and handling state.
For example stateful set can be useful for Elasticsearch, Redis
Yes, it is similar to a replicaset but with some extra features like ensuring that only one pod is created at time and allowing a templated volume config so each pod gets different volumes (unlike a RS where the pod template results in every pod being identical). These features are aimed specifically at stateful applications like databases, though the specifics would depend on the database and are out of scope for an SO question.
Related
I am some kind of new to Kubernetes and Docker and I was studying the concept of statelessness and statefulness and I understand that stateless microservices don't store data on the host, whereas stateful microservices require some kind of storage on the host who serves the requests but if it's up to me I will always use a stateful one why should I ever use a stateless pod? what is the advantage of statelessness?
For a typical Kubernetes Pod, it will be managed by a higher-level controller like a Deployment. You might set the Deployment to have replicas: 3 so that if one of them fails the other two can pick up the load. On an update the existing Pods will get deleted and recreated. If there's heavy load, you can set up a HorizontalPodAutoscaler to increase that replica count for you, which will create more pods when needed.
All of this is really straightforward if your containers are stateless, and there are no consequences to kubectl delete pod.
The problem with a stateful pod is, well, the state. Kubernetes gives you some choices on where to store data, but most of them can only be used on one pod at a time; if you have multiple replicas then each generally needs its own local storage, and the application needs to know how to reconcile the multiple copies of it. (Or, if you can set up something like an NFS server, the application needs to know how to handle concurrent writes.) Operationally, you need to know how to back up and restore all of the individual little volumes that are getting created along the way.
A standard approach is to minimize the number of places where state is stored, and use network I/O from stateless applications to put things in places. The state doesn't even need to be in the cluster: if your application is running in AWS, you could have containers that principally store data in RDS hosted relational databases and Amazon's S3 object store but keep nothing locally, and you can then use normal backup and management approaches for those out-of-cluster stores.
I have 3 replicas of same application running in kubernetes and application exposes an endpoint. Hitting the endpoint sets a boolean variable value to true which is having a use in my application. But the issue is, when hitting the endpoint, the variable is updated only in one of the replicas. How to make the changes in all replicas just by hitting one endpoint?
You need to store your data in a shared database of some kind, not locally in memory. If all you need is a temporary flag, Redis would be a popular choice, or for more durable stuff Postgres is the gold standard. But there's a wide and wonderful world of databases out there, explore which match your use case.
Seems like you're trying to solve an issue with your app using Kubernetes.
Let me elaborate:
If you have several pods behind a service, you can't access all of them using a single request. This have been proposed here, but in my opinion - isn't best practice.
If you need to share data between your apps, you can have them communicate with each other using a cluster service.
You probably assume you can share data using Kubernetes volumes, such as gcePersistentDisk or some other sort of volume, but then again, volumes were not meant to solve such problems.
In conclusion, the way I would solve such issue, is by having the pods communicate changes with each other. Kubernetes can't solve this for you.
EDIT:
Another approach could be having a shared storage (for example a single pod containing mongoDB for example) but I would say that it's a bit of an overkill for your issue. Also because in order to communicate with this pod you would probably need in-cluster communication anyway.
I'm trying to figure out if Kubernetes will work for a certain use case. I understand the networking/clustering concept, and even the load balancing and how that can be used with things like nginx. However, assuming this is not deployed on a public cloud and things like ELB won't be available, could it still be used for a high-speed networking application using DPDK? For example, if we assume the cluster networking provided by k8s is only used for the control/management path, and the containers themselves handle the NIC directly with DPDK, is this something it's commonly used for?
Secondly, I understand the replication controller and petsets feature I think, but I'm not really clear on whether the intent of those features is for high availability or not. It seems that the "pod fails and the RC replaces it on a different node" isn't necessarily for HA, and there aren't really guarantees on how fast it builds a new pod. Am I incorrect?
For the second question, if the replication controller has size large than 1, it is highly available.
For example, you have an service "web-svc" in front of the replication controller "web-app", with size 3, then your request will be load balanced to one of the 3 pod:
web-svc ----> {web-app-pod1, web-app-pod2, web-app-pod3}
If some of the 3 pods fail, kubernetes will replace them with new ones.
And pet set is similar to replication controller, but used for stateful applications like database.
I'm sharing the same cluster for 2 namespaces: staging and production. The only differences among the two namespaces are:
Volumes mounted to certain pods (separate persistence between staging and production, obviously!)
A couple of web-URLs for relative addressing
A couple of IPs to databases used for sophisticated persistence
I have managed to address (2) and (3) as follows, so as to maintain a single YAML file for all ReplicationControllers:
Use ConfigMaps local to a namespace to define any configuration that is passed via environment variables into the pods
Use Services with Endpoints to handle a DNS entry pointing to different internal IPs
However, I'm unable to find a satisfactory way to have a reference for a gcePersistentDisk's pdName - I can't seem to use a ConfigMap, hence a little stumped. What would be the appropriate way to go about this? The best alternative seems to be to maintain 2 separate YAML files with different strings, but this has a code-smell as it is violating DRY.
Also, any constructive commentary on the rest of my setup as mentioned above is highly appreciated :-)
You could probably create one PersistentVolumeClaim in each namespace. Take a look at Can a PVC be bound to a specific PV? on how to "pre-bind" PersistentVolumes to PersistentVolumeClaims.
Might not be an ideal solution but it probably works 'till PVCs support label selectors.
I Have the following setup in mind:
Kubernetes on Mesos (based on the kubernetes-mesos project) within a /16 network.
Each pod will have its own IP and I believe this will avail 64 000 pods.
The idea is to provide isolation for each app i.e. Each app gets its own mysql within the same pod - the app accesses mysql on localhost(within the pod).
If an additional service were needed, I'd use kubernetes rolling updates to add the service's container to the pod, the app will be able to access this new service on localhost as well.
Each application needs as much isolation as possible.
Are there any defects to such an implementation?
Do I have to use weave?
There's an option to specify the service-ip-range while running the kubernetes-mesos install.
One hole is how do I scale a service, is this really viable?
Is there a better way to do this? i.e. Offering isolated services
Thanks.
PS//I'm obviously a noobie at this and I'm trying to get the best possible setup running.
A common misconception is that a Pod should manage a vertical, multi-tier stack: for example a web tier + DB tier together.
It's interesting to read the Kubernetes design intent of Pods: they're for collecting 'helper' processes rather than composing a vertical stack.
To answer your questions, I'd recommend:
Define a Pod template for the web tier only. This can be scaled to any size required, using a replication controller (questions #1 and #3).
Define another Pod for MySQL.
Use the Service abstraction to locate these components.
This sort of design will work for small applications, but you're right that it'll be tough to scale up if you suddenly want two have a couple instances of a service hit the same mysql backend.
You may want to look into putting each service into a separate namespace. Then a service's DNS lookups will be scoped to its own namespace by default so that it won't find other services' resources unless it's explicitly looking for them. This would let you put mysql (and any other dependencies) in a separate pod so that the frontend could be scaled independently.