Kubernetes POD Failover

Kubernetes POD Failover - jenkins

I am toying around with Kubernetes and have managed to deploy a statefull application (jenkins instance) to a single node.
It uses a PVC to make sure that I can persist my jenkins data (jobs, plugins etc).
Now I would like to experiment with failover.
My cluster has 2 digital ocean droplets.
Currently my jenkins pod is running on just one node.
When that goes down, Jenkins becomes unavailable.
I am now looking on how to accomplish failover in a sense that, when the jenkins pod goes down on my node, it will spin up on the other node. (so short downtime during this proces is ok).
Of course it has to use the same PVC, so that my data remains intact.
I believe, when reading, that a StatefulSet kan be used for this?
Any pointers are much appreciated!
Best regards

Digital Ocean's Kubernetes service only supports ReadWriteOnce access modes for PVCs (see here). This means the volume can only be attached to one node at a time.
I came across this blogpost which, while focused on Jenkins on Azure, has the same situation of only supporting ReadWriteOnce. The author states:
the drawback for me though lies in the fact that the access mode for Azure Disk persistent volumes is ReadWriteOnce. This means that an Azure disk can be attached to only one cluster node at a time. In the event of a node failure or update, it could take anywhere between 1-5 minutes for the Azure disk to get detached and attached to the next available node.
Note, Pod failure and node failures are different things. Since DO only supports ReadWriteOnce, there's no benefit to trying anything more sophisticated than what you have right now in terms of tolerance to node failure. Since it's ReadWriteOnce the volume will need to be unmounted from the failing node and re-mounted to the new node, and then a new Pod will get scheduled on the new node. Kubernetes will do this for you, and there's not much you can do to optimize it.
For Pod failure, you could use a Deployment since you want to read and write the same data, you don't want different PVs attached to the different replicas. There may be very limited benefit to this, you will have multiple replicas of the Pod all running on the same node, so it depends on how the Jenkins process scales and if it can support that type of scale horizontal out model while all writing to the same volume (as opposed to simply vertically scaling memory or CPU requests).
If you really want to achieve higher availability in the face of node and/or Pod failures, and the Jenkins workload you're deploying has a hard requirement on local volumes for persistent state, you will need to consider an alternative volume plugin like NFS, or moving to a different cloud provider like GKE.

Yes, you would use a Deployment or StatefulSet depending on the use case. For Jenkins, a StatefulSet would be appropriate. If the running pod becomes unavailable, the StatefulSet controller will see that and spawn a new one.

What you are describing is the default behaviour of Kubernetes for Pods that are managed by a controller, such as a Deployment.
You should deploy any application as a Deployment (or another controller) even if it consists just of a single Pod. You never really deploy Pods directly to Kubernetes. So, in this case, there's nothing special you need to do to get this behaviour.
When one of your nodes dies, the Pod dies too. This is detected by the Deployment controller, which creates a new Pod. This is in turn detected by the scheduler, which assigns the new Pod to a node. Since one of the nodes is down, it will assign the Pod to the other node that is still running. Once the Pod is assigned to this node, the kubelet of this node will run the container(s) of this Pod on this node.

Ok, let me try to anwser my own question here.
I think Amit Kumar Gupta came the closest to what I believe is going on here.
Since I am using a Deployment and my PVC in ReadWriteOnce, I am basically stuck with one pod, running jenkins, on one node.
weibelds answer made me realise that I was asking questions to about a concept that Kubernetes performs by default.
If my pod goes down (in my case i am shutting down a node on purpose by doing a hard power down to simulate a failure), the cluster (controller?) will detect this and spawn a new pod on another node.
All is fine so far, but then I noticed that my new pod as stuck in ContainerCreating state.
Running a describe on my new pod (the one in ContainerCreating state) showed this
Warning FailedAttachVolume 16m attachdetach-controller Multi-Attach error for volume "pvc-cb772fdb-492b-4ef5-a63e-4e483b8798fd" Volume is already used by pod(s) jenkins-deployment-6ddd796846-dgpnm
Warning FailedMount 70s (x7 over 14m) kubelet, cc-pool-bg6u Unable to mount volumes for pod "jenkins-deployment-6ddd796846-wjbkl_default(93747d74-b208-421c-afa4-8d467e717649)": timeout expired waiting for volumes to attach or mount for pod "default"/"jenkins-deployment-6ddd796846-wjbkl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[jenkins-home default-token-wd6p7]
Then it started to hit me, this makes sense.
It's a pitty, but it makes sense.
Since I did a hard power down on the node, the PV went down with it.
So now the controller tries to start a new pod, on a new node but it cant transfer the PV, since the one on the previous pod became unreachable.
As I read more on this, I read that DigitalOcean only supports ReadWriteOnce , which now leaves me wondering, how the hell can I achieve a simple failover for a stateful application on a Kubernetes Cluster on Digital Ocean that consists of just a couple of simple droplets?

Related

Run the kubernetes pod from the point of failure without restarting

I have deployed an application in Kubernetes that prints numbers from 1-20 in Kubernetes.
While printing numbers suddenly there is an internet failure and the pod crases after printing numbers from 1-10. Now the basic pod lifecycle says that the pod will restart and numbers will print again starting from 1 but I want to print the numbers from where it failed ie 10...
So basically I am searching for a way through which I can resume the application running in pods from the point of failure without restarting again.
Is there a way to do it ?? I have read about persistent storage and volumes but they are basically used to assign volumes to pods so that they can retain data and files .....
Please help me how can I achieve this and demonstrate this in form of POC ...

can a statefulset be of use here?
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
Using StatefulSets
StatefulSets are valuable for applications that require one or more of the following.
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
In the above, stable is synonymous with persistence across Pod (re)scheduling. If an application doesn't require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application using a workload object that provides a set of stateless replicas. Deployment or ReplicaSet may be better suited to your stateless needs.

what is meant by "service level" in docker?

While going through the documentation of getting started with kubernetes on docker desktop, i came through this word called service level , can anyone help me understand what is service level?
ps: i am a beginner in docker and kubernetes
thanks in advance :)

It is not entirely clear what "Service level" references in this case,
It says in your link:
Kubernetes makes sure containers are running to keep your app at the service level you requested in the YAML file
And a little further down:
And now remove that container:
Check back on the result app in your browser at http://localhost:5001 and you’ll see it’s still working. Kubernetes saw that the container had been removed and started a replacement straight away.^
Judging from the context they refer to that the kube-controller-manager in the Kubernetes control plane continuously watches the state of the cluster and compares it to the desired state. When it discovers a difference (for example when a pod was removed) it fixes it by adding a new pod to match the number of replicas defined in the deployment.
For example if the deployment was configured to run in N number of replicas and one is removed, N-1 replicas remain. The kube-controller-manager starts a new pod to achieve the desired state of N replicas.
In this case the service level would refer to the number of replicas running, but as mentioned, it is ambiguous...

There are services in kubernetes which you can use to expose applications (containers) running on pods.
You may read through this blog to learn more
https://medium.com/#naweed.rizvi/kubernetes-setup-local-cluster-with-docker-desktop-7ead3b17bf68
You can also Watch this tutorial
https://www.youtube.com/watch?v=CX8AnwTW2Zs&t=272s

HPA Implementation on single node kubernetes cluster

I am running Kubernetes cluster on GKE. Running the monolithic application and now migrating to microservices so both are running parallel on cluster.
A monolithic application is simple python app taking the memory of 200Mb around.
K8s cluster is simple single node cluster GKE having 15Gb memory and 4vCPU.
Now i am thinking to apply the HPA for my microservices and monolithic application.
On single node i have also installed Graylog stack which include (elasticsearch, mongoDb, Graylog pod). Sperated by namespace Devops.
In another namespace monitoring there is Grafana, Prometheus, Alert manager running.
There is also ingress controller and cert-manager running.
Now in default namespace there is another Elasticsearch for application use, Redis, Rabbitmq running. These all are single pod, Type statefulsets or deployment with volume.
Now i am thinking to apply the HPA for microservices and application.
Can someone suggest how to add node-pool on GKE and auto scale. When i added node in pool and deleted old node from GCP console whole cluster restarted and service goes down for while.
Plus i am thinking to use the affinity/anti-affinity so can someone suggest devide infrastructure and implement HPA.

From the wording in your question, I suspect that you want to move your current workloads to the new pool without disruption.
Since this action represents a voluntary disruption, you can start by defining a PodDisruptionBudget to control the number of pods that can be evicted in this voluntary disruption operation:
A PDB limits the number of pods of a replicated application that are down simultaneously from voluntary disruptions.
The settings in the PDB depend on your application and your business needs, for a reference on the values to apply, you can check this.
Following this, you can drain the nodes where your application is scheduled since it will be "protected" by the budget and, drain uses the Eviction API instead of directly deleting the pods, which should make evictions graceful.
Regarding Affinity, I'm not sure how it fits in the beforementioned goal that you're trying to achieve. However, there is an answer of this particular regard in the comments.

Trying to understand what a Kubernetes Worker Node and Pod is compared to a Docker "Service"

I'm trying to learn Kubernetes to push up my microservices solution to some Kubernetes in the Cloud (e.g. Azure Kubernetes Service, etc)
As part of this, I'm trying to understand the main concepts, specifically around Pods + Workers and (in the yml file) Pods + Services. To do this, I'm trying to compare what I have inside my docker-compose file against the new concepts.
Context
I currently have a docker-compose.yml file which contains about 10 images. I've split the solution up into two 'networks': frontend and backend. The backend network contains 3 microservices and cannot be accessed at all via a browser. The frontend network contains a reverse-proxy (aka. Traefik, which is just like nginx) which is used to route all requests to the appropriate backend microservice and a simple SPA web app. All works 100% awesome.
Each backend Microservice has at least one of these:
Web API host
Background tasks host
So this means, I could scale out the WebApi hosts, if required .. but I should never scale out the background tasks hosts.
Here's a simple diagram of the solution:
So if the SPA app tries to request some data with the following route:
https://api.myapp.com/account/1 this will hit the reverse-proxy and match a rule to then forward onto <microservice b>/account/1
So it's from here, I'm trying to learn how to write up an Kubernetes deployment file based on these docker-compose concepts.
Questions
Each 'Pod' has it's own IP so I should create a Pod per container. (Yes, a Pod can have multiple containers and to me, that's like saying 'install these software products on the same machine')
A 'Worker Node' is what we replicate/scale out, so we should put our Pods into a Node based on the scaling scenario. For example, the Background Task hosts should go into one Node because they shouldn't be scaled. Also, the hardware requirements for that node are really small. While the Web Api's should go into another Node so they can be replicated/scaled out
If I'm on the right path with the understanding above, then I'll have a lot of nodes and pods ... which feels ... weird?

The pod is the unit of Workload, and has one or more containers. Exactly one container is normal. You scale that workload by changing the number of Pod Replicas in a ReplicaSet (or Deployment).
A Pod is mostly an accounting construct with no direct parallel to base docker. It's similar to docker-compose's Service. A pod is mostly immutable after creation. Like every resource in kubernetes, a pod is a declaration of desired state - containers to be running somewhere. All containers defined in a pod are scheduled together and share resources (IP, memory limits, disk volumes, etc).
All Pods within a ReplicaSet are both fungible and mortal - a request can be served by any pod in the ReplicaSet, and any pod can be replaced at any time. Each pod does get its own IP, but a replacement pod will probably get a different IP. And if you have multiple replicas of a pod they'll all have different IPs. You don't want to manage or track pod IPs. Kubernetes Services provide discovery (how do I find those pods' IPs) and routing (connect to any Ready pod without caring about its identity) and load balancing (round robin over that group of Pods).
A Node is the compute machine (VM or Physical) running a kernel and a kubelet and a dockerd. (This is a bit of a simplification. Other container runtimes than just dockerd exist, and the virtual-kubelet project aims to turn this assumption on its head.)
All pods are Scheduled on Nodes. When a pod (with containers) is scheduled on a node, the kubelet responsible for & running on that node does things. The kubelet talks to dockerd to start containers.
Once scheduled on a node, a pod is not moved to another node. Nodes are fungible & mortal too, though. If a node goes down or is being decommissioned, the pod will be evicted/terminated/deleted. If that pod was created by a ReplicaSet (or Deployment) then the ReplicaSet Controller will create a new replica of that pod to be scheduled somewhere else.
You normally start many (1-100) pods+containers on the same node+kubelet+dockerd. If you have more pods than that (or they need a lot of cpu/ram/io), you need more nodes. So the nodes are also a unit of scale, though very much indirectly wrt the web-app.
You do not normally care which Node a pod is scheduled on. You let kubernetes decide.

Is there a best practice to reboot a cluster

I followed Alex Ellis' excellent tutorial that uses kubeadm to spin-up a K8s cluster on Raspberry Pis. It's unclear to me what the best practice is when I wish to power-cycle the Pis.
I suspect sudo systemctl reboot is going to result in problems. I'd prefer not to delete and recreate the cluster each time starting with kubeadm reset.
Is there a way that I can shutdown and restart the machines without deleting the cluster?
Thanks!

This question is quite old but I imagine others may eventually stumble upon it so I thought I would provide a quick answer because there is, in fact, a best practice around this operation.
The first thing that you're going to want to ensure is that you have a highly available cluster. This consists of at least 3 masters and 3 worker nodes. Why 3? This is so that at any given time they can always form a quorum for eventual consistency.
Now that you have an HA Kubernetes cluster, you're going to have to go through every single one of your application manifests and ensure that you have specified Resource Requests and Limitations. This is so that you can ensure that a pod will never be scheduled on a pod without the required resources. Furthermore, in the event that a pod has a bug that causes it to consume a highly abnormal amount of resources, the limitation will prevent it from taking down your cluster.
Now that that is out of the way, you can begin the process of rebooting the cluster. The first thing you're going to do is reboot your masters. So run kubectl drain $MASTER against one of your (at least) three masters. The API Server will now reject any scheduling attempts and immediately start the process of evicting any scheduled pods and migrating their workloads to your other masters.
Use kubectl describe node $MASTER to monitor the node until all pods have been removed. Now you can safely connect to it and reboot it. Once it has come back up, you can now run kubectl uncordon $MASTER and the API Server will once again begin scheduling Pods to it. Once again use kubectl describe $NODE until you have confirmed that all pods are READY.
Repeat this process for all of the masters. After the masters have been rebooted, you can safely repeat this process for all three (or more) worker nodes. If you properly perform this operation you can ensure that all of your applications will maintain 100% availability provided they are using multiple pods per service and have proper Deployment Strategy configured.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart