I have single node kubenertes cluster running on GKE. All the load is running on single node separated by namesapces.
now i would like to implement the auto-scaling. Is it possible i can scale mircoservices to new node but one pod is running My main node only.
what i am thinking
Main node : Running everything with 1 pod avaibility (Redis, Elasticsearch)
Scaled up node : Scaled up replicas only of stateless microservice
so is there any way i can implemet this using node auto scaeler or using affinity.
Issue is that right now i am running graylog, elasticsearch and redis and rabbitmq on single node having statefulsets and backed by volume i have to redeploy everything edit yaml file for adding affinity to all.
I'm not sure that I understand your question correctly but if I do then you may try to use taints and tolerations (node affinity). Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. All the details are available in the documentation here.
Assuming the issue you have is that the persistent volumes bound to your StatefulSets are only accessible from one node, then you can use the nodeAffinity field to constraint where the StatefulSet Pods can be scheduled. As mentioned in the documentation:
A PV can specify node affinity to define constraints that limit what
nodes this volume can be accessed from. Pods that use a PV will only
be scheduled to nodes that are selected by the node affinity.
Related
I've started learning kubernetes with docker and I've been thinking, what happens if master node dies/fails. I've already read the answers here. But it doesn't answer the remedy for it.
Who is responsible to bring it back? And how to bring it back? Can there be a backup master node to avoid this? If yes, how?
Basically, I'm asking a recommended way to handle master failure in kubernetes setup.
You should have multiple VMs serving as master node to avoid single point of failure.An odd number of 3 or 5 master nodes are recommended for quorum. Have a load balancer in-front of all the VMs serving as master node which can do load balancing and in case one master node dies loadbalancer should remove the VMs IP and make it as unhealthy and not send traffic to it.
Also ETCD cluster is the brain of a kubernetes cluster. So you should have multiple VMs serving as ETCD nodes. Those VMs can be same VMs as of master node or for reduced blast radius you can have separate VMs for ETCD. Again the odd number of VMs should should be 3 or 5. Make sure to take periodic backup of ETCD nodes data so that you can restore the cluster state to pervious state in case of a disaster.
Check the official doc on how to install a HA kubernetes cluster using Kubeadm.
In short, for Kubernetes you should keep master nodes to function properly all the time. There are different methods to make copies of master node, so it is available on failure. As example check this - https://kubernetes.io/docs/tasks/administer-cluster/highly-available-master/
Abhishek, you can run master node in high availability, you should set up the control plane aka master node behind Load balancer as first step. If you have plans to upgrade a single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.
By default because of security reasons the master node does not host PODS and if you want to enable hosting PODS on master node you can run the following command to do so.
kubectl taint nodes --all node-role.kubernetes.io/master
If you want to manually restore the master make sure you back up the etcd directory /var/lib/etcd. You can restore this on the new master and it should work. Read about high availability kubernetes over here.
I am toying around with Kubernetes and have managed to deploy a statefull application (jenkins instance) to a single node.
It uses a PVC to make sure that I can persist my jenkins data (jobs, plugins etc).
Now I would like to experiment with failover.
My cluster has 2 digital ocean droplets.
Currently my jenkins pod is running on just one node.
When that goes down, Jenkins becomes unavailable.
I am now looking on how to accomplish failover in a sense that, when the jenkins pod goes down on my node, it will spin up on the other node. (so short downtime during this proces is ok).
Of course it has to use the same PVC, so that my data remains intact.
I believe, when reading, that a StatefulSet kan be used for this?
Any pointers are much appreciated!
Best regards
Digital Ocean's Kubernetes service only supports ReadWriteOnce access modes for PVCs (see here). This means the volume can only be attached to one node at a time.
I came across this blogpost which, while focused on Jenkins on Azure, has the same situation of only supporting ReadWriteOnce. The author states:
the drawback for me though lies in the fact that the access mode for Azure Disk persistent volumes is ReadWriteOnce. This means that an Azure disk can be attached to only one cluster node at a time. In the event of a node failure or update, it could take anywhere between 1-5 minutes for the Azure disk to get detached and attached to the next available node.
Note, Pod failure and node failures are different things. Since DO only supports ReadWriteOnce, there's no benefit to trying anything more sophisticated than what you have right now in terms of tolerance to node failure. Since it's ReadWriteOnce the volume will need to be unmounted from the failing node and re-mounted to the new node, and then a new Pod will get scheduled on the new node. Kubernetes will do this for you, and there's not much you can do to optimize it.
For Pod failure, you could use a Deployment since you want to read and write the same data, you don't want different PVs attached to the different replicas. There may be very limited benefit to this, you will have multiple replicas of the Pod all running on the same node, so it depends on how the Jenkins process scales and if it can support that type of scale horizontal out model while all writing to the same volume (as opposed to simply vertically scaling memory or CPU requests).
If you really want to achieve higher availability in the face of node and/or Pod failures, and the Jenkins workload you're deploying has a hard requirement on local volumes for persistent state, you will need to consider an alternative volume plugin like NFS, or moving to a different cloud provider like GKE.
Yes, you would use a Deployment or StatefulSet depending on the use case. For Jenkins, a StatefulSet would be appropriate. If the running pod becomes unavailable, the StatefulSet controller will see that and spawn a new one.
What you are describing is the default behaviour of Kubernetes for Pods that are managed by a controller, such as a Deployment.
You should deploy any application as a Deployment (or another controller) even if it consists just of a single Pod. You never really deploy Pods directly to Kubernetes. So, in this case, there's nothing special you need to do to get this behaviour.
When one of your nodes dies, the Pod dies too. This is detected by the Deployment controller, which creates a new Pod. This is in turn detected by the scheduler, which assigns the new Pod to a node. Since one of the nodes is down, it will assign the Pod to the other node that is still running. Once the Pod is assigned to this node, the kubelet of this node will run the container(s) of this Pod on this node.
Ok, let me try to anwser my own question here.
I think Amit Kumar Gupta came the closest to what I believe is going on here.
Since I am using a Deployment and my PVC in ReadWriteOnce, I am basically stuck with one pod, running jenkins, on one node.
weibelds answer made me realise that I was asking questions to about a concept that Kubernetes performs by default.
If my pod goes down (in my case i am shutting down a node on purpose by doing a hard power down to simulate a failure), the cluster (controller?) will detect this and spawn a new pod on another node.
All is fine so far, but then I noticed that my new pod as stuck in ContainerCreating state.
Running a describe on my new pod (the one in ContainerCreating state) showed this
Warning FailedAttachVolume 16m attachdetach-controller Multi-Attach error for volume "pvc-cb772fdb-492b-4ef5-a63e-4e483b8798fd" Volume is already used by pod(s) jenkins-deployment-6ddd796846-dgpnm
Warning FailedMount 70s (x7 over 14m) kubelet, cc-pool-bg6u Unable to mount volumes for pod "jenkins-deployment-6ddd796846-wjbkl_default(93747d74-b208-421c-afa4-8d467e717649)": timeout expired waiting for volumes to attach or mount for pod "default"/"jenkins-deployment-6ddd796846-wjbkl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[jenkins-home default-token-wd6p7]
Then it started to hit me, this makes sense.
It's a pitty, but it makes sense.
Since I did a hard power down on the node, the PV went down with it.
So now the controller tries to start a new pod, on a new node but it cant transfer the PV, since the one on the previous pod became unreachable.
As I read more on this, I read that DigitalOcean only supports ReadWriteOnce , which now leaves me wondering, how the hell can I achieve a simple failover for a stateful application on a Kubernetes Cluster on Digital Ocean that consists of just a couple of simple droplets?
I have 1 master kubernetes server and 9 nodes. In that, I want to run backend on 2 nodes and frontend on 2 nodes and DB on 3 nodes.
For all backend, frontend, DB I have ready DockerImage.
How to run an image using kubernetes on only desired(2 or 3).
Please share some ideas to achieve the same.
The Kubernetes scheduler most of the time will do a good job distributing the pods across the cluster. You may want to delegate that responsibility to the scheduler unless you have very specific requirements.
If you want to control this, you can use:
Node selectors
Node Affinity or Anti-Affinity
Directly specify the node name in the deployment spec
From these three, the recommended approach is to use node affinity or anti-affinity due to its flexibility.
Run the front end as a Deployment with desired replica count and let kubernetes manage it for you.
Run Backend as Deployment with desired number of replicas and Kubernetes will figure out how to run it. Use node selectors if you prefer specific nodes.
Run the DB as Deployment OR StatefulSet, Kubernetes will figure out how to run it.
https://kubernetes.io/docs/tutorials/stateful-application/mysql-wordpress-persistent-volume/
Use network policies to restrict traffic.
You may use labels and nodeSelector. Here it is:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
I'm new to Kubernetes and I was wondering if it is possible to have container replicas launching one at a time? In other words, if I deploy a compose file yielding a container or pod configuration with N replicas, is it possible (and if so how) to ensure that each replica waits for the previous one to be ready before launching?
I read about readiness probes, but if I understood them correctly, they ensure pod ordering instead of replica, or did I misunderstood?
Thanks
A StatefulSet has this property: given three replicas, the second one will not start until the first one is running and ready.
(Usually "replica" and "pod" mean the same thing. If you create a Deployment or StatefulSet with 3 replicas, and run kubectl get pods once it's done, you should see 3 pods.)
If you're using Kompose to do the deployment, there's at least a hint that it doesn't support StatefulSets; you need to write native Kubernetes YAML for this.
Kubernetes has the StatefulSet Object to manage a set of replica's of a Pod. The StatefulSet differs from the default Deployment in the sense that it provides guarantees about the ordering and uniqueness of these Pods. From the documentation:
For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}.
As an example, see this blog on how to setup a StatefulSet for ElasticSearch.
I want to be able to override the gcr.io/google_containers/pause container only in a single pod. I'm having trouble finding in the documentation if it's possible at all.
I'm trying to set up a VPN client container/pod and use it's networking namespace to connect to an remote DC, but only for a single pod group.
The closest I have found is the --pod-infra-container-image flag on kubelet, which would modify it for all pods.
As the other answer suggests, this is not configurable per pod.
If you really want to achieve this through the custom infra container image and you have multiple nodes (and are willing to dedicate one node for this purpose), you can configure one node to use your custom infra container image. You should then label and taint the node such that
The group of pods can only be scheduled onto the special node based on the node selector in the pod spec.
Other pods cannot be scheduled onto the special node because they cannot tolerate the taint.
No, that container is designed to be uniform for all pods, and is not intended to be under the control of the API user.