I have a running cluster that was created by a ACS-Engine template deployment following this guide: https://github.com/Azure/acs-engine/blob/master/docs/swarmmode.md
I want to add extra Master nodes to my running cluster...
Say I have 3 Master nodes and I want to scale that to 5. Or similarly, say one of my 3 master VMs crashes, can't be recovered and it needs to be replaced.
What is the right way to add a new Master node VM to my cluster?
Related
I'm learning spark, and I'm get confused about running docker which contains spark code on Kubernetes cluster.
I read that spark get utilized multiple nodes (servers) and it can run code on different nodes, in order to get complete jobs faster (and get used the memory of each node, when the data is too big)
On the other side, I read that Kubernetes pod (which contains dockers/containers) run on one node.
For example, I'm running the following spark code from docker:
num = [1, 2, 3, 4, 5]
num_rdd = sc.parallelize(num)
double_rdd = num_rdd.map(lambda x: x * 2)
Some notes and reminders (from my understanding):
When using the map command, each value of the num array maps to different spark node (slave worker)
k8s pod run on one node
So I'm confused how spark utilized multiple nodes when the pod run on one node ?
Does the spark slave workers runs on different nodes, and this is how the pod which run the code above can communicate with those nodes in order to utilize the spark framework ?
When you run Spark on Kubernetes, you have a few ways to set things up. The most common way is to set Spark to run in client-mode.
Basically Spark can run on Kubernetes on a Pod.. then the application itself, having the endpoints for the k8s masters, is able to spawn its own worker Pods, as long as everything is correctly configured.
What is needed for this setup is to deploy the Spark application on Kubernetes (usually with a StatefulSet but it's not a must) along with an headless ClusterIP Service (which is required to make worker Pods able to communicate with the master application that spawned them)
You also need to give the Spark application all the correct configurations such as the k8s masters endpoint, the Pod name and other parameters to set things up.
There are other ways to setup Spark, there's no obligation to spawn worker Pods, you can run all the stages of your code locally (and the configuration is easy, if you have small jobs with small amount of data to execute you don't need workers)
Or you can execute the Spark application externally from the Kubernetes cluster, so not on a pod.. but giving it the Kubernetes master endpoints so that it can still spawn workers on the cluster (aka cluster-mode)
You can find a lot more info in the Spark documentation, which explains mostly everything to set things up (https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode)
And can read about StatefulSets and their usage of headless ClusterIP Services here (https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
I've started learning kubernetes with docker and I've been thinking, what happens if master node dies/fails. I've already read the answers here. But it doesn't answer the remedy for it.
Who is responsible to bring it back? And how to bring it back? Can there be a backup master node to avoid this? If yes, how?
Basically, I'm asking a recommended way to handle master failure in kubernetes setup.
You should have multiple VMs serving as master node to avoid single point of failure.An odd number of 3 or 5 master nodes are recommended for quorum. Have a load balancer in-front of all the VMs serving as master node which can do load balancing and in case one master node dies loadbalancer should remove the VMs IP and make it as unhealthy and not send traffic to it.
Also ETCD cluster is the brain of a kubernetes cluster. So you should have multiple VMs serving as ETCD nodes. Those VMs can be same VMs as of master node or for reduced blast radius you can have separate VMs for ETCD. Again the odd number of VMs should should be 3 or 5. Make sure to take periodic backup of ETCD nodes data so that you can restore the cluster state to pervious state in case of a disaster.
Check the official doc on how to install a HA kubernetes cluster using Kubeadm.
In short, for Kubernetes you should keep master nodes to function properly all the time. There are different methods to make copies of master node, so it is available on failure. As example check this - https://kubernetes.io/docs/tasks/administer-cluster/highly-available-master/
Abhishek, you can run master node in high availability, you should set up the control plane aka master node behind Load balancer as first step. If you have plans to upgrade a single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.
By default because of security reasons the master node does not host PODS and if you want to enable hosting PODS on master node you can run the following command to do so.
kubectl taint nodes --all node-role.kubernetes.io/master
If you want to manually restore the master make sure you back up the etcd directory /var/lib/etcd. You can restore this on the new master and it should work. Read about high availability kubernetes over here.
I have single node kubenertes cluster running on GKE. All the load is running on single node separated by namesapces.
now i would like to implement the auto-scaling. Is it possible i can scale mircoservices to new node but one pod is running My main node only.
what i am thinking
Main node : Running everything with 1 pod avaibility (Redis, Elasticsearch)
Scaled up node : Scaled up replicas only of stateless microservice
so is there any way i can implemet this using node auto scaeler or using affinity.
Issue is that right now i am running graylog, elasticsearch and redis and rabbitmq on single node having statefulsets and backed by volume i have to redeploy everything edit yaml file for adding affinity to all.
I'm not sure that I understand your question correctly but if I do then you may try to use taints and tolerations (node affinity). Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. All the details are available in the documentation here.
Assuming the issue you have is that the persistent volumes bound to your StatefulSets are only accessible from one node, then you can use the nodeAffinity field to constraint where the StatefulSet Pods can be scheduled. As mentioned in the documentation:
A PV can specify node affinity to define constraints that limit what
nodes this volume can be accessed from. Pods that use a PV will only
be scheduled to nodes that are selected by the node affinity.
I have setup ejabberd clustering, one is master and other is slave as described here.
I have copied .erlang.cookie and database files from master to slave.
Everything is working fine.
The issue is when I stop master node:
Then no request getting routed to slave.
When trying to restart slave node its not getting start once it down.
I get stuck here, please help me out.
Thanks
This is the standard behaviour of Mnesia. If the node you start was not the last one that was stopped in a cluster, then it does not have any way to know if it has the latest, most up to date data.
The process to start a Mnesia cluster is to start the node in reverse order in which they were shutdown.
In case the node that was last seen on Mnesia cluster cannot start or join the cluster, them you need to use a Mnesia command to force the cluster "master", that is tell it that you consider this node has the most up to date content. This is done by using Erlang command mnesia:set_master_nodes/1.
For example, from ejabberd Erlang command-line:
mnesia:set_master_nodes([node1#myhost]).
In most case, Mnesia clustering handles everything automatically. When a node goes down, the other nodes are aware and automatically keep on working transparently. The only case you need to set which node as the reference data (with set_master_nodes/1), is when this is ambiguous for Mnesia, that is either when starting only nodes that were down when there was still running nodes or when there is a netsplit.
Follow the step from below link:
http://chadillac.tumblr.com/post/35967173942/easy-ejabberd-clustering-guide-mnesia-mysql
and call the method join_as_master(NodeName) of the easy_cluster module.
I have two instances of an app container (happens to be a Node.JS app, but that shouldn't matter) running in a Kubernetes cluster on Google Container Engine. I'd like to scale it up to three instances.
My cluster has a master and two minion nodes, with a replication controller and a load balancer service. The replication controller keeps my app container running happily on the two nodes.
I can see that there is a handy gcloud alpha container kubectl resize command which lets me change the number of replicas, but I don't see how or if I can increase the size of the cluster itself, so that it can spin up another minion node. I only see gcloud commands to create, delete, list and describe clusters; nothing to resize them.
If I can't resize my cluster, then to scale up I'd need to create a whole new cluster and kill the old one. Am I missing something?
Also, are there plans to support auto-scaling?
Update (June 2015): Kubernetes on GCE now uses managed instance groups which you can manually resize to add new nodes to your cluster.
There isn't currently a way to add nodes to your existing Google Container Engine cluster. We are currently adding support to Kubernetes to allow clusters to have nodes dynamically added but the work isn't quite finished yet. Once the feature is available in Kubernetes you can expect that it will show up in Google Container Engine shortly after the next Kubernetes release.
In the mean time, it should be possible to run more than two replicas of your node.js application on the existing two VMs.
This presentation: http://fr.slideshare.net/craigbox/autoscaling-kubernetes is about kubernetes horizontal scaling in Google Cloud. I put this link here to save some googling to next one that ends up on this thread.