Adding Hypervisor back to Failover Cluster - failovercluster

Somehow I removed my test hyper-visor from a two node cluster and now when i try to add it back to cluster it is not happening basically the Hyp is pointing towards CSV but not able to access it when I spin a VM and place it in a volume in that CSV. What could I be possibly doing wrong and when I try to connect to Failover Cluster which is already present from the same Hyp I am not able to connect to it with an error message that cites issues with network.

We first have to clear the cluster by running following command
Clear-ClusterNode -Name nodeName -Force
After the cluster is cleared you will be able to add the node back to the cluster.

Related

What happens if master node dies in kubernetes? How to resolve the issue?

I've started learning kubernetes with docker and I've been thinking, what happens if master node dies/fails. I've already read the answers here. But it doesn't answer the remedy for it.
Who is responsible to bring it back? And how to bring it back? Can there be a backup master node to avoid this? If yes, how?
Basically, I'm asking a recommended way to handle master failure in kubernetes setup.
You should have multiple VMs serving as master node to avoid single point of failure.An odd number of 3 or 5 master nodes are recommended for quorum. Have a load balancer in-front of all the VMs serving as master node which can do load balancing and in case one master node dies loadbalancer should remove the VMs IP and make it as unhealthy and not send traffic to it.
Also ETCD cluster is the brain of a kubernetes cluster. So you should have multiple VMs serving as ETCD nodes. Those VMs can be same VMs as of master node or for reduced blast radius you can have separate VMs for ETCD. Again the odd number of VMs should should be 3 or 5. Make sure to take periodic backup of ETCD nodes data so that you can restore the cluster state to pervious state in case of a disaster.
Check the official doc on how to install a HA kubernetes cluster using Kubeadm.
In short, for Kubernetes you should keep master nodes to function properly all the time. There are different methods to make copies of master node, so it is available on failure. As example check this - https://kubernetes.io/docs/tasks/administer-cluster/highly-available-master/
Abhishek, you can run master node in high availability, you should set up the control plane aka master node behind Load balancer as first step. If you have plans to upgrade a single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.
By default because of security reasons the master node does not host PODS and if you want to enable hosting PODS on master node you can run the following command to do so.
kubectl taint nodes --all node-role.kubernetes.io/master
If you want to manually restore the master make sure you back up the etcd directory /var/lib/etcd. You can restore this on the new master and it should work. Read about high availability kubernetes over here.

Why am I getting "Structure needs cleaning" message on Ceph with Kubernetes?

Sorry to ask this, I am relatively new in Kubernetes and Ceph, only have a little idea about this.
I have setup Kubernetes and Ceph using this tutorial(http://tutorial.kubernetes.noverit.com/content/ceph.html)
I had set up my cluster like this:
1 Kube-Master and 2 worker Nodes(this acts ceph monitor with 2 OSD in each)
The ceph-deploy I used to setup ceph cluster is in the Kube-master.
Everything is working fine, I installed my sample web application(deployment) with 5 replicas, which will create a file when the rest API is hit. The file is getting copied to every node.
But after 10 min, I created one more file using the API, but when I try to list(ls -l) I am getting the following error:
For node1:
ls: cannot access 'previousFile.txt': Structure needs cleaning
previousFile.txt newFile.txt
For node2:
previousFile.txt
For node2 the new file is not created
What might be the issue? I have tried many times still same error pop up.
Any help appreciated.
This totally looks like your filesystem got corrupted. Things to check:
$ kubectl logs <ceph-pod1>
$ kubectl logs <ceph-pod2>
$ kubectl describe deployment <ceph-deployment> # did any of the pods restart?
Some info about the error message here.
Depending on what you have you might need to start from scratch. Or you can take a look a recovering data in Ceph, but may not work if you don't have a snap.
Running Ceph on Kubernetes can be very tricky because any start/restart for a specific node starting on a different Kubernetes node might corrupt the data, so you need to make sure that part is pretty solid possibly using Node Affinity or running Ceph pods on specific Kubernetes nodes with labels.

Wrong ip when setting up Redis cluster on Kubernetes "Waiting for the cluster to join..."

I am trying to build a redis cluster with kubernetes on azure. And I am faced with the exact same problem when running different samples : sanderp.nl/running-redis-cluster-on-kubernetes or github.com/zuxqoj/kubernetes-redis-cluster
Everything goes well until I try to have the different nodes join the cluster with the redis-trib command.
At that time I face the infamous infinite "Waiting for the cluster to join ...." message.
Trying to see what is happening, I set up the loglevel of the redis pods to debug level. I then noticed that the pods do not seem to announce their correct ip when communicating together.
In fact it seems that the last byte of the ip is replaced by a zero. Say if pod1 has ip address 10.1.34.9, I will see in pod2 logs:
Accepted clusternode 10.1.34.0:someport
So the pods do not seem to be able to communicate back and the join cluster process never ends.
Now, if before running redis-trib, I enforce the cluster-announce-ip by running on each pod :
redis-cli -h mypod-ip config set cluster-announce-ip mypod-ip
the redis-trib command then completes successfully and the cluster is up and running.
But this not a viable solution as if a pod goes down and comes back, it may have changed ip and I will face the same problem when it will try to join the cluster.
Note that I do not encounter any problem when running the samples with minikube.
I am using flannel for kubernetes networking. Can the problem come from incorrect configuration of flannel ? Has anyone encountered the same issue ?
You can use statefulsets for deploying your replicas, so your pod will have a unique name always.
Moreover, you will be able to use service DNS-names as host. See this official doc DNS for Services and Pods.
The second example you shared, has another part for redis cluster using statefulsets. Try that out.

Docker swarm load balancing - How to give common name to the service?

I read swarm routing mesh
I create a simple service which uses tomcat server and listens at 8080.
docker swarm init I created a node manager at node1.
docker swarm join /tokens I used the token provided by the manager at node 2 and node 3 to create workers.
docker node ls shows 5 instances of my service, 3 running at node 1, 1 running at node 2, another one is at node 3.
docker service create image I created the service.
docker service scale imageid=5 scaled it.
My application uses atomic number which is maintained at JVM level.
If I hit http://node1:8080/service 25 times, all requests goes to node1. How dose it balance node?
If I hit http://node2:8080/service, it goes to node 2.
Why is it not using round-robin?
Doubts:
Is anything wrong in the above steps?
Did I miss something?
I feel I am missing something. Like common service name http://domain:8080/service, then swarm will work in round robin fashion.
I would like to understand only swarm mode. I am not interested external load balancer as of now.
How do I see swarm load balance in action?
Docker does round robin load balancing per connection to the port. As long as a connection is up, it will continue to go to the same instance.
Http allows a connection to be kept alive and reused. Browsers take advantage of this behavior to speed up later requests by leaving connections open. To test the round robin load balancing, you'd need to either disable that keep alive setting or switch to a command line tool like curl or wget.

ejabberd clustering, Slave doesn't work when master goes down

I have setup ejabberd clustering, one is master and other is slave as described here.
I have copied .erlang.cookie and database files from master to slave.
Everything is working fine.
The issue is when I stop master node:
Then no request getting routed to slave.
When trying to restart slave node its not getting start once it down.
I get stuck here, please help me out.
Thanks
This is the standard behaviour of Mnesia. If the node you start was not the last one that was stopped in a cluster, then it does not have any way to know if it has the latest, most up to date data.
The process to start a Mnesia cluster is to start the node in reverse order in which they were shutdown.
In case the node that was last seen on Mnesia cluster cannot start or join the cluster, them you need to use a Mnesia command to force the cluster "master", that is tell it that you consider this node has the most up to date content. This is done by using Erlang command mnesia:set_master_nodes/1.
For example, from ejabberd Erlang command-line:
mnesia:set_master_nodes([node1#myhost]).
In most case, Mnesia clustering handles everything automatically. When a node goes down, the other nodes are aware and automatically keep on working transparently. The only case you need to set which node as the reference data (with set_master_nodes/1), is when this is ambiguous for Mnesia, that is either when starting only nodes that were down when there was still running nodes or when there is a netsplit.
Follow the step from below link:
http://chadillac.tumblr.com/post/35967173942/easy-ejabberd-clustering-guide-mnesia-mysql
and call the method join_as_master(NodeName) of the easy_cluster module.

Resources