Im trying to build a three node DSE cluster. I have installed DSE 4.5.6 in node1(ip1) node2(ip2) node3(ip3). I have configured cassandra.yaml as below.
cluster_name= DSE(same in all nodes)
seed= ip1(same in all nodes)
listen_port=respective ip's for each node.
and strated the dse services and checked nodetool status. It shows only node1 in the ring. Im i missing any configurations.?
Related
Assume that there are some pods from Deployments/StatefulSet/DaemonSet, etc. running on a Kubernetes node.
Then I restarted the node directly, and then start docker, start kubelet with the same parameters.
What would happen to those pods?
Are they recreated with metadata saved locally from kubelet? Or use info retrieved from api-server? Or recovered from OCI runtime and behaves like nothing happened?
Is it that only stateless pod(no --local-data) can be recovered normally? If any of them has a local PV/dir, would they be connected back normally?
What if I did not restart the node for a long time? Would api-server assign other nodes to create those pods? What is the default timeout value? How can I configure this?
As far as I know:
apiserver
^
|(sync)
V
kubelet
^
|(sync)
V
-------------
| CRI plugin |(like api)
| containerd |(like api-server)
| runc |(low-level binary which manages container)
| c' runtime |(container runtime where containers run)
-------------
When kubelet received a PodSpec from kube-api-server, it calls CRI like a remote service, the steps be like:
create PodSandbox(a.k.a 'pause' image, always 'stopped')
create container(s)
run container(s)
So I guess that as the node and docker being restarted, steps 1 and 2 are already done, containers are at 'stopped' status; Then as kubelet being restarted, it pulls latest info from kube-api-server, found out that container(s) are not in 'running' state, so it calls CRI to run container(s), then everything are back to normal.
Please help me confirm.
Thank you in advance~
Good questions. A few things first; a Pod is not pinned to a certain node. The nodes is mostly seen as a "server farm" that Kubernetes can use to run its workload. E.g. you give Kubernetes a set of nodes and you also give a set of e.g. Deployment - that is desired state of applications that should run on your servers. Kubernetes is responsible for scheduling these Pods and also keep them running when something in the cluster is changed.
Standalone pods is not managed by anything, so if a Pod crashes it is not recovered. You typically want to deploy your stateless apps as Deployments, that then initiates ReplicaSets that manage a set of Pods - e.g. 4 Pods - instances of your app.
Your desired state; a Deployment with e.g. replicas: 4 is saved in the etcd database within the Kubernetes control plane.
Then a set of controllers for Deployment and ReplicaSet is responsible for keeping 4 replicas of your app alive. E.g. if a node becomes unresponsible (or dies), new pods will be created on other Nodes, if they are managed by the controllers for ReplicaSet.
A Kubelet receives a PodSpecs that are scheduled to the node, and then keep these pods alive by regularly health checks.
Is it that only stateless pod(no --local-data) can be recovered normally?
Pods should be seen as emphemeral - e.g. can disappear - but is recovered by a controller that manages them - unless deployed as standalone Pod. So don't store local data within the pod.
There is also StatefulSet pods, those are meant for stateful workload - but distributed stateful workload, typically e.g. 3 pods, that use Raft to replicate data. The etcd database is an example of distributed database that uses Raft.
The correct answer: it depends.
Imagine, you've got 3 nodes cluster, where you created a Deployment with 3 replicas, and 3-5 standalone pods.
Pods are created and scheduled to nodes.
Everything is up and running.
Let's assume that worker node node1 has got 1 deployment replica and 1 or more standalone pods.
The general sequence of node restart process as follows:
The node gets restarted, for ex. using sudo reboot
After restart, the node starts all OS processes in the order specified by systemd dependencies
When dockerd is started it does nothing. At this point all previous containers has Exited state.
When kubelet is started it requests the cluster apiserver for the list of Pods with node property equals its node name.
After getting the reply from apiserver, kubelet starts containers for all pods described in the apiserver reply using Docker CRI.
When pause container starts for each Pod from the list, it gets new IP address configured by CNI binary, deployed by Network addon Daemonset's Pod.
After kube-proxy Pod is started on the node, it updates iptables rules to implement Kubernetes Services desired configuration, taking to account new Pods' IP addresses.
Now things become a bit more complicated.
Depending on apiserver, kube-controller-manager and kubelet configuration, they reacts on the fact that node is not responding with some delay.
If the node restarts fast enough, kube-controller-manager doesn't evict the Pods and they all remain scheduled on the same node increasing their RESTARTS number after their new containers become Ready.
Example 1.
The cluster is created using Kubeadm with Flannel network addon on Ubuntu 18.04 VM created in GCP.
Kubernetes version is v1.18.8
Docker version is 19.03.12
After the node is restarted, all Pods' containers are started on the node with new IP addresses. Pods keep their names and location.
If node is stopped for a long time, the pods on the node stays in Running state, but connection attempts are obviously timed out.
If node remains stopped, after approximately 5 minutes pods scheduled on that node were evicted by kube-controller-manager and terminated. If I would start node before that eviction all pods were remained on the node.
In case of eviction, standalone Pods disappear forever, Deployments and similar controllers create necessary number of pods to replace evicted Pods and kube-scheduler puts them to appropriate nodes. If new Pod can't be scheduled on another node, for ex. due to lack of required volumes it will remain in Pending state, until the scheduling requirements were satisfied.
On a cluster created using Ubuntu 18.04 Vagrant box and Virtualbox hypervisor with host-only adapter dedicated for Kubernetes networking, pods on stopped node remains in the Running, but with Readiness: false state even after two hours, and were never evicted. After starting the node in 2 hours all containers were restarted successfully.
This configuration's behavior is the same all the way from Kubernetes v1.7 till the latest v1.19.2.
Example 2.
The cluster is created in Google cloud (GKE) with the default kubenet network addon:
Kubernetes version is 1.15.12-gke.20
Node OS is Container-Optimized OS (cos)
After the node is restarted (it takes around 15-20 seconds) all pods are started on the node with new IP addresses. Pods keep their names and location. (same with example 1)
If the node is stopped, after short period of time (T1 equals around 30-60 seconds) all pods on the node change status to Terminating. Couple minutes later they disappear from the Pods list. Pods managed by Deployment are rescheduled on other nodes with new names and ip addresses.
If the node pool is created with Ubuntu nodes, apiserver terminates Pods later, T1 equals around 2-3 minutes.
The examples show that the situation after worker node gets restarted is different for different clusters, and it's better to run the experiment on a specific cluster to check if you can get the expected results.
How to configure those timeouts:
How Can I Reduce Detecting the Node Failure Time on Kubernetes?
Kubernetes recreate pod if node becomes offline timeout
When the node is restarted and there are pods scheduled on it, managed by Deployment or ReplicaSet, those controllers will take care of scheduling desired number of replicas on another, healthy node. So if you have 2 replicas running on restarted node, they will be terminated and scheduled on other node.
Before restarting a node you should use kubectl cordon to mark the node as unschedulable and give kubernetes time to reschedule pods.
Stateless pods will not be rescheduled on any other node, they will be terminated.
I've started learning kubernetes with docker and I've been thinking, what happens if master node dies/fails. I've already read the answers here. But it doesn't answer the remedy for it.
Who is responsible to bring it back? And how to bring it back? Can there be a backup master node to avoid this? If yes, how?
Basically, I'm asking a recommended way to handle master failure in kubernetes setup.
You should have multiple VMs serving as master node to avoid single point of failure.An odd number of 3 or 5 master nodes are recommended for quorum. Have a load balancer in-front of all the VMs serving as master node which can do load balancing and in case one master node dies loadbalancer should remove the VMs IP and make it as unhealthy and not send traffic to it.
Also ETCD cluster is the brain of a kubernetes cluster. So you should have multiple VMs serving as ETCD nodes. Those VMs can be same VMs as of master node or for reduced blast radius you can have separate VMs for ETCD. Again the odd number of VMs should should be 3 or 5. Make sure to take periodic backup of ETCD nodes data so that you can restore the cluster state to pervious state in case of a disaster.
Check the official doc on how to install a HA kubernetes cluster using Kubeadm.
In short, for Kubernetes you should keep master nodes to function properly all the time. There are different methods to make copies of master node, so it is available on failure. As example check this - https://kubernetes.io/docs/tasks/administer-cluster/highly-available-master/
Abhishek, you can run master node in high availability, you should set up the control plane aka master node behind Load balancer as first step. If you have plans to upgrade a single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.
By default because of security reasons the master node does not host PODS and if you want to enable hosting PODS on master node you can run the following command to do so.
kubectl taint nodes --all node-role.kubernetes.io/master
If you want to manually restore the master make sure you back up the etcd directory /var/lib/etcd. You can restore this on the new master and it should work. Read about high availability kubernetes over here.
I'd like to upgrade the Docker engine on my Docker Swarm managed nodes (both manager and worker nodes) from 18.06 to 19.03, without causing any downtime.
I see there are many tutorials online for rolling update of a Dockerized application without downtime, but nothing related to upgrading the Docker engine on all Docker Swarm managed nodes.
Is it really not possible to upgrade the Docker daemon on Docker Swarm managed nodes without a downtime? If true, that would indeed be a pity.
Thanks in advance to the wonderful community at SO!
You can upgrade managers, in place, one at a time. During this upgrade process, you would drain the node with docker node update, and run the upgrade to the docker engine with the normal OS commands, and then return the node to active. What will not work is to add or remove nodes to the cluster while the managers have mixed versions. This means you cannot completely replace nodes with an install from scratch at the same time you upgrade the versions. All managers need to be the same version (upgraded) and then you can look at rebuilding/replacing the hosts. What I've seen in the past is that nodes do not fully join the manager quorum, and after losing enough managers you eventually lose quorum.
Once all managers are upgraded, then you can upgrade the workers, either with in place upgrades or replacing the nodes. Until the workers have all been upgraded, do not use any new features.
You can drain your node and after that upgrade your docker version,then make this ACTIVE again.
Repeat this step for all the nodes.
DRAIN availability prevents a node from receiving new tasks from the swarm manager. Manager stops tasks running on the node and launches replica tasks on a node with ACTIVE availability.
For detailed information you can refer this link :- https://docs.docker.com/engine/swarm/swarm-tutorial/drain-node/
I have just started working with DSE 4.8.4 in AWS EC2. Launched 2 "m3.xlarge" instances in us-west-1a availability zone. Of course, both nodes are in same region and same availability zone. This is fresh installation and does not have any user defined keyspace and no data etc.
On both instances, DSE 4.8.4 was installed as per datastax documentation. Service 'dse' starts on both nodes as individual with default endpoint_snitch as "com.datastax.bdp.snitch.DseSimpleSnitch" and I have used ALL private IP addresses on both nodes in cassandra.yaml file.
Now, when I changed 2nd node's .yaml file seeds property to point to 1st node Private IP address, dse service on 2nd node no longer starts with error indicating "Unable to find gossip ...", with hint to fix snitch settings.
I looked around and it seems, snitch should be used as Ec2Snitch.
Q1) Does both nodes need to have same snitch?
Q2) Will cassandra-rackdc.properties file need any changes due to us-west-1a?
Q3) Should I follow steps described in http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeSingleDS.html ?
My objective is to build this 2-node cluster (manually as I did not use OpsCenter) with suitable changes to relevant config files.
I would really appreciate if someone could please point me in right direction?
I follow this website link for ejabberd clustering http://chad.ill.ac/post/35967173942/easy-ejabberd-clustering-guide-mnesia-mysql
everything is fine its shows two nodes running db and web admin also two node master and slave but if i shtdown master or slave node other one node not continue the process what should i do for if one node is down otherone is continue the process.
Mnesia behaves as a multi-master database. But if you down the nodes, the restart process should be in reverse order. If you have node1 and node2 and you kill node 1 and after that, you kill node2, then you should restart node2 first and then node1. That's because Mnesia thinks that the last updated node is the last one.