I'm having some trouble configuring my deployments on GKE.
I want to decrease the default(110) number of pods allowed on a node and I can't do it through the console. I read that the max number of pods can be set by kubelet --max-pods=<int32> however I don't know how to do this with GKE.
Any help is much appreciated.
The kubelet --max-pods=<int32> approach you mentioned is actually deprecated as per the Kubernetes documentation:
This parameter should be set via the config file specified by the
Kubelet's --config flag. See
https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
for more information.
However, in Google Kubernetes Engine, you can modify this by going to the Developer Console > Kubernetes Engine > Clusters, click on your cluster's name and click on Add Node Pool at the top. You'll be able to set a maximum number of pods per node for that new node pool.
Once done, you'll be able to deploy your services to specific node pools by following the instructions here.
Related
While going through the documentation of getting started with kubernetes on docker desktop, i came through this word called service level , can anyone help me understand what is service level?
ps: i am a beginner in docker and kubernetes
thanks in advance :)
It is not entirely clear what "Service level" references in this case,
It says in your link:
Kubernetes makes sure containers are running to keep your app at the service level you requested in the YAML file
And a little further down:
And now remove that container:
Check back on the result app in your browser at http://localhost:5001 and you’ll see it’s still working. Kubernetes saw that the container had been removed and started a replacement straight away.^
Judging from the context they refer to that the kube-controller-manager in the Kubernetes control plane continuously watches the state of the cluster and compares it to the desired state. When it discovers a difference (for example when a pod was removed) it fixes it by adding a new pod to match the number of replicas defined in the deployment.
For example if the deployment was configured to run in N number of replicas and one is removed, N-1 replicas remain. The kube-controller-manager starts a new pod to achieve the desired state of N replicas.
In this case the service level would refer to the number of replicas running, but as mentioned, it is ambiguous...
There are services in kubernetes which you can use to expose applications (containers) running on pods.
You may read through this blog to learn more
https://medium.com/#naweed.rizvi/kubernetes-setup-local-cluster-with-docker-desktop-7ead3b17bf68
You can also Watch this tutorial
https://www.youtube.com/watch?v=CX8AnwTW2Zs&t=272s
I'm using airflow with composer (GCP) to extract data from cloud sql for gcs and after gcs for bigquery, I have some tables between 100 Mb and 10 Gb. My dag has two tasks to do what I mentioned before. with the smaller tables the dag runs smoothly, but with slightly larger tables the cloud sql extraction task ends in a few seconds with failure, but does not bring any logs except "negsignal.sigkill", I have already tried to increase the composer capacity , among other things, but nothing has worked yet.
I'm using the mysql_to_gcs and gcs_to_bigquery operators
The first thing you should check when you get negsinal.SIGKILL is your Kubernetes resources. This is surely a problem with resources limits.
I think you should monitor your Kubernetes Cluster Nodes. Inside GCP, go to Kubernetes Engine > Clusters. You should have a cluster containing the environment that Cloud Composer uses.
Now, head to the nodes of your cluster. Each node provides you metrics about CPU, memory & disk usage. You will also see the limit for the resources that each node uses. Also, you will see the pods that each node has.
If you are not very familiar with K8s, let me explain this briefly. Airflow uses Pods inside nodes to run your Airflow tasks. These pods are called airflow-worker-[id]. That way you can identify your worker pods inside the Node.
Check your pods list. If you have evicted airflow-worker pods, then Kubernetes is stopping your workers for some reason. Since Composer uses CeleryExecutor, a evicted airflow-worker points to a problem. This is not the case if you use KubernetesExecutor, but that is not available yet in Composer.
If you click in some evicted pod, you will see the reason for eviction. That should give you the answer.
If you don't see a problem with your pod eviction, don't panic, you still have some options. From that point on, your best friend will be logs. Be sure to check your pods logs, node logs and cluster logs, in that order.
I've started learning kubernetes with docker and I've been thinking, what happens if master node dies/fails. I've already read the answers here. But it doesn't answer the remedy for it.
Who is responsible to bring it back? And how to bring it back? Can there be a backup master node to avoid this? If yes, how?
Basically, I'm asking a recommended way to handle master failure in kubernetes setup.
You should have multiple VMs serving as master node to avoid single point of failure.An odd number of 3 or 5 master nodes are recommended for quorum. Have a load balancer in-front of all the VMs serving as master node which can do load balancing and in case one master node dies loadbalancer should remove the VMs IP and make it as unhealthy and not send traffic to it.
Also ETCD cluster is the brain of a kubernetes cluster. So you should have multiple VMs serving as ETCD nodes. Those VMs can be same VMs as of master node or for reduced blast radius you can have separate VMs for ETCD. Again the odd number of VMs should should be 3 or 5. Make sure to take periodic backup of ETCD nodes data so that you can restore the cluster state to pervious state in case of a disaster.
Check the official doc on how to install a HA kubernetes cluster using Kubeadm.
In short, for Kubernetes you should keep master nodes to function properly all the time. There are different methods to make copies of master node, so it is available on failure. As example check this - https://kubernetes.io/docs/tasks/administer-cluster/highly-available-master/
Abhishek, you can run master node in high availability, you should set up the control plane aka master node behind Load balancer as first step. If you have plans to upgrade a single control-plane kubeadm cluster to high availability you should specify the --control-plane-endpoint to set the shared endpoint for all control-plane nodes. Such an endpoint can be either a DNS name or an IP address of a load-balancer.
By default because of security reasons the master node does not host PODS and if you want to enable hosting PODS on master node you can run the following command to do so.
kubectl taint nodes --all node-role.kubernetes.io/master
If you want to manually restore the master make sure you back up the etcd directory /var/lib/etcd. You can restore this on the new master and it should work. Read about high availability kubernetes over here.
My team is currently working on migrating a Discord chat bot to Kubernetes. We plan on using a StatefulSet for the main bot service, as each Shard (pod) should only have a single connection to the Gateway. Whenever a shard connects to said Gateway, it tells it its ID (in our case the pod's ordinal index) and how many shards we are running in total (the amount of replicas in the StatefulSet).
Having to tell the gateway the total number of shards means that in order to scale our StatefulSet up or down we'd have to stop all pods in that StatefulSet before starting new ones with the updated value.
How can I achieve that? Preferrably through configuration so I don't have to run a special command each time.
Try kubectl rollout restart sts <sts name> command. It'll restart the pods one by one in a RollingUpdate way.
Scale down the sts
kubectl scale --replicas=0 sts <sts name>
Scale up the sts
kubectl scale --replicas=<number of replicas> sts <sts name>
One way of doing this is,
Firstly get the YAML configuration of StatefulSets by running the below command and save it in a file:
kubectl get statefulset NAME -o yaml > sts.yaml
And then delete the StatefulSets by running the below command:
kubectl delete -f sts.yaml
And Finally, again create the StatefulSets by using the same configuration file which you got in the first step.
kubectl apply -f sts.yaml
I hope this answers your query to only delete the StatefulSets and to create the new StatefulSets as well.
Before any kubectl scale, since you need more control on your nodes, you might consider a kubectl drain first
When you are ready to put the node back into service, use kubectl uncordon, which will make the node schedulable again.
By draining the node where your pods are maanged, you would stop all pods, with the opportunity to scale the statefulset with the new value.
See also "How to Delete Pods from a Kubernetes Node" by Keilan Jackson
Start at least with kubectl cordon <nodename> to mark the node as unschedulable.
If your pods are controlled by a StatefulSet, first make sure that the pod that will be deleted can be safely deleted.
How you do this depends on the pod and your application’s tolerance for one of the stateful pods to become temporarily unavailable.
For example you might want to demote a MySQL or Redis writer to just a read-only slave, update and release application code to no longer reference the pod in question temporarily, or scale up the ReplicaSet first to handle the extra traffic that may be caused by one pod being unavailable.
Once this is done, delete the pod and wait for its replacement to appear on another node.
Trying to deploy pods in my kubernetes cluster and some of the pods are giving me an error of some storage problems. Screen shot is given below:
I am sure the problem is with one of my worker node. its not a problem with pulsar i think. i'll also share the YAML file here just for a clear view of what the problem is.
Link to YAML File:https://github.com/apache/pulsar/blob/master/deployment/kubernetes/generic/k8s-1-9-and-above/zookeeper.yaml
I need help with the YAML file to tweek it arround a little, so that the pods can be created with existing requirements i have on my worker nodes. I'll be happy if you need more information.
Thanks in advance
It looks like the affinity rules are preventing the pods from starting. In production, you want to make sure the Zookeeper pods (and other pod groups like BookKeeper) don't run on the same worker node, which is why those rules are configured that way. You can increase your Kubernetes setup to 3 worker nodes, or remove the affinity rules from the various stateful sets and deployment files.
Alternatively, you can use this Helm chart (full disclosure: I am the creator) to deploy Pulsar to Kubernetes:
https://helm.kafkaesque.io
See the section "Installing Pulsar for development" for settings that will enable Pulsar to run in smaller Kubernetes setups, including disabling affinity rules.