Apache pulsar on Kubernetes cluster (Local Environment)

Apache pulsar on Kubernetes cluster (Local Environment) - docker

Trying to deploy pods in my kubernetes cluster and some of the pods are giving me an error of some storage problems. Screen shot is given below:
I am sure the problem is with one of my worker node. its not a problem with pulsar i think. i'll also share the YAML file here just for a clear view of what the problem is.
Link to YAML File:https://github.com/apache/pulsar/blob/master/deployment/kubernetes/generic/k8s-1-9-and-above/zookeeper.yaml
I need help with the YAML file to tweek it arround a little, so that the pods can be created with existing requirements i have on my worker nodes. I'll be happy if you need more information.
Thanks in advance

It looks like the affinity rules are preventing the pods from starting. In production, you want to make sure the Zookeeper pods (and other pod groups like BookKeeper) don't run on the same worker node, which is why those rules are configured that way. You can increase your Kubernetes setup to 3 worker nodes, or remove the affinity rules from the various stateful sets and deployment files.
Alternatively, you can use this Helm chart (full disclosure: I am the creator) to deploy Pulsar to Kubernetes:
https://helm.kafkaesque.io
See the section "Installing Pulsar for development" for settings that will enable Pulsar to run in smaller Kubernetes setups, including disabling affinity rules.

Related

Deployment to the particular Kubernetes cluster node

I am trying to implement the CI/CD pipeline using Kubernetes , Jenkins with my private SVN repository. And I am planning to use Kubernetes cluster having 3 master and 15 worker machine/Node. And Using Jenkins to deploy the microservice developed using spring boot. So When I am deploying using Jenkins , How I can define which microservice need to deploy in which node in kubernetes cluster?. Do I need to specify in Pod ? Or Any other definition ?

How I can define which microservice need to deploy in which node in kubernetes cluster?. Do I need to specify in Pod ? Or Any other definition ?
As said in other answers you don't need to do this, but you can if there is any reason to do so using deprecated nodeSelector or preferable affinities. They are well worth the time to read since you can have some pods relating to specific services/microservices group together or away from each other across available nodes to allow for more flexible and resilient architecture and proper spread out. This way you are helping scheduler deciding where to place what to achieve desired layout. For most basic needs previously mentioned resource allocation can do the trick but for any fine graining you have affinity (and anti affinity) at your disposal. Documentation detailing this is here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

Kubernetes figures out what nodes should run what pods. You don't have to do that. You have to indicate how much memory and cpu each pod needs, k8s to a first approximation figures out the rest.
That said, what you do have to do is figure out how to partition the full set of workloads you need to run- say, by environment (dev/stage/prod), or by tenant (team X/team Y/team Z or client X/client Y/client Z)- into namespaces, then figure out what workflow makes sense for that partitioning, then configure the CI to satisfy that workflow.

How to find inter-dependency between pods in kubernates cluster?

I have 2 pods running in my kubernates cluster. One is simple a wordpress application and the 2nd one contains a mysql DB. Now wordpress is communicating with mysql DB.
I want to find this dependancies between pods. Is there any kubectl command or any tool like prometheus by which I can find dependancies between pods inside kubernates cluster?

No, there is no native kubernetes primitive which can define dependencies between pods. An easy thing you can do is to define labels like dependsOn and attach them to the corresponding pod.
For example, your wordpress pod can have a label which says dependsOn: mysql where mysql can either be the name or another label of your mysql pod.
But this will only help a human reader understand what this pod is dependent on. Kubernetes works on the principle of eventual consistency. Even if mysql doesn't start before wordpress, eventually they will start working together and system will become consistent. The wordpress pod will crash when it cannot find mysql and Kubernetes will keep restarting crashing pods.
If you want to define dependencies between applications on Kubernetes and require deployments to happen in a particular order etc. you can take a look at tools like Aptomi.

Kubernetes Deployments, Pod and Container concepts

I have started recently getting familiar with Kubernetes, however while I do get the concept I have some questions I am unable to answer clearly through Kubernete's Concept and Documentation, and some understandings that I'd wish to confirm.
A Deployment is a group of one or more container images (Docker ..etc) that is deployed within a Pod, and through Kubernetes Deployment Controller such deployments are monitored and created, updated, or deleted.
A Pod is a group of one or more containers, are those containers from the same Deployment, or can they be from multiple deployments?
"A pod models contains one or more application containers which are relatively tightly coupled". Is there any clear criteria on when to deploy containers within the same pod, rather than separate pods?
"Pods are the smallest deployable units of computing that can be created and managed in Kubernetes" - Pods, Kuberenets Documentation. Is that to mean that Kubernetes API is unable to monitor, and manage containers (at least directly)?
Appreciate your input.

your question is actually too broad for StackOverflow but I'll quickly answer before this one is closed.
Maybe it get's clearer when you look at the API documentation. Which you could read like this:
A Deployment describes a specification of the desired behavior for the contained objects.
This is done within the spec field which is of type DeploymentSpec.
A DeploymentSpec defines how the related Pods should look like with a templatethrough the PodTemplateSpec
The PodTemplateSpec then holds the PodSpec for all the require parameters and that defines how containers within this Pod should look like through a Container definition.
This is not a punchy oneline statement, but maybe makes it easier to see how things relate to each other.
Related to the criteria on what's a good size and what's too big for a Pod or a Container. This is very opinion loaded and the best way to figure that out is to read through the opinions on the size of Microservices.
To cover your last point - Kubernetes is able to monitor and manage containers, but the "user" is not able to schedule single containers. They have to be embedded in a Pod definion. You can of course access Container status and details per container (e.g. through kubeget logs <pod> -c <container> (details) or through the metrics API.
I hope this helps a bit and doesn't add to the confusion.

Pod is an abstraction provided by Kubernetes and it corresponds to a group of containers which share a subset of namespaces, most importantly the network namespace. For instances the applications running in these containers can interact like the way applications in the same vm would interact, except for the fact that they don't share the same filesystem hierarchy.
The workloads are run in the form of pods, but POD is a lower level abstraction. The workloads are typically scheduled in terms of Kubernetes Deployments/ Jobs / CronJobs / Daemonsets etc which in turn create the Pods.

Autoscale volume and pods simultaneously (Kubernetes)

I'm using Kubernetes deployment with persistent volume to run my application, like this example;
https://github.com/kubernetes/kubernetes/tree/master/examples/mysql-wordpress-pd
, but when I try to add more replicas or autoscale, all the new pods try to connect to the same volume.
How can I simultaneously auto create new volumes for each new pod., like statefulsets(petsets) are able to do it.

The conclusion I reached for K8S 1.6 is you can't. However, you can use NFS. If, like CrateDB, your cluster can create a folder for each node under the volume mount, then you can auto-scale. So, I auto-scale CrateDB as a Deployment using this configuration:
https://github.com/erik777/kubernetes-cratedb
which relies on an nfs-server, which I deploy as an RC with PVC/PV:
SAME_BASE/kubernetes-nfs-server
It is on my TODO list to exlpore distributed file systems such as GluterFS. For K8S Deployments, your choice of file system is your remedy.
You can also engage the scalability and storage SIGs in the K8S community to help prioritize this use-case. Adding the capability to K8S removes the requirement for a clustering solution to handle node separation in a shared volume, as well as prevent the introduction of additional points of failure between the clustered app and the PV.
GITHUB kubernetes/community
Hopefully, we can see a K8S OTB solution by 2.0.
(NOTE: Had to change 2 of the GITHUB links because I don't have "10 reputation")

What is the difference between kubernetes and GKE?

I know that GKE is driven by kubernetes underneath. But I don't seem to still get is that what part is taken care by GKE and what by k8s in the layering? The main purpose of both, as it appears to me is to manage containers in a cluster. Basically, I am looking for a simpler explanation with an example.

GKE is a managed/hosted Kubernetes (i.e. it is managed for you so you can concentrate on running your pods/containers applications)
Kubernetes does handle:
Running pods, scheduling them on nodes, guarantee no of replicas per Replication Controller settings (i.e. relaunch pods if they fail, relocate them if the node fails)
Services: proxy traffic to the right pod wherever it is located.
Jobs
In addition, there are several 'add-ons' to Kubernetes, some of which are part of what makes GKE:
DNS (you can't really live without it, even thought it's an add-on)
Metrics monitoring: with influxdb, grafana
Dashboard
None of these are out-of-the-box, although they are fairly easy to setup, but you need to maintain them.
There is no real 'logging' add-on, but there are various projects to do this (using Logspout, logstash, elasticsearch etc...)
In short Kubernetes does the orchestration, the rest are services that would run on top of Kubernetes.
GKE brings you all these components out-of-the-box, and you don't have to maintain them. They're setup for you, and they're more 'integrated' with the Google portal.
One important thing that everyone needs is the LoadBalancer part:
- Since Pods are ephemeral containers, that can be rescheduled anywhere and at any time, they are not static, so ingress traffic needs to be managed separately.
This can be done within Kubernetes by using a DaemonSet to fix a Pod on a specific node, and use a hostPort for that Pod to bind to the node's IP.
Obviously this lacks fault tolerance, so you could use multiple and do DNS round robin load balancing.
GKE takes care of all this too with external Load Balancing.
(On AWS, it's similar, with ALB taking care of load balancing in Kubernetes)

GKE (Google Container Engine) is only container platform, which Kubernetes can manage. It is not a kubernetes-like with "differences".
As mentioned in "Docker and Kubernetes and AppC " (May 2015, that can change):
Docker is currently the only supported runtime in GKE (Google Container Engine) our commercial containers product, and in GAE (Google App Engine), our Platform-as-a-Service product.
You can see Kubernetes used on GKE in this example: "Spinning Up Your First Kubernetes Cluster on GKE" from Rimantas Mocevicius.
The gcloud API will still make kubernetes commands behind the scene.
GKE will organize its platform through Kubernetes master
Every container cluster has a single master endpoint, which is managed by Container Engine.
The master provides a unified view into the cluster and, through its publicly-accessible endpoint, is the doorway for interacting with the cluster.
The managed master also runs the Kubernetes API server, which services REST requests, schedules pod creation and deletion on worker nodes, and synchronizes pod information (such as open ports and location) with service information.

In short, without getting into technical details,
GKE is managed Kubernetes, similar to how Google's Cloud Composer is managed Apache Airflow and Cloud Dataflow is managed Apache Beam.
So, some of Google Cloud Platform's services (GKE, Cloud Composer, Cloud Dataflow) are managed implementations of various open source technologies (Kubernetes, Airflow, Beam).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart