How spark get utilized the nodes of kubernetes when running on pod? - docker

I'm learning spark, and I'm get confused about running docker which contains spark code on Kubernetes cluster.
I read that spark get utilized multiple nodes (servers) and it can run code on different nodes, in order to get complete jobs faster (and get used the memory of each node, when the data is too big)
On the other side, I read that Kubernetes pod (which contains dockers/containers) run on one node.
For example, I'm running the following spark code from docker:
num = [1, 2, 3, 4, 5]
num_rdd = sc.parallelize(num)
double_rdd = num_rdd.map(lambda x: x * 2)
Some notes and reminders (from my understanding):
When using the map command, each value of the num array maps to different spark node (slave worker)
k8s pod run on one node
So I'm confused how spark utilized multiple nodes when the pod run on one node ?
Does the spark slave workers runs on different nodes, and this is how the pod which run the code above can communicate with those nodes in order to utilize the spark framework ?

When you run Spark on Kubernetes, you have a few ways to set things up. The most common way is to set Spark to run in client-mode.
Basically Spark can run on Kubernetes on a Pod.. then the application itself, having the endpoints for the k8s masters, is able to spawn its own worker Pods, as long as everything is correctly configured.
What is needed for this setup is to deploy the Spark application on Kubernetes (usually with a StatefulSet but it's not a must) along with an headless ClusterIP Service (which is required to make worker Pods able to communicate with the master application that spawned them)
You also need to give the Spark application all the correct configurations such as the k8s masters endpoint, the Pod name and other parameters to set things up.
There are other ways to setup Spark, there's no obligation to spawn worker Pods, you can run all the stages of your code locally (and the configuration is easy, if you have small jobs with small amount of data to execute you don't need workers)
Or you can execute the Spark application externally from the Kubernetes cluster, so not on a pod.. but giving it the Kubernetes master endpoints so that it can still spawn workers on the cluster (aka cluster-mode)
You can find a lot more info in the Spark documentation, which explains mostly everything to set things up (https://spark.apache.org/docs/latest/running-on-kubernetes.html#client-mode)
And can read about StatefulSets and their usage of headless ClusterIP Services here (https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)

Related

Airflow with mysql_to_gcp negsignal.sigkill

I'm using airflow with composer (GCP) to extract data from cloud sql for gcs and after gcs for bigquery, I have some tables between 100 Mb and 10 Gb. My dag has two tasks to do what I mentioned before. with the smaller tables the dag runs smoothly, but with slightly larger tables the cloud sql extraction task ends in a few seconds with failure, but does not bring any logs except "negsignal.sigkill", I have already tried to increase the composer capacity , among other things, but nothing has worked yet.
I'm using the mysql_to_gcs and gcs_to_bigquery operators
The first thing you should check when you get negsinal.SIGKILL is your Kubernetes resources. This is surely a problem with resources limits.
I think you should monitor your Kubernetes Cluster Nodes. Inside GCP, go to Kubernetes Engine > Clusters. You should have a cluster containing the environment that Cloud Composer uses.
Now, head to the nodes of your cluster. Each node provides you metrics about CPU, memory & disk usage. You will also see the limit for the resources that each node uses. Also, you will see the pods that each node has.
If you are not very familiar with K8s, let me explain this briefly. Airflow uses Pods inside nodes to run your Airflow tasks. These pods are called airflow-worker-[id]. That way you can identify your worker pods inside the Node.
Check your pods list. If you have evicted airflow-worker pods, then Kubernetes is stopping your workers for some reason. Since Composer uses CeleryExecutor, a evicted airflow-worker points to a problem. This is not the case if you use KubernetesExecutor, but that is not available yet in Composer.
If you click in some evicted pod, you will see the reason for eviction. That should give you the answer.
If you don't see a problem with your pod eviction, don't panic, you still have some options. From that point on, your best friend will be logs. Be sure to check your pods logs, node logs and cluster logs, in that order.

3 tier architecture on kubernetes

I have 1 master kubernetes server and 9 nodes. In that, I want to run backend on 2 nodes and frontend on 2 nodes and DB on 3 nodes.
For all backend, frontend, DB I have ready DockerImage.
How to run an image using kubernetes on only desired(2 or 3).
Please share some ideas to achieve the same.
The Kubernetes scheduler most of the time will do a good job distributing the pods across the cluster. You may want to delegate that responsibility to the scheduler unless you have very specific requirements.
If you want to control this, you can use:
Node selectors
Node Affinity or Anti-Affinity
Directly specify the node name in the deployment spec
From these three, the recommended approach is to use node affinity or anti-affinity due to its flexibility.
Run the front end as a Deployment with desired replica count and let kubernetes manage it for you.
Run Backend as Deployment with desired number of replicas and Kubernetes will figure out how to run it. Use node selectors if you prefer specific nodes.
Run the DB as Deployment OR StatefulSet, Kubernetes will figure out how to run it.
https://kubernetes.io/docs/tutorials/stateful-application/mysql-wordpress-persistent-volume/
Use network policies to restrict traffic.
You may use labels and nodeSelector. Here it is:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

Trying to understand what a Kubernetes Worker Node and Pod is compared to a Docker "Service"

I'm trying to learn Kubernetes to push up my microservices solution to some Kubernetes in the Cloud (e.g. Azure Kubernetes Service, etc)
As part of this, I'm trying to understand the main concepts, specifically around Pods + Workers and (in the yml file) Pods + Services. To do this, I'm trying to compare what I have inside my docker-compose file against the new concepts.
Context
I currently have a docker-compose.yml file which contains about 10 images. I've split the solution up into two 'networks': frontend and backend. The backend network contains 3 microservices and cannot be accessed at all via a browser. The frontend network contains a reverse-proxy (aka. Traefik, which is just like nginx) which is used to route all requests to the appropriate backend microservice and a simple SPA web app. All works 100% awesome.
Each backend Microservice has at least one of these:
Web API host
Background tasks host
So this means, I could scale out the WebApi hosts, if required .. but I should never scale out the background tasks hosts.
Here's a simple diagram of the solution:
So if the SPA app tries to request some data with the following route:
https://api.myapp.com/account/1 this will hit the reverse-proxy and match a rule to then forward onto <microservice b>/account/1
So it's from here, I'm trying to learn how to write up an Kubernetes deployment file based on these docker-compose concepts.
Questions
Each 'Pod' has it's own IP so I should create a Pod per container. (Yes, a Pod can have multiple containers and to me, that's like saying 'install these software products on the same machine')
A 'Worker Node' is what we replicate/scale out, so we should put our Pods into a Node based on the scaling scenario. For example, the Background Task hosts should go into one Node because they shouldn't be scaled. Also, the hardware requirements for that node are really small. While the Web Api's should go into another Node so they can be replicated/scaled out
If I'm on the right path with the understanding above, then I'll have a lot of nodes and pods ... which feels ... weird?
The pod is the unit of Workload, and has one or more containers. Exactly one container is normal. You scale that workload by changing the number of Pod Replicas in a ReplicaSet (or Deployment).
A Pod is mostly an accounting construct with no direct parallel to base docker. It's similar to docker-compose's Service. A pod is mostly immutable after creation. Like every resource in kubernetes, a pod is a declaration of desired state - containers to be running somewhere. All containers defined in a pod are scheduled together and share resources (IP, memory limits, disk volumes, etc).
All Pods within a ReplicaSet are both fungible and mortal - a request can be served by any pod in the ReplicaSet, and any pod can be replaced at any time. Each pod does get its own IP, but a replacement pod will probably get a different IP. And if you have multiple replicas of a pod they'll all have different IPs. You don't want to manage or track pod IPs. Kubernetes Services provide discovery (how do I find those pods' IPs) and routing (connect to any Ready pod without caring about its identity) and load balancing (round robin over that group of Pods).
A Node is the compute machine (VM or Physical) running a kernel and a kubelet and a dockerd. (This is a bit of a simplification. Other container runtimes than just dockerd exist, and the virtual-kubelet project aims to turn this assumption on its head.)
All pods are Scheduled on Nodes. When a pod (with containers) is scheduled on a node, the kubelet responsible for & running on that node does things. The kubelet talks to dockerd to start containers.
Once scheduled on a node, a pod is not moved to another node. Nodes are fungible & mortal too, though. If a node goes down or is being decommissioned, the pod will be evicted/terminated/deleted. If that pod was created by a ReplicaSet (or Deployment) then the ReplicaSet Controller will create a new replica of that pod to be scheduled somewhere else.
You normally start many (1-100) pods+containers on the same node+kubelet+dockerd. If you have more pods than that (or they need a lot of cpu/ram/io), you need more nodes. So the nodes are also a unit of scale, though very much indirectly wrt the web-app.
You do not normally care which Node a pod is scheduled on. You let kubernetes decide.

Cluster level logging using Elasticsearch and Kibana on Docker for Windows

The Kubernetes documentation states it's possible to use Elasticsearch and Kibana for cluster level logging.
Is this possible to do this on the instance of Kubernetes that's shipped with Docker for Windows as per the documentation? I'm not interested in third party Kubernetes manifests or Helm charts that mimic this behavior.
Kubernetes is an open-source system for automating deployment, scaling,
and management of containerized applications.
It is a complex environment with a huge amount of information regarding the state of cluster and events
processed during execution of pods lifecycle and health checking off all nodes and whole Kubernetes
cluster.
I do not have practice with Docker for Windows, so my point of view is based on Kubernetes with Linux containers
perspective.
To collect and analyze all of this information there are some tools like Fluentd, Logstash
and they are accompanied by tools such as Elasticsearch and Kibana.
Those cluster-level log aggregation can be realized using Kubernetes orchestration framework.
So we can expect that some running containers take care of gathering data and other containers
take care of other aspects of abstractions like analyzing and presentation layer.
Please notice that some solutions depend on cloud platform features where Kubernetes environment
is running. For example, GCP offers Stackdriver Logging.
We can mention some layers of log probes and analyses:
monitoring a pod
is the most rudimentary form of viewing Kubernetes logs.
You use the kubectl commands to fetch log data for each pod individually.
These logs are stored in the pod and when the pod dies, the logs die with them.
monitoring a node. Collected log for each node are stored in a JSON file. This file can get really large.
Node-level logs are more persistent than pod-level ones.
monitoring a cluster.
Kubernetes doesn’t provide a default logging mechanism for the entire cluster, but leaves this up
to the user and third-party tools to figure out. One approach is to build on the node-level logging.
This way, you can assign an agent to log every node and combine their output.
As you see, there is a niche on cluster level monitoring, so there is a reason to aggregate current logs and
offer a practical way to analyze and present results.
On the node level logging, popular log aggregator is Fluentd. It is implemented as a Docker container,
and it is run parallel with pod lifecycle. Fluentd does not store the logs themselves.
Instead, it sends their logs to an Elasticsearch cluster that stores the log information in a replicated set of nodes.
It looks like Elasticsearch is used as a data store of aggregated logs of working nodes.
This aggregator cluster consists of a pod with two instances of Elasticsearch.
The aggregated logs in the Elasticsearch cluster can be viewed using Kibana.
This presents a web interface, which provides a more convenient interactive method for querying the ingested logs
The Kibana pods are also monitored by the Kubernetes system to ensure they are running healthily and the expected
number of replicas are present.
The lifecycle of these pods is controlled by a replication-controller specification similar in nature to how the
Elasticsearch cluster was configured.
Back to your question. I'm pretty sure that the mentioned above also works with Kubernetes and Dockers
for Windows. From the other hand, I think the cloud platform or the Linux premise environment
is a natural space to live for them.
Answer was inspired by Cluster-level Logging of Containers with Containers and Kubernetes Logging articles.
I also like Configuring centralized logging from Kubernetes page and used An Introduction
to logging in Kubernetes at my beginning with Kubernetes.

What is the difference between kubernetes and GKE?

I know that GKE is driven by kubernetes underneath. But I don't seem to still get is that what part is taken care by GKE and what by k8s in the layering? The main purpose of both, as it appears to me is to manage containers in a cluster. Basically, I am looking for a simpler explanation with an example.
GKE is a managed/hosted Kubernetes (i.e. it is managed for you so you can concentrate on running your pods/containers applications)
Kubernetes does handle:
Running pods, scheduling them on nodes, guarantee no of replicas per Replication Controller settings (i.e. relaunch pods if they fail, relocate them if the node fails)
Services: proxy traffic to the right pod wherever it is located.
Jobs
In addition, there are several 'add-ons' to Kubernetes, some of which are part of what makes GKE:
DNS (you can't really live without it, even thought it's an add-on)
Metrics monitoring: with influxdb, grafana
Dashboard
None of these are out-of-the-box, although they are fairly easy to setup, but you need to maintain them.
There is no real 'logging' add-on, but there are various projects to do this (using Logspout, logstash, elasticsearch etc...)
In short Kubernetes does the orchestration, the rest are services that would run on top of Kubernetes.
GKE brings you all these components out-of-the-box, and you don't have to maintain them. They're setup for you, and they're more 'integrated' with the Google portal.
One important thing that everyone needs is the LoadBalancer part:
- Since Pods are ephemeral containers, that can be rescheduled anywhere and at any time, they are not static, so ingress traffic needs to be managed separately.
This can be done within Kubernetes by using a DaemonSet to fix a Pod on a specific node, and use a hostPort for that Pod to bind to the node's IP.
Obviously this lacks fault tolerance, so you could use multiple and do DNS round robin load balancing.
GKE takes care of all this too with external Load Balancing.
(On AWS, it's similar, with ALB taking care of load balancing in Kubernetes)
GKE (Google Container Engine) is only container platform, which Kubernetes can manage. It is not a kubernetes-like with "differences".
As mentioned in "Docker and Kubernetes and AppC " (May 2015, that can change):
Docker is currently the only supported runtime in GKE (Google Container Engine) our commercial containers product, and in GAE (Google App Engine), our Platform-as-a-Service product.
You can see Kubernetes used on GKE in this example: "Spinning Up Your First Kubernetes Cluster on GKE" from Rimantas Mocevicius.
The gcloud API will still make kubernetes commands behind the scene.
GKE will organize its platform through Kubernetes master
Every container cluster has a single master endpoint, which is managed by Container Engine.
The master provides a unified view into the cluster and, through its publicly-accessible endpoint, is the doorway for interacting with the cluster.
The managed master also runs the Kubernetes API server, which services REST requests, schedules pod creation and deletion on worker nodes, and synchronizes pod information (such as open ports and location) with service information.
In short, without getting into technical details,
GKE is managed Kubernetes, similar to how Google's Cloud Composer is managed Apache Airflow and Cloud Dataflow is managed Apache Beam.
So, some of Google Cloud Platform's services (GKE, Cloud Composer, Cloud Dataflow) are managed implementations of various open source technologies (Kubernetes, Airflow, Beam).

Resources