How to deploy a kubernetes cluster on multiple physical machines in the best manner? - docker

I recently finished a project where I created an App consisting of several docker containers. The purpose of the app was to collect some data and safe it to an databank and also allow user interactions over an simple web gui. The app was hosted on four different Raspberry Pi's and it was possible to collect data from all physicial maschines through an api. Further you could do some simple machine learning tasks like calculating anomalies in the sensor data of the Pi's.
Now I'm trying to take the next step and using kubernetes for some load balancing and remote updates. My main goal is to remote update all raspberries from my master node. Which, in theory, would be a very handy feature. Also I want to share the ressources of the Pi's within the cluster for calculations.
I read a lot about Kubernets, Minikube, K3's, Kind and all the different approaches to set up an Kubernetes cluster, but feel like I am missing "a last puzzle piece".
So from what I understood I need an approach which allows me to set up an local (because all machines are laying on my desk/ no cloud needed) multi node cluster. My master node would be (idealy) my laptop, running Ubuntu in a virtual machine. My rasberry's would be my slave/worker nodes. If I would want to update my cluster I can use the kubernetes remote update functionality.
So my question out of this would be: Does it makes sense to use several rasberries as nodes in a kubernetes cluster and to manage them from one master node (laptop) and do you have any suggestions about the way to achieve this setup.
I usally dont like those question not containing any specific code or questions by myself, but feel like an simple hint could accelerate my project noteable. If it's the wrong place please feel free to delete this question.
Best regards

You didn't mention which rpi models you are using, but I assume you are not using rpi zeros.
My main goal is to remote update all raspberries from my master node.
Assuming that by that you mean updating your applications running in kubernetes that is installed on rpi then keep reading. Otherwise ignore all I wrote, and what you probably need is ansible or other simmilar provisioning/configuration-management/application-deployment tool.
Now answering to your question:
Does it makes sense to use several rasberries as nodes in a kubernetes cluster
yes, this is why people created k3s, so such setup is possible using less resources.
and to manage them from one master node (laptop)
assuming you will be using it for learning purpouses then why not. It is possible, but just be aware that when master node goes down (e.g. when you turn off your laptop), all cluster goes down (or at least api-server communication so you wont be able to change cluster's state). Also make sure you are using bridge networking interface for your VM so it is visible in your local network as a standalone instance.
and do you have any suggestions about the way to achieve this setup.
installing k3s on all nodes would be the easiest in your case. There are plenty of resources on the internet explaining how to achieve it.
One last thing I would like to explain is the thing with updates.
Speaking of kubernetes updates you need to know that kubernetes doesn't update itself automatically. You need to explicitly update it. New k8s version is beeing released every 3 months that sometimes "breaks" things and backward compatibility is not possible (so always read changelog before updating stuff because rollbacks may not be possible unless you backed up an etcd cluster earlier).
Speaking of updating applications - To run your app all you do is send yaml files describing your application to k8s and it handles the rest. So if you want to update your app just update the tag on container image to newer version and k8s will handle the updates. Read here more about update strategies in k8s.

Related

What is the "proper" way to migrate from Docker Compose to Kubernetes?

My organization manages systems where each client is provisioned a VPS and then their tech stack is spun up on that system via Docker Compose.
Data is stored on-system, using Docker Compose volumes. None of the fancy named storage - just good old direct path volumes.
While this solution is workable, the problem is that this method does not scale. We can always give the VPS more CPU/Memory but that does not fix the underlying issues.
Staging / development environments must be brought up manually - and there is no service redundancy. Hot swapping is impossible with our current system.
Kubernetes has been pitched to me to solve our problems, but honestly I have no idea where to begin - most of the documentation is obtuse and I have failed to find somebody with our particular predicament.
The end goal would be to have just a few high-spec machines running Kubernetes - with redundancy, staging, and the ability to spin up new clients as necessary (without having to provision additional machines or external IPs).
What specific tools would my organization need to use to achieve this goal?
Are there any tools that would allow us to bring over our existing Docker Compose stacks into Kubernetes?
Where to begin: given what you're telling us, I would first look into my options to implement some SDS.
You're currently using local volumes, which you probably won't be able to do with Kubernetes - or at least shouldn't, if you don't want to bind your containers to a unique node.
The most easy way - while not necessarily the one I would recommend - would be to use some NFS servers. Even better: with some DRBD, pacemaker / corosync, using a VIP for failover -- or the FreeBSD way: hastd, carp, ifstated, maybe some zfs. You would probably have to deploy distinct systems scaling your Kubernetes cluster, distributing IOs, ... a single NFS server doesn't last long without its load going over 50 and iowaits spiking ...
A better way would be to look into actual SDS solutions. One I could recommend is Ceph, though there's a lot of new solutions I'm less familiar with ... and there's GlusterFS I would definitely avoid. An easy way to deploy Ceph would be to use ceph-ansible.
Given what corporate hardware you have at your disposal, maybe you would have some NetApp or equivalent, something that can implement NFS shares, and/or some iSCSI gateways.
Now, those are all solutions you could run on the side, although note that you would also find "CNS" solutions (container native), which are meant to be deployed on top of Kubernetes. Ceph clusters can be managed using Rook. These can be interesting, though in terms of maintenance and operations, it requires good knowledge of both the solution you operate and kubernetes/containers in general: troubleshooting issues and fixing outages may not be as easy as a good-old bare-meta/VM setup. For a first Kubernetes experience: I would refrain myself. When you'll feel comfortable enough, go ahead.
In any cases, another critical consideration before deploying your cluster would be the network that would host your installation. Consider that Kubernetes should not be directly deployed on public instances: you would probably want to have some private VLAN, maybe an internal DNS, a local resitry (could be Kubernetes-hosted), or other tools such as an LDAP server, some SMTP relay, HTTP cache/proxies, loadbalancers to put in front of your API, ...
Once you've made up your mind regarding those issues, you can look into deploying a Kubernetes cluster using tools such as Kubespray (ansible) or Kops (uses Terraform, and thus requires some cloud API, eg: aws). Both projects are part of the Kubernetes project and maintained by its community. Kubespray would cover all scenarios (IAAS & bare-metal), integrate with popular SDS out of the box, can ship with various ingress controllers, ... overall offers good defaults, and lots of variables to customize your installation.
Start with a 3-master 2-workers cluster, make sure the resulting cluster matches what you would expect.
Before going to prod, take your time to properly translate your existing configurations. Sometime, refactoring code or images could be worth it.
Going to prod, consider adding a group of "infra" nodes: if you want to host some logging solution or other internal services that are somewhat critical to users and shouldn't suffer outages caused by end-users workloads (eg: ingress routers, monitoring, logging, integrated registry, ...).
Kubespray: https://github.com/kubernetes-sigs/kubespray/
Kops: https://github.com/kubernetes/kops
Ceph: https://ceph.com/en/discover/
Ceph Ansible: https://github.com/ceph/ceph-ansible
Rook (Ceph CNS): https://github.com/rook/rook

Does it make sense to cluster NodeJs (in order to take advantage of multiple CPUs) if will be deployed with orchestration tool like Kubernetes?

Right now I am struggling with debugging of NodeJs application which is clustered and is running on Docker. Found on this link and this information in it:
Remember, Node.js is still single-threaded in most cases, so even on a
single server you’ll likely want to spin up multiple container
replicas to take advantage of multiple CPU’s
So what does it mean, clustering of NodeJs app is pointless when it is meant to be deployed on Kubernetes ?
EDIT: I should also say that, by clustering I mean forking workers with cluster.fork() and goal of the application is to build simple REST API with high load traffic.
Short answer is yes..
Containers are just mini VM's and kubernetes is the orchestration tool that manages all the running 'containers', checking for health, resource allocation, load etc.
So, if you are running your node application in a container with an orchestration tool like kubernetes, then clustering is moot as each 'container' will be using 1 CPU or partial CPU depending on how you have it configured. Multiple containers essentially just place a new VM in rotation and kubernetes will direct traffic to each.
Now, when we talk about clustering node, that really comes into play when using tools like PM2, lets say you have a beefy server with 8 CPU's, node can only use 1 per instance so tools like PM2 setup a cluster and will route traffic along each of the running instances.
One thing to keep in mind though is that your application needs to be cluster OR container ready. Meaning nothing should be stored on the ephemeral disk as with each container restart that data is lost OR in a cluster situation there is no guarantee the folders will be available to each running instance and if you cluster with multiple servers etc you are asking for trouble :D ( this is where an object store would come into play like S3)

Where should I put shared services for multiple kubernetes-clusters?

Our company is developing an application which runs in 3 seperate kubernetes-clusters in different versions (production, staging, testing).
We need to monitor our clusters and the applications over time (metrics and logs). We also need to run a mailserver.
So basically we have 3 different environments with different versions of our application. And we have some shared services that just need to run and we do not care much about them:
Monitoring: We need to install influxdb and grafana. In every cluster there's a pre-installed heapster, that needs to send data to our tools.
Logging: We didn't decide yet.
Mailserver (https://github.com/tomav/docker-mailserver)
independant services: Sentry, Gitlab
I am not sure where to run these external shared services. I found these options:
1. Inside each cluster
We need to install the tools 3 times for the 3 environments.
Con:
We don't have one central point to analyze our systems.
If the whole cluster is down, we cannot look at anything.
Installing the same tools multiple times does not feel right.
2. Create an additional cluster
We install the shared tools in an additional kubernetes-cluster.
Con:
Cost for an additional cluster
It's probably harder to send ongoing data to external cluster (networking, security, firewall etc.).
3) Use an additional root-server
We run docker-containers on an oldschool-root-server.
Con:
Feels contradictory to use root-server instead of cutting-edge-k8s.
Single point of failure.
We need to control the docker-containers manually (or attach the machine to rancher).
I tried to google for the problem but I cannot find anything about the topic. Can anyone give me a hint or some links on this topic?
Or is it just no relevant problem that a cluster might go down?
To me, the second option sound less evil but I cannot estimate yet if it's hard to transfer data from one cluster to another.
The important questions are:
Is it a problem to have monitoring-data in a cluster because one cannot see the monitoring-data if the cluster is offline?
Is it common practice to have an additional cluster for shared services that should not have an impact on other parts of the application?
Is it (easily) possible to send metrics and logs from one kubernetes-cluster to another (we are running kubernetes in OpenTelekomCloud which is basically OpenStack)?
Thanks for your hints,
Marius
That is a very complex and philosophic topic, but I will give you my view on it and some facts to support it.
I think the best way is the second one - Create an additional cluster, and that's why:
You need a point which should be accessible from any of your environments. With a separate cluster, you can set the same firewall rules, routes, etc. in all your environments and it doesn't affect your current workload.
Yes, you need to pay a bit more. However, you need resources to run your shared applications, and overhead for a Kubernetes infrastructure is not high in comparison with applications.
With a separate cluster, you can setup a real HA solution, which you might not need for staging and development clusters, so you will not pay for that multiple times.
Technically, it is also OK. You can use Heapster to collect data from multiple clusters; almost any logging solution can also work with multiple clusters. All other applications can be just run on the separate cluster, and that's all you need to do with them.
Now, about your questions:
Is it a problem to have monitoring-data in a cluster because one cannot see the monitoring-data if the cluster is offline?
No, it is not a problem with a separate cluster.
Is it common practice to have an additional cluster for shared services that should not have an impact on other parts of the application?
I think, yes. At least I did it several times, and I know some other projects with similar architecture.
Is it (easily) possible to send metrics and logs from one kubernetes-cluster to another (we are running kubernetes in OpenTelekomCloud which is basically OpenStack)?
Yes, nothing complex there. Usually, it does not depend on the platform.

Kubernetes scaling pods using custom algorithm

Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web and Mongo. Currently we run these containers on a single machine. However as our users are increasing we are looking for a solution to scale. Using Kubernetes we would form a multi container pod. If we are to replicate we need to replicate all 3 containers as a unit. Our cloud application is consumed by mobile app users. Our app can only handle approx 30000 users per Worker node and we intend to place a single pod on a single worker node. Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
We plan on using Kubernetes to manage the containers. Load balancing doesn't work for our use case as a mobile device needs to be tied to a single machine once assigned and each Pod works independently with its own persistent volume. However we need a way of spinning up new Pods on worker nodes if the number of users goes over 30000 and so on.
The idea is we have some sort of custom scheduler which assigns a mobile device a Worker Node ( domain/ IPaddress) depending on the number of users on that node.
Is Kubernetes a good fit for this design and how could we implement a custom pod scale algorithm.
Thanks
Piggy-Backing on the answer of Jonah Benton:
While this is technically possible - your problem is not with Kubernetes it's with your Application! Let me point you the problem:
Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web, and Mongo.
Here is your first problem: Is you can only deploy these three containers together and not independently - you cannot scale one or the other!
While MongoDB can be scaled to insane loads - if it's bundled with your web server and web application it won't be able to...
So the first step for you is to break up these three components so they can be managed independently of each other. Next:
Currently we run these containers on a single machine.
While not strictly a problem - I have serious doubt's what it would mean to scale your application and what the challenges that come with scalability!
Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
Now, this IS a problem. You're looking to run an application on Kubernetes but I do not think you understand the consequences of doing that: Kubernetes orchestrates your resources. This means it will move pods (by killing and recreating) between nodes (and if necessary to the same node). It does this fully autonomous (which is awesome and gives you a good night sleep) If you're relying on clients sticking to a single nodes IP, you're going to get up in the middle of the night because Kubernetes tried to correct for a node failure and moved your pod which is now gone and your users can't connect anymore. You need to leverage the load-balancing features (services) in Kubernetes. Only they are able to handle the dynamic changes that happen in Kubernetes clusters.
Using Kubernetes we would form a multi container pod.
And we have another winner - No! You're trying to treat Kubernetes as if it were your on-premise infrastructure! If you keep doing so you're going to fail and curse Kubernetes in the process!
Now that I told you some of the things you're thinking wrong - what a person would I be if I did not offer some advice on how to make this work:
In Kubernetes your three applications should not run in one pod! They should run in separate pods:
your webservers work should be done by Ingress and since you're already familiar with nginx, this is probably the ingress you are looking for!
Your web application should be a simple Deployment and be exposed to ingress through a Service
your database should be a separate deployment which you can either do manually through a statefullset or (more advanced) through an operator and also exposed to the web application trough a Service
Feel free to ask if you have any more questions!
Building a custom scheduler and running multiple schedulers at the same time is supported:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
That said, to the question of whether kubernetes is a good fit for this design- my answer is: not really.
K8s can be difficult to operate, with the payoff being the level of automation and resiliency that it provides out of the box for whole classes of workloads.
This workload is not one of those. In order to gain any benefit you would have to write a scheduler to handle the edge failure and error cases this application has (what happens when you lose a node for a short period of time...) in a way that makes sense for k8s. And you would have to come up to speed with normal k8s operations.
With the information provided, hard pressed to see why one would use k8s for this workload over just running docker on some VMs and scripting some of the automation.

Kubernetes With DPDK

I'm trying to figure out if Kubernetes will work for a certain use case. I understand the networking/clustering concept, and even the load balancing and how that can be used with things like nginx. However, assuming this is not deployed on a public cloud and things like ELB won't be available, could it still be used for a high-speed networking application using DPDK? For example, if we assume the cluster networking provided by k8s is only used for the control/management path, and the containers themselves handle the NIC directly with DPDK, is this something it's commonly used for?
Secondly, I understand the replication controller and petsets feature I think, but I'm not really clear on whether the intent of those features is for high availability or not. It seems that the "pod fails and the RC replaces it on a different node" isn't necessarily for HA, and there aren't really guarantees on how fast it builds a new pod. Am I incorrect?
For the second question, if the replication controller has size large than 1, it is highly available.
For example, you have an service "web-svc" in front of the replication controller "web-app", with size 3, then your request will be load balanced to one of the 3 pod:
web-svc ----> {web-app-pod1, web-app-pod2, web-app-pod3}
If some of the 3 pods fail, kubernetes will replace them with new ones.
And pet set is similar to replication controller, but used for stateful applications like database.

Resources