Kubernetes and network control - docker

I have some simple software which I need to distribute on about 20 physical servers. I'm planning on using Ansible+Docker for provisioning the servers. These tools does not quite cover my needs though; I need the software to be running at all times. If a box is out for any reason, another node needs to spawn a new instance. This is pretty much what any of the orchestration tools offers, but I also have some network constraints:
I need some control over where the instances run. A node cannot run all the instances.
It would be nice to prefer another node that is as close as possible to the failing node in the network
Is Kubernetes the right fit for this system?
Also, does the system work without the Kubernetes master? Can the master be redundant as well?

Related

Does it make sense to cluster NodeJs (in order to take advantage of multiple CPUs) if will be deployed with orchestration tool like Kubernetes?

Right now I am struggling with debugging of NodeJs application which is clustered and is running on Docker. Found on this link and this information in it:
Remember, Node.js is still single-threaded in most cases, so even on a
single server you’ll likely want to spin up multiple container
replicas to take advantage of multiple CPU’s
So what does it mean, clustering of NodeJs app is pointless when it is meant to be deployed on Kubernetes ?
EDIT: I should also say that, by clustering I mean forking workers with cluster.fork() and goal of the application is to build simple REST API with high load traffic.
Short answer is yes..
Containers are just mini VM's and kubernetes is the orchestration tool that manages all the running 'containers', checking for health, resource allocation, load etc.
So, if you are running your node application in a container with an orchestration tool like kubernetes, then clustering is moot as each 'container' will be using 1 CPU or partial CPU depending on how you have it configured. Multiple containers essentially just place a new VM in rotation and kubernetes will direct traffic to each.
Now, when we talk about clustering node, that really comes into play when using tools like PM2, lets say you have a beefy server with 8 CPU's, node can only use 1 per instance so tools like PM2 setup a cluster and will route traffic along each of the running instances.
One thing to keep in mind though is that your application needs to be cluster OR container ready. Meaning nothing should be stored on the ephemeral disk as with each container restart that data is lost OR in a cluster situation there is no guarantee the folders will be available to each running instance and if you cluster with multiple servers etc you are asking for trouble :D ( this is where an object store would come into play like S3)

How to deploy a kubernetes cluster on multiple physical machines in the best manner?

I recently finished a project where I created an App consisting of several docker containers. The purpose of the app was to collect some data and safe it to an databank and also allow user interactions over an simple web gui. The app was hosted on four different Raspberry Pi's and it was possible to collect data from all physicial maschines through an api. Further you could do some simple machine learning tasks like calculating anomalies in the sensor data of the Pi's.
Now I'm trying to take the next step and using kubernetes for some load balancing and remote updates. My main goal is to remote update all raspberries from my master node. Which, in theory, would be a very handy feature. Also I want to share the ressources of the Pi's within the cluster for calculations.
I read a lot about Kubernets, Minikube, K3's, Kind and all the different approaches to set up an Kubernetes cluster, but feel like I am missing "a last puzzle piece".
So from what I understood I need an approach which allows me to set up an local (because all machines are laying on my desk/ no cloud needed) multi node cluster. My master node would be (idealy) my laptop, running Ubuntu in a virtual machine. My rasberry's would be my slave/worker nodes. If I would want to update my cluster I can use the kubernetes remote update functionality.
So my question out of this would be: Does it makes sense to use several rasberries as nodes in a kubernetes cluster and to manage them from one master node (laptop) and do you have any suggestions about the way to achieve this setup.
I usally dont like those question not containing any specific code or questions by myself, but feel like an simple hint could accelerate my project noteable. If it's the wrong place please feel free to delete this question.
Best regards
You didn't mention which rpi models you are using, but I assume you are not using rpi zeros.
My main goal is to remote update all raspberries from my master node.
Assuming that by that you mean updating your applications running in kubernetes that is installed on rpi then keep reading. Otherwise ignore all I wrote, and what you probably need is ansible or other simmilar provisioning/configuration-management/application-deployment tool.
Now answering to your question:
Does it makes sense to use several rasberries as nodes in a kubernetes cluster
yes, this is why people created k3s, so such setup is possible using less resources.
and to manage them from one master node (laptop)
assuming you will be using it for learning purpouses then why not. It is possible, but just be aware that when master node goes down (e.g. when you turn off your laptop), all cluster goes down (or at least api-server communication so you wont be able to change cluster's state). Also make sure you are using bridge networking interface for your VM so it is visible in your local network as a standalone instance.
and do you have any suggestions about the way to achieve this setup.
installing k3s on all nodes would be the easiest in your case. There are plenty of resources on the internet explaining how to achieve it.
One last thing I would like to explain is the thing with updates.
Speaking of kubernetes updates you need to know that kubernetes doesn't update itself automatically. You need to explicitly update it. New k8s version is beeing released every 3 months that sometimes "breaks" things and backward compatibility is not possible (so always read changelog before updating stuff because rollbacks may not be possible unless you backed up an etcd cluster earlier).
Speaking of updating applications - To run your app all you do is send yaml files describing your application to k8s and it handles the rest. So if you want to update your app just update the tag on container image to newer version and k8s will handle the updates. Read here more about update strategies in k8s.

Kubernetes scaling pods using custom algorithm

Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web and Mongo. Currently we run these containers on a single machine. However as our users are increasing we are looking for a solution to scale. Using Kubernetes we would form a multi container pod. If we are to replicate we need to replicate all 3 containers as a unit. Our cloud application is consumed by mobile app users. Our app can only handle approx 30000 users per Worker node and we intend to place a single pod on a single worker node. Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
We plan on using Kubernetes to manage the containers. Load balancing doesn't work for our use case as a mobile device needs to be tied to a single machine once assigned and each Pod works independently with its own persistent volume. However we need a way of spinning up new Pods on worker nodes if the number of users goes over 30000 and so on.
The idea is we have some sort of custom scheduler which assigns a mobile device a Worker Node ( domain/ IPaddress) depending on the number of users on that node.
Is Kubernetes a good fit for this design and how could we implement a custom pod scale algorithm.
Thanks
Piggy-Backing on the answer of Jonah Benton:
While this is technically possible - your problem is not with Kubernetes it's with your Application! Let me point you the problem:
Our cloud application consists of 3 tightly coupled Docker containers, Nginx, Web, and Mongo.
Here is your first problem: Is you can only deploy these three containers together and not independently - you cannot scale one or the other!
While MongoDB can be scaled to insane loads - if it's bundled with your web server and web application it won't be able to...
So the first step for you is to break up these three components so they can be managed independently of each other. Next:
Currently we run these containers on a single machine.
While not strictly a problem - I have serious doubt's what it would mean to scale your application and what the challenges that come with scalability!
Once a mobile device is connected to worker node it must continue to only use that machine ( unique IP address )
Now, this IS a problem. You're looking to run an application on Kubernetes but I do not think you understand the consequences of doing that: Kubernetes orchestrates your resources. This means it will move pods (by killing and recreating) between nodes (and if necessary to the same node). It does this fully autonomous (which is awesome and gives you a good night sleep) If you're relying on clients sticking to a single nodes IP, you're going to get up in the middle of the night because Kubernetes tried to correct for a node failure and moved your pod which is now gone and your users can't connect anymore. You need to leverage the load-balancing features (services) in Kubernetes. Only they are able to handle the dynamic changes that happen in Kubernetes clusters.
Using Kubernetes we would form a multi container pod.
And we have another winner - No! You're trying to treat Kubernetes as if it were your on-premise infrastructure! If you keep doing so you're going to fail and curse Kubernetes in the process!
Now that I told you some of the things you're thinking wrong - what a person would I be if I did not offer some advice on how to make this work:
In Kubernetes your three applications should not run in one pod! They should run in separate pods:
your webservers work should be done by Ingress and since you're already familiar with nginx, this is probably the ingress you are looking for!
Your web application should be a simple Deployment and be exposed to ingress through a Service
your database should be a separate deployment which you can either do manually through a statefullset or (more advanced) through an operator and also exposed to the web application trough a Service
Feel free to ask if you have any more questions!
Building a custom scheduler and running multiple schedulers at the same time is supported:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
That said, to the question of whether kubernetes is a good fit for this design- my answer is: not really.
K8s can be difficult to operate, with the payoff being the level of automation and resiliency that it provides out of the box for whole classes of workloads.
This workload is not one of those. In order to gain any benefit you would have to write a scheduler to handle the edge failure and error cases this application has (what happens when you lose a node for a short period of time...) in a way that makes sense for k8s. And you would have to come up to speed with normal k8s operations.
With the information provided, hard pressed to see why one would use k8s for this workload over just running docker on some VMs and scripting some of the automation.

Container technologies: docker, rkt, orchestration, kubernetes, GKE and AWS Container Service

I'm trying to get a good understanding of container technologies but am somewhat confused. It seems like certain technologies overlap different portions of the stack and different pieces of different technologies can be used as the DevOps team sees fit (e.g., can use Docker containers but don't have to use the Docker engine, could use engine from cloud provider instead). My confusion lies in understanding what each layer of the "Container Stack" provides and who the key providers are of each solution.
Here's my layman's understanding; would appreciate any corrections and feedback on holes in my understanding
Containers: self-contained package including application, runtime environment, system libraries, etc.; like a mini-OS with an application
It seems like Docker is the de-facto standard. Any others that are notable and widely used?
Container Clusters: groups of containers that share resources
Container Engine: groups containers into clusters, manages resources
Orchestrator: is this any different from a container engine? How?
Where do Docker Engine, rkt, Kubernetes, Google Container Engine, AWS Container Service, etc. fall between #s 2-4?
This may be a bit long and present some oversimplification but should be sufficient to get the idea across.
Physical machines
Some time ago, the best way to deploy simple applications was to simply buy a new webserver, install your favorite operating system on it, and run your applications there.
The cons of this model are:
The processes may interfere with each other (because they share CPU and file system resources), and one may affect the other's performance.
Scaling this system up/down is difficult as well, taking a lot of effort and time in setting up a new physical machine.
There may be differences in the hardware specifications, OS/kernel versions and software package versions of the physical machines, which make it difficult to manage these application instances in a hardware-agnostic manner.
Applications, being directly affected by the physical machine specifications, may need specific tweaking, recompilation, etc, which means that the cluster administrator needs to think of them as instances at an individual machine level. Hence, this approach does not scale. These properties make it undesirable for deploying modern production applications.
Virtual Machines
Virtual machines solve some of the problems of the above:
They provide isolation even while running on the same machine.
They provide a standard execution environment (the guest OS) irrespective of the underlying hardware.
They can be brought up on a different machine (replicated) quite quickly when scaling (order of minutes).
Applications typically do not need to be rearchitected for moving from physical hardware to virtual machines.
But they introduce some problems of their own:
They consume large amounts of resources in running an entire instance of an operating system.
They may not start/go down as fast as we want them to (order of seconds).
Even with hardware assisted virtualization, application instances may see significant performance degradation over an application running directly on the host.
(This may be an issue only for certain kinds of applications)
Packaging and distributing VM images is not as simple as it could be.
(This is not as much a drawback of the approach, as it is of the existing tooling for virtualization.)
Containers
Then, somewhere along the line, cgroups (control groups) were added to the linux kernel. This feature lets us isolate processes in groups, decide what other processes and file system they can see, and perform resource accounting at the group level.
Various container runtimes and engines came along which make the process of creating a "container", an environment within the OS, like a namespace which has limited visibility, resources, etc, very easy. Common examples of these include docker, rkt, runC, LXC, etc.
Docker, for example, includes a daemon which provides interactions like creating an "image", a reusable entity that can be launched into a container instantly. It also lets one manage individual containers in an intuitive way.
The advantages of containers:
They are light-weight and run with very little overhead, as they do not have their own instance of the kernel/OS and are running on top of a single host OS.
They offer some degree of isolation between the various containers and the ability to impose limits on various resources consumed by them (using the cgroup mechanism).
The tooling around them has evolved rapidly to allow easy building of reusable units (images), repositories for storing image revisions (container registries) and so on, largely due to docker.
It is encouraged that a single container run a single application process, in order to maintain and distribute it independently. The light-weight nature of a container make this preferable, and leads to faster development due to decoupling.
There are some cons as well:
The level of isolation provided is a less than that in case of VMs.
They are easiest to use with stateless 12-factor applications being built afresh and a slight struggle if one tries to deploy legacy applications, clustered distributed databases and so on.
They need orchestration and higher level primitives to be used effectively and at scale.
Container Orchestration
When running applications in production, as the complexity grows, it tends to have many different components, some of which scale up/down as necessary, or may need to be scaled. The containers themselves do not solve all our problems. We need a system that solves problems associated with real large-scale applications such as:
Networking between containers
Load balancing
Managing storage attached to these containers
Updating containers, scaling them, spreading them across nodes in a multi-node cluster and so on.
When we want to manage a cluster of containers, we use a container orchestration engine. Examples of these are Kubernetes, Mesos, Docker Swarm etc. They provide a host of functionality in addition to those listed above and the goal is to reduce the effort involved in dev-ops.
GKE (Google Container Engine) is hosted Kubernetes on Google Cloud Platform. It lets a user simply specify that they need an n-node kubernetes cluster and exposes the cluster itself as a managed instance. Kubernetes is open source and if one wanted to, one could also set it up on Google Compute Engine, a different cloud provider, or their own machines in their own data-center.
ECS is a proprietary container management/orchestration system built and operated by Amazon and available as part of the AWS suite.
To answer your questions specifically:
Docker engine: A tool to manage the lifecycle of a docker container and docker images. Create, restart, delete docker containers. Create, rename, delete docker images.
rkt: Analogous to docker engine, but different implementation
Kubernetes: A collection of tools to manage the lifecycle of a distributed application that uses containers. Contains tooling to manage containers, groups of containers, configuration for containers, orchestrating containers, scheduling them on actual instances, tooling to help developers write and maintain other services/tools to deal with containers.
Google Container Engine: Instead of getting VMs, installing "docker-engine" on them, installing kubernetes on them and getting it all to work with things like the right permissions to your infrastructure etc. imagine if it all came together so that you can choose the types of machines and the size of your cluster that has all of this just working. Things like pulling images from your project specific docker repository (google container registry) or claiming persistent volumes, or provisioning load-balancers just work without worrying about service accounts and permissions and what not.
ECS: Analogous to GKE (4) but without Kubernetes.
To address the points in your understanding: you are loosely right about things (except container engine I think). It's important to understand that the only important thing to understand is what a container is. The rest of it is just marketing/product names. It's also important to understand that today's understanding of containers is very warped by what Docker containers are and a lot of the opinions enforced by Docker and tooling around Docker. Containers have been around for a long time.
So once you understand what a (docker) container is, a container engine is just a tool to manage them, a container cluster is a just a group of containers, an orchestrator is just a tool to manage where containers run based on some parameters. IMHO, you really don't need to worry too much about what the rest of the tooling is once you understand and build a solid mental model around containers. The rest will just fit in automatically.
The best way to understand all of this? Build & deploy a decently complex application with Docker (persist data/use a database in your app) and everything will make sense.

What's the best way to run a gen_server on all nodes in an Erlang cluster?

I'm building a monitoring tool in Erlang. When run on a cluster, it should run a set of data collection functions on all nodes and record that data using RRD on a single "recorder" node.
The current version has a supervisor running on the master node (rolf_node_sup) which attempts to run a 2nd supervisor on each node in the cluster (rolf_service_sup). Each of the on-node supervisors should then start and monitor a bunch of processes which send messages back to a gen_server on the master node (rolf_recorder).
This only works locally. No supervisor is started on any remote node. I use the following code to attempt to load the on-node supervisor from the recorder node:
rpc:call(Node, supervisor, start_child, [{global, rolf_node_sup}, [Services]])
I've found a couple of people suggesting that supervisors are really only designed for local processes. E.g.
Starting processes at remote nodes
how: distributed supervision tree
What is the most OTP way to implement my requirement to have supervised code running on all nodes in a cluster?
A distributed application is suggested as one alternative to a distributed supervisor tree. These don't fit my use case. They provide for failover between nodes, but keeping code running on a set of nodes.
The pool module is interesting. However, it provides for running a job on the node which is currently the least loaded, rather than on all nodes.
Alternatively, I could create a set of supervised "proxy" processes (one per node) on the master which use proc_lib:spawn_link to start a supervisor on each node. If something goes wrong on a node, the proxy process should die and then be restarted by it's supervisor, which in turn should restart the remote processes. The slave module could be very useful here.
Or maybe I'm overcomplicating. Is directly supervising nodes a bad idea, instead perhaps I should architect the application to gather data in a more loosely coupled way. Build a cluster by running the app on multiple nodes, tell one to be master, leave it at that!
Some requirements:
The architecture should be able to cope with nodes joining and leaving the pool without manual intervention.
I'd like to build a single-master solution, at least initially, for the sake of simplicity.
I would prefer to use existing OTP facilities over hand-rolled code in my implementation.
Interesting challenges, to which there are multiple solutions. The following are just my suggestions, which hopefully makes you able to better make the choice on how to write your program.
As I understand your program, you want to have one master node where you start your application. This will start the Erlang VM on the nodes in the cluster. The pool module uses the slave module to do this, which require key-based ssh communication in both directions. It also requires that you have proper dns working.
A drawback of slave is that if the master dies, so does the slaves. This is by design as it probably fit the original use case perfectly, however in your case it might be stupid (you may want to still collect data, even if the master is down, for example)
As for the OTP applications, every node may run the same application. In your code you can determine the nodes role in the cluster using configuration or discovery.
I would suggest starting the Erlang VM using some OS facility or daemontools or similar. Every VM would start the same application, where one would be started as the master and the rest as slaves. This has the drawback of marking it harder to "automatically" run the software on machines coming up in the cluster like you could do with slave, however it is also much more robust.
In every application you could have a suitable supervision tree based on the role of the node. Removing inter-node supervision and spawning makes the system much simpler.
I would also suggest having all the nodes push to the master. This way the master does not really need to care about what's going on in the slave, it might even ignore the fact that the node is down. This also allows new nodes to be added without any change to the master. The cookie could be used as authentication. Multiple masters or "recorders" would also be relatively easy.
The "slave" nodes however will need to watch out for the master going down and coming up and take appropriate action, like storing the monitoring data so it can send it later when the master is back up.
I would look into riak_core. It provides a layer of infrastructure for managing distributed applications on top of the raw capabilities of erlang and otp itself. Under riak_core, no node needs to be designated as master. No node is central in an otp sense, and any node can take over other failing nodes. This is the very essence of fault tolerance. Moreover, riak_core provides for elegant handling of nodes joining and leaving the cluster without needing to resort to the master/slave policy.
While this sort of "topological" decentralization is handy, distributed applications usually do need logically special nodes. For this reason, riak_core nodes can advertise that they are providing specific cluster services, e.g., as embodied by your use case, a results collector node.
Another interesting feature/architecture consequence is that riak_core provides a mechanism to maintain global state visible to cluster members through a "gossip" protocol.
Basically, riak_core includes a bunch of useful code to develop high performance, reliable, and flexible distributed systems. Your application sounds complex enough that having a robust foundation will pay dividends sooner than later.
otoh, there's almost no documentation yet. :(
Here's a guy who talks about an internal AOL app he wrote with riak_core:
http://www.progski.net/blog/2011/aol_meet_riak.html
Here's a note about a rebar template:
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-March/003632.html
...and here's a post about a fork of that rebar template:
https://github.com/rzezeski/try-try-try/blob/7980784b2864df9208e7cd0cd30a8b7c0349f977/2011/riak-core-first-multinode/README.md
...talk on riak_core:
http://www.infoq.com/presentations/Riak-Core
...riak_core announcement:
http://blog.basho.com/2010/07/30/introducing-riak-core/

Resources