Cluster management and service discovery - ruby-on-rails

I want to introduce on my deploys a service discovery / cluster management solution. As far as I see Mesos is one solution, but I'm worried about how much it consumes in terms of RAM when installing agents of marathon, cronos, mesos, etc; my boxes have at most 512mb of RAM.
It is feasible to install Mesos on boxes with low resources?
It is Consul a replacement for Mesos?

Your question is really a number of questions:
Mesos is a great solution for cluster management. It is tested in production at high scale at twitter.
Mesos doesn't provide a service discover mechanism.
Mesos requests other components in order to provide a full solution. There is no one solution for all environments / topologies. The leading supplements are provided by mesosphere which include marathon (at a minimum).
Memory requirements will vary based on number of slaves. The starting requirement is 3MB for each the master and the slave. Making it feasible to install on nodes with low resources.
Consul is a service discovery component and does not replace Mesos. They are complementary. In fact Keen Labs has modified marathon to integrate mesos with consul. See: https://github.com/keenlabs/marathon/commit/290036e34337dcd6483550b7ab7d723bc4378d5f

Related

Is there a reason running CI builds on kubernetes cluster?

I don't know much about kubernetes, but as far as I know, it is a system that enables you to control and manage containerized applications. So, generally speaking, the essence of the benefit that we get from kubernetes is the ability to "tell" kubernetes what containers we want running, how many of them, on which machines, among other details, and kubernetes will take care of doing that for us. Is that correct?
If so, I just can't see the benefit of running a CI pipeline using a kubernetes pod, as I understand that some people do. Let's say you have your build tools on Docker containers instead of having them installed on a specific machine, that's great - you can just use those containers in the build process, why kubernetes? Is there any performance gain or something like this?
Appreciate some insights.
It is highly recommended to get a good understanding of what Kubernetes is and what it can and cannot do.
Generally, containers combined with an orchestration tools can provide a better management of your machines and services. It can significantly improve the reliability of your application and reduce the time and resources spent on DevOps.
Some of the features worth noting are:
Horizontal infrastructure scaling: New servers can be added or removed easily.
Auto-scaling: Automatically change the number of running containers, based on CPU utilization or other application-provided metrics.
Manual scaling: Manually scale the number of running containers through a command or the interface.
Replication controller: The replication controller makes sure your cluster has an equal amount of pods running. If there are too many pods, the replication controller terminates the extra pods. If there are too few, it starts more pods.
Health checks and self-healing: Kubernetes can check the health of nodes and containers ensuring your application doesn’t run into any failures. Kubernetes also offers self-healing and auto-replacement so you don’t need to worry about if a container or pod fails.
Traffic routing and load balancing: Traffic routing sends requests to the appropriate containers. Kubernetes also comes with built-in load balancers so you can balance resources in order to respond to outages or periods of high traffic.
Automated rollouts and rollbacks: Kubernetes handles rollouts for new versions or updates without downtime while monitoring the containers’ health. In case the rollout doesn’t go well, it automatically rolls back.
Canary Deployments: Canary deployments enable you to test the new deployment in production in parallel with the previous version.
However you should also know what Kubernetes is not:
Kubernetes is not a traditional, all-inclusive PaaS (Platform as a
Service) system. Since Kubernetes operates at the container level
rather than at the hardware level, it provides some generally
applicable features common to PaaS offerings, such as deployment,
scaling, load balancing, and lets users integrate their logging,
monitoring, and alerting solutions. However, Kubernetes is not
monolithic, and these default solutions are optional and pluggable.
Kubernetes provides the building blocks for building developer
platforms, but preserves user choice and flexibility where it is
important.
Especially in your use case note that Kubernetes:
Does not deploy source code and does not build your application.
Continuous Integration, Delivery, and Deployment (CI/CD) workflows are
determined by organization cultures and preferences as well as
technical requirements.
The decision is yours but having in mind the main concepts above will help you make it.
An important detail is that you do not tell Kubernetes what nodes a given pod should run on; it picks itself, and if the cluster is low on resources, in many cases it can actually allocate more nodes on its own (via the cluster autoscaler).
So if your CI system is fairly busy, and uses all containers for everything, it could make more sense to run an individual build job as a Kubernetes Job. If you have 100 builds that all start at the same time, it's possible for the cluster to give itself more hardware, and the build queue will clear out faster. Particularly if you're using Kubernetes for other tasks, this can save you same administrative effort over maintaining a dedicated pool of CI-system workers that need to be separately updated and will sit mostly idle until that big set of builds arrives.
Kubernetes's security settings are also substantially better than Docker's. Say your CI system needs to launch containers as part of a build. In Kubernetes, it can run under a service account, and be given permissions to create and delete deployments in a specific namespace, and nothing else. In Docker the standard approach is to give your CI system access to the host's Docker socket, but this can be easily exploited to take over the host.

Apache Hadoop Yarn vs. Kubernetes

Since versions 2.6 (Apache Hadoop) Yarn handles docker containers. Basically it distributes the requested amount of containers on a Hadoop cluster, restart failed containers and so on.
Kubernetes seemed to do the same.
Where are the major differences?
Kubernetes is developed almost from a clean slate for extending Docker container kernel to become a platform. Kubernetes development has taken bottom up approach. It has good optimization on specifying per container/pod resource requirements, but it lacks a effective global scheduler that can partition resources into logical grouping. Kubernetes design allows multiple schedulers to run in the cluster. Each scheduler manages resources within its own pods. However, Kubernetes cluster can suffer from instability when application demands more resources than physical systems can handle. It work best in infrastructure capacity exceeding application demands. Kubernetes scheduler will attempt to fill up the idle nodes with incoming application requests
and terminate low priority and starvation containers to improve resource utilization. Kubernetes containers can integrate with external storage system like S3 to provide resilience to data. Kubernetes framework uses etcd to store cluster data. Etcd cluster nodes and Hadoop Namenode are both single point of failures in Kubernetes or Hadoop platform. Etcd can have more replica than Namenode, hence, from reliability point of view seems to favor Kubernetes in theory. However, Kubernetes security is default open, unless RBAC are defined with fine-grained role binding. Security context is set correctly for pods. If omitted, primary group of the pod will default to root, which can be problematic for system administrators trying to secure the infrastructure.
Apache Hadoop YARN was developed to run isolated java processes to process big data workload then improved to support Docker containers. YARN provides global level resource management like capacity queues for partitioning physical resources into logical units. Each business unit can be assigned with percentage of the cluster resources. Capacity resource sharing system is designed in favor of guarentee resource
availability for Enterprise priority instead of squeezing every available physical resources. YARN does score more points in security. There are more
security featuers in Kerberos, access control for privileged/non-privileged containers, trusted docker images, and placement policy constraints. Most docker
related security are default to close, and system admin needs to manually turn on flags to grant more power to containers. Large enterprises tend to run Hadoop more
than Kubernetes because securing the system cost less. There are more distributed SQL engines built on top of YARN, including Hive, Impala, SparkSQL and IBM BigSQL.
Database options make YARN an attrative option because the ability to run online transaction processing in containers, and online analytical processing using batch workload. Hadoop Developer toolchains can be overwhelming. Mapreduce, Hive, Pig, Spark and etc, each have its own style of development. The user experience is inconsistent and take a while to learn them all. Kubernetes feels less obstructive by comparison because it only deploys docker containers. With introduction of YARN services to run
Docker container workload, YARN can feel less wordy than Kubernetes.
If your plan is to out source IT operations to public cloud, pick Kubernetes. If your plan is to build private/hybrid/multi-clouds, pick Apache YARN.
While this question and answer isn't exactly what you are asking, it does touch on a number of the same points.
Last I saw, Yarn was just a resource sharing mechanism, whereas Kubernetes is an entire platform, encompassing ConfigMaps, declarative environment management, Secret management, Volume Mounts, a super well designed API for interacting with all of those things, Role Based Access Control, and Kubernetes is in wide-spread use, meaning one can very easily find both candidates to hire and tools to buy.
A blog post I found cited a master's thesis that describes some of the fascinating trade-offs between the different scheduler's view of the world. It's a lot of words, so if you're looking for a tl;dr answer, that link may not be it, but if you're looking for actual research on the topic, it seems sound.

Does it makes sense to manage Docker containers of a/few single hosts with Kubernetes?

I'm using docker on a bare metal server. I'm pretty happy with docker-compose to configure and setup applications.
Still some features are missing, like configuration management and monitoring maybe there are other solutions to solve this issues but I'm a bit overwhelmed by the feature set of Kubernetes and can't judge if it would help me here.
I'm also open for recommendations to solve the requirements separately:
Configuration / Secret management
Monitoring of my docker hostes applications (e.g. having some kind of dashboard)
Remot container control (SSH is okay with only one Server)
Being ready to scale my environment (based on multiple different Dockerized applications) to more than one server in future - already thinking about networking/service discovery issues with a pure docker-compose setup
I'm sure Kubernetes covers some of these features, but I have the feeling that it's too much focused on Cloud platforms where Machines are created on the fly (since I only have at most few bare metal Servers)
I hope the questions scope is not too broad, else please use the comment section and help me to narrow down the question.
Thanks.
I think the Kubernetes is absolutely much your requests and it is what you need.
Let's start one by one.
I have the feeling that it's too much focused on Cloud platforms where Machines are created on the fly (since I only have at most few bare metal Servers)
No, it is not focused on Clouds. Kubernates can be installed almost on any bare-metal platform (include ARM) and have many tools and instructions which can help you to do it. Also, it is easy to deploy it on your local PC using Minikube, which will prepare local cluster for you within VMs or right in your OS (only for Linux).
Configuration / Secret management
Kubernates has a powerful configuration and management based on special objects which can be attached to your containers. You can read more about configuration management in that article.
Moreover, some tools like Helm can provide you more automation and range of preconfigured applications, which you can install using a single command. And you can prepare your own charts for it.
Monitoring of my docker hostes applications (e.g. having some kind of dashboard)
Kubernetes has its own dashboard where you can get many kinds of information: current applications status, configuration, statistics and many more. Also, Kubernetes has great integration with Heapster which can be used with Grafana for powerful visualization of almost anything.
Remot container control (SSH is okay with only one Server)
Kubernetes controlling tool kubectl can get logs and connect to containers in the cluster without any problems. As an example, to connect a container "myapp" you just need to call kubectl exec -it myapp sh, and you will get sh session in the container. Also, you can connect to any application inside your cluster using kubectl proxy command, which will forward a port you need to your PC.
Being ready to scale my environment (based on multiple different Dockerized applications) to more than one server in future - already thinking about networking/service discovery issues with a pure docker-compose setup
Kubernetes can be scaled up to thousands of nodes. Or can have only one. It is your choice. Independent of a cluster size, you will get production-grade networking, service discovery and load balancing.
So, do not afraid, just try to use it locally with Minikube. It will make many of operation tasks more simple, not more complex.

Kubernetes vs Docker Swarm

I am evaluating Kubernetes (with Docker containers, not Kubernetes) and Docker Swarm and could use your input.
If I'm looking at 3 (8.76 hours) or 4 (52 min) 9's reliability in a server farm that is < 100 servers, would Kubernetes be overkill due to its complexity? Would Docker Swarm suffice?
Docker swarm will be able to meet your requirements. I recommend you start with Docker swarm as it is robust and very straightforward to use for anyone who has used Docker before.
For a Docker user, there are many new concepts that you need to learn to be able to use Kubernetes. Moreover, setting up Kubernetes on premise without using a preconfigured cloud platform is not straightforward
On the other hand, Kubernetes is more flexible and extensible. Kubernetes is older than Docker swarm and the community for kubernetes community is really big.
It really depends on your real needs, Kubernetes or Swarm orchestrators are not silver bullets. To take real advantage from the container technology, the applications have to be properly designed. A design guideline for these cloud native apps are the Twelve Factor app principles made by Heroku.
In case you want to scale and achieve global scaling, Kubernetes is a great framework to run distributed apps at scale. In case you have a lot of Java apps, maybe containerized traditional applications then Swarm is a best option.
The business requirements can drive you to make the right choice.
Hope this helps!

AutoScaling in Docker Containers

I have been looking into Docker containerization for a while now but few things are still confusing to me. I understand that all the containers are grouped into a cluster and cluster management tools like Docker Swarm, DC/OS, Kubernetes or Rancher can be used to manage docker containers. I have been testing out Container cluster management with DC/OS and Kubernetes, but still a few questions remain unanswered to me.
How does auto scaling in container level help us in production servers? How does the application serve traffic from multiple containers?
Suppose we have deployed a web application using containers and they have auto scaled. How does the traffic flow to the containers? How are the sessions managed?
What metrics are calculated for autoscaling containers?
The autoscaling in DC/OS (note: Mesosphere is the company, DC/OS the open source project) the autoscaling is described in detail in the docs. Essentially the same as with Kubernetes, you can use either low-level metrics such as CPU utilization to decide when to increase the number of instances of an app or higher-level stuff like app throughput, for example using the Microscaling approach.
Regarding your question how the routing works (how are requests forwarded to an instance, that is a single container running): you need a load balancer and again, DC/OS provides you with this out of the box. And again, the options are detailed out in the docs, essentially: HAProxy-based North-South or IPtables-based, East-West (cluster internal) load balancers.
Kubernetes has concept called service. A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them. Kubernetes uses services to serve traffic from multiple containers. You can read more about services here.
AFAIK, Sessions are managed outside kubernetes, but Client-IP based session affinity can be selected by setting service.spec.sessionAffinity to "ClientIP". You can read more about Service and session affinity here
Multiple metrics like cpu and memory can be used for autoscaling containers. There is a good blog you can read about autoscaling, when and how.

Resources