Openshift PaaS/Kubernetes Docker Container Monitoring and Orchestration - docker

Kubernetes deployment and replication controller give the ability to self heal by ensuring a minimum number of replicas is/are present.
Also the auto scaling features, allows to increase replicas given a specific cpu threshold.
Are there tools available that would provide flexibility in the auto-healing and auto-scale features?
Example :
Auto-adjust number of replicas during peak hours or days.
When the pod dies, and is due to external issues, prevent the system from re-creating container and wait for a condition to succeed, i.e. ping or telnet test.

You can block pod startup by waiting for external services in an entrypoint script or init container. That's the closest that exists today to waiting for external conditions.
There is no time based autoscaler today, although it would be possible to script it failure easily on a schedule.

In Openshift, you can easily scale your app by running this command in a cron job.
Scale command
oc scale dc app --replicas=5
And of course, scale it down changing the numer of replicas.
Autoscale
This is what Openshift for developers write about autoscaling.
OpenShift also supports automatic scaling, defining upper and lower thresholds for CPU usage by pod.
If the upper threshold is consistently exceeded by the running pods for your application, a new instance of your application will be started. When CPU usage drops back below the lower threshold, because your application is no longer working as hard, the number of instances will be scaled back again.
I think Kubernetes now released version 1.3 which allows autoscale but integrated yet in Openshift.
Health Check
What it comes to health check, Openshift has:
readiness checks Checks the status of the test you configure before the router start to send traffic to it.
liveness probe: liveness probe is run periodically once traffic has been switched to an instance of your application to ensure it is still behaving correctly. If the liveness probe fails, OpenShift will automatically shut down that instance of your application and replace it with a new one.
You can perform this kind of tests (HTTP check, Container execution check and TCP socket check)
So e this tolos I guess you can créate some readiness check and liveness check to ensure that the status of your pod is running properly, if not a new deployment will be triggered until readiness status comes to ok.

Related

Is there a reason running CI builds on kubernetes cluster?

I don't know much about kubernetes, but as far as I know, it is a system that enables you to control and manage containerized applications. So, generally speaking, the essence of the benefit that we get from kubernetes is the ability to "tell" kubernetes what containers we want running, how many of them, on which machines, among other details, and kubernetes will take care of doing that for us. Is that correct?
If so, I just can't see the benefit of running a CI pipeline using a kubernetes pod, as I understand that some people do. Let's say you have your build tools on Docker containers instead of having them installed on a specific machine, that's great - you can just use those containers in the build process, why kubernetes? Is there any performance gain or something like this?
Appreciate some insights.
It is highly recommended to get a good understanding of what Kubernetes is and what it can and cannot do.
Generally, containers combined with an orchestration tools can provide a better management of your machines and services. It can significantly improve the reliability of your application and reduce the time and resources spent on DevOps.
Some of the features worth noting are:
Horizontal infrastructure scaling: New servers can be added or removed easily.
Auto-scaling: Automatically change the number of running containers, based on CPU utilization or other application-provided metrics.
Manual scaling: Manually scale the number of running containers through a command or the interface.
Replication controller: The replication controller makes sure your cluster has an equal amount of pods running. If there are too many pods, the replication controller terminates the extra pods. If there are too few, it starts more pods.
Health checks and self-healing: Kubernetes can check the health of nodes and containers ensuring your application doesn’t run into any failures. Kubernetes also offers self-healing and auto-replacement so you don’t need to worry about if a container or pod fails.
Traffic routing and load balancing: Traffic routing sends requests to the appropriate containers. Kubernetes also comes with built-in load balancers so you can balance resources in order to respond to outages or periods of high traffic.
Automated rollouts and rollbacks: Kubernetes handles rollouts for new versions or updates without downtime while monitoring the containers’ health. In case the rollout doesn’t go well, it automatically rolls back.
Canary Deployments: Canary deployments enable you to test the new deployment in production in parallel with the previous version.
However you should also know what Kubernetes is not:
Kubernetes is not a traditional, all-inclusive PaaS (Platform as a
Service) system. Since Kubernetes operates at the container level
rather than at the hardware level, it provides some generally
applicable features common to PaaS offerings, such as deployment,
scaling, load balancing, and lets users integrate their logging,
monitoring, and alerting solutions. However, Kubernetes is not
monolithic, and these default solutions are optional and pluggable.
Kubernetes provides the building blocks for building developer
platforms, but preserves user choice and flexibility where it is
important.
Especially in your use case note that Kubernetes:
Does not deploy source code and does not build your application.
Continuous Integration, Delivery, and Deployment (CI/CD) workflows are
determined by organization cultures and preferences as well as
technical requirements.
The decision is yours but having in mind the main concepts above will help you make it.
An important detail is that you do not tell Kubernetes what nodes a given pod should run on; it picks itself, and if the cluster is low on resources, in many cases it can actually allocate more nodes on its own (via the cluster autoscaler).
So if your CI system is fairly busy, and uses all containers for everything, it could make more sense to run an individual build job as a Kubernetes Job. If you have 100 builds that all start at the same time, it's possible for the cluster to give itself more hardware, and the build queue will clear out faster. Particularly if you're using Kubernetes for other tasks, this can save you same administrative effort over maintaining a dedicated pool of CI-system workers that need to be separately updated and will sit mostly idle until that big set of builds arrives.
Kubernetes's security settings are also substantially better than Docker's. Say your CI system needs to launch containers as part of a build. In Kubernetes, it can run under a service account, and be given permissions to create and delete deployments in a specific namespace, and nothing else. In Docker the standard approach is to give your CI system access to the host's Docker socket, but this can be easily exploited to take over the host.

On a k8s node, how does one manage the pod disk IO land-rush at node power-on?

The Problem
When one of our locally hosted bare-metal k8s (1.18) nodes is powered-on, pods are scheduled, but struggle to reach 'Ready' status - almost entirely due to a land-rush of disk IO from 30-40 pods being scheduled simultaneously on the node.
This often results in a cascade of Deployment failures:
IO requests on the node stack up in the IOWait state as pods deploy.
Pod startup times skyrocket from (normal) 10-20 seconds to minutes.
livenessProbes fail.
Pods are re-scheduled, compounding the problem as more IO
requests stack up.
Repeat.
FWIW Memory and CPU are vastly over-provisioned on the nodes, even in the power-on state (<10% usage).
Although we do have application NFS volume mounts (that would normally be suspect WRT IO issues), the disk activity and restriction at pod startup is almost entirely in the local docker container filesystem.
Attempted Solutions
As disk IO is not a limitable resource, we are struggling to find a solution for this. We have tuned our docker images to write to disk as little as possible at startup, and this has helped some.
One basic solution involves lowering the number of pods scheduled per node by increasing the number of nodes in the cluster. This isn't ideal for us, as they are physical machines, and once the nodes DO start up, the cluster is significantly over-resourced.
As we are bare-metal/local we do not have an automated method to auto-provision nodes in startup situations and lower them as the cluster stabilizes.
Applying priorityClasses at first glance seemed to be a solution. We have created priorityClasses and applied them accordingly, however, as listed in the documentation:
Pods can have priority. Priority indicates the importance of a Pod
relative to other Pods. If a Pod cannot be scheduled, the scheduler
tries to preempt (evict) lower priority Pods to make scheduling of the
pending Pod possible.
tldr: Pods will still all be "scheduleable" simultaneously at power-on, as no configurable resource limits are being exceeded.
Question(s)
Is there a method to limit scheduing pods on a node based on its current number of non-Ready pods? This would allow priority classes to evict non-priority pods and schedule the higher priority first.
Aside from increasing the number of
cluster nodes, is there a method we have not thought of to manage this disk IO landrush otherwise?
While I am also interested to see smart people answer the question, here is my probably "just OK" idea:
Configure the new node with a Taint that will prevent your "normal" pods from being scheduled to it.
Create a deployment of do-nothing pods with:
A "reasonably large" memory request, eg: 1GB.
A number of replicas high enough to "fill" the node.
A toleration for the above Taint.
Remove the Taint from the now-"full" node.
Scale down the do-nothing deployment at whatever rate you feel is appropriate as to avoid the "land rush".
Here's a Dockerfile for the do-nothing "noop" image I use for testing/troubleshooting:
FROM alpine:3.9
CMD sh -c 'while true; do sleep 5; done'
Kubernetes Startup Probes might mitigate the problem of Pods being killed due to livenessProbe timeouts: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes
If you configure them appropiately, the I/O "landrush" will still happen, but the pods have enought time to settle themselves instead of being killed.

Does it make sense to run multiple similar processes in a container?

a brief background to give context on the question.
Currently my team and i are in the midst of migrating our microservices to k8s to lessen the effort of having to maintain multiple deployment tools & pipelines.
One of the microservices that we are planning to migrate is an ETL worker that listens to messages on SQS and performs multi-stage processing.
It is built using PHP Laravel and we use supervisord to control how many processes to run on each worker instance on aws ec2. Each process basically executes a laravel command to poll different queues for new messages. We also periodically adjust the number of processes to maximize utilization of each instance's compute power.
So the questions are:
is this method of deployment still feasible when moving to k8s? Is there still a need to "maximize" compute usage? Are we better off just running 1 process in each container using the "container way" (not sure what is the tool called. runit?)
i read from multiple sources (e.g https://devops.stackexchange.com/questions/447/why-it-is-recommended-to-run-only-one-process-in-a-container) that it is ideal that for a container to run only 1 process. There's also the case of recovering crashed processes and how running supervisord might interfere with how container performs recovery. But i am not very sure if it applies for our use case.
You should absolutely restructure this to run one process per container and one container per pod. You do not typically need an init system or a process manager like supervisord or runit (there is an argument to have a dedicated init like tini that can do the special pid-1 things).
You mention two concerns here, restarting failed processes and process placement in the cluster. For both of these, Kubernetes handles these automatically for you.
If the main process in a Pod fails, Kubernetes will restart it. You don't need to do anything for this. If it fails repeatedly, it will start delaying the restarts. This functionality only works if the main process fails – if your container's main process is a supervisor process, you will never get a pod restart and you may not directly notice if a process can't start up at all.
Typically you'll run containers via Deployments that have some number of identical replica Pods. Kubernetes itself takes responsibility for deciding which node will run each pod; you don't need to manually specify this. The smaller the pods are, the easier it is to place them. Since you're controlling the number of replicas of a pod, you also want to separate concerns like Web servers vs. queue workers so you can scale these independently.
Kubernetes has some ability to auto-scale, though the typical direction is to size the cluster based on the workload: in a cloud-oriented setup if you add a new pod that requests more CPUs than your cluster currently has available, it will provision a new node. The HorizonalPodAutoscaler is something of an advanced setup, but you can configure it so that the number of workers is a function of your queue length. Again, this works better if the only thing it's scaling is the worker pods, and not a collection of unrelated things packaged together.

How to kill a multi-container pod if one container fails?

I'm using Jenkins Kubernetes Plugin which starts Pods in a Kubernetes Cluster which serve as Jenkins agents. The pods contain 3 containers in order to provide the slave logic, a Docker socket as well as the gcloud command line tool.
The usual workflow is that the slave does its job and notifies the master that it completed. Then the master terminates the pod. However, if the slave container crashes due to a lost network connection, the container terminates with error code 255, the other two containers keep running and so does the pod. This is a problem because the pods have large CPU requests and setup is cheap with the slave running only when they have to, but having multiple machines running for 24h or over the weekend is a noticable financial damage.
I'm aware that starting multiple containers in the same pod is not fine Kubernetes arts, however ok if I know what I'm doing and I assume I do. I'm sure it's hard to solve this differently given the way the Jenkins Kubernetes Plugin works.
Can I make the pod terminate if one container fails without it respawn? As solution with a timeout is acceptable as well, however less preferred.
Disclaimer, I have a rather limited knowledge of kubernetes, but given the question:
Maybe you can run the forth container that exposes one simple endpoint of "liveness"
It can run ps -ef or any other way to contact 3 existing containers just to make sure they're alive.
This endpoint could return "OK" only if all the containers are running, and "ERROR" if at least one of them was detected as "crushed"
Then you could setup a liveness probe of kubernetes so that it would stop the pod upon the error returned from that forth container.
Of course if this 4th process will crash by itself for any reason (well it shouldn't unless there is a bug or something) then the liveness probe won't respond and kubernetes is supposed to stop the pod anyway, which is probably what you really want to achieve.

Is there a best practice to reboot a cluster

I followed Alex Ellis' excellent tutorial that uses kubeadm to spin-up a K8s cluster on Raspberry Pis. It's unclear to me what the best practice is when I wish to power-cycle the Pis.
I suspect sudo systemctl reboot is going to result in problems. I'd prefer not to delete and recreate the cluster each time starting with kubeadm reset.
Is there a way that I can shutdown and restart the machines without deleting the cluster?
Thanks!
This question is quite old but I imagine others may eventually stumble upon it so I thought I would provide a quick answer because there is, in fact, a best practice around this operation.
The first thing that you're going to want to ensure is that you have a highly available cluster. This consists of at least 3 masters and 3 worker nodes. Why 3? This is so that at any given time they can always form a quorum for eventual consistency.
Now that you have an HA Kubernetes cluster, you're going to have to go through every single one of your application manifests and ensure that you have specified Resource Requests and Limitations. This is so that you can ensure that a pod will never be scheduled on a pod without the required resources. Furthermore, in the event that a pod has a bug that causes it to consume a highly abnormal amount of resources, the limitation will prevent it from taking down your cluster.
Now that that is out of the way, you can begin the process of rebooting the cluster. The first thing you're going to do is reboot your masters. So run kubectl drain $MASTER against one of your (at least) three masters. The API Server will now reject any scheduling attempts and immediately start the process of evicting any scheduled pods and migrating their workloads to your other masters.
Use kubectl describe node $MASTER to monitor the node until all pods have been removed. Now you can safely connect to it and reboot it. Once it has come back up, you can now run kubectl uncordon $MASTER and the API Server will once again begin scheduling Pods to it. Once again use kubectl describe $NODE until you have confirmed that all pods are READY.
Repeat this process for all of the masters. After the masters have been rebooted, you can safely repeat this process for all three (or more) worker nodes. If you properly perform this operation you can ensure that all of your applications will maintain 100% availability provided they are using multiple pods per service and have proper Deployment Strategy configured.

Resources