Switch container runtime in K8S cluster. From CRI-O to docker

Switch container runtime in K8S cluster. From CRI-O to docker - docker

There is a working k8s cluster with two nodes(master and worker) in it, and with CRI-O as a container runtime. I need(temporary) to switch from cri-o to docker container runtime.
I was trying to use these commands:
kubectl cordon <node_name>
kubectl drain <node_name>
and it was failed on master node.

Here are some things to help you:
Understand that dockershim support was removed from Kubernetes v1.24+. So, if your Kubernetes version is one of these, docker as a runtime will not work. This is a great resource in understanding the details of this.
If your version allows using docker engine as a runtime, then as per the docs, you need to install the docker engine and then cri-dockerd adapter to interface it with Kubernetes. Links for all this you can find in the linked docs.
After you're done installing and configuring your nodes, you will need to create a RuntimeClass object in your cluster. You can use this guide.
Now, you need to update each pod specification to add the runtimeClass parameter to it, so it can be scheduled on the specified node.
Understand that there is no "temporary" switching between runtimes. You simply install, configure and setup all the runtimes you need, in parallel, on your worker nodes and then update all of your pod specifications to schedule them on the worker node with the required RuntimeClass.
Also, there is no point in changing a runtime of the master node. The master node pods are Kubernetes system components that are static pods and have their manifests at /etc/kubernetes/manifests directory. They are not applied through the Kubernetes API server. Any runtime changes on the node will not affect these pods unless the cluster is deleted and these pods are created again. It is HIGHLY DISCOURAGED to manipulate these manifests because any errors will not be shown anywhere and the component will simply "not work". (Hence, static pods).
Bottom line; Runtime changes only make sense for worker nodes. Do not try to change master node runtimes.

Related

Trying to understand what a Kubernetes Worker Node and Pod is compared to a Docker "Service"

I'm trying to learn Kubernetes to push up my microservices solution to some Kubernetes in the Cloud (e.g. Azure Kubernetes Service, etc)
As part of this, I'm trying to understand the main concepts, specifically around Pods + Workers and (in the yml file) Pods + Services. To do this, I'm trying to compare what I have inside my docker-compose file against the new concepts.
Context
I currently have a docker-compose.yml file which contains about 10 images. I've split the solution up into two 'networks': frontend and backend. The backend network contains 3 microservices and cannot be accessed at all via a browser. The frontend network contains a reverse-proxy (aka. Traefik, which is just like nginx) which is used to route all requests to the appropriate backend microservice and a simple SPA web app. All works 100% awesome.
Each backend Microservice has at least one of these:
Web API host
Background tasks host
So this means, I could scale out the WebApi hosts, if required .. but I should never scale out the background tasks hosts.
Here's a simple diagram of the solution:
So if the SPA app tries to request some data with the following route:
https://api.myapp.com/account/1 this will hit the reverse-proxy and match a rule to then forward onto <microservice b>/account/1
So it's from here, I'm trying to learn how to write up an Kubernetes deployment file based on these docker-compose concepts.
Questions
Each 'Pod' has it's own IP so I should create a Pod per container. (Yes, a Pod can have multiple containers and to me, that's like saying 'install these software products on the same machine')
A 'Worker Node' is what we replicate/scale out, so we should put our Pods into a Node based on the scaling scenario. For example, the Background Task hosts should go into one Node because they shouldn't be scaled. Also, the hardware requirements for that node are really small. While the Web Api's should go into another Node so they can be replicated/scaled out
If I'm on the right path with the understanding above, then I'll have a lot of nodes and pods ... which feels ... weird?

The pod is the unit of Workload, and has one or more containers. Exactly one container is normal. You scale that workload by changing the number of Pod Replicas in a ReplicaSet (or Deployment).
A Pod is mostly an accounting construct with no direct parallel to base docker. It's similar to docker-compose's Service. A pod is mostly immutable after creation. Like every resource in kubernetes, a pod is a declaration of desired state - containers to be running somewhere. All containers defined in a pod are scheduled together and share resources (IP, memory limits, disk volumes, etc).
All Pods within a ReplicaSet are both fungible and mortal - a request can be served by any pod in the ReplicaSet, and any pod can be replaced at any time. Each pod does get its own IP, but a replacement pod will probably get a different IP. And if you have multiple replicas of a pod they'll all have different IPs. You don't want to manage or track pod IPs. Kubernetes Services provide discovery (how do I find those pods' IPs) and routing (connect to any Ready pod without caring about its identity) and load balancing (round robin over that group of Pods).
A Node is the compute machine (VM or Physical) running a kernel and a kubelet and a dockerd. (This is a bit of a simplification. Other container runtimes than just dockerd exist, and the virtual-kubelet project aims to turn this assumption on its head.)
All pods are Scheduled on Nodes. When a pod (with containers) is scheduled on a node, the kubelet responsible for & running on that node does things. The kubelet talks to dockerd to start containers.
Once scheduled on a node, a pod is not moved to another node. Nodes are fungible & mortal too, though. If a node goes down or is being decommissioned, the pod will be evicted/terminated/deleted. If that pod was created by a ReplicaSet (or Deployment) then the ReplicaSet Controller will create a new replica of that pod to be scheduled somewhere else.
You normally start many (1-100) pods+containers on the same node+kubelet+dockerd. If you have more pods than that (or they need a lot of cpu/ram/io), you need more nodes. So the nodes are also a unit of scale, though very much indirectly wrt the web-app.
You do not normally care which Node a pod is scheduled on. You let kubernetes decide.

Why POD is the fundamental unit of deployment instead of containers?

In Kubernetes POD is considered as a single unit of deployment which might have one or more containers, so if we scale all the containers in the POD are scaled irrespectively.
If the POD has only one container its easier to scale the particular POD, so whats purpose of packaging one or more containers inside the POD?

From the documentation:
Pods can be used to host vertically integrated application stacks (e.g. LAMP), but their primary motivation is to support co-located, co-managed helper programs
The most common example of this is sidecar containers which contain helper applications like log shipping utilities.
A deeper dive can be found here

The reason behind using pod rather than directly container is that kubernetes requires more information to orchestrate the containers like restart policy, liveness probe, readiness probe. A liveness probe defines that container inside the pods is alive or not, restart policy defines the what to do with container when it failed. A readiness probe defines that container is ready to start serving.
So, Instead of adding those properties to the existing container, kubernetes had decided to write the wrapper on containers with all the necessary additional information.
Also, Kubernetes supports the multi-container pod which is mainly requires for the sidecar containers mainly log or data collector or proxies for the main container. Another advantage of multi-container pod is they can have very tightly coupled application container together sharing the same data, same network namespace and same IPC namespace which would not be possible if they choose for directly using container without any wrapper around it.
Following is very nice article to give you brief idea:
https://www.mirantis.com/blog/multi-container-pods-and-container-communication-in-kubernetes/

Not able to connect to a container(Created via Rest API) in Kubernetes

I am creating a docker container ( using docker run) in a kubernetes Environment by invoking a rest API.
I have mounted the docker.sock of the host machine and i am building an image and running that image from RESTAPI..
Now i need to connect to this container from some other container which is actually started by Kubectl from deployment.yml file.
But when used kubeclt describe pod (Pod name), my container created using Rest API is not there.. So where is this container running and how can i connect to it from some other container ?

Are you running the container in the same namespace as namespace with deployment.yml? One of the option to check that would be to run -
kubectl get pods --all-namespaces
If you are not able to find the docker container there than I would suggest performing below steps -
docker ps -a {verify running docker status}
Ensuring that while mounting docker.sock there are no permission errors
If there are permission errors, escalate privileges to the appropriate level
To answer the second question, connection between two containers should be possible by referencing cluster DNS in below format -
"<servicename>.<namespacename>.svc.cluster.local"
I would also request you to detail steps, codes and errors(if there are any) for me to better answer the question.

You probably shouldn't be directly accessing the Docker API from anywhere in Kubernetes. Kubernetes will be totally unaware of anything you manually docker run (or equivalent) and as you note normal administrative calls like kubectl get pods won't see it; the CPU and memory used by the pod won't be known about by the node interface and this could cause a node to become over utilized. The Kubernetes network environment is also pretty complicated, and unless you know the details of your specific CNI provider it'll be hard to make your container accessible at all, much less from a pod running on a different node.
A process running in a pod can access the Kubernetes API directly, though. That page notes that all of the official client libraries are aware of the conventions this uses. This means that you should be able to directly create a Job that launches your target pod, and a Service that connects to it, and get the normal Kubernetes features around this. (For example, servicename.namespacename.svc.cluster.local is a valid DNS name that reaches any Pod connected to the Service.)
You should also consider whether you actually need this sort of interface. For many applications, it will work just as well to deploy some sort of message-queue system (e.g., RabbitMQ) and then launch a pool of workers that connects to it. You can control the size of the worker queue using a Deployment. This is easier to develop since it avoids a hard dependency on Kubernetes, and easier to manage since it prevents a flood of dynamic jobs from overwhelming your cluster.

How to add containers to a Kubernetes pod on runtime

I have a number of Jobs
running on k8s.
These jobs run a custom agent that copies some files and sets up the environment for a user (trusted) provided container to run.
This agent runs on the side of the user container, captures the logs, waits for the container to exit and process the generated results.
To achieve this, we mount Docker's socket /var/run/docker.sock and run as a privileged container, and from within the agent, we use docker-py to interact with the user container (setup, run, capture logs, terminate).
This works almost fine, but I'd consider it a hack. Since the user container was created by calling docker directly on a node, k8s is not aware of it's existence. This has been causing troubles since our monitoring tools interact with K8s, and don't get visibility to these stand-alone user containers. It also makes pod scheduling harder to manage, since the limits (cpu/memory) for the user container are not accounted as the requests for the pod.
I'm aware of init containers but these don't quite fit this use case, since we want to keep the agent running and monitoring the user container until it completes.
Is it possible for a container running on a pod, to request Kubernetes to add additional containers to the same pod the agent is running? And if so, can the agent also request Kubernetes to remove the user container at will (e.g. certain custom condition was met)?

From this GitHub issue, it seems that the answer is that adding or removing containers to a pod is not possible, since the container list in the pod spec is immutable.

In kubernetes 1.16, there is an alpha feature that would allow for creation of ephemeral containers which could be "added" to running pods. Note, that this requires a feature gate to be enabled on relevant components e.g. kubelet. This may be hard to enable on control plane for cloud provider managed services such as EKS.
API Reference 1.16
Simple tutorial

I don't think you can alter a running pod like that but you can certainly define your own pod and run it programmatically using API
What I mean is you should define a pod with the user container and whatever other containers you wish and run it as a unit. It's possible you'll need to play around with liveness checks to have post processing completed after your user container dies

You can share data between multiple containers in a pod using shared volumes. this would let your agent container read from log files written on the user container, and drop config files into the shared volume for setup.
This way you could run the user container and the agent container as a Job with both containers in the pod. When both containers exit, the job will be completed.
You seem to indicate above that you are manually terminating the user container. That wouldn't be supported via shared volume unless you did something like forcing users to terminate their execution at the presence of a file on the shared volume.
Is it possible for a container running on a pod, to request Kubernetes
to add additional containers to the same pod the agent is running? And
if so, can the agent also request Kubernetes to remove the user
container at will (e.g. certain custom condition was met)?
I'm not aware of any way to add containers to existing Job pod definitions. There's no replicas option for Jobs so you couldn't hack it by changing replicas from 0->1 like you potentially could on a Deployment.
I'm not aware of any way to use kubectl to delete a container but not the whole pod. See kubectl delete.
If you want to kill the user container (rather than having it run to completion), you'll have to get on the host and use docker kill <sha> on the user container. Make sure to set .spec.template.spec.restartPolicy = "Never" on the user container or k8s will restart it.
I'd recommend:
Having a shared volume to transfer logs to the agent and so the agent can set up the user container
Making user containers expect to exit on their own and read configs from the shared volume
I don't know what workloads you are doing or how users are making containers so that may not be possible. If you're not able to dictate how users build their containers, the above may not work.
Another option is providing a binary that acts as a command API on the user container. This binary could accept commands like "setup", "run", "terminate", "transfer logs" via RPC and it would be the main process in their docker container.
Then you could make the build process for users something like:
FROM your-container-with-binary:latest
put whatever you want in this
container and set ENV JOB_PATH=/path/to/executable/code (or put code
in specific location)
Lots of moving parts to this whichever way you make it happen.

You can inject containers to pods dynamically via : https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/
An admission controller is a piece of code that intercepts requests to
the Kubernetes API server prior to persistence of the object, but
after the request is authenticated and authorized. The controllers
consist of the list below, are compiled into the kube-apiserver
binary, and may only be configured by the cluster administrator. In
that list, there are two special controllers: MutatingAdmissionWebhook
and ValidatingAdmissionWebhook. These execute the mutating and
validating (respectively) admission control webhooks which are
configured in the API.
Admission controllers may be “validating”, “mutating”, or both.
Mutating controllers may modify the objects they admit; validating
controllers may not.
And you can inject additional runtime requirements to pods via : https://kubernetes.io/docs/concepts/workloads/pods/podpreset/
A Pod Preset is an API resource for injecting additional runtime
requirements into a Pod at creation time. You use label selectors to
specify the Pods to which a given Pod Preset applies.

Defining services in kubelet manifest

I'm in middle of setting up a masterless kubernetes, which I mean setting up a kubernetes node without setting up kube-apiserver.
I was successful defining my Pods in kubelet manifest files, but I wonder is it possible to define services in kubelet manifests too?

I'm sorry, but no -- only pods can be created via manifest files.
Services (along with replication controllers and other API objects) require the orchestration that the master provides in order to function.
If it interests you, you can run the master components on the same node without even needing VMs by using the instructions for a "local" cluster.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart