Kubernetes and Docker Relationship - docker

What is the nature of the relationship between Docker and Kubernetes? Is it safe to assume that ALL Docker operations done within a Pod will treat the Pod as if it is a normal host machine?
For example, if I were to use the Python Docker SDK, attach to the /var/run/docker.sock, and create a volume, will this volume only exist within the Pod?
My main concern is that I know a Pod is virtualized, thus may not play nicely if I dig a little too deep via other virtualization tools like Docker.

It's important to understand what the responsibility of each of these concepts is.
A Docker container is in essence a boundary between the host OS and guest OS, that allows for a process to run in isolation (docs).
Kubernetes is an orchestration platform for running such containers (docs).
Finally a Pod is a kubernetes object that describes how a docker container is to be run (docs).
With that knowledge we can answer some of your questions;
What is the nature of the relationship between Docker and Kubernetes?
Kubernetes can run docker containers like your computer can, but it's optimised for this specific goal. Kubernetes is also an abstraction (or orchestration) layer, handling resources like network capability, disk space, and cpu cycles for you.
Is it safe to assume that ALL Docker operations done within a Pod will treat the Pod as if it is a normal host machine?
A Pod is not a host in any way. It's merely a description of how a docker container (or multiple) should run. Any resulting containers are running in the virtual space that is created by the kubernetes Nodes.
For example, if I were to use the Python Docker SDK, attach to the /var/run/docker.sock, and create a volume, will this volume only exist within the Pod?
This is something you can do on your local machine, and while technically you could do this on your Node as well, it's not a common use case.
Note that a docker container is isolated from any external factors like a mount or a network socket (which only happen at runtime, and don't change the state of the container itself). You can however configure a container (using a Pod object) to recreate the same conditions on your cluster.

If Kubernetes is running Docker (it's not guaranteed to) then that /var/run/docker.sock will be the host's Docker socket; there is not an additional layer of virtualization.
You shouldn't try to use Docker primitives in an application running in Kubernetes. The approach you describe can lead to data loss, even, if you try to create a Docker-native volume on a node but then a cluster autoscaler or some other task destroys the node. If you need to create storage or additional containers, you can use the Kubernetes API to create PersistentVolumeClaims, Jobs, and other Kubernetes-managed objects.

Related

How to add docker container hostname and network in Kubernetes Deployment?

I have docker images of different elk stacks, and I want to communicate between them. I have achieved it by creating a docker network and accessing them via hostname. I want to know if we can pass this properties in the kubernetes or not?
Can we create a docker network over there? And how do we pass these properties inside the deployment yaml?
I have created a docker network named as "elk", and then passed it in the run arguments (as docker run --network=elk -h elasticsearch ....)
I am expecting to create this network in kubernetes cluster and then pass these properties to deployment yaml
Kubernetes does not have Docker's notion of separate per-application isolated networks. You can't reproduce this Docker setup in Kubernetes and don't need to. Also see Services, Load Balancing, and Networking in the Kubernetes documentation.
In Kubernetes you usually do not communicate directly with Pods (containers). Instead, you also create a Service matching each Deployment, and then make calls to the Service name and port.
If you're currently deploying containers with docker run --net=... then you can ignore that option when migrating to Kubernetes. If you're using Compose, I'd suggest first trying to update the Compose setup to use only the Compose-provided default network, removing all of the networks: blocks.
For something like Elasticsearch, you probably want to run it in a StatefulSet which can also manage the per-replica storage. This has specific requirements around corresponding Services, and it does provide a way to connect to a specific replica when you need to. Relevantly to this question, if the StatefulSet is named elasticsearch then the Pods will be named elasticsearch-0, elasticsearch-1, and so on, and these names will also be visible as the hostname(8) inside the container, matching the docker run -h option.

Difference between Minikube, Kubernetes, Docker Compose, Docker Swarm, etc

I am new to cluster container management, and this question is the basis for all the freshers over here.
I read some documentation, but still, my understanding is not too clear, so any leads.. helping to understand?
Somewhere it is mentioned, Minikube is used to run Kubernetes locally. So if we want to maintain cluster management in my four-node Raspberry Pi, then Minikube is not the option?
Does Minikube support only a one-node system?
Docker Compose is set of instructions and a YAML file to configure and start multiple Docker containers. Can we use this to start containers of the different hosts? Then for simple orchestration where I need to call container of the second host, I don't need any cluster management, right?
What is the link between Docker Swarm and Kubernetes? Both are independent cluster management. Is it efficient to use Kubernetes on Raspberry Pi? Any issue, because I was told that Kubernetes in single node takes the complete memory and CPU usage? Is it true?
Is there other cluster management for Raspberry Pi?
I think this 4-5 set will help me better.
Presuming that your goal here is to run a set of containers over a number of different Raspberry Pi based nodes:
Minikube isn't really appropriate. This starts a single virtual machine on a Windows, MacOS or Linux and installs a Kubernetes cluster into it. It's generally used by developers to quickly start-up a cluster on their laptops or desktops for development and testing purposes.
Docker Compose is a system for managing sets of related containers. So for example if you had a web server and database that you wanted to manage together you could put them in a single Docker Compose file.
Docker Swarm is a system for managing sets of containers across multiple hosts. It's essentially an alternative to Kubernetes. It has fewer features than Kubernetes, but it is much simpler to set up.
If you want a really simple multi-node Container cluster, I'd say that Docker swarm is a reasonable choice. If you explicitly want to experiment with Kubernetes, I'd say that kubeadm is a good option here. Kubernetes in general has higher resource requirements than Docker Swarm, so it could be somewhat less suited to it, although I know people have successfully run Kubernetes clusters on Raspberry Pis.
Docker Compose
A utility to to start multiple docker containers on a single host using a single docker-compose up. This makes it easier to start multiple containers at once, rather than having do mutliple docker run commands.
Docker swarm
A native container orchestrator for Docker. Docker swarm allows you to create a cluster of docker containers running on multiple machines. It provides features such as replication, scaling, self-healing i.e. starting a new container when one dies ...
Kubernetes
Also a container orchestrator. Kubernetes and Docker swarm can be considered as alternatives to one another. They both try to handle managing containers starting in a cluster
Minikube
Creating a real kubernetes cluster requires having multiple machines either on premise or on a cloud platform. This is not always convenient if someone is just new to Kubernetes and trying to learn by playing around with Kubernetes. To solve that minikube allows you to start a very basic Kubernetes cluster that consists of a single VM on you machine, which you can use to play around with Kubernetes.
Minikube is not for a production or multi-node cluster. There are many tools that can be used to create a multi-node Kubernetes cluster such as kubeadm
Containers are the future of application deployment. Containers are smallest unit of deployment in docker. There are three components in docker as docker engine to run a single container, docker-compose to run a multi-container application on a single host and docker-swarm to run multi-container application across hosts which also an orchestration tool.
In kubernetes, the smallest unit of deployment is Pod(which is composed of multiple container). Minikube is a single node cluster where you can install it locally and try, test and feel the kubernetes features locally. But, you can't scale this to more than a single machine. Kubernetes is an orchestration tool like Docker Swarm but more prominent than Docker Swarm with respect to features, scaling, resiliency, and security.
You can do the analysis and think about which tool will be fit for your requirements. Each one having their own pros or cons like docker swarm is good and easy to manage small clusters whereas kubernetes is much better for larger once. There is another orchestration tool Mesos which is also popular and used in largest size clusters.
Check this out, Choose your own Adventure but, it's just a general analogy and only to understand because all the three technologies are evolving rapidly.
I get the impression you're mostly looking for confirmation and am happy to help with that if I can.
Yes, minikube is local-only
Yes, minikube is intended to be single-node
Docker-compose isn't really an orchestration system like swarm and Kubernetes are. It helps with running related containers on a single host, but it is not used for multi-host.
Kubernetes and Docker Swarm are both container orchestration systems. These systems are good at managing scaling up, but they have an overhead associated with them so they're better suited to multi-node.
I don't know the range of orchestration options for Raspberry Pi, but there are Kubernetes examples out there such as Build Your Own Cloud with Kubernetes and Some Raspberry Pi.
For Pi, you can use Docker Swarm Mode on one or more Pi's. You can even run ARM emulation for testing on Docker for Windows/Mac before trying to get it all working directly on a Pi. Same goes for Kubernetes, as it's built-in to Docker for Windows/Mac now (no minikube needed).
Alex Ellis has a good blog on Pi and Docker and this post may help too.
I've been playing around with orchestrating Docker containers on a subnet of Raspberry Pis (3Bs).
I found Docker-swarm easiest to set up and work with, and adequate for my purposes. Guide: https://docs.docker.com/engine/swarm/swarm-tutorial/
For Kubernetes there are two main options; k3s and microk8s. Some guides:
k3s
https://bryanbende.com/development/2021/05/07/k3s-raspberry-pi-initial-setup
microk8s
https://ubuntu.com/tutorials/how-to-kubernetes-cluster-on-raspberry-pi#1-overview

Bind mount volume between host and pod containers in Kubernetes

I have a legacy application that stores some config/stats in one of the directory on OS partition (e.g. /config/), and I am trying to run this as a stateful container in Kubernetes cluster.
I am able to run it as a container but due to the inherent ephemeral nature of containers, whatever data my container is writing to the OS partition directory /config/ is lost when the container goes down/destroyed.
I have the Kubernetes deployment file written in such a way that the container is brought back to life, albeit as a new instance either on same host or on another host, but this new container has no access to the data written by previous instance of the container.
If it was a docker container I could get this working using bind-mounts, so that whatever data the container writes to its OS partition directory is saved on the host directory, so that any new instance would have access to the data written by previous instance.
But I could not find any alternative for this in Kubernetes.
I could use hostpath provisioning, but hostpath-provisioning right now works only for single-node kubernetes cluster.
Is there a way I could get this working in a multi-node Kubernetes cluster? Any other option other than hostpath provisioning? I can get the containers talk to each other and sync-the data between nodes, but how do we bind-mount a host directory to container?
Thanks for your help in advance!
This is what you have Volumes and VolumeMounts for in your POD definition. Your lead about hostPath is the right direction, but you need a different volume type when you host data in a cluster (as you seen your self).
Take a look at https://kubernetes.io/docs/concepts/storage/volumes/ for a list of supported storage backends. Depending on your infrastructure you might find one that suits your needs, or you might need to actually create a backing service for one (ie. NFS server, Gluster, Ceph and so on).
If you want to add another abstraction layer to make a universal manifest that can work on different environments (ie. with storage based on cloud provider, or just manualy provisioned depending on particular needs). You will want to get familiar with PV and PVC (https://kubernetes.io/docs/concepts/storage/persistent-volumes/), but as I said they are esntially an abstraction over the basic volumes, so you need to crack that first issue anyway.

Are Docker Volumes machine-specific

I'm new to Docker Swarm. As I understand, Docker Swarm allows you to abstract from clustering. Means you don't care on which hardriwe container is deployed.
On the other hand, the standard way to handle database in Docker - is to write data outside Docker container (to avoid copy-on-write behaviour). That's achieved by mounting a Volume and write db-related data to it. The important thing here - are Volumes machine-specific? Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
Example:
I have 3 machines and 3 microservices/containers. All of them are deployed through Docker Swarm. Only one microservice/container must connect to a database. So I need to mount Volume only on one machine. But on which?
Databases and similar stateful applications are still a hard thing to deal with when it comes to Docker swarm and other orchestration frameworks. Ideally, containers should be able to run on any node in the swarm, but the problem comes when you need to persist data beyond the container's lifecycle.
Mounting a volume is the Docker way to persist data, however this ties the container with a specific node as volumes are created on the specific nodes. There are many projects that try to solve this problem and provide some sort of distributed storage.
There was a project called Flocker that deals with the above problem (it’s no longer maintained). There is also a newer project called REXRAY.
Are Docker & Docker Swarm clever enough to mount a Volume on the machine it's needed?
By default, no. Docker swarm will choose one of the nodes and deploy the container on it. However, you can work around this problem:
First, you need to define a named volume in you Stackfile/Composefile under the service definition.
Second, you need to use node Placement Constraints to restrict where the database container should run.
If you do not you a distributed storage tool, then when it comes to databases and similar stateful containers that need volumes, you need to restrict the container to a specific nodes.

How do I do docker clustering or hot copy a docker container?

Is it possible to hotcopy a docker container? or some sort of clustering with docker for HA purposes?
Can someone simplify this?
How to scale Docker containers in production
Docker containers are not designed to be VMs and are not really meant for hot-copies. Instead you should define your container such that it has a well-known start state. If the container goes down the alternate should start from the well-known start state. If you need to keep track of state that the container generates at run time this has to be done externally to docker.
One option is to use volumes to mount the state (files) on to the host filesystem. Then use RAID, NTFS or any other means, to share that file system with other physical nodes. Then you can mount the same files on to a second docker container on a second host with the same state.
Depending on what you are running in your containers you can also have to state sharing inside your containers for example using mongo replication sets. To reiterate though containers are not as of yet designed to be migrated with runtime state.
There is a variety of technologies around Docker that could help, depending on what you need HA-wise.
If you simply wish to start a stateless service container on different host, you need a network overlay, such as weave.
If you wish to replicate data across for something like database failover, you need a storage solution, such as Flocker.
If you want to run multiple services and have load-balancing and forget on which host each container runs, given that X instances are up, then Kubernetes is the kind of tool you need.
It is possible to make many Docker-related tools work together, we have a few stories on our blog already.

Resources