I've read kubernetes and minikube docs and it's not explicit if minikube implementation supports automatically log rotation (deleting the pod logs periodically) in order to prevent the memory to be overloaded by the logs.
I'm not talking about the various centralized logging stacks used to collect, persist and analyze logs, but the standard pod log management of minikube.
In kubernetes official documentation is specified:
An important consideration in node-level logging is implementing log rotation, so that logs don’t consume all available storage on the node. Kubernetes currently is not responsible for rotating logs, but rather a deployment tool should set up a solution to address that. For example, in Kubernetes clusters, deployed by the kube-up.sh script, there is a logrotate tool configured to run each hour. You can also set up a container runtime to rotate application’s logs automatically, for example by using Docker’s log-opt. In the kube-up.sh script, the latter approach is used for COS image on GCP, and the former approach is used in any other environment. In both cases, by default rotation is configured to take place when log file exceeds 10MB.
Of course if we're not in GCP and we don't use kube-up.sh to start the cluster (or we don't use Docker as container tool) but we spin up our Cluster with Minikube what happens?
As per the implementation
Minikube now uses systemd which has built in log rotation
Refer this issue
Related
Im new to docker and am wanting to accomplish something but I am unsure on how to Orchestrate my docker containers to do this.
What I want to do:
I have an API that in simple does a calculation from a requested file. It loads the file (around 80mb) from disk to memory then keep it in memory for 2 hours (caching).
Im wanting to have an architecture where for example when the container gets overwhelmed with requests a new one fires up, and when the original container frees its memory and the requests slow down then the container shuts down.
Is Memory and CPU Container Orchestration possible?
Thank You,
/Jeremy
Docker itself is not dedicated to the orchestration multiple containers. You need to use some container orchestration environment. The most popular are Kubernetes, Docker Swarm, and Apache Mesos. Or if you want to run in the Cloud, then some vendor-specific, like AWS ECS.
Here's a good list of container clustering toolkit.
In all these environments it's possible to configure what you described. If you're completely new to the topic, then I recommend installing Docker-for-Desktop which comes with built-in Kubernetes and play with that in your local.
For sure, container orchestration system is what you want to be able efficiently manage your docker containers.
You can find current complete list of solutions for production environment in this spreadsheet
Tools, like kubernetes will give you reach set of benefits eg
Provisioning and deployment of containers
Redundancy and availability of containers
Scaling up or removing containers to spread application load evenly
across host infrastructure
Allocation of resources between containers
Load balancing of service discovery between containers
Health monitoring of containers and hosts
In Kubernetes there is a Horizontal Pod Autoscaler, that
automatically scales the number of pods in a replication controller,
deployment, replica set or stateful set based on observed CPU
utilization (or, with custom metrics support, on some other
application-provided metrics). Note that Horizontal Pod Autoscaling
does not apply to objects that can’t be scaled, for example,
DaemonSets.
As for beginning I would recommend you start with minikube.
More advanced ways are setup manually cluster using kubeadm either look into the cloud providers
Please be aware that you will not have option to modify cloud based control plane. More info in my related answer
I am creating a docker container ( using docker run) in a kubernetes Environment by invoking a rest API.
I have mounted the docker.sock of the host machine and i am building an image and running that image from RESTAPI..
Now i need to connect to this container from some other container which is actually started by Kubectl from deployment.yml file.
But when used kubeclt describe pod (Pod name), my container created using Rest API is not there.. So where is this container running and how can i connect to it from some other container ?
Are you running the container in the same namespace as namespace with deployment.yml? One of the option to check that would be to run -
kubectl get pods --all-namespaces
If you are not able to find the docker container there than I would suggest performing below steps -
docker ps -a {verify running docker status}
Ensuring that while mounting docker.sock there are no permission errors
If there are permission errors, escalate privileges to the appropriate level
To answer the second question, connection between two containers should be possible by referencing cluster DNS in below format -
"<servicename>.<namespacename>.svc.cluster.local"
I would also request you to detail steps, codes and errors(if there are any) for me to better answer the question.
You probably shouldn't be directly accessing the Docker API from anywhere in Kubernetes. Kubernetes will be totally unaware of anything you manually docker run (or equivalent) and as you note normal administrative calls like kubectl get pods won't see it; the CPU and memory used by the pod won't be known about by the node interface and this could cause a node to become over utilized. The Kubernetes network environment is also pretty complicated, and unless you know the details of your specific CNI provider it'll be hard to make your container accessible at all, much less from a pod running on a different node.
A process running in a pod can access the Kubernetes API directly, though. That page notes that all of the official client libraries are aware of the conventions this uses. This means that you should be able to directly create a Job that launches your target pod, and a Service that connects to it, and get the normal Kubernetes features around this. (For example, servicename.namespacename.svc.cluster.local is a valid DNS name that reaches any Pod connected to the Service.)
You should also consider whether you actually need this sort of interface. For many applications, it will work just as well to deploy some sort of message-queue system (e.g., RabbitMQ) and then launch a pool of workers that connects to it. You can control the size of the worker queue using a Deployment. This is easier to develop since it avoids a hard dependency on Kubernetes, and easier to manage since it prevents a flood of dynamic jobs from overwhelming your cluster.
The Kubernetes documentation states it's possible to use Elasticsearch and Kibana for cluster level logging.
Is this possible to do this on the instance of Kubernetes that's shipped with Docker for Windows as per the documentation? I'm not interested in third party Kubernetes manifests or Helm charts that mimic this behavior.
Kubernetes is an open-source system for automating deployment, scaling,
and management of containerized applications.
It is a complex environment with a huge amount of information regarding the state of cluster and events
processed during execution of pods lifecycle and health checking off all nodes and whole Kubernetes
cluster.
I do not have practice with Docker for Windows, so my point of view is based on Kubernetes with Linux containers
perspective.
To collect and analyze all of this information there are some tools like Fluentd, Logstash
and they are accompanied by tools such as Elasticsearch and Kibana.
Those cluster-level log aggregation can be realized using Kubernetes orchestration framework.
So we can expect that some running containers take care of gathering data and other containers
take care of other aspects of abstractions like analyzing and presentation layer.
Please notice that some solutions depend on cloud platform features where Kubernetes environment
is running. For example, GCP offers Stackdriver Logging.
We can mention some layers of log probes and analyses:
monitoring a pod
is the most rudimentary form of viewing Kubernetes logs.
You use the kubectl commands to fetch log data for each pod individually.
These logs are stored in the pod and when the pod dies, the logs die with them.
monitoring a node. Collected log for each node are stored in a JSON file. This file can get really large.
Node-level logs are more persistent than pod-level ones.
monitoring a cluster.
Kubernetes doesn’t provide a default logging mechanism for the entire cluster, but leaves this up
to the user and third-party tools to figure out. One approach is to build on the node-level logging.
This way, you can assign an agent to log every node and combine their output.
As you see, there is a niche on cluster level monitoring, so there is a reason to aggregate current logs and
offer a practical way to analyze and present results.
On the node level logging, popular log aggregator is Fluentd. It is implemented as a Docker container,
and it is run parallel with pod lifecycle. Fluentd does not store the logs themselves.
Instead, it sends their logs to an Elasticsearch cluster that stores the log information in a replicated set of nodes.
It looks like Elasticsearch is used as a data store of aggregated logs of working nodes.
This aggregator cluster consists of a pod with two instances of Elasticsearch.
The aggregated logs in the Elasticsearch cluster can be viewed using Kibana.
This presents a web interface, which provides a more convenient interactive method for querying the ingested logs
The Kibana pods are also monitored by the Kubernetes system to ensure they are running healthily and the expected
number of replicas are present.
The lifecycle of these pods is controlled by a replication-controller specification similar in nature to how the
Elasticsearch cluster was configured.
Back to your question. I'm pretty sure that the mentioned above also works with Kubernetes and Dockers
for Windows. From the other hand, I think the cloud platform or the Linux premise environment
is a natural space to live for them.
Answer was inspired by Cluster-level Logging of Containers with Containers and Kubernetes Logging articles.
I also like Configuring centralized logging from Kubernetes page and used An Introduction
to logging in Kubernetes at my beginning with Kubernetes.
I am running a managed kubernetes cluster on Google Cloud Platform with a single node for development.
However when I update Pod images too frequently, the ImagePull step fails due to insufficient disk space in the boot disk.
I noticed that images should be auto GC-ed according to documentation, but I have no idea what is the setting on GKE or how to change it.
https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/#image-collection
Can I manually trigger a unused image clean up, using kubectl or Google Cloud console command?
How do I check/change the above GC setting above such that I wont encounter this issue in the future?
Since Garbage Collector is an automated service, there are no kubectl commands or any other commands within GCP to manually trigger Garbage Collector.
In regards to your second inquiry, Garbage Collector is handled by the Master node. The Master node is not accessible to users since it is a managed service. As such, users cannot configure Garbage Collection withing GKE.
The only workaround I can offer is to create a custom cluster from scratch within Google Compute Engine. This would provide you access to the Master node of your cluster so you can have the flexibility of configuring the cluster to your liking.
Edit: If you need to delete old images, I would suggest removing the old images using docker commands. I have attached a github article that provides several different commands that you can run on the node level to remove old images here.
How can one define log retention for kubernetes pods?
For now it seems like the log file size is not limited, and it is uses the host machine complete resources.
According to Logging Architecture from kubernetes.io there are some options
First option
Kubernetes currently is not responsible for rotating logs, but rather
a deployment tool should set up a solution to address that. For
example, in Kubernetes clusters, deployed by the kube-up.sh script,
there is a logrotate tool configured to run each hour. You can also
set up a container runtime to rotate application’s logs automatically,
e.g. by using Docker’s log-opt. In the kube-up.sh script, the latter
approach is used for COS image on GCP, and the former approach is used
in any other environment. In both cases, by default rotation is
configured to take place when log file exceeds 10MB.
Also
Second option
Sidecar containers can also be used to rotate log files that cannot be rotated by the application itself. An example of this approach is a small container running logrotate periodically. However, it’s recommended to use stdout and stderr directly and leave rotation and retention policies to the kubelet.
You can always set the logging retention policy on your docker nodes
See: https://docs.docker.com/config/containers/logging/json-file/#examples
I've just got this working by changing the ExecStart line in /etc/default/docker and adding the line --log-opt max-size=10m
Please note, that this will affect all containers running on a node, which makes it ideal for a Kubernetes setup (because my real-time logs are uploaded to an external ELK stack)