I am running a managed kubernetes cluster on Google Cloud Platform with a single node for development.
However when I update Pod images too frequently, the ImagePull step fails due to insufficient disk space in the boot disk.
I noticed that images should be auto GC-ed according to documentation, but I have no idea what is the setting on GKE or how to change it.
https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/#image-collection
Can I manually trigger a unused image clean up, using kubectl or Google Cloud console command?
How do I check/change the above GC setting above such that I wont encounter this issue in the future?
Since Garbage Collector is an automated service, there are no kubectl commands or any other commands within GCP to manually trigger Garbage Collector.
In regards to your second inquiry, Garbage Collector is handled by the Master node. The Master node is not accessible to users since it is a managed service. As such, users cannot configure Garbage Collection withing GKE.
The only workaround I can offer is to create a custom cluster from scratch within Google Compute Engine. This would provide you access to the Master node of your cluster so you can have the flexibility of configuring the cluster to your liking.
Edit: If you need to delete old images, I would suggest removing the old images using docker commands. I have attached a github article that provides several different commands that you can run on the node level to remove old images here.
Related
I'm using airflow with composer (GCP) to extract data from cloud sql for gcs and after gcs for bigquery, I have some tables between 100 Mb and 10 Gb. My dag has two tasks to do what I mentioned before. with the smaller tables the dag runs smoothly, but with slightly larger tables the cloud sql extraction task ends in a few seconds with failure, but does not bring any logs except "negsignal.sigkill", I have already tried to increase the composer capacity , among other things, but nothing has worked yet.
I'm using the mysql_to_gcs and gcs_to_bigquery operators
The first thing you should check when you get negsinal.SIGKILL is your Kubernetes resources. This is surely a problem with resources limits.
I think you should monitor your Kubernetes Cluster Nodes. Inside GCP, go to Kubernetes Engine > Clusters. You should have a cluster containing the environment that Cloud Composer uses.
Now, head to the nodes of your cluster. Each node provides you metrics about CPU, memory & disk usage. You will also see the limit for the resources that each node uses. Also, you will see the pods that each node has.
If you are not very familiar with K8s, let me explain this briefly. Airflow uses Pods inside nodes to run your Airflow tasks. These pods are called airflow-worker-[id]. That way you can identify your worker pods inside the Node.
Check your pods list. If you have evicted airflow-worker pods, then Kubernetes is stopping your workers for some reason. Since Composer uses CeleryExecutor, a evicted airflow-worker points to a problem. This is not the case if you use KubernetesExecutor, but that is not available yet in Composer.
If you click in some evicted pod, you will see the reason for eviction. That should give you the answer.
If you don't see a problem with your pod eviction, don't panic, you still have some options. From that point on, your best friend will be logs. Be sure to check your pods logs, node logs and cluster logs, in that order.
I've read kubernetes and minikube docs and it's not explicit if minikube implementation supports automatically log rotation (deleting the pod logs periodically) in order to prevent the memory to be overloaded by the logs.
I'm not talking about the various centralized logging stacks used to collect, persist and analyze logs, but the standard pod log management of minikube.
In kubernetes official documentation is specified:
An important consideration in node-level logging is implementing log rotation, so that logs don’t consume all available storage on the node. Kubernetes currently is not responsible for rotating logs, but rather a deployment tool should set up a solution to address that. For example, in Kubernetes clusters, deployed by the kube-up.sh script, there is a logrotate tool configured to run each hour. You can also set up a container runtime to rotate application’s logs automatically, for example by using Docker’s log-opt. In the kube-up.sh script, the latter approach is used for COS image on GCP, and the former approach is used in any other environment. In both cases, by default rotation is configured to take place when log file exceeds 10MB.
Of course if we're not in GCP and we don't use kube-up.sh to start the cluster (or we don't use Docker as container tool) but we spin up our Cluster with Minikube what happens?
As per the implementation
Minikube now uses systemd which has built in log rotation
Refer this issue
I have a container deployed in a pod by Kubernetes running Jenkins. The container is mounted with a persistent storage volume (AWS Elastic File Store) that's currently storing all of the Jenkins instance's user, configuration, job configurations, etc.
I need to update Jenkins. Normally when I do this, the process wipes out the storage, since the whole container gets re-launched. However, I need to figure out how to do this without losing the data.
How do I update Jenkins without losing the info on the storage volume attached to the container?
I ended up finding the answer I needed by first reading this article, which helped me understand the underlying concepts:
http://www.monkeylittle.com/blog/2017/02/08/adding-persistent-volumes-to-jenkins-with-kubernetes-volumes.html
Then I found this article, which shows you how to do it with EFS, specifically:
https://itnext.io/efs-persistent-volumes-on-aws-kubernetes-193e0035bbfb
I am creating a docker container ( using docker run) in a kubernetes Environment by invoking a rest API.
I have mounted the docker.sock of the host machine and i am building an image and running that image from RESTAPI..
Now i need to connect to this container from some other container which is actually started by Kubectl from deployment.yml file.
But when used kubeclt describe pod (Pod name), my container created using Rest API is not there.. So where is this container running and how can i connect to it from some other container ?
Are you running the container in the same namespace as namespace with deployment.yml? One of the option to check that would be to run -
kubectl get pods --all-namespaces
If you are not able to find the docker container there than I would suggest performing below steps -
docker ps -a {verify running docker status}
Ensuring that while mounting docker.sock there are no permission errors
If there are permission errors, escalate privileges to the appropriate level
To answer the second question, connection between two containers should be possible by referencing cluster DNS in below format -
"<servicename>.<namespacename>.svc.cluster.local"
I would also request you to detail steps, codes and errors(if there are any) for me to better answer the question.
You probably shouldn't be directly accessing the Docker API from anywhere in Kubernetes. Kubernetes will be totally unaware of anything you manually docker run (or equivalent) and as you note normal administrative calls like kubectl get pods won't see it; the CPU and memory used by the pod won't be known about by the node interface and this could cause a node to become over utilized. The Kubernetes network environment is also pretty complicated, and unless you know the details of your specific CNI provider it'll be hard to make your container accessible at all, much less from a pod running on a different node.
A process running in a pod can access the Kubernetes API directly, though. That page notes that all of the official client libraries are aware of the conventions this uses. This means that you should be able to directly create a Job that launches your target pod, and a Service that connects to it, and get the normal Kubernetes features around this. (For example, servicename.namespacename.svc.cluster.local is a valid DNS name that reaches any Pod connected to the Service.)
You should also consider whether you actually need this sort of interface. For many applications, it will work just as well to deploy some sort of message-queue system (e.g., RabbitMQ) and then launch a pool of workers that connects to it. You can control the size of the worker queue using a Deployment. This is easier to develop since it avoids a hard dependency on Kubernetes, and easier to manage since it prevents a flood of dynamic jobs from overwhelming your cluster.
I have two instances of an app container (happens to be a Node.JS app, but that shouldn't matter) running in a Kubernetes cluster on Google Container Engine. I'd like to scale it up to three instances.
My cluster has a master and two minion nodes, with a replication controller and a load balancer service. The replication controller keeps my app container running happily on the two nodes.
I can see that there is a handy gcloud alpha container kubectl resize command which lets me change the number of replicas, but I don't see how or if I can increase the size of the cluster itself, so that it can spin up another minion node. I only see gcloud commands to create, delete, list and describe clusters; nothing to resize them.
If I can't resize my cluster, then to scale up I'd need to create a whole new cluster and kill the old one. Am I missing something?
Also, are there plans to support auto-scaling?
Update (June 2015): Kubernetes on GCE now uses managed instance groups which you can manually resize to add new nodes to your cluster.
There isn't currently a way to add nodes to your existing Google Container Engine cluster. We are currently adding support to Kubernetes to allow clusters to have nodes dynamically added but the work isn't quite finished yet. Once the feature is available in Kubernetes you can expect that it will show up in Google Container Engine shortly after the next Kubernetes release.
In the mean time, it should be possible to run more than two replicas of your node.js application on the existing two VMs.
This presentation: http://fr.slideshare.net/craigbox/autoscaling-kubernetes is about kubernetes horizontal scaling in Google Cloud. I put this link here to save some googling to next one that ends up on this thread.