Kubernetes - Too many open files

Kubernetes - Too many open files - docker

I'm trying to evaluate the performance of one of my go server running inside the pod. However, receiving an error saying too many open files. Is there any way to set the ulimit in kubernetes?
ubuntu#ip-10-0-1-217:~/ppu$ kubectl exec -it go-ppu-7b4b679bf5-44rf7 -- /bin/sh -c 'ulimit -a'
core file size (blocks) (-c) unlimited
data seg size (kb) (-d) unlimited
scheduling priority (-e) 0
file size (blocks) (-f) unlimited
pending signals (-i) 15473
max locked memory (kb) (-l) 64
max memory size (kb) (-m) unlimited
open files (-n) 1048576
POSIX message queues (bytes) (-q) 819200
real-time priority (-r) 0
stack size (kb) (-s) 8192
cpu time (seconds) (-t) unlimited
max user processes (-u) unlimited
virtual memory (kb) (-v) unlimited
file locks (-x) unlimited
Deployment file.
---
apiVersion: apps/v1
kind: Deployment # Type of Kubernetes resource
metadata:
name: go-ppu # Name of the Kubernetes resource
spec:
replicas: 1 # Number of pods to run at any given time
selector:
matchLabels:
app: go-ppu # This deployment applies to any Pods matching the specified label
template: # This deployment will create a set of pods using the configurations in this template
metadata:
labels: # The labels that will be applied to all of the pods in this deployment
app: go-ppu
spec: # Spec for the container which will run in the Pod
containers:
- name: go-ppu
image: ppu_test:latest
imagePullPolicy: Never
ports:
- containerPort: 8081 # Should match the port number that the Go application listens on
livenessProbe: # To check t$(minikube docker-env)he health of the Pod
httpGet:
path: /health
port: 8081
scheme: HTTP
initialDelaySeconds: 35
periodSeconds: 30
timeoutSeconds: 20
readinessProbe: # To check if the Pod is ready to serve traffic or not
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 35
timeoutSeconds: 20
Pods info:
ubuntu#ip-10-0-1-217:~/ppu$ kubectl get pods
NAME READY STATUS RESTARTS AGE
go-ppu-7b4b679bf5-44rf7 1/1 Running 0 18h
ubuntu#ip-10-0-1-217:~/ppu$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 19h
ppu-service LoadBalancer 100.64.171.12 74d35bb2a5f30ca13877-1351038893.us-east-1.elb.amazonaws.com 8081:32623/TCP 18h
When I used locust to test the performance of the server receiving the following error.
# fails Method Name Type
3472 POST /supplyInkHistory ConnectionError(MaxRetryError("HTTPConnectionPool(host='74d35bb2a5f30ca13877-1351038893.us-east-1.elb.amazonaws.com', port=8081): Max retries exceeded with url: /supplyInkHistory (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x....>: Failed to establish a new connection: [Errno 24] Too many open files',))",),)

May you have a look at https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/
You but you need enable few features to make it work.
securityContext:
sysctls:
- name: fs.file-max
value: "YOUR VALUE HERE"

There was a few cases regarding setting --ulimit argument, you can find them here or check this article. This resource limit can be set by Docker during the container startup. As you add tag google-kubernetes-engine answer will be related to GKE environment, however on other cloud it could work similar.
If you would like to set unlimit for open files you can modify configuration file /etc/security/limits.conf. However, please not it will not persist across reboots.
Second option would be edit /etc/init/docker.conf and restart docker service. As default it have a few limits like nofile or nproc, you can add it here.
Another option could be to use instance template. Instance template would include a start-up script that set the required limit.
After that, you would need to use this new instance template for the instance group in the GKE. More information here and here.

Related

Trying to understand what values to use for resources and limits of multiple container deployment

I am trying to set up HorizontalPodAutoscaler autoscaler for my app, alongside automatic Cluster Autoscaling of DigitalOcean
I will add my deployment yaml below, I have also deployed metrics-server as per guide in link above. At the moment I am struggling to figure out how to determine what values to use for my cpu and memory requests and limits fields. Mainly due to variable replica count, i.e. do I need to account for maximum number of replicas each using their resources or for deployment in general, do I plan it per pod basis or for each container individually?
For some context I am running this on a cluster that can have up to two nodes, each node has 1 vCPU and 2GB of memory (so total can be 2 vCPUs and 4 GB of memory).
As it is now my cluster is running one node and my kubectl top statistics for pods and nodes look as follows:
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
graphql-85cc89c874-cml6j 5m 203Mi
graphql-85cc89c874-swmzc 5m 176Mi
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
skimitar-dev-pool-3cpbj 62m 6% 1151Mi 73%
I have tried various combinations of cpu and resources, but when I deploy my file my deployment is either stuck in a Pending state, or keeps restarting multiple times until it gets terminated. My horizontal pod autoscaler also reports targets as <unknown>/80%, but I believe it is due to me removing resources from my deployment, as it was not working.
Considering deployment below, what should I look at / consider in order to determine best values for requests and limits of my resources?
Following yaml is cleaned up from things like env variables / services, it works as is, but results in above mentioned issues when resources fields are uncommented.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: graphql
spec:
replicas: 2
selector:
matchLabels:
app: graphql
template:
metadata:
labels:
app: graphql
spec:
containers:
- name: graphql-hasura
image: hasura/graphql-engine:v1.2.1
ports:
- containerPort: 8080
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
# resources:
# requests:
# memory: "150Mi"
# cpu: "100m"
# limits:
# memory: "200Mi"
# cpu: "150m"
- name: graphql-actions
image: my/nodejs-app:1
ports:
- containerPort: 4040
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 4040
readinessProbe:
httpGet:
path: /healthz
port: 4040
# resources:
# requests:
# memory: "150Mi"
# cpu: "100m"
# limits:
# memory: "200Mi"
# cpu: "150m"
# Disruption budget
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: graphql-disruption-budget
spec:
minAvailable: 1
selector:
matchLabels:
app: graphql
# Horizontal auto scaling
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: graphql-autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: graphql
minReplicas: 2
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80

How to determine what values to use for my cpu and memory requests and limits fields. Mainly due to variable replica count, i.e. do I need to account for maximum number of replicas each using their resources or for deployment in general, do I plan it per pod basis or for each container individually
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory.
Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.
Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
The number of replicas will be determined by the autoscaler on the ReplicaController.
when I deploy my file my deployment is either stuck in a Pending state, or keeps restarting multiple times until it gets terminated.
pending state means that there is not resources available to schedule new pods.
restarting may be triggered by other issues, I'd suggest you to debug it after solving the scaling issues.
My horizontal pod autoscaler also reports targets as <unknown>/80%, but I believe it is due to me removing resources from my deployment, as it was not working.
You are correct, if you don't set the request limit, the % desired will remain unknown and the autoscaler won't be able to trigger scaling up or down.
Here you can see algorithm responsible for that.
Horizontal Pod Autoscaler will trigger new pods based on the request % of usage on the pod. In this case whenever the pod reachs 80% of the max request value it will trigger new pods up to the maximum specified.
For a good HPA example, check this link: Horizontal Pod Autoscale Walkthrough
But How does Horizontal Pod Autoscaler works with Cluster Autoscaler?
Horizontal Pod Autoscaler changes the deployment's or replicaset's number of replicas based on the current CPU load. If the load increases, HPA will create new replicas, for which there may or may not be enough space in the cluster.
If there are not enough resources, CA will try to bring up some nodes, so that the HPA-created pods have a place to run. If the load decreases, HPA will stop some of the replicas. As a result, some nodes may become underutilized or completely empty, and then CA will terminate such unneeded nodes.
NOTE: The key is to set the maximum replicas for HPA thinking on a cluster level according to the amount of nodes (and budget) available for your app, you can start setting a very high max number of replicas, monitor and then change it according to the usage metrics and prediction of future load.
Take a look at How to Enable the Cluster Autoscaler for a DigitalOcean Kubernetes Cluster in order to properly enable it as well.
If you have any question let me know in the comments.

How to expose low-numbered ports in the kubernetes mini-cluster that comes with Docker Desktop

I'm using the kubernetes cluster built in to Docker Desktop to develop my application.
I would like to expose services inside the cluster as ports on localhost.
I can do so using kubectl expose deployment foobar --type=NodePort --port=30088, which creates a service like this:
apiVersion: v1
kind: Service
metadata:
labels:
role: web
name: foobar
spec:
externalTrafficPolicy: Cluster
ports:
- nodePort: 30088
port: 80
protocol: TCP
targetPort: 80
selector:
role: web
type: NodePort
But it only works for very high numbered ports. If I try something lower I get:
The Service "kafka-external" is invalid: spec.ports[0].nodePort: Invalid value: 9092: provided port is not in the valid range. The range of valid ports is 30000-32767
It seems there is a kubernetes apiserver setting called ServiceNodePortRange which would allow me to override this restriction, but I can't figure out how to set it on Docker's builtin cluster.
So my question is: how do I expose a specific, low-numbered port (like 9092) on Docker's kubernetes cluster? Is there a way to override that setting? Or a better way to expose the service than NodePort?

NodePort is intended to be a building block for load-balancers or other
ingress modes. This means it didn't matter which port you got as long as
you got one. This makes it a little clunky to use directly - you can't
have just any port. You can change the port range, but you run the risk of
conflicts with real things running on your nodes and with any pod HostPorts.
The default range is indeed 30000-32767 but it can be changed by setting the --service-node-port-range Update the file /etc/kubernetes/manifests/kube-apiserver.yaml and add the line --service-node-port-range=xxxxx-yyyyy.
In the Kubernetes cluster there is a kube-apiserver.yaml file which is in the directory - /etc/kubernetes/manifests/kube-apiserver.yaml but not on the kube-apiserver container/pod but on the master itself.
Login to Docker VM:
Add the following line to the pod spec:
spec:
containers:
- command:
- kube-apiserver
...
- --service-node-port-range=xxxxx-yyyyy # <-- add this line
...
Save and exit. Pod kube-apiserver will be restarted with new parameters.
Exit Docker VM (for screen: Ctrl-a,k , for container: Ctrl-d )
Check the results:
$ kubectl get pod kube-apiserver-docker-desktop -o yaml -n kube-system | less
Take a look: service-pod-range, changing pod range, changing-nodeport-range.

minikube how to connect from one pod to another using hostnames?

I am running a cluster in default namespace with all the pods in Running state.
I have an issue, I am trying to telnet from one pod to another pod using the pod hostname 'abcd-7988b76669-lgp8l' but I am not able to connect. although it works if I use pods internal ip. Why does the dns is not resolved?
I looked at
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-5lpfd 1/1 Running 0 12h
coredns-6955765f44-9cvnb 1/1 Running 0 12h
Anybody has any idea how to connect from one pod to another using hostname resolution ?

First of all it is worth mentioning that typically you won't connect to individual Pods using their domain names. One good reason for that is their ephemeral nature. Note that typically you don't create plain Pods but controller such as Deployment which manages your Pods and ensures that specific number of Pods of a certain kind is constantly up and running. Pods may be often deleted and recreated hence you should never rely on their domain names in your applications. Typically you will expose them to another apps e.g. running in other Pods via Service.
Although using invididual Pod's domain name is not recommended, it is still possible. You can do it just for fun or learning/experimenting purposes.
As #David already mentioned you would help us much more in providing you a comprehensive answer if you EDIT your question and provide a few important details, showing what you've tried already such as your Pods and Services definitions in yaml format.
Answering literally to your question posted in the title:
minikube how to connect from one pod to another using hostnames?
You won't be able to connect to a Pod using simply its hostname. You can e.g. ping your backend Pods exposed via ClusterIP Service by simply pinging the <service-name> (provided it is in the same namespace as the Pod your pinging from).
Keep in mind however that it doesn't work for Pods - neither Pods names nor their hostnames are resolvable by cluster DNS.
You should be able to connect to an individual Pod using its fully quallified domain name (FQDN) provided you have configured everything properly. Just make sure you didn't overlook any of the steps described here:
Make sure you've created a simple Headless Service which may look like this:
apiVersion: v1
kind: Service
metadata:
name: default-subdomain
spec:
selector:
name: busybox
clusterIP: None
Make sure that your Pods definitions didn't lack any important details:
apiVersion: v1
kind: Pod
metadata:
name: busybox1
labels:
name: busybox
spec:
hostname: busybox-1
subdomain: default-subdomain
containers:
- image: busybox:1.28
command:
- sleep
- "3600"
name: busybox
---
apiVersion: v1
kind: Pod
metadata:
name: busybox2
labels:
name: busybox
spec:
hostname: busybox-2
subdomain: default-subdomain
containers:
- image: busybox:1.28
command:
- sleep
- "3600"
name: busybox
Speaking about important details, pay special attention that you correctly defined hostname and subdomain in Pod specification and that labels used by Pods match the labels used by Service's selector.
Once everything is configured properly you will be able to attach to Pod busybox1 and ping Pod busybox2 by using its FQDN like in the example below:
$ kubectl exec -ti busybox1 -- /bin/sh
/ # ping busybox-2.default-subdomain.default.svc.cluster.local
PING busybox-2.default-subdomain.default.svc.cluster.local (10.16.0.109): 56 data bytes
64 bytes from 10.16.0.109: seq=0 ttl=64 time=0.051 ms
64 bytes from 10.16.0.109: seq=1 ttl=64 time=0.082 ms
64 bytes from 10.16.0.109: seq=2 ttl=64 time=0.081 ms
I hope this helps.

Any way to prevent k8s pod eviction?

I have a set of daemons I need to run, generally, they do not consume much memory or CPU and I have their limits to cpu: 150m and memory: 150m.
Occasionally they will spike to quite a bit higher than this and this seems to be causing evictions and unstable node.
It is critical that the daemons remain running 24/7, even if they are throttled by CPU and/or memory when they spike. Is it possible to prevent their eviction and to cap their resources?
As I understand the CPU usage is throttled but over memory use results in an OOM eviction, is there any way to prevent this eviction?

As of 1.11, you can set pod priorities.
create priority class
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
set priority in pod
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
priorityClassName: high-priority

Sounds like you need to track the resources consumption trends with something like Prometheus + Grafana to check what sort of spikes you expect from your DaemonSets.
Then you can allocate more resources to these pods or remove this config (which, by default, will leave them in unbounded mode). But, of course, you don't want to risk a full node / host crash so you can consider tweaking your eviction threshold:
https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#eviction-thresholds
More details:
https://kubernetes-v1-4.github.io/docs/admin/limitrange/

Running kubernetes autoscalar

I have a replication controller running with the following spec:
apiVersion: v1
kind: ReplicationController
metadata:
name: owncloud-controller
spec:
replicas: 1
selector:
app: owncloud
template:
metadata:
labels:
app: owncloud
spec:
containers:
- name: owncloud
image: adimania/owncloud9-centos7
ports:
- containerPort: 80
volumeMounts:
- name: userdata
mountPath: /var/www/html/owncloud/data
resources:
requests:
cpu: 400m
volumes:
- name: userdata
hostPath:
path: /opt/data
Now I run a hpa using autoscale command.
$ kubectl autoscale rc owncloud-controller --max=5 --cpu-percent=10
I have also started heapster using kubernetes run command.
$ kubectl run heapster --image=gcr.io/google_containers/heapster:v1.0.2 --command -- /heapster --source=kubernetes:http://192.168.0.103:8080?inClusterConfig=false --sink=log
After all this, the autoscaling never kicks in. From logs, it seems that the actual CPU utilization is not getting reported.
$ kubectl describe hpa owncloud-controller
Name: owncloud-controller
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 26 May 2016 14:24:51 +0530
Reference: ReplicationController/owncloud-controller/scale
Target CPU utilization: 10%
Current CPU utilization: <unset>
Min replicas: 1
Max replicas: 5
ReplicationController pods: 1 current / 1 desired
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
44m 8s 92 {horizontal-pod-autoscaler } Warning FailedGetMetrics failed to get CPU consumption and request: metrics obtained for 0/1 of pods
44m 8s 92 {horizontal-pod-autoscaler } Warning FailedComputeReplicas failed to get CPU utilization: failed to get CPU consumption and request: metrics obtained for 0/1 of pods
What am I missing here?

Most probably heapster is running in a wrong namespace ("default"). HPA expects heapster to be in "kube-system" namespace. Please, add --namespace=kube-system to kubectl run heapster command.

I installed hepaster under the name space "kube-system" and it worked. After running heapster, make sure it's running before you use HPA for your application.
How to run Heapster with Kubernetes cluster
I put all files here https://gitlab.com/abushoeb/kubernetes/tree/master/heapster. They are collected from the official Kubernetes Repository and made minor changes.
How to run Heapster
Go to the directory heapster where you have grafana.yaml, heapster.yaml and influxdb.yaml and run following command
$ kubectl create -f .
How to stop Heapster
Go to the same heapster directory and then run following command
$ kubectl delete -f .
How to check Heapster is running
You can access heapster metric model from the pod where heapster is running to make sure heapster is working. It can be accessed via web browser by accessing http://heapster-pod-ip:heapster-service-port/api/v1/model/metrics/. The same result can be seen by executing following command.
$ curl -L http://heapster-pod-ip:heapster-service-port/api/v1/model/metrics/
If you see the list of metrics then heapster is running correctly. You can also browse grafana dashboard to see it (find the ip of the pod where grafana is running and the access it http://grafana-pod-ip:grafana-service-port).
Full documentation of Heapster Metric Model are available here.
Also just run ($ kubectl cluster-info) and see if it shows results like this:
Kubernetes master is running at https://cluster-ip:6443
Heapster is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/heapster
kubernetes-dashboard is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
monitoring-grafana is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
monitoring-influxdb is running at https://cluster-ip:6443/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb
Check influxdb
You can also check influxdb if it has data in it. Install Influxdb Client on your local machine to get connected to infuxdb database.
$ influx -host <cluster-ip> -port <influxdb-service-port>
Some Sample influxdb queries
show databases
use db-name
show measurements
select value from "cpu/node_capacity"
Reference and Help
https://github.com/kubernetes/heapster/blob/master/docs/influxdb.md
https://github.com/kubernetes/heapster/blob/master/docs/debugging.md
https://blog.kublr.com/how-to-utilize-the-heapster-influxdb-grafana-stack-in-kubernetes-for-monitoring-pods-4a553f4d36c9
http://www.dasblinkenlichten.com/installing-cadvisor-and-heapster-on-bare-metal-kubernetes/
http://blog.arungupta.me/kubernetes-monitoring-heapster-influxdb-grafana/

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart