I have aks with one kubernetes cluster having 2 nodes. Each node has about 6-7 pod running with 2 containers for each pod. One container is my docker image and the other is created by istio for its service mesh. But after about 10 hours the nodes become 'not ready' and the node describe shows me 2 errors:
1.container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s.
2.rpc error: code = DeadlineExceeded desc = context deadline exceeded,
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
When I restart the node, it works fine but, the node goes back to 'NOT READY' after a while. Started facing this issue since adding in istio, but could not find any documents relating the two. Next step is to try and upgrade kubernetes
The node describe log:
Name: aks-agentpool-22124581-0
Roles: agent
Labels: agentpool=agentpool
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=Standard_B2s
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eastus
failure-domain.beta.kubernetes.io/zone=1
kubernetes.azure.com/cluster=MC_XXXXXXXXX
kubernetes.io/hostname=aks-XXXXXXXXX
kubernetes.io/role=agent
node-role.kubernetes.io/agent=
storageprofile=managed
storagetier=Premium_LRS
Annotations: aks.microsoft.com/remediated=3
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Thu, 25 Oct 2018 14:46:53 +0000
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 25 Oct 2018 14:49:06 +0000 Thu, 25 Oct 2018 14:49:06 +0000 RouteCreated RouteController created a route
OutOfDisk False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 19 Dec 2018 19:28:55 +0000 Thu, 25 Oct 2018 14:46:53 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletNotReady container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s
Addresses:
Hostname: aks-XXXXXXXXX
Capacity:
cpu: 2
ephemeral-storage: 30428648Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4040536Ki
pods: 110
Allocatable:
cpu: 1940m
ephemeral-storage: 28043041951
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3099480Ki
pods: 110
System Info:
Machine ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
System UUID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Boot ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Kernel Version: 4.15.0-1035-azure
OS Image: Ubuntu 16.04.5 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://Unknown
Kubelet Version: v1.11.3
Kube-Proxy Version: v1.11.3
PodCIDR: 10.244.0.0/24
ProviderID: azure:///subscriptions/9XXXXXXXXXXX/resourceGroups/MC_XXXXXXXXXXXXXXXXXXXXXXXXXXXX/providers/Microsoft.Compute/virtualMachines/aks-XXXXXXXXXXXX
Non-terminated Pods: (42 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default emailgistics-graph-monitor-6477568564-q98p2 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-message-handler-7df4566b6f-mh255 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-reports-aggregator-5fd96b94cb-b5vbn 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-rules-844b77f46-5lrkw 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-scheduler-754884b566-mwgvp 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-subscription-token-manager-7974558985-f2t49 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default mollified-kiwi-cert-manager-665c5d9c8c-2ld59 0 (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system grafana-59b787b9b-dzdtc 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-citadel-5d8956cc6-x55vk 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-egressgateway-f48fc7fbb-szpwp 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-galley-6975b6bd45-g7lsc 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-ingressgateway-c6c4bcdbf-bbgcw 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-pilot-d9b5b9b7c-ln75n 510m (26%) 0 (0%) 2Gi (67%) 0 (0%)
istio-system istio-policy-6b465cd4bf-92l57 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-b2z85 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-j59r4 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-s9pdm 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-sidecar-injector-575597f5cf-npkcz 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-9794j 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-g7gh5 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-gd88n 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-px8qb 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-xzslh 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-tracing-7596597bd7-hjtq2 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system prometheus-76db5fddd5-d6dxs 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system servicegraph-758f96bf5b-c9sqk 10m (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system addon-http-application-routing-default-http-backend-5ccb95zgfm8 10m (0%) 10m (0%) 20Mi (0%) 20Mi (0%)
kube-system addon-http-application-routing-external-dns-59d8698886-h8xds 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system addon-http-application-routing-nginx-ingress-controller-ff49qc7 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system heapster-5d6f9b846c-m4kfp 130m (6%) 130m (6%) 230Mi (7%) 230Mi (7%)
kube-system kube-dns-v20-7c7d7d4c66-qqkfm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%)
kube-system kube-dns-v20-7c7d7d4c66-wrxjm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%)
kube-system kube-proxy-2tb68 100m (5%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-svc-redirect-d6gqm 10m (0%) 0 (0%) 34Mi (1%) 0 (0%)
kube-system kubernetes-dashboard-68f468887f-l9x46 100m (5%) 100m (5%) 50Mi (1%) 300Mi (9%)
kube-system metrics-server-5cbc77f79f-x55cs 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system omsagent-mhrqm 50m (2%) 150m (7%) 150Mi (4%) 300Mi (9%)
kube-system omsagent-rs-d688cdf68-pjpmj 50m (2%) 150m (7%) 100Mi (3%) 500Mi (16%)
kube-system tiller-deploy-7f4974b9c8-flkjm 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system tunnelfront-7f766dd857-kgqps 10m (0%) 0 (0%) 64Mi (2%) 0 (0%)
kube-systems-dev nginx-ingress-dev-controller-7f78f6c8f9-csct4 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-systems-dev nginx-ingress-dev-default-backend-95fbc75b7-lq9tw 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1540m (79%) 540m (27%)
memory 2976Mi (98%) 1790Mi (59%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ContainerGCFailed 48m (x43 over 19h) kubelet, aks-agentpool-22124581-0 rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning ImageGCFailed 29m (x57 over 18h) kubelet, aks-agentpool-22124581-0 failed to get image stats: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Warning ContainerGCFailed 2m (x237 over 18h) kubelet, aks-agentpool-22124581-0 rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
General deployment file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: emailgistics-pod
spec:
minReadySeconds: 10
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
sidecar.istio.io/status: '{"version":"ebf16d3ea0236e4b5cb4d3fc0f01da62e2e6265d005e58f8f6bd43a4fb672fdd","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
creationTimestamp: null
labels:
app: emailgistics-pod
spec:
containers:
- image: xxxxxxxxxxxxxxxxxxxxx/emailgistics_pod:xxxxxx
imagePullPolicy: Always
name: emailgistics-pod
ports:
- containerPort: 80
resources: {}
- args:
- proxy
- sidecar
- --configPath
- /etc/istio/proxy
- --binaryPath
- /usr/local/bin/envoy
- --serviceCluster
- emailgistics-pod
- --drainDuration
- 45s
- --parentShutdownDuration
- 1m0s
- --discoveryAddress
- istio-pilot.istio-system:15005
- --discoveryRefreshDelay
- 1s
- --zipkinAddress
- zipkin.istio-system:9411
- --connectTimeout
- 10s
- --proxyAdminPort
- "15000"
- --controlPlaneAuthPolicy
- MUTUAL_TLS
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: ISTIO_META_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
- name: ISTIO_METAJSON_LABELS
value: |
{"app":"emailgistics-pod"}
image: docker.io/istio/proxyv2:1.0.4
imagePullPolicy: IfNotPresent
name: istio-proxy
ports:
- containerPort: 15090
name: http-envoy-prom
protocol: TCP
resources:
requests:
cpu: 10m
securityContext:
readOnlyRootFilesystem: true
runAsUser: 1337
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/certs/
name: istio-certs
readOnly: true
imagePullSecrets:
- name: ga.secretname
initContainers:
- args:
- -p
- "15001"
- -u
- "1337"
- -m
- REDIRECT
- -i
- '*'
- -x
- ""
- -b
- "80"
- -d
- ""
image: docker.io/istio/proxy_init:1.0.4
imagePullPolicy: IfNotPresent
name: istio-init
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
volumes:
- emptyDir:
medium: Memory
name: istio-envoy
- name: istio-certs
secret:
optional: true
secretName: istio.default
status: {}
---
Currently this is a known bug and no real fix has been created to normalize nodes behavior.
Inspect below urls:
https://github.com/kubernetes/kubernetes/issues/45419
https://github.com/kubernetes/kubernetes/issues/61117
https://github.com/Azure/AKS/issues/102
Hope soon we will have a solution.
Related
I have a 1.16.2 version cluster of kubernetes. When I deploy all the service in the cluster with the replicas is 1, it works fine. Then i scale all the service's replicas to 2 And check out. found that some service are running Normal but But some states are pending.
when I kubectl describe one of the Pending pod, I get the message like below
[root#runsdata-bj-01 society-training-service-v1-0]# kcd society-resident-service-v3-0-788446c49b-rzjsx
Name: society-resident-service-v3-0-788446c49b-rzjsx
Namespace: runsdata
Priority: 0
Node: <none>
Labels: app=society-resident-service-v3-0
pod-template-hash=788446c49b
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/society-resident-service-v3-0-788446c49b
Containers:
society-resident-service-v3-0:
Image: docker.ssiid.com/society-resident-service:3.0.33
Port: 8231/TCP
Host Port: 0/TCP
Limits:
cpu: 1
memory: 4Gi
Requests:
cpu: 200m
memory: 2Gi
Liveness: http-get http://:8231/actuator/health delay=600s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:8231/actuator/health delay=30s timeout=5s period=10s #success=1 #failure=3
Environment:
spring_profiles_active: production
TZ: Asia/Hong_Kong
JAVA_OPTS: -Djgroups.use.jdk_logger=true -Xmx4000M -Xms4000M -Xmn600M -XX:PermSize=500M -XX:MaxPermSize=500M -Xss384K -XX:+DisableExplicitGC -XX:SurvivorRatio=1 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+PrintClassHistogram -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -Xloggc:log/gc.log
Mounts:
/data/storage from nfs-data-storage (rw)
/opt/security from security (rw)
/var/log/runsdata from log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from application-token-vgcvb (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
log:
Type: HostPath (bare host directory volume)
Path: /log/runsdata
HostPathType:
security:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-security-claim
ReadOnly: false
nfs-data-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-storage-claim
ReadOnly: false
application-token-vgcvb:
Type: Secret (a volume populated by a Secret)
SecretName: application-token-vgcvb
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/4 nodes are available: 4 Insufficient memory.
And from below, you can see that my machine have more than 2G memory left .
[root#runsdata-bj-01 society-training-service-v1-0]# kcp |grep Pending
society-insurance-foundation-service-v2-0-7697b9bd5b-7btq6 0/1 Pending 0 60m
society-notice-service-v1-0-548b8d5946-c5gzm 0/1 Pending 0 60m
society-online-business-service-v2-1-7f897f564-phqjs 0/1 Pending 0 60m
society-operation-gateway-7cf86b77bd-lmswm 0/1 Pending 0 60m
society-operation-user-service-v1-1-755dcff964-dr9mj 0/1 Pending 0 60m
society-resident-service-v3-0-788446c49b-rzjsx 0/1 Pending 0 60m
society-training-service-v1-0-774f8c5d98-tl7vq 0/1 Pending 0 60m
society-user-service-v3-0-74865dd9d7-t9fwz 0/1 Pending 0 60m
traefik-ingress-controller-8688cccf79-5gkjg 0/1 Pending 0 60m
[root#runsdata-bj-01 society-training-service-v1-0]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
192.168.0.94 384m 9% 11482Mi 73%
192.168.0.95 399m 9% 11833Mi 76%
192.168.0.96 399m 9% 11023Mi 71%
192.168.0.97 457m 11% 10782Mi 69%
[root#runsdata-bj-01 society-training-service-v1-0]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.0.94 Ready <none> 8d v1.16.2
192.168.0.95 Ready <none> 8d v1.16.2
192.168.0.96 Ready <none> 8d v1.16.2
192.168.0.97 Ready <none> 8d v1.16.2
[root#runsdata-bj-01 society-training-service-v1-0]#
here is the description of all 4 node
[root#runsdata-bj-01 frontend]#kubectl describe node 192.168.0.94
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1930m (48%) 7600m (190%)
memory 9846Mi (63%) 32901376Ki (207%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root#runsdata-bj-01 frontend]#kubectl describe node 192.168.0.95
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1670m (41%) 6600m (165%)
memory 7196Mi (46%) 21380Mi (137%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root#runsdata-bj-01 frontend]# kubectl describe node 192.168.0.96
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2610m (65%) 7 (175%)
memory 9612Mi (61%) 19960Mi (128%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root#runsdata-bj-01 frontend]# kubectl describe node 192.168.0.97
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2250m (56%) 508200m (12705%)
memory 10940Mi (70%) 28092672Ki (176%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
And the memory of all the 4 node:
[root#runsdata-bj-00 ~]# free -h
total used free shared buff/cache available
Mem: 15G 2.8G 6.7G 2.1M 5.7G 11G
Swap: 0B 0B 0B
[root#runsdata-bj-01 frontend]# free -h
total used free shared buff/cache available
Mem: 15G 7.9G 3.7G 2.4M 3.6G 6.8G
Swap: 0B 0B 0B
[root#runsdata-bj-02 ~]# free -h
total used free shared buff/cache available
Mem: 15G 5.0G 2.9G 3.9M 7.4G 9.5G
Swap: 0B 0B 0B
[root#runsdata-bj-03 ~]# free -h
total used free shared buff/cache available
Mem: 15G 6.5G 2.2G 2.3M 6.6G 8.2G
Swap: 0B 0B 0B
here is the kube-scheduler log:
[root#runsdata-bj-01 log]# cat messages|tail -n 5000|grep kube-scheduler
Apr 17 14:31:24 runsdata-bj-01 kube-scheduler: E0417 14:31:24.404442 12740 factory.go:585] pod is already present in the activeQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.490310 12740 factory.go:585] pod is already present in the backoffQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.873292 12740 factory.go:585] pod is already present in the backoffQ
Apr 18 21:44:18 runsdata-bj-01 etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " with result "range_response_count:1 size:440" took too long (100.521269ms) to execute
Apr 18 21:59:40 runsdata-bj-01 kube-scheduler: E0418 21:59:40.050852 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.069465 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.950254 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:08 runsdata-bj-01 kube-scheduler: E0418 22:03:08.567290 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.152812 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.344902 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:04:32 runsdata-bj-01 kube-scheduler: E0418 22:04:32.969606 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:09:51 runsdata-bj-01 kube-scheduler: E0418 22:09:51.366877 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.430976 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.441182 12740 factory.go:585] pod is already present in the activeQ
I searched google and stackoverflow and can not found the solution.
who can help me ?
Kubernetes preserves the node stability instead the resource provisioning, the memory available is not calculate based on free -m command, as the documentation mention:
The value for memory.available is derived from the cgroupfs instead of tools like free -m. This is important because free -m does not work in a container, and if users use the node allocatable feature, out of resource decisions are made local to the end user Pod part of the cgroup hierarchy as well as the root node. This script reproduces the same set of steps that the kubelet performs to calculate memory.available. The kubelet excludes inactive_file (i.e. # of bytes of file-backed memory on inactive LRU list) from its calculation as it assumes that memory is reclaimable under pressure.
You could use the script mentioned above to check your memory available in the nodes and if there's no available resource you will need to increase the cluster size adding a new node.
Additionally, you can check the documenation page for more information about resources limits
The problem i'm running into is very similar to the other existing post, except they all have the same solution therefore im creating a new thread.
The Problem:
The Master node is still in "NotReady" status after installing Flannel.
Expected result:
Master Node becomes "Ready" after installing Flannel.
Background:
I am following this guide when installing Flannel
My concern is that I am using Kubelet v1.17.2 by default that just came out like last month (Can anyone confirm if v1.17.2 works with Flannel?"
Here is the output after running the command on master node: kubectl describe node machias
Name: machias
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=machias
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"be:78:65:7f:ae:6d"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.122.172
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 15 Feb 2020 01:00:01 -0500
Taints: node.kubernetes.io/not-ready:NoExecute
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: machias
AcquireTime: <unset>
RenewTime: Sat, 15 Feb 2020 13:54:56 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sat, 15 Feb 2020 13:54:52 -0500 Sat, 15 Feb 2020 00:59:54 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sat, 15 Feb 2020 13:54:52 -0500 Sat, 15 Feb 2020 00:59:54 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sat, 15 Feb 2020 13:54:52 -0500 Sat, 15 Feb 2020 00:59:54 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Sat, 15 Feb 2020 13:54:52 -0500 Sat, 15 Feb 2020 00:59:54 -0500 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
InternalIP: 192.168.122.172
Hostname: machias
Capacity:
cpu: 2
ephemeral-storage: 38583284Ki
hugepages-2Mi: 0
memory: 4030364Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 35558354476
hugepages-2Mi: 0
memory: 3927964Ki
pods: 110
System Info:
Machine ID: 20cbe0d737dd43588f4a2bccd70681a2
System UUID: ee9bc138-edee-471a-8ecc-f1c567c5f796
Boot ID: 0ba49907-ec32-4e80-bc4c-182fccb0b025
Kernel Version: 5.3.5-200.fc30.x86_64
OS Image: Fedora 30 (Workstation Edition)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.5
Kubelet Version: v1.17.2
Kube-Proxy Version: v1.17.2
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system etcd-machias 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12h
kube-system kube-apiserver-machias 250m (12%) 0 (0%) 0 (0%) 0 (0%) 12h
kube-system kube-controller-manager-machias 200m (10%) 0 (0%) 0 (0%) 0 (0%) 12h
kube-system kube-flannel-ds-amd64-rrfht 100m (5%) 100m (5%) 50Mi (1%) 50Mi (1%) 12h
kube-system kube-proxy-z2q7d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12h
kube-system kube-scheduler-machias 100m (5%) 0 (0%) 0 (0%) 0 (0%) 12h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 650m (32%) 100m (5%)
memory 50Mi (1%) 50Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
And the following command: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6955765f44-7nz46 0/1 Pending 0 12h
kube-system coredns-6955765f44-xk5r2 0/1 Pending 0 13h
kube-system etcd-machias.cs.unh.edu 1/1 Running 0 13h
kube-system kube-apiserver-machias.cs.unh.edu 1/1 Running 0 13h
kube-system kube-controller-manager-machias.cs.unh.edu 1/1 Running 0 13h
kube-system kube-flannel-ds-amd64-rrfht 1/1 Running 0 12h
kube-system kube-flannel-ds-amd64-t7p2p 1/1 Running 0 12h
kube-system kube-proxy-fnn78 1/1 Running 0 12h
kube-system kube-proxy-z2q7d 1/1 Running 0 13h
kube-system kube-scheduler-machias.cs.unh.edu 1/1 Running 0 13h
Thank you for your help!
I've reproduced your scenario using the same versions you are using to make sure these versions work with Flannel.
After testing it I can affirm that there is no problem with the version you are using.
I created it following these steps:
Ensure iptables tooling does not use the nftables backend Source
update-alternatives --set iptables /usr/sbin/iptables-legacy
Installing runtime
sudo yum remove docker docker-common docker-selinux docker-engine
sudo yum install -y yum-utils device-mapper-persistent-data lvm2
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce-19.03.5-3.el7
sudo systemctl start docker
Installing kubeadm, kubelet and kubectl
sudo su -c "cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF"
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
sudo yum install -y kubelet-1.17.2-0 kubeadm-1.17.2-0 kubectl-1.17.2-0 --disableexcludes=kubernetes
sudo systemctl enable --now kubelet
Note:
Setting SELinux in permissive mode by running setenforce 0 and sed ... effectively disables it. This is required to allow containers to access the host filesystem, which is needed by pod networks for example. You have to do this until SELinux support is improved in the kubelet.
Some users on RHEL/CentOS 7 have reported issues with traffic being routed incorrectly due to iptables being bypassed. You should ensure net.bridge.bridge-nf-call-iptables is set to 1 in your sysctl config, e.g.
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
Make sure that the br_netfilter module is loaded before this step. This can be done by running lsmod | grep br_netfilter. To load it explicitly call modprobe br_netfilter.
Initialize cluster with Flannel CIDR
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Add Flannel CNI
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
By default, your cluster will not schedule Pods on the control-plane node for security reasons. If you want to be able to schedule Pods on the control-plane node, e.g. for a single-machine Kubernetes cluster for development, run:
kubectl taint nodes --all node-role.kubernetes.io/master-
As can be seen, my master node is Ready. Please, follow this How-to and let me know if you can achieve your desired state.
$ kubectl describe nodes
Name: kubeadm-fedora
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=kubeadm-fedora
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"8e:7e:bf:d9:21:1e"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.128.15.200
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 17 Feb 2020 11:31:59 +0000
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: kubeadm-fedora
AcquireTime: <unset>
RenewTime: Mon, 17 Feb 2020 11:47:52 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 17 Feb 2020 11:47:37 +0000 Mon, 17 Feb 2020 11:31:51 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 17 Feb 2020 11:47:37 +0000 Mon, 17 Feb 2020 11:31:51 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 17 Feb 2020 11:47:37 +0000 Mon, 17 Feb 2020 11:31:51 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 17 Feb 2020 11:47:37 +0000 Mon, 17 Feb 2020 11:32:32 +0000 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.128.15.200
Hostname: kubeadm-fedora
Capacity:
cpu: 2
ephemeral-storage: 104844988Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7493036Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 96625140781
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7390636Ki
pods: 110
System Info:
Machine ID: 41689852cca44b659f007bb418a6fa9f
System UUID: 390D88CD-3D28-5657-8D0C-83AB1974C88A
Boot ID: bff1c808-788e-48b8-a789-4fee4e800554
Kernel Version: 3.10.0-1062.9.1.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.5
Kubelet Version: v1.17.2
Kube-Proxy Version: v1.17.2
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-6955765f44-d9fb4 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 15m
kube-system coredns-6955765f44-l7xrk 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 15m
kube-system etcd-kubeadm-fedora 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15m
kube-system kube-apiserver-kubeadm-fedora 250m (12%) 0 (0%) 0 (0%) 0 (0%) 15m
kube-system kube-controller-manager-kubeadm-fedora 200m (10%) 0 (0%) 0 (0%) 0 (0%) 15m
kube-system kube-flannel-ds-amd64-v6m2w 100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%) 15m
kube-system kube-proxy-d65kl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15m
kube-system kube-scheduler-kubeadm-fedora 100m (5%) 0 (0%) 0 (0%) 0 (0%) 15m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 850m (42%) 100m (5%)
memory 190Mi (2%) 390Mi (5%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasSufficientMemory 16m (x6 over 16m) kubelet, kubeadm-fedora Node kubeadm-fedora status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 16m (x5 over 16m) kubelet, kubeadm-fedora Node kubeadm-fedora status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 16m (x5 over 16m) kubelet, kubeadm-fedora Node kubeadm-fedora status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 16m kubelet, kubeadm-fedora Updated Node Allocatable limit across pods
Normal Starting 15m kubelet, kubeadm-fedora Starting kubelet.
Normal NodeHasSufficientMemory 15m kubelet, kubeadm-fedora Node kubeadm-fedora status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 15m kubelet, kubeadm-fedora Node kubeadm-fedora status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 15m kubelet, kubeadm-fedora Node kubeadm-fedora status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 15m kubelet, kubeadm-fedora Updated Node Allocatable limit across pods
Normal Starting 15m kube-proxy, kubeadm-fedora Starting kube-proxy.
Normal NodeReady 15m kubelet, kubeadm-fedora Node kubeadm-fedora status is now: NodeReady
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubeadm-fedora Ready master 17m v1.17.2
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6955765f44-d9fb4 1/1 Running 0 17m
kube-system coredns-6955765f44-l7xrk 1/1 Running 0 17m
kube-system etcd-kubeadm-fedora 1/1 Running 0 17m
kube-system kube-apiserver-kubeadm-fedora 1/1 Running 0 17m
kube-system kube-controller-manager-kubeadm-fedora 1/1 Running 0 17m
kube-system kube-flannel-ds-amd64-v6m2w 1/1 Running 0 17m
kube-system kube-proxy-d65kl 1/1 Running 0 17m
kube-system kube-scheduler-kubeadm-fedora 1/1 Running 0 17m
PodCIDR value is showing as 10.244.0.0/24.For flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init.
I am trying to install Kubectl but when I type this in the terminal :
kubectl get pods --namespace knative-serving -w
I got this :
NAME READY STATUS RESTARTS AGE
activator-69b8474d6b-jvzvs 2/2 Running 0 2h
autoscaler-6579b57774-cgmm9 2/2 Running 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
webhook-6d9568d-v4pgk 1/1 Running 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
I don't understand why controller-66cd7d99df-q59kl is still pending.
When I tried this : kubectl describe pods -n knative-serving controller-66cd7d99df-q59kl I got this :
Name: controller-66cd7d99df-q59kl
Namespace: knative-serving
Node: <none>
Labels: app=controller
pod-template-hash=66cd7d99df
Annotations: sidecar.istio.io/inject=false
Status: Pending
IP:
Controlled By: ReplicaSet/controller-66cd7d99df
Containers:
controller:
Image: gcr.io/knative-releases/github.com/knative/serving/cmd/controller#sha256:5a5a0d5fffe839c99fc8f18ba028375467fdcd83cbee9c7015c1a58d01ca6929
Port: 9090/TCP
Limits:
cpu: 1
memory: 1000Mi
Requests:
cpu: 100m
memory: 100Mi
Environment: <none>
Mounts:
/etc/config-logging from config-logging (rw)
/var/run/secrets/kubernetes.io/serviceaccount from controller-token-d9l64 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-logging:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: config-logging
Optional: false
controller-token-d9l64:
Type: Secret (a volume populated by a Secret)
SecretName: controller-token-d9l64
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 40s (x98 over 2h) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Please consider the comments above: you have kubectl installed correctly (it's working) and kubectl describe pod/<pod> would help...
But, the information you provide appears sufficient for an answer:
FailedScheduling because of Insufficient cpu
The pod that you show (one of several) requests:
cpu: 1
memory: 1000Mi
The cluster has insufficient capacity to deploy this pod (and apparently the others).
You should increase the number (and|or size) of the nodes in your cluster to accommodate the capacity needed for the pods.
You needn't delete these pods because, once the cluster's capacity increases, you should see these pods deploy successfully.
Please verify your cpu resources by running:
kubectl get nodes
kubectl describe nodes (your node)
take a look also for all information related to:
Capacity:
cpu:
Allocatable:
cpu:
CPU Requests, CPU Limits information can be helpful
I am practicing the k8s by following the ingress chapter. I am using Google Cluster. Specification are as follows
master: 1.11.7-gke.4
node: 1.11.7-gke.4
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-singh-default-pool-a69fa545-1sm3 Ready <none> 6h v1.11.7-gke.4 10.148.0.46 35.197.128.107 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-819z Ready <none> 6h v1.11.7-gke.4 10.148.0.47 35.198.217.71 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-djhz Ready <none> 6h v1.11.7-gke.4 10.148.0.45 35.197.159.75 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
master endpoint: 35.186.148.93
DNS: singh.hbot.io (master IP)
To keep my question short. I post my source code in the snippet and links back to here.
Files:
deployment.yaml
ingress.yaml
ingress-rules.yaml
Problem:
curl http://singh.hbot.io/webapp1 got timed out
Description
$ kubectl get deployment -n nginx-ingress
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx-ingress 1 1 1 0 2h
nginx-ingress deployment is not available.
$ kubectl describe deployment -n nginx-ingress
Name: nginx-ingress
Namespace: nginx-ingress
CreationTimestamp: Mon, 04 Mar 2019 15:09:42 +0700
Labels: app=nginx-ingress
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-ingress","namespace":"nginx-ingress"},"s...
Selector: app=nginx-ingress
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=nginx-ingress
Service Account: nginx-ingress
Containers:
nginx-ingress:
Image: nginx/nginx-ingress:edge
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-ingress-77fcd48f4d (1/1 replicas created)
Events: <none>
pods:
$ kubectl get pods --all-namespaces=true
NAMESPACE NAME READY STATUS RESTARTS AGE
default webapp1-7d67d68676-k9hhl 1/1 Running 0 6h
default webapp2-64d4844b78-9kln5 1/1 Running 0 6h
default webapp3-5b8ff7484d-zvcsf 1/1 Running 0 6h
kube-system event-exporter-v0.2.3-85644fcdf-xxflh 2/2 Running 0 6h
kube-system fluentd-gcp-scaler-8b674f786-gvv98 1/1 Running 0 6h
kube-system fluentd-gcp-v3.2.0-srzc2 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-w2z2q 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-z7p9l 2/2 Running 0 6h
kube-system heapster-v1.6.0-beta.1-5685746c7b-kd4mn 3/3 Running 0 6h
kube-system kube-dns-6b98c9c9bf-6p8qr 4/4 Running 0 6h
kube-system kube-dns-6b98c9c9bf-pffpt 4/4 Running 0 6h
kube-system kube-dns-autoscaler-67c97c87fb-gbgrs 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-1sm3 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-819z 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-djhz 1/1 Running 0 6h
kube-system l7-default-backend-7ff48cffd7-trqvx 1/1 Running 0 6h
kube-system metrics-server-v0.2.1-fd596d746-bvdfk 2/2 Running 0 6h
kube-system tiller-deploy-57c574bfb8-xnmtj 1/1 Running 0 1h
nginx-ingress nginx-ingress-77fcd48f4d-rfwbk 0/1 CrashLoopBackOff 35 2h
describe pod
$ kubectl describe pods -n nginx-ingress
Name: nginx-ingress-77fcd48f4d-5rhtv
Namespace: nginx-ingress
Priority: 0
PriorityClassName: <none>
Node: gke-singh-default-pool-a69fa545-djhz/10.148.0.45
Start Time: Mon, 04 Mar 2019 17:55:00 +0700
Labels: app=nginx-ingress
pod-template-hash=3397804908
Annotations: <none>
Status: Running
IP: 10.48.2.10
Controlled By: ReplicaSet/nginx-ingress-77fcd48f4d
Containers:
nginx-ingress:
Container ID: docker://5d3ee9e2bf7a2060ff0a96fdd884a937b77978c137df232dbfd0d3e5de89fe0e
Image: nginx/nginx-ingress:edge
Image ID: docker-pullable://nginx/nginx-ingress#sha256:16c1c6dde0b904f031d3c173e0b04eb82fe9c4c85cb1e1f83a14d5b56a568250
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 04 Mar 2019 18:16:33 +0700
Finished: Mon, 04 Mar 2019 18:16:33 +0700
Ready: False
Restart Count: 9
Environment:
POD_NAMESPACE: nginx-ingress (v1:metadata.namespace)
POD_NAME: nginx-ingress-77fcd48f4d-5rhtv (v1:metadata.name)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-token-zvcwt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-token-zvcwt:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-token-zvcwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26m default-scheduler Successfully assigned nginx-ingress/nginx-ingress-77fcd48f4d-5rhtv to gke-singh-default-pool-a69fa545-djhz
Normal Created 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Created container
Normal Started 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Started container
Normal Pulling 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz pulling image "nginx/nginx-ingress:edge"
Normal Pulled 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Successfully pulled image "nginx/nginx-ingress:edge"
Warning BackOff 62s (x112 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Back-off restarting failed container
Fix container terminated
Add to the command to ingress.yaml to prevent container finish running and get terminated by k8s.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
Ingress has no IP address from GKE. Let me have a look in details
describe ingress:
$ kubectl describe ing
Name: webapp-ingress
Namespace: default
Address:
Default backend: default-http-backend:80 (10.48.0.8:8080)
Rules:
Host Path Backends
---- ---- --------
*
/webapp1 webapp1-svc:80 (<none>)
/webapp2 webapp2-svc:80 (<none>)
webapp3-svc:80 (<none>)
Annotations:
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{},"name":"webapp-ingress","namespace":"default"},"spec":{"rules":[{"http":{"paths":[{"backend":{"serviceName":"webapp1-svc","servicePort":80},"path":"/webapp1"},{"backend":{"serviceName":"webapp2-svc","servicePort":80},"path":"/webapp2"},{"backend":{"serviceName":"webapp3-svc","servicePort":80}}]}}]}}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
From this line I got all the ultimate solution from Christian Roy Thank you very much.
Fix the ClusterIP
It is default value then I have to edit my manifest file using NodePort as follow
apiVersion: v1
kind: Service
metadata:
name: webapp1-svc
labels:
app: webapp1
spec:
type: NodePort
ports:
- port: 80
selector:
app: webapp1
And that is.
The answer is in your question. The describe of your ingress shows the problem.
You did kubectl describe ing and the last part of that output was:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
The important part is:
error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
Solution
Just change all your services to be of type NodePort and it will work.
I have to add command in order to let the container not finish working.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
I created a 200G disk with the command gcloud compute disks create --size 200GB my-disk
then created a PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-volume
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteOnce
gcePersistentDisk:
pdName: my-disk
fsType: ext4
then created a PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
then created a StatefulSet and mount the volume to /mnt/disks, which is an existing directory. statefulset.yaml:
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: ...
spec:
...
spec:
containers:
- name: ...
...
volumeMounts:
- name: my-volume
mountPath: /mnt/disks
volumes:
- name: my-volume
emptyDir: {}
volumeClaimTemplates:
- metadata:
name: my-claim
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 200Gi
I ran command kubectl get pv and saw that disk was successfully mounted to each instance
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
my-volume 200Gi RWO Retain Available 19m
pvc-17c60f45-2e4f-11e8-9b77-42010af0000e 200Gi RWO Delete Bound default/my-claim-xxx_1 standard 13m
pvc-5972c804-2e4e-11e8-9b77-42010af0000e 200Gi RWO Delete Bound default/my-claim standard 18m
pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e 200Gi RWO Delete Bound default/my-claimxxx_0 standard 18m
but when I ssh into an instance and run df -hT, I do not see the mounted volume. below is the output:
Filesystem Type Size Used Avail Use% Mounted on
/dev/root ext2 1.2G 447M 774M 37% /
devtmpfs devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs tmpfs 1.9G 744K 1.9G 1% /run
tmpfs tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
tmpfs tmpfs 1.9G 0 1.9G 0% /tmp
tmpfs tmpfs 256K 0 256K 0% /mnt/disks
/dev/sda8 ext4 12M 28K 12M 1% /usr/share/oem
/dev/sda1 ext4 95G 3.5G 91G 4% /mnt/stateful_partition
tmpfs tmpfs 1.0M 128K 896K 13% /var/lib/cloud
overlayfs overlay 1.0M 148K 876K 15% /etc
anyone has any idea?
Also worth mentioning that I'm trying to mount the disk to a docker image which is running in kubernete engine. The pod was created with below commands:
docker build -t gcr.io/xxx .
gcloud docker -- push gcr.io/xxx
kubectl create -f statefulset.yaml
The instance I sshed into is the one that runs the docker image. I do not see the volume in both instance and the docker container
UPDATE
I found the volume, I ran df -ahT in the instance, and saw the relevant entries
/dev/sdb - - - - - /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/gke-xxx-cluster-c-pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
/dev/sdb - - - - - /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/gke-xxx-cluster-c-pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
/dev/sdb - - - - - /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/gke-xxx-cluster-c-pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
/dev/sdb - - - - - /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/gke-xxx-cluster-c-pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
/dev/sdb - - - - - /var/lib/kubelet/pods/61bb679b-2e4e-11e8-9b77-42010af0000e/volumes/kubernetes.io~gce-pd/pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
/dev/sdb - - - - - /var/lib/kubelet/pods/61bb679b-2e4e-11e8-9b77-42010af0000e/volumes/kubernetes.io~gce-pd/pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
/dev/sdb - - - - - /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/61bb679b-2e4e-11e8-9b77-42010af0000e/volumes/kubernetes.io~gce-pd/pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
/dev/sdb - - - - - /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/61bb679b-2e4e-11e8-9b77-42010af0000e/volumes/kubernetes.io~gce-pd/pvc-61b9daf9-2e4e-11e8-9b77-42010af0000e
then I went into the docker container and ran df -ahT, I got
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 95G 3.5G 91G 4% /mnt/disks
Why I'm seeing 95G total size instead of 200G, which is the size of my volume?
More info:
kubectl describe pod
Name: xxx-replicaset-0
Namespace: default
Node: gke-xxx-cluster-default-pool-5e49501c-nrzt/10.128.0.17
Start Time: Fri, 23 Mar 2018 11:40:57 -0400
Labels: app=xxx-replicaset
controller-revision-hash=xxx-replicaset-755c4f7cff
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"StatefulSet","namespace":"default","name":"xxx-replicaset","uid":"d6c3511f-2eaf-11e8-b14e-42010af0000...
kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container xxx-deployment
Status: Running
IP: 10.52.4.5
Created By: StatefulSet/xxx-replicaset
Controlled By: StatefulSet/xxx-replicaset
Containers:
xxx-deployment:
Container ID: docker://137b3966a14538233ed394a3d0d1501027966b972d8ad821951f53d9eb908615
Image: gcr.io/sampeproject/xxxstaging:v1
Image ID: docker-pullable://gcr.io/sampeproject/xxxstaging#sha256:a96835c2597cfae3670a609a69196c6cd3d9cc9f2f0edf5b67d0a4afdd772e0b
Port: 8080/TCP
State: Running
Started: Fri, 23 Mar 2018 11:42:17 -0400
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment:
Mounts:
/mnt/disks from my-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hj65g (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
my-claim:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: my-claim-xxx-replicaset-0
ReadOnly: false
my-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-hj65g:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hj65g
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 10m (x4 over 10m) default-scheduler PersistentVolumeClaim is not bound: "my-claim-xxx-replicaset-0" (repeated 5 times)
Normal Scheduled 9m default-scheduler Successfully assigned xxx-replicaset-0 to gke-xxx-cluster-default-pool-5e49501c-nrzt
Normal SuccessfulMountVolume 9m kubelet, gke-xxx-cluster-default-pool-5e49501c-nrzt MountVolume.SetUp succeeded for volume "my-volume"
Normal SuccessfulMountVolume 9m kubelet, gke-xxx-cluster-default-pool-5e49501c-nrzt MountVolume.SetUp succeeded for volume "default-token-hj65g"
Normal SuccessfulMountVolume 9m kubelet, gke-xxx-cluster-default-pool-5e49501c-nrzt MountVolume.SetUp succeeded for volume "pvc-902c57c5-2eb0-11e8-b14e-42010af0000e"
Normal Pulling 9m kubelet, gke-xxx-cluster-default-pool-5e49501c-nrzt pulling image "gcr.io/sampeproject/xxxstaging:v1"
Normal Pulled 8m kubelet, gke-xxx-cluster-default-pool-5e49501c-nrzt Successfully pulled image "gcr.io/sampeproject/xxxstaging:v1"
Normal Created 8m kubelet, gke-xxx-cluster-default-pool-5e49501c-nrzt Created container
Normal Started 8m kubelet, gke-xxx-cluster-default-pool-5e49501c-nrzt Started container
Seems like it did not mount the correct volume. I ran lsblk in docker container
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 100G 0 disk
├─sda1 8:1 0 95.9G 0 part /mnt/disks
├─sda2 8:2 0 16M 0 part
├─sda3 8:3 0 2G 0 part
├─sda4 8:4 0 16M 0 part
├─sda5 8:5 0 2G 0 part
├─sda6 8:6 0 512B 0 part
├─sda7 8:7 0 512B 0 part
├─sda8 8:8 0 16M 0 part
├─sda9 8:9 0 512B 0 part
├─sda10 8:10 0 512B 0 part
├─sda11 8:11 0 8M 0 part
└─sda12 8:12 0 32M 0 part
sdb 8:16 0 200G 0 disk
Why this is happening?
When you use PVCs, K8s manages persistent disks for you.
The exact way how PVs can by defined by provisioner in storage classes. Since you use GKE your default SC uses kubernetes.io/gce-pd provisioner (https://kubernetes.io/docs/concepts/storage/storage-classes/#gce).
In other words for each pod new PV is created.
If you would like to use existing disk you can use Volumes instead of PVCs (https://kubernetes.io/docs/concepts/storage/volumes/#gcepersistentdisk)
The PVC is not mounted into your container because you did not actually specify the PVC in your container's volumeMounts. Only the emptyDir volume was specified.
I actually recently modified the GKE StatefulSet tutorial. Before, some of the steps were incorrect and saying to manually create the PD and PV objects. Instead, it's been corrected to use dynamic provisioning.
Please try that out and see if the updated steps work for you.