I have a problem when I set kubelet parameter cluster-dns
My OS is CentOS Linux release 7.0.1406 (Core)
Kernel:Linux master 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
kubelet config file:
KUBELET_HOSTNAME="--hostname-override=master"
#KUBELET_API_SERVER="--api-servers=http://master:8080
KUBECONFIG="--kubeconfig=/root/.kube/config-demo"
KUBELET_DNS="–-cluster-dns=10.254.0.10"
KUBELET_DOMAIN="--cluster-domain=cluster.local"
# Add your own!
KUBELET_ARGS="--cgroup-driver=systemd --fail-swap-on=false --pod_infra_container_image=177.1.1.35/library/pause:latest"
config file:
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=4"
KUBE_ALLOW_PRIV="--allow-privileged=false"
KUBE_MASTER="--master=http://master:8080"
kubelet.service file:
[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/kubelet
ExecStart=/usr/bin/kubelet \
$KUBE_LOGTOSTDERR \
$KUBE_LOG_LEVEL \
$KUBELET_API_SERVER \
$KUBELET_DNS \
$KUBELET_DOMAIN \
$KUBELET_ADDRESS \
$KUBELET_PORT \
$KUBELET_HOSTNAME \
$KUBE_ALLOW_PRIV \
$KUBELET_ARGS \
$KUBECONFIG
Restart=on-failure
KillMode=process
[Install]
WantedBy=multi-user.target
When I start the kubelet service I can see the "--cluster-dns=10.254.0.10" parameter is correct set:
root 29705 1 1 13:24 ? 00:00:16 /usr/bin/kubelet --logtostderr=true --v=4 –-cluster-dns=10.254.0.10 --cluster-domain=cluster.local --hostname-override=master --allow-privileged=false --cgroup-driver=systemd --fail-swap-on=false --pod_infra_container_image=177.1.1.35/library/pause:latest --kubeconfig=/root/.kube/config-demo
But when I use systemctl status kubelet check the service the cluster-domain parameter just have only on "-" like:
systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2018-07-13 13:24:07 CST; 5s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 29705 (kubelet)
Memory: 30.6M
CGroup: /system.slice/kubelet.service
└─29705 /usr/bin/kubelet --logtostderr=true --v=4 -cluster-dns=10.254.0.10 --cluster-domain=cluster.local --hostname-override=master --allow-privileged=false --cgroup-driver=systemd --fail-swap-on=false --pod_infra_container_image=177.1.1.35/library/pause:latest --kubeconfig=/root/.kube/config-demo
In the logs say there is nothing set in cluster-dns flag:
Jul 13 13:24:07 master kubelet: I0713 13:24:07.680625 29705 flags.go:27] FLAG: --cluster-dns="[]"
Jul 13 13:24:07 master kubelet: I0713 13:24:07.680636 29705 flags.go:27] FLAG: --cluster-domain="cluster.local"
The Pods with errors:
pod: "java-deploy-69c84746b9-b2d7j_default(ce02d183-864f-11e8-9bdb-525400c4f6bf)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
My kube-dns config file:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.254.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
---
#apiVersion: v1
#kind: ServiceAccount
#metadata:
# name: kube-dns
# namespace: kube-system
# labels:
# kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-dns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: EnsureExists
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
# replicas: not specified here:
# 1. In order to make Addon Manager do not reconcile this replicas parameter.
# 2. Default is 1.
# 3. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
volumes:
- name: kube-dns-config
configMap:
name: kube-dns
optional: true
containers:
- name: kubedns
image: 177.1.1.35/library/kube-dns:1.14.8
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthcheck/kubedns
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-dir=/kube-dns-config
- --kube-master-url=http://177.1.1.40:8080
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
volumeMounts:
- name: kube-dns-config
mountPath: /kube-dns-config
- name: dnsmasq
image: 177.1.1.35/library/dnsmasq:1.14.8
livenessProbe:
httpGet:
path: /healthcheck/dnsmasq
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- -v=2
- -logtostderr
- -configDir=/etc/k8s/dns/dnsmasq-nanny
- -restartDnsmasq=true
- --
- -k
- --cache-size=1000
- --no-negcache
- --log-facility=-
- --server=/cluster.local/127.0.0.1#10053
- --server=/in-addr.arpa/127.0.0.1#10053
- --server=/ip6.arpa/127.0.0.1#10053
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 20Mi
volumeMounts:
- name: kube-dns-config
mountPath: /etc/k8s/dns/dnsmasq-nanny
- name: sidecar
image: 177.1.1.35/library/sidecar:1.14.8
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
- --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
- --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 20Mi
cpu: 10m
dnsPolicy: Default # Don't use cluster DNS.
#serviceAccountName: kube-dns
Recheck your kubelet config:
KUBELET_DNS="–-cluster-dns=10.254.0.10"
It seems to me that the first dash is longer than the second.
Maybe a copy&paste you made causes that strange character.
Retype it and retry.
Related
I have a setup of aks with
movie-service deployed
nginx ingress deployed
NAME READY STATUS RESTARTS AGE
movie-service-7bbf464749-ffxh6 1/1 Running 0 45m
nginx-release-nginx-ingress-7c97fd9dd7-qdcjw 1/1 Running 0 3m38s
kubectl describe ingress output
Name: movie-service
Labels: app.kubernetes.io/instance=movie-service
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=movie-service
app.kubernetes.io/version=1.16.0
helm.sh/chart=movie-service-0.1.0
Namespace: default
Address:
Ingress Class: nginx
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
suchait-ingress.eastus.cloudapp.azure.com
/movie-service(/|$)(.*) movie-service:8080 (10.244.0.8:8080)
Annotations: kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: false
meta.helm.sh/release-name: movie-service
meta.helm.sh/release-namespace: default
nginx.com/health-checks: true
nginx.ingress.kubernetes.io/rewrite-target: /$1
nginx.ingress.kubernetes.io/ssl-redirect: false
nginx.ingress.kubernetes.io/use-regex: true
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AddedOrUpdated 128m nginx-ingress-controller Configuration for default/movie-service was added or updated
Normal AddedOrUpdated 44m (x3 over 44m) nginx-ingress-controller Configuration for default/movie-service was added or updated
But when I check my path configuration inside nginx container I can't see anything
kubectl exec -it nginx-release-nginx-ingress-7c97fd9dd7-qdcjw sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
$ ls -lrt /etc/nginx/conf.d
total 0 ```
Observations :
1. My movie-service api is working fine - when i go inside container and hit with localhost.
2. /nginx-health url works fine and gives 200 OK response.
**Note : I have deployed this whole setup using helm charts **
ingress deployment.yml template
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: nginx-release
meta.helm.sh/release-namespace: default
creationTimestamp: "2022-10-19T17:57:16Z"
generation: 9
labels:
app.kubernetes.io/instance: nginx-release
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: nginx-release-nginx-ingress
helm.sh/chart: nginx-ingress-0.15.0
name: nginx-release-nginx-ingress
namespace: default
resourceVersion: "329441"
uid: 908f50a6-a8a5-49c2-802b-cb5a75aa0299
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-release-nginx-ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "9113"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-release-nginx-ingress
spec:
automountServiceAccountToken: true
containers:
- args:
- -nginx-plus=false
- -nginx-reload-timeout=60000
- -enable-app-protect=false
- -enable-app-protect-dos=false
- -nginx-configmaps=$(POD_NAMESPACE)/nginx-release-nginx-ingress
- -default-server-tls-secret=$(POD_NAMESPACE)/nginx-release-nginx-ingress-default-server-tls
- -ingress-class=nginx
- -health-status=true
- -health-status-uri=/nginx-health
- -nginx-debug=true
- -v=5
- -nginx-status=true
- -nginx-status-port=8080
- -nginx-status-allow-cidrs=127.0.0.1
- -report-ingress-status
- -external-service=nginx-release-nginx-ingress
- -enable-leader-election=true
- -leader-election-lock-name=nginx-release-nginx-ingress-leader-election
- -enable-prometheus-metrics=true
- -prometheus-metrics-listen-port=9113
- -prometheus-tls-secret=
- -enable-custom-resources=true
- -enable-snippets=false
- -include-year=false
- -disable-ipv6=false
- -enable-tls-passthrough=false
- -enable-preview-policies=false
- -enable-cert-manager=false
- -enable-oidc=false
- -enable-external-dns=false
- -ready-status=true
- -ready-status-port=8081
- -enable-latency-metrics=false
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: nginx/nginx-ingress:2.4.0
imagePullPolicy: IfNotPresent
name: nginx-release-nginx-ingress
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
- containerPort: 9113
name: prometheus
protocol: TCP
- containerPort: 8081
name: readiness-port
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /nginx-ready
port: readiness-port
scheme: HTTP
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsNonRoot: false
runAsUser: 101
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: nginx-release-nginx-ingress
serviceAccountName: nginx-release-nginx-ingress
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-10-19T17:57:16Z"
lastUpdateTime: "2022-10-20T17:13:56Z"
message: ReplicaSet "nginx-release-nginx-ingress-7c97fd9dd7" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-10-20T17:27:50Z"
lastUpdateTime: "2022-10-20T17:27:50Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 9
readyReplicas: 1
replicas: 1
updatedReplicas: 1
I have spent like 4 days already by trying multiple possible combinations, any help or suggestion will be much appreciated.
Same issue out here.
have k8 installed using VMS in virtual box along with Ingress nginx-2.4.1
https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/
path: /applike
path based routing isint working at all......
path: / - just the default path works and provides a response from the specified service.
Kindly assist.
I'm stuck with an annoying issue, where my pod can't access the mounted persistent volume.
Kubeadm: v1.19.2
Docker: 19.03.13
Zookeeper image: library/zookeeper:3.6
Cluster info: Locally hosted, no Cloud Provide
K8s configuration:
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
selector:
app: zk
ports:
- port: 2888
targetPort: 2888
name: server
protocol: TCP
- port: 3888
targetPort: 3888
name: leader-election
protocol: TCP
clusterIP: ""
type: LoadBalancer
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
selector:
app: zk
ports:
- name: client
protocol: TCP
port: 2181
targetPort: 2181
type: LoadBalancer
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 1
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
volumes:
- name: zoo-config
configMap:
name: zoo-config
- name: datadir
persistentVolumeClaim:
claimName: zoo-pvc
containers:
- name: zookeeper
imagePullPolicy: Always
image: "library/zookeeper:3.6"
resources:
requests:
memory: "1Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper/data
- name: zoo-config
mountPath: /conf
securityContext:
fsGroup: 2000
runAsUser: 1000
runAsNonRoot: true
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.beta.kubernetes.io/storage-class: local-storage
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-storage
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: zoo-config
namespace: default
data:
zoo.cfg: |
tickTime=10000
dataDir=/var/lib/zookeeper/data
clientPort=2181
initLimit=10
syncLimit=4
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: zoo-pv
labels:
type: local
spec:
storageClassName: local-storage
persistentVolumeReclaimPolicy: Retain
hostPath:
path: "/mnt/data"
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- <node-name>
I've tried running the pod as root with the following security context, which I know is a terrible idea, purely as a test. This however caused a bunch of other issues.
securityContext:
fsGroup: 0
runAsUser: 0
Once the pod starts up the logs contain the following,
Zookeeper JMX enabled by default
Using config: /conf/zoo.cfg
<log4j Warnings>
Unable too access datadir, exiting abnormally
Inspecting the pod, provides me with the following information,
~$ kubectl describe pod/zk-0
Name: zk-0
Namespace: default
Priority: 0
Node: <node>
Start Time: Sat, 26 Sep 2020 15:48:00 +0200
Labels: app=zk
controller-revision-hash=zk-6c68989bd
statefulset.kubernetes.io/pod-name=zk-0
Annotations: <none>
Status: Running
IP: <IP>
IPs:
IP: <IP>
Controlled By: StatefulSet/zk
Containers:
zookeeper:
Container ID: docker://281e177d677394604785542c231d21b71f1666a22e74c1c10ef88491dad7a522
Image: library/zookeeper:3.6
Image ID: docker-pullable://zookeeper#sha256:6c051390cfae7958ff427834937c353fc6c34484f6a84b3e4bc8c512b53a16f6
Ports: 2181/TCP, 2888/TCP, 3888/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 3
Started: Sat, 26 Sep 2020 16:04:26 +0200
Finished: Sat, 26 Sep 2020 16:04:27 +0200
Ready: False
Restart Count: 8
Requests:
cpu: 500m
memory: 1Gi
Environment: <none>
Mounts:
/conf from zoo-config (rw)
/var/lib/zookeeper/data from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-88x56 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-zk-0
ReadOnly: false
zoo-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: zoo-config
Optional: false
default-token-88x56:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-88x56
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned default/zk-0 to <node>
Normal Pulled 17m kubelet Successfully pulled image "library/zookeeper:3.6" in 1.932381527s
Normal Pulled 17m kubelet Successfully pulled image "library/zookeeper:3.6" in 1.960610662s
Normal Pulled 17m kubelet Successfully pulled image "library/zookeeper:3.6" in 1.959935633s
Normal Created 16m (x4 over 17m) kubelet Created container zookeeper
Normal Pulled 16m kubelet Successfully pulled image "library/zookeeper:3.6" in 1.92551645s
Normal Started 16m (x4 over 17m) kubelet Started container zookeeper
Normal Pulling 15m (x5 over 17m) kubelet Pulling image "library/zookeeper:3.6"
Warning BackOff 2m35s (x71 over 17m) kubelet Back-off restarting failed container
To me, it seems like the pod has full rw access to the volume, so I'm unsure why it's still refusing to access the directory. Any help will be appreciated!
After quite some digging, I finally figured out why it wasn't working. The logs were actually telling me all I needed to know in the end, the mounted persistentVolumeClaim simply did not have the correct file permissions to read from the mounted hostpath /mnt/data directory
To fix this, in a somewhat hacky way, I gave read, write & execute permissions to all.
chmod 777 /mnt/data
Overview can be found here
This is definitely not the most secure way, of fixing the issue, and I would strongly advise against using this in any production like environment.
Probably a better approach would be the following
sudo usermod -a -G 1000 1000
I'am running Kubernetes cluster on Google Cloud Platform via their Kubernetes Engine. Cluster version is 1.13.11-gke.14. PHP application pod contains 2 containers - Nginx as a reverse proxy and php-fpm (7.2).
In google cloud is used TCP Load Balancer and then internal routing via Nginx Ingress.
Problem is:
when I upload some bigger file (17MB), ingress is crashing with this error:
W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX:
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required.
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b
E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:13.156963Z Exiting with 0
Here is yaml configuration of Nginx ingress. Configuration is default by Gitlab's system that is creating cluster on their own.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2019-11-24T17:35:04Z"
generation: 3
labels:
app: nginx-ingress
chart: nginx-ingress-1.22.1
component: controller
heritage: Tiller
release: ingress
name: ingress-nginx-ingress-controller
namespace: gitlab-managed-apps
resourceVersion: "2638973"
selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-ingress
release: ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-ingress
component: controller
release: ingress
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: nginx-ingress-controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources: {}
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nginx/modsecurity/modsecurity.conf
name: modsecurity-template-volume
subPath: modsecurity.conf
- mountPath: /var/log/modsec
name: modsecurity-log-volume
- args:
- /bin/sh
- -c
- tail -f /var/log/modsec/audit.log
image: busybox
imagePullPolicy: Always
name: modsecurity-log
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/log/modsec
name: modsecurity-log-volume
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ingress-nginx-ingress
serviceAccountName: ingress-nginx-ingress
terminationGracePeriodSeconds: 60
volumes:
- configMap:
defaultMode: 420
items:
- key: modsecurity.conf
path: modsecurity.conf
name: ingress-nginx-ingress-controller
name: modsecurity-template-volume
- emptyDir: {}
name: modsecurity-log-volume
I have no Idea what else to try. I'm running cluster on 3 nodes (2x 1vCPU, 1.5GB RAM and 1x Preemptile 2vCPU, 1,8GB RAM), all of them on SSD drives.
Anytime i upload the image, disk IO will get crazy.
Disk IOPS
Disk I/O
Thanks for your help.
Found solution. Nginx-ingress pod contained modsecurity too. All requests were analyzed by mod security and bigger uploaded files caused those crashes. It wasn't crash at all but took too much CPU and I/O, that caused longer healthcheck response to all other pods. Solution is to configure correctly modsecurity or disable.
I'm having a few issues getting Ambassador to work correctly. I'm new to Kubernetes and just teaching myself.
I have successfully managed to work through the demo material Ambassador provide - e.g /httpbin/ endpoint is working correctly, but when I try to deploy a Go service it is falling over.
When hitting the 'qotm' endpoint, the page this is the response:
upstream request timeout
Pod status:
CrashLoopBackOff
From my research, it seems to be related to the yaml file not being configured correctly but I'm struggling to find any documentation relating to this use case.
My cluster is running on AWS EKS and the images are being pushed to AWS ECR.
main.go:
package main
import (
"fmt"
"net/http"
"os"
)
func main() {
var PORT string
if PORT = os.Getenv("PORT"); PORT == "" {
PORT = "3001"
}
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello World from path: %s\n", r.URL.Path)
})
http.ListenAndServe(":" + PORT, nil)
}
Dockerfile:
FROM golang:alpine
ADD ./src /go/src/app
WORKDIR /go/src/app
EXPOSE 3001
ENV PORT=3001
CMD ["go", "run", "main.go"]
test.yaml:
apiVersion: v1
kind: Service
metadata:
name: qotm
annotations:
getambassador.io/config: |
---
apiVersion: ambassador/v1
kind: Mapping
name: qotm_mapping
prefix: /qotm/
service: qotm
spec:
selector:
app: qotm
ports:
- port: 80
name: http-qotm
targetPort: http-api
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: qotm
spec:
replicas: 1
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: qotm
spec:
containers:
- name: qotm
image: ||REMOVED||
ports:
- name: http-api
containerPort: 3001
readinessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 30
periodSeconds: 3
resources:
limits:
cpu: "0.1"
memory: 100Mi
Pod description:
Name: qotm-7b9bf4d499-v9nxq
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: ip-192-168-89-69.eu-west-1.compute.internal/192.168.89.69
Start Time: Sun, 17 Mar 2019 17:19:50 +0000
Labels: app=qotm
pod-template-hash=3656908055
Annotations: <none>
Status: Running
IP: 192.168.113.23
Controlled By: ReplicaSet/qotm-7b9bf4d499
Containers:
qotm:
Container ID: docker://5839996e48b252ac61f604d348a98c47c53225712efd503b7c3d7e4c736920c4
Image: IMGURL
Image ID: docker-pullable://IMGURL
Port: 3001/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 17 Mar 2019 17:30:49 +0000
Finished: Sun, 17 Mar 2019 17:30:49 +0000
Ready: False
Restart Count: 7
Limits:
cpu: 100m
memory: 200Mi
Requests:
cpu: 100m
memory: 200Mi
Readiness: http-get http://:3001/health delay=30s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5bbxw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-5bbxw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5bbxw
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned default/qotm-7b9bf4d499-v9nxq to ip-192-168-89-69.eu-west-1.compute.internal
Normal Pulled 10m (x5 over 12m) kubelet, ip-192-168-89-69.eu-west-1.compute.internal Container image "IMGURL" already present on machine
Normal Created 10m (x5 over 12m) kubelet, ip-192-168-89-69.eu-west-1.compute.internal Created container
Normal Started 10m (x5 over 11m) kubelet, ip-192-168-89-69.eu-west-1.compute.internal Started container
Warning BackOff 115s (x47 over 11m) kubelet, ip-192-168-89-69.eu-west-1.compute.internal Back-off restarting failed container
In your kubernetes deployment file you have exposed a readiness probe on port 5000 while your application is exposed on port 3001, also while running the container a few times I got OOMKilled so increased the memory limit. Anyways below deployment file should work fine.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: qotm
spec:
replicas: 1
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: qotm
spec:
containers:
- name: qotm
image: <YOUR_IMAGE>
imagePullPolicy: Always
ports:
- name: http-api
containerPort: 3001
readinessProbe:
httpGet:
path: /health
port: 3001
initialDelaySeconds: 30
periodSeconds: 3
resources:
limits:
cpu: "0.1"
memory: 200Mi
I'm trying to do something very simple and it doesn't work. I must be doing something stupid but I just can't see it. I hope someone can...
When I run the rabbitmq:latest Docker image on my local docker I can connect to it successfully:
docker run -p 5672:5672 -d rabbitmq
telnet <dockerMachineIp> 5672
Trying x.y.z.w...
Connected to x.y.z.w.
Now I'm deploying the image into k8s:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
run: rabbitmq
name: rabbitmq
namespace: uat
spec:
replicas: 1
selector:
matchLabels:
env: uat
run: rabbitmq
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
env: uat
run: rabbitmq
spec:
containers:
- image: rabbitmq
imagePullPolicy: Always
name: rabbitmq
ports:
- containerPort: 5672
protocol: TCP
readinessProbe:
failureThreshold: 3
tcpSocket:
port: 5672
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
livenessProbe:
failureThreshold: 3
tcpSocket:
port: 5672
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 100m
memory: 150Mi
requests:
cpu: 100m
memory: 150Mi
terminationMessagePath: /dev/termination-log
dnsPolicy: ClusterFirst
restartPolicy: Always
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
observedGeneration: 2
replicas: 1
updatedReplicas: 1
And I create a service for it:
apiVersion: v1
kind: Service
metadata:
name: rabbitmq
namespace: uat
spec:
ports:
- name: tcp5672
port: 5672
protocol: TCP
targetPort: 5672
selector:
run: rabbitmq
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
The image successfully deploys:
2016-08-16T06:08:15.903787400Z =INFO REPORT==== 16-Aug-2016::06:08:15 ===
2016-08-16T06:08:15.903793115Z started TCP Listener on [::]:5672
2016-08-16T06:08:15.911128257Z completed with 0 plugins.
2016-08-16T06:08:15.911479872Z
2016-08-16T06:08:15.911492347Z =INFO REPORT==== 16-Aug-2016::06:08:15 ===
2016-08-16T06:08:15.911497759Z Server startup complete; 0 plugins started.
2016-08-16T06:11:00.901609310Z
But after this my other application trying to connect to tcp://rabbitmq:5672 receive a Connection Refused. When I test it myself:
kubectl run --namespace uat -i --tty busybox --image=busybox --restart=Never -- sh
/ # telnet rabbitmq 5672
Connection closed by foreign host
In the rabbitmq logs I can see:
2016-08-16T07:38:48.465296167Z =INFO REPORT==== 16-Aug-2016::07:38:48 ===
2016-08-16T07:38:48.465302171Z accepting AMQP connection <0.3666.0> (10.244.66.6:50968 -> 10.244.64.4:5672)
2016-08-16T07:38:48.465391749Z
2016-08-16T07:38:48.465408673Z =ERROR REPORT==== 16-Aug-2016::07:38:48 ===
2016-08-16T07:38:48.465414738Z closing AMQP connection <0.3666.0> (10.244.66.6:50968 -> 10.244.64.4:5672):
2016-08-16T07:38:48.465420105Z {handshake_timeout,handshake}
What I'm doing here is so simple I don't see what I missed.
EDIT
I had left that issue on the side because I had no time to work on it. When I tried again a few weeks later it just started working. I didn't change the version of k8s or made any changes to the infrastructure. I'm afraid I don't know what happened
5672 is the AMQP port is not accessible using HTTP.
The management UI uses the port 15672, but you have enable it:
rabbitmq-plugins enable rabbitmq_management
see this https://www.rabbitmq.com/management.html
then you can use: http://server-name:15672/