I'm building a jenkins cluster in Amazon EKS and am trying to register Jenkins with the AWS Load Balancer Controller. I could use a bit of advice from some more experienced folks.
Here is my values for Jenkins helm3 install (I'm still a bit new at helm):
clusterZone: "cluster.local"
renderHelmLabels: true
controller:
componentName: "jenkins-controller"
image: "jenkins/jenkins"
tag: "2.263.3"
imagePullPolicy: "Always"
adminUser: "admin"
adminPassword: "admin"
jenkinsHome: "/var/jenkins_home"
jenkinsWar: "/usr/share/jenkins/jenkins.war"
resources:
requests:
cpu: "50m"
memory: "256Mi"
limits:
cpu: "2000m"
memory: "4096Mi"
usePodSecurityContext: true
runAsUser: 1000
fsGroup: 1000
servicePort: 8080
targetPort: 8080
serviceType: NodePort
serviceAnnotations:
alb.ingress.kubernetes.io/healthcheck-path: '{{ default "" .Values.controller.jenkinsUriPrefix }}/login'
alb.ingress.kubernetes.io/group.name: "jenkins-ingress"
healthProbes: true
probes:
startupProbe:
httpGet:
path: '{{ default "" .Values.controller.jenkinsUriPrefix }}/login'
port: http
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 12
livenessProbe:
failureThreshold: 5
httpGet:
path: '{{ default "" .Values.controller.jenkinsUriPrefix }}/login'
port: http
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
failureThreshold: 3
httpGet:
path: '{{ default "" .Values.controller.jenkinsUriPrefix }}/login'
port: http
periodSeconds: 10
timeoutSeconds: 5
agentListenerPort: 50000
agentListenerHostPort:
disabledAgentProtocols:
- JNLP-connect
- JNLP2-connect
csrf:
defaultCrumbIssuer:
enabled: true
proxyCompatability: true
agentListenerServiceType: "ClusterIP"
installPlugins:
- kubernetes:1.29.0
- workflow-aggregator:2.6
- git:4.5.2
- configuration-as-code:1.47
JCasC:
defaultConfig: true
securityRealm: |-
local:
allowsSignup: false
enableCaptcha: false
users:
- id: "${chart-admin-username}"
name: "Jenkins Admin"
password: "${chart-admin-password}"
authorizationStrategy: |-
loggedInUsersCanDoAnything:
allowAnonymousRead: false
sidecars:
configAutoReload:
enabled: true
image: kiwigrid/k8s-sidecar:0.1.275
imagePullPolicy: IfNotPresent
reqRetryConnect: 10
sshTcpPort: 1044
folder: "/var/jenkins_home/casc_configs"
ingress:
enabled: true
paths:
- backend:
serviceName: >-
{{ template "jenkins.fullname" . }}
servicePort: 8080
# path: "/jenkins"
apiVersion: "extensions/v1beta1"
annotations:
alb.ingress.kubernetes.io/group.name: "jenkins-ingress"
kubernetes.io/ingress.class: "alb"
persistence:
enabled: true
existingClaim: jenkins-0-claim
rbac:
create: true
readSecrets: false
serviceAccount:
create: true
name: "jenkins"
Here is the contents of my ingress.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/group.name: jenkins-ingress
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}, {"HTTP":
8080}, {"HTTPS": 8443}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
kubernetes.io/ingress.class: alb
name: app-ingress
namespace: default
spec:
rules:
- http:
paths:
- backend:
serviceName: app1-nginx-nodeport-service
servicePort: 80
path: /app1/*
- backend:
serviceName: app2-nginx-nodeport-service
servicePort: 80
path: /app2/*
- backend:
serviceName: app3-nginx-nodeport-service
servicePort: 80
path: /app3/*
- backend:
serviceName: jenkins
servicePort: 8080
path: /jenkins/*
Here is the error, I suspect it is due to the namespace. Jenkins is in it's own namespace:
❯ kubectl describe ingress app-ingress
Name: app-ingress
Namespace: default
Address: internal-k8s-jenkinsingress-9f4e69d9f1-2066345703.us-west-2.elb.amazonaws.com
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
*
/app1/* app1-nginx-nodeport-service:80 (10.216.66.254:80)
/app2/* app2-nginx-nodeport-service:80 (10.216.66.248:80)
/app3/* app3-nginx-nodeport-service:80 (10.216.66.174:80)
/jenkins/* jenkins:8080 (<error: endpoints "jenkins" not found>)
Annotations: alb.ingress.kubernetes.io/group.name: jenkins-ingress
alb.ingress.kubernetes.io/listen-ports: [{"HTTP": 80}, {"HTTPS": 443}, {"HTTP": 8080}, {"HTTPS": 8443}]
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
kubernetes.io/ingress.class: alb
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 35m (x16 over 37m) ingress Failed deploy model due to InvalidParameter: 1 validation error(s) found.
- minimum field value of 1, CreateTargetGroupInput.Port.
Warning FailedBuildModel 7m2s (x15 over 34m) ingress Failed build model due to ingress: default/app-ingress: Service "jenkins" not found
I was able to resolve my issue. Turns out I was defining the the jenkins path in too many places. I removed it from the primary ingress definition and altered my jenkins helm values.
I also set service type to NodePort instead of ClusterIP
Removed this from app-ingress.yaml:
- backend:
serviceName: jenkins
servicePort: 8080
path: /jenkins/*
Removed path value from jenkins helm ingress definition and set the jenkinsUriPrefix to "/jenkins".
Related
I have a setup of aks with
movie-service deployed
nginx ingress deployed
NAME READY STATUS RESTARTS AGE
movie-service-7bbf464749-ffxh6 1/1 Running 0 45m
nginx-release-nginx-ingress-7c97fd9dd7-qdcjw 1/1 Running 0 3m38s
kubectl describe ingress output
Name: movie-service
Labels: app.kubernetes.io/instance=movie-service
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=movie-service
app.kubernetes.io/version=1.16.0
helm.sh/chart=movie-service-0.1.0
Namespace: default
Address:
Ingress Class: nginx
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
suchait-ingress.eastus.cloudapp.azure.com
/movie-service(/|$)(.*) movie-service:8080 (10.244.0.8:8080)
Annotations: kubernetes.io/ingress.class: nginx
kubernetes.io/tls-acme: false
meta.helm.sh/release-name: movie-service
meta.helm.sh/release-namespace: default
nginx.com/health-checks: true
nginx.ingress.kubernetes.io/rewrite-target: /$1
nginx.ingress.kubernetes.io/ssl-redirect: false
nginx.ingress.kubernetes.io/use-regex: true
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AddedOrUpdated 128m nginx-ingress-controller Configuration for default/movie-service was added or updated
Normal AddedOrUpdated 44m (x3 over 44m) nginx-ingress-controller Configuration for default/movie-service was added or updated
But when I check my path configuration inside nginx container I can't see anything
kubectl exec -it nginx-release-nginx-ingress-7c97fd9dd7-qdcjw sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
$ ls -lrt /etc/nginx/conf.d
total 0 ```
Observations :
1. My movie-service api is working fine - when i go inside container and hit with localhost.
2. /nginx-health url works fine and gives 200 OK response.
**Note : I have deployed this whole setup using helm charts **
ingress deployment.yml template
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
meta.helm.sh/release-name: nginx-release
meta.helm.sh/release-namespace: default
creationTimestamp: "2022-10-19T17:57:16Z"
generation: 9
labels:
app.kubernetes.io/instance: nginx-release
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: nginx-release-nginx-ingress
helm.sh/chart: nginx-ingress-0.15.0
name: nginx-release-nginx-ingress
namespace: default
resourceVersion: "329441"
uid: 908f50a6-a8a5-49c2-802b-cb5a75aa0299
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-release-nginx-ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "9113"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-release-nginx-ingress
spec:
automountServiceAccountToken: true
containers:
- args:
- -nginx-plus=false
- -nginx-reload-timeout=60000
- -enable-app-protect=false
- -enable-app-protect-dos=false
- -nginx-configmaps=$(POD_NAMESPACE)/nginx-release-nginx-ingress
- -default-server-tls-secret=$(POD_NAMESPACE)/nginx-release-nginx-ingress-default-server-tls
- -ingress-class=nginx
- -health-status=true
- -health-status-uri=/nginx-health
- -nginx-debug=true
- -v=5
- -nginx-status=true
- -nginx-status-port=8080
- -nginx-status-allow-cidrs=127.0.0.1
- -report-ingress-status
- -external-service=nginx-release-nginx-ingress
- -enable-leader-election=true
- -leader-election-lock-name=nginx-release-nginx-ingress-leader-election
- -enable-prometheus-metrics=true
- -prometheus-metrics-listen-port=9113
- -prometheus-tls-secret=
- -enable-custom-resources=true
- -enable-snippets=false
- -include-year=false
- -disable-ipv6=false
- -enable-tls-passthrough=false
- -enable-preview-policies=false
- -enable-cert-manager=false
- -enable-oidc=false
- -enable-external-dns=false
- -ready-status=true
- -ready-status-port=8081
- -enable-latency-metrics=false
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: nginx/nginx-ingress:2.4.0
imagePullPolicy: IfNotPresent
name: nginx-release-nginx-ingress
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
- containerPort: 9113
name: prometheus
protocol: TCP
- containerPort: 8081
name: readiness-port
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /nginx-ready
port: readiness-port
scheme: HTTP
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 100m
memory: 128Mi
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsNonRoot: false
runAsUser: 101
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: nginx-release-nginx-ingress
serviceAccountName: nginx-release-nginx-ingress
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-10-19T17:57:16Z"
lastUpdateTime: "2022-10-20T17:13:56Z"
message: ReplicaSet "nginx-release-nginx-ingress-7c97fd9dd7" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-10-20T17:27:50Z"
lastUpdateTime: "2022-10-20T17:27:50Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 9
readyReplicas: 1
replicas: 1
updatedReplicas: 1
I have spent like 4 days already by trying multiple possible combinations, any help or suggestion will be much appreciated.
Same issue out here.
have k8 installed using VMS in virtual box along with Ingress nginx-2.4.1
https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/
path: /applike
path based routing isint working at all......
path: / - just the default path works and provides a response from the specified service.
Kindly assist.
Currently I am testing on Windows using Docker Desktop with Kubernetes feature on.
I want to stream RTMP data over TCP through the Ingress Controller.
I followed the NGINX controller installation guide https://kubernetes.github.io/ingress-nginx/deploy/ and I tried to configure the TCP like https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/
Please note - --tcp-services-configmap=rtmp/tcp-services
If I push data through port 1936 the connection cannot be established. If I try with 1935 it works. I would like to have the Ingress controller route the traffic to my service and get rid of the LoadBalancer since it doesn't really make sense to have one balancer after another.
With the following configuration I was expecting that sending data to 1936 would work.
Am I missing something?
apiVersion: v1
kind: Service
metadata:
name: restreamer1-service
namespace: rtmp
spec:
type: LoadBalancer
selector:
app: restreamer1-service
ports:
- protocol: TCP
port: 1935
targetPort: 1935
name: rtml-com
- protocol: TCP
port: 8080
targetPort: 8080
name: http-com
---
apiVersion: v1
kind: ConfigMap
metadata:
name: tcp-services
namespace: rtmp
data:
1936: "rtmp/restreamer1-service:1935"
---
# Source: ingress-nginx/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
helm.sh/chart: ingress-nginx-3.23.0
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.44.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
revisionHistoryLimit: 10
minReadySeconds: 0
template:
metadata:
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
spec:
dnsPolicy: ClusterFirst
containers:
- name: controller
image: k8s.gcr.io/ingress-nginx/controller:v0.44.0#sha256:3dd0fac48073beaca2d67a78c746c7593f9c575168a17139a9955a82c63c4b9a
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --tcp-services-configmap=rtmp/tcp-services
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
runAsUser: 101
allowPrivilegeEscalation: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: LD_PRELOAD
value: /usr/local/lib/libmimalloc.so
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
- name: webhook
containerPort: 8443
protocol: TCP
volumeMounts:
- name: webhook-cert
mountPath: /usr/local/certificates/
readOnly: true
resources:
requests:
cpu: 100m
memory: 90Mi
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: ingress-nginx
terminationGracePeriodSeconds: 300
volumes:
- name: webhook-cert
secret:
secretName: ingress-nginx-admission
I have a private gke cluster. It contains 3 nodes (each has 2 CPUs and 7.5GB of memory) and 3 pods' replica (it's a .NET Core application). I've noticed that my containers sometimes restart with "error 139 SIGSEGV", which says that there is problem with accessing memory:
This occurs when a program attempts to access a memory location that it’s not allowed to access, or attempts to access a memory location in a way that’s not allowed.
I don't have application logs with the error before restarting the container, therefore it's impossible to debug it.
I've added a property false
in application but it didn't solve the problem.
How can I fix this problem?
Manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: stage-deployment
namespace: stage
spec:
replicas: 3
minReadySeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: stage
template:
metadata:
labels:
app: stage
spec:
containers:
- name: stage-container
image: my.registry/stage/core:latest
imagePullPolicy: "Always"
ports:
- containerPort: 5000
name: http
- containerPort: 22
name: ssh
readinessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 5
periodSeconds: 20
env:
- name: POSTGRES_DB_HOST
value: 127.0.0.1:5432
- name: POSTGRES_DB_USER
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: POSTGRES_DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: DB_NAME
valueFrom:
secretKeyRef:
name: db-credentials
key: dbname
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=my-instance:us-west1:dbserver=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: instance-credentials
secret:
secretName: instance-credentials
imagePullSecrets:
- name: regcred
---
apiVersion: v1
kind: Service
metadata:
name: stage-service
namespace: stage
spec:
type: NodePort
selector:
app: stage
ports:
- protocol: TCP
port: 80
targetPort: 5000
name: https
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: 300m
nginx.ingress.kubernetes.io/proxy-buffer-size: 128k
nginx.ingress.kubernetes.io/proxy-buffers-number: 4 256k
nginx.org/client-max-body-size: 1000m
name: ingress
namespace: stage
spec:
rules:
- host: my.site.com
http:
paths:
- backend:
serviceName: stage-service
servicePort: 80
tls:
- hosts:
- my.site.com
secretName: my-certs
i'm new to openshift/kubernetes/docker and i was wondering where the docker registry of openshift origin persist the images , knowing that :
1.in the deployment's yaml of the docker registry , there is only emptyDir volumes declaration
volumes:
- emptyDir: {}
name: registry-storage
2.in the machine where the pod is deployed i can't see no volume using
docker volumes ls
3.the images are still persisted even if i restart the pod
docker registry deployment's yaml :
apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
creationTimestamp: '2020-04-26T18:16:50Z'
generation: 1
labels:
docker-registry: default
name: docker-registry
namespace: default
resourceVersion: '1844231'
selfLink: >-
/apis/apps.openshift.io/v1/namespaces/default/deploymentconfigs/docker-registry
uid: 1983153d-87ea-11ea-a4bc-fa163ee581f7
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
docker-registry: default
strategy:
activeDeadlineSeconds: 21600
resources: {}
rollingParams:
intervalSeconds: 1
maxSurge: 25%
maxUnavailable: 25%
timeoutSeconds: 600
updatePeriodSeconds: 1
type: Rolling
template:
metadata:
creationTimestamp: null
labels:
docker-registry: default
spec:
containers:
- env:
- name: REGISTRY_HTTP_ADDR
value: ':5000'
- name: REGISTRY_HTTP_NET
value: tcp
- name: REGISTRY_HTTP_SECRET
value:
- name: REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA
value: 'false'
- name: OPENSHIFT_DEFAULT_REGISTRY
value: 'docker-registry.default.svc:5000'
- name: REGISTRY_HTTP_TLS_CERTIFICATE
value: /etc/secrets/registry.crt
- name: REGISTRY_OPENSHIFT_SERVER_ADDR
value: 'docker-registry.default.svc:5000'
- name: REGISTRY_HTTP_TLS_KEY
value: /etc/secrets/registry.key
image: 'docker.io/openshift/origin-docker-registry:v3.11'
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 5000
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: registry
ports:
- containerPort: 5000
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 5000
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
requests:
cpu: 100m
memory: 256Mi
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /registry
name: registry-storage
- mountPath: /etc/secrets
name: registry-certificates
dnsPolicy: ClusterFirst
nodeSelector:
node-role.kubernetes.io/infra: 'true'
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: registry
serviceAccountName: registry
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: registry-storage
- name: registry-certificates
secret:
defaultMode: 420
secretName: registry-certificates
test: false
triggers:
- type: ConfigChange
status:
availableReplicas: 1
conditions:
- lastTransitionTime: '2020-04-26T18:17:12Z'
lastUpdateTime: '2020-04-26T18:17:12Z'
message: replication controller "docker-registry-1" successfully rolled out
reason: NewReplicationControllerAvailable
status: 'True'
type: Progressing
- lastTransitionTime: '2020-05-05T09:39:57Z'
lastUpdateTime: '2020-05-05T09:39:57Z'
message: Deployment config has minimum availability.
status: 'True'
type: Available
details:
causes:
- type: ConfigChange
message: config change
latestVersion: 1
observedGeneration: 1
readyReplicas: 1
replicas: 1
unavailableReplicas: 0
updatedReplicas: 1
to restart : i just delete the pod and a new one is created since i'm using a deployment
i'm creating the file in the /registry
Restarting does not mean the data is deleted, it still exist in the container top layer, suggest you get started by reading this.
Persistence is, for example in Kubernetes, when a pod is deleted and re-created on another node and still maintains the same state of a volume.
I'am running Kubernetes cluster on Google Cloud Platform via their Kubernetes Engine. Cluster version is 1.13.11-gke.14. PHP application pod contains 2 containers - Nginx as a reverse proxy and php-fpm (7.2).
In google cloud is used TCP Load Balancer and then internal routing via Nginx Ingress.
Problem is:
when I upload some bigger file (17MB), ingress is crashing with this error:
W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX:
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required.
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b
E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:13.156963Z Exiting with 0
Here is yaml configuration of Nginx ingress. Configuration is default by Gitlab's system that is creating cluster on their own.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2019-11-24T17:35:04Z"
generation: 3
labels:
app: nginx-ingress
chart: nginx-ingress-1.22.1
component: controller
heritage: Tiller
release: ingress
name: ingress-nginx-ingress-controller
namespace: gitlab-managed-apps
resourceVersion: "2638973"
selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-ingress
release: ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-ingress
component: controller
release: ingress
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: nginx-ingress-controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources: {}
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nginx/modsecurity/modsecurity.conf
name: modsecurity-template-volume
subPath: modsecurity.conf
- mountPath: /var/log/modsec
name: modsecurity-log-volume
- args:
- /bin/sh
- -c
- tail -f /var/log/modsec/audit.log
image: busybox
imagePullPolicy: Always
name: modsecurity-log
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/log/modsec
name: modsecurity-log-volume
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ingress-nginx-ingress
serviceAccountName: ingress-nginx-ingress
terminationGracePeriodSeconds: 60
volumes:
- configMap:
defaultMode: 420
items:
- key: modsecurity.conf
path: modsecurity.conf
name: ingress-nginx-ingress-controller
name: modsecurity-template-volume
- emptyDir: {}
name: modsecurity-log-volume
I have no Idea what else to try. I'm running cluster on 3 nodes (2x 1vCPU, 1.5GB RAM and 1x Preemptile 2vCPU, 1,8GB RAM), all of them on SSD drives.
Anytime i upload the image, disk IO will get crazy.
Disk IOPS
Disk I/O
Thanks for your help.
Found solution. Nginx-ingress pod contained modsecurity too. All requests were analyzed by mod security and bigger uploaded files caused those crashes. It wasn't crash at all but took too much CPU and I/O, that caused longer healthcheck response to all other pods. Solution is to configure correctly modsecurity or disable.