My kubernetes pod status is Init:CrashLoopBackOff using seldon core - seldon-core

As I used seldon core official manifest for deploying simple hello world application.
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
labels:
app: seldon
name: outliersdetector
spec:
name: outliersdetector
predictors:
- componentSpecs:
- spec:
containers:
- image: gcr.io/project-staging-310806/ar:latest
name: outliersdetector
env:
- name: PORT
value: 8080
ports:
- containerPort: 3000
livenessProbe:
initialDelaySeconds: 60
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /
port: 8080
#scheme: HTTP
readinessProbe:
initialDelaySeconds: 60
failureThreshold: 100
periodSeconds: 5
successThreshold: 1
httpGet:
path: /
port: 8080
#scheme: HTTP
graph:
children: []
name: outliersdetector
type: MODEL
endpoint:
type: REST
name: outliersdetector
replicas: 1
This is manifest I have used. My pods status after kubectl apply command is crashloopback off and in kubectl describe is Back-off restarting failed container. can anyone tell what is reason behind it?....

Related

NextJs deployment on kubernetes showing 404 on static resource

I have a web application written on NextJs. To deploy it on kubernetes, first I wrote following Dockerfile
# pull official base image
FROM node:12-alpine as node
# set working directory
WORKDIR /app
ADD ./package.json ./
COPY . .
RUN apk add --no-cache git curl
RUN yarn
RUN yarn build
CMD ["yarn", "start"]
And following is my kubernetes deployment file
apiVersion: apps/v1
kind: Deployment
metadata:
name: www-server
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 3
revisionHistoryLimit: 5
selector:
matchLabels:
app: www-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: www-server
spec:
containers:
- image: www-server:cc57243
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: server
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 1
name: www-server
ports:
- containerPort: 3000
name: server
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: server
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 300m
memory: 500Mi
startupProbe:
failureThreshold: 3
httpGet:
path: /
port: server
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 2
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: aws-ecr-login
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
During the deployment, while I am browsing the server it's showing following error on js files
_next/static/DdLeVH51NHNpJfvX3wxJT/_buildManifest.js net::ERR_ABORTED 404
What I am missing to make the deployment with 0 downtime configuration?

exposing TCP traffic with nginx ingress controller

Currently I am testing on Windows using Docker Desktop with Kubernetes feature on.
I want to stream RTMP data over TCP through the Ingress Controller.
I followed the NGINX controller installation guide https://kubernetes.github.io/ingress-nginx/deploy/ and I tried to configure the TCP like https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/
Please note - --tcp-services-configmap=rtmp/tcp-services
If I push data through port 1936 the connection cannot be established. If I try with 1935 it works. I would like to have the Ingress controller route the traffic to my service and get rid of the LoadBalancer since it doesn't really make sense to have one balancer after another.
With the following configuration I was expecting that sending data to 1936 would work.
Am I missing something?
apiVersion: v1
kind: Service
metadata:
name: restreamer1-service
namespace: rtmp
spec:
type: LoadBalancer
selector:
app: restreamer1-service
ports:
- protocol: TCP
port: 1935
targetPort: 1935
name: rtml-com
- protocol: TCP
port: 8080
targetPort: 8080
name: http-com
---
apiVersion: v1
kind: ConfigMap
metadata:
name: tcp-services
namespace: rtmp
data:
1936: "rtmp/restreamer1-service:1935"
---
# Source: ingress-nginx/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
helm.sh/chart: ingress-nginx-3.23.0
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/version: 0.44.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/component: controller
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
revisionHistoryLimit: 10
minReadySeconds: 0
template:
metadata:
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/component: controller
spec:
dnsPolicy: ClusterFirst
containers:
- name: controller
image: k8s.gcr.io/ingress-nginx/controller:v0.44.0#sha256:3dd0fac48073beaca2d67a78c746c7593f9c575168a17139a9955a82c63c4b9a
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
exec:
command:
- /wait-shutdown
args:
- /nginx-ingress-controller
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --tcp-services-configmap=rtmp/tcp-services
- --validating-webhook=:8443
- --validating-webhook-certificate=/usr/local/certificates/cert
- --validating-webhook-key=/usr/local/certificates/key
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
runAsUser: 101
allowPrivilegeEscalation: true
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: LD_PRELOAD
value: /usr/local/lib/libmimalloc.so
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
ports:
- name: http
containerPort: 80
protocol: TCP
- name: https
containerPort: 443
protocol: TCP
- name: webhook
containerPort: 8443
protocol: TCP
volumeMounts:
- name: webhook-cert
mountPath: /usr/local/certificates/
readOnly: true
resources:
requests:
cpu: 100m
memory: 90Mi
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: ingress-nginx
terminationGracePeriodSeconds: 300
volumes:
- name: webhook-cert
secret:
secretName: ingress-nginx-admission

GKE: pods(dotnet application) often restart with error 139

I have a private gke cluster. It contains 3 nodes (each has 2 CPUs and 7.5GB of memory) and 3 pods' replica (it's a .NET Core application). I've noticed that my containers sometimes restart with "error 139 SIGSEGV", which says that there is problem with accessing memory:
This occurs when a program attempts to access a memory location that it’s not allowed to access, or attempts to access a memory location in a way that’s not allowed.
I don't have application logs with the error before restarting the container, therefore it's impossible to debug it.
I've added a property false
in application but it didn't solve the problem.
How can I fix this problem?
Manifest:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: stage-deployment
namespace: stage
spec:
replicas: 3
minReadySeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: stage
template:
metadata:
labels:
app: stage
spec:
containers:
- name: stage-container
image: my.registry/stage/core:latest
imagePullPolicy: "Always"
ports:
- containerPort: 5000
name: http
- containerPort: 22
name: ssh
readinessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 5000
initialDelaySeconds: 5
periodSeconds: 20
env:
- name: POSTGRES_DB_HOST
value: 127.0.0.1:5432
- name: POSTGRES_DB_USER
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: POSTGRES_DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: DB_NAME
valueFrom:
secretKeyRef:
name: db-credentials
key: dbname
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=my-instance:us-west1:dbserver=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: instance-credentials
secret:
secretName: instance-credentials
imagePullSecrets:
- name: regcred
---
apiVersion: v1
kind: Service
metadata:
name: stage-service
namespace: stage
spec:
type: NodePort
selector:
app: stage
ports:
- protocol: TCP
port: 80
targetPort: 5000
name: https
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: 300m
nginx.ingress.kubernetes.io/proxy-buffer-size: 128k
nginx.ingress.kubernetes.io/proxy-buffers-number: 4 256k
nginx.org/client-max-body-size: 1000m
name: ingress
namespace: stage
spec:
rules:
- host: my.site.com
http:
paths:
- backend:
serviceName: stage-service
servicePort: 80
tls:
- hosts:
- my.site.com
secretName: my-certs

Kubernetes - nginx-ingress is crashing after file upload via php

I'am running Kubernetes cluster on Google Cloud Platform via their Kubernetes Engine. Cluster version is 1.13.11-gke.14. PHP application pod contains 2 containers - Nginx as a reverse proxy and php-fpm (7.2).
In google cloud is used TCP Load Balancer and then internal routing via Nginx Ingress.
Problem is:
when I upload some bigger file (17MB), ingress is crashing with this error:
W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX:
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required.
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b
E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:13.156963Z Exiting with 0
Here is yaml configuration of Nginx ingress. Configuration is default by Gitlab's system that is creating cluster on their own.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2019-11-24T17:35:04Z"
generation: 3
labels:
app: nginx-ingress
chart: nginx-ingress-1.22.1
component: controller
heritage: Tiller
release: ingress
name: ingress-nginx-ingress-controller
namespace: gitlab-managed-apps
resourceVersion: "2638973"
selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-ingress
release: ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-ingress
component: controller
release: ingress
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: nginx-ingress-controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources: {}
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nginx/modsecurity/modsecurity.conf
name: modsecurity-template-volume
subPath: modsecurity.conf
- mountPath: /var/log/modsec
name: modsecurity-log-volume
- args:
- /bin/sh
- -c
- tail -f /var/log/modsec/audit.log
image: busybox
imagePullPolicy: Always
name: modsecurity-log
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/log/modsec
name: modsecurity-log-volume
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ingress-nginx-ingress
serviceAccountName: ingress-nginx-ingress
terminationGracePeriodSeconds: 60
volumes:
- configMap:
defaultMode: 420
items:
- key: modsecurity.conf
path: modsecurity.conf
name: ingress-nginx-ingress-controller
name: modsecurity-template-volume
- emptyDir: {}
name: modsecurity-log-volume
I have no Idea what else to try. I'm running cluster on 3 nodes (2x 1vCPU, 1.5GB RAM and 1x Preemptile 2vCPU, 1,8GB RAM), all of them on SSD drives.
Anytime i upload the image, disk IO will get crazy.
Disk IOPS
Disk I/O
Thanks for your help.
Found solution. Nginx-ingress pod contained modsecurity too. All requests were analyzed by mod security and bigger uploaded files caused those crashes. It wasn't crash at all but took too much CPU and I/O, that caused longer healthcheck response to all other pods. Solution is to configure correctly modsecurity or disable.

orientdb kubernetes readiness probe errored: gzip : invalid header

I am trying to create an orient db deployment on kubernetes cluster using the following yaml file using the orientdb:2.125 docker image from docker hub.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: orientdb
namespace: default
labels:
name: orientdb
spec:
replicas: 2
revisionHistoryLimit: 100
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
minReadySeconds: 5
template:
metadata:
labels:
service: orientdb
spec:
containers:
# Custom pod name.
- name: orientdb-node
image: orientdb:2.1.25
imagePullPolicy: Always
ports:
- name: http-port
containerPort: 2480 # WEB port number.
- name: binary-port
containerPort: 2424
livenessProbe:
httpGet:
path: /
port: http-port
initialDelaySeconds: 60
timeoutSeconds: 30
readinessProbe:
httpGet:
path: /
port: http-port
initialDelaySeconds: 5
timeoutSeconds: 5
But I am getting the following message
Readiness probe errored: gzip: invalid header
Liveness probe errored: gzip: invalid header
How do I fix the readiness and liveness probe for orient db?
orientdb web application on port 2480 returns gzipped HTTP response, so you should add custom HTTP headers to support this into your httpGet livenessProbe and readinessProbe:
livenessProbe:
httpGet:
path: /
port: http-port
httpHeaders:
- name: Accept-Encoding
value: gzip
initialDelaySeconds: 60
timeoutSeconds: 30
readinessProbe:
httpGet:
path: /
port: http-port
httpHeaders:
- name: Accept-Encoding
value: gzip
initialDelaySeconds: 5
timeoutSeconds: 5

Resources