Liveness Probe gets timed - jenkins

I'm using Jenkins and Kubernetes to perform this actions.
Since my loadBalancer needs a healthy pod I had to add the livenessProbe to my pod.
My configuration for the pod:
apiVersion: v1
kind: Pod
metadata:
labels:
component: ci
spec:
# Use service account that can deploy to all namespaces
serviceAccountName: default
# Use the persisnte volume
containers:
- name: gcloud
image: gcr.io/cloud-builders/gcloud
command:
- cat
tty: true
- name: kubectl
image: gcr.io/cloud-builders/kubectl
command:
- cat
tty: true
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
The issue that happens is when I want to deploy the code (CD over Jenkins) it comes to the touch
/tmp/healthy;
command and it's timed out.
The error response I get looks like this:
java.io.IOException: Failed to execute shell script inside container [kubectl] of pod [wobbl-mobile-label-qcd6x-13mtj]. Timed out waiting for the container to become ready!
When I type kubectl get events
I get the following response:
Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Any hints on how to solve this?
I have read this documentation for the liveness and I took the config for it from there.

As can be seen from the link you are referring . The example is to help you understand the working of liveness probe. I the example below from this link
they have purposely removed /tmp/healthy file after
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
what this does is it creates /tmp/healthy file when the container is created. After 5 seconds the liveness probe kicks in and checks for /tmp/healthy file , at this moment the container does have a /tmp/healthy file present . After 30 seconds it deletes the file and liveness probe fails to find the /tmp/healthy file and restarts the container. This process will continue to go on and liveness probe will fail the health check after every 30 seconds.
If you only add
touch /tmp/healthy
The liveness probe should work well

Related

kuberneets pods images ErrImageNeverPull

I have a docker image created user-service and tagged it to localhost:5001
I have a local registry running at PORT 5001
User-service pushed to local registry
and created pod using deploy_new.yaml file
apiVersion: v1
kind: Pod
metadata:
name: user-service
labels:
component: web
spec:
containers:
- name: web
image: localhost:5001/user-service
resources:
limits:
memory: 512Mi
cpu: "1"
requests:
memory: 256Mi
cpu: "0.2"
imagePullPolicy: Never
ports:
- name: http
containerPort: 4006
protocol: TCP
livenessProbe:
httpGet:
path: /health/health
port: 4006
initialDelaySeconds: 3
periodSeconds: 3
failureThreshold: 2
readinessProbe:
httpGet:
path: /health/health
port: 4006
initialDelaySeconds: 15
periodSeconds: 10
But on describing pod
I get
Questions :
What is ErrImageNeverPull image and how to fix it?
How to test liveliness and readiness probes?
Probe APIs
1. What is ErrImageNeverPull image and how to fix it?
As the imagePullPolicyis set to Never the kubelet won't fetch images but look for what is present locally. The error means it could not found the image locally and it will not try to fetch it.
If the cluster can reach to your local docker registry, just change the image: user-service to image: localhost:5000/user-service:latest
If you are using minikube, check the README to reuse your docker daemon so you can use your image without uploading it.
Do eval $(minikube docker-env) on each session you need to use it.
Build the image docker build -t user-service .
Add the image in your Pod manifest as image: user-service
make sure you have imagePullPolicy: Never for your container (which you already have)
2. How to test liveliness and readiness probes?
I suggest you try the examples form the Kubernetes documentation they explain really good the difference between the two and the different types of probes you can configure.
You need first to make your pod running before checking liveness and readiness probes. But in your case they will succeed as soon as the Pod starts. Just describe it and see the events.
One more thing to note. eval $(minikube docker-env) will fail silently if you are using a non-default minikube profile, leading to the observed behavior:
$ eval $(minikube docker-env)
$ minikube docker-env
🤷 Profile "minikube" not found. Run "minikube profile list" to view all profiles.
👉 To start a cluster, run: "minikube start"
$
To address this re-run specifying the profile you are using:
$ eval $(minikube -p my-profile docker-env)

Flask app deployment failed (Liveness probe failed) after "gcloud builds submit ..."

I am a newbie in frontend/backend/DevOps. But I am in need of using Kubernetes to deploy an app on Google Cloud Platform (GCP) to provide a service. Then I start learning by following this series of tutorials:
https://mickeyabhi1999.medium.com/build-and-deploy-a-web-app-with-react-flask-nginx-postgresql-docker-and-google-kubernetes-e586de159a4d
https://medium.com/swlh/build-and-deploy-a-web-app-with-react-flask-nginx-postgresql-docker-and-google-kubernetes-341f3b4de322
And the code of this tutorial series is here: https://github.com/abhiChakra/Addition-App
Everything was fine until the last step: using "gcloud builds submit ..." to build
nginx+react service
flask+wsgi service
nginx+react deployment
flask+wsgi deployment
on a GCP cluster.
1.~3. went well and the status of them are "OK". But the status of flask+wsgi deployment was "Does not have minimum availability" even after many times of restarting.
I used "kubectl get pods" and saw the status of the flask pod was "CrashLoopBackOff".
Then I followed the processes of debugging suggested here:
https://containersolutions.github.io/runbooks/posts/kubernetes/crashloopbackoff/
I used "kubectl describe pod flask" to look into the problem of the flask pod. Then I found the "Exit Code" was 139 and there were messages "Liveness probe failed: Get "http://10.24.0.25:8000/health": read tcp 10.24.0.1:55470->10.24.0.25:8000: read: connection reset by peer" and "Readiness probe failed: Get "http://10.24.0.25:8000/ready": read tcp 10.24.0.1:55848->10.24.0.25:8000: read: connection reset by peer".
The complete log:
Name: flask-676d5dd999-cf6kt
Namespace: default
Priority: 0
Node: gke-addition-app-default-pool-89aab4fe-3l1q/10.140.0.3
Start Time: Thu, 11 Nov 2021 19:06:24 +0800
Labels: app.kubernetes.io/managed-by=gcp-cloud-build-deploy
component=flask
pod-template-hash=676d5dd999
Annotations: <none>
Status: Running
IP: 10.24.0.25
IPs:
IP: 10.24.0.25
Controlled By: ReplicaSet/flask-676d5dd999
Containers:
flask:
Container ID: containerd://5459b747e1d44046d283a46ec1eebb625be4df712340ff9cf492d5583a4d41d2
Image: gcr.io/peerless-garage-330917/addition-app-flask:latest
Image ID: gcr.io/peerless-garage-330917/addition-app-flask#sha256:b45d25ffa8a0939825e31dec1a6dfe84f05aaf4a2e9e43d35084783edc76f0de
Port: 8000/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 12 Nov 2021 17:24:14 +0800
Last State: Terminated
Reason: Error
Exit Code: 139
Started: Fri, 12 Nov 2021 17:17:06 +0800
Finished: Fri, 12 Nov 2021 17:19:06 +0800
Ready: False
Restart Count: 222
Limits:
cpu: 1
Requests:
cpu: 400m
Liveness: http-get http://:8000/health delay=120s timeout=1s period=5s #success=1 #failure=3
Readiness: http-get http://:8000/ready delay=120s timeout=1s period=5s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s97x5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-s97x5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-s97x5
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 9m7s (x217 over 21h) kubelet (combined from similar events): Liveness probe failed: Get "http://10.24.0.25:8000/health": read tcp 10.24.0.1:48636->10.24.0.25:8000: read: connection reset by peer
Warning BackOff 4m38s (x4404 over 22h) kubelet Back-off restarting failed container
Following the suggestion here:
https://containersolutions.github.io/runbooks/posts/kubernetes/crashloopbackoff/#step-4
I had increased the "initialDelaySeconds" to 120, but it still failed.
Because I made sure that everything worked fine on my local laptop, so I think there could be some connection or authentication issue.
To be more detailed, the deployment.yaml looks like:
apiVersion: v1
kind: Service
metadata:
name: ui
spec:
type: LoadBalancer
selector:
app: react
tier: ui
ports:
- port: 8080
targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: flask
spec:
type: ClusterIP
selector:
component: flask
ports:
- port: 8000
targetPort: 8000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: flask
spec:
replicas: 1
selector:
matchLabels:
component: flask
template:
metadata:
labels:
component: flask
spec:
containers:
- name: flask
image: gcr.io/peerless-garage-330917/addition-app-flask:latest
imagePullPolicy: "Always"
resources:
limits:
cpu: "1000m"
requests:
cpu: "400m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 30
periodSeconds: 5
ports:
- containerPort: 8000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ui
spec:
replicas: 1
selector:
matchLabels:
app: react
tier: ui
template:
metadata:
labels:
app: react
tier: ui
spec:
containers:
- name: ui
image: gcr.io/peerless-garage-330917/addition-app-nginx:latest
imagePullPolicy: "Always"
resources:
limits:
cpu: "1000m"
requests:
cpu: "400m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
ports:
- containerPort: 8080
docker-compose.yaml:
# we will be creating these services
services:
flask:
# Note that we are building from our current terminal directory where our Dockerfile is located, we use .
build: .
# naming our resulting container
container_name: flask
# publishing a port so that external services requesting port 8000 on your local machine
# are mapped to port 8000 on our container
ports:
- "8000:8000"
nginx:
# Since our Dockerfile for web-server is located in react-app foler, our build context is ./react-app
build: ./react-app
container_name: nginx
ports:
- "8080:8080"
Nginx Dockerfile:
# first building react project, using node base image
FROM node:10 as build-stage
# setting working dir inside container
WORKDIR /react-app
# required to install packages
COPY package*.json ./
# installing npm packages
RUN npm install
# copying over react source material
COPY src ./src
# copying over further react material
COPY public ./public
# copying over our nginx config file
COPY addition_container_server.conf ./
# creating production build to serve through nginx
RUN npm run build
# starting second, nginx build-stage
FROM nginx:1.15
# removing default nginx config file
RUN rm /etc/nginx/conf.d/default.conf
# copying our nginx config
COPY --from=build-stage /react-app/addition_container_server.conf /etc/nginx/conf.d/
# copying production build from last stage to serve through nginx
COPY --from=build-stage /react-app/build/ /usr/share/nginx/html
# exposing port 8080 on container
EXPOSE 8080
CMD ["nginx", "-g", "daemon off;"]
Nginx server config:
server {
listen 8080;
# location of react build files
root /usr/share/nginx/html/;
# index html from react build to serve
index index.html;
# ONLY KUBERNETES RELEVANT: endpoint for health checkup
location /health {
return 200 "health ok";
}
# ONLY KUBERNETES RELEVANT: endpoint for readiness checkup
location /ready {
return 200 "ready";
}
# html file to serve with / endpoint
location / {
try_files $uri /index.html;
}
# proxing under /api endpoint
location /api {
client_max_body_size 10m;
add_header 'Access-Control-Allow-Origin' http://<NGINX_SERVICE_ENDPOINT>:8080;
proxy_pass http://flask:8000/;
}
}
There are two important functions in App.js:
...
insertCalculation(event, calculation){
/*
Making a POST request via a fetch call to Flask API with numbers of a
calculation we want to insert into DB. Making fetch call to web server
IP with /api/insert_nums which will be reverse proxied via Nginx to the
Application (Flask) server.
*/
event.preventDefault();
fetch('http://<NGINX_SERVICE_ENDPOINT>:8080/api/insert_nums', {method: 'POST',
mode: 'cors',
headers: {
'Content-Type' : 'application/json'
},
body: JSON.stringify(calculation)}
).then((response) => {
...
getHistory(event){
/*
Making a GET request via a fetch call to Flask API to retrieve calculations history.
*/
event.preventDefault()
fetch('http://<NGINX_SERVICE_ENDPOINT>:8080/api/data', {method: 'GET',
mode: 'cors'
}
).then(response => {
...
Flask Dockerfile:
# using base image
FROM python:3.8
# setting working dir inside container
WORKDIR /addition_app_flask
# adding run.py to workdir
ADD run.py .
# adding config.ini to workdir
ADD config.ini .
# adding requirements.txt to workdir
ADD requirements.txt .
# installing flask requirements
RUN pip install -r requirements.txt
# adding in all contents from flask_app folder into a new flask_app folder
ADD ./flask_app ./flask_app
# exposing port 8000 on container
EXPOSE 8000
# serving flask backend through uWSGI server
CMD [ "python", "run.py" ]
run.py:
from gevent.pywsgi import WSGIServer
from flask_app.app import app
# As flask is not a production suitable server, we use will
# a WSGIServer instance to serve our flask application.
if __name__ == '__main__':
WSGIServer(('0.0.0.0', 8000), app).serve_forever()
app.py:
from flask import Flask, request, jsonify
from flask_app.storage import insert_calculation, get_calculations
app = Flask(__name__)
#app.route('/')
def index():
return "My Addition App", 200
#app.route('/health')
def health():
return '', 200
#app.route('/ready')
def ready():
return '', 200
#app.route('/data', methods=['GET'])
def data():
'''
Function used to get calculations history
from Postgres database and return to fetch call in frontend.
:return: Json format of either collected calculations or error message
'''
calculations_history = []
try:
calculations = get_calculations()
for key, value in calculations.items():
calculations_history.append(value)
return jsonify({'calculations': calculations_history}), 200
except:
return jsonify({'error': 'error fetching calculations history'}), 500
#app.route('/insert_nums', methods=['POST'])
def insert_nums():
'''
Function used to insert a calculation into our postgres
DB. Operands of operation received from frontend.
:return: Json format of either success or failure response.
'''
insert_nums = request.get_json()
firstNum, secondNum, answer = insert_nums['firstNum'], insert_nums['secondNum'], insert_nums['answer']
try:
insert_calculation(firstNum, secondNum, answer)
return jsonify({'Response': 'Successfully inserted into DB'}), 200
except:
return jsonify({'Response': 'Unable to insert into DB'}), 500
I can't tell what is going wrong. And I also wonder what should be the better way to debug such a cloud deployment case? Because in normal programs, we can set some breakpoints and print or log something to examine the root location of code that causes the problem, in cloud deployment, however, I lost my direction of debugging.
...Exit Code was 139...
This could mean there's a bug in your Flask app. You can start with minimum spec instead of trying to do all in one goal:
apiVersion: v1
kind: Pod
metadata:
name: flask
labels:
component: flask
spec:
containers:
- name: flask
image: gcr.io/peerless-garage-330917/addition-app-flask:latest
ports:
- containerPort: 8000
See if your pod start accordingly. If it does, try connect to it kubectl port-forward <flask pod name> 8000:8000, follow by curl localhost:8000/health. You should watch your application at all time kubectl logs -f <flask pod name>.
Thanks for #gohm'c response! It is a good suggestion to isolate different parts and start from a smaller component. As suggested, I tried deploying a single flask pod first. Then I used
kubectl port-forward flask 8000:8000
to map the port to local machine. After using
curl localhost:8000/health
to access the port, it showed
Forwarding from 127.0.0.1:8000 -> 8000
Forwarding from [::1]:8000 -> 8000
Handling connection for 8000
E1112 18:52:15.874759 300145 portforward.go:400] an error occurred forwarding 8000 -> 8000: error forwarding port 8000 to pod 4870b939f3224f968fd5afa4660a5af7d10e144ee85149d69acff46a772e94b1, uid : failed to execute portforward in network namespace "/var/run/netns/cni-32f718f0-1248-6da4-c726-b2a5bf1918db": read tcp4 127.0.0.1:38662->127.0.0.1:8000: read: connection reset by peer
At this moment, using
kubectl logs -f flask
returned empty response.
So there is indeed some issues in the flask app.
This health probing is a really simple function in app.py:
#app.route('/health')
def health():
return '', 200
How can I know if the route setting is wrong or not?
Is it because of the WSGIServer in run.py?
from gevent.pywsgi import WSGIServer
from flask_app.app import app
# As flask is not a production suitable server, we use will
# a WSGIServer instance to serve our flask application.
if __name__ == '__main__':
WSGIServer(('0.0.0.0', 8000), app).serve_forever()
If we look at Dockerfile, it seems it exposes the correct port 8000.
If I directly run
python run.py
on my laptop, I can successfully access localhost:8000 .
How can I debug with this kind of problem?

Can't delete exited Init Container

I'm using Kubernetes 1.15.7 and my issue is similar to https://github.com/kubernetes/kubernetes/issues/62362
apiVersion: v1
kind: Pod
metadata:
name: init-demo
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
volumeMounts:
- name: workdir
mountPath: /usr/share/nginx/html
# These containers are run during pod initialization
restartPolicy: Always
initContainers:
- name: install
image: busybox
command:
- sh
- -c
- sleep 60
volumeMounts:
- name: workdir
mountPath: "/work-dir"
dnsPolicy: Default
volumes:
- name: workdir
emptyDir: {}
On the node the container is runner if I issue a docker container prune ,it removes the exited busybox (init) container. Only to restart it again and trigger the pod to restart too.
I found the github issue similar to this but without much explaination. These exited container as such do not show up to consumer much same using docker system df but it doesnt allow me to run the prune command as a whole on the node.
Kubelet manages garbage collection of docker images so you dont have to.
Take a look at k8s documentation for more info on this topic.
From k8s documentation:
Garbage collection is a helpful function of kubelet that will clean up unused images and unused containers. Kubelet will perform garbage collection for containers every minute and garbage collection for images every five minutes.
External garbage collection tools are not recommended as these tools can potentially break the behavior of kubelet by removing containers expected to exist

kubernetest,the sharing Volume by Containers in one pod

I get a question about sharing Volume by Containers in one pod.
Here is my yaml, pod-volume.yaml
apiVersion: v1
kind: Pod
metadata:
name: volume-pod
spec:
containers:
- name: tomcat
image: tomcat
imagePullPolicy: Never
ports:
- containerPort: 8080
volumeMounts:
- name: app-logs
mountPath: /usr/local/tomcat/logs
- name: busybox
image: busybox
command: ["sh", "-c", "tail -f /logs/catalina.out*.log"]
volumeMounts:
- name: app-logs
mountPath: /logs
volumes:
- name: app-logs
emptyDir: {}
create pod:
kubectl create -f pod-volume.yaml
wacth pod status:
watch kubectl get pod -n default
finally,I got this:
NAME READY STATUS RESTARTS AGE
redis-php 2/2 Running 0 15h
volume-pod 1/2 CrashLoopBackOff 5 6m49s
then,I check logs about busybox container:
kubectl logs pod/volume-pod -c busybox
tail: can't open '/logs/catalina.out*.log': No such file or directory
tail: no files
I don't know where is went wrong.
Is this an order of container start in pod, please help me, thanks
For this case:
Catalina logs file is : catalina.$(date '+%Y-%m-%d').log
And in shell script you should not put * into.
So please try:
command: ["sh", "-c", "tail -f /logs/catalina.$(date '+%Y-%m-%d').log"]

How to notify application about updated config file in kubernetes?

When having configmap update, how to automatically trigger the reload of parameters by the application? Application uses POSIX signals for that.
Depending on how you are consuming the configmap values, there could be two ways in which you can reload the configmap updates into a running pod.
If you are consuming the configs as environment variables, you can write a controller, which watches for the updates in configs and restarts your pods with new config whenever the config changes.
If you are consuming the configmap via volumes, you can watch for file changes and notify that to your process in the container and handle the update in application. Please see https://github.com/jimmidyson/configmap-reload for example.
There are good solutions mentioned around here but I tried to find a solution that could be done without modifying existing deployment pipelines, etc.
Here is an example of a filebeat Daemonset from a Helm chart that reloads when the filebeat config changes. The approach is not new: use the liveness probe to trigger a reload of the pod from within the pod itself. The postStart calculates an md5 sum of the configmap directory; the liveness probe checks it. That's all.
The '...' are just to cut out the cruft. You can see that the filebeat.yml file is mounted directly into /etc and used by filebeat itself. The configmap is mounted again, specifically for the purposes of watching the configmap contents for changes.
Once configmap is edited (or otherwise modified), it takes some time before the pod is actually restarted. You can tweak all of that separately.
#apiVersion: apps/v1
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: ...-filebeat
...
containers:
- name: ...-filebeat
image: "{{ .Values.filebeat.image.url }}:{{ .Values.filebeat.image.version }}"
imagePullPolicy: "{{ .Values.filebeat.image.pullPolicy }}"
command: [ "filebeat" ]
args: [
"-c", "/etc/filebeat-config/filebeat.yml",
"-e"
]
env:
...
resources:
...
lifecycle:
postStart:
exec:
command: ["sh", "-c", "ls -LRih /etc/filebeat-config | md5sum >> /tmp/filebeat-config-md5.txt"]
livenessProbe:
exec:
# Further commands can be strung to the statement e.g. calls with curl
command:
- sh
- -c
- >
x=$(cat /tmp/filebeat-config-md5.txt);
y=$(ls -LRih /etc/filebeat-config | md5sum);
if [ "$x" != "$y" ]; then exit 1; fi
initialDelaySeconds: 60
periodSeconds: 60
volumeMounts:
- name: filebeat-config
mountPath: /etc/filebeat-config
readOnly: true
....

Resources