I have the following liveness probe in my service deployment.yaml
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 9081
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
I want to test that the probe is actually triggering a POD redeployment, which is the easiest thing to do to make it fail?
Possibly in a programmatic way.
Update:
Better to clarify the question, I don't want to change the code in the application, neither pausing the container that is running.
I was wondering if it's possible to block someway the endpoint/port at runtime maybe using a kubernetes or docker command.
You could define your liveness probe as follows
livenessProbe:
exec:
command:
- /bin/bash
- '-c'
- /liveness-probe.sh
initialDelaySeconds: 10
periodSeconds: 60
And create an sh file in your root path named
liveness-probe.sh
that contains
#!/bin/bash
#exit 0 #Does not fail and does not trigger a pod restart
exit 1 #Triggers pod restart
If you have the ability to change the underlying applications code, simply change the /health endpoint to make it return something higher than a 400 http status code.
If not, you'll have to make your application fail somehow, probably by logging into the pod using kubectl exec and making changes that affect the application's health.
This is entirely dependent on your application, and kubernetes will simply do what you tell it to.
If you can get to the host where the pod is running, doing a docker pause on the container will pause all the processes in the container, which should fail the liveness probes.
Note: I have not tried this myself but based on the documentation of docker pause here, it sounds like that.
Related
I'm working on the setup of a new Rails project, hosted with Google Kubernetes Engine. Everything was going fine until I switched my deployed server to production mode, with RAILS_ENV=production.
My Kubernetes pods don't reach the ready state anymore. The readiness probe is forbidden to hit the server apparently, since it return a 403 code.
When I run kubectl describe pod <name> on a stuck pod, I get this :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m25s default-scheduler Successfully assigned front to gke-interne-pool
Normal Pulling 5m24s kubelet Pulling image "registry/image:latest"
Normal Pulled 5m24s kubelet Successfully pulled image "registry/image:latest"
Normal Created 5m24s kubelet Created container front
Normal Started 5m24s kubelet Started container front
Warning Unhealthy 11s (x19 over 4m41s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 403
The return of kubectl logs <name> for this pod shows indeed no request from the probe.
But when I launch a console with kubectl exec -it deploy/front -- bash, I can make a curl -s http://localhost:3000, which works perfectly, is displayed in the logs and returns 200.
My setup works in development mode but not in production, and so the Rails 6 app config is the main suspect. Something that I don't understand in the production mode of Rails 6 forbid my readiness probes to contact my pod.
Just in case, the readiness part of deployment.yaml :
spec:
containers:
- name: front
image: registry/image:latest
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: "/"
port: 3000
initialDelaySeconds: 30
periodSeconds: 15
Another potential cause of readiness 403 errors with Rails 6 is the allowed hosts list defined in config.
Typically, you'll have a line of code in config/environments/production.rb that looks something like:
config.hosts << "www.mydomain.com"
This leads to the rejection of any requests that come from hosts other than "www.mydomain.com". The readiness checks come from a private IP address within your cluster so they're going to be rejected given the above config.
One way to get around this is by adding an additional hosts entry that allows traffic from any private IP addresses:
config.hosts << "www.mydomain.com"
config.hosts << /\A10\.\d+\.\d+\.\d+\z/
I can't notice a specific error whereby your implementation failed after switching the mode RAILS_ENV = production. But after checking the error 403 I was able to find a hack, which seems that it worked for some users in their use case, you can try that by leaving your code in yaml like,
readinessProbe:
httpGet:
path: "/"
port: 3000
scheme: "HTTP"
initialDelaySeconds: 30
periodSeconds: 15
Even though I was not able to find an error on your deploy, the error could be directed to your credentials, so validate the route that you put in Rails and the permissions it has and identify if these change depending on whether it is in production or development environment.
as a last option would be to clarify your suspicions with the Rails app, because I don't see what affects when changing the environment variable to Production.
Background: My Docker container has a very long startup time, and it is hard to predict when it is done. And when the health check kicks in, it first may show 'unhealthy' since the startup is sometimes not finished. This may cause a restart or container removal from our automation tools.
My specific question is if I can control my Docker container so that it shows 'starting' until the setup is ready and that the health check can somehow be started immediately after that? Or is there any other recommendation on how to handle states in a good way using health checks?
Side question: I would love to get a reference to how transitions are made and determined during container startup and health check initiating. I have tried googling how to determine Docker (container) states but I can't find any good reference.
My specific question is if I can control my container so that it shows
'starting' until the setup is ready and that the health check can
somehow be started immediately after that?
I don't think that it is possible with just K8s or Docker.
Containers are not designed to communicate with Docker Daemon or Kubernetes to tell that its internal setup is done.
If the application takes a time to setup you could play with readiness and liveness probe options of Kubernetes.
You may indeed configure readynessProbe to perform the initial check after a specific delay.
For example to specify 120 seconds as initial delay :
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 120
Same thing for livenessProbe:
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 120
periodSeconds: 3
For Docker "alone" while not so much configurable you could make it to work with the --health-start-period parameter of the docker run sub command :
--health-start-period : Start period for the container to initialize
before starting health-retries countdown
For example you could specify an important value such as :
docker run --health-start-period=120s ...
Here is my work around. First in docker-compose set long timeout, start_period+timeout should be grater than max expected starting time, eg.
healthcheck:
test: ["CMD", "python3", "appstatus.py", '500']
interval: 60s
timeout: 900s
retries: 2
start_period: 30s
and then run script which can wait (if needed) before return results. In example above it is appstatus.py. In the script is something like:
timeout = int(sys.argv[1])
t0 = time.time()
while True:
time.sleep(2)
if isReady():
sys.exit(os.EX_OK)
t = time.time() - t0
if t > timeout:
sys.exit(os.EX_SOFTWARE)
I am running 2 pod(replicas) of particular deployment on Kubernetes with nginx ingress. Service using web socket also.
Out of 2 pod I have deleted one pod so it starts creating again while 1 was in a ready state. In between this, I tried to open the URL and got an error 504 gateway timeout.
As per my understanding traffic has to divert to Ready state pod from Kubernetes service. Am I missing something please let me know?
Thanks in advance.
Here is my ingress if any mistake
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: core-ingress
annotations:
kubernetes.io/ingress.class: nginx
certmanager.k8s.io/cluster-issuer: core-prod
nginx.ingress.kubernetes.io/proxy-body-size: 50m
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/secure-backends: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/websocket-services: core
nginx.org/websocket-services: core
spec:
tls:
- hosts:
- app.wotnot.io
secretName: core-prod
rules:
- host: example.io
http:
paths:
- backend:
serviceName: core
servicePort: 80
Services do not guarantee 100% uptime, especially if there are only 2 pods. Depending on the timing of your request, one of a number of possible outcomes is occurring.
You try to open the URL before the pod is marked as notReady. What happens, in this case, is your service forwards the request to your pod which is about to terminate. Since the pod is about to terminate and the webserver is shutting down, the pod is no longer able to respond so nginx responds with 504. It is also possible that a session is already started with this pod and it is interrupted because of the sigterm.
You send a request once the second pod is in terminating state. Your primary pod is being overworked from handling 100% of the requests so a response does not come fast enough so nginx returns an error.
In any scenario, your best option is to check the nginx ingress container logs to see why 504 is being returned so you can further debug this.
Note that, as mentioned just above, services do only include pods marked as Ready, however, this does not guarantee that 100% of requests will always be served correctly. Any time a pod is taken down for any reason, there is always a chance that a 5xx error is returned. Having a greater number of pods will reduce the odds of an error being returned but it will rarely completely eliminate the odds.
The "azds up" command times out before all the steps are done. I have a large Angular app that typically takes 5 minutes+ when npm install is executed. When I execute azds up this is what I get:
Step 1/9 : FROM node
Step 2/9 : ENV PORT 80
Step 3/9 : WORKDIR /app
Step 4/9 : COPY package*.json ./
Step 5/9 : RUN npm install --silent
Waiting for container...
and then it returns to the command line.
Is there a configuration in the azds.yaml where I can tell azds/helm to wait for a longer period of time?
Thanks!
To begin with i will give some examples that might be helpful in your case:
An example of .yaml of how to rolling upgrade
spec:
minReadySeconds: 60
progressDeadlineSeconds: 600
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 50%
maxUnavailable: 50%
minReadySeconds: The bootup time of your application, Kubernetes waits specific time until the next pod creation. For
example minReadySeconds is 60 then after the Pod become healthy the
Deployment must wait for 60 seconds to update the next Pod.
progressDeadlineSeconds Timeout value for updating one pod in this example 10 minutes. If the rollout fails to progress in 10
minutes, then the Deployment is marked as failed and finished, also
all of next job never invoke.
maxSurge: The maxSurge parameter controls how many extra resources can be created during the rollout.(Absolute value or %)
maxUnavailable: The maxUnavailable parameter sets the maximum number of Pods that can be unavailable during a rolling
update.(Absolute value or %)
An example of .yaml Liveness & Readiness
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 1
periodSeconds: 10
failureThreshold: 3
The above manifest configured livenessProbes which confirm whether an container is running properly or not.
It probe the health check by using HTTP GET request to /healthz with port 8080.
The probe sets an initialDelaySeconds=60 which means that it will not
be called until 60 seconds after all the containers in the Pod are
created. And timeoutSeconds=1 was configured it means that the probe
must respond with in the 1 second timeout. The periodSeconds=10 was
configured, it means that the probe invoke every 10 seconds. If more
than 3 probe failed(failureThreshold=3), the container will be
considerd un-healthy and failed and restart.
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 120
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 3
The above readinessProbe is more important than liveness probe in
production environment. the readinessProbe is confirm whether the
service can acceptable or not. If this probe failed, the internal
loadbalancer never send the traffic to this pod. Only successed the
probe, the traffic to this pod will start.
I am writing a Jenkins Global pipeline library where I have a stage to deploy my docker image to K8s cluster.
So after building my docker image during CI process I am promoting(deploying) the image to multiple environments(sequentially lower to higher).
So, to get the correct status of deployment after running
kubectl apply -f Application-k8s-file.yaml
I used following command in a shell step.
kubectl rollout status deployment deployment_name
Things goes well if my deployment does not have error but if my deployment has some error(might be some code bug, application does not start) then this command kubectl rollout status deployment <deployment name> runs infinitely(as k8s retries again and again to redeploy) and my Jenkins job runs infinitely(till the Job timeout).
So to find a hack I tried a logic to put the timeout on this command and calculations are something like this:
timeout = (number of pods * liveness probe time + number of pods* 10) seconds
Not sure if this calculation is correct or not.
My code snippet looks like this
sh(returnStdout: true,script:"#!/bin/sh +e\n timeout --preserve-status ${timeout_value} kubectl rollout status deployment ${deploymentName} --kubeconfig='/opt/kubernetes-secrets/${env}/kubeconfig' 2>tmpfile; echo \$? > tmpfile1")
def readThisFile = readFile "tmpfile1.txt"
def var=readThisFile.toInteger()
if(var==0)
{
echo "deployment successful"
}
else{"do something else"}
This works well initially but later I find that k8s "kubectl rollout status deployment " command doesn't give exit code 0 until all the pods get scheduled and old get terminated completely which sometimes take time.
What I basically want is a minimal calculated timeout value.
My K8s file have parameters like this:
spec:
minReadySeconds: 30
livenessProbe:
httpGet:
path: /ping
port: 80
initialDelaySeconds: 45
periodSeconds: 5
timeoutSeconds: 60
name: test-dummyservice
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 60
periodSeconds: 120
timeoutSeconds: 60
I did not find anything specific related to this in K8s documentation. Anyone facing same challenge?
You should take a look at progressDeadlineSeconds. Once this exceeds the deadline the rollout status will exit out.
kubectl rollout status deployment ng
Waiting for rollout to finish: 2 out of 7 new replicas have been updated...
error: deployment "ng" exceeded its progress deadline
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#progress-deadline-seconds
You can add the timeout flag like below, here is the doc https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands
kubectl rollout status deployment deployment_name --watch --timeout=5m
If you don't want to wait for the rollout to finish then you can use --watch=false.
kubectl rollout status deployment deployment_name --watch=false
Now, you can check with this command for a duration with specific interval.