nginx ingress 504 timeout with running multi pod - docker

I am running 2 pod(replicas) of particular deployment on Kubernetes with nginx ingress. Service using web socket also.
Out of 2 pod I have deleted one pod so it starts creating again while 1 was in a ready state. In between this, I tried to open the URL and got an error 504 gateway timeout.
As per my understanding traffic has to divert to Ready state pod from Kubernetes service. Am I missing something please let me know?
Thanks in advance.
Here is my ingress if any mistake
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: core-ingress
annotations:
kubernetes.io/ingress.class: nginx
certmanager.k8s.io/cluster-issuer: core-prod
nginx.ingress.kubernetes.io/proxy-body-size: 50m
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/secure-backends: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/websocket-services: core
nginx.org/websocket-services: core
spec:
tls:
- hosts:
- app.wotnot.io
secretName: core-prod
rules:
- host: example.io
http:
paths:
- backend:
serviceName: core
servicePort: 80

Services do not guarantee 100% uptime, especially if there are only 2 pods. Depending on the timing of your request, one of a number of possible outcomes is occurring.
You try to open the URL before the pod is marked as notReady. What happens, in this case, is your service forwards the request to your pod which is about to terminate. Since the pod is about to terminate and the webserver is shutting down, the pod is no longer able to respond so nginx responds with 504. It is also possible that a session is already started with this pod and it is interrupted because of the sigterm.
You send a request once the second pod is in terminating state. Your primary pod is being overworked from handling 100% of the requests so a response does not come fast enough so nginx returns an error.
In any scenario, your best option is to check the nginx ingress container logs to see why 504 is being returned so you can further debug this.
Note that, as mentioned just above, services do only include pods marked as Ready, however, this does not guarantee that 100% of requests will always be served correctly. Any time a pod is taken down for any reason, there is always a chance that a 5xx error is returned. Having a greater number of pods will reduce the odds of an error being returned but it will rarely completely eliminate the odds.

Related

Trying to understand what values to use for resources and limits of multiple container deployment

I am trying to set up HorizontalPodAutoscaler autoscaler for my app, alongside automatic Cluster Autoscaling of DigitalOcean
I will add my deployment yaml below, I have also deployed metrics-server as per guide in link above. At the moment I am struggling to figure out how to determine what values to use for my cpu and memory requests and limits fields. Mainly due to variable replica count, i.e. do I need to account for maximum number of replicas each using their resources or for deployment in general, do I plan it per pod basis or for each container individually?
For some context I am running this on a cluster that can have up to two nodes, each node has 1 vCPU and 2GB of memory (so total can be 2 vCPUs and 4 GB of memory).
As it is now my cluster is running one node and my kubectl top statistics for pods and nodes look as follows:
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
graphql-85cc89c874-cml6j 5m 203Mi
graphql-85cc89c874-swmzc 5m 176Mi
kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
skimitar-dev-pool-3cpbj 62m 6% 1151Mi 73%
I have tried various combinations of cpu and resources, but when I deploy my file my deployment is either stuck in a Pending state, or keeps restarting multiple times until it gets terminated. My horizontal pod autoscaler also reports targets as <unknown>/80%, but I believe it is due to me removing resources from my deployment, as it was not working.
Considering deployment below, what should I look at / consider in order to determine best values for requests and limits of my resources?
Following yaml is cleaned up from things like env variables / services, it works as is, but results in above mentioned issues when resources fields are uncommented.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: graphql
spec:
replicas: 2
selector:
matchLabels:
app: graphql
template:
metadata:
labels:
app: graphql
spec:
containers:
- name: graphql-hasura
image: hasura/graphql-engine:v1.2.1
ports:
- containerPort: 8080
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
# resources:
# requests:
# memory: "150Mi"
# cpu: "100m"
# limits:
# memory: "200Mi"
# cpu: "150m"
- name: graphql-actions
image: my/nodejs-app:1
ports:
- containerPort: 4040
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 4040
readinessProbe:
httpGet:
path: /healthz
port: 4040
# resources:
# requests:
# memory: "150Mi"
# cpu: "100m"
# limits:
# memory: "200Mi"
# cpu: "150m"
# Disruption budget
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: graphql-disruption-budget
spec:
minAvailable: 1
selector:
matchLabels:
app: graphql
# Horizontal auto scaling
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: graphql-autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: graphql
minReplicas: 2
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
How to determine what values to use for my cpu and memory requests and limits fields. Mainly due to variable replica count, i.e. do I need to account for maximum number of replicas each using their resources or for deployment in general, do I plan it per pod basis or for each container individually
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory.
Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.
Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
The number of replicas will be determined by the autoscaler on the ReplicaController.
when I deploy my file my deployment is either stuck in a Pending state, or keeps restarting multiple times until it gets terminated.
pending state means that there is not resources available to schedule new pods.
restarting may be triggered by other issues, I'd suggest you to debug it after solving the scaling issues.
My horizontal pod autoscaler also reports targets as <unknown>/80%, but I believe it is due to me removing resources from my deployment, as it was not working.
You are correct, if you don't set the request limit, the % desired will remain unknown and the autoscaler won't be able to trigger scaling up or down.
Here you can see algorithm responsible for that.
Horizontal Pod Autoscaler will trigger new pods based on the request % of usage on the pod. In this case whenever the pod reachs 80% of the max request value it will trigger new pods up to the maximum specified.
For a good HPA example, check this link: Horizontal Pod Autoscale Walkthrough
But How does Horizontal Pod Autoscaler works with Cluster Autoscaler?
Horizontal Pod Autoscaler changes the deployment's or replicaset's number of replicas based on the current CPU load. If the load increases, HPA will create new replicas, for which there may or may not be enough space in the cluster.
If there are not enough resources, CA will try to bring up some nodes, so that the HPA-created pods have a place to run. If the load decreases, HPA will stop some of the replicas. As a result, some nodes may become underutilized or completely empty, and then CA will terminate such unneeded nodes.
NOTE: The key is to set the maximum replicas for HPA thinking on a cluster level according to the amount of nodes (and budget) available for your app, you can start setting a very high max number of replicas, monitor and then change it according to the usage metrics and prediction of future load.
Take a look at How to Enable the Cluster Autoscaler for a DigitalOcean Kubernetes Cluster in order to properly enable it as well.
If you have any question let me know in the comments.

minikube how to connect from one pod to another using hostnames?

I am running a cluster in default namespace with all the pods in Running state.
I have an issue, I am trying to telnet from one pod to another pod using the pod hostname 'abcd-7988b76669-lgp8l' but I am not able to connect. although it works if I use pods internal ip. Why does the dns is not resolved?
I looked at
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-5lpfd 1/1 Running 0 12h
coredns-6955765f44-9cvnb 1/1 Running 0 12h
Anybody has any idea how to connect from one pod to another using hostname resolution ?
First of all it is worth mentioning that typically you won't connect to individual Pods using their domain names. One good reason for that is their ephemeral nature. Note that typically you don't create plain Pods but controller such as Deployment which manages your Pods and ensures that specific number of Pods of a certain kind is constantly up and running. Pods may be often deleted and recreated hence you should never rely on their domain names in your applications. Typically you will expose them to another apps e.g. running in other Pods via Service.
Although using invididual Pod's domain name is not recommended, it is still possible. You can do it just for fun or learning/experimenting purposes.
As #David already mentioned you would help us much more in providing you a comprehensive answer if you EDIT your question and provide a few important details, showing what you've tried already such as your Pods and Services definitions in yaml format.
Answering literally to your question posted in the title:
minikube how to connect from one pod to another using hostnames?
You won't be able to connect to a Pod using simply its hostname. You can e.g. ping your backend Pods exposed via ClusterIP Service by simply pinging the <service-name> (provided it is in the same namespace as the Pod your pinging from).
Keep in mind however that it doesn't work for Pods - neither Pods names nor their hostnames are resolvable by cluster DNS.
You should be able to connect to an individual Pod using its fully quallified domain name (FQDN) provided you have configured everything properly. Just make sure you didn't overlook any of the steps described here:
Make sure you've created a simple Headless Service which may look like this:
apiVersion: v1
kind: Service
metadata:
name: default-subdomain
spec:
selector:
name: busybox
clusterIP: None
Make sure that your Pods definitions didn't lack any important details:
apiVersion: v1
kind: Pod
metadata:
name: busybox1
labels:
name: busybox
spec:
hostname: busybox-1
subdomain: default-subdomain
containers:
- image: busybox:1.28
command:
- sleep
- "3600"
name: busybox
---
apiVersion: v1
kind: Pod
metadata:
name: busybox2
labels:
name: busybox
spec:
hostname: busybox-2
subdomain: default-subdomain
containers:
- image: busybox:1.28
command:
- sleep
- "3600"
name: busybox
Speaking about important details, pay special attention that you correctly defined hostname and subdomain in Pod specification and that labels used by Pods match the labels used by Service's selector.
Once everything is configured properly you will be able to attach to Pod busybox1 and ping Pod busybox2 by using its FQDN like in the example below:
$ kubectl exec -ti busybox1 -- /bin/sh
/ # ping busybox-2.default-subdomain.default.svc.cluster.local
PING busybox-2.default-subdomain.default.svc.cluster.local (10.16.0.109): 56 data bytes
64 bytes from 10.16.0.109: seq=0 ttl=64 time=0.051 ms
64 bytes from 10.16.0.109: seq=1 ttl=64 time=0.082 ms
64 bytes from 10.16.0.109: seq=2 ttl=64 time=0.081 ms
I hope this helps.

POST larger than 400 Kilobytes payload to a container in Kubernetes fails

I'm using EKS (Kubernetes) in AWS and I have problems with posting a payload at around 400 Kilobytes to any web server that runs in a container in that Kubernetes. I hit some kind of limit but it's not a limit in size, it seems at around 400 Kilobytes many times works but sometimes I get (testing with Python requests)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
I test this with different containers (python web server on Alpine, Tomcat server on CentOS, nginx, etc).
The more I increase the size over 400 Kilobytes, the more consistent I get: Connection reset by peer.
Any ideas?
Thanks for your answers and comments, helped me get closer to the source of the problem. I did upgrade the AWS cluster from 1.11 to 1.12 and that cleared this error when accessing from service to service within Kubernetes. However, the error still persisted when accessing from outside the Kubernetes cluster using a public dns, thus the load balancer.
So after testing some more I found out that now the problem lies in the ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/
So I switched back to a Kubernetes service that generates an older-generation ELB and the problem was fixed. The ELB is not ideal, but it's a good work-around for the moment, until the ALB controller gets fixed or I have the right button to press to fix it.
As you mentioned in this answer that the issue might be caused by ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/.
Can you check if Nginx Ingress controller can be used with ALB ?
Nginx has a default value of request size set to 1Mb. It can be changed by using this annotation: nginx.ingress.kubernetes.io/proxy-body-size.
Also are you configuring connection-keep-alive or connection timeouts anywhere ?
The connection reset by peer, even between services inside the cluster, sounds like it may be the known issue with conntrack. The fix involves running the following:
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
And you can automate this with the following DaemonSet:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: startup-script
labels:
app: startup-script
spec:
template:
metadata:
labels:
app: startup-script
spec:
hostPID: true
containers:
- name: startup-script
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
echo done
As this answer suggests, you may try to change you kube-proxy mode of operation. To edit your kube-proxy configs:
kubectl -n kube-system edit configmap kube-proxy
Search for mode: "" and try "iptables" , "userspace" or "ipvs". Each time you change your configmap, delete your kube-proxy pod(s) to make sure it is reading the new configmap.
we had a similar issue with Azure and its firewall which prevents to send more than 128KB as patch request.
After researching and thinking about the pro/cons on this approach within the team, our solution is a complete different one.
We put our "bigger" requests into a blob storage. Afterwards we put a message onto a queue with the filename created before. The queue will receive the message with the filename, reads the blob from the storage, converts it into whatever-you-need-to-have as object and is able to apply any business logic on this big object.
After processing the message, the file will be deleted.
The biggest advantage is that our API is not blocked with a big request and its long running job.
Maybe this can be another way to solve your issue within the kubernetes container.
See ya, Leonhard

GCP Kubernetes workload "Does not have minimum availability"

Background: I'm trying to set up a Bitcoin Core regtest pod on Google Cloud Platform. I borrowed some code from https://gist.github.com/zquestz/0007d1ede543478d44556280fdf238c9, editing it so that instead of using Bitcoin ABC (a different client implementation), it uses Bitcoin Core instead, and changed the RPC username and password to both be "test". I also added some command arguments for the docker-entrypoint.sh script to forward to bitcoind, the daemon for the nodes I am running. When attempting to deploy the following three YAML files, the dashboard in "workloads" shows bitcoin has not having minimum availability. Getting the pod to deploy correctly is important so I can send RPC commands to the Load Balancer. Attached below are my YAML files being used. I am not very familiar with Kubernetes, and I'm doing a research project on scalability which entails running RPC commands against this pod. Ask for relevant logs and I will provide them in seperate pastebins. Right now, I'm only running three machines on my cluster, as I'm am still setting this up. The zone is us-east1-d, machine type is n1-standard-2.
Question: Given these files below, what is causing GCP Kubernetes Engine to respond with "Does not have minimum availability", and how can this be fixed?
bitcoin-deployment.sh
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: default
labels:
service: bitcoin
name: bitcoin
spec:
strategy:
type: Recreate
replicas: 1
template:
metadata:
labels:
service: bitcoin
spec:
containers:
- env:
- name: BITCOIN_RPC_USER
valueFrom:
secretKeyRef:
name: test
key: test
- name: BITCOIN_RPC_PASSWORD
valueFrom:
secretKeyRef:
name: test
key: test
image: ruimarinho/bitcoin-core:0.17.0
name: bitcoin
ports:
- containerPort: 18443
protocol: TCP
volumeMounts:
- mountPath: /data
name: bitcoin-data
resources:
requests:
memory: "1.5Gi"
command: ["./entrypoint.sh"]
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
restartPolicy: Always
volumes:
- name: bitcoin-data
gcePersistentDisk:
pdName: disk-bitcoincore-1
fsType: ext4
bitcoin-secrets.yml
apiVersion: v1
kind: Secret
metadata:
name: bitcoin
type: Opaque
data:
rpcuser: dGVzdAo=
rpcpass: dGVzdAo=
bitcoin-srv.yml
apiVersion: v1
kind: Service
metadata:
name: bitcoin
namespace: default
spec:
ports:
- port: 18443
targetPort: 18443
selector:
service: bitcoin
type: LoadBalancer
externalTrafficPolicy: Local
I have run into this issue several times. The solutions that I used:
Wait. Google Cloud does not have enough resource available in the Region/Zone that you are trying to launch into. In some cases this took an hour to an entire day.
Select a different Region/Zone.
An example was earlier this month. I could not launch new resources in us-west1-a. I think just switched to us-east4-c. Everything launched.
I really do not know why this happens under the covers with Google. I have personally experienced this problem three times in the last three months and I have seen this problem several times on StackOverflow. The real answer might be a simple is that Google Cloud is really started to grow faster than their infrastructure. This is a good thing for Google as I know that they are investing in major new reasources for the cloud. Personally, I really like working with their cloud.
There could be many reasons for this failure:
Insufficient resources
Liveliness probe failure
Readiness probe failure
I encountered this error within GKE.
The reason was the pod was not about to find the configmap due to name mismatch. So make sure all the resources are discoverable by the pod.
The error message you mentioned isn't directly pointing to a stockout; it's more of resources unavailable within the cluster. You can try again after adding another node to the cluster etc. Also, this troubleshooting guide suggests if your Nodes have enough resources but you still have Does not have minimum availability message, check if the Nodes have SchedulingDisabled or Cordoned status: in this case they don't accept new pods.
Please, check your logs https://console.cloud.google.com/logs you might be surprised that your app is been failing.
I faced with the same issue when my spring-boot application failed to start due to my spring-boot configuration mistake.
Also in the args you use:
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
should it be "-rpcpassport" or "-rpcpassword" ?

Kubernetes certbot standalone not working

I'm trying to generate an SSL certificate with certbot/certbot docker container in kubernetes. I am using Job controller for this purpose which looks as the most suitable option. When I run the standalone option, I get the following error:
Failed authorization procedure. staging.ishankhare.com (http-01):
urn:ietf:params:acme:error:connection :: The server could not connect
to the client to verify the domain :: Fetching
http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
Timeout during connect (likely firewall problem)
I've made sure that this isn't due to misconfigured DNS entries by running a simple nginx container, and it resolves properly. Following is my Jobs file:
apiVersion: batch/v1
kind: Job
metadata:
#labels:
# app: certbot-generator
name: certbot
spec:
template:
metadata:
labels:
app: certbot-generate
spec:
volumes:
- name: certs
containers:
- name: certbot
image: certbot/certbot
command: ["certbot"]
#command: ["yes"]
args: ["certonly", "--noninteractive", "--agree-tos", "--staging", "--standalone", "-d", "staging.ishankhare.com", "-m", "me#ishankhare.com"]
volumeMounts:
- name: certs
mountPath: "/etc/letsencrypt/"
#- name: certs
#mountPath: "/opt/"
ports:
- containerPort: 80
- containerPort: 443
restartPolicy: "OnFailure"
and my service:
apiVersion: v1
kind: Service
metadata:
name: certbot-lb
labels:
app: certbot-lb
spec:
type: LoadBalancer
loadBalancerIP: 35.189.170.149
ports:
- port: 80
name: "http"
protocol: TCP
- port: 443
name: "tls"
protocol: TCP
selector:
app: certbot-generator
the full error message is something like this:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for staging.ishankhare.com
Waiting for verification...
Cleaning up challenges
Failed authorization procedure. staging.ishankhare.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8: Timeout during connect (likely firewall problem)
IMPORTANT NOTES:
- The following errors were reported by the server:
Domain: staging.ishankhare.com
Type: connection
Detail: Fetching
http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
Timeout during connect (likely firewall problem)
To fix these errors, please make sure that your domain name was
entered correctly and the DNS A/AAAA record(s) for that domain
contain(s) the right IP address. Additionally, please check that
your computer has a publicly routable IP address and that no
firewalls are preventing the server from communicating with the
client. If you're using the webroot plugin, you should also verify
that you are serving files from the webroot path you provided.
- Your account credentials have been saved in your Certbot
configuration directory at /etc/letsencrypt. You should make a
secure backup of this folder now. This configuration directory will
also contain certificates and private keys obtained by Certbot so
making regular backups of this folder is ideal.
I've also tried running this as a simple Pod but to no help. Although I still feel running it as a Job to completion is the way to go.
First, be aware your Job definition is valid, but the spec.template.metadata.labels.app: certbot-generate value does not match with your Service definition spec.selector.app: certbot-generator: one is certbot-generate, the second is certbot-generator. So the pod run by the job controller is never added as an endpoint to the service.
Adjust one or the other, but they have to match, and that might just work :)
Although, I'm not sure using a Service with a selector targeting short-lived pods from a Job controller would work, neither with a simple Pod as you tested. The certbot-randomId pod created by the job (or whatever simple pod you create) takes about 15 seconds total to run/fail, and the HTTP validation challenge is triggered after just a few seconds of the pod life: it's not clear to me that would be enough time for kubernetes proxying to be already working between the service and the pod.
We can safely assume that the Service is actually working because you mentioned that you tested DNS resolution, so you can easily ensure that's not a timing issue by adding a sleep 10 (or more!) to give more time for the pod to be added as an endpoint to the service and being proxied appropriately before the HTTP challenge is triggered by certbot. Just change your Job command and args for those:
command: ["/bin/sh"]
args: ["-c", "sleep 10 && certbot certonly --noninteractive --agree-tos --staging --standalone -d staging.ishankhare.com -m me#ishankhare.com"]
And here too, that might just work :)
That being said, I'd warmly recommend you to use cert-manager which you can install easily through its stable Helm chart: the Certificate custom resource that it introduces will store your certificate in a Secret which will make it straightforward to reuse from whatever K8s resource, and it takes care of renewal automatically so you can just forget about it all.

Resources