POST larger than 400 Kilobytes payload to a container in Kubernetes fails - docker

I'm using EKS (Kubernetes) in AWS and I have problems with posting a payload at around 400 Kilobytes to any web server that runs in a container in that Kubernetes. I hit some kind of limit but it's not a limit in size, it seems at around 400 Kilobytes many times works but sometimes I get (testing with Python requests)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
I test this with different containers (python web server on Alpine, Tomcat server on CentOS, nginx, etc).
The more I increase the size over 400 Kilobytes, the more consistent I get: Connection reset by peer.
Any ideas?

Thanks for your answers and comments, helped me get closer to the source of the problem. I did upgrade the AWS cluster from 1.11 to 1.12 and that cleared this error when accessing from service to service within Kubernetes. However, the error still persisted when accessing from outside the Kubernetes cluster using a public dns, thus the load balancer.
So after testing some more I found out that now the problem lies in the ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/
So I switched back to a Kubernetes service that generates an older-generation ELB and the problem was fixed. The ELB is not ideal, but it's a good work-around for the moment, until the ALB controller gets fixed or I have the right button to press to fix it.

As you mentioned in this answer that the issue might be caused by ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/.
Can you check if Nginx Ingress controller can be used with ALB ?
Nginx has a default value of request size set to 1Mb. It can be changed by using this annotation: nginx.ingress.kubernetes.io/proxy-body-size.
Also are you configuring connection-keep-alive or connection timeouts anywhere ?

The connection reset by peer, even between services inside the cluster, sounds like it may be the known issue with conntrack. The fix involves running the following:
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
And you can automate this with the following DaemonSet:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: startup-script
labels:
app: startup-script
spec:
template:
metadata:
labels:
app: startup-script
spec:
hostPID: true
containers:
- name: startup-script
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
echo done

As this answer suggests, you may try to change you kube-proxy mode of operation. To edit your kube-proxy configs:
kubectl -n kube-system edit configmap kube-proxy
Search for mode: "" and try "iptables" , "userspace" or "ipvs". Each time you change your configmap, delete your kube-proxy pod(s) to make sure it is reading the new configmap.

we had a similar issue with Azure and its firewall which prevents to send more than 128KB as patch request.
After researching and thinking about the pro/cons on this approach within the team, our solution is a complete different one.
We put our "bigger" requests into a blob storage. Afterwards we put a message onto a queue with the filename created before. The queue will receive the message with the filename, reads the blob from the storage, converts it into whatever-you-need-to-have as object and is able to apply any business logic on this big object.
After processing the message, the file will be deleted.
The biggest advantage is that our API is not blocked with a big request and its long running job.
Maybe this can be another way to solve your issue within the kubernetes container.
See ya, Leonhard

Related

Unable to export traces to OpenTelemetry Collector on Kubernetes

I am using the opentelemetry-ruby otlp exporter for auto instrumentation:
https://github.com/open-telemetry/opentelemetry-ruby/tree/main/exporter/otlp
The otel collector was installed as a daemonset:
https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector
I am trying to get the OpenTelemetry collector to collect traces from the Rails application. Both are running in the same cluster, but in different namespaces.
We have enabled auto-instrumentation in the app, but the rails logs are currently showing these errors:
E, [2022-04-05T22:37:47.838197 #6] ERROR -- : OpenTelemetry error: Unable to export 499 spans
I set the following env variables within the app:
OTEL_LOG_LEVEL=debug
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
I can't confirm that the application can communicate with the collector pods on this port.
Curling this address from the rails/ruby app returns "Connection Refused". However I am able to curl http://<OTEL_POD_IP>:4318 which returns 404 page not found.
From inside a pod:
# curl http://localhost:4318/
curl: (7) Failed to connect to localhost port 4318: Connection refused
# curl http://10.1.0.66:4318/
404 page not found
This helm chart created a daemonset but there is no service running. Is there some setting I need to enable to get this to work?
I confirmed that otel-collector is running on every node in the cluster and the daemonset has HostPort set to 4318.
The problem is with this setting:
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
Imagine your pod as a stripped out host itself. Localhost or 0.0.0.0 of your pod, and you don't have a collector deployed in your pod.
You need to use the address from your collector. I've checked the examples available at the shared repo and for agent-and-standalone and standalone-only you also have a k8s resource of type Service.
With that you can use the full service name (with namespace) to configure your environment variable.
Also, the Environment variable now is called OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, so you will need something like this:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=<service-name>.<namespace>.svc.cluster.local:<service-port>
The correct solution is to use the Kubernetes Downward API to fetch the node IP address, which will allow you to export the traces directly to the daemonset pod within the same node:
containers:
- name: my-app
image: my-image
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://$(HOST_IP):4318
Note that using the deployment's service as the endpoint (<service-name>.<namespace>.svc.cluster.local) is incorrect, as it effectively bypasses the daemonset and sends the traces directly to the deployment, which makes the daemonset useless.

nginx ingress 504 timeout with running multi pod

I am running 2 pod(replicas) of particular deployment on Kubernetes with nginx ingress. Service using web socket also.
Out of 2 pod I have deleted one pod so it starts creating again while 1 was in a ready state. In between this, I tried to open the URL and got an error 504 gateway timeout.
As per my understanding traffic has to divert to Ready state pod from Kubernetes service. Am I missing something please let me know?
Thanks in advance.
Here is my ingress if any mistake
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: core-ingress
annotations:
kubernetes.io/ingress.class: nginx
certmanager.k8s.io/cluster-issuer: core-prod
nginx.ingress.kubernetes.io/proxy-body-size: 50m
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800"
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800"
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/secure-backends: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/websocket-services: core
nginx.org/websocket-services: core
spec:
tls:
- hosts:
- app.wotnot.io
secretName: core-prod
rules:
- host: example.io
http:
paths:
- backend:
serviceName: core
servicePort: 80
Services do not guarantee 100% uptime, especially if there are only 2 pods. Depending on the timing of your request, one of a number of possible outcomes is occurring.
You try to open the URL before the pod is marked as notReady. What happens, in this case, is your service forwards the request to your pod which is about to terminate. Since the pod is about to terminate and the webserver is shutting down, the pod is no longer able to respond so nginx responds with 504. It is also possible that a session is already started with this pod and it is interrupted because of the sigterm.
You send a request once the second pod is in terminating state. Your primary pod is being overworked from handling 100% of the requests so a response does not come fast enough so nginx returns an error.
In any scenario, your best option is to check the nginx ingress container logs to see why 504 is being returned so you can further debug this.
Note that, as mentioned just above, services do only include pods marked as Ready, however, this does not guarantee that 100% of requests will always be served correctly. Any time a pod is taken down for any reason, there is always a chance that a 5xx error is returned. Having a greater number of pods will reduce the odds of an error being returned but it will rarely completely eliminate the odds.

scdf 2.1 k8s config security context non root no fs writable

I need config scdf2 skipper , scdf and app pods to run without root and no write into filesystem pod .
i made changes into config yamls
data:
application.yaml: |-
spring:
cloud:
skipper:
server:
platform:
kubernetes:
accounts:
default:
namespace: default
deploymentServiceAccountName: scdf2-server-data-flow
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
limits:
Colla
And scdf start runs with user "2000", (there is a problem with writeable local maven repo, fixed with a pvc nfs)...
But, the app pods always starts as root user, no 2000 users.
I've change skipper-config with securitycontext, .. any clues?
TX
What you set as deploymentServiceAccountName is one of the Kubernetes deployer properties that can be used for deploying streaming applications or launching task applications.
Looks like the above configuration is not applied to your SCDF or Skipper server configuration properties as they should at least get applied when deploying applications.
For the SCDF server and Skipper servers, in your SCDF/Skipper server deployment configurations, you need to explicitly set your serviceAccountName (not as deploymentServiceAccountName as its name suggests, the deploymentServiceAccountName is internally converted into the actual serviceAccountName for the respective stream/task apps when they get deployed).
We got it. We use it into skipp/scdf deploy, not in pods deploymente.
Your request:
Into scdf / skipper cfg deployment got:
spec:
containers:
- name: {{ template "scdf.fullname" . }}-server
image: {{ .Values.server.image }}:{{ .Values.server.version }}
imagePullPolicy: {{ .Values.server.imagePullPolicy }}
volumeMounts:
...
serviceAccountName: {{ template "scdf.serviceAccountName" . }}
Do you tell me to change config map scdf/skipper to task and streams? Another property into or before config about deployment
How is it relation about "serviceaccount" and user running process into pod?
How related serviceaccount with running process user "2000"
I cant understand.
Please help, it is very important to running without root and no use local filesystem from pod excepts "tmp" files

Kubernetes certbot standalone not working

I'm trying to generate an SSL certificate with certbot/certbot docker container in kubernetes. I am using Job controller for this purpose which looks as the most suitable option. When I run the standalone option, I get the following error:
Failed authorization procedure. staging.ishankhare.com (http-01):
urn:ietf:params:acme:error:connection :: The server could not connect
to the client to verify the domain :: Fetching
http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
Timeout during connect (likely firewall problem)
I've made sure that this isn't due to misconfigured DNS entries by running a simple nginx container, and it resolves properly. Following is my Jobs file:
apiVersion: batch/v1
kind: Job
metadata:
#labels:
# app: certbot-generator
name: certbot
spec:
template:
metadata:
labels:
app: certbot-generate
spec:
volumes:
- name: certs
containers:
- name: certbot
image: certbot/certbot
command: ["certbot"]
#command: ["yes"]
args: ["certonly", "--noninteractive", "--agree-tos", "--staging", "--standalone", "-d", "staging.ishankhare.com", "-m", "me#ishankhare.com"]
volumeMounts:
- name: certs
mountPath: "/etc/letsencrypt/"
#- name: certs
#mountPath: "/opt/"
ports:
- containerPort: 80
- containerPort: 443
restartPolicy: "OnFailure"
and my service:
apiVersion: v1
kind: Service
metadata:
name: certbot-lb
labels:
app: certbot-lb
spec:
type: LoadBalancer
loadBalancerIP: 35.189.170.149
ports:
- port: 80
name: "http"
protocol: TCP
- port: 443
name: "tls"
protocol: TCP
selector:
app: certbot-generator
the full error message is something like this:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for staging.ishankhare.com
Waiting for verification...
Cleaning up challenges
Failed authorization procedure. staging.ishankhare.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8: Timeout during connect (likely firewall problem)
IMPORTANT NOTES:
- The following errors were reported by the server:
Domain: staging.ishankhare.com
Type: connection
Detail: Fetching
http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
Timeout during connect (likely firewall problem)
To fix these errors, please make sure that your domain name was
entered correctly and the DNS A/AAAA record(s) for that domain
contain(s) the right IP address. Additionally, please check that
your computer has a publicly routable IP address and that no
firewalls are preventing the server from communicating with the
client. If you're using the webroot plugin, you should also verify
that you are serving files from the webroot path you provided.
- Your account credentials have been saved in your Certbot
configuration directory at /etc/letsencrypt. You should make a
secure backup of this folder now. This configuration directory will
also contain certificates and private keys obtained by Certbot so
making regular backups of this folder is ideal.
I've also tried running this as a simple Pod but to no help. Although I still feel running it as a Job to completion is the way to go.
First, be aware your Job definition is valid, but the spec.template.metadata.labels.app: certbot-generate value does not match with your Service definition spec.selector.app: certbot-generator: one is certbot-generate, the second is certbot-generator. So the pod run by the job controller is never added as an endpoint to the service.
Adjust one or the other, but they have to match, and that might just work :)
Although, I'm not sure using a Service with a selector targeting short-lived pods from a Job controller would work, neither with a simple Pod as you tested. The certbot-randomId pod created by the job (or whatever simple pod you create) takes about 15 seconds total to run/fail, and the HTTP validation challenge is triggered after just a few seconds of the pod life: it's not clear to me that would be enough time for kubernetes proxying to be already working between the service and the pod.
We can safely assume that the Service is actually working because you mentioned that you tested DNS resolution, so you can easily ensure that's not a timing issue by adding a sleep 10 (or more!) to give more time for the pod to be added as an endpoint to the service and being proxied appropriately before the HTTP challenge is triggered by certbot. Just change your Job command and args for those:
command: ["/bin/sh"]
args: ["-c", "sleep 10 && certbot certonly --noninteractive --agree-tos --staging --standalone -d staging.ishankhare.com -m me#ishankhare.com"]
And here too, that might just work :)
That being said, I'd warmly recommend you to use cert-manager which you can install easily through its stable Helm chart: the Certificate custom resource that it introduces will store your certificate in a Secret which will make it straightforward to reuse from whatever K8s resource, and it takes care of renewal automatically so you can just forget about it all.

How to run a redis cluster on a docker cluster?

Context
I am trying to setup a redis cluster so that it runs on top off a docker cluster, to achieve maximum auto-healing.
More precisely, I have a docker compose file, which defines a service that has 3 replicas. Each service replica has a redis-server running on.
Then I have a program inside each replica that listens to changes on the docker cluster and that starts the cluster when conditions are met (each 3 redis-servers know each other).
Setting up the redis cluster works has expected, the cluster is formed and all the redis-servers communicate well, but the communication between redis-servers is inside the docker cluster.
The Problem
When I try to communicate from outside the docker cluster, because of the ingress mode I am able to talk to a redis-server, however when I try to add info (eg: set foo bar) and the client is moved to another redis-server the communication hangs and eventually times out.
Code
This is the docker-compose file.
version: "3.3"
services:
redis-cluster:
image: redis-srv-instance
volumes:
- /var/run/:/var/run
deploy:
mode: replicated
#endpoint_mode: dnsrr
replicas: 3
resources:
limits:
cpus: '0.5'
memory: 512M
ports:
- target: 6379
published: 30000
protocol: tcp
mode: ingress
The flux of commands that show the problem.
Client
~ ./redis-cli -c -p 30000
127.0.0.1:30000>
Redis-server
OK
1506533095.032738 [0 10.255.0.2:59700] "COMMAND"
1506533098.335858 [0 10.255.0.2:59700] "info"
Client
127.0.0.1:30000> set ghb fki
OK
Redis-server
1506533566.481334 [0 10.255.0.2:59718] "COMMAND"
1506533571.315238 [0 10.255.0.2:59718] "set" "ghb" "fki"
Client
127.0.0.1:30000> set rte fgh
-> Redirected to slot [3830] located at 10.0.0.3:6379
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
Could not connect to Redis at 10.0.0.3:6379: Operation timed out
(150.31s)
not connected>
Any ideas? I have also tried making my one proxy/load balancer but didn't work.
Thank you! Have a nice day.
For this use case, sentinel might help. Redis on its own is not capably of high availability. Sentinel on the other side is a distributed system which can do the following for you:
Route the ingress trafic to the current Redis master.
Elect a new Redis master should the current one fail.
While I have previously done research on this topic, I have not yet managed to pull to getter a working example.
redis-cli would get the redis server ip inside the ingress network, and try to access the remote redis server by that ip directly. That is why redis-cli shows Redirected to slot [3830] located at 10.0.0.3:6379. But this internal 10.0.0.3 is not accessible to redis-cli.
One solution is to run another proxy service which attaches to the same network with redis cluster. The application sends all requests to that proxy service, and the proxy service talks with redis cluster.
Or you could create 3 swarm services that uses the bridge network and exposes the redis port to node. Your internal program needs to change accordingly.

Resources