I'm trying to generate an SSL certificate with certbot/certbot docker container in kubernetes. I am using Job controller for this purpose which looks as the most suitable option. When I run the standalone option, I get the following error:
Failed authorization procedure. staging.ishankhare.com (http-01):
urn:ietf:params:acme:error:connection :: The server could not connect
to the client to verify the domain :: Fetching
http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
Timeout during connect (likely firewall problem)
I've made sure that this isn't due to misconfigured DNS entries by running a simple nginx container, and it resolves properly. Following is my Jobs file:
apiVersion: batch/v1
kind: Job
metadata:
#labels:
# app: certbot-generator
name: certbot
spec:
template:
metadata:
labels:
app: certbot-generate
spec:
volumes:
- name: certs
containers:
- name: certbot
image: certbot/certbot
command: ["certbot"]
#command: ["yes"]
args: ["certonly", "--noninteractive", "--agree-tos", "--staging", "--standalone", "-d", "staging.ishankhare.com", "-m", "me#ishankhare.com"]
volumeMounts:
- name: certs
mountPath: "/etc/letsencrypt/"
#- name: certs
#mountPath: "/opt/"
ports:
- containerPort: 80
- containerPort: 443
restartPolicy: "OnFailure"
and my service:
apiVersion: v1
kind: Service
metadata:
name: certbot-lb
labels:
app: certbot-lb
spec:
type: LoadBalancer
loadBalancerIP: 35.189.170.149
ports:
- port: 80
name: "http"
protocol: TCP
- port: 443
name: "tls"
protocol: TCP
selector:
app: certbot-generator
the full error message is something like this:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for staging.ishankhare.com
Waiting for verification...
Cleaning up challenges
Failed authorization procedure. staging.ishankhare.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8: Timeout during connect (likely firewall problem)
IMPORTANT NOTES:
- The following errors were reported by the server:
Domain: staging.ishankhare.com
Type: connection
Detail: Fetching
http://staging.ishankhare.com/.well-known/acme-challenge/tpumqbcDWudT7EBsgC7IvtSzZvMAuooQ3PmSPh9yng8:
Timeout during connect (likely firewall problem)
To fix these errors, please make sure that your domain name was
entered correctly and the DNS A/AAAA record(s) for that domain
contain(s) the right IP address. Additionally, please check that
your computer has a publicly routable IP address and that no
firewalls are preventing the server from communicating with the
client. If you're using the webroot plugin, you should also verify
that you are serving files from the webroot path you provided.
- Your account credentials have been saved in your Certbot
configuration directory at /etc/letsencrypt. You should make a
secure backup of this folder now. This configuration directory will
also contain certificates and private keys obtained by Certbot so
making regular backups of this folder is ideal.
I've also tried running this as a simple Pod but to no help. Although I still feel running it as a Job to completion is the way to go.
First, be aware your Job definition is valid, but the spec.template.metadata.labels.app: certbot-generate value does not match with your Service definition spec.selector.app: certbot-generator: one is certbot-generate, the second is certbot-generator. So the pod run by the job controller is never added as an endpoint to the service.
Adjust one or the other, but they have to match, and that might just work :)
Although, I'm not sure using a Service with a selector targeting short-lived pods from a Job controller would work, neither with a simple Pod as you tested. The certbot-randomId pod created by the job (or whatever simple pod you create) takes about 15 seconds total to run/fail, and the HTTP validation challenge is triggered after just a few seconds of the pod life: it's not clear to me that would be enough time for kubernetes proxying to be already working between the service and the pod.
We can safely assume that the Service is actually working because you mentioned that you tested DNS resolution, so you can easily ensure that's not a timing issue by adding a sleep 10 (or more!) to give more time for the pod to be added as an endpoint to the service and being proxied appropriately before the HTTP challenge is triggered by certbot. Just change your Job command and args for those:
command: ["/bin/sh"]
args: ["-c", "sleep 10 && certbot certonly --noninteractive --agree-tos --staging --standalone -d staging.ishankhare.com -m me#ishankhare.com"]
And here too, that might just work :)
That being said, I'd warmly recommend you to use cert-manager which you can install easily through its stable Helm chart: the Certificate custom resource that it introduces will store your certificate in a Secret which will make it straightforward to reuse from whatever K8s resource, and it takes care of renewal automatically so you can just forget about it all.
Related
I am using keycloak to implement OAuth2 code authorization flow in a kubernetes cluster governed by an API gatware Ambassador, I am using Istio Service mesh to add all the tracability, mTLS features to my cluster. One of which is Jaeger which requires all the services to forward x-request-id header in order to link the spans into a specific trace.
When request is sent, Istio's proxy attached to ambassador will generate the x-request-id and forward the request keycloak for authorization, when the results are sent back to the ambassador, the header is dropped and therefore, the istio proxy of keycloak will be generating a new x-header-id. The following image shows the problem:
Here is a photo of the trace where I lost the x-request-id:
Is there a way I can force Keycloak to forward the x-request-id header if passed to it?
Update
here is the environment variables (ConfigMap) associated with Keycloak:
kind: ConfigMap
apiVersion: v1
metadata:
name: keycloak-envars
data:
KEYCLOAK_ADMIN: "admin"
KC_PROXY: "edge"
KC_DB: "postgres"
KC_DB_USERNAME: "test"
KC_DB_DATABASE: "keycloak"
PROXY_ADDRESS_FORWARDING: "true"
You may need to restart your keycloak docker container with the environment variable PROXY_ADDRESS_FORWARDING=true.
ex: docker run -e PROXY_ADDRESS_FORWARDING=true jboss/keycloak
Problem Facing: When I try to run kubectl apply command on both the files below and try to see the app in the browser in http://192.168.49.2:30080/ the app did not render.I tried to run minikube service fleetman - webapp --url but still no progress . Please Help !!!
Additional information :minikube ip -192.168.49.2 .
Note:I have installed docker Desktop app on my mac book air catalina.
Browser message: This site can’t be reached 192.168.49.2 took too long to respond.
Docker image Link :https://hub.docker.com/r/richardchesterwood/k8s-fleetman-webapp-angular
first-pod.yaml file
apiVersion: v1
kind: Pod
metadata:
name: webapp
labels :
mylabelname: webapp
spec:
containers:
- name: webapp
image: richardchesterwood/k8s-fleetman-webapp-angular:release0
webapp-services.yaml file
apiVersion: v1
kind: Service
metadata:
name: fleetman-webapp
spec:
# This defines which pods are going to be represented by this Service
# The service becomes a network endpoint for either other services
# or maybe external users to connect to (eg browser)
selector:
mylabelname: webapp
ports:
- name: http
port: 80
nodePort: 30080
type: NodePort
Try creating minikube with driver none:
$ minikube start --driver=none
The none driver allows advanced minikube users to skip VM creation, allowing minikube to be run on a user-supplied VM.
Hence you will be able to communicate to your app via your host (ie. user-supplied VM) network address.
I am using the opentelemetry-ruby otlp exporter for auto instrumentation:
https://github.com/open-telemetry/opentelemetry-ruby/tree/main/exporter/otlp
The otel collector was installed as a daemonset:
https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector
I am trying to get the OpenTelemetry collector to collect traces from the Rails application. Both are running in the same cluster, but in different namespaces.
We have enabled auto-instrumentation in the app, but the rails logs are currently showing these errors:
E, [2022-04-05T22:37:47.838197 #6] ERROR -- : OpenTelemetry error: Unable to export 499 spans
I set the following env variables within the app:
OTEL_LOG_LEVEL=debug
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
I can't confirm that the application can communicate with the collector pods on this port.
Curling this address from the rails/ruby app returns "Connection Refused". However I am able to curl http://<OTEL_POD_IP>:4318 which returns 404 page not found.
From inside a pod:
# curl http://localhost:4318/
curl: (7) Failed to connect to localhost port 4318: Connection refused
# curl http://10.1.0.66:4318/
404 page not found
This helm chart created a daemonset but there is no service running. Is there some setting I need to enable to get this to work?
I confirmed that otel-collector is running on every node in the cluster and the daemonset has HostPort set to 4318.
The problem is with this setting:
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
Imagine your pod as a stripped out host itself. Localhost or 0.0.0.0 of your pod, and you don't have a collector deployed in your pod.
You need to use the address from your collector. I've checked the examples available at the shared repo and for agent-and-standalone and standalone-only you also have a k8s resource of type Service.
With that you can use the full service name (with namespace) to configure your environment variable.
Also, the Environment variable now is called OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, so you will need something like this:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=<service-name>.<namespace>.svc.cluster.local:<service-port>
The correct solution is to use the Kubernetes Downward API to fetch the node IP address, which will allow you to export the traces directly to the daemonset pod within the same node:
containers:
- name: my-app
image: my-image
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://$(HOST_IP):4318
Note that using the deployment's service as the endpoint (<service-name>.<namespace>.svc.cluster.local) is incorrect, as it effectively bypasses the daemonset and sends the traces directly to the deployment, which makes the daemonset useless.
I'm using the kubernetes cluster built in to Docker Desktop to develop my application.
I would like to expose services inside the cluster as ports on localhost.
I can do so using kubectl expose deployment foobar --type=NodePort --port=30088, which creates a service like this:
apiVersion: v1
kind: Service
metadata:
labels:
role: web
name: foobar
spec:
externalTrafficPolicy: Cluster
ports:
- nodePort: 30088
port: 80
protocol: TCP
targetPort: 80
selector:
role: web
type: NodePort
But it only works for very high numbered ports. If I try something lower I get:
The Service "kafka-external" is invalid: spec.ports[0].nodePort: Invalid value: 9092: provided port is not in the valid range. The range of valid ports is 30000-32767
It seems there is a kubernetes apiserver setting called ServiceNodePortRange which would allow me to override this restriction, but I can't figure out how to set it on Docker's builtin cluster.
So my question is: how do I expose a specific, low-numbered port (like 9092) on Docker's kubernetes cluster? Is there a way to override that setting? Or a better way to expose the service than NodePort?
NodePort is intended to be a building block for load-balancers or other
ingress modes. This means it didn't matter which port you got as long as
you got one. This makes it a little clunky to use directly - you can't
have just any port. You can change the port range, but you run the risk of
conflicts with real things running on your nodes and with any pod HostPorts.
The default range is indeed 30000-32767 but it can be changed by setting the --service-node-port-range Update the file /etc/kubernetes/manifests/kube-apiserver.yaml and add the line --service-node-port-range=xxxxx-yyyyy.
In the Kubernetes cluster there is a kube-apiserver.yaml file which is in the directory - /etc/kubernetes/manifests/kube-apiserver.yaml but not on the kube-apiserver container/pod but on the master itself.
Login to Docker VM:
Add the following line to the pod spec:
spec:
containers:
- command:
- kube-apiserver
...
- --service-node-port-range=xxxxx-yyyyy # <-- add this line
...
Save and exit. Pod kube-apiserver will be restarted with new parameters.
Exit Docker VM (for screen: Ctrl-a,k , for container: Ctrl-d )
Check the results:
$ kubectl get pod kube-apiserver-docker-desktop -o yaml -n kube-system | less
Take a look: service-pod-range, changing pod range, changing-nodeport-range.
I'm using EKS (Kubernetes) in AWS and I have problems with posting a payload at around 400 Kilobytes to any web server that runs in a container in that Kubernetes. I hit some kind of limit but it's not a limit in size, it seems at around 400 Kilobytes many times works but sometimes I get (testing with Python requests)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
I test this with different containers (python web server on Alpine, Tomcat server on CentOS, nginx, etc).
The more I increase the size over 400 Kilobytes, the more consistent I get: Connection reset by peer.
Any ideas?
Thanks for your answers and comments, helped me get closer to the source of the problem. I did upgrade the AWS cluster from 1.11 to 1.12 and that cleared this error when accessing from service to service within Kubernetes. However, the error still persisted when accessing from outside the Kubernetes cluster using a public dns, thus the load balancer.
So after testing some more I found out that now the problem lies in the ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/
So I switched back to a Kubernetes service that generates an older-generation ELB and the problem was fixed. The ELB is not ideal, but it's a good work-around for the moment, until the ALB controller gets fixed or I have the right button to press to fix it.
As you mentioned in this answer that the issue might be caused by ALB or the ALB controller for Kubernetes: https://kubernetes-sigs.github.io/aws-alb-ingress-controller/.
Can you check if Nginx Ingress controller can be used with ALB ?
Nginx has a default value of request size set to 1Mb. It can be changed by using this annotation: nginx.ingress.kubernetes.io/proxy-body-size.
Also are you configuring connection-keep-alive or connection timeouts anywhere ?
The connection reset by peer, even between services inside the cluster, sounds like it may be the known issue with conntrack. The fix involves running the following:
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
And you can automate this with the following DaemonSet:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: startup-script
labels:
app: startup-script
spec:
template:
metadata:
labels:
app: startup-script
spec:
hostPID: true
containers:
- name: startup-script
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal
echo done
As this answer suggests, you may try to change you kube-proxy mode of operation. To edit your kube-proxy configs:
kubectl -n kube-system edit configmap kube-proxy
Search for mode: "" and try "iptables" , "userspace" or "ipvs". Each time you change your configmap, delete your kube-proxy pod(s) to make sure it is reading the new configmap.
we had a similar issue with Azure and its firewall which prevents to send more than 128KB as patch request.
After researching and thinking about the pro/cons on this approach within the team, our solution is a complete different one.
We put our "bigger" requests into a blob storage. Afterwards we put a message onto a queue with the filename created before. The queue will receive the message with the filename, reads the blob from the storage, converts it into whatever-you-need-to-have as object and is able to apply any business logic on this big object.
After processing the message, the file will be deleted.
The biggest advantage is that our API is not blocked with a big request and its long running job.
Maybe this can be another way to solve your issue within the kubernetes container.
See ya, Leonhard