Istio EnvoyProxy Access Logs - how to log external HTTPS calls? - azure-aks

I have ISTIO service mesh configured in my AKS (Azure Kubernetes Service) cluster.
I am able to see the access logs of requests my services makes to other services within the AKS cluster.
However, When my service makes a call to external services, the call is not logged in the ISTIO-PROXY (ENVOY)'s access Log.
For example, call to readme of prometheus from inside my service container:
curl -kv https://raw.githubusercontent.com/prometheus/prometheus/main/README.md
All I can see from the istio-proxy is this -
[2022-09-05T14:18:38.038Z] "- - -" 0 - - - "-" 879 12246 45 - "-" "-" "-" "-" "185.199.108.133:443" PassthroughCluster x.x.x.x:39380 185.199.108.133:443 x.x.x.x:39378 - -
185.199.108.133 is the github's IP. How can I get the response code, path and other details of the call?
I'm using the default access log format of Envoy as described here - https://istio.io/latest/docs/tasks/observability/logs/access-log/

Related

Unable to export traces to OpenTelemetry Collector on Kubernetes

I am using the opentelemetry-ruby otlp exporter for auto instrumentation:
https://github.com/open-telemetry/opentelemetry-ruby/tree/main/exporter/otlp
The otel collector was installed as a daemonset:
https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector
I am trying to get the OpenTelemetry collector to collect traces from the Rails application. Both are running in the same cluster, but in different namespaces.
We have enabled auto-instrumentation in the app, but the rails logs are currently showing these errors:
E, [2022-04-05T22:37:47.838197 #6] ERROR -- : OpenTelemetry error: Unable to export 499 spans
I set the following env variables within the app:
OTEL_LOG_LEVEL=debug
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
I can't confirm that the application can communicate with the collector pods on this port.
Curling this address from the rails/ruby app returns "Connection Refused". However I am able to curl http://<OTEL_POD_IP>:4318 which returns 404 page not found.
From inside a pod:
# curl http://localhost:4318/
curl: (7) Failed to connect to localhost port 4318: Connection refused
# curl http://10.1.0.66:4318/
404 page not found
This helm chart created a daemonset but there is no service running. Is there some setting I need to enable to get this to work?
I confirmed that otel-collector is running on every node in the cluster and the daemonset has HostPort set to 4318.
The problem is with this setting:
OTEL_EXPORTER_OTLP_ENDPOINT=http://0.0.0.0:4318
Imagine your pod as a stripped out host itself. Localhost or 0.0.0.0 of your pod, and you don't have a collector deployed in your pod.
You need to use the address from your collector. I've checked the examples available at the shared repo and for agent-and-standalone and standalone-only you also have a k8s resource of type Service.
With that you can use the full service name (with namespace) to configure your environment variable.
Also, the Environment variable now is called OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, so you will need something like this:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=<service-name>.<namespace>.svc.cluster.local:<service-port>
The correct solution is to use the Kubernetes Downward API to fetch the node IP address, which will allow you to export the traces directly to the daemonset pod within the same node:
containers:
- name: my-app
image: my-image
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://$(HOST_IP):4318
Note that using the deployment's service as the endpoint (<service-name>.<namespace>.svc.cluster.local) is incorrect, as it effectively bypasses the daemonset and sends the traces directly to the deployment, which makes the daemonset useless.

Traefik delivering two different ports on same docker

I have a problem with traefik. The situation:
I have a docker gitlab with gitlab-registry activated, no problem with that.
My gitlab answer on port 80 and registry on port 80 as well, and this is not a problem nether (tried with port mapping and worked well.
Now I try to configure these guys in traefik and that's when problems occur. When ther is just one router and one service, all good but when I try to add the registry I have an error:
The involved part in the docker-compose file:
labels:
- traefik.http.routers.gitlab.rule=Host(`git.example.com`)
- traefik.http.routers.gitlab.tls=true
- traefik.http.routers.gitlab.tls.certresolver=lets-encrypt
- traefik.http.routers.gitlab.service=gitlabservice
- traefik.http.services.giltabservice.loadbalancer.server.port=80
- traefik.http.routers.gitlab.entrypoints=websecure
- traefik.http.routers.gitlabregistry.rule=Host(`registry.example.com`)
- traefik.http.routers.gitlabregistry.tls=true
- traefik.http.routers.gitlabregistry.tls.certresolver=lets-encrypt
- traefik.http.routers.gitlabregistry.service=gitlabregistryservice
- traefik.http.services.giltabregistryservice.loadbalancer.server.port=80
- traefik.http.routers.gitlabregistry.entrypoints=websecure
And the errors I have in traefik:
time="2021-03-27T08:48:20Z" level=error msg="the service "gitlabservice#docker" does not exist" entryPointName=websecure routerName=gitlab#docker
time="2021-03-27T08:48:20Z" level=error msg="the service "gitlabregistryservice#docker" does not exist" entryPointName=websecure routerName=gitlabregistry#docker
Thanks in advance if you have any idea.

How to fix "failed to ensure load balancer" error for nginx ingress

When setting a new nginx-ingress using helm and a static ip on Azure the nginx controller never gets the static IP assigned. It always says <pending>.
I install the helm chart as follows -
helm install stable/nginx-ingress --name <my-name> --namespace <my-namespace> --set controller.replicaCount=2 --set controller.service.loadBalancerIP="<static-ip-address>"
It says it installs correctly but there is an error listed as well
E0411 06:44:17.063913 13264 portforward.go:303] error copying from
remote stream to local connection: readfrom tcp4
127.0.0.1:57881->127.0.0.1:57886: write tcp4 127.0.0.1:57881->127.0.0.1:57886: wsasend: An established connection was aborted by the software in your host machine.
I then do a kubectl get all -n <my-namespace> and everything is listed correctly just with the external IP as <pending> for the controller.
I then do a kubectl describe -n <my-namespace> service/<my-name>-nginx-ingress-controller and this error is listed under Events -
Warning CreatingLoadBalancerFailed 11s (x4 over 47s)
service-controller Error creating load balancer (will retry): failed
to ensure load balancer for service
my-namespace/my-name-nginx-ingress-controller: timed out waiting for the
condition.
Thank you kindly
For your issue, the possible reason is that your public IP is not in the same resource group and region with the AKS cluster. See the steps in Create an ingress controller with a static public IP address in Azure Kubernetes Service (AKS).
You can get the AKS group through the CLI command like this:
az aks show --resource-group myResourceGroup --name myAKSCluster --query nodeResourceGroup -o tsv
When your public IP in a different group and region, then it will give the time out error as you.
Make sure that your ingress is in the node resource group, and also that the sku for the ingress is Basic not Standard

Docker push intermittent failure to private docker registry on kubernetes (docker-desktop)

I'm running a kubernetes cluster on docker-desktop (mac).
It has a local docker registry inside it.
I'm able to query the registry no problem via the API calls to get the list of tags.
I was able to push an image before, but it took multiple attempts to push.
I can't push new changes now. It looks like it pushes successfully for layers, but then doesn't acknowledge the layer has been pushed and then retries.
Repo is called localhost:5000 and I am correctly port forwarding as per instructions on https://blog.hasura.io/sharing-a-local-registry-for-minikube-37c7240d0615/
I'm ot using ssl certs as this is for development on local machine.
(The port forwarding is proven to work otherwise API call would fail)
e086a4af6e6b: Retrying in 1 second
35c20f26d188: Layer already exists
c3fe59dd9556: Pushing [========================> ] 169.3MB/351.5MB
6ed1a81ba5b6: Layer already exists
a3483ce177ce: Retrying in 16 seconds
ce6c8756685b: Layer already exists
30339f20ced0: Retrying in 1 second
0eb22bfb707d: Pushing [==================================================>] 45.18MB
a2ae92ffcd29: Waiting
received unexpected HTTP status: 502 Bad Gateway
workaround (this will suffice but not ideal, as have to build each container
apiVersion: v1
kind: Pod
metadata:
name: producer
namespace: aetasa
spec:
containers:
- name: kafkaproducer
image: localhost:5000/aetasa/cta-user-create-app
imagePullPolicy: Never // this line uses the built container in docker
ports:
- containerPort: 5005
Kubectl logs for registry
10.1.0.1 - - [20/Feb/2019:19:18:03 +0000] "POST /v2/aetasa/cta-user-create-app/blobs/uploads/ HTTP/1.1" 202 0 "-" "docker/18.09.2 go/go1.10.6 git-commit/6247962 kernel/4.9.125-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.2 \x5C(darwin\x5C))" "-"
2019/02/20 19:18:03 [warn] 12#12: *293 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000011, client: 10.1.0.1, server: localhost, request: "PATCH /v2/aetasa/cta-user-create-app/blobs/uploads/16ad0e41-9af3-48c8-bdbe-e19e2b478278?_state=qjngrtaLCTal-7-hLwL9mvkmhOTHu4xvOv12gxYfgPx7Ik5hbWUiOiJhZXRhc2EvY3RhLXVzZXItY3JlYXRlLWFwcCIsIlVVSUQiOiIxNmFkMGU0MS05YWYzLTQ4YzgtYmRiZS1lMTllMmI0NzgyNzgiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMTktMDItMjBUMTk6MTg6MDMuMTU2ODYxNloifQ%3D%3D HTTP/1.1", host: "localhost:5000"
2019/02/20 19:18:03 [error] 12#12: *293 connect() failed (111: Connection refused) while connecting to upstream, client: 10.1.0.1, server: localhost, request: "PATCH /v2/aetasa/cta-user-create-app/blobs/uploads/16ad0e41-9af3-48c8-bdbe-e19e2b478278?_state=qjngrtaLCTal-7-hLwL9mvkmhOTHu4xvOv12gxYfgPx7Ik5hbWUiOiJhZXRhc2EvY3RhLXVzZXItY3JlYXRlLWFwcCIsIlVVSUQiOiIxNmFkMGU0MS05YWYzLTQ4YzgtYmRiZS1lMTllMmI0NzgyNzgiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMTktMDItMjBUMTk6MTg6MDMuMTU2ODYxNloifQ%3D%3D HTTP/1.1", upstream: "http://10.104.68.90:5000/v2/aetasa/cta-user-create-app/blobs/uploads/16ad0e41-9af3-48c8-bdbe-e19e2b478278?_state=qjngrtaLCTal-7-hLwL9mvkmhOTHu4xvOv12gxYfgPx7Ik5hbWUiOiJhZXRhc2EvY3RhLXVzZXItY3JlYXRlLWFwcCIsIlVVSUQiOiIxNmFkMGU0MS05YWYzLTQ4YzgtYmRiZS1lMTllMmI0NzgyNzgiLCJPZmZzZXQiOjAsIlN0YXJ0ZWRBdCI6IjIwMTktMDItMjBUMTk6MTg6MDMuMTU2ODYxNloifQ%3D%3D", host: "localhost:5000"
Try configure --max-concurrent-uploads=1 for your docker client. You are pushing quite large layers (350MB), so probably you are hitting some limits (request sizes, timeouts) somewhere. Single concurrent upload may help you, but it is only a work around. Real solution will be configuration (buffer sizes, timeouts, ...) of registry + reverse proxy in front of registry eventually.
It may be a disk space issue. If you store docker images inside the Docker VM you can fill up the disk space quite fast.
By default, docker-desktop VM disk space is limited to 64 gigabytes. You can increase it up to 112Gb on the "Disk" tab in Docker Preferences.
I have encountered this issues quite few times and unfortunately couldn't get to the permanent fix.
Most likely the image should have been corrupted in the registry. As a work around, i suggest you delete the image from registry and do a fresh push. it would work and subsequent pushes would work too.
This issue must be related to the missing layers of the image. sometimes we delete the image using --force option, in that case it is possible that some of the common layers might get deleted and would affect other images that share the deleted layers.

Unable to push image to Docker Hub registry

I am brand new to Docker and I am trying to follow the Getting Started tutorial from Docker. I am using Docker 17.05-ce under Ubuntu 17.04. The problem appears to be network related. When I try to push I get the following results:
jonathan#poseidon:~/DockerTest$ sudo docker push jgossage/get-started:part1
The push refers to a repository [docker.io/jgossage/get-started]
1770f1c9a8cf: Pushed
61fd1d8cd138: Pushed
e0f735a5e86f: Layer already exists
1de570a07fb5: Pushed
b3640b6d4ac2: Layer already exists
08d4c9ccebfd: Pushed
007ab444b234: Retrying in 1 second
dial tcp: lookup registry-1.docker.io on 127.0.0.53:53: dial udp 127.0.0.53:53: i/o timeout
jonathan#poseidon:~/DockerTest$ sudo docker logs 58e8df0a7426
* Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
172.17.0.1 - - [20/Jun/2017 15:12:24] "GET / HTTP/1.1" 200 -
172.17.0.1 - - [20/Jun/2017 15:13:17] "GET / HTTP/1.1" 200 -
The push runs for some time with several retries before timing out.
This is on a home network with one computer connected to the router via WiFi and then normal TCP to my ISP and the Internet. What steps can I take to make Docker run reliably?
It looks like a DNS issue similar to this one: https://forums.docker.com/t/fata-0025-io-timeout-on-docker-image-push/1742/9
The suggestion is to replace your current DNS (127.0.0.53) by the Google DNS (8.8.8.8).
I'm not sure if there is an open issue concerning this problem. I couldn't find one.
I resolved this issue by replacing the standard DNS caching and resolving DNS server with a third party implementation unbound. The following web page contains complete instructions for doing this at the end of the document. As also suggested by others, it is a good idea to change to use the public Google DNS servers

Resources