kubectl timeout inside kube-addon-manager - docker

I was debugging a issue from my cluster, seems kubectl commands timeout inside the kube-addon-manager pod, while the equivalent curl command works fine.
bash-4.3# kubectl get node --v 10
I1119 16:35:55.506867 54 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.10.5 (linux/amd64) kubernetes/32ac1c9" http://localhost:8080/api
I1119 16:36:25.507550 54 round_trippers.go:405] GET http://localhost:8080/api in 30000 milliseconds
I1119 16:36:25.507959 54 round_trippers.go:411] Response Headers:
I1119 16:36:25.508122 54 cached_discovery.go:124] skipped caching discovery info due to Get http://localhost:8080/api: dial tcp: i/o timeout
Equivalent curl command output
bash-4.3# curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.10.5 (linux/amd64) kubernetes/32ac1c9" http://localhost:8080/api
Note: Unnecessary use of -X or --request, GET is already inferred.
* Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /api HTTP/1.1
> Host: localhost:8080
> Accept: application/json, */*
> User-Agent: kubectl/v1.10.5 (linux/amd64) kubernetes/32ac1c9
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Mon, 19 Nov 2018 16:43:00 GMT
< Content-Length: 134
<
{"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"172.16.1.13:6443"}]}
* Connection #0 to host localhost left intact
Also tried to run a docker container with host network mode, kubectl command still timeout.
kube-addon-manager.yaml
apiVersion: v1
kind: Pod
metadata:
name: kube-addon-manager
namespace: kube-system
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
labels:
component: kube-addon-manager
spec:
hostNetwork: true
containers:
- name: kube-addon-manager
image: gcr.io/google-containers/kube-addon-manager:v8.6
imagePullPolicy: IfNotPresent
command:
- /bin/bash
- -c
- /opt/kube-addons.sh
resources:
requests:
cpu: 5m
memory: 50Mi
volumeMounts:
- mountPath: /etc/kubernetes/
name: addons
readOnly: true
volumes:
- name: addons
hostPath:
path: /etc/kubernetes/

Seems like in your config you are trying to talk to port 8080 which is the insecure port in the kube-apiserver.
You can try starting your kube-apiserver with this option:
--insecure-port
The default for the insecure port is 8080. Note that this option might be deprecated in the future.
Also, keep in mind the the kube-addon-manager is part of the legacy add-ons.

Related

I run the kserve example for sklearn-iris and got `302 Found`

serving exam model
create namespace
$ kubectl create namespace kserve-test
create InferenceService
$ kubectl apply -n kserve-test -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
check
$ kubectl get inferenceservices sklearn-iris -n kserve-test
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
sklearn-iris http://sklearn-iris.kserve-test.example.com True 100 sklearn-iris-predictor-default-00001 5h11m
check SERVICE_HOSTNAME, INGRESS_PORT, INGRESS_HOST
SERVICE_HOSTNAME
$ SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
$ echo $SERVICE_HOSTNAME
sklearn-iris.kserve-test.example.com
INGRESS_PORT
$ INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(#.name=="http2")].nodePort}')
$ echo $INGRESS_PORT
31018
INGRESS_HOST
$ INGRESS_HOST=192.168.219.100
192.168.219.100 : Internal IP of my device
create input
$ cat <<EOF > "./iris-input.json"
{
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
EOF
send request
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d #./iris-input.json
* Trying 192.168.219.100:31018...
* Connected to 192.168.219.100 (192.168.219.100) port 31018 (#0)
> POST /v1/models/sklearn-iris:predict HTTP/1.1
> Host: sklearn-iris.kserve-test.example.com
> User-Agent: curl/7.71.1
> Accept: */*
> Content-Length: 76
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 76 out of 76 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< location: /dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY3MTU5MDMxOHxFd3dBRUhZek16WktlRlZJZFc1alowVlROVTA9fFynQ-3082qPF_-qUwnYllySrEQPAKGqpBuF-Pu9gcnx
< date: Wed, 21 Dec 2022 02:38:38 GMT
< x-envoy-upstream-service-time: 6
< server: istio-envoy
< content-length: 0
<
* Connection #0 to host 192.168.219.100 left intact
Got 302 Found.
As far as I know, this result is because I was asked for authentication from Dex.
what i did to solve this
Authentication
I tried to get the authservice_session token by following the method here: kserve:github
From CLI
$ CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.clusterIP}')
CLUSTER_IP: 10.103.239.220
$ curl -v http://${CLUSTER_IP}
* Trying 10.103.239.220:80...
* Connected to 10.103.239.220 (10.103.239.220) port 80 (#0)
> GET / HTTP/1.1
> Host: 10.103.239.220
> User-Agent: curl/7.71.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< content-type: text/html; charset=utf-8
< location: /dex/auth?client_id=kubeflow-oidc-authservice&redirect_uri=%2Flogin%2Foidc&response_type=code&scope=profile+email+groups+openid&state=MTY3MTU5MzE2MXxFd3dBRUdGM2JUSnlaMkZVYjFWUVNFa3dURGs9fEDuO8ql3cFsetSfKntLvFV0al5tEZJeh23VK-JrJubM
< date: Wed, 21 Dec 2022 03:26:01 GMT
< content-length: 269
< x-envoy-upstream-service-time: 4
< server: istio-envoy
<
Found.
* Connection #0 to host 10.103.239.220 left intact
I stuck at this stage. I think it's wrong to see 302 Found.
From the browser
Copy the token content from the cookie authservice_session
$ SESSION =MTY3MTUyODQ2M3xOd3dBTkRkRVExbEdVa0kzVFRJMFMwOU9VRE5hV2pSS1VGVkNSRVJVUlRKVlVVOUlTa2hDVWpOU1RUZFRVRkJGVTFGV1N6UktXVkU9fCoQdbMu_diLBJAKLZSmF4qoqQTlINKq7A63hy-QNQcR
$ curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Cookie: authservice_session=${SESSION}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d ./iris-input.json
got 404 Not Found
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Cookie: authservice_session=${SESSION}" http://${CLUSTER_IP}/v1/models/${MODEL_NAME}:predict -d #./iris-input.json
got 404 Not Found
added Envoy filter for bypass Dex
$ vi envoyfilter.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: sklearn-iris-filter
namespace: istio-system
spec:
workloadSelector:
labels:
istio: ingressgateway
configPatches:
- applyTo: VIRTUAL_HOST
match:
routeConfiguration:
vhost:
name: sklearn-iris.kserve-test.example.com:31018
patch:
operation: MERGE
value:
per_filter_config:
envoy.ext_authz:
disabled: true
$ kubectl apply -f envoyfilter.yaml
Note: spec.configPatches.match.routeConfiguration.vhost.name : sklearn-iris.kserve-test.example.com:31018
It's not working.
Got still 302 Found
External Authorization
reference: https://github.com/kubeflow/kubeflow/issues/4549#issuecomment-932259673
add AuthorizationPolicy
$ vi authorizationpolicy.yaml
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: dex-auth
namespace: istio-system
spec:
selector:
matchLabels:
istio: ingressgateway
action: CUSTOM
provider:
# The provider name must match the extension provider defined in the mesh config.
name: dex-auth-provider
rules:
# The rules specify when to trigger the external authorizer.
- to:
- operation:
notPaths: ["/v1*"]
$ kubectl apply -f authorizationpolicy.yaml
Note: rules.to.operation: notPaths: ["/v1*"]
and delete envoyfilters named authn-filter that originally existed
$ kubectl delete -n istio-system envoyfilters.networking.istio.io authn-filter
next, restart deployment/istiod
$ kubectl rollout restart deployment/istiod -n istio-system
It's not working.
Got still 302 Found if don't delete envoyfilters named authn-filter that originally existed, or block the connection if I delete authn-filter .
What I need help:
How can I get Dex authorization and make a connected?
Or how can I bypass Dex if I can't get Dex authorization?
Maybe my model serving example is wrong. Thanks for any advice on what's wrong.
env:
ubuntu 20.04
$ kubectl version --client && kubeadm version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.13", GitCommit:"a43c0904d0de10f92aa3956c74489c45e6453d6e", GitTreeState:"clean", BuildDate:"2022-08-17T18:28:56Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.13", GitCommit:
I installed istio and kanative that included in the kubeflow manifest by following: kubeflow/manifasts.
$ kubectl get pod -n istio-system
istio-system authservice-0 1/1 Running
istio-system cluster-local-gateway-5449f87d9b-bb4vs 1/1 Running
istio-system istio-ingressgateway-77b9d69b74-xmv98 1/1 Running
istio-system istiod-67fcb675b5-kzfvd 1/1 Running
$ kubectl get pod -n knative-eventing
$ kubectl get pod -n knative-serving
knative-eventing eventing-controller-8457bd9747-855lc 1/1 Running
knative-eventing eventing-webhook-69986cfb5d-hn7tx 1/1 Running
knative-serving activator-7c5cd78566-pz6ns 2/2 Running
knative-serving autoscaler-98487645d-vh5wk 2/2 Running
knative-serving controller-7546f544b7-mng9g 2/2 Running
knative-serving domain-mapping-5d56bfc7d-5cb9l 2/2 Running
knative-serving domainmapping-webhook-696559d49c-p8rwr 2/2 Running
knative-serving net-istio-controller-c4d469c-lt5fl 2/2 Running
knative-serving net-istio-webhook-855bcb6747-wbl4x 2/2 Running
knative-serving webhook-59f9fdd446-xsf6n 2/2 Running
And installed KServe and KServe Built-in ClusterServingRuntimes by following: kserve installation
$ kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.9.0/kserve.yaml
$ kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.9.0/kserve-runtimes.yaml
$ kubectl get pod -n kserve
kserve-controller-manager-5fc887875d-m7rlp 2/2 Running
check gateway selector
knative-local-gateway in namespace knative-serving
$ kubectl get gateways knative-local-gateway -n knative-serving -o yaml
spec:
selector:
app: cluster-local-gateway
istio: cluster-local-gateway
istio-ingressgateway in namespace istio-system
$ kubectl get gateways istio-ingressgateway -n istio-system -o yaml
spec:
selector:
app: istio-ingressgateway
istio: ingressgateway
cluster-local-gateway in namespace istio-system
$ kubectl get gateways cluster-local-gateway -n istio-system -o yaml
spec:
selector:
app: cluster-local-gateway
istio: cluster-local-gateway
kubeflow-gateway in namespace kubeflow
$ kubectl get gateways kubeflow-gateway -n kubeflow -o yaml
spec:
selector:
istio: ingressgateway

Cannot access Docker container from another

Using this docker-compose file:
version: '3'
services:
hello:
image: nginxdemos/hello
ports:
- 7080:80
tool:
image: wbitt/network-multitool
tty: true
networks:
default:
name: test-network
If I curl from the host, it works.
❯ curl -s -o /dev/null -v http://192.168.1.102:7080
* Expire in 0 ms for 6 (transfer 0x8088b0)
* Trying 192.168.1.102...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x8088b0)
* Connected to 192.168.1.102 (192.168.1.102) port 7080 (#0)
> GET / HTTP/1.1
> Host: 192.168.1.102:7080
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.23.1
< Date: Sun, 10 May 2071 00:06:00 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< Expires: Sun, 10 May 2071 00:05:59 GMT
< Cache-Control: no-cache
<
{ [6 bytes data]
* Connection #0 to host 192.168.1.102 left intact
If I try to contact another container from within the network, it fails.
❯ docker exec -it $(gdid tool) curl -s -o /dev/null -v http://hello
* Could not resolve host: hello
* Closing connection 0
Is this intended behaviour? I thought networks within the same network (and using docker-compose) are meant to be able to talk by their service name?
I am bringing the containers up with docker-compose up -d

Accessing service from an Alpine-based k8s pod is throwing a DNS Resolution error

I have pod A (it's actually the kube-scheduler pod) and pod B (a pod that has a REST API that will be invoked by pod A).
For this purpose, I created a ClusterIP service.
Now, when I exec into pod A to perform the API call to pod B, I get:
curl: (6) Could not resolve host: my-svc.default.svc.cluster.local
I tried to follow the debug instructions mentioned here:
kubectl exec -i -t dnsutils -- nslookup my-svc.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: my-svc.default.svc.cluster.local
Address: 10.111.181.13
Also:
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
This seems to be working as expected. However, when I exec into pod A, I get:
kubectl exec -it kube-scheduler -n kube-system -- sh
/bin # nslookup kubernetes.default
Server: 8.8.8.8
Address: 8.8.8.8:53
** server can't find kubernetes.default: NXDOMAIN
** server can't find kubernetes.default: NXDOMAIN
Other debugging steps (inside pod A) include:
/bin # cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 172.30.0.1
And:
/bin # cat /etc/*-release
3.12.8
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.12.8
PRETTY_NAME="Alpine Linux v3.12"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
There are no useful logs from the coredns pods, either.
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
From the documentation, it seems there is a known issue with Alpine and DNS resolution (even though the version I have is greater than the version they mentioned).
Is there a workaround this to enable accessing the service properly from the Alpine pod?
Edit providing pod A manifest:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --config=/etc/kubernetes/sched-cs.yaml
- --port=0
image: localhost:5000/scheduler-plugins/kube-scheduler:latest
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/sched-cs.yaml
name: sched-cs
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/sched-cs.yaml
type: FileOrCreate
name: sched-cs
status: {}
Edit 2:
Adding the following lines manually to /etc/resolv.conf of Pod A allows me to perform the curl request successfully.
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Wouldn't there be a cleaner/less manual way to achieve the same result?
Try setting the DNSPolicy for pod A (or whatever deployment, statefulset, etc.) defines its template to ClusterFirst or ClusterFirstWithHostNet.
The behavior of this setting depends on how your cluster and kubelet are set up, but in most default configurations this will make the kubelet set resolv.conf inside the pod to use the kube-dns service that you manually set in your edit (10.96.0.10), which will forward lookups outside the cluster to the nameservers for the host.
K8s docs
Error curl: (6) Could not resolve host mainly occurs due to a wrong DNS set up or bad settings on the server. You can find an explanation of this problem.
If you want to apply a custom DNS configuration you can do so according to this documentation:
If a Pod's dnsPolicy is set to default, it inherits the name resolution configuration from the node that the Pod runs on. The Pod's DNS resolution should behave the same as the node. But see Known issues.
If you don't want this, or if you want a different DNS config for pods, you can use the kubelet's --resolv-conf flag. Set this flag to "" to prevent Pods from inheriting DNS. Set it to a valid file path to specify a file other than /etc/resolv.conf for DNS inheritance.
Another solution will be to create your own system image in which you already put the values you are interested in.

Errors form my k8s CornJob: Pod errors: Error with exit code 127

I have backend service deployed on a private GKE cluster, and i want to execute this Corn job but everytime i get the following error: Pod errors: Error with exit code 127
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: call-callendar-api-demo
spec:
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
nodeSelector:
env: demo
containers:
- name: call-callendar-api-demo
image: busybox
command: ["/bin/sh"]
args: ["-c", 'curl -X POST "curl -X POST "https://x.x.x/api/v1/cal/add_link" -H "accept: application/json" -d "" >/dev/null 2>&1" -H "accept: application/json" -d "" >/dev/null 2>&1']
restartPolicy: Never
Any suggestions why this CornJob that is deployed on the same namespace with my backend service is giving me this error? Also there is no logs in the container :( btw i have basic auth, could that be a reason?
Edit: Logs from the pod after removing >/dev/null/:
textPayload: "curl: (3) URL using bad/illegal format or missing URL
textPayload: "
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: application
"
The command is wrong, and i changed the picture with one that implements curl it suppose to look like this.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: demo
spec:
schedule: "*/15 * * * *"
jobTemplate:
spec:
template:
spec:
nodeSelector:
env: demo
containers:
- name: -demo
image: curlimages/curl #changed the picture
command: ["/bin/sh"]
args: ["-c", 'curl -X POST "https://x.x.x/api/v1/cal/addy_link" -H "accept: application/json" -d "" >/dev/null 2>&1']
restartPolicy: Never
It solved my problem.

Liveness probe with http post

I'm running a web service that I can not change any of the specifications. I want to use liveness probe with HTTP POST on Kubernetes. I couldn't find anything available. All of my efforts with busybox and netcat have failed.
Is there a solution? Is it possible to build a custom liveness probe from any Linux dist?
Kubernetes Probes only support HTTP GET, TCP & Command.
If you must check something over HTTP POST you could use a command approach and just curl -XPOST ..
An example would be:
...
containers:
- name: k8-byexamples-spring-rest
image: gcr.io/matthewdavis-byexamples/k8-byexamples-spring-rest:1d4c1401c9485ef61322d9f2bb33157951eb351f
ports:
- containerPort: 8080
name: http
livenessProbe:
exec:
command:
- curl
- -X POST
- http://localhost/test123
initialDelaySeconds: 5
periodSeconds: 5
...
For more explanation see: https://matthewdavis.io/kubernetes-health-checks-demystified/.
Hope that helps!

Resources