Issues with Kubernetes multi-master using kubeadm on premises - docker

Following Kubernetes v1.11 documentation, I have managed to setup Kubernetes high availability using kubeadm, stacked control plane nodes, with 3 masters running on-premises on CentOS7 VMs. But with no load-balancer available, I used Keepalived to set a failover virtual IP (10.171.4.12) for apiserver as described in Kubernetes v1.10 documentation. As a result, my "kubeadm-config.yaml" used to boostrap the control planes had the following header:
apiVersion: kubeadm.k8s.io/v1alpha2
kind: MasterConfiguration
kubernetesVersion: v1.11.0
apiServerCertSANs:
- "10.171.4.12"
api:
controlPlaneEndpoint: "10.171.4.12:6443"
etcd:
...
The configuration went fine with the following Warning that appeared when boostrapping all 3 Masters:
[endpoint] WARNING: port specified in api.controlPlaneEndpoint
overrides api.bindPort in the controlplane address
And this Warning when joining Workers:
[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will
not be used, because the following required kernel modules are not
loaded: [ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh] or no builtin kernel ipvs
support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{}
nf_conntrack_ipv4:{}] you can solve this problem with following
methods:
1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support
Afterwards, basic tests succeed:
When stopped, Keepalived is failing over to another Master and apiserver is always accessible (all kubectl commands succeed).
When stopping the main Master (with highest Keepalived preference), the deployment of apps is successful (tested with Kubernetes bootcamp) and everything syncs properly with the main Master when it is back online.
Kubernetes bootcamp application runs successfully, and all master & worker nodes respond properly when the service exposing bootcamp with NodePort is curled.
Successfully deployed docker-registry as per https://github.com/kubernetes/ingress-nginx/tree/master/docs/examples/docker-registry
But then comes these issues:
Nginx Ingress Controller pod fails to run and enters state CrashLoopBackOff (refer to events below)
After installing helm and tiller on any Master, all commands using "helm install" or "helm list" failed to execute (refer to command ouputs below)
I am running Kubernetes v1.11.1 but kubeadm-config.yaml mentions 1.11.0, is this something I should worry about?
Shall I not follow the official documentation and go for other alternatives such as described at: https://medium.com/#bambash/ha-kubernetes-cluster-via-kubeadm-b2133360b198
Note: same issue with new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and deployed nginx latest ingress controller release 0.18.0.
-- Nginx controller pod events & logs:
Normal Pulled 28m (x38 over 2h) kubelet, node3.local Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.17.1" already present on machine
Warning Unhealthy 7m (x137 over 2h) kubelet, node3.local Liveness probe failed: Get http://10.240.3.14:10254/healthz: dial tcp 10.240.3.14:10254: connect: connection refused
Warning BackOff 2m (x502 over 2h) kubelet, node3.local Back-off restarting failed container
nginx version: nginx/1.13.12
W0809 14:05:46.171066 5 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0809 14:05:46.171748 5 main.go:191] Creating API client for https://10.250.0.1:443
-- helm command outputs:
'# helm install ...
Error: no available release name found
'# helm list
Error: Get https://10.250.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.250.0.1:443: i/o timeout
-- kubernetes service & endpoints:
# kubectl describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.250.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443
Session Affinity: None
Events: <none>
# kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default bc-svc 10.240.3.27:8080 6d
default kubernetes 10.171.4.10:6443,10.171.4.8:6443,10.171.4.9:6443 7d
ingress-nginx default-http-backend 10.240.3.24:8080 4d
kube-system kube-controller-manager <none> 7d
kube-system kube-dns 10.240.2.4:53,10.240.2.5:53,10.240.2.4:53 + 1 more... 7d
kube-system kube-scheduler <none> 7d
kube-system tiller-deploy 10.240.3.25:44134 5d

Problems solved when switched my POD network from Flanneld to Calico.
(tested on Kubernetes 1.11.0; will repeat tests tomorrow on latest k8s version 1.11.2)

As you can see in the Kubernetes client-go code, IP address and port are read from environment variables inside a container:
host, port := os.Getenv("KUBERNETES_SERVICE_HOST"), os.Getenv("KUBERNETES_SERVICE_PORT")
You can check these variables if you run following command mentioning any healthy pod in it:
$ kubectl exec <healthy-pod-name> -- printenv | grep SERVICE
I think the cause of the problem is that the variables KUBERNETES_SERVICE_HOST:KUBERNETES_SERVICE_PORT is set to 10.250.0.1:443 instead of 10.171.4.12:6443
Could you confirm it by checking these variables in your cluster?

Important Additional Notes:
After running couple of labs, I got the same issue with:
- new Kubernetes HA installation using the latest version 1.11.2 (three masters + one worker) and nginx latest ingress controller release 0.18.0.
- standalone Kubernetes master with few workers using version 1.11.1 (one master + two workers) and nginx latest ingress controller release 0.18.0.
- but with standalone Kubernetes master version 1.11.0 (one master + two workers), nginx ingress controller 0.17.1 worked with no complaints while 0.18.0 complained that Readiness probe failed but the pod went into the running state.
=> As a result, I think the issue is related to kubernetes releases 1.11.1 and 1.11.2 in the way they interpret the health probes maybe

Related

Can not ping to pod's ip of worker node in kubernetes

My cluster includes: 1 master and 2 worker nodes. I created a pod using deployment yaml. The pod running successfully on the worker node 1, I can ping the pod's ip on worker nodes but I can't ping the ip address of the pod on the master. I tried to disable firewarlld, restart docker but not successfully. Please see my commands
[root#k8s-master ~]# kubectl get pods -o wide | grep qldv
qldv-liberty-8499dfcf67-55njr 1/1 Running 0 6m42s 10.40.0.2 worker-node1 <none> <none>
[root#k8s-master ~]# ping 10.40.0.2
PING 10.40.0.2 (10.40.0.2) 56(84) bytes of data.
From 10.32.0.1 icmp_seq=1 Destination Host Unreachable
From 10.32.0.1 icmp_seq=2 Destination Host Unreachable
From 10.32.0.1 icmp_seq=3 Destination Host Unreachable
[root#k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 43d v1.15.0
worker-node1 Ready <none> 42d v1.15.0
worker-node2 Ready <none> 42d v1.15.0
[root#k8s-master ~]# kubectl describe pod qldv-liberty-8499dfcf67-55njr
Name: qldv-liberty-8499dfcf67-55njr
Namespace: default
Priority: 0
Node: worker-node1/192.168.142.130
Start Time: Sat, 17 Aug 2019 20:05:57 +0700
Labels: app=qldv-liberty
pod-template-hash=8499dfcf67
Annotations: <none>
Status: Running
IP: 10.40.0.2
Controlled By: ReplicaSet/qldv-liberty-8499dfcf67
Containers:
qldv-liberty:
Container ID: docker://03636fb62d4cca0e41f4ad9f5a94b50cf371089ab5a0813ed802d02f4ac4b07a
Image: qldv-liberty
Image ID: docker://sha256:bd0d7ce1c07da5b9d398131b17da7a6931a9b7ae0673d19a6ec0c409416afc69
Port: 9080/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 17 Aug 2019 20:06:23 +0700
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-vtphv (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-vtphv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-vtphv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 119s default-scheduler Successfully assigned default/qldv-liberty-8499dfcf67-55njr to worker-node1
Normal Pulled 96s kubelet, worker-node1 Container image "qldv-liberty" already present on machine
Normal Created 95s kubelet, worker-node1 Created container qldv-liberty
Normal Started 91s kubelet, worker-node1 Started container qldv-liberty
I have another app, it also has a pod that running on the worker node 1, and I can ping the pod's ip from master. But I don't know why it is impossible with above case.
Please help me !
I doubt that the cluster still exists, therefore I'd better share some troubleshooting tips:
Check status of all control plane components and node status. Ensure kube-proxy and network addon (flannel/calico/waive/etc) Pods exist on each node and in Ready state.
kubectl get deployments,daemonsets,pods,svc -A -o wide
        There are several requirements for Kubernetes cluster, and it worth to check if they are satisfied.
        Some useful information could be found in the control-plane component logs using
kubectl logs kube-component-name-pod -n kube-system
        or kubelet logs using
journalctl -u kubelet
It's better to use well known images like nginx or mendhak/http-https-echo. They could be configured to listen any desired port and provide detailed information about requests in logs or in HTTP reply. It helps to exclude application/image related issues.
Check connectivity to Pod IP and Service ClusterIP within the same node first.
If worker node OS doesn't have necessary tools for troubleshooting (e.g container optimized images or coreOS), Pod with Ubuntu or Busybox image can be used for that. Creating Deployment or DaemonSet could help to schedule it on all nodes. Note that firewall or network issues can block kubectl exec connections to that pods.
If everything works fine within the same node, but connection to the Pod can't be established from another node it worth to check network addon status and nodes firewall configuration. Native Linux firewall helpers can interfere with iptables set of rules created by kube-proxy and block the connection.
        Clusters created in public clouds may require additional routing, peering, cloud firewall or security groups configuration to allow full IPIP connectivity between cluster nodes, especially if they are created in different VPCs.
The next thing that worth to check is coredns/kube-dns health. They suppose to resolve to correct IP address cluster Services' names like servicename.namespacename.svc.cluster.local if requested using their pod IP addresses or kube-dns Service (it usually has IP address 10.96.0.10 in default kubeadm cluster configuration).
Solution for each problem could be found in another answers on StackExchange sites. Official documentation is another great source of information and also contains good examples.

kubernetes 1.12.2 failed to load Kubelet config file /var/lib/kubelet/config.yaml

Environment:
Kubernetes 1.12.2
Docker 18.9.0
microk8s.kubectl
$ k get all
NAME READY STATUS
RESTARTS AGE
pod/mysql-0 1/1 Running 0 72s
pod/nginx-ingress-microk8s-controller-c2pgz 0/1 CrashLoopBackOff 129 22h
pod/web-0 1/1 Running 0 78s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 70m
service/mysql-service ClusterIP None <none> 3306/TCP 72s
service/nginx-service ClusterIP None <none> 80/TCP 78s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nginx-ingress-microk8s-controller 1 1 0 1 0 <none> 2d22h
NAME DESIRED CURRENT AGE
statefulset.apps/mysql 1 1 72s
statefulset.apps/web 1 1 78s
/var/log/syslog:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Error syncing pod f0ab0f74-e6f2-11e8-8410-482ae31e6a94 ("nginx-ingress-microk8s-controller-c2pgz_default(f0ab0f74-e6f2-11e8-8410-482ae31e6a94)"), skipping: failed to "StartContainer" for "nginx-ingress-microk8s" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=nginx-ingress-microk8s pod=nginx-ingress-microk8s-controller-c2pgz_default(f0ab0f74-e6f2-11e8-8410-482ae31e6a94)"
What is nginx-ingress-microk8s-controller-c2pgz? Who started it?
You mentioned in the comments that the reason is related to kubeadm init fails.
The /var/lib/kubelet/config.yaml config file is being populated only after:
A successful cluster initialization (kubeadmin init) in the master node.
In the worker node - after a successful joining to the cluster (kubeadm join).
So if the problem is with kubeadm init you should check the command's output (also great if you could paste it in the question).
Make sure you don't run kubeadm init with the --ignore-preflight-errors=all flag.
I'm not familiar with your specific error, but in order for the answer to be more helpful - I'll try to give some possible solutions:
Make sure all requirements for kubeadm are in place.
Check the firewall rules - make sure you don't block egress traffic and that port 6443 ingress rule is open for the worker node (relevant for the joining phase).
Make sure that the required ports are not occupied.
Try restarting Kubelet with systemctl restart kubelet and check latest logs with: sudo journalctl -u kubelet -n 100 --no-pager.
Check if Docker version can be updated to a newer stabler one.
Try running kubeadm reset and make sure you re-run kubeadm init with latest version or with the specific stable version by addding --kubernetes-version=X.Y.Z.
As per RtmY, it works only kubectl initilzation works correct
after doing following
kubeadm init --pod-network-cidr=192.168.0.0/16
it worked successfully.
As i have updated kubelet, I am not able to find /var/lib/kubelet/config.yaml
For that "systemctl status kubelet|journalctl -xe"
failed to load Kubelet config file /var/lib/kubelet/config.yaml
As per the below link, I have copied the config.yaml from other working worker nodes and its worked !!
https://github.com/kubernetes/kubernetes/issues/65863#issuecomment-403003592

Kubernetes pods hanging in Init state

I am facing a weird issue with my pods. I am launching around 20 pods in my env and every time some random 3-4 pods out of them hang with Init:0/1 status. On checking the status of pod, Init container shows running status, which should terminate after task is finished, and app container shows Waiting/Pod Initializing stage. Same init container image and specs are being used in across all 20 pods but this issue is happening with some random pods every time. And on terminating these stuck pods, it stucks in Terminating state. If i ssh on node at which this pod is launched and run docker ps, it shows me init container in running state but on running docker exec it throws error that container doesn't exist. This init container is pulling configs from Consul Server and on checking volume (got from docker inspect), i found that it has pulled all the key-val pairs correctly and saved it in defined file name. I have checked resources on all the nodes and more than enough is available on all.
Below is detailed example of on the pod acting like this.
Kubectl Version :
kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Pods :
kubectl get pods -n dev1|grep -i session-service
session-service-app-75c9c8b5d9-dsmhp 0/1 Init:0/1 0 10h
session-service-app-75c9c8b5d9-vq98k 0/1 Terminating 0 11h
Pods Status :
kubectl describe pods session-service-app-75c9c8b5d9-dsmhp -n dev1
Name: session-service-app-75c9c8b5d9-dsmhp
Namespace: dev1
Node: ip-192-168-44-18.ap-southeast-1.compute.internal/192.168.44.18
Start Time: Fri, 27 Apr 2018 18:14:43 +0530
Labels: app=session-service-app
pod-template-hash=3175746185
release=session-service-app
Status: Pending
IP: 100.96.4.240
Controlled By: ReplicaSet/session-service-app-75c9c8b5d9
Init Containers:
initpullconsulconfig:
Container ID: docker://c658d59995636e39c9d03b06e4973b6e32f818783a21ad292a2cf20d0e43bb02
Image: shr-u-nexus-01.myops.de:8082/utils/app-init:1.0
Image ID: docker-pullable://shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd
Port: <none>
Args:
-consul-addr=consul-server.consul.svc.cluster.local:8500
State: Running
Started: Fri, 27 Apr 2018 18:14:44 +0530
Ready: False
Restart Count: 0
Environment:
CONSUL_TEMPLATE_VERSION: 0.19.4
POD: sand
SERVICE: session-service-app
ENV: dev1
Mounts:
/var/lib/app from shared-volume-sidecar (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro)
Containers:
session-service-app:
Container ID:
Image: shr-u-nexus-01.myops.de:8082/sand-images/sessionservice-init:sitv12
Image ID:
Port: 8080/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/appenv from shared-volume-sidecar (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro)
Conditions:
Type Status
Initialized False
Ready False
PodScheduled True
Volumes:
shared-volume-sidecar:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-bthkv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bthkv
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Container Status on Node :
sudo docker ps|grep -i session
c658d5999563 shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
da120abd3dbb gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
f53d48c7d6ec shr-u-nexus-01.myops.de:8082/utils/app-init#sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
c26415458d39 gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
On running Docker exec (same result with kubectl exec) :
sudo docker exec -it c658d5999563 bash
rpc error: code = 2 desc = containerd: container not found
A Pod can be stuck in Init status due to many reasons.
PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means one init container is not finalized; init:N/M means the Pod has M Init Containers, and N have completed so far.
Gathering information
For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.
kubectl describe pods pod-XXX with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name
kubectl logs pod-XXX this command prints the logs for a container in a pod or specified resource.
kubectl logs pod-XXX -c init-container-xxx This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:
The output of kubectl logs pod-XXX -c init-container-xxx can thrown meaningful info of the issue, reference:
In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.
Additional:
kubectl describe node node-XXX describing the pod will give you the name of its node, which you can also inspect with this command.
kubectl get events to list the cluster events.
journalctl -xeu kubelet | tail -n 10 kubelet logs on systemd (journalctl -xeu docker | tail -n 1 for docker).
Solutions
The solutions depends on the information gathered, once the root cause is found.
When you find a log with an insight of the root cause, you can investigate that specific root cause.
Some examples:
1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.
2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh.
3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.
4 > A FailedSync was found, and it was solved restarting docker on the node.
In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.
My problem was related to the ebs-csi-controller (AWS EKS 1.24)
The ebs addin needs access to a role, and in my case the role trust relationship was broken. It uses OIDC, so I had to add my cluster's OIDC provider manually into the IAM identity provider section
kubectl logs deployment/ebs-csi-controller -n kube-system -c ebs-plugin
helped diagnose this, as well as
https://aws.amazon.com/premiumsupport/knowledge-center/eks-troubleshoot-ebs-volume-mounts/

Where is kube-apiserver located

Base question: When I try to use kube-apiserver on my master node, I get command not found error. How I can install/configure kube-apiserver? Any link to example will help.
$ kube-apiserver --enable-admission-plugins DefaultStorageClass
-bash: kube-apiserver: command not found
Details: I am new to Kubernetes and Docker and was trying to create StatefulSet with volumeClaimTemplates. My problem is that the automatic PVs are not created and I get this message in the PVC log: "persistentvolume-controller waiting for a volume to be created". I am not sure if I need to define DefaultStorageClass and so needed kube-apiserver to define it.
Name: nfs
Namespace: default
StorageClass: example-nfs
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner=example.com/nfs
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 3m (x2401 over 10h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "example.com/nfs" or manually created by system administrator
Here is get pvc result:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs Pending example-nfs 10h
And get storageclass:
$ kubectl describe storageclass example-nfs
Name: example-nfs
IsDefaultClass: No
Annotations: <none>
Provisioner: example.com/nfs
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
How can I troubleshoot this issue (e.g. logs for why the storage was not created)?
You are asking two different questions here, one about kube-apiserver configuration, one about troubleshooting your StorageClass.
Here's an answer for your first question:
kube-apiserver is running as a Docker container on your master node. Therefore, the binary is within the container, not on your host system. It is started by the master's kubelet from a file located at /etc/kubernetes/manifests. kubelet is watching this directory and will start any Pod defined here as "static pods".
To configure kube-apiserver command line arguments you need to modify /etc/kubernetes/manifests/kube-apiserver.yaml on your master.
I'll refer to the question regarding the location of the api-server.
Basic answer (specific to the question title):
The kube apiserver is located on the master node (known as the control plane).
It can be executed:
1 ) Via the host's init system (like systemd).
2 ) As a pod (I'll explain below).
In both cases it will be located on the control plane (left side below):
If its running under systemD you can run: systemctl status api-server to see the path to the configuration (drop-in) file.
If it is running as pod you can view it under the kube-system namespace with all other control panel components (plus kube-proxy and maybe network solution like weave below):
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-lpdlc 1/1 Running 1 2d22h
coredns-f9fd979d6-vcs7g 1/1 Running 1 2d22h
etcd-my-master 1/1 Running 1 2d22h
kube-apiserver-my-master 1/1 Running 1 2d22h #<----Here
kube-controller-manager-my-master 1/1 Running 1 2d22h
kube-proxy-kh2lc 1/1 Running 1 2d22h
kube-scheduler-my-master 1/1 Running 1 2d22h
weave-net-59r5b 2/2 Running 3 2d22h
You can run:
kubectl describe pod/kube-apiserver-my-master -n kube-system
In order to get more details regarding the pod.
A bit more advanced answer:
(regarding the location of /etc/kubernetes/manifests)
Lets say we have no idea where to find the relevant path for the kube-api-server config file.
But we need to remember two important things:
1 ) The kube-api-server is running on the master node.
2 ) The Kubelet isn't running as pod and when the control plane components (plus kube-proxy) are executed as static pods - it is done by the Kubelet on the master node.
So we can start our journey for reaching the manifests path by investigating the Kubelet logs.
If the Kubelet is running for a long time it will be a very large file and we'll need to dump it somewhere and go to the begging - or if Kubelet was started 5 minutes ago we can run:
sudo journalctl -u kubelet --since -5m >> kubelet_5_minutes.log
And a quick search for "api-server" will bring us to the 2 lines below where the path of the manifests in mentioned:
my-master kubelet[71..]: 00:03:21 kubelet.go:261] Adding pod path: /etc/kubernetes/manifests
my-master kubelet[71..]: 00:03:21 kubelet.go:273] Watching apiserver
And also we can see that the Kubelet is trying to create the kube-apiserver pod under my-master node and inside the kube-system namespace:
my-master kubelet[71..]: 00:03:29.05 kubelet.go:1576] ..
Creating a mirror pod for "kube-apiserver-my-master_kube-system
To make the storage class "example-nfs" default, you need to run the below command:
kubectl patch storageclass example-nfs -p '{"metadata":
{"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'

Flannel fails in kubernetes cluster due to failure of subnet manager

I am running etcd, kube-apiserver, kube-scheduler, and kube-controllermanager on a master node as well as kubelet and kube-proxy on a minion node as follows (all kube binaries are from kubernetes 1.7.4):
# [master node]
./etcd
./kube-apiserver --logtostderr=true --etcd-servers=http://127.0.0.1:2379 --service-cluster-ip-range=10.10.10.0/24 --insecure-port 8080 --secure-port=0 --allow-privileged=true --insecure-bind-address 0.0.0.0
./kube-scheduler --address=0.0.0.0 --master=http://127.0.0.1:8080
./kube-controller-manager --address=0.0.0.0 --master=http://127.0.0.1:8080
# [minion node]
./kubelet --logtostderr=true --address=0.0.0.0 --api_servers=http://$MASTER_IP:8080 --allow-privileged=true
./kube-proxy --master=http://$MASTER_IP:8080
After this, if I execute kubectl get all --all-namespaces and kubectl get nodes, I get
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default svc/kubernetes 10.10.10.1 <none> 443/TCP 27m
NAME STATUS AGE VERSION
minion-1 Ready 27m v1.7.4+793658f2d7ca7
Then, I apply flannel as follows:
kubectl apply -f kube-flannel-rbac.yml -f kube-flannel.yml
Now, I see a pod is created, but with error:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-flannel-ds-p8tcb 1/2 CrashLoopBackOff 4 2m
When I check the logs inside the failed container in the minion node, I see the following error:
Failed to create SubnetManager: unable to initialize inclusterconfig: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
My question is: how to resolve this? Is this a SSL issue? What step am I missing in setting up my cluster?
Maybe it is your flannel yaml file has something wrong,
you can try this to install your flannel,
check the old ip link
ip link
if it show flannel,please delete it
ip link delete flannel.1
and install , its default pod network cdir is 10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.0/Documentation/kube-flannel.yml
You could try to pass --etcd-prefix=/your/prefix and --etcd-endpoints=address to flanneld instead of --kube-subnet-mgr so flannel get net-conf from etcd server and not from api server.
Keep in mind that you must to push net-conf to etcd server.
UPDATE
The problem (/var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory) can appear when execute apiserver without --admission-control=...,ServiceAccount,... or if kubelet is inside a container (eg: hypercube) and this last was my case. If you want execute k8s components inside a container you need to pass 'shared' option to kubelet volume
/var/lib/kubelet/:/var/lib/kubelet:rw,shared
Furthermore enable same option to docker in docker.service
MountFlags=shared
Now the question is: is there a security hole with shared mount?

Resources