First of all, let me thank you for this amazing guide. I'm very new to kubernetes and having a guide like this to follow helps a lot when trying to setup my first cluster!
That said, I'm having some issues with creating deploytments, as there are two pods that aren't being created, and remain stuck in the state: ContainerCreating
[root#master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane 25h v1.24.0
node1 Ready <none> 24h v1.24.0
node2 Ready <none> 24h v1.24.0
[root#master ~]# kubectl cluster-info
Kubernetes control plane is running at https://192.168.3.200:6443
CoreDNS is running at https://192.168.3.200:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The problem:
[root#master ~]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-6d4b75cb6d-v5pvk 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-q6nwq 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-sg4wn 0/1 ContainerCreating 0 114m
kube-system pod/etcd-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-apiserver-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-controller-manager-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-ftxzx 1/1 Running 2 (3h11m ago) 24h
kube-system pod/kube-proxy-pcl8q 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-q7dpw 1/1 Running 2 (3h23m ago) 24h
kube-system pod/kube-scheduler-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/weave-net-2p47z 2/2 Running 5 (3h23m ago) 24h
kube-system pod/weave-net-k5529 2/2 Running 4 (3h11m ago) 24h
kube-system pod/weave-net-tq4bs 2/2 Running 7 (3h26m ago) 25h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 25h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 25h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 25h
kube-system daemonset.apps/weave-net 3 3 3 3 3 <none> 25h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 25h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6d4b75cb6d 1 1 0 25h
kube-system replicaset.apps/coredns-7599c5f99f 2 2 0 116m
Note that the first three pods, from coredns, fail to start.
[root#master ~]# kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
93m Warning FailedCreatePodSandBox pod/nginx-deploy-99976564d-s4shk (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67": failed to find network info for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67"
.
[root#master ~]# kubectl describe pod coredns-6d4b75cb6d-v5pvk -n kube-system
Name: coredns-6d4b75cb6d-v5pvk
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: node2/192.168.3.202
Start Time: Thu, 12 May 2022 19:45:58 +0000
Labels: k8s-app=kube-dns
pod-template-hash=6d4b75cb6d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-6d4b75cb6d
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns/coredns:v1.8.6
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4bpvz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-4bpvz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 93s (x393 over 124m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967": failed to find network info for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967"
Versions:
[root#master ~]# docker --version
Docker version 20.10.15, build fd82621
[root#master ~]# kubelet --version
Kubernetes v1.24.0
[root#master ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:44:24Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
I have no idea where to go from here. I googled keywords like "rpc error weave k8s" and "Failed to create pod sandbox: rpc error" but none of the solutions I found had a solution to my problem. I saw some problems mentioning weaving net, could this be the problem? Maybe I got it wrong, but I'm sure I followed the instructions very well.
Any help would be greatly appreciated!
Looks like you got pretty far! Support for docker as a container runtime was dropped in 1.24.0. I can't tell if that is what you are using or not but if you are that could be your problem.
https://kubernetes.io/blog/2022/05/03/kubernetes-1-24-release-announcement/
You could switch to containerd for your container runtime but for the purposes of learning you could try the latest 1.23.x version of kubernetes. Get that to work then circle back and tackle containerd with kubernetes v1.24.0
You can still use docker on your laptop/desktop but on the k8s servers you will not be able to use docker on 1.24.x or later.
Hope that helps and good luck!
When setting up an ingress in my kubernetes project I can't seem to get it to work. I already checked following questions:
Enable Ingress controller on Docker Desktop with WLS2
Docker Desktop + k8s plus https proxy multiple external ports to pods on http in deployment?
How can I access nginx ingress on my local?
But I can't get it to work. When testing the service via NodePort (http://kubernetes.docker.internal:30090/ or localhost:30090) it works without any problem, but when using http://kubernetes.docker.internal/ I get kubernetes.docker.internal didn’t send any data. ERR_EMPTY_RESPONSE.
This is my yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
minReadySeconds: 30
selector:
matchLabels:
app: webapp
replicas: 1
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: gcr.io/google-samples/hello-app:2.0
env:
- name: "PORT"
value: "3000"
---
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
selector:
app: webapp
ports:
- name: http
port: 3000
nodePort: 30090 # only for NotPort > 30,000
type: NodePort #ClusterIP inside cluster
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: webapp-ingress
spec:
defaultBackend:
service:
name: webapp-service
port:
number: 3000
rules:
- host: kubernetes.docker.internal
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: webapp-service
port:
number: 3000
I also used following command:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.45.0/deploy/static/provider/cloud/deploy.yaml
The output of kubectl get all -A is as follows (indicating that the ingress controller is running):
NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/webapp-78d8b79b4f-7whzf 1/1 Running 0 13m
ingress-nginx pod/ingress-nginx-admission-create-gwhbq 0/1 Completed 0 11m
ingress-nginx pod/ingress-nginx-admission-patch-bxv9v 0/1 Completed 1 11m
ingress-nginx pod/ingress-nginx-controller-6f5454cbfb-s2w9p 1/1 Running 0 11m
kube-system pod/coredns-f9fd979d6-6xbxs 1/1 Running 0 19m
kube-system pod/coredns-f9fd979d6-frrrv 1/1 Running 0 19m
kube-system pod/etcd-docker-desktop 1/1 Running 0 18m
kube-system pod/kube-apiserver-docker-desktop 1/1 Running 0 18m
kube-system pod/kube-controller-manager-docker-desktop 1/1 Running 0 18m
kube-system pod/kube-proxy-mfwlw 1/1 Running 0 19m
kube-system pod/kube-scheduler-docker-desktop 1/1 Running 0 18m
kube-system pod/storage-provisioner 1/1 Running 0 18m
kube-system pod/vpnkit-controller 1/1 Running 0 18m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 19m
default service/webapp-service NodePort 10.111.167.112 <none> 3000:30090/TCP 13m
ingress-nginx service/ingress-nginx-controller LoadBalancer 10.106.21.69 localhost 80:32737/TCP,443:32675/TCP 11m
ingress-nginx service/ingress-nginx-controller-admission ClusterIP 10.105.208.234 <none> 443/TCP 11m
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 19m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 19m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
default deployment.apps/webapp 1/1 1 1 13m
ingress-nginx deployment.apps/ingress-nginx-controller 1/1 1 1 11m
kube-system deployment.apps/coredns 2/2 2 2 19m
NAMESPACE NAME DESIRED CURRENT READY AGE
default replicaset.apps/webapp-78d8b79b4f 1 1 1 13m
ingress-nginx replicaset.apps/ingress-nginx-controller-6f5454cbfb 1 1 1 11m
kube-system replicaset.apps/coredns-f9fd979d6 2 2 2 19m
NAMESPACE NAME COMPLETIONS DURATION AGE
ingress-nginx job.batch/ingress-nginx-admission-create 1/1 1s 11m
ingress-nginx job.batch/ingress-nginx-admission-patch 1/1 3s 11m
I already tried debugging, and when doing an exec to the nginx service:
kubectl exec service/ingress-nginx-controller -n ingress-nginx -it -- sh
I can do the following curl: curl -H "host:kubernetes.docker.internal" localhost and it returns the correct content. So to me this seems like my loadbalancer service is not used when opening http://kubernetes.docker.internal via the browser. I also tried using the same curl from my terminal but that had the same 'empty response' result.
I knew this is quite outdated thread, but i think my answer can help later visitors
Answer: You have to install ingress controller. For exam: ingress-nginx controller,
either using helm:
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
or kubectl:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/cloud/deploy.yaml
you can find addional info here
don't forget to add defined host to /etc/hosts file. e.g
127.0.0.1 your.defined.host
then access defined host as usual
I have my own Kubernetes env with 3 nodes. I'm trying to deploy a basic Ubuntu docker image from Docker Hub, but it gives error like below.
Firstly I followed the instructions the link below. I suppose that everything's ok.
https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#create-a-secret-in-the-cluster-that-holds-your-authorization-token
Then I created a basic ubuntu yaml file:
apiVersion: v1
kind: Pod
metadata:
name: private-reg
spec:
containers:
- name: private-reg-container
image: ubuntu:21.04
imagePullSecrets:
- name: regcred
But after I apply, pod's status is "ContainerCreating" and it gives error like below;
[root#user]# kubectl describe pod private-reg
Name: private-reg
Namespace: kube-system
Priority: 0
Node: server3/10.100.2.183
Start Time: Tue, 08 Dec 2020 14:44:41 +0200
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
private-reg-container:
Container ID:
Image: ubuntu:21.04
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-57296 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-57296:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-57296
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18s default-scheduler Successfully assigned kube-system/private-reg to server3
Warning FailedCreatePodSandBox 74s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "aac36b2a8b1af982eeea04bd914c3614ffda31c7881822c2d5dd335780335cf0" network for pod "private-reg": networkPlugin cni failed to set up pod "private-reg_kube-system" network: error getting ClusterInformation: Get "https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), failed to clean up sandbox container "aac36b2a8b1af982eeea04bd914c3614ffda31c7881822c2d5dd335780335cf0" network for pod "private-reg": networkPlugin cni failed to teardown pod "private-reg_kube-system" network: error getting ClusterInformation: Get "https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
Normal SandboxChanged 60s (x2 over 74s) kubelet Pod sandbox changed, it will be killed and re-created.
Pods are like below:
[root#server ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-gsksj 1/1 Running 3 35d
coredns-f9fd979d6-r5gr8 1/1 Running 3 35d
etcd-serverkube1 1/1 Running 4 35d
kube-apiserver-serverkube1 1/1 Running 5 35d
kube-controller-manager-srvkube1 1/1 Running 4 35d
kube-proxy-8ngxt 1/1 Running 1 22d
kube-proxy-lv5b4 1/1 Running 4 35d
kube-proxy-xfswt 1/1 Running 2 22d
kube-scheduler-srvkube1 1/1 Running 4 35d
ubuntu 1/1 Running 5 26d
weave-net-749h4 2/2 Running 2 22d
weave-net-cpj5h 2/2 Running 11 35d
weave-net-fpjqp 2/2 Running 5 22d
How can I solve this problem? Thanks!
New to K8s. So far I have the following:
docker-ce-19.03.8
docker-ce-cli-19.03.8
containerd.io-1.2.13
kubelet-1.18.5
kubeadm-1.18.5
kubectl-1.18.5
etcd-3.4.10
Use Flannel for Pod Overlay Net
Performed all of the host-level work (SELinux permissive, swapoff, etc.)
All Centos7 in an on-prem Vsphere envioronment (6.7U3)
I've built all my configs and currently have:
a 3-node external/stand-alone etcd cluster with peer-to-peer and client-server encrypted transmissions
a 3-node control plane cluster -- kubeadm init is bootstrapped with x509s and targets to the 3 etcds (so stacked etcd never happens)
HAProxy and Keepalived are installed on two of the etcd cluster members, load-balancing access to the API server endpoints on the control plane (TCP6443)
6-worker nodes
Storage configured with the in-tree Vmware Cloud Provider (I know it's deprecated)--and yes, this is my DEFAULT SC
Status Checks:
kubectl cluster-info reports:
[me#km-01 pods]$ kubectl cluster-info
Kubernetes master is running at https://k8snlb:6443
KubeDNS is running at https://k8snlb:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubectl get all --all-namespaces reports:
[me#km-01 pods]$ kubectl get all --all-namespaces -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ag1 pod/mssql-operator-68bcc684c4-rbzvn 1/1 Running 0 86m 10.10.4.133 kw-02.bogus.local <none> <none>
kube-system pod/coredns-66bff467f8-k6m94 1/1 Running 4 20h 10.10.0.11 km-01.bogus.local <none> <none>
kube-system pod/coredns-66bff467f8-v848r 1/1 Running 4 20h 10.10.0.10 km-01.bogus.local <none> <none>
kube-system pod/kube-apiserver-km-01.bogus.local 1/1 Running 8 10h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-controller-manager-km-01.bogus.local 1/1 Running 2 10h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-7l76c 1/1 Running 0 10h x.x.x..30 kw-01.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-8kft7 1/1 Running 0 10h x.x.x..33 kw-04.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-r5kqv 1/1 Running 0 10h x.x.x..34 kw-05.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-t6xcd 1/1 Running 0 10h x.x.x..35 kw-06.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-vhnx8 1/1 Running 0 10h x.x.x..32 kw-03.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-xdk2n 1/1 Running 0 10h x.x.x..31 kw-02.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-z4kfk 1/1 Running 4 20h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-proxy-49hsl 1/1 Running 0 10h x.x.x..35 kw-06.bogus.local <none> <none>
kube-system pod/kube-proxy-62klh 1/1 Running 0 10h x.x.x..34 kw-05.bogus.local <none> <none>
kube-system pod/kube-proxy-64d5t 1/1 Running 0 10h x.x.x..30 kw-01.bogus.local <none> <none>
kube-system pod/kube-proxy-6ch42 1/1 Running 4 20h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-proxy-9css4 1/1 Running 0 10h x.x.x..32 kw-03.bogus.local <none> <none>
kube-system pod/kube-proxy-hgrx8 1/1 Running 0 10h x.x.x..33 kw-04.bogus.local <none> <none>
kube-system pod/kube-proxy-ljlsh 1/1 Running 0 10h x.x.x..31 kw-02.bogus.local <none> <none>
kube-system pod/kube-scheduler-km-01.bogus.local 1/1 Running 5 20h x.x.x..25 km-01.bogus.local <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
ag1 service/ag1-primary NodePort 10.104.183.81 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30405/TCP 85m role.ag.mssql.microsoft.com/ag1=primary,type=sqlservr
ag1 service/ag1-secondary NodePort 10.102.52.31 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30713/TCP 85m role.ag.mssql.microsoft.com/ag1=secondary,type=sqlservr
ag1 service/mssql1 NodePort 10.96.166.108 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:32439/TCP 86m name=mssql1,type=sqlservr
ag1 service/mssql2 NodePort 10.109.146.58 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30636/TCP 86m name=mssql2,type=sqlservr
ag1 service/mssql3 NodePort 10.101.234.186 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30862/TCP 86m name=mssql3,type=sqlservr
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 20h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/kube-flannel-ds-amd64 7 7 7 7 7 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-amd64 app=flannel
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-arm app=flannel
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-arm64 app=flannel
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-ppc64le app=flannel
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-s390x app=flannel
kube-system daemonset.apps/kube-proxy 7 7 7 7 7 kubernetes.io/os=linux 20h kube-proxy k8s.gcr.io/kube-proxy:v1.18.7 k8s-app=kube-proxy
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
ag1 deployment.apps/mssql-operator 1/1 1 1 86m mssql-operator mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu app=mssql-operator
kube-system deployment.apps/coredns 2/2 2 2 20h coredns k8s.gcr.io/coredns:1.6.7 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
ag1 replicaset.apps/mssql-operator-68bcc684c4 1 1 1 86m mssql-operator mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu app=mssql-operator,pod-template-hash=68bcc684c4
kube-system replicaset.apps/coredns-66bff467f8 2 2 2 20h coredns k8s.gcr.io/coredns:1.6.7 k8s-app=kube-dns,pod-template-hash=66bff467f8
To the problem: There are a number of articles talking about a SQL2019 HA build. It appears that every single one however, is in the cloud whereas mine is on-prem in a Vsphere env. They appear to be very simple: Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.
My YAML's are based on: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files
For the blogs that actually screenshot the environment afterward, there should be at least 7 pods (1 Operator, 3 SQL Init, 3 SQL). If you look at my aforementioned all --all-namespaces output, I have everything (and in a running state) but no pods other than the running Operator...???
I actually broke the control plane back to a single-node just to try to isolate the logs. /var/log/container/* and /var/log/pod/* contain nothing of value to indicate a problem with storage or any other reason the the Pods are non-existent. It's probably also worth noting that I started using the latest sql2019 label: 2019-latest but when I got the same behavior there, I decided to try to use the old bits since so many blogs are based on CTP 2.1.
I can create PVs and PVCs using the VCP storage provider. I have my Secrets and can see them in the Secrets store.
I'm at a loss as to explain why pods are missing or where to look after checking journalctl, the daemons themselves, and /var/log and I don't see any indication there's even an attempt to create them -- the kubectl apply -f mssql-server2019.yaml that I adapted runs to completion and without error indicating 3 sql objects and 3 sql services get created. But here is the file anyway targeting CTP2.1:
cat << EOF > mssql-server2019.yaml
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql1, type: sqlservr}
name: mssql1
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql1, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql1, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql2, type: sqlservr}
name: mssql2
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql2, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql2, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql3, type: sqlservr}
name: mssql3
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql3, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql3, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
EOF
Edit1: kubectl logs -n ag mssql-operator-*
[sqlservers] 2020/08/14 14:36:48 Creating custom resource definition
[sqlservers] 2020/08/14 14:36:48 Created custom resource definition
[sqlservers] 2020/08/14 14:36:48 Waiting for custom resource definition to be available
[sqlservers] 2020/08/14 14:36:49 Watching for resources...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql1 in namespace ag1 ...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap ag1
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error creating ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql2 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql3 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
I've looked over my operator and mssql2019.yamls (specifically around the kind: SqlServer, since that seems to be where it's failing) and can't identify any glaring inconsistencies or differences.
So your operator is running:
ag1 pod/pod/mssql-operator-68bcc684c4-rbzvn 1/1 Running 0 86m 10.10.4.133 kw-02.bogus.local <none> <none>
I would start by looking at the logs there:
kubectl -n ag1 logs pod/mssql-operator-68bcc684c4-rbzvn
Most likely it needs to interact with the cloud provider (i.e Azure) and VMware is not supported but check what the logs say 👀.
Update:
Based on the logs you posted it looks like you are using K8s 1.18 and the operator is incompatible. It's trying to create a ConfigMap with a spec that the kube-apiserver is rejecting.
✌️
YAMLs mine are based off of: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files
Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.
I have just ran it on my GKE cluster and got similar result if I try running only these 3 files.
If you ran it without preparing PV and PVC ( .././sample-deployment-script/templates/pv*.yaml )
$ git clone https://github.com/microsoft/sql-server-samples.git
$ cd sql-server-samples/samples/features/high\ availability/Kubernetes/sample-manifest-files/
$ kubectl create -f operator.yaml
namespace/ag1 created
serviceaccount/mssql-operator created
clusterrole.rbac.authorization.k8s.io/mssql-operator-ag1 created
clusterrolebinding.rbac.authorization.k8s.io/mssql-operator-ag1 created
deployment.apps/mssql-operator created
$ kubectl create -f sqlserver.yaml
sqlserver.mssql.microsoft.com/mssql1 created
service/mssql1 created
sqlserver.mssql.microsoft.com/mssql2 created
service/mssql2 created
sqlserver.mssql.microsoft.com/mssql3 created
service/mssql3 created
$ kubectl create -f ag-services.yaml
service/ag1-primary created
service/ag1-secondary created
You'll have:
kubectl get pods -n ag1
NAME READY STATUS RESTARTS AGE
mssql-initialize-mssql1-js4zc 0/1 CreateContainerConfigError 0 6m12s
mssql-initialize-mssql2-72d8n 0/1 CreateContainerConfigError 0 6m8s
mssql-initialize-mssql3-h4mr9 0/1 CreateContainerConfigError 0 6m6s
mssql-operator-658558b57d-6xd95 1/1 Running 0 6m33s
mssql1-0 1/2 CrashLoopBackOff 5 6m12s
mssql2-0 1/2 CrashLoopBackOff 5 6m9s
mssql3-0 0/2 Pending 0 6m6s
I see that the failed mssql<N> pods are parts of statefulset.apps/mssql<N> and mssql-initialize-mssql<N> are parts of job.batch/mssql-initialize-mssql<N>
Upon adding PV and PVC it looks in a following way:
$ kubectl get all -n ag1
NAME READY STATUS RESTARTS AGE
mssql-operator-658558b57d-pgx74 1/1 Running 0 20m
And 3 sqlservers.mssql.microsoft.com objects
$ kubectl get sqlservers.mssql.microsoft.com -n ag1
NAME AGE
mssql1 64m
mssql2 64m
mssql3 64m
That is why it looks exactly as it is specified in the abovementioned files.
Any assistance would be greatly appreciated.
However, if you run:
sql-server-samples/samples/features/high availability/Kubernetes/sample-deployment-script/$ ./deploy-ag.py deploy --dry-run
configs will be generated automatically.
without dry-run and that configs (and with properly set PV+PVC) it gives us 7 pods.
You'll have configs generated. It'll be useful to compare auto-generated configs with the one's you have (and compare running only subset 3 files vs. stuff from deploy-ag.py )
P.S.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15+" GitVersion:"v1.15.11-dispatcher"
Server Version: version.Info{Major:"1", Minor:"15+" GitVersion:"v1.15.12-gke.2"
Environment: (On Premise):-
K8s-Master Server: Standalone Physical server.
OS - CentOS Linux release 8.2.2004 (Core)
K8s Client Version: v1.18.6
K8s Server Version: v1.18.6
K8S-Worker node OS – CentOS Linux release 8.2.2004 (Core) - Standalone Physical server.
I have installed the Kubernetes as per the below given link instructions.
https://www.tecmint.com/install-a-kubernetes-cluster-on-centos-8/
But weave-net namespace shows – CrashLoopBackOff
[root#K8S-Master ~]# kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default test-pod 1/1 Running 0 156m 10.88.0.83 k8s-worker-1 <none> <none>
kube-system coredns-66bff467f8-99dww 1/1 Running 1 2d23h 10.88.0.5 K8S-Master <none> <none>
kube-system coredns-66bff467f8-ghk5g 1/1 Running 2 2d23h 10.88.0.6 K8S-Master <none> <none>
kube-system etcd-K8S-Master 1/1 Running 1 2d23h 100.101.102.103 K8S-Master <none> <none>
kube-system kube-apiserver-K8S-Master 1/1 Running 1 2d23h 100.101.102.103 K8S-Master <none> <none>
kube-system kube-controller-manager-K8S-Master 1/1 Running 1 2d23h 100.101.102.103 K8S-Master <none> <none>
kube-system kube-proxy-btgqb 1/1 Running 1 2d23h 100.101.102.103 K8S-Master <none> <none>
kube-system kube-proxy-mqg85 1/1 Running 1 2d23h 100.101.102.104 k8s-worker-1 <none> <none>
kube-system kube-scheduler-K8S-Master 1/1 Running 2 2d23h 100.101.102.103 K8S-Master <none> <none>
kube-system weave-net-2nxpk 1/2 CrashLoopBackOff 848 2d23h 100.101.102.104 k8s-worker-1 <none> <none>
kube-system weave-net-n8wv9 1/2 CrashLoopBackOff 846 2d23h 100.101.102.103 K8S-Master <none> <none>
[root#K8S-Master ~]# kubectl logs weave-net-2nxpk -c weave --namespace=kube-system
ipset v7.2: Set cannot be destroyed: it is in use by a kernel component
[root#K8S-Master ~]# kubectl logs weave-net-n8wv9 -c weave --namespace=kube-system
ipset v7.2: Set cannot be destroyed: it is in use by a kernel component
[root#K8S-Master ~]# kubectl describe pod/weave-net-n8wv9 -n kube-system
Name: weave-net-n8wv9
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: K8S-Master/100.101.102.103
Start Time: Mon, 03 Aug 2020 10:56:12 +0530
Labels: controller-revision-hash=6768fc7ccf
name=weave-net
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 100.101.102.103
IPs:
IP: 100.101.102.103
Controlled By: DaemonSet/weave-net
Containers:
weave:
Container ID: docker://efeb277639ac8262c8864f2d598606e19caadbb65cdda4645d67589eab13d109
Image: docker.io/weaveworks/weave-kube:2.6.5
Image ID: docker-pullable://weaveworks/weave-kube#sha256:703a045a58377cb04bc85d0f5a7c93356d5490282accd7e5b5d7a99fe2ef09e2
Port: <none>
Host Port: <none>
Command:
/home/weave/launch.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 06 Aug 2020 20:18:21 +0530
Finished: Thu, 06 Aug 2020 20:18:21 +0530
Ready: False
Restart Count: 971
Requests:
cpu: 10m
Readiness: http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/host/etc from cni-conf (rw)
/host/home from cni-bin2 (rw)
/host/opt from cni-bin (rw)
/host/var/lib/dbus from dbus (rw)
/lib/modules from lib-modules (rw)
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-p9ltl (ro)
/weavedb from weavedb (rw)
weave-npc:
Container ID: docker://33f9fed68c452187490e261830a283c1ec9361aba01b86b60598dcc871ca1b11
Image: docker.io/weaveworks/weave-npc:2.6.5
Image ID: docker-pullable://weaveworks/weave-npc#sha256:0f6166e000faa500ccc0df53caae17edd3110590b7b159007a5ea727cdfb1cef
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 06 Aug 2020 17:13:35 +0530
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 06 Aug 2020 11:38:19 +0530
Finished: Thu, 06 Aug 2020 17:08:40 +0530
Ready: True
Restart Count: 4
Requests:
cpu: 10m
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-p9ltl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
weavedb:
Type: HostPath (bare host directory volume)
Path: /var/lib/weave
HostPathType:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
HostPathType:
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
HostPathType:
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
HostPathType:
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
HostPathType:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
weave-net-token-p9ltl:
Type: Secret (a volume populated by a Secret)
SecretName: weave-net-token-p9ltl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 2m3s (x863 over 3h7m) kubelet, K8S-Master Back-off restarting failed container
[root#K8S-Master ~]# journalctl -u kubelet
Aug 06 20:36:56 K8S-Master kubelet[2647]: I0806 20:36:56.549156 2647 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: f873d>
Aug 06 20:36:56 K8S-Master kubelet[2647]: E0806 20:36:56.549515 2647 pod_workers.go:191] Error syncing pod e7511fe6-c60f-4833-bfb6-d59d6e8720e3 ("wea>
Some configuration wrong during my K8s installation? Could someone provide workaround to solve this problem.
I have re-installed K8s with Calico network now all namespaces are running as expected.