First of all, let me thank you for this amazing guide. I'm very new to kubernetes and having a guide like this to follow helps a lot when trying to setup my first cluster!
That said, I'm having some issues with creating deploytments, as there are two pods that aren't being created, and remain stuck in the state: ContainerCreating
[root#master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane 25h v1.24.0
node1 Ready <none> 24h v1.24.0
node2 Ready <none> 24h v1.24.0
[root#master ~]# kubectl cluster-info
Kubernetes control plane is running at https://192.168.3.200:6443
CoreDNS is running at https://192.168.3.200:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The problem:
[root#master ~]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-6d4b75cb6d-v5pvk 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-q6nwq 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-sg4wn 0/1 ContainerCreating 0 114m
kube-system pod/etcd-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-apiserver-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-controller-manager-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-ftxzx 1/1 Running 2 (3h11m ago) 24h
kube-system pod/kube-proxy-pcl8q 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-q7dpw 1/1 Running 2 (3h23m ago) 24h
kube-system pod/kube-scheduler-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/weave-net-2p47z 2/2 Running 5 (3h23m ago) 24h
kube-system pod/weave-net-k5529 2/2 Running 4 (3h11m ago) 24h
kube-system pod/weave-net-tq4bs 2/2 Running 7 (3h26m ago) 25h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 25h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 25h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 25h
kube-system daemonset.apps/weave-net 3 3 3 3 3 <none> 25h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 25h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6d4b75cb6d 1 1 0 25h
kube-system replicaset.apps/coredns-7599c5f99f 2 2 0 116m
Note that the first three pods, from coredns, fail to start.
[root#master ~]# kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
93m Warning FailedCreatePodSandBox pod/nginx-deploy-99976564d-s4shk (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67": failed to find network info for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67"
.
[root#master ~]# kubectl describe pod coredns-6d4b75cb6d-v5pvk -n kube-system
Name: coredns-6d4b75cb6d-v5pvk
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: node2/192.168.3.202
Start Time: Thu, 12 May 2022 19:45:58 +0000
Labels: k8s-app=kube-dns
pod-template-hash=6d4b75cb6d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-6d4b75cb6d
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns/coredns:v1.8.6
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4bpvz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-4bpvz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 93s (x393 over 124m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967": failed to find network info for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967"
Versions:
[root#master ~]# docker --version
Docker version 20.10.15, build fd82621
[root#master ~]# kubelet --version
Kubernetes v1.24.0
[root#master ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:44:24Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
I have no idea where to go from here. I googled keywords like "rpc error weave k8s" and "Failed to create pod sandbox: rpc error" but none of the solutions I found had a solution to my problem. I saw some problems mentioning weaving net, could this be the problem? Maybe I got it wrong, but I'm sure I followed the instructions very well.
Any help would be greatly appreciated!
Looks like you got pretty far! Support for docker as a container runtime was dropped in 1.24.0. I can't tell if that is what you are using or not but if you are that could be your problem.
https://kubernetes.io/blog/2022/05/03/kubernetes-1-24-release-announcement/
You could switch to containerd for your container runtime but for the purposes of learning you could try the latest 1.23.x version of kubernetes. Get that to work then circle back and tackle containerd with kubernetes v1.24.0
You can still use docker on your laptop/desktop but on the k8s servers you will not be able to use docker on 1.24.x or later.
Hope that helps and good luck!
When setting up an ingress in my kubernetes project I can't seem to get it to work. I already checked following questions:
Enable Ingress controller on Docker Desktop with WLS2
Docker Desktop + k8s plus https proxy multiple external ports to pods on http in deployment?
How can I access nginx ingress on my local?
But I can't get it to work. When testing the service via NodePort (http://kubernetes.docker.internal:30090/ or localhost:30090) it works without any problem, but when using http://kubernetes.docker.internal/ I get kubernetes.docker.internal didn’t send any data. ERR_EMPTY_RESPONSE.
This is my yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
minReadySeconds: 30
selector:
matchLabels:
app: webapp
replicas: 1
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: gcr.io/google-samples/hello-app:2.0
env:
- name: "PORT"
value: "3000"
---
apiVersion: v1
kind: Service
metadata:
name: webapp-service
spec:
selector:
app: webapp
ports:
- name: http
port: 3000
nodePort: 30090 # only for NotPort > 30,000
type: NodePort #ClusterIP inside cluster
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: webapp-ingress
spec:
defaultBackend:
service:
name: webapp-service
port:
number: 3000
rules:
- host: kubernetes.docker.internal
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: webapp-service
port:
number: 3000
I also used following command:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.45.0/deploy/static/provider/cloud/deploy.yaml
The output of kubectl get all -A is as follows (indicating that the ingress controller is running):
NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/webapp-78d8b79b4f-7whzf 1/1 Running 0 13m
ingress-nginx pod/ingress-nginx-admission-create-gwhbq 0/1 Completed 0 11m
ingress-nginx pod/ingress-nginx-admission-patch-bxv9v 0/1 Completed 1 11m
ingress-nginx pod/ingress-nginx-controller-6f5454cbfb-s2w9p 1/1 Running 0 11m
kube-system pod/coredns-f9fd979d6-6xbxs 1/1 Running 0 19m
kube-system pod/coredns-f9fd979d6-frrrv 1/1 Running 0 19m
kube-system pod/etcd-docker-desktop 1/1 Running 0 18m
kube-system pod/kube-apiserver-docker-desktop 1/1 Running 0 18m
kube-system pod/kube-controller-manager-docker-desktop 1/1 Running 0 18m
kube-system pod/kube-proxy-mfwlw 1/1 Running 0 19m
kube-system pod/kube-scheduler-docker-desktop 1/1 Running 0 18m
kube-system pod/storage-provisioner 1/1 Running 0 18m
kube-system pod/vpnkit-controller 1/1 Running 0 18m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 19m
default service/webapp-service NodePort 10.111.167.112 <none> 3000:30090/TCP 13m
ingress-nginx service/ingress-nginx-controller LoadBalancer 10.106.21.69 localhost 80:32737/TCP,443:32675/TCP 11m
ingress-nginx service/ingress-nginx-controller-admission ClusterIP 10.105.208.234 <none> 443/TCP 11m
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 19m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 19m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
default deployment.apps/webapp 1/1 1 1 13m
ingress-nginx deployment.apps/ingress-nginx-controller 1/1 1 1 11m
kube-system deployment.apps/coredns 2/2 2 2 19m
NAMESPACE NAME DESIRED CURRENT READY AGE
default replicaset.apps/webapp-78d8b79b4f 1 1 1 13m
ingress-nginx replicaset.apps/ingress-nginx-controller-6f5454cbfb 1 1 1 11m
kube-system replicaset.apps/coredns-f9fd979d6 2 2 2 19m
NAMESPACE NAME COMPLETIONS DURATION AGE
ingress-nginx job.batch/ingress-nginx-admission-create 1/1 1s 11m
ingress-nginx job.batch/ingress-nginx-admission-patch 1/1 3s 11m
I already tried debugging, and when doing an exec to the nginx service:
kubectl exec service/ingress-nginx-controller -n ingress-nginx -it -- sh
I can do the following curl: curl -H "host:kubernetes.docker.internal" localhost and it returns the correct content. So to me this seems like my loadbalancer service is not used when opening http://kubernetes.docker.internal via the browser. I also tried using the same curl from my terminal but that had the same 'empty response' result.
I knew this is quite outdated thread, but i think my answer can help later visitors
Answer: You have to install ingress controller. For exam: ingress-nginx controller,
either using helm:
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
or kubectl:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/cloud/deploy.yaml
you can find addional info here
don't forget to add defined host to /etc/hosts file. e.g
127.0.0.1 your.defined.host
then access defined host as usual
I have applied a elasticsearch on k8s in my Mac (a minikube cluster). The elasticsearch configure file is:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.10.0
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
podTemplate:
spec:
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms2g -Xmx2g
resources:
requests:
memory: 4Gi
cpu: 4
limits:
memory: 4Gi
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 1Gi
after I run kubectl apply -f es.yaml, the pod and services are created but the pod is pending.
$kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 28h
quickstart-es-default ClusterIP None <none> 9200/TCP 21m
quickstart-es-http ClusterIP 10.103.177.195 <none> 9200/TCP 21m
quickstart-es-transport ClusterIP None <none> 9300/TCP 21m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
quickstart-es-default-0 0/1 Pending 0 21m
the output of describe pods is:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m (x2 over 22m) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 40s (x17 over 22m) default-scheduler 0/1 nodes are available: 1 Insufficient memory.
It seems that I don't have enough memory in my pod. How can I allocate more memories to my pod?
Minikube itself starts with DefaultMemory = 2048, you are hitting this limit.
Using minikube you should think in advance how much resources your pods/replicas use in total, so that you can use --memory flag to allocated need amount of RAM.
You already got an answer in separate question. In addition to that I would add that you should always do minikube delete prior to start minikube with --memory= option, eg minikube start --memory=8192
You can always check you current memory configuration by kubectl describe node in Capacity section, e.g.
Capacity:
cpu: 1
...
memory: 3785984Ki
pods: 110
New to K8s. So far I have the following:
docker-ce-19.03.8
docker-ce-cli-19.03.8
containerd.io-1.2.13
kubelet-1.18.5
kubeadm-1.18.5
kubectl-1.18.5
etcd-3.4.10
Use Flannel for Pod Overlay Net
Performed all of the host-level work (SELinux permissive, swapoff, etc.)
All Centos7 in an on-prem Vsphere envioronment (6.7U3)
I've built all my configs and currently have:
a 3-node external/stand-alone etcd cluster with peer-to-peer and client-server encrypted transmissions
a 3-node control plane cluster -- kubeadm init is bootstrapped with x509s and targets to the 3 etcds (so stacked etcd never happens)
HAProxy and Keepalived are installed on two of the etcd cluster members, load-balancing access to the API server endpoints on the control plane (TCP6443)
6-worker nodes
Storage configured with the in-tree Vmware Cloud Provider (I know it's deprecated)--and yes, this is my DEFAULT SC
Status Checks:
kubectl cluster-info reports:
[me#km-01 pods]$ kubectl cluster-info
Kubernetes master is running at https://k8snlb:6443
KubeDNS is running at https://k8snlb:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubectl get all --all-namespaces reports:
[me#km-01 pods]$ kubectl get all --all-namespaces -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ag1 pod/mssql-operator-68bcc684c4-rbzvn 1/1 Running 0 86m 10.10.4.133 kw-02.bogus.local <none> <none>
kube-system pod/coredns-66bff467f8-k6m94 1/1 Running 4 20h 10.10.0.11 km-01.bogus.local <none> <none>
kube-system pod/coredns-66bff467f8-v848r 1/1 Running 4 20h 10.10.0.10 km-01.bogus.local <none> <none>
kube-system pod/kube-apiserver-km-01.bogus.local 1/1 Running 8 10h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-controller-manager-km-01.bogus.local 1/1 Running 2 10h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-7l76c 1/1 Running 0 10h x.x.x..30 kw-01.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-8kft7 1/1 Running 0 10h x.x.x..33 kw-04.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-r5kqv 1/1 Running 0 10h x.x.x..34 kw-05.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-t6xcd 1/1 Running 0 10h x.x.x..35 kw-06.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-vhnx8 1/1 Running 0 10h x.x.x..32 kw-03.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-xdk2n 1/1 Running 0 10h x.x.x..31 kw-02.bogus.local <none> <none>
kube-system pod/kube-flannel-ds-amd64-z4kfk 1/1 Running 4 20h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-proxy-49hsl 1/1 Running 0 10h x.x.x..35 kw-06.bogus.local <none> <none>
kube-system pod/kube-proxy-62klh 1/1 Running 0 10h x.x.x..34 kw-05.bogus.local <none> <none>
kube-system pod/kube-proxy-64d5t 1/1 Running 0 10h x.x.x..30 kw-01.bogus.local <none> <none>
kube-system pod/kube-proxy-6ch42 1/1 Running 4 20h x.x.x..25 km-01.bogus.local <none> <none>
kube-system pod/kube-proxy-9css4 1/1 Running 0 10h x.x.x..32 kw-03.bogus.local <none> <none>
kube-system pod/kube-proxy-hgrx8 1/1 Running 0 10h x.x.x..33 kw-04.bogus.local <none> <none>
kube-system pod/kube-proxy-ljlsh 1/1 Running 0 10h x.x.x..31 kw-02.bogus.local <none> <none>
kube-system pod/kube-scheduler-km-01.bogus.local 1/1 Running 5 20h x.x.x..25 km-01.bogus.local <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
ag1 service/ag1-primary NodePort 10.104.183.81 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30405/TCP 85m role.ag.mssql.microsoft.com/ag1=primary,type=sqlservr
ag1 service/ag1-secondary NodePort 10.102.52.31 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30713/TCP 85m role.ag.mssql.microsoft.com/ag1=secondary,type=sqlservr
ag1 service/mssql1 NodePort 10.96.166.108 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:32439/TCP 86m name=mssql1,type=sqlservr
ag1 service/mssql2 NodePort 10.109.146.58 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30636/TCP 86m name=mssql2,type=sqlservr
ag1 service/mssql3 NodePort 10.101.234.186 x.x.x..30,x.x.x..31,x.x.x..32,x.x.x..33,x.x.x..34,x.x.x..35 1433:30862/TCP 86m name=mssql3,type=sqlservr
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 20h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/kube-flannel-ds-amd64 7 7 7 7 7 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-amd64 app=flannel
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-arm app=flannel
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-arm64 app=flannel
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-ppc64le app=flannel
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 <none> 20h kube-flannel quay.io/coreos/flannel:v0.12.0-s390x app=flannel
kube-system daemonset.apps/kube-proxy 7 7 7 7 7 kubernetes.io/os=linux 20h kube-proxy k8s.gcr.io/kube-proxy:v1.18.7 k8s-app=kube-proxy
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
ag1 deployment.apps/mssql-operator 1/1 1 1 86m mssql-operator mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu app=mssql-operator
kube-system deployment.apps/coredns 2/2 2 2 20h coredns k8s.gcr.io/coredns:1.6.7 k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
ag1 replicaset.apps/mssql-operator-68bcc684c4 1 1 1 86m mssql-operator mcr.microsoft.com/mssql/ha:2019-CTP2.1-ubuntu app=mssql-operator,pod-template-hash=68bcc684c4
kube-system replicaset.apps/coredns-66bff467f8 2 2 2 20h coredns k8s.gcr.io/coredns:1.6.7 k8s-app=kube-dns,pod-template-hash=66bff467f8
To the problem: There are a number of articles talking about a SQL2019 HA build. It appears that every single one however, is in the cloud whereas mine is on-prem in a Vsphere env. They appear to be very simple: Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.
My YAML's are based on: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files
For the blogs that actually screenshot the environment afterward, there should be at least 7 pods (1 Operator, 3 SQL Init, 3 SQL). If you look at my aforementioned all --all-namespaces output, I have everything (and in a running state) but no pods other than the running Operator...???
I actually broke the control plane back to a single-node just to try to isolate the logs. /var/log/container/* and /var/log/pod/* contain nothing of value to indicate a problem with storage or any other reason the the Pods are non-existent. It's probably also worth noting that I started using the latest sql2019 label: 2019-latest but when I got the same behavior there, I decided to try to use the old bits since so many blogs are based on CTP 2.1.
I can create PVs and PVCs using the VCP storage provider. I have my Secrets and can see them in the Secrets store.
I'm at a loss as to explain why pods are missing or where to look after checking journalctl, the daemons themselves, and /var/log and I don't see any indication there's even an attempt to create them -- the kubectl apply -f mssql-server2019.yaml that I adapted runs to completion and without error indicating 3 sql objects and 3 sql services get created. But here is the file anyway targeting CTP2.1:
cat << EOF > mssql-server2019.yaml
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql1, type: sqlservr}
name: mssql1
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql1, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql1, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql2, type: sqlservr}
name: mssql2
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql2, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql2, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
apiVersion: mssql.microsoft.com/v1
kind: SqlServer
metadata:
labels: {name: mssql3, type: sqlservr}
name: mssql3
namespace: ag1
spec:
acceptEula: true
agentsContainerImage: mcr.microsoft.com/mssql/ha:2019-CTP2.1
availabilityGroups: [ag1]
instanceRootVolumeClaimTemplate:
accessModes: [ReadWriteOnce]
resources:
requests: {storage: 5Gi}
storageClass: default
saPassword:
secretKeyRef: {key: sapassword, name: sql-secrets}
sqlServerContainer: {image: 'mcr.microsoft.com/mssql/server:2019-CTP2.1'}
---
apiVersion: v1
kind: Service
metadata: {name: mssql3, namespace: ag1}
spec:
ports:
- {name: tds, port: 1433}
selector: {name: mssql3, type: sqlservr}
type: NodePort
externalIPs:
- x.x.x.30
- x.x.x.31
- x.x.x.32
- x.x.x.33
- x.x.x.34
- x.x.x.35
---
EOF
Edit1: kubectl logs -n ag mssql-operator-*
[sqlservers] 2020/08/14 14:36:48 Creating custom resource definition
[sqlservers] 2020/08/14 14:36:48 Created custom resource definition
[sqlservers] 2020/08/14 14:36:48 Waiting for custom resource definition to be available
[sqlservers] 2020/08/14 14:36:49 Watching for resources...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql1 in namespace ag1 ...
[sqlservers] 2020/08/14 14:37:08 Creating ConfigMap ag1
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error creating ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql2 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
[sqlservers] 2020/08/14 14:37:08 Updating ConfigMap sql-operator
[sqlservers] 2020/08/14 14:37:08 Updating mssql3 in namespace ag1 ...
[sqlservers] ERROR: 2020/08/14 14:37:08 could not process update request: error getting ConfigMap ag1: v1.ConfigMap: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, parsing 627 ...:{},"k:{\"... at {"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"ag1","namespace":"ag1","selfLink":"/api/v1/namespaces/ag1/configmaps/ag1","uid":"33af6232-4464-4290-bb14-b21e8f72e361","resourceVersion":"314186","creationTimestamp":"2020-08-14T14:37:08Z","ownerReferences":[{"apiVersion":"mssql.microsoft.com/v1","kind":"ReplicationController","name":"mssql1","uid":"e71a7246-2776-4d96-9735-844ee136a37d","controller":false}],"managedFields":[{"manager":"mssql-server-k8s-operator","operation":"Update","apiVersion":"v1","time":"2020-08-14T14:37:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"e71a7246-2776-4d96-9735-844ee136a37d\"}":{".":{},"f:apiVersion":{},"f:controller":{},"f:kind":{},"f:name":{},"f:uid":{}}}}}}]}}
I've looked over my operator and mssql2019.yamls (specifically around the kind: SqlServer, since that seems to be where it's failing) and can't identify any glaring inconsistencies or differences.
So your operator is running:
ag1 pod/pod/mssql-operator-68bcc684c4-rbzvn 1/1 Running 0 86m 10.10.4.133 kw-02.bogus.local <none> <none>
I would start by looking at the logs there:
kubectl -n ag1 logs pod/mssql-operator-68bcc684c4-rbzvn
Most likely it needs to interact with the cloud provider (i.e Azure) and VMware is not supported but check what the logs say 👀.
Update:
Based on the logs you posted it looks like you are using K8s 1.18 and the operator is incompatible. It's trying to create a ConfigMap with a spec that the kube-apiserver is rejecting.
✌️
YAMLs mine are based off of: https://github.com/microsoft/sql-server-samples/tree/master/samples/features/high%20availability/Kubernetes/sample-manifest-files
Run 3 scripts in this order: operator.yaml, sql.yaml, and ag-service.yaml.
I have just ran it on my GKE cluster and got similar result if I try running only these 3 files.
If you ran it without preparing PV and PVC ( .././sample-deployment-script/templates/pv*.yaml )
$ git clone https://github.com/microsoft/sql-server-samples.git
$ cd sql-server-samples/samples/features/high\ availability/Kubernetes/sample-manifest-files/
$ kubectl create -f operator.yaml
namespace/ag1 created
serviceaccount/mssql-operator created
clusterrole.rbac.authorization.k8s.io/mssql-operator-ag1 created
clusterrolebinding.rbac.authorization.k8s.io/mssql-operator-ag1 created
deployment.apps/mssql-operator created
$ kubectl create -f sqlserver.yaml
sqlserver.mssql.microsoft.com/mssql1 created
service/mssql1 created
sqlserver.mssql.microsoft.com/mssql2 created
service/mssql2 created
sqlserver.mssql.microsoft.com/mssql3 created
service/mssql3 created
$ kubectl create -f ag-services.yaml
service/ag1-primary created
service/ag1-secondary created
You'll have:
kubectl get pods -n ag1
NAME READY STATUS RESTARTS AGE
mssql-initialize-mssql1-js4zc 0/1 CreateContainerConfigError 0 6m12s
mssql-initialize-mssql2-72d8n 0/1 CreateContainerConfigError 0 6m8s
mssql-initialize-mssql3-h4mr9 0/1 CreateContainerConfigError 0 6m6s
mssql-operator-658558b57d-6xd95 1/1 Running 0 6m33s
mssql1-0 1/2 CrashLoopBackOff 5 6m12s
mssql2-0 1/2 CrashLoopBackOff 5 6m9s
mssql3-0 0/2 Pending 0 6m6s
I see that the failed mssql<N> pods are parts of statefulset.apps/mssql<N> and mssql-initialize-mssql<N> are parts of job.batch/mssql-initialize-mssql<N>
Upon adding PV and PVC it looks in a following way:
$ kubectl get all -n ag1
NAME READY STATUS RESTARTS AGE
mssql-operator-658558b57d-pgx74 1/1 Running 0 20m
And 3 sqlservers.mssql.microsoft.com objects
$ kubectl get sqlservers.mssql.microsoft.com -n ag1
NAME AGE
mssql1 64m
mssql2 64m
mssql3 64m
That is why it looks exactly as it is specified in the abovementioned files.
Any assistance would be greatly appreciated.
However, if you run:
sql-server-samples/samples/features/high availability/Kubernetes/sample-deployment-script/$ ./deploy-ag.py deploy --dry-run
configs will be generated automatically.
without dry-run and that configs (and with properly set PV+PVC) it gives us 7 pods.
You'll have configs generated. It'll be useful to compare auto-generated configs with the one's you have (and compare running only subset 3 files vs. stuff from deploy-ag.py )
P.S.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15+" GitVersion:"v1.15.11-dispatcher"
Server Version: version.Info{Major:"1", Minor:"15+" GitVersion:"v1.15.12-gke.2"
I am practicing the k8s by following the ingress chapter. I am using Google Cluster. Specification are as follows
master: 1.11.7-gke.4
node: 1.11.7-gke.4
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-singh-default-pool-a69fa545-1sm3 Ready <none> 6h v1.11.7-gke.4 10.148.0.46 35.197.128.107 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-819z Ready <none> 6h v1.11.7-gke.4 10.148.0.47 35.198.217.71 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-djhz Ready <none> 6h v1.11.7-gke.4 10.148.0.45 35.197.159.75 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
master endpoint: 35.186.148.93
DNS: singh.hbot.io (master IP)
To keep my question short. I post my source code in the snippet and links back to here.
Files:
deployment.yaml
ingress.yaml
ingress-rules.yaml
Problem:
curl http://singh.hbot.io/webapp1 got timed out
Description
$ kubectl get deployment -n nginx-ingress
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx-ingress 1 1 1 0 2h
nginx-ingress deployment is not available.
$ kubectl describe deployment -n nginx-ingress
Name: nginx-ingress
Namespace: nginx-ingress
CreationTimestamp: Mon, 04 Mar 2019 15:09:42 +0700
Labels: app=nginx-ingress
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-ingress","namespace":"nginx-ingress"},"s...
Selector: app=nginx-ingress
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=nginx-ingress
Service Account: nginx-ingress
Containers:
nginx-ingress:
Image: nginx/nginx-ingress:edge
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-ingress-77fcd48f4d (1/1 replicas created)
Events: <none>
pods:
$ kubectl get pods --all-namespaces=true
NAMESPACE NAME READY STATUS RESTARTS AGE
default webapp1-7d67d68676-k9hhl 1/1 Running 0 6h
default webapp2-64d4844b78-9kln5 1/1 Running 0 6h
default webapp3-5b8ff7484d-zvcsf 1/1 Running 0 6h
kube-system event-exporter-v0.2.3-85644fcdf-xxflh 2/2 Running 0 6h
kube-system fluentd-gcp-scaler-8b674f786-gvv98 1/1 Running 0 6h
kube-system fluentd-gcp-v3.2.0-srzc2 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-w2z2q 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-z7p9l 2/2 Running 0 6h
kube-system heapster-v1.6.0-beta.1-5685746c7b-kd4mn 3/3 Running 0 6h
kube-system kube-dns-6b98c9c9bf-6p8qr 4/4 Running 0 6h
kube-system kube-dns-6b98c9c9bf-pffpt 4/4 Running 0 6h
kube-system kube-dns-autoscaler-67c97c87fb-gbgrs 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-1sm3 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-819z 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-djhz 1/1 Running 0 6h
kube-system l7-default-backend-7ff48cffd7-trqvx 1/1 Running 0 6h
kube-system metrics-server-v0.2.1-fd596d746-bvdfk 2/2 Running 0 6h
kube-system tiller-deploy-57c574bfb8-xnmtj 1/1 Running 0 1h
nginx-ingress nginx-ingress-77fcd48f4d-rfwbk 0/1 CrashLoopBackOff 35 2h
describe pod
$ kubectl describe pods -n nginx-ingress
Name: nginx-ingress-77fcd48f4d-5rhtv
Namespace: nginx-ingress
Priority: 0
PriorityClassName: <none>
Node: gke-singh-default-pool-a69fa545-djhz/10.148.0.45
Start Time: Mon, 04 Mar 2019 17:55:00 +0700
Labels: app=nginx-ingress
pod-template-hash=3397804908
Annotations: <none>
Status: Running
IP: 10.48.2.10
Controlled By: ReplicaSet/nginx-ingress-77fcd48f4d
Containers:
nginx-ingress:
Container ID: docker://5d3ee9e2bf7a2060ff0a96fdd884a937b77978c137df232dbfd0d3e5de89fe0e
Image: nginx/nginx-ingress:edge
Image ID: docker-pullable://nginx/nginx-ingress#sha256:16c1c6dde0b904f031d3c173e0b04eb82fe9c4c85cb1e1f83a14d5b56a568250
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 04 Mar 2019 18:16:33 +0700
Finished: Mon, 04 Mar 2019 18:16:33 +0700
Ready: False
Restart Count: 9
Environment:
POD_NAMESPACE: nginx-ingress (v1:metadata.namespace)
POD_NAME: nginx-ingress-77fcd48f4d-5rhtv (v1:metadata.name)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-token-zvcwt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-token-zvcwt:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-token-zvcwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26m default-scheduler Successfully assigned nginx-ingress/nginx-ingress-77fcd48f4d-5rhtv to gke-singh-default-pool-a69fa545-djhz
Normal Created 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Created container
Normal Started 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Started container
Normal Pulling 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz pulling image "nginx/nginx-ingress:edge"
Normal Pulled 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Successfully pulled image "nginx/nginx-ingress:edge"
Warning BackOff 62s (x112 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Back-off restarting failed container
Fix container terminated
Add to the command to ingress.yaml to prevent container finish running and get terminated by k8s.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
Ingress has no IP address from GKE. Let me have a look in details
describe ingress:
$ kubectl describe ing
Name: webapp-ingress
Namespace: default
Address:
Default backend: default-http-backend:80 (10.48.0.8:8080)
Rules:
Host Path Backends
---- ---- --------
*
/webapp1 webapp1-svc:80 (<none>)
/webapp2 webapp2-svc:80 (<none>)
webapp3-svc:80 (<none>)
Annotations:
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{},"name":"webapp-ingress","namespace":"default"},"spec":{"rules":[{"http":{"paths":[{"backend":{"serviceName":"webapp1-svc","servicePort":80},"path":"/webapp1"},{"backend":{"serviceName":"webapp2-svc","servicePort":80},"path":"/webapp2"},{"backend":{"serviceName":"webapp3-svc","servicePort":80}}]}}]}}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
From this line I got all the ultimate solution from Christian Roy Thank you very much.
Fix the ClusterIP
It is default value then I have to edit my manifest file using NodePort as follow
apiVersion: v1
kind: Service
metadata:
name: webapp1-svc
labels:
app: webapp1
spec:
type: NodePort
ports:
- port: 80
selector:
app: webapp1
And that is.
The answer is in your question. The describe of your ingress shows the problem.
You did kubectl describe ing and the last part of that output was:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
The important part is:
error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
Solution
Just change all your services to be of type NodePort and it will work.
I have to add command in order to let the container not finish working.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]