I've accidentally drained/uncordoned all nodes in Kubernetes (even master) and now I'm trying to bring it back by connecting to the ETCD and manually change some keys in there. I successfuly bashed into etcd container:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8fbcb67da963 quay.io/coreos/etcd:v3.3.10 "/usr/local/bin/etcd" 17 hours ago Up 17 hours etcd1
a0d6426df02a cd48205a40f0 "kube-controller-man…" 17 hours ago Up 17 hours k8s_kube-controller-manager_kube-controller-manager-node1_kube-system_0441d7804a7366fd957f8b402008efe5_16
5fa8e47441a0 6bed756ced73 "kube-scheduler --au…" 17 hours ago Up 17 hours k8s_kube-scheduler_kube-scheduler-node1_kube-system_6f33d7866b72ca1b13c79edd42fa8dc6_14
2c8e07cf499f gcr.io/google_containers/pause-amd64:3.1 "/pause" 17 hours ago Up 17 hours k8s_POD_kube-scheduler-node1_kube-system_6f33d7866b72ca1b13c79edd42fa8dc6_3
2ca43282ea1c gcr.io/google_containers/pause-amd64:3.1 "/pause" 17 hours ago Up 17 hours k8s_POD_kube-controller-manager-node1_kube-system_0441d7804a7366fd957f8b402008efe5_3
9473644a3333 gcr.io/google_containers/pause-amd64:3.1 "/pause" 17 hours ago Up 17 hours k8s_POD_kube-apiserver-node1_kube-system_93ff1a9840f77f8b2b924a85815e17fe_3
and then I run:
docker exec -it 8fbcb67da963 /bin/sh
and then I try to run the following:
ETCDCTL_API=3 etcdctl --endpoints https://172.16.16.111:2379 --cacert /etc/ssl/etcd/ssl/ca.pem --key /etc/ssl/etcd/ssl/member-node1-key.pem --cert /etc/ssl/etcd/ssl/member-node1.pem get / --prefix=true -w json --debug
and here is the result I get:
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/member-node1.pem
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=2s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[https://172.16.16.111:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCDCTL_USER=
ETCDCTL_WRITE_OUT=json
INFO: 2020/06/24 15:44:07 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc420246c00
INFO: 2020/06/24 15:44:07 dialing to target with scheme: ""
INFO: 2020/06/24 15:44:07 could not get resolver for scheme: ""
INFO: 2020/06/24 15:44:07 balancerWrapper: is pickfirst: false
INFO: 2020/06/24 15:44:07 balancerWrapper: got update addr from Notify: [{172.16.16.111:2379 <nil>}]
INFO: 2020/06/24 15:44:07 ccBalancerWrapper: new subconn: [{172.16.16.111:2379 0 <nil>}]
INFO: 2020/06/24 15:44:07 balancerWrapper: handle subconn state change: 0xc4201708d0, CONNECTING
INFO: 2020/06/24 15:44:07 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc420246c00
Error: context deadline exceeded
Here is my etcd.env:
# Environment file for etcd v3.3.10
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://172.16.16.111:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://172.16.16.111:2380
ETCD_INITIAL_CLUSTER_STATE=existing
ETCD_METRICS=basic
ETCD_LISTEN_CLIENT_URLS=https://172.16.16.111:2379,https://127.0.0.1:2379
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://172.16.16.111:2380
ETCD_NAME=etcd1
ETCD_PROXY=off
ETCD_INITIAL_CLUSTER=etcd1=https://172.16.16.111:2380,etcd2=https://172.16.16.112:2380,etcd3=https://172.16.16.113:2380
ETCD_AUTO_COMPACTION_RETENTION=8
ETCD_SNAPSHOT_COUNT=10000
# TLS settings
ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=True
Update 1:
Here is my kubeadm-config.yaml:
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.16.16.111
bindPort: 6443
certificateKey: d73faece88f86e447eea3ca38f7b07e0a1f0bbb886567fee3b8cf8848b1bf8dd
nodeRegistration:
name: node1
taints: []
criSocket: /var/run/dockershim.sock
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
clusterName: cluster.local
etcd:
external:
endpoints:
- https://172.16.16.111:2379
- https://172.16.16.112:2379
- https://172.16.16.113:2379
caFile: /etc/ssl/etcd/ssl/ca.pem
certFile: /etc/ssl/etcd/ssl/node-node1.pem
keyFile: /etc/ssl/etcd/ssl/node-node1-key.pem
dns:
type: CoreDNS
imageRepository: docker.io/coredns
imageTag: 1.6.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.233.0.0/18
podSubnet: 10.233.64.0/18
kubernetesVersion: v1.16.6
controlPlaneEndpoint: 172.16.16.111:6443
certificatesDir: /etc/kubernetes/ssl
imageRepository: gcr.io/google-containers
apiServer:
extraArgs:
anonymous-auth: "True"
authorization-mode: Node,RBAC
bind-address: 0.0.0.0
insecure-port: "0"
apiserver-count: "1"
endpoint-reconciler-type: lease
service-node-port-range: 30000-32767
kubelet-preferred-address-types: "InternalDNS,InternalIP,Hostname,ExternalDNS,ExternalIP"
profiling: "False"
request-timeout: "1m0s"
enable-aggregator-routing: "False"
storage-backend: etcd3
runtime-config:
allow-privileged: "true"
extraVolumes:
- name: usr-share-ca-certificates
hostPath: /usr/share/ca-certificates
mountPath: /usr/share/ca-certificates
readOnly: true
certSANs:
- kubernetes
- kubernetes.default
- kubernetes.default.svc
- kubernetes.default.svc.cluster.local
- 10.233.0.1
- localhost
- 127.0.0.1
- node1
- lb-apiserver.kubernetes.local
- 172.16.16.111
- node1.cluster.local
timeoutForControlPlane: 5m0s
controllerManager:
extraArgs:
node-monitor-grace-period: 40s
node-monitor-period: 5s
pod-eviction-timeout: 5m0s
node-cidr-mask-size: "24"
profiling: "False"
terminated-pod-gc-threshold: "12500"
bind-address: 0.0.0.0
configure-cloud-routes: "false"
scheduler:
extraArgs:
bind-address: 0.0.0.0
extraVolumes:
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes:
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig:
qps: 5
clusterCIDR: 10.233.64.0/18
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: False
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: node1
iptables:
masqueradeAll: False
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: []
minSyncPeriod: 0s
scheduler: rr
syncPeriod: 30s
strictARP: False
metricsBindAddress: 127.0.0.1:10249
mode: ipvs
nodePortAddresses: []
oomScoreAdj: -999
portRange:
udpIdleTimeout: 250ms
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- 169.254.25.10
Update 2:
Contents of /etc/kubernetes/manigests/kube-apiserver.yaml:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=172.16.16.111
- --allow-privileged=true
- --anonymous-auth=True
- --apiserver-count=1
- --authorization-mode=Node,RBAC
- --bind-address=0.0.0.0
- --client-ca-file=/etc/kubernetes/ssl/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-aggregator-routing=False
- --enable-bootstrap-token-auth=true
- --endpoint-reconciler-type=lease
- --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem
- --etcd-certfile=/etc/ssl/etcd/ssl/node-node1.pem
- --etcd-keyfile=/etc/ssl/etcd/ssl/node-node1-key.pem
- --etcd-servers=https://172.16.16.111:2379,https://172.16.16.112:2379,https://172.16.16.113:2379
- --insecure-port=0
- --kubelet-client-certificate=/etc/kubernetes/ssl/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/ssl/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalDNS,InternalIP,Hostname,ExternalDNS,ExternalIP
- --profiling=False
- --proxy-client-cert-file=/etc/kubernetes/ssl/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/ssl/front-proxy-client.key
- --request-timeout=1m0s
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/ssl/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --runtime-config=
- --secure-port=6443
- --service-account-key-file=/etc/kubernetes/ssl/sa.pub
- --service-cluster-ip-range=10.233.0.0/18
- --service-node-port-range=30000-32767
- --storage-backend=etcd3
- --tls-cert-file=/etc/kubernetes/ssl/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/ssl/apiserver.key
image: gcr.io/google-containers/kube-apiserver:v1.16.6
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 172.16.16.111
path: /healthz
port: 6443
scheme: HTTPS
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-apiserver
resources:
requests:
cpu: 250m
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/ca-certificates
name: etc-ca-certificates
readOnly: true
- mountPath: /etc/ssl/etcd/ssl
name: etcd-certs-0
readOnly: true
- mountPath: /etc/kubernetes/ssl
name: k8s-certs
readOnly: true
- mountPath: /usr/local/share/ca-certificates
name: usr-local-share-ca-certificates
readOnly: true
- mountPath: /usr/share/ca-certificates
name: usr-share-ca-certificates
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/ca-certificates
type: DirectoryOrCreate
name: etc-ca-certificates
- hostPath:
path: /etc/ssl/etcd/ssl
type: DirectoryOrCreate
name: etcd-certs-0
- hostPath:
path: /etc/kubernetes/ssl
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /usr/local/share/ca-certificates
type: DirectoryOrCreate
name: usr-local-share-ca-certificates
- hostPath:
path: /usr/share/ca-certificates
type: ""
name: usr-share-ca-certificates
status: {}
I used kubespray to install the cluster.
How can I connect to the etcd? Any help would be appreciated.
This context deadline exceeded generally happens because of
Using wrong certificates. You could be using peer certificates instead of client certificates. You need to check the Kubernetes API Server parameters which will tell you where are the client certificates located because Kubernetes API Server is a client to ETCD. Then you can use those same certificates in the etcdctl command from the node.
The etcd cluster is not operational anymore because peer members are down.
Related
i'm new to openshift/kubernetes/docker and i was wondering where the docker registry of openshift origin persist the images , knowing that :
1.in the deployment's yaml of the docker registry , there is only emptyDir volumes declaration
volumes:
- emptyDir: {}
name: registry-storage
2.in the machine where the pod is deployed i can't see no volume using
docker volumes ls
3.the images are still persisted even if i restart the pod
docker registry deployment's yaml :
apiVersion: apps.openshift.io/v1
kind: DeploymentConfig
metadata:
creationTimestamp: '2020-04-26T18:16:50Z'
generation: 1
labels:
docker-registry: default
name: docker-registry
namespace: default
resourceVersion: '1844231'
selfLink: >-
/apis/apps.openshift.io/v1/namespaces/default/deploymentconfigs/docker-registry
uid: 1983153d-87ea-11ea-a4bc-fa163ee581f7
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
docker-registry: default
strategy:
activeDeadlineSeconds: 21600
resources: {}
rollingParams:
intervalSeconds: 1
maxSurge: 25%
maxUnavailable: 25%
timeoutSeconds: 600
updatePeriodSeconds: 1
type: Rolling
template:
metadata:
creationTimestamp: null
labels:
docker-registry: default
spec:
containers:
- env:
- name: REGISTRY_HTTP_ADDR
value: ':5000'
- name: REGISTRY_HTTP_NET
value: tcp
- name: REGISTRY_HTTP_SECRET
value:
- name: REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA
value: 'false'
- name: OPENSHIFT_DEFAULT_REGISTRY
value: 'docker-registry.default.svc:5000'
- name: REGISTRY_HTTP_TLS_CERTIFICATE
value: /etc/secrets/registry.crt
- name: REGISTRY_OPENSHIFT_SERVER_ADDR
value: 'docker-registry.default.svc:5000'
- name: REGISTRY_HTTP_TLS_KEY
value: /etc/secrets/registry.key
image: 'docker.io/openshift/origin-docker-registry:v3.11'
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 5000
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: registry
ports:
- containerPort: 5000
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 5000
scheme: HTTPS
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
requests:
cpu: 100m
memory: 256Mi
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /registry
name: registry-storage
- mountPath: /etc/secrets
name: registry-certificates
dnsPolicy: ClusterFirst
nodeSelector:
node-role.kubernetes.io/infra: 'true'
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: registry
serviceAccountName: registry
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: registry-storage
- name: registry-certificates
secret:
defaultMode: 420
secretName: registry-certificates
test: false
triggers:
- type: ConfigChange
status:
availableReplicas: 1
conditions:
- lastTransitionTime: '2020-04-26T18:17:12Z'
lastUpdateTime: '2020-04-26T18:17:12Z'
message: replication controller "docker-registry-1" successfully rolled out
reason: NewReplicationControllerAvailable
status: 'True'
type: Progressing
- lastTransitionTime: '2020-05-05T09:39:57Z'
lastUpdateTime: '2020-05-05T09:39:57Z'
message: Deployment config has minimum availability.
status: 'True'
type: Available
details:
causes:
- type: ConfigChange
message: config change
latestVersion: 1
observedGeneration: 1
readyReplicas: 1
replicas: 1
unavailableReplicas: 0
updatedReplicas: 1
to restart : i just delete the pod and a new one is created since i'm using a deployment
i'm creating the file in the /registry
Restarting does not mean the data is deleted, it still exist in the container top layer, suggest you get started by reading this.
Persistence is, for example in Kubernetes, when a pod is deleted and re-created on another node and still maintains the same state of a volume.
I'am running Kubernetes cluster on Google Cloud Platform via their Kubernetes Engine. Cluster version is 1.13.11-gke.14. PHP application pod contains 2 containers - Nginx as a reverse proxy and php-fpm (7.2).
In google cloud is used TCP Load Balancer and then internal routing via Nginx Ingress.
Problem is:
when I upload some bigger file (17MB), ingress is crashing with this error:
W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX:
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required.
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory
E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:12.314526990Z 10.132.0.25 - [10.132.0.25] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 10.102.2.4:80 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b
E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused
I 2019-12-01T14:26:13.156963Z Exiting with 0
Here is yaml configuration of Nginx ingress. Configuration is default by Gitlab's system that is creating cluster on their own.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2019-11-24T17:35:04Z"
generation: 3
labels:
app: nginx-ingress
chart: nginx-ingress-1.22.1
component: controller
heritage: Tiller
release: ingress
name: ingress-nginx-ingress-controller
namespace: gitlab-managed-apps
resourceVersion: "2638973"
selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-ingress
release: ingress
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: nginx-ingress
component: controller
release: ingress
spec:
containers:
- args:
- /nginx-ingress-controller
- --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
- --election-id=ingress-controller-leader
- --ingress-class=nginx
- --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
name: nginx-ingress-controller
ports:
- containerPort: 80
name: http
protocol: TCP
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources: {}
securityContext:
allowPrivilegeEscalation: true
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
runAsUser: 33
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/nginx/modsecurity/modsecurity.conf
name: modsecurity-template-volume
subPath: modsecurity.conf
- mountPath: /var/log/modsec
name: modsecurity-log-volume
- args:
- /bin/sh
- -c
- tail -f /var/log/modsec/audit.log
image: busybox
imagePullPolicy: Always
name: modsecurity-log
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/log/modsec
name: modsecurity-log-volume
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: ingress-nginx-ingress
serviceAccountName: ingress-nginx-ingress
terminationGracePeriodSeconds: 60
volumes:
- configMap:
defaultMode: 420
items:
- key: modsecurity.conf
path: modsecurity.conf
name: ingress-nginx-ingress-controller
name: modsecurity-template-volume
- emptyDir: {}
name: modsecurity-log-volume
I have no Idea what else to try. I'm running cluster on 3 nodes (2x 1vCPU, 1.5GB RAM and 1x Preemptile 2vCPU, 1,8GB RAM), all of them on SSD drives.
Anytime i upload the image, disk IO will get crazy.
Disk IOPS
Disk I/O
Thanks for your help.
Found solution. Nginx-ingress pod contained modsecurity too. All requests were analyzed by mod security and bigger uploaded files caused those crashes. It wasn't crash at all but took too much CPU and I/O, that caused longer healthcheck response to all other pods. Solution is to configure correctly modsecurity or disable.
I have a problem when I set kubelet parameter cluster-dns
My OS is CentOS Linux release 7.0.1406 (Core)
Kernel:Linux master 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
kubelet config file:
KUBELET_HOSTNAME="--hostname-override=master"
#KUBELET_API_SERVER="--api-servers=http://master:8080
KUBECONFIG="--kubeconfig=/root/.kube/config-demo"
KUBELET_DNS="–-cluster-dns=10.254.0.10"
KUBELET_DOMAIN="--cluster-domain=cluster.local"
# Add your own!
KUBELET_ARGS="--cgroup-driver=systemd --fail-swap-on=false --pod_infra_container_image=177.1.1.35/library/pause:latest"
config file:
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=4"
KUBE_ALLOW_PRIV="--allow-privileged=false"
KUBE_MASTER="--master=http://master:8080"
kubelet.service file:
[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/kubelet
ExecStart=/usr/bin/kubelet \
$KUBE_LOGTOSTDERR \
$KUBE_LOG_LEVEL \
$KUBELET_API_SERVER \
$KUBELET_DNS \
$KUBELET_DOMAIN \
$KUBELET_ADDRESS \
$KUBELET_PORT \
$KUBELET_HOSTNAME \
$KUBE_ALLOW_PRIV \
$KUBELET_ARGS \
$KUBECONFIG
Restart=on-failure
KillMode=process
[Install]
WantedBy=multi-user.target
When I start the kubelet service I can see the "--cluster-dns=10.254.0.10" parameter is correct set:
root 29705 1 1 13:24 ? 00:00:16 /usr/bin/kubelet --logtostderr=true --v=4 –-cluster-dns=10.254.0.10 --cluster-domain=cluster.local --hostname-override=master --allow-privileged=false --cgroup-driver=systemd --fail-swap-on=false --pod_infra_container_image=177.1.1.35/library/pause:latest --kubeconfig=/root/.kube/config-demo
But when I use systemctl status kubelet check the service the cluster-domain parameter just have only on "-" like:
systemctl status kubelet -l
● kubelet.service - Kubernetes Kubelet Server
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2018-07-13 13:24:07 CST; 5s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 29705 (kubelet)
Memory: 30.6M
CGroup: /system.slice/kubelet.service
└─29705 /usr/bin/kubelet --logtostderr=true --v=4 -cluster-dns=10.254.0.10 --cluster-domain=cluster.local --hostname-override=master --allow-privileged=false --cgroup-driver=systemd --fail-swap-on=false --pod_infra_container_image=177.1.1.35/library/pause:latest --kubeconfig=/root/.kube/config-demo
In the logs say there is nothing set in cluster-dns flag:
Jul 13 13:24:07 master kubelet: I0713 13:24:07.680625 29705 flags.go:27] FLAG: --cluster-dns="[]"
Jul 13 13:24:07 master kubelet: I0713 13:24:07.680636 29705 flags.go:27] FLAG: --cluster-domain="cluster.local"
The Pods with errors:
pod: "java-deploy-69c84746b9-b2d7j_default(ce02d183-864f-11e8-9bdb-525400c4f6bf)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
My kube-dns config file:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.254.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
---
#apiVersion: v1
#kind: ServiceAccount
#metadata:
# name: kube-dns
# namespace: kube-system
# labels:
# kubernetes.io/cluster-service: "true"
# addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-dns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: EnsureExists
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
# replicas: not specified here:
# 1. In order to make Addon Manager do not reconcile this replicas parameter.
# 2. Default is 1.
# 3. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
volumes:
- name: kube-dns-config
configMap:
name: kube-dns
optional: true
containers:
- name: kubedns
image: 177.1.1.35/library/kube-dns:1.14.8
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthcheck/kubedns
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-dir=/kube-dns-config
- --kube-master-url=http://177.1.1.40:8080
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
volumeMounts:
- name: kube-dns-config
mountPath: /kube-dns-config
- name: dnsmasq
image: 177.1.1.35/library/dnsmasq:1.14.8
livenessProbe:
httpGet:
path: /healthcheck/dnsmasq
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- -v=2
- -logtostderr
- -configDir=/etc/k8s/dns/dnsmasq-nanny
- -restartDnsmasq=true
- --
- -k
- --cache-size=1000
- --no-negcache
- --log-facility=-
- --server=/cluster.local/127.0.0.1#10053
- --server=/in-addr.arpa/127.0.0.1#10053
- --server=/ip6.arpa/127.0.0.1#10053
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 20Mi
volumeMounts:
- name: kube-dns-config
mountPath: /etc/k8s/dns/dnsmasq-nanny
- name: sidecar
image: 177.1.1.35/library/sidecar:1.14.8
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
- --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
- --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 20Mi
cpu: 10m
dnsPolicy: Default # Don't use cluster DNS.
#serviceAccountName: kube-dns
Recheck your kubelet config:
KUBELET_DNS="–-cluster-dns=10.254.0.10"
It seems to me that the first dash is longer than the second.
Maybe a copy&paste you made causes that strange character.
Retype it and retry.
I'm pretty new in kubernetes, I just install kubernetes via kubeadm and run dashboard UI but can't config access to it. Following docs I add line --basic-auth-file=/etc/kubernetes/auth.csv to /etc/kubernetes/manifests/kube-apiserver.yaml, create file and put in one string like pass,admin,admin. But after that api server crashed and back to normal after deleting this string and reboot the server. How I can pass this parametr to api server without api server crashing, and maybe something else need add or remove from this file? Here is my
kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --admission-control=Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
- --secure-port=6443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --requestheader-allowed-names=front-proxy-client
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-bootstrap-token-auth=true
- --allow-privileged=true
- --requestheader-username-headers=X-Remote-User
- --advertise-address=236.273.51.124
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --insecure-port=0
- --requestheader-group-headers=X-Remote-Group
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --service-cluster-ip-range=10.96.0.0/12
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --authorization-mode=Node,RBAC
- --etcd-servers=http://127.0.0.1:2379
image: gcr.io/google_containers/kube-apiserver-amd64:v1.8.0
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 6443
scheme: HTTPS
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-apiserver
resources:
requests:
cpu: 250m
volumeMounts:
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/pki
name: ca-certs-etc-pki
Your file for basic authentication /etc/kubernetes/auth.csv is not available inside kube-apiserver pod's container. It should be mounted to pod's container as well as certificate folders. Just add it to volumes and volumeMounts sections:
volumeMounts:
- mountPath: /etc/kubernetes/auth.csv
name: kubernetes-dashboard
readOnly: true
volumes:
- hostPath:
path: /etc/kubernetes/auth.csv
name: kubernetes-dashboard
I've setup Kubernetes with the steps below. Everything looks fine - but it's running on a single node/server.
Now I want to take the next step for running on multiple nodes. I wonder where should I configure my physical servers ip's so I could create the pod in more than one physical server.
I run:
hack/local-up-cluster.sh
then (In another terminal):
cluster/kubectl.sh config set-cluster local --server=http://127.0.0.1:8080 --insecure-skip-tls-verify=true
cluster/kubectl.sh config set-context local --cluster=local
cluster/kubectl.sh config use-context local
And:
cluster/kubectl.sh create -f run-aii.yaml
my run-aii.yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: aii
spec:
replicas: 1
template:
metadata:
labels:
run: aii
spec:
containers:
- name: aii
image: localhost:5000/dev/aii
ports:
- containerPort: 5144
env:
- name: KAFKA_IP
value: kafka
volumeMounts:
- mountPath: /root/script
name: scripts-data
readOnly: true
- mountPath: /home/aii/core
name: core-aii
readOnly: false
- mountPath: /home/aii/genome
name: genome-aii
readOnly: true
- mountPath: /home/aii/main
name: main-aii
readOnly: false
- name: kafka
image: localhost:5000/dev/kafkazoo
volumeMounts:
- mountPath: /root/script
name: scripts-data
readOnly: true
- mountPath: /root/config
name: config-data
readOnly: true
volumes:
- name: scripts-data
hostPath:
path: /home/aii/general/infra/script
- name: config-data
hostPath:
path: /home/aii/general/infra/config
- name: core-aii
hostPath:
path: /home/aii/general/core
- name: genome-aii
hostPath:
path: /home/aii/general/genome
- name: main-aii
hostPath:
path: /home/aii/general/main
- mountPath: /home/aii/main
name: main-aii
readOnly: false
- name: kafka
image: localhost:5000/dev/kafkazoo
volumeMounts:
- mountPath: /root/script
name: scripts-data
readOnly: true
- mountPath: /root/config
name: config-data
readOnly: true
volumes:
- name: scripts-data
hostPath:
path: /home/aii/general/infra/script
- name: config-data
hostPath:
path: /home/aii/general/infra/config
- name: core-aii
hostPath:
path: /home/aii/general/core
- name: genome-aii
hostPath:
path: /home/aii/general/genome
- name: main-aii
hostPath:
path: /home/aii/general/main
Additional info:
[aii#localhost kubernetes]$ cluster/kubectl.sh get pod
NAME READY STATUS RESTARTS AGE
aii-3934754246-yilg3 2/2 Running 0 59s
[aii#localhost kubernetes]$ cluster/kubectl.sh describe pod aii-3934754246-yilg3
Name: aii-3934754246-yilg3
Namespace: default
Node: 127.0.0.1/127.0.0.1
Start Time: Sun, 29 May 2016 16:58:20 +0300
Labels: pod-template-hash=3934754246,run=aii
Status: Running
IP: 172.17.0.4
Controllers: ReplicaSet/aii-3934754246
Containers:
aii:
Container ID: docker://71609cfd8e33c01a81a36770d12d884443a12b4c2969b95e3042d9dee4fb455b
Image: localhost:5000/dev/aii
Image ID: docker://sha256:7e70fbb724962b2f23c9439a1c00356deb551fd96ffd27a8afa6340fc903e735
Port: 5144/TCP
QoS Tier:
memory: BestEffort
cpu: BestEffort
State: Running
Started: Sun, 29 May 2016 16:58:23 +0300
Ready: True
Restart Count: 0
Environment Variables:
KAFKA_IP: kafka
kafka:
Container ID: docker://6eb891e5968cf1106b26a9f3f7db881683a8e15dd59b1858435715580c90656c
Image: localhost:5000/dev/kafkazoo
Image ID: docker://sha256:b78e60582cbc8d3c4946807baf59552d110c7802c8204157e6fba509b96bc11c
Port:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Running
Started: Sun, 29 May 2016 16:58:24 +0300
Ready: True
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready True
Volumes:
scripts-data:
Type: HostPath (bare host directory volume)
Path: /home/aii/general/infra/script
config-data:
Type: HostPath (bare host directory volume)
Path: /home/aii/general/infra/config
core-aii:
Type: HostPath (bare host directory volume)
Path: /home/aii/general/core
genome-aii:
Type: HostPath (bare host directory volume)
Path: /home/aii/general/genome
main-aii:
Type: HostPath (bare host directory volume)
Path: /home/aii/general/main
default-token-5z9rd:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5z9rd
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {default-scheduler } Normal Scheduled Successfully assigned aii-3934754246-yilg3 to 127.0.0.1
1m 1m 1 {kubelet 127.0.0.1} spec.containers{aii} Normal Pulling pulling image "localhost:5000/dev/aii"
1m 1m 1 {kubelet 127.0.0.1} spec.containers{aii} Normal Pulled Successfully pulled image "localhost:5000/dev/aii"
1m 1m 1 {kubelet 127.0.0.1} spec.containers{aii} Normal Created Created container with docker id 71609cfd8e33
1m 1m 1 {kubelet 127.0.0.1} spec.containers{aii} Normal Started Started container with docker id 71609cfd8e33
1m 1m 1 {kubelet 127.0.0.1} spec.containers{kafka} Normal Pulling pulling image "localhost:5000/dev/kafkazoo"
1m 1m 1 {kubelet 127.0.0.1} spec.containers{kafka} Normal Pulled Successfully pulled image "localhost:5000/dev/kafkazoo"
1m 1m 1 {kubelet 127.0.0.1} spec.containers{kafka} Normal Created Created container with docker id 6eb891e5968c
1m 1m 1 {kubelet 127.0.0.1} spec.containers{kafka} Normal Started Started container with docker id 6eb891e5968c
Sounds like you want to setup a multi-node cluster, there are numerous ways to do that (http://kubernetes.io/docs/getting-started-guides/).
If you want a local solution on your machine, the vagrant way or the docker way is pretty straight forward.
If looking for a cloud, then GCE is the next easiest (https://cloud.google.com/container-engine/).
Once you have your multi-node cluster setup, then when you deploy your pod, it will be scheduled to a node in the cluster.
The only gotcha to be careful of given your manifest above, is you are using HostPaths for all your volume mounts. This is fine when you know what machine you are running the pod on, however, you should be abstracting yourself away from that.
To better solve you'll need to look into some persistent volumes that are not host specific. BUT for now, you can get it working. =)