After installing this is what my pods look like
Running pods
NAME READY STATUS RESTARTS AGE
elk-elasticsearch-client-5ffc974f8-987zv 1/1 Running 0 21m
elk-elasticsearch-curator-1582107120-4f2wm 0/1 Completed 0 19m
elk-elasticsearch-data-0 0/1 Pending 0 21m
elk-elasticsearch-exporter-84ff9b656d-t8vw2 1/1 Running 0 21m
elk-elasticsearch-master-0 1/1 Running 0 21m
elk-elasticsearch-master-1 1/1 Running 0 20m
elk-filebeat-4sxn9 0/2 Init:CrashLoopBackOff 9 21m
elk-kibana-77b97d7c69-d4jzz 1/1 Running 0 21m
elk-logstash-0 0/2 Pending 0 21m
So filebeat refuses to start.
Getting the logs from this node I get
Exiting: Couldn't connect to any of the configured Elasticsearch hosts. Errors: [Error connection to Elasticsearch http://elk-elasticsearch-client.elk.svc:9200: Get http://elk-elasticsearch-client.elk.svc:9200: lookup elk-elasticsearch-client.elk.svc on 10.96.0.10:53: no such host]
Also when trying to access the kibana node (the only node i can call using http) I get that it is not ready.
get pv:
pvc-9b9b13d8-48d2-4a79-a10c-8d1278554c75 4Gi RWO Delete Bound default/data-elk-elasticsearch-master-0 standard 113m
pvc-d8b361d7-8e04-4300-a0f8-c79f7cea7e44 4Gi RWO Delete Bound default/data-elk-elasticsearch-master-1 standard 112m
I'm running minikube with the none vm-driver which it tells me, does not respect the memory or cpu-flag. But I don't get it complaining about resources
kubectl version 1.17
docker version i 19.03.5, build 633a0ea838
minikube version 1.6.2
The elk stack was installed using helm.
I have the following versions:
elasticsearch-1.32.2.tgz
elasticsearch-curator-2.1.3.tgz
elasticsearch-exporter-2.2.0.tgz
filebeat-4.0.0.tgz
kibana-3.2.6.tgz
logstash-2.4.0.tgz
Running on ubuntu 18.04
Tearing everything down and then installing the required components from other helm-charts solved the issues. It may be that the charts I was using were not intended to run locally on minikube.
Related
I feel I have created an abomination. The goal of what I am doing is to run a docker image and start the AWX web application an be able to use AWX on my local machine. The issue with this is that AWX uses kubernetes to run. I have created an image that is able to run kubernetes and the AWX application inside a container. The final output after running my bash script in the container to start AWX looks like this:
NAMESPACE NAME READY STATUS RESTARTS AGE
awx-operator-system awx-demo-586bd67d59-vj79v 4/4 Running 0 3m14s
awx-operator-system awx-demo-postgres-0 1/1 Running 0 4m11s
awx-operator-system awx-operator-controller-manager-5b4fdf998d-7tzgh 2/2 Running 0 5m4s
ingress-nginx ingress-nginx-admission-create-pfcqs 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-admission-patch-8rghp 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-controller-755dfbfc65-f7vm7 1/1 Running 0 5m33s
kube-system coredns-6d4b75cb6d-4lnvw 1/1 Running 0 5m33s
kube-system etcd-minikube 1/1 Running 0 5m46s
kube-system kube-apiserver-minikube 1/1 Running 0 5m45s
kube-system kube-controller-manager-minikube 1/1 Running 0 5m45s
kube-system kube-proxy-ddnh7 1/1 Running 0 5m34s
kube-system kube-scheduler-minikube 1/1 Running 0 5m45s
kube-system storage-provisioner 1/1 Running 1 (5m33s ago) 5m43s
go to http://192.168.49.2:30085 , the username is admin and the password is XL8aBJPy16ziBau84v63QJLNVw2JGmnb
So I believe that it is running and starting properly. The IP address 192.168.49.2 is the IP of one of the kubernetes pods. I have been struggeling to forward the info coming from this pod to my local machine. I have been trying to go from Kubernetes pod -> docker localhost -> local machine local host.
I have tried using kubectl proxy, host.docker.internal curl and a few other with no success. However I might be using these in the wrong form.
I understand that docker containers run in a very isolated environment so is it possible to forward this information from the pod to my local machine?
Thanks for your time!
I have built a small, single user, internal service that stores data in a single JSON blob on disk (it uses tinydb) -- thus the service is not designed to be run on multiple nodes to ensure data consistency. Unfortunately, when I send API requests I get back inconsistent results -- it appears the API is writing to different on-disk files and thus returning inconsistent results (if I call the API twice for a list of objects, it will return one of two different versions).
I deployed the service to Google Cloud (put it into a container, pushed to gcr.io). I created a cluster with a single node and deployed the docker image to the cluster. I then created a service to expose port 80. (Followed the tutorial here: https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app)
I confirmed that only a single node and single pod was running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
XXXXX-2-69db8f8765-8cdkd 1/1 Running 0 28m
kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-1-default-pool-4f369c90-XXXX Ready <none> 28m v1.14.10-gke.24
I also tried to check if multiple containers might be running in the pod, but only one container of my app seems to be running (my app is the first one, with the XXXX):
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default XXXXX-69db8f8765-8cdkd 1/1 Running 0 31m
kube-system event-exporter-v0.2.5-7df89f4b8f-x6v9p 2/2 Running 0 31m
kube-system fluentd-gcp-scaler-54ccb89d5-p9qgl 1/1 Running 0 31m
kube-system fluentd-gcp-v3.1.1-bmxnh 2/2 Running 0 31m
kube-system heapster-gke-6f86bf7b75-pvf45 3/3 Running 0 29m
kube-system kube-dns-5877696fb4-sqnw6 4/4 Running 0 31m
kube-system kube-dns-autoscaler-8687c64fc-nm4mz 1/1 Running 0 31m
kube-system kube-proxy-gke-cluster-1-default-pool-4f369c90-7g2h 1/1 Running 0 31m
kube-system l7-default-backend-8f479dd9-9jsqr 1/1 Running 0 31m
kube-system metrics-server-v0.3.1-5c6fbf777-vqw5b 2/2 Running 0 31m
kube-system prometheus-to-sd-6rgsm 2/2 Running 0 31m
kube-system stackdriver-metadata-agent-cluster-level-7bd5779685-nbj5n 2/2 Running 0 30m
Any thoughts on how to fix this? I know "use a real database" is a simple answer, but the app is pretty lightweight and does not need that complexity. Our company uses GCloud + Kubernetes so I want to stick with this infrastructure.
Files written inside the container (i.e. not to a persistent volume of some kind) will disappear when then container is restarted for any reason. In fact you should really have the file permissions set up to prevent writing to files in the image except maybe /tmp or similar. You should use a GCE disk persistent volume and it will probably work better :)
I got Kubernetes Cluster with 1 master and 3 workers nodes.
calico v3.7.3 kubernetes v1.16.0 installed via kubespray https://github.com/kubernetes-sigs/kubespray
Before that, I normally deployed all the pods without any problems.
I can't start a few pod (Ceph):
kubectl get all --namespace=ceph
NAME READY STATUS RESTARTS AGE
pod/ceph-cephfs-test 0/1 Pending 0 162m
pod/ceph-mds-665d849f4f-fzzwb 0/1 Pending 0 162m
pod/ceph-mon-744f6dc9d6-jtbgk 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-744f6dc9d6-mqwgb 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-744f6dc9d6-zthpv 0/1 CrashLoopBackOff 24 162m
pod/ceph-mon-check-6f474c97f-gjr9f 1/1 Running 0 162m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ceph-mon ClusterIP None <none> 6789/TCP 162m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/ceph-osd 0 0 0 0 0 node-type=storage 162m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ceph-mds 0/1 1 0 162m
deployment.apps/ceph-mon 0/3 3 0 162m
deployment.apps/ceph-mon-check 1/1 1 1 162m
NAME DESIRED CURRENT READY AGE
replicaset.apps/ceph-mds-665d849f4f 1 1 0 162m
replicaset.apps/ceph-mon-744f6dc9d6 3 3 0 162m
replicaset.apps/ceph-mon-check-6f474c97f 1 1 1 162m
But another obe is ok:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6d57b44787-xlj89 1/1 Running 19 24d
calico-node-dwm47 1/1 Running 310 19d
calico-node-hhgzk 1/1 Running 15 24d
calico-node-tk4mp 1/1 Running 309 19d
calico-node-w7zvs 1/1 Running 312 19d
coredns-74c9d4d795-jrxjn 1/1 Running 0 2d23h
coredns-74c9d4d795-psf2v 1/1 Running 2 18d
dns-autoscaler-7d95989447-7kqsn 1/1 Running 10 24d
kube-apiserver-master 1/1 Running 4 24d
kube-controller-manager-master 1/1 Running 3 24d
kube-proxy-9bt8m 1/1 Running 2 19d
kube-proxy-cbrcl 1/1 Running 4 19d
kube-proxy-stj5g 1/1 Running 0 19d
kube-proxy-zql86 1/1 Running 0 19d
kube-scheduler-master 1/1 Running 3 24d
kubernetes-dashboard-7c547b4c64-6skc7 1/1 Running 591 24d
nginx-proxy-worker1 1/1 Running 2 19d
nginx-proxy-worker2 1/1 Running 0 19d
nginx-proxy-worker3 1/1 Running 0 19d
nodelocaldns-6t92x 1/1 Running 2 19d
nodelocaldns-kgm4t 1/1 Running 0 19d
nodelocaldns-xl8zg 1/1 Running 0 19d
nodelocaldns-xwlwk 1/1 Running 12 24d
tiller-deploy-8557598fbc-7f2w6 1/1 Running 0 131m
I use Centos 7:
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
The error log:
Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000×tamps=true: dial tcp 10.2.67.203:10250: connect: no route to host
Maybe someone came across this and can help me? I will provide any additional information
logs from pending pods:
Warning FailedScheduling 98s (x125 over 3h1m) default-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.
It seems that a firewall is blocking ingress traffic from port 10250 on the 10.2.67.203 node.
You can open it by running the commands below (I'm assuming firewalld is installed or you can run the commands of the equivalent firewall module):
sudo firewall-cmd --add-port=10250/tcp --permanent
sudo firewall-cmd --reload
sudo firewall-cmd --list-all # you should see that port `10250` is updated
tl;dr; It looks like your cluster itself is fairly broken and should be repaired before looking at Ceph specifically
Get https://10.2.67.203:10250/containerLogs/ceph/ceph-mon-744f6dc9d6-mqwgb/ceph-mon?tailLines=5000×tamps=true: dial tcp 10.2.67.203:10250: connect: no route to host
10250 is the port that the Kubernetes API server uses to connect to a node's Kubelet to retrieve the logs.
This error indicates that the Kubernetes API server is unable to reach the node. This has nothing to do with your containers, pods or even your CNI network. no route to host indicates that either:
The host is unavailable
A network segmentation has occurred
The Kubelet is unable to answer the API server
Before addressing issues with the Ceph pods I would investigate why the Kubelet isn't reachable from the API server.
After you have solved the underlying network connectivity issues I would address the crash-looping Calico pods (You can see the logs of the previously executed containers by running kubectl logs -n kube-system calico-node-dwm47 -p).
Once you have both the underlying network and the pod network sorted I would address the issues with the Kubernetes Dashboard crash-looping, and finally, start to investigate why you are having issues deploying Ceph.
I'm using Kubeadm to create a cluster of 3 nodes
One Master
Two Workers
I'm using weave as the network pod
The status of my cluster is this:
NAME STATUS ROLES AGE VERSION
darthvader Ready <none> 56m v1.12.3
jarjar Ready master 60m v1.12.3
palpatine Ready <none> 55m v1.12.3
And I tried to init helm and tiller in my cluster
helm init
The result was this:
$HELM_HOME has been configured at /home/ubuntu/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
Happy Helming!
And the status of my pods is this:
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-8q6j7 1/1 Running 0 54m
coredns-576cbf47c7-kkvd8 1/1 Running 0 54m
etcd-jarjar 1/1 Running 0 54m
kube-apiserver-jarjar 1/1 Running 0 54m
kube-controller-manager-jarjar 1/1 Running 0 53m
kube-proxy-2lwgd 1/1 Running 0 49m
kube-proxy-jxwqq 1/1 Running 0 54m
kube-proxy-mv7vh 1/1 Running 0 50m
kube-scheduler-jarjar 1/1 Running 0 54m
tiller-deploy-845cffcd48-bqnht 0/1 ContainerCreating 0 12m
weave-net-5h5hw 2/2 Running 0 51m
weave-net-jv68s 2/2 Running 0 50m
weave-net-vsg2f 2/2 Running 0 49m
The problem is that tiller is stuck in ContainerCreating State.
And I ran
kubectl describe pod tiller-deploy -n kube-system
To check the status of tiller and I found The Next error:
Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Pod sandbox changed, it will be killed and re-created.
How I can to create the tiller deploy pod successfully? I don't understand why the pod sandbox is failing.
Maybe the problem is in the way you deployed Tiller. I just recreated this and had no issues using Weave and Compute Engine instances on GCP.
You should retry with different method of installing helm as maybe there was some issue (you did not provide details on how did you install it).
Reset helm and delete tiller pod:
helm reset --force(if the tiller persists check the name of the replicaset with tiller kubectl get all --all-namespaces and kubectl delete rs/name)
Now try deploying helm and tiller using different method. For example running it through the script:
As explained here.
You can also run Helm without Tiller.
It looks like you are running into this.
Most likely your node cannot pull the container image because of a networking connectivity problem. Something image like this: gcr.io/kubernetes-helm/tiller:v2.3.1 or the pause container gcr.io/google_containers/pause (unlikely if your other pods are running). You can try logging into your nodes (darthvader, palpatine) and manually debug with:
$ docker pull gcr.io/kubernetes-helm/tiller:v2.3.1 <= Use the version on your tiller pod spec or deployment (tiller-deploy)
$ docker pull gcr.io/google_containers/pause
I have created a k8s cluster with RHEL7 with kubernetes packages GitVersion:"v1.8.1". I'm trying to deploy wordpress on my custom cluster. But pod creation is always stuck in ContainerCreating state.
phani#k8s-master]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default wordpress-766d75457d-zlvdn 0/1 ContainerCreating 0 11m
kube-system etcd-k8s-master 1/1 Running 0 1h
kube-system kube-apiserver-k8s-master 1/1 Running 0 1h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 1h
kube-system kube-dns-545bc4bfd4-bb8js 3/3 Running 0 1h
kube-system kube-proxy-bf4zr 1/1 Running 0 1h
kube-system kube-proxy-d7zvg 1/1 Running 0 34m
kube-system kube-scheduler-k8s-master 1/1 Running 0 1h
kube-system weave-net-92zf9 2/2 Running 0 34m
kube-system weave-net-sh7qk 2/2 Running 0 1h
Docker Version:1.13.1
Pod status from descibe command
Normal Scheduled 18m default-scheduler Successfully assigned wordpress-766d75457d-zlvdn to worker1
Normal SuccessfulMountVolume 18m kubelet, worker1 MountVolume.SetUp succeeded for volume "default-token-tmpcm"
Warning DNSSearchForming 18m kubelet, worker1 Search Line limits were exceeded, some dns names have been omitted, the applied search line is: default.svc.cluster.local svc.cluster.local cluster.local
Warning FailedCreatePodSandBox 14m kubelet, worker1 Failed create pod sandbox.
Warning FailedSync 25s (x8 over 14m) kubelet, worker1 Error syncing pod
Normal SandboxChanged 24s (x8 over 14m) kubelet, worker1 Pod sandbox changed, it will be killed and re-created.
from the kubelet log I observed below error on worker
error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
But kubelet is stable no problems seen on worker.
How do I solve this problem?
I checked the cni failure, I couldn't find anything.
~]# ls /opt/cni/bin
bridge cnitool dhcp flannel host-local ipvlan loopback macvlan noop ptp tuning weave-ipam weave-net weave-plugin-2.3.0
In journal logs below messages are repetitively appeared . seems like scheduler is trying to create the container all the time.
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421184 14339 remote_runtime.go:115] StopPodSandbox "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421212 14339 kuberuntime_manager.go:780] Failed to stop sandbox {"docker" "47da29873230d830f0ee21adfdd3b06ed0c653a0001c29289fe78446d27d2304"}
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421247 14339 kuberuntime_manager.go:580] killPodWithSyncResult failed: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Jun 08 11:25:22 worker1 kubelet[14339]: E0608 11:25:22.421262 14339 pod_workers.go:182] Error syncing pod 7f1c6bf1-6af3-11e8-856b-fa163e3d1891 ("wordpress-766d75457d-spdrb_default(7f1c6bf1-6af3-11e8-856b-fa163e3d1891)"), skipping: failed to "KillPodSandbox" for "7f1c6bf1-6af3-11e8-856b-fa163e3d1891" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Failed create pod sandbox.
... is almost always a CNI failure; I would check on the node that all the weave containers are happy, and that /opt/cni/bin is present (or its weave equivalent)
You may have to check both the journalctl -u kubelet.service as well as the docker logs for any containers running to discover the full scope of the error on the node.
It's seem to working by removing the$KUBELET_NETWORK_ARGS in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
I have removed $KUBELET_NETWORK_ARGS and restarted the worker node then pods got deployed successfully.
As Matthew said it's most likely a CNI failure.
First, find the node this pod is running on:
kubectl get po wordpress-766d75457d-zlvdn -o wide
Next in the node where the pod is located check /etc/cni/net.d if you have more than one .conf then you can delete one and restart the node.
source: https://github.com/kubernetes/kubeadm/issues/578.
note this is one of the solutions.
While hopefully it's no one else's problem, for me, this happened when part of my filesystem was full.
I had pods stuck in ContainerCreating only on one node in my cluster. I also had a bunch of pods which I expected to shutdown, but hadn't. Someone recommended running
sudo systemctl status kubelet -l
which showed me a bunch of lines like
Jun 18 23:19:56 worker01 kubelet[1718]: E0618 23:19:56.461378 1718 kuberuntime_manager.go:647] createPodSandbox for pod "REDACTED(2c681b9c-cf5b-11eb-9c79-52540077cc53)" failed: mkdir /var/log/pods/2c681b9c-cf5b-11eb-9c79-52540077cc53: no space left on device
I confirmed that I was out of space with
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 189G 0 189G 0% /dev
tmpfs 189G 0 189G 0% /sys/fs/cgroup
/dev/mapper/vg01-root 20G 7.0G 14G 35% /
/dev/mapper/vg01-tmp 4.0G 34M 4.0G 1% /tmp
/dev/mapper/vg01-home 4.0G 72M 4.0G 2% /home
/dev/mapper/vg01-varlog 10G 10G 20K 100% /var/log
/dev/mapper/vg01-varlogaudit 2.0G 68M 2.0G 4% /var/log/audit
I just had to clear out that dir (and did some manual cleanup on all the pending pods and pods that were stuck running).