How to debug kuberenetes system pod elections on masters? - docker

I'm trying to make SSL work with kubernetes, but am stuck with leader election problem. So I think I should be seeing scheduler and controller system pods somewhere, while all I have is this:
kubectl get po --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-10.255.12.200 1/1 Running 0 18h
kube-apiserver-10.255.16.111 1/1 Running 0 20h
kube-apiserver-10.255.17.12 1/1 Running 0 20h
scheduler-master-10.255.12.200 2/2 Running 0 20h
scheduler-master-10.255.16.111 2/2 Running 0 20h
scheduler-master-10.255.17.12 2/2 Running 0 20h
as for comparison, on my other clusters, I can see more pods:
kubectl get po --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-10.255.0.248 1/1 Running 1 30d
kube-apiserver-10.255.1.112 1/1 Running 1 30d
kube-apiserver-10.255.1.216 1/1 Running 1 30d
kube-controller-manager-10.255.1.216 1/1 Running 3 30d
kube-scheduler-10.255.1.216 1/1 Running 1 30d
scheduler-master-10.255.0.248 2/2 Running 2 30d
scheduler-master-10.255.1.112 2/2 Running 2 30d
scheduler-master-10.255.1.216 2/2 Running 2 30d
Does anybody knows how to debug this ? Pod logs doesn't show much, and my pods are stuck in pending state.

Several things could be happening:
Were the two clusters created the same way? Note that, depending on the cluster, some services like the scheduler or the controller could be running on bare metal. If that is the case, then you should check the logs inside the node using, for instance, systemctl status <name-of-the-service>.service (in case it is using systemd).
In case these modules are meant to be running as pods, then I would advise you to go to the master node and check /etc/kubernetes/manifests. You should find the maniphests of the services you are looking for. If they are not, then you have the reason why you do not find the pods in the system. If the pods are there, then check if the arguments are properly set (especially the --leader-elect one)

As mentioned by Javier Salmeron check the manifest folder. If they do exist there then check the log output of kubelet. If the missing pods try to start yout could also check the logs with "docker logs"

Related

How to connect to an application that using kubernetes and is running in a docker container from local machine?

I feel I have created an abomination. The goal of what I am doing is to run a docker image and start the AWX web application an be able to use AWX on my local machine. The issue with this is that AWX uses kubernetes to run. I have created an image that is able to run kubernetes and the AWX application inside a container. The final output after running my bash script in the container to start AWX looks like this:
NAMESPACE NAME READY STATUS RESTARTS AGE
awx-operator-system awx-demo-586bd67d59-vj79v 4/4 Running 0 3m14s
awx-operator-system awx-demo-postgres-0 1/1 Running 0 4m11s
awx-operator-system awx-operator-controller-manager-5b4fdf998d-7tzgh 2/2 Running 0 5m4s
ingress-nginx ingress-nginx-admission-create-pfcqs 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-admission-patch-8rghp 0/1 Completed 0 5m33s
ingress-nginx ingress-nginx-controller-755dfbfc65-f7vm7 1/1 Running 0 5m33s
kube-system coredns-6d4b75cb6d-4lnvw 1/1 Running 0 5m33s
kube-system etcd-minikube 1/1 Running 0 5m46s
kube-system kube-apiserver-minikube 1/1 Running 0 5m45s
kube-system kube-controller-manager-minikube 1/1 Running 0 5m45s
kube-system kube-proxy-ddnh7 1/1 Running 0 5m34s
kube-system kube-scheduler-minikube 1/1 Running 0 5m45s
kube-system storage-provisioner 1/1 Running 1 (5m33s ago) 5m43s
go to http://192.168.49.2:30085 , the username is admin and the password is XL8aBJPy16ziBau84v63QJLNVw2JGmnb
So I believe that it is running and starting properly. The IP address 192.168.49.2 is the IP of one of the kubernetes pods. I have been struggeling to forward the info coming from this pod to my local machine. I have been trying to go from Kubernetes pod -> docker localhost -> local machine local host.
I have tried using kubectl proxy, host.docker.internal curl and a few other with no success. However I might be using these in the wrong form.
I understand that docker containers run in a very isolated environment so is it possible to forward this information from the pod to my local machine?
Thanks for your time!

Kubernetes service seems to go to multiple containers, despite only one running container

I have built a small, single user, internal service that stores data in a single JSON blob on disk (it uses tinydb) -- thus the service is not designed to be run on multiple nodes to ensure data consistency. Unfortunately, when I send API requests I get back inconsistent results -- it appears the API is writing to different on-disk files and thus returning inconsistent results (if I call the API twice for a list of objects, it will return one of two different versions).
I deployed the service to Google Cloud (put it into a container, pushed to gcr.io). I created a cluster with a single node and deployed the docker image to the cluster. I then created a service to expose port 80. (Followed the tutorial here: https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app)
I confirmed that only a single node and single pod was running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
XXXXX-2-69db8f8765-8cdkd 1/1 Running 0 28m
kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-1-default-pool-4f369c90-XXXX Ready <none> 28m v1.14.10-gke.24
I also tried to check if multiple containers might be running in the pod, but only one container of my app seems to be running (my app is the first one, with the XXXX):
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default XXXXX-69db8f8765-8cdkd 1/1 Running 0 31m
kube-system event-exporter-v0.2.5-7df89f4b8f-x6v9p 2/2 Running 0 31m
kube-system fluentd-gcp-scaler-54ccb89d5-p9qgl 1/1 Running 0 31m
kube-system fluentd-gcp-v3.1.1-bmxnh 2/2 Running 0 31m
kube-system heapster-gke-6f86bf7b75-pvf45 3/3 Running 0 29m
kube-system kube-dns-5877696fb4-sqnw6 4/4 Running 0 31m
kube-system kube-dns-autoscaler-8687c64fc-nm4mz 1/1 Running 0 31m
kube-system kube-proxy-gke-cluster-1-default-pool-4f369c90-7g2h 1/1 Running 0 31m
kube-system l7-default-backend-8f479dd9-9jsqr 1/1 Running 0 31m
kube-system metrics-server-v0.3.1-5c6fbf777-vqw5b 2/2 Running 0 31m
kube-system prometheus-to-sd-6rgsm 2/2 Running 0 31m
kube-system stackdriver-metadata-agent-cluster-level-7bd5779685-nbj5n 2/2 Running 0 30m
Any thoughts on how to fix this? I know "use a real database" is a simple answer, but the app is pretty lightweight and does not need that complexity. Our company uses GCloud + Kubernetes so I want to stick with this infrastructure.
Files written inside the container (i.e. not to a persistent volume of some kind) will disappear when then container is restarted for any reason. In fact you should really have the file permissions set up to prevent writing to files in the image except maybe /tmp or similar. You should use a GCE disk persistent volume and it will probably work better :)

Is there some ways to manage Kubernetes image offline?

I'm new to kubernetes. Recently, I was successfull to manage kubernetes with online server. But, when I move to isolated area (offline server) I can't deploy kubectl image. But all of my environment are running well and I got stuck in this. The different just internet connection.
Currently, I can't deploy kubernetes dashboard and some images in offline server. This example of my kubectl command in offline server (I was downloaded the tar file in online server) :
# docker load < nginx.tar
# kubectl create deployment test-nginx --image=nginx
# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default test-nginx-7d97ffc85d-2s4lh 0/1 ImagePullBackOff 0 50s
kube-system coredns-6955765f44-2s54f 1/1 Running 1 26h
kube-system coredns-6955765f44-wmtq9 1/1 Running 1 26h
kube-system etcd-devkubeapp01 1/1 Running 1 26h
kube-system kube-apiserver-devkubeapp01 1/1 Running 1 26h
kube-system kube-controller-manager-devkubeapp01 1/1 Running 1 26h
kube-system kube-flannel-ds-amd64-czn8z 1/1 Running 0 26h
kube-system kube-flannel-ds-amd64-d58x4 1/1 Running 0 26h
kube-system kube-flannel-ds-amd64-z9w9x 1/1 Running 0 26h
kube-system kube-proxy-9wxj2 1/1 Running 0 26h
kube-system kube-proxy-mr76b 1/1 Running 1 26h
kube-system kube-proxy-w5pvm 1/1 Running 0 26h
kube-system kube-scheduler-devkubeapp01 1/1 Running 1 26h
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
devkubeapp01 Ready master 26h v1.17.2
devkubeapp02 Ready minion1 26h v1.17.2
devkubeapp03 Ready minion2 25h v1.17.2
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nginx latest 5ad3bd0e67a9 6 days ago 127MB
k8s.gcr.io/kube-proxy v1.17.2 cba2a99699bd 10 days ago 116MB
k8s.gcr.io/kube-apiserver v1.17.2 41ef50a5f06a 10 days ago 171MB
k8s.gcr.io/kube-controller-manager v1.17.2 da5fd66c4068 10 days ago 161MB
k8s.gcr.io/kube-scheduler v1.17.2 f52d4c527ef2 10 days ago 94.4MB
k8s.gcr.io/coredns 1.6.5 70f311871ae1 2 months ago 41.6MB
k8s.gcr.io/etcd 3.4.3-0 303ce5db0e90 3 months ago 288MB
quay.io/coreos/flannel v0.11.0-amd64 ff281650a721 12 months ago 52.6MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 2 years ago 742kB
My Pod cant running well, so the status CreatingContainer turn into ImagePullBackOff (I was try in online server when I disconnected the Internet the status is same => ImagePullBackOff). Anyone can help to solve this ? Does kubernetes support offline environment to deploy the image ?
Thanks.
As already stated in my previous comment:
I suspect that your imagePullPolicy might be misconfigured.
and further proven by the logs you have provided:
Error from server (BadRequest): container "nginx" in pod
"test-nginx-7d97ffc85d-2s4lh" is waiting to start: trying and failing
to pull image
the problem lays within the imagePullPolicy configuration.
As stated in the official documentation:
Pre-pulled Images
By default, the kubelet will try to pull each image from the specified
registry. However, if the imagePullPolicy property of the container
is set to IfNotPresent or Never, then a local image is used
(preferentially or exclusively, respectively).
If you want to rely on pre-pulled images as a substitute for registry
authentication, you must ensure all nodes in the cluster have the same
pre-pulled images.
So basically as already mentioned by #Eduardo you need to make sure that you have the same images on all nodes and your imagePullPolicy is correctly configured.
However, make sure the container always uses the same version of the image, you can specify its digest, for example sha256:45b23dee08af5e43a7fea6c4cf9c25ccf269ee113168c19722f87876677c5cb2. The digest uniquely identifies a specific version of the image, so it is never updated by Kubernetes unless you change the digest value.
This way you would similar avoid issues in the future as keeping the exact same version of the image cluster wide is the biggest trap in this scenario.
I hope this helps and expands on the previous answer (which is correct) as well as proves my point from the very beginning.
Using an offline environment, you will need to pre-load the docker images on all your nodes and make sure to use the proper imagePullPolicy to prevent Kubernetes from downloading container images.
You need to:
docker load < nginx.tar in all nodes
Make sure the deployment is using imagePullPolicy with value IfNotPresent or Never (the default value is Always, which might be your problem).

Pods are not getting created and not running after a long time also

kubectl get pods
NAME READY STATUS RESTARTS AGE
cassandra-0 0/1 Pending 0 9h
cd-jenkins-7fb5d96d69-v9svc 1/1 Running 0 9h
hello-1571555340-872pt 0/1 Completed 0 2m17s
hello-1571555400-5wzrk 0/1 Completed 0 77s
hello-1571555460-spjm6 0/1 Completed 0 16s
webpod 0/2 ContainerCreating 0 10h
wordpress-557bfb4d8b-bcbs7 0/1 CreateContainerConfigError 0 9h
I want to know the exact reason ,why these pods are not running.
Tried executing kubectl describe pods .But not finding the exact reason .
I tried deleting other pods forcefully,but that's also not working.
Please help me in running these pods.
Check
Kubectl get pods -o wide
It will tell you the node where it is running, then check
Kubectl get nodes
to make sure that your VMs are ready, if any of them is not ready then turn it on
Pods takes up pending state when they don't get adequate resources (CPU/MEMORY) to get scheduled
And if you have only 1 VM then you need to resize it to high memory and CPU

Initializing Tiller for Helm with Kubeadm - Kubernetes

I'm using Kubeadm to create a cluster of 3 nodes
One Master
Two Workers
I'm using weave as the network pod
The status of my cluster is this:
NAME STATUS ROLES AGE VERSION
darthvader Ready <none> 56m v1.12.3
jarjar Ready master 60m v1.12.3
palpatine Ready <none> 55m v1.12.3
And I tried to init helm and tiller in my cluster
helm init
The result was this:
$HELM_HOME has been configured at /home/ubuntu/.helm.
Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
Happy Helming!
And the status of my pods is this:
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-8q6j7 1/1 Running 0 54m
coredns-576cbf47c7-kkvd8 1/1 Running 0 54m
etcd-jarjar 1/1 Running 0 54m
kube-apiserver-jarjar 1/1 Running 0 54m
kube-controller-manager-jarjar 1/1 Running 0 53m
kube-proxy-2lwgd 1/1 Running 0 49m
kube-proxy-jxwqq 1/1 Running 0 54m
kube-proxy-mv7vh 1/1 Running 0 50m
kube-scheduler-jarjar 1/1 Running 0 54m
tiller-deploy-845cffcd48-bqnht 0/1 ContainerCreating 0 12m
weave-net-5h5hw 2/2 Running 0 51m
weave-net-jv68s 2/2 Running 0 50m
weave-net-vsg2f 2/2 Running 0 49m
The problem is that tiller is stuck in ContainerCreating State.
And I ran
kubectl describe pod tiller-deploy -n kube-system
To check the status of tiller and I found The Next error:
Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Pod sandbox changed, it will be killed and re-created.
How I can to create the tiller deploy pod successfully? I don't understand why the pod sandbox is failing.
Maybe the problem is in the way you deployed Tiller. I just recreated this and had no issues using Weave and Compute Engine instances on GCP.
You should retry with different method of installing helm as maybe there was some issue (you did not provide details on how did you install it).
Reset helm and delete tiller pod:
helm reset --force(if the tiller persists check the name of the replicaset with tiller kubectl get all --all-namespaces and kubectl delete rs/name)
Now try deploying helm and tiller using different method. For example running it through the script:
As explained here.
You can also run Helm without Tiller.
It looks like you are running into this.
Most likely your node cannot pull the container image because of a networking connectivity problem. Something image like this: gcr.io/kubernetes-helm/tiller:v2.3.1 or the pause container gcr.io/google_containers/pause (unlikely if your other pods are running). You can try logging into your nodes (darthvader, palpatine) and manually debug with:
$ docker pull gcr.io/kubernetes-helm/tiller:v2.3.1 <= Use the version on your tiller pod spec or deployment (tiller-deploy)
$ docker pull gcr.io/google_containers/pause

Resources