Pods can't mount disks on managed aks cluster - azure-aks

I tried simple PVC example from here with nginx claiming azure-managed-disk and I getting 'unable to mount' error, see below. Also I can't remove the created PV with 'kubectl delete pv pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988'.
$kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
10m Warning FailedMount Pod MountVolume.WaitForAttach failed for volume "pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988" : azureDisk - WaitForAttach failed within timeout node (aks-agentpool-10844952-2) diskId:(kubernetes-dynamic-pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988) lun:(1)
22s Warning FailedMount Pod Unable to mount volumes for pod "nginx_default(bd16b9c8-97b2-11e9-9018-eaa2ea1705c5)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx". list of unmounted volumes=[volume]. list of unattached volumes=[volume default-token-92rj6]
My managed aks cluster is using v1.12.8 , SP has contributor role (owner role doesn't not help too). There is storage class 'managed-premium', in the yaml from my simple nginx example (link provided).

For your issue, there are no more details to judge the exact reason. Just list the possible reason here.
It's just a simple error that it's failed when the API call to Azure. If so, you just need to delete them and recreate again.
The node that the pod run in already has too many Azure disks attached. If so, you need to schedule the pod run in another node which does not attach to many disks.
The Azure disk cannot be unmounted or detached from the old node. It means that the PV is in use and attach to another node. If so, you need to create another dynamic PV that does not in use for your pod.
You can check carefully again according to these reasons. In my opinion, the third reason is the most possible one. Of curse, it all dependents on the actual situation. For more details about the similar errors, see How to Understand & Resolve “Warning Failed Attach Volume” and “Warning Failed Mount” Errors in Kubernetes on Azure.

Related

How to check failed container logs in Kubernetes

Before i check the logs, pods are failing and removed by jenkins and I am unable to see the logs.
How can i check the logs of the pods that are removed.
is there any simple way to save the logs in kubernetes.
I don't have any logging system for my kubernetes.
In a fraction of seconds, it keeps creating and deleting because of some error. I want to find what the error is. before i check the logs, the container name is changed.
Thanks,
Most probably you meant "pods are failing and removed by kubernetes and I am unable to see the logs." This is kubernetes itself who manage API objects, not jenkins.
Answering your question directly - you are not able to fetch any logs from any of your containers once related POD was deleted. Deletion pod means wiping all pod's containers with all the data included. Logs were deleted in the moment your pod was terminated.
By default, if a container restarts, the kubelet keeps one terminated
container with its logs. If a pod is evicted from the node, all
corresponding containers are also evicted, along with their logs.
If you pod were alive - you would be able to use ----previous flag to check the logs, but unfortunatelly thats not your case.
There are a lot of similar questions - and the only main suggestion is to set up some log aggregation system that will store logs separately. IN that case you wont lose them and will be able at least check them.
Logging at the node level
Cluster-level logging architectures
How to see logs of terminated pods
How to access Logs of Pods in Kubernetes after its deletion

Stop all Pods in a StatefulSet before scaling it up or down

My team is currently working on migrating a Discord chat bot to Kubernetes. We plan on using a StatefulSet for the main bot service, as each Shard (pod) should only have a single connection to the Gateway. Whenever a shard connects to said Gateway, it tells it its ID (in our case the pod's ordinal index) and how many shards we are running in total (the amount of replicas in the StatefulSet).
Having to tell the gateway the total number of shards means that in order to scale our StatefulSet up or down we'd have to stop all pods in that StatefulSet before starting new ones with the updated value.
How can I achieve that? Preferrably through configuration so I don't have to run a special command each time.
Try kubectl rollout restart sts <sts name> command. It'll restart the pods one by one in a RollingUpdate way.
Scale down the sts
kubectl scale --replicas=0 sts <sts name>
Scale up the sts
kubectl scale --replicas=<number of replicas> sts <sts name>
One way of doing this is,
Firstly get the YAML configuration of StatefulSets by running the below command and save it in a file:
kubectl get statefulset NAME -o yaml > sts.yaml
And then delete the StatefulSets by running the below command:
kubectl delete -f sts.yaml
And Finally, again create the StatefulSets by using the same configuration file which you got in the first step.
kubectl apply -f sts.yaml
I hope this answers your query to only delete the StatefulSets and to create the new StatefulSets as well.
Before any kubectl scale, since you need more control on your nodes, you might consider a kubectl drain first
When you are ready to put the node back into service, use kubectl uncordon, which will make the node schedulable again.
By draining the node where your pods are maanged, you would stop all pods, with the opportunity to scale the statefulset with the new value.
See also "How to Delete Pods from a Kubernetes Node" by Keilan Jackson
Start at least with kubectl cordon <nodename> to mark the node as unschedulable.
If your pods are controlled by a StatefulSet, first make sure that the pod that will be deleted can be safely deleted.
How you do this depends on the pod and your application’s tolerance for one of the stateful pods to become temporarily unavailable.
For example you might want to demote a MySQL or Redis writer to just a read-only slave, update and release application code to no longer reference the pod in question temporarily, or scale up the ReplicaSet first to handle the extra traffic that may be caused by one pod being unavailable.
Once this is done, delete the pod and wait for its replacement to appear on another node.

Kubernetes POD Failover

I am toying around with Kubernetes and have managed to deploy a statefull application (jenkins instance) to a single node.
It uses a PVC to make sure that I can persist my jenkins data (jobs, plugins etc).
Now I would like to experiment with failover.
My cluster has 2 digital ocean droplets.
Currently my jenkins pod is running on just one node.
When that goes down, Jenkins becomes unavailable.
I am now looking on how to accomplish failover in a sense that, when the jenkins pod goes down on my node, it will spin up on the other node. (so short downtime during this proces is ok).
Of course it has to use the same PVC, so that my data remains intact.
I believe, when reading, that a StatefulSet kan be used for this?
Any pointers are much appreciated!
Best regards
Digital Ocean's Kubernetes service only supports ReadWriteOnce access modes for PVCs (see here). This means the volume can only be attached to one node at a time.
I came across this blogpost which, while focused on Jenkins on Azure, has the same situation of only supporting ReadWriteOnce. The author states:
the drawback for me though lies in the fact that the access mode for Azure Disk persistent volumes is ReadWriteOnce. This means that an Azure disk can be attached to only one cluster node at a time. In the event of a node failure or update, it could take anywhere between 1-5 minutes for the Azure disk to get detached and attached to the next available node.
Note, Pod failure and node failures are different things. Since DO only supports ReadWriteOnce, there's no benefit to trying anything more sophisticated than what you have right now in terms of tolerance to node failure. Since it's ReadWriteOnce the volume will need to be unmounted from the failing node and re-mounted to the new node, and then a new Pod will get scheduled on the new node. Kubernetes will do this for you, and there's not much you can do to optimize it.
For Pod failure, you could use a Deployment since you want to read and write the same data, you don't want different PVs attached to the different replicas. There may be very limited benefit to this, you will have multiple replicas of the Pod all running on the same node, so it depends on how the Jenkins process scales and if it can support that type of scale horizontal out model while all writing to the same volume (as opposed to simply vertically scaling memory or CPU requests).
If you really want to achieve higher availability in the face of node and/or Pod failures, and the Jenkins workload you're deploying has a hard requirement on local volumes for persistent state, you will need to consider an alternative volume plugin like NFS, or moving to a different cloud provider like GKE.
Yes, you would use a Deployment or StatefulSet depending on the use case. For Jenkins, a StatefulSet would be appropriate. If the running pod becomes unavailable, the StatefulSet controller will see that and spawn a new one.
What you are describing is the default behaviour of Kubernetes for Pods that are managed by a controller, such as a Deployment.
You should deploy any application as a Deployment (or another controller) even if it consists just of a single Pod. You never really deploy Pods directly to Kubernetes. So, in this case, there's nothing special you need to do to get this behaviour.
When one of your nodes dies, the Pod dies too. This is detected by the Deployment controller, which creates a new Pod. This is in turn detected by the scheduler, which assigns the new Pod to a node. Since one of the nodes is down, it will assign the Pod to the other node that is still running. Once the Pod is assigned to this node, the kubelet of this node will run the container(s) of this Pod on this node.
Ok, let me try to anwser my own question here.
I think Amit Kumar Gupta came the closest to what I believe is going on here.
Since I am using a Deployment and my PVC in ReadWriteOnce, I am basically stuck with one pod, running jenkins, on one node.
weibelds answer made me realise that I was asking questions to about a concept that Kubernetes performs by default.
If my pod goes down (in my case i am shutting down a node on purpose by doing a hard power down to simulate a failure), the cluster (controller?) will detect this and spawn a new pod on another node.
All is fine so far, but then I noticed that my new pod as stuck in ContainerCreating state.
Running a describe on my new pod (the one in ContainerCreating state) showed this
Warning FailedAttachVolume 16m attachdetach-controller Multi-Attach error for volume "pvc-cb772fdb-492b-4ef5-a63e-4e483b8798fd" Volume is already used by pod(s) jenkins-deployment-6ddd796846-dgpnm
Warning FailedMount 70s (x7 over 14m) kubelet, cc-pool-bg6u Unable to mount volumes for pod "jenkins-deployment-6ddd796846-wjbkl_default(93747d74-b208-421c-afa4-8d467e717649)": timeout expired waiting for volumes to attach or mount for pod "default"/"jenkins-deployment-6ddd796846-wjbkl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[jenkins-home default-token-wd6p7]
Then it started to hit me, this makes sense.
It's a pitty, but it makes sense.
Since I did a hard power down on the node, the PV went down with it.
So now the controller tries to start a new pod, on a new node but it cant transfer the PV, since the one on the previous pod became unreachable.
As I read more on this, I read that DigitalOcean only supports ReadWriteOnce , which now leaves me wondering, how the hell can I achieve a simple failover for a stateful application on a Kubernetes Cluster on Digital Ocean that consists of just a couple of simple droplets?

Why am I getting "Structure needs cleaning" message on Ceph with Kubernetes?

Sorry to ask this, I am relatively new in Kubernetes and Ceph, only have a little idea about this.
I have setup Kubernetes and Ceph using this tutorial(http://tutorial.kubernetes.noverit.com/content/ceph.html)
I had set up my cluster like this:
1 Kube-Master and 2 worker Nodes(this acts ceph monitor with 2 OSD in each)
The ceph-deploy I used to setup ceph cluster is in the Kube-master.
Everything is working fine, I installed my sample web application(deployment) with 5 replicas, which will create a file when the rest API is hit. The file is getting copied to every node.
But after 10 min, I created one more file using the API, but when I try to list(ls -l) I am getting the following error:
For node1:
ls: cannot access 'previousFile.txt': Structure needs cleaning
previousFile.txt newFile.txt
For node2:
previousFile.txt
For node2 the new file is not created
What might be the issue? I have tried many times still same error pop up.
Any help appreciated.
This totally looks like your filesystem got corrupted. Things to check:
$ kubectl logs <ceph-pod1>
$ kubectl logs <ceph-pod2>
$ kubectl describe deployment <ceph-deployment> # did any of the pods restart?
Some info about the error message here.
Depending on what you have you might need to start from scratch. Or you can take a look a recovering data in Ceph, but may not work if you don't have a snap.
Running Ceph on Kubernetes can be very tricky because any start/restart for a specific node starting on a different Kubernetes node might corrupt the data, so you need to make sure that part is pretty solid possibly using Node Affinity or running Ceph pods on specific Kubernetes nodes with labels.

Kubernetes pod is stuck in ContainerCreating state after image upgrade

During image upgrade of pods few of the pods are stuck in ContainerCreating state.
kubectl get events has below error: FailedSync kubelet,
10.102.10.34 Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod
"default"/"ob-service-1124355621-1th47". list of unattached/unmounted
volumes=[timezone default-token-3x1x9]
Docker Logs :
^[[31mERRO^[[0m[240242] Handler for DELETE /v1.22/containers/749d05b355e2b80bffb90d207232d37e3ebc5ff57942c46ce0a2b4ca5950ed0e returned error: Driver devicemapper failed to remove root filesystem 749d05b355e2b80bffb90d207232d37e3ebc5ff57942c46ce0a2b4ca5950ed0e: Device is Busy
^[[31mERRO^[[0m[240242] Error saving dying container to disk: open /var/lib/docker/containers/5d01db2c31a3073cc7fb68f2be5acc45c34583d5f2ae0c0879ec064f90da6943/config.v2.json: no such file or directory
^[[31mERRO^[[0m[240263] Error removing mounted layer 5d01db2c31a3073cc7fb68f2be5acc45c34583d5f2ae0c0879ec064f90da6943: Device is Busy
it's a bit hard to debug with just the information you provided, but the general direction you should be looking into is resources of your cluster.
failed to sync usually means the pods can't be fit into any of the workers (maybe adding more will help) or from your error seems like you're trying to "connect" to volumes that are busy and can't accept the connection which fails the pod.
Again lacking details, but let's assume you're on AWS and you have a volume that didn't dismount and now you're trying to connect to it again - the above would be the result pretty much, so you'll need to detach the volume so the new pod can connect to it.
if you say there are some pods that are okay with the same image it means you don't have enough volumes and/or some of the current volumes are not available to accept new connection (maybe during the deletion of the old pods they didn't dismount properly)

Resources