Multi-Attach error for volume "XXXXXXXXXXXX" Volume is already used by pod - persistent-volumes

Rollover Update for a deployment that uses ReadWriteOnce got failure due to the second deployment pod getting started on the second node.
I have followed dynamic provisioning of persistent volume using the doc: https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/
The pods are created during the first deployment but when the image is replaced, the second deployment shows the following error as the pod is being created in the next node.
Multi-Attach error for volume "xxxxxxxxxxxxxxxxxxx" Volume is already used by pod(s) "xxxxxxxxxxxx"
I have tried the strategy of using recreate but there would be downtime as the new pod will be created only after termination of previous pod.
Reference: Updating a deployment that uses a ReadWriteOnce volume will fail on mount
I have also tried using ReadWriteMany along with NFS persistent volume. But this approach is not suitable for our use case as the solution needs to be multi cloud solutiun
Is there any approach with access mode ReadWriteOnce that can be used along with rolling update with no downtime in gcp

Related

Kubernetes POD Failover

I am toying around with Kubernetes and have managed to deploy a statefull application (jenkins instance) to a single node.
It uses a PVC to make sure that I can persist my jenkins data (jobs, plugins etc).
Now I would like to experiment with failover.
My cluster has 2 digital ocean droplets.
Currently my jenkins pod is running on just one node.
When that goes down, Jenkins becomes unavailable.
I am now looking on how to accomplish failover in a sense that, when the jenkins pod goes down on my node, it will spin up on the other node. (so short downtime during this proces is ok).
Of course it has to use the same PVC, so that my data remains intact.
I believe, when reading, that a StatefulSet kan be used for this?
Any pointers are much appreciated!
Best regards
Digital Ocean's Kubernetes service only supports ReadWriteOnce access modes for PVCs (see here). This means the volume can only be attached to one node at a time.
I came across this blogpost which, while focused on Jenkins on Azure, has the same situation of only supporting ReadWriteOnce. The author states:
the drawback for me though lies in the fact that the access mode for Azure Disk persistent volumes is ReadWriteOnce. This means that an Azure disk can be attached to only one cluster node at a time. In the event of a node failure or update, it could take anywhere between 1-5 minutes for the Azure disk to get detached and attached to the next available node.
Note, Pod failure and node failures are different things. Since DO only supports ReadWriteOnce, there's no benefit to trying anything more sophisticated than what you have right now in terms of tolerance to node failure. Since it's ReadWriteOnce the volume will need to be unmounted from the failing node and re-mounted to the new node, and then a new Pod will get scheduled on the new node. Kubernetes will do this for you, and there's not much you can do to optimize it.
For Pod failure, you could use a Deployment since you want to read and write the same data, you don't want different PVs attached to the different replicas. There may be very limited benefit to this, you will have multiple replicas of the Pod all running on the same node, so it depends on how the Jenkins process scales and if it can support that type of scale horizontal out model while all writing to the same volume (as opposed to simply vertically scaling memory or CPU requests).
If you really want to achieve higher availability in the face of node and/or Pod failures, and the Jenkins workload you're deploying has a hard requirement on local volumes for persistent state, you will need to consider an alternative volume plugin like NFS, or moving to a different cloud provider like GKE.
Yes, you would use a Deployment or StatefulSet depending on the use case. For Jenkins, a StatefulSet would be appropriate. If the running pod becomes unavailable, the StatefulSet controller will see that and spawn a new one.
What you are describing is the default behaviour of Kubernetes for Pods that are managed by a controller, such as a Deployment.
You should deploy any application as a Deployment (or another controller) even if it consists just of a single Pod. You never really deploy Pods directly to Kubernetes. So, in this case, there's nothing special you need to do to get this behaviour.
When one of your nodes dies, the Pod dies too. This is detected by the Deployment controller, which creates a new Pod. This is in turn detected by the scheduler, which assigns the new Pod to a node. Since one of the nodes is down, it will assign the Pod to the other node that is still running. Once the Pod is assigned to this node, the kubelet of this node will run the container(s) of this Pod on this node.
Ok, let me try to anwser my own question here.
I think Amit Kumar Gupta came the closest to what I believe is going on here.
Since I am using a Deployment and my PVC in ReadWriteOnce, I am basically stuck with one pod, running jenkins, on one node.
weibelds answer made me realise that I was asking questions to about a concept that Kubernetes performs by default.
If my pod goes down (in my case i am shutting down a node on purpose by doing a hard power down to simulate a failure), the cluster (controller?) will detect this and spawn a new pod on another node.
All is fine so far, but then I noticed that my new pod as stuck in ContainerCreating state.
Running a describe on my new pod (the one in ContainerCreating state) showed this
Warning FailedAttachVolume 16m attachdetach-controller Multi-Attach error for volume "pvc-cb772fdb-492b-4ef5-a63e-4e483b8798fd" Volume is already used by pod(s) jenkins-deployment-6ddd796846-dgpnm
Warning FailedMount 70s (x7 over 14m) kubelet, cc-pool-bg6u Unable to mount volumes for pod "jenkins-deployment-6ddd796846-wjbkl_default(93747d74-b208-421c-afa4-8d467e717649)": timeout expired waiting for volumes to attach or mount for pod "default"/"jenkins-deployment-6ddd796846-wjbkl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[jenkins-home default-token-wd6p7]
Then it started to hit me, this makes sense.
It's a pitty, but it makes sense.
Since I did a hard power down on the node, the PV went down with it.
So now the controller tries to start a new pod, on a new node but it cant transfer the PV, since the one on the previous pod became unreachable.
As I read more on this, I read that DigitalOcean only supports ReadWriteOnce , which now leaves me wondering, how the hell can I achieve a simple failover for a stateful application on a Kubernetes Cluster on Digital Ocean that consists of just a couple of simple droplets?

Pods can't mount disks on managed aks cluster

I tried simple PVC example from here with nginx claiming azure-managed-disk and I getting 'unable to mount' error, see below. Also I can't remove the created PV with 'kubectl delete pv pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988'.
$kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
10m Warning FailedMount Pod MountVolume.WaitForAttach failed for volume "pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988" : azureDisk - WaitForAttach failed within timeout node (aks-agentpool-10844952-2) diskId:(kubernetes-dynamic-pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988) lun:(1)
22s Warning FailedMount Pod Unable to mount volumes for pod "nginx_default(bd16b9c8-97b2-11e9-9018-eaa2ea1705c5)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx". list of unmounted volumes=[volume]. list of unattached volumes=[volume default-token-92rj6]
My managed aks cluster is using v1.12.8 , SP has contributor role (owner role doesn't not help too). There is storage class 'managed-premium', in the yaml from my simple nginx example (link provided).
For your issue, there are no more details to judge the exact reason. Just list the possible reason here.
It's just a simple error that it's failed when the API call to Azure. If so, you just need to delete them and recreate again.
The node that the pod run in already has too many Azure disks attached. If so, you need to schedule the pod run in another node which does not attach to many disks.
The Azure disk cannot be unmounted or detached from the old node. It means that the PV is in use and attach to another node. If so, you need to create another dynamic PV that does not in use for your pod.
You can check carefully again according to these reasons. In my opinion, the third reason is the most possible one. Of curse, it all dependents on the actual situation. For more details about the similar errors, see How to Understand & Resolve “Warning Failed Attach Volume” and “Warning Failed Mount” Errors in Kubernetes on Azure.

in kubernetes, how to update pods to use updated configmap

I'm running more than one replicas of pods with kubernetes deployment
and I'd like to update the replicas to use updated configmap in a rolling way. same like rolling-update works.
So that kubernetes will terminate pod and start sending traffic to the newly updated pods one at a time until all pods will be updated.
Can I use rolling-update with deployment?
Applying a change to the Deployment object will trigger a rolling-update. From the docs:
A Deployment’s rollout is triggered if and only if the Deployment’s pod template (that is, .spec.template) is changed, for example if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.
So if you want to trigger a rolling update to update your configmap I would suggest you update a metadata label. Perhaps a CONFIG_VER key.
To automatically perform a rolling update of deployment on configmap update, you can also use a tool that my team has built and opensourced: Reloader which we are also using in production clusters of our customers.
Reloader watches changes in ConfigMap and Secret and updates the associated Deployments, Deamonsets and Statefulsets, based on the configured update strategy.

Kubernetes pod is stuck in ContainerCreating state after image upgrade

During image upgrade of pods few of the pods are stuck in ContainerCreating state.
kubectl get events has below error: FailedSync kubelet,
10.102.10.34 Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod
"default"/"ob-service-1124355621-1th47". list of unattached/unmounted
volumes=[timezone default-token-3x1x9]
Docker Logs :
^[[31mERRO^[[0m[240242] Handler for DELETE /v1.22/containers/749d05b355e2b80bffb90d207232d37e3ebc5ff57942c46ce0a2b4ca5950ed0e returned error: Driver devicemapper failed to remove root filesystem 749d05b355e2b80bffb90d207232d37e3ebc5ff57942c46ce0a2b4ca5950ed0e: Device is Busy
^[[31mERRO^[[0m[240242] Error saving dying container to disk: open /var/lib/docker/containers/5d01db2c31a3073cc7fb68f2be5acc45c34583d5f2ae0c0879ec064f90da6943/config.v2.json: no such file or directory
^[[31mERRO^[[0m[240263] Error removing mounted layer 5d01db2c31a3073cc7fb68f2be5acc45c34583d5f2ae0c0879ec064f90da6943: Device is Busy
it's a bit hard to debug with just the information you provided, but the general direction you should be looking into is resources of your cluster.
failed to sync usually means the pods can't be fit into any of the workers (maybe adding more will help) or from your error seems like you're trying to "connect" to volumes that are busy and can't accept the connection which fails the pod.
Again lacking details, but let's assume you're on AWS and you have a volume that didn't dismount and now you're trying to connect to it again - the above would be the result pretty much, so you'll need to detach the volume so the new pod can connect to it.
if you say there are some pods that are okay with the same image it means you don't have enough volumes and/or some of the current volumes are not available to accept new connection (maybe during the deletion of the old pods they didn't dismount properly)

Kubernetes: Send a Pod Id and Deployment Id to Docker

My Kube deployment
Consider a cluster with a few Kubernetes deployments of Docker containers,
one per each campaign. Each deployment scales quite frequently.
For logging purposes, I want each pod to write to a separate directory on the host, which includes both its unique Pod Id and the shared deployment id, e.g.:
/var/logs/campaigns/campaign1/pod1/
/var/logs/campaigns/campaign1/pod2/
/var/logs/campaigns/campaign2/pod3/
/var/logs/campaigns/campaign2/pod4/
This way no pod can access logs of another campaign or another pod. As a side note, I use FileBeat to collect the logs from the base directory and send them to ElasticSearch.
My question
How can Kubernetes pass the Pod Id and Deployment Id to Docker so that it can create and mount a directory on the host containing these values?
For deployment you do not need anything inside kube, all you need to do is define proper ENV variable in your deployment manifest (ie. identical to deployment name).
As for Pod, there is a downward API.

Resources