GCP Kubernetes workload "Does not have minimum availability" - docker

Background: I'm trying to set up a Bitcoin Core regtest pod on Google Cloud Platform. I borrowed some code from https://gist.github.com/zquestz/0007d1ede543478d44556280fdf238c9, editing it so that instead of using Bitcoin ABC (a different client implementation), it uses Bitcoin Core instead, and changed the RPC username and password to both be "test". I also added some command arguments for the docker-entrypoint.sh script to forward to bitcoind, the daemon for the nodes I am running. When attempting to deploy the following three YAML files, the dashboard in "workloads" shows bitcoin has not having minimum availability. Getting the pod to deploy correctly is important so I can send RPC commands to the Load Balancer. Attached below are my YAML files being used. I am not very familiar with Kubernetes, and I'm doing a research project on scalability which entails running RPC commands against this pod. Ask for relevant logs and I will provide them in seperate pastebins. Right now, I'm only running three machines on my cluster, as I'm am still setting this up. The zone is us-east1-d, machine type is n1-standard-2.
Question: Given these files below, what is causing GCP Kubernetes Engine to respond with "Does not have minimum availability", and how can this be fixed?
bitcoin-deployment.sh
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
namespace: default
labels:
service: bitcoin
name: bitcoin
spec:
strategy:
type: Recreate
replicas: 1
template:
metadata:
labels:
service: bitcoin
spec:
containers:
- env:
- name: BITCOIN_RPC_USER
valueFrom:
secretKeyRef:
name: test
key: test
- name: BITCOIN_RPC_PASSWORD
valueFrom:
secretKeyRef:
name: test
key: test
image: ruimarinho/bitcoin-core:0.17.0
name: bitcoin
ports:
- containerPort: 18443
protocol: TCP
volumeMounts:
- mountPath: /data
name: bitcoin-data
resources:
requests:
memory: "1.5Gi"
command: ["./entrypoint.sh"]
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
restartPolicy: Always
volumes:
- name: bitcoin-data
gcePersistentDisk:
pdName: disk-bitcoincore-1
fsType: ext4
bitcoin-secrets.yml
apiVersion: v1
kind: Secret
metadata:
name: bitcoin
type: Opaque
data:
rpcuser: dGVzdAo=
rpcpass: dGVzdAo=
bitcoin-srv.yml
apiVersion: v1
kind: Service
metadata:
name: bitcoin
namespace: default
spec:
ports:
- port: 18443
targetPort: 18443
selector:
service: bitcoin
type: LoadBalancer
externalTrafficPolicy: Local

I have run into this issue several times. The solutions that I used:
Wait. Google Cloud does not have enough resource available in the Region/Zone that you are trying to launch into. In some cases this took an hour to an entire day.
Select a different Region/Zone.
An example was earlier this month. I could not launch new resources in us-west1-a. I think just switched to us-east4-c. Everything launched.
I really do not know why this happens under the covers with Google. I have personally experienced this problem three times in the last three months and I have seen this problem several times on StackOverflow. The real answer might be a simple is that Google Cloud is really started to grow faster than their infrastructure. This is a good thing for Google as I know that they are investing in major new reasources for the cloud. Personally, I really like working with their cloud.

There could be many reasons for this failure:
Insufficient resources
Liveliness probe failure
Readiness probe failure

I encountered this error within GKE.
The reason was the pod was not about to find the configmap due to name mismatch. So make sure all the resources are discoverable by the pod.

The error message you mentioned isn't directly pointing to a stockout; it's more of resources unavailable within the cluster. You can try again after adding another node to the cluster etc. Also, this troubleshooting guide suggests if your Nodes have enough resources but you still have Does not have minimum availability message, check if the Nodes have SchedulingDisabled or Cordoned status: in this case they don't accept new pods.

Please, check your logs https://console.cloud.google.com/logs you might be surprised that your app is been failing.
I faced with the same issue when my spring-boot application failed to start due to my spring-boot configuration mistake.
Also in the args you use:
args: ["-server", "-daemon", "-regtest", "-rpcbind=127.0.0.1", "-rpcallowip=0.0.0.0/0", "-rpcport=18443", "-rpcuser=test", "-rpcpassport=test"]
should it be "-rpcpassport" or "-rpcpassword" ?

Related

Why am I getting `0/1 nodes are available` when running Docker Desktop?

I'm running Docker Desktop with Kubernetes.
I can ssh to the node and I have other pods running on the node.
However, when I apply a StatefulSet to the cluster I get:
0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
The Stateful Set is here:
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#components
kubectl get no
NAME STATUS ROLES AGE VERSION
docker-desktop Ready control-plane 6d2h v1.24.1
If you are applying the manifest defined here as it is, the problem is in the below snippet, particularly with the storageClassName. Likely, your cluster does not have a storage class called my-storage-class.
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "my-storage-class"
resources:
requests:
storage: 1Gi
To get the definitive error statement, you can run the following command:
kubectl describe pvc www-web-0
you will notice something like:
storageclass.storage.k8s.io "my-storage-class" not found
Solution:
You can run the following command to get your cluster's available storage class and replace it in yaml file.
kubectl get sc
Alternatively, you can delete the storageClassName and let the default storage class do the magic. However, for this to work, you must have a default sc present in your cluster.
If you have no storage class present, you need to create one. Check this out.
if your using k8s locally with docker desktop ensure that the storageClassName is set to "hostpath" below is one of my volumeClaimTemplates for a local redis cluster. The comment has saved me a few times when getting the "0/1 nodes are available" which is a confusing error
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
##############################
## this will catch you out ##
# for Docker Desktop (Local K8s Cluster) set to -> storageClassName: "hostpath"
##############################
storageClassName: "hostpath"
resources:
requests:
storage: 250Mi

Best practice for JSON logging in Kubernetes

I'm developing an app whose logs contain custom fields for metric purposes.
Therefore, we produce the logs in JSON format and send them to an Elasticsearch cluster.
We're currently working on migrating the app from a local Docker node to our organization's Kubernetes cluster.
Our cluster uses Fluentd as a DaemonSet, to output logs from all pods to our Elasticsearch cluster.
The setup is similar to this: https://medium.com/kubernetes-tutorials/cluster-level-logging-in-kubernetes-with-fluentd-e59aa2b6093a
I'm trying to figure out what's the best practice to send logs from our app. My two requirements are:
That the logs are formatted correctly in JSON format. I don't want them to be nested in the msg field of the persisted document.
That I can run kubectl logs -f <pod> and view the logs in readable text format.
Currently, if I don't do anything and let the DaemonSet send the logs, it'll fail both requirements.
The best solution I thought about is to ask the administrators of our Kubernetes cluster to replace the Fluentd logging with Fluentbit.
Then I can configure my deployment like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
labels:
app: example-app
annotations:
fluentbit.io/parser-example-app: json
fluentbit.io/exclude-send-logs: "true"
spec:
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: example-app
image: myapp:1.0.0
volumeMounts:
- name: app-logs
mountPath: "/var/log/app"
- name: tail-logs
image: busybox
args: [/bin/sh, -c, 'tail -f /var/log/example-app.log']
volumeMounts:
- name: app-logs
mountPath: "/var/log/app"
volumes:
- name: app-logs
emptyDir: {}
Then the logs are sent to the Elasticsearch in correct JSON format, and I can run kubectl logs -f example-app -c tail-logs to view them in a readable format.
Is this the best practice though? Am I missing a simpler solution?
Is there an alternative supported by Fluentd?
I'll be glad to here your opinion :)
There isn't really a good option here that isn't going to chew massive amounts of CPU. The closest things I can suggest other than the solution you mentioned above is inverting it where the main output stream is unformatted and you run Fluent* (usually Bit) are a sidecar on a secondary file stream. That's no better though.
Really most of us just make the output be in JSON format and on the rare occasions we need to manually poke at logs outside of the normal UI (Kibana, Grafana, whatever), we just deal with the annoyance.
You could also theoretically make your "human" format sufficiently machine parsable to allow for querying. The usual choice there is "logfmt", aka key=value pairs. So my log lines on logfmt-y services look like timestamp=2021-05-15T03:48:05.171973Z level=info event="some message" otherkey=1 foo="bar baz". That's simple enough to read by hand but also can be parsed efficiently.

Openshift: any deployment resulted in Application is not available

Fist time deploying to OpenShift (actually minishift in my Windows 10 Pro). Any sample application I deploied successfully resulted in:
From Web Console I see a weird message "Build #1 is pending" although I saw it was successfully from PowerShell
I found someone fixing similiar issue changing to 0.0.0.0 (enter link description here) but I give a try and it isn't the solution in my case.
Here are the full logs and how I am deploying
PS C:\to_learn\docker-compose-to-minishift\first-try> oc new-app https://github.com/openshift/nodejs-ex warning: Cannot check if git requires authentication.
--> Found image 93de123 (16 months old) in image stream "openshift/nodejs" under tag "10" for "nodejs"
Node.js 10.12.0
---------------
Node.js available as docker container is a base platform for building and running various Node.js applications and frameworks. Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.
Tags: builder, nodejs, nodejs-10.12.0
* The source repository appears to match: nodejs
* A source build using source code from https://github.com/openshift/nodejs-ex will be created
* The resulting image will be pushed to image stream tag "nodejs-ex:latest"
* Use 'start-build' to trigger a new build
* WARNING: this source repository may require credentials.
Create a secret with your git credentials and use 'set build-secret' to assign it to the build config.
* This image will be deployed in deployment config "nodejs-ex"
* Port 8080/tcp will be load balanced by service "nodejs-ex"
* Other containers can access this service through the hostname "nodejs-ex"
--> Creating resources ...
imagestream.image.openshift.io "nodejs-ex" created
buildconfig.build.openshift.io "nodejs-ex" created
deploymentconfig.apps.openshift.io "nodejs-ex" created
service "nodejs-ex" created
--> Success
Build scheduled, use 'oc logs -f bc/nodejs-ex' to track its progress.
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/nodejs-ex'
Run 'oc status' to view your app.
PS C:\to_learn\docker-compose-to-minishift\first-try> oc get bc/nodejs-ex -o yaml apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: 2020-02-20T20:10:38Z
labels:
app: nodejs-ex
name: nodejs-ex
namespace: samplepipeline
resourceVersion: "1123211"
selfLink: /apis/build.openshift.io/v1/namespaces/samplepipeline/buildconfigs/nodejs-ex
uid: 1003675e-541d-11ea-9577-080027aefe4e
spec:
failedBuildsHistoryLimit: 5
nodeSelector: null
output:
to:
kind: ImageStreamTag
name: nodejs-ex:latest
postCommit: {}
resources: {}
runPolicy: Serial
source:
git:
uri: https://github.com/openshift/nodejs-ex
type: Git
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: nodejs:10
namespace: openshift
type: Source
successfulBuildsHistoryLimit: 5
triggers:
- github:
secret: c3FoC0RRfTy_76WEOTNg
type: GitHub
- generic:
secret: vlKqJQ3ZBxfP4HWce_Oz
type: Generic
- type: ConfigChange
- imageChange:
lastTriggeredImageID: 172.30.1.1:5000/openshift/nodejs#sha256:3cc041334eef8d5853078a0190e46a2998a70ad98320db512968f1de0561705e
type: ImageChange
status:
lastVersion: 1

How to scrape Jenkins metrics using Prometheus Operator

I'm using Kube-prometheus with Prometheus-Operator to monitor my K8s cluster. I've deployed Jenkins on my cluster and want to start to get metrics here using ServiceMonitor.
I've installed the Prometheus plugin which exposes the metrics using /prometheus or by /metrics/API_KEY/metrics, this works fine if I create a new static job. However, if I want to use ServiceMonitor, it does not work.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: jenkins
name: jenkins
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http
path: /metrics/y1H6G16T-DhqpHdW9XwHWnP9FWAXMMfy4XnXVnyoIOEV3-gPJZKN284OFUcVkPxL/metrics
selector:
matchLabels:
jenkins: main
I don't know about ServiceMonitor, but I monitor my Jenkins instance without any problem, using annotations on Jenkins' service :
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/prometheus"
I'm using kube-prometheus-stack v12.8.0 (formerly known as the prometheus-operator helm chart).
To make prometheus-operator detect external serviceMonitors (like the one provided by Jenkins), you'll have to configure two things:
configure it to scan other namespaces:
serviceMonitorNamespaceSelector
matchLabels:
prometheus: please-scan-this-namespace-too
note: alternatively, you can leave it to {} so that all namespaces are scanned
configure it to also select the serviceMonitors detected in these other namespaces:
serviceMonitorSelector:
matchLabels:
release: prometheus-operator
note: even though the documentation states that if you leave serviceMonitorSelector to {}, it will select all serviceMonitors, it does not seem to work.
And finally, you'd still need to add these labels to 1) the namespaces and 2) serviceMonitors that you want prometheus to adopt.

Kubernetes Pod Warning: 1 node(s) had volume node affinity conflict

I try to set up Kubernetes cluster. I have Persistent Volume, Persistent Volume Claim and Storage class all set-up and running but when I wan to create pod from deployment, pod is created but it hangs in Pending state. After describe I get only this warning "1 node(s) had volume node affinity conflict." Can somebody tell me what I am missing in my volume configuration?
apiVersion: v1
kind: PersistentVolume
metadata:
creationTimestamp: null
labels:
io.kompose.service: mariadb-pv0
name: mariadb-pv0
spec:
volumeMode: Filesystem
storageClassName: local-storage
local:
path: "/home/gtcontainer/applications/data/db/mariadb"
accessModes:
- ReadWriteOnce
capacity:
storage: 2Gi
claimRef:
namespace: default
name: mariadb-claim0
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/cvl-gtv-42.corp.globaltelemetrics.eu
operator: In
values:
- master
status: {}
The error "volume node affinity conflict" happens when the persistent volume claims that the pod is using, are scheduled on different zones, rather than on one zone, and so the actual pod was not able to be scheduled because it cannot connect to the volume from another zone. To check this, you can see the details of all the Persistent Volumes.
To check that, first get your PVCs:
$ kubectl get pvc -n <namespace>
Then get the details of the Persistent Volumes (not Volume claims)
$ kubectl get pv
Find the PVs, that correspond to your PVCs and describe them
$ kubectl describe pv <pv1> <pv2>
You can check the Source.VolumeID for each of the PV, most likely they will be different availability zone, and so your pod gives the affinity error.
To fix this, create a storageclass for a single zone and use that storageclass in your PVC.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: region1storageclass
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
encrypted: "true" # if encryption required
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- eu-west-2b # this is the availability zone, will depend on your cloud provider
# multi-az can be added, but that defeats the purpose in our scenario
0. If you didn't find the solution in other answers...
In our case the error happened on a AWS EKS cluster freshly provisioned with Pulumi (see full source here). The error drove me nuts, since I didn't change anything, just created a PersistentVolumeClaim as described in the Buildpacks Tekton docs:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: buildpacks-source-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
I didn't change anything else from the default EKS configuration and also didn't add/change any PersistentVolume or StorageClass (in fact I didn't even know how to do that). As the default EKS setup seems to rely on 2 nodes, I got the error:
0/2 nodes are available: 2 node(s) had volume node affinity conflict.
Reading through Sownak Roy's answer I got a first glue what to do - but didn't know how to do it. So for the folks interested here are all my steps to resolve the error:
1. Check EKS nodes failure-domain.beta.kubernetes.io labels
As described in the section Statefull applications in this post two nodes are provisioned on other AWS availability zones as the persistent volume (PV), which is created by applying our PersistendVolumeClaim described above.
To check that, you need to look into/describe your nodes with kubectl get nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-10-186.eu-central-1.compute.internal Ready <none> 2d16h v1.21.5-eks-bc4871b
ip-172-31-20-83.eu-central-1.compute.internal Ready <none> 2d16h v1.21.5-eks-bc4871b
and then have a look at the Label section using kubectl describe node <node-name>:
$ kubectl describe node ip-172-77-88-99.eu-central-1.compute.internal
Name: ip-172-77-88-99.eu-central-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t2.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-central-1
failure-domain.beta.kubernetes.io/zone=eu-central-1b
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-172-77-88-99.eu-central-1.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t2.medium
topology.kubernetes.io/region=eu-central-1
topology.kubernetes.io/zone=eu-central-1b
Annotations: node.alpha.kubernetes.io/ttl: 0
...
In my case the node ip-172-77-88-99.eu-central-1.compute.internal has failure-domain.beta.kubernetes.io/region defined as eu-central-1 and the az with failure-domain.beta.kubernetes.io/zone to eu-central-1b.
And the other node defines failure-domain.beta.kubernetes.io/zone az eu-central-1a:
$ kubectl describe nodes ip-172-31-10-186.eu-central-1.compute.internal
Name: ip-172-31-10-186.eu-central-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t2.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eu-central-1
failure-domain.beta.kubernetes.io/zone=eu-central-1a
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-172-31-10-186.eu-central-1.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t2.medium
topology.kubernetes.io/region=eu-central-1
topology.kubernetes.io/zone=eu-central-1a
Annotations: node.alpha.kubernetes.io/ttl: 0
...
2. Check PersistentVolume's topology.kubernetes.io field
Now we should check the PersistentVolume automatically provisioned after we manually applied our PersistentVolumeClaim. Use kubectl get pv:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-93650993-6154-4bd0-bd1c-6260e7df49d3 1Gi RWO Delete Bound default/buildpacks-source-pvc gp2 21d
followed by kubectl describe pv <pv-name>
$ kubectl describe pv pvc-93650993-6154-4bd0-bd1c-6260e7df49d3
Name: pvc-93650993-6154-4bd0-bd1c-6260e7df49d3
Labels: topology.kubernetes.io/region=eu-central-1
topology.kubernetes.io/zone=eu-central-1c
Annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner
...
The PersistentVolume was configured with the label topology.kubernetes.io/zone in az eu-central-1c, which makes our Pods complain about not finding their volume - since they are in a completely different az!
3. Add allowedTopologies to StorageClass
As stated in the Kubernetes docs one solution to the problem is to add a allowedTopologies configuration to the StorageClass. If you already provisioned a EKS cluster like me, you need to retrieve your already defined StorageClass with
kubectl get storageclasses gp2 -o yaml
Save it to a file called storage-class.yml and add a allowedTopologies section that matches your node's failure-domain.beta.kubernetes.io labels like this:
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- eu-central-1a
- eu-central-1b
The allowedTopologies configuration defines that the failure-domain.beta.kubernetes.io/zone of the PersistentVolume must be either in eu-central-1a or eu-central-1b - not eu-central-1c!
The full storage-class.yml looks like this:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp2
parameters:
fsType: ext4
type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- eu-central-1a
- eu-central-1b
Apply the enhanced StorageClass configuration to your EKS cluster with
kubectl apply -f storage-class.yml
4. Delete PersistentVolumeClaim, add storageClassName: gp2 to it and re-apply it
In order to get things working again, we need to delete the PersistentVolumeClaim first.
To map the PersistentVolumeClaim to our previously define StorageClass we need to add storageClassName: gp2 to the PersistendVolumeClaim definition in our pvc.yml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: buildpacks-source-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Mi
storageClassName: gp2
Finally re-apply the PersistentVolumeClaim with kubectl apply -f pvc.yml. This should resolve the error.
There a few things that can cause this error:
Node isn’t labeled properly. I had this issue on AWS when my worker node didn’t have appropriate labels(master had them though) like that:
failure-domain.beta.kubernetes.io/region=us-east-2
failure-domain.beta.kubernetes.io/zone=us-east-2c
After patching the node with the labels, the “1 node(s) had volume node affinity conflict” error was gone, so PV, PVC with a pod were deployed successfully.
The value of these labels is cloud provider specific. Basically, it is the job of the cloud provider(with —cloud-provider option defined in cube-controller, API-server, kubelet) to set those labels. If appropriate labels aren’t set, then check that your CloudProvider integration is correct. I used kubeadm, so it is cumbersome to set up but with other tools, kops, for instance, it is working right away.
Based on your PV definition and the usage of nodeAffinity field, you are trying to use a local volume, (read here local volume description link, official docs), then make sure that you set "NodeAffinity field" like that(it worked in my case on AWS):
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- my-node # it must be the name of your node(kubectl get nodes)
So that after creating the resource and running describe on it it will show up there like that:
Required Terms:
Term 0: kubernetes.io/hostname in [your node name]
StorageClass definition(named local-storage, which is not posted here) must be created with volumeBindingMode set to WaitForFirstConsumer for local storage to work properly. Refer to the example here storage class local description, official doc to understand the reason behind that.
The "1 node(s) had volume node affinity conflict" error is created by the scheduler because it can't schedule your pod to a node that conforms with the persistenvolume.spec.nodeAffinity field in your PersistentVolume (PV).
In other words, you say in your PV that a pod using this PV must be scheduled to a node with a label of kubernetes.io/cvl-gtv-42.corp.globaltelemetrics.eu = master, but this isn't possible for some reason.
There may be various reason that your pod can't be scheduled to such a node:
The pod has node affinities, pod affinities, etc. that conflict with the target node
The target node is tainted
The target node has reached its "max pods per node" limit
There exists no node with the given label
The place to start looking for the cause is the definition of the node and the pod.
Great answer by Sownak Roy. I've had the same case of a PV being created in a different zone compared to the node that was supposed to use it. The solution I applied was based on Sownak's answer only in my case it was enough to specify the storage class without the "allowedTopologies" list, like this:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cloud-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
volumeBindingMode: WaitForFirstConsumer
After some headache inducing investigation there are a few things that are needed to be checked:
Azure:
Does your cluster have more that one zone selected? (zone 1, 2, 3)
Does your default storage class have the correct storage provider?
(ZRS Zone-Redundant-Storage)
If not:
change the storage class to use te correct provider
create backup of PV data
stop the deployment that is using the PVC (set replicas to 0)
delete the PVC and confirm that the associated PV is deleted.
re-apply the PVC config yaml (without reference to the old storageclass name)
start the deployment that is using the PVC (set replicas to 1)
manually import backupdata
Example storageclass for AKS:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: zone-redundant-storage
parameters:
skuname: StandardSSD_ZRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
GKE:
Does your cluster have more than one zone selected? (Zone A, B, C)
Does your default storage class have replication-type parameter? (replication-type: regional-pd)
If not:
change the storage class to use te correct parameters
create backup of PV data
stop the deployment that is using the PVC (set replicas to 0)
delete the PVC and confirm that the associated PV is deleted.
re-apply the PVC config yaml (without reference to the old storageclass name)
start the deployment that is using the PVC (set replicas to 1)
manually import backupdata
Example storageclass for GKE:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: standard-regional-pd-storage
provisioner: pd.csi.storage.gke.io
parameters:
type: pd-standard
replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
After that PV's will have redundancy across the selected zones allowing a pod to access PV from other nodes in different zones.
In my case, the root cause was that the persistent volume are in us-west-2c and the new worker nodes are relaunched to be in us-west-2a and us-west-2b. The solution is to either have more worker nodes so they are in more zones, or remove / widen node affinity for the application so that more worker nodes qualifies to be bounded to the persistent volume.
Make sure the kubernetes node had the required label. You can verify the node labels using:
kubectl get nodes --show-labels
One of the kubernetes nodes should show you the name/ label of the persistent volume and your pod should be scheduled on the same node.
Make sure the requested size in PersistentVolumeClaim is matching with the size of the PersistentVolume. If the size does not match, either correct the resources.requests.storage in PersistentVolumeClaim or delete the old PersistentVolume and create a new one with the correct size.
Verification steps:
Describe your persistent volume:
kubectl describe pv postgres-br-proxy-pv-0
Output:
...
Node Affinity:
Required Terms:
Term 0: postgres-br-proxy in [postgres-br-proxy-pv-0]
...
Show node labels:
kubectl get nodes --show-labels
Output:
NAME STATUS ROLES AGE VERSION LABELS
node3 Ready <none> 19d v1.17.6 postgres-br-proxy=postgres-br-proxy-pv-0
If you are not getting the persistent volume label on the node that your pod is using then the pod won't get scheduled.
For me, this happened on GKE after upgrading to k8s v1.25. In my case, none of the above worked, so I looked into cloning the volume as I didn't want to lose the data.
This post led me to enable the Compute Engine persistent disk CSI Driver which once enabled, fixed my issue.
Different case from GCP GKE. Assume that you are using regional cluster and you created two PVC. Both were created in different zones (you didn't notice).
In next step you are trying to run the pod which will have mounted both PVC to the same pod. You have to schedule that pod to specific node in specific zone but because your volumes are on different zones the k8s won't be able to schedule that and you will receive the following problem.
For example - two simple PVC(s) on the regional cluster (nodes in different zones):
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: disk-a
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: disk-b
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Next simple pod:
apiVersion: v1
kind: Pod
metadata:
name: debug
spec:
containers:
- name: debug
image: pnowy/docker-tools:latest
command: [ "sleep" ]
args: [ "infinity" ]
volumeMounts:
- name: disk-a
mountPath: /disk-a
- name: disk-b
mountPath: /disk-b
volumes:
- name: disk-a
persistentVolumeClaim:
claimName: disk-a
- name: disk-b
persistentVolumeClaim:
claimName: disk-b
Finally as a result it could happen that k8s won't be able schedule to pod because the volumes are on different zones.
On AWS EKS, you may also get this problem if you forget to install the aws-ebs-csi-driver EKS addon prior to upgrading your Kubernetes cluster from 1.22 to 1.23.
You can also install the addon after the upgrade (although with some service interruption).
Make sure to check the AWS FAQ on this: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi-migration-faq.html
almost same problem described here...
https://github.com/kubernetes/kubernetes/issues/61620
"If you're using local volumes, and the node crashes, your pod cannot be rescheduled to a different node. It must be scheduled to the same node. That is the caveat of using local storage, your Pod becomes bound forever to one specific node."
Most likely you just reduced number of nodes in your kubernetes cluster and some "regions" are not available anymore...
Something worth mentioning... if your pod will be in different zone than persistent volume then:
your disc access times will drop significantly (your local persistent storage is not local anymore - even with Amazon's / Google's fiber hyper fast links it's still traffic across data centers)
you will be paying for "cross regional network" (on your AWS bill it is something that goes into "EC2-other" and only after drilling down Aws Bill you can spot that)
One cause from this is when you have a definition like below (Kafka Zookeeper in this example) which is using multiple pvcs for one container. If they land on different nodes, you will get something like the following: ..volume node affinity conflict. The solution here is to use one pvc definition and use subPath on the volumeMount.
Problem
...
volumeMounts:
- mountPath: /data
name: kafka-zoo-data
- mountPath: /datalog
name: kafka-zoo-datalog
restartPolicy: Always
volumes:
- name: kafka-zoo-data
persistentVolumeClaim:
claimName: "zookeeper-data"
- name: kafka-zoo-datalog
persistentVolumeClaim:
claimName: "zookeeper-datalog"
Resolved
...
volumeMounts:
- mountPath: /data
subPath: data
name: kafka-zoo-data
- mountPath: /datalog
subPath: datalog
name: kafka-zoo-data
restartPolicy: Always
volumes:
- name: kafka-zoo-data
persistentVolumeClaim:
claimName: "zookeeper-data"
In my case, I was working with minikube on Docker Desktop on Windows, and my example was using only docker-desktop value as node name. so the setup is pretty important.
I have added minikube as I was using a single node. there might be more if additional nodes are added, such as minikube-m02.
spec:
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- minikube
kubectl get node should be enough to give node names.
In my case I just deleted the PersistentVolumeClaim associated with the conflicting Pod and then recreated the pod.
Another reason for this error to occur is if you have a mix of nodes utilising taints. In some releases the DaemonSet component of the EBS CSI driver does not tolerate all taints by default; if you're trying to schedule a Pod onto a node with a taint and because of that taint it doesn't have the ebs-csi-node Pod running, you get this error.

Resources