GKE can't pull image from GCR - docker

This one is a real head-scratcher, because everything had worked fine for years until yesterday. I have a google cloud account and the billing is set up correctly. I have private images in my GCR registry which I can 'docker pull' and 'docker push' from my laptop (MacBook Pro with Big Sur 11.4) with no problems.
The problem I detail here started happening yesterday after I deleted a project in the google cloud console, then created it again from scratch with the same name. The previous project had no problem pulling GCR images, the new one couldn't pull the same images. I have now used the cloud console to create new, empty test projects with a variety of names, with new clusters using default GKE values. But this new problem persists with all of them.
When I use kubectl to create a deployment on GKE that uses any of the GCR images in the same project, I get ErrImagePull errors. When I 'describe' the pod that won't load the image, the error (with project id obscured) is:
Failed to pull image "gcr.io/test-xxxxxx/test:1.0.0": rpc error: code
= Unknown desc = failed to pull and unpack image "gcr.io/test-xxxxxx/test:1.0.0": failed to resolve reference
"gcr.io/test-xxxxxx/test:1.0.0": unexpected status code [manifests
1.0.0]: 401 Unauthorized.
This happens when I use kubectl from my laptop (including after wiping out and creating a new .kube/config file with proper credentials), but happens exactly the same when I use the cloud console to set up a deployment by choosing 'Deploy to GKE' for the GCR image... no kubectl involved.
If I ssh into a node in any of these new clusters and try to 'docker pull' a GCR image (in the same project), I get a similar error:
Error response from daemon: unauthorized: You don't have the needed
permissions to perform this operation, and you may have invalid
credentials. To authenticate your request, follow the steps in:
https://cloud.google.com/container-registry/docs/advanced-authentication
My understanding from numerous articles is that no special authorization needs to be set up for GKE to pull GCR images from within the same project, and I've NEVER had this issue in the past.
I hope I'm not the only one on this deserted island. Thanks in advance for your help!

I tried Implementing the setup and faced the same error both on the GKE Cluster and the Cluster’s nodes. This was caused because the access to Cloud Storage API is “Disabled” on the Cluster Nodes which can be verified from Node (VM instance) details under the “Cloud API access scopes” section.
We can rectify this by changing the “Access scopes” to “Set access for each API” and modify access to specific API in the Node Pools -> default-pool -> Security section when creating the cluster. In our case we need at least "Read Only" access for Storage API to enable access to Cloud Storage where the Image is stored. Changing the service account and access scopes for an instance for more information.

Related

Error pulling docker image from GCR into GKE "Failed to pull image .... 403 Forbidden"

Background:
I have a GKE cluster which has suddenly stopped being able to pull my docker images from GCR; both are in the same GCP project. It has been working well for several months, no issues pulling images, and has now started throwing errors without having made any changes.
(NB: I'm generally the only one on my team who accesses Google Cloud, though it's entirely possible that someone else on my team may have made changes / inadvertently made changes without realising).
I've seen a few other posts on this topic, but the solutions offered in others haven't helped. Two of these posts stood out to me in particular, as they were both posted around the same day my issues started ~13/14 days ago. Whether this is coincidence or not who knows..
This post has the same issue as me; unsure whether the posted comments helped them resolve, but it hasn't fixed for me. This post seemed to also be the same issue, but the poster says it resolved by itself after waiting some time.
The Issue:
I first noticed the issue on the cluster a few days ago. Went to deploy a new image by pushing image to GCR and then bouncing the pods kubectl rollout restart deployment.
The pods all then came back with ImagePullBackOff, saying that they couldn't get the image from GCR:
kubectl get pods:
XXX-XXX-XXX 0/1 ImagePullBackOff 0 13d
XXX-XXX-XXX 0/1 ImagePullBackOff 0 13d
XXX-XXX-XXX 0/1 ImagePullBackOff 0 13d
...
kubectl describe pod XXX-XXX-XXX:
Normal BackOff 20s kubelet Back-off pulling image "gcr.io/<GCP_PROJECT>/XXX:dev-latest"
Warning Failed 20s kubelet Error: ImagePullBackOff
Normal Pulling 8s (x2 over 21s) kubelet Pulling image "gcr.io/<GCP_PROJECT>/XXX:dev-latest"
Warning Failed 7s (x2 over 20s) kubelet Failed to pull image "gcr.io/<GCP_PROJECT>/XXX:dev-latest": rpc error: code = Unknown desc = failed to pull and unpack image "gcr.io/<GCP_PROJECT>/XXX:dev-latest": failed to resolve reference "gcr.io/<GCR_PROJECT>/XXX:dev-latest": unexpected status code [manifests dev-latest]: 403 Forbidden
Warning Failed 7s (x2 over 20s) kubelet Error: ErrImagePull
Troubleshooting steps followed from other posts:
I know that the image definitely exists in GCR -
I can pull the image to my own machine (also removed all docker images from my machine to confirm it was really pulling)
I can see the tagged image if I look on the GCR UI on chrome.
I've SSH'd into one of the cluster nodes and tried to docker pull manually, with no success:
docker pull gcr.io/<GCP_PROJECT>/XXX:dev-latest
Error response from daemon: unauthorized: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication
(Also did a docker pull of a public mongodb image to confirm that was working, and it's specific to GCR).
So this leads me to believe it's an issue with the service account not having the correct permissions, as in the cloud docs under the 'Error 400/403' section. This seems to suggest that the service account has either been deleted, or edited manually.
During my troubleshooting, I tried to find out exactly which service account GKE was using to pull from GCR. In the steps outlined in the docs, it says that: The name of your Google Kubernetes Engine service account is as follows, where PROJECT_NUMBER is your project number:
service-PROJECT_NUMBER#container-engine-robot.iam.gserviceaccount.com
I found the service account and checked the polices - it did have one for roles/container.serviceAgent, but nothing specifically mentioning kubernetes as I would expect from the description in the docs.. 'the Kubernetes Engine Service Agent role' (unless that is the one they're describing, in which case I'm no better off that before anyway..).
Must not have had the correct roles, so I then followed the steps to re-enable (disable then enable the Kubernetes API). Running cloud projects get-iam-policy <GCP_PROJECT> again and diffing the two outputs (before/after), the only difference is that a service account for '#cloud-filer...' has been deleted.
Thinking maybe the error was something else, I thought I would try spinning up a new cluster. Same error - can't pull images.
Send help..
I've been racking my brains to try to troubleshoot, but I'm now out of ideas! Any and all help much appreciated!
I believe the correct solution is to add the "roles/artifactregistry.reader" role to the service account that the node pool is configured to use.
In terraform that can be done by
resource "google_project_iam_member" "allow_image_pull" {
project = var.project_id
role = "roles/artifactregistry.reader"
member = "serviceAccount:${var.service_account_email}"
}
I don't know if it still helps, but I had the same issue and managed to fix it.
In my case I was deploying GKE trough terraform and did not specify oauth_scope property for node pool as show in example. As I understand you need to make gcp APIs available here to make nodes able to use them.
Have now solved this.
The service account had the correct roles/permissions, but for whatever reason stopped working.
I manually created a key for that service account, added that secret into the kube cluster, and set the service account to use that key.
Still at a loss as to why it wasn't already doing this, or why it stopped working in the first place all of a sudden, but it's working...
Fix was from this guide, from the section starting 'Create & use GCR credentials'.
From the docs compute engine default service account accesses container registry for pulling image not the kubernetes engine service account.You can go to node pool and check the service account name in the security section.Check the access logs of the service account to see errors and then provide necessary permission to the service account.
In my case worked re-add (i.e. deletion and then addition) role "Artifact registry reader" for serviceaccount used by cluster.

pushing signed docker images to GCR

Somewhat of a GCR newbie question.
I have not been able to find any documentation on whether it is possible to push signed docker images to GCR. So I attempted it but it fails with following error below.
I first built a docker image, then tagged it to point to my project in GCR with "docker tag gcr.io/my-project/image-name:tag"
Then attempted signing using
"docker trust sign gcr.io/my-project/image-name:tag"
Error: error contacting notary server: denied: Token exchange failed for project 'gcr.io:my-project'. Please enable Google Container Registry API in Cloud Console at https://console.cloud.google.com/apis/api/containerregistry.googleapis.com/overview?project=gcr.io:my-project before performing this operation.
GCR API for my project is enabled and I have permissions to push to it.
Do I need to something more in my project in GCP to be able to push signed images OR it is just not supported?
If later, how does one (as a image consumer) verify the integrity of the image?
thanks,
J
This is currently not supported in Google Cloud Platform.
You can file a feature request to request its implementation here.
To verify an images integrity, use image digests. Basically they are cryptographic hashes associated with the image. You can compare the hash of the image you pulled with the hash you are expecting. Command reference here
Google now implements the concept of Binary Authorization and "attestations" based off of Kritis. The intention is for this to be used within your CI/CD pipeline to ensure images have been created and validated correctly.
Full docs are here but the process basically consists of signing an image via a PKIX signature and then using the gcloud tool to create an attestation.
You then specify a Binary Authorization policy on your GKE cluster to enforce which attestations are required before an image is allowed to be used within the cluster.

Getting "unauthorized: authentication required" when pulling ACR images from Azure Kubernetes Service

I followed the guide here (Grant AKS access to ACR), but am still getting "unauthorized: authentication required" when a Pod is attempting to pull an image from ACR.
The bash script executed without any errors. I have tried deleting my Deployment and creating it from scratch kubectl apply -f ..., no luck.
I would like to avoid using the 2nd approach of using a secret.
The link you posted in the question is the correct steps for Authenticate with Azure Container Registry from Azure Kubernetes Service. I tried before and it works well.
So I suggest you can check if the service-principal-ID and service-principal-password are correct in the command kubectl create secret docker-registry acr-auth --docker-server <acr-login-server> --docker-username <service-principal-ID> --docker-password <service-principal-password> --docker-email <email-address>. And the secret you set in the yaml file should also be check if the same as the secret you created.
Jeff & Charles - I also experienced this issue, but found that the actual cause of the issue was that AKS was trying to pull an image tag from the container registry that didn't exist (e.g. latest). When I updated this to a tag that was available (e.g. 9) the deployment script on azure kubernetes service (AKS) worked successfully.
I've commented on the product feedback for the guide to request the error message context be improved to reflect this root cause.
Hope this helps! :)
In my case, I was having this problem because my clock was out of sync. I run on Windows Subsytem for Linux, so running sudo hwclock -s fixed my issue.
See this GitHub thread for longer discussion.
In my case, the Admin User was not enabled in the Azure Container Registry.
I had to enable it:
Go to "Container registries" page > Open your Registry > In the side pannel under Settings open Access keys and switch Admin user on. This generates a Username, a Password, and a Password2.

GCP container push not working - "denied: Please enable Google Container Registry API in Cloud Console at ..."

I'm having trouble uploading my docker image to GCP Container registry. I was following the instructions here.
As you can see in the screenshot below, I've:
Logged into my google cloud shell and built a docker image via a dockerfile
Tagged my image correctly (I think)
Tried to push the image using the correct command (I think)
However, I'm getting this error:
denied:
Please enable Google Container Registry API in Cloud Console at https://console.cloud.google.com/apis/api/containerregistry.googleapis.com/overview?project=bentestproject-184220 before performing this operation.
When I follow that link, it takes me to the wrong project:
When I select my own project, I can see that "Google Container Registry API" is indeed enabled:
How do I upload my docker images?
I seems that you mistype you project ID. You project name is BensTestsProject but ID is bentestproject-184220.
I have the same issue and solved it. In my case the project name in the image tag was wrong. you must be re-check your "bentestproject-184220" in your image tag is correctly your projectID.

Google Cloud Container Registry Issues while pushing docker images

I have a Google project on which i am one of the owner. It was created by another developer and he added me as the owner. Now within that i created a VM instance within which i installed docker. After installing docker, i created an image of my node.js application by providing the git repository as the argument.
However after setting the gcloud config parameters, its giving me 500 error while trying to push that docker image
Error: Status 500 trying to push repository <project-id>/<image-name>: "Internal Error."
My gcloud and docker version information :-
Google Cloud SDK 0.9.71
Docker version 1.7.1, build 786b29d
you probably were hit by the Google Cloud Storage outage that was going on last night: https://status.cloud.google.com/incident/storage/16027
would you mind trying again?
Sorry for the inconvenience!
Jeffrey van Gogh
Google Container Registry Team

Resources