Forbidden error using paritioned job with Spring Cloud Data Flow on Kubernetes - spring-cloud-dataflow

I want to implement a remote partitioned job using Spring Cloud Data Flow on Kuberentes. The Skipper server is not installed because I just need to run tasks and jobs.
I modified the partitioned batch job sample project using spring-cloud-deployer-kubernetes instead of the local one, as suggested here.
When the master job tries to launch a worker I get the "Forbidden" error below in the pod logs:
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.43.0.1/api/v1/namespaces/svi-scdf-poc/pods/partitionedbatchjobtask-39gvq3p8ok. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "partitionedbatchjobtask-39gvq3p8ok" is forbidden: User "system:serviceaccount:svi-scdf-poc:default" cannot get resource "pods" in API group "" in the namespace "svi-scdf-poc".
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:589) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:526) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:492) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:890) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:233) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:187) ~[kubernetes-client-4.10.3.jar:na]
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:79) ~[kubernetes-client-4.10.3.jar:na]
at org.springframework.cloud.deployer.spi.kubernetes.KubernetesTaskLauncher.getPodByName(KubernetesTaskLauncher.java:411) ~[spring-cloud-deployer-kubernetes-2.5.0.jar:2.5.0]
at org.springframework.cloud.deployer.spi.kubernetes.KubernetesTaskLauncher.buildPodStatus(KubernetesTaskLauncher.java:350) ~[spring-cloud-deployer-kubernetes-2.5.0.jar:2.5.0]
at org.springframework.cloud.deployer.spi.kubernetes.KubernetesTaskLauncher.buildTaskStatus(KubernetesTaskLauncher.java:345) ~[spring-cloud-deployer-kubernetes-2.5.0.jar:2.5.0]
In my understanding, it should be correct that the master job pod tries deploy the worker pod, so it seems to be just a permission problem, or is the Skipper server required?
If my assumptions are correct, should I just configure SCDF to assign a spefic service account to the master pod?

Ran into the same issue for partitioned-batch-job, but saw these options in the official documentation to specify service account at app level and server. I tried the app level one (via SCDF dashboard task launch properties) and it worked. I just specified the service account created by the SCDF helm deployment. Made me wonder why this was not used by default though, and required me to specify this again when launching app (ie. shouldn't the server level service account be set to that by default). The pod logs showed the k8s 'default' service account was being used when launching.

Related

How does the dataflow authenticate the worker service account?

I created a service account in project A to use as a worker service account for dataflow.
I specify the worker service account in dataflow's options
I've looked for an dataflow's option to specify Service account keys for the worker service account, but can't find it.
I ran it with the following program arguments and it worked fine.I ran it with a service account that is different from the worker service account that exists in project A.
--project=projectA --serviceAccount=my-service-account-name#projectA.iam.gserviceaccount.com
I didn't load the Json credentials file for the worker service account in my Apache Beam application.
And I haven't specified the service account key for the worker service account in the dataflow options.
How does the dataflow authenticate the worker service account?
Please take a look at Dataflow security and permissions -> Security and permissions for pipelines on Google Cloud.
It uses the project's Compute Engine default service account as the worker service account by default.

How to determine service account used to run Dataflow job?

My Dataflow job fails when it tries to access a secret:
"Exception in thread "main" com.google.api.gax.rpc.PermissionDeniedException: io.grpc.StatusRuntimeException: PERMISSION_DENIED: Permission 'secretmanager.versions.access' denied for resource 'projects/REDACTED/secrets/REDACTED/versions/latest' (or it may not exist)."
I launch the job using gcloud dataflow flex-template run. I am able to view the secret in the console. The same code works when I run it on my laptop. As I understand it, when I submit a job with the above command, it runs under a service account that may have different permissions. How do I determine which service account the job runs under?
Since Dataflow creates workers, they create instances. You can check this on Logging
Open GCP console
Open Logging -> Logs Explorer (make sure you are not using the "Legacy Logs Viewer")
At the query builder type in protoPayload.serviceName="compute.googleapis.com"
Click Run Query
Expand the entry for v1.compute_instances.create or any other resources used by compute.googleapis.com
You should be able to see the service account used for creating the instance. This service account (boxed in red) is used anything related to the running the dataflow job.
Take note that I tested this using the official dataflow quick start.
By default the worker nodes of dataflow run with the compute engine default service account (YOUR_PROJECT_NUMBER-compute#developer.gserviceaccount.com) lacking of the "Secret Manager Secret Accessor" rights.
Either you need to add those rights to the service account or you have to specify the service account in the pipeline options:
gcloud dataflow flex-template run ... --parameters service_account_email="your-service-account-name#YOUR_PROJECT_NUMBER.iam.gserviceaccount.com"

MetricsExtension.Native.Exe is not running when deploy Geneva monitoring agent on service fabric

The problem is:
Received service fabric metrics should be under new namespace, but still under old namespace;
A measure metric didn't appear.
Config:

Airflow: Could not send worker log to S3

I deployed Airflow webserver, scheduler, worker, and flower on my kubernetes cluster using Docker images.
Airflow version is 1.8.0.
Now I want to send worker logs to S3 and
Create S3 connection of Airflow from Admin UI (Just set S3_CONN as
conn id, s3 as type. Because my kubernetes cluster is running on
AWS and all nodes have S3 access roles, it should be sufficient)
Set Airflow config as follows
remote_base_log_folder = s3://aws-logs-xxxxxxxx-us-east-1/k8s-airflow
remote_log_conn_id = S3_CONN
encrypt_s3_logs = False
and first I tried creating a DAG so that it just raises an exception immediately after it's running. This works, log can be seen on S3.
So I modified so that the DAG now creates an EMR cluster and waits for it to be ready (waiting status). To do this, I restarted all 4 docker containers of airflow.
Now the DAG looks working, a cluster is started and once it's ready, DAG marked as success. But I could see no logs on S3.
There is no related error log on worker and web server, so I even cannot see what may cause this issue. The log just not sent.
Does anyone know if there is some restriction for remote logging of Airflow, other than this description in the official documentation?
https://airflow.incubator.apache.org/configuration.html#logs
In the Airflow Web UI, local logs take precedence over remote logs. If
local logs can not be found or accessed, the remote logs will be
displayed. Note that logs are only sent to remote storage once a task
completes (including failure). In other words, remote logs for running
tasks are unavailable.
I didn't expect it but on success, will the logs not be sent to remote storage?
The boto version that is installed with airflow is 2.46.1 and that version doesn't use iam instance roles.
Instead, you will have to add an access key and secret for an IAM user that has access in the extra field of your S3_CONN configuration
Like so:
{"aws_access_key_id":"123456789","aws_secret_access_key":"secret12345"}

Google Cloud Jenkins gcloud push access denied

I'm trying via Jenkins to push an image to the container repository. It was working at first, but now, I got "access denied"
docker -- push gcr.io/xxxxxxx-yyyyy-138623/myApp:master.1
The push refers to a repository [gcr.io/xxxxxxx-yyyyy-138623/myApp]
bdc3ba7fdb96: Preparing
5632c278a6dc: Waiting
denied: Access denied.
the Jenkinsfile look like :
sh("gcloud docker --authorize-only")
sh("docker -- push gcr.io/xxxxxxx-yyyyy-138623/hotelpro4u:master.1")
Remarks:
Jenkins is running in Google Cloud
If I try in Google Shell or from my computer, it's working
I followed this tutorial : https://github.com/GoogleCloudPlatform/continuous-deployment-on-kubernetes
I'm stuck while 12 hours.... I need help
That error means that the GKE node is not authorized to push to the GCS bucket that is backing your repository.
This could be because:
The cluster does not have the correct scopes to authenticate to GCS. Did you create the cluster w/ --scopes storage-rw?
The service account that the cluster is running as does not have permissions on the bucket. Check the IAM & Admin section on your project to make sure that the service account has the necessary role.
Building on #cj-cullen's answer above, you have two options:
Destroy the node pool and then, from the CLI, recreate it with the missing https://www.googleapis.com/auth/projecthosting,storage-rw scope. The GKE console does not have the capability to change the default scopes when creating a node pool from the console.
Stop each instance in your cluster. In the console, click the edit button for the instance. You should now be able to add the appropriate https://www.googleapis.com/auth/projecthosting,storage-rw scope.

Resources