Enabling Scheduler for spring cloud data flow server in pcf - spring-cloud-dataflow

We are using PCF to run our applications, To build data pipelines we thought of leveraging the Spring cloud data flow server, which is given as service inside PCF.
We created a DataFlow server by giving SQL server and maven repo details, and for the scheduler, we didn't provide any extra parameters while creating service, so by default, it is disabled.
Got some info from here, how to enable scheduler: https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#_enabling_scheduling
So I tried updating the existing Data Flow service with the below command:
cf updat-service my-service -c '{"spring.cloud.dataflow.features.schedules-enabled":true}'
the Data Flow server is restarted, but still the scheduler is not enabled to schedule the jobs.
When I check with this endpoint GET /about from the Data Flow server, I am still getting
"schedulesEnabled": false
in response body.

I am not sure why the SCDF service isn't updated with the schedules enabled property even after you update service (as it is expected to have it enabled).
Irrespective of that you can try setting the following as environment property for SCDF service instance as well:
SPRING_CLOUD_DATAFLOW_FEATURES_SCHEDULES_ENABLED: true
Once the schedule is enabled, you need to make sure that you have the following properties set correctly as well:
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES: <all-the-services-for-tasks-along-with-the-scheduler-service-instance>
SPRING_CLOUD_SCHEDULER_CLOUDFOUNDRY_SCHEDULER_URL: <scheduler-url>

Related

Cloud Scheduler has Permission Denied when attempting to run a Cloud Run job

I have created a simple Cloud Run job. I am able to trigger this code via a curl command:
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" https://sync-<magic>.a.run.app
(Obviously <magic> is actually something else)
Cloud Run is configured for Ingress to Allow All Traffic and with Authentication to be required.
I followed this documentation: https://cloud.google.com/run/docs/triggering/using-scheduler
And created a service account, granted it the Cloud Run Invoker Role and then setup an HTTP scheduled job to GET the same URL I tested with CURL. I have Add OIDC Token selected, and I provide the service account created above and the Audience which is the same URL I used with curl.
When I attempt to trigger this job (or when it triggers based of the native cron) it fails with:
{ "status": "PERMISSION_DENIED", "#type": "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished", "targetType": "HTTP", "jobName": "projects/<project>/locations/<region>/jobs/sync", "url": "https://sync-<magic>.a.run.app/" }
Again <project>, <region> and <magic> have real values.
I tried using service-YOUR_PROJECT_NUMBER#gcp-sa-cloudscheduler.iam.gserviceaccount.com with YOUR_PROJECT_NUMBER updated appropriately as the service account that runs the scheduled job. It has the same error.
Any advice on how to debug this would be greatly appreciated!
Here is what i did which solved the issue altogether and now I get the success flag when running a secure Cloud Run service via a Cloud Scheduler job -
Create your service on Cloud run - let's call it "hello" and make it secured by removing "allUsers" permission from the list of Permissions PRINCIPALS - you should get an error when going to the endpoint as such - Error: Forbidden
Your client does not have permission to get URL / from this server.
Create an IAM service account for cloud scheduler - let's call it "cloud-scheduler" you will get this: cloud-scheduler#project-ID.iam.gserviceaccount.com now comes the important part :
Give your SA the ability to run Scheduler Jobs by adding the -
Cloud Run Invoker & Cloud Scheduler Job Runner permissions
Create your Cloud scheduler job and add the new SA to it according to google procedure :
Auth header: Add OIDC token
Service account: cloud-scheduler#project-id.iam.gserviceaccount.com
Audience : https://Service.url.from.cloud.run.service/
Add to your cloud run service an additional principal that will let your SA access to cloud run invoker
Run your scheduler and voila - all green !
Enjoy
I have tried to create a new service account, gave it Cloud run invoker role. Disable the Cloud Scheduler API and re-enable it.
The only thing that work for me is changing Auth header from Add OIDC token to None.
For some reason Cloud Scheduler change None back to Add OIDC token and Trigger cloud run normally

Spring Cloud Data Flow Stream Deployment to Cloud Foundry

I am new to spring cloud data flow. I am trying to build a simple http source and rabbitmq sink stream using SCDF stream app.The stream should be deployed on OSCF (Cloud Foundry). Once deployed, the stream should be able to receive HTTP POST Request and send the request data to RabbitMQ.
So far, I have downloaded Data Flow Server using below link and push to cloud foundry. I am using Shall application from my local.
https://dataflow.spring.io/docs/installation/cloudfoundry/cf-cli/.
I also have HTTP Source and RabbitMQ Sink application which is deployed in CF. RabbitMQ service is also bound to sink application.
My question - how can I create a stream using application deployed in CF? Registering app requires HTTP/File/Maven URI but I am not sure how can an app deployed on CF be registered?
Appreciate your help. Please let me know if more details are needed?
Thanks
If you're using the out-of-the-box apps that we ship, the relevant Maven repo configuration is already set within SCDF, so you can freely already deploy the http app, and SCDF would resolve and pull it from the Spring Maven repository and then deploy that application to CF.
However, if you're building custom apps, you can configure your internal/private Maven repositories in SCDF/Skipper and then register your apps using the coordinates from your internal repo.
If Maven is not a viable solution for you on CF, I have seen customers resolve artifacts from s3 buckets and persistent-volume services in CF.

How to determine service account used to run Dataflow job?

My Dataflow job fails when it tries to access a secret:
"Exception in thread "main" com.google.api.gax.rpc.PermissionDeniedException: io.grpc.StatusRuntimeException: PERMISSION_DENIED: Permission 'secretmanager.versions.access' denied for resource 'projects/REDACTED/secrets/REDACTED/versions/latest' (or it may not exist)."
I launch the job using gcloud dataflow flex-template run. I am able to view the secret in the console. The same code works when I run it on my laptop. As I understand it, when I submit a job with the above command, it runs under a service account that may have different permissions. How do I determine which service account the job runs under?
Since Dataflow creates workers, they create instances. You can check this on Logging
Open GCP console
Open Logging -> Logs Explorer (make sure you are not using the "Legacy Logs Viewer")
At the query builder type in protoPayload.serviceName="compute.googleapis.com"
Click Run Query
Expand the entry for v1.compute_instances.create or any other resources used by compute.googleapis.com
You should be able to see the service account used for creating the instance. This service account (boxed in red) is used anything related to the running the dataflow job.
Take note that I tested this using the official dataflow quick start.
By default the worker nodes of dataflow run with the compute engine default service account (YOUR_PROJECT_NUMBER-compute#developer.gserviceaccount.com) lacking of the "Secret Manager Secret Accessor" rights.
Either you need to add those rights to the service account or you have to specify the service account in the pipeline options:
gcloud dataflow flex-template run ... --parameters service_account_email="your-service-account-name#YOUR_PROJECT_NUMBER.iam.gserviceaccount.com"

Duplicated port of child tasks in Spring Cloud Data Flow

When I launch new task (Spring Batch Job) using Spring Cloud Data Flow, I see that SCDF auto initialize Tomcat with some "random" ports but I do not know if there ports are created randomly or following any rule of the framework?
Therefore, I sometime have a trouble that "Web server failed to start. Port 123456 was already in use".
In conclusion, my questions are:
1) How does the framework choose ports for initializing? (randomly or by principle)?
2) Are there anyway to launch task effectively without duplicated ports(fixed configuration or method for choosing unused port at particular time)?
I don't think SCDF has anything to do with the port assignment etc.,
It is your task application that gets launched. You need to decide whether you really need the web dependency that brings in the tomcat to your application.
Assuming you use Spring Boot, you can either exclude the web starter dependency in your dependencies or pass the command line arg server.port=<?> to a specific port when launching the task (if you really need this task app to be a web app).

Configure Spring Cloud Task to use Kafa of Spring Cloud Data Flow server

I have a Spring Cloud Data Flow (SCDF) server running on Kubernetes cluster with Kafka as the message broker. Now I am trying to launch a Spring Cloud Task (SCT) that writes to a topic in Kafka. I would like the SCT to use the same Kafka that SCDF is using. This brings up two questions that I have and hope they can be answered:
How to configure the SCT to use the same Kafka as SCDF?
Is it possible to configure the SCT so that the Kafka server uri can be passed to the SCT automatically when it launches, similar to
the data source properties that get passed to the SCT at launch?
As I could not find any examples on how to achieve this, help is very appreciated.
Edit: My own answer
This is how I get it working for my case. My SCT requires spring.kafka.bootstrap-servers to be supplied. From SCDF's shell, I provide it as an argument --spring.kafka.bootstrap-servers=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT}, where KAFKA_SERVICE_HOST and KAFKA_SERVICE_PORT are environment variables created by SCDF's k8s setup script.
This is how to launch the task within SCDF's shell
dataflow:>task launch --name sample-task --arguments "--spring.kafka.bootstrap-servers=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT}"
You may want to review the Spring Cloud Task Events section in the reference guide.
The expectation is that you'd choose the binder of choice and pack that library in the Task application's classpath. With that dependency, you'd then configure the application with Spring Cloud Stream's Kafka binder properties such as the spring.cloud.stream.kafka.binder.brokers and others that are relevant to connect to the existing Kafka cluster.
Upon launching the Task application (from SCDF) with these configurations, you'd be able to publish or receive events in your Task app.
Alternatively, with the Kafka-binder in the classpath of the Task application, you can define the Kafka binder properties to all the Task's launched by SCDF via global configuration. See Common Application Properties in the ref. guide for more information. In this model, you don't have to configure each of the Task application with Kafka properties explicitly, but instead, SCDF would propagate them automatically when it launches the Tasks. Keep in mind that these properties would be supplied to all the Task launches.

Resources