Configure Spring Cloud Task to use Kafa of Spring Cloud Data Flow server - spring-cloud-dataflow

I have a Spring Cloud Data Flow (SCDF) server running on Kubernetes cluster with Kafka as the message broker. Now I am trying to launch a Spring Cloud Task (SCT) that writes to a topic in Kafka. I would like the SCT to use the same Kafka that SCDF is using. This brings up two questions that I have and hope they can be answered:
How to configure the SCT to use the same Kafka as SCDF?
Is it possible to configure the SCT so that the Kafka server uri can be passed to the SCT automatically when it launches, similar to
the data source properties that get passed to the SCT at launch?
As I could not find any examples on how to achieve this, help is very appreciated.
Edit: My own answer
This is how I get it working for my case. My SCT requires spring.kafka.bootstrap-servers to be supplied. From SCDF's shell, I provide it as an argument --spring.kafka.bootstrap-servers=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT}, where KAFKA_SERVICE_HOST and KAFKA_SERVICE_PORT are environment variables created by SCDF's k8s setup script.
This is how to launch the task within SCDF's shell
dataflow:>task launch --name sample-task --arguments "--spring.kafka.bootstrap-servers=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT}"

You may want to review the Spring Cloud Task Events section in the reference guide.
The expectation is that you'd choose the binder of choice and pack that library in the Task application's classpath. With that dependency, you'd then configure the application with Spring Cloud Stream's Kafka binder properties such as the spring.cloud.stream.kafka.binder.brokers and others that are relevant to connect to the existing Kafka cluster.
Upon launching the Task application (from SCDF) with these configurations, you'd be able to publish or receive events in your Task app.
Alternatively, with the Kafka-binder in the classpath of the Task application, you can define the Kafka binder properties to all the Task's launched by SCDF via global configuration. See Common Application Properties in the ref. guide for more information. In this model, you don't have to configure each of the Task application with Kafka properties explicitly, but instead, SCDF would propagate them automatically when it launches the Tasks. Keep in mind that these properties would be supplied to all the Task launches.

Related

Comparison StreamPipes vs Spring Cloud Dataflow

I'm comparing Apache StreamPipes and SCDF (Spring Cloud Dataflow).
I found out that there are some similarities:
Components of the Stream are executed as microservices via Wrappers (Flink/standalone)
Internally uses Message Broker to automatically create required topics and connect pipeline-components by that
I found nothing about a Support for using Kubernetes as Execution Engine. Is something planned in the Future? Anyone knows some other differences/similarities?

Spring Cloud Data Flow Stream Deployment to Cloud Foundry

I am new to spring cloud data flow. I am trying to build a simple http source and rabbitmq sink stream using SCDF stream app.The stream should be deployed on OSCF (Cloud Foundry). Once deployed, the stream should be able to receive HTTP POST Request and send the request data to RabbitMQ.
So far, I have downloaded Data Flow Server using below link and push to cloud foundry. I am using Shall application from my local.
https://dataflow.spring.io/docs/installation/cloudfoundry/cf-cli/.
I also have HTTP Source and RabbitMQ Sink application which is deployed in CF. RabbitMQ service is also bound to sink application.
My question - how can I create a stream using application deployed in CF? Registering app requires HTTP/File/Maven URI but I am not sure how can an app deployed on CF be registered?
Appreciate your help. Please let me know if more details are needed?
Thanks
If you're using the out-of-the-box apps that we ship, the relevant Maven repo configuration is already set within SCDF, so you can freely already deploy the http app, and SCDF would resolve and pull it from the Spring Maven repository and then deploy that application to CF.
However, if you're building custom apps, you can configure your internal/private Maven repositories in SCDF/Skipper and then register your apps using the coordinates from your internal repo.
If Maven is not a viable solution for you on CF, I have seen customers resolve artifacts from s3 buckets and persistent-volume services in CF.

Enabling Scheduler for spring cloud data flow server in pcf

We are using PCF to run our applications, To build data pipelines we thought of leveraging the Spring cloud data flow server, which is given as service inside PCF.
We created a DataFlow server by giving SQL server and maven repo details, and for the scheduler, we didn't provide any extra parameters while creating service, so by default, it is disabled.
Got some info from here, how to enable scheduler: https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#_enabling_scheduling
So I tried updating the existing Data Flow service with the below command:
cf updat-service my-service -c '{"spring.cloud.dataflow.features.schedules-enabled":true}'
the Data Flow server is restarted, but still the scheduler is not enabled to schedule the jobs.
When I check with this endpoint GET /about from the Data Flow server, I am still getting
"schedulesEnabled": false
in response body.
I am not sure why the SCDF service isn't updated with the schedules enabled property even after you update service (as it is expected to have it enabled).
Irrespective of that you can try setting the following as environment property for SCDF service instance as well:
SPRING_CLOUD_DATAFLOW_FEATURES_SCHEDULES_ENABLED: true
Once the schedule is enabled, you need to make sure that you have the following properties set correctly as well:
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES: <all-the-services-for-tasks-along-with-the-scheduler-service-instance>
SPRING_CLOUD_SCHEDULER_CLOUDFOUNDRY_SCHEDULER_URL: <scheduler-url>

Duplicated port of child tasks in Spring Cloud Data Flow

When I launch new task (Spring Batch Job) using Spring Cloud Data Flow, I see that SCDF auto initialize Tomcat with some "random" ports but I do not know if there ports are created randomly or following any rule of the framework?
Therefore, I sometime have a trouble that "Web server failed to start. Port 123456 was already in use".
In conclusion, my questions are:
1) How does the framework choose ports for initializing? (randomly or by principle)?
2) Are there anyway to launch task effectively without duplicated ports(fixed configuration or method for choosing unused port at particular time)?
I don't think SCDF has anything to do with the port assignment etc.,
It is your task application that gets launched. You need to decide whether you really need the web dependency that brings in the tomcat to your application.
Assuming you use Spring Boot, you can either exclude the web starter dependency in your dependencies or pass the command line arg server.port=<?> to a specific port when launching the task (if you really need this task app to be a web app).

scdf 1.7.3 docker k8s function-runner not start

Trying to deploy into scdf k8s a function-runner into a stream
http --server.port=9001 | f-run: function-runner --function.className=com.example.functions.CharCounter --class-name=com.example.functions.CharCounter --location="maven://io.spring.sample:function-sample:jar:1.0.2" | log
I've create a docker image using function-runner-kafka 1.1.0.M1.
Always get :
***************************
APPLICATION FAILED TO START
***************************
Description:
Binding to target org.springframework.cloud.stream.app.function.app.FunctionProperties#264f218 failed:
Property: function.className
Value: null
Reason: may not be empty
Action:
Update your application's configuration
Into stream definition, how can i set the maven uri to the function-jar?
I wanna run function-runner with jar code into a scdf k8s
The function-runner model is deprecated in favor of native Spring Cloud Function integration available in Spring Cloud Stream.
You can simply build a Spring Cloud Stream application with just function #Bean's and have them part of the functional flow as a chain or resolved as an individual function at runtime.
See ref. guide for more details.
Once when you have an application with multiple function #Bean's, you also have the ability to compose them with other App Starters and use it in the SCDF DSL, too.
Refer to this blog on this subject for more background.

Resources