Spring Cloud Dataflow: how to persist stream definitions - spring-cloud-dataflow

I am using the local server of spring cloud dataflow. Each time I restart the server, all deployed apps and stream definitions are lost. How can I persist my stream definitions so that they survive server restarts?

As of RC1, the stream/task/job definitions among other metadata specs can be configured to persist in an RDBMS, and there's support for many of the commonly used databases. If nothing is provided, the default embedded h2 database is used, which is in-memory and it is recommended only for development purposes.

Related

Comparison StreamPipes vs Spring Cloud Dataflow

I'm comparing Apache StreamPipes and SCDF (Spring Cloud Dataflow).
I found out that there are some similarities:
Components of the Stream are executed as microservices via Wrappers (Flink/standalone)
Internally uses Message Broker to automatically create required topics and connect pipeline-components by that
I found nothing about a Support for using Kubernetes as Execution Engine. Is something planned in the Future? Anyone knows some other differences/similarities?

Spring Cloud Data Flow Stream Deployment to Cloud Foundry

I am new to spring cloud data flow. I am trying to build a simple http source and rabbitmq sink stream using SCDF stream app.The stream should be deployed on OSCF (Cloud Foundry). Once deployed, the stream should be able to receive HTTP POST Request and send the request data to RabbitMQ.
So far, I have downloaded Data Flow Server using below link and push to cloud foundry. I am using Shall application from my local.
https://dataflow.spring.io/docs/installation/cloudfoundry/cf-cli/.
I also have HTTP Source and RabbitMQ Sink application which is deployed in CF. RabbitMQ service is also bound to sink application.
My question - how can I create a stream using application deployed in CF? Registering app requires HTTP/File/Maven URI but I am not sure how can an app deployed on CF be registered?
Appreciate your help. Please let me know if more details are needed?
Thanks
If you're using the out-of-the-box apps that we ship, the relevant Maven repo configuration is already set within SCDF, so you can freely already deploy the http app, and SCDF would resolve and pull it from the Spring Maven repository and then deploy that application to CF.
However, if you're building custom apps, you can configure your internal/private Maven repositories in SCDF/Skipper and then register your apps using the coordinates from your internal repo.
If Maven is not a viable solution for you on CF, I have seen customers resolve artifacts from s3 buckets and persistent-volume services in CF.

Duplicated port of child tasks in Spring Cloud Data Flow

When I launch new task (Spring Batch Job) using Spring Cloud Data Flow, I see that SCDF auto initialize Tomcat with some "random" ports but I do not know if there ports are created randomly or following any rule of the framework?
Therefore, I sometime have a trouble that "Web server failed to start. Port 123456 was already in use".
In conclusion, my questions are:
1) How does the framework choose ports for initializing? (randomly or by principle)?
2) Are there anyway to launch task effectively without duplicated ports(fixed configuration or method for choosing unused port at particular time)?
I don't think SCDF has anything to do with the port assignment etc.,
It is your task application that gets launched. You need to decide whether you really need the web dependency that brings in the tomcat to your application.
Assuming you use Spring Boot, you can either exclude the web starter dependency in your dependencies or pass the command line arg server.port=<?> to a specific port when launching the task (if you really need this task app to be a web app).

Stream apps not using the buildpack provided in SCDF server environment variable (SCDF ver 2.1.2)

Recently, I upgraded from SCDF 1.7.3 to SCDF 2.1.2 for cloud foundry. Also, I am using skipper (I have to with 2.x). There are two main problems I am facing:-
Buildpack given as a property in the SCDF server environment is not being used to deploy stream applications. Following is the env key that I am using:-
SPRING_CLOUD_DATAFLOW_STREAM_PLATFORM_CLOUDFOUNDRY_ACCOUNTS[xxx]_DEPLOYMENT_BUILDPACK. This has no effect at all.
Even though I set SPRING_CLOUD_DATAFLOW_STREAM_PLATFORM_CLOUDFOUNDRY_ACCOUNTS[xxx]_DEPLOYMENT_ENABLE_RANDOM_APP_NAME_PREFIX to false skipper still generates random prefix for these applications.
I am not sure what I am doing wrong. Any advice will be of great help.
There are no stream platform properties with the prefix SPRING_CLOUD_DATAFLOW_STREAM_PLATFORM_CLOUDFOUNDRY in Spring Cloud Data Flow as the stream deployments are managed by Spring Cloud Skipper. Hence, you need to use the Skipper properties for stream deployment-related configurations.
The correct properties to use in this case are:
SPRING_CLOUD_SKIPPER_SERVER_PLATFORM_CLOUDFOUNDRY_ACCOUNTS[xxx]_DEPLOYMENT_ENABLERANDOMAPPNAMEPREFIX: false
SPRING_CLOUD_SKIPPER_SERVER_PLATFORM_CLOUDFOUNDRY_ACCOUNTS[xxx]_DEPLOYMENT_BUILDPACK:

Configure Spring Cloud Task to use Kafa of Spring Cloud Data Flow server

I have a Spring Cloud Data Flow (SCDF) server running on Kubernetes cluster with Kafka as the message broker. Now I am trying to launch a Spring Cloud Task (SCT) that writes to a topic in Kafka. I would like the SCT to use the same Kafka that SCDF is using. This brings up two questions that I have and hope they can be answered:
How to configure the SCT to use the same Kafka as SCDF?
Is it possible to configure the SCT so that the Kafka server uri can be passed to the SCT automatically when it launches, similar to
the data source properties that get passed to the SCT at launch?
As I could not find any examples on how to achieve this, help is very appreciated.
Edit: My own answer
This is how I get it working for my case. My SCT requires spring.kafka.bootstrap-servers to be supplied. From SCDF's shell, I provide it as an argument --spring.kafka.bootstrap-servers=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT}, where KAFKA_SERVICE_HOST and KAFKA_SERVICE_PORT are environment variables created by SCDF's k8s setup script.
This is how to launch the task within SCDF's shell
dataflow:>task launch --name sample-task --arguments "--spring.kafka.bootstrap-servers=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT}"
You may want to review the Spring Cloud Task Events section in the reference guide.
The expectation is that you'd choose the binder of choice and pack that library in the Task application's classpath. With that dependency, you'd then configure the application with Spring Cloud Stream's Kafka binder properties such as the spring.cloud.stream.kafka.binder.brokers and others that are relevant to connect to the existing Kafka cluster.
Upon launching the Task application (from SCDF) with these configurations, you'd be able to publish or receive events in your Task app.
Alternatively, with the Kafka-binder in the classpath of the Task application, you can define the Kafka binder properties to all the Task's launched by SCDF via global configuration. See Common Application Properties in the ref. guide for more information. In this model, you don't have to configure each of the Task application with Kafka properties explicitly, but instead, SCDF would propagate them automatically when it launches the Tasks. Keep in mind that these properties would be supplied to all the Task launches.

Resources