Spring Cloud Dataflow Task database configuration - spring-cloud-dataflow

I'm trying to understand the expected behavior when running Batch tasks via Spring Cloud Dataflow wrt datasource configuration.
Is the idea that the Spring Batch database tables (BATCH_JOB_EXECUTION, etc.) would be in the SCDF database itself? There appears to be some magic happening when launching a task via SCDF where it creates those tables in the SCDF database and appears to use them. It seems to be injecting the SCDF datasource into my application?
I'm currently running on the localhost server version 2.0.1. Streams are working as expected, they use the datasource configured in application.properties.

Is the idea that the Spring Batch database tables (BATCH_JOB_EXECUTION, etc.) would be in the SCDF database itself?
Correct. It is required that the Spring Batch, Task, and SCDF share a common datasource if you are interested in tracking and managing the lifecycle of batch-jobs using the SCDF Shell/Dashboard.
If you include a batch-job in the Task application, it is the application that directly creates the Batch and Task schemas when it starts. SCDF doesn't inject datasource creds unless you intentionally request for it to do that when it launches the Tasks.
SCDF happens to partake in the same datasource, so it can in turn query the executions/status tables to show it in the Dashboard.
Here's some more background in the ref. guide.

Related

Spring Cloud Data Flow - rabbitmq source sink example

i am looking at the documentation for spring Cloud Data flow.
https://dataflow.spring.io/docs/recipes/rabbitmq/rabbit-source-sink/
This example that uses RabbitMQ as source and sink is using Spring Cloud Streams framework - which is fine. But it doesn't show how these 3 apps - source, sink and processor can be deployed to Spring Cloud Data Flow (SCDF), it simply just runs three jars locally and they talk to each other via RabbitMQ Queues.
I am not sure how this shows the use of SCDF in this case. There's no involvement of SCDF in this case. A proper example that shows how to deploy this jars as apps inside the SCDF needs to be provided. am i missing anything in this case?. i am hoping somebody else has tried them and can share their feedback about my concern.
The documentation here covers the SCDF part of how to manage those source, processor and sink applications.

Permanently register app in Spring Data flow application

I have integrated Spring Data flow and uploaded application jar into the panel. However, whenever I restart the dataflow application I loose the app mapping with JAR. How can I permanently have it in spring-data-flow
I tried various places to register the app permanently but all in vain.
Thanks,
Dhruv
You need to add data source mapping to spring-data-flow application.
By default, it goes for embedded H2 database and hence the deployment gets lost.
Once I added the DB configuration. It was resolved.
Add the following lines in application.properties for mysql
server.port=8081
spring.datasource.url= jdbc:mysql://localhost:3306/app_batch
spring.datasource.username=root
spring.datasource.password=
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.jpa.hibernate.ddl-auto=none
SCDF requires a persistent RDBMS like MySQL, Oracle and others for production deployments.
The app-registry (i.e., a registry for app coordinates), task/batch execution history, stream/task definitions, audit trails, and other metadata about all of your deployments via SCDF are tracked in the persistent database.
If you don't provide one, by default, SCDF uses H2 - an in-memory database. Though it allows you to bootstrap with this database rapidly, it should not be used in production deployments. If the server restarts/crashes, the in-memory footprint goes away and a new session is created. That's why persistent storage is a requirement, so it can survive independently even when SCDF restarts.

Spring Cloud Dataflow REST API: deploying Spring Batch-specific REST API and Console standalone?

I need a Spring Batch Admin-like application to embed in my own SB-powered Spring Boot application.
The Spring website says it's deprecated and been moved to the Spring Attic. They recommend making use of Spring Cloud Dataflow Console.
I investigated this, and it appears that there is a lot of additional functionality I don't need -- all I want to do is inspect and retry batch job executions.
Is there a means of getting only this functionality, short of carving out the Jobs controllers out of the REST API implementation, and building my own admin screens?
Yes, it is possible; however, you'd still have to use SCDF to gain access to the REST-APIs.
Once when you have SCDF running, you'd get access to the Task/Batch-job specific REST endpoints and that you can use in your custom dashboard tooling.

Spring Cloud data flow does not show Spring cloud task execution details

The Spring cloud dataflow documentation mentions
When executing tasks externally (i.e. command line) and you wish for Spring Cloud Data Flow to show the TaskExecutions in its UI, be sure that common datasource settings are shared among the both. By default Spring Cloud Task will use a local H2 instance and the execution will not be recorded to the database used by Spring Cloud Data Flow.
I am new to Spring cloud dataflow and spring cloud task. Can somebody help me how to setup a common datasource for both. For my development purpose I'm using the embedded H2 database. Can I use the embedded one to see task execution details in Spring Flo/Dashboard?
A common "datasource" must be shared between Spring Cloud Data Flow (SCDF) and your Spring Cloud Task (SCT) applications in order to track and monitor task executions. If the datasource is not shared, both SCDF and SCT applications by default use a individual H2 database. And because they are on different databases, the task-executions in SCDF won't have visibility to independent execution history of SCT microservice applications.
Make sure to supply common DB properties to both. In your case, you can supply the same H2 DB properties. It is as simple as Spring Boot DB property overrides.

Running multiple spring cloud task jobs within Spring Cloud data flow container on PCF

I am trying to execute multiple spring cloud task jobs within spring cloud data flow container on PCF. These jobs reads the raw file from a http source and then
parses it and writes that to mysql db.These jobs are written in plain java and not with spring batch.
I have binded mysql db with the scdf container on PCF . I believe spring cloud task will use mysql db to store the execution status of these status .I want the actual output records also to go in mysql.
My question is how the output records for each of these jobs will get stored in mysql db ? Will it use different schema for each of these parser jobs ? If not then how can I configure it to do so ?
Please share your thoughts if you have encountered this scenario.
Thanks!
Nilanjan
To orchestrate Tasks in SCDF, you'd have to supply an RDBMS and it looks like you've already done that. The task repository is primarily used to persist Task executions as a historical representation, so you can drill into the entire history of executions via the GUI/Shell.
You'd configure task-repository at the server level - see this CF-server's manifest.yml sample (under services: section) for reference.
My question is how the output records for each of these jobs will get stored in mysql db ?
If you'd like to also use the same datastore for all the tasks, it can be configured via SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES env-var. Anything supplied via this token would be automatically propagated to all the Task applications.
However, it is your responsibility to make sure the right database driver is in the classpath of your Task application. In your case, you'd need to have one of the mysql drivers.
Will it use different schema for each of these parser jobs ?
It is up to your business requirements. Whether it is a different schema or different set of tables, you'd have to determine what's needed for your requirements and make sure it exist/setup before binding the Task application via SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES.
If not then how can I configure it to do so ?
If you have to use a different datasource, you can supply a different mysql binding for the Task application that includes your requirement specific schema/table changes. Review this section to learn how autoconfiguration kicks in on PCF.
As an alternative option, you can selectively supply different mysql binding at each application, too - here's some docs on that.

Resources