Permanently register app in Spring Data flow application - spring-cloud-dataflow

I have integrated Spring Data flow and uploaded application jar into the panel. However, whenever I restart the dataflow application I loose the app mapping with JAR. How can I permanently have it in spring-data-flow
I tried various places to register the app permanently but all in vain.
Thanks,
Dhruv

You need to add data source mapping to spring-data-flow application.
By default, it goes for embedded H2 database and hence the deployment gets lost.
Once I added the DB configuration. It was resolved.
Add the following lines in application.properties for mysql
server.port=8081
spring.datasource.url= jdbc:mysql://localhost:3306/app_batch
spring.datasource.username=root
spring.datasource.password=
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.jpa.hibernate.ddl-auto=none

SCDF requires a persistent RDBMS like MySQL, Oracle and others for production deployments.
The app-registry (i.e., a registry for app coordinates), task/batch execution history, stream/task definitions, audit trails, and other metadata about all of your deployments via SCDF are tracked in the persistent database.
If you don't provide one, by default, SCDF uses H2 - an in-memory database. Though it allows you to bootstrap with this database rapidly, it should not be used in production deployments. If the server restarts/crashes, the in-memory footprint goes away and a new session is created. That's why persistent storage is a requirement, so it can survive independently even when SCDF restarts.

Related

Neo4J upgrade: why are the user roles not transfered?

When upgrading from Neo4J 3.3.3 community to enterprise (or even between the versions), I noticed that the users, user roles, and permissions are not transferred.
Is this normal?
Do I have to set up the users manually every time there's an upgrade because they are stored in a separate DB?
In Neo4j 3.x.x, user and authentication data is stored in various directories at NEO4J_HOME/data/dbms, so you need to make sure you copy this over when you perform an upgrade within 3.x.x, and to sync this yourself if you're using a cluster.
In Neo4j 4.x.x, we introduced the concept of the system database to hold user, database, and security data, and this is automatically synced to a cluster. For backup/restore/upgrade you will need to include the system database when you perform these operations.
So we didn't have the concept of a separate database for this before 4.0, it only lived within discrete files.

Spring Cloud Dataflow Task database configuration

I'm trying to understand the expected behavior when running Batch tasks via Spring Cloud Dataflow wrt datasource configuration.
Is the idea that the Spring Batch database tables (BATCH_JOB_EXECUTION, etc.) would be in the SCDF database itself? There appears to be some magic happening when launching a task via SCDF where it creates those tables in the SCDF database and appears to use them. It seems to be injecting the SCDF datasource into my application?
I'm currently running on the localhost server version 2.0.1. Streams are working as expected, they use the datasource configured in application.properties.
Is the idea that the Spring Batch database tables (BATCH_JOB_EXECUTION, etc.) would be in the SCDF database itself?
Correct. It is required that the Spring Batch, Task, and SCDF share a common datasource if you are interested in tracking and managing the lifecycle of batch-jobs using the SCDF Shell/Dashboard.
If you include a batch-job in the Task application, it is the application that directly creates the Batch and Task schemas when it starts. SCDF doesn't inject datasource creds unless you intentionally request for it to do that when it launches the Tasks.
SCDF happens to partake in the same datasource, so it can in turn query the executions/status tables to show it in the Dashboard.
Here's some more background in the ref. guide.

Running multiple spring cloud task jobs within Spring Cloud data flow container on PCF

I am trying to execute multiple spring cloud task jobs within spring cloud data flow container on PCF. These jobs reads the raw file from a http source and then
parses it and writes that to mysql db.These jobs are written in plain java and not with spring batch.
I have binded mysql db with the scdf container on PCF . I believe spring cloud task will use mysql db to store the execution status of these status .I want the actual output records also to go in mysql.
My question is how the output records for each of these jobs will get stored in mysql db ? Will it use different schema for each of these parser jobs ? If not then how can I configure it to do so ?
Please share your thoughts if you have encountered this scenario.
Thanks!
Nilanjan
To orchestrate Tasks in SCDF, you'd have to supply an RDBMS and it looks like you've already done that. The task repository is primarily used to persist Task executions as a historical representation, so you can drill into the entire history of executions via the GUI/Shell.
You'd configure task-repository at the server level - see this CF-server's manifest.yml sample (under services: section) for reference.
My question is how the output records for each of these jobs will get stored in mysql db ?
If you'd like to also use the same datastore for all the tasks, it can be configured via SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES env-var. Anything supplied via this token would be automatically propagated to all the Task applications.
However, it is your responsibility to make sure the right database driver is in the classpath of your Task application. In your case, you'd need to have one of the mysql drivers.
Will it use different schema for each of these parser jobs ?
It is up to your business requirements. Whether it is a different schema or different set of tables, you'd have to determine what's needed for your requirements and make sure it exist/setup before binding the Task application via SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES.
If not then how can I configure it to do so ?
If you have to use a different datasource, you can supply a different mysql binding for the Task application that includes your requirement specific schema/table changes. Review this section to learn how autoconfiguration kicks in on PCF.
As an alternative option, you can selectively supply different mysql binding at each application, too - here's some docs on that.

Is all WSO2 API Manager's configuration saved in the database?

Say one implements a WSO2 API Manager Docker instance connecting to a separate database (like MySql) which is not dockerized. Say some API configuration is made within the API Manager (like referencing a Swagger file in a GitHub).
Say someone rebuilds the WSO2 API Manager Docker image (to modify CSS files for example), will the past configuration still be available from the separate database? Or does one have to reconfigure everything in the new Docker instance?
To put it in another way, if one needs to reconfigure everything, is there an easy way to do it? Something automatic?
All the configurations are stored in database. (Some are stored in internal registry, but registry saves data in database at the end)
API artifacts (synapse files) are saved in the file system [1]. You can use API Manager's API import/export tool to migrate API artifacts (and all other related files such as swagger, images, sequences etc.) between one server to another.
[1] <APIM_HOME>/repository/deployment/server/synapse-configs/default/api/

Elmah, what is the most efficient persistent mechanism for errors?

I have an asp.net MVC application in Azure web apps which connects to SQL Azure.
Currently I store Elmah errors in App_data. These can build up. Also I feel writing these files is inefficient. In addition when you download the "Next 50" errors, there can be a hit on the server.
How can I improve my persistence strategy? I suspect it may be to use a database. Would this be a seperate database to the application database or the same one?
I am also testing Application Insights. At present I suspect that Elmah has a role alongside Application Insights, but I might be wrong.
Thanks.
As mentioned in the previous answer you can store the log files in a sql azure database . or you can go with a cheaper option of azure table storage which is a no-sql based data store. There is a provider available for the same.
https://github.com/MisinformedDNA/Elmah.AzureTableStorage
https://www.nuget.org/packages/WindowsAzure.ELMAH.Tables/
or if you looking more at a data dump of your logs say in xml format and does not really need a queryable format you can opt for a much cheaper azure blob storage .
https://github.com/dampee/Blob-Elmah
An Elmah database can be used in a separate Azure DB to not consume your "Business" database DTUs and by the way never affect it's performance if you want to log a lot of things.
On the one hand Elmah can take care of "Functionnal" logs, on the other and application Insight can do telemery and monitoring logs, besides you can enable server and applications logs in the Azure Portal to get automatic logs in a storage account, here is an overview of those server and application logs.

Resources