If I kill spring cloud task jar it is not reflecting in SCDF dashboard - spring-cloud-dataflow

We triggered spring cloud task using SCDF and abnormally if i kill the running jar which triggered for SCDF. In SCDF Ui still it is running state is there any way we can get the failed status in SCDF

Related

Spring cloud data flow kubernetes

I have a basic question about deploying Spring tasks/batch jobs on SCDF Kubernetes. Now if I deploy the SCDF on Kubernetes and then schedule a batch job, which Kubernetes cluster is the batch job deployed on? where is the pod created? the same cluster where the SCDF server is running?
By default the apps are deployed on the same cluster and namespace as the SCDF server but this is configurable. You can configure any number of target “platforms”. Each platform is essentially a set of deployment properties keyed to a logical name. You can set the platform name as a parameter on which each task is launched. This is described here

How to handle flink management and k8s management

I'm considering deploying Flink with K8s. I'm a newbie on Flink and have a simple question:
Saying that I use K8s to manager dockers and deploy the TaskManager into the dockers.
As my understanding, a docker can be restarted by K8s when it fails, and a Task can be restarted by Flink when it fails.
If a Task is running in a container of docker and the container suddenly fails for some reason, in the Flink's view, a Task failed so the task should be restarted, and in the K8s' view, a container failed so the docker should be restarted. In this case, should we worry about some conflict because of the two kinds of "be restarted"?
I think you want to read up on the official kubernetes setup guide here: https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html
It describes 3 ways of getting it to work:
Session Cluster: This involves spinning up 2 deployments in the appendix and requires you to submit your Flink job manually or via a script in the beginning. This is very similar to a local standalone cluster when you are developing, except it is now in your kubernetes cluster
Job Cluster: By deploying Flink as a k8s job, you would be able to eliminate the job submission step.
Helm chart: By the look of it, the project has not updated this for 2 years, so your mileage may vary.
I have had success with a Session Cluster, but I would eventually like to try the "proper" way, which is to deploy it as kubernetes job using the 2nd method by the looks of it.
Depending on your Flink Source and the kind of failure, your Flink job will fail differently. You shouldn't worry about the "conflict". Either Kubernetes is going to restart the container, or Flink is going to handle the error it could handle. After a certain amount of retry it would cancel, depending on how you configured this. See Configuration for more details. If the container exited with a code that is not 0, Kubernetes would try to restart it. However, it may or may not resubmit the job depending on whether you deployed the job in a Job Cluster or whether you had an initialization script for the image you used. In a Session Cluster, this can be problematic depending on whether the job submission is done through task manager or job manager. If the job was submitted through task manager, then we need to cancel the existing failed job so the resubmitted job can start.
Note: if you did go with the Session Cluster and have a file system based Stateful Backend (non-RocksDB Stateful Backend) for checkpoints, you would need to figure out a way for the job manager and task manager to share a checkpoint directory.
If the task manager uses a checkpoint directory that is inaccessible to the job manager, the task manager's persistence layer would build up and eventually cause some kind of out of disk space error. This may not be a problem if you decided to go with RocksDB and enable incremental checkpoints

Can Spring Cloud Task's Partitioned Job be executed in Spring Cloud Data Flow?

I am trying to setup and execute the Spring Cloud Tasks Sample of partitioned batch job (https://github.com/spring-cloud/spring-cloud-task/tree/master/spring-cloud-task-samples/partitioned-batch-job) in Spring Cloud Data Flow Server.
But for some reason there are errors in the partitioned job tasks:
A job execution for this job is already running: JobInstance: id=2, version=0, Job=[partitionedJob]
Is the partition job incompatible with Spring Cloud Dataflow server?
Yes, the sample partitioned batch job is compatible with Spring Cloud Data Flow server and work out of the box so long as:
The datasource is either H2 or Mysql.
And you are using the Spring Cloud Data Flow Server Local
But it is difficult to diagnose the issue without knowing which Data Flow Server you are using and the database. Also were there any exceptions?

How to persist jobs in Jenkins Mesos framework?

i have started Jenkins Scheduler (Framework) as Marathon app. Now if the Jenkins Scheduler dies somehow, the Marathon will restart it. But all the jobs and settings will be gone. How to persist jobs in Jenkins Mesos framework if it dies and started again?
The Jenkins plugin for Mesos does not yet support scheduler HA. To do so, the scheduler would need to persist the frameworkId remotely somewhere (ZK?) and try to reregister with the same frameworkId when it restarts. We'd also need to set the failover_timeout to a sufficient duration. Bonus points: persist task state and perform task reconciliation on reregistration.
I filed a new github issue for this: https://github.com/jenkinsci/mesos-plugin/issues/147

The Spring Cloud Dataflow UI is not refreshing the status correctly

The deploy status shows undeployed in latest Spring Cloud Dataflow UI though it's deployed
Name: spring-cloud-dataflow-server
Version: 2.5.0.BUILD-SNAPSHOT

Resources