Spring Cloud Dataflow Task Execution Fails on subsequent runs - spring-cloud-dataflow

Name: spring-cloud-dataflow-server
Version: 2.5.0.BUILD-SNAPSHOT
I have a very simple task created. First run it always COMPLETES fine with NO ISSUES. If task is run again it FAILS with following error.
Subsequent Launch of same task fails with below exception and it's a fresh run after the previous execution completed fully. If a task is run one time can't it be run again?
(log from Task Execution Details - Execution ID: 246)
Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={-spring.cloud.data.flow.taskappname=composed-task-runner, -spring.cloud.task.executionid=246, -graph=threetasks-t1 && threetasks-t2 && threetasks-t3, -spring.datasource.username=root, -spring.cloud.data.flow.platformname=default, -dataflow-server-uri=http://10.104.227.49:9393, -management.metrics.export.prometheus.enabled=true, -management.metrics.export.prometheus.rsocket.host=prometheus-proxy, -spring.datasource.url=jdbc:mysql://10.110.89.91:3306/mysql, -spring.datasource.driverClassName=org.mariadb.jdbc.Driver, -spring.datasource.password=manager, -management.metrics.export.prometheus.rsocket.port=7001, -management.metrics.export.prometheus.rsocket.enabled=true, -spring.cloud.task.name=threetasks}. If you want to run this job again, change the parameters.

A Job instance in a Spring Batch application requires a unique Job Parameter and this is by design.
In this case, since you are using the Composed Task, you can use the property --increment-instance-enabled=true as part of the composed task definition to handle it. This property will make sure to have the Job Instance get the unique Job parameters.
You can check the list of properties supported for Composed Task Runner here

Related

How do I restart Docker containers on Windows reboot?

It may sound like a known issue but the problem is that when system reboots, the containers don't start and appear to be in the Exited status. We're using docker-compose to start up the containers (in total about ~10 containers launched as a PowerShell script).
The docker documentation says to use the restart_policy but that mainly deals with container crashes. https://docs.docker.com/compose/compose-file.
The restart always flag is also set in the config file and doesn't seem to help, have tried setting up the task scheduler however it's still the same issue.
I'm wondering if there's a way the containers will be started gracefully or if it could be set up in Task Scheduler?
You could create and schedule task to stop the containers at system startup first and create another task to schedule an event on the successful completion of the previous task.
The important thing for another task is to edit the new event filter in XML format and to update the original task upon the successful completion of which we want to trigger a new task.
<QueryList>
<Query Id="0" Path="Microsoft-Windows-TaskScheduler/Operational">
<Select Path="Microsoft-Windows-TaskScheduler/Operational">*[System[Provider[#Name='Microsoft-Windows-TaskScheduler'] and Task = 102]]</Select>
</Query>
</QueryList>
You need to edit query manually and to replace the following line in the XML filter:
*[System[Provider[#Name='Microsoft-Windows-TaskScheduler'] and Task = 102]]
with:
*[EventData [#Name='TaskSuccessEvent'][Data[#Name='TaskName']='\Original\Task']]
The event filter details for the new task are as follows:
Events Logs: Microsoft-Windows-TaskScheduler/Operational
Event source: TaskScheduler
Task category: Task completed (status 102)
The event ID of the original task with the completion status code id 102:
EventID: 102
Provider-Name: Microsoft-Windows-TaskScheduler
Channel: Microsoft-Windows-TaskScheduler/Operational
TaskName: \Original\Task
Finally, add the action details with the program executable path and script/command (passing it as the argument) and save your changes to be able to run with the highest privileges.

Self-delete Jenkins builds after they're finished

tl;dr: I'd like to delete builds from within their execution, or rather, in a post statement (though the specifics shouldn't matter).
Background: In a project I'm working on, there is a "gateway" job of sorts that aggregates all new job triggers into one launch as long as certain other jobs are still running. For this purpose, this job aborts itself such that there is only ever one instance running (which is often not the latest build).
Unfortunately, this means that in the job preview, the job is often shown as aborted, which is undesirable (ending the job as "successful" or some other status wouldn't improve anything). Thus, I have two options:
Change the abortion logic so the newest build survives and older ones are aborted. This is technically possible, but has other drawbacks due to some internal logic, which is why I'd like to avoid this solution.
Delete the aborted builds once they're finished
However, this is apparently not as easy as just calling the "doDelete" REST API inside the job, and the build discarder can't be set to store 0 builds (it needs to be a positive integer). This is what I tried code-wise (MWE):
steps {
script {
currentBuild.result = 'ABORTED'
error("abort")
}
}
post {
always {
withCredentials([string(credentialsId: 'x', variable: 'TOKEN')]) {
sh "curl -X POST \"https://jenkins.example.com/etc/job/jobname/${env.BUILD_NUMBER}/doDelete\" -i -H 'Authorization: Bearer $TOKEN'"
}
}
}
This code deletes some job information (for instance, the console log is empty), but not the build itself. Thus, my question remains:
How can I make a job delete itself?

How can I retrieve the execution status of parallel triggered child jobs to a pipeline script

have a pipeline script that executes child jobs in parallel.
Say I have 5 data (a,b,c,d,e) that has to be executed on 3 jobs (J1, J2, J3)
My pipeline script is in the below format
for (int i = 0; i < size; i++) { def index = i branches["branch${i}"] = { build job: 'SampleJob', parameters: [ string(name: 'param1', value:'${data}'), string(name:'dummy', value: "${index}")] } } parallel branches
My problem is, say the execution is happening on Job 1 with the data 1,2,3,4,5 and if the data 3 execution is failed on Job 1 then the data 3 execution should be stopped there itself and should not happen on the subsequent parallel execution on Jobs 2 and 3.
Is there any way that I can read the execution status of parallelly execution job status on the Pipeline script so that I can restrict data 3 execution to block in Jobs 2 and 3.
I am quite blocked here for a long time. Hoping for a solution from my community. Thanks a lot in advance.
In summary, it sounds like you want to
run multiple jobs in parallel against different pieces of data. I will call the set of related jobs the "batch".
avoid starting a queued job if any of the jobs in the batch have failed
automatically abort a running job if any of the jobs in the batch have failed
The jobs need some way to communicate their failure to the others. Use a shared storage location to store the "failure flag". If the file exists, then one or more of the jobs have failed.
For example, a shared NFS path: /shared/jenkins/jobstate/<BATCH_ID>/failed
At the start of the job, check for the existence of this path. Exit if it does. The file doesn't necessarily need to contain any data - its presence is enough.
Since you need running jobs to abort early if the failure flag exists, you will need to poll that location periodically. For example, after each unit of work. Again, if the file exists then exit early.
If you don't use NFS, that's ok. You could also use an object storage bucket. The important thing is that the state is accessible to all the relevant build jobs.

Use BlockingDataflowPipelineRunner and post-processing code for Dataflow template

I'd like to run some code after my pipeline finishes all processing, so I'm using BlockingDataflowPipelineRunner and placing code after pipeline.run() in main.
This works properly when I run the job from the command line using BlockingDataflowPipelineRunner. The code under pipeline.run() runs after the pipeline has finished processing.
However, it does not work when I try to run the job as a template. I deployed the job as a template (with TemplatingDataflowPipelineRunner), and then tried to run the template in a Cloud Function like this:
dataflow.projects.templates.create({
projectId: 'PROJECT ID HERE',
resource: {
parameters: {
runner: 'BlockingDataflowPipelineRunner'
},
jobName: `JOB NAME HERE`,
gcsPath: 'GCS TEMPLATE PATH HERE'
}
}, function(err, response) {
if (err) {
// etc
}
callback();
});
The runner does not seem to take. I can put gibberish under runner and the job still runs.
The code I had under pipeline.run() does not run when each job runs -- it runs only when I deploy the template.
Is it expected that the code under pipeline.run() in main would not run each time the job runs? Is there a solution for executing code after the pipeline is finished?
(For context, the code after pipeline.run() moves a file from one Cloud Storage bucket to another. It's archiving the file that was just processed by the job.)
Yes, this expected behavior. A template represents the pipeline itself, and allows (re-)executing the pipeline by launching the template. Since the template doesn't include any of the code from the main() method, it doesn't allow doing anything after the pipeline execution.
Similarly, the dataflow.projects.templates.create API is just the API to launch the template.
The way the blocking runner accomplished this was to get the job ID from the created pipeline and periodically poll to observe when it has completed. For your use case, you'll need to do the same:
Execute dataflow.projects.templates.create(...) to create the Dataflow job. This should return the job ID.
Periodically (every 5-10s, for instance) poll dataflow.projects.jobs.get(...) to retrieve the job with the given ID, and check what state it is in.

Quartz jobs failing after MySQL db errors

On a working Grails 2.2.5 system, we're occasionally losing connection to the MySQL database, for reasons that are not relevant here. The majority of the system recovers perfectly well from the outage. But any Quartz jobs (using Quartz plugin 0.4.2) are typically failing to run again after such an outage. This is a typical message which appears in the log at the point the job should run:
2015-02-26 16:30:45,304 [quartzScheduler_Worker-9] ERROR core.ErrorLogger - Unable to notify JobListener(s) of Job to be executed: (Job will NOT be executed!). trigger= GRAILS_JOBS.quickQuoteCleanupJob job= GRAILS_JOBS.com.aire.QuickQuoteCleanupJob
org.quartz.SchedulerException: JobListener 'sessionBinderListener' threw exception: Already value [org.springframework.orm.hibernate3.SessionHolder#593a9498] for key [org.hibernate.impl.SessionFactoryImpl#c8488d7] bound to thread [quartzScheduler_Worker-9] [See nested exception: java.lang.IllegalStateException: Already value [org.springframework.orm.hibernate3.SessionHolder#593a9498] for key [org.hibernate.impl.SessionFactoryImpl#c8488d7] bound to thread [quartzScheduler_Worker-9]]
at org.quartz.core.QuartzScheduler.notifyJobListenersToBeExecuted(QuartzScheduler.java:1868)
at org.quartz.core.JobRunShell.notifyListenersBeginning(JobRunShell.java:338)
at org.quartz.core.JobRunShell.run(JobRunShell.java:176)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
Caused by: java.lang.IllegalStateException: Already value [org.springframework.orm.hibernate3.SessionHolder#593a9498] for key [org.hibernate.impl.SessionFactoryImpl#c8488d7] bound to thread [quartzScheduler_Worker-9]
at org.quartz.core.QuartzScheduler.notifyJobListenersToBeExecuted(QuartzScheduler.java:1866)
... 3 more
What do I need to do to make things more robust, so that the Quartz jobs recover as well?
By default, a Quartz job will get a session bound to it. Disable that session binding and let your service handle the transaction / session. That's what we do and when we get our DB connections back up, jobs still work.
To disable session binding in your job, add :
def sessionRequired = false

Resources