Spring Cloud Data Flow Java DSL for Task / Batch - spring-cloud-dataflow

I am wondering if there is any java dsl support for registering spring batch applications as SCDF taks? Till now I was able to find only streams support for Java DSL.
Any link in this direction will be much helpful. Also how to automatically deploy spring batch as SCDF tasks in production environment without any manual intervention.
Thank you.

There's Java DSL support for Tasks. This was recently introduced in 2.4 as part of the SCDF's IT test-suite, but it is not promoted as part of the SCDF's REST client that we ship as a library — see: spring-cloud/spring-cloud-dataflow#3949 / spring-io/dataflow.spring.io#242. Feel free to contribute the migration if you have cycles.
That said, though, we have end-to-end IT tests that leverage this capability [see: DataFlowIT.java#L771-L906], which we use for each commit based IT test runs internally. You could certainly follow this as a pattern to automate the creation and launch of the task/batch-jobs.

Related

Spring Cloud Data Flow preconfigured tasks

Spring Cloud Data Flow is a great solution and currently I'm trying to find the possibility to preconfigure the tasks in order to trigger them manually.
The use-case very simple:
as a DevOps I should have ability to preconfigure the tasks, which includes creation of the execution graph and application and deploy parameters and save task with all parameters needed for execution.
as a User with role ROLE_DEPLOY I should have ability start, stop, restart and execution monitor the preconfigured tasks.
Preconfigured tasks is the task with all parameters needed for execution.
Is it possible to have such functionality?
Thank you.
You may want to review the continuous deployment section from the reference guide. There's built-in lifecycle semantics for orchestrating tasks in SCDF.
Assuming you have the Task applications already built and that the DevOps persona is familiar with how the applications work together, they can either interactively or programmatically build a composed-task definition and task/job parameters in SCDF.
The above step persists the task definition in the SCDF's database. Now, the same definition can be launched manually or via a cronjob schedule.
If nothing really changes to the definition, app version, or the task/parameters, yes, anyone with the ROLE_DEPLOY role can interact with it.
You may also find the CD for Tasks guide useful reference material. Perhaps repeat this locally (with desired security configurations) on your environment to get a hold of how it works.

Google Cloud Dataflow - Apache Beam - Pipeline Shutdown Hook

Wondering if there is some kind of "hook" to place a piece of code that would be executed when apache beam pipeline is being shutdown (for whatever reason - crash, cancel)
I need to delete a subscription on pubsub topic each time Dataflow is stopped.
Apache Beam is not naturally suited for this sort of flow. For this you may want to look at an orchestration engine, such as Apache Airflow.
With Airflow you should be able to schedule any sort of script to run after a Beam pipeline finishes/fails/is cancelled, etc. Take a look at it!
There are some examples of waiting for pipelines to finish and indeed managing Pubsub topics/subscriptions in a ExampleUtils class in the examples folder in the apache/beam repository here. See if there is anything you can use in the waitUntilFinish and tearDown methods.
This is java code - not sure if that is the language you use.
(In the long run #Pablo's suggestion to separate this further from the pipeline code may be best - perhaps depends on your exact goal here.)

Jenkinsfile - Mutual exclusivity across multiple pipelines

I'm looking for a way to make multiple declaratively written Jenkinsfiles only run exclusively and block each other. They consume test instances who will be terminated after they run which causes problems when PRs are being tested as they come in.
I cannot find an option to make the BuildBlocker plugin do this, all the jenkinsfiles that use this plugin are not running in our Plugin/Jenkins version schema and it seems as if these [$class: <some.java.expression>] strings being exported from the syntax generator don't work here anyways.
I cannot find a way to run these Locks over all the steps involved in the pipeline.
I could hack a file-lock but this won't help me with multi-node builds.
This plugin could perhaps help you as it allows to lock resources you've declared previously so that if a resource is currently locked, any other job that requires the same resource will wait until it is released.
https://plugins.jenkins.io/lockable-resources/
Since you say you want declarative, probably wait for the currently-in-review Allow locking multiple stages in declarative pipeline jira issue to be completed. You can also vote for it and watch it.
And if you can't wait, this is your opportunity to learn golang (or whatever language you want to learn) by implementing a microservice that holds these locks that you call from your pipeline scripts. :D
The JobDSL plugin is for configuring Jenkins execution policies including blocking another and calling pipeline code.
https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.blockOn describes the method, which also the blocker plugin uses.
This is the tutorial for usage https://github.com/jenkinsci/job-dsl-plugin/wiki/Tutorial---Using-the-Jenkins-Job-DSL, the api https://github.com/jenkinsci/job-dsl-plugin/wiki/Job-DSL-Commands.
Taken from https://www.digitalocean.com/community/tutorials/how-to-automate-jenkins-job-configuration-using-job-dsl:
It should be possible to use https://github.com/jenkinsci/job-dsl-plugin/wiki/Dynamic-DSL, but I found no good usage example yet.

CI / CD: Principles for deployment of environments

I am not a developer, but reading about CI/CD at the moment. Now I am wondering about good practices for automated code deployment. I read a lot about the deployment of code to a pre-existing environment so far.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created. For example for testing of a newly created build, deleting the environment again after testing.
I know that there are various plugins to interact with AWS, Azure etc. that could be used to develop a job for deployment of a virtual machine.
There are also plugins to trigger Puppet to deploy infra (as code) and there are plugins to invoke an infrastructure orchestration.
So everything is available to be able to deploy the infrastructure and middleware before deploying code (with some extra effort of course).
Is this something that is used in real life? How is it done?
The background of my question is my interest in full automation of development with as few clicks as possible, and cost saving in a pay-per-use model by not having idle machines.
My question now is whether it is also good-practice to use e.g. a Jenkins workflow to deploy an environment from scratch when a new build is created
Yes it is good practice to deploy an environment from scratch. Like you say, Jenkins and Jenkins pipelines can certainly help with kicking off and orchestrating that process depending on your specific requirements. Deploying a full environment from scratch is one of the hardest things to automate, and if that is automated, it implies that a lot of other things are also automated, such as infrastructure, application deployments, application configuration, and so on.
Is this something that is used in real life?
Yes, definitely. A lot of shops do this. The simpler your environments, the easier it is, and therefore, a startup with one backend app would have relatively little trouble achieving this valhalla state. But even the creation of the most complex environments--with hundreds of interdependent applications--can be fully automated; it just takes more time and effort.
The background of my question is my interest in full automation of development with as less clicks as possible and cost saving in a pay-per-use model by not having idling machines.
Yes, definitely. The "spin up and destroy" strategy benefits all hosting models (since, after full automation, no one ever has to wait for someone to manually provision an environment), but those using public clouds see even larger benefits in terms of cost (vs always leaving AWS environments running, for example).
I appreciate your thoughts.
Not a problem. I will advise that this question doesn't fit stackoverflow's question and answer sweet spot super well, since it is quite general. In the future, I would recommend chatting with your developers, finding folks who are excited about this sort of thing, and formulating more specific questions when you all get stuck in the weeds on something. Welcome to stackoverflow!
All is being used in various combinations; the objective is to deliver continuous value to end user. My two cents:
Build & Release
It depends on what you are using. I personally recommend to use what is available with the tool. For example, VSTS (Visual Studio Team Services) offers complete CI/CD pipeline. But if you have a unique need which can only be served by Jenkins then you must use that and VSTS offers that out of the box.
IAC (Infrastructure as code)
In addition to Puppet etc. You can take benefits of AZURE ARM (Azure Resource Manager) Template to Build and destroy an environment. Again, see what is available out of the box with the tool set you have.
Pay-per-use
What I have personally used is Azure Dev/Test Labs and have the code deployed to that via CI/CD pipeline. Later setup Shutdown policy on the VM so it will auto-start and auto-shutdown based on time provided. This is a great feature to let you save cost on the resources being used and replicate environments.
For example, UAT environment might not needed until QA is signed off. But using IAC you can quickly spin up the environment automatically and then have one-click deployment setup to deploy code to UAT.

Does spring-cloud-dataflow provide support for scheduling applications defined as tasks?

I have been looking at using projects built using spring-cloud-task within spring-cloud-dataflow. Having looked at the example projects and the documentation, the indication seems to be that tasks are launched manually through the dashboard or the shell. Does spring-cloud-dataflow provide any way of scheduling task definitions so that they can run for example on a cron schedule? I.e. Can you create a spring-cloud-task app which itself has no knowledge of a schedule, but deploy it to the dataflow server and configure the scheduling there?
Among the posts and blogs I have looked at I noticed the following:
https://spring.io/blog/2016/01/27/introducing-spring-cloud-task
Some of the Q&A afterwards hints at this being a possibility, with the reference to triggers, but I think this was discussed before it was released.
Any advice would be greatly appreciated, many thanks.
There are few ways you could launch Tasks in Spring Cloud Data Flow. Following are the available options today.
Launch it using TriggerTask; with this you could either choose to launch it with fixedDelay or via a cron expression - example here.
Launch it via an event in streaming pipeline. Imagine a use-case where you would want to create a "thumbnail" as and when there's a new image (event) in s3-bucket or in a file-system directory; the "thumbnail" operation could be a task in this case - example here.
Lastly, in the upcoming releases, we will port over "scheduler" functionality from Spring XD to Spring Cloud Data Flow.
Yes, Spring Cloud Data Flow does provide a scheduling option. To enable it, you need to add below arguments while starting the server:
--spring.cloud.dataflow.features.schedules-enabled=true

Resources