Composed task custom condition - spring-cloud-dataflow

Is it possible to create a custom Exit status in spring cloud data flow?
Let's say i have the following:
I saw an examples for FAILED and UNKNOWN, so I've created 2 custom conditions Worked & Generated.
Assuming this approach is possible - How do i pass those strings from inside the task? Or it needs to be passed from somewhere else?
If not - then why i can write any string that i want in the "Properties for TRANSITION" modal?

Other than providing the UI option to wire the exit-code to map to a particular downstream step, there's nothing dynamically influenced by SCDF. In other words, SCDF doesn't interfere with whatever is happening internally in each Task application.
The custom transitions require the desired exit-codes to be returned/handled within the Task application itself.
In your example above, if the Timestamp's business logic returned "Worked" as the exit-code, then the transition would result in executing Bar application. Likewise, if the exit-code is "Generated", you'd see Foo running.

Related

Can I define new mustache template variables in swagger-codegen?

I have developed a rest-api client (in java) customised to the needs of my product. I wanted to generate tests using my rest api client using swagger-codegen modules based on yaml-file.
I have already extended DefaultCodegenConfig & even tried implementing the CodegenConfig interface to build my custom jar. I have customized the api.mustache and api_test.mustache files and passing them in the constructor and processOpts() method of my CustomCodeGen that extends DefaultCodegenConfig.
However, I want to use the custom/new mustache template variables that I have added in my customised api.mustache.
For e.g. if refer to standard api.mustache, the template variables it typically uses are
- {{classname}}
- {{#operation}}
- {{#contents}}
- {{#parameters}}
etc.
Now, I want to introduce a new template variable, let's say {{custom_param}}. Now I am not clear how do I integrate this new template variable with the implementation.
Looks like from this Mustache-Template-Variables published here, swagger-codegen does not allow adding new template-variables and perhaps we are restricted to only the variables mentioned on this page.
So, is there some way to make the new template variables work ?
Some time ago I added the uniqueItems parameter for bean validation as it was not getting processed by the engine even though it was a part of the implemented JSR.
So I believe codebase needs to be updated to use your own variable which is only possible if you fork the code.
In case it helps, these two were the PRs:
For query parameters: https://github.com/swagger-api/swagger-codegen/pull/10154.
For body parameters: https://github.com/swagger-api/swagger-codegen/pull/10490.

TFS build custom conditions for running a task - check if specific previous task has failed

TFS build allows to specify conditions for running a task: reference.
The condition I would like to define is: a specific task [addressed by name or other mean] has failed.
This is similar to Only when a previous task has failed, but I want to specify which previous task that is.
Looking at the examples I don't see any condition that is addressing a specific task outcome, only the entire build status.
Is it possible? any workaround to achieve this?
It doesn't seem like there's an out-of-the-box solution for this requirement, but I can come up with (an ugly :)) workaround.
Suppose your specific task (the one you examine in regards to its status) is called A. The goal is to call another build task (let's say B) only in case A fails.
You can do the following:
Define a custom build variable, call it task.A.status and set to success
Create another build task, e.g. C and schedule it right after A; condition it to only run if A fails - there's a standard condition for that
The task C should only do one thing - set task.A.status build variable to 'failure' (like this, if we are talking PowerShell: Write-Host "##vso[task.setvariable variable=task.A.status]failure")
Finally, the task B is scheduled sometime after C and is conditioned to run in case task.A.status equals failure, like this: eq(variables['task.A.status'], 'failure')
I might be incorrect in syntax details, but you should get the general idea. Hope it helps.

Dataflow/Beam Templates, Productionization, Initialization, and ValueProviders

I have an Apache Beam job running on Google Cloud Dataflow, and as part of its initialization it needs to run some basic sanity/availability checks on services, pub/sub subscriptions, GCS blobs, etc. It's a streaming pipeline intended to run ad infinitum that processes hundreds of thousands of pub/sub messages.
Currently it needs a whole heap of required, variable parameters: which Google Cloud project it needs to run in, which bucket and directory prefix it's going to be storing files in, which pub/sub subscriptions it needs to read from, and so on. It does some work with these parameters before pipeline.run is called - validation, string splitting, and the like. In its current form in order to start a job we've been passing these parameters to to a PipelineOptionsFactory and issuing a new compile every single time, but it seems like there should be a better way. I've set up the parameters to be ValueProvider objects, but because they're being called outside of pipeline.run, Maven complains at compile time that ValueProvider.get() is being called outside of a runtime context (which, yes, it is.)
I've tried using NestedValueProviders as in the Google "Creating Templates" document, but my IDE complains if I try to use NestedValueProvider.of to return a string as shown in the document. The only way I've been able to get NestedValueProviders to compile is as follows:
NestedValueProvider<String, String> pid = NestedValueProvider.of(
pipelineOptions.getDataflowProjectId(),
(SerializableFunction<String, String>) s -> s
);
(String pid = NestedValueProvider.of(...) results in the following error: "incompatible types: no instance(s) of type variable(s) T,X exist so that org.apache.beam.sdk.options.ValueProvider.NestedValueProvider conforms to java.lang.String")
I have the following in my pipelineOptions:
ValueProvider<String> getDataflowProjectId();
void setDataflowProjectId(ValueProvider<String> value);
Because of the volume of messages we're going to be processing, adding these checks at the front of the pipeline for every message that comes through isn't really practical; we'll hit daily account administrative limits on some of these calls pretty quickly.
Are templates the right approach for what I want to do? How do I go about actually productionizing this? Should (can?) I compile with maven into a jar, then just run the jar on a local dev/qa/prod box with my parameters and just not bother with ValueProviders at all? Or is it possible to provide a default to a ValueProvider and override it as part of the options passed to the template?
Any advice on how to proceed would be most appreciated. Thanks!
The way templates are currently implemented there is no point to perform "post-template creation" but "pre-pipeline start" initialization/validation.
All of the existing validation executes during template creation. If the validation detects that there the values aren't available (due to being a ValueProvider) the validation is skipped.
In some cases it is possible to approximate validation by adding runtime checks either as part of initial splitting of a custom source or part of the #Setup method of a DoFn. In the latter case, the #Setup method will run once for each instance of the DoFn that is created. If the pipeline is Batch, after 4 failures for a specific instance it will fail the pipeline.
Another option for productionizing pipelines is to build the JAR that runs the pipeline, and have a production process that runs that JAR to initiate the pipeline.
Regarding the compile error you received -- the NestedValueProvider returns a ValueProvider -- it isn't possible to get a String out of that. You could, however, put the validation code into the SerializableFunction that is run within the NestedValueProvider.
Although I believe this will currently re-run the validation everytime the value is accessed, it wouldn't be unreasonable to have the NestedValueProvider cache the translated value.

Perform action after Dataflow pipeline has processed all data

Is it possible to perform an action once a batch Dataflow job has finished processing all data? Specifically, I'd like to move the text file that the pipeline just processed to a different GCS bucket. I'm not sure where to place that in my pipeline to ensure it executes once after the data processing has completed.
I don't see why you need to do this post pipeline execution. You could use side outputs to write the file to multiple buckets, and save yourself the copy after the pipeline finishes.
If that's not going to work for you (for whatever reason), then you can simply run your pipeline in blocking execution mode i.e. use pipeline.run().waitUntilFinish(), and then just write the rest of your code (which does the copy) after that.
[..]
// do some stuff before the pipeline runs
Pipeline pipeline = ...
pipeline.run().waitUntilFinish();
// do something after the pipeline finishes here
[..]
A little trick I got from reading the source code of apache beam's PassThroughThenCleanup.java.
Right after your reader, create a side input that 'combine' the entire collection (in the source code, it is the View.asIterable() PTransform) and connect its output to a DoFn. This DoFn will be called only after the reader has finished reading ALL elements.
P.S. The code literally name the operation, cleanupSignalView which I found really clever
Note that you can achieve the same effect using Combine.globally() (java) or beam.CombineGlobally() (python). For more info check out section 4.2.4.3 here
I think two options can help you here:
1) Use TextIO to write to the bucket or folder you want, specifying the exact GCS path (for e.g. gs://sandbox/other-bucket)
2) Use Object Change Notifications in combination with Cloud Functions. You can find a good primer on doing this here and the SDK for GCS in JS here. What you will do in this option is basically setting up a trigger when something drops in a certain bucket, and move it to another one using your self-written Cloud Function.

Jenkins MultiJob Phases share ${BUILD_TIMESTAMP}

I have a parent jenkins multijob that calls 3 children jobs, passing to the children the same parameters the parent was built with.
Each child needs to use the same timestamp as it is a unique identifier that each child needs to search for on a webpage.
My problem is this:
When the parent is built, the "name" parameter is set to ${BUILD_TIMESTAMP}, lets call this "02201200" short for Feb 20, 12:00. Each child is called with "pass current job parameters". However, instead of each child receiving 02201200, they each receive ${BUILD_TIMESTAMP} and fetch this value again (eg 02201204).
How do I force the parent to evaluate ${BUILD_TIMESTAMP} and pass its evaluation to the children instead of the variable itself?
One possible solution would be to write the value of this time stamp to a file. Then you could reference that value in subsequent jobs via the "Parameters from properties file" option. Obviously you would just keep overwriting this file every time your job sequence runs.
I used this method and also ended up generally saving all metadata (system/environment variables, jenkins parameters, and build properties, etc) into a properties file and even archived them. This approach simplifies/works around many problems I had. Now, every build has its metadata archived, for downstream jobs or later references, I can get all necessary information from this one file; no extra parameters need to be passed around.
Furthermore, if anything goes wrong, the metadata is very helpful for investigating. I would recommend this simple strategy as it has proven extremely useful to myself and my team.

Resources