Window Duration within PipelineOptions in Cloud Dataflow - google-cloud-dataflow

Been trying to dig how I could do this but I'm constantly getting the same error, which is the following...
An exception occured while executing the Java class. Value only available at runtime, but accessed from a non-runtime context:
Solved this issue before accessing .get() once the Pipeline was instantiated and configured, inside a custom DoFn that was needed.
My problem now is when defining the Duration of a Window within the Pipeline, which isn't a custom object such as the previously metioned one.
#Description("Defaults to 5 (minutes).")
#Default.Long(5)
ValueProvider<Long> getWindowDuration();
I can't wrap my head around how to access that value once the Pipeline has been deployed, or if the current Window objects support PipelineOptions in some of its constructors...
.apply(
options.getWindowDuration() + "min Window",
Window.<GenericRecord>into(
FixedWindows.of(
Duration.standardMinutes(options.getWindowDuration().get())
/** [Hardcoded so I can debug] Duration.standardMinutes(5) **/))
.triggering(AfterProcessingTime.pastFirstElementInPane()
(...)

Window duration has to be specified when defining the pipeline (not during execution). So you should set it directly in the window object (for example, FixedWindows). The value you set does not necessarily have to come from a pipeline option.

Related

How to access trigger event properties in a Jenkins pipeline script

I have a Jenkins job config that uses the "Build whenever the specified event is seen" trigger (supported by the Cloudbee's Notification API plugin) and specifies a Jmespath Query (e.g. ref=='refs/heads/master') and runs a pipeline script. I want to access other properties in the trigger event (e.g. repository.full_name) from within the pipeline script. How can I do this?
Found the answer. The data I was looking for is in the com.cloudbees.jenkins.plugins.pipeline.events.EventTriggerCause instance of the build causes. For example, the following code finds all the commits:
def newCommits = currentBuild.rawBuild.getCauses().findAll {
it instanceof com.cloudbees.jenkins.plugins.pipeline.events.EventTriggerCause
}.collect{
it.getEvent().commits
}

Get console Logger (or TaskListener) from Pipeline script method

If I have a Pipeline script method in Pipeline script (Jenkinsfile), my Global Pipeline Library's vars/ or in a src/ class, how can obtain the OutputStream for the console log? I want to write directly to the console log.
I know I can echo or println, but for this purpose I need to write without the extra output that yields. I also need to be able to pass the OutputStream to something else.
I know I can call TaskListener.getLogger() if I can get the TaskListener (really hudson.util.StreamTaskListener) instance, but how?
I tried:
I've looked into manager.listener.logger (from the groovy postbuild plugin) and in the early-build context I'm calling from it doesn't yield an OutputStream that writes to the job's Console Log.
echo "listener is a ${manager.listener} - ${manager.listener.getClass().getName()} from ${manager} and has a ${manager.listener.logger} of class ${manager.listener.logger.getClass().getName()}"
prints
listener is a hudson.util.LogTaskListener#420c55c4 - hudson.util.LogTaskListener from org.jvnet.hudson.plugins.groovypostbuild.GroovyPostbuildRecorder$BadgeManager#58ac0c55 and has a java.io.PrintStream#715b9f99 of class java.io.PrintStream
I know you can get it from a StepContext via context.get(TaskListener.class) but I'm not in a Step, I'm in a CpsScript (i.e. WorkflowScript i.e. Jenkinsfile).
Finding it from a CpsFlowExecution obtained from the DSL instance registered as the steps script-property, but I couldn't work out how to discover the TaskListener that's passed to it when it's created
How is it this hard? What am I missing? There's so much indirect magic I find it incredibly hard to navigate the system.
BTW, I'm aware direct access is blocked by Script Security, but I can create #Whitelisted methods, and anything in a global library's vars/ is always whitelisted anyway.
You can access the build object from the Jenkins root object:
def listener = Jenkins.get()
.getItemByFullName(env.JOB_NAME)
.getBuildByNumber(Integer.parseInt(env.BUILD_NUMBER))
.getListener()
def logger = listener.getLogger() as PrintStream
logger.println("Listener: ${listener} Logger: ${logger}")
Result:
Listener: CloseableTaskListener[org.jenkinsci.plugins.workflow.log.BufferedBuildListener#6e9e6a16 / org.jenkinsci.plugins.workflow.log.BufferedBuildListener#6e9e6a16] Logger: java.io.PrintStream#423efc01
After banging my head against this problem for a couple days I think I have a solution:
CpsThreadGroup.current().execution.owner.listener
It's ugly, and I don't know if it's correct or if there's a better way, but seems to work.

Google dataflow: AvroIO read from file in google storage passed as runtime parameter

I want to read Avro files in my dataflow using java SDK 2
I have schedule my dataflow using cloud function which are triggered based on the files uploaded to the bucket.
Following is the code for options:
ValueProvider <String> getInputFile();
void setInputFile(ValueProvider<String> value);
I am trying to read this input file using following code:
PCollection<user> records = p.apply(
AvroIO.read(user.class)
.from(String.valueOf(options.getInputFile())));
I get following error while running the pipeline:
java.lang.IllegalArgumentException: Unable to find any files matching RuntimeValueProvider{propertyName=inputFile, default=gs://test_bucket/user.avro, value=null}
Same code works fine in case of TextIO.
How can we read Avro file which is uploaded for triggering cloud function which triggers the dataflow pipeline?
Please try ...from(options.getInputFile())) without converting it to a string.
For simplicity, you could even define your option as simple string:
String getInputFile();
void setInputFile(String value);
You need to use simply from(options.getInputFile()): AvroIO explicitly supports reading from a ValueProvider.
Currently the code is taking options.getInputFile() which is a ValueProvider, calling the JavatoString() function on it which gives a human-readable debug string "RuntimeValueProvider{propertyName=inputFile, default=gs://test_bucket/user.avro, value=null}" and passing that as a filename for AvroIO to read, and of course this string is not a valid filename, that's why the code currently doesn't work.
Also note that the whole point of ValueProvider is that it is placeholder for a value that is not known while constructing the pipeline and will be supplied later (potentially the pipeline will be executed several times, supplying different values) - so extracting the value of a ValueProvider at pipeline construction time is impossible by design, because there is no value. At runtime though (e.g. in a DoFn) you can extract the value by calling .get() on it.

Can Dataflow sideInput be updated per window by reading a gcs bucket?

I’m currently creating a PCollectionView by reading filtering information from a gcs bucket and passing it as side input to different stages of my pipeline in order to filter the output. If the file in the gcs bucket changes, I want the currently running pipeline to use this new filter info. Is there a way to update this PCollectionView on each new window of data if my filter changes? I thought I could do it in a startBundle but I can’t figure out how or if it’s possible. Could you give an example if it is possible.
PCollectionView<Map<String, TagObject>>
tagMapView =
pipeline.apply(TextIO.Read.named("TagListTextRead")
.from("gs://tag-list-bucket/tag-list.json"))
.apply(ParDo.named("TagsToTagMap").of(new Tags.BuildTagListMapFn()))
.apply("MakeTagMapView", View.asSingleton());
PCollection<String>
windowedData =
pipeline.apply(PubsubIO.Read.topic("myTopic"))
.apply(Window.<String>into(
SlidingWindows.of(Duration.standardMinutes(15))
.every(Duration.standardSeconds(31))));
PCollection<MY_DATA>
lineData = windowedData
.apply(ParDo.named("ExtractJsonObject")
.withSideInputs(tagMapView)
.of(new ExtractJsonObjectFn()));
You probably want something like "use an at most a 1-minute-old version of the filter as a side input" (since in theory the file can change frequently, unpredictably, and independently from your pipeline - so there's no way really to completely synchronize changes of the file with the behavior of the pipeline).
Here's a (granted, rather clumsy) solution I was able to come up with. It relies on the fact that side inputs are implicitly also keyed by window. In this solution we're going to create a side input windowed into 1-minute fixed windows, where each window will contain a single value of the tag map, derived from the filter file as-of some moment inside that window.
PCollection<Long> ticks = p
// Produce 1 "tick" per second
.apply(CountingInput.unbounded().withRate(1, Duration.standardSeconds(1)))
// Window the ticks into 1-minute windows
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(1))))
// Use an arbitrary per-window combiner to reduce to 1 element per window
.apply(Count.globally());
// Produce a collection of tag maps, 1 per each 1-minute window
PCollectionView<TagMap> tagMapView = ticks
.apply(MapElements.via((Long ignored) -> {
... manually read the json file as a TagMap ...
}))
.apply(View.asSingleton());
This pattern (joining against slowly changing external data as a side input) is coming up repeatedly, and the solution I'm proposing here is far from perfect, I wish we had better support for this in the programming model. I've filed a BEAM JIRA issue to track this.

Why doesn't the default attribute for number fields work for Jenkins jelly configurations?

I'm working on a Jenkins plugin where we make a call out to a remote service using Spring's RestTemplate. To configure the timeout values, I'm setting up some fields in the global configuration using the global.jelly file for Jenkins plugins using a number field as shown here:
<f:entry title="Read Timeout" field="readTimeout" description="Read timeout in ms.">
<f:number default="3000"/>
</f:entry>
Now, this works to save the values and retrieve the values no problem, so it looks like everything is setup correctly for my BuildStepDescriptor. However, when I first install the update to a Jenkins instance, instead of getting 3000 in the field by default as I would expect, instead I am getting 0. This is the same for all the fields that I'm using.
Given that the Jelly tag reference library says this attribute should be the default value, why do I keep seeing 0 when I first install the plugin?
Is there some more Java code that needs to be added to my plugin to tie the default in Jelly back to the global configuration?
I would think that when Jenkins starts, it goes to get the plugin configuration XML and fails to find a value and sets it to a default of 0.
I have got round this in the past by setting a default in the descriptor (in groovy) then this value will be saved into the global config the first time in and also be available if the user never visits the config page.
#Extension
static class DescriptorImpl extends AxisDescriptor {
final String displayName = 'Selenium Capability Axis'
String server = 'http://localhost:4444'
Boolean sauceLabs = false
String sauceLabsName
Secret sauceLabsPwd
String sauceLabsAPIURL =
'http://saucelabs.com/rest/v1/info/platforms/webdriver'
String sauceLabsURL = 'http://ondemand.saucelabs.com:80'
from here

Resources