What do these warnings mean when running our pipeline?
1497 [main] WARN com.google.cloud.dataflow.sdk.Pipeline - Transform
AsIterable2 does not have a stable unique name. In the future, this
will prevent reloading streaming pipelines
The warning in question indicates that the specified transform -- AsIterable2 -- isn't uniquely named. A likely cause of this is that there are two applications of an AsIterable transform at the top-level.
You can get rid of the warning by using the PTransform#setName method on the transform in question.
We attempt to infer names from the class names being applied. The only times it should be necessary to set an explicit name are:
When using an anonymous PTransform or DoFn.
When the same named transform is used multiple times within the same PTransform.
Specifically, requirement #2 means that if you use a named PTransform multiple times within an outer PTransform you need to make sure that each application has a different name. For instance:
input.apply(someTransform)
.apply(View.<TableRow>asIterable().withName("iterable1"));
input.apply(someOtherTransform)
.apply(View.<TableRow>asIterable().withName("iterable2"));
instead of:
View.AsIterable<TableRow> iterable = View.TableRow>asIterable().setName("aName");
input.apply(someTransform).apply(iterable);
input.apply(someOtherTransform).apply(iterable);
Related
Let's say I have a rule like this.
foo(
name = "helloworld",
myarray = [
":bar",
"//path/to:qux",
],
)
In this case, myarray is static.
However, I want it to be given by cli, like
bazel run //:helloworld --myarray=":bar,//path/to:qux,:baz,:another"
How is this possible?
Thanks
To get exactly what you're asking for, Bazel would need to support LABEL_LIST in Starlark-defined command line flags, which are documented here:
https://docs.bazel.build/versions/2.1.0/skylark/lib/config.html
and here: https://docs.bazel.build/versions/2.1.0/skylark/config.html
Unfortunately that's not implemented at the moment.
If you don't actually need a list of labels (i.e., to create dependencies between targets), then maybe STRING_LIST will work for you.
If you do need a list of labels, and the different possible values are known, then you can use --define, config_setting(), and select():
https://docs.bazel.build/versions/2.1.0/configurable-attributes.html
The question is, what are you really after. Passing variable, array into the bazel build/run isn't really possible, well not as such and not (mostly) without (very likely unwanted) side effects. Aren't you perhaps really just looking into passing arguments directly to what is being run by the run? I.e. pass it to the executable itself, not bazel?
There are few ways you could sneak stuff in (you'd also in most cases need to come up with a syntax to pass data on CLI and unpack the array in a rule), but many come with relatively substantial price.
You can define your array in a bzl file and load it from where the rule uses it. You can then dump the bzl content rewriting your build/run configuration (also making it obvious, traceable) and load the bits from the rule (only affecting the rule loading and using the variable). E.g, BUILD file:
load(":myarray.bzl", "myarray")
foo(
name = "helloworld",
myarray = myarray,
],
)
And you can then call your build:
$ echo 'myarray=[":bar", "//path/to:qux", ":baz", ":another"]' > myarray.bzl
$ bazel run //:helloworld
Which you can of course put in a single wrapper script. If this really needs to be a bazel array, this one is probably the cleanest way to do that.
--workspace_status_command: you can collection information about your environment, add either or both of the resulting files (depending on whether the inputs are meant to invalidate the rule results or not, you could use volatile or stable status files) as a dependency of your rule and process the incoming file in the what is being executed by the rule (at which point one would wonder why not pass it to as its command line arguments directly). If using stable status file, also each other rule depending on it is invalidated by any change.
You can do similar thing by using --action_env. From within the executable/tool/script underpinning the rule, you can directly access defined environmental variable. However, this also means environment of each rule is affected (not just the one you're targeting); and again, why would it parse the information from environment and not accept arguments on the command line.
There is also --define, but you would not really get direct access it's value as much as you could select() a choice out of possible options.
I use Kapacitor auto load directory for delivering tick scripts to all envs https://docs.influxdata.com/kapacitor/v1.4/guides/load_directory/
one requirement: you need set "dbrp"
other way you get error:
failed to create task: must specify dbrp
In the same time I want to debug/modify (see log) of this alert in Chronograf web interface (http://****:8888/sources/1/tickscript/)
but can not do it cause Error message:
cannot specify dbrp in implicitly and explicitly
as Chronograf provide one more "select database" control.
May be someone now is it possible to debug pre-load tick script in Chronograf ui?
In https://docs.influxdata.com/kapacitor/v1.5/tick/syntax/#declarations
the following paragraph is instructive:
A database declaration begins with the keyword dbrp and is followed by two strings separated by a period. The first string declares the default database, with which the script will be used. The second string declares its retention policy. Note that the database and retention policy can also be declared using the flag -dbrp when defining the task with the command kapacitor define on the command-line, so this statement is optional. ...
Since it is optional in the TICKscript, then you can just set database declaration can be set from the command line when you load the script, e.g.
kapacitor define load_1 -tick ~/tick/telegraf-autogen/load_1.tick -dbrp "telegraf"."autogen"
Defined this way, the dbrp is considered implicitly set since it's not defined in the TICKscript. If you define it in the TICKscript, then it is explicitly set. This small detail unlocks this conundrum - define dbrp on the load script and not the TICKscript.
Coded this way, if you later save the TICKscript in the cronograf TICKscript editor, you won't get this error, since it's not explicitly set in the TICKscript.
Yes, you have to track two pieces of code, e.g. the TICKscript and the command line you use to load it into kapacitor. Suggestion, add a hint in the TICKscript will help reduce confusion regarding the intended dbrp. Also, group TICKscripts in subdirs by dbrp (as shown above) along with the load script in that dir, would keep things clean.
I have an Apache Beam job running on Google Cloud Dataflow, and as part of its initialization it needs to run some basic sanity/availability checks on services, pub/sub subscriptions, GCS blobs, etc. It's a streaming pipeline intended to run ad infinitum that processes hundreds of thousands of pub/sub messages.
Currently it needs a whole heap of required, variable parameters: which Google Cloud project it needs to run in, which bucket and directory prefix it's going to be storing files in, which pub/sub subscriptions it needs to read from, and so on. It does some work with these parameters before pipeline.run is called - validation, string splitting, and the like. In its current form in order to start a job we've been passing these parameters to to a PipelineOptionsFactory and issuing a new compile every single time, but it seems like there should be a better way. I've set up the parameters to be ValueProvider objects, but because they're being called outside of pipeline.run, Maven complains at compile time that ValueProvider.get() is being called outside of a runtime context (which, yes, it is.)
I've tried using NestedValueProviders as in the Google "Creating Templates" document, but my IDE complains if I try to use NestedValueProvider.of to return a string as shown in the document. The only way I've been able to get NestedValueProviders to compile is as follows:
NestedValueProvider<String, String> pid = NestedValueProvider.of(
pipelineOptions.getDataflowProjectId(),
(SerializableFunction<String, String>) s -> s
);
(String pid = NestedValueProvider.of(...) results in the following error: "incompatible types: no instance(s) of type variable(s) T,X exist so that org.apache.beam.sdk.options.ValueProvider.NestedValueProvider conforms to java.lang.String")
I have the following in my pipelineOptions:
ValueProvider<String> getDataflowProjectId();
void setDataflowProjectId(ValueProvider<String> value);
Because of the volume of messages we're going to be processing, adding these checks at the front of the pipeline for every message that comes through isn't really practical; we'll hit daily account administrative limits on some of these calls pretty quickly.
Are templates the right approach for what I want to do? How do I go about actually productionizing this? Should (can?) I compile with maven into a jar, then just run the jar on a local dev/qa/prod box with my parameters and just not bother with ValueProviders at all? Or is it possible to provide a default to a ValueProvider and override it as part of the options passed to the template?
Any advice on how to proceed would be most appreciated. Thanks!
The way templates are currently implemented there is no point to perform "post-template creation" but "pre-pipeline start" initialization/validation.
All of the existing validation executes during template creation. If the validation detects that there the values aren't available (due to being a ValueProvider) the validation is skipped.
In some cases it is possible to approximate validation by adding runtime checks either as part of initial splitting of a custom source or part of the #Setup method of a DoFn. In the latter case, the #Setup method will run once for each instance of the DoFn that is created. If the pipeline is Batch, after 4 failures for a specific instance it will fail the pipeline.
Another option for productionizing pipelines is to build the JAR that runs the pipeline, and have a production process that runs that JAR to initiate the pipeline.
Regarding the compile error you received -- the NestedValueProvider returns a ValueProvider -- it isn't possible to get a String out of that. You could, however, put the validation code into the SerializableFunction that is run within the NestedValueProvider.
Although I believe this will currently re-run the validation everytime the value is accessed, it wouldn't be unreasonable to have the NestedValueProvider cache the translated value.
I've been using the option Restrict matrix execution to a subset from the Parameterized Trigger Plugin to pass on a combination filter to a rather large Matrix Project where all test execution is made. As the number of tests grow, so does the combination filter (which is dynamically built up) and I seemed to hit the cap. The following job gets this error message:
FATAL: Invalid method Code length 69871 in class file Script1
java.lang.ClassFormatError: Invalid method Code length 69871 in class file Script1
After reading about this problem, it seems to be a JVM constraint after reading the JVM documentation
The value of the code_length item must be less than 65536.
I get the impression that this is not something I can (or even should) tinker with in Jenkins.
My second idea was to go around this problem was to create the combination filter and then pass it as String parameter to the following Matrix Project, then use the Combination Filter option and expand the variable to achieve the same result.
Unfortunately I get this exception when trying to save my Matrix Project with a String parameter as combination filter
javax.servlet.ServletException: groovy.lang.MissingPropertyException: No such property: $COMBINATION_FILTER for class: groovy.lang.Binding
I guess this is because the variable needs to be available in the configuration when saving but I want to inject it when starting the Matrix Project.
I am running out of ideas to solve this problem. Any ideas?
You could try the Matrix Groovy Execution Strategy which is like a super combination filter
If I can quote myself
A plugin to decide the execution order and valid combinations of
matrix projects.
This uses a user defined groovy script to arrange the order which will
then be executed
Disclaimer: I built this plugin
I use java and saxonee-9.5.1.6.jar included build path , when run, getting these errors at different times.
Error at xsl:import-schema on line 6 column 169 of stylesheet.xslt:
XTSE1650: net.sf.saxon.trans.LicenseException: Requested feature (xsl:import-schema)
requires Saxon-EE
Error on line 1 column 1
SXXP0003: Error reported by XML parser: Content is not allowed in prolog.
javax.xml.transform.TransformerConfigurationException: Failed to compile stylesheet. 1 error detected.
I open .xslt file in hex editor and dont see any different character at the beginning AND
I use transformerfactory in a different project but any error I get.
Check what the implementation class of tFactory is. My guess is it is probably net.sf.saxon.TransformerFactoryImpl - which is basically the Saxon-HE version.
When you use JAXP like this, you're very exposed to configuration problems, because it loads whatever it finds sitting around on the classpath, or is affected by system property settings which could be set in parts of the application you know nothing about.
If your application depends on particular features, it's best to load a specific TransformerFactory, e.g. tFactory = new com.saxonica.config.EnterpriseTransformerFactory().
I don't know whether your stylesheet expects the source document to be validated against the schema, but it it does, note that this isn't automatic: you can set properties on the factory to make it happen.
I would recommend using Saxon's s9api interface rather than JAXP for this kind of thing. The JAXP interface was designed for XSLT 1.0, and it's a real stretch to use it for some of the new 2.0 features like schema-awareness: it can be done, but you keep running into limitations.