Will other rule results impact the correctness of the policy? - open-policy-agent

I am working on a rego policy that has multiple conditions. Each condition needs to make a call to the same endpoint of a REST api.
I have managed to retain the result of the service call as a result of another rule.
The result of the Evaluate Package command (VS Code with OPA extension) on the policy looks like this now:
[ [
{
"allow": false,
"someProp": [
"N/A"
]
} ] ]
Will this affect the correctness of the policy once it gets deployed to the server?
EDIT
Until my changes, the policy only returned allow.

The "evaluate package" feature of the VS Code plugin does pretty much just that :) i.e. it evaluates the whole package including all rules. You'll see the same default behavior in the Rego Playground, since it's a pretty good way of learning what's going on to see what all rules evaluate to.
When running OPA as a server (or by all means, opa eval), you'll commonly query only the specific rule you're interested in, so if in your case you had defined your policy in a package called mypolicy a request to the OPA REST API might be sent to the /v1/data/mypolicy/allow endpoint to evaluate only the allow rule rathe than the whole package (which could be queried at /v1/data/mypolicy.

Related

Bazel - best documentation for which providers are used by any given rule?

I am writing a custom rule that takes inputs from cc_library, cc_binary, apple_static_library, and a few other platform-specific rules. I'd like to view each API given to me via referencing ctx.attr.foo inside the custom rule's implementation function.
There is a list of providers here https://docs.bazel.build/versions/master/skylark/lib/skylark-provider.html but it doesn't say which rules are using them.
Is there a best practice for viewing what these rules are providing me, or does it require going through the source for each one?
This is how you get all providers and output groups from a target:
bazel cquery my_target --output=starlark --starlark:expr="providers(target)"
You can get a list of providers for a given target with dir. Something like this is helpful for debugging:
def _print_attrs_impl(ctx):
for target in ctx.attr.targets:
print('%s: %s' % (target.label, dir(target)))
Printing from inside a rule you're developing is often helpful too, to verify targets are actually what you expect them to be.
You can also apply dir to the providers themselves, to see what fields they have.

How to pass an array from Bazel cli to rules?

Let's say I have a rule like this.
foo(
name = "helloworld",
myarray = [
":bar",
"//path/to:qux",
],
)
In this case, myarray is static.
However, I want it to be given by cli, like
bazel run //:helloworld --myarray=":bar,//path/to:qux,:baz,:another"
How is this possible?
Thanks
To get exactly what you're asking for, Bazel would need to support LABEL_LIST in Starlark-defined command line flags, which are documented here:
https://docs.bazel.build/versions/2.1.0/skylark/lib/config.html
and here: https://docs.bazel.build/versions/2.1.0/skylark/config.html
Unfortunately that's not implemented at the moment.
If you don't actually need a list of labels (i.e., to create dependencies between targets), then maybe STRING_LIST will work for you.
If you do need a list of labels, and the different possible values are known, then you can use --define, config_setting(), and select():
https://docs.bazel.build/versions/2.1.0/configurable-attributes.html
The question is, what are you really after. Passing variable, array into the bazel build/run isn't really possible, well not as such and not (mostly) without (very likely unwanted) side effects. Aren't you perhaps really just looking into passing arguments directly to what is being run by the run? I.e. pass it to the executable itself, not bazel?
There are few ways you could sneak stuff in (you'd also in most cases need to come up with a syntax to pass data on CLI and unpack the array in a rule), but many come with relatively substantial price.
You can define your array in a bzl file and load it from where the rule uses it. You can then dump the bzl content rewriting your build/run configuration (also making it obvious, traceable) and load the bits from the rule (only affecting the rule loading and using the variable). E.g, BUILD file:
load(":myarray.bzl", "myarray")
foo(
name = "helloworld",
myarray = myarray,
],
)
And you can then call your build:
$ echo 'myarray=[":bar", "//path/to:qux", ":baz", ":another"]' > myarray.bzl
$ bazel run //:helloworld
Which you can of course put in a single wrapper script. If this really needs to be a bazel array, this one is probably the cleanest way to do that.
--workspace_status_command: you can collection information about your environment, add either or both of the resulting files (depending on whether the inputs are meant to invalidate the rule results or not, you could use volatile or stable status files) as a dependency of your rule and process the incoming file in the what is being executed by the rule (at which point one would wonder why not pass it to as its command line arguments directly). If using stable status file, also each other rule depending on it is invalidated by any change.
You can do similar thing by using --action_env. From within the executable/tool/script underpinning the rule, you can directly access defined environmental variable. However, this also means environment of each rule is affected (not just the one you're targeting); and again, why would it parse the information from environment and not accept arguments on the command line.
There is also --define, but you would not really get direct access it's value as much as you could select() a choice out of possible options.

Dataflow/Beam Templates, Productionization, Initialization, and ValueProviders

I have an Apache Beam job running on Google Cloud Dataflow, and as part of its initialization it needs to run some basic sanity/availability checks on services, pub/sub subscriptions, GCS blobs, etc. It's a streaming pipeline intended to run ad infinitum that processes hundreds of thousands of pub/sub messages.
Currently it needs a whole heap of required, variable parameters: which Google Cloud project it needs to run in, which bucket and directory prefix it's going to be storing files in, which pub/sub subscriptions it needs to read from, and so on. It does some work with these parameters before pipeline.run is called - validation, string splitting, and the like. In its current form in order to start a job we've been passing these parameters to to a PipelineOptionsFactory and issuing a new compile every single time, but it seems like there should be a better way. I've set up the parameters to be ValueProvider objects, but because they're being called outside of pipeline.run, Maven complains at compile time that ValueProvider.get() is being called outside of a runtime context (which, yes, it is.)
I've tried using NestedValueProviders as in the Google "Creating Templates" document, but my IDE complains if I try to use NestedValueProvider.of to return a string as shown in the document. The only way I've been able to get NestedValueProviders to compile is as follows:
NestedValueProvider<String, String> pid = NestedValueProvider.of(
pipelineOptions.getDataflowProjectId(),
(SerializableFunction<String, String>) s -> s
);
(String pid = NestedValueProvider.of(...) results in the following error: "incompatible types: no instance(s) of type variable(s) T,X exist so that org.apache.beam.sdk.options.ValueProvider.NestedValueProvider conforms to java.lang.String")
I have the following in my pipelineOptions:
ValueProvider<String> getDataflowProjectId();
void setDataflowProjectId(ValueProvider<String> value);
Because of the volume of messages we're going to be processing, adding these checks at the front of the pipeline for every message that comes through isn't really practical; we'll hit daily account administrative limits on some of these calls pretty quickly.
Are templates the right approach for what I want to do? How do I go about actually productionizing this? Should (can?) I compile with maven into a jar, then just run the jar on a local dev/qa/prod box with my parameters and just not bother with ValueProviders at all? Or is it possible to provide a default to a ValueProvider and override it as part of the options passed to the template?
Any advice on how to proceed would be most appreciated. Thanks!
The way templates are currently implemented there is no point to perform "post-template creation" but "pre-pipeline start" initialization/validation.
All of the existing validation executes during template creation. If the validation detects that there the values aren't available (due to being a ValueProvider) the validation is skipped.
In some cases it is possible to approximate validation by adding runtime checks either as part of initial splitting of a custom source or part of the #Setup method of a DoFn. In the latter case, the #Setup method will run once for each instance of the DoFn that is created. If the pipeline is Batch, after 4 failures for a specific instance it will fail the pipeline.
Another option for productionizing pipelines is to build the JAR that runs the pipeline, and have a production process that runs that JAR to initiate the pipeline.
Regarding the compile error you received -- the NestedValueProvider returns a ValueProvider -- it isn't possible to get a String out of that. You could, however, put the validation code into the SerializableFunction that is run within the NestedValueProvider.
Although I believe this will currently re-run the validation everytime the value is accessed, it wouldn't be unreasonable to have the NestedValueProvider cache the translated value.

Machine parseable error messages

(From https://groups.google.com/d/msg/bazel-discuss/cIBIP-Oyzzw/caesbhdEAAAJ)
What is the recommended way for rules to export information about failures such that downstream tools can include them in UIs.
Example use case:
I ran bazel test //my:target, and one of the actions for //my:target fails because there is an unknown variable "usrname" in my/target.foo at line 7 column 10. It would also like to report that "username" is a valid variable and this is a possible misspelling. And thus wants to suggest an addition of an "e" character.
One way I have thought to do this is to have a separate file that my action produces //my:target.errors that is in a separate output group and have it write machine parseable data there in addition to human readable data on stdout.
I can then find all of these files and parse the data in them in downstream tools.
Is there any prior work on this, or does everything just try to parse the human readable output?
I recommend running the error checkers as extra actions.
I don't think Bazel currently has hooks for custom error handlers like you describe. Please consider opening a feature request: https://github.com/bazelbuild/bazel/issues/new

Requiring workflow class in workflow starter code

I have a simple question regarding the architecture of my Amazon Simple Workflow / AWS Flow for Ruby app. For background, I have a simple workflow with one activity running in an AWS Flow for Ruby layer on Opsworks. I have a separate REST API running in a Rails App Server layer on Opsworks that I would like to kick off the workflow.
The code in the REST API that kicks off the workflow:
1: domain = AWS::SimpleWorkflow.new.domains['my_domain']
2: workflow_client = AWS::Flow::workflow_client(domain.client, domain) {{from_class: MyWorkflowClass}}
3: workflow_client.start_execution(input_1: #input1, input_2: #input2)
My assumption is that my workflow and REST API code bases could be separate and that the only common component would be the aws-flow Ruby gem and require 'aws/decider'. However, I'm finding that my REST API also needs to have require 'PATH_TO_MY_WORKFLOW_CLASS'. When I remove that line of code from the code file in my REST API that kicks off the workflow, I get the following error:
undefined method `_options' for nil:NilClass; ["/Users/MyName/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-2.2.1/lib/aws/decider/utilities.rb:183:in `interpret_block_for_options'", "/Users/MyName/.rvm/gems/ruby-2.0.0-p247/gems/aws-flow-2.2.1/lib/aws/decider/implementation.rb:73:in `workflow_client'"
(error at line 2 above)
Am I mistaken? Do I really need to require MyWorkflowClass in my workflow starter app (i.e. my REST API) or am I doing something wrong? I've scoured the documentation and could not find a clear answer to this. All the samples that I can find do indeed have the workflow class included in the workflow starter code, but I'm not sure if it's because they are bundled as a simple sample or if it's because it's the way it's supposed to be. The reason why I am not taking the samples at face value is because requiring the workflow class in the workflow starter code does not make any sense to me. It binds the two apps way too tightly for my taste.
I posted an issue on the aws-flow-ruby sdk and got the answer from an Amazon Engineer. In short, you can use the :from_class option or the :prefix_name and :execution_method options together.
There are two ways of starting the workflow in code
1) Using the aws sdk directly.
In this case, your code doesn't need to know anything about the workflow class. You just need the domain, workflow type (name and version) and the workflow id.
It will look something like -
require 'aws-sdk-v1'
swf = AWS::SimpleWorkflow.new.client
swf.start_workflow_execution(
domain: "HelloWorld",
workflow_type: {
name: "HelloWorldWorkflow",
version: "1.0"
},
workflow_id: "foo",
input: ....,
....other options (optional)...
)
As you can see above, this doesn't require the workflow class at all.
2) Using the aws-flow gem (which is what you are doing above).
There are two ways of using the workflow client provided by the aws-flow gem to start an execution. You can either use the client as a generic client and not tie it to any workflow class or you can use the :from_class option to fetch options from a particular workflow class. To use the from_class option, you need to have the class in the ObjectSpace (hence you need to require the workflow file).
With from_class -
require 'aws/decider'
domain = AWS::SimpleWorkflow.new.domains['my_domain']
workflow_client = AWS::Flow::workflow_client(domain.client, domain) {{from_class: "MyWorkflowClass"}}
workflow_client.start_execution(input_1: #input1, input_2: #input2)
Without from_class -
require 'aws/decider'
domain = AWS::SimpleWorkflow.new.domains['my_domain']
workflow_client = AWS::Flow::workflow_client(domain.client, domain) {{
prefix_name: "YourClassName",
execution_method: "workflow_method_name",
version: "1.0",
...other options...
}}
workflow_client.start_execution(input_1: #input1, input_2: #input2)
The recommended way to start a workflow execution is to use aws-flow WorkflowClient instead of using the SDK directly.
Additional notes with respect to the input accepted by a workflow:
The SDK and the console will only take strings as input. This can be a free form string but if your workflow is written using ruby flow, this string should be a serialized form of your input so that the WorkflowWorker can deserialize the input when it picks up the task and convert it into ruby objects (in this case a hash).
When you use the ruby flow WorkflowClient, the client will automatically serialize your input hash (or any other input) into a string before sending it to SWF. aws-flow by default uses a YAML based data converter to do this (It can be overridden).
If you just want to see what your input hash will look like as a string, you can do the following -
AWS::Flow::FlowConstants.default_data_converter.dump(input_hash)
You can then use this serialized input to start a workflow using the SDK or the console.

Resources