Trying to wrap my head around this issue for a while - I have a JSON input which contains an array, say something like this:
{
"array" : [
{"foo": "bar"},
{"foo": "buzz"},
{"misbehaving": "object"}
]
}
My goal is to verify that all of the objects in the array satisfy the condition of having a field named foo (actual use-case is to make sure that all resources in cloud deployment have tags). My issue is that standard rego expressions are evaluated as "at least" and not "all", which means that expressions like:
all_have_foo_field {
input.array.foo
}
Are always returning true, even though some objects do not satisfy this. I've looked at this, but evaluating a regex returns true or false while my policy checks if field exists, meaning if it does not I get a 'var_is_unsafe' error.
Any ideas?
There are two ways to say "all fields of elements in X must match these conditions" (FOR ALL).
TLDR:
all_have_foo_field {
# use negation and a helper rule
not any_missing_foo_field
}
any_missing_foo_field {
some i
input.array[i]
not input.array[i].foo
}
OR
all_have_foo_field {
# use a comprehension
having_foo := {i | input.array[i].foo}
count(having_foo) == count(input.array)
}
The approach depends on the us case. If you want to know what elements do not satisfy the conditions, the comprehension is nice because you can use set arithmetic, e.g., {i | input.array[i]} - {i | input.array[i].foo} produces the set of array indices that do not have the field "foo". You probably want to assign these expressions to local variables for readability. See this section in the docs for more detail: https://www.openpolicyagent.org/docs/latest/policy-language/#universal-quantification-for-all.
In this case (as opposed to the answer you linked to) we don't have to use regex or anything like that since references to missing/undefined fields results in undefined and undefined propagates outward to the expression, query, rule, etc. This is covered to some extent in the Introduction.
All we have to do then is just refer to the field in question. Note, technically not input.array[i].foo would be TRUE if the "foo" field value false however in many cases undefined and false can be treated as interchangeable (they're not quite the same--false is a valid JSON value whereas undefined represents the lack of a value.) If you need to only match undefined then you have to assign the result of the reference to a local variable. In the comprehension case we can write:
# the set will contain all values i where field "foo" exists regardless
{i | _ = input.array[i].foo}
In the negation case we need an additional helper rule since not _ = input.array[i].foo would be "unsafe". We can write:
exists(value, key) { value[key] = _ }`
And now not exists(input[i], "foo") is only TRUE when the field "foo" is missing.
Note, differentiating between undefined and false is often not worth it--I recommend only doing so when necessary.
Related
The following is my sample code: https://play.openpolicyagent.org/p/oyY1GOsYaf
Here when I try to evaluate names array, it is showing:
error occurred: 1:1: rego_unsafe_var_error: var names is unsafe
But when I define the same comprehension outside the allow rule definition : https://play.openpolicyagent.org/p/Xv0cF7FM8b, I am able to evaluate the selection
[
"smoke",
"dev"]
could someone help me to point out the difference and if I want to define the comprehention inside the rule is there any syntax I need to follow? Thanks in advance
Note: I am getting the final output as expected in both cases, only issue is with the names array evaluation.
The way the Rego Playground generates a query when evaluating a selection is much more simplistic than one might assume. A query will be generated from your selected text, without taking into account where in the document that text was selected. This means that even if you select a local variable inside a rule body, the query will simply contain that variable name (names, in your case); which will be perceived as a reference to a top-level variable in the document's body, even though a rule-local variable was selected. This is why your first sample returns an error, as there is no top-level variable names in the document; whereas the second sample does, and therefore succeeds.
You can test this quirk by selecting and evaluating the word hello on line 3 here: https://play.openpolicyagent.org/p/n5OPoFnlhx.
package play
# hello
hello {
m := input.message
m == "world"
}
Even though it's just part of a comment, it'll evaluate just as if you had selected the rule name on line 5.
I am looking to change the ending of the user name based on the use case (in the language system will operate, names ends depending on how it is used).
So need to define all endings of names and define the replacement for them.
Was suggested to use .gsub regular expression to search and replace in a string:
Changing text based on the final letter of user name
"name surname".gsub(/e\b/, 'ai')
this will replace e with ai, so "name surname = namai surnamai".
How can it be used for more options like: "e = ai, us = mi, i = as" on the same record?
thanks
You can use String#gsub with block. Docs say:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
So you can use a regex with concatenation of all substrings to be replaced and then replace it in the block, e.g. using a hash that maps matches to replacements.
Full example:
replacements = {'e'=>'ai', 'us'=>'mi', 'i' => 'as'}
['surname', 'surnamus', 'surnami'].map do |s|
s.gsub(/(e|us|i)$/){|p| replacements[p] }
end
#Sundeep makes an important observation in a comment on the question. If, for example, the substitutions were give by the following hash:
g = {'e'=>'ai', 's'=>'es', 'us'=>'mi', 'i' => 'as'}
#=> {"e"=>"ai", "s"=>"es", "us"=>"mi", "i"=>"as"}
'surnamus' would be converted (incorrectly) to 'surnamues' merely because 's'=>'es' precedes 'us'=>'mi' in g. That situation may not exist at present, but it may be prudent to allow for it in future, particularly because it is so simple to do so:
h = g.sort_by { |k,_| -k.size }.to_h
#=> {"us"=>"mi", "e"=>"ai", "s"=>"es", "i"=>"as"}
arr = ['surname', 'surnamus', 'surnami', 'surnamo']
The substitutions can be done using the form of String##sub that employs a hash as its second argument.
r = /#{Regexp.union(h.keys)}\z/
#=> /(?-mix:us|e|s|i)\z/i
arr.map { |s| s.sub(r,h) }
#=> ["surnamai", "surnammi", "surnamas", "surnamo"]
See also Regexp::union.
Incidentally, though key-insertion order has been guaranteed for hashes since Ruby v1.9, there is a continuing debate as to whether that property should be made use of in Ruby code, mainly because there was no concept of key order when hashes were first used in computer programs. This answer provides a good example of the benefit of exploiting key order.
Say I have a scope like this:
scope :by_templates, ->(t) { joins(:template).where('templates.label ~* ?', t) }
How can I retrieve multiple templates with t like so?
Document.first.by_templates(%w[email facebook])
This code returns this error.
PG::DatatypeMismatch: ERROR: argument of AND must be type boolean, not type record
LINE 1: ...template_id" WHERE "documents"."user_id" = $1 AND (templates...
PostgreSQL allows you to apply a boolean valued operator to an entire array of values using the op any(array_expr) construct:
9.23.3. ANY/SOME (array)
expression operator ANY (array expression)
expression operator SOME (array expression)
The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ANY is “true” if any true result is obtained. The result is “false” if no true result is found (including the case where the array has zero elements).
PostgreSQL also supports the array constructor syntax for creating arrays:
array[value, value, ...]
Conveniently, ActiveRecord will expand a placeholder as a comma-delimited list when the value is an array.
Putting these together gives us:
scope :by_templates, ->(templates) { joins(:template).where('templates.label ~* any(array[?])', templates) }
As an aside, if you're using the case-insensitive regex operator (~*) as a case-insensitive comparison (i.e. no real regex pattern matching going on) then you might want to use upper instead:
# Yes, this class method is still a scope.
def self.by_templates(templates)
joins(:template).where('upper(templates.label) = any(array[?])', templates.map(&:upcase) }
end
Then you could add an index to templates on upper(label) to speed things up and avoid possible issues with stray regex metacharacters in the templates. I tend to use upper case for this sort of thing because of oddities lie 'ß'.upcase being 'SS' but 'SS'.downcase being 'ss'.
unregister_name({local,Name}) ->
_ = (catch unregister(Name));
unregister_name({global,Name}) ->
_ = global:unregister_name(Name);
unregister_name({via, Mod, Name}) ->
_ = Mod:unregister_name(Name);
unregister_name(Pid) when is_pid(Pid) ->
Pid.
This is from gen_server.erl. If _ always matches and the match always evaluates to the right hand side expression, what are the _ = expression() lines doing here?
Typically _ = ... matches are used to quiet dialyzer warnings about unmatched function return values when its -Wunmatched_returns option is used. As the documentation explains:
-Wunmatched_returns
Include warnings for function calls which ignore a structured return value or
do not match against one of many possible return value(s).
By explicitly matching the return value against the _ "don't care" variable, you can use this useful dialyzer option without having to see warnings for return values you don't care about.
In Erlang, last expression of function is its return value, so someone might be tempted to check, what global:unregister_name/1 or Mod:unregister_name(Name) return and try to pattern match on that.
The _ = expression() doesn't do anything in particular, but hints, that this return value should be ignored (for example, because they are not documented and might be subject to change). However in the last expression, Pid is returned explicitly. This means, that you can pattern match like this:
case unregister_name(Something) of
Pid when is_pid(Pid) -> foo();
_ -> bar()
end.
To sum up: those lines aren't doing anything there, but when someone else is reading the source code, they show original programmer intent.
Unfortunately, this particular function is not exported and in the original module never used in pattern match, so I don't have an example to back this up :)
And I'll note that I've since come across this:
The Power of Ten – Rules for Developing Safety Critical Code
Gerard J. Holzmann
NASA/JPL Laboratory for Reliable Software Pasadena, CA
91109
[...]
Rule: The return value of non-void functions must be checked by each calling function, and the validity of parameters must be checked
inside each function.
Rationale: This is possibly the most frequently
violated rule, and therefore somewhat more suspect as a general rule.
In its strictest form, this rule means that even the return value of
printf statements and file close statements must be checked. One can
make a case, though, that if the response to an error would rightfully
be no different than the response to success, there is little point in
explicitly checking a return value. This is often the case with calls
to printf and close. In cases like these, it can be acceptable to
explicitly cast the function return value to (void) – thereby
indicating that the programmer explicitly and not accidentally decides
to ignore a return value. In more dubious cases, a comment should be
present to explain why a return value is irrelevant. In most cases,
though, the return value of a function should not be ignored,
especially if error return values must be propagated up the function
call chain. Standard libraries famously violate this rule with
potentially grave consequences. See, for instance, what happens if you
accidentally execute strlen(0), or strcat(s1, s2, -1) with the
standard C string library – it is not pretty. By keeping the general
rule, we make sure that exceptions must be justified, with mechanical
checkers flagging violations. Often, it will be easier to comply with
the rule than to explain why noncompliance might be acceptable.
I want to use erlang datetime values in the standard format {{Y,M,D},{H,Min,Sec}} in a MNESIA table for logging purposes and be able to select log entries by comparing with constant start and end time tuples.
It seems that the matchspec guard compiler somehow confuses tuple values with guard sub-expressions. Evaluating ets:match_spec_compile(MatchSpec) fails for
MatchSpec = [
{
{'_','$1','$2'}
,
[
{'==','$2',{1,2}}
]
,
['$_']
}
]
but succeeds when I compare $2 with any non-tuple value.
Is there a restriction that match guards cannot compare tuple values?
I believe the answer is to use double braces when using tuples (see Variables and Literals section of http://www.erlang.org/doc/apps/erts/match_spec.html#id69408). So to use a tuple in a matchspec expression, surround that tuple with braces, as in,
{'==','$2',{{1,2}}}
So, if I understand your example correctly, you would have
22> M=[{{'_','$1','$2'},[{'==','$2',{{1,2}}}],['$_']}].
[{{'_','$1','$2'},[{'==','$2',{{1,2}}}],['$_']}]
23> ets:match_spec_run([{1,1,{1,2}}],ets:match_spec_compile(M)).
[{1,1,{1,2}}]
24> ets:match_spec_run([{1,1,{2,2}}],ets:match_spec_compile(M)).
[]
EDIT: (sorry to edit your answer but this was the easiest way to get my comment in a readable form)
Yes, this is how it must be done. An easier way to get the match-spec is to use the (pseudo) function ets:fun2ms/1 which takes a literal fun as an argument and returns the match-spec. So
10> ets:fun2ms(fun ({A,B,C}=X) when C == {1,2} -> X end).
[{{'$1','$2','$3'},[{'==','$3',{{1,2}}}],['$_']}]
The shell recognises ets:fun2ms/1. For more information see ETS documentation. Mnesia uses the same match-specs as ETS.