Z3 model for correct Dafny method - z3

For a correct method, can Z3 find a model for the method's verification condition?
I had thought not, but here is an example where the method is correct
yet verification finds a model.
This was with Dafny 1.9.7.

What Malte says is correct (and I found it nicely explained as well).
Dafny is sound, in the sense that it will only verify correct programs. In other words, if a program is incorrect, the Dafny verifier will never say that it is correct. However, the underlying decision problems are in general undecidable. Therefore, unavoidably, there will be cases where a program meets its specifications and the verifier still gives an error message. Indeed, in such cases, the verifier may even show a purported counterexample. It may be a false counterexample (as in the example above) -- it simply means that, as far as the verifier can tell, this is a counterexample. If the verifier just spent a little more time or if it was clever enough to unroll more function definitions, apply induction hypotheses, or do a host of other good-things-to-do, it may be possible to determine that the counterexample is bogus. So, any error message you get (including any counterexample that may accompany such an error message) should be interpreted as a possible error (and possible counterexample).
Similar situations frequently occur if you're trying to verify the correctness of a loop and you don't supply a strong enough loop invariant. The Dafny verifier may then show some values of variables on entry to the loop that can never occur in actuality. The counterexample is then trying to give you an idea of how to strengthen your loop invariant appropriately.
Finally, let me add two notes to what Malte said.
First, there's at least another source of incompleteness involved in this example, namely non-linear arithmetic. It can sometimes be difficult to navigate around.
Second, the trick of using function Dummy can be simplified. It suffices (at least in this example) to mention the Pow call, for example like this:
lemma EvenPowerLemma(a: int, b: nat)
requires Even(b)
ensures Pow(a, b) == Pow(a*a, b/2)
{
if b != 0 {
var dummy := Pow(a, b - 2);
}
}
Still, I like the other two manual proofs better, because they do a better job of explaining to the user what the proof is.
Rustan

Dafny fails to prove the lemma due to a combination of two possible sources of incompleteness: recursive definitions (here Pow) and induction. The proof effectively fails because of too little information, i.e. because the problem is underconstrained, which in turn explains why a counterexample can be found.
Induction
Automating induction is difficult because it requires computing an induction hypothesis, which is not always possible. However, Dafny has some heuristics for applying induction (that might or might not work), and which can be switched of, as in the following code:
lemma {:induction false} EvenPowerLemma_manual(a: int, b: nat)
requires Even(b);
ensures Pow(a, b) == Pow(a*a, b/2);
{
if (b != 0) {
EvenPowerLemma_manual(a, b - 2);
}
}
With the heuristics switched off, you need to manually "call" the lemma, i.e. use the induction hypothesis (here, only in the case where b >= 2), in order to get the proof through.
In your case, the heuristics were activated, but they were not "good enough" to get the proof done. I'll explain why next.
Recursive definitions
Reasoning statically about recursive definitions by unfolding them is prone to infinite descent because it is in general undecidable when to stop. Hence, Dafny per default unrolls function definitions only once. In your example, unrolling the definition of Pow only once is not enough to get the induction heuristics to work because the induction hypothesis must be applied to Pow(a, b-2), which does not "appear" in the proof (since unrolling once only gets you to Pow(a, b - 1)). Explicitly mentioning Pow(a, b-2) in the proof, even in a otherwise meaningless formula, triggers the induction heuristics, however:
function Dummy(a: int): bool
{ true }
lemma EvenPowerLemma(a: int, b: nat)
requires Even(b);
ensures Pow(a, b) == Pow(a*a, b/2);
{
if (b != 0) {
assert Dummy(Pow(a, b - 2));
}
}
The Dummy function is there to make sure that the assertion provides no information beyond syntactically including Pow(a, b-2). A less oddly-looking assertion would be assert Pow(a, b) == a * a * Pow(a, b - 2).
Calculational Proof
FYI: You can also make the proof steps explicit and have Dafny check them:
lemma {:induction false} EvenPowerLemma_manual(a: int, b: nat)
requires Even(b);
ensures Pow(a, b) == Pow(a*a, b/2);
{
if (b != 0) {
calc {
Pow(a, b);
== a * Pow(a, b - 1);
== a * a * Pow(a, b - 2);
== {EvenPowerLemma_manual(a, b - 2);}
a * a * Pow(a*a, (b-2)/2);
== Pow(a*a, (b-2)/2 + 1);
== Pow(a*a, b/2);
}
}
}

Related

assertion violation when verifying Max function in Dafny?

The following program results in an assertion violation on assert v==40: why ? The program can be verified when the array a contains only one element.
method Max(a:array<int>) returns(max:int)
requires 1<=a.Length
ensures forall j:int :: 0<=j< a.Length ==> max >= a[j]
ensures exists j:int :: 0<=j< a.Length && max == a[j]
{
max:=a[0];
var i :=1;
while(i < a.Length)
invariant 1<=i<=a.Length
decreases a.Length-i
invariant forall j:int :: 0<=j<i ==> max >= a[j]
invariant exists j:int :: 0<=j<i && max == a[j]
{
if(a[i] >= max){max := a[i];}
i := i + 1;
}
}
method Test(){
var a := new int[2];
a[0],a[1] := 40,10;
var v:int:=Max(a);
assert v==40;
}
This is indeed strange! It boils down to the way Dafny handles quantifiers.
Let's start with a human-level proof that the assertion is actually valid. From the postconditions of Max, we know two things about v: (1) it is at least as big as every element in a, and (2) it is equal to some element of a. By (2), v is either 40 or 10, and by (1), v is at least 40 (because it's at least as big as a[0], which is 40). Since 10 is not at least 40, v can't be 10, so it must be 40.
Now, why does Dafny fail to understand this automatically? It's because of the forall quantifier in (1). Dafny (really Z3) internally uses "triggers" to approximate the behavior of universal quantifiers. (Without any approximation, reasoning with quantifiers is undecidable in general, so some restriction like this is required.) The way triggers work is that for each quantifier in the program, a syntactic pattern called the trigger is inferred. Then, that quantifier is completely ignored unless the trigger matches some expression in the context.
In this example, fact (1) will have a trigger of a[j]. (You can see what triggers are inferred in Visual Studio or VSCode or emacs by hovering over the quantifier. Or on the command line, by passing the option /printTooltips and looking for the line number.) That means that the quantifier will be ignored unless there is some expression of the form a[foo] in the context, for any expression foo. Then (1) will be instantiated with foo for j, and we'll learn max >= a[foo].
Since your Test method's assertion doesn't mention any expression of the form a[foo], Dafny will not be able to use fact (1) at all, which results in the spurious assertion violation.
One way to fix your Test method is add the assertion
assert v >= a[0];
just before the other assertion. This is the key consequence of fact (1) that we needed in our human level proof, and it contains the expression a[0], which matches the trigger, allowing Dafny to instantiate the quantifier. The rest of the proof then goes through automatically.
For more information about triggers in general and how to write them manually, see this answer.

Surprising Dafny failure to verify boundedness of set comprehension

Dafny has no problem with this definition of a set intersection function.
function method intersection(A: set<int>, B: set<int>): (r: set<int>)
{
set x | x in A && x in B
}
But when it comes to union, Dafny complains, "a set comprehension must produce a finite set, but Dafny's heuristics can't figure out how to produce a bounded set of values for 'x'". A and B are finite, and so, clearly the union is, too.
function method union(A: set<int>, B: set<int>): (r: set<int>)
{
set x | x in A || x in B
}
What explains this, to-a-beginner seemingly discrepant, behavior?
This is indeed potentially surprising!
First, let me note that in practice, Dafny has built-in operators for intersection and union that it knows preserve finiteness. So you don't need to use set comprehensions to express these ideas. Instead you could just say A * B and A + B respectively.
However, my guess is that you're running into a more complicated example where you're using a set comprehension with a disjunction and are confused about why Dafny can't prove it finite.
Dafny uses syntactic heuristics to determine whether a set comprehension is finite. Unfortunately, these heuristics are not well documented anywhere. For purposes of this question, the key point is that the heuristics either depend on the type of the comprehension's bound variables, or look for a conjunct that constrains elements to be bounded in some other way. For example, Dafny can prove
set x: int | 0 <= x < 10 && ...
finite, as well as
set x:A | x in S && ...
In both cases, it is essential that the relevant bounds be conjuncts. Dafny has no syntactic heuristic for proving a bound for disjunctions, although one could imagine adding one. That is why Dafny cannot prove your union function finite.
As an aside, another work around would be to use potentially infinite sets (written iset in Dafny). If you don't need use the cardinality of the sets, then these might work better.

Dafny : lemmas without bodies

I have been using abstract lemmas and functions (without bodies) in a modeling task. In this example,
lemma py(c : int) returns (a: int, b :int)
ensures a*a + b*b == c*c
lemma main(c : int) returns (a: int, b :int)
ensures a*a + b*b == c*c
{
a, b := py(c);
}
calling py in main ensures that the post-condition is true irrespective of how py is implemented. I have 2 questions:
Is it safe to use abstract lemmas/functions in Dafny? The following modification to the above code is verified by Dafny:
lemma py(c : int) returns (a: int, b :int)
ensures a*a + b*b == c*c
ensures a*a + b*b != c*c
while I think that may be Dafny should have thrown an error.
Should I say lemma {:axiom} py(...)? I haven't observed a difference on including {:axiom} or {:imported}.
As James mentions, a lemma without a body can be useful when modeling behavior that you're not implementing. If you don't give a body, the verifier does not attempt to verify the correctness of the lemma. Therefore, it is essentially an axiom.
Even without the /noCheating flag that James mentions, the Dafny compiler will complain about body-less lemmas (and methods and functions). Note that the compiler does not kick in until after the verifier is satisfied. The {:axiom} attribute says you're willing to accept responsibility for the truth of the lemma. For a body-less non-ghost method, you can also use the {:extern} attribute, which lets you link with code written in other languages. Again, you assume responsibility for the correctness of that external code, since the Dafny verifier won't check it.
Rustan
Methods and functions without bodies are indeed useful when modeling parts of a system that you're not implementing.
However, one has to be careful when giving such methods and functions postconditions, because those become trusted, and are not checked by Dafny. In other words, it is potentially not safe to use lemmas or functions without bodies if they have postconditions.
That said, such methods and functions are indispensable for modeling, so the fact that they are potentially unsafe does not mean you shouldn't use them. Instead, you should just be extra careful when writing down the postconditions, because they will not be checked.
If you pass Dafny the /noCheating:1 flag, it will complain about any methods or functions without bodies that have postconditions, and force you to use the {:axiom} attribute. This can be helpful even when not passing /noCheating:1, just to mark the fact that the postcondition is trusted. It's up to you whether to pass /noCheating:1 or whether to use the attribute anyways.

Sum of the elements of a sequence of integers: loop invariant might not be maintained by the loop

After reading Getting Started with Dafny: A Guide, I decided to create my first program: given a sequence of integers, compute the sum of its elements. However, I am having a hard time in getting Dafny to verify the program.
function G(a: seq<int>): int
decreases |a|
{
if |a| == 0 then 0 else a[0] + G(a[1..])
}
method sum(a: seq<int>) returns (r: int)
ensures r == G(a)
{
r := 0;
var i: int := 0;
while i < |a|
invariant 0 <= i <= |a|
invariant r == G(a[0..i])
{
r := r + a[i];
i := i + 1;
}
}
I get
stdin.dfy(12,2): Error BP5003: A postcondition might not hold on this return path.
stdin.dfy(8,12): Related location: This is the postcondition that might not hold.
stdin.dfy(14,16): Error BP5005: This loop invariant might not be maintained by the loop.
I suspect that Dafny needs some "help" in order to verify the program (lemmas maybe?) but I do not know where to start.
Here is a version of your program that verifies.
There were two things to fix: the proof that the postcondition follows after the loop, and the proof that the loop invariant is preserved.
The postcondition
Dafny needs a hint that it might be helpful to try to prove a == a[..|a|]. Asserting that equality is enough to finish this part of the proof: Dafny automatically proves the equality and uses it to prove the postcondition from the loop invariant.
This is a common pattern. You can try to see what is bothering Dafny by doing the proof "by hand" in Dafny by making various assertions that you would use to prove it to yourself on paper.
The loop invariant
This one is a bit more complicated. We need to show that updating r and incrementing i preserves r == G(a[..i]). To do this, I used a calc statement, which let's one prove an equality via a sequence of intermediate steps. (It is always possible to prove such things without calc, if you prefer, by asserting all the relevant equalities as well as any assertions inside the calc. But I think calc is nicer.)
I placed the calc statement before the updates to r and i occur. I know that after the updates occur, I will need to prove r == G(a[..i]) for the updated values of r and i. Thus, before the updates occur, it suffices to prove r + a[i] == G(a[..i+1]) for the un-updated values. My calc statement starts with r + a[i] and works toward G(a[..i+1]).
First, by the loop invariant on entry to the loop, we know that r == G(a[i]) for the current values.
Next, we want to bring the a[i] inside the G. This fact is not entirely trivial, so we need a lemma. I chose to prove something slightly more general than necessary, which is that G(a + b) == G(a) + G(b) for any integer sequences a and b. I call this lemma G_append. Its proof is discussed below. For now, we just use it to get bring the a[i] inside as a singleton sequence.
The last step in this calc is to combine a[0..i] + [a[i]] into a[0..i+1]. This is another sequence extensionality fact, and thus needs to be asserted explicitly.
That completes the calc, which proves the invariant is preserved.
The lemma
The proof of G_append proceeds by induction on a. The base case where a == [] is handled automatically. In the inductive case, we need to show G(a + b) == G(a) + G(b), assuming the induction hypothesis for any subsequences of a. I use another calc statement for this.
Beginning with G(a + b), we first expand the definition of G. Next, we note that (a + b)[0] == a[0] since a != []. Similarly, we have that (a + b)[1..] == a[1..] + b, but since this is another sequence extensionality fact, it must be explicitly asserted. Finally, we can use the induction hypothesis (automatically invoked by Dafny) to show that G(a[1..] + b) == G(a[1..]) + G(b).

Boogie strange assert(false) behavior

I am working with Boogie and I have come across some behaviors I do not understand.
I have been using assert(false) as a way to check if the previous assume statements are absurd.
For instance in the following case, the program is verified without errors...
type T;
const t1, t2: T;
procedure test()
{
assume (t1 == t2);
assume (t1 != t2);
assert(false);
}
...as t1 == t2 && t1 != t2 is an absurd statement.
On the other hand if I have something like
type T;
var map: [T]bool;
const t1, t2: T;
procedure test()
{
assume(forall a1: T, a2: T :: !map[a1] && map[a2]);
//assert(!map[t1]);
assert(false);
}
The assert(false) only fails when the commented line is uncommented. Why is the commented assert changing the result of the assert(false)?
Gist: the SMT solver underlying Boogie will not instantiate the quantifier if you don't mention a ground instance of map[...] in your program.
Here is why: SMT solvers (that use e-matching) typically use syntactic heuristics to decide when to instantiate a quantifier. Consider the following quantifier:
forall i: Int :: f(i)
This quantifier admits infinitely many instantiations (since i ranges over an unbounded domain), trying all would thus result in non-termination. Instead, SMT solvers expect syntactic hints instructing it for which i the quantifier should be instantiated. These hints are called a patterns or triggers. In Boogie, they can be written as follows:
forall i: Int :: {f(i)} f(i)
This trigger instructs the SMT solver to instantiate the quantifier for each i for which f(i) is mentioned in the program (or rather, current proof search). E.g., if you assume f(5), then the quantifier will be instantiated with 5 substituted for i.
In your example, you don't provide a pattern explicitly, so the SMT solver might pick one for you, by inspecting the quantifier body. It will most likely pick {map[a1], map[a2]} (multiple function applications are allowed, patterns must cover all quantified variables). If you uncomment the assume, the ground term map[t1] becomes available, and the SMT solver can instantiate the quantifier with a1, a2 mapped to t1, t1. Hence, the contradiction is obtained.
See the Z3 guide for more details on patterns. More involved texts about patterns can be found, e.g. in
this paper, in
this paper or in
this paper.

Resources