how does the dafny invariant cope with datatypes - dafny

The example code will seem artificial as it is the smallest that I can find to illustrates my problem.
datatype Twee = Node(value : int, left : Twee, right : Twee) | Empty
method containsI(t : Twee, s : int) returns (r : bool)
{
var working :Twee := t;
if (working.Node?) {
r:= (working.value == s);
assert r==true ==> (working.value == s);
while working.Node?
decreases working
invariant r==true ==> (working.value == s)
{ //assert r==true ==> (working.value == s);
r:=false;
working:= working.right;
assert r==true ==> (working.value == s);
}
}
r:=false;
assert r==true ==> (working.value == s);
}
Dafny complains about working.value in the invariant. Stating that it can only be applied when working is a Node event though Dafny reports no problems when the invariant is commented out. Hence Dafny seems to know that working is a Node.
Any corrections to my understanding much appreciated.

a loop invariant needs to hold before and after each iteration of the loop; in particular, it should hold on loop termination, when the loop condition is returning false.
So for example, anything in the body of the loop is guarded by working.Node? (like the line working:= working.right;) which is why Dafny reports no problems there. However, Dafny reports a problem with the line
invariant r==true ==> (working.value == s) because this expression may need to be evaluated in a context where working.Node? does not hold.
Perhaps what you mean to do is write something like
invariant r==true ==> working.Node? ==> (working.value == s)
It's a little hard to tell what your intent is.
By the way, you may be wondering why the two lines at the end do not fail...
r:=false;
assert r==true ==> (working.value == s);
This is because the assertion is a vacuous predicate here (r==true always evaluates to false)

Related

how to prove that turning a set into a sequence and back is an identity in dafny

Hi Relatively new to Dafny and have defined methods set2Seq and seq2Set for conversion between sets and seqs. But can only find how to write a function fseq2Set from sets to sequences.
I can not find how to define fseq2Set. As Lemmas can not reference methods this makes proving the identity beyond me. Any help much appreciated?
Code:
function method fseq2Set(se: seq<int>) :set<int>
{ set x:int | x in se :: x }
method seq2Set(se: seq<int>) returns (s:set<int>)
{ s := set x:int | x in se :: x; }
method set2Seq(s: set<int>) returns (se:seq<int>)
requires s != {}
ensures s == fseq2Set(se)
decreases |s|
{
var y :| y in s;
var tmp ;
if (s=={y}) {tmp := [];} else {tmp := set2Seq(s-{y});}
se := [y] + tmp;
assert (s-{y}) + {y} == fseq2Set([y] + tmp);
}
/* below fails */
function fset2Seq(s:set<int>):seq<int>
decreases s { var y :| y in s ; [y] + fset2Seq(s-{y}) }
lemma cycle(s:set<int>) ensures forall s:set<int> :: fseq2Set(fset2Seq(s)) == s { }
Ok, there is kind of a lot going on here. First of all, I'm not sure if you intended to do anything with the methods seq2Set and set2Seq, but they don't seem to be relevant to your failure, so I'm just going to ignore them and focus on the functions.
Speaking of functions, Danfy reports an error on your definition of fset2Seq, because s might be empty. In that case, we should return the empty sequence, so I adjusted your definition to:
function fset2Seq(s:set<int>):seq<int>
decreases s
{
if s == {} then []
else
var y := Pick(s);
[y] + fset2Seq(s - {y})
}
function Pick(s: set<int>): int
requires s != {}
{
var x :| x in s; x
}
which fixes that error. Notice that I also wrapped the let-such-that operator :| in a function called Pick. This is essential, but hard to explain. Just trust me for now.
Now on to the lemma. Your lemma is stated a bit weirdly, because it takes a parameter s, but then the ensures clause doesn't mention s. (Instead it mentions a completely different variable, also called s, that is bound by the forall quantifier!) So I adjusted it to get rid of the quantifier, as follows:
lemma cycle(s:set<int>)
ensures fseq2Set(fset2Seq(s)) == s
Next, I follow My Favorite Heuristic™ in program verification, which is that the structure of the proof follows the structure of the program. In this case, the "program" in question is fseq2Set(fset2Seq(s)). Starting from our input s, it first gets processed recursively by fset2Seq and then through the set comprehension in fseq2Set. So, I expect a proof by induction on s that follows the structure of fset2Seq. That structure is to branch on whether s is empty, so let's do that in the lemma too:
lemma cycle(s:set<int>)
ensures fseq2Set(fset2Seq(s)) == s
{
if s == {} {
} else {
}
...
Dafny reports an error on the else branch but not on the if branch. In other words, Dafny has proved the base case, but it needs help with the inductive case. The next thing that fset2Seq(s) does is call Pick(s). Let's do that too.
lemma cycle(s:set<int>)
ensures fseq2Set(fset2Seq(s)) == s
{
if s == {} {
} else {
var y := Pick(s);
...
Now we know from its definition that fset2Seq(s) is going to return [y] + fset2Seq(s - {y}), so we can copy-paste our ensures clause and manually substitute this expression.
lemma cycle(s:set<int>)
ensures fseq2Set(fset2Seq(s)) == s
{
if s == {} {
} else {
var y := Pick(s);
assert fseq2Set([y] + fset2Seq(s - {y})) == s;
...
Dafny reports an error on this assertion, which is not surprising, since it's just a lightly edited version of the ensures clause we're trying to prove. But importantly, Dafny no longer reports an error on the ensures clause itself. In other words, if we can prove this assertion, we are done.
Looking at this assert, we can see that fseq2Set is applied to two lists appended together. And we would expect that to be equivalent to separately converting the two lists to sets, and then taking their union. We could prove a lemma to that effect, or we could just ask Dafny if it already knows this fact, like this:
lemma cycle(s:set<int>)
ensures fseq2Set(fset2Seq(s)) == s
{
if s == {} {
} else {
var y := Pick(s);
assert fseq2Set([y] + fset2Seq(s - {y})) == fseq2Set([y]) + fseq2Set(fset2Seq(s - {y}));
assert fseq2Set([y] + fset2Seq(s - {y})) == s;
...
(Note that the newly added assertion is before the last one.)
Dafny now accepts our lemma. We can clean up a little by deleting the the base case and the final assertion that was just a copy-pasted version of our ensures clause. Here is the polished proof.
lemma cycle(s:set<int>)
ensures fseq2Set(fset2Seq(s)) == s
{
if s != {} {
var y := Pick(s);
assert fseq2Set([y] + fset2Seq(s - {y})) == fseq2Set([y]) + fseq2Set(fset2Seq(s - {y}));
}
}
I hope this explains how to prove the lemma and also gives you a little bit of an idea about how to make progress when you are stuck.
I did not explain Pick. Basically, as a rule of thumb, you should just always wrap :| in a function whenever you use it. To understand why, see the Dafny power user posts on iterating over collecion and functions over set elements. Also, see Rustan's paper Compiling Hilbert's epsilon operator.

Dafny: Postcondition error in constructor

The following constructor does not work and fails at
parent !in Repr
Why can't Dafny proof the postcondition, that parent is not part of the Repr set?
constructor Init(x: HashObj, parent:Node?)
ensures Valid() && fresh(Repr - {this, data})
ensures Contents == {x.get_hash()}
ensures Repr == {this, data};
ensures left == null;
ensures right == null;
ensures data == x;
ensures parent != null ==> parent !in Repr;
ensures this in Repr;
{
data := x;
left := null;
right := null;
Contents := {x.get_hash()};
Repr := {this} + {data};
}
I'm guessing that HashObj is a trait? (If it's a class, then your example verifies for me.) The verification fails because the verifier thinks x might equal parent.
The verifier ought to know that Node is not a HashObj (unless, of course, your class Node really does extend HashObj), but it doesn't. You may file this as an Issue on https://github.com/dafny-lang/dafny to get that corrected.
In the meantime, you can write a precondition that says x and parent are different. Here, there's a wrinkle, too. You'd like to write
requires x != parent
but (unless Node really does extend HashObj) this does not type check. So, you would want to cast parent to object?. There's no direct syntax for such an up-cast, but you can do it with a let expression:
requires x != var o: object? := parent; o

Dafny iterator on heap fails to ensure Valid()

I have a class that allocates an iterator on the heap and calls MoveNext() on that iterator. I want to prove that if MoveNext() returns true, Valid() holds for the iterator.
iterator Iter()
{}
class MyClass
{
var iter : Iter;
constructor ()
{
iter := new Iter();
}
method next() returns (more : bool)
requires iter.Valid();
modifies iter, iter._new, iter._modifies;
{
more := iter.MoveNext();
// This assertion fails:
assert more ==> iter.Valid();
}
}
I took a look at the output of /rprint, and the method MoveNext() contains ensures more ==> this.Valid(), which seems to imply my desired assertion. If I change iter to be locally allocated within the method next(), then Dafny verifies the assertion.
The problem is that nothing is known about what's in iter._new and iter._modifies. If this is in one of those sets, then this.iter may have changed during the call to MoveNext().
This body for next() confirms what I just said:
var i := iter;
more := iter.MoveNext();
assert i == iter; // this assertion fails
assert more ==> iter.Valid();
So, under the assumption more, Valid() does still hold of the iterator, but the iterator may no longer referenced by iter:
more := iter.MoveNext();
assert more ==> old(iter).Valid(); // this holds
Probably the way you want to solve this problem is to add a precondition to next():
method next() returns (more : bool)
requires iter.Valid()
requires this !in iter._new + iter._modifies // add this precondition
modifies iter, iter._new, iter._modifies
{
more := iter.MoveNext();
assert more ==> iter.Valid(); // this now verifies
}
Rustan

Grammar and top down parsing

For a while I've been trying to learn how LL parsers work and, if I understand this correctly, when writing a top-down recursive descent parser by hand I should create a function for every non-terminal symbol. So for this example:
S -> AB
A -> aA|ε
B -> bg|dDe
D -> ab|cd
I'd have to create a function for each S, A, B and D like this:
Token B()
{
if (Lookahead(1) == 'b')
{
Eat('b');
Eat('g');
}
else if (Lookahead(1) == 'd')
{
Eat('d');
D();
Eat('e');
}
else
{
Error();
}
return B_TOKEN;
}
But then I try to do the same thing with the following grammar that I created to generate the same language as (a|b|c)*a regular expression:
S -> Ma
M -> aM|bM|cM|ε
That gives me following functions:
Token S()
{
char Ch = Lookahead(1);
if (Ch == 'a' || Ch == 'b' || Ch == 'c')
{
M();
Eat('a');
}
else
{
Error();
}
return S_TOKEN;
}
Token M()
{
char Ch = Lookahead(1);
if (Ch == 'a' || Ch == 'b' || Ch == 'c')
{
Eat(ch);
M();
}
return M_TOKEN;
}
And it doesn't seem to be good because for input 'bbcaa' M will consume everything, and after that S won't find the last 'a' and report an error, even though the grammar accepts it. It feels like M is missing the ε case, or maybe it is handled in the wrong way, but I'm not sure how to deal with it. Could anyone please help?
The behaviour of a top-down predictive parser is exactly as you note in your question. In other words, your second grammar is not suitable for top-down parsing (with one-token lookahead). Many grammars are not; that includes any grammar in which it is not possible to predict which production to use based on a finite lookahead.
In this case, if you were to lookahead two tokens instead of one, you could solve the conflict; M should predict the ε production on lookahead aEND, and the aM production on all other two-token lookaheads where the first token is a.

Recursive descent parser implementation

I am looking to write some pseudo-code of a recursive descent parser. Now, I have no experience with this type of coding. I have read some examples online but they only work on grammar that uses mathematical expressions. Here is the grammar I am basing the parser on.
S -> if E then S | if E then S else S | begin S L | print E
L -> end | ; S L
E -> i
I must write methods S(), L() and E() and return some error messages, but the tutorials I have found online have not helped a lot. Can anyone point me in the right direction and give me some examples?.
I would like to write it in C# or Java syntax, since it is easier for me to relate.
Update
public void S() {
if (currentToken == "if") {
getNextToken();
E();
if (currentToken == "then") {
getNextToken();
S();
if (currentToken == "else") {
getNextToken();
S();
Return;
}
} else {
throw new IllegalTokenException("Procedure S() expected a 'then' token " + "but received: " + currentToken);
} else if (currentToken == "begin") {
getNextToken();
S();
L();
return;
} else if (currentToken == "print") {
getNextToken();
E();
return;
} else {
throw new IllegalTokenException("Procedure S() expected an 'if' or 'then' or else or begin or print token " + "but received: " + currentToken);
}
}
}
public void L() {
if (currentToken == "end") {
getNextToken();
return;
} else if (currentToken == ";") {
getNextToken();
S();
L();
return;
} else {
throw new IllegalTokenException("Procedure L() expected an 'end' or ';' token " + "but received: " + currentToken);
}
}
public void E() {
if (currentToken == "i") {
getNextToken();
return;
} else {
throw new IllegalTokenException("Procedure E() expected an 'i' token " + "but received: " + currentToken);
}
}
Basically in recursive descent parsing each non-terminal in the grammar is translated into a procedure, then inside each procedure you check to see if the current token you are looking at matches what you would expect to see on the right hand side of the non-terminal symbol corresponding to the procedure, if it does then continue applying the production, if it doesn't then you have encountered an error and must take some action.
So in your case as you mentioned above you will have procedures: S(), L(), and E(), I will give an example of how L() could be implemented and then you can try and do S() and E() on your own.
It is also important to note that you will need some other program to tokenize the input for you, then you can just ask that program for the next token from your input.
/**
* This procedure corresponds to the L non-terminal
* L -> 'end'
* L -> ';' S L
*/
public void L()
{
if(currentToken == 'end')
{
//we found an 'end' token, get the next token in the input stream
//Notice, there are no other non-terminals or terminals after
//'end' in this production so all we do now is return
//Note: that here we return to the procedure that called L()
getNextToken();
return;
}
else if(currentToken == ';')
{
//we found an ';', get the next token in the input stream
getNextToken();
//Notice that there are two more non-terminal symbols after ';'
//in this production, S and L; all we have to do is call those
//procedures in the order they appear in the production, then return
S();
L();
return;
}
else
{
//The currentToken is not either an 'end' token or a ';' token
//This is an error, raise some exception
throw new IllegalTokenException(
"Procedure L() expected an 'end' or ';' token "+
"but received: " + currentToken);
}
}
Now you try S() and E(), and post back.
Also as Kristopher points out your grammar has something called a dangling else, meaing that you have a production that starts with the same thing up to a point:
S -> if E then S
S -> if E then S else S
So this begs the question if your parser sees an 'if' token, which production should it choose to process the input? The answer is it has no idea which one to choose because unlike humans the compiler can't look ahead into the input stream to search for an 'else' token. This is a simple problem to fix by applying a rule known as Left-Factoring, very similar to how you would factor an algebra problem.
All you have to do is create a new non-terminal symbol S' (S-prime) whose right hand side will hold the pieces of the productions that aren't common, so your S productions no becomes:
S -> if E then S S'
S' -> else S
S' -> e
(e is used here to denote the empty string, which basically means there is no
input seen)
This is not the easiest grammar to start with because you have an unlimited amount of lookahead on your first production rule:
S -> if E then S | if E then S else S | begin S L | print E
consider
if 5 then begin begin begin begin ...
When do we determine this stupid else?
also, consider
if 5 then if 4 then if 3 then if 2 then print 2 else ...
Now, was that else supposed to bind to the if 5 then fragment? If not, that's actually cool, but be explicit.
You can rewrite your grammar (possibly, depending on else rule) equivalently as:
S -> if E then S (else S)? | begin S L | print E
L -> end | ; S L
E -> i
Which may or may not be what you want. But the pseudocode sort of jumps out from this.
define S() {
if (peek()=="if") {
consume("if")
E()
consume("then")
S()
if (peek()=="else") {
consume("else")
S()
}
} else if (peek()=="begin") {
consume("begin")
S()
L()
} else if (peek()=="print") {
consume("print")
E()
} else {
throw error()
}
}
define L() {
if (peek()=="end") {
consume("end")
} else if (peek==";")
consume(";")
S()
L()
} else {
throw error()
}
}
define E() {
consume_token_i()
}
For each alternate, I created an if statement that looked at the unique prefix. The final else on any match attempt is always an error. I consume keywords and call the functions corresponding to production rules as I encounter them.
Translating from pseudocode to real code isn't too complicated, but it isn't trivial. Those peeks and consumes probably don't actually operate on strings. It's far easier to operate on tokens. And simply walking a sentence and consuming it isn't quite the same as parsing it. You'll want to do something as you consume the pieces, possibly building up a parse tree (which means these functions probably return something). And throwing an error might be correct at a high level, but you'd want to put some meaningful information into the error. Also, things get more complex if you do require lookahead.
I would recommend Language Implementation Patterns by Terence Parr (the guy who wrote antlr, a recursive descent parser generator) when looking at these kinds of problems. The Dragon Book (Aho, et al, recommended in a comment) is good, too (it is still probably the dominant textbook in compiler courses).
I taught (really just, helped) the parsing section of a PL class last semester. I really recommend looking at the parsing slides from our page: here. Basically, for recursive descent parsing, you ask yourself the following question:
I've parsed a little bit of a nonterminal, now I'm at a point where I can make a choice as to what I'm supposed to parse next. What I see next will make decisions as to what nonterminal I'm in.
Your grammar -- by the way -- exhibits a very common ambiguity called the "dangling else," which has been around since the Algol days. In shift reduce parsers (typically produced by parser generators) this generates a shift / reduce conflict, where you typically choose to arbitrarily shift instead of reduce, giving you the common "maximal much" principal. (So, if you see "if (b) then if (b2) S1 else S2" you read it as "if (b) then { if (b2) { s1; } else { s2; } }")
Let's throw this out of your grammar, and work with a slightly simpler grammar:
T -> A + T
| A - T
| A
A -> NUM * A
| NUM / A
| NUM
we're also going to assume that NUM is something you get from the lexer (i.e., it's just a token). This grammar is LL(1), that is, you can parse it with a recursive descent parser implemented using a naive algorithm. The algorithm works like this:
parse_T() {
a = parse_A();
if (next_token() == '+') {
next_token(); // advance to the next token in the stream
t = parse_T();
return T_node_plus_constructor(a, t);
}
else if (next_token() == '-') {
next_token(); // advance to the next token in the stream
t = parse_T();
return T_node_minus_constructor(a, t);
} else {
return T_node_a_constructor(a);
}
}
parse_A() {
num = token(); // gets the current token from the token stream
next_token(); // advance to the next token in the stream
assert(token_type(a) == NUM);
if (next_token() == '*') {
a = parse_A();
return A_node_times_constructor(num, a);
}
else if (next_token() == '/') {
a = parse_A();
return A_node_div_constructor(num, a);
} else {
return A_node_num_constructor(num);
}
}
Do you see the pattern here:
for each nonterminal in your grammar: parse the first thing, then see what you have to look at to decide what you should parse next. Continue this until you're done!
Also, please note that this type of parsing typically won't produce what you want for arithmetic. Recursive descent parsers (unless you use a little trick with tail recursion?) won't produce leftmost derivations. In particular, you can't write left recursive rules like "a -> a - a" where the leftmost associativity is really necessary! This is why people typically use fancier parser generator tools. But the recursive descent trick is still worth knowing about and playing around with.

Resources