z3 PROOF=true option namespaces & quantified variables - z3

I'm using the TRACE=true and PROOF=trueoptions to generate logs.
Without the PROOF=true option identifiers that are never declared are used for quantified variables in pattern terms for quantifier-triggers. For example for a quantifier ∀x. {f(x)} f(x) = x I would get a log similar to:
[mk-app] #3 f #2
[mk-app] #4 pattern #3
[mk-app] #5 = #3 #2
[mk-quant] #6 quantifier_name #4 #5
where there is no [mk-app] line for #2 in the log. Is this understanding of how the logs work correct?
With the PROOF=true option enabled the identifier that should represent the quantified variable is used for proof steps. For the same quantifier as above I might get a log like:
[mk-app] #1 true
[mk-app] #2 asserted #1
[mk-app] #3 f #2
[mk-app] #4 pattern #3
[mk-app] #5 = #3 #2
[mk-quant] #6 quantifier_name #4 #5
My first guess was that the identifiers for proof steps and regular terms should be in separate namespaces such that #2 would still be undefined in the "terms namespace" in the log above.
This interpretation does however not solve another kind of issue that I'm seeing with the PROOF=true option enabled where I'm getting log similar to the following for the same quantifier as before:
[mk-app] #1 some_term
[mk-app] #2 iff #1 #1
[mk-app] #3 refl #2
[mk-app] #4 f #2
[mk-app] #5 pattern #4
[mk-app] #6 = #4 #2
[mk-quant] #7 quantifier_name #5 #6
i.e. the quantified variable is #2 which is a regular term.
How does one figure out which term is supposed to be a quantified variable in such logs? I've uploaded an smt file and the corresponding log file I generated with z3 version 4.6.1 (64-bit). The first issue I described happens for the quantifier on l. 50 of the log file with the quantified variable being #40 (l. 32) and the second issue happens for the quantifier on l. 63 of the log file and the quantified variable #58 (l. 52).

Related

Top down parsing - Compute FIRST and FOLLOW

Given the following grammar:
S -> S + S | S S | (S) | S* | a
S -> S S + | S S * | a
For the life of me I can't seem to figure out how to compute the FIRST and FOLLOW for the above grammar. The recursive non-terminal of S confuses me. Does that mean I have to factor out the grammar first before computing the FIRST and FOLLOW?
The general rule for computing FIRST sets in CFGs without ε productions is the following:
Initialize FIRST(A) as follows: for each production A → tω, where t is a terminal, add t to FIRST(A).
Repeatedly apply the following until nothing changes: for each production of the form A → Bω, where B is a nonterminal, set FIRST(A) = FIRST(A) ∪ FIRST(B).
We could follow the above rules as written, but there's something interesting here we can notice. Your grammar only has a single nonterminal, so that second rule - which imports elements into the FIRST set of one nonterminal from FIRST sets from another nonterminal - won't actually do anything. In other words, we can compute the FIRST set just by applying that initial rule. And that's not too bad here - we just look at all the productions that start with a terminal and get FIRST(S) = { a, ( }.

Building parse trees with shift-reduce parsing

I'm experimenting with parsing on my free time, and I wanted to implement a shift-reduce parser for a very very simple grammar. I've read many online articles but I'm still confused on how to create parse trees. Here's an example of what I want to do:
Grammar:
Expr -> Expr TKN_Op Expr
Expr -> TKN_Num
Here's an example input:
1 + 1 + 1 + 1
That, after tokenization, becomes:
TKN_Num TKN_Op TKN_Num TKN_Op TKN_Num TKN_Op TKN_Num
I understand that:
Shifting means pushing the first input token on the stack and removing it from the input
Reducing means substituting one or more elements on the stack with a grammar element
So, basically, this should happen:
Step 1:
Stack:
Input: TKN_Num TKN_Op TKN_Num TKN_Op TKN_Num TKN_Op TKN_Num
What: Stack is empty. Shift.
Step 2:
Stack: TKN_Num
Input: TKN_Op TKN_Num TKN_Op TKN_Num TKN_Op TKN_Num
What: TKN_Num can be reduced to Expr. Reduce.
Step 3:
Stack: Expr
Input: TKN_Op TKN_Num TKN_Op TKN_Num TKN_Op TKN_Num
What: Cannot reduce. Shift.
Step 4:
Stack: Expr TKN_Op
Input: TKN_Num TKN_Op TKN_Num TKN_Op TKN_Num
What: Cannot reduce. Shift.
Step 5:
Stack: Expr TKN_Op TKN_Num
Input: TKN_Op TKN_Num TKN_Op TKN_Num
What: TKN_Num can be reduced to Expr. Reduce.
// What should I check for reduction?
// Should I try to reduce incrementally using
// only the top of the stack first,
// then adding more stack elements if I couldn't
// reduce the top alone?
Step 6:
Stack: Expr TKN_Op Expr
Input: TKN_Op TKN_Num TKN_Op TKN_Num
What: Expr TKN_Op Expr can be reduced to Expr. Reduce.
Step 7:
Stack: Expr
Input: TKN_Op TKN_Num TKN_Op TKN_Num
What: ...
// And so on...
Apart from the "what to reduce?" doubt, I have no clue how to correctly build a parse tree. The tree should probably look like this:
1 + o
|
1 + o
|
1 + 1
Should I create a new node on reduction?
And when should I add children to the newly create node / when should I create a new root node?
The simple and obvious thing to do is to create a tree node on every reduction, and add the tree nodes from the grammar elements that were reduced to that tree node.
This is easily managed with a node stack that runs in parallel to the "shift token" stack that the raw parser uses. For every reduction for a rule of length N, the shift-token stack is shortened by N, and nonterminal token is pushed on the shift stack. At the same time, shorten the node stack by removing the top N nodes, create a node for the nonterminal, attached the removed N nodes as children, and push that node onto the node stack.
This policy even works with rules that have zero-length right hand side: create a tree node and attach the empty set of children to it (e.g., create a leaf node).
If you think of a "shift" on a terminal node as a reduction (of the characters forming the terminal) to the terminal node, then terminal node shifts fit right in. Create a node for the terminal, and push it onto the stack.
If you do this, you get a "concrete syntax/parse tree" that matches the grammar isomorphically. (We do this for a commercial tool I offer). There are lots of folks that don't like such concrete trees, because they contain nodes for keywords, etc., which don't really add much value. True, but such trees are supremely easy to construct, and supremely easy to understand becuase the grammar is the tree structure. When you have 2500 rules (as we do for a full COBOL parser), this matters. This is also convenient because all the mechanism can be built completely into the parsing infrastructure. The grammar engineer simply writes rules, the parser runs, voila, a tree. It is also easy to change the grammar: just change it, voila, you still get parse trees.
However, if you don't want a concrete tree, e.g., you want an "abstract syntax tree", then what you have to do is let the grammar engineer control which reductions generate nodes; usually be adding some procedural attachment (code) to each grammar rule to be executed on a reduction step. Then if any such procedural attachement produces a node, it is retained on a node stack. Any procedural attachment which produces a node must attach nodes produced by the right hand elements. If any this is what YACC/Bison/... most of the shift-reduce parser engines do. Go read about Yacc or Bison and examine a grammar. This scheme gives you lot of control, at the price of insisting that you take that control. (For what we do, we don't want this much engineering effort in building a grammar).
In the case of producing CSTs, it is conceptually straightforward to remove "useless" nodes from trees; we do that in our tool. The result is a lot like an AST, without the manual effort to write all those procedural attachments.
The reason of your trouble is that you have a shift/reduce conflict in your grammar:
expr: expr OP expr
| number
You can resolve this in 2 ways:
expr: expr OP number
| number
for left associative operators, or
expr: number OP expr
| number
for right associative ones. This should also determine the shape of your tree.
Reduction is usually done when one clause is detected complete. In the right associative case, you would start in a state 1 that expects a number, pushes it onto the value stack and shifts to state 2. In state 2, if the token is not an OP, you can reduce a number to an expr. Otherwise, push the operator and shift to state 1. Once state 1 is complete, you can reduce the number, operator and expression to another expression. Note, you need a mechanism to "return" after a reduction. The overall parser would then start in state 0, say, which immediately goes to state 1 and accepts after reduction.
Note that tools like yacc or bison make this kind of stuff much easier because they bring all the low level machinery and the stacks.

ini-option CASE_SPLIT produces strange model

While working on my masters thesis with z3 I found something strange I can't understand.
I hope you can help me. :)
The smt-file I wrote looks like this:
(set-logic QF_UF)
(set-info :smt-lib-version 2.0)
;Declare sort Node and its objects.
(declare-sort Node 0)
(declare-fun n0 () Node)
(declare-fun n1 () Node)
;Predicate
(declare-fun dead_0 (Node) Bool)
;Abbreviation
(declare-fun I () Bool)
;initial configuration
(assert(= I (and
(not(= n0 n1))
(not(dead_0 n0))
(dead_0 n1))))
;Predicate
(declare-fun dead_1 (Node) Bool)
;variable
(declare-fun m0_1 () Node)
;Abbreviation for Transformation
(declare-fun TL1_1 () Bool)
;Transformation1neuerKnoten1
(assert(or (= m0_1 n0)(= m0_1 n1)))
;Is the whole formula satisfiable?
(assert(= (and I TL1_1) true))
(check-sat)
(get-model)
Everything works quite well while using z3 as a command line tool with default-options.
The generated model contains:
;; universe for Node:
;; Node!val!0 Node!val!1
;; -----------
and
(define-fun n0 () Node
Node!val!0)
(define-fun n1 () Node
Node!val!1)
(define-fun m0_1 () Node
Node!val!0)
So my variable m0_1 is bound to the node n0.
Then I used z3 with an ini-file only containing CASE_SPLIT=5.
The result was a strange model. In my opinion the difference is basically
that my variable m0_1 is NOT bound to any of my nodes n0 or n1.
The produced model contains:
;; universe for Node:
;; Node!val!2 Node!val!0 Node!val!1
;; -----------
and
(define-fun n0 () Node
Node!val!0)
(define-fun n1 () Node
Node!val!1)
(define-fun m0_1 () Node
Node!val!2)
So my question is this: why did z3 create another node (Node!val!2) and why is my variable m0_1 bound to this new node? I thought that one of my assertions ((assert(or (= m0_1 n0)(= m0_1 n1)))) would forbid this.
Thanks in advance! :)
Z3 has a feature called "relevancy propagation". This is feature is very effective for problems containing quantifiers, but it is usually overhead for quantifier free problems. The command line option RELEVANCY=0 disables relevancy propagation, and RELEVANCY=2 or RELEVANCY=1 enables it.
The option CASE_SPLIT=5 assumes that relevancy propagation is enabled.
If we provide CASE_SPLIT=5 RELEVANCY=0, then Z3 will generate a warning message
WARNING: relevacy must be enabled to use option CASE_SPLIT=3, 4 or 5
and, ignores the option.
Moreover, by default, Z3 uses a "automatic configuration" feature. This feature scans the input formula and adjusts the Z3 configuration for the given instance.
So, in your example, the following happens:
You provide the option CASE_SPLIT=5
When Z3 validates the command line options, relevancy propagation is disabled, and no warning message is generated.
Z3 runs the auto configuration procedure, and since your example is quantifier free, it disables relevancy propagation RELEVANCY=0. Now, an inconsistent configuration is uses, and Z3 produces the wrong result.
To avoid this problem, if you use CASE_SPLIT=5, then you should also use AUTO_CONFIG=false (disables auto configuration) and RELEVANCY=2 (enables relevancy propagation). So, the command line should be:
z3 CASE_SPLIT=5 AUTO_CONFIG=false RELEVANCY=2 file.smt2
In the next release (Z3 4.2), I will make Z3 to display the warning message if the user tries to set CASE_SPLIT=5 when auto configuration is enabled.

Satisfying Models under Tseitin Encoding

I am using the following code fragment in z3 4.0 to convert a formula to CNF.
(set-logic QF_UF)
(
set-option
:produce-models
true
)
; ------ snip -------
;
; declarations,
; and assert statement
; of "original" formula
; here.
;
; ------ snap -------
(
apply
(
then
(
!
simplify
:elim-and
true
)
tseitin-cnf
)
)
I get something like the following:
(goals
(goal
; ------ snip -------
;
; Lot's of lines here
;
; ------ snap -------
:precision precise :depth 2)
)
I was assuming that each of the expressions that follows goal is one clause of the CNF, i.e., all those expressions should be conjuncted to yield the actual formula. I will refer to this conjunction as the "encoded" formula.
Obviously, the original formula and the encoded formula are not equivalent, as the encoded formula contains new variables k!0, k!1, ... which do the Tseitin encoding. However, I was expecting that they are equisatisfiable, or actually that they are satisfied by the same models (when disregarding the k!i variables).
I.e., I was expecting that (encoded formula) AND (NOT original formula) is unsatisfiable. Unfortunately, this does not seem to be the case; I have a counterexample where this check actually returns sat.
Is this a bug in z3, am I using it wrong, or are any of my assumptions not valid?
This is a bug in the new tseitin-cnf tactic. I fixed the bug, and the fix will be available in the next release (Z3 4.1). In the meantime, you can workaround the bug by using the rounds of simplification.
That is, use
(apply
(then (! simplify :elim-and true)
(! simplify :elim-and true)
tseitin-cnf))
instead of
(apply
(then (! simplify :elim-and true)
tseitin-cnf))

Why does a query result changes if comment an intermediate `(check-sat)` call?

While debugging UNSAT query I noticed an interesting difference in the query status. The query structure is:
assert(...)
(push) ; commenting any of these two calls
(check-sat) ; makes the whole query UNSAT, otherwise it is SAT
assert(...)
(check-sat) ; SAT or UNSAT depending on existence of previous call
(exit)
There are no pop calls in the query. The query that triggers this behaviour is here.
Ideas why?
Note: I don't actually need incrementality, it is for debugging purposes only. Z3 version is 3.2.
This is a bug in one of the quantifier reasoning engines. This bug will be fixed. In the meantime, you can avoid the bug by using datatypes instead of uninterpreted sorts + cardinality constraints. That is, you declare Q and T as:
(declare-datatypes () ((Q q_accept_S13 q_T0_init q_accept_S7
q_accept_S6 q_accept_S5 q_accept_S4 q_T0_S3 q_accept_S12 q_accept_S10
q_accept_S9 q_accept_all)))
(declare-datatypes () ((T t_0 t_1 t_2 t_3 t_4 t_5 t_6 t_7)))
The declarations above are essentially defining two "enumeration" types.
With these declarations, you will get a consistent answer for the second query.

Resources