z3 Prover for solving the Abstract syntax tree - z3

I want to convert AST(Abstract syntax tree) to the input constraints of Z3, How can I do that ? This process is for test pattern generation

Related

How can LR parsers generate parse trees?

Suppose I have a grammar:
S -> Aaa
A -> a | ε
Clearly, this grammar can parse only sequences aa and aaa. A simple LR(1) parser (or even LL) can parse these when transformed into an equivalent grammar:
S -> aaA
A -> a | ε
Although these grammars are equivalent, their generated parse trees are different. Consider, for the sequence aaa:
S S
/ \ / \
A aa aa A
| |
a a
Grammars determine whether a sequence is part of a language, instead of providing the parse tree that represents it in the language. The un-transformed grammar cannot parse the sequence (without greater look-ahead); While the transformed grammar can parse it, but builds an invalid parse tree.
How would one go about building a parse tree for a sequence - whose (context-free) grammar can (untransformed) not be represented by an LR-parser?
If a grammar has no LR(1) parser, you need to use a different parsing algorithm. In this case, you could use an LR(3) parser. Or you could (in general) use an Earley or GLR parser, which have no lookahead limitations.
I think your question has to do with recovering the original parse from the results of a parse with a transformed grammar. This will depend on the transformation algorithm.
In the example you provide, I think you're using the left-recursion-elimination transformation; this procedure does not preserve derivations, so as far as I know there is no algorithm for recovering the original parse.
There is a different transformation which can be used to construct an LR(1) grammar from.an LR(k) grammar if the value of k is known. That transformation is reversible. However, it's not usually considered practical because it effectively encodes the LR(k) machine into the grammar rules, leading to a massive blow-up of the grammar. It would be equivalent to use a real LR(k) parser, which also has a huge automaton.
First, I would say, "Grammars determine whether a sequence is a sentence of a language. You then say that the transformed grammar builds an invalid parse tree. I would say that it builds a different parse tree, which may or may not be useful. But, to your question about building a parse tree from a non-LR grammar. Consider the following grammar that is not LR(k) for any k because it is ambiguous:
E -> E + E | E * E | number
For example:
7 * 4 + 3
There are two distinct parse trees you can build from this sentence precisely due to the ambiguity in the grammar (really this is the definition of an ambiguous grammar). So, the answer to your question is that I wouldn't know how to in the general case.

Understanding and Writing Parsers

I'm writing a program that requires me to create my first real, somewhat complicated parser. I would like to understand what parsing algorithms exists, as well as how to create a "grammar". So my question(s) are as follows:
1) How does one create a formal grammar that a parser can understand? What are the basic components of a grammar?
2) What parsing algorithms exists, and what kind of input does each exceed at parsing?
3) In light of the broad nature of the questions above, what are some good references I can read through to understand the answer to questions 1 and 2?
I'm looking for more of a broad overview with the keywords/topic areas I need so I can look into the details myself. Thanks everybody!
You generally write a context-free grammar G that describes a certain formal language L (e.g. the set of all syntactically valid C programs) which is simply a set of strings over a certain alphabet (think of all well-formed C programs; or of all well-formed HTML documents; or of all well-formed MARKDOWN posts; all of these are sets of finite strings over certain subsets of the ASCII character set). After that you come up with a parser for the given grammar---that is, an algorithm that, given a string w, decides whether the string w can be derived by the grammar G. (For example, the grammar of the C11 language describes the set of all well-formed C programs.)
Some types of grammars admit simple-to-implement parsers. An example of grammars that are often used in practice are LL grammars. A special subset of LL grammars, called the LL(1) grammars, have parsers that run in linear time (linear in the length of the string we're parsing).
There are more general parsing algorithms---most notably the Early parser and the CYK algorithm---that take as inpuit a string w and a grammar G and decide in time O(|w|^3) whether the string w is derivable by the grammar G. (Notice how cool this is: the algorithm takes the grammar as an agrument. But I don't think this is used in practice.)
I implemented the Early parser in Java some time ago. If your're insterested, the code is available on GitHub.
For a concrete example of the whole process, consider the language of all balanced strings of parenthesis (), (()), ((()))()(())(), etc. We can describe them with the following context-free grammar:
S -> (S) | SS | eps
where eps is the empty production. For example, we can derive the string (())() as follows: S => SS => (S)S => ((S))S => (())S => (())(S) => (())(). We can easily implement a parser for this grammar (left as exercise :-).
A very good references is the so-called dragon book: Compilers: Principles, Techniques, and Tools by Aho et al. It covers all the essential topics. Another good reference is the classic book Introduction to Automata Theory, Languages, and Computation by Hopcroft et al.

Custom theory solver for order theory?

My program, bounded synthesizer of reactive finite state systems, produces SMT queries to annotate a product automaton of the (uninterpreted) system and a specification. Essentially it is a model checking with uninterpreted functions. If the annotation exists => the model found by Z3 satisfies the spec. The queries contain:
datatype (to encode states of a system and of a specification automaton)
>= (greater), > (strictly) (to specify ranking function of states of automaton system*spec, which is used to search lassos with bad states)or in other words, ordering of states of that automaton, which
uninterpreted functions with boolean domain and range
all clauses are horn clauses
An example is https://dl.dropboxusercontent.com/u/444947/posts/full_arbiter2.smt2
('forall' are used to encode "don't care" inputs to functions)
Currently queries take strictly greater > operator from integers arithmetic (that is a ranking function has Int range).
Question: is it worth developing a custom theory solver in Z3 for such queries? It could exploit DFS based search of lassos which might be faster than integers theory solver (or diff-neg tactic).
Or Z3 already efficiently handles this? (efficiently means "comparable to graph-based search of lassos").
Arithmetic is not the bottleneck of your benchmark.
We can check that by using
valgrind --tool=callgrind z3 full_arbiter2.smt2
kcachegrind
Valgrind and kcachegrind are available in most Linux distros.
So, I don't think you will get a significant performance improvement if you implement a solver for order theory.
One bottleneck is the datatype theory. You may get a performance boost if you encode the types Q and T using Bit-vectors. Another bottleneck is quantifier reasoning. Have you tried to expand them before invoking Z3?
In Z3, the qe (quantifier elimination) tactic will essentially expand Boolean quantifiers.
I got a small speedup by replacing
(check-sat)
with
(check-sat-using (then qe smt))

Used Premises/Axioms in Z3 TPTP proofs

When using Z3 on TPTP files (e.g. http://www.cs.miami.edu/~tptp/cgi-bin/SeeTPTP?Category=Problems&Domain=SYN&File=SYN054+1.p) is there a way to find out which axioms were used to prove the conjecture?
Alternatively, can Z3 produce TPTP proofs?
Cheers
Z3 includes limited TPTP support.
It does not track axiom names or produce proofs in the TPTP format.
Z3 offers rich support for SMT-LIB2 format and it produces proofs
in a format that can be digested by SMT-LIB2 parsers.

Is it possible to use FsYacc parser developed for one language as part of parsing process for other language?

I'm implementing parsing and expression evaluation for two languages L1 and L2.
Important thing is that L1 can be used as separate language or as a part of L2, and L2 contains only several keywords, which are absent in L1.
I've done already Lexing -> Parsing -> AST production -> AST processing process for L1 and used for this F# with FsLex, FsYacc utilities.
Is it possible to use already developed parsing process (I mean tokens, AST production defined in L1 parser) during parsing another language L2?
AST: AST of the L1 will be used as part of AST for L2, and it will be used the same AST processing process.
FsLex Lexer: possible will be common for both languages, and I will need just include into L1 lexer also several absent keywords for L2. But if it is possible to have separate lexers for L1 and L2, and refer to L1 from L2 lexer, it will be excellent.
FsYacc Parser: I would not like 'copy-C' all L1 parsers code into the L2.
Is there is a way to reference in my L2, tokens and AST data production defined in L1 parser?
Thanks in advance
Here is an interesting article which mentions the difficulties of grammar composition. The short-story is that you can't do what you want to do using yacc-like parser generators. That does not mean you can't achieve code reuse using some macro-based system, but it will remain a hack.

Resources