This is Right Recursion Grammar:
<assign> -> <id> = <exp>
<id> -> A | B | C
<exp> -> <term> + <exp> | <temp>
<term> -> <factor> * <term> | <factor>
<factor> -> ( <exp> ) | <id>
This is Left Recursion Grammar:
<assign> -> <id> = <exp>
<id> -> A | B | C
<exp> -> <exp> + <term> | <term>
<term> -> <term> * <factor> | <factor>
<factor> -> ( <exp> ) | <id>
Will those grammar produce the same parse tree for String B + C + A?
The following picture is for Left Recursion.
However, i draw the parse tree for the Right Recursion, it is a bit different between position of nodes. I dont know if what i am doing is correct. So I wonder that Left Recursion and Right Recursion produce two different parse tree or should be same parse tree. Please help to clarify this problem. Thanks.
Left and right recursion will not produce identical trees.
You can see easily from the grammars that A+B+C will at the "top-level' have <term> <op> <exp> or <exp> <op> <term> ("exp" being "B+C" in one case and "A+B" in the other.
The trees will only be identical in trivial cases where all productions yield a direct match.
(E.g. A (skipping assign) would be <exp> --> <term> --> <factor> --> <id>
Since both grammars are different I would expect different parse trees. The right recursion would swap the addition and the single expansion on the node at the top that is labeled <expr> = <expr> + <term>. The right recursion grammar would expand to <expr> = <term> + <expr> so both children would be swapped.
If you are trying to write a compiler for mathematical expressions a thing that matters more is operator precedence, but not sure what you are up to.
Related
Considering the following grammar for propositional logic:
<A> ::= <B> <-> <A> | <B>
<B> ::= <C> -> <B> | <C>
<C> ::= <D> \/ <C> | <D>
<D> ::= <E> /\ <D> | <E>
<E> ::= <F> | -<F>
<F> ::= <G> | <H>
<G> ::= (<A>)
<H> ::= p | q | r | ... | z
Precedence for conectives is: -, /\, /, ->, <->.
Associativity is also considered, for example p\/q\/r should be the same as p\/(q\/r). The same for the other conectives.
I pretend to make a predictive top-down parser in java. I dont see here ambiguity or direct left recursion, but not sure if thats all i need to consider this a LL(1) grammar. Maybe undirect left recursion?
If this is not a LL(1) grammar, what would be the steps required to transform it for my intentions?
It's not LL(1). Here's why:
The first rule of an LL(1) grammar is:
A grammar G is LL(1) if and only if whenever A --> C | D are two distinct productions of G, the following conditions hold:
For no terminal a , do both C and D derive strings beginning with a.
This rule is, so that there are no conflicts while parsing this code. When the parser encounters a (, it won't know which production to use.
Your grammar violates this first rule. All your non-terminals on the right hand of the same production , that is, all your Cs and Ds, eventually reduce to G and H, so all of them derive at least one string beginning with (.
How can I interpret this as ENBF grammer?
<assign>--> <id> = <expr>
<id>--> A | B | C
<expr> --> <expr> * <expr>
<expr> --> <expr> + <expr>
| <id> + <expr>
|( <expr> )
| <id>
I can make parse tree and derivation of any statement using this grammer, but am having trouble with EBNF.
<assign>--> <id> = <expr>
An assign is the sequence: id equals-sign expr.
<id>--> A | B | C
An id is one of A, B or C
<expr> --> <expr> * <expr>
<expr> --> <expr> + <expr>
| <id> + <expr>
|( <expr> )
| <id>
An expression can be:
The product of two expressions (infix notation)
The addition of two expression (infix notation)
The addition of an identifier and an expression (which is a particular case of addition of two expressions, where the first expresion is just <id>)
A parenthesized expression.
An identifier.
We have little snippets of vb6 code (the only use a subset of features) that gets wirtten by non-programmers. These are called rules. For the people writing these they are hard to debug so somebody wrote a kind of add hoc parser to be able to evaluete the subexpressions and thereby show better where the problem is.
This addhoc parser is very bad and does not really work woll. So Im trying to write a real parser (because im writting it by hand (no parser generator I could understand with vb6 backends) I want to go with recursive decent parser). I had to reverse-engineer the grammer because I could find anything. (Eventully I found something http://www.notebar.com/GoldParserEngine.html but its LALR and its way bigger then i need)
Here is the grammer for the subset of VB.
<Rule> ::= expr rule | e
<Expr> ::= ( expr )
| Not_List CompareExpr <and_or> expr
| Not_List CompareExpr
<and_or> ::= Or | And
<Not_List> ::= Not Not_List | e
<CompareExpr> ::= ConcatExpr comp CompareExpr
|ConcatExpr
<ConcatExpr> ::= term term_tail & ConcatExpr
|term term_tail
<term> ::= factor factor_tail
<term_tail> ::= add_op term term_tail | e
<factor> ::= add_op Value | Value
<factor_tail> ::= multi_op factor factor_tail | e
<Value> ::= ConstExpr | function | expr
<ConstExpr> ::= <bool> | number | string | Nothing
<bool> ::= True | False
<Nothing> ::= Nothing | Null | Empty
<function> ::= id | id ( ) | id ( arg_list )
<arg_list> ::= expr , arg_list | expr
<add_op> ::= + | -
<multi_op> ::= * | /
<comp> ::= > | < | <= | => | =< | >= | = | <>
All in all it works pretty good here are some simple examples:
my_function(1, 2 , 3)
looks like
(Programm
(rule
(expr
(Not_List)
(CompareExpr
(ConcatExpr
(term
(factor
(value
(function
my_function
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 1))) (term_tail))))
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 2))) (term_tail))))
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 3))) (term_tail))))
(arg_list))))))))
(term_tail))))
(rule)))
Now whats my problem?
if you have code that looks like this (( true OR false ) AND true) I have a infinit recursion but the real problem is that in the (true OR false) AND true (after the first ( expr ) ) is understood as only (true or false).
Here is the Parstree:
So how to solve this. Should I change the grammer somehow or use some implmentation hack?
Something hard exmplale in case you need it.
(( f1 OR f1 ) AND (( f3="ALL" OR f4="test" OR f5="ALL" OR f6="make" OR f9(1, 2) ) AND ( f7>1 OR f8>1 )) OR f8 <> "")
You have several issues that I see.
You are treating OR and AND as equal precedence operators. You should have separate rules for OR, and for AND. Otherwise you will the wrong precedence (therefore evaluation) for the expression A OR B AND C.
So as a first step, I'd revise your rules as follows:
<Expr> ::= ( expr )
| Not_List AndExpr Or Expr
| Not_List AndExpr
<AndExpr> ::=
| CompareExpr And AndExpr
| Not_List CompareExpr
Next problem is that you have ( expr ) at the top level of your list. What if I write:
A AND (B OR C)
To fix this, change these two rules:
<Expr> ::= Not_List AndExpr Or Expr
| Not_List AndExpr
<Value> ::= ConstExpr | function | ( expr )
I think your implementation of Not is not appropriate. Not is an operator,
just with one operand, so its "tree" should have a Not node and a child which
is the expression be Notted. What you have a list of Nots with no operands.
Try this instead:
<Expr> ::= AndExpr Or Expr
| AndExpr
<Value> ::= ConstExpr | function | ( expr ) | Not Value
I haven't looked, but I think VB6 expressions have other messy things in them.
If you notice, the style of Expr and AndExpr I have written use right recursion to avoid left recursion. You should change your Concat, Sum, and Factor rules to follow a similar style; what you have is pretty complicated and hard to follow.
If they are just creating snippets then perhaps VB5 is "good enough" for creating them. And if VB5 is good enough, the free VB5 Control Creation Edition might be worth tracking down for them to use:
http://www.thevbzone.com/vbcce.htm
You could have them start from a "test harness" project they add snippets to, and they can even test them out.
With a little orientation this will probably prove much more practical than hand crafting a syntax analyzer, and a lot more useful since they can test for more than correct syntax.
Where VB5 is lacking you might include a static module in the "test harness" that provides a rough and ready equivalent of Split(), Replace(), etc:
http://support.microsoft.com/kb/188007
Some compiler books / articles / papers talk about design of a grammar and the relation of its operator's associativity. I'm a big fan of top-down, especially recursive descent, parsers and so far most (if not all) compilers I've written use the following expression grammar:
Expr ::= Term { ( "+" | "-" ) Term }
Term ::= Factor { ( "*" | "/" ) Factor }
Factor ::= INTEGER | "(" Expr ")"
which is an EBNF representation of this BNF:
Expr ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor = INTEGER | "(" Expr ")"
According to what I read, some regards this grammar as being "wrong" due to the change of operator associativity (left to right for those 4 operators) proven by the growing parse tree to the right instead of left. For a parser implemented through attribute grammar, this might be true as l-attribute value requires that this value created first then passed to child nodes. however, when implementing with normal recursive descent parser, it's up to me whether to construct this node first then pass to child nodes (top-down) or let child nodes be created first then add the returned value as the children of this node (passed in this node's constructor) (bottom-up). There should be something I miss here because I don't agree with the statement saying this grammar is "wrong" and this grammar has been used in many languages esp. Wirthian ones. Usually (or all?) the reading that says it promotes LR parsing instead of LL.
I think the issue here is that a language has an abstract syntax which is just like:
E ::= E + E | E - E | E * E | E / E | Int | (E)
but this is actually implemented via a concrete syntax which is used to specify associativity and precedence. So, if you're writing a recursive decent parse, you're implicitly writing the concrete syntax into it as you go along and that's fine, though it may be good to specify it exactly as a phrase-structured grammar as well!
There are a couple of issues with your grammar if it is to be a fully-fledged concrete grammar. First of all, you need to add productions to just 'go to the next level down', so relaxing your syntax a bit:
Expr ::= Term + Term | Term - Term | Term
Term ::= Factor * Factor | Factor / Factor | Factor
Factor ::= INTEGER | (Expr)
Otherwise there's no way to derive valid sentences starting from the start symbol (in this case Expr). For example, how would you derive '1 * 2' without those extra productions?
Expr -> Term
-> Factor * Factor
-> 1 * Factor
-> 1 * 2
We can see the other grammar handles this in a slightly different way:
Expr -> Term Expr'
-> Factor Term' Expr'
-> 1 Term' Expr'
-> 1 * Factor Term' Expr'
-> 1 * 2 Term' Expr'
-> 1 * 2 ε Expr'
-> 1 * 2 ε ε
= 1 * 2
but this achieves the same effect.
Your parser is actually non-associative. To see this ask how E + E + E would be parsed and find that it couldn't. Whichever + is consumed first, we get E on one side and E + E on the other, but then we're trying to parse E + E as a Term which is not possible. Equivalently, think about deriving that expression from the start symbol, again not possible.
Expr -> Term + Term
-> ? (can't get another + in here)
The other grammar is left-associative ebcase an arbitrarily long sting of E + E + ... + E can be derived.
So anyway, to sum up, you're right that when writing the RDP, you can implement whatever concrete version of the abstract syntax you like and you probably know a lot more about that than me. But there are these issues when trying to produce the grammar which describes your RDP precisely. Hope that helps!
To get associative trees, you really need to have the trees formed with the operator as the subtree root node, with children having similar roots.
Your implementation grammar:
Expr ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor ::= INTEGER | "(" Expr ")"
must make that awkward; if you implement recursive descent on this, the Expr' routine has no access to the "left child" and so can't build the tree. You can always patch this up by passing around pieces (in this case, passing tree parts up the recursion) but that just seems awkward. You could have chosen this instead as a grammar:
Expr ::= Term ( ("+"|"-") Term )*;
Term ::= Factor ( ( "*" | "/" ) Factor )* ;
Factor ::= INTEGER | "(" Expr ")"
which is just as easy (easier?) to code recursive descent-wise, but now you can form the trees you need without trouble.
This doesn't really get you associativity; it just shapes the trees so that it could be allowed. Associativity means that the tree ( + (+ a b) c) means the same thing as (+ a (+ b c)); its actually a semantic property (sure doesn't work for "-" but the grammar as posed can't distinguish).
We have a tool (the DMS Software Reengineering Toolkit) that includes parsers and term-rewriting (using source-to-source transformations) in which the associativity is explicitly expressed. We'd write your grammar:
Expr ::= Term ;
[Associative Commutative] Expr ::= Expr "+" Term ;
Expr ::= Expr "-" Term ;
Term ::= Factor ;
[Associative Commutative] Term ::= Term "*" Factor ;
Term ::= Term "/" Factor ;
Factor ::= INTEGER ;
Factor ::= "(" Expr ")" ;
The grammar seems longer and clumsier this way, but it in fact allows us to break out the special cases and mark them as needed. In particular, we can now distinguish operators that are associative from those that are not, and mark them accordingly. With that semantic marking, our tree-rewrite engine automatically accounts for associativity and commutativity. You can see a full example of such DMS rules being used to symbolically simplify high-school algebra using explicit rewrite rules over a typical expression grammar that don't have to account for such semantic properties. That is built into the rewrite engine.
The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1):
<expr> := <term> + <term>
| <term> - <term>
| <term>
<term> := <factor> * <factor>
<factor> / <factor>
<factor>
<factor> := <number>
| <id>
| ( <expr> )
<number> := \d+
<id> := [a-zA-Z_]\w+
Because it is always enough to see the next token in order to know the rule to pick. However, suppose that I add the following rule:
<command> := <expr>
| <id> = <expr>
For the purpose of interacting with the calculator on the command line, with variables, like this:
calc> 5+5
=> 10
calc> x = 8
calc> 6 * x + 1
=> 49
Is it true that I can not use a simple LL(1) predictive parser to parse <command> rules ? I tried to write the parser for it, but it seems that I need to know more tokens forward. Is the solution to use backtracking, or can I just implement LL(2) and always look two tokens forward ?
How to RD parser generators handle this problem (ANTLR, for instance)?
THe problem with
<command> := <expr>
| <id> = <expr>
is that when you "see" <id> you can't tell if it's the beginning of an assignement (second rule) or it's a "<factor>". You will only know when you'll read the next token.
AFAIK ANTLR is LL(*) (and is also able to generate rat-pack parsers if I'm not mistaken) so it will probably handle this grammare considering two tokens at once.
If you can play with the grammar I would suggest to either add a keyword for the assignment (e.g. let x = 8) :
<command> := <expr>
| "let" <id> "=" <expr>
or use the = to signify evaluation:
<command> := "=" <expr>
| <id> "=" <expr>
I think there are two ways to solve this with a recursive descent parser: either by using (more) lookahead or by backtracking.
Lookahead
command() {
if (currentToken() == id && lookaheadToken() == '=') {
return assignment();
} else {
return expr();
}
}
Backtracking
command() {
savedLocation = scanLocation();
if (accept( id )) {
identifier = acceptedTokenValue();
if (!accept( '=' )) {
setScanLocation( savedLocation );
return expr();
}
return new assignment( identifier, expr() );
} else {
return expr();
}
}
The problem is that the grammar:
<command> := <expr>
| <id> = <expr>
is not a mutually-recursive procedure. For a recursive decent parser you will need to determine a non-recursive equivalent.
rdentato post's shows how to fix this, assuming you can play with the grammar. This powerpoint spells out the problem in a bit more detail and shows how to correct it:
http://www.google.com/url?sa=t&source=web&ct=res&cd=7&url=http%3A%2F%2Fxml.cs.nccu.edu.tw%2Fcourses%2Fcompiler%2Fcp2006%2Fslides%2Flec3-Parsing%26TopDownParsing.ppt&ei=-YLaSPrWGaPwhAK5ydCqBQ&usg=AFQjCNGAFrODJxoxkgJEwDMQ8A8594vn0Q&sig2=nlYKQVfakmqy_57137XzrQ
ANTLR 3 uses a "LL(*)" parser as opposed to a LL(k) parser, so it will look ahead until it reaches the end of the input if it has to, without backtracking, using a specially optimized determinstic finite automata (DFA).