Convert C- grammar to LL(1) - parsing

I am currently building a compiler for C-. I am currently working on the parser, and for some reason, I can't seem to resolve the first-sets collision (of terminal id) origination from the EXPRESSION production. Below, Is a subset of the grammar I have now, could someone point me in the right direction as to how to resolve the collision (or convert to a an equivalent LL(1) parseable grammar).
EXPRESSION -> id VAR eq EXPRESSION | SIMPLEEXPRESSION
VAR -> lbracket EXPRESSION rbracket | empty
SIMPLEEXPRESSION -> ADDITIVEEXPRESSION FADDITIVEEXPRESSION
FADDITIVEEXPRESSION -> RELOP ADDITIVEEXPRESSION | empty
RELOP -> ltoreq | lt | gt | gtoreq | doubleeq | noteq
ADDITIVEEXPRESSION -> TERM ADDITIVEEXPRESSION1
ADDITIVEEXPRESSION1 -> ADDOP TERM ADDITIVEEXPRESSION1 | empty
ADDOP -> plus | minus
TERM -> FACTOR TERM1
TERM1 -> MULOP FACTOR TERM1 | empty
MULOP -> times | divide
FACTOR -> lparen EXPRESSION rparen | id FACTOR1 | num
FACTOR1 -> a | b

C does not lend itself very well to LL(1) parsing, so what you are trying to do here could be quite difficult to achieve and may not even be possible for the full grammar.
But for the problem at hand, for the top-level production
EXPRESSION -> id VAR eq EXPRESSION | SIMPLEEXPRESSION
it's easy to see that id can be the start of either alternative, so an LL(1) parser will not know which alternative to pick.
One solution to the immediate problem would be to split the EXPRESSION production into two alternatives, one that always starts with an id terminal, and one that never does:
EXPRESSION -> EXPRESSION_id | EXPRESSION_non_id
For the id alternative, we would require the id terminal up front and then create id-only versions of the productions that follow:
EXPRESSION_id -> id (VAR eq EXPRESSION | SIMPLEEXPRESSION_id)
Similarly, for the non-id side, we would create non-id versions of the productions that follow:
EXPRESSION_non_id -> SIMPLEEXPRESSION_non_id
The required sub-productions to complete the grammar would look something like this:
SIMPLEEXPRESSION_id -> ADDITIVEEXPRESSION_id FADDITIVEEXPRESSION
ADDITIVEEXPRESSION_id -> TERM_id ADDITIVEEXPRESSION1
TERM_id -> FACTOR_id TERM1
FACTOR_id -> FACTOR1
SIMPLEEXPRESSION_non_id -> ADDITIVEEXPRESSION_non_id FADDITIVEEXPRESSION
ADDITIVEEXPRESSION_non_id -> TERM_non_id ADDITIVEEXPRESSION1
TERM_non_id -> FACTOR_non_id TERM1
FACTOR_non_id -> lparen EXPRESSION rparen | num
You can make similar transformations for other conflicts, but the resulting grammar can become quite unwieldy.

Related

How to make this grammar LL(1)

I want to know if it would be possible to transform this grammar to LL(1). This is the grammar:
A -> B
| C
B -> a
| a ';'
C -> a D
| a D ';'
D -> ';' a
| D ';' a
Since this language is regular ( a; | a(;a)+;? ), then yes, it would be possible.
Not sure if I'm using the right syntax, but the language is basically a; (using A->B) or any string that starts with an a, followed one or more ;a pairs, optionally adding another ; on the end.
This is the same grammar but simpler:
A -> a | a ';' | a ';' A
It still not LL(1). But removing left factor now it is LL(1):
A -> a B B -> ε | ';' C C -> ε | A

This context-free grammar is ambiguous and I'm not sure why. The SLR(1) compiler I'm building doesn't work the way I expect it to

I'm building a syntax parser. It's going good to be SLR(1) but I believe there are some reduce/shift conflicts or some kind of conflict that is making the parser reject strings too early . Here is the grammar:
Note: I did left factor the grammar to see if that was the problem, but that doesn't get rid of ambiguity. However this is the original grammar without left factoring
P'' -> P'$
P' -> P
P -> C | C;D
D -> R | RD
R -> pu{P}
C -> I | I;C
I -> h | O | A | R | Z
O -> i(V) | z(V)
Y -> u
V -> S | N
S -> u
N -> u
A -> S=s | S=S | N=X
X -> N | b | L
L -> d(X,X) | s(X,X) | m(X,X)
R -> f(B)t{C} | f(B)t{C}1{C}
B -> e(V,V) | (N<N) | (N>N) | nB | a(B,B) | o(B,B)
Z -> w(B){C} | r(N=0;N<N;N=a(N,1)){C}
I understand this grammar is quite big, but if you could help me here you would be a life saver. Thank you in advance!
Having recognized an I, and with ; as the next symbol, there's a shift-reduce conflict:
The production C -> I;C says to shift the ;.
The production P -> C;D says to reduce via C -> I.
So the grammar is not SLR(1).

How to fix this grammar so that it could evaluate postfix correctly

I'm writing a grammar that should convert infix to postfix. Our teacher told us to change this grammar:
E -> TT'
T -> FF'
T'-> +T | -T | nil
F -> (E) | id | num
F' -> *F | /F | nil
Note: tokens are +,-,*,/, ^ (pow). The problem is power operator . I don't know how to change the grammar so that it could parse power too.
Thanks in advance.

F# pattern matching: how to match a set of possible types that share the same parameters?

I'm new to F# and not quite familiar with the whole pattern matching idea.
I tried to search for a better solution to my problem but I fear I can't even express the problem properly – I hope the question title is at least somewhat accurate.
What I want to do is extract 2 "parameters" from listMethod.
listMethod is of one of several types that have a string and an Expression "parameter" (I suspect parameter is the wrong term):
let (varDecl, listExpr) =
match listMethod with
| Select (var, expr) -> (var, expr)
| Where (var, expr) -> (var, expr)
| Sum (var, expr) -> (var, expr)
| Concat (var, expr) -> (var, expr)
Then I continue to work with varDecl and at the end have a similar match expression with the actual listMethod code that makes use of several temporary variables I created based on varDecl.
My question now is: How can I make the above code more compact?
I want to match all those types that have 2 parameters (of type string and Expression) without listing them all myself, which is kinda ugly and hard to maintain.
The ListMethod type is declared as follows (the whole thing is a FsLex/FsYacc project):
type ListMethod =
| Select of string * Expr
| Where of string * Expr
| Sum of string * Expr
| Concat of string * Expr
| ...
| somethingElse of Expr
(as of now I only have types of the form string * Expr, but that will change).
I reckon that this is a fairly dumb question for anyone with some experience, but as I've said I'm new to F# and couldn't find a solution myself.
Thanks in advance!
Edit: I'd really like to avoid listing all possible types of listMethod twice. If there's no way I can use wildcards or placeholders in the match expressions, perhaps I can modify the listMethod type to make things cleaner.
One option that comes to mind would be creating only 1 type of listMethod and to create a third parameter for the concrete type (Select, Where, Sum).
Or is there a better approach?
This is probably the standard way:
let (varDecl, listExpr) =
match listMethod with
| Select (var, expr)
| Where (var, expr)
| Sum (var, expr)
| Concat (var, expr) -> (var, expr)
The | sign means or, so if one of these match, the result will be returned. Just make sure that every case has exactly the same names (and types).
As Chuck commented, this is an even better solution:
let (Select (varDecl, expr)
| Where (varDecl, expr)
| Sum (varDecl, expr)
| Concat (varDecl, expr)) = listMethod
I reckon that this is a fairly dumb question for anyone with some experience, but as I've said I'm new to F# and couldn't find a solution myself.
On the contrary, this is a very good question and actually relatively untrodden ground because F# differs from other languages in this regard (e.g. you might solve this problem using polymorphic variants in OCaml).
As Ankur wrote, the best solution is always to change your data structure to make it easier to do what you need to do if that is possible. KVB's solution of using active patterns is not only valuable but also novel because that language feature is uncommon in other languages. Ramon's suggestion to combine your match cases using or-patterns is also good but you don't want to write incomplete pattern matches.
Perhaps the most common example of this problem arising in practice is in operators:
type expr =
| Add of expr * expr
| Sub of expr * expr
| Mul of expr * expr
| Div of expr * expr
| Pow of expr * expr
| ...
where you might restructure your type as follows:
type binOp = Add | Sub | Mul | Div | Pow
type expr =
| BinOp of binOp * expr * expr
| ...
Then tasks like extracting subexpressions:
let subExprs = function
| Add(f, g)
| Sub(f, g)
| Mul(f, g)
| Div(f, g)
| Pow(f, g) -> [f; g]
| ...
can be performed more easily:
let subExprs = function
| BinOp(_, f, g) -> [f; g]
| ...
Finally, don't forget that you can augment F# types (such as union types) with OOP constructs such as implementing shared interfaces. This can also be used to express commonality, e.g. if you have two overlapping requirements on two types then you might make them both implement the same interface in order to expose this commonality.
In case you are ok to do adjustments to your data structure then below is something that will ease out the pattern matching.
type ListOperations =
Select | Where | Sum | Concat
type ListMethod =
| ListOp of ListOperations * string * Expr
| SomethingElse of int
let test t =
match t with
| ListOp (a,b,c) -> (b,c)
| _ -> ....
A data structure should be designed by keeping in mind the operation you want to perform on it.
If there are times when you will want to treat all of your cases the same and other times where you will want to treat them differently based on whether you are processing a Select, Where, Sum, etc., then one solution would be to use an active pattern:
let (|OperatorExpression|_|) = function
| Select(var, expr) -> Some(Select, var, expr)
| Where (var, expr) -> Some(Where, var, expr)
| Sum (var, expr) -> Some(Sum, var, expr)
| Concat (var, expr) -> Some(Concat, var, expr)
| _ -> None
Now you can still match normally if you need to treat the cases individually, but you can also match using the active pattern:
let varDecl, listExp =
match listMethod with
| OperatorExpression(_, v, e) -> v, e
| _ -> // whatever you do for other cases...

Practical solution to fix a Grammar Problem

We have little snippets of vb6 code (the only use a subset of features) that gets wirtten by non-programmers. These are called rules. For the people writing these they are hard to debug so somebody wrote a kind of add hoc parser to be able to evaluete the subexpressions and thereby show better where the problem is.
This addhoc parser is very bad and does not really work woll. So Im trying to write a real parser (because im writting it by hand (no parser generator I could understand with vb6 backends) I want to go with recursive decent parser). I had to reverse-engineer the grammer because I could find anything. (Eventully I found something http://www.notebar.com/GoldParserEngine.html but its LALR and its way bigger then i need)
Here is the grammer for the subset of VB.
<Rule> ::= expr rule | e
<Expr> ::= ( expr )
| Not_List CompareExpr <and_or> expr
| Not_List CompareExpr
<and_or> ::= Or | And
<Not_List> ::= Not Not_List | e
<CompareExpr> ::= ConcatExpr comp CompareExpr
|ConcatExpr
<ConcatExpr> ::= term term_tail & ConcatExpr
|term term_tail
<term> ::= factor factor_tail
<term_tail> ::= add_op term term_tail | e
<factor> ::= add_op Value | Value
<factor_tail> ::= multi_op factor factor_tail | e
<Value> ::= ConstExpr | function | expr
<ConstExpr> ::= <bool> | number | string | Nothing
<bool> ::= True | False
<Nothing> ::= Nothing | Null | Empty
<function> ::= id | id ( ) | id ( arg_list )
<arg_list> ::= expr , arg_list | expr
<add_op> ::= + | -
<multi_op> ::= * | /
<comp> ::= > | < | <= | => | =< | >= | = | <>
All in all it works pretty good here are some simple examples:
my_function(1, 2 , 3)
looks like
(Programm
(rule
(expr
(Not_List)
(CompareExpr
(ConcatExpr
(term
(factor
(value
(function
my_function
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 1))) (term_tail))))
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 2))) (term_tail))))
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 3))) (term_tail))))
(arg_list))))))))
(term_tail))))
(rule)))
Now whats my problem?
if you have code that looks like this (( true OR false ) AND true) I have a infinit recursion but the real problem is that in the (true OR false) AND true (after the first ( expr ) ) is understood as only (true or false).
Here is the Parstree:
So how to solve this. Should I change the grammer somehow or use some implmentation hack?
Something hard exmplale in case you need it.
(( f1 OR f1 ) AND (( f3="ALL" OR f4="test" OR f5="ALL" OR f6="make" OR f9(1, 2) ) AND ( f7>1 OR f8>1 )) OR f8 <> "")
You have several issues that I see.
You are treating OR and AND as equal precedence operators. You should have separate rules for OR, and for AND. Otherwise you will the wrong precedence (therefore evaluation) for the expression A OR B AND C.
So as a first step, I'd revise your rules as follows:
<Expr> ::= ( expr )
| Not_List AndExpr Or Expr
| Not_List AndExpr
<AndExpr> ::=
| CompareExpr And AndExpr
| Not_List CompareExpr
Next problem is that you have ( expr ) at the top level of your list. What if I write:
A AND (B OR C)
To fix this, change these two rules:
<Expr> ::= Not_List AndExpr Or Expr
| Not_List AndExpr
<Value> ::= ConstExpr | function | ( expr )
I think your implementation of Not is not appropriate. Not is an operator,
just with one operand, so its "tree" should have a Not node and a child which
is the expression be Notted. What you have a list of Nots with no operands.
Try this instead:
<Expr> ::= AndExpr Or Expr
| AndExpr
<Value> ::= ConstExpr | function | ( expr ) | Not Value
I haven't looked, but I think VB6 expressions have other messy things in them.
If you notice, the style of Expr and AndExpr I have written use right recursion to avoid left recursion. You should change your Concat, Sum, and Factor rules to follow a similar style; what you have is pretty complicated and hard to follow.
If they are just creating snippets then perhaps VB5 is "good enough" for creating them. And if VB5 is good enough, the free VB5 Control Creation Edition might be worth tracking down for them to use:
http://www.thevbzone.com/vbcce.htm
You could have them start from a "test harness" project they add snippets to, and they can even test them out.
With a little orientation this will probably prove much more practical than hand crafting a syntax analyzer, and a lot more useful since they can test for more than correct syntax.
Where VB5 is lacking you might include a static module in the "test harness" that provides a rough and ready equivalent of Split(), Replace(), etc:
http://support.microsoft.com/kb/188007

Resources