Transform grammar into LL(1) - parsing

I have the following grammar:
START -> STM $
STM -> VAR = EXPR
STM -> EXPR
EXPR -> VAR
VAR -> id
VAR -> * EXPR
With this firstand follow sets:
First set Follow set
START id, * $
STM id, * $
EXPR id, * $, =
VAR id, * $, =
I've created the parsing table that follows:
$ = id * $
START START → STM $ START → STM $
STM STM → VAR = EXPR STM → VAR = EXPR
STM → EXPR STM → EXPR
EXPR EXPR → VAR EXPR → VAR
VAR VAR → id VAR → id
VAR → * EXPR VAR → * EXPR
From here I can see that this is not LL(1).
How can I modify this grammar so that it becomes LL(1)?

If you think about what sorts of strings can be generated by this particular grammar, it's all the strings of one of the following forms:
***....**id
***....**id = ***...**id
With this in mind, you can design an LL(1) grammar for this language by essentially building a new grammar for the language from scratch. Here's one way to do this:
Start → Statement $
Statement → StarredID OptExpr
StarredID → * StarredID | id
OptExpr → ε | = StarredID
Here, the FIRST sets are given as follows:
FIRST(Start) = {*, id}
FIRST(Statement) = {*, id}
FIRST(StarredID) = {*, id}
FIRST(OptExpr) = {ε, *, id}
FOLLOW(Statement) = {$}
FOLLOW(StarredID) = {=, $}
FOLLOW(OptExpr) = {$}
The parse table is then shown here:
* | id | = $
---------------+-------------------+-------------------+-------------+-----------
Start | Statement$ | Statement$ | |
Statement | StarredID OptExpr | StarredID OptExpr | |
StarredID | * StarredID | id | |
OptExpr | | | = StarredID | epsilon
So this grammar is LL(1).

Related

How do parentheses work with custom data types?

Currently, I am working on a problem of parsing and showing expressions in Haskell.
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
This is the code of my data type Expr and this is how i define show function:
instance Show Expr where
show (Val x) = show x
show (Var y) = y
show (p :+: q) = par (show p ++ "+" ++ show q)
show (p :-: q) = par (show p ++ "-" ++ show q)
show (p :/: q) = par (show p ++ "/" ++ show q)
show (p :*: q) = par (show p ++ "*" ++ show q)
show (p :%: q) = par (show p ++ "%" ++ show q)
par :: String -> String
par s = "(" ++ s ++ ")"
Later i tried to transform string input into the expression but i encounter the following problem: I don't understand how parentheses in the second case are implemented in Haskell.
*Main> Val 2 :*:Val 2 :+: Val 3
((2*2)+3)
*Main> Val 2 :*:(Val 2 :+: Val 3)
(2*(2+3))
Because of that, i am a bit confused regarding how should i transform parentheses from my string into the expression. Currently i am using the following function for parsing, but for now, it just ignores parentheses which is not intended behavior:
toExpr :: String -> Expr
toExpr str = f (lexer str) (Val 0)
where
f [] expr = expr
f (c:cs) expr
|isAlpha (head c) = f cs (Var c)
|isDigit (head c) = f cs (Val (read c))
|c == "+" = (expr :+: f cs (Val 0))
|c == "-" = (expr :-: f cs (Val 0))
|c == "/" = (expr :/: f cs (Val 0))
|c == "*" = (expr :*: f cs (Val 0))
|c == "%" = (expr :%: f cs (Val 0))
|otherwise = f cs expr
Edit: few grammar mistakes
I don't understand how parentheses in the second case are implemented in Haskell.
The brackets just give precedence to a certain part of the expression to parse. The problem is not with the parenthesis you render. I think the problem is that you did not assign precedence to your operators. This thus means that, unless you specify brackets, Haskell will consider all operators to have the same precedence, and parse these left-to-right. This thus means that x ⊕ y ⊗ z is parsed as (x ⊕ y) ⊗ z.
You can define the precedence of your :+:, :*, etc. operators with infixl:
infixl 7 :*:, :/:, :%:
infixl 5 :+:, :-:
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
As for your parser (the toExpr), you will need a parsing mechanism like a LALR parser [wiki] that stores results on a stack, and thus makes proper operations.
This was my final parser which gave me the result I needed. To get the result i wanted proper grammar was added and i wrote a parses according to he grammar.
Thanks, everyone for the help.
{-
parser for the following grammar:
E -> T E'
E' -> + T E' | - T E' | <empty string>
T -> F T'
T' -> * F T' | / F T' | % F T' | <empty string>
F -> (E) | <integer> | <identifier>
-}
parseExpr :: String -> (Expr,[String])
parseExpr tokens = parseE (lexer tokens)
parseE :: [String] -> (Expr,[String])
parseE tokens = parseE' acc rest where (acc,rest) = parseT tokens
parseE' :: Expr -> [String] -> (Expr,[String])
parseE' accepted ("+":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :+: acc) rest
parseE' accepted ("-":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :-: acc) rest
parseE' accepted tokens = (accepted,tokens)
parseT :: [String] -> (Expr,[String])
parseT tokens = let (acc,rest) = parseF tokens in parseT' acc rest
parseT' :: Expr -> [String] -> (Expr,[String])
parseT' accepted ("*":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :*: acc) rest
parseT' accepted ("/":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :/: acc) rest
parseT' accepted ("%":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :%: acc) rest
parseT' accepted tokens = (accepted,tokens)
parseF :: [String] -> (Expr,[String])
parseF ("(":tokens) = (e, tail rest) where (e,rest) = parseE tokens
parseF (t:tokens)
| isAlpha (head t) = (Var t,tokens)
| isDigit (head t) = (Val (read t),tokens)
| otherwise = error ""
parseF [] = error ""
lexer :: String -> [String]
lexer [] = []
lexer (c:cs)
| elem c " \t\n" = lexer cs
| elem c "=+-*/%()" = [c]:(lexer cs)
| isAlpha c = (c:takeWhile isAlpha cs):lexer(dropWhile isAlpha cs)
| isDigit c = (c:takeWhile isDigit cs):lexer(dropWhile isDigit cs)
| otherwise = error ""

How Do I left factor and eliminate left recursion?

My Production rules are as follows:
S → id = Exp
S → id (Arglist)
Arglist → Arglist , Exp
Arglist → Exp
Exp → id (Arglist)
Exp → id
This is my first attempt:
S -> id S'
S' -> ϵ | = EXP | (Arglist)
Arglist -> Arglist'
Arglist' -> ϵ | ,Exp Arglist'
Exp -> id Exp'
Exp' -> ϵ | (Arglist)
My problem is with the Arglist production rule, I am wrong.
You just need to change Arglist to being right-recursive, which will recognise the same language (with a slightly different parse tree):
Arglist → Exp , Arglist
Arglist → Exp
Then left-factor:
Arglist → Exp Arglist'
Arglist' → ε | , Exp Arglist'

Calculating first and follow set of grammar

below is the grammar that i am using for a calculator language and my attempt at finding the follow set and the first set of the grammar.
I would love help in figuring out what i am doing wrong when trying to figure out these sets because I feel like i am not doing them correctly at all (at least for the follow sets)
Grammar
program → stmt_list $$$
stmt_list → stmt stmt_list | ε
stmt → id = expr | input id | print expr
expr → term term_tail
term_tail → add op term term_tail | ε
term → factor fact_tail
fact_tail → mult_op fact fact_tail | ε
factor → ( expr ) | number | id
add_op → + | -
mult_op → * | / | // | %
First set
first(p) = {id, input, print}
first(stmt_list) = {id, input, print, e}
first(s) = {id, input, print}
first(expr) = {(, id, number}
first(term_tail) = {+, -, e}
first(term) = {(, id, number}
first(fact_tail) = {, /, //, %, e}
first(factor) = {(, id, number}
first(add_op) = {+, -}
first(mult_op) = {, /, //, %}
Follow Set
follow(p) = {$}
follow(stmt_list) = {$}
follow(stmt) = {id, input, print}
follow(expr) = {(, id, number, ), input, print, , /, //, %}
follow(term_tail) = {), (, id, number, print, input}
follow(term) = {+, -}
follow(factor) = {, /, //, %}
follow(add_op) = {}
follow(mult_op) = {}
follow(fact_tail) = {*, /, //, %, +, -}
You have certain mistakes in First as well
first(p) = {id, input, print,e}
it will include epsilon
* is missing in the next two -
first(fact_tail) = { *,/, //, %, e} first(mult_op) = {*, /, //, %}
fact_tail → mult_op fact fact_tail | ε
Iam assuming here you actually mean
fact_tail → mult_op factor fact_tail | ε
Follow
follow(stmt) = {id, input, print,$}
if you refer to
stmt_list → stmt stmt_list | ε
then stmt is followed by first of stmt_list which includes e so string generated will end, hence stmt is followed by $
follow(expr) = {(, id, number, ), input, print, , /, //, %}
I don't know how you got this, follow of expr is equal to follow of stmt and )
follow(expr) = {id, ), input, print,$}
follow(term_tail) is equal to follow(expr)
follow(term) = {+,-,),id,input,print,$}
follow(fact_tail) is equal to follow(term)
follow(factor) = first(fact_tail)
follow(add_op) = first(term)
follow(mult_op) = first(factor)

Assignment as expression in Antlr grammar

I'm trying to extend the grammar of the Tiny Language to treat assignment as expression. Thus it would be valid to write
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
a = 1 = 2; // invalid
Assignment differs from other operators in two aspects. It's right associative (not a big deal), and its left-hand side is has to be a variable. So I changed the grammar like this
statement: assignmentExpr | functionCall ...;
assignmentExpr: Identifier indexes? '=' expression;
expression: assignmentExpr | condExpr;
It doesn't work, because it contains a non-LL(*) decision. I also tried this variant:
assignmentExpr: Identifier indexes? '=' (expression | condExpr);
but I got the same error. I am interested in
This specific question
Given a grammar with a non-LL(*) decision, how to find the two paths that cause the problem
How to fix it
I think you can change your grammar like this to achieve the same, without using syntactic predicates:
statement: Expr ';' | functionCall ';'...;
Expr: Identifier indexes? '=' Expr | condExpr ;
condExpr: .... and so on;
I altered Bart's example with this idea in mind:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
}
parse
: stat+ EOF -> ^(ROOT stat+)
;
stat
: expr ';'
;
expr
: Id Assign expr -> ^(Assign Id expr)
| add
;
add
: mult (('+' | '-')^ mult)*
;
mult
: atom (('*' | '/')^ atom)*
;
atom
: Id
| Num
| '('! expr ')' !
;
Assign : '=' ;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
And for the input:
a=b=4;
a = 2 * (b = 1);
you get following parse tree:
The key here is that you need to "assure" the parser that inside an expression, there is something ahead that satisfies the expression. This can be done using a syntactic predicate (the ( ... )=> parts in the add and mult rules).
A quick demo:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
ASSIGN;
}
parse
: stat* EOF -> ^(ROOT stat+)
;
stat
: expr ';' -> expr
;
expr
: add
;
add
: mult ((('+' | '-') mult)=> ('+' | '-')^ mult)*
;
mult
: atom ((('*' | '/') atom)=> ('*' | '/')^ atom)*
;
atom
: (Id -> Id) ('=' expr -> ^(ASSIGN Id expr))?
| Num
| '(' expr ')' -> expr
;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
which will parse the input:
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
into the following AST:

How to write a recursive descent parser from scratch?

As a purely academic exercise, I'm writing a recursive descent parser from scratch -- without using ANTLR or lex/yacc.
I'm writing a simple function which converts math expressions into their equivalent AST. I have the following:
// grammar
type expr =
| Lit of float
| Add of expr * expr
| Mul of expr * expr
| Div of expr * expr
| Sub of expr * expr
// tokens
type tokens =
| Num of float
| LParen | RParen
| XPlus | XStar | XMinus | XSlash
let tokenize (input : string) =
Regex.Matches(input.Replace(" ", ""), "\d+|[+/*\-()]")
|> Seq.cast<Match>
|> Seq.map (fun x -> x.Value)
|> Seq.map (function
| "+" -> XPlus
| "-" -> XMinus
| "/" -> XSlash
| "*" -> XStar
| "(" -> LParen
| ")" -> RParen
| num -> Num(float num))
|> Seq.to_list
So, tokenize "10 * (4 + 5) - 1" returns the following token stream:
[Num 10.0; XStar; LParen; Num 4.0; XPlus; Num 5.0; RParen; XMinus; Num 1.0]
At this point, I'd like to map the token stream to its AST with respect to operator precedence:
Sub(
Mul(
Lit 10.0
,Add(Lit 4.0, Lit 5.0)
)
,Lit 1.0
)
However, I'm drawing a blank. I've never written a parser from scratch, and I don't know even in principle how to begin.
How do I convert a token stream its representative AST?
Do you know about language grammars?
Assuming yes, you have a grammar with rules along the lines
...
addTerm := mulTerm addOp addTerm
| mulTerm
addOp := XPlus | XMinus
mulTerm := litOrParen mulOp mulTerm
| litOrParen
...
which ends up turning into code like (writing code in browser, never compiled)
let rec AddTerm() =
let mulTerm = MulTerm() // will parse next mul term (error if fails to parse)
match TryAddOp with // peek ahead in token stream to try parse
| None -> mulTerm // next token was not prefix for addOp rule, stop here
| Some(ao) -> // did parse an addOp
let rhsMulTerm = MulTerm()
match ao with
| XPlus -> Add(mulTerm, rhsMulTerm)
| XMinus -> Sub(mulTerm, rhsMulTerm)
and TryAddOp() =
let next = tokens.Peek()
match next with
| XPlus | XMinus ->
tokens.ConsumeNext()
Some(next)
| _ -> None
...
Hopefully you see the basic idea. This assumes a global mutable token stream that allows both 'peek at next token' and 'consume next token'.
If I remember from college classes the idea was to build expression trees like:
<program> --> <expression> <op> <expression> | <expression>
<expression> --> (<expression>) | <constant>
<op> --> * | - | + | /
<constant> --> <constant><constant> | [0-9]
then once you have construction your tree completely so you get something like:
exp
exp op exp
5 + and so on
then you run your completed tree through another program that recursively descents into the tree calculating expressions until you have an answer. If your parser doesn't understand the tree, you have a syntax error. Hope that helps.

Resources