Unambiguous grammar for expressions with let and addition - parsing

What is an unambiguous grammar equivalent to to the following ambiguous grammar for a language of expressions with let and addition?
E ⇒ let id = E in E
E ⇒ E + E
E ⇒ num
The ambiguity should be solved so that:
addition is left associative
addition has higher precedence than let expressions when it appears on the right
addition has lower precedence than let expressions when it appears on the left
Using braces to show the grouping of sub-expressions, the following illustrates how expressions should be interpreted:
num + num + num => { num + num } + num
let id = num in num + num => let id = num in { num + num }
num + let id = num in num => num + { let id = num in num }

Consider the expression
E1 + E2
E1 cannot have the form let ID = E3 because let ID = E3 + E2 must be parsed as let ID = (E3 + E2). This restriction is recursive: it also cannot have the form E4 + let ID = E3.
E2 can have the form let ID = E3 but it cannot have the form E3 + E4 (because E1 + E3 + E4 must be parsed as (E1 + E3) + E4). Only E1 can have the form E3 + E4.
It's straight-forward (but repetitive) to translate these restrictions to BNF:
Expr ⇒ Sum
Sum ⇒ SumNoLet '+' Atom
| Atom
SumNoLet ⇒ SumNoLet '+' AtomNoLet
| AtomNoLet
AtomNoLet ⇒ num
| id
| '(' Expr ')'
Atom ⇒ AtomNoLet
| 'let' id '=' Expr
To make the pattern clearer, we can add the * operator:
Expr ⇒ Sum
Sum ⇒ SumNoLet '+' Prod
| Prod
SumNoLet ⇒ SumNoLet '+' ProdNoLet
| ProdNoLet
Prod ⇒ ProdNoLet '*' Atom
| Atom
ProdNoLet ⇒ ProdNoLet '*' AtomNoLet
| AtomNoLet
AtomNoLet ⇒ num
| id
| '(' Expr ')'
Atom ⇒ AtomNoLet
| 'let' id '=' Expr
It is possible to implement this in bison (or other similar parser generators) using precedence declarations. But the precedence solution is harder to reason about, and can be confusing to incorporate into more complicated grammars.

Related

Expression Evaluation using combinators in Haskell

I'm trying to make an expression evaluator in Hakell:
data Parser i o
= Success o [i]
| Failure String [i]
| Parser
{parse :: [i] -> Parser i o}
data Operator = Add | Sub | Mul | Div | Pow
data Expr
= Op Operator Expr Expr
| Val Double
expr :: Parser Char Expr
expr = add_sub
where
add_sub = calc Add '+' mul_div <|> calc Sub '-' mul_div <|> mul_div
mul_div = calc Mul '*' pow <|> calc Div '/' pow <|> pow
pow = calc Pow '^' factor <|> factor
factor = parens <|> val
val = Val <$> parseDouble
parens = parseChar '(' *> expr <* parseChar ')'
calc c o p = Op c <$> (p <* parseChar o) <*> p
My problem is that when I try to evaluate an expression with two operators with same priority (e.g. 1+1-1) the parser will fail.
How can I say that an add_sub can be an operation between two other add_subs without creating an infinite loop?
As explained by #chi the problem is that calc was using p twice which doesn't allow for patterns like muldiv + .... | muldiv - ... | ...
I just changed the definition of calc to :
calc c o p p2 = Op c <$> (p <* parseChar o) <*> p2
where p2 is the current priority (mul_div in the definition of mul_div)
it works much better but the order of calulations is backwards:
2/3/4 is parsed as 2/(3/4) instead of (2/3)/4

Is there a way to fix an expression with operators in it after parsing, using a table of associativities and precedences?

I'm currently working on a parser for a simple programming language written in Haskell. I ran into a problem when I tried to allow for binary operators with differing associativities and precedences. Normally this wouldn't be an issue, but since my language allows users to define their own operators, the precedence of operators isn't known by the compiler until the program has already been parsed.
Here are some of the data types I've defined so far:
data Expr
= Var String
| Op String Expr Expr
| ..
data Assoc
= LeftAssoc
| RightAssoc
| NonAssoc
type OpTable =
Map.Map String (Assoc, Int)
At the moment, the compiler parses all operators as if they were right-associative with equal precedence. So if I give it an expression like a + b * c < d the result will be Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d"))).
I'm trying to write a function called fixExpr which takes an OpTable and an Expr and rearranges the Expr based on the associativities and precedences listed in the OpTable. For example:
operators :: OpTable
operators =
Map.fromList
[ ("<", (NonAssoc, 4))
, ("+", (LeftAssoc, 6))
, ("*", (LeftAssoc, 7))
]
expr :: Expr
expr = Op "+" (Var "a") (Op "*" (Var "b") (Op "<" (Var "c") (Var "d")))
fixExpr operators expr should evaluate to Op "<" (Op "+" (Var "a") (Op "*" (Var "b") (Var "c"))) (Var "d").
How do I define the fixExpr function? I've tried multiple solutions and none of them have worked.
An expression e may be an atomic term n (e.g. a variable or literal), a parenthesised expression, or an application of an infix operator ○.
e ⩴ n | (e​) | e1 ○ e2
We need the parentheses to know whether the user entered a * b + c, which we happen to associate as a * (b + c) and need to reassociate as (a * b) + c, or if they entered a * (b + c) literally, which should not be reassociated. Therefore I’ll make a small change to the data type:
data Expr
= Var String
| Group Expr
| Op String Expr Expr
| …
Then the method is simple:
The rebracketing of an expression ⟦e⟧ applies recursively to all its subexpressions.
⟦n⟧ = n
⟦(e)⟧ = (⟦e⟧)
⟦e1 ○ e2⟧ = ⦅⟦e1⟧ ○ ⟦e2⟧⦆
A single reassociation step ⦅e⦆ removes redundant parentheses on the right, and reassociates nested operator applications leftward in two cases: if the left operator has higher precedence, or if the two operators have equal precedence, and are both left-associative. It leaves nested infix applications alone, that is, associating rightward, in the opposite cases: if the right operator has higher precedence, or the operators have equal precedence and right associativity. If the associativities are mismatched, then the result is undefined.
⦅e ○ n⦆ = e ○ n
⦅e1 ○ (e2)⦆ = ⦅e1 ○ e2⦆
⦅e1 ○ (e2 ● e3)⦆ =
⦅e1 ○ e2⦆ ● e3, if:
a. P(○) > P(●); or
b. P(○) = P(●) and A(○) = A(●) = L
e1 ○ (e2 ● e3), if:
a. P(○) < P(●); or
b. P(○) = P(●) and A(○) = A(●) = R
undefined otherwise
NB.: P(o) and A(o) are respectively the precedence and associativity (L or R) of operator o.
This can be translated fairly literally to Haskell:
fixExpr operators = reassoc
where
-- 1.1
reassoc e#Var{} = e
-- 1.2
reassoc (Group e) = Group (reassoc e)
-- 1.3
reassoc (Op o e1 e2) = reassoc' o (reassoc e1) (reassoc e2)
-- 2.1
reassoc' o e1 e2#Var{} = Op o e1 e2
-- 2.2
reassoc' o e1 (Group e2) = reassoc' o e1 e2
-- 2.3
reassoc' o1 e1 r#(Op o2 e2 e3) = case compare prec1 prec2 of
-- 2.3.1a
GT -> assocLeft
-- 2.3.2a
LT -> assocRight
EQ -> case (assoc1, assoc2) of
-- 2.3.1b
(LeftAssoc, LeftAssoc) -> assocLeft
-- 2.3.2b
(RightAssoc, RightAssoc) -> assocRight
-- 2.3.3
_ -> error $ concat
[ "cannot mix ‘", o1
, "’ ("
, show assoc1
, " "
, show prec1
, ") and ‘"
, o2
, "’ ("
, show assoc2
, " "
, show prec2
, ") in the same infix expression"
]
where
(assoc1, prec1) = opInfo o1
(assoc2, prec2) = opInfo o2
assocLeft = Op o2 (Group (reassoc' o1 e1 e2)) e3
assocRight = Op o1 e1 r
opInfo op = fromMaybe (notFound op) (Map.lookup op operators)
notFound op = error $ concat
[ "no precedence/associativity defined for ‘"
, op
, "’"
]
Note the recursive call in assocLeft: by reassociating the operator applications, we may have revealed another association step, as in a chain of left-associative operator applications like a + b + c + d = (((a + b) + c) + d).
I insert Group constructors in the output for illustration, but they can be removed at this point, since they’re only necessary in the input.
This hasn’t been tested very thoroughly at all, but I think the idea is sound, and should accommodate modifications for more complex situations, even if the code leaves something to be desired.
An alternative that I’ve used is to parse expressions as “flat” sequences of operators applied to terms, and then run a separate parsing pass after name resolution, using e.g. Parsec’s operator precedence parser facility, which would handle these details automatically.

Calculating first and follow set of grammar

below is the grammar that i am using for a calculator language and my attempt at finding the follow set and the first set of the grammar.
I would love help in figuring out what i am doing wrong when trying to figure out these sets because I feel like i am not doing them correctly at all (at least for the follow sets)
Grammar
program → stmt_list $$$
stmt_list → stmt stmt_list | ε
stmt → id = expr | input id | print expr
expr → term term_tail
term_tail → add op term term_tail | ε
term → factor fact_tail
fact_tail → mult_op fact fact_tail | ε
factor → ( expr ) | number | id
add_op → + | -
mult_op → * | / | // | %
First set
first(p) = {id, input, print}
first(stmt_list) = {id, input, print, e}
first(s) = {id, input, print}
first(expr) = {(, id, number}
first(term_tail) = {+, -, e}
first(term) = {(, id, number}
first(fact_tail) = {, /, //, %, e}
first(factor) = {(, id, number}
first(add_op) = {+, -}
first(mult_op) = {, /, //, %}
Follow Set
follow(p) = {$}
follow(stmt_list) = {$}
follow(stmt) = {id, input, print}
follow(expr) = {(, id, number, ), input, print, , /, //, %}
follow(term_tail) = {), (, id, number, print, input}
follow(term) = {+, -}
follow(factor) = {, /, //, %}
follow(add_op) = {}
follow(mult_op) = {}
follow(fact_tail) = {*, /, //, %, +, -}
You have certain mistakes in First as well
first(p) = {id, input, print,e}
it will include epsilon
* is missing in the next two -
first(fact_tail) = { *,/, //, %, e} first(mult_op) = {*, /, //, %}
fact_tail → mult_op fact fact_tail | ε
Iam assuming here you actually mean
fact_tail → mult_op factor fact_tail | ε
Follow
follow(stmt) = {id, input, print,$}
if you refer to
stmt_list → stmt stmt_list | ε
then stmt is followed by first of stmt_list which includes e so string generated will end, hence stmt is followed by $
follow(expr) = {(, id, number, ), input, print, , /, //, %}
I don't know how you got this, follow of expr is equal to follow of stmt and )
follow(expr) = {id, ), input, print,$}
follow(term_tail) is equal to follow(expr)
follow(term) = {+,-,),id,input,print,$}
follow(fact_tail) is equal to follow(term)
follow(factor) = first(fact_tail)
follow(add_op) = first(term)
follow(mult_op) = first(factor)

Changing associativity schema in a grammar

I'm trying to use SableCC to generate a Parser for models, which I call LAM. LAM in itself are simple, and a simple grammar (where I omit a lot of things) for these is:
L := 0 | (x,y) | F(x1,...,xn) | L || L | L ; L
I wrote this grammar:
Helpers
number = ['0' .. '9'] ;
letter = ['a' .. 'z'] ;
uletter = ['A' .. 'Z'] ;
Tokens
zero = '0' ;
comma = ',' ;
parallel = '||' ;
point = ';' ;
lpar = '(' ;
rpar = ')' ;
identifier = letter+ number* ;
uidentifier = uletter+ number* ;
Productions
expr = {term} term |
{parallel} expr parallel term |
{point} expr point term;
term = {parenthesis} lpar expr rpar |
{zero} zero |
{invk} uidentifier lpar paramlist rpar |
{pair} lpar [left]:identifier comma [right]:identifier rpar ;
paramlist = {list} list |
{empty} ;
list = {var} identifier |
{com} identifier comma list ;
This basically works, but there is a side effect: it is left associative. For example, if I have
L = L1 || L2 ; L3 || L4
Then it is parsed like:
L = ((L1 || L2) ; L3) || L4
I want to give all precedence to the ";" operator, and so have L parsed like
L = (L1 || L2) ; (L3 || L4)
(other things, like "||", could remains left-associative)
My questions are:
There are tips to do such conversions in a "automated" way?
How could be a grammar with all the precedence on the ";" ?
It is accepted also "RTFM link" :-D
Thank you all
You need to create a hierarchy of rules that matches the desired operator precedence.
expr = {subexp} subexp |
{parallel} subexp parallel expr ;
subexp = {term} term |
{point} term point subexp;
Note that I also changed the associativity.

How to write a recursive descent parser from scratch?

As a purely academic exercise, I'm writing a recursive descent parser from scratch -- without using ANTLR or lex/yacc.
I'm writing a simple function which converts math expressions into their equivalent AST. I have the following:
// grammar
type expr =
| Lit of float
| Add of expr * expr
| Mul of expr * expr
| Div of expr * expr
| Sub of expr * expr
// tokens
type tokens =
| Num of float
| LParen | RParen
| XPlus | XStar | XMinus | XSlash
let tokenize (input : string) =
Regex.Matches(input.Replace(" ", ""), "\d+|[+/*\-()]")
|> Seq.cast<Match>
|> Seq.map (fun x -> x.Value)
|> Seq.map (function
| "+" -> XPlus
| "-" -> XMinus
| "/" -> XSlash
| "*" -> XStar
| "(" -> LParen
| ")" -> RParen
| num -> Num(float num))
|> Seq.to_list
So, tokenize "10 * (4 + 5) - 1" returns the following token stream:
[Num 10.0; XStar; LParen; Num 4.0; XPlus; Num 5.0; RParen; XMinus; Num 1.0]
At this point, I'd like to map the token stream to its AST with respect to operator precedence:
Sub(
Mul(
Lit 10.0
,Add(Lit 4.0, Lit 5.0)
)
,Lit 1.0
)
However, I'm drawing a blank. I've never written a parser from scratch, and I don't know even in principle how to begin.
How do I convert a token stream its representative AST?
Do you know about language grammars?
Assuming yes, you have a grammar with rules along the lines
...
addTerm := mulTerm addOp addTerm
| mulTerm
addOp := XPlus | XMinus
mulTerm := litOrParen mulOp mulTerm
| litOrParen
...
which ends up turning into code like (writing code in browser, never compiled)
let rec AddTerm() =
let mulTerm = MulTerm() // will parse next mul term (error if fails to parse)
match TryAddOp with // peek ahead in token stream to try parse
| None -> mulTerm // next token was not prefix for addOp rule, stop here
| Some(ao) -> // did parse an addOp
let rhsMulTerm = MulTerm()
match ao with
| XPlus -> Add(mulTerm, rhsMulTerm)
| XMinus -> Sub(mulTerm, rhsMulTerm)
and TryAddOp() =
let next = tokens.Peek()
match next with
| XPlus | XMinus ->
tokens.ConsumeNext()
Some(next)
| _ -> None
...
Hopefully you see the basic idea. This assumes a global mutable token stream that allows both 'peek at next token' and 'consume next token'.
If I remember from college classes the idea was to build expression trees like:
<program> --> <expression> <op> <expression> | <expression>
<expression> --> (<expression>) | <constant>
<op> --> * | - | + | /
<constant> --> <constant><constant> | [0-9]
then once you have construction your tree completely so you get something like:
exp
exp op exp
5 + and so on
then you run your completed tree through another program that recursively descents into the tree calculating expressions until you have an answer. If your parser doesn't understand the tree, you have a syntax error. Hope that helps.

Resources