Calculating first and follow set of grammar - parsing

below is the grammar that i am using for a calculator language and my attempt at finding the follow set and the first set of the grammar.
I would love help in figuring out what i am doing wrong when trying to figure out these sets because I feel like i am not doing them correctly at all (at least for the follow sets)
Grammar
program → stmt_list $$$
stmt_list → stmt stmt_list | ε
stmt → id = expr | input id | print expr
expr → term term_tail
term_tail → add op term term_tail | ε
term → factor fact_tail
fact_tail → mult_op fact fact_tail | ε
factor → ( expr ) | number | id
add_op → + | -
mult_op → * | / | // | %
First set
first(p) = {id, input, print}
first(stmt_list) = {id, input, print, e}
first(s) = {id, input, print}
first(expr) = {(, id, number}
first(term_tail) = {+, -, e}
first(term) = {(, id, number}
first(fact_tail) = {, /, //, %, e}
first(factor) = {(, id, number}
first(add_op) = {+, -}
first(mult_op) = {, /, //, %}
Follow Set
follow(p) = {$}
follow(stmt_list) = {$}
follow(stmt) = {id, input, print}
follow(expr) = {(, id, number, ), input, print, , /, //, %}
follow(term_tail) = {), (, id, number, print, input}
follow(term) = {+, -}
follow(factor) = {, /, //, %}
follow(add_op) = {}
follow(mult_op) = {}
follow(fact_tail) = {*, /, //, %, +, -}

You have certain mistakes in First as well
first(p) = {id, input, print,e}
it will include epsilon
* is missing in the next two -
first(fact_tail) = { *,/, //, %, e} first(mult_op) = {*, /, //, %}
fact_tail → mult_op fact fact_tail | ε
Iam assuming here you actually mean
fact_tail → mult_op factor fact_tail | ε
Follow
follow(stmt) = {id, input, print,$}
if you refer to
stmt_list → stmt stmt_list | ε
then stmt is followed by first of stmt_list which includes e so string generated will end, hence stmt is followed by $
follow(expr) = {(, id, number, ), input, print, , /, //, %}
I don't know how you got this, follow of expr is equal to follow of stmt and )
follow(expr) = {id, ), input, print,$}
follow(term_tail) is equal to follow(expr)
follow(term) = {+,-,),id,input,print,$}
follow(fact_tail) is equal to follow(term)
follow(factor) = first(fact_tail)
follow(add_op) = first(term)
follow(mult_op) = first(factor)

Related

Using OCaml Menhir, is there a way to access something before it is processed?

I am writing a parser to parse and compute function derivatives in a calculator.
I got a problem with implementing product and quotient rules :
for a product the derivation formula is (u*v)' = u'v+uv', thus I need the value of u and u' in the final output. And with the parser I currently have, whenever comes the need to write u, it has already been replaced by u' and I quite don't know how to save its value, nor if it's even
possible...
Here's the parser :
%token <string> VAR FUNCTION CONST
%token LEFT_B RIGHT_B PLUS MINUS TIMES DIV
%token DERIV
%token EOL
%start<string> main
%%
main:
t = toDeriv; EOL {t}
;
toDeriv:
DERIV; LEFT_B; e = expr; RIGHT_B {e}
;
expr:
u = expr; PLUS v = hat_funct {u^"+"^v}
| u = expr; MINUS; v = hat_funct {u^"-"^v}
| u = hat_funct {u}
;
hat_funct:
u = hat_funct TIMES v = funct {Printf.sprintf "%s * %s + %s * %s" u (A way to save v) v (A way to save u)}
| u = hat_funct DIV v = funct {Printf.sprintf "(%s * %s - %s * %s)/%s^2" u (A way to save v) (A way to save u) v (A way to save v)}
| u = funct {u}
;
funct:
f = func; LEFT_B; c = content; RIGHT_B {Derivatives.deriv_func f c}
;
content:
e = expr {e}
| x = VAR {x}
| k = CONST {k}
func:
f = FUNCTION {f}
| k = CONST {k}
;
P.S : I know it might not be the greatest grammar definition at all, it's still a work in progress
Answering directly your question, yes you can maintain the state of what is being already processed. But that is not how things are done. The idiomatic solution is to write a parser that parses the input language into the abstract syntax tree and then write a solver that will take this tree as input and computes it. You shouldn't do anything in the parser, this is a simple automaton which shall not have any side-effects.
To keep it less abstract, what you want from the parser is the function string -> expr, where expr type is defined something like
type expr =
| Var of string
| Const of string
| Binop of binop * expr * expr
and binop = Add | Mul | Sub | Div

Unambiguous grammar for expressions with let and addition

What is an unambiguous grammar equivalent to to the following ambiguous grammar for a language of expressions with let and addition?
E ⇒ let id = E in E
E ⇒ E + E
E ⇒ num
The ambiguity should be solved so that:
addition is left associative
addition has higher precedence than let expressions when it appears on the right
addition has lower precedence than let expressions when it appears on the left
Using braces to show the grouping of sub-expressions, the following illustrates how expressions should be interpreted:
num + num + num => { num + num } + num
let id = num in num + num => let id = num in { num + num }
num + let id = num in num => num + { let id = num in num }
Consider the expression
E1 + E2
E1 cannot have the form let ID = E3 because let ID = E3 + E2 must be parsed as let ID = (E3 + E2). This restriction is recursive: it also cannot have the form E4 + let ID = E3.
E2 can have the form let ID = E3 but it cannot have the form E3 + E4 (because E1 + E3 + E4 must be parsed as (E1 + E3) + E4). Only E1 can have the form E3 + E4.
It's straight-forward (but repetitive) to translate these restrictions to BNF:
Expr ⇒ Sum
Sum ⇒ SumNoLet '+' Atom
| Atom
SumNoLet ⇒ SumNoLet '+' AtomNoLet
| AtomNoLet
AtomNoLet ⇒ num
| id
| '(' Expr ')'
Atom ⇒ AtomNoLet
| 'let' id '=' Expr
To make the pattern clearer, we can add the * operator:
Expr ⇒ Sum
Sum ⇒ SumNoLet '+' Prod
| Prod
SumNoLet ⇒ SumNoLet '+' ProdNoLet
| ProdNoLet
Prod ⇒ ProdNoLet '*' Atom
| Atom
ProdNoLet ⇒ ProdNoLet '*' AtomNoLet
| AtomNoLet
AtomNoLet ⇒ num
| id
| '(' Expr ')'
Atom ⇒ AtomNoLet
| 'let' id '=' Expr
It is possible to implement this in bison (or other similar parser generators) using precedence declarations. But the precedence solution is harder to reason about, and can be confusing to incorporate into more complicated grammars.

Transform grammar into LL(1)

I have the following grammar:
START -> STM $
STM -> VAR = EXPR
STM -> EXPR
EXPR -> VAR
VAR -> id
VAR -> * EXPR
With this firstand follow sets:
First set Follow set
START id, * $
STM id, * $
EXPR id, * $, =
VAR id, * $, =
I've created the parsing table that follows:
$ = id * $
START START → STM $ START → STM $
STM STM → VAR = EXPR STM → VAR = EXPR
STM → EXPR STM → EXPR
EXPR EXPR → VAR EXPR → VAR
VAR VAR → id VAR → id
VAR → * EXPR VAR → * EXPR
From here I can see that this is not LL(1).
How can I modify this grammar so that it becomes LL(1)?
If you think about what sorts of strings can be generated by this particular grammar, it's all the strings of one of the following forms:
***....**id
***....**id = ***...**id
With this in mind, you can design an LL(1) grammar for this language by essentially building a new grammar for the language from scratch. Here's one way to do this:
Start → Statement $
Statement → StarredID OptExpr
StarredID → * StarredID | id
OptExpr → ε | = StarredID
Here, the FIRST sets are given as follows:
FIRST(Start) = {*, id}
FIRST(Statement) = {*, id}
FIRST(StarredID) = {*, id}
FIRST(OptExpr) = {ε, *, id}
FOLLOW(Statement) = {$}
FOLLOW(StarredID) = {=, $}
FOLLOW(OptExpr) = {$}
The parse table is then shown here:
* | id | = $
---------------+-------------------+-------------------+-------------+-----------
Start | Statement$ | Statement$ | |
Statement | StarredID OptExpr | StarredID OptExpr | |
StarredID | * StarredID | id | |
OptExpr | | | = StarredID | epsilon
So this grammar is LL(1).

Create discriminated union data from file/database

I have a discriminated union for expressions like this one (EQ =; GT >; etc)
(AND (OR (EQ X 0)
(GT X 10))
(OR (EQ Y 0)
(GT Y 10)))
I want to create instances of DU from such expressions saved in file/database.
How do i do it? If it is not feasible, what is the best way to approach it in F#?
Daniel: these expressions are saved in prefix format (as above) as text and will be parsed in F#. Thanks.
If you just want to know how to model these expressions using DUs, here's one way:
type BinaryOp =
| EQ
| GT
type Expr =
| And of Expr * Expr
| Or of Expr * Expr
| Binary of BinaryOp * Expr * Expr
| Var of string
| Value of obj
let expr =
And(
Or(
Binary(EQ, Var("X"), Value(0)),
Binary(GT, Var("X"), Value(10))),
Or(
Binary(EQ, Var("Y"), Value(0)),
Binary(GT, Var("Y"), Value(10))))
Now, this may be too "loose," i.e., it permits expressions like And(Value(1), Value(2)), which may not be valid according to your grammar. But this should give you an idea of how to approach it.
There are also some good examples in the F# Programming wikibook.
If you need to parse these expressions, I highly recommend FParsec.
Daniel's answer is good. Here's a similar approach, along with a simple top-down parser built with active patterns:
type BinOp = | And | Or
type Comparison = | Gt | Eq
type Expr =
| BinOp of BinOp * Expr * Expr
| Comp of Comparison * string * int
module private Parsing =
// recognize and strip a leading literal
let (|Lit|_|) lit (s:string) =
if s.StartsWith(lit) then Some(s.Substring lit.Length)
else None
// strip leading whitespace
let (|NoWs|) (s:string) =
s.TrimStart(' ', '\t', '\r', '\n')
// parse a binary operator
let (|BinOp|_|) = function
| Lit "AND" r -> Some(And, r)
| Lit "OR" r -> Some(Or, r)
| _ -> None
// parse a comparison operator
let (|Comparison|_|) = function
| Lit "GT" r -> Some(Gt, r)
| Lit "EQ" r -> Some(Eq, r)
| _ -> None
// parse a variable (alphabetical characters only)
let (|Var|_|) s =
let m = System.Text.RegularExpressions.Regex.Match(s, "^[a-zA-Z]+")
if m.Success then
Some(m.Value, s.Substring m.Value.Length)
else
None
// parse an integer
let (|Int|_|) s =
let m = System.Text.RegularExpressions.Regex.Match(s, #"^-?\d+")
if m.Success then
Some(int m.Value, s.Substring m.Value.Length)
else
None
// parse an expression
let rec (|Expr|_|) = function
| NoWs (Lit "(" (BinOp (b, Expr(e1, Expr(e2, Lit ")" rest))))) ->
Some(BinOp(b, e1, e2), rest)
| NoWs (Lit "(" (Comparison (c, NoWs (Var (v, NoWs (Int (i, Lit ")" rest))))))) ->
Some(Comp(c, v, i), rest)
| _ -> None
let parse = function
| Parsing.Expr(e, "") -> e
| s -> failwith (sprintf "Not a valid expression: %s" s)
let e = parse #"
(AND (OR (EQ X 0)
(GT X 10))
(OR (EQ Y 0)
(GT Y 10)))"

Changing associativity schema in a grammar

I'm trying to use SableCC to generate a Parser for models, which I call LAM. LAM in itself are simple, and a simple grammar (where I omit a lot of things) for these is:
L := 0 | (x,y) | F(x1,...,xn) | L || L | L ; L
I wrote this grammar:
Helpers
number = ['0' .. '9'] ;
letter = ['a' .. 'z'] ;
uletter = ['A' .. 'Z'] ;
Tokens
zero = '0' ;
comma = ',' ;
parallel = '||' ;
point = ';' ;
lpar = '(' ;
rpar = ')' ;
identifier = letter+ number* ;
uidentifier = uletter+ number* ;
Productions
expr = {term} term |
{parallel} expr parallel term |
{point} expr point term;
term = {parenthesis} lpar expr rpar |
{zero} zero |
{invk} uidentifier lpar paramlist rpar |
{pair} lpar [left]:identifier comma [right]:identifier rpar ;
paramlist = {list} list |
{empty} ;
list = {var} identifier |
{com} identifier comma list ;
This basically works, but there is a side effect: it is left associative. For example, if I have
L = L1 || L2 ; L3 || L4
Then it is parsed like:
L = ((L1 || L2) ; L3) || L4
I want to give all precedence to the ";" operator, and so have L parsed like
L = (L1 || L2) ; (L3 || L4)
(other things, like "||", could remains left-associative)
My questions are:
There are tips to do such conversions in a "automated" way?
How could be a grammar with all the precedence on the ";" ?
It is accepted also "RTFM link" :-D
Thank you all
You need to create a hierarchy of rules that matches the desired operator precedence.
expr = {subexp} subexp |
{parallel} subexp parallel expr ;
subexp = {term} term |
{point} term point subexp;
Note that I also changed the associativity.

Resources