I have to check the syntax for a simple boolean expression such as
(X = 100 and Y < 100), I wrote the grammar and tried to check if this was correct so I am using this online tool http://smlweb.cpsc.ucalgary.ca/start.html. it is saying the grammar is wrong.
can someone point out the issue here ? thanks in advance.
boolean -> bool_term | boolean OR bool_term
bool_term -> bool_factor | bool_term AND bool_factor
bool_factor -> bool_primary | NOT bool_primary
bool_primary -> predicate | ( boolean )
predicate -> expr comp_op expr
expr -> string | number.
comp_op -> = | >
It is mostly about the stupid syntax of the service. For example, characters = and > are unsupported and there is no way to escape them.
This grammar works:
BOOLEAN -> BOOLTERM | BOOLEAN or BOOLTERM .
BOOLTERM -> BOOLFACTOR | BOOLTERM and BOOLFACTOR .
BOOLFACTOR -> BOOLPRIMARY | not BOOLPRIMARY .
BOOLPRIMARY -> PREDICATE | ( BOOLEAN ) .
PREDICATE -> EXPR COMPOP EXPR .
EXPR -> string | number .
COMPOP -> eq | gt .
Related
I've written a typical evaluator for simple math expressions (arithmetic with some custom functions) in F#. While it seems to be working correctly, some expressions don't evaluate as expected, for example, these work fine:
eval "5+2" --> 7
eval "sqrt(25)^2" --> 25
eval "1/(sqrt(4))" --> 0.5
eval "1/(2^2+2)" --> 1/6 ~ 0.1666...
but these don't:
eval "1/(sqrt(4)+2)" --> evaluates to 1/sqrt(6) ~ 0.408...
eval "1/(sqrt 4 + 2)" --> will also evaluate to 1/sqrt(6)
eval "1/(-1+3)" --> evaluates to 1/(-4) ~ -0.25
the code works as follows, tokenization (string as input) -> to rev-polish-notation (RPN) -> evalRpn
I thought that the problem seems to occur somewhere with the unary functions (functions accepting one operator), these are the sqrt function and the negation (-) function. I don't really see what's going wrong in my code. Can someone maybe point out what I am missing here?
this is my implementation in F#
open System.Collections
open System.Collections.Generic
open System.Text.RegularExpressions
type Token =
| Num of float
| Plus
| Minus
| Star
| Hat
| Sqrt
| Slash
| Negative
| RParen
| LParen
let hasAny (list: Stack<'T>) =
list.Count <> 0
let tokenize (input:string) =
let tokens = new Stack<Token>()
let push tok = tokens.Push tok
let regex = new Regex(#"[0-9]+(\.+\d*)?|\+|\-|\*|\/|\^|\)|\(|pi|e|sqrt")
for x in regex.Matches(input.ToLower()) do
match x.Value with
| "+" -> push Plus
| "*" -> push Star
| "/" -> push Slash
| ")" -> push LParen
| "(" -> push RParen
| "^" -> push Hat
| "sqrt" -> push Sqrt
| "pi" -> push (Num System.Math.PI)
| "e" -> push (Num System.Math.E)
| "-" ->
if tokens |> hasAny then
match tokens.Peek() with
| LParen -> push Minus
| Num v -> push Minus
| _ -> push Negative
else
push Negative
| value -> push (Num (float value))
tokens.ToArray() |> Array.rev |> Array.toList
let isUnary = function
| Negative | Sqrt -> true
| _ -> false
let prec = function
| Hat -> 3
| Star | Slash -> 2
| Plus | Minus -> 1
| _ -> 0
let toRPN src =
let output = new ResizeArray<Token>()
let stack = new Stack<Token>()
let rec loop = function
| Num v::tokens ->
output.Add(Num v)
loop tokens
| RParen::tokens ->
stack.Push RParen
loop tokens
| LParen::tokens ->
while stack.Peek() <> RParen do
output.Add(stack.Pop())
stack.Pop() |> ignore // pop the "("
loop tokens
| op::tokens when op |> isUnary ->
stack.Push op
loop tokens
| op::tokens ->
if stack |> hasAny then
if prec(stack.Peek()) >= prec op then
output.Add(stack.Pop())
stack.Push op
loop tokens
| [] ->
output.AddRange(stack.ToArray())
output
(loop src).ToArray()
let (#) op tok =
match tok with
| Num v ->
match op with
| Sqrt -> Num (sqrt v)
| Negative -> Num (v * -1.0)
| _ -> failwith "input error"
| _ -> failwith "input error"
let (##) op toks =
match toks with
| Num v,Num u ->
match op with
| Plus -> Num(v + u)
| Minus -> Num(v - u)
| Star -> Num(v * u)
| Slash -> Num(u / v)
| Hat -> Num(u ** v)
| _ -> failwith "input error"
| _ -> failwith "inpur error"
let evalRPN src =
let stack = new Stack<Token>()
let rec loop = function
| Num v::tokens ->
stack.Push(Num v)
loop tokens
| op::tokens when op |> isUnary ->
let result = op # stack.Pop()
stack.Push result
loop tokens
| op::tokens ->
let result = op ## (stack.Pop(),stack.Pop())
stack.Push result
loop tokens
| [] -> stack
if loop src |> hasAny then
match stack.Pop() with
| Num v -> v
| _ -> failwith "input error"
else failwith "input error"
let eval input =
input |> (tokenize >> toRPN >> Array.toList >> evalRPN)
Before answering your specific question, did you notice you have another bug? Try eval "2-4" you get 2.0 instead of -2.0.
That's probably because along these lines:
match op with
| Plus -> Num(v + u)
| Minus -> Num(v - u)
| Star -> Num(v * u)
| Slash -> Num(u / v)
| Hat -> Num(u ** v)
u and v are swapped, in commutative operations you don't notice the difference, so just revert them to u -v.
Now regarding the bug you mentioned, the cause seems obvious to me, by looking at your code you missed the precedence of those unary operations:
let prec = function
| Hat -> 3
| Star | Slash -> 2
| Plus | Minus -> 1
| _ -> 0
I tried adding them this way:
let prec = function
| Negative -> 5
| Sqrt -> 4
| Hat -> 3
| Star | Slash -> 2
| Plus | Minus -> 1
| _ -> 0
And now it seems to be fine.
Edit: meh, seems I was late, Gustavo posted the answer while I was wondering about the parentheses. Oh well.
Unary operators have the wrong precedence. Add the primary case | a when isUnary a -> 4 to prec.
The names of LParen and RParen are consistently swapped throughout the code. ( maps to RParen and ) to LParen!
It runs all tests from the question properly for me, given the appropriate precedence, but I haven't checked the code for correctness.
I've defined an expression tree structure in F# as follows:
type Num = int
type Name = string
type Expr =
| Con of Num
| Var of Name
| Add of Expr * Expr
| Sub of Expr * Expr
| Mult of Expr * Expr
| Div of Expr * Expr
| Pow of Expr * Expr
| Neg of Expr
I wanted to be able to pretty-print the expression tree so I did the following:
let (|Unary|Binary|Terminal|) expr =
match expr with
| Add(x, y) -> Binary(x, y)
| Sub(x, y) -> Binary(x, y)
| Mult(x, y) -> Binary(x, y)
| Div(x, y) -> Binary(x, y)
| Pow(x, y) -> Binary(x, y)
| Neg(x) -> Unary(x)
| Con(x) -> Terminal(box x)
| Var(x) -> Terminal(box x)
let operator expr =
match expr with
| Add(_) -> "+"
| Sub(_) | Neg(_) -> "-"
| Mult(_) -> "*"
| Div(_) -> "/"
| Pow(_) -> "**"
| _ -> failwith "There is no operator for the given expression."
let rec format expr =
match expr with
| Unary(x) -> sprintf "%s(%s)" (operator expr) (format x)
| Binary(x, y) -> sprintf "(%s %s %s)" (format x) (operator expr) (format y)
| Terminal(x) -> string x
However, I don't really like the failwith approach for the operator function since it's not compile-time safe. So I rewrote it as an active pattern:
let (|Operator|_|) expr =
match expr with
| Add(_) -> Some "+"
| Sub(_) | Neg(_) -> Some "-"
| Mult(_) -> Some "*"
| Div(_) -> Some "/"
| Pow(_) -> Some "**"
| _ -> None
Now I can rewrite my format function beautifully as follows:
let rec format expr =
match expr with
| Unary(x) & Operator(op) -> sprintf "%s(%s)" op (format x)
| Binary(x, y) & Operator(op) -> sprintf "(%s %s %s)" (format x) op (format y)
| Terminal(x) -> string x
I assumed, since F# is magic, that this would just work. Unfortunately, the compiler then warns me about incomplete pattern matches, because it can't see that anything that matches Unary(x) will also match Operator(op) and anything that matches Binary(x, y) will also match Operator(op). And I consider warnings like that to be as bad as compiler errors.
So my questions are: Is there a specific reason why this doesn't work (like have I left some magical annotation off somewhere or is there something that I'm just not seeing)? Is there a simple workaround I could use to get the type of safety I want? And is there an inherent problem with this type of compile-time checking, or is it something that F# might add in some future release?
If you code the destinction between ground terms and complex terms into the type system, you can avoid the runtime check and make them be complete pattern matches.
type Num = int
type Name = string
type GroundTerm =
| Con of Num
| Var of Name
type ComplexTerm =
| Add of Term * Term
| Sub of Term * Term
| Mult of Term * Term
| Div of Term * Term
| Pow of Term * Term
| Neg of Term
and Term =
| GroundTerm of GroundTerm
| ComplexTerm of ComplexTerm
let (|Operator|) ct =
match ct with
| Add(_) -> "+"
| Sub(_) | Neg(_) -> "-"
| Mult(_) -> "*"
| Div(_) -> "/"
| Pow(_) -> "**"
let (|Unary|Binary|) ct =
match ct with
| Add(x, y) -> Binary(x, y)
| Sub(x, y) -> Binary(x, y)
| Mult(x, y) -> Binary(x, y)
| Div(x, y) -> Binary(x, y)
| Pow(x, y) -> Binary(x, y)
| Neg(x) -> Unary(x)
let (|Terminal|) gt =
match gt with
| Con x -> Terminal(string x)
| Var x -> Terminal(string x)
let rec format expr =
match expr with
| ComplexTerm ct ->
match ct with
| Unary(x) & Operator(op) -> sprintf "%s(%s)" op (format x)
| Binary(x, y) & Operator(op) -> sprintf "(%s %s %s)" (format x) op (format y)
| GroundTerm gt ->
match gt with
| Terminal(x) -> x
also, imo, you should avoid boxing if you want to be type-safe. If you really want both cases, make two pattern. Or, as done here, just make a projection to the type you need later on. This way you avoid the boxing and instead you return what you need for printing.
I think you can make operator a normal function rather than an active pattern. Because operator is just a function which gives you an operator string for an expr, where as unary, binary and terminal are expression types and hence it make sense to pattern match on them.
let operator expr =
match expr with
| Add(_) -> "+"
| Sub(_) | Neg(_) -> "-"
| Mult(_) -> "*"
| Div(_) -> "/"
| Pow(_) -> "**"
| Var(_) | Con(_) -> ""
let rec format expr =
match expr with
| Unary(x) -> sprintf "%s(%s)" (operator expr) (format x)
| Binary(x, y) -> sprintf "(%s %s %s)" (format x) (operator expr) (format y)
| Terminal(x) -> string x
I find the best solution is to restructure your original type defintion:
type UnOp = Neg
type BinOp = Add | Sub | Mul | Div | Pow
type Expr =
| Int of int
| UnOp of UnOp * Expr
| BinOp of BinOp * Expr * Expr
All sorts of functions can then be written over the UnOp and BinOp types including selecting operators. You may even want to split BinOp into arithmetic and comparison operators in the future.
For example, I used this approach in the (non-free) article "Language-oriented programming: The Term-level Interpreter
" (2008) in the F# Journal.
I'm trying to extend the grammar of the Tiny Language to treat assignment as expression. Thus it would be valid to write
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
a = 1 = 2; // invalid
Assignment differs from other operators in two aspects. It's right associative (not a big deal), and its left-hand side is has to be a variable. So I changed the grammar like this
statement: assignmentExpr | functionCall ...;
assignmentExpr: Identifier indexes? '=' expression;
expression: assignmentExpr | condExpr;
It doesn't work, because it contains a non-LL(*) decision. I also tried this variant:
assignmentExpr: Identifier indexes? '=' (expression | condExpr);
but I got the same error. I am interested in
This specific question
Given a grammar with a non-LL(*) decision, how to find the two paths that cause the problem
How to fix it
I think you can change your grammar like this to achieve the same, without using syntactic predicates:
statement: Expr ';' | functionCall ';'...;
Expr: Identifier indexes? '=' Expr | condExpr ;
condExpr: .... and so on;
I altered Bart's example with this idea in mind:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
}
parse
: stat+ EOF -> ^(ROOT stat+)
;
stat
: expr ';'
;
expr
: Id Assign expr -> ^(Assign Id expr)
| add
;
add
: mult (('+' | '-')^ mult)*
;
mult
: atom (('*' | '/')^ atom)*
;
atom
: Id
| Num
| '('! expr ')' !
;
Assign : '=' ;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
And for the input:
a=b=4;
a = 2 * (b = 1);
you get following parse tree:
The key here is that you need to "assure" the parser that inside an expression, there is something ahead that satisfies the expression. This can be done using a syntactic predicate (the ( ... )=> parts in the add and mult rules).
A quick demo:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
ASSIGN;
}
parse
: stat* EOF -> ^(ROOT stat+)
;
stat
: expr ';' -> expr
;
expr
: add
;
add
: mult ((('+' | '-') mult)=> ('+' | '-')^ mult)*
;
mult
: atom ((('*' | '/') atom)=> ('*' | '/')^ atom)*
;
atom
: (Id -> Id) ('=' expr -> ^(ASSIGN Id expr))?
| Num
| '(' expr ')' -> expr
;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
which will parse the input:
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
into the following AST:
Say I've got some code like this
match exp with
| Addition(lhs,rhs,_) -> Addition(fix lhs,fix rhs)
| Subtraction(lhs,rhs,_) -> Subtraction(fix lhs,fix rhs)
is there any way that would allow me to do something like
match exp with
| Addition(lhs,rhs,_)
| Subtraction(lhs,rhs,_) -> X(fix lhs,fix rhs)
where X be based on the actual pattern being matched
I like #kvb's answer.
This does suggest that you may want to redefine the DU, though:
type Op = | Add | Sub
type Expr = | Binary of Op * Expr * Expr
You can use an active pattern:
let (|Binary|_|) = function
| Addition(e1,e2) -> Some(Addition, e1, e2)
| Subtraction(e1,e2) -> Some(Subtraction, e1, e2)
| _ -> None
let rec fix = function
| Binary(con,lhs,rhs) -> con(fix lhs, fix rhs)
| _ -> ...
As a purely academic exercise, I'm writing a recursive descent parser from scratch -- without using ANTLR or lex/yacc.
I'm writing a simple function which converts math expressions into their equivalent AST. I have the following:
// grammar
type expr =
| Lit of float
| Add of expr * expr
| Mul of expr * expr
| Div of expr * expr
| Sub of expr * expr
// tokens
type tokens =
| Num of float
| LParen | RParen
| XPlus | XStar | XMinus | XSlash
let tokenize (input : string) =
Regex.Matches(input.Replace(" ", ""), "\d+|[+/*\-()]")
|> Seq.cast<Match>
|> Seq.map (fun x -> x.Value)
|> Seq.map (function
| "+" -> XPlus
| "-" -> XMinus
| "/" -> XSlash
| "*" -> XStar
| "(" -> LParen
| ")" -> RParen
| num -> Num(float num))
|> Seq.to_list
So, tokenize "10 * (4 + 5) - 1" returns the following token stream:
[Num 10.0; XStar; LParen; Num 4.0; XPlus; Num 5.0; RParen; XMinus; Num 1.0]
At this point, I'd like to map the token stream to its AST with respect to operator precedence:
Sub(
Mul(
Lit 10.0
,Add(Lit 4.0, Lit 5.0)
)
,Lit 1.0
)
However, I'm drawing a blank. I've never written a parser from scratch, and I don't know even in principle how to begin.
How do I convert a token stream its representative AST?
Do you know about language grammars?
Assuming yes, you have a grammar with rules along the lines
...
addTerm := mulTerm addOp addTerm
| mulTerm
addOp := XPlus | XMinus
mulTerm := litOrParen mulOp mulTerm
| litOrParen
...
which ends up turning into code like (writing code in browser, never compiled)
let rec AddTerm() =
let mulTerm = MulTerm() // will parse next mul term (error if fails to parse)
match TryAddOp with // peek ahead in token stream to try parse
| None -> mulTerm // next token was not prefix for addOp rule, stop here
| Some(ao) -> // did parse an addOp
let rhsMulTerm = MulTerm()
match ao with
| XPlus -> Add(mulTerm, rhsMulTerm)
| XMinus -> Sub(mulTerm, rhsMulTerm)
and TryAddOp() =
let next = tokens.Peek()
match next with
| XPlus | XMinus ->
tokens.ConsumeNext()
Some(next)
| _ -> None
...
Hopefully you see the basic idea. This assumes a global mutable token stream that allows both 'peek at next token' and 'consume next token'.
If I remember from college classes the idea was to build expression trees like:
<program> --> <expression> <op> <expression> | <expression>
<expression> --> (<expression>) | <constant>
<op> --> * | - | + | /
<constant> --> <constant><constant> | [0-9]
then once you have construction your tree completely so you get something like:
exp
exp op exp
5 + and so on
then you run your completed tree through another program that recursively descents into the tree calculating expressions until you have an answer. If your parser doesn't understand the tree, you have a syntax error. Hope that helps.