Create discriminated union data from file/database - f#

I have a discriminated union for expressions like this one (EQ =; GT >; etc)
(AND (OR (EQ X 0)
(GT X 10))
(OR (EQ Y 0)
(GT Y 10)))
I want to create instances of DU from such expressions saved in file/database.
How do i do it? If it is not feasible, what is the best way to approach it in F#?
Daniel: these expressions are saved in prefix format (as above) as text and will be parsed in F#. Thanks.

If you just want to know how to model these expressions using DUs, here's one way:
type BinaryOp =
| EQ
| GT
type Expr =
| And of Expr * Expr
| Or of Expr * Expr
| Binary of BinaryOp * Expr * Expr
| Var of string
| Value of obj
let expr =
And(
Or(
Binary(EQ, Var("X"), Value(0)),
Binary(GT, Var("X"), Value(10))),
Or(
Binary(EQ, Var("Y"), Value(0)),
Binary(GT, Var("Y"), Value(10))))
Now, this may be too "loose," i.e., it permits expressions like And(Value(1), Value(2)), which may not be valid according to your grammar. But this should give you an idea of how to approach it.
There are also some good examples in the F# Programming wikibook.
If you need to parse these expressions, I highly recommend FParsec.

Daniel's answer is good. Here's a similar approach, along with a simple top-down parser built with active patterns:
type BinOp = | And | Or
type Comparison = | Gt | Eq
type Expr =
| BinOp of BinOp * Expr * Expr
| Comp of Comparison * string * int
module private Parsing =
// recognize and strip a leading literal
let (|Lit|_|) lit (s:string) =
if s.StartsWith(lit) then Some(s.Substring lit.Length)
else None
// strip leading whitespace
let (|NoWs|) (s:string) =
s.TrimStart(' ', '\t', '\r', '\n')
// parse a binary operator
let (|BinOp|_|) = function
| Lit "AND" r -> Some(And, r)
| Lit "OR" r -> Some(Or, r)
| _ -> None
// parse a comparison operator
let (|Comparison|_|) = function
| Lit "GT" r -> Some(Gt, r)
| Lit "EQ" r -> Some(Eq, r)
| _ -> None
// parse a variable (alphabetical characters only)
let (|Var|_|) s =
let m = System.Text.RegularExpressions.Regex.Match(s, "^[a-zA-Z]+")
if m.Success then
Some(m.Value, s.Substring m.Value.Length)
else
None
// parse an integer
let (|Int|_|) s =
let m = System.Text.RegularExpressions.Regex.Match(s, #"^-?\d+")
if m.Success then
Some(int m.Value, s.Substring m.Value.Length)
else
None
// parse an expression
let rec (|Expr|_|) = function
| NoWs (Lit "(" (BinOp (b, Expr(e1, Expr(e2, Lit ")" rest))))) ->
Some(BinOp(b, e1, e2), rest)
| NoWs (Lit "(" (Comparison (c, NoWs (Var (v, NoWs (Int (i, Lit ")" rest))))))) ->
Some(Comp(c, v, i), rest)
| _ -> None
let parse = function
| Parsing.Expr(e, "") -> e
| s -> failwith (sprintf "Not a valid expression: %s" s)
let e = parse #"
(AND (OR (EQ X 0)
(GT X 10))
(OR (EQ Y 0)
(GT Y 10)))"

Related

How do parentheses work with custom data types?

Currently, I am working on a problem of parsing and showing expressions in Haskell.
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
This is the code of my data type Expr and this is how i define show function:
instance Show Expr where
show (Val x) = show x
show (Var y) = y
show (p :+: q) = par (show p ++ "+" ++ show q)
show (p :-: q) = par (show p ++ "-" ++ show q)
show (p :/: q) = par (show p ++ "/" ++ show q)
show (p :*: q) = par (show p ++ "*" ++ show q)
show (p :%: q) = par (show p ++ "%" ++ show q)
par :: String -> String
par s = "(" ++ s ++ ")"
Later i tried to transform string input into the expression but i encounter the following problem: I don't understand how parentheses in the second case are implemented in Haskell.
*Main> Val 2 :*:Val 2 :+: Val 3
((2*2)+3)
*Main> Val 2 :*:(Val 2 :+: Val 3)
(2*(2+3))
Because of that, i am a bit confused regarding how should i transform parentheses from my string into the expression. Currently i am using the following function for parsing, but for now, it just ignores parentheses which is not intended behavior:
toExpr :: String -> Expr
toExpr str = f (lexer str) (Val 0)
where
f [] expr = expr
f (c:cs) expr
|isAlpha (head c) = f cs (Var c)
|isDigit (head c) = f cs (Val (read c))
|c == "+" = (expr :+: f cs (Val 0))
|c == "-" = (expr :-: f cs (Val 0))
|c == "/" = (expr :/: f cs (Val 0))
|c == "*" = (expr :*: f cs (Val 0))
|c == "%" = (expr :%: f cs (Val 0))
|otherwise = f cs expr
Edit: few grammar mistakes
I don't understand how parentheses in the second case are implemented in Haskell.
The brackets just give precedence to a certain part of the expression to parse. The problem is not with the parenthesis you render. I think the problem is that you did not assign precedence to your operators. This thus means that, unless you specify brackets, Haskell will consider all operators to have the same precedence, and parse these left-to-right. This thus means that x ⊕ y ⊗ z is parsed as (x ⊕ y) ⊗ z.
You can define the precedence of your :+:, :*, etc. operators with infixl:
infixl 7 :*:, :/:, :%:
infixl 5 :+:, :-:
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
As for your parser (the toExpr), you will need a parsing mechanism like a LALR parser [wiki] that stores results on a stack, and thus makes proper operations.
This was my final parser which gave me the result I needed. To get the result i wanted proper grammar was added and i wrote a parses according to he grammar.
Thanks, everyone for the help.
{-
parser for the following grammar:
E -> T E'
E' -> + T E' | - T E' | <empty string>
T -> F T'
T' -> * F T' | / F T' | % F T' | <empty string>
F -> (E) | <integer> | <identifier>
-}
parseExpr :: String -> (Expr,[String])
parseExpr tokens = parseE (lexer tokens)
parseE :: [String] -> (Expr,[String])
parseE tokens = parseE' acc rest where (acc,rest) = parseT tokens
parseE' :: Expr -> [String] -> (Expr,[String])
parseE' accepted ("+":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :+: acc) rest
parseE' accepted ("-":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :-: acc) rest
parseE' accepted tokens = (accepted,tokens)
parseT :: [String] -> (Expr,[String])
parseT tokens = let (acc,rest) = parseF tokens in parseT' acc rest
parseT' :: Expr -> [String] -> (Expr,[String])
parseT' accepted ("*":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :*: acc) rest
parseT' accepted ("/":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :/: acc) rest
parseT' accepted ("%":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :%: acc) rest
parseT' accepted tokens = (accepted,tokens)
parseF :: [String] -> (Expr,[String])
parseF ("(":tokens) = (e, tail rest) where (e,rest) = parseE tokens
parseF (t:tokens)
| isAlpha (head t) = (Var t,tokens)
| isDigit (head t) = (Val (read t),tokens)
| otherwise = error ""
parseF [] = error ""
lexer :: String -> [String]
lexer [] = []
lexer (c:cs)
| elem c " \t\n" = lexer cs
| elem c "=+-*/%()" = [c]:(lexer cs)
| isAlpha c = (c:takeWhile isAlpha cs):lexer(dropWhile isAlpha cs)
| isDigit c = (c:takeWhile isDigit cs):lexer(dropWhile isDigit cs)
| otherwise = error ""

How to insert an indentation check into the operator precedence parser?

I am working on the parsing stage for the language I am making and am having difficulty with the following.
let test2 = // I'd like this to be an error.
"""
2
+ 2
"""
let result = run (spaces >>. expr) test2
val result : ParserResult<CudaExpr,unit> =
Success: Add (LitInt32 2,LitInt32 2)
I already managed to make the following example when the terms are indented incorrectly
2 +
2
give me an error, but not when the operator is on the wrong indentation level. I need something like a before-parse check.
let operators expr i =
let f expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f expr
f opp.ExpressionParser
The above function is how the operators phase is structured and as you can see, the term parsers get wrapped in a function that does the indentation check, but the last line is faulty.
Here is a simplified example of the full parser.
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsecCS.dll"
#r "../../packages/FParsec.1.0.2/lib/net40-client/FParsec.dll"
open FParsec
type Expr =
| V of string
| Add of Expr * Expr
let identifier = many1Satisfy2L isAsciiLetter (fun x -> isAsciiLetter x || isDigit x || x = ''') "identifier" .>> spaces |>> V
let indentations expressions (s: CharStream<_>) =
let i = s.Column
let expr_indent expr (s: CharStream<_>) =
let expr (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
many1 expr s
expr_indent (expressions i) s
let expr =
let opp = new OperatorPrecedenceParser<_,_,_>()
opp.AddOperator(InfixOperator("+", spaces, 6, Associativity.Left, fun x y -> Add(x,y)))
let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
let rec expr s = indentations (operators identifier) s
expr
let test2 = // I'd like this to be an error.
"""
a
+
b
"""
let result = run (spaces >>. expr) test2
The full parser so far can be found here.
let operators expr i =
let f (s: CharStream<_>) = if i <= s.Column then expr s else pzero s
opp.TermParser <- f
f opp.ExpressionParser
I did not realize it 2.5 weeks ago, but what happens when a new block gets opened and expr s gets called is that the term parser gets overwritten with the new indentation and there is no way to back it up and restore it on exit. I did a bit of looking around and managed to adapt the Pratt top down parsing method for my purposes.
Here is a talk by Douglas Crockford on the method.
let poperator: Parser<_,_> =
let f c = (isAsciiIdContinue c || isAnyOf [|' ';'\t';'\n';'\"';'(';')';'{';'}';'[';']'|] c) = false
(many1Satisfy f .>> spaces)
>>= fun token ->
match dict_operator.TryGetValue token with
| true, x -> preturn x
| false, _ -> fail "unknown operator"
let rec led poperator term left (prec,asoc,m) =
match asoc with
| Associativity.Left | Associativity.None -> tdop poperator term prec |>> m left
| Associativity.Right -> tdop poperator term (prec-1) |>> m left
| _ -> failwith "impossible"
and tdop poperator term rbp =
let rec f left =
poperator >>= fun (prec,asoc,m as v) ->
if rbp < prec then led poperator term left v >>= loop
else pzero
and loop left = attempt (f left) <|>% left
term >>= loop
let operators expr i (s: CharStream<_>) =
let expr_indent expr (s: CharStream<_>) = expr_indent i (<=) expr s
let op s = expr_indent poperator s
let term s = expr_indent expr s
tdop op term 0 s
The led and tdop functions which do the actual precedence parsing are 10 lines long. The above is just a snippet of the full parser for the language I am making - in terms of syntax it is similar to F# and is indentation sensitive. Here is a more straightforward F# translation of Douglas Crockford's Javascript example.

How to write a arithmetic expression using a discriminated union in F#?

Recently, I started learning F# and I am a bit struggling with discriminated unions and function signatures.
I am trying to define an arithmetic expression (for sum, product, multiply, average etc) as a discriminate union and write a function that evaluates it.
However, I can't figure out what I am doing wrong. Could anyone point me in the right direction? This what I've tried so far:
Attempt 1
type Expr =
| Sum of int * int
| Avg of int * int
| Mul of int * int
let Evaluate (input : Expr) =
match input with
| Sum(a,b) -> a + b
printfn "%A" (Sum(5,10))
My output is :
Sum (5,10)
Attempt 2
I also tried something like :
type Expr = int -> int -> int
let evaluate (a : Expr) (b : Expr) =
match a,b with
| a,b -> (fun a -> a + b)
printfn "%A" (evaluate 5 10)
since all common arithmetic expressions (like sum, product, multiply, average etc) take two integer inputs and output a single integer.
I am getting errors for: 'This expression was expected to have type Expr but here has type Int'.
edit 1
let evaluate (input : Expr) =
match input with
| Sum(a,b) -> a + b
| Avg(a,b) -> a + b / 2
| Mul(a,b) -> a * b
let evaluate = function
| Sum(a,b) -> a + b
| Avg(a,b) -> a + b / 2
| Mul(a,b) -> a * b
Your first attempt comes close.
type Expr =
| Sum of int * int
| Avg of int * int
| Mul of int * int
let evaluate = function
| Sum(a,b) -> a + b
| Avg(a,b) -> a + b / 2
| Mul(a,b) -> a * b
As commented you left out the evaluation of the expression.
Also there's no reason for using printfn "%A", since we know the return type.
Sum(5,10) // Expr
|> evaluate // int
|> printfn "%i" // unit
In your second attempt you're mixing things up a bit.
type Expr = int -> int -> int
As Lee (again) correctly points out, this signature represents a function with 2 arguments.
let add : Expr = fun a b -> a + b
Or shorthand
let add : Expr = (+)
As Expr represents an arithmetic operator, following evaluate function combines them.
let evaluate (a : Expr) (b : Expr) =
match a,b with
| a,b -> (fun a -> a + b)
If you're intent was to sum two input integers, a and b should've been of type int.
If you want to combine multiple expressions, FuneSnabel's proposal is the way to go.
For example: (1 + 2) * (3 + 4)
type Expr =
| Cte of int
| Add of Expr * Expr
| Mul of Expr * Expr
let rec eval = function
| Cte(i) -> i
| Add(a,b) -> eval a + eval b
| Mul(a,b) -> eval a * eval b
Mul( Add(Cte 1,Cte 2) , Add(Cte 3,Cte 4) )
|> eval
|> printfn "%i"
Attempt 1 looks about right but limited as it doesn't allow you to create expressions like 1+2+3.
Instead you could try something like this:
type Expr =
| Constant of int
| Add of Expr*Expr
| Sub of Expr*Expr
| Mul of Expr*Expr
| Div of Expr*Expr
A bit more flexible and with support for bindings like x, y and so on:
type Expr =
| Constant of int
| Binding of string
| BinaryOp of string*(int -> int -> int)*Expr*Expr
| UnaryOp of string*(int -> int)*Expr

Incomplete match with AND patterns

I've defined an expression tree structure in F# as follows:
type Num = int
type Name = string
type Expr =
| Con of Num
| Var of Name
| Add of Expr * Expr
| Sub of Expr * Expr
| Mult of Expr * Expr
| Div of Expr * Expr
| Pow of Expr * Expr
| Neg of Expr
I wanted to be able to pretty-print the expression tree so I did the following:
let (|Unary|Binary|Terminal|) expr =
match expr with
| Add(x, y) -> Binary(x, y)
| Sub(x, y) -> Binary(x, y)
| Mult(x, y) -> Binary(x, y)
| Div(x, y) -> Binary(x, y)
| Pow(x, y) -> Binary(x, y)
| Neg(x) -> Unary(x)
| Con(x) -> Terminal(box x)
| Var(x) -> Terminal(box x)
let operator expr =
match expr with
| Add(_) -> "+"
| Sub(_) | Neg(_) -> "-"
| Mult(_) -> "*"
| Div(_) -> "/"
| Pow(_) -> "**"
| _ -> failwith "There is no operator for the given expression."
let rec format expr =
match expr with
| Unary(x) -> sprintf "%s(%s)" (operator expr) (format x)
| Binary(x, y) -> sprintf "(%s %s %s)" (format x) (operator expr) (format y)
| Terminal(x) -> string x
However, I don't really like the failwith approach for the operator function since it's not compile-time safe. So I rewrote it as an active pattern:
let (|Operator|_|) expr =
match expr with
| Add(_) -> Some "+"
| Sub(_) | Neg(_) -> Some "-"
| Mult(_) -> Some "*"
| Div(_) -> Some "/"
| Pow(_) -> Some "**"
| _ -> None
Now I can rewrite my format function beautifully as follows:
let rec format expr =
match expr with
| Unary(x) & Operator(op) -> sprintf "%s(%s)" op (format x)
| Binary(x, y) & Operator(op) -> sprintf "(%s %s %s)" (format x) op (format y)
| Terminal(x) -> string x
I assumed, since F# is magic, that this would just work. Unfortunately, the compiler then warns me about incomplete pattern matches, because it can't see that anything that matches Unary(x) will also match Operator(op) and anything that matches Binary(x, y) will also match Operator(op). And I consider warnings like that to be as bad as compiler errors.
So my questions are: Is there a specific reason why this doesn't work (like have I left some magical annotation off somewhere or is there something that I'm just not seeing)? Is there a simple workaround I could use to get the type of safety I want? And is there an inherent problem with this type of compile-time checking, or is it something that F# might add in some future release?
If you code the destinction between ground terms and complex terms into the type system, you can avoid the runtime check and make them be complete pattern matches.
type Num = int
type Name = string
type GroundTerm =
| Con of Num
| Var of Name
type ComplexTerm =
| Add of Term * Term
| Sub of Term * Term
| Mult of Term * Term
| Div of Term * Term
| Pow of Term * Term
| Neg of Term
and Term =
| GroundTerm of GroundTerm
| ComplexTerm of ComplexTerm
let (|Operator|) ct =
match ct with
| Add(_) -> "+"
| Sub(_) | Neg(_) -> "-"
| Mult(_) -> "*"
| Div(_) -> "/"
| Pow(_) -> "**"
let (|Unary|Binary|) ct =
match ct with
| Add(x, y) -> Binary(x, y)
| Sub(x, y) -> Binary(x, y)
| Mult(x, y) -> Binary(x, y)
| Div(x, y) -> Binary(x, y)
| Pow(x, y) -> Binary(x, y)
| Neg(x) -> Unary(x)
let (|Terminal|) gt =
match gt with
| Con x -> Terminal(string x)
| Var x -> Terminal(string x)
let rec format expr =
match expr with
| ComplexTerm ct ->
match ct with
| Unary(x) & Operator(op) -> sprintf "%s(%s)" op (format x)
| Binary(x, y) & Operator(op) -> sprintf "(%s %s %s)" (format x) op (format y)
| GroundTerm gt ->
match gt with
| Terminal(x) -> x
also, imo, you should avoid boxing if you want to be type-safe. If you really want both cases, make two pattern. Or, as done here, just make a projection to the type you need later on. This way you avoid the boxing and instead you return what you need for printing.
I think you can make operator a normal function rather than an active pattern. Because operator is just a function which gives you an operator string for an expr, where as unary, binary and terminal are expression types and hence it make sense to pattern match on them.
let operator expr =
match expr with
| Add(_) -> "+"
| Sub(_) | Neg(_) -> "-"
| Mult(_) -> "*"
| Div(_) -> "/"
| Pow(_) -> "**"
| Var(_) | Con(_) -> ""
let rec format expr =
match expr with
| Unary(x) -> sprintf "%s(%s)" (operator expr) (format x)
| Binary(x, y) -> sprintf "(%s %s %s)" (format x) (operator expr) (format y)
| Terminal(x) -> string x
I find the best solution is to restructure your original type defintion:
type UnOp = Neg
type BinOp = Add | Sub | Mul | Div | Pow
type Expr =
| Int of int
| UnOp of UnOp * Expr
| BinOp of BinOp * Expr * Expr
All sorts of functions can then be written over the UnOp and BinOp types including selecting operators. You may even want to split BinOp into arithmetic and comparison operators in the future.
For example, I used this approach in the (non-free) article "Language-oriented programming: The Term-level Interpreter
" (2008) in the F# Journal.

Parsing grammars using OCaml

I have a task to write a (toy) parser for a (toy) grammar using OCaml and not sure how to start (and proceed with) this problem.
Here's a sample Awk grammar:
type ('nonterm, 'term) symbol = N of 'nonterm | T of 'term;;
type awksub_nonterminals = Expr | Term | Lvalue | Incrop | Binop | Num;;
let awksub_grammar =
(Expr,
function
| Expr ->
[[N Term; N Binop; N Expr];
[N Term]]
| Term ->
[[N Num];
[N Lvalue];
[N Incrop; N Lvalue];
[N Lvalue; N Incrop];
[T"("; N Expr; T")"]]
| Lvalue ->
[[T"$"; N Expr]]
| Incrop ->
[[T"++"];
[T"--"]]
| Binop ->
[[T"+"];
[T"-"]]
| Num ->
[[T"0"]; [T"1"]; [T"2"]; [T"3"]; [T"4"];
[T"5"]; [T"6"]; [T"7"]; [T"8"]; [T"9"]]);;
And here's some fragments to parse:
let frag1 = ["4"; "+"; "3"];;
let frag2 = ["9"; "+"; "$"; "1"; "+"];;
What I'm looking for is a rulelist that is the result of the parsing a fragment, such as this one for frag1 ["4"; "+"; "3"]:
[(Expr, [N Term; N Binop; N Expr]);
(Term, [N Num]);
(Num, [T "3"]);
(Binop, [T "+"]);
(Expr, [N Term]);
(Term, [N Num]);
(Num, [T "4"])]
The restriction is to not use any OCaml libraries other than List... :/
Here is a rough sketch - straightforwardly descend into the grammar and try each branch in order. Possible optimization : tail recursion for single non-terminal in a branch.
exception Backtrack
let parse l =
let rules = snd awksub_grammar in
let rec descend gram l =
let rec loop = function
| [] -> raise Backtrack
| x::xs -> try attempt x l with Backtrack -> loop xs
in
loop (rules gram)
and attempt branch (path,tokens) =
match branch, tokens with
| T x :: branch' , h::tokens' when h = x ->
attempt branch' ((T x :: path),tokens')
| N n :: branch' , _ ->
let (path',tokens) = descend n ((N n :: path),tokens) in
attempt branch' (path', tokens)
| [], _ -> path,tokens
| _, _ -> raise Backtrack
in
let (path,tail) = descend (fst awksub_grammar) ([],l) in
tail, List.rev path
Ok, so the first think you should do is write a lexical analyser. That's the
function that takes the ‘raw’ input, like ["3"; "-"; "("; "4"; "+"; "2"; ")"],
and splits it into a list of tokens (that is, representations of terminal symbols).
You can define a token to be
type token =
| TokInt of int (* an integer *)
| TokBinOp of binop (* a binary operator *)
| TokOParen (* an opening parenthesis *)
| TokCParen (* a closing parenthesis *)
and binop = Plus | Minus
The type of the lexer function would be string list -> token list and the ouput of
lexer ["3"; "-"; "("; "4"; "+"; "2"; ")"]
would be something like
[ TokInt 3; TokBinOp Minus; TokOParen; TokInt 4;
TBinOp Plus; TokInt 2; TokCParen ]
This will make the job of writing the parser easier, because you won't have to
worry about recognising what is a integer, what is an operator, etc.
This is a first, not too difficult step because the tokens are already separated.
All the lexer has to do is identify them.
When this is done, you can write a more realistic lexical analyser, of type string -> token list, that takes a actual raw input, such as "3-(4+2)" and turns it into a token list.
I'm not sure if you specifically require the derivation tree, or if this is a just a first step in parsing. I'm assuming the latter.
You could start by defining the structure of the resulting abstract syntax tree by defining types. It could be something like this:
type expr =
| Operation of term * binop * term
| Term of term
and term =
| Num of num
| Lvalue of expr
| Incrop of incrop * expression
and incrop = Incr | Decr
and binop = Plus | Minus
and num = int
Then I'd implement a recursive descent parser. Of course it would be much nicer if you could use streams combined with the preprocessor camlp4of...
By the way, there's a small example about arithmetic expressions in the OCaml documentation here.

Resources