Writing a parser in Racket #lang plai - parsing

I need to write a parser in racket for this abstract syntax:
;; <DE> ::= <num>
;; | {distribution <num>*}
;; | {uniform <num> <num>}
;; | {+ <DE> <DE>}
;; | {- <DE> <DE>}
;; | {* <DE> <DE>}
;; | {with {<id> <DE>} <DE>}
;; | <id>
(define-type Binding
[binding (name symbol?) (named-expr DE?)])
(define-type DE
[distribution (values (listof number?))]
[id (name symbol?)]
[binop (op procedure?) (lhs DE?) (rhs DE?)]
[with (b Binding?) (body DE?)])
What I have so far is:
(define (parse sexp)
(match sexp
[(? symbol?) (id sexp)]
[sexp (distribution (values (list sexp)))]
[(list '+ l r) (binop + (parse l) (parse r))]
[(list '- l r) (binop - (parse l) (parse r))]
[(list '* l r) (binop * (parse l) (parse r))]))
Note that {uniform a b}, is a discrete uniform distribution from a to b (inclusive). If a > b, then it's empty.
This is not working as expected and I can't get it right. There aren't resources on the web to help me, or at least I couldn't find them.
Is anyone able to explain me where I'm wrong and what the solution could be? I really have no clue. Thanks!

Related

What's causing my OCaml S-expression parser to fail?

I am working on making a Lisp interpreter in OCaml. I naturally started with the front-end. So far I have an S-expression parsing algorithm that works most of the time. For both simple S-expressions like (a b) and ((a b) (c d)) my function, ast_as_str, shows that the output list structure is incorrect. I've documented that below. After trying countless variations on parse nothing seems to work. Does someone adept in writing parsers in OCaml have a suggestion as to how I can fix my code?
type s_expression = Nil | Atom of string | Pair of s_expression * s_expression
let rec parse tokens =
match tokens with
| [] -> Nil
| token :: rest ->
match token with
| "(" -> parse rest
| ")" -> Pair(Nil, parse rest)
| atom -> Pair(Atom atom, parse rest)
let rec ast_as_str ast =
match ast with
| Nil -> "nil"
| Atom a -> Printf.sprintf "%s" a
| Pair(a, b) -> Printf.sprintf "(%s %s)" (ast_as_str a) (ast_as_str b);;
let check_output test = print_endline (ast_as_str (parse test));;
(*
Input:
(a b)
Output:
(a (b (nil nil)))
Almost correct...
*)
check_output ["("; "a"; "b"; ")"];;
(*
Input:
((w x) (y z))
Output:
(w (x (nil (y (z (nil (nil nil)))))))
Incorrect.
*)
check_output ["("; "("; "w"; "x"; ")"; "("; "y"; "z"; ")"; ")"]
I'm going to assume this isn't homework. If it is, I will change the answer to some less specific hints.
A recursive descent parser works by recognizing the beginning token of a construct, then parsing the contents of the construct, then (very often) recognizing the ending token of the construct. S-expressions have just one construct, the parenthesized list. Your parser isn't doing the recognizing of the end of the construct.
If you assume your parser works correctly, then encountering a right parenthesis ) is a syntax error. There shouldn't be any unmatched right parentheses, and matching right parentheses are parsed as part of the parenthesized list construct (as I described above).
If you swear that this is just a personal project I'd be willing to write up a parser. But you should try writing up something as described above.
Note that when you see an atom, you aren't seeing a pair. It's not correct to return Pair (Atom xyz, rest) when seeing an atom.
Update
The way to make things work in a functional setting is to have the parsing functions return not only the construct that they saw, but also the remaining tokens that haven't been parsed yet.
The following code works for your examples and is probably pretty close to correct:
let rec parse tokens =
match tokens with
| [] -> failwith "Syntax error: end of input"
| "(" :: rest ->
(match parselist rest with
| (sexpr, ")" :: rest') -> (sexpr, rest')
| _ -> failwith "Syntax error: unmatched ("
)
| ")" :: _ -> failwith "Syntax error: unmatched )"
| atom :: rest -> (Atom atom, rest)
and parselist tokens =
match tokens with
| [] | ")" :: _ -> (Nil, tokens)
| _ ->
let (sexpr1, rest) = parse tokens in
let (sexpr2, rest') = parselist rest in
(Pair (sexpr1, sexpr2), rest')
You can define check_output like this:
let check_output test =
let (sexpr, toks) = parse test in
if toks <> [] then
Printf.printf "(extra tokens in input)\n";
print_endline (ast_as_str sexpr)
Here's what I see for your two test cases:
# check_output ["("; "a"; "b"; ")"];;
(a (b nil))
- : unit = ()
# check_output ["("; "("; "w"; "x"; ")"; "("; "y"; "z"; ")"; ")"];;
((w (x nil)) ((y (z nil)) nil))
- : unit = ()
I think these are the right results.

Haskell: If-Block Evaluator works correctly within GHCI but not when its evaluated through a parsed expression

I'm trying to write an evaluator for my language that takes a parsed if-block, evaluates all conditions and filters out the False branches and then picks one branch to take at random. I'm at the stage wherein I can filter out the branches and be left with my possible paths but only within GHCI. When I attempt to pass an if-block into my program through the command line, what occurs is if the first branch is True, that's the only branch that is taken, otherwise an empty list is returned regardless if there are any sub-branches or not. My evaluator for binary expressions works so I know that my error is with my logic in my if evaluator or parser.
Here is the behaviour through the command line:
~/Desktop/Olivia > ./Main "if (1 > 3)-> 1 + 1 [] (1 < 2)-> 2 + 2"
[]
~/Desktop/Olivia > ./Main "if (1 < 3)-> 1 + 1 [] (1 < 2)-> 2 + 2"
[2]
Here is the behaviour through GHCI:
λ > b = SubIf (Expr (HInteger 2) Less (HInteger 1)) [(Expr (HInteger 45) Add (HInteger 45)), (Expr (HInteger 6) Add (HInteger 6))]
λ > c = SubIf (Expr (HInteger 2) Less (HInteger 3)) [(Expr (HInteger 5) Add (HInteger 5)), (Expr (HInteger 6) Add (HInteger 6))]
λ > a = If (Expr (HInteger 3) Less (HInteger 2)) [(Expr (HInteger 1) Add (HInteger 100)), (Expr (HInteger 2) Add (HInteger 2))] [b, c]
λ > eval b -- All False branches return a value of 1
1
λ > eval c
[10 12]
λ > eval a
[[10 12]]
This is my expected result. In the case of n number of sub if expressions, I will pick one list of the overall list as my branch to take before I go back to evaluate the loop guard and begin the next iteration of the program.
My data type:
data HVal
= HInteger Integer
| HBool Bool
| HString String
| HList [HVal]
| Expr HVal Op HVal
| EqExpr HVal Op HVal
| Neg HVal
| Assign HVal HVal
| Do HVal [HVal]
| If HVal [HVal] [HVal]
| SubIf HVal [HVal]
| Load String
deriving (Eq, Read)
My Parsers:
parseIf :: Parser HVal
parseIf = do
_ <- string "if"
spaces
_ <- string "("
cond <- (parseExpr <|> parseEqExpr <|> parseBool)
_ <- string ")->"
expr <- spaces *> many (parseExpression <* spaces)
expr' <- spaces *> many (parseExpression <* spaces)
return $ If cond expr expr'
parseSubIf :: Parser HVal
parseSubIf = do
_ <- string "[]"
spaces
_ <- string "("
cond <- (parseExpr <|> parseEqExpr <|> parseBool)
_ <- string ")->"
expr <- spaces *> many (parseExpression <* spaces)
return $ SubIf cond expr
My Evaluator:
eval :: HVal -> HVal
---------- EVALUATING PRIMITIVES ----------
eval val#(HString _) = val
eval val#(HInteger _) = val
eval val#(HBool _) = val
eval val#(HList _) = val
eval (Expr x op y) = evalExpr x op y
eval (If cond expr expr') = evalIf cond expr expr'
eval (SubIf cond expr) = evalSubIf cond expr
evalIf :: HVal -> [HVal] -> [HVal] -> HVal
evalIf cond expr expr' = if ((eval cond) == (HBool True))
then HList $ map eval expr
else HList $ (filter (/= (HInteger 1)) (map eval expr'))
evalSubIf :: HVal -> [HVal] -> HVal
evalSubIf cond expr = if ((eval cond) == (HBool True))
then HList $ map eval expr
else (HInteger 1)
I think that the error may be with my parser for the if-block. My thinking behind it was that the if block contains the conditional for the first branch and what it evaluates to and then a list of sub expressions wherein each element of the list contains the branch conditional and what it evaluates to.

How do parentheses work with custom data types?

Currently, I am working on a problem of parsing and showing expressions in Haskell.
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
This is the code of my data type Expr and this is how i define show function:
instance Show Expr where
show (Val x) = show x
show (Var y) = y
show (p :+: q) = par (show p ++ "+" ++ show q)
show (p :-: q) = par (show p ++ "-" ++ show q)
show (p :/: q) = par (show p ++ "/" ++ show q)
show (p :*: q) = par (show p ++ "*" ++ show q)
show (p :%: q) = par (show p ++ "%" ++ show q)
par :: String -> String
par s = "(" ++ s ++ ")"
Later i tried to transform string input into the expression but i encounter the following problem: I don't understand how parentheses in the second case are implemented in Haskell.
*Main> Val 2 :*:Val 2 :+: Val 3
((2*2)+3)
*Main> Val 2 :*:(Val 2 :+: Val 3)
(2*(2+3))
Because of that, i am a bit confused regarding how should i transform parentheses from my string into the expression. Currently i am using the following function for parsing, but for now, it just ignores parentheses which is not intended behavior:
toExpr :: String -> Expr
toExpr str = f (lexer str) (Val 0)
where
f [] expr = expr
f (c:cs) expr
|isAlpha (head c) = f cs (Var c)
|isDigit (head c) = f cs (Val (read c))
|c == "+" = (expr :+: f cs (Val 0))
|c == "-" = (expr :-: f cs (Val 0))
|c == "/" = (expr :/: f cs (Val 0))
|c == "*" = (expr :*: f cs (Val 0))
|c == "%" = (expr :%: f cs (Val 0))
|otherwise = f cs expr
Edit: few grammar mistakes
I don't understand how parentheses in the second case are implemented in Haskell.
The brackets just give precedence to a certain part of the expression to parse. The problem is not with the parenthesis you render. I think the problem is that you did not assign precedence to your operators. This thus means that, unless you specify brackets, Haskell will consider all operators to have the same precedence, and parse these left-to-right. This thus means that x ⊕ y ⊗ z is parsed as (x ⊕ y) ⊗ z.
You can define the precedence of your :+:, :*, etc. operators with infixl:
infixl 7 :*:, :/:, :%:
infixl 5 :+:, :-:
type Name = String
data Expr = Val Integer
| Var Name
| Expr :+: Expr
| Expr :-: Expr
| Expr :*: Expr
| Expr :/: Expr
| Expr :%: Expr
As for your parser (the toExpr), you will need a parsing mechanism like a LALR parser [wiki] that stores results on a stack, and thus makes proper operations.
This was my final parser which gave me the result I needed. To get the result i wanted proper grammar was added and i wrote a parses according to he grammar.
Thanks, everyone for the help.
{-
parser for the following grammar:
E -> T E'
E' -> + T E' | - T E' | <empty string>
T -> F T'
T' -> * F T' | / F T' | % F T' | <empty string>
F -> (E) | <integer> | <identifier>
-}
parseExpr :: String -> (Expr,[String])
parseExpr tokens = parseE (lexer tokens)
parseE :: [String] -> (Expr,[String])
parseE tokens = parseE' acc rest where (acc,rest) = parseT tokens
parseE' :: Expr -> [String] -> (Expr,[String])
parseE' accepted ("+":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :+: acc) rest
parseE' accepted ("-":tokens) = let (acc,rest) = parseT tokens in parseE' (accepted :-: acc) rest
parseE' accepted tokens = (accepted,tokens)
parseT :: [String] -> (Expr,[String])
parseT tokens = let (acc,rest) = parseF tokens in parseT' acc rest
parseT' :: Expr -> [String] -> (Expr,[String])
parseT' accepted ("*":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :*: acc) rest
parseT' accepted ("/":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :/: acc) rest
parseT' accepted ("%":tokens) = let (acc,rest) = parseF tokens in parseT' (accepted :%: acc) rest
parseT' accepted tokens = (accepted,tokens)
parseF :: [String] -> (Expr,[String])
parseF ("(":tokens) = (e, tail rest) where (e,rest) = parseE tokens
parseF (t:tokens)
| isAlpha (head t) = (Var t,tokens)
| isDigit (head t) = (Val (read t),tokens)
| otherwise = error ""
parseF [] = error ""
lexer :: String -> [String]
lexer [] = []
lexer (c:cs)
| elem c " \t\n" = lexer cs
| elem c "=+-*/%()" = [c]:(lexer cs)
| isAlpha c = (c:takeWhile isAlpha cs):lexer(dropWhile isAlpha cs)
| isDigit c = (c:takeWhile isDigit cs):lexer(dropWhile isDigit cs)
| otherwise = error ""

Parsing an Indexed Type

I just started exploring the possibilities of data types à la carte in combination with indexed types. My current experiment is a bit too large to include here, but can be found here. My example is mixing together an expression from different ingredients (arithmetic, functions, ...). The goal is to enforce only well-typed expressions. That is why an index is added to the expressions (the Sort type).
I can build expressions like:
-- define expressions over variables and arithmetic (+, *, numeric constants)
type Lia = IFix (VarF :+: ArithmeticF)
-- expression of integer type/sort
t :: Lia IntegralSort
t = var "c" .+. cnst 1
This is all good as long as I construct only fixed (static) expressions.
Is there a way to read an expression from string/other representation (that obviously has to encode the sort) and produce a dynamic value that gets represented by these functors?
For example, I would like to read ((c : Int) + (1 : Int)) and represent it somehow with VarF and ArithmeticF. Here I realize I cannot obtain a value of static type Lia IntegralSort. But suppose I have in addition:
data EqualityF a where
Equals :: forall s. a s -> a s -> EqualityF a BoolSort
I could expect there being a function that can read String into Maybe (IFix (EqualityF :+: VarF :+: ...)). Such a function would attempt to build representations for the LHS and RHS and if the sorts matched it could produce a result of statically known type IFix (EqualityF :+: ...) BoolSort. The problem is that the representation of LHS (and RHS) has no fixed static sort. Is what I am trying to do impossible with this representation I chose?
(.=.) :: EqualityF :<: f => IFix f s -> IFix f s -> IFix f BoolSort
(.=.) a b = inject (Equals a b)
You can use a GADT to hide the sort, allowing you to return values of sorts depending on the input. Pattern matching then allows you to recover the sort.
data Expr (f :: (Sort -> *) -> (Sort -> *)) where
BoolExpr :: IFix f BoolSort -> Expr f
IntExpr :: IFix f IntegralSort -> Expr f
Here is a simplistic parser of postfix expressions involving + and =.
parse :: (EqualityF :<: f, ArithmeticF :<: f) => String -> [Expr f] -> Maybe (Expr f)
parse (c : s) stack | isDigit c =
parse s (IntExpr (cnst (digitToInt c)) : stack)
parse ('+' : s) (IntExpr e1 : IntExpr e2 : stack) =
parse s (IntExpr (e1 .+. e2) : stack)
parse ('=' : s) (IntExpr e1 : IntExpr e2 : stack) =
parse s (BoolExpr (e1 .=. e2) : stack)
parse ('=' : s) (BoolExpr e1 : BoolExpr e2 : stack) =
parse s (BoolExpr (e1 .=. e2) : stack)
parse [] [e] = Just e
parse _ _ = Nothing
You might not like the duplicate cases for =. A more general framework is Typeable, allowing you to just test for the type equalities you need.
data SomeExpr (f :: (Sort -> *) -> Sort -> *) where
SomeExpr :: Typeable s => IFix f s -> SomeExpr f
parseSome :: forall f. (EqualityF :<: f, ArithmeticF :<: f) => String -> [SomeExpr f] -> Maybe (Expr f)
parseSome (c : s) stack | isDigit c =
parseSome s (SomeExpr (cnst (digitToInt c)) : stack)
parseSome ('+' : s) (SomeExpr e1 : SomeExpr e2 : stack) = do
e1 <- gcast e1
e2 <- gcast e2
parseSome s (SomeExpr (e1 .+. e2) : stack)
parseSome ('=' : s) (SomeExpr (e1 :: IFix f s1) : SomeExpr (e2 :: IFix f s2) : stack) = do
Refl <- eqT :: Maybe (s1 :~: s2)
parseSome s (SomeExpr (e1 .=. e2) : stack)
parseSome [] [e] = Just e
parseSome _ _ = Nothing
Edit
To parse sorts, you want to track them at the type level. Again, use an existential type.
data SomeSort where
SomeSort :: Typeable (s :: Sort) => proxy s -> SomeSort
You can construct the sort of arrays this way:
-- \i e -> array i e
arraySort :: SomeSort -> SomeSort -> SomeSort
arraySort (SomeSort (Proxy :: Proxy i)) (SomeSort (Proxy :: Proxy e)) =
SomeSort (Proxy :: Proxy (ArraySort i e))
A potential problem with Typeable here is that it only allows you to test equality of types, when you may want only to check the head constructor: you can't ask "is this type an ArraySort?", but only "is this type equal to ArraySort IntSort BoolSort?" or some other full type.
In that case you need a GADT that reflects the structure of a sort.
-- "Singleton type"
data SSort (s :: Sort) where
SIntSort :: SSort IntSort
SBoolSort :: SSort BoolSort
SArraySort :: SSort i -> SSort e -> SSort (ArraySort i e)
data SomeSort where
SomeSort :: SSort s -> SomeSort
array :: SomeSort -> SomeSort -> SomeSort
array (SomeSort i) (SomeSort e) = SomeSort (SArraySort i e)
The singleton package provides various facilities for defining and working with these singleton types, though it may be overkill for your use case.

Parsing non binary operators with Parsec

Traditionally, arithmetic operators are considered to be binary (left or right associative), thus most tools are dealing only with binary operators.
Is there an easy way to parse arithmetic operators with Parsec, which can have an arbitrary number of arguments?
For example, the following expression should be parsed into the tree
(a + b) + c + d * e + f
Yes! The key is to first solve a simpler problem, which is to model + and * as tree nodes with only two children. To add four things, we'll just use + three times.
This is a great problem to solve since there's a Text.Parsec.Expr module for just this problem. Your example is actually parseable by the example code in the documentation. I've slightly simplified it here:
module Lib where
import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
data Expr =
Identifier String
| Multiply Expr Expr
| Add Expr Expr
instance Show Expr where
show (Identifier s) = s
show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell
-- Infix parser.
infix_ operator func =
Expr.Infix (reserved operator >> return func) Expr.AssocLeft
parser =
Expr.buildExpressionParser table term <?> "expression"
where
table = [[infix_ "*" Multiply], [infix_ "+" Add]]
term =
parens parser
<|> (Identifier <$> identifier)
<?> "term"
Running this in GHCi:
λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)
There are lots of ways of converting this tree to the desired form. Here's a hacky gross slow one:
data Expr' =
Identifier' String
| Add' [Expr']
| Multiply' [Expr']
deriving (Show)
collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e#(Add l r) f =
collect l f ++ collect r f
collect e#(Multiply l r) f =
collect l f ++ collect r f
isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False
isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False
optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e#(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e#(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))
I will note, however, that almost always Expr is Good Enough™ for the purposes of a parser or compiler.

Resources