Expression Evaluation using combinators in Haskell - parsing

I'm trying to make an expression evaluator in Hakell:
data Parser i o
= Success o [i]
| Failure String [i]
| Parser
{parse :: [i] -> Parser i o}
data Operator = Add | Sub | Mul | Div | Pow
data Expr
= Op Operator Expr Expr
| Val Double
expr :: Parser Char Expr
expr = add_sub
where
add_sub = calc Add '+' mul_div <|> calc Sub '-' mul_div <|> mul_div
mul_div = calc Mul '*' pow <|> calc Div '/' pow <|> pow
pow = calc Pow '^' factor <|> factor
factor = parens <|> val
val = Val <$> parseDouble
parens = parseChar '(' *> expr <* parseChar ')'
calc c o p = Op c <$> (p <* parseChar o) <*> p
My problem is that when I try to evaluate an expression with two operators with same priority (e.g. 1+1-1) the parser will fail.
How can I say that an add_sub can be an operation between two other add_subs without creating an infinite loop?

As explained by #chi the problem is that calc was using p twice which doesn't allow for patterns like muldiv + .... | muldiv - ... | ...
I just changed the definition of calc to :
calc c o p p2 = Op c <$> (p <* parseChar o) <*> p2
where p2 is the current priority (mul_div in the definition of mul_div)
it works much better but the order of calulations is backwards:
2/3/4 is parsed as 2/(3/4) instead of (2/3)/4

Related

How to parse a bool expression in Haskell

I am trying to parse a bool expression in Haskell. This line is giving me an error: BoolExpr <$> parseBoolOp <*> (n : ns). This is the error:
• Couldn't match type ‘[]’ with ‘Parser’
Expected type: Parser [Expr]
Actual type: [Expr]
-- define the expression types
data Expr
= BoolExpr BoolOp [Expr]
deriving (Show, Eq)
-- define the type for bool value
data Value
= BoolVal Bool
deriving (Show, Eq)
-- many x = Parser.some x <|> pure []
-- some x = (:) <$> x <*> Parser.many x
kstar :: Alternative f => f a -> f [a]
kstar x = kplus x <|> pure []
kplus :: Alternative f => f a -> f [a]
kplus x = (:) <$> x <*> kstar x
symbol :: String -> Parser String
symbol xs = token (string xs)
-- a bool expression is the operator followed by one or more expressions that we have to parse
-- TODO: add bool expressions
boolExpr :: Parser Expr
boolExpr = do
n <- parseExpr
ns <- kstar (symbol "," >> parseExpr)
BoolExpr <$> parseBoolOp <*> (n : ns)
-- an atom is a literalExpr, which can be an actual literal or some other things
parseAtom :: Parser Expr
parseAtom =
do
literalExpr
-- the main parsing function which alternates between all the options you have
parseExpr :: Parser Expr
parseExpr =
do
parseAtom
<|> parseParens boolExpr
<|> parseParens parseExpr
-- implement parsing bool operations, these are 'and' and 'or'
parseBoolOp :: Parser BoolOp
parseBoolOp =
do symbol "and" >> return And
<|> do symbol "or" >> return Or
The boolExpr is expecting a Parser [Expr] but I am returning only an [Expr]. Is there a way to fix this or do it in another way? When I try pure (n:ns), evalStr "(and true (and false true) true)" returns Left (ParseError "'a' didn't match expected character") instead of Right (BoolVal False)
The expression (n : ns) is a list. Therefore the compiler thinks that the applicative operators <*> and <$> should be used in the context [], while you want Parser instead.
I would guess you need pure (n : ns) instead.

Best ADT representation of AST

I have the following grammar for expressions that I'm trying to represent as a Haskell ADT:
Expr = SimpleExpr [OPrelation SimpleExpr]
SimpleExpr = [OPunary] Term {OPadd Term}
Term = Factor {OPmult Factor}
where:
{} means 0 or more
[] means 0 or 1
OPmult, OPadd, OPrelation, OPunary are classes of operators
Note that this grammar does get precedence right.
Here's something I tried:
data Expr = Expr SimpleExpr (Maybe OPrelation) (Maybe SimpleExpr)
data SimpleExpr = SimpleExpr (Maybe OPunary) Term [OPadd] [Term]
data Term = Term Factor [OPmult] [Factor]
which in hindsight I think is awful, especially the [OPadd] [Term] and [OPmult] [Factor] parts. Because, for example, in the parse tree for 1+2+3 it would put [+, +] in one branch and [2, 3] in another, meaning they're decoupled.
What would be a good representation that'll play nice later in the next stages of compilation?
Decomposing { } and [ ] into more data types seems like an overkill
Using lists seems not quite right as it would no longer be a tree (Just a node that's a list)
Maybe for { }. A good idea ?
And finally, I'm assuming after parsing I'll have to pass over the Parse Tree and reduce it to an AST? or should the whole grammar be modified to be less complex? or maybe it's abstract enough?
The AST does not need to be that close to the grammar. The grammar is structured into multiple levels to encode precedence and uses repetition to avoid left-recursion while still being able to correctly handle left-associative operators. The AST does not need to worry about such things.
Instead I'd define the AST like this:
data Expr = BinaryOperation BinaryOperator Expr Expr
| UnaryOperation UnaryOperator Expr
| Literal LiteralValue
| Variable Id
data BinaryOperator = Add | Sub | Mul | Div
data UnaryOperator = Not | Negate
Here's an additional answer that might help you. I don't want to spoil your fun, so here's a very simple example grammar:
-- Expr = Term ['+' Term]
-- Term = Factor ['*' Factor]
-- Factor = number | '(' Expr ')'
-- number = one or more digits
Using a CST
As one approach, we can represent this grammar as a concrete syntax tree (CST):
data Expr = TermE Term | PlusE Term Term deriving (Show)
data Term = FactorT Factor | TimesT Factor Factor deriving (Show)
data Factor = NumberF Int | ParenF Expr deriving (Show)
A Parsec-based parser to turn the concrete syntax into a CST might look like this:
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
with helper functions for whitespace processing:
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
and main entry point:
parseExpr :: String -> Expr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
after which we can run:
> parseExpr "1+1*(3+4)"
PlusE (FactorT (Number 1)) (TimesT (Number 1) (ParenF (PlusE
(FactorT (Number 3)) (FactorT (Number 4)))))
>
To convert this into the following AST:
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
we could write:
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
To interpret the AST, we could use:
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
and then write:
calc :: String -> Int
calc = interp . aexpr . parseExpr
after which we have a crude little calculator:
> calc "1 + 2 * (6 + 3)"
19
>
Skipping the CST
As an alternative approach, we could replace the parser with one that parses directly into an AST of type AExpr:
expr :: Parser AExpr
expr = do
t1 <- term
(PlusA t1 <$ symbol "+" <*> term)
<|> pure t1
term :: Parser AExpr
term = do
f1 <- factor
(TimesA f1 <$ symbol "*" <*> factor)
<|> pure f1
factor :: Parser AExpr
factor = NumberA . read <$> lexeme (many1 (satisfy isDigit))
<|> between (symbol "(") (symbol ")") expr
You can see how little the structure of these parsers changes. All that's disappeared is the distinction between expressions, terms, and factors at the type level, and constructors like TermE, FactorT, and ParenF whose only purpose is to allow embedding of these types within each other.
In more complex scenarios, the CST and AST parsers might exhibit bigger differences. (For example, in a grammar that allowed 1 + 2 + 3, this might be represented as a single constructor data Expr = ... | PlusE [Term] | ... in the CST but with a nested series of binary PlusA constructors in the same AExpr AST type as above.)
After redefining parseExpr to return an AExpr and dropping the aexpr step from calc, everything else stays the same, and we still have:
> calc "1 + 2 * (6 + 3)"
19
>
Programs for Reference
Here's the full program using an intermediate CST:
-- Calc1.hs, using a CST
{-# OPTIONS_GHC -Wall #-}
module Calc1 where
import Data.Char
import Text.Parsec
import Text.Parsec.String
data Expr = TermE Term | PlusE Term Term deriving (Show)
data Term = FactorT Factor | TimesT Factor Factor deriving (Show)
data Factor = NumberF Int | ParenF Expr deriving (Show)
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
parseExpr :: String -> Expr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
calc :: String -> Int
calc = interp . aexpr . parseExpr
and here's the full program for the more traditional solution that skips an explicit CST representation:
-- Calc2.hs, with direct parsing to AST
{-# OPTIONS_GHC -Wall #-}
module Calc where
import Data.Char
import Text.Parsec
import Text.Parsec.String
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
expr :: Parser AExpr
expr = do
t1 <- term
(PlusA t1 <$ symbol "+" <*> term)
<|> pure t1
term :: Parser AExpr
term = do
f1 <- factor
(TimesA f1 <$ symbol "*" <*> factor)
<|> pure f1
factor :: Parser AExpr
factor = NumberA . read <$> lexeme (many1 (satisfy isDigit))
<|> between (symbol "(") (symbol ")") expr
parseExpr :: String -> AExpr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
calc :: String -> Int
calc = interp . parseExpr
Okay so Buhr's answer is quite nice. Here's how I did though (no CST) inspired by sepp2k's response:
The AST:
data OP = OPplus | OPminus | OPstar | OPdiv
| OPidiv | OPmod | OPand | OPeq | OPneq
| OPless | OPgreater | OPle | OPge
| OPin | OPor
data Expr =
Relation Expr OP Expr -- > < == >= etc..
| Unary OP Expr -- + -
| Mult Expr OP Expr -- * / div mod and
| Add Expr OP Expr -- + - or
| FactorInt Int | FactorReal Double
| FactorStr String
| FactorTrue | FactorFalse
| FactorNil
| FactorDesig Designator -- identifiers
| FactorNot Expr
| FactorFuncCall FuncCall deriving (Show)
The parsers:
parseExpr :: Parser Expr
parseExpr = (try $ Relation <$>
parseSimpleExpr <*> parseOPrelation <*> parseSimpleExpr)
<|> parseSimpleExpr
parseSimpleExpr :: Parser Expr
parseSimpleExpr = (try simpleAdd)
<|> (try $ Unary <$> parseOPunary <*> simpleAdd)
<|> (try $ Unary <$> parseOPunary <*> parseSimpleExpr)
<|> parseTerm
where simpleAdd = Add <$> parseTerm <*> parseOPadd <*> parseSimpleExpr
parseTerm :: Parser Expr
parseTerm = (try $ Mult <$>
parseFactor <*> parseOPmult <*> parseTerm)
<|> parseFactor
parseFactor :: Parser Expr
parseFactor =
(parseKWnot >> FactorNot <$> parseFactor)
<|> (exactTok "true" >> return FactorTrue)
<|> (exactTok "false" >> return FactorFalse)
<|> (parseNumber)
<|> (FactorStr <$> parseString)
<|> (betweenCharTok '(' ')' parseExpr)
<|> (FactorDesig <$> parseDesignator)
<|> (FactorFuncCall <$> parseFuncCall)
I didn't include basic parsers like parseOPadd as those are what you'd expect and are easy to build.
I still parsed according to the grammar but tweaked it slightly to match my AST.
You could check out the full source which is a compiler for Pascal here.

Parsing an expression grammar having function application with parser combinators (left-recursion)

As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:
integer numbers
identifiers ([a-zA-Z]+)
arithmetic expressions with + and * and parenthesis
structure access with . (eg foo.bar.buz)
tuples (eg (1, foo, bar.buz)) (to remove ambiguity one-tuples are written as (x,))
function application (eg foo(1, bar, buz()))
functions are first class so they can also be returned from other functions and directly be applied (eg foo()() is legal because foo() might return a function)
So a fairly complex program in this language is
(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)
the associativity is supposed to be
( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )
I'm currently using the very nice uu-parsinglib an applicative parser combinator library.
The first problem was obviously that the intuitive expression grammar (expr -> identifier | number | expr * expr | expr + expr | (expr) is left-recursive. But I could solve that problem using the the pChainl combinator (see parseExpr in the example below).
The remaining problem (hence this question) is function application with functions returned from other functions (f()()). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list ). Any ideas how I can solve this problem elegantly using uu-parsinglib? (the problem should directly apply to parsec, attoparsec and other parser combinators as well I guess).
See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
module TestExprGrammar
(
) where
import Data.Foldable (asum)
import Data.List (intercalate)
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
data Node =
NumberLiteral Integer
| Identifier String
| Tuple [Node]
| MemberAccess Node Node
| FunctionCall Node [Node]
| BinaryOperation String Node Node
parseFunctionCall :: Parser Node
parseFunctionCall =
FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> parseParenthesisedNodeList 0
operators :: [[(Char, Node -> Node -> Node)]]
operators = [ [('+', BinaryOperation "+")]
, [('*' , BinaryOperation "*")]
, [('.', MemberAccess)]
]
samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]
parseExpr :: Parser Node
parseExpr =
foldr pChainl
(parseIdentifier
<|> parseNumber
<|> parseTuple
<|> parseFunctionCall
<|> pParens parseExpr
)
(map samePrio operators)
parseNodeList :: Int -> Parser [Node]
parseNodeList n =
case n of
_ | n < 0 -> parseNodeList 0
0 -> pListSep (pSymbol ",") parseExpr
n -> (:) <$>
parseExpr
<* pSymbol ","
<*> parseNodeList (n-1)
parseParenthesisedNodeList :: Int -> Parser [Node]
parseParenthesisedNodeList n = pParens (parseNodeList n)
parseIdentifier :: Parser Node
parseIdentifier = Identifier <$> pSome pLetter <* pSpaces
parseNumber :: Parser Node
parseNumber = NumberLiteral <$> pNatural
parseTuple :: Parser Node
parseTuple =
Tuple <$> parseParenthesisedNodeList 1
<|> Tuple [] <$ pSymbol "()"
instance Show Node where
show n =
let showNodeList ns = intercalate ", " (map show ns)
showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
in case n of
Identifier i -> i
Tuple ns -> showParenthesisedNodeList ns
NumberLiteral n -> show n
FunctionCall f args -> show f ++ showParenthesisedNodeList args
MemberAccess f g -> show f ++ "." ++ show g
BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"
Looking briefly at the list-like combinators for uu-parsinglib (I'm more familiar with parsec), I think you can solve this by folding over the result of the pSome combinator:
parseFunctionCall :: Parser Node
parseFunctionCall =
foldl' FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> pSome (parseParenthesisedNodeList 0)
This is also equivalent to the Alternative some combinator, which should indeed apply to the other parsing libs you mentioned.
I don't know this library but can show you how to remove left recursion. The standard right recursive expression grammar is
E -> T E'
E' -> + TE' | eps
T -> F T'
T' -> * FT' | eps
F -> NUMBER | ID | ( E )
To add function application you must decide its level of precedence. In most languages I've seen it is highest. So you'd add another layer of productions for function application.
E -> T E'
E' -> + TE' | eps
T -> AT'
T' -> * A T' | eps
A -> F A'
A' -> ( E ) A' | eps
F -> NUMBER | ID | ( E )
Yes this is a hairy-looking grammar and bigger than the left recursive one. That's the price of top-down predictive parsing. If you want simpler grammars use a bottom up parser generator a la yacc.

How do I create a Parser data?

I am trying to learn how can I do a parser for expressions in Haskell and I found this code (below), but I don't even know how to use it.
I tried with: expr (Add (Num 5) (Num 2)) , but it needs a "Parser" data type.
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Expr
import Text.Parsec.Token
import Text.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr | Sub Expr Expr | Mul Expr Expr | Div Expr Expr deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc = Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
Solution: readExpr input = parse expr "name for error messages" input
and use readExpr.
You can use the function parse which will run a Parser on an input string and return an Either ParseError Expr. I put a simple usage below where I turn that ParseError into a string and pass it along
readExpr :: String -> Either String Expr
readExpr input = case parse expr "name for error messages" input of
Left err -> Left $ "Oh noes parsers are failing: " ++ show err -- Handle error
Right a -> Right a -- Handle success
There are a few other functions, such as parseFromFile, which let you shorthand a few common patterns, to find them, check out the parsec haddock

Parsing function in haskell

I'm new to Haskell and I am trying to parse expressions. I found out about Parsec and I also found some articles but I don't seem to understand what I have to do. My problem is that I want to give an expression like "x^2+2*x+3" and the result to be a function that takes an argument x and returns a value. I am very sorry if this is an easy question but I really need some help. Thanks! The code I inserted is from the article that you can find on this link.
import Control.Monad(liftM)
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Token
import Text.ParserCombinators.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr
| Sub Expr Expr | Mul Expr Expr | Div Expr Expr
| Pow Expr Expr
deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "^" Pow AssocRight],
[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc
= Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
This is just a parser for expressions with variables. Actually interpreting the expression is an entirely separate matter.
You should create a function that takes an already parsed expression and values for variables, and returns the result of evaluating the expression. Pseudocode:
evaluate :: Expr -> Map String Int -> Int
evaluate (Num n) _ = n
evaluate (Var x) vars = {- Look up the value of x in vars -}
evaluate (Plus e f) vars = {- Evaluate e and f, and return their sum -}
...
I've deliberately omitted some details; hopefully by exploring the missing parts, you learn more about Haskell.
As a next step, you should probably look at the Reader monad for a convenient way to pass the variable map vars around, and using Maybe or Error to signal errors, e.g. referencing a variable that is not bound in vars, or division by zero.

Resources