This question already has an answer here:
Parse expression right-to-left
(1 answer)
Closed 3 years ago.
I'm having a hard time defining a left-to-right arithmetic expression parser in Haskell. What I have done so far is defining a right-to-left parser, following the "G. Hutton, Programming in Haskell" book.
-- the aexpr integer parser
aexpr_int :: Parser Int
aexpr_int = do
a1 <- aterm_int
s <- sign -- return 1 for "+" and -1 for "-"
a2 <- aexpr_int
return (a1 + (s * a2))
<|>
aterm_int
-- the aterm integer parser
aterm_int :: Parser Int
aterm_int = do
a1 <- aterm_int
char '*'
a2 <- afactor_int
return (a1 * a2);
<|>
do
a1 <- afactor_int
char '/'
a2 <- aterm_int
return (div a1 a2)
<|>
afactor_int
-- afactor_int
afactor_int :: Parser Int
afactor_int = do
token (char '(')
e <- aexpr_int
token (char ')')
return e
<|>
do
s <- sign
ic <- aexpr_int
return (s * ic)
<|>
token int_const
So this parses 1 - 2 - 3 - 4 as 1 - (2 - (3 - 4)), but I want it to parse as ((1 - 2) - 3) - 4. How can I reach this result?
The trick here is to define an aexpr_int parser that first parses an aterm_int, and then recursively, using a helper function on an accumulating expression, checks for multiple additional occurrences of sign >> aterm_int, adding the additional terms to the accumulator. It might look something like this:
-- the aexpr integer parser
aexpr_int :: Parser Int
aexpr_int = do
a1 <- aterm_int
go a1
where go expr =
do
s <- sign
a2 <- aterm_int
go (expr + (s * a2))
<|> return expr
Together with the following:
-- the aterm integer parser
aterm_int :: Parser Int
aterm_int = do
a1 <- afactor_int
go a1
where go expr =
do
char '*'
a2 <- afactor_int
go (expr * a2)
<|>
do
char '/'
a2 <- afactor_int
go (div expr a2)
<|> return expr
-- afactor_int
afactor_int :: Parser Int
afactor_int = do
token (char '(')
e <- aexpr_int
token (char ')')
return e
<|>
do
s <- sign
ic <- afactor_int
return (s * ic)
<|>
token int_const
this seems to work correctly:
> parseTest aexpr_int "1-2-3-4"
-8
Note that, if you're using a real parser library instead of trying to write your own parser code for learning purposes, you'll want to use the library's built-in expression parser or combinators with names like chainl or sepBy to accomplish this.
Related
For the MVE code below it outputs [] rather than the expected Not (Oper Eq 2 2)) for the input parseString "2+2" which is supposed to call pOper. My guess is that pOper would expect three arguments for the anonymous function to work. That is 3 strings. However due to partial call of a function only one argument is passed. Is there a way to work around to preserve the type signature of pOper while dealing with the Not and at the same time not changing the type definitions?
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
data Value =
IntVal Int
deriving (Eq, Show, Read)
data Exp =
Const Value
| Oper Op Exp Exp
| Not Exp
deriving (Eq, Show, Read)
data Op = Plus | Minus | Eq
deriving (Eq, Show, Read)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
space1 :: Parser String
space1 = many1 space
symbol :: String -> Parser String
symbol = token . string
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr :: Parser Exp
pExpr = chainl1 pTerm pOper
pTerm :: Parser Exp
pTerm =
(do
pv <- numConst
skipSpaces
return pv)
numConst :: Parser Exp
numConst =
(do
skipSpaces
y <- munch isDigit
return (Const (IntVal (read y)))
)
-- Parser for an operator
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = symbol "+" >> return (Oper Plus)
<|> (symbol "-" >> return (Oper Minus))
<|> (symbol "=" >> return (Oper Eq))
<|> (symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2)))
There's nothing wrong with your parser for !=. Rather, your parser for operators in general is broken: it only parses the first operator correctly. A simpler version of your pOper would be
pOper = a >> b
<|> (c >> d)
But because of precedence, this isn't the same as (a >> b) <|> (c >> d). Actually, it's a >> (b <|> (c >> d))! So the symbol your first alternative parses is accidentally mandatory. It would parse 2+!=2 instead.
So, you could fix this by just adding in the missing parentheses. But if, like me, you find it a little tacky to rely so much on operator precedence for semantic meaning, consider something that's more obviously safe, using the type system to separate the clauses from the delimiters:
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = asum [ symbol "+" >> return (Oper Plus)
, symbol "-" >> return (Oper Minus)
, symbol "=" >> return (Oper Eq)
, symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2))
]
This way, you have a list of independent parsers, not a single parser built with alternation. asum (from Control.Applicative) does the work of combining that list into alternatives. It means the same thing, of course, but it means you don't have to learn any operator precedence tables, because , can only be a list item separator.
The best way I can think of to solve the problem is by creating these to modificatoins: 1) this alternative in the expression
pExpr :: Parser Exp
pExpr =
(do pv <- chainl1 pTerm pOper
pv2 <- pOper2 pv
return pv2)
<|> chainl1 pTerm pOper
And 2) this helper function to deal with infix patterns
pOper2 :: Exp -> Parser Exp
pOper2 e1 = (do
symbol "!="
e2 <- numConst
return (Not (Oper Eq e1 e2)))
This is the output, althought I don't know if there will be problems if other operations such as / and * which has different associativety are to be taken into account as well.
parseString "2+4+6"
[(Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6)),"")]
ghci> parseString "2+4+6 != 2"
[(Not (Oper Eq (Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6))) (Const (IntVal 2))),"")]
ghci> parseString "2 != 4"
[(Not (Oper Eq (Const (IntVal 2)) (Const (IntVal 4))),"")]
I am trying to parse a bool expression in Haskell. This line is giving me an error: BoolExpr <$> parseBoolOp <*> (n : ns). This is the error:
• Couldn't match type ‘[]’ with ‘Parser’
Expected type: Parser [Expr]
Actual type: [Expr]
-- define the expression types
data Expr
= BoolExpr BoolOp [Expr]
deriving (Show, Eq)
-- define the type for bool value
data Value
= BoolVal Bool
deriving (Show, Eq)
-- many x = Parser.some x <|> pure []
-- some x = (:) <$> x <*> Parser.many x
kstar :: Alternative f => f a -> f [a]
kstar x = kplus x <|> pure []
kplus :: Alternative f => f a -> f [a]
kplus x = (:) <$> x <*> kstar x
symbol :: String -> Parser String
symbol xs = token (string xs)
-- a bool expression is the operator followed by one or more expressions that we have to parse
-- TODO: add bool expressions
boolExpr :: Parser Expr
boolExpr = do
n <- parseExpr
ns <- kstar (symbol "," >> parseExpr)
BoolExpr <$> parseBoolOp <*> (n : ns)
-- an atom is a literalExpr, which can be an actual literal or some other things
parseAtom :: Parser Expr
parseAtom =
do
literalExpr
-- the main parsing function which alternates between all the options you have
parseExpr :: Parser Expr
parseExpr =
do
parseAtom
<|> parseParens boolExpr
<|> parseParens parseExpr
-- implement parsing bool operations, these are 'and' and 'or'
parseBoolOp :: Parser BoolOp
parseBoolOp =
do symbol "and" >> return And
<|> do symbol "or" >> return Or
The boolExpr is expecting a Parser [Expr] but I am returning only an [Expr]. Is there a way to fix this or do it in another way? When I try pure (n:ns), evalStr "(and true (and false true) true)" returns Left (ParseError "'a' didn't match expected character") instead of Right (BoolVal False)
The expression (n : ns) is a list. Therefore the compiler thinks that the applicative operators <*> and <$> should be used in the context [], while you want Parser instead.
I would guess you need pure (n : ns) instead.
It is true that Parsec has chainl and chainr to parse chains of either left-associative or right-associative operations (i.e. a -> a -> a). So I could quite easily parse something like x + y + z in a ((a + y) + z) or (a + (y + z)) manner.
However,
there is no standard way to parse a -> b -> c functions and specific case when a = b: a -> a -> c, for example a = b = c thought as a comparison function (a -> a -> Bool);
there is no standard way to implement "importance" of an operation: for example a + b = b + a should be parsed as ((a + b) = (b + a)) and not (((a + b) = b) + a)).
I am kind of new to parsing problems, so it would be great to get answers for both questions.
Okay, here's a long answer that might help. First, these are the imports I'm using, if you want to follow along:
{-# LANGUAGE FlexibleContexts #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Applicative (some)
import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.String
Why a -> a -> a isn't so bad...
The operator type signature a -> a -> a is less restrictive and makes more sense than you might at first think. One key point is that usually when we're parsing expressions, we don't write a parser to evaluate them directly but rather parse them into some intermediate abstract syntax tree (AST) that is later evaluated. For example, consider a simple untyped AST with addition, subtraction, equality, and boolean connectives:
data Expr
= IntE Int -- integer literals
| FalseE | TrueE -- boolean literals (F, T)
| AddE Expr Expr -- x + y
| SubE Expr Expr -- x - y
| EqE Expr Expr -- x = y
| OrE Expr Expr -- x | y
| AndE Expr Expr -- x & y
deriving (Show)
If we want to write a parser to treat all these operators as left associative at the same precedence level, we can write a chainl-based parser like so. (For simplicity, this parser doesn't permit whitespace.)
expr :: Parser Expr
expr = chainl1 term op
where op = AddE <$ char '+'
<|> SubE <$ char '-'
<|> EqE <$ char '='
<|> OrE <$ char '|'
<|> AndE <$ char '&'
term :: Parser Expr
term = IntE . read <$> some digit
<|> FalseE <$ char 'F' <|> TrueE <$ char 'T'
<|> parens expr
parens :: Parser a -> Parser a
parens = between (char '(') (char ')')
and we get:
> parseTest expr "1+2+3"
AddE (AddE (IntE 1) (IntE 2)) (IntE 3)
> parseTest expr "1=2=F"
EqE (EqE (IntE 1) (IntE 2)) FalseE
>
We'd then leave it up to the interpreter to deal with the types (i.e., to type check the program):
data Value = BoolV Bool | IntV Int deriving (Eq, Show)
eval :: Expr -> Value
eval (IntE x) = IntV x
eval FalseE = BoolV False
eval TrueE = BoolV True
eval (AddE e1 e2)
= let IntV v1 = eval e1 -- pattern match ensures right type
IntV v2 = eval e2
in IntV (v1 + v2)
eval (SubE e1 e2)
= let IntV v1 = eval e1
IntV v2 = eval e2
in IntV (v1 - v2)
eval (EqE e1 e2) = BoolV (eval e1 == eval e2) -- equal if same type and value
eval (OrE e1 e2)
= let BoolV v1 = eval e1
BoolV v2 = eval e2
in BoolV (v1 || v2)
eval (AndE e1 e2)
= let BoolV v1 = eval e1
BoolV v2 = eval e2
in BoolV (v1 && v2)
evalExpr :: String -> Value
evalExpr str = let Right e = parse expr "<evalExpr>" str in eval e
giving:
> evalExpr "1+2+3"
IntV 6
> evalExpr "1=2=F"
BoolV True
>
Note that even though the type of the "=" operator is something like Eq a => a -> a -> Bool (or actually a -> b -> Bool, as we allow comparison of unequal types), it's represented in the AST as the constructor EqE of type Expr -> Expr -> Expr, so the a -> a -> a type makes sense.
Even if we were to combine the parser and evaluator above into a single function, we'd probably find it easiest to use a dynamic Value type, so all operators would be of type Value -> Value -> Value which fits into the a -> a -> a pattern:
expr' :: Parser Value
expr' = chainl1 term' op
where op = add <$ char '+'
<|> sub <$ char '-'
<|> eq <$ char '='
<|> or <$ char '|'
<|> and <$ char '&'
add (IntV x) (IntV y) = IntV $ x + y
sub (IntV x) (IntV y) = IntV $ x - y
eq v1 v2 = BoolV $ v1 == v2
or (BoolV x) (BoolV y) = BoolV $ x || y
and (BoolV x) (BoolV y) = BoolV $ x && y
term' :: Parser Value
term' = IntV . read <$> some digit
<|> BoolV False <$ char 'F' <|> BoolV True <$ char 'T'
<|> parens expr'
This works too, with the parser directly evaluating the expression
> parseTest expr' "1+2+3"
IntV 6
> parseTest expr' "1=2=F"
BoolV True
>
You may find this use of dynamic typing during parsing and evaluation a little unsatifactory, but see below.
Operator Precedence
The standard way of adding operator precedence is to define multiple expression "levels" that work with a subset of the operators. If we want a precedence ordering from highest to lowest of addition/subtraction, then equality, then boolean "and", then boolean "or", we could replace expr' with the following. Note that each chainl1 call uses as "terms" the next (higher-precedence) expression level:
expr0 :: Parser Value
expr0 = chainl1 expr1 op
where op = or <$ char '|'
or (BoolV x) (BoolV y) = BoolV $ x || y
expr1 :: Parser Value
expr1 = chainl1 expr2 op
where op = and <$ char '&'
and (BoolV x) (BoolV y) = BoolV $ x && y
expr2 :: Parser Value
expr2 = chainl1 expr3 op
where op = eq <$ char '='
eq v1 v2 = BoolV $ v1 == v2
expr3 :: Parser Value
expr3 = chainl1 term'' op
where op = add <$ char '+' -- two operators at same precedence
<|> sub <$ char '-'
add (IntV x) (IntV y) = IntV $ x + y
sub (IntV x) (IntV y) = IntV $ x - y
term'' :: Parser Value
term'' = IntV . read <$> some digit
<|> BoolV False <$ char 'F' <|> BoolV True <$ char 'T'
<|> parens expr0
After which:
> parseTest expr0 "(1+5-6=2-3+1&2+2=4)=(T|F)"
BoolV True
>
As this can be tedious, Parsec provides a Text.Parsec.Expr that makes this easier. The following replaces expr0 through expr3 above:
expr0' :: Parser Value
expr0' = buildExpressionParser table term''
where table = [ [binary '+' add, binary '-' sub]
, [binary '=' eq]
, [binary '&' and]
, [binary '|' or]
]
binary c op = Infix (op <$ char c) AssocLeft
add (IntV x) (IntV y) = IntV $ x + y
sub (IntV x) (IntV y) = IntV $ x - y
eq v1 v2 = BoolV $ v1 == v2
and (BoolV x) (BoolV y) = BoolV $ x && y
or (BoolV x) (BoolV y) = BoolV $ x || y
Typed Parsing
You may find it strange above that we use an untyped AST (i.e., everything's an Expr) and dynamically typed Value instead of using Haskell's type system in the parsing. It is possible to design a parser where the operators actually have expected Haskell types. In the language above, equality causes a bit of an issue, but if we permit integer equality only, it's possible to write a typed parser/evaluator as follows. Here bexpr and iexpr are for boolean-valued and integer-values expressions respectively.
bexpr0 :: Parser Bool
bexpr0 = chainl1 bexpr1 op
where op = (||) <$ char '|'
bexpr1 :: Parser Bool
bexpr1 = chainl1 bexpr2 op
where op = (&&) <$ char '&'
bexpr2 :: Parser Bool
bexpr2 = False <$ char 'F' <|> True <$ char 'T'
<|> try eqexpr
<|> parens bexpr0
where eqexpr = (==) <$> iexpr3 <* char '=' <*> iexpr3 -- this can't chain now
iexpr3 :: Parser Int
iexpr3 = chainl1 iterm op
where op = (+) <$ char '+'
<|> (-) <$ char '-'
iterm :: Parser Int
iterm = read <$> some digit
<|> parens iexpr3
Note that we're still able to use chainl1, but there's a boundary between the integer and boolean types enforced by precedence, so we only ever chain Int -> Int -> Int or Bool -> Bool -> Bool operators, and we don't let the Int -> Int -> Bool integer equality operator chain.
This also means we need to use a different parser to parse a boolean versus an integer expression:
> parseTest bexpr0 "1+2=3"
True
> parseTest iexpr3 "1+2-3" -- iexpr3 is top-most integer expression parser
0
>
Note here that if you wanted integer equality to chain as a set of equalities so that 1+1=2=3-1 would check that all three terms are equal, you could do this with chainl1 using some trickery with lists and singleton values, but it's easier to use sepBy1 and replace eqexpr above with the definition:
eqexpr' = do
x:xs <- sepBy1 iexpr3 (char '=')
return $ all (==x) xs
giving:
> parseTest bexpr0 "1+1=2=3-1"
True
The whole program
To summarize, here's all the code:
{-# LANGUAGE FlexibleContexts #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Applicative (some)
import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.String
-- * Untyped parser to AST
data Expr
= IntE Int -- integer literals
| FalseE | TrueE -- boolean literals (F, T)
| AddE Expr Expr -- x + y
| SubE Expr Expr -- x - y
| EqE Expr Expr -- x = y
| OrE Expr Expr -- x | y
| AndE Expr Expr -- x & y
deriving (Show)
expr :: Parser Expr
expr = chainl1 term op
where op = AddE <$ char '+'
<|> SubE <$ char '-'
<|> EqE <$ char '='
<|> OrE <$ char '|'
<|> AndE <$ char '&'
term :: Parser Expr
term = IntE . read <$> some digit
<|> FalseE <$ char 'F' <|> TrueE <$ char 'T'
<|> parens expr
parens :: Parser a -> Parser a
parens = between (char '(') (char ')')
-- * Interpreter
data Value = BoolV Bool | IntV Int deriving (Eq, Show)
eval :: Expr -> Value
eval (IntE x) = IntV x
eval FalseE = BoolV False
eval TrueE = BoolV True
eval (AddE e1 e2)
= let IntV v1 = eval e1 -- pattern match ensures right type
IntV v2 = eval e2
in IntV (v1 + v2)
eval (SubE e1 e2)
= let IntV v1 = eval e1
IntV v2 = eval e2
in IntV (v1 - v2)
eval (EqE e1 e2) = BoolV (eval e1 == eval e2) -- equal if same type and value
eval (OrE e1 e2)
= let BoolV v1 = eval e1
BoolV v2 = eval e2
in BoolV (v1 || v2)
eval (AndE e1 e2)
= let BoolV v1 = eval e1
BoolV v2 = eval e2
in BoolV (v1 && v2)
-- * Combined parser/interpreter with no intermediate AST
expr' :: Parser Value
expr' = chainl1 term' op
where op = add <$ char '+'
<|> sub <$ char '-'
<|> eq <$ char '='
<|> or <$ char '|'
<|> and <$ char '&'
add (IntV x) (IntV y) = IntV $ x + y
sub (IntV x) (IntV y) = IntV $ x - y
eq v1 v2 = BoolV $ v1 == v2
or (BoolV x) (BoolV y) = BoolV $ x || y
and (BoolV x) (BoolV y) = BoolV $ x && y
term' :: Parser Value
term' = IntV . read <$> some digit
<|> BoolV False <$ char 'F' <|> BoolV True <$ char 'T'
<|> parens expr'
-- * Parser/interpreter with operator precendence
expr0 :: Parser Value
expr0 = chainl1 expr1 op
where op = or <$ char '|'
or (BoolV x) (BoolV y) = BoolV $ x || y
expr1 :: Parser Value
expr1 = chainl1 expr2 op
where op = and <$ char '&'
and (BoolV x) (BoolV y) = BoolV $ x && y
expr2 :: Parser Value
expr2 = chainl1 expr3 op
where op = eq <$ char '='
eq v1 v2 = BoolV $ v1 == v2
expr3 :: Parser Value
expr3 = chainl1 term'' op
where op = add <$ char '+' -- two operators at same precedence
<|> sub <$ char '-'
add (IntV x) (IntV y) = IntV $ x + y
sub (IntV x) (IntV y) = IntV $ x - y
term'' :: Parser Value
term'' = IntV . read <$> some digit
<|> BoolV False <$ char 'F' <|> BoolV True <$ char 'T'
<|> parens expr0
-- * Alternate implementation using buildExpressionParser
expr0' :: Parser Value
expr0' = buildExpressionParser table term''
where table = [ [binary '+' add, binary '-' sub]
, [binary '=' eq]
, [binary '&' and]
, [binary '|' or]
]
binary c op = Infix (op <$ char c) AssocLeft
add (IntV x) (IntV y) = IntV $ x + y
sub (IntV x) (IntV y) = IntV $ x - y
eq v1 v2 = BoolV $ v1 == v2
and (BoolV x) (BoolV y) = BoolV $ x && y
or (BoolV x) (BoolV y) = BoolV $ x || y
-- * Typed parser/interpreter with separate boolean and integer expressions
bexpr0 :: Parser Bool
bexpr0 = chainl1 bexpr1 op
where op = (||) <$ char '|'
bexpr1 :: Parser Bool
bexpr1 = chainl1 bexpr2 op
where op = (&&) <$ char '&'
bexpr2 :: Parser Bool
bexpr2 = False <$ char 'F' <|> True <$ char 'T'
<|> try eqexpr
<|> parens bexpr0
where eqexpr = (==) <$> iexpr3 <* char '=' <*> iexpr3 -- this can't chain now
iexpr3 :: Parser Int
iexpr3 = chainl1 iterm op
where op = (+) <$ char '+'
<|> (-) <$ char '-'
iterm :: Parser Int
iterm = read <$> some digit
<|> parens iexpr3
-- * Alternate definition of eqexpr to allow 4=2+2=1+3
eqexpr' = do
x:xs <- sepBy1 iexpr3 (char '=')
return $ all (==x) xs
I have the following language I am trying to parse.
formula ::= true
| false
| var
| formula & formula
| ∀ var . formula
| (formula)
var ::= letter { letter | digit }*
I have been following this article on the Haskell wiki and have used Parsec combinators to create the following parser.
module LogicParser where
import System.IO
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token
-- Data Structures
data Formula = Var String
| TT
| FF
| Con Formula Formula
| Forall String Formula
deriving (Show)
-- Language Definition
lang :: LanguageDef st
lang =
emptyDef{ Token.identStart = letter
, Token.identLetter = alphaNum
, Token.opStart = oneOf "&."
, Token.opLetter = oneOf "&."
, Token.reservedOpNames = ["&", "."]
, Token.reservedNames = ["tt", "ff", "forall"]
}
-- Lexer for langauge
lexer =
Token.makeTokenParser lang
-- Trivial Parsers
identifier = Token.identifier lexer
keyword = Token.reserved lexer
op = Token.reservedOp lexer
roundBrackets = Token.parens lexer
whiteSpace = Token.whiteSpace lexer
-- Main Parser, takes care of trailing whitespaces
formulaParser :: Parser Formula
formulaParser = whiteSpace >> formula
-- Parsing Formulas
formula :: Parser Formula
formula = conFormula
<|> formulaTerm
<|> forallFormula
-- Term in a Formula
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
<|> ttFormula
<|> ffFormula
<|> varFormula
-- Conjunction
conFormula :: Parser Formula
conFormula =
buildExpressionParser [[Infix (op "&" >> return Con) AssocLeft]] formula
-- Truth
ttFormula :: Parser Formula
ttFormula = keyword "tt" >> return TT
-- Falsehood
ffFormula :: Parser Formula
ffFormula = keyword "ff" >> return FF
-- Variable
varFormula :: Parser Formula
varFormula =
do var <- identifier
return $ Var var
-- Universal Statement
forallFormula :: Parser Formula
forallFormula =
do keyword "forall"
x <- identifier
op "."
phi <- formulaTerm
return $ Forall x phi
-- For running runghc
calculate :: String -> String
calculate s =
case ret of
Left e -> "Error: " ++ (show e)
Right n -> "Interpreted as: " ++ (show n)
where
ret = parse formulaParser "" s
main :: IO ()
main = interact (unlines . (map calculate) . lines)
The problem is the & operator, I am trying to model it on how the article parses expressions, using the Infix function and passing it a list of operators and a parser to parse terms. However, the parser is not behaving as desired and I can't work out why. Here are some examples of desired behaviour:
true -- parsing --> TT
true & false -- parsing --> Con TT FF
true & (true & false) -- parsing --> Con TT (Con TT FF)
forall x . false & true -- parsing --> Con (Forall "x" FF) TT
However currently the parser is producing no output. I appreciate any assistance.
I think I managed to fix the problem by rephrasing the grammar, and conjunction is still left-associative:
formula :: Parser Formula
formula = conFormula
<|> formulaTerm
formulaTerm :: Parser Formula
formulaTerm = roundBrackets formula
<|> ttFormula
<|> ffFormula
<|> varFormula
<|> forallFormula
conFormula :: Parser Formula
conFormula =
buildExpressionParser [[Infix (op "&" >> return Con) AssocLeft]] formulaTerm
This parsed the following successfully.
The problem in your code is, that you try to parse a formula for which the first attempt is to parse a formulaCon. This then first tries to parse a formula, i.e. you create an infinite recursion without consuming any input.
To resolve this, you have to structure your grammar. Define your terms like this (note that all these terms consume some input before doing a recursion back to formula):
formulaTerm = ttFormula
<|> ffFormula
<|> varFormula
<|> forallFormula
<|> roundBrackets formula
forallFormula = do
keyword "forall"
x <- identifier
op "."
phi <- formula
return $ Forall x phi
A formula is then either a single term or a conjunction consisting of term, an operator, and another formula. To ensure that all input is parsed, first try to parse a conjunction and - if that fails - parse a single term:
formula = (try formulaCon) <|> formulaTerm
Finally a formulaCon can be parsed like this:
formulaCon = do
f1 <- formulaTerm
op "&"
f2 <- formula
return $ Con f1 f2
The drawback with this solution is that conjunctions are now right associative.
I have the following grammar for expressions that I'm trying to represent as a Haskell ADT:
Expr = SimpleExpr [OPrelation SimpleExpr]
SimpleExpr = [OPunary] Term {OPadd Term}
Term = Factor {OPmult Factor}
where:
{} means 0 or more
[] means 0 or 1
OPmult, OPadd, OPrelation, OPunary are classes of operators
Note that this grammar does get precedence right.
Here's something I tried:
data Expr = Expr SimpleExpr (Maybe OPrelation) (Maybe SimpleExpr)
data SimpleExpr = SimpleExpr (Maybe OPunary) Term [OPadd] [Term]
data Term = Term Factor [OPmult] [Factor]
which in hindsight I think is awful, especially the [OPadd] [Term] and [OPmult] [Factor] parts. Because, for example, in the parse tree for 1+2+3 it would put [+, +] in one branch and [2, 3] in another, meaning they're decoupled.
What would be a good representation that'll play nice later in the next stages of compilation?
Decomposing { } and [ ] into more data types seems like an overkill
Using lists seems not quite right as it would no longer be a tree (Just a node that's a list)
Maybe for { }. A good idea ?
And finally, I'm assuming after parsing I'll have to pass over the Parse Tree and reduce it to an AST? or should the whole grammar be modified to be less complex? or maybe it's abstract enough?
The AST does not need to be that close to the grammar. The grammar is structured into multiple levels to encode precedence and uses repetition to avoid left-recursion while still being able to correctly handle left-associative operators. The AST does not need to worry about such things.
Instead I'd define the AST like this:
data Expr = BinaryOperation BinaryOperator Expr Expr
| UnaryOperation UnaryOperator Expr
| Literal LiteralValue
| Variable Id
data BinaryOperator = Add | Sub | Mul | Div
data UnaryOperator = Not | Negate
Here's an additional answer that might help you. I don't want to spoil your fun, so here's a very simple example grammar:
-- Expr = Term ['+' Term]
-- Term = Factor ['*' Factor]
-- Factor = number | '(' Expr ')'
-- number = one or more digits
Using a CST
As one approach, we can represent this grammar as a concrete syntax tree (CST):
data Expr = TermE Term | PlusE Term Term deriving (Show)
data Term = FactorT Factor | TimesT Factor Factor deriving (Show)
data Factor = NumberF Int | ParenF Expr deriving (Show)
A Parsec-based parser to turn the concrete syntax into a CST might look like this:
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
with helper functions for whitespace processing:
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
and main entry point:
parseExpr :: String -> Expr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
after which we can run:
> parseExpr "1+1*(3+4)"
PlusE (FactorT (Number 1)) (TimesT (Number 1) (ParenF (PlusE
(FactorT (Number 3)) (FactorT (Number 4)))))
>
To convert this into the following AST:
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
we could write:
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
To interpret the AST, we could use:
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
and then write:
calc :: String -> Int
calc = interp . aexpr . parseExpr
after which we have a crude little calculator:
> calc "1 + 2 * (6 + 3)"
19
>
Skipping the CST
As an alternative approach, we could replace the parser with one that parses directly into an AST of type AExpr:
expr :: Parser AExpr
expr = do
t1 <- term
(PlusA t1 <$ symbol "+" <*> term)
<|> pure t1
term :: Parser AExpr
term = do
f1 <- factor
(TimesA f1 <$ symbol "*" <*> factor)
<|> pure f1
factor :: Parser AExpr
factor = NumberA . read <$> lexeme (many1 (satisfy isDigit))
<|> between (symbol "(") (symbol ")") expr
You can see how little the structure of these parsers changes. All that's disappeared is the distinction between expressions, terms, and factors at the type level, and constructors like TermE, FactorT, and ParenF whose only purpose is to allow embedding of these types within each other.
In more complex scenarios, the CST and AST parsers might exhibit bigger differences. (For example, in a grammar that allowed 1 + 2 + 3, this might be represented as a single constructor data Expr = ... | PlusE [Term] | ... in the CST but with a nested series of binary PlusA constructors in the same AExpr AST type as above.)
After redefining parseExpr to return an AExpr and dropping the aexpr step from calc, everything else stays the same, and we still have:
> calc "1 + 2 * (6 + 3)"
19
>
Programs for Reference
Here's the full program using an intermediate CST:
-- Calc1.hs, using a CST
{-# OPTIONS_GHC -Wall #-}
module Calc1 where
import Data.Char
import Text.Parsec
import Text.Parsec.String
data Expr = TermE Term | PlusE Term Term deriving (Show)
data Term = FactorT Factor | TimesT Factor Factor deriving (Show)
data Factor = NumberF Int | ParenF Expr deriving (Show)
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
expr :: Parser Expr
expr = do
t1 <- term
(PlusE t1 <$ symbol "+" <*> term)
<|> pure (TermE t1)
term :: Parser Term
term = do
f1 <- factor
(TimesT f1 <$ symbol "*" <*> factor)
<|> pure (FactorT f1)
factor :: Parser Factor
factor = NumberF . read <$> lexeme (many1 (satisfy isDigit))
<|> ParenF <$> between (symbol "(") (symbol ")") expr
parseExpr :: String -> Expr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
aexpr :: Expr -> AExpr
aexpr (TermE t) = aterm t
aexpr (PlusE t1 t2) = PlusA (aterm t1) (aterm t2)
aterm :: Term -> AExpr
aterm (FactorT f) = afactor f
aterm (TimesT f1 f2) = TimesA (afactor f1) (afactor f2)
afactor :: Factor -> AExpr
afactor (NumberF n) = NumberA n
afactor (ParenF e) = aexpr e
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
calc :: String -> Int
calc = interp . aexpr . parseExpr
and here's the full program for the more traditional solution that skips an explicit CST representation:
-- Calc2.hs, with direct parsing to AST
{-# OPTIONS_GHC -Wall #-}
module Calc where
import Data.Char
import Text.Parsec
import Text.Parsec.String
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
symbol :: String -> Parser String
symbol = lexeme . string
expr :: Parser AExpr
expr = do
t1 <- term
(PlusA t1 <$ symbol "+" <*> term)
<|> pure t1
term :: Parser AExpr
term = do
f1 <- factor
(TimesA f1 <$ symbol "*" <*> factor)
<|> pure f1
factor :: Parser AExpr
factor = NumberA . read <$> lexeme (many1 (satisfy isDigit))
<|> between (symbol "(") (symbol ")") expr
parseExpr :: String -> AExpr
parseExpr pgm = case parse (spaces *> expr) "(string)" pgm of
Right e -> e
Left err -> error $ show err
data AExpr -- Abstract Expression
= NumberA Int
| PlusA AExpr AExpr
| TimesA AExpr AExpr
interp :: AExpr -> Int
interp (NumberA n) = n
interp (PlusA e1 e2) = interp e1 + interp e2
interp (TimesA e1 e2) = interp e1 * interp e2
calc :: String -> Int
calc = interp . parseExpr
Okay so Buhr's answer is quite nice. Here's how I did though (no CST) inspired by sepp2k's response:
The AST:
data OP = OPplus | OPminus | OPstar | OPdiv
| OPidiv | OPmod | OPand | OPeq | OPneq
| OPless | OPgreater | OPle | OPge
| OPin | OPor
data Expr =
Relation Expr OP Expr -- > < == >= etc..
| Unary OP Expr -- + -
| Mult Expr OP Expr -- * / div mod and
| Add Expr OP Expr -- + - or
| FactorInt Int | FactorReal Double
| FactorStr String
| FactorTrue | FactorFalse
| FactorNil
| FactorDesig Designator -- identifiers
| FactorNot Expr
| FactorFuncCall FuncCall deriving (Show)
The parsers:
parseExpr :: Parser Expr
parseExpr = (try $ Relation <$>
parseSimpleExpr <*> parseOPrelation <*> parseSimpleExpr)
<|> parseSimpleExpr
parseSimpleExpr :: Parser Expr
parseSimpleExpr = (try simpleAdd)
<|> (try $ Unary <$> parseOPunary <*> simpleAdd)
<|> (try $ Unary <$> parseOPunary <*> parseSimpleExpr)
<|> parseTerm
where simpleAdd = Add <$> parseTerm <*> parseOPadd <*> parseSimpleExpr
parseTerm :: Parser Expr
parseTerm = (try $ Mult <$>
parseFactor <*> parseOPmult <*> parseTerm)
<|> parseFactor
parseFactor :: Parser Expr
parseFactor =
(parseKWnot >> FactorNot <$> parseFactor)
<|> (exactTok "true" >> return FactorTrue)
<|> (exactTok "false" >> return FactorFalse)
<|> (parseNumber)
<|> (FactorStr <$> parseString)
<|> (betweenCharTok '(' ')' parseExpr)
<|> (FactorDesig <$> parseDesignator)
<|> (FactorFuncCall <$> parseFuncCall)
I didn't include basic parsers like parseOPadd as those are what you'd expect and are easy to build.
I still parsed according to the grammar but tweaked it slightly to match my AST.
You could check out the full source which is a compiler for Pascal here.