Using chainl1 correctly with infix in Haskell - parsing

For the MVE code below it outputs [] rather than the expected Not (Oper Eq 2 2)) for the input parseString "2+2" which is supposed to call pOper. My guess is that pOper would expect three arguments for the anonymous function to work. That is 3 strings. However due to partial call of a function only one argument is passed. Is there a way to work around to preserve the type signature of pOper while dealing with the Not and at the same time not changing the type definitions?
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
data Value =
IntVal Int
deriving (Eq, Show, Read)
data Exp =
Const Value
| Oper Op Exp Exp
| Not Exp
deriving (Eq, Show, Read)
data Op = Plus | Minus | Eq
deriving (Eq, Show, Read)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
space1 :: Parser String
space1 = many1 space
symbol :: String -> Parser String
symbol = token . string
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr :: Parser Exp
pExpr = chainl1 pTerm pOper
pTerm :: Parser Exp
pTerm =
(do
pv <- numConst
skipSpaces
return pv)
numConst :: Parser Exp
numConst =
(do
skipSpaces
y <- munch isDigit
return (Const (IntVal (read y)))
)
-- Parser for an operator
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = symbol "+" >> return (Oper Plus)
<|> (symbol "-" >> return (Oper Minus))
<|> (symbol "=" >> return (Oper Eq))
<|> (symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2)))

There's nothing wrong with your parser for !=. Rather, your parser for operators in general is broken: it only parses the first operator correctly. A simpler version of your pOper would be
pOper = a >> b
<|> (c >> d)
But because of precedence, this isn't the same as (a >> b) <|> (c >> d). Actually, it's a >> (b <|> (c >> d))! So the symbol your first alternative parses is accidentally mandatory. It would parse 2+!=2 instead.
So, you could fix this by just adding in the missing parentheses. But if, like me, you find it a little tacky to rely so much on operator precedence for semantic meaning, consider something that's more obviously safe, using the type system to separate the clauses from the delimiters:
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = asum [ symbol "+" >> return (Oper Plus)
, symbol "-" >> return (Oper Minus)
, symbol "=" >> return (Oper Eq)
, symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2))
]
This way, you have a list of independent parsers, not a single parser built with alternation. asum (from Control.Applicative) does the work of combining that list into alternatives. It means the same thing, of course, but it means you don't have to learn any operator precedence tables, because , can only be a list item separator.

The best way I can think of to solve the problem is by creating these to modificatoins: 1) this alternative in the expression
pExpr :: Parser Exp
pExpr =
(do pv <- chainl1 pTerm pOper
pv2 <- pOper2 pv
return pv2)
<|> chainl1 pTerm pOper
And 2) this helper function to deal with infix patterns
pOper2 :: Exp -> Parser Exp
pOper2 e1 = (do
symbol "!="
e2 <- numConst
return (Not (Oper Eq e1 e2)))
This is the output, althought I don't know if there will be problems if other operations such as / and * which has different associativety are to be taken into account as well.
parseString "2+4+6"
[(Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6)),"")]
ghci> parseString "2+4+6 != 2"
[(Not (Oper Eq (Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6))) (Const (IntVal 2))),"")]
ghci> parseString "2 != 4"
[(Not (Oper Eq (Const (IntVal 2)) (Const (IntVal 4))),"")]

Related

How to parse a bool expression in Haskell

I am trying to parse a bool expression in Haskell. This line is giving me an error: BoolExpr <$> parseBoolOp <*> (n : ns). This is the error:
• Couldn't match type ‘[]’ with ‘Parser’
Expected type: Parser [Expr]
Actual type: [Expr]
-- define the expression types
data Expr
= BoolExpr BoolOp [Expr]
deriving (Show, Eq)
-- define the type for bool value
data Value
= BoolVal Bool
deriving (Show, Eq)
-- many x = Parser.some x <|> pure []
-- some x = (:) <$> x <*> Parser.many x
kstar :: Alternative f => f a -> f [a]
kstar x = kplus x <|> pure []
kplus :: Alternative f => f a -> f [a]
kplus x = (:) <$> x <*> kstar x
symbol :: String -> Parser String
symbol xs = token (string xs)
-- a bool expression is the operator followed by one or more expressions that we have to parse
-- TODO: add bool expressions
boolExpr :: Parser Expr
boolExpr = do
n <- parseExpr
ns <- kstar (symbol "," >> parseExpr)
BoolExpr <$> parseBoolOp <*> (n : ns)
-- an atom is a literalExpr, which can be an actual literal or some other things
parseAtom :: Parser Expr
parseAtom =
do
literalExpr
-- the main parsing function which alternates between all the options you have
parseExpr :: Parser Expr
parseExpr =
do
parseAtom
<|> parseParens boolExpr
<|> parseParens parseExpr
-- implement parsing bool operations, these are 'and' and 'or'
parseBoolOp :: Parser BoolOp
parseBoolOp =
do symbol "and" >> return And
<|> do symbol "or" >> return Or
The boolExpr is expecting a Parser [Expr] but I am returning only an [Expr]. Is there a way to fix this or do it in another way? When I try pure (n:ns), evalStr "(and true (and false true) true)" returns Left (ParseError "'a' didn't match expected character") instead of Right (BoolVal False)
The expression (n : ns) is a list. Therefore the compiler thinks that the applicative operators <*> and <$> should be used in the context [], while you want Parser instead.
I would guess you need pure (n : ns) instead.

Haskell: Parsing a file finishes after first expression despite more input in file

The following is an example program of a language in which I'm writing a parser.
n := 1
Do (1)-> -- The 1 in brackets is a placeholder for a Boolean or relational expression.
n := 1 + 1
Od
When the program looks like this, the parseFile functions ends after the assignment on the first line however when the assignment is removed, it parses as expected. Below is how it's called in GHCI, first with the first line present then removed:
λ > parseFile "example.hnry"
Assign "n" (HInteger 1)
λ > parseFile "example.hnry"
Do (HInteger 1) (Assign "n" (AExpr (HInteger 1) Add (HInteger 1)))
The expected output would look similar to this:
λ > parseFile "example.hnry"
Assign "n" (HInteger 1) Do (HInteger 1) (Assign "n" (AExpr (HInteger 1) Add (HInteger 1)))
I first assumed it was something to do with the the assignment parser but in the body of the loop, there exists an assignment which parses as expected so I was able to rule that out. I believe that the issue is within the parseFile function itself. The following is the parseFile function and the other functions that make up the parseExpression function that I'm using to parse a program.
I think that the error is within parseFile because it parses an expression only once and doesn't "loop" for the want of a better word to itself to check if there's more input left the parse. I think that's the error but I'm not quite sure.
parseFile :: String -> IO HVal
parseFile file =
do program <- readFile file
case parse parseExpression "" program of
Left err -> fail "Parse Error"
Right parsed -> return $ parsed
parseExpression :: Parser HVal
parseExpression = parseAExpr <|> parseDo <|> parseAssign
parseDo :: Parser HVal
parseDo = do
_ <- string "Do "
_ <- char '('
x <- parseHVal -- Will be changed to a Boolean expression
_ <- string ")->"
spaces
y <- parseExpression
spaces
_ <- string "Od"
return $ Do x y
parseAExpr :: Parser HVal
parseAExpr = do
x <- parseInteger
spaces
op <- parseOp
spaces
y <- parseInteger <|> do
_ <- char '('
z <- parseAExpr
_ <- char ')'
return $ z
return $ AExpr x op y
parseAssign :: Parser HVal
parseAssign = do
var <- oneOf ['a'..'z'] <|> oneOf ['A'..'Z']
spaces
_ <- string ":="
spaces
val <- parseHVal <|> do
_ <- char '('
z <- parseAExpr
_ <- char ')'
return $ z
return $ Assign [var] val
As you note, your parseFile function parses a single expression (though maybe "statement" would be a better name) using the parseExpression parser. You probably want to introduce a new parser for a "program" or sequence of expressions/statements:
parseProgram :: Parser [HVal]
parseProgram = spaces *> many (parseExpression <* spaces)
and then in parseFile, replace parseExpression with parseProgram:
parseFile :: String -> IO [HVal]
parseFile file =
do program <- readFile file
case parse parseProgram "" program of
Left err -> fail "Parse Error"
Right parsed -> return $ parsed
Note that I've had to change the type here from HVal to [HVal] to reflect the fact that a program, being a sequence of expressions each of type HVal, needs to be represented as some sort of data type capable of combining multiple HVals together, and a list [HVal] is one way of doing so.
If you want a program to be an HVal instead of an [HVal], then you need to introduce a new constructor in your HVal type that's capable of representing programs. One method is to use a constructor to directly represent a block of statements:
data HVal = ... | Block [HVal]
Another is to add a constructor represent a sequence of two statements:
data HVal = ... | Seq HVal HVal
Both methods are used in real parsers. (Note that you'd normally pick one; you wouldn't use both.) To represent a sequence of three assignment statements, for example, the block method would do it directly as a list:
Block [Assign "a" (HInteger 1), Assign "b" (HInteger 2), Assign "c" (HInteger 3)]
while the two-statement sequence method would build a sort of nested tree:
Seq (Assign "a" (HInteger 1)) (Seq (Assign "b" (HInteger 2)
(Assign "c" (HInteger 3))
The appropriate parsers for these two alternatives, both of which return a plain HVal, might be:
-- use blocks
parseProgram1 :: Parser HVal
parseProgram1 = do
spaces
xs <- many (parseExpression <* spaces)
return $ Block xs
parseProgram2 :: Parser HVal
parseProgram2 = do
spaces
x <- parseExpression
spaces
(do xs <- parseProgram2
return $ Seq x xs)
<|> return x

Haskell : Non-Exhaustive Pattern In Function Prevents Another Function From Executing Even Though Its Not Used

I'm trying to implement car, cdr, and cons functionality into a toy language I'm writing however when I try to execute my car function through main, I get the following error:
./parser "car [1 2 3]"
parser: parser.hs:(48,27)-(55,45): Non-exhaustive patterns in case
The function on lines 48-55 is the following:
parseOp :: Parser HVal
parseOp = (many1 letter <|> string "+" <|> string "-" <|> string "*" <|> string "/" <|> string "%" <|> string "&&" <|> string "||") >>=
(\x -> return $ case x of
"&&" -> Op And
"||" -> Op Or
"+" -> Op Add
"-" -> Op Sub
"*" -> Op Mult
"/" -> Op Div
"%" -> Op Mod)
I'm really unsure why the error message points to this function because it has nothing to do with the list functionality. The car function is working however because I was able to successfully execute it through GHCI. I know my problem is due to parsing but I don't see where it is. The following are the functions that relate to lists. I can't see from them how they are influenced by parseOp.
data HVal = Number Integer
| String String
| Boolean Bool
| List [HVal]
| Op Op
| Expr Op HVal HVal
| Car [HVal]
deriving (Read)
car :: [HVal] -> HVal
car xs = head xs
parseListFunctions :: Parser HVal
parseListFunctions = do
_ <- string "car "
_ <- char '['
x <- parseList
_ <- char ']'
return $ Car [x]
parseExpr :: Parser HVal
parseExpr = parseNumber
<|> parseOp
<|> parseBool
<|> parseListFunctions
<|> do
_ <- char '['
x <- parseList
_ <- char ']'
return x
<|> do
_ <- char '('
x <- parseExpression
_ <- char ')'
return x
eval :: HVal -> HVal
eval val#(Number _) = val
eval val#(String _) = val
eval val#(Boolean _) = val
eval val#(List _) = val -- Look at list eval NOT WORKING
eval val#(Op _) = val
eval (Expr op x y) = eval $ evalExpr (eval x) op (eval y)
eval (Car xs) = eval $ car xs
The removal of many1 letter in parseOp transfers the same error to the following function parseBool:
parseBool :: Parser HVal
parseBool = many1 letter >>= (\x -> return $ case x of
"True" -> Boolean True
"False" -> Boolean False)
You write
parseExpr = ... <|> parseOp <|> ... <|> parseListFunctions <|> ...
and so
car ...
is passed to parseOp first, then parseListFunctions. The parseOp parser succeeds in the
many1 letter
branch, and so in the \x -> return $ case x of ..., x is bound to "car". Because parseOp succeeds (and returns an error value with an embedded, not-yet-evaluated inexhaustive case error!), parseListFunctions is never tried.
You will need to modify your grammar to reduce the ambiguity in it, so that these conflicts where multiple branches may match do not arise.

Custom ADT vs. Tree for parser return value

I'm using Parsec to build a simple Lisp parser.
What are the (dis)advantages of using a custom ADT for the parser types versus using a standard Tree (i.e. Data.Tree)?
After trying both ways, I've come up with a couple points for custom ADTs (i.e. Parser ASTNode):
seems to be much clearer and simpler
others have done it this way(including Tiger, which is/was bundled with Parsec)
and one against (i.e. Parser (Tree ASTNode):
Data.Tree already has Functor, Monad, etc. instances, which will be very helpful for semantic analysis, evaluation, calculating code statistics
For example:
custom ADT
import Text.ParserCombinators.Parsec
data ASTNode
= Application ASTNode [ASTNode]
| Symbol String
| Number Float
deriving (Show)
int :: Parser ASTNode
int = many1 digit >>= (return . Number . read)
symbol :: Parser ASTNode
symbol = many1 (oneOf ['a'..'z']) >>= (return . Symbol)
whitespace :: Parser String
whitespace = many1 (oneOf " \t\n\r\f")
app :: Parser ASTNode
app =
char '(' >>
sepBy1 expr whitespace >>= (\(e:es) ->
char ')' >>
(return $ Application e es))
expr :: Parser ASTNode
expr = symbol <|> int <|> app
example use:
ghci> parse expr "" "(a 12 (b 13))"
Right
(Application
(Symbol "a")
[Number 12.0, Application
(Symbol "b")
[Number 13.0]])
Data.Tree
import Text.ParserCombinators.Parsec
import Data.Tree
data ASTNode
= Application (Tree ASTNode)
| Symbol String
| Number Float
deriving (Show)
int :: Parser (Tree ASTNode)
int = many1 digit >>= (\x -> return $ Node (Number $ read x) [])
symbol :: Parser (Tree ASTNode)
symbol = many1 (oneOf ['a' .. 'z']) >>= (\x -> return $ Node (Symbol x) [])
whitespace :: Parser String
whitespace = many1 (oneOf " \t\n\r\f")
app :: Parser (Tree ASTNode)
app =
char '(' >>
sepBy1 expr whitespace >>= (\(e:es) ->
char ')' >>
(return $ Node (Application e) es))
expr :: Parser (Tree ASTNode)
expr = symbol <|> int <|> app
and example use:
ghci> parse expr "" "(a 12 (b 13))"
Right
(Node
(Application
(Node (Symbol "a") []))
[Node (Number 12.0) [],
Node
(Application
(Node (Symbol "b") []))
[Node (Number 13.0) []]])
(sorry for the formatting -- hopefully it's clear)
I'd absolutely go for the AST, because interpretation/compilation/language analysis in general is very much driven by the structure of your language. The AST will simply and naturally represent and respect that structure, while Tree will do neither.
For example, a common form of language implementation technique is to implement some complex features by translation: translate programs that involve those features or constructs into programs in a subset of the a language that does not use them (Lisp macros, for example, are all about this). If you use an AST, the type system will, for example, often forbid you from producing illegal translations as output. Whereas a Tree type that doesn't understand your program will not help there.
Your AST doesn't look very complicated, so writing utility functions for it should not be hard. Take this one for example:
foldASTNode :: (r -> [r] -> r) -> (String -> r) -> (Float -> r) -> r
foldASTNode app sym num node =
case node of
Application f args -> app (subfold f) (map subfold args)
Symbol str -> sym str
Number n -> num n
where subfold = foldASTNode app sym num
And in any case, what sort of Functor do you wish to have on your AST? There's no type parameter on it...

Using Parsec to parse regular expressions

I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like:
EXP : EXP *
| LIT EXP
| LIT
I've tried to implement this in Haskell as:
expr = try star
<|> try litE
<|> lit
litE = do c <- noneOf "*"
rest <- expr
return (c : rest)
lit = do c <- noneOf "*"
return [c]
star = do content <- expr
char '*'
return (content ++ "*")
There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star is that it consumes its mandatory token at the end.
Any thoughts?
You should use Parsec.Expr.buildExprParser; it is ideal for this purpose. You simply describe your operators, their precedence and associativity, and how to parse an atom, and the combinator builds the parser for you!
You probably also want to add the ability to group terms with parens so that you can apply * to more than just a single literal.
Here's my attempt (I threw in |, +, and ? for good measure):
import Control.Applicative
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
data Term = Literal Char
| Sequence [Term]
| Repeat (Int, Maybe Int) Term
| Choice [Term]
deriving ( Show )
term :: Parser Term
term = buildExpressionParser ops atom where
ops = [ [ Postfix (Repeat (0, Nothing) <$ char '*')
, Postfix (Repeat (1, Nothing) <$ char '+')
, Postfix (Repeat (0, Just 1) <$ char '?')
]
, [ Infix (return sequence) AssocRight
]
, [ Infix (choice <$ char '|') AssocRight
]
]
atom = msum [ Literal <$> lit
, parens term
]
lit = noneOf "*+?|()"
sequence a b = Sequence $ (seqTerms a) ++ (seqTerms b)
choice a b = Choice $ (choiceTerms a) ++ (choiceTerms b)
parens = between (char '(') (char ')')
seqTerms (Sequence ts) = ts
seqTerms t = [t]
choiceTerms (Choice ts) = ts
choiceTerms t = [t]
main = parseTest term "he(llo)*|wor+ld?"
Your grammar is left-recursive, which doesn’t play nice with try, as Parsec will repeatedly backtrack. There are a few ways around this. Probably the simplest is just making the * optional in another rule:
lit :: Parser (Char, Maybe Char)
lit = do
c <- noneOf "*"
s <- optionMaybe $ char '*'
return (c, s)
Of course, you’ll probably end up wrapping things in a data type anyway, and there are a lot of ways to go about it. Here’s one, off the top of my head:
import Control.Applicative ((<$>))
data Term = Literal Char
| Sequence [Term]
| Star Term
expr :: Parser Term
expr = Sequence <$> many term
term :: Parser Term
term = do
c <- lit
s <- optionMaybe $ char '*' -- Easily extended for +, ?, etc.
return $ if isNothing s
then Literal c
else Star $ Literal c
Maybe a more experienced Haskeller will come along with a better solution.

Resources