I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)
Related
I'm studying functional programming using Haskell language. And as an exercise I need to implement a function parsing a primitive arithmetic expression from String. The function must be able to handle double literals, operations +, -, *, / with the usual precedence and parentheses.
parseExpr :: String -> Except ParseError Expr
with next defined data types:
data ParseError = ErrorAtPos Natural
deriving Show
newtype Parser a = P (ExceptState ParseError (Natural, String) a)
deriving newtype (Functor, Applicative, Monad)
data Prim a
= Add a a
| Sub a a
| Mul a a
| Div a a
| Abs a
| Sgn a
deriving Show
data Expr
= Val Double
| Op (Prim Expr)
deriving Show
Where ExceptState is a modified State monad, allowing to throw exception pointing at the error position.
data Annotated e a = a :# e
deriving Show
infix 0 :#
data Except e a = Error e | Success a
deriving Show
data ExceptState e s a = ES { runES :: s -> Except e (Annotated s a) }
Also ExceptState has defined Functor, Applicative and Monad instances, which were thoroughly tested earlier, so I am positive in their correctness.
instance Functor (ExceptState e s) where
fmap func ES{runES = runner} = ES{runES = \s ->
case (runner s) of
Error err -> Error err
Success ans -> Success (mapAnnotated func $ ans) }
instance Applicative (ExceptState e s) where
pure arg = ES{runES = \s -> Success (arg :# s)}
p <*> q = Control.Monad.ap p q
instance Monad (ExceptState e s) where
m >>= f = joinExceptState (fmap f m)
where
joinExceptState :: ExceptState e s (ExceptState e s a) -> ExceptState e s a
joinExceptState ES{runES = runner} = ES{runES = \s ->
case (runner s) of
Error err -> Error err
Success (ES{runES = runner2} :# s2) ->
case (runner2 s2) of
Error err -> Error err
Success (res :# s3) -> Success (res :# s3) }
To implement the function parseExpr I used basic parser combinators:
pChar :: Parser Char
pChar = P $ ES $ \(pos, s) ->
case s of
[] -> Error (ErrorAtPos pos)
(c:cs) -> Success (c :# (pos + 1, cs))
parseError :: Parser a
parseError = P $ ES $ \(pos, _) -> Error (ErrorAtPos pos)
instance Alternative Parser where
empty = parseError
(<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
P $ ES $ \(pos, s) ->
case runnerP (pos, s) of
Error _ -> runnerQ (pos, s)
Success res -> Success res
instance MonadPlus Parser
which were used to construct more complex ones:
-- | elementary parser not consuming a character, failing if input doesn't
-- reach its end
pEof :: Parser ()
pEof = P $ ES $ \(pos, s) ->
case s of
[] -> Success (() :# (pos, []))
_ -> Error $ ErrorAtPos pos
-- | parses a single digit value
parseVal :: Parser Expr
parseVal = Val <$> (fromIntegral . digitToInt) <$> mfilter isDigit pChar
-- | parses an expression inside parenthises
pParenth :: Parser Expr
pParenth = do
void $ mfilter (== '(') pChar
expr <- parseAddSub
(void $ mfilter (== ')') pChar) <|> parseError
return expr
-- | parses the most prioritised operations
parseTerm :: Parser Expr
parseTerm = pParenth <|> parseVal
parseAddSub :: Parser Expr
parseAddSub = do
x <- parseTerm
ys <- many parseSecond
return $ foldl (\acc (sgn, y) -> Op $
(if sgn == '+' then Add else Sub) acc y) x ys
where
parseSecond :: Parser (Char, Expr)
parseSecond = do
sgn <- mfilter ((flip elem) "+-") pChar
y <- parseTerm <|> parseError
return (sgn, y)
-- | Parses the whole expression. Begins from parsing on +, - level and
-- successfully consuming the whole string.
pExpr :: Parser Expr
pExpr = do
expr <- parseAddSub
pEof
return expr
-- | More convinient way to run 'pExpr' parser
parseExpr :: String -> Except ParseError Expr
parseExpr = runP pExpr
As a result, at this point function works as intended if given String expression is valid:
ghci> parseExpr "(2+3)-1"
Success (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0)))
ghci> parseExpr "(2+3-1)-1"
Success (Op (Sub (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0))) (Val 1.0)))
Otherwise ErrorAtPos does not point at the necessary position:
ghci> parseExpr "(2+)-1"
Error (ErrorAtPos 1)
ghci> parseExpr "(2+3-)-1"
Error (ErrorAtPos 1)
What am I doing wrong here? Thank you in advance.
My main assumption was that something wrong was with function (<|>) of Alternative Parser and it incorrectly changed pos variable.
(<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
P $ ES $ \(pos, s) ->
case runnerP (pos, s) of
-- Error _ -> runnerQ (pos, s)
Error (ErrorAtPos pos') -> runnerQ (pos' + pos, s)
Success res -> Success res
But it led to more strange results:
ghci> parseExpr "(5+)-3"
Error (ErrorAtPos 84)
ghci> parseExpr "(5+2-)-3"
Error (ErrorAtPos 372)
Then more doubts were aimed at joinExceptState function of instance Monad (ExceptState e s) in spite of everything I've run it through, doubts that it wasn't working on s of (Natural, String) type as I indented in this case. But then I can't really change it for this concrete type only.
Excellent question, although it would have been even better if it really included all your code. I filled in the missing pieces:
mapAnnotated :: (a -> b) -> Annotated s a -> Annotated s b
mapAnnotated f (a :# e) = (f a) :# e
runP :: Parser a -> String -> Except ParseError a
runP (P (ES {runES = p})) s = case p (0, s) of
Error e -> Error e
Success (a :# e) -> Success a
Why is parseExpr "(5+)-3" equal to Error (ErrorAtPos 1)? Here's what happens: we call parseExpr which (ultimately) calls parseTerm which is just pParenth <|> parseVal. pParenth fails, of course, so we look at the definition of <|> to work out what to do. That definition says: if the thing on the left fails, try the thing on the right. So we try the thing on the right (i. e. parseVal), which also fails, and we report the second error, which is in fact at position 1.
To see this more clearly, you can just replace pParenth <|> parseVal with parseVal <|> pParenth and observe that you get ErrorAtPos 2 instead.
This is almost certainly not the behaviour you want. The documentation of Megaparsec's p <|> q, here, says:
If [parser] p fails without consuming any input, parser q is tried.
(emphasis in original, meaning: parser q is not tried in other cases). This is a more useful thing to do. If you got reasonably far trying to parse a parenthesised expression and then got an error, probably you want to report that error rather than complaining that '(' isn't a digit.
Since you say this is an exercise, I'm not going to tell you how to fix the problem. I'll tell you some other stuff, though.
First, this is not your only issue with error reporting. Above we see that parseVal "(1" reports an error at position 1 (after the problematic character, which is at position 0) whereas pParenth "(5+)-3" reports an error at position 2 (before the problematic character, which is at position 3). Ideally, both should give the position of the problematic character itself. (Of course, it'd be even better if the parser stated what character it expected, but that's more difficult to do.)
Second, the way I found the problem was to import Debug.Trace, replace your definition of pChar with
pChar :: Parser Char
pChar = P $ ES $ \(pos, s) -> traceShow (pos, s) $
case s of
[] -> Error (ErrorAtPos pos)
(c:cs) -> Success (c :# (pos + 1, cs))
and mull over the output for a bit. Debug.Trace is sometimes less useful than one hopes, because of lazy evaluation, but for a program like this it can help a lot.
Third, if you modify your definition of <|> to match Megaparsec's does, you might need Megaparsec's try combinator. (Not for the grammar you're trying to parse now, but maybe later.) try solves the issue that
(singleChar 'p' *> singleChar 'q') <|> (singleChar 'p' *> singleChar 'r')
fails on the string "pr" with Megaparsec's <|>.
Fourth, you sometimes write someParser <|> parseError, which I think is equivalent to someParser for both your definition of <|> and Megaparsec's.
Fifth, you don't need void; just ignore the result, it's the same thing.
Sixth, your Except seems to just be Either.
I am completely new to haskell and seen examples online of how to add error handling but I'm not sure how to incorporate it in my context. Below is an example of the code which works before trying to handle errors.
expr'::Parser Double
expr' = term' `chainl1'` addop
term'::Parser Double
term' = factor' `chainl1` mulop
chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) <|> pure a
chainl1 ::Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
where
rest a = (do
f <- op
b <- p
rest (f a b)) <|> pure a
addop, mulop :: Parser (Double -> Double -> Double)
I've since expanded this to let addop and mulop return error messages if something irregular is found. This causes the function definition to change to:
addop, mulop :: Parser (Either String (Double -> Double -> Double))
In other programming languages I would check if f <- op is a String and return the string. However I'm not sure how to go about this in Haskell. The idea is that this error message returns all the way back to term'. Hence its function definition also needs to change eventually. This is all in the attempt to build a Monadic Parser.
If you're using parsec then you can make your code more general to work with the ParsecT monad transformer:
import Text.Parsec hiding (chainl1)
import Control.Monad.Trans.Class (lift)
expr' :: ParsecT String () (Either String) Double
expr' = term' `chainl1` addop
term' :: ParsecT String () (Either String) Double
term' = factor' `chainl1` mulop
factor' :: ParsecT String () (Either String) Double
factor' = read <$> many1 digit
chainl1 :: Monad m => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
chainl1 p op = p >>= rest
where
rest a = (do
f <- op
b <- p
rest (f a b))
<|> pure a
addop, mulop :: ParsecT String () (Either String) (Double -> Double -> Double)
addop = (+) <$ char '+' <|> (-) <$ char '-'
mulop = ((*) <$ char '*' <* lift (Left "error")) <|> (/) <$ char '/' <|> (**) <$ char '^'
I don't know what kind of errors you would want to return, so I've just made an error if an '*' is encountered in the input.
You can run the parser like this:
ghci> runParserT (expr' <* eof) () "buffer" "1+2+3"
Right (Right 6.0)
ghci> runParserT (expr' <* eof) () "buffer" "1+2*3"
Left "error"
The answer based on parsec implementation.
Actually the operator <|> is what you need. It handles any parsing errors. In expression a <|> b if the parser a fails then the parser b will be run (expect if the parser a consume some input before fails; for handle this case you can use combinator try like this: try a <|> b).
But if you want to handle error depending to the kind of error then you should do like #Noughtmare answered. But then I recomend you to do that:
Define your type for errors. It will be bugless to handle errors.
data MyError
= ME_DivByZero
| ...
You can simplify type signature if you define type alias for your parser.
type MyParser = ParsecT String () (Either MyError)
Then signatires will look like this:
expr' :: MyParser Double
addop, mulop :: MyParser (Double -> Double -> Double)
Use throwError to throw your errors and catchError to handle your errors, that will be more idiomatic. So it's look like this:
f <- catchError op $ \case
ME_DivByZero -> ...
ME_... -> ...
err -> throwError err -- rethrow error
My problem is how to combine the recursive, F-algebra-style recursive type definitions, with monadic/applicative-style parsers, in way that would scale to a realistic programming language.
I have just started with the Expr definition below:
data ExprF a = Plus a a |
Val Integer deriving (Functor,Show)
data Rec f = In (f (Rec f))
type Expr = Rec ExprF
and I am trying to combine it with a parser which uses anamorphisms:
ana :: Functor f => (a -> f a) -> a -> Rec f
ana psi x = In $ fmap (ana psi) (psi x)
parser = ana psi
where psi :: String -> ExprF String
psi = ???
as far as I could understand, in my example, psi should either parse just an integer, or it should decide that the string is a <expr> + <expr> and then (by recursively calling fmap (ana psi)), it should parse the left-hand side and the right-hand side expressions.
However, (monadic/applicative) parsers don't work like that:
they first attempt parsing the left-hand expression,
the +,
and the right-hand expression
One solution that I see, is to change the type definition for Plus a a to Plus Integer a, such that it reflects the parsing process, however this doesn't seem like the best avenue.
Any suggestions (or reading directions) would be welcome!
If you need a monadic parser, you need a monad in your unfold:
anaM :: (Traversable f, Monad m) => (a -> m (f a)) -> a -> m (Rec f)
anaM psiM x = In <$> (psiM x >>= traverse (anaM psiM))
Then you can write something that parses just one level of an ExprF like this:
parseNum :: Parser Integer
parseNum = -- ...
char :: Char -> Parser Char
char c = -- ...
parseExprF :: Maybe Integer -> Parser (ExprF (Maybe Integer))
parseExprF (Just n) = pure (Val n)
parseExprF Nothing = do
n <- parseNum
empty
<|> (Plus (Just n) Nothing <$ char '+')
<|> (pure (Val n))
Given that, you now have your recursive Expr parser:
parseExpr :: Parser Expr
parseExpr = anaM parseExprF Nothing
You will need to have instances of Foldable and Traversable for ExprF, of course, but the compiler can write these for you and they are not themselves recursive.
I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?
To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)
I'm trying to write a Parser for S Expressions from Prof. Yorgey's 2013 homework.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Given the following definitions, presented in the homework:
type Ident = String
-- An "atom" is either an integer value or an identifier.
data Atom = N Integer | I Ident
deriving Show
-- An S-expression is either an atom, or a list of S-expressions.
data SExpr = A Atom
| Comb [SExpr]
deriving Show
I wrote a parser for Parser Atom and Parser SExpr for A Atom.
parseAtom :: Parser Atom
parseAtom = alt n i
where n = (\_ z -> N z) <$> spaces <*> posInt
i = (\ _ z -> I z) <$> spaces <*> ident
parseAAtom :: Parser SExpr
parseAAtom = fmap (\x -> A x) parseAtom
Then, I attempted to write a parser to handle a Parser SExpr for the Comb ... case:
parseComb :: Parser SExpr
parseComb = (\_ _ x _ _ _ -> x) <$> (zeroOrMore spaces) <*> (char '(') <*>
(alt parseAAtom parseComb) <*> (zeroOrMore spaces)
<*> (char ')') <*> (zeroOrMore spaces)
Assuming that parseComb was right, I could simply make usage of oneOrMore for Parser [SExpr].
parseCombElements :: Parser [SExpr]
parseCombElements = oneOrMore parseComb
So, my two last functions compile, but running runParser parseComb "( foo )" never terminates.
What's wrong with my parseComb definition? Please don't give me the whole answer, but rather a hint - for my own learning.
I am very suspicious of zeroOrMore spaces, because spaces is usually a parser which itself parses zero or more spaces. Which means that it can parse the empty string if there aren't any spaces at that point. In particular, the spaces parser always succeeds.
But when you apply zeroOrMore to a parser that always succeeds, the combined parser will never stop - because zeroOrMore only stops trying again once its parser argument fails.
As an aside, Applicative expressions like (\_ _ x _ _ _ -> x) <$> ... <*> ... <*> ...... which only use a single of the subparsers can usually be written more succinctly with the *> and <* combinators:
... *> ... *> x_parser_here <* ... <* ...