My problem is how to combine the recursive, F-algebra-style recursive type definitions, with monadic/applicative-style parsers, in way that would scale to a realistic programming language.
I have just started with the Expr definition below:
data ExprF a = Plus a a |
Val Integer deriving (Functor,Show)
data Rec f = In (f (Rec f))
type Expr = Rec ExprF
and I am trying to combine it with a parser which uses anamorphisms:
ana :: Functor f => (a -> f a) -> a -> Rec f
ana psi x = In $ fmap (ana psi) (psi x)
parser = ana psi
where psi :: String -> ExprF String
psi = ???
as far as I could understand, in my example, psi should either parse just an integer, or it should decide that the string is a <expr> + <expr> and then (by recursively calling fmap (ana psi)), it should parse the left-hand side and the right-hand side expressions.
However, (monadic/applicative) parsers don't work like that:
they first attempt parsing the left-hand expression,
the +,
and the right-hand expression
One solution that I see, is to change the type definition for Plus a a to Plus Integer a, such that it reflects the parsing process, however this doesn't seem like the best avenue.
Any suggestions (or reading directions) would be welcome!
If you need a monadic parser, you need a monad in your unfold:
anaM :: (Traversable f, Monad m) => (a -> m (f a)) -> a -> m (Rec f)
anaM psiM x = In <$> (psiM x >>= traverse (anaM psiM))
Then you can write something that parses just one level of an ExprF like this:
parseNum :: Parser Integer
parseNum = -- ...
char :: Char -> Parser Char
char c = -- ...
parseExprF :: Maybe Integer -> Parser (ExprF (Maybe Integer))
parseExprF (Just n) = pure (Val n)
parseExprF Nothing = do
n <- parseNum
empty
<|> (Plus (Just n) Nothing <$ char '+')
<|> (pure (Val n))
Given that, you now have your recursive Expr parser:
parseExpr :: Parser Expr
parseExpr = anaM parseExprF Nothing
You will need to have instances of Foldable and Traversable for ExprF, of course, but the compiler can write these for you and they are not themselves recursive.
Related
I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)
TL;DR
I'm trying to understand how this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
where
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
Is equivalent to this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = do
c <- anyChar
if pred c then return c else empty
Context
This is a snippet from some lecture notes on Haskell parsing, which I'm trying to understand:
import Control.Applicative
import Data.Char
import Data.Functor
import Data.List
newtype Parser a = PsrOf (String -> Maybe (String, a))
-- Function from input string to:
--
-- * Nothing, if failure (syntax error);
-- * Just (unconsumed input, answer), if success.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) = p
-- Monadic Parsing in Haskell uses [] instead of Maybe to support ambiguous
-- grammars and multiple answers.
-- | Use a parser on an input string.
runParser :: Parser a -> String -> Maybe a
runParser (PsrOf p) inp = case p inp of
Nothing -> Nothing
Just (_, a) -> Just a
-- OR: fmap (\(_,a) -> a) (p inp)
-- | Read a character and return. Failure if input is empty.
anyChar :: Parser Char
anyChar = PsrOf p
where
p "" = Nothing
p (c:cs) = Just (cs, c)
-- | Read a character and check against the given character.
char :: Char -> Parser Char
-- char wanted = PsrOf p
-- where
-- p (c:cs) | c == wanted = Just (cs, c)
-- p _ = Nothing
char wanted = satisfy (\c -> c == wanted) -- (== wanted)
-- | Read a character and check against the given predicate.
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
where
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
-- Could also be:
-- satisfy pred = do
-- c <- anyChar
-- if pred c then return c else empty
instance Monad Parser where
-- return :: a -> Parser a
return = pure
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
PsrOf p1 >>= k = PsrOf q
where
q inp = case p1 inp of
Nothing -> Nothing
Just (rest, a) -> dePsr (k a) rest
I understand everything up until the last bit of the Monad definition, specifically I don't understand how the following line returns something of type Parser b as is required by the (>>=) definition:
Just (rest, a) -> dePsr (k a) rest
It's difficult for me grasp what the Monad definition means without an example. Thankfully, we have one in the alternate version of the satisfy function, which uses do-notation (which of course means the Monad is being called). I really don't understand do-notation yet, so here's the desugared version of satisfy:
satisfy pred = do
anyChar >>= (c ->
if pred c then return c else empty)
So based on the first line of our (>>=)definition, which is
PsrOf p1 >>= k = PsrOf q
We have anyChar as our PsrOf p1 and (c -> if pred c then return c else empty) as our k. What I don't get is how in dePsr (k a) rest that (k a) returns a Parser (at least it shold, otherwise calling dePsr on it wouldn't make sense). This is made more confusing by the presence of rest. Even if (k a) returned a Parser, calling dePsr would extract the underlying function from the returned Parser and pass rest to it as an input. This is definitely doesn't return something of type Parser b as required by the definition of (>>=). Clearly I'm misunderstanding something somewhere.
Ok, Maybe this will help. Let's start by puting some points back into dePsr.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) rest = p rest
And let's also write out return: (NB I'm putting in all the points for clarity)
return :: a -> Parser a
return a = PsrOf (\rest -> Just (rest, a))
And now from the Just branch of the (>>=) definition
Just (rest, a) -> dePsr (k a) rest
Let's make sure we agree on what every thing is:
rest the string remaining unparsed after p1 is applied
a the result of applying p1
k :: a -> Parser b takes the result of the previous parser and makes a new parser
dePsr unwraps a Parser a back into a function `String -> Maybe (String, a)
Remember we will wrap this back into a parser again at the top of the function: PsrOf q
So in English bind (>>=) take a parser in a and a function from a to a parser in b and returns a parser in b. The resulting parser is made by wrapping q :: String -> Maybe (String, b) in the Parser constructor PsrOf. Then q, the combined parser, take a String called inp and applies the function p1 :: String -> Maybe (String,a) that we got from pattern matching against the first parser, and pattern matches on the result. For an error we propagate Nothing (easy). If the first parser had a result we have tow pieces of information, the still unparsed string called rest and the result a. We give a to k, the second parser combinator, and get a Parser b which we need to unwrap with dePsr to get a function (String -> Maybe (String,b) back. That function can be applied to rest for the final result of the combined parsers.
I think the hardest part about reading this is that sometimes we curry the parser function which obscures what is actually happening.
Ok for the satisfy example
satisfy pred
= anyChar >>= (c -> if pred c then return c else empty)
empty comes from the alternative instance and is PsrOf (const Nothing) so a parser that always fails.
Lets look at only the successful branches. By substitution of only the successful part:
PsrOf (\(c:cs) ->Just (cs, c)) >>= (\c -> PsrOf (\rest -> Just (rest, c)))
So in the bind (>>=) definition
p1 = \(c:cs -> Just (cs, c))
k = (\c -> PsrOf (\rest -> Just (rest, c)))
q inp = let Just (rest,a) = p1 inp in dePsr (k a) rest again only successful branch
Then q becomes
q inp =
let Just (rest, a) = (\(c:cs) -> Just (cs, c)) inp
in dePsr (\c -> PsrOf (\rest -> Just (rest, c))) a rest
Doing a little β-reduction
q inp =
let (c:cs) = inp
rest = cs
a = c
in dePsr (PsdOf (\rest -> Just (rest, a))) rest -- dePsr . PsrOf = id
Finally cleaning up some more
q (c:cs) = Just (cs, c)
So if pred is successful we reduce satisfy back to exactly anyChar which is exactly what we expect, and exactly what we find in the first example of the question. I will leave it as and exersize to the reader (read: I'm lazy) to prove that if either inp = "" or pred c = False that the outcome is Nothing as in the first satisfy example.
NOTE: If you are doing anything other than a class assignment, you will save yourself hours of pain and frustration by starting with error handling from the beginning make your parser String -> Either String (String,a) it is easy to make the error type more general later, but a PITA to change everything from Maybe to Either.
Question: "[C]ould you explain how you arrived at return a = PsrOf (\rest -> Just (rest, a)) from return = pure after you put "points" back into return?
Answer: First off, it is pretty unfortunate to give the Monad instance definition without the Functor and Applicative definitions. The pure and return functions must be identical (It is part of the Monad Laws), and they would be called the same thing except Monad far predates Applicative in Haskell history. In point of fact, I don't "know" what pure looks like, but I know what it has to be because it is the only possible definition. (If you want to understand the the proof of that statement ask, I have read the papers, and I know the results, but I'm not into typed lambda calculus quite enough to be confident in reproducing the results.)
return must wrap a value in the context without altering the context.
return :: Monad m => a -> m a
return :: a -> Parser a -- for our Monad
return :: a -> PsrOf(\str -> Maybe (rest, value)) -- substituting the constructor (PSUDO CODE)
A Parser is a function that takes a string to be parsed and returns Just the value along with any unparsed portion of the original string or Nothing on failure, all wrapped in the constructorPsrOf. The context is the string to be parsed, so we cannot change that. The value is of course what was passed toreturn`. The parser always succeeds so we must return Just a value.
return a = PsrOf (\rest -> Just (rest, a))
rest is the context and it is passed through unaltered.
a is the value we put into the Monad context.
For completeness here is also the only reasonable definition of fmap from Functor.
fmap :: Functor f => (a->b) -> f a -> f b
fmap :: (a -> b) -> Parser a -> Parser b -- for Parser Monad
fmap f (PsrOf p) = PsrOf q
where q inp = case p inp of
Nothing -> Nothing
Just (rest, a) -> Just (rest, f a)
-- better but less instructive definition of q
-- q = fmap (\(rest,a) -> (rest, f a)) . p
I'm trying to figure out how to build a "purely applicative parser" based on a simple parser implementation. The parser would not use monads in its implementation. I asked this question previously but mis-framed it so I'm trying again.
Here is the basic type and its Functor, Applicative and Alternative implementations:
newtype Parser a = Parser { parse :: String -> [(a,String)] }
instance Functor Parser where
fmap f (Parser cs) = Parser (\s -> [(f a, b) | (a, b) <- cs s])
instance Applicative Parser where
pure = Parser (\s -> [(a,s)])
(Parser cs1) <*> (Parser cs2) = Parser (\s -> [(f a, s2) | (f, s1) <- cs1 s, (a, s2) <- cs2 s1])
instance Alternative Parser where
empty = Parser $ \s -> []
p <|> q = Parser $ \s ->
case parse p s of
[] -> parse q s
r -> r
The item function takes a character off the stream:
item :: Parser Char
item = Parser $ \s ->
case s of
[] -> []
(c:cs) -> [(c,cs)]
At this point, I want to implement digit. I can of course do this:
digit = Parser $ \s ->
case s of
[] -> []
(c:cs) -> if isDigit c then [(c, cs)] else []
but I'm replicating the code of item. I'd like to implement digit based on item.
How do I go about implementing digit, using item to take a character off the stream and then checking to see if the character is a digit without bringing monadic concepts into the implementation?
First, let us write down all the tools we currently have at hand:
-- Data constructor
Parser :: (String -> [(a, String)]) -> Parser a
-- field accessor
parse :: Parser a -> String -> [(a, String)]
-- instances, replace 'f' by 'Parser'
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
pure :: Applicative f => a -> f a
-- the parser at hand
item :: Parser Char
-- the parser we want to write with item
digit :: Parser Char
digit = magic item
-- ?
magic :: Parser Char -> Parser Char
The real question at hand is "what is magic"? There are only so many things we can use. Its type indicates fmap, but we can rule that out. All we can provide is some function a -> b, but there is no f :: Char -> Char that makes fmap f indicate a failure.
What about (<*>), can this help? Well, again, the answer is no. The only thing we can do here is to take the (a -> b) out of the context and apply it; whatever that means in the context of the given Applicative. We can rule pure out.
The problem is that we need to check the Char that item might parse and change the context. We need something like Char -> Parser Char
But we didn't rule Parser or parse out!
magic p = Parser $ \s ->
case parse p s of -- < item will be used here
[(c, cs)] -> if isDigit c then [(c, cs)] else []
_ -> []
Yes, I know, it's duplicate code, but now it's using item. It's using item before inspecting the character. That's the only way we can use item here. And now, there is some kind of sequence implied: item has to succeed before digit can do it's work.
Alternatively, we could have tried this way:
digit' c :: Char -> Parser Char
digit' c = if isDigit c then pure c else empty
But then fmap digit' item would have the type Parser (Parser Char), which can only get collapsed with a join-like function. That's why monads are more powerful than applicative.
That being said, you can get around all of the monad requirements if you use a more general function first:
satisfy :: (Char -> Bool) -> Parser Char
satisfy = Parser $ \s ->
case s of
(c:cs) | p c -> [(c, cs)]
_ -> []
You can then define both item and digit in terms of satisfy:
item = satisfy (const True)
digit = satisfy isDigit
That way digit does not have to inspect the result of a previous parser.
Functors allow you to act on somethings values. For example, if you have a list [1,2,3], you can change the contents. Note that Functors do not allow changing structure. map can not change the length of a list.
Applicatives allow you to combine structure, and the content is mushed together somehow. But the values can not change influence the structure.
Namely, given an item, we can change its structure, and we can change its content, but the content can not change the structure. We can't choose to fail on some content and not other.
If anyone knows how to state this more formally and provably, I'm all ears (it probably has to do with free theorems).
EDITED for more complete problem:
I'd like to create a parser (I'm using uu-parsinglib) that takes the result of a previous parser, and conditionally fails if the result contains a certain constructor:
I now realise this must be a monadic parser.
I have a grammar which contains non-direct left recursive. Below illustrates the problem, the reality is slightly more convoluted:
data Field =
Field_A A
Field_B B
Field_C C
Field_D String
data A =
A Prefix String
data B =
B Prefix String
data C =
C Prefix String
data Prefix =
Prefix Field
Most of the time I'm only interested in Field, and in the interests of minimising backtracking, its best to focus on that case.
I've defined an operator to help
(<..>) :: IsParser p => p (a -> b) -> p (b -> c) -> p (a -> c)
g <..> f = (.) <$> f <*> g
And I approach the problem as:
pField :: Parser Field
pField =
( Field_D <$> pString ) <??>
pChainl' ( pReturn (helper) <*> pPrefix' ) ( pA' <<|> pB' <<|> pC' )
where pChainl' :: IsParser p => p (f -> (pre -> f) -> f) ->
p (pre -> f) ->
p (f -> f)
pChainl' op x = must_be_non_empties "pChainl'" op x (
flip f <$> pList1 (flip <$> op <*> x)
)
f x [] = x
f x (func:rest) = f (func x) rest
helper :: (Field -> Prefix) ->
Field ->
(Prefix -> Field) ->
Field
helper p i n = n $ p i
Note I've defined a variant of pChainl that allows the initial field to be passed in, whilst keeping left association.
pA' :: Parser (Prefix -> Field)
pA' = ( (flip A) <$> pString ) <..> pReturn Field_A
pB' :: Parser (Prefix -> Field)
pB' = ( (flip B) <$> pString ) <..> pReturn Field_B
pC' :: Parser (Prefix -> Field)
pC' = ( (flip C) <$> pString ) <..> pReturn Field_C
-- This consumes no input
pPrefix' :: Parser (Field -> Prefix)
pPrefix' = pReturn Prefix
The question
I'd like to define
pA :: Parser A
in terms of pField, with a post filter to fail if the rightmost Field constructor is not Field_A. As has rightly been pointed out, this is a monadic parse. I can't find any compelling examples of using uu-parsinglib as a monadic parser, so what would your suggested approach be?
If I'm barking up the wrong tree, please let me know also.
It seems like you could make a generalized conditional parser that only succeeds if the value returned by the parser passes some test. This uses the monad capabilities of course. I am not sure if this is a good thing to do with uu-parsinglib however. It seems to work fine in my testing, with one exception: when the conditional fails and no other parsers are available to consume input, the library throws an exception. (something along the lines of no correcting steps given...)
pConditional :: Parser a -> (a -> Bool) -> Parser a
pConditional p test = p >>= (\result -> case (test result) of
True -> pure result
False -> empty)
I would also like to know of other pitfalls that would arise from liberal use of such a conditional parser. (if any.)
I think I've found a solution. I'm still interested in hearing thoughts on the best way to parse such indirect left recursion.
The proposed solution is
pA :: Parser A
pA = do
a <- pField
case a of
(Field_A r) -> return r
otherwise -> pFail
I'm using Parsec to build a simple Lisp parser.
What are the (dis)advantages of using a custom ADT for the parser types versus using a standard Tree (i.e. Data.Tree)?
After trying both ways, I've come up with a couple points for custom ADTs (i.e. Parser ASTNode):
seems to be much clearer and simpler
others have done it this way(including Tiger, which is/was bundled with Parsec)
and one against (i.e. Parser (Tree ASTNode):
Data.Tree already has Functor, Monad, etc. instances, which will be very helpful for semantic analysis, evaluation, calculating code statistics
For example:
custom ADT
import Text.ParserCombinators.Parsec
data ASTNode
= Application ASTNode [ASTNode]
| Symbol String
| Number Float
deriving (Show)
int :: Parser ASTNode
int = many1 digit >>= (return . Number . read)
symbol :: Parser ASTNode
symbol = many1 (oneOf ['a'..'z']) >>= (return . Symbol)
whitespace :: Parser String
whitespace = many1 (oneOf " \t\n\r\f")
app :: Parser ASTNode
app =
char '(' >>
sepBy1 expr whitespace >>= (\(e:es) ->
char ')' >>
(return $ Application e es))
expr :: Parser ASTNode
expr = symbol <|> int <|> app
example use:
ghci> parse expr "" "(a 12 (b 13))"
Right
(Application
(Symbol "a")
[Number 12.0, Application
(Symbol "b")
[Number 13.0]])
Data.Tree
import Text.ParserCombinators.Parsec
import Data.Tree
data ASTNode
= Application (Tree ASTNode)
| Symbol String
| Number Float
deriving (Show)
int :: Parser (Tree ASTNode)
int = many1 digit >>= (\x -> return $ Node (Number $ read x) [])
symbol :: Parser (Tree ASTNode)
symbol = many1 (oneOf ['a' .. 'z']) >>= (\x -> return $ Node (Symbol x) [])
whitespace :: Parser String
whitespace = many1 (oneOf " \t\n\r\f")
app :: Parser (Tree ASTNode)
app =
char '(' >>
sepBy1 expr whitespace >>= (\(e:es) ->
char ')' >>
(return $ Node (Application e) es))
expr :: Parser (Tree ASTNode)
expr = symbol <|> int <|> app
and example use:
ghci> parse expr "" "(a 12 (b 13))"
Right
(Node
(Application
(Node (Symbol "a") []))
[Node (Number 12.0) [],
Node
(Application
(Node (Symbol "b") []))
[Node (Number 13.0) []]])
(sorry for the formatting -- hopefully it's clear)
I'd absolutely go for the AST, because interpretation/compilation/language analysis in general is very much driven by the structure of your language. The AST will simply and naturally represent and respect that structure, while Tree will do neither.
For example, a common form of language implementation technique is to implement some complex features by translation: translate programs that involve those features or constructs into programs in a subset of the a language that does not use them (Lisp macros, for example, are all about this). If you use an AST, the type system will, for example, often forbid you from producing illegal translations as output. Whereas a Tree type that doesn't understand your program will not help there.
Your AST doesn't look very complicated, so writing utility functions for it should not be hard. Take this one for example:
foldASTNode :: (r -> [r] -> r) -> (String -> r) -> (Float -> r) -> r
foldASTNode app sym num node =
case node of
Application f args -> app (subfold f) (map subfold args)
Symbol str -> sym str
Number n -> num n
where subfold = foldASTNode app sym num
And in any case, what sort of Functor do you wish to have on your AST? There's no type parameter on it...