Making attoparsec parsers recursive - parsing

I've been coding up an attoparsec parser and have been hitting a pattern where I want to turn parsers into recursive parsers (recursively combining them with the monad bind >>= operator).
So I created a function to turn a parser into a recursive parser as follows:
recursiveParser :: (a -> A.Parser a) -> a -> A.Parser a
recursiveParser parser a = (parser a >>= recursiveParser parser) <|> return a
Which is useful if you have a recursive data type like
data Expression = ConsExpr Expression Expression | EmptyExpr
parseRHS :: Expression -> Parser Expression
parseRHS e = ConsExpr e <$> parseFoo
parseExpression :: Parser Expression
parseExpression = parseLHS >>= recursiveParser parseRHS
where parseLHS = parseRHS EmptyExpr
Is there a more idiomatic solution? It almost seems like recursiveParser should be some kind of fold... I also saw sepBy in the docs, but this method seems to suit me better for my application.
EDIT: Oh, actually now that I think about it should actually be something similar to fix... Don't know how I forgot about that.
EDIT2: Rotsor makes a good point with his alternative for my example, but I'm afraid my AST is actually a bit more complicated than that. It actually looks something more like this (although this is still simplified)
data Segment = Choice1 Expression
| Choice2 Expression
data Expression = ConsExpr Segment Expression
| Token String
| EmptyExpr
where the string a -> b brackets to the right and c:d brackets to the left, with : binding more tightly than ->.
I.e. a -> b evaluates to
(ConsExpr (Choice1 (Token "a")) (Token "b"))
and c:d evaluates to
(ConsExpr (Choice2 (Token "d")) (Token "c"))
I suppose I could use foldl for the one and foldr for the other but there's still more complexity in there. Note that it's recursive in a slightly strange way, so "a:b:c -> e:f -> :g:h ->" is actually a valid string, but "-> a" and "b:" are not. In the end fix seemed simpler to me. I've renamed the recursive method like so:
fixParser :: (a -> A.Parser a) -> a -> A.Parser a
fixParser parser a = (parser a >>= fixParser parser) <|> pure a
Thanks.

Why not just parse a list and fold it into whatever you want later?
Maybe I am missing something, but this looks more natural to me:
consChain :: [Expression] -> Expression
consChain = foldl ConsExpr EmptyExpr
parseExpression :: Parser Expression
parseExpression = consChain <$> many1 parseFoo
And it's shorter too.
As you can see, consChain is now independent from parsing and can be useful somewhere else. Also, if you separate out the result folding, the somewhat unintuitive recursive parsing simplifies down to many or many1 in this case.
You may want to take a look at how many is implemented too:
many :: (Alternative f) => f a -> f [a]
many v = many_v
where many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
It has a lot in common with your recursiveParser:
some_v is similar to parser a >>= recursiveParser parser
many_v is similar to recursiveParser parser
You may ask why I called your recursive parser function unintuitive. This is because this pattern allows parser argument to affect the parsing behaviour (a -> A.Parser a, remember?), which may be useful, but not obviously (I don't see a use case for this yet). The fact that your example does not use this feature makes it look redundant.

Related

Applicative parser stuck in infinite loop

I'm trying to implement my own Applicative parser, here's the code I use:
{-# LANGUAGE ApplicativeDo, LambdaCase #-}
module Parser where
-- Implementation of an Applicative Parser
import Data.Char
import Control.Applicative (some, many, empty, (<*>), (<$>), (<|>), Alternative)
data Parser a = Parser { runParser :: String -> [(a, String)] }
instance Functor Parser where
-- fmap :: (a -> b) -> (Parser a -> Parser b)
fmap f (Parser p) = Parser (\s -> [(f a, s') | (a,s') <- p s])
instance Applicative Parser where
-- pure :: a -> Parser a
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s ->
[(f a, s'') | (f, s') <- pf s, (a, s'') <- p s']
instance Alternative Parser where
-- empty :: Parser a
-- <|> :: Parser a -> Parser a -> Parser a
empty = Parser $ \_s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of [] -> p2 s
xs -> xs
char :: Char -> Parser Char
char c = Parser $ \case (c':cs) | c == c' -> [(c,cs)] ; _ -> []
main = print $ runParser (some $ char 'A') "AAA"
When I run it, it gets stuck and never returns. After digging into the problem I pinpointed the root cause to be my implementation of the <|> method. If I use the following implementation then everything goes as expected:
instance Alternative Parser where
empty = Parser $ \_s -> []
p1 <|> p2 = Parser $ \s ->
case runParser p1 s of [] -> runParser p2 s
xs -> xs
These two implementations are, in my understanding, quite equivalent. What I guess is that this may have something to do with Haskell's lazy evaluation scheme. Can someone explain what's going on?
Fact "star": in your implementation of (<*>):
Parser p1 <*> Parser p2 = ...
...we must compute enough to know that both arguments are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equation.
Fact "pipe strict": in this implementation:
Parser p1 <|> Parser p2 = ...
...we must compute enough to know that both parsers are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equals sign.
Fact "pipe lazy": in this implementation:
p1 <|> p2 = Parser $ ...
...we may proceed to the right-hand side of the equals sign without doing any computation on p1 or p2.
This is important, because:
some v = some_v where
some_v = pure (:) <*> v <*> (some_v <|> pure [])
Let's take your first implementation, the one about which we know the "pipe strict" fact. We want to know if some_v is an application of Parser to something. Thanks to fact "star", we must therefore know whether pure (:), v, and some_v <|> pure [] are applications of Parser to something. To know this last one, by fact "pipe strict", we must know whether some_v and pure [] are applications of Parser to something. Whoops! We just showed that to know whether some_v is an application of Parser to something, we need to know whether some_v is an application of Parser to something -- an infinite loop!
On the other hand, with your second implementation, to check whether some_v is a Parser _, we still must check pure (:), v, and some_v <|> pure [], but thanks to fact "pipe lazy", that's all we need to check -- we can be confident that some_v <|> pure [] is a Parser _ without first checking recursively that some_v and pure [] are.
(And next, you will learn about newtype -- and be confused yet again when changing from data to newtype makes both implementation work!)

Implementation differences between parser combinators and packrat algorithm

In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind):
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (\dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail [] = Parser (\dvs -> NoParse (nullError dvs))
fail msg = Parser (\dvs -> NoParse (msgError (dvPos dvs) msg))
For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s
I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part.
Sadly it seems that this concept is not used in this implementation.
What is the big difference between parser combinators and packrat at implementation level?
PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
[] -> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat(). Its implementation is almost a drop-in replacement for pyparsing's _parse method (renamed to _parseNoCache), with a _parseCache method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat(), or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.

Implement `Applicative Parser`'s Apply Function

From Brent Yorgey's 2013 Penn class, after getting help on defining a Functor Parser, I'm attempting to make an Applicative Parser:
--p1 <*> p2 represents the parser which first runs p1 (which will
--consume some input and produce a function), then passes the
--remaining input to p2 (which consumes more input and produces
--some value), then returns the result of applying the function to the
--value
Here's my attempt:
instance Applicative (Parser) where
pure x = Parser $ \_ -> Just (x, [])
(Parser f) <*> (Parser g) = case (\ys -> f ys) of Nothing -> Parser Nothing
Just (_, xs) -> Parser $ g xs
However, I'm getting compile-time errors on the apply (<*>) definition.
Intuitively, I believe that using <*> achieves AND functionality.
If I have a parser for foo and a parser for bar, then I should be able to use apply <*> to say: foo followed by bar. In other words, input of foobar should successfully match, whereas foobip would not. It would fail on the second parser.
However, I believe that the types are:
Parser (a -> b) -> Parser a -> Parser b
So, that makes me think that my intuition is not entirely correct.
Please give me a tip to guide me towards understanding how to implement apply.
Your code is predicated on a misunderstanding of what a Parser is. Don't worry, virtually everybody makes this mistake.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Lets break down what this means.
String -> Maybe (a, String)
[1] [2] [3] [4]
[1]: I take a string and return Maybe (a, String)
[2]: I might not succeed in parsing the input into the desired datatype
[3]: The desired type I am parsing the String into
[4]: Remaining input after having consumed the amount of data required to parse a
Parser is a function of text input to Maybe a tuple of a value and the rest of the text. Parser is emphatically not a tuple, otherwise you wouldn't have a parser. Just data in a tuple.
I'm not going to tell you how to implement <*> and nobody else should either as it would deprive you of the experience.
However, I'll give you pure so you understand the basic pattern:
pure a = Parser (\s -> Just (a, s))
See? It's a function of s -> Maybe (a, s). I intentionally mimicked the type variables in my terms to make it more obvious.

Conditional looping in an Applicative Functor

Suppose that Parser x is a parser that parses an x. This parser probably possesses a many combinator, that parses zero or more occurrences of something (stopping when the item parser fails).
I can see how one might implement that if Parser forms a monad. I can't figure out how to do it if Parser is only an Applicative Functor. There doesn't seem to be any way to check the previous result and decide what to do next (precisely the notion that monads add). What am I missing?
The Alternative type class provides the many combinator:
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
many :: f a -> f [a]
some :: f a -> f [a]
some = some'
many = many'
many' a = some' a <|> pure []
some' a = (:) <$> a <*> many' a
The many a combinator means “zero or more” a.
The some a combinator means “one or more” a.
Hence:
The some a combinator returns a list of one a followed by many a (i.e. 1 + (0 or more)).
The many a combinator returns either some a or an empty list (i.e. (1 or more) | 0).
The many combinator depends upon the (<|>) operator which can be viewed as the default operator in languages like JavaScript. For example, consider the Alternative instance of Maybe:
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
Essentially the (<|>) should return the left hand side value if it's truthy. Otherwise it should return the right hand side value.
A Parser is a data structure which is defined similarly to Maybe (the idea of applicative lexer combinators and parser combinators is essentially the same):
data Lexer a = Fail | Ok (Maybe a) (Vec (Lexer a))
If parsing fails, the Fail value is returned. Otherwise an Ok value is returned. Since Fail <|> pure [] is pure [], this is how the many combinator knows when to stop and return an empty list.
It can't be done just by using what is provided by Applicative. But Alternative has a function that gives you power beyond Applicative:
(<|>) :: f a -> f a -> f a
This function lets you "combine" two Alternatives without any restriction whatsoever on the a. But how? Something intrinsic to the particular functor f must give you a means to do that.
Typically, Alternatives require some notion of failure or emptiness. Like for parsers, where (<|>) means "try to parse this, if it fails, try this other thing". But this "dependence on a previous value" is hidden in the machinery implementing (<|>). It is not available to the external interface, so to speak.
From (<|>), one can implement a zero-or-one combinator:
optional :: Alternative f => f a -> f (Maybe a)
optional v = Just <$> v <|> pure Nothing
The definitions of some an many are similar but they require mutually recursive functions.
Notice that there are Applicatives that aren't Alternatives. You can't make the Identity functor an Alternative, for example. How would you implement empty?
many is a class method of the Alternative class (link) which suggests that an general applicative functor does not always have a many implementation.

Haskell: Lifting a reads function to a parsec parser

As part of the 4th exercise here
I would like to use a reads type function such as readHex with a parsec Parser.
To do this I have written a function:
liftReadsToParse :: Parser String -> (String -> [(a, String)]) -> Parser a
liftReadsToParse p f = p >>= \s -> if null (f s) then fail "No parse" else (return . fst . head ) (f s)
Which can be used, for example in GHCI, like this:
*Main Numeric> parse (liftReadsToParse (many1 hexDigit) readHex) "" "a1"
Right 161
Can anyone suggest any improvement to this approach with regard to:
Will the term (f s) be memoised, or evaluated twice in the case of a null (f s) returning False?
Handling multiple successful parses, i.e. when length (f s) is greater than one, I do not know how parsec deals with this.
Handling the remainder of the parse, i.e. (snd . head) (f s).
This is a nice idea. A more natural approach that would make
your ReadS parser fit in better with Parsec would be to
leave off the Parser String at the beginning of the type:
liftReadS :: ReadS a -> String -> Parser a
liftReadS reader = maybe (unexpected "no parse") (return . fst) .
listToMaybe . filter (null . snd) . reader
This "combinator" style is very idiomatic Haskell - once you
get used to it, it makes function definitions much easier
to read and understand.
You would then use liftReadS like this in the simple case:
> parse (many1 hexDigit >>= liftReadS readHex) "" "a1"
(Note that listToMaybe is in the Data.Maybe module.)
In more complex cases, liftReadS is easy to use inside any
Parsec do block.
Regarding some of your other questions:
The function reader is applied only once now, so there is nothing to "memoize".
It is common and accepted practice to ignore all except the first parse in a ReadS parser in most cases, so you're fine.
To answer the first part of your question, no (f s) will not be memoised, you would have to do that manually:
liftReadsToParse p f = p >>= \s -> let fs = f s in if null fs then fail "No parse"
else (return . fst . head ) fs
But I'd use pattern matching instead:
liftReadsToParse p f = p >>= \s -> case f s of
[] -> fail "No parse"
(answer, _) : _ -> return answer

Resources