Applicative parser stuck in infinite loop - parsing

I'm trying to implement my own Applicative parser, here's the code I use:
{-# LANGUAGE ApplicativeDo, LambdaCase #-}
module Parser where
-- Implementation of an Applicative Parser
import Data.Char
import Control.Applicative (some, many, empty, (<*>), (<$>), (<|>), Alternative)
data Parser a = Parser { runParser :: String -> [(a, String)] }
instance Functor Parser where
-- fmap :: (a -> b) -> (Parser a -> Parser b)
fmap f (Parser p) = Parser (\s -> [(f a, s') | (a,s') <- p s])
instance Applicative Parser where
-- pure :: a -> Parser a
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
pure x = Parser $ \s -> [(x, s)]
(Parser pf) <*> (Parser p) = Parser $ \s ->
[(f a, s'') | (f, s') <- pf s, (a, s'') <- p s']
instance Alternative Parser where
-- empty :: Parser a
-- <|> :: Parser a -> Parser a -> Parser a
empty = Parser $ \_s -> []
(Parser p1) <|> (Parser p2) = Parser $ \s ->
case p1 s of [] -> p2 s
xs -> xs
char :: Char -> Parser Char
char c = Parser $ \case (c':cs) | c == c' -> [(c,cs)] ; _ -> []
main = print $ runParser (some $ char 'A') "AAA"
When I run it, it gets stuck and never returns. After digging into the problem I pinpointed the root cause to be my implementation of the <|> method. If I use the following implementation then everything goes as expected:
instance Alternative Parser where
empty = Parser $ \_s -> []
p1 <|> p2 = Parser $ \s ->
case runParser p1 s of [] -> runParser p2 s
xs -> xs
These two implementations are, in my understanding, quite equivalent. What I guess is that this may have something to do with Haskell's lazy evaluation scheme. Can someone explain what's going on?

Fact "star": in your implementation of (<*>):
Parser p1 <*> Parser p2 = ...
...we must compute enough to know that both arguments are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equation.
Fact "pipe strict": in this implementation:
Parser p1 <|> Parser p2 = ...
...we must compute enough to know that both parsers are actually applications of the Parser constructor to something before we may proceed to the right-hand side of the equals sign.
Fact "pipe lazy": in this implementation:
p1 <|> p2 = Parser $ ...
...we may proceed to the right-hand side of the equals sign without doing any computation on p1 or p2.
This is important, because:
some v = some_v where
some_v = pure (:) <*> v <*> (some_v <|> pure [])
Let's take your first implementation, the one about which we know the "pipe strict" fact. We want to know if some_v is an application of Parser to something. Thanks to fact "star", we must therefore know whether pure (:), v, and some_v <|> pure [] are applications of Parser to something. To know this last one, by fact "pipe strict", we must know whether some_v and pure [] are applications of Parser to something. Whoops! We just showed that to know whether some_v is an application of Parser to something, we need to know whether some_v is an application of Parser to something -- an infinite loop!
On the other hand, with your second implementation, to check whether some_v is a Parser _, we still must check pure (:), v, and some_v <|> pure [], but thanks to fact "pipe lazy", that's all we need to check -- we can be confident that some_v <|> pure [] is a Parser _ without first checking recursively that some_v and pure [] are.
(And next, you will learn about newtype -- and be confused yet again when changing from data to newtype makes both implementation work!)

Related

Haskell Parser Seperator

I am using Parsec to write a parser for a logfile. Every line of that logfile follows a common structure A:B:C:D with the components A, B, C and D following simple rules. I've already written parsers for each of the components and I would like to combine them into a single parser. My current approach works, but I feel there has to be a nicer solution. One immediate drawback is that it would not scale very well for logfiles with more than 4 components.
parser :: (a -> b -> c -> d -> e) -> Parser a -> Parser b -> Parser c -> Parser d -> Parser e
parser f pa pb pc pd = f <$> pa <* (char ':') <*> pb <* (char ':') <*> pc <* (char ':') <*> pd
I searched for a fitting parser combinator, but the only combinator coming close is sepBy, which does not work for this use case. Any help is appreciated!
I think the best option is to introduce your own operator, e.g.:
infixl 4 <:>
p <:> q = p <* char ':' <*> q
Then you don't need to define a separate parse function, it is just as easy to just write the implementation:
myParser = f <$> pa <:> pb <:> pc <:> pd
This is easily extended:
myParser2 = g <$> pa <:> pb <:> pc <:> pd <:> pe

Can not understand 'pMany' and 'pMany1' in the parser combinator

I am trying to understand the pMany and pMany1 functions in the parser combinator
newtype Parser s t = P([s] -> [(t, [s])])
pMany, pMany1 :: Parser s a → Parser s [a]
pMany p =(:) <$> p <*> pMany p `opt` []
pMany1 p = (:) <$> p <*> pMany p
As I understand, pMany runs the parser as many times as possible and collects the final result in [a]. What I don't understand is how it keeps track of the results between each run. The applicative combination is context-free and should not remember the state in between. Is that correct?
Thanks a lot!
Let's break it down;
pMany p = (:) <$> p <*> pMany p `opt` []
basically means
pMany p = (fmap (:) p <*> pMany p) `opt` []
This expression consists of two parts:
Parse fmap (:) p <*> pMany p
If above fails, be okay with empty list as result
The idea here is to try to parse "an element and try to parse more" or "don't parse anything" if the previous step didn't succeed. I assume the second part is understandable, let's focus on the first.
What we will need here is to understand how do the fmap and <*> work:
fmap is pretty simple: it takes a function a -> b, a parser Parser s a and returns Parser s b. This allows us to explicitly manipulate the results of the parser without actually running it.
<*> works exactly the same as fmap with a little difference that the function is itself a result of a parsing. In (subjectively) most sane implementation it:
Runs the left side parser (consuming input) that returns a function
Runs the right side parser (on the remaining input) that returns an argument
Combines the above into a single parser that returns function applied to mentioned argument
So what happens in this mysterious fmap (:) p <*> pMany p:
First we parse some object of type a using parser p.
Then, inside of the parsing context, we apply function (:) :: a -> [a] -> [a] to it. So if we have parsed, let's say, an int 2137, we now have (:) 2137 which is the same as \rest -> 2137:rest. At this point we have parser of type Parser s ([a] -> [a]).
Next step is to parse the right side of the <*> operator which is recursive call to pMany. We can read this as "proceed with the same algorithm". Effectively, we parse the rest of the elements. This yields (according to the type of pMany) Parser s [a].
Finally, we apply the previous result over the last one getting a parser that attaches the left element (parsed with p) to the following ones (parsed with pMany p). From the type of <*> we can infer that the resulting type will be Parser s [a] as expected.
This code is semantically equivalent to
pMany p = (do
someElem <- p
restElems <- pMany p
return (someElem : restElems)
) `opt` []
pMany1 does the same trick, but fails if it can't parse the first element. Note that it calls pMany after that which doesn't have this property. As a result we force it to parse at least one thing ("parse one and then parse any number").

How do I implement an Applicative instance for a parser without assuming Monad?

I can't figure out how to implement an Applicative instance for this parser:
newtype Parser m s a = Parser { getParser :: [s] -> m ([s], a) }
without assuming Monad m. I expected to only have to assume Applicative m, since the Functor instance only has to assume Functor m. I finally ended up with:
instance Functor m => Functor (Parser m s) where
fmap f (Parser g) = Parser (fmap (fmap f) . g)
instance Monad m => Applicative (Parser m s) where
pure a = Parser (\xs -> pure (xs, a))
Parser f <*> Parser x = Parser h
where
h xs = f xs >>= \(ys, f') ->
x ys >>= \(zs, x') ->
pure (zs, f' x')
How do I do this? I tried substituting in for >>= by hand, but always wound up getting stuck trying to reduce a join -- which would also require Monad.
I also consulted Parsec, but even that wasn't much help:
instance Applicative.Applicative (ParsecT s u m) where
pure = return
(<*>) = ap
My reasons for asking this question are purely self-educational.
It's not possible. Look at the inside of your newtype:
getParser :: [s] -> m ([s], a)
Presumably, you want to pass [s] to the input of y in x <*> y. This is exactly the difference between Monad m and Applicative m:
In Monad you can use the output of one computation as the input to another.
In Applicative, you cannot.
It's possible if you do a funny trick:
Parser x <*> Parser y = Parser $
\s -> (\(_, xv) (s', yv) -> (s', xv yv)) <$> x s <*> y s
However, this is almost certainly not the definition that you want, since it parses x and y in parallel.
Fixes
Your ParserT can be Applicative quite easily:
newtype ParserT m s a = ParserT { runParser :: [s] -> m ([s], a) }
-- or, equvalently
newtype ParserT m s a = ParserT (StateT [s] m a)
instance Monad m => Applicative (ParserT m s) where
...
Note that ParserT m s is not an instance of Monad as long as you don't define the Monad instance.
You can move the leftover characters outside the parser:
newtype ParserT m s a = ParserT { runParser :: [s] -> ([s], m a) }
instance Applicative m => Applicative (ParserT m s) where
ParserT x <*> ParserT y = ParserT $ \s ->
let (s', x') = x s
(s'', y') = y s'
in x' <*> y'
...
Full marks for aiming to use Applicative as much as possible - it's much cleaner.
Headline: Your parser can stay Applicative, but your collection of possible parses need to be stored in a Monad. Internal structure: uses a monad. External structure: is applicative.
You're using m ([s],a) to represent a bunch of possible parses. When you parse the next input, you want it to depend on what's already been parsed, but you're using m because there's potentially less than or more than one possible parse; you want to do \([s],a) -> ... and work with that to make a new m ([s],a). That process is called binding and uses >>= or equivalent, so your container is definitely a Monad, no escape.
It's not all that bad using a monad for your container - it's just a container you're keeping some stuff in after all. There's a difference between using a monad internally and being a monad. Your parsers can be applicative whilst using a monad inside.
See What are the benefits of applicative parsing over monadic parsing?.
If your parsers are applicative, they're simpler, so in theory you can do some optimisation when you combine them, by keeping static information about what they do instead of keeping their implementation. For example,
string "Hello World!" <|> string "Hello Mum!"
== (++) <$> string "Hello " <*> (string "World" <|> string "Mum!")
The second version is better than the first because it does no backtracking.
If you do a lot of this, it's like when a regular expression is compiled before it's run, creating a graph (finite state automaton) and simplifying it as much as possible and eliminating a whole load of inefficient backtracking.

Making attoparsec parsers recursive

I've been coding up an attoparsec parser and have been hitting a pattern where I want to turn parsers into recursive parsers (recursively combining them with the monad bind >>= operator).
So I created a function to turn a parser into a recursive parser as follows:
recursiveParser :: (a -> A.Parser a) -> a -> A.Parser a
recursiveParser parser a = (parser a >>= recursiveParser parser) <|> return a
Which is useful if you have a recursive data type like
data Expression = ConsExpr Expression Expression | EmptyExpr
parseRHS :: Expression -> Parser Expression
parseRHS e = ConsExpr e <$> parseFoo
parseExpression :: Parser Expression
parseExpression = parseLHS >>= recursiveParser parseRHS
where parseLHS = parseRHS EmptyExpr
Is there a more idiomatic solution? It almost seems like recursiveParser should be some kind of fold... I also saw sepBy in the docs, but this method seems to suit me better for my application.
EDIT: Oh, actually now that I think about it should actually be something similar to fix... Don't know how I forgot about that.
EDIT2: Rotsor makes a good point with his alternative for my example, but I'm afraid my AST is actually a bit more complicated than that. It actually looks something more like this (although this is still simplified)
data Segment = Choice1 Expression
| Choice2 Expression
data Expression = ConsExpr Segment Expression
| Token String
| EmptyExpr
where the string a -> b brackets to the right and c:d brackets to the left, with : binding more tightly than ->.
I.e. a -> b evaluates to
(ConsExpr (Choice1 (Token "a")) (Token "b"))
and c:d evaluates to
(ConsExpr (Choice2 (Token "d")) (Token "c"))
I suppose I could use foldl for the one and foldr for the other but there's still more complexity in there. Note that it's recursive in a slightly strange way, so "a:b:c -> e:f -> :g:h ->" is actually a valid string, but "-> a" and "b:" are not. In the end fix seemed simpler to me. I've renamed the recursive method like so:
fixParser :: (a -> A.Parser a) -> a -> A.Parser a
fixParser parser a = (parser a >>= fixParser parser) <|> pure a
Thanks.
Why not just parse a list and fold it into whatever you want later?
Maybe I am missing something, but this looks more natural to me:
consChain :: [Expression] -> Expression
consChain = foldl ConsExpr EmptyExpr
parseExpression :: Parser Expression
parseExpression = consChain <$> many1 parseFoo
And it's shorter too.
As you can see, consChain is now independent from parsing and can be useful somewhere else. Also, if you separate out the result folding, the somewhat unintuitive recursive parsing simplifies down to many or many1 in this case.
You may want to take a look at how many is implemented too:
many :: (Alternative f) => f a -> f [a]
many v = many_v
where many_v = some_v <|> pure []
some_v = (:) <$> v <*> many_v
It has a lot in common with your recursiveParser:
some_v is similar to parser a >>= recursiveParser parser
many_v is similar to recursiveParser parser
You may ask why I called your recursive parser function unintuitive. This is because this pattern allows parser argument to affect the parsing behaviour (a -> A.Parser a, remember?), which may be useful, but not obviously (I don't see a use case for this yet). The fact that your example does not use this feature makes it look redundant.

Haskell: Lifting a reads function to a parsec parser

As part of the 4th exercise here
I would like to use a reads type function such as readHex with a parsec Parser.
To do this I have written a function:
liftReadsToParse :: Parser String -> (String -> [(a, String)]) -> Parser a
liftReadsToParse p f = p >>= \s -> if null (f s) then fail "No parse" else (return . fst . head ) (f s)
Which can be used, for example in GHCI, like this:
*Main Numeric> parse (liftReadsToParse (many1 hexDigit) readHex) "" "a1"
Right 161
Can anyone suggest any improvement to this approach with regard to:
Will the term (f s) be memoised, or evaluated twice in the case of a null (f s) returning False?
Handling multiple successful parses, i.e. when length (f s) is greater than one, I do not know how parsec deals with this.
Handling the remainder of the parse, i.e. (snd . head) (f s).
This is a nice idea. A more natural approach that would make
your ReadS parser fit in better with Parsec would be to
leave off the Parser String at the beginning of the type:
liftReadS :: ReadS a -> String -> Parser a
liftReadS reader = maybe (unexpected "no parse") (return . fst) .
listToMaybe . filter (null . snd) . reader
This "combinator" style is very idiomatic Haskell - once you
get used to it, it makes function definitions much easier
to read and understand.
You would then use liftReadS like this in the simple case:
> parse (many1 hexDigit >>= liftReadS readHex) "" "a1"
(Note that listToMaybe is in the Data.Maybe module.)
In more complex cases, liftReadS is easy to use inside any
Parsec do block.
Regarding some of your other questions:
The function reader is applied only once now, so there is nothing to "memoize".
It is common and accepted practice to ignore all except the first parse in a ReadS parser in most cases, so you're fine.
To answer the first part of your question, no (f s) will not be memoised, you would have to do that manually:
liftReadsToParse p f = p >>= \s -> let fs = f s in if null fs then fail "No parse"
else (return . fst . head ) fs
But I'd use pattern matching instead:
liftReadsToParse p f = p >>= \s -> case f s of
[] -> fail "No parse"
(answer, _) : _ -> return answer

Resources