Defining a new primitive combinator in Parsec - parsing

Problem context
I am writing a toy parser (for Scheme), that should be able to distinguish between - and + as identifiers and as a part of a number (e.g. +i, -2.4, +5). I would like to try-parse anything starting with + or - as a number, but with a catch: if the parses consumes the sign, but not any characters after the sign, then I would like it to act as if it was wrapped in a try, but if any input was consumed after the sign then I'd like it to fail, getting a nice contextual error message and line number/position for the offending character; wrapping the entire parser in a try would always backtrack, which is not what I want.
New combinator: followedBy
To this end, I want to create a combinator, which I dubbed followedBy¹, that mirrors bind (>>=) and takes a parser m and a function k that returns a parser,
followedBy :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b
m `followedBy` k = ...
and acts as >>= except when m consumes ok and k empty fails, in which case I want the combinator to empty fail. This combinator can then be used as such:
(char '+') `followedBy` (\sign -> parseUnsignedNumber sign)
to try to parse as a number, fail if it is partially a number but malformed, and empty fail if what follows + is not at all like a number.
Low-level implementation
I have adjusted the parserBind code from Text.Parsec.Prim to do exactly this², but because it uses symbols that are not exported by Parsec I can't compile it. Is there a way to write the same thing using higher-level constructs that are exported by Parsec?
My (uncompilable) implementation for followedBy is:
import Text.Parsec.Prim
followedBy :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b
-- Code adjusted from parserBind in Text.Parsec.Prim. Changed is that mcok.peerr uses eerr instead.
followedBy m k
= ParsecT $ \s cok cerr eok eerr ->
let
-- consumed-okay case for m
mcok x s' err =
let
-- if (k x) consumes, those go straight up
pcok = cok
pcerr = cerr
-- if (k x) doesn't consume input, but is okay,
-- we still return in the consumed continuation
peok x s err' = cok x s (mergeError err err')
-- if (k x) doesn't consume input, but errors,
-- then we return the empty error (not consuming m or k)
peerr err' = eerr (mergeError err err')
in unParser (k x) s pcok pcerr peok peerr
-- empty-ok case for m
meok x s err =
let
-- in these cases, (k x) can return as empty
pcok = cok
peok x s err' = eok x s (mergeError err err')
pcerr = cerr
peerr err' = eerr (mergeError err err')
in unParser (k x) s pcok pcerr peok peerr
-- consumed-error case for m
mcerr = cerr
-- empty-error case for m
meerr = eerr
in unParser m s mcok mcerr meok meerr
Possible alternative
I know that there is also an alternative, namely to try-parse + as an identifier, along the lines of char '+' >> parseDelimiter, and I will take this route if the followedBy does not work and another (elegant) solution does not present itself, but I am really curious if followedBy can be implemented and, if so, how.
¹) Anyone having a better name for it is welcome to comment.
²) I have not been able to test it, so I don't know for sure if my code works correctly.

Related

Using monad for simple Haskell parser

TL;DR
I'm trying to understand how this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
where
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
Is equivalent to this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = do
c <- anyChar
if pred c then return c else empty
Context
This is a snippet from some lecture notes on Haskell parsing, which I'm trying to understand:
import Control.Applicative
import Data.Char
import Data.Functor
import Data.List
newtype Parser a = PsrOf (String -> Maybe (String, a))
-- Function from input string to:
--
-- * Nothing, if failure (syntax error);
-- * Just (unconsumed input, answer), if success.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) = p
-- Monadic Parsing in Haskell uses [] instead of Maybe to support ambiguous
-- grammars and multiple answers.
-- | Use a parser on an input string.
runParser :: Parser a -> String -> Maybe a
runParser (PsrOf p) inp = case p inp of
Nothing -> Nothing
Just (_, a) -> Just a
-- OR: fmap (\(_,a) -> a) (p inp)
-- | Read a character and return. Failure if input is empty.
anyChar :: Parser Char
anyChar = PsrOf p
where
p "" = Nothing
p (c:cs) = Just (cs, c)
-- | Read a character and check against the given character.
char :: Char -> Parser Char
-- char wanted = PsrOf p
-- where
-- p (c:cs) | c == wanted = Just (cs, c)
-- p _ = Nothing
char wanted = satisfy (\c -> c == wanted) -- (== wanted)
-- | Read a character and check against the given predicate.
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
where
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
-- Could also be:
-- satisfy pred = do
-- c <- anyChar
-- if pred c then return c else empty
instance Monad Parser where
-- return :: a -> Parser a
return = pure
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
PsrOf p1 >>= k = PsrOf q
where
q inp = case p1 inp of
Nothing -> Nothing
Just (rest, a) -> dePsr (k a) rest
I understand everything up until the last bit of the Monad definition, specifically I don't understand how the following line returns something of type Parser b as is required by the (>>=) definition:
Just (rest, a) -> dePsr (k a) rest
It's difficult for me grasp what the Monad definition means without an example. Thankfully, we have one in the alternate version of the satisfy function, which uses do-notation (which of course means the Monad is being called). I really don't understand do-notation yet, so here's the desugared version of satisfy:
satisfy pred = do
anyChar >>= (c ->
if pred c then return c else empty)
So based on the first line of our (>>=)definition, which is
PsrOf p1 >>= k = PsrOf q
We have anyChar as our PsrOf p1 and (c -> if pred c then return c else empty) as our k. What I don't get is how in dePsr (k a) rest that (k a) returns a Parser (at least it shold, otherwise calling dePsr on it wouldn't make sense). This is made more confusing by the presence of rest. Even if (k a) returned a Parser, calling dePsr would extract the underlying function from the returned Parser and pass rest to it as an input. This is definitely doesn't return something of type Parser b as required by the definition of (>>=). Clearly I'm misunderstanding something somewhere.
Ok, Maybe this will help. Let's start by puting some points back into dePsr.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) rest = p rest
And let's also write out return: (NB I'm putting in all the points for clarity)
return :: a -> Parser a
return a = PsrOf (\rest -> Just (rest, a))
And now from the Just branch of the (>>=) definition
Just (rest, a) -> dePsr (k a) rest
Let's make sure we agree on what every thing is:
rest the string remaining unparsed after p1 is applied
a the result of applying p1
k :: a -> Parser b takes the result of the previous parser and makes a new parser
dePsr unwraps a Parser a back into a function `String -> Maybe (String, a)
Remember we will wrap this back into a parser again at the top of the function: PsrOf q
So in English bind (>>=) take a parser in a and a function from a to a parser in b and returns a parser in b. The resulting parser is made by wrapping q :: String -> Maybe (String, b) in the Parser constructor PsrOf. Then q, the combined parser, take a String called inp and applies the function p1 :: String -> Maybe (String,a) that we got from pattern matching against the first parser, and pattern matches on the result. For an error we propagate Nothing (easy). If the first parser had a result we have tow pieces of information, the still unparsed string called rest and the result a. We give a to k, the second parser combinator, and get a Parser b which we need to unwrap with dePsr to get a function (String -> Maybe (String,b) back. That function can be applied to rest for the final result of the combined parsers.
I think the hardest part about reading this is that sometimes we curry the parser function which obscures what is actually happening.
Ok for the satisfy example
satisfy pred
= anyChar >>= (c -> if pred c then return c else empty)
empty comes from the alternative instance and is PsrOf (const Nothing) so a parser that always fails.
Lets look at only the successful branches. By substitution of only the successful part:
PsrOf (\(c:cs) ->Just (cs, c)) >>= (\c -> PsrOf (\rest -> Just (rest, c)))
So in the bind (>>=) definition
p1 = \(c:cs -> Just (cs, c))
k = (\c -> PsrOf (\rest -> Just (rest, c)))
q inp = let Just (rest,a) = p1 inp in dePsr (k a) rest again only successful branch
Then q becomes
q inp =
let Just (rest, a) = (\(c:cs) -> Just (cs, c)) inp
in dePsr (\c -> PsrOf (\rest -> Just (rest, c))) a rest
Doing a little β-reduction
q inp =
let (c:cs) = inp
rest = cs
a = c
in dePsr (PsdOf (\rest -> Just (rest, a))) rest -- dePsr . PsrOf = id
Finally cleaning up some more
q (c:cs) = Just (cs, c)
So if pred is successful we reduce satisfy back to exactly anyChar which is exactly what we expect, and exactly what we find in the first example of the question. I will leave it as and exersize to the reader (read: I'm lazy) to prove that if either inp = "" or pred c = False that the outcome is Nothing as in the first satisfy example.
NOTE: If you are doing anything other than a class assignment, you will save yourself hours of pain and frustration by starting with error handling from the beginning make your parser String -> Either String (String,a) it is easy to make the error type more general later, but a PITA to change everything from Maybe to Either.
Question: "[C]ould you explain how you arrived at return a = PsrOf (\rest -> Just (rest, a)) from return = pure after you put "points" back into return?
Answer: First off, it is pretty unfortunate to give the Monad instance definition without the Functor and Applicative definitions. The pure and return functions must be identical (It is part of the Monad Laws), and they would be called the same thing except Monad far predates Applicative in Haskell history. In point of fact, I don't "know" what pure looks like, but I know what it has to be because it is the only possible definition. (If you want to understand the the proof of that statement ask, I have read the papers, and I know the results, but I'm not into typed lambda calculus quite enough to be confident in reproducing the results.)
return must wrap a value in the context without altering the context.
return :: Monad m => a -> m a
return :: a -> Parser a -- for our Monad
return :: a -> PsrOf(\str -> Maybe (rest, value)) -- substituting the constructor (PSUDO CODE)
A Parser is a function that takes a string to be parsed and returns Just the value along with any unparsed portion of the original string or Nothing on failure, all wrapped in the constructorPsrOf. The context is the string to be parsed, so we cannot change that. The value is of course what was passed toreturn`. The parser always succeeds so we must return Just a value.
return a = PsrOf (\rest -> Just (rest, a))
rest is the context and it is passed through unaltered.
a is the value we put into the Monad context.
For completeness here is also the only reasonable definition of fmap from Functor.
fmap :: Functor f => (a->b) -> f a -> f b
fmap :: (a -> b) -> Parser a -> Parser b -- for Parser Monad
fmap f (PsrOf p) = PsrOf q
where q inp = case p inp of
Nothing -> Nothing
Just (rest, a) -> Just (rest, f a)
-- better but less instructive definition of q
-- q = fmap (\(rest,a) -> (rest, f a)) . p

how to tell whether Parsec parser uses constant heap space in Haskell

In a recent question, I asked about the following
parsec parser:
manyLength
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLength p = go 0
where
go :: Int -> ParsecT s u m Int
go !i = (p *> go (i + 1)) <|> pure i
This function is similar to many. However, instead of returning [a], it
returns the number of times it was able to successfully run p.
This works well, except for one problem. It doesn't run in constant heap
space.
In the linked question, Li-yao
Xia gives an alternative way of
writing manyLength that uses constant heap space:
manyLengthConstantHeap
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeap p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
((p *> pure True) <|> pure False) >>=
\success -> if success then go (i+1) else pure i
This is a significant improvement, but I don't understand why
manyLengthConstantHeap uses constant heap space, while my original manyLength doesn't.
If you inline (<|>) in manyLength, it looks somewhat like this:
manyLengthInline
:: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthInline p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
ParsecT $ \s cok cerr eok eerr ->
let meerr :: ParserError -> m b
meerr err =
let neok :: Int -> State s u -> ParserError -> m b
neok y s' err' = eok y s' (mergeError err err')
neerr :: ParserError -> m b
neerr err' = eerr $ mergeError err err'
in unParser (pure i) s cok cerr neok neerr
in unParser (p *> go (i + 1)) s cok cerr eok meerr
If you inline (>>=) in manyLengthConstantHeap, it looks somewhat like this:
manyLengthConstantHeapInline
:: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeapInline p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
ParsecT $ \s cok cerr eok eerr ->
let mcok :: Bool -> State s u -> ParserError -> m b
mcok success s' err =
let peok :: Int -> State s u -> ParserError -> m b
peok int s'' err' = cok int s'' (mergeError err err')
peerr :: ParserError -> m b
peerr err' = cerr (mergeError err err')
in unParser
(if success then go (i + 1) else pure i)
s'
cok
cerr
peok
peerr
meok :: Bool -> State s u -> ParserError -> m b
meok success s' err =
let peok :: Int -> State s u -> ParserError -> m b
peok int s'' err' = eok int s'' (mergeError err err')
peerr :: ParserError -> m b
peerr err' = eerr (mergeError err err')
in unParser
(if success then go (i + 1) else pure i)
s'
cok
pcerr
peok
peerr
in unParser ((p *> pure True) <|> pure False) s mcok cerr meok eerr
Here is the ParsecT constructor for completeness:
newtype ParsecT s u m a = ParsecT
{ unParser
:: forall b .
State s u
-> (a -> State s u -> ParseError -> m b) -- consumed ok
-> (ParseError -> m b) -- consumed err
-> (a -> State s u -> ParseError -> m b) -- empty ok
-> (ParseError -> m b) -- empty err
-> m b
}
Why does manyLengthConstantHeap run with constant heap space, while
manyLength does not? It doesn't look like the recursive call to go is in
the tail-call position for either manyLengthConstantHeap or manyLength.
When writing parsec parsers in the future, how can I know the space
requirements for a given parser? How did Li-yao Xia know that
manyLengthConstantHeap would be okay?
I don't feel like I have any confidence in predicting which parsers will use a
lot of memory on a large input.
Is there an easy way to figure out whether a given function will be
tail-recursive in Haskell without running it? Or better yet, without compiling
it?

parsec produce strange errors when trying to handle Maybe as ParseError

If I have this code :
import Text.Parsec
ispositive a = if (a<0) then Nothing else (Just a)
f a b = a+b
parserfrommaybe :: String -> (Maybe c) -> Parsec a b c
parserfrommaybe msg Nothing = fail msg
parserfrommaybe _ (Just res) = return res
(<!>) :: Parsec a b (Maybe c) -> String -> Parsec a b c
(<!>) p1 msg = p1 >>= (parserfrommaybe msg)
integermaybeparser = (ispositive <$> integer) <!> "negative numbers are not allowed"
testparser = f <$> (integermaybeparser <* whiteSpace) <*> integermaybeparser
when I test testparser with input like this "-1 3" it gives :
Left (line 1, column 4):
unexpected "3"
negative numbers are not allowed
I expected it to give error on Column 1 and give the error message without the sentence "unexpected 3" but it seems parsec continued parsing.
Why did this happen ? and how to make parsec give the error message I expect ?
I have found the solution, the cause of is that the first parser gets run and consumes input even when failing.
The solution was to use lookAhead like this:
(<!>) :: (Monad m,Stream a m t) => ParsecT a b m (Maybe c) -> String -> ParsecT a b m c
(<!>) p1 msg = ((lookAhead p1) >>= (parserfrommaybe msg)) *> (p1 >>= (parserfrommaybe msg))
if lookAhead p1 returns Nothing then the first argument of *> would fail without consuming input because of lookAhead, now if lookAhead p1 returns Just res then it would succeed again without consuming input and the result would be obtained from the second argument of *>.
ofcourse I had to change parserfrommaybe type annotation to (Monad m) => String -> (Maybe c) -> ParsecT a b m c to satisfy ghc.

uu-parsinglib parsing with conditional fail

EDITED for more complete problem:
I'd like to create a parser (I'm using uu-parsinglib) that takes the result of a previous parser, and conditionally fails if the result contains a certain constructor:
I now realise this must be a monadic parser.
I have a grammar which contains non-direct left recursive. Below illustrates the problem, the reality is slightly more convoluted:
data Field =
Field_A A
Field_B B
Field_C C
Field_D String
data A =
A Prefix String
data B =
B Prefix String
data C =
C Prefix String
data Prefix =
Prefix Field
Most of the time I'm only interested in Field, and in the interests of minimising backtracking, its best to focus on that case.
I've defined an operator to help
(<..>) :: IsParser p => p (a -> b) -> p (b -> c) -> p (a -> c)
g <..> f = (.) <$> f <*> g
And I approach the problem as:
pField :: Parser Field
pField =
( Field_D <$> pString ) <??>
pChainl' ( pReturn (helper) <*> pPrefix' ) ( pA' <<|> pB' <<|> pC' )
where pChainl' :: IsParser p => p (f -> (pre -> f) -> f) ->
p (pre -> f) ->
p (f -> f)
pChainl' op x = must_be_non_empties "pChainl'" op x (
flip f <$> pList1 (flip <$> op <*> x)
)
f x [] = x
f x (func:rest) = f (func x) rest
helper :: (Field -> Prefix) ->
Field ->
(Prefix -> Field) ->
Field
helper p i n = n $ p i
Note I've defined a variant of pChainl that allows the initial field to be passed in, whilst keeping left association.
pA' :: Parser (Prefix -> Field)
pA' = ( (flip A) <$> pString ) <..> pReturn Field_A
pB' :: Parser (Prefix -> Field)
pB' = ( (flip B) <$> pString ) <..> pReturn Field_B
pC' :: Parser (Prefix -> Field)
pC' = ( (flip C) <$> pString ) <..> pReturn Field_C
-- This consumes no input
pPrefix' :: Parser (Field -> Prefix)
pPrefix' = pReturn Prefix
The question
I'd like to define
pA :: Parser A
in terms of pField, with a post filter to fail if the rightmost Field constructor is not Field_A. As has rightly been pointed out, this is a monadic parse. I can't find any compelling examples of using uu-parsinglib as a monadic parser, so what would your suggested approach be?
If I'm barking up the wrong tree, please let me know also.
It seems like you could make a generalized conditional parser that only succeeds if the value returned by the parser passes some test. This uses the monad capabilities of course. I am not sure if this is a good thing to do with uu-parsinglib however. It seems to work fine in my testing, with one exception: when the conditional fails and no other parsers are available to consume input, the library throws an exception. (something along the lines of no correcting steps given...)
pConditional :: Parser a -> (a -> Bool) -> Parser a
pConditional p test = p >>= (\result -> case (test result) of
True -> pure result
False -> empty)
I would also like to know of other pitfalls that would arise from liberal use of such a conditional parser. (if any.)
I think I've found a solution. I'm still interested in hearing thoughts on the best way to parse such indirect left recursion.
The proposed solution is
pA :: Parser A
pA = do
a <- pField
case a of
(Field_A r) -> return r
otherwise -> pFail

How to restrict backtracking in a monad transformer parser combinator

tl;dr, How do I implement parsers whose backtracking can be restricted, where the parsers are monad transformer stacks?
I haven't found any papers, blogs, or example implementations of this approach; it seems the typical approach to restricting backtracking is a datatype with additional constructors, or the Parsec approach where backtracking is off by default.
My current implementation -- using a commit combinator, see below -- is wrong; I'm not sure about the types, whether it belongs in a type class, and my instances are less generic than it feels like they should be.
Can anyone describe how to do this cleanly, or point me to resources?
I've added my current code below; sorry for the post being so long!
The stack:
StateT
MaybeT/ListT
Either e
The intent is that backtracking operates in the middle layer -- a Nothing or an empty list wouldn't necessarily yield an error, it'd just mean that a different branch should be tried -- whereas the bottom layer is for errors (with some contextual information) that immediately abort the parsing.
{-# LANGUAGE NoMonomorphismRestriction, FunctionalDependencies,
FlexibleInstances, UndecidableInstances #-}
import Control.Monad.Trans.State (StateT(..))
import Control.Monad.State.Class (MonadState(..))
import Control.Monad.Trans.Maybe (MaybeT(..))
import Control.Monad.Trans.List (ListT(..))
import Control.Monad (MonadPlus(..), guard)
type Parser e t mm a = StateT [t] (mm (Either e)) a
newtype DParser e t a =
DParser {getDParser :: Parser e t MaybeT a}
instance Monad (DParser e t) where
return = DParser . return
(DParser d) >>= f = DParser (d >>= (getDParser . f))
instance MonadPlus (DParser e t) where
mzero = DParser (StateT (const (MaybeT (Right Nothing))))
mplus = undefined -- will worry about later
instance MonadState [t] (DParser e t) where
get = DParser get
put = DParser . put
A couple of parsing classes:
class (Monad m) => MonadParser t m n | m -> t, m -> n where
item :: m t
parse :: m a -> [t] -> n (a, [t])
class (Monad m, MonadParser t m n) => CommitParser t m n where
commit :: m a -> m a
Their instances:
instance MonadParser t (DParser e t) (MaybeT (Either e)) where
item =
get >>= \xs -> case xs of
(y:ys) -> put ys >> return y;
[] -> mzero;
parse = runStateT . getDParser
instance CommitParser t (DParser [t] t) (MaybeT (Either [t])) where
commit p =
DParser (
StateT (\ts -> MaybeT $ case runMaybeT (parse p ts) of
Left e -> Left e;
Right Nothing -> Left ts;
Right (Just x) -> Right (Just x);))
And a couple more combinators:
satisfy f =
item >>= \x ->
guard (f x) >>
return x
literal x = satisfy (== x)
Then these parsers:
ab = literal 'a' >> literal 'b'
ab' = literal 'a' >> commit (literal 'b')
give these results:
> myParse ab "abcd"
Right (Just ('b',"cd")) -- succeeds
> myParse ab' "abcd"
Right (Just ('b',"cd")) -- 'commit' doesn't affect success
> myParse ab "acd"
Right Nothing -- <== failure but not an error
> myParse ab' "acd"
Left "cd" -- <== error b/c of 'commit'
The answer appears to be in the MonadOr type class (which unfortunately for me is not part of the standard libraries):
class MonadZero m => MonadOr m where
morelse :: m a -> m a -> m a
satisfying Monoid and Left Catch:
morelse mzero b = b
morelse a mzero = a
morelse (morelse a b) c = morelse a (morelse b c)
morelse (return a) b = return a

Resources