how to tell whether Parsec parser uses constant heap space in Haskell - parsing

In a recent question, I asked about the following
parsec parser:
manyLength
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLength p = go 0
where
go :: Int -> ParsecT s u m Int
go !i = (p *> go (i + 1)) <|> pure i
This function is similar to many. However, instead of returning [a], it
returns the number of times it was able to successfully run p.
This works well, except for one problem. It doesn't run in constant heap
space.
In the linked question, Li-yao
Xia gives an alternative way of
writing manyLength that uses constant heap space:
manyLengthConstantHeap
:: forall s u m a. ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeap p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
((p *> pure True) <|> pure False) >>=
\success -> if success then go (i+1) else pure i
This is a significant improvement, but I don't understand why
manyLengthConstantHeap uses constant heap space, while my original manyLength doesn't.
If you inline (<|>) in manyLength, it looks somewhat like this:
manyLengthInline
:: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthInline p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
ParsecT $ \s cok cerr eok eerr ->
let meerr :: ParserError -> m b
meerr err =
let neok :: Int -> State s u -> ParserError -> m b
neok y s' err' = eok y s' (mergeError err err')
neerr :: ParserError -> m b
neerr err' = eerr $ mergeError err err'
in unParser (pure i) s cok cerr neok neerr
in unParser (p *> go (i + 1)) s cok cerr eok meerr
If you inline (>>=) in manyLengthConstantHeap, it looks somewhat like this:
manyLengthConstantHeapInline
:: forall s u m a. Monad m => ParsecT s u m a -> ParsecT s u m Int
manyLengthConstantHeapInline p = go 0
where
go :: Int -> ParsecT s u m Int
go !i =
ParsecT $ \s cok cerr eok eerr ->
let mcok :: Bool -> State s u -> ParserError -> m b
mcok success s' err =
let peok :: Int -> State s u -> ParserError -> m b
peok int s'' err' = cok int s'' (mergeError err err')
peerr :: ParserError -> m b
peerr err' = cerr (mergeError err err')
in unParser
(if success then go (i + 1) else pure i)
s'
cok
cerr
peok
peerr
meok :: Bool -> State s u -> ParserError -> m b
meok success s' err =
let peok :: Int -> State s u -> ParserError -> m b
peok int s'' err' = eok int s'' (mergeError err err')
peerr :: ParserError -> m b
peerr err' = eerr (mergeError err err')
in unParser
(if success then go (i + 1) else pure i)
s'
cok
pcerr
peok
peerr
in unParser ((p *> pure True) <|> pure False) s mcok cerr meok eerr
Here is the ParsecT constructor for completeness:
newtype ParsecT s u m a = ParsecT
{ unParser
:: forall b .
State s u
-> (a -> State s u -> ParseError -> m b) -- consumed ok
-> (ParseError -> m b) -- consumed err
-> (a -> State s u -> ParseError -> m b) -- empty ok
-> (ParseError -> m b) -- empty err
-> m b
}
Why does manyLengthConstantHeap run with constant heap space, while
manyLength does not? It doesn't look like the recursive call to go is in
the tail-call position for either manyLengthConstantHeap or manyLength.
When writing parsec parsers in the future, how can I know the space
requirements for a given parser? How did Li-yao Xia know that
manyLengthConstantHeap would be okay?
I don't feel like I have any confidence in predicting which parsers will use a
lot of memory on a large input.
Is there an easy way to figure out whether a given function will be
tail-recursive in Haskell without running it? Or better yet, without compiling
it?

Related

Implementing a lexer using the Free Monad

I am thinking about a use case of the free monad which would be a simple lexing DSL. So far I came up with some primitive operations:
data LexF r where
POP :: (Char -> r) -> LexF r
PEEK :: (Char -> r) -> LexF r
FAIL :: LexF r
...
instance Functor LexF where
...
type Lex = Free LexF
The problem I encounter is that I would like to have a CHOICE primitive that would describe an operation of trying to execute one parser and in case of failure fallback to another. Something like CHOICE :: LexF r -> LexF r -> (r -> r) -> LexF r...
...and here the stairs begin. Since r is preset at contravariant position, it is impossible (is it?) to create a valid Functor instance for Op. I came up with some other idea, which was to generalize over the type of alternative lexers, so CHOICE :: LexF a -> LexF a -> (a -> r) -> LexF r – now it works as a Functor, though the problem is with thawing it into Free, as I would normally do it with liftF:
choice :: OpF a -> OpF a -> OpF a
choice c1 c2 = liftF $ CHOICE _ _ id -- how to fill the holes :: Op a ?
I am really running out of any ideas. This of course generalizes to nearly all other combinators, I just find CHOICE a good minimal case. How to tackle it? I am okay to hear that this example is totally broken and it just won't work with Free as I would like to. But therefore, does it even make sense to write lexers/parsers in this manner?
As a general rule when working with free monads, you don't want to introduce primitives for "monadic control". For example, a SEQUENCE primitive would be ill-advised, because the free monad itself provides sequencing. Likewise, a CHOICE primitive is ill-advised because this should be provided by a free
MonadPlus.
Now, there is no free MonadPlus in modern versions of free because equivalent functionality is provided by a free monad transformer over a list base monad, namely FreeT f []. So, what you probably want is to define:
data LexF r where
POP :: (Char -> r) -> LexF r
PEEK :: (Char -> r) -> LexF r
deriving instance Functor LexF
type Lex = FreeT LexF []
pop :: (Char -> a) -> Lex a
pop f = liftF $ POP f
peek :: (Char -> a) -> Lex a
peek f = liftF $ PEEK f
but no FAIL or CHOICE primitives.
If you were to define fail and choice combinators, they would be defined by means of the list base monad using transformer magic:
fail :: Lex a
fail = empty
choice :: Lex a -> Lex a -> Lex a
choice = (<|>)
though there's no actual reason to define these.
SPOILERS follow... Anyway, you can now write things like:
anyChar :: Lex Char
anyChar = pop id
char :: Char -> Lex Char
char c = do
c' <- anyChar
guard $ c == c'
return c'
a_or_b :: Lex Char
a_or_b = char 'a' <|> char 'b'
With an interpreter for your monad primitives, in this case intrepreting them to the StateT String [] AKA String -> [(a,String)] monad:
type Parser = StateT String []
runLex :: Lex a -> Parser a
runLex = iterTM go
where go :: LexF (Parser a) -> Parser a
go (POP f) = StateT pop' >>= f
where pop' (c:cs) = [(c,cs)]
pop' _ = []
go (PEEK f) = StateT peek' >>= f
where peek' (c:cs) = [(c,c:cs)]
peek' _ = []
parse :: Lex a -> String -> [(a, String)]
parse = runStateT . runLex
you can then:
main :: IO ()
main = do
let test = parse a_or_b
print $ test "abc"
print $ test "bca"
print $ test "cde"
The full example:
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE GADTs #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad.State
import Control.Applicative
import Control.Monad.Trans.Free
data LexF r where
POP :: (Char -> r) -> LexF r
PEEK :: (Char -> r) -> LexF r
deriving instance Functor LexF
type Lex = FreeT LexF []
pop :: (Char -> a) -> Lex a
pop f = liftF $ POP f
peek :: (Char -> a) -> Lex a
peek f = liftF $ PEEK f
anyChar :: Lex Char
anyChar = pop id
char :: Char -> Lex Char
char c = do
c' <- anyChar
guard $ c == c'
return c'
a_or_b :: Lex Char
a_or_b = char 'a' <|> char 'b'
type Parser = StateT String []
runLex :: Lex a -> Parser a
runLex = iterTM go
where go :: LexF (Parser a) -> Parser a
go (POP f) = StateT pop' >>= f
where pop' (c:cs) = [(c,cs)]
pop' _ = []
go (PEEK f) = StateT peek' >>= f
where peek' (c:cs) = [(c,c:cs)]
peek' _ = []
parse :: Lex a -> String -> [(a, String)]
parse = runStateT . runLex
main :: IO ()
main = do
let test = parse a_or_b
print $ test "abc"
print $ test "bca"
print $ test "cde"

How does the Haskell `do` notation know which value to take when it isn't defined by a return?

I have this monadic object.
data Parser a = Parser (String -> Maybe (a, String))
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f (Parser pa) = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a, rest) -> Just (f a, rest)
instance Applicative Parser where
pure = return
(<*>) = ap
instance Monad Parser where
--return :: a -> Parser a
return a = Parser $ \input -> Just (a, input)
--(>>=) :: Parser a -> (a -> Parser b) -> Parser b
(Parser pa) >>= f = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a,rest) -> parse (f a) rest
And I have this definition of an item which I am told "reads in a character" but I don't really see any reading going on.
item :: Parser Char
item = Parser $ \ input -> case input of "" -> Nothing
(h:t) -> Just (h, t)
But ok, fine, maybe I should just relax about how literal to take the word "read" and jibe with it. Moving on, I have
failParse :: Parser a
failParse = Parser $ \ input -> Nothing
sat :: (Char -> Bool) -> Parser Char
sat p = do c <- item
if p c
then return c
else failParse
And this is where I get pretty confused. What is getting stored in the variable c? Since item is a Parser with parameter Char, my first guess is that c is storing such an object. But after a second of thought I know that's not now the do notation works, you don't get the monad, you get the contents of the monad. Great, but then that tells me c is then the function
\ input -> case input of "" -> Nothing
(h:t) -> Just (h, t)
But clearly that's wrong since the next line of the definition of sat treats c like a character. Not only is that not what I expect, but it's about three levels of structure down from what I expected! It's not the function, it's not the Maybe object, and it's not the tuple, but it's the left coordinate of the Just tuple buried inside the function! How is that little character working all that way outside? What is instructing the <- to extract this part of the monad?
As comment mentioned, <- just be do notation syntax sugar and equivalent to:
item >>= (\c->if p c
then return c
else failParse)
Okay, let see what is c? consider the definition of (>>=)
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
or more readable way:
Parser a >>= (a -> Parser b)
And Now, matches it with above expression item >>= (\c->if p c then return c else failParse) give:
Parer a = item
and
(a->Parser b) = (\c->if p c then return c else failParse)
and item has type:
item :: Parser Char
so, we can now replace a in (>>=) by Char, gives
Parser Char >>= (Char -> Parser b)
and now \c->if p c then return c else failParse also have type: (Char -> Parser b)
and so c is a Char, and the whole expression can be extended to:
sat p =
item >>= (\c->...) =
Parser pa >= (\c->...) = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a,rest) -> parse (f a) rest
where f c = if p c
then return c
else failParse
pa input = case input of "" -> Nothing
(h:t) -> Just (h, t)
TL;DR: In general, by Monad laws,
do { item }
is the same as
do { c <- item
; return c
}
so it is defined by a return, in a sense. Details follow.
It does take one character out from the input string which is being "read", so in this sense it "reads" that character:
item :: Parser Char
item = Parser $ \ input -> -- input :: [Char]
case input of { "" -> Nothing
; (h:t) -> Just (h, t) -- (h:t) :: [Char]
} -- h :: Char t :: [Char]
and I bet there's a definition
parse (Parser pa) input = pa input
defined there somewhere; so
parse item input = case input of { "" -> Nothing
; (h:t) -> Just (h, t) }
Next, what does (>>=) mean? It means that
parse (Parser pa >>= f) input = case (parse (Parser pa) input) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
i.e.
parse (item >>= f) input
= case (parse item input) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
= case (case input of { "" -> Nothing
; (h:t) -> Just (h, t)
}) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
= case input of
"" -> Nothing
(h:t) -> case Just (h, t) of {
Just (a, leftovers) -> parse (f a) leftovers }
= case input of
"" -> Nothing
(h:t) -> parse (f h) t
Now,
-- sat p: a "satisfies `p`" parser
sat :: (Char -> Bool) -> Parser Char
sat p = do { c <- item -- sat p :: Parser Char
; if p c -- item :: Parser Char, c :: Char
then return c -- return c :: Parser Char
else failParse -- failParse :: Parser Char
}
= item >>= (\ c ->
if p c then return c else failParse)
(by unraveling the do syntax), and so
parse (sat p) input
= parse (item >>= (\ c ->
if p c then return c else failParse)) input
-- parse (item >>= f) input
-- = case input of { "" -> Nothing ; (h:t) -> parse (f h) t }
= case input of
"" -> Nothing
(h:t) -> parse ((\ c -> if p c then (return c)
else failParse) h) t
= case input of
"" -> Nothing
(c:t) -> parse (if p c then (return c)
else failParse) t
= case input of
"" -> Nothing
(c:t) -> if p c then parse (return c) t
else parse failParse t
= case input of
"" -> Nothing
(c:t) -> if p c then Just (c, t)
else Nothing
Now the meaning of sat p should be clear: for c produced by item (which is the first character in the input, if input is non-empty), if p c holds, c is accepted and the parse succeeds, otherwise the parse fails:
sat p = for c from item: -- do { c <- item
if p c -- ; if p c
then return c -- then return c
else failParse -- else failParse }

Parser Error Reporting deriving the right instances

I am trying to build an error reporting parser in haskell. Currently I have been looking at a tutorial and this is what I have so far.
type Position = (Int, Int)
type Err = (String, Position)
newtype Parser1 a = Parser1 {parse1 :: StateT String (StateT Position (MaybeT
(Either Err))) a} deriving (Monad, MonadState String, Applicative, Functor)
runParser :: Parser1 a -> String -> Either Err (Maybe ((a, String), Position))
runParser p ts = runMaybeT $ runStateT (runStateT (parse1 p) ts) (0, 0)
basicItem = Parser1 $ do
state <- get
case state of
(x:xs) -> do {put xs; return x}
[] -> empty
item = Parser1 $ do
c <- basicItem
pos <- lift get
lift (put (f pos))
return c
f :: Char -> Position -> Position
f d (ln, c) = (ln + 1, 0)
f _ (ln, c) = (ln , c + 1)
This piece of code does not compile, I think it is to do with my item parser and the fact that I am trying to access the inner state namely position. I was wondering how in the deriving clause do I make Haskell derive the instances for both states in my parser type, so then I can access the inner state?
Edit 1:
I initially tried declaring basicItem as:
basicItem :: (MonadState String m, Alternative m) => m t
basicItem = do
state <- get
case state of
(x:xs) -> do {put xs; return x}
[] -> empty`
However, I kept getting the error:
I was wondering why it cannot deduce context of get from MonadState String m,
when in my deriving clause I have MonadState String.
The error for my initial question is here:

Defining a new primitive combinator in Parsec

Problem context
I am writing a toy parser (for Scheme), that should be able to distinguish between - and + as identifiers and as a part of a number (e.g. +i, -2.4, +5). I would like to try-parse anything starting with + or - as a number, but with a catch: if the parses consumes the sign, but not any characters after the sign, then I would like it to act as if it was wrapped in a try, but if any input was consumed after the sign then I'd like it to fail, getting a nice contextual error message and line number/position for the offending character; wrapping the entire parser in a try would always backtrack, which is not what I want.
New combinator: followedBy
To this end, I want to create a combinator, which I dubbed followedBy¹, that mirrors bind (>>=) and takes a parser m and a function k that returns a parser,
followedBy :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b
m `followedBy` k = ...
and acts as >>= except when m consumes ok and k empty fails, in which case I want the combinator to empty fail. This combinator can then be used as such:
(char '+') `followedBy` (\sign -> parseUnsignedNumber sign)
to try to parse as a number, fail if it is partially a number but malformed, and empty fail if what follows + is not at all like a number.
Low-level implementation
I have adjusted the parserBind code from Text.Parsec.Prim to do exactly this², but because it uses symbols that are not exported by Parsec I can't compile it. Is there a way to write the same thing using higher-level constructs that are exported by Parsec?
My (uncompilable) implementation for followedBy is:
import Text.Parsec.Prim
followedBy :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b
-- Code adjusted from parserBind in Text.Parsec.Prim. Changed is that mcok.peerr uses eerr instead.
followedBy m k
= ParsecT $ \s cok cerr eok eerr ->
let
-- consumed-okay case for m
mcok x s' err =
let
-- if (k x) consumes, those go straight up
pcok = cok
pcerr = cerr
-- if (k x) doesn't consume input, but is okay,
-- we still return in the consumed continuation
peok x s err' = cok x s (mergeError err err')
-- if (k x) doesn't consume input, but errors,
-- then we return the empty error (not consuming m or k)
peerr err' = eerr (mergeError err err')
in unParser (k x) s pcok pcerr peok peerr
-- empty-ok case for m
meok x s err =
let
-- in these cases, (k x) can return as empty
pcok = cok
peok x s err' = eok x s (mergeError err err')
pcerr = cerr
peerr err' = eerr (mergeError err err')
in unParser (k x) s pcok pcerr peok peerr
-- consumed-error case for m
mcerr = cerr
-- empty-error case for m
meerr = eerr
in unParser m s mcok mcerr meok meerr
Possible alternative
I know that there is also an alternative, namely to try-parse + as an identifier, along the lines of char '+' >> parseDelimiter, and I will take this route if the followedBy does not work and another (elegant) solution does not present itself, but I am really curious if followedBy can be implemented and, if so, how.
¹) Anyone having a better name for it is welcome to comment.
²) I have not been able to test it, so I don't know for sure if my code works correctly.

How to restrict backtracking in a monad transformer parser combinator

tl;dr, How do I implement parsers whose backtracking can be restricted, where the parsers are monad transformer stacks?
I haven't found any papers, blogs, or example implementations of this approach; it seems the typical approach to restricting backtracking is a datatype with additional constructors, or the Parsec approach where backtracking is off by default.
My current implementation -- using a commit combinator, see below -- is wrong; I'm not sure about the types, whether it belongs in a type class, and my instances are less generic than it feels like they should be.
Can anyone describe how to do this cleanly, or point me to resources?
I've added my current code below; sorry for the post being so long!
The stack:
StateT
MaybeT/ListT
Either e
The intent is that backtracking operates in the middle layer -- a Nothing or an empty list wouldn't necessarily yield an error, it'd just mean that a different branch should be tried -- whereas the bottom layer is for errors (with some contextual information) that immediately abort the parsing.
{-# LANGUAGE NoMonomorphismRestriction, FunctionalDependencies,
FlexibleInstances, UndecidableInstances #-}
import Control.Monad.Trans.State (StateT(..))
import Control.Monad.State.Class (MonadState(..))
import Control.Monad.Trans.Maybe (MaybeT(..))
import Control.Monad.Trans.List (ListT(..))
import Control.Monad (MonadPlus(..), guard)
type Parser e t mm a = StateT [t] (mm (Either e)) a
newtype DParser e t a =
DParser {getDParser :: Parser e t MaybeT a}
instance Monad (DParser e t) where
return = DParser . return
(DParser d) >>= f = DParser (d >>= (getDParser . f))
instance MonadPlus (DParser e t) where
mzero = DParser (StateT (const (MaybeT (Right Nothing))))
mplus = undefined -- will worry about later
instance MonadState [t] (DParser e t) where
get = DParser get
put = DParser . put
A couple of parsing classes:
class (Monad m) => MonadParser t m n | m -> t, m -> n where
item :: m t
parse :: m a -> [t] -> n (a, [t])
class (Monad m, MonadParser t m n) => CommitParser t m n where
commit :: m a -> m a
Their instances:
instance MonadParser t (DParser e t) (MaybeT (Either e)) where
item =
get >>= \xs -> case xs of
(y:ys) -> put ys >> return y;
[] -> mzero;
parse = runStateT . getDParser
instance CommitParser t (DParser [t] t) (MaybeT (Either [t])) where
commit p =
DParser (
StateT (\ts -> MaybeT $ case runMaybeT (parse p ts) of
Left e -> Left e;
Right Nothing -> Left ts;
Right (Just x) -> Right (Just x);))
And a couple more combinators:
satisfy f =
item >>= \x ->
guard (f x) >>
return x
literal x = satisfy (== x)
Then these parsers:
ab = literal 'a' >> literal 'b'
ab' = literal 'a' >> commit (literal 'b')
give these results:
> myParse ab "abcd"
Right (Just ('b',"cd")) -- succeeds
> myParse ab' "abcd"
Right (Just ('b',"cd")) -- 'commit' doesn't affect success
> myParse ab "acd"
Right Nothing -- <== failure but not an error
> myParse ab' "acd"
Left "cd" -- <== error b/c of 'commit'
The answer appears to be in the MonadOr type class (which unfortunately for me is not part of the standard libraries):
class MonadZero m => MonadOr m where
morelse :: m a -> m a -> m a
satisfying Monoid and Left Catch:
morelse mzero b = b
morelse a mzero = a
morelse (morelse a b) c = morelse a (morelse b c)
morelse (return a) b = return a

Resources