recursive parser combinator (string) - parsing

I'm reading Programming in Haskell by G. Hutton and got confused in chapter 8 (Parsers). I can't understand how string parser works recursively. Here's the code (I've rewritten it explicitly using >>= because do notation abstracts even more stuff).
type Parser a = String -> [(a, String)]
failure :: Parser a
failure = \ inp -> []
return :: a -> Parser a
return v = \ inp -> [(v, inp)]
item :: Parser Char
item = \ inp -> case inp of
[] -> []
(x : xs) -> [(x, xs)]
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= f = \ inp -> case parse p inp of
[] -> []
[(v, out)] -> parse (f v) out
parse :: Parser a -> String -> [(a, String)]
parse p inp = p inp
char :: Char -> Parser Char
char x = sat (== x)
sat :: (Char -> Bool) -> Parser Char
sat p = item >>= \ ch -> if p ch then return ch else failure
Clear so far to me, but now I don't see how recursively string would work.
string :: String -> Parser String
string [] = return []
string (x:xs) = (char x) >>= \ _ ->
string xs >>= \ _ ->
return (x:xs)
Can someone rewrite every step of recursion on a simple input: e.g. string "he" "hell" which should return [("he","ll")]. Here's how I start.
string ('h':['e']) "hell"
((char 'h') >>= \ _ -> string ['e'] >>= \ _ -> return('h':['e'])) "hell"
(\ inp -> case parse (char 'h') inp of
[] -> []
[(v, out)] -> parse ((\ _ -> string ['e'] >>= \ _ -> return('h':['e']) v) out) "hell"
parse ((\ _ -> string ['e'] >>= \ _ -> return('h':['e']) 'h') "ell"
(string ['e'] >>= \ _ -> return('h':['e'])) "ell"
(\ inp -> case parse (string ['e']) "ell" of
[] -> []
[('e', "ll")] -> parse ((\ _ return(['h','e']) 'e')) "ll"
((\ _ return "he") 'e') "ll"
return "he" "ll"
[("he", "ll")]

Related

Combining parsers in Haskell

I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?
To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)

Monad Parser - Couldn't match expected type ‘[(b, String)]’ with actual type ‘Parser b’

I'm studying Haskell using "Programming in Haskell" of G.Hutton. I'm following ch.13 of Monadic Parser.
First, I define a type Parser:
newtype Parser a = P (String -> [(a, String)])
Then, a parse function
parse:: Parser a -> String -> [(a, String)]
I make the Parser a Monad
instance Monad Parser where
--return :: a -> Parser a
return v = P(\inp -> [(v, inp)])
--(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= g = P (\inp -> case parse p inp of
[] -> []
[(v, out)] -> parse (g v) out)
My problem is on the last line of Monad instance
Why this
[(v, out)] -> parse (g v) out)
and not this
[(v, out)] -> (g v))
>>= return a Parser b, not a [(b, String)].Infact g v, is a Parser.
I know I'm wrong, but I don't understand why.
>>= return a Parser b, not a [(b, String)].Infact g v, is a Parser.
That is correct, but we are constructing a Parser with the outer P. Indeed:
p >>= g = P (\inp -> case parse p inp of
[] -> []
[(v, out)] -> parse (g v) out)
Notice the P here immediately after the =. The lambda expression \inp -> … thus has to have as type String -> [(b, String)], not Parser. We evaluate the parser with parse, since that acts as a "getter" to get the function out of the g v.
Your implementation of >>= is however not complete. Indeed, this is a backtracking parser, and it is possible that the list contains no elements (no options), one element, or multiple elements. We thus should perform a mapping like:
p >>= g = P (
\inp -> concatMap (\(v, out) -> parse (g v) out) (parse p inp)
)
or we can make use of the bind operator >>= defined for lists:
p >>= g = P (
\inp -> parse p inp >>= \(v, out) -> parse (g v) out
)

How does the Haskell `do` notation know which value to take when it isn't defined by a return?

I have this monadic object.
data Parser a = Parser (String -> Maybe (a, String))
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f (Parser pa) = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a, rest) -> Just (f a, rest)
instance Applicative Parser where
pure = return
(<*>) = ap
instance Monad Parser where
--return :: a -> Parser a
return a = Parser $ \input -> Just (a, input)
--(>>=) :: Parser a -> (a -> Parser b) -> Parser b
(Parser pa) >>= f = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a,rest) -> parse (f a) rest
And I have this definition of an item which I am told "reads in a character" but I don't really see any reading going on.
item :: Parser Char
item = Parser $ \ input -> case input of "" -> Nothing
(h:t) -> Just (h, t)
But ok, fine, maybe I should just relax about how literal to take the word "read" and jibe with it. Moving on, I have
failParse :: Parser a
failParse = Parser $ \ input -> Nothing
sat :: (Char -> Bool) -> Parser Char
sat p = do c <- item
if p c
then return c
else failParse
And this is where I get pretty confused. What is getting stored in the variable c? Since item is a Parser with parameter Char, my first guess is that c is storing such an object. But after a second of thought I know that's not now the do notation works, you don't get the monad, you get the contents of the monad. Great, but then that tells me c is then the function
\ input -> case input of "" -> Nothing
(h:t) -> Just (h, t)
But clearly that's wrong since the next line of the definition of sat treats c like a character. Not only is that not what I expect, but it's about three levels of structure down from what I expected! It's not the function, it's not the Maybe object, and it's not the tuple, but it's the left coordinate of the Just tuple buried inside the function! How is that little character working all that way outside? What is instructing the <- to extract this part of the monad?
As comment mentioned, <- just be do notation syntax sugar and equivalent to:
item >>= (\c->if p c
then return c
else failParse)
Okay, let see what is c? consider the definition of (>>=)
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
or more readable way:
Parser a >>= (a -> Parser b)
And Now, matches it with above expression item >>= (\c->if p c then return c else failParse) give:
Parer a = item
and
(a->Parser b) = (\c->if p c then return c else failParse)
and item has type:
item :: Parser Char
so, we can now replace a in (>>=) by Char, gives
Parser Char >>= (Char -> Parser b)
and now \c->if p c then return c else failParse also have type: (Char -> Parser b)
and so c is a Char, and the whole expression can be extended to:
sat p =
item >>= (\c->...) =
Parser pa >= (\c->...) = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a,rest) -> parse (f a) rest
where f c = if p c
then return c
else failParse
pa input = case input of "" -> Nothing
(h:t) -> Just (h, t)
TL;DR: In general, by Monad laws,
do { item }
is the same as
do { c <- item
; return c
}
so it is defined by a return, in a sense. Details follow.
It does take one character out from the input string which is being "read", so in this sense it "reads" that character:
item :: Parser Char
item = Parser $ \ input -> -- input :: [Char]
case input of { "" -> Nothing
; (h:t) -> Just (h, t) -- (h:t) :: [Char]
} -- h :: Char t :: [Char]
and I bet there's a definition
parse (Parser pa) input = pa input
defined there somewhere; so
parse item input = case input of { "" -> Nothing
; (h:t) -> Just (h, t) }
Next, what does (>>=) mean? It means that
parse (Parser pa >>= f) input = case (parse (Parser pa) input) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
i.e.
parse (item >>= f) input
= case (parse item input) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
= case (case input of { "" -> Nothing
; (h:t) -> Just (h, t)
}) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
= case input of
"" -> Nothing
(h:t) -> case Just (h, t) of {
Just (a, leftovers) -> parse (f a) leftovers }
= case input of
"" -> Nothing
(h:t) -> parse (f h) t
Now,
-- sat p: a "satisfies `p`" parser
sat :: (Char -> Bool) -> Parser Char
sat p = do { c <- item -- sat p :: Parser Char
; if p c -- item :: Parser Char, c :: Char
then return c -- return c :: Parser Char
else failParse -- failParse :: Parser Char
}
= item >>= (\ c ->
if p c then return c else failParse)
(by unraveling the do syntax), and so
parse (sat p) input
= parse (item >>= (\ c ->
if p c then return c else failParse)) input
-- parse (item >>= f) input
-- = case input of { "" -> Nothing ; (h:t) -> parse (f h) t }
= case input of
"" -> Nothing
(h:t) -> parse ((\ c -> if p c then (return c)
else failParse) h) t
= case input of
"" -> Nothing
(c:t) -> parse (if p c then (return c)
else failParse) t
= case input of
"" -> Nothing
(c:t) -> if p c then parse (return c) t
else parse failParse t
= case input of
"" -> Nothing
(c:t) -> if p c then Just (c, t)
else Nothing
Now the meaning of sat p should be clear: for c produced by item (which is the first character in the input, if input is non-empty), if p c holds, c is accepted and the parse succeeds, otherwise the parse fails:
sat p = for c from item: -- do { c <- item
if p c -- ; if p c
then return c -- then return c
else failParse -- else failParse }

Sequencing operator in Haskell Parser

We are trying to build a Parser, but I can't understand the way the function p works, even if I understood the >>>= operator. How does de p function works?
type Parser a = String -> [(a, String)]
returnb :: a -> Parser a
returnb v = \input -> [(v, input)]
failure :: Parser a
failure = \input -> []
item :: Parser Char
item = \input -> case input of
[] -> []
(x:xs) -> [(x, xs)]
parse :: Parser a -> String -> [(a, String)]
parse parser input = parser input
-- Sequencing operator
(>>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>>= f = \input -> case parse p input of
[] -> []
[(v, out)] -> parse (f v) out
p :: Parser (Char, Char)
p = item >>>= \x ->
item >>>= \z ->
item >>>= \y ->
returnb (x,y)
As far as I can see it reads three char and gives back the first and the last one as a pair wrapped in a Parser monad.
If p specifically is the problem, it's working similarly to this pseudo-program, which is completely made up but hopefully illustrates the flow:
p input = let [(x, rest1)] = item input
[(z, rest2)] = item rest1
[(y, rest3)] = item rest2
in
[((x, y), rest3)]
That is, it parses three characters and produces a pair consisting of the first and the third characters.

Example of simple parsers in haskell

I have the following parsers functions
import Data.Char
type Parser a = String -> [(a,String)]
return' :: a -> Parser a
return' v = \inp -> [(v,inp)]
failure :: Parser a
failure = \inp -> []
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = \inp -> case parse p inp of
[] -> parse q inp
[(v,out)] -> [(v,out)]
So far so good, they can be loaded into ghci. But when I add the following function
sat :: (Char -> Bool) -> Parser Char
sat p = do x <- item
if p x then return x else failure
I obtain an error. Could you tell what's going on?
The type of x in the do block of sat is [(Char, String)]. This is because item has the type String -> [(Char, String)] or (->) String [(Char, String)] and you are using the Monad instance for (->) String, so what is "contained" in a (->) String [(Char, String)] is a [(Char, String)].
{-# LANGUAGE ScopedTypeVariables #-}
sat :: (Char -> Bool) -> Parser Char
sat p = do (x :: [(Char, String)]) <- item
if p x then return' x else failure
p is a function from a Char to Bool; it doesn't accept a list of possible parsing results. The only sensible thing to do with p is filter the results based on whether the Char matches p. This still results in [] when the result doesn't match p.
sat :: (Char -> Bool) -> Parser Char
sat p = do (x :: [(Char, String)]) <- item
return $ filter (p . fst) x
If we drop ScopedTypeVariables, we need to drop the type signature I added for illustration.
sat :: (Char -> Bool) -> Parser Char
sat p = do x <- item
return $ filter (p . fst) x
Use of do blocks requires that your Parser is an instance of the Monad class. Since your Parser type is an alias for a function type String -> [(a, String)] and functions are not instances of Monad, this is why you get the error. You could write your function without using do notation:
sat :: (Char -> Bool) -> Parser Char
sat p [] = []
sat p (c:cs) = if p c then [(c, cs)] else []

Resources