I'm doing data61's course: https://github.com/data61/fp-course. In the parser one, the following implementation will cause parse (list1 (character *> valueParser 'v')) "abc" stack overflow.
Existing code:
data List t =
Nil
| t :. List t
deriving (Eq, Ord)
-- Right-associative
infixr 5 :.
type Input = Chars
data ParseResult a =
UnexpectedEof
| ExpectedEof Input
| UnexpectedChar Char
| UnexpectedString Chars
| Result Input a
deriving Eq
instance Show a => Show (ParseResult a) where
show UnexpectedEof =
"Unexpected end of stream"
show (ExpectedEof i) =
stringconcat ["Expected end of stream, but got >", show i, "<"]
show (UnexpectedChar c) =
stringconcat ["Unexpected character: ", show [c]]
show (UnexpectedString s) =
stringconcat ["Unexpected string: ", show s]
show (Result i a) =
stringconcat ["Result >", hlist i, "< ", show a]
instance Functor ParseResult where
_ <$> UnexpectedEof =
UnexpectedEof
_ <$> ExpectedEof i =
ExpectedEof i
_ <$> UnexpectedChar c =
UnexpectedChar c
_ <$> UnexpectedString s =
UnexpectedString s
f <$> Result i a =
Result i (f a)
-- Function to determine is a parse result is an error.
isErrorResult ::
ParseResult a
-> Bool
isErrorResult (Result _ _) =
False
isErrorResult UnexpectedEof =
True
isErrorResult (ExpectedEof _) =
True
isErrorResult (UnexpectedChar _) =
True
isErrorResult (UnexpectedString _) =
True
-- | Runs the given function on a successful parse result. Otherwise return the same failing parse result.
onResult ::
ParseResult a
-> (Input -> a -> ParseResult b)
-> ParseResult b
onResult UnexpectedEof _ =
UnexpectedEof
onResult (ExpectedEof i) _ =
ExpectedEof i
onResult (UnexpectedChar c) _ =
UnexpectedChar c
onResult (UnexpectedString s) _ =
UnexpectedString s
onResult (Result i a) k =
k i a
data Parser a = P (Input -> ParseResult a)
parse ::
Parser a
-> Input
-> ParseResult a
parse (P p) =
p
-- | Produces a parser that always fails with #UnexpectedChar# using the given character.
unexpectedCharParser ::
Char
-> Parser a
unexpectedCharParser c =
P (\_ -> UnexpectedChar c)
--- | Return a parser that always returns the given parse result.
---
--- >>> isErrorResult (parse (constantParser UnexpectedEof) "abc")
--- True
constantParser ::
ParseResult a
-> Parser a
constantParser =
P . const
-- | Return a parser that succeeds with a character off the input or fails with an error if the input is empty.
--
-- >>> parse character "abc"
-- Result >bc< 'a'
--
-- >>> isErrorResult (parse character "")
-- True
character ::
Parser Char
character = P p
where p Nil = UnexpectedString Nil
p (a :. as) = Result as a
-- | Parsers can map.
-- Write a Functor instance for a #Parser#.
--
-- >>> parse (toUpper <$> character) "amz"
-- Result >mz< 'A'
instance Functor Parser where
(<$>) ::
(a -> b)
-> Parser a
-> Parser b
f <$> P p = P p'
where p' input = f <$> p input
-- | Return a parser that always succeeds with the given value and consumes no input.
--
-- >>> parse (valueParser 3) "abc"
-- Result >abc< 3
valueParser ::
a
-> Parser a
valueParser a = P p
where p input = Result input a
-- | Return a parser that tries the first parser for a successful value.
--
-- * If the first parser succeeds then use this parser.
--
-- * If the first parser fails, try the second parser.
--
-- >>> parse (character ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') ""
-- Result >< 'v'
--
-- >>> parse (character ||| valueParser 'v') "abc"
-- Result >bc< 'a'
--
-- >>> parse (constantParser UnexpectedEof ||| valueParser 'v') "abc"
-- Result >abc< 'v'
(|||) ::
Parser a
-> Parser a
-> Parser a
P a ||| P b = P c
where c input
| isErrorResult resultA = b input
| otherwise = resultA
where resultA = a input
infixl 3 |||
My code:
instance Monad Parser where
(=<<) ::
(a -> Parser b)
-> Parser a
-> Parser b
f =<< P a = P p
where p input = onResult (a input) (\i r -> parse (f r) i)
instance Applicative Parser where
(<*>) ::
Parser (a -> b)
-> Parser a
-> Parser b
P f <*> P a = P b
where b input = onResult (f input) (\i f' -> f' <$> a i)
list ::
Parser a
-> Parser (List a)
list p = list1 p ||| pure Nil
list1 ::
Parser a
-> Parser (List a)
list1 p = (:.) <$> p <*> list p
However, if I change list to not use list1, or use =<< in list1, it works fine. It also works if <*> uses =<<. I feel like it might be an issue with tail recursion.
UPDATE:
If I use lazy pattern matching here
P f <*> ~(P a) = P b
where b input = onResult (f input) (\i f' -> f' <$> a i)
It works fine. Pattern matching here is the problem. I don't understand this... Please help!
If I use lazy pattern matching P f <*> ~(P a) = ... then it works fine. Why?
This very issue was discussed recently. You could also fix it by using newtype instead of data: newtype Parser a = P (Input -> ParseResult a).(*)
The definition of list1 wants to know both parser arguments to <*>, but actually when the first will fail (when input is exhausted) we don't need to know the second! But since we force it, it will force its second argument, and that one will force its second parser, ad infinitum.(**) That is, p will fail when input is exhausted, but we have list1 p = (:.) <$> p <*> list p which forces list p even though it won't run when the preceding p fails. That's the reason for the infinite looping, and why your fix with the lazy pattern works.
What is the difference between data and newtype in terms of laziness?
(*)newtype'd type always has only one data constructor, and pattern matching on it does not actually force the value, so it is implicitly like a lazy pattern. Try newtype P = P Int, let foo (P i) = 42 in foo undefined and see that it works.
(**) This happens when the parser is still prepared, composed; before the combined, composed parser even gets to run on the actual input. This means there's yet another, third way to fix the problem: define
list1 p = (:.) <$> p <*> P (\s -> parse (list p) s)
This should work regardless of the laziness of <*> and whether data or newtype was used.
Intriguingly, the above definition means that the parser will be actually created during run time, depending on the input, which is the defining characteristic of Monad, not Applicative which is supposed to be known statically, in advance. But the difference here is that the Applicative depends on the hidden state of input, and not on the "returned" value.
Related
I am studying parsers in Haskell following definitions from G. Hutton, E. Meijer - Monadic Parsing in Haskell.
data Parser a = Parser { parseWith :: String -> [(a, String)] }
instance Functor Parser where
fmap f (Parser p) = Parser $ \s -> [(f a, rest) | (a, rest) <- p s]
instance Applicative Parser where
pure x = Parser $ \s -> [(x, s)]
(Parser p1) <*> (Parser p2) = Parser $ \s -> [(f x, r2) | (f, r1) <- p1 s, (x, r2) <- p2 r1]
instance Monad Parser where
return = pure
p >>= f = Parser $ \s -> concatMap (\(x, r) -> parseWith (f x) r) $ parseWith p s
instance Alternative Parser where
empty = failure
p1 <|> p2 = Parser $ \s ->
case parseWith p1 s of
[] -> parseWith p2 s
res -> res
Essentially I have a (parsed :: a, remaining :: String) context.
As a simple application, I defined the following ADT to parse:
data Arr = Arr Int [Int] -- len [values]
and a parser that can construct Array values from strings, e.g.:
"5|12345" -> Arr 5 [1,2,3,4,5]
First, in order to parse n such Array values (the string input contains n on the first position), e.g.:
"2 3|123 4|9876 2|55" -> [Arr 3 [1,2,3], Arr 4 [9,8,7,6]]
I can do the following:
arrayParse :: Parser Arr
arrayParse = do
len <- digitParse
vals <- exactly len digitParse
return $ Arr len vals
nArraysParse :: Parser [Arr]
nArraysParse = do
n <- digitParse
exactly n arrayParse
where exactly n p constructs a new parser by applying p n times.
Next, I want to parse a different scheme.
Suppose the first character denotes the length of the sub-string defining the arrays, e.g.:
"9 3|123 4|9876 2|55" -> [Arr 3 [1,2,3], Arr 4 [9,8,7,6]]
Meaning that I have to apply arrayParse on the first n chars (excluding | and whitespace) to get the first 2 arrays:
3|123 -> 4 chars (excluding | and whitespace)
4|9876 -> 5 chars (excluding | and whitespace)
So, it's straightforward to apply a parser n times:
exactly :: Int -> Parser a -> Parser [a]
exactly 0 _ = pure []
exactly n p = do
v <- p -- apply parser p once
v' <- exactly (n-1) p -- apply parser p n-1 times
return (v:v')
but how can I express the intent of applying a parser on the first n characters?
My initial approach was something like this:
foo :: Parser [Arr]
foo = do
n <- digitParse
substring <- consume n
-- what to do with substring?
-- can I apply arrayParse on it?
How should I approach this?
Following #jlwoodwa's advice, I managed to achieve the following:
innerParse :: Parser a -> String -> Parser a
innerParse p s = case parseWith p s of
[(arr, "")] -> return arr
_ -> failure
substringParse :: Parser [Arr]
substringParse = do
n <- digitParse
substring <- consume n
innerParse (zeroOrMore arrayParse) substring
which works for my use-case.
I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)
I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?
To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)
I have this monadic object.
data Parser a = Parser (String -> Maybe (a, String))
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f (Parser pa) = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a, rest) -> Just (f a, rest)
instance Applicative Parser where
pure = return
(<*>) = ap
instance Monad Parser where
--return :: a -> Parser a
return a = Parser $ \input -> Just (a, input)
--(>>=) :: Parser a -> (a -> Parser b) -> Parser b
(Parser pa) >>= f = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a,rest) -> parse (f a) rest
And I have this definition of an item which I am told "reads in a character" but I don't really see any reading going on.
item :: Parser Char
item = Parser $ \ input -> case input of "" -> Nothing
(h:t) -> Just (h, t)
But ok, fine, maybe I should just relax about how literal to take the word "read" and jibe with it. Moving on, I have
failParse :: Parser a
failParse = Parser $ \ input -> Nothing
sat :: (Char -> Bool) -> Parser Char
sat p = do c <- item
if p c
then return c
else failParse
And this is where I get pretty confused. What is getting stored in the variable c? Since item is a Parser with parameter Char, my first guess is that c is storing such an object. But after a second of thought I know that's not now the do notation works, you don't get the monad, you get the contents of the monad. Great, but then that tells me c is then the function
\ input -> case input of "" -> Nothing
(h:t) -> Just (h, t)
But clearly that's wrong since the next line of the definition of sat treats c like a character. Not only is that not what I expect, but it's about three levels of structure down from what I expected! It's not the function, it's not the Maybe object, and it's not the tuple, but it's the left coordinate of the Just tuple buried inside the function! How is that little character working all that way outside? What is instructing the <- to extract this part of the monad?
As comment mentioned, <- just be do notation syntax sugar and equivalent to:
item >>= (\c->if p c
then return c
else failParse)
Okay, let see what is c? consider the definition of (>>=)
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
or more readable way:
Parser a >>= (a -> Parser b)
And Now, matches it with above expression item >>= (\c->if p c then return c else failParse) give:
Parer a = item
and
(a->Parser b) = (\c->if p c then return c else failParse)
and item has type:
item :: Parser Char
so, we can now replace a in (>>=) by Char, gives
Parser Char >>= (Char -> Parser b)
and now \c->if p c then return c else failParse also have type: (Char -> Parser b)
and so c is a Char, and the whole expression can be extended to:
sat p =
item >>= (\c->...) =
Parser pa >= (\c->...) = Parser $ \input -> case pa input of
Nothing -> Nothing
Just (a,rest) -> parse (f a) rest
where f c = if p c
then return c
else failParse
pa input = case input of "" -> Nothing
(h:t) -> Just (h, t)
TL;DR: In general, by Monad laws,
do { item }
is the same as
do { c <- item
; return c
}
so it is defined by a return, in a sense. Details follow.
It does take one character out from the input string which is being "read", so in this sense it "reads" that character:
item :: Parser Char
item = Parser $ \ input -> -- input :: [Char]
case input of { "" -> Nothing
; (h:t) -> Just (h, t) -- (h:t) :: [Char]
} -- h :: Char t :: [Char]
and I bet there's a definition
parse (Parser pa) input = pa input
defined there somewhere; so
parse item input = case input of { "" -> Nothing
; (h:t) -> Just (h, t) }
Next, what does (>>=) mean? It means that
parse (Parser pa >>= f) input = case (parse (Parser pa) input) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
i.e.
parse (item >>= f) input
= case (parse item input) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
= case (case input of { "" -> Nothing
; (h:t) -> Just (h, t)
}) of
Nothing -> Nothing
Just (a, leftovers) -> parse (f a) leftovers
= case input of
"" -> Nothing
(h:t) -> case Just (h, t) of {
Just (a, leftovers) -> parse (f a) leftovers }
= case input of
"" -> Nothing
(h:t) -> parse (f h) t
Now,
-- sat p: a "satisfies `p`" parser
sat :: (Char -> Bool) -> Parser Char
sat p = do { c <- item -- sat p :: Parser Char
; if p c -- item :: Parser Char, c :: Char
then return c -- return c :: Parser Char
else failParse -- failParse :: Parser Char
}
= item >>= (\ c ->
if p c then return c else failParse)
(by unraveling the do syntax), and so
parse (sat p) input
= parse (item >>= (\ c ->
if p c then return c else failParse)) input
-- parse (item >>= f) input
-- = case input of { "" -> Nothing ; (h:t) -> parse (f h) t }
= case input of
"" -> Nothing
(h:t) -> parse ((\ c -> if p c then (return c)
else failParse) h) t
= case input of
"" -> Nothing
(c:t) -> parse (if p c then (return c)
else failParse) t
= case input of
"" -> Nothing
(c:t) -> if p c then parse (return c) t
else parse failParse t
= case input of
"" -> Nothing
(c:t) -> if p c then Just (c, t)
else Nothing
Now the meaning of sat p should be clear: for c produced by item (which is the first character in the input, if input is non-empty), if p c holds, c is accepted and the parse succeeds, otherwise the parse fails:
sat p = for c from item: -- do { c <- item
if p c -- ; if p c
then return c -- then return c
else failParse -- else failParse }
My question is about Graham Hutton's book Programming in Haskell 1st Ed.
There is a parser created in section 8.4, and I am assuming anyone answering has the book or can see the link to slide 8 in the link above.
A basic parser called item is described as:
type Parser a = String -> [(a, String)]
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
which is used with do to define another parser p (the do parser)
p :: Parser (Char, Char)
p = do x <- item
item
y <- item
return (x,y)
the relevant bind definition is:
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= f = \inp -> case parse p inp of
[] -> []
[(v,out)] -> parse (f v) out
return is defined as:
return :: a -> Parser a
return v = \inp -> [(v,inp)]
parse is defined as:
parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
The program (the do parser) takes a string and selects the 1st and 3rd characters and returns them in a tuple with the remainder of the string in a list, e.g., "abcdef" produces [('a','c'), "def"].
I want to know how the
(f v) out
in
[(v,out)] -> parse (f v) out
returns a parser which is then applied to out.
f in the do parser is item and item taking a character 'c' returns [('c',[])]?
How can that be a parser and how can it take out as an argument?
Perhaps I am just not understanding what (f v) does.
Also how does the do parser 'drop' the returned values each time to operate on the rest of the input string when item is called again?
What is the object that works its way through the do parser, and how is it altered at each step, and by what means is it altered?
f v produces a Parser b because f is a function of type a -> Parser b and v is a value of type a. So then you're calling parse with this Parser b and the string out as arguments.
F in the 'do' parser is item
No, it's not. Let's consider a simplified (albeit now somewhat pointless) version of your parser:
p = do x <- item
return x
This will desugar to:
p = item >>= \x -> return x
So the right operand of >>=, i.e. f, is \x -> return x, not item.
Also how does the 'do' parser 'drop' the returned values each time to operate on the rest of the input string when item is called again? What is the object that works its way through the 'do' parser and how is it altered and each step and by what means is it altered?
When you apply a parser it returns a tuple containing the parsed value and a string representing the rest of the input. If you look at item for example, the second element of the tuple will be xs which is the tail of the input string (i.e. a string containing all characters of the input string except the first). This second part of the tuple will be what's fed as the new input to subsequent parsers (as per [(v,out)] -> parse (f v) out), so that way each successive parser will take as input the string that the previous parser produced as the second part of its output tuple (which will be a suffix of its input).
In response to your comments:
When you write "p = item >>= \x -> return x", is that the equivalent of just the first line "p = do x <- item"?
No, it's equivalent to the entire do-block (i.e. do {x <- item; return x}). You can't translate do-blocks line-by-line like that. do { x <- foo; rest } is equivalent to foo >>= \x -> do {rest}, so you'll always have the rest of the do-block as part of the right operand of >>=.
but not how that reduces to simply making 'out' available as the input for the next line. What is parse doing if the next line of the 'do' parser is a the item parser?
Let's walk through an example where we invoke item twice (this is like your p, but without the middle item). In the below I'll use === to denote that the expressions above and below the === are equivalent.
do x <- item
y <- item
return (x, y)
=== -- Desugaring do
item >>= \x -> item >>= \y -> return (x, y)
=== -- Inserting the definition of >>= for outer >>=
\inp -> case parse item inp of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
Now let's apply this to the input "ab":
case parse item "ab" of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
=== Insert defintiion of `parse`
case item "ab" of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
=== Insert definition of item
case ('a', "b") of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
===
parse (item >>= \y -> return ('a', y)) out
Now we can expand the second >>= the same we did the fist and eventually end up with ('a', 'b').
The relevant advice is, Don't panic (meaning, don't rush it; or, take it slow), and, Follow the types.
First of all, Parsers
type Parser a = String -> [(a,String)]
are functions from String to lists of pairings of result values of type a and the leftover Strings (because type defines type synonyms, not new types like data or newtype do).
That leftovers string will be used as input for the next parsing step. That's the main thing about it here.
You are asking, in
p >>= f = \inp -> case (parse p inp) of
[] -> []
[(v,out)] -> parse (f v) out
how the (f v) in [(v,out)] -> parse (f v) out returns a parser which is then applied to out?
The answer is, f's type says that it does so:
(>>=) :: Parser a -> (a -> Parser b) -> Parser b -- or, the equivalent
(>>=) :: Parser a -> (a -> Parser b) -> (String -> [(b,String)])
-- p f inp
We have f :: a -> Parser b, so that's just what it does: applied to a value of type a it returns a value of type Parser b. Or equivalently,
f :: a -> (String -> [(b,String)]) -- so that
f (v :: a) :: String -> [(b,String)] -- and,
f (v :: a) (out :: String) :: [(b,String)]
So whatever is the value that parse p inp produces, it must be what f is waiting for to proceed. The types must "fit":
p :: Parser a -- m a
f :: a -> Parser b -- a -> m b
f <$> p :: Parser ( Parser b ) -- m ( m b )
f =<< p :: Parser b -- m b
or, equivalently,
p :: String -> [(a, String)]
-- inp v out
f :: a -> String -> [(b, String)]
-- v out
p >>= f :: String -> [(b, String)] -- a combined Parser
-- inp v2 out2
So this also answers your second question,
How can that be a parser and how can it take out as an argument?
The real question is, what kind of f is it, that does such a thing? Where does it come from? And that's your fourth question.
And the answer is, your example in do-notation,
p :: Parser (Char, Char)
p = do x <- item
_ <- item
y <- item
return (x,y)
by Monad laws is equivalent to the nested chain
p = do { x <- item
; do { _ <- item
; do { y <- item
; return (x,y) }}}
which is a syntactic sugar for the nested chain of Parser bind applications,
p :: Parser (Char, Char) -- ~ String -> [((Char,Char), String)]
p = item >>= (\ x -> -- item :: Parser Char ~ String -> [(Char,String)]
item >>= (\ _ -> -- x :: Char
item >>= (\ y -> -- y :: Char
return (x,y) )))
and it is because the functions are nested that the final return has access to both y and x there; and it is precisely the Parser bind that arranges for the output leftovers string to be used as input to the next parsing step:
p = item >>= f -- :: String -> [((Char,Char), String)]
where
{ f x = item >>= f2
where { f2 _ = item >>= f3
where { f3 y = return (x,y) }}}
i.e. (under the assumption that inp is a string of length two or longer),
parse p inp -- assume that `inp`'s
= (item >>= f) inp -- length is at least 2 NB.
=
let [(v, left)] = item inp -- by the def of >>=
in
(f v) left
=
let [(v, left)] = item inp
in
let x = v -- inline the definition of `f`
in (item >>= f2) left
=
let [(v, left)] = item inp
in let x = v
in let [(v2, left2)] = item left -- by the def of >>=, again
in (f2 v2) left2
=
..........
=
let [(x,left1)] = item inp -- x <- item
[(_,left2)] = item left1 -- _ <- item
[(y,left3)] = item left2 -- y <- item
in
[((x,y), left3)]
=
let (x:left1) = inp -- inline the definition
(_:left2) = left1 -- of `item`
(y:left3) = left2
in
[((x,y), left3)]
=
let (x:_:y:left3) = inp
in
[((x,y), left3)]
after few simplifications.
And this answers your third question.
I am having similar problems reading the syntax, because it's not what we are used to.
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= f = \inp -> case parse p inp of
[] -> []
[(v,out)] -> parse (f v) out
so for the question:
I want to know how the (f v) out in [(v,out)] -> parse (f v) out returns a parser which is then applied to out.
It does because that's the signature of the 2nd arg (the f): (>>=) :: Parser a -> (a -> Parser b) -> Parser b .... f takes an a and produces a Parser b . a Parser b takes a String which is the out ... (f v) out.
But the output of this should not be mixed up with the output of the function we are writing: >>=
We are outputting a parser ... (>>=) :: Parser a -> (a -> Parser b) ->
Parser b .
The Parser we are outputting has the job of wrapping and chaining the first 2 args
A parser is a function that takes 1 arg. This is constructed right after the first = ... i.e. by returning an (anonymous) function: p >>= f = \inp -> ... so inp refers to the input string of the Parser we are building
so what is left is to define what that constructed function should do ... NOTE: we are not implementing any of the input parsers just chaining them together ... so the output Parser function should:
apply the input parser (p) to the its input (inp): p >>= f = \inp -> case parse p inp of
take the output of that parse [(v, out)] -- v is the result, out is what remains of the input
apply the input function (f is (a -> Parser b)) to the parsed result (v)
(f v) produces a Parser b (a function that takes 1 arg)
so apply that output parser to the remainder of the input after the first parser (out)
For me the understanding lies in the use of destructuring and the realization that we are constructing a function that glues together the execution of other functions together simply considering their interface.
Hope that helps ... it helped me to write it :-)