TL;DR
I'm trying to understand how this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
where
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
Is equivalent to this:
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = do
c <- anyChar
if pred c then return c else empty
Context
This is a snippet from some lecture notes on Haskell parsing, which I'm trying to understand:
import Control.Applicative
import Data.Char
import Data.Functor
import Data.List
newtype Parser a = PsrOf (String -> Maybe (String, a))
-- Function from input string to:
--
-- * Nothing, if failure (syntax error);
-- * Just (unconsumed input, answer), if success.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) = p
-- Monadic Parsing in Haskell uses [] instead of Maybe to support ambiguous
-- grammars and multiple answers.
-- | Use a parser on an input string.
runParser :: Parser a -> String -> Maybe a
runParser (PsrOf p) inp = case p inp of
Nothing -> Nothing
Just (_, a) -> Just a
-- OR: fmap (\(_,a) -> a) (p inp)
-- | Read a character and return. Failure if input is empty.
anyChar :: Parser Char
anyChar = PsrOf p
where
p "" = Nothing
p (c:cs) = Just (cs, c)
-- | Read a character and check against the given character.
char :: Char -> Parser Char
-- char wanted = PsrOf p
-- where
-- p (c:cs) | c == wanted = Just (cs, c)
-- p _ = Nothing
char wanted = satisfy (\c -> c == wanted) -- (== wanted)
-- | Read a character and check against the given predicate.
satisfy :: (Char -> Bool) -> Parser Char
satisfy pred = PsrOf p
where
p (c:cs) | pred c = Just (cs, c)
p _ = Nothing
-- Could also be:
-- satisfy pred = do
-- c <- anyChar
-- if pred c then return c else empty
instance Monad Parser where
-- return :: a -> Parser a
return = pure
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
PsrOf p1 >>= k = PsrOf q
where
q inp = case p1 inp of
Nothing -> Nothing
Just (rest, a) -> dePsr (k a) rest
I understand everything up until the last bit of the Monad definition, specifically I don't understand how the following line returns something of type Parser b as is required by the (>>=) definition:
Just (rest, a) -> dePsr (k a) rest
It's difficult for me grasp what the Monad definition means without an example. Thankfully, we have one in the alternate version of the satisfy function, which uses do-notation (which of course means the Monad is being called). I really don't understand do-notation yet, so here's the desugared version of satisfy:
satisfy pred = do
anyChar >>= (c ->
if pred c then return c else empty)
So based on the first line of our (>>=)definition, which is
PsrOf p1 >>= k = PsrOf q
We have anyChar as our PsrOf p1 and (c -> if pred c then return c else empty) as our k. What I don't get is how in dePsr (k a) rest that (k a) returns a Parser (at least it shold, otherwise calling dePsr on it wouldn't make sense). This is made more confusing by the presence of rest. Even if (k a) returned a Parser, calling dePsr would extract the underlying function from the returned Parser and pass rest to it as an input. This is definitely doesn't return something of type Parser b as required by the definition of (>>=). Clearly I'm misunderstanding something somewhere.
Ok, Maybe this will help. Let's start by puting some points back into dePsr.
dePsr :: Parser a -> String -> Maybe (String, a)
dePsr (PsrOf p) rest = p rest
And let's also write out return: (NB I'm putting in all the points for clarity)
return :: a -> Parser a
return a = PsrOf (\rest -> Just (rest, a))
And now from the Just branch of the (>>=) definition
Just (rest, a) -> dePsr (k a) rest
Let's make sure we agree on what every thing is:
rest the string remaining unparsed after p1 is applied
a the result of applying p1
k :: a -> Parser b takes the result of the previous parser and makes a new parser
dePsr unwraps a Parser a back into a function `String -> Maybe (String, a)
Remember we will wrap this back into a parser again at the top of the function: PsrOf q
So in English bind (>>=) take a parser in a and a function from a to a parser in b and returns a parser in b. The resulting parser is made by wrapping q :: String -> Maybe (String, b) in the Parser constructor PsrOf. Then q, the combined parser, take a String called inp and applies the function p1 :: String -> Maybe (String,a) that we got from pattern matching against the first parser, and pattern matches on the result. For an error we propagate Nothing (easy). If the first parser had a result we have tow pieces of information, the still unparsed string called rest and the result a. We give a to k, the second parser combinator, and get a Parser b which we need to unwrap with dePsr to get a function (String -> Maybe (String,b) back. That function can be applied to rest for the final result of the combined parsers.
I think the hardest part about reading this is that sometimes we curry the parser function which obscures what is actually happening.
Ok for the satisfy example
satisfy pred
= anyChar >>= (c -> if pred c then return c else empty)
empty comes from the alternative instance and is PsrOf (const Nothing) so a parser that always fails.
Lets look at only the successful branches. By substitution of only the successful part:
PsrOf (\(c:cs) ->Just (cs, c)) >>= (\c -> PsrOf (\rest -> Just (rest, c)))
So in the bind (>>=) definition
p1 = \(c:cs -> Just (cs, c))
k = (\c -> PsrOf (\rest -> Just (rest, c)))
q inp = let Just (rest,a) = p1 inp in dePsr (k a) rest again only successful branch
Then q becomes
q inp =
let Just (rest, a) = (\(c:cs) -> Just (cs, c)) inp
in dePsr (\c -> PsrOf (\rest -> Just (rest, c))) a rest
Doing a little β-reduction
q inp =
let (c:cs) = inp
rest = cs
a = c
in dePsr (PsdOf (\rest -> Just (rest, a))) rest -- dePsr . PsrOf = id
Finally cleaning up some more
q (c:cs) = Just (cs, c)
So if pred is successful we reduce satisfy back to exactly anyChar which is exactly what we expect, and exactly what we find in the first example of the question. I will leave it as and exersize to the reader (read: I'm lazy) to prove that if either inp = "" or pred c = False that the outcome is Nothing as in the first satisfy example.
NOTE: If you are doing anything other than a class assignment, you will save yourself hours of pain and frustration by starting with error handling from the beginning make your parser String -> Either String (String,a) it is easy to make the error type more general later, but a PITA to change everything from Maybe to Either.
Question: "[C]ould you explain how you arrived at return a = PsrOf (\rest -> Just (rest, a)) from return = pure after you put "points" back into return?
Answer: First off, it is pretty unfortunate to give the Monad instance definition without the Functor and Applicative definitions. The pure and return functions must be identical (It is part of the Monad Laws), and they would be called the same thing except Monad far predates Applicative in Haskell history. In point of fact, I don't "know" what pure looks like, but I know what it has to be because it is the only possible definition. (If you want to understand the the proof of that statement ask, I have read the papers, and I know the results, but I'm not into typed lambda calculus quite enough to be confident in reproducing the results.)
return must wrap a value in the context without altering the context.
return :: Monad m => a -> m a
return :: a -> Parser a -- for our Monad
return :: a -> PsrOf(\str -> Maybe (rest, value)) -- substituting the constructor (PSUDO CODE)
A Parser is a function that takes a string to be parsed and returns Just the value along with any unparsed portion of the original string or Nothing on failure, all wrapped in the constructorPsrOf. The context is the string to be parsed, so we cannot change that. The value is of course what was passed toreturn`. The parser always succeeds so we must return Just a value.
return a = PsrOf (\rest -> Just (rest, a))
rest is the context and it is passed through unaltered.
a is the value we put into the Monad context.
For completeness here is also the only reasonable definition of fmap from Functor.
fmap :: Functor f => (a->b) -> f a -> f b
fmap :: (a -> b) -> Parser a -> Parser b -- for Parser Monad
fmap f (PsrOf p) = PsrOf q
where q inp = case p inp of
Nothing -> Nothing
Just (rest, a) -> Just (rest, f a)
-- better but less instructive definition of q
-- q = fmap (\(rest,a) -> (rest, f a)) . p
I am reading the book Programming in Haskell by Graham Hutton and I have some problem to understand how <*> and partial application can be used to parse a string.
I know that pure (+1) <*> Just 2
produces Just 3
because pure (+1) produces Just (+1) and then Just (+1) <*> Just 2
produces Just (2+1) and then Just 3
But in more complex case like this:
-- Define a new type containing a parser function
newtype Parser a = P (String -> [(a,String)])
-- This function apply the parser p on inp
parse :: Parser a -> String -> [(a,String)]
parse (P p) inp = p inp
-- A parser which return a tuple with the first char and the remaining string
item :: Parser Char
item = P (\inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)])
-- A parser is a functor
instance Functor Parser where
fmap g p = P (\inp -> case parse p inp of
[] -> []
[(v, out)] -> [(g v, out)])
-- A parser is also an applicative functor
instance Applicative Parser where
pure v = P (\inp -> [(v, inp)])
pg <*> px = P (\inp -> case parse pg inp of
[] -> []
[(g, out)] -> parse (fmap g px) out)
So, when I do:
parse (pure (\x y -> (x,y)) <*> item <*> item) "abc"
The answer is:
[(('a','b'),"c")]
But I don't understand what exactly happens.
First:
pure (\x y -> (x,y)) => P (\inp1 -> [(\x y -> (x,y), inp1)])
I have now a parser with one parameter.
Then:
P (\inp1 -> [(\x y -> (x,y), inp1)]) <*> item
=> P (\inp2 -> case parse (\inp1 -> [(\x y -> (x,y), inp1)]) inp2 of ???
I really don't understand what happens here.
Can someone explain, step by step, what's happens now until the end please.
Let's evaluate pure (\x y -> (x,y)) <*> item. The second application of <*> will be easy once we've seen the first:
P (\inp1 -> [(\x y -> (x,y), inp1)]) <*> item
We replace the <*> expression with its definition, substituting the expression's operands for the definition's parameters.
P (\inp2 -> case parse P (\inp1 -> [(\x y -> (x,y), inp1)]) inp2 of
[] -> []
[(g, out)] -> parse (fmap g item) out)
Then we do the same for the fmap expression.
P (\inp2 -> case parse P (\inp1 -> [(\x y -> (x,y), inp1)]) inp2 of
[] -> []
[(g, out)] -> parse P (\inp -> case parse item inp of
[] -> []
[(v, out)] -> [(g v, out)]) out)
Now we can reduce the first two parse expressions (we'll leave parse item out for later since it's basically primitive).
P (\inp2 -> case [(\x y -> (x,y), inp2)] of
[] -> []
[(g, out)] -> case parse item out of
[] -> []
[(v, out)] -> [(g v, out)])
So much for pure (\x y -> (x,y)) <*> item. Since you created the first parser by lifting a binary function of type a -> b -> (a, b), the single application to a parser of type Parser Char represents a parser of type Parser (b -> (Char, b)).
We can run this parser through the parse function with input "abc". Since the parser has type Parser (b -> (Char, b)), this should reduce to a value of type [(b -> (Char, b), String)]. Let's evaluate that expression now.
parse P (\inp2 -> case [(\x y -> (x,y), inp2)] of
[] -> []
[(g, out)] -> case parse item out of
[] -> []
[(v, out)] -> [(g v, out)]) "abc"
By the definition of parse this reduces to
case [(\x y -> (x,y), "abc")] of
[] -> []
[(g, out)] -> case parse item out of
[] -> []
[(v, out)] -> [(g v, out)]
Clearly, the patterns don't match in the first case, but they do in the second case. We substitute the matches for the patterns in the second expression.
case parse item "abc" of
[] -> []
[(v, out)] -> [((\x y -> (x,y)) v, out)]
Now we finally evaluate that last parse expression. parse item "abc" clearly reduces to [('a', "bc")] from the definition of item.
case [('a', "bc")] of
[] -> []
[(v, out)] -> [((\x y -> (x,y)) v, out)]
Again, the second pattern matches and we do substitution
[((\x y -> (x,y)) 'a', "bc")]
which reduces to
[(\y -> ('a', y), "bc")] :: [(b -> (Char, b), String)] -- the expected type
If you apply this same process to evaluate a second <*> application, and put the result in the parse (result) "abc" expression, you'll see that the expression indeed reduces to[(('a','b'),"c")].
What helped me a lot while learning these things was to focus on the types of the values and functions involved. It's all about applying a function to a value (or in your case applying a function to two values).
($) :: (a -> b) -> a -> b
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
So with a Functor we apply a function on a value inside a "container/context" (i.e. Maybe, List, . .), and with an Applicative the function we want to apply is itself inside a "container/context".
The function you want to apply is (,), and the values you want to apply the function to are inside a container/context (in your case Parser a).
Using pure we lift the function (,) into the same "context/container" our values are in (note, that we can use pure to lift the function into any Applicative (Maybe, List, Parser, . . ):
(,) :: a -> b -> (a, b)
pure (,) :: Parser (a -> b -> (a, b))
Using <*> we can apply the function (,) that is now inside the Parser context to a value that is also inside the Parser context. One difference to the example you provided with +1 is that (,) has two arguments. Therefore we have to use <*> twice:
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
x :: Parser Int
y :: Parser Char
let p1 = pure (,) <*> x :: Parser (b -> (Int, b))
let v1 = (,) 1 :: b -> (Int, b)
let p2 = p1 <*> y :: Parser (Int, Char)
let v2 = v1 'a' :: (Int, Char)
We have now created a new parser (p2) that we can use just like any other parser!
. . and then there is more!
Have a look at this convenience function:
(<$>) :: Functor f => (a -> b) -> f a -> f b
<$> is just fmap but you can use it to write the combinators more beautifully:
data User = User {name :: String, year :: Int}
nameParser :: Parser String
yearParser :: Parser Int
let userParser = User <$> nameParser <*> yearParser -- :: Parser User
Ok, this answer got longer than I expected! Well, I hope it helps. Maybe also have a look at Typeclassopedia which I found invaluable while learning Haskell which is an endless beautiful process . . :)
TL;DR: When you said you "[now] have a parser with one parameter" inp1, you got confused: inp1 is an input string to a parser, but the function (\x y -> (x,y)) - which is just (,) - is being applied to the output value(s), produced by parsing the input string. The sequence of values produced by your interim parsers is:
-- by (pure (,)):
(,) -- a function expecting two arguments
-- by the first <*> combination with (item):
(,) x -- a partially applied (,) function expecting one more argument
-- by the final <*> combination with another (item):
((,) x) y == (x,y) -- the final result, a pair of `Char`s taken off the
-- input string, first (`x`) by an `item`,
-- and the second (`y`) by another `item` parser
Working by equational reasoning can oftentimes be easier:
-- pseudocode definition of `fmap`:
parse (fmap g p) inp = case (parse p inp) of -- g :: a -> b , p :: Parser a
[] -> [] -- fmap g p :: Parser b
[(v, out)] -> [(g v, out)] -- v :: a , g v :: b
(apparently this assumes any parser can only produce 0 or 1 results, as the case of a longer list isn't handled at all -- which is usually frowned upon, and with good reason);
-- pseudocode definition of `pure`:
parse (pure v) inp = [(v, inp)] -- v :: a , pure v :: Parser a
(parsing with pure v produces the v without consuming the input);
-- pseudocode definition of `item`:
parse (item) inp = case inp of -- inp :: ['Char']
[] -> []
(x:xs) -> [(x,xs)] -- item :: Parser 'Char'
(parsing with item means taking one Char off the head of the input String, if possible); and,
-- pseudocode definition of `(<*>)`:
parse (pg <*> px) inp = case (parse pg inp) of -- px :: Parser a
[] -> []
[(g, out)] -> parse (fmap g px) out -- g :: a -> b
(<*> combines two parsers with types of results that fit, producing a new, combined parser which uses the first parse to parse the input, then uses the second parser to parse the leftover string, combining the two results to produce the result of the new, combined parser);
Now, <*> associates to the left, so what you ask about is
parse ( pure (\x y -> (x,y)) <*> item <*> item ) "abc"
= parse ( (pure (,) <*> item1) <*> item2 ) "abc" -- item_i = item
the rightmost <*> is the topmost, so we expand it first, leaving the nested expression as is for now,
= case (parse (pure (,) <*> item1) "abc") of -- by definition of <*>
[] -> []
[(g2, out2)] -> parse (fmap g2 item2) out2
= case (parse item out2) of -- by definition of fmap
[] -> []
[(v, out)] -> [(g2 v, out)]
= case out2 of -- by definition of item
[] -> []
(y:ys) -> [(g2 y, ys)]
Similarly, the nested expression is simplified as
parse (pure (,) <*> item1) "abc"
= case (parse (pure (\x y -> (x,y))) "abc") of -- by definition of <*>
[] -> []
[(g1, out1)] -> parse (fmap g1 item1) out1
= case (parse item out1) of ....
= case out1 of
[] -> []
(x:xs) -> [(g1 x, xs)]
= case [((,), "abc")] of -- by definition of pure
[(g1, out1)] -> case out1 of
[] -> []
(x:xs) -> [(g1 x, xs)]
= let { out1 = "abc"
; g1 = (,)
; (x:xs) = out1
}
in [(g1 x, xs)]
= [( (,) 'a', "bc")]
and thus we get
= case [( (,) 'a', "bc")] of
[(g2, out2)] -> case out2 of
[] -> []
(y:ys) -> [(g2 y, ys)]
I think you can see now why the result will be [( ((,) 'a') 'b', "c")].
First, I want to emphasize one thing. I found that the crux of understanding lies in noticing the separation between the Parser itself and running the parser with parse.
In running the parser you give the Parser and input string to parse and it will give you the list of possible parses. I think that's probably easy to understand.
You will pass parse a Parser, which may be built using glue, <*>. Try to understand that when you pass parse the Parser, a, or the Parser, f <*> a <*> b, you will be passing it the same type of thing, i.e. something equivalent to (String -> [(a,String)]). I think this is probably easy to understand as well, but still it takes a while to "click".
That said, I'll talk a little about the nature of this applicative glue, <*>. An applicative, F a is a computation that yields data of type a. You can think of a term such as
... f <*> g <*> h
as a series of computations which return some data, say a then b then c. In the context of Parser, the computation involve f looking for a in the current string, then passing the remainder of the string to g, etc. If any of the computations/parses fails, then so does the whole term.
Its interesting to note that any applicative can be written with a pure function at the beginning to collect all those emitted values, so we can generally write,
pure3ArgFunction <$> f <*> g <*> h
I personally find the mental model of emitting and collecting helpful.
So, with that long preamble over, onto the actual explanation. What does
parse (pure (\x y -> (x,y)) <*> item <*> item) "abc"
do? Well, parse (p::Parser (Char,Char) "abc" applies the parser, (which I renamed p) to "abc", yielding [(('a','b'),"c")]. This is a successful parse with the return value of ('a','b') and the leftover string, "c".
Ok, that's not the question though. Why does the parser work this way? Starting with:
.. <*> item <*> item
item takes the next character from the string, yields it as a result and passes the unconsumed input. The next item does the same. The beginning can be rewritten as:
fmap (\x y -> (x,y)) $ item <*> item
or
(\x y -> (x,y)) <$> item <*> item
which is my way of showing that the pure function does not do anything to the input string, it just collects the results. When looked at in this light I think the parser should be easy to understand. Very easy. Too easy. I mean that in all seriousness. Its not that the concept is so hard, but our normal frame of looking at programming is just too foreign for it to make much sense at first.
Some people below did great jobs on "step-by-step" guides for you to easily understand the progress of computation to create the final result. So I don't replicate it here.
What I think is that, you really need to deeply understand about Functor and Applicative Functor. Once you understand these topics, the others will be easy as one two three (I means most of them ^^).
So: what is Functor, Applicative Functor and their applications in your problem?
Best tutorials on these:
Chapter 11 of "Learn You a Haskell for a great good": http://learnyouahaskell.com/functors-applicative-functors-and-monoids.
More visual "Functors, Applicatives, And Monads in Pictures": http://adit.io/posts/2013-04-17-functors,_applicatives,_and_monads_in_pictures.html.
First, when you think about Functor, Applicative Functor, think about "values in contexts": the values are important, and the computational contexts are important too. You have to deal with both of them.
The definitions of the types:
-- Define a new type containing a parser function
newtype Parser a = P (String -> [(a,String)])
-- This function apply the parser p on inp
parse :: Parser a -> String -> [(a,String)]
parse (P p) inp = p inp
The value here is the value of type a, the first element of the tuple in the list.
The context here is the function, or the eventual value. You have to supply an input to get the final value.
Parser is a function wrapped in a P data constructor. So if you got a value b :: Parser Char, and you want to apply it to some input, you have to unwrap the inner function in b. That's why we have the function parse, it unwraps the inner function and applies it to the input value.
And, if you want to create Parser value, you have to use P data constructor wraps around a function.
Second, Functor: something that can be "mapped" over, specified by the function fmap:
fmap :: (a -> b) -> f a -> f b
I often call the function g :: (a -> b) is a normal function because as you see no context wraps around it. So, to be able to apply g to f a, we have to extract the a from f a somehow, so that g can be apply to a alone. That "somehow" depends on the specific Functor and is the context you are working in:
instance Functor Parser where
fmap g p = P (\inp -> case parse p inp of
[] -> []
[(v, out)] -> [(g v, out)])
g is the function of type (a -> b), p is of type f a.
To unwrap p, to get the value of of context, we have to pass some input value in: parse p inp, then the value is the 1st element of the tuple. Apply g to that value, get a value of type b.
The result of fmap is of type f b, so we have to wrap all the result in the same context, that why we have: fmap g p = P (\inp -> ...).
At this time, you might be wonder you could have an implementation of fmap in which the result, instead of [(g v, out)], is [(g v, inp)]. And the answer is Yes. You can implement fmap in any way you like, but the important thing is to be an appropriate Functor, the implementation must obey Functor laws. The laws are they way we deriving the implementation of those functions (http://mvanier.livejournal.com/4586.html). The implementation must satisfy at least 2 Functor laws:
fmap id = id.
fmap (f . g) = fmap f . fmap g.
fmap is often written as infix operator: <$>. When you see this, look at the 2nd operand to determine which Functor you are working with.
Third, Applicative Functor: you apply a wrapped function to a wrapped value to get another wrapped value:
<*> :: f (a -> b) -> f a -> f b
Unwrap the inner function.
Unwrap 1st value.
Apply the function and wrap the result.
In your case:
instance Applicative Parser where
pure v = P (\inp -> [(v, inp)])
pg <*> px = P (\inp -> case parse pg inp of
[] -> []
[(g, out)] -> parse (fmap g px) out)
pg is of type f (a -> b), px is of type f a.
Unwrap g from pg by parse pg inp, g is the 1st of the tuple.
Unwrap px and apply g to the value by using fmap g px. Attention, the result function only applies to out, in some case that is "bc" not "abc".
Wrap the whole result: P (\inp -> ...).
Like Functor, an implementation of Applicative Functor must obey Applicative Functor laws (in the tutorials above).
Fourth, apply to your problem:
parse (pure (\x y -> (x,y)) <*> item <*> item) "abc"
| f1 | |f2| |f3|
Unwrap f1 <*> f2 by passing "abc" to it:
Unwrap f1 by passing "abc" to it, we get [(g, "abc")].
Then fmap g on f2 and passing out="abc" to it:
Unwrap f2 get [('a', "bc")].
Apply g on 'a' get a result: [(\y -> ('a', y), "bc")].
Then fmap 1st element of the result on f3 and passing out="bc" to it:
Unwrap f3 get [('b', "c")].
Apply the function on 'b' get final result: [(('a', 'b'), "c")].
In conclusion:
Take some time for the ideas to "dive" into you. Especially, the laws derives the implementations.
Next time, design your data structure to easier understand.
Haskell is one of my favorite languages and I thing it will be yours soon, so be patient, it needs a learning curve and then you go!
Happy Haskell hacking!
Hmm I am not experienced with Haskell but my attempt on generating Functor and Applicative instances of the Parser type would be as follows;
-- Define a new type containing a parser function
newtype Parser a = P (String -> [(a,String)])
-- This function apply the parser p on inp
parse :: Parser a -> String -> [(a,String)]
parse (P p) inp = p inp
-- A parser which return a tuple with the first char and the remaining string
item :: Parser Char
item = P (\inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)])
-- A parser is a functor
instance Functor Parser where
fmap g (P f) = P (\str -> map (\(x,y) -> (g x, y)) $ f str)
-- A parser is also an applicative functor
instance Applicative Parser where
pure v = P (\str -> [(v, str)])
(P g) <*> (P f) = P (\str -> [(g' v, s) | (g',s) <- g str, (v,_) <- f str])
(P g) <*> (P f) = P (\str -> f str >>= \(v,s1) -> g s1 >>= \(g',s2) -> [(g' v,s2)])
(10x very much for the helping of #Will Ness on <*>)
Accordingly...
*Main> parse (P (\s -> [((+3), s)]) <*> pure 2) "test"
[(5,"test")]
*Main> parse (P (\s -> [((,), s ++ " altered")]) <*> pure 2 <*> pure 4) "test"
[((2,4),"test altered")]
I'm trying to figure out how to build a "purely applicative parser" based on a simple parser implementation. The parser would not use monads in its implementation. I asked this question previously but mis-framed it so I'm trying again.
Here is the basic type and its Functor, Applicative and Alternative implementations:
newtype Parser a = Parser { parse :: String -> [(a,String)] }
instance Functor Parser where
fmap f (Parser cs) = Parser (\s -> [(f a, b) | (a, b) <- cs s])
instance Applicative Parser where
pure = Parser (\s -> [(a,s)])
(Parser cs1) <*> (Parser cs2) = Parser (\s -> [(f a, s2) | (f, s1) <- cs1 s, (a, s2) <- cs2 s1])
instance Alternative Parser where
empty = Parser $ \s -> []
p <|> q = Parser $ \s ->
case parse p s of
[] -> parse q s
r -> r
The item function takes a character off the stream:
item :: Parser Char
item = Parser $ \s ->
case s of
[] -> []
(c:cs) -> [(c,cs)]
At this point, I want to implement digit. I can of course do this:
digit = Parser $ \s ->
case s of
[] -> []
(c:cs) -> if isDigit c then [(c, cs)] else []
but I'm replicating the code of item. I'd like to implement digit based on item.
How do I go about implementing digit, using item to take a character off the stream and then checking to see if the character is a digit without bringing monadic concepts into the implementation?
First, let us write down all the tools we currently have at hand:
-- Data constructor
Parser :: (String -> [(a, String)]) -> Parser a
-- field accessor
parse :: Parser a -> String -> [(a, String)]
-- instances, replace 'f' by 'Parser'
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
pure :: Applicative f => a -> f a
-- the parser at hand
item :: Parser Char
-- the parser we want to write with item
digit :: Parser Char
digit = magic item
-- ?
magic :: Parser Char -> Parser Char
The real question at hand is "what is magic"? There are only so many things we can use. Its type indicates fmap, but we can rule that out. All we can provide is some function a -> b, but there is no f :: Char -> Char that makes fmap f indicate a failure.
What about (<*>), can this help? Well, again, the answer is no. The only thing we can do here is to take the (a -> b) out of the context and apply it; whatever that means in the context of the given Applicative. We can rule pure out.
The problem is that we need to check the Char that item might parse and change the context. We need something like Char -> Parser Char
But we didn't rule Parser or parse out!
magic p = Parser $ \s ->
case parse p s of -- < item will be used here
[(c, cs)] -> if isDigit c then [(c, cs)] else []
_ -> []
Yes, I know, it's duplicate code, but now it's using item. It's using item before inspecting the character. That's the only way we can use item here. And now, there is some kind of sequence implied: item has to succeed before digit can do it's work.
Alternatively, we could have tried this way:
digit' c :: Char -> Parser Char
digit' c = if isDigit c then pure c else empty
But then fmap digit' item would have the type Parser (Parser Char), which can only get collapsed with a join-like function. That's why monads are more powerful than applicative.
That being said, you can get around all of the monad requirements if you use a more general function first:
satisfy :: (Char -> Bool) -> Parser Char
satisfy = Parser $ \s ->
case s of
(c:cs) | p c -> [(c, cs)]
_ -> []
You can then define both item and digit in terms of satisfy:
item = satisfy (const True)
digit = satisfy isDigit
That way digit does not have to inspect the result of a previous parser.
Functors allow you to act on somethings values. For example, if you have a list [1,2,3], you can change the contents. Note that Functors do not allow changing structure. map can not change the length of a list.
Applicatives allow you to combine structure, and the content is mushed together somehow. But the values can not change influence the structure.
Namely, given an item, we can change its structure, and we can change its content, but the content can not change the structure. We can't choose to fail on some content and not other.
If anyone knows how to state this more formally and provably, I'm all ears (it probably has to do with free theorems).
I am working my way through the functional pearl paper Monadic parsing in Haskell (after recommendation at haskellforall.com to read that paper to understand parsing). I wrote an implementation until section 4 on page 3 as below:
newtype Parser a = Parser (String -> [(a,String)])
parse (Parser p) = p
instance Monad Parser where
return a = Parser (\cs -> [(a,cs)])
p >>= f = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse p cs])
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)])
p :: Parser (Char,Char)
p = do { a <- item; item; b <- item; return (a,b)}
According to the paper, p is a parser that consumes three characters, skips middle one, and returns a pair of first and second. What I can't figure out is how the modified input string is passed to 2nd and 3rd definitions of item in p. We are not passing the result of first parser to second parser, and so on (because ;, syntactic sugar for >> is used which discards the result as shown by type signature (>>) :: Monad m => m a -> m b -> m b). I will appreciate explanation of how the modified function is being passed in last two invocations of item in p.
Another thing that confuses me is the handling of cs in item - it doesn't return (head,tail) pair. Shouldn't it be redefined as follow since the item parser consumes one character according to the paper:
item :: Parser Char
item = Parser (\cs -> case cs of
"" -> []
(c:cs') -> [(c,cs')]) -- redefinition - use cs' to denote tail
The syntax ; is not always syntactic sugar for >>.
Rather, we have:
do m ; n = m >> n
do x<-m ; n = m >>= \x -> n
(The above translation is simplified, the full gory details can be found in the Haskell Report)
So, your definition for p is equivalent to:
p = item >>= \a -> ( item >> (item >>= \b -> return (a,b) ))
Here, you can see that the first and third items do not have their results discarded (because >>= binds them to a and b respectively), while the middle item does.
Also note that the code
\cs -> case cs of
"" -> []
(c:cs) -> [(c,cs)]
is misleading since it is defining variable cs twice: once in the \cs and once in the
pattern (c:cs). It is equivalent to
\cs -> case cs of
"" -> []
(x:xs) -> [(x,xs)]
This clarifies that the final String is the output is not the original cs one, but rather its tail xs.
In a comment, the poster wondered why the three uses of item do not return the same result, i.e., why in return (a,b) the character a is not equal to b. This is due to the >>= monadic operator, which in this Parser monad automatically feeds the output string xs of each item occurence to the next one. Indeed, the whole point of this monad is to help feeding the "leftover" output of each parser as the "to-be-consumed" input in the next one. This has two advantages: it frees the programmer from having to write code to pass this string around, and it ensures that the string is not accidentally "rewound" to a previous state. To illustrate the latter point, here's some wrong code:
let [(c1,s1)] = someParser someInitialString
[(c2,s2)] = anotherParser1 s1
[(c3,s3)] = anotherParser2 s2
[(c4,s4)] = anotherParser3 s3
[(c5,s5)] = anotherParser4 s2 -- Whoops! Should have been s4
in [c1,c2,c3,c4,c5]
In the last step the string, after having been consumed multiple times, is wrongly rolled back to a previous state, as if the parsers anotherParser2 and anotherParser3 did not consume anything at all. This error is prevented by composing parsers through >>= instead.
I'll try shedding some more light regarding >>.
As you see in the other answer, you should desugar the do's into >>= to better understand what's going on.
Let's for example write a parser that parses two chars and returns them.
twoChars :: Parser (Char,Char)
twoChars = do
i <- item
j <- item
return (i,j)
Now, desugar the do syntax:
twoChars :: Parser (Char,Char)
twoChars =
item >>= (\i ->
item >>= (\j ->
return (i,j) ) )
I put brackets for clarity. As you see, the second item receives the result of the first item parser in the anonymous function, with the result bound to i. The >>= function takes a parser, a function, and returns a parser. Best way to understand it would be to plug it into the definition:
f = \i → item »= \j → return (i,j)
twoChars = item >>= f
twoChars = Parser (\cs -> concat [parse (f a) cs' | (a,cs') <- parse item cs])
So we got back a new Parser. Try to imagine what it will do on an input "abc". cs is bound to "abc", and the item Parser is used to get back [('a',"bc")]. Now, we apply f to 'a', to get back the new parser:
item >>= \j -> return ('a',j)
This parser will be passed the rest of our string left to process ("bc"), and it will use the item parser to get out the b when the \j above is bound to b. We then get a return ('a','b') statement, which puts ('a','b') into a parser that just return ('a','b').
I hope this clears up how the information flow happens. Now, suppose that you want to ignore a character. You could do it like this.
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>= \j ->
item >>= \k ->
return (i,k)
It's ok that the j is bound to 'b' for the example "abc", you never use it. We can so replace j by _.
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>= \_ ->
item >>= \k ->
return (i,k)
But we also know that >> :: m a -> m b -> m b can be defined as:
p >> q = p >>= \_ -> q
So we are left with
twoChars :: Parser (Char,Char)
twoChars =
item >>= \i ->
item >>
item >>= \k ->
return (i,k)
Finally, you can sugar this back into do. The application of >> simply sugars into a single-line statement with no bounding. It results in:
twoChars :: Parser (Char,Char)
twoChars = do
i <- item
item
j <- item
return (i,j)
Hope this cleared some things up.
The more uniform translation of your
p3 = do { a <- item; item; b <- item; return (a,b)}
-- do { a <- item; z <- item; b <- item; return (a,b)} -- z is ignored
is
p3 = item >>= (\a ->
item >>= (\z ->
item >>= (\b ->
return (a,b)))) -- z is unused
(the key observation here is that the functions are nested). Which means that
-- parse (return a) cs = [(a,cs)]
-- parse (p >>= f) cs = [r | (a,cs1) <- parse p cs, -- concat
-- r <- parse (f a) cs1] ) -- inlined !
parse p3 cs
= [ r | (a,cs1) <- parse item cs,
r <- [ r | (z,cs2) <- parse item cs1,
r <- [ r | (b,cs3) <- parse item cs2,
r <- -- parse (return (a,b)) cs3
[((a,b),cs3)]]]] -- z is unused
= [ ((a,b),cs3) | (a,cs1) <- parse item cs,
(_,cs2) <- parse item cs1,
(b,cs3) <- parse item cs2]
So you see, "the input string" does change: first it's cs, then cs1, then cs2.
That is the simple real computation behind all the Parser tags and do syntax. It's all just about the chaining of inputs and outputs in the nested loops, in the end:
parse p3 cs =
for each (a,cs1) in (parse item cs):
for each (z,cs2) in (parse item cs1):
for each (b,cs3) in (parse item cs2):
yield ((a,b),cs3)