Example of simple parsers in haskell - parsing

I have the following parsers functions
import Data.Char
type Parser a = String -> [(a,String)]
return' :: a -> Parser a
return' v = \inp -> [(v,inp)]
failure :: Parser a
failure = \inp -> []
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = \inp -> case parse p inp of
[] -> parse q inp
[(v,out)] -> [(v,out)]
So far so good, they can be loaded into ghci. But when I add the following function
sat :: (Char -> Bool) -> Parser Char
sat p = do x <- item
if p x then return x else failure
I obtain an error. Could you tell what's going on?

The type of x in the do block of sat is [(Char, String)]. This is because item has the type String -> [(Char, String)] or (->) String [(Char, String)] and you are using the Monad instance for (->) String, so what is "contained" in a (->) String [(Char, String)] is a [(Char, String)].
{-# LANGUAGE ScopedTypeVariables #-}
sat :: (Char -> Bool) -> Parser Char
sat p = do (x :: [(Char, String)]) <- item
if p x then return' x else failure
p is a function from a Char to Bool; it doesn't accept a list of possible parsing results. The only sensible thing to do with p is filter the results based on whether the Char matches p. This still results in [] when the result doesn't match p.
sat :: (Char -> Bool) -> Parser Char
sat p = do (x :: [(Char, String)]) <- item
return $ filter (p . fst) x
If we drop ScopedTypeVariables, we need to drop the type signature I added for illustration.
sat :: (Char -> Bool) -> Parser Char
sat p = do x <- item
return $ filter (p . fst) x

Use of do blocks requires that your Parser is an instance of the Monad class. Since your Parser type is an alias for a function type String -> [(a, String)] and functions are not instances of Monad, this is why you get the error. You could write your function without using do notation:
sat :: (Char -> Bool) -> Parser Char
sat p [] = []
sat p (c:cs) = if p c then [(c, cs)] else []

Related

Combining parsers in Haskell

I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?
To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)

Monad Parser - Couldn't match expected type ‘[(b, String)]’ with actual type ‘Parser b’

I'm studying Haskell using "Programming in Haskell" of G.Hutton. I'm following ch.13 of Monadic Parser.
First, I define a type Parser:
newtype Parser a = P (String -> [(a, String)])
Then, a parse function
parse:: Parser a -> String -> [(a, String)]
I make the Parser a Monad
instance Monad Parser where
--return :: a -> Parser a
return v = P(\inp -> [(v, inp)])
--(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= g = P (\inp -> case parse p inp of
[] -> []
[(v, out)] -> parse (g v) out)
My problem is on the last line of Monad instance
Why this
[(v, out)] -> parse (g v) out)
and not this
[(v, out)] -> (g v))
>>= return a Parser b, not a [(b, String)].Infact g v, is a Parser.
I know I'm wrong, but I don't understand why.
>>= return a Parser b, not a [(b, String)].Infact g v, is a Parser.
That is correct, but we are constructing a Parser with the outer P. Indeed:
p >>= g = P (\inp -> case parse p inp of
[] -> []
[(v, out)] -> parse (g v) out)
Notice the P here immediately after the =. The lambda expression \inp -> … thus has to have as type String -> [(b, String)], not Parser. We evaluate the parser with parse, since that acts as a "getter" to get the function out of the g v.
Your implementation of >>= is however not complete. Indeed, this is a backtracking parser, and it is possible that the list contains no elements (no options), one element, or multiple elements. We thus should perform a mapping like:
p >>= g = P (
\inp -> concatMap (\(v, out) -> parse (g v) out) (parse p inp)
)
or we can make use of the bind operator >>= defined for lists:
p >>= g = P (
\inp -> parse p inp >>= \(v, out) -> parse (g v) out
)

Haskell: Graham Hutton Book Parsing (Ch-8): What does `parse (f v) out` do, and how does it do it?

My question is about Graham Hutton's book Programming in Haskell 1st Ed.
There is a parser created in section 8.4, and I am assuming anyone answering has the book or can see the link to slide 8 in the link above.
A basic parser called item is described as:
type Parser a = String -> [(a, String)]
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x,xs)]
which is used with do to define another parser p (the do parser)
p :: Parser (Char, Char)
p = do x <- item
item
y <- item
return (x,y)
the relevant bind definition is:
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= f = \inp -> case parse p inp of
[] -> []
[(v,out)] -> parse (f v) out
return is defined as:
return :: a -> Parser a
return v = \inp -> [(v,inp)]
parse is defined as:
parse :: Parser a -> String -> [(a,String)]
parse p inp = p inp
The program (the do parser) takes a string and selects the 1st and 3rd characters and returns them in a tuple with the remainder of the string in a list, e.g., "abcdef" produces [('a','c'), "def"].
I want to know how the
(f v) out
in
[(v,out)] -> parse (f v) out
returns a parser which is then applied to out.
f in the do parser is item and item taking a character 'c' returns [('c',[])]?
How can that be a parser and how can it take out as an argument?
Perhaps I am just not understanding what (f v) does.
Also how does the do parser 'drop' the returned values each time to operate on the rest of the input string when item is called again?
What is the object that works its way through the do parser, and how is it altered at each step, and by what means is it altered?
f v produces a Parser b because f is a function of type a -> Parser b and v is a value of type a. So then you're calling parse with this Parser b and the string out as arguments.
F in the 'do' parser is item
No, it's not. Let's consider a simplified (albeit now somewhat pointless) version of your parser:
p = do x <- item
return x
This will desugar to:
p = item >>= \x -> return x
So the right operand of >>=, i.e. f, is \x -> return x, not item.
Also how does the 'do' parser 'drop' the returned values each time to operate on the rest of the input string when item is called again? What is the object that works its way through the 'do' parser and how is it altered and each step and by what means is it altered?
When you apply a parser it returns a tuple containing the parsed value and a string representing the rest of the input. If you look at item for example, the second element of the tuple will be xs which is the tail of the input string (i.e. a string containing all characters of the input string except the first). This second part of the tuple will be what's fed as the new input to subsequent parsers (as per [(v,out)] -> parse (f v) out), so that way each successive parser will take as input the string that the previous parser produced as the second part of its output tuple (which will be a suffix of its input).
In response to your comments:
When you write "p = item >>= \x -> return x", is that the equivalent of just the first line "p = do x <- item"?
No, it's equivalent to the entire do-block (i.e. do {x <- item; return x}). You can't translate do-blocks line-by-line like that. do { x <- foo; rest } is equivalent to foo >>= \x -> do {rest}, so you'll always have the rest of the do-block as part of the right operand of >>=.
but not how that reduces to simply making 'out' available as the input for the next line. What is parse doing if the next line of the 'do' parser is a the item parser?
Let's walk through an example where we invoke item twice (this is like your p, but without the middle item). In the below I'll use === to denote that the expressions above and below the === are equivalent.
do x <- item
y <- item
return (x, y)
=== -- Desugaring do
item >>= \x -> item >>= \y -> return (x, y)
=== -- Inserting the definition of >>= for outer >>=
\inp -> case parse item inp of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
Now let's apply this to the input "ab":
case parse item "ab" of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
=== Insert defintiion of `parse`
case item "ab" of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
=== Insert definition of item
case ('a', "b") of
[] -> []
[(v,out)] -> parse (item >>= \y -> return (v, y)) out
===
parse (item >>= \y -> return ('a', y)) out
Now we can expand the second >>= the same we did the fist and eventually end up with ('a', 'b').
The relevant advice is, Don't panic (meaning, don't rush it; or, take it slow), and, Follow the types.
First of all, Parsers
type Parser a = String -> [(a,String)]
are functions from String to lists of pairings of result values of type a and the leftover Strings (because type defines type synonyms, not new types like data or newtype do).
That leftovers string will be used as input for the next parsing step. That's the main thing about it here.
You are asking, in
p >>= f = \inp -> case (parse p inp) of
[] -> []
[(v,out)] -> parse (f v) out
how the (f v) in [(v,out)] -> parse (f v) out returns a parser which is then applied to out?
The answer is, f's type says that it does so:
(>>=) :: Parser a -> (a -> Parser b) -> Parser b -- or, the equivalent
(>>=) :: Parser a -> (a -> Parser b) -> (String -> [(b,String)])
-- p f inp
We have f :: a -> Parser b, so that's just what it does: applied to a value of type a it returns a value of type Parser b. Or equivalently,
f :: a -> (String -> [(b,String)]) -- so that
f (v :: a) :: String -> [(b,String)] -- and,
f (v :: a) (out :: String) :: [(b,String)]
So whatever is the value that parse p inp produces, it must be what f is waiting for to proceed. The types must "fit":
p :: Parser a -- m a
f :: a -> Parser b -- a -> m b
f <$> p :: Parser ( Parser b ) -- m ( m b )
f =<< p :: Parser b -- m b
or, equivalently,
p :: String -> [(a, String)]
-- inp v out
f :: a -> String -> [(b, String)]
-- v out
p >>= f :: String -> [(b, String)] -- a combined Parser
-- inp v2 out2
So this also answers your second question,
How can that be a parser and how can it take out as an argument?
The real question is, what kind of f is it, that does such a thing? Where does it come from? And that's your fourth question.
And the answer is, your example in do-notation,
p :: Parser (Char, Char)
p = do x <- item
_ <- item
y <- item
return (x,y)
by Monad laws is equivalent to the nested chain
p = do { x <- item
; do { _ <- item
; do { y <- item
; return (x,y) }}}
which is a syntactic sugar for the nested chain of Parser bind applications,
p :: Parser (Char, Char) -- ~ String -> [((Char,Char), String)]
p = item >>= (\ x -> -- item :: Parser Char ~ String -> [(Char,String)]
item >>= (\ _ -> -- x :: Char
item >>= (\ y -> -- y :: Char
return (x,y) )))
and it is because the functions are nested that the final return has access to both y and x there; and it is precisely the Parser bind that arranges for the output leftovers string to be used as input to the next parsing step:
p = item >>= f -- :: String -> [((Char,Char), String)]
where
{ f x = item >>= f2
where { f2 _ = item >>= f3
where { f3 y = return (x,y) }}}
i.e. (under the assumption that inp is a string of length two or longer),
parse p inp -- assume that `inp`'s
= (item >>= f) inp -- length is at least 2 NB.
=
let [(v, left)] = item inp -- by the def of >>=
in
(f v) left
=
let [(v, left)] = item inp
in
let x = v -- inline the definition of `f`
in (item >>= f2) left
=
let [(v, left)] = item inp
in let x = v
in let [(v2, left2)] = item left -- by the def of >>=, again
in (f2 v2) left2
=
..........
=
let [(x,left1)] = item inp -- x <- item
[(_,left2)] = item left1 -- _ <- item
[(y,left3)] = item left2 -- y <- item
in
[((x,y), left3)]
=
let (x:left1) = inp -- inline the definition
(_:left2) = left1 -- of `item`
(y:left3) = left2
in
[((x,y), left3)]
=
let (x:_:y:left3) = inp
in
[((x,y), left3)]
after few simplifications.
And this answers your third question.
I am having similar problems reading the syntax, because it's not what we are used to.
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>= f = \inp -> case parse p inp of
[] -> []
[(v,out)] -> parse (f v) out
so for the question:
I want to know how the (f v) out in [(v,out)] -> parse (f v) out returns a parser which is then applied to out.
It does because that's the signature of the 2nd arg (the f): (>>=) :: Parser a -> (a -> Parser b) -> Parser b .... f takes an a and produces a Parser b . a Parser b takes a String which is the out ... (f v) out.
But the output of this should not be mixed up with the output of the function we are writing: >>=
We are outputting a parser ... (>>=) :: Parser a -> (a -> Parser b) ->
Parser b .
The Parser we are outputting has the job of wrapping and chaining the first 2 args
A parser is a function that takes 1 arg. This is constructed right after the first = ... i.e. by returning an (anonymous) function: p >>= f = \inp -> ... so inp refers to the input string of the Parser we are building
so what is left is to define what that constructed function should do ... NOTE: we are not implementing any of the input parsers just chaining them together ... so the output Parser function should:
apply the input parser (p) to the its input (inp): p >>= f = \inp -> case parse p inp of
take the output of that parse [(v, out)] -- v is the result, out is what remains of the input
apply the input function (f is (a -> Parser b)) to the parsed result (v)
(f v) produces a Parser b (a function that takes 1 arg)
so apply that output parser to the remainder of the input after the first parser (out)
For me the understanding lies in the use of destructuring and the realization that we are constructing a function that glues together the execution of other functions together simply considering their interface.
Hope that helps ... it helped me to write it :-)

Sequencing operator in Haskell Parser

We are trying to build a Parser, but I can't understand the way the function p works, even if I understood the >>>= operator. How does de p function works?
type Parser a = String -> [(a, String)]
returnb :: a -> Parser a
returnb v = \input -> [(v, input)]
failure :: Parser a
failure = \input -> []
item :: Parser Char
item = \input -> case input of
[] -> []
(x:xs) -> [(x, xs)]
parse :: Parser a -> String -> [(a, String)]
parse parser input = parser input
-- Sequencing operator
(>>>=) :: Parser a -> (a -> Parser b) -> Parser b
p >>>= f = \input -> case parse p input of
[] -> []
[(v, out)] -> parse (f v) out
p :: Parser (Char, Char)
p = item >>>= \x ->
item >>>= \z ->
item >>>= \y ->
returnb (x,y)
As far as I can see it reads three char and gives back the first and the last one as a pair wrapped in a Parser monad.
If p specifically is the problem, it's working similarly to this pseudo-program, which is completely made up but hopefully illustrates the flow:
p input = let [(x, rest1)] = item input
[(z, rest2)] = item rest1
[(y, rest3)] = item rest2
in
[((x, y), rest3)]
That is, it parses three characters and produces a pair consisting of the first and the third characters.

Simple Parse in Haskell with do construct

I'm trying to write a simple Parser, so all the declarations are listed in the image below, but when I try to compile this module it fails.
I'm following the tutorial provided by this source ->
Haskell lessons suggested by official site and specifically this video by Dr. Erik Meijer (Lesson on Parser with "do" construct).
The problem is that I thought that the "do" construct was able to "concatenate" outputs from a previous function in a descending way, but the way that this p function should work seems to be magic to me. What's the right implementation?
-- Generic Parser.
type Parser a = String -> [(a, String)]
-- A simple Parser that captures the first char of the string, puts it in
-- the first position of the couple and then puts the rest of the string into
-- the second place of the couple inside the singleton list.
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x, xs)]
-- Simple parser that always fails.
failure :: Parser a
failure = \inp -> []
-- Returns the type of the parser without operating on the input.
return1 :: a -> Parser a
return1 v = \inp -> [(v, inp)]
-- Explicit call to parse.
parse :: Parser a -> String -> [(a, String)]
parse p inp = p inp
-- Some kind of "or" operator that if the first parser fails (returning an empty list) it
-- parses the second parser.
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = \inp -> case p inp of
[] -> parse q inp
[(v, out)] -> [(v, out)]
-- The function within I'm having troubles.
p :: Parser (Char,Char)
p = do
x <- item
item
y <- item
return1 (x, y)
This is how it's explained by Dr. Meijer:
And this is how it should work:
Your Parser is just a type synonym for a function. The friendly parsers you've seen in use are all proper types of their own, with Functor and Applicative instances, along with (in most cases) Alternative, Monad, and MonadPlus instances. You probably want something that looks like the following (untested, never compiled) version.
import Control.Monad (ap, liftM)
import Control.Applicative (Alternative (..))
newtype Parser a = Parser
{ runParser :: String -> [(a, String)] }
instance Functor Parser where
fmap = liftM
instance Applicative Parser where
pure v = Parser $ \inp -> [(v, inp)]
(<*>) = ap
instance Monad Parser where
-- The next line isn't required for
-- recent GHC versions
-- return = pure
Parser m >>= f = Parser $ \s ->
[(r, s'') | (x, s') <- m s
, (r, s'') <- runParser (f r) s']
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = Parser $ \inp -> case runParser p inp of
[] -> runParser q inp
[(v, out)] -> [(v, out)]
failure :: Parser a
failure = Parser $ \inp -> []
instance Alternative Parser where
(<|>) = (+++)
empty = failure
instance MonadPlus Parser

Resources