Combining parsers in Haskell - parsing

I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?

To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)

Related

Apply parser n times vs on the first n characters (Haskell)

I am studying parsers in Haskell following definitions from G. Hutton, E. Meijer - Monadic Parsing in Haskell.
data Parser a = Parser { parseWith :: String -> [(a, String)] }
instance Functor Parser where
fmap f (Parser p) = Parser $ \s -> [(f a, rest) | (a, rest) <- p s]
instance Applicative Parser where
pure x = Parser $ \s -> [(x, s)]
(Parser p1) <*> (Parser p2) = Parser $ \s -> [(f x, r2) | (f, r1) <- p1 s, (x, r2) <- p2 r1]
instance Monad Parser where
return = pure
p >>= f = Parser $ \s -> concatMap (\(x, r) -> parseWith (f x) r) $ parseWith p s
instance Alternative Parser where
empty = failure
p1 <|> p2 = Parser $ \s ->
case parseWith p1 s of
[] -> parseWith p2 s
res -> res
Essentially I have a (parsed :: a, remaining :: String) context.
As a simple application, I defined the following ADT to parse:
data Arr = Arr Int [Int] -- len [values]
and a parser that can construct Array values from strings, e.g.:
"5|12345" -> Arr 5 [1,2,3,4,5]
First, in order to parse n such Array values (the string input contains n on the first position), e.g.:
"2 3|123 4|9876 2|55" -> [Arr 3 [1,2,3], Arr 4 [9,8,7,6]]
I can do the following:
arrayParse :: Parser Arr
arrayParse = do
len <- digitParse
vals <- exactly len digitParse
return $ Arr len vals
nArraysParse :: Parser [Arr]
nArraysParse = do
n <- digitParse
exactly n arrayParse
where exactly n p constructs a new parser by applying p n times.
Next, I want to parse a different scheme.
Suppose the first character denotes the length of the sub-string defining the arrays, e.g.:
"9 3|123 4|9876 2|55" -> [Arr 3 [1,2,3], Arr 4 [9,8,7,6]]
Meaning that I have to apply arrayParse on the first n chars (excluding | and whitespace) to get the first 2 arrays:
3|123 -> 4 chars (excluding | and whitespace)
4|9876 -> 5 chars (excluding | and whitespace)
So, it's straightforward to apply a parser n times:
exactly :: Int -> Parser a -> Parser [a]
exactly 0 _ = pure []
exactly n p = do
v <- p -- apply parser p once
v' <- exactly (n-1) p -- apply parser p n-1 times
return (v:v')
but how can I express the intent of applying a parser on the first n characters?
My initial approach was something like this:
foo :: Parser [Arr]
foo = do
n <- digitParse
substring <- consume n
-- what to do with substring?
-- can I apply arrayParse on it?
How should I approach this?
Following #jlwoodwa's advice, I managed to achieve the following:
innerParse :: Parser a -> String -> Parser a
innerParse p s = case parseWith p s of
[(arr, "")] -> return arr
_ -> failure
substringParse :: Parser [Arr]
substringParse = do
n <- digitParse
substring <- consume n
innerParse (zeroOrMore arrayParse) substring
which works for my use-case.

Haskell : Operator Parser keeps going to undefined rather than inputs

I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)

Cascading Parsers in Haskell

My parser type is
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
I have two parsers :
1) a = (satisfy isAlpha) that knows how to match the first alpha numeric character in a string.
Running parse a "k345" gives Just ('k',"345")
2) b = many (satisfy isDigit) that knows how to match any number of digits. Running parse b "1234 abc" gives Just ("1234"," abc")
Now I want to combine those two parsers and match a singe alphanumeric character followed by any number of digits.
I tried:
parse (a *> b) "k1234 7" and got Just ("1234"," 7 "). Looks like the 'k' matched by the first parser a is gone from the output. How do I fix this problem ?
Thanks!
For a toy parser, look the following code:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
module Parse where
import Data.Char
import Data.List
newtype Parser a = Parser
{ parse :: String -> Maybe (a, String) }
satisfy :: (Char -> Bool) -> Parser Char
satisfy cond = Parser $ \s ->
case s of
"" -> Nothing
(c:cs) -> if cond c then Just (c, cs) else Nothing
many :: Parser a -> Parser [a]
many p = Parser $ \s ->
case parse p s of
Nothing -> Just ([], s)
Just (c, cs) -> let Just (cc, cs') = parse (many p) cs
in Just (c:cc, cs')
string :: String -> Parser String
string str = Parser $ \s -> if isPrefixOf str s
then Just (str, drop (length str) s)
else Nothing
instance Functor Parser where
fmap f (Parser g) = Parser $ \s ->
case g s of
Nothing -> Nothing
Just (r, remain) -> Just (f r, remain)
instance Applicative Parser where
pure a = Parser $ \s -> Just (a, s)
-- (<*>) :: f (a -> b) -> f a -> f b
(Parser f) <*> (Parser g) = Parser $ \s ->
case f s of
Nothing -> Nothing
Just (ab, remain) -> case g remain of
Nothing -> Nothing
Just (r, remain1) -> Just (ab r, remain1)
instance Semigroup a => Semigroup (Parser a) where
(Parser p1) <> (Parser p2) = Parser $ \s ->
case p1 s of
Nothing -> Nothing
Just (r1, s1) -> case p2 s1 of
Nothing -> Nothing
Just (r2, s2) -> Just (r1 <> r2, s2)
instance (Monoid a, Semigroup (Parser a))=> Monoid (Parser a) where
mempty = Parser $ \s -> Just (mempty, s)
mappend = (<>)
a = satisfy isAlpha
b = many (satisfy isDigit)
λ> parse a "k345"
Just ('k',"345")
λ> parse b "12345 abc"
Just ("12345"," abc")
λ> parse (a *> b) "k1234 7"
Just ("1234"," 7")
λ> parse (string "k" <> b) "k1234 7"
Just ("k1234"," 7")
So maybe you should find some tutorials and try to be familiar with Functor, Applicative, and Monad. See, you can implement the instance of a Monoid for your Parser type, and then you can use (<>) to combine your parsed results together.
It looks like this is working fine :
parse (fmap (:) (satisfy isAlpha) <*> many (satisfy isDigit)) "k1234 7"
And gives back what I wanted
Just ("k1234"," 7")

Simple Parse in Haskell with do construct

I'm trying to write a simple Parser, so all the declarations are listed in the image below, but when I try to compile this module it fails.
I'm following the tutorial provided by this source ->
Haskell lessons suggested by official site and specifically this video by Dr. Erik Meijer (Lesson on Parser with "do" construct).
The problem is that I thought that the "do" construct was able to "concatenate" outputs from a previous function in a descending way, but the way that this p function should work seems to be magic to me. What's the right implementation?
-- Generic Parser.
type Parser a = String -> [(a, String)]
-- A simple Parser that captures the first char of the string, puts it in
-- the first position of the couple and then puts the rest of the string into
-- the second place of the couple inside the singleton list.
item :: Parser Char
item = \inp -> case inp of
[] -> []
(x:xs) -> [(x, xs)]
-- Simple parser that always fails.
failure :: Parser a
failure = \inp -> []
-- Returns the type of the parser without operating on the input.
return1 :: a -> Parser a
return1 v = \inp -> [(v, inp)]
-- Explicit call to parse.
parse :: Parser a -> String -> [(a, String)]
parse p inp = p inp
-- Some kind of "or" operator that if the first parser fails (returning an empty list) it
-- parses the second parser.
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = \inp -> case p inp of
[] -> parse q inp
[(v, out)] -> [(v, out)]
-- The function within I'm having troubles.
p :: Parser (Char,Char)
p = do
x <- item
item
y <- item
return1 (x, y)
This is how it's explained by Dr. Meijer:
And this is how it should work:
Your Parser is just a type synonym for a function. The friendly parsers you've seen in use are all proper types of their own, with Functor and Applicative instances, along with (in most cases) Alternative, Monad, and MonadPlus instances. You probably want something that looks like the following (untested, never compiled) version.
import Control.Monad (ap, liftM)
import Control.Applicative (Alternative (..))
newtype Parser a = Parser
{ runParser :: String -> [(a, String)] }
instance Functor Parser where
fmap = liftM
instance Applicative Parser where
pure v = Parser $ \inp -> [(v, inp)]
(<*>) = ap
instance Monad Parser where
-- The next line isn't required for
-- recent GHC versions
-- return = pure
Parser m >>= f = Parser $ \s ->
[(r, s'') | (x, s') <- m s
, (r, s'') <- runParser (f r) s']
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = Parser $ \inp -> case runParser p inp of
[] -> runParser q inp
[(v, out)] -> [(v, out)]
failure :: Parser a
failure = Parser $ \inp -> []
instance Alternative Parser where
(<|>) = (+++)
empty = failure
instance MonadPlus Parser

Removing Left Recursion in a Basic Expression Parser

As an exercise, I'm implementing a parser for an exceedingly simple language defined in Haskell using the following GADT (the real grammar for my project involves many more expressions, but this extract is sufficient for the question):
data Expr a where
I :: Int -> Expr Int
Add :: [Expr Int] -> Expr Int
The parsing functions are as follows:
expr :: Parser (Expr Int)
expr = foldl1 mplus
[ lit
, add
]
lit :: Parser (Expr Int)
lit = I . read <$> some digit
add :: Parser (Expr Int)
add = do
i0 <- expr
is (== '+')
i1 <- expr
is <- many (is (== '+') *> expr)
pure (Add (i0:i1:is))
Due to the left-recursive nature of the expression grammar, when I attempt to parse something as simple as 1+1 using the expr parser, the parser get stuck in an infinite loop.
I've seen examples of how to factor out left recursion across the web using a transformation from something like:
S -> S a | b
Into something like:
S -> b T
T -> a T
But I'm struggling with how to apply this to my parser.
For completeness, here is the code that actually implements the parser:
newtype Parser a = Parser
{ runParser :: String -> [(a, String)]
}
instance Functor Parser where
fmap f (Parser p) = Parser $ \s ->
fmap (\(a, r) -> (f a, r)) (p s)
instance Applicative Parser where
pure a = Parser $ \s -> [(a, s)]
(<*>) (Parser f) (Parser p) = Parser $ \s ->
concat $ fmap (\(f', r) -> fmap (\(a, r') -> (f' a, r')) (p r)) (f >
instance Alternative Parser where
empty = Parser $ \s -> []
(<|>) (Parser a) (Parser b) = Parser $ \s ->
case a s of
(r:rs) -> (r:rs)
[] -> case b s of
(r:rs) -> (r:rs)
[] -> []
instance Monad Parser where
return = pure
(>>=) (Parser a) f = Parser $ \s ->
concat $ fmap (\(r, rs) -> runParser (f r) rs) (a s)
instance MonadPlus Parser where
mzero = empty
mplus (Parser a) (Parser b) = Parser $ \s -> a s ++ b s
char = Parser $ \case (c:cs) -> [(c, cs)]; [] -> []
is p = char >>= \c -> if p c then pure c else empty
digit = is isDigit
Suppose you want to parse non-parenthesized expressions involving literals, addition, and multiplication. You can do this by cutting down the list by precedence. Here's one way to do it in attoparsec, which should be pretty similar to what you'd do with your parser. I'm no parsing expert, so there might be some errors or infelicities.
import Data.Attoparsec.ByteString.Char8
import Control.Applicative
expr :: Parser (Expr Int)
expr = choice [add, mul, lit] <* skipSpace
-- choice is in Data.Attoparsec.Combinators, but is
-- actually a general Alternative operator.
add :: Parser (Expr Int)
add = Add <$> addList
addList :: Parser [Expr Int]
addList = (:) <$> addend <* skipSpace <* char '+' <*> (addList <|> ((:[]) <$> addend))
addend :: Parser (Expr Int)
addend = mul <|> multiplicand
mul :: Parser (Expr Int)
mul = Mul <$> mulList
mulList :: Parser [Expr Int]
mulList = (:) <$> multiplicand <* skipSpace <* char '*' <*> (mulList <|> ((:[]) <$> multiplicand))
multiplicand :: Parser (Expr Int)
multiplicand = lit
lit :: Parser (Expr Int)
lit = I <$> (skipSpace *> decimal)

Resources