Cascading Parsers in Haskell - parsing

My parser type is
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
I have two parsers :
1) a = (satisfy isAlpha) that knows how to match the first alpha numeric character in a string.
Running parse a "k345" gives Just ('k',"345")
2) b = many (satisfy isDigit) that knows how to match any number of digits. Running parse b "1234 abc" gives Just ("1234"," abc")
Now I want to combine those two parsers and match a singe alphanumeric character followed by any number of digits.
I tried:
parse (a *> b) "k1234 7" and got Just ("1234"," 7 "). Looks like the 'k' matched by the first parser a is gone from the output. How do I fix this problem ?
Thanks!

For a toy parser, look the following code:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
module Parse where
import Data.Char
import Data.List
newtype Parser a = Parser
{ parse :: String -> Maybe (a, String) }
satisfy :: (Char -> Bool) -> Parser Char
satisfy cond = Parser $ \s ->
case s of
"" -> Nothing
(c:cs) -> if cond c then Just (c, cs) else Nothing
many :: Parser a -> Parser [a]
many p = Parser $ \s ->
case parse p s of
Nothing -> Just ([], s)
Just (c, cs) -> let Just (cc, cs') = parse (many p) cs
in Just (c:cc, cs')
string :: String -> Parser String
string str = Parser $ \s -> if isPrefixOf str s
then Just (str, drop (length str) s)
else Nothing
instance Functor Parser where
fmap f (Parser g) = Parser $ \s ->
case g s of
Nothing -> Nothing
Just (r, remain) -> Just (f r, remain)
instance Applicative Parser where
pure a = Parser $ \s -> Just (a, s)
-- (<*>) :: f (a -> b) -> f a -> f b
(Parser f) <*> (Parser g) = Parser $ \s ->
case f s of
Nothing -> Nothing
Just (ab, remain) -> case g remain of
Nothing -> Nothing
Just (r, remain1) -> Just (ab r, remain1)
instance Semigroup a => Semigroup (Parser a) where
(Parser p1) <> (Parser p2) = Parser $ \s ->
case p1 s of
Nothing -> Nothing
Just (r1, s1) -> case p2 s1 of
Nothing -> Nothing
Just (r2, s2) -> Just (r1 <> r2, s2)
instance (Monoid a, Semigroup (Parser a))=> Monoid (Parser a) where
mempty = Parser $ \s -> Just (mempty, s)
mappend = (<>)
a = satisfy isAlpha
b = many (satisfy isDigit)
λ> parse a "k345"
Just ('k',"345")
λ> parse b "12345 abc"
Just ("12345"," abc")
λ> parse (a *> b) "k1234 7"
Just ("1234"," 7")
λ> parse (string "k" <> b) "k1234 7"
Just ("k1234"," 7")
So maybe you should find some tutorials and try to be familiar with Functor, Applicative, and Monad. See, you can implement the instance of a Monoid for your Parser type, and then you can use (<>) to combine your parsed results together.

It looks like this is working fine :
parse (fmap (:) (satisfy isAlpha) <*> many (satisfy isDigit)) "k1234 7"
And gives back what I wanted
Just ("k1234"," 7")

Related

Combining parsers in Haskell

I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?
To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)

Minimal Purely Applicative Parser

I'm trying to figure out how to build a "purely applicative parser" based on a simple parser implementation. The parser would not use monads in its implementation. I asked this question previously but mis-framed it so I'm trying again.
Here is the basic type and its Functor, Applicative and Alternative implementations:
newtype Parser a = Parser { parse :: String -> [(a,String)] }
instance Functor Parser where
fmap f (Parser cs) = Parser (\s -> [(f a, b) | (a, b) <- cs s])
instance Applicative Parser where
pure = Parser (\s -> [(a,s)])
(Parser cs1) <*> (Parser cs2) = Parser (\s -> [(f a, s2) | (f, s1) <- cs1 s, (a, s2) <- cs2 s1])
instance Alternative Parser where
empty = Parser $ \s -> []
p <|> q = Parser $ \s ->
case parse p s of
[] -> parse q s
r -> r
The item function takes a character off the stream:
item :: Parser Char
item = Parser $ \s ->
case s of
[] -> []
(c:cs) -> [(c,cs)]
At this point, I want to implement digit. I can of course do this:
digit = Parser $ \s ->
case s of
[] -> []
(c:cs) -> if isDigit c then [(c, cs)] else []
but I'm replicating the code of item. I'd like to implement digit based on item.
How do I go about implementing digit, using item to take a character off the stream and then checking to see if the character is a digit without bringing monadic concepts into the implementation?
First, let us write down all the tools we currently have at hand:
-- Data constructor
Parser :: (String -> [(a, String)]) -> Parser a
-- field accessor
parse :: Parser a -> String -> [(a, String)]
-- instances, replace 'f' by 'Parser'
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
pure :: Applicative f => a -> f a
-- the parser at hand
item :: Parser Char
-- the parser we want to write with item
digit :: Parser Char
digit = magic item
-- ?
magic :: Parser Char -> Parser Char
The real question at hand is "what is magic"? There are only so many things we can use. Its type indicates fmap, but we can rule that out. All we can provide is some function a -> b, but there is no f :: Char -> Char that makes fmap f indicate a failure.
What about (<*>), can this help? Well, again, the answer is no. The only thing we can do here is to take the (a -> b) out of the context and apply it; whatever that means in the context of the given Applicative. We can rule pure out.
The problem is that we need to check the Char that item might parse and change the context. We need something like Char -> Parser Char
But we didn't rule Parser or parse out!
magic p = Parser $ \s ->
case parse p s of -- < item will be used here
[(c, cs)] -> if isDigit c then [(c, cs)] else []
_ -> []
Yes, I know, it's duplicate code, but now it's using item. It's using item before inspecting the character. That's the only way we can use item here. And now, there is some kind of sequence implied: item has to succeed before digit can do it's work.
Alternatively, we could have tried this way:
digit' c :: Char -> Parser Char
digit' c = if isDigit c then pure c else empty
But then fmap digit' item would have the type Parser (Parser Char), which can only get collapsed with a join-like function. That's why monads are more powerful than applicative.
That being said, you can get around all of the monad requirements if you use a more general function first:
satisfy :: (Char -> Bool) -> Parser Char
satisfy = Parser $ \s ->
case s of
(c:cs) | p c -> [(c, cs)]
_ -> []
You can then define both item and digit in terms of satisfy:
item = satisfy (const True)
digit = satisfy isDigit
That way digit does not have to inspect the result of a previous parser.
Functors allow you to act on somethings values. For example, if you have a list [1,2,3], you can change the contents. Note that Functors do not allow changing structure. map can not change the length of a list.
Applicatives allow you to combine structure, and the content is mushed together somehow. But the values can not change influence the structure.
Namely, given an item, we can change its structure, and we can change its content, but the content can not change the structure. We can't choose to fail on some content and not other.
If anyone knows how to state this more formally and provably, I'm all ears (it probably has to do with free theorems).

Purely Applicative Parser using Alternative

In a previous post, a user offered an implementation of a purely applicative parser for Haskell (code originally from here). Below is the partial implementation of that parser:
{-# LANGUAGE Rank2Types #-}
import Control.Applicative (Alternative(..))
import Data.Foldable (asum, traverse_)
The type:
newtype Parser a = Parser {run :: forall f. Alternative f => (Char -> f ()) -> f a}
The instances:
instance Functor Parser where
fmap f (Parser cont) = Parser $ \char -> f <$> cont char
instance Applicative Parser where
pure a = Parser $ \char -> pure a
(Parser contf) <*> (Parser cont) = Parser $ \char -> (contf char) <*> (cont char)
instance Alternative Parser where
empty = Parser $ \char -> empty
(Parser cont) <|> (Parser cont') = Parser $ \char -> (cont char) <|> (cont' char)
some (Parser cont) = Parser $ \char -> some $ cont char
many (Parser cont) = Parser $ \char -> many $ cont char
Some example parsers:
item = Parser $ \char -> asum $ map (\c -> c <$ char c) ['A'..'z']
digit = Parser $ \char -> asum $ map (\c -> c <$ char (head $ show c)) [0..9]
string s = Parser $ \char -> traverse_ char s
Unfortunately, I'm having a hard time trying to understand how I might use this parser implementation. In particular, I do not understand what Char -> f () should/could be and how I could use this to do simple parsing, e.g. to extra a digit out of an input string. I'd like a concrete example if possible. Could someone please shed some light?
In forall f. Alternative f => (Char -> f ()) -> f a, the Char -> f () is something that you are provided with. Your mission, should you choose to accept it, is to then turn that into an f a using only these two bits:
The Char -> f () function (i.e. a way to parse a single character: if the next character matches the argument, the parsing succeeds; otherwise it doesn't.)
The Alternative instance of f
So how would you parse a single digit into an Int? It would have to be of the form
digit :: Parser Int
digit = Parser $ \parseChar -> _
In _, we have to create an f Int using the kit parseChar :: Char -> f () and Alternative f. We know how to parse a single '0' character: parseChar '0' succeds iff the next character is '0'. We can turn it into a value of Int via f's Functor instance, arriving at
digit0 :: Parser Int
digit0 = Parser $ \parseChar -> fmap (const 0) (parseChar '0')
But f is not just Functor, it is also Alternative, so we can write digit in long-form as
digit :: Parser Int
digit = Parser $ \parseChar -> fmap (const 0) (parseChar '0') <|>
fmap (const 1) (parseChar '1') <|>
fmap (const 2) (parseChar '2') <|>
fmap (const 3) (parseChar '3') <|>
fmap (const 4) (parseChar '4') <|>
fmap (const 5) (parseChar '5') <|>
fmap (const 6) (parseChar '6') <|>
fmap (const 7) (parseChar '7') <|>
fmap (const 8) (parseChar '8') <|>
fmap (const 9) (parseChar '9')
And from here, it is merely a matter of pedestrian Haskell programming to cut down on the cruft, arriving at something like
digit :: Parser Int
digit = Parser $ \parseChar -> asum [fmap (const d) (parseChar c) | d <- [0..9], let [c] = show d]
which we can further simplify by noting that fmap (const x) f can be written as x <$ f, giving
digit :: Parser Int
digit = Parser $ \parseChar -> asum [d <$ parseChar c | d <- [0..9], let [c] = show d]
The Char -> f () part represents matching on a single character. Namely, if you do char 'c', it will match on 'c' and fail on everything else.
To use it, you can convert it to, say Parsec:
convert :: Parser a -> Parsec a
convert p = run p anyChar
p is essentially of the type forall f. Alternative f => (Char -> f ()) -> f a, which specializes to (Char -> Parsec ()) -> Parsec a. We pass in anyChar, and it will produce a Parsec a value by using anyChar and any Alternative operations.
Basically, a Parser a it is a function that, given away to match on a single character, and an Alternative instance, it will produce an Alternative value.

Removing Left Recursion in a Basic Expression Parser

As an exercise, I'm implementing a parser for an exceedingly simple language defined in Haskell using the following GADT (the real grammar for my project involves many more expressions, but this extract is sufficient for the question):
data Expr a where
I :: Int -> Expr Int
Add :: [Expr Int] -> Expr Int
The parsing functions are as follows:
expr :: Parser (Expr Int)
expr = foldl1 mplus
[ lit
, add
]
lit :: Parser (Expr Int)
lit = I . read <$> some digit
add :: Parser (Expr Int)
add = do
i0 <- expr
is (== '+')
i1 <- expr
is <- many (is (== '+') *> expr)
pure (Add (i0:i1:is))
Due to the left-recursive nature of the expression grammar, when I attempt to parse something as simple as 1+1 using the expr parser, the parser get stuck in an infinite loop.
I've seen examples of how to factor out left recursion across the web using a transformation from something like:
S -> S a | b
Into something like:
S -> b T
T -> a T
But I'm struggling with how to apply this to my parser.
For completeness, here is the code that actually implements the parser:
newtype Parser a = Parser
{ runParser :: String -> [(a, String)]
}
instance Functor Parser where
fmap f (Parser p) = Parser $ \s ->
fmap (\(a, r) -> (f a, r)) (p s)
instance Applicative Parser where
pure a = Parser $ \s -> [(a, s)]
(<*>) (Parser f) (Parser p) = Parser $ \s ->
concat $ fmap (\(f', r) -> fmap (\(a, r') -> (f' a, r')) (p r)) (f >
instance Alternative Parser where
empty = Parser $ \s -> []
(<|>) (Parser a) (Parser b) = Parser $ \s ->
case a s of
(r:rs) -> (r:rs)
[] -> case b s of
(r:rs) -> (r:rs)
[] -> []
instance Monad Parser where
return = pure
(>>=) (Parser a) f = Parser $ \s ->
concat $ fmap (\(r, rs) -> runParser (f r) rs) (a s)
instance MonadPlus Parser where
mzero = empty
mplus (Parser a) (Parser b) = Parser $ \s -> a s ++ b s
char = Parser $ \case (c:cs) -> [(c, cs)]; [] -> []
is p = char >>= \c -> if p c then pure c else empty
digit = is isDigit
Suppose you want to parse non-parenthesized expressions involving literals, addition, and multiplication. You can do this by cutting down the list by precedence. Here's one way to do it in attoparsec, which should be pretty similar to what you'd do with your parser. I'm no parsing expert, so there might be some errors or infelicities.
import Data.Attoparsec.ByteString.Char8
import Control.Applicative
expr :: Parser (Expr Int)
expr = choice [add, mul, lit] <* skipSpace
-- choice is in Data.Attoparsec.Combinators, but is
-- actually a general Alternative operator.
add :: Parser (Expr Int)
add = Add <$> addList
addList :: Parser [Expr Int]
addList = (:) <$> addend <* skipSpace <* char '+' <*> (addList <|> ((:[]) <$> addend))
addend :: Parser (Expr Int)
addend = mul <|> multiplicand
mul :: Parser (Expr Int)
mul = Mul <$> mulList
mulList :: Parser [Expr Int]
mulList = (:) <$> multiplicand <* skipSpace <* char '*' <*> (mulList <|> ((:[]) <$> multiplicand))
multiplicand :: Parser (Expr Int)
multiplicand = lit
lit :: Parser (Expr Int)
lit = I <$> (skipSpace *> decimal)

Infinite loop when implementing "zero or more" in Haskell Arrow parser

I am learning how to use arrows in Haskell and have implemented the following parser.
All tests work fine except for the last two tests:
test (pZeroOrMore pDigit) "x123abc"
test (pZeroOrMore pDigit) "123abc"
Those tests get stuck in an infinite loop. The question is why? As far as I can see it should work OK?
{-# LANGUAGE Arrows #-}
module Code.ArrowParser where
import Control.Arrow
import Control.Category
import Data.Char
import Prelude hiding (id,(.))
---------------------------------------------------------------------
data Parser a b = Parser { runParser :: (a,String) -> Either (String,String) (b,String) }
---------------------------------------------------------------------
instance Category Parser where
id = Parser Right
(Parser bc) . (Parser ab) = Parser $ \a ->
case ab a of
Left es -> Left es
Right b -> bc b
---------------------------------------------------------------------
instance Arrow Parser where
arr ab = Parser $ \(a,s) -> Right (ab a,s)
first (Parser ab) = Parser $ \((a,c),as) ->
case ab (a,as) of
Left es -> Left es
Right (b,bs) -> Right ((b,c),bs)
---------------------------------------------------------------------
pChar :: Char -> Parser a Char
pChar c =
pMatch (== c) ("'" ++ [c] ++ "' expected")
---------------------------------------------------------------------
pConst :: a -> Parser x a
pConst a = arr (\_ -> a)
---------------------------------------------------------------------
pDigit :: Parser a Int
pDigit =
pMatch isDigit ("Digit expected") >>> arr (\c -> ord c - ord '0')
---------------------------------------------------------------------
pError :: String -> Parser a ()
pError e = Parser $ \(_,s) -> Left (e,s)
---------------------------------------------------------------------
pIf :: Parser a b -> Parser b c -> Parser a c -> Parser a c
pIf (Parser pc) (Parser pt) (Parser pf) = Parser $ \(a,as) ->
case pc (a,as) of
Right (b,bs) -> pt (b,bs)
Left _ -> pf (a,as)
---------------------------------------------------------------------
pMatch :: (Char -> Bool) -> String -> Parser a Char
pMatch f e = Parser $ \(_,s) ->
if s /= [] && f (head s) then
Right (head s,tail s)
else
Left (e, s)
---------------------------------------------------------------------
pMaybe :: (Char -> Maybe b) -> String -> Parser a b
pMaybe f e = Parser $ \(_,s) ->
if s == [] then
Left (e, s)
else
case f (head s) of
Nothing -> Left (e,s)
Just b -> Right (b,tail s)
---------------------------------------------------------------------
pZeroOrMore :: Parser () b -> Parser () [b]
pZeroOrMore p =
pIf p (arr (\b -> [b])) (pConst [])
>>> arr ((,) ())
>>> first (pZeroOrMore p)
>>> arr (\(b1,b0) -> b0 ++ b1)
---------------------------------------------------------------------
test :: Show a => Parser () a -> String -> IO ()
test p s =
print $ runParser p ((),s)
---------------------------------------------------------------------
arMain :: IO ()
arMain = do
test (pChar 'a') "abcdef"
test (pChar 'b') "abcdef"
test pDigit "54321"
test (pIf (pChar 'a') (pChar 'b') (pChar 'c')) "abc"
test (pIf (pChar 'a') (pChar 'b') (pChar 'c')) "bc"
test (pIf (pChar 'a') (pChar 'b') (pChar 'c')) "c"
test (pError "Error!" >>> pChar 'a') "abc"
test (pZeroOrMore pDigit) "x123abc"
test (pZeroOrMore pDigit) "123abc"
There's no stoppage condition on your pZeroOrMore function. The line pIf p (arr (\b -> [b])) (pConst []) is always returning Right ... even if there's no parse. Which means that the recursive call first (pZeroOrMore p) is always executed, thus the infinite loop.

Resources