Removing Left Recursion in a Basic Expression Parser

Removing Left Recursion in a Basic Expression Parser - parsing

As an exercise, I'm implementing a parser for an exceedingly simple language defined in Haskell using the following GADT (the real grammar for my project involves many more expressions, but this extract is sufficient for the question):
data Expr a where
I :: Int -> Expr Int
Add :: [Expr Int] -> Expr Int
The parsing functions are as follows:
expr :: Parser (Expr Int)
expr = foldl1 mplus
[ lit
, add
]
lit :: Parser (Expr Int)
lit = I . read <$> some digit
add :: Parser (Expr Int)
add = do
i0 <- expr
is (== '+')
i1 <- expr
is <- many (is (== '+') *> expr)
pure (Add (i0:i1:is))
Due to the left-recursive nature of the expression grammar, when I attempt to parse something as simple as 1+1 using the expr parser, the parser get stuck in an infinite loop.
I've seen examples of how to factor out left recursion across the web using a transformation from something like:
S -> S a | b
Into something like:
S -> b T
T -> a T
But I'm struggling with how to apply this to my parser.
For completeness, here is the code that actually implements the parser:
newtype Parser a = Parser
{ runParser :: String -> [(a, String)]
}
instance Functor Parser where
fmap f (Parser p) = Parser $ \s ->
fmap (\(a, r) -> (f a, r)) (p s)
instance Applicative Parser where
pure a = Parser $ \s -> [(a, s)]
(<*>) (Parser f) (Parser p) = Parser $ \s ->
concat $ fmap (\(f', r) -> fmap (\(a, r') -> (f' a, r')) (p r)) (f >
instance Alternative Parser where
empty = Parser $ \s -> []
(<|>) (Parser a) (Parser b) = Parser $ \s ->
case a s of
(r:rs) -> (r:rs)
[] -> case b s of
(r:rs) -> (r:rs)
[] -> []
instance Monad Parser where
return = pure
(>>=) (Parser a) f = Parser $ \s ->
concat $ fmap (\(r, rs) -> runParser (f r) rs) (a s)
instance MonadPlus Parser where
mzero = empty
mplus (Parser a) (Parser b) = Parser $ \s -> a s ++ b s
char = Parser $ \case (c:cs) -> [(c, cs)]; [] -> []
is p = char >>= \c -> if p c then pure c else empty
digit = is isDigit

Suppose you want to parse non-parenthesized expressions involving literals, addition, and multiplication. You can do this by cutting down the list by precedence. Here's one way to do it in attoparsec, which should be pretty similar to what you'd do with your parser. I'm no parsing expert, so there might be some errors or infelicities.
import Data.Attoparsec.ByteString.Char8
import Control.Applicative
expr :: Parser (Expr Int)
expr = choice [add, mul, lit] <* skipSpace
-- choice is in Data.Attoparsec.Combinators, but is
-- actually a general Alternative operator.
add :: Parser (Expr Int)
add = Add <$> addList
addList :: Parser [Expr Int]
addList = (:) <$> addend <* skipSpace <* char '+' <*> (addList <|> ((:[]) <$> addend))
addend :: Parser (Expr Int)
addend = mul <|> multiplicand
mul :: Parser (Expr Int)
mul = Mul <$> mulList
mulList :: Parser [Expr Int]
mulList = (:) <$> multiplicand <* skipSpace <* char '*' <*> (mulList <|> ((:[]) <$> multiplicand))
multiplicand :: Parser (Expr Int)
multiplicand = lit
lit :: Parser (Expr Int)
lit = I <$> (skipSpace *> decimal)

Related

How to parse a bool expression in Haskell

I am trying to parse a bool expression in Haskell. This line is giving me an error: BoolExpr <$> parseBoolOp <*> (n : ns). This is the error:
• Couldn't match type ‘[]’ with ‘Parser’
Expected type: Parser [Expr]
Actual type: [Expr]
-- define the expression types
data Expr
= BoolExpr BoolOp [Expr]
deriving (Show, Eq)
-- define the type for bool value
data Value
= BoolVal Bool
deriving (Show, Eq)
-- many x = Parser.some x <|> pure []
-- some x = (:) <$> x <*> Parser.many x
kstar :: Alternative f => f a -> f [a]
kstar x = kplus x <|> pure []
kplus :: Alternative f => f a -> f [a]
kplus x = (:) <$> x <*> kstar x
symbol :: String -> Parser String
symbol xs = token (string xs)
-- a bool expression is the operator followed by one or more expressions that we have to parse
-- TODO: add bool expressions
boolExpr :: Parser Expr
boolExpr = do
n <- parseExpr
ns <- kstar (symbol "," >> parseExpr)
BoolExpr <$> parseBoolOp <*> (n : ns)
-- an atom is a literalExpr, which can be an actual literal or some other things
parseAtom :: Parser Expr
parseAtom =
do
literalExpr
-- the main parsing function which alternates between all the options you have
parseExpr :: Parser Expr
parseExpr =
do
parseAtom
<|> parseParens boolExpr
<|> parseParens parseExpr
-- implement parsing bool operations, these are 'and' and 'or'
parseBoolOp :: Parser BoolOp
parseBoolOp =
do symbol "and" >> return And
<|> do symbol "or" >> return Or
The boolExpr is expecting a Parser [Expr] but I am returning only an [Expr]. Is there a way to fix this or do it in another way? When I try pure (n:ns), evalStr "(and true (and false true) true)" returns Left (ParseError "'a' didn't match expected character") instead of Right (BoolVal False)

The expression (n : ns) is a list. Therefore the compiler thinks that the applicative operators <*> and <$> should be used in the context [], while you want Parser instead.
I would guess you need pure (n : ns) instead.

Haskell : Operator Parser keeps going to undefined rather than inputs

I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)

The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)

Combining parsers in Haskell

I'm given the following parsers
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
fmap f p = Parser $ \s -> (\(a,c) -> (f a, c)) <$> parse p s
instance Applicative Parser where
pure a = Parser $ \s -> Just (a,s)
f <*> a = Parser $ \s ->
case parse f s of
Just (g,s') -> parse (fmap g a) s'
Nothing -> Nothing
instance Alternative Parser where
empty = Parser $ \s -> Nothing
l <|> r = Parser $ \s -> parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
ensure p parser = Parser $ \s ->
case parse parser s of
Nothing -> Nothing
Just (a,s') -> if p a then Just (a,s') else Nothing
lookahead :: Parser (Maybe Char)
lookahead = Parser f
where f [] = Just (Nothing,[])
f (c:s) = Just (Just c,c:s)
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser f
where f [] = Nothing
f (x:xs) = if p x then Just (x,xs) else Nothing
eof :: Parser ()
eof = Parser $ \s -> if null s then Just ((),[]) else Nothing
eof' :: Parser ()
eof' = ???
I need to write a new parser eof' that does exactly what eof does but is built only using the given parsers and the
Functor/Applicative/Alternative instances above. I'm stuck on this as I don't have experience in combining parsers. Can anyone help me out ?

To understand it easier, we can write it in an equational pseudocode, while we substitute and simplify the definitions, using Monad Comprehensions for clarity and succinctness.
Monad Comprehensions are just like List Comprehensions, only working for any MonadPlus type, not just []; while corresponding closely to do notation, e.g. [ (f a, s') | (a, s') <- parse p s ] === do { (a, s') <- parse p s ; return (f a, s') }.
This gets us:
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
instance Functor Parser where
parse (fmap f p) s = [ (f a, s') | (a, s') <- parse p s ]
instance Applicative Parser where
parse (pure a) s = pure (a, s)
parse (pf <*> pa) s = [ (g a, s'') | (g, s') <- parse pf s
, (a, s'') <- parse pa s' ]
instance Alternative Parser where
parse empty s = empty
parse (l <|> r) s = parse l s <|> parse r s
ensure :: (a -> Bool) -> Parser a -> Parser a
parse (ensure pred p) s = [ (a, s') | (a, s') <- parse p s, pred a ]
lookahead :: Parser (Maybe Char)
parse lookahead [] = pure (Nothing, [])
parse lookahead s#(c:_) = pure (Just c, s )
satisfy :: (Char -> Bool) -> Parser Char
parse (satisfy p) [] = mzero
parse (satisfy p) (x:xs) = [ (x, xs) | p x ]
eof :: Parser ()
parse eof s = [ ((), []) | null s ]
eof' :: Parser ()
eof' = ???
By the way thanks to the use of Monad Comprehensions and the more abstract pure, empty and mzero instead of their concrete representations in terms of the Maybe type, this same (pseudo-)code will work with a different type, like [] in place of Maybe, viz. newtype Parser a = Parser { parse :: String -> [(a,String)] }.
So we have
ensure :: (a -> Bool) -> Parser a -> Parser a
lookahead :: Parser (Maybe Char)
(satisfy is no good for us here .... why?)
Using that, we can have
ensure ....... ...... :: Parser (Maybe Char)
(... what does ensure id (pure False) do? ...)
but we'll have a useless Nothing result in case the input string was in fact empty, whereas the eof parser given to use produces the () as its result in such case (and otherwise it produces nothing).
No fear, we also have
fmap :: ( a -> b ) -> Parser a -> Parser b
which can transform the Nothing into () for us. We'll need a function that will always do this for us,
alwaysUnit nothing = ()
which we can use now to arrive at the solution:
eof' = fmap ..... (..... ..... ......)

Cascading Parsers in Haskell

My parser type is
newtype Parser a = Parser { parse :: String -> Maybe (a,String) }
I have two parsers :
1) a = (satisfy isAlpha) that knows how to match the first alpha numeric character in a string.
Running parse a "k345" gives Just ('k',"345")
2) b = many (satisfy isDigit) that knows how to match any number of digits. Running parse b "1234 abc" gives Just ("1234"," abc")
Now I want to combine those two parsers and match a singe alphanumeric character followed by any number of digits.
I tried:
parse (a *> b) "k1234 7" and got Just ("1234"," 7 "). Looks like the 'k' matched by the first parser a is gone from the output. How do I fix this problem ?
Thanks!

For a toy parser, look the following code:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}
module Parse where
import Data.Char
import Data.List
newtype Parser a = Parser
{ parse :: String -> Maybe (a, String) }
satisfy :: (Char -> Bool) -> Parser Char
satisfy cond = Parser $ \s ->
case s of
"" -> Nothing
(c:cs) -> if cond c then Just (c, cs) else Nothing
many :: Parser a -> Parser [a]
many p = Parser $ \s ->
case parse p s of
Nothing -> Just ([], s)
Just (c, cs) -> let Just (cc, cs') = parse (many p) cs
in Just (c:cc, cs')
string :: String -> Parser String
string str = Parser $ \s -> if isPrefixOf str s
then Just (str, drop (length str) s)
else Nothing
instance Functor Parser where
fmap f (Parser g) = Parser $ \s ->
case g s of
Nothing -> Nothing
Just (r, remain) -> Just (f r, remain)
instance Applicative Parser where
pure a = Parser $ \s -> Just (a, s)
-- (<*>) :: f (a -> b) -> f a -> f b
(Parser f) <*> (Parser g) = Parser $ \s ->
case f s of
Nothing -> Nothing
Just (ab, remain) -> case g remain of
Nothing -> Nothing
Just (r, remain1) -> Just (ab r, remain1)
instance Semigroup a => Semigroup (Parser a) where
(Parser p1) <> (Parser p2) = Parser $ \s ->
case p1 s of
Nothing -> Nothing
Just (r1, s1) -> case p2 s1 of
Nothing -> Nothing
Just (r2, s2) -> Just (r1 <> r2, s2)
instance (Monoid a, Semigroup (Parser a))=> Monoid (Parser a) where
mempty = Parser $ \s -> Just (mempty, s)
mappend = (<>)
a = satisfy isAlpha
b = many (satisfy isDigit)
λ> parse a "k345"
Just ('k',"345")
λ> parse b "12345 abc"
Just ("12345"," abc")
λ> parse (a *> b) "k1234 7"
Just ("1234"," 7")
λ> parse (string "k" <> b) "k1234 7"
Just ("k1234"," 7")
So maybe you should find some tutorials and try to be familiar with Functor, Applicative, and Monad. See, you can implement the instance of a Monoid for your Parser type, and then you can use (<>) to combine your parsed results together.

It looks like this is working fine :
parse (fmap (:) (satisfy isAlpha) <*> many (satisfy isDigit)) "k1234 7"
And gives back what I wanted
Just ("k1234"," 7")

Purely Applicative Parser using Alternative

In a previous post, a user offered an implementation of a purely applicative parser for Haskell (code originally from here). Below is the partial implementation of that parser:
{-# LANGUAGE Rank2Types #-}
import Control.Applicative (Alternative(..))
import Data.Foldable (asum, traverse_)
The type:
newtype Parser a = Parser {run :: forall f. Alternative f => (Char -> f ()) -> f a}
The instances:
instance Functor Parser where
fmap f (Parser cont) = Parser $ \char -> f <$> cont char
instance Applicative Parser where
pure a = Parser $ \char -> pure a
(Parser contf) <*> (Parser cont) = Parser $ \char -> (contf char) <*> (cont char)
instance Alternative Parser where
empty = Parser $ \char -> empty
(Parser cont) <|> (Parser cont') = Parser $ \char -> (cont char) <|> (cont' char)
some (Parser cont) = Parser $ \char -> some $ cont char
many (Parser cont) = Parser $ \char -> many $ cont char
Some example parsers:
item = Parser $ \char -> asum $ map (\c -> c <$ char c) ['A'..'z']
digit = Parser $ \char -> asum $ map (\c -> c <$ char (head $ show c)) [0..9]
string s = Parser $ \char -> traverse_ char s
Unfortunately, I'm having a hard time trying to understand how I might use this parser implementation. In particular, I do not understand what Char -> f () should/could be and how I could use this to do simple parsing, e.g. to extra a digit out of an input string. I'd like a concrete example if possible. Could someone please shed some light?

In forall f. Alternative f => (Char -> f ()) -> f a, the Char -> f () is something that you are provided with. Your mission, should you choose to accept it, is to then turn that into an f a using only these two bits:
The Char -> f () function (i.e. a way to parse a single character: if the next character matches the argument, the parsing succeeds; otherwise it doesn't.)
The Alternative instance of f
So how would you parse a single digit into an Int? It would have to be of the form
digit :: Parser Int
digit = Parser $ \parseChar -> _
In _, we have to create an f Int using the kit parseChar :: Char -> f () and Alternative f. We know how to parse a single '0' character: parseChar '0' succeds iff the next character is '0'. We can turn it into a value of Int via f's Functor instance, arriving at
digit0 :: Parser Int
digit0 = Parser $ \parseChar -> fmap (const 0) (parseChar '0')
But f is not just Functor, it is also Alternative, so we can write digit in long-form as
digit :: Parser Int
digit = Parser $ \parseChar -> fmap (const 0) (parseChar '0') <|>
fmap (const 1) (parseChar '1') <|>
fmap (const 2) (parseChar '2') <|>
fmap (const 3) (parseChar '3') <|>
fmap (const 4) (parseChar '4') <|>
fmap (const 5) (parseChar '5') <|>
fmap (const 6) (parseChar '6') <|>
fmap (const 7) (parseChar '7') <|>
fmap (const 8) (parseChar '8') <|>
fmap (const 9) (parseChar '9')
And from here, it is merely a matter of pedestrian Haskell programming to cut down on the cruft, arriving at something like
digit :: Parser Int
digit = Parser $ \parseChar -> asum [fmap (const d) (parseChar c) | d <- [0..9], let [c] = show d]
which we can further simplify by noting that fmap (const x) f can be written as x <$ f, giving
digit :: Parser Int
digit = Parser $ \parseChar -> asum [d <$ parseChar c | d <- [0..9], let [c] = show d]

The Char -> f () part represents matching on a single character. Namely, if you do char 'c', it will match on 'c' and fail on everything else.
To use it, you can convert it to, say Parsec:
convert :: Parser a -> Parsec a
convert p = run p anyChar
p is essentially of the type forall f. Alternative f => (Char -> f ()) -> f a, which specializes to (Char -> Parsec ()) -> Parsec a. We pass in anyChar, and it will produce a Parsec a value by using anyChar and any Alternative operations.
Basically, a Parser a it is a function that, given away to match on a single character, and an Alternative instance, it will produce an Alternative value.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Removing Left Recursion in a Basic Expression Parser - parsing

Related

How to parse a bool expression in Haskell

Haskell : Operator Parser keeps going to undefined rather than inputs

Combining parsers in Haskell

Cascading Parsers in Haskell

Purely Applicative Parser using Alternative

Categories

Resources