haskell recursive descent parser - parsing

How can i create a recursive decent parser without the use of parsec or any library, for this grammar?
The output should be an error message if the string does not belong in this grammar?
parse::String -> AST
Re -> Sq | Sq + Re
Sq -> Ba | Ba Sq
Ba -> El | Ba*
El -> lower-or-digit | (Re)
lower-or-digit are just lowercase letters or digits

First, you need to define your abstract syntax tree, likely as some declared data types. Then you want to define your basic parsing action. For instance,
type ParseResult = Either String AST
type ParseState = (ParseResult, String)
Your parse action is straightforward:
re, sq, ba, el :: ParseState -> ParseState
where re is the top level parser action.
The concrete parsing step might look like this:
el (_, ('(':restOfInput)) = case re (Right restOfInput) of
err#(Left error, s) -> err
(result, ')':s) -> (El result, s)
(_, s) -> (Left "no closing parens", s)
el (_, input#(c:restOfInput)) = if lowerOrDigit c
then (El c, restOfInput)
else (Left "bad character", "")
Where a parsing library buys you a lot of traction is in handling all of the parsing state and propagating errors up the call stack.

Related

How do I generalize a Parser which extracts values from tokens?

I'm creating a parser in Haskell for a simple language. The parser takes a list of tokens, generated by a separate tokenizer, and turns it into a structured tree. While I was writing this Parser, I came across a reoccurring pattern that I would like to generalize. Here are the relevant definitions:
--Parser Definition
data Parser a b = Parser { runParser :: [a] -> Maybe ([a], b) }
--Token Definition
data Token = TokOp { getTokOp :: Operator }
| TokInt { getTokInt :: Int }
| TokIdent { getTokIdent :: String }
| TokAssign
| TokLParen
| TokRParen
deriving (Eq, Show)
The parser also has instances defined for MonadPlus and all of its super classes. Here are two examples of the reoccurring Pattern I am trying to generalize:
-- Identifier Parser
identP :: Parser Token String
identP = Parser $ \input -> case input of
TokIdent s : rest -> Just (rest, s)
_ -> Nothing
--Integer Parser
intP :: Parser Token Int
intP = Parser $ \input -> case input of
TokInt n : rest -> Just (rest, n)
_ -> Nothing
As you can see, these two examples are incredibly similar, yet I see no way generalize it. Ideally I would want some function of type extractToken :: ?? -> Parser Token a where a is the value contained by the token. I am guessing the solution involves >>=, but I am not experienced enough with Monads to figure it out. This also may be impossible, I am not sure.
It seems difficult to avoid at least some boilerplate. One simple approach is to manually define the field selectors to return Maybes:
{-# LANGUAGE LambdaCase #-}
data Token = TokOp Operator
| TokInt Int
| TokIdent String
| TokAssign
| TokLParen
| TokRParen
deriving (Eq, Show)
getTokOp = \case { TokOp x -> Just x ; _ -> Nothing }
getTokInt = \case { TokInt x -> Just x ; _ -> Nothing }
getTokIdent = \case { TokIdent x -> Just x ; _ -> Nothing }
and then the rest is:
fieldP :: (Token -> Maybe a) -> Parser Token a
fieldP sel = Parser $ \case tok:rest -> (,) rest <$> sel tok
[] -> Nothing
opP = fieldP getTokOp
identP = fieldP getTokIdent
intP = fieldP getTokInt
You could derive the getXXX selectors using Template Haskell or generics, though it's likely not worth it.

Parsing a Complex Number from Scratch

Given a data type data CI = CI Int Int, representing a complex number, I want to build a parser for CI that can convert "a" to CI a 0 and "(a,b)" to CI a b. For example, I want a function parseCI such runParser parseCI "(1,2)" returns the value [(CI 1 2, "")] (ideally, but something similar is fine). I also want to make CI an instance of read.
I would like to do this using functions and definitions from the code below (basically, without anything advanced, like Parsec), but I'm not sure where to start. Some starting code to set me on the right track and/or a hint would be helpful. I'm not looking for a full answer, as I'd like to figure that out myself.
module Parser where
import Control.Applicative
import Control.Monad
newtype Parser a = Parser { runParser :: String -> [(a,String)] }
satisfy :: (Char -> Bool) -> Parser Char
satisfy f = Parser $ \s -> case s of
[] -> []
a:as -> [(a,as) | f a]
char :: Char -> Parser Char
char = satisfy . (==)
string :: String -> Parser String
string str = Parser $ \s -> [(t,u) | let (t,u) = splitAt (length str) s, str == t]
instance Functor Parser where
fmap f p = Parser $ \s ->
[ (f a,t)
| (a,t) <- runParser p s
]
instance Applicative Parser where
pure a = Parser $ \s -> [(a,s)]
af <*> aa = Parser $ \s ->
[ (f a,u)
| (f,t) <- runParser af s
, (a,u) <- runParser aa t
]
instance Alternative Parser where
empty = Parser $ \s -> []
p1 <|> p2 = Parser $ (++) <$> runParser p1 <*> runParser p2`
instance Monad Parser where
return = pure
ma >>= f = Parser $ \s ->
[ (b,u)
| (a,t) <- runParser ma s
, (b,u) <- runParser (f a) t
]
instance MonadPlus Parser where
mzero = empty
mplus = (<|>)
You've probably already seen it, but in case you haven't: Monadic Parsing in Haskell sets up parsing like this.
Since you have two different ways of parsing CI, you might want to approach this as two problems: make one parser parseCI1 that parses "a" to CI a 0 and make another parser parseCI2 that parses "(a,b)" to CI a b. Then, you can combine these into one with
parseCI = parseCI1 <|> parseCI2
For both of these subparsers, you will need some way of parsing integers: parseInt :: Parser Int. When making parseInt, you will likely want to use some combination of satisfy, isDigit, read, and possibly some (depending on how you go about solving this).
Making CI an instance of read is a bit more straightforward once you have parseCI done:
instance Read CI where
readsPrec _ = runParser parseCI

parsec produce strange errors when trying to handle Maybe as ParseError

If I have this code :
import Text.Parsec
ispositive a = if (a<0) then Nothing else (Just a)
f a b = a+b
parserfrommaybe :: String -> (Maybe c) -> Parsec a b c
parserfrommaybe msg Nothing = fail msg
parserfrommaybe _ (Just res) = return res
(<!>) :: Parsec a b (Maybe c) -> String -> Parsec a b c
(<!>) p1 msg = p1 >>= (parserfrommaybe msg)
integermaybeparser = (ispositive <$> integer) <!> "negative numbers are not allowed"
testparser = f <$> (integermaybeparser <* whiteSpace) <*> integermaybeparser
when I test testparser with input like this "-1 3" it gives :
Left (line 1, column 4):
unexpected "3"
negative numbers are not allowed
I expected it to give error on Column 1 and give the error message without the sentence "unexpected 3" but it seems parsec continued parsing.
Why did this happen ? and how to make parsec give the error message I expect ?
I have found the solution, the cause of is that the first parser gets run and consumes input even when failing.
The solution was to use lookAhead like this:
(<!>) :: (Monad m,Stream a m t) => ParsecT a b m (Maybe c) -> String -> ParsecT a b m c
(<!>) p1 msg = ((lookAhead p1) >>= (parserfrommaybe msg)) *> (p1 >>= (parserfrommaybe msg))
if lookAhead p1 returns Nothing then the first argument of *> would fail without consuming input because of lookAhead, now if lookAhead p1 returns Just res then it would succeed again without consuming input and the result would be obtained from the second argument of *>.
ofcourse I had to change parserfrommaybe type annotation to (Monad m) => String -> (Maybe c) -> ParsecT a b m c to satisfy ghc.

Minimal Purely Applicative Parser

I'm trying to figure out how to build a "purely applicative parser" based on a simple parser implementation. The parser would not use monads in its implementation. I asked this question previously but mis-framed it so I'm trying again.
Here is the basic type and its Functor, Applicative and Alternative implementations:
newtype Parser a = Parser { parse :: String -> [(a,String)] }
instance Functor Parser where
fmap f (Parser cs) = Parser (\s -> [(f a, b) | (a, b) <- cs s])
instance Applicative Parser where
pure = Parser (\s -> [(a,s)])
(Parser cs1) <*> (Parser cs2) = Parser (\s -> [(f a, s2) | (f, s1) <- cs1 s, (a, s2) <- cs2 s1])
instance Alternative Parser where
empty = Parser $ \s -> []
p <|> q = Parser $ \s ->
case parse p s of
[] -> parse q s
r -> r
The item function takes a character off the stream:
item :: Parser Char
item = Parser $ \s ->
case s of
[] -> []
(c:cs) -> [(c,cs)]
At this point, I want to implement digit. I can of course do this:
digit = Parser $ \s ->
case s of
[] -> []
(c:cs) -> if isDigit c then [(c, cs)] else []
but I'm replicating the code of item. I'd like to implement digit based on item.
How do I go about implementing digit, using item to take a character off the stream and then checking to see if the character is a digit without bringing monadic concepts into the implementation?
First, let us write down all the tools we currently have at hand:
-- Data constructor
Parser :: (String -> [(a, String)]) -> Parser a
-- field accessor
parse :: Parser a -> String -> [(a, String)]
-- instances, replace 'f' by 'Parser'
fmap :: Functor f => (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
pure :: Applicative f => a -> f a
-- the parser at hand
item :: Parser Char
-- the parser we want to write with item
digit :: Parser Char
digit = magic item
-- ?
magic :: Parser Char -> Parser Char
The real question at hand is "what is magic"? There are only so many things we can use. Its type indicates fmap, but we can rule that out. All we can provide is some function a -> b, but there is no f :: Char -> Char that makes fmap f indicate a failure.
What about (<*>), can this help? Well, again, the answer is no. The only thing we can do here is to take the (a -> b) out of the context and apply it; whatever that means in the context of the given Applicative. We can rule pure out.
The problem is that we need to check the Char that item might parse and change the context. We need something like Char -> Parser Char
But we didn't rule Parser or parse out!
magic p = Parser $ \s ->
case parse p s of -- < item will be used here
[(c, cs)] -> if isDigit c then [(c, cs)] else []
_ -> []
Yes, I know, it's duplicate code, but now it's using item. It's using item before inspecting the character. That's the only way we can use item here. And now, there is some kind of sequence implied: item has to succeed before digit can do it's work.
Alternatively, we could have tried this way:
digit' c :: Char -> Parser Char
digit' c = if isDigit c then pure c else empty
But then fmap digit' item would have the type Parser (Parser Char), which can only get collapsed with a join-like function. That's why monads are more powerful than applicative.
That being said, you can get around all of the monad requirements if you use a more general function first:
satisfy :: (Char -> Bool) -> Parser Char
satisfy = Parser $ \s ->
case s of
(c:cs) | p c -> [(c, cs)]
_ -> []
You can then define both item and digit in terms of satisfy:
item = satisfy (const True)
digit = satisfy isDigit
That way digit does not have to inspect the result of a previous parser.
Functors allow you to act on somethings values. For example, if you have a list [1,2,3], you can change the contents. Note that Functors do not allow changing structure. map can not change the length of a list.
Applicatives allow you to combine structure, and the content is mushed together somehow. But the values can not change influence the structure.
Namely, given an item, we can change its structure, and we can change its content, but the content can not change the structure. We can't choose to fail on some content and not other.
If anyone knows how to state this more formally and provably, I'm all ears (it probably has to do with free theorems).

uu-parsinglib parsing with conditional fail

EDITED for more complete problem:
I'd like to create a parser (I'm using uu-parsinglib) that takes the result of a previous parser, and conditionally fails if the result contains a certain constructor:
I now realise this must be a monadic parser.
I have a grammar which contains non-direct left recursive. Below illustrates the problem, the reality is slightly more convoluted:
data Field =
Field_A A
Field_B B
Field_C C
Field_D String
data A =
A Prefix String
data B =
B Prefix String
data C =
C Prefix String
data Prefix =
Prefix Field
Most of the time I'm only interested in Field, and in the interests of minimising backtracking, its best to focus on that case.
I've defined an operator to help
(<..>) :: IsParser p => p (a -> b) -> p (b -> c) -> p (a -> c)
g <..> f = (.) <$> f <*> g
And I approach the problem as:
pField :: Parser Field
pField =
( Field_D <$> pString ) <??>
pChainl' ( pReturn (helper) <*> pPrefix' ) ( pA' <<|> pB' <<|> pC' )
where pChainl' :: IsParser p => p (f -> (pre -> f) -> f) ->
p (pre -> f) ->
p (f -> f)
pChainl' op x = must_be_non_empties "pChainl'" op x (
flip f <$> pList1 (flip <$> op <*> x)
)
f x [] = x
f x (func:rest) = f (func x) rest
helper :: (Field -> Prefix) ->
Field ->
(Prefix -> Field) ->
Field
helper p i n = n $ p i
Note I've defined a variant of pChainl that allows the initial field to be passed in, whilst keeping left association.
pA' :: Parser (Prefix -> Field)
pA' = ( (flip A) <$> pString ) <..> pReturn Field_A
pB' :: Parser (Prefix -> Field)
pB' = ( (flip B) <$> pString ) <..> pReturn Field_B
pC' :: Parser (Prefix -> Field)
pC' = ( (flip C) <$> pString ) <..> pReturn Field_C
-- This consumes no input
pPrefix' :: Parser (Field -> Prefix)
pPrefix' = pReturn Prefix
The question
I'd like to define
pA :: Parser A
in terms of pField, with a post filter to fail if the rightmost Field constructor is not Field_A. As has rightly been pointed out, this is a monadic parse. I can't find any compelling examples of using uu-parsinglib as a monadic parser, so what would your suggested approach be?
If I'm barking up the wrong tree, please let me know also.
It seems like you could make a generalized conditional parser that only succeeds if the value returned by the parser passes some test. This uses the monad capabilities of course. I am not sure if this is a good thing to do with uu-parsinglib however. It seems to work fine in my testing, with one exception: when the conditional fails and no other parsers are available to consume input, the library throws an exception. (something along the lines of no correcting steps given...)
pConditional :: Parser a -> (a -> Bool) -> Parser a
pConditional p test = p >>= (\result -> case (test result) of
True -> pure result
False -> empty)
I would also like to know of other pitfalls that would arise from liberal use of such a conditional parser. (if any.)
I think I've found a solution. I'm still interested in hearing thoughts on the best way to parse such indirect left recursion.
The proposed solution is
pA :: Parser A
pA = do
a <- pField
case a of
(Field_A r) -> return r
otherwise -> pFail

Resources