Expressing basic number parser using >>= - parsing

I am working through the WikiBook "Write Yourself A Scheme in 48 Hours."
The Haskell library Parsec is being used to parse basic expressions, such as numbers (as shown in the code below).
import Lib
import Text.ParserCombinators.Parsec hiding (spaces)
import System.Environment
import Control.Monad
import Data.Typeable ( typeOf )
import Debug.Trace
data LispVal = Atom String
| List [LispVal]
| DottedList [LispVal] LispVal
| Number Integer
| String String
| Bool Bool
deriving Show
-- ...
parseNumber :: Parser LispVal
parseNumber = do
x <- many1 digit
return $ Number (read x)
One of the exercises in the book asks the reader to rewrite parseNumber using >>= notation instead. However, I keep running into scary-looking type mismatch errors. Can someone please show me how to rewrite the function using >>= notation? Or at least give me a hint?

The Haskell report has a section on do notation and how to "desugar" these do blocks.
If you write a do block as:
parseNumber = do
x <- many1 digit
return (Number (read x))
Then this is syntactically equivalent to:
parseNumber :: Parser LispVal
parseNumber = many1 digit >>= \x -> return (Number (read x))
or more elegant:
parseNumber :: Parser LispVal
parseNumber = many1 digit >>= return . Number . read
We however do not need to work with >>=. Indeed, if we want to apply a function on the item that will be constructed with the Parser, then we can use fmap :: Functor f => (a -> b) -> f a -> f b or (<$>) :: Functor f => (a -> b) -> f a -> f b for this:
parseNumber :: Parser LispVal
parseNumber = Number . read <$> many1 digit

Related

Haskell monadic parser with anamorphisms

My problem is how to combine the recursive, F-algebra-style recursive type definitions, with monadic/applicative-style parsers, in way that would scale to a realistic programming language.
I have just started with the Expr definition below:
data ExprF a = Plus a a |
Val Integer deriving (Functor,Show)
data Rec f = In (f (Rec f))
type Expr = Rec ExprF
and I am trying to combine it with a parser which uses anamorphisms:
ana :: Functor f => (a -> f a) -> a -> Rec f
ana psi x = In $ fmap (ana psi) (psi x)
parser = ana psi
where psi :: String -> ExprF String
psi = ???
as far as I could understand, in my example, psi should either parse just an integer, or it should decide that the string is a <expr> + <expr> and then (by recursively calling fmap (ana psi)), it should parse the left-hand side and the right-hand side expressions.
However, (monadic/applicative) parsers don't work like that:
they first attempt parsing the left-hand expression,
the +,
and the right-hand expression
One solution that I see, is to change the type definition for Plus a a to Plus Integer a, such that it reflects the parsing process, however this doesn't seem like the best avenue.
Any suggestions (or reading directions) would be welcome!
If you need a monadic parser, you need a monad in your unfold:
anaM :: (Traversable f, Monad m) => (a -> m (f a)) -> a -> m (Rec f)
anaM psiM x = In <$> (psiM x >>= traverse (anaM psiM))
Then you can write something that parses just one level of an ExprF like this:
parseNum :: Parser Integer
parseNum = -- ...
char :: Char -> Parser Char
char c = -- ...
parseExprF :: Maybe Integer -> Parser (ExprF (Maybe Integer))
parseExprF (Just n) = pure (Val n)
parseExprF Nothing = do
n <- parseNum
empty
<|> (Plus (Just n) Nothing <$ char '+')
<|> (pure (Val n))
Given that, you now have your recursive Expr parser:
parseExpr :: Parser Expr
parseExpr = anaM parseExprF Nothing
You will need to have instances of Foldable and Traversable for ExprF, of course, but the compiler can write these for you and they are not themselves recursive.

Haskell : Operator Parser keeps going to undefined rather than inputs

I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)

Writing Parser for S Expressions

I'm trying to write a Parser for S Expressions from Prof. Yorgey's 2013 homework.
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Given the following definitions, presented in the homework:
type Ident = String
-- An "atom" is either an integer value or an identifier.
data Atom = N Integer | I Ident
deriving Show
-- An S-expression is either an atom, or a list of S-expressions.
data SExpr = A Atom
| Comb [SExpr]
deriving Show
I wrote a parser for Parser Atom and Parser SExpr for A Atom.
parseAtom :: Parser Atom
parseAtom = alt n i
where n = (\_ z -> N z) <$> spaces <*> posInt
i = (\ _ z -> I z) <$> spaces <*> ident
parseAAtom :: Parser SExpr
parseAAtom = fmap (\x -> A x) parseAtom
Then, I attempted to write a parser to handle a Parser SExpr for the Comb ... case:
parseComb :: Parser SExpr
parseComb = (\_ _ x _ _ _ -> x) <$> (zeroOrMore spaces) <*> (char '(') <*>
(alt parseAAtom parseComb) <*> (zeroOrMore spaces)
<*> (char ')') <*> (zeroOrMore spaces)
Assuming that parseComb was right, I could simply make usage of oneOrMore for Parser [SExpr].
parseCombElements :: Parser [SExpr]
parseCombElements = oneOrMore parseComb
So, my two last functions compile, but running runParser parseComb "( foo )" never terminates.
What's wrong with my parseComb definition? Please don't give me the whole answer, but rather a hint - for my own learning.
I am very suspicious of zeroOrMore spaces, because spaces is usually a parser which itself parses zero or more spaces. Which means that it can parse the empty string if there aren't any spaces at that point. In particular, the spaces parser always succeeds.
But when you apply zeroOrMore to a parser that always succeeds, the combined parser will never stop - because zeroOrMore only stops trying again once its parser argument fails.
As an aside, Applicative expressions like (\_ _ x _ _ _ -> x) <$> ... <*> ... <*> ...... which only use a single of the subparsers can usually be written more succinctly with the *> and <* combinators:
... *> ... *> x_parser_here <* ... <* ...

Parsing an expression grammar having function application with parser combinators (left-recursion)

As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:
integer numbers
identifiers ([a-zA-Z]+)
arithmetic expressions with + and * and parenthesis
structure access with . (eg foo.bar.buz)
tuples (eg (1, foo, bar.buz)) (to remove ambiguity one-tuples are written as (x,))
function application (eg foo(1, bar, buz()))
functions are first class so they can also be returned from other functions and directly be applied (eg foo()() is legal because foo() might return a function)
So a fairly complex program in this language is
(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)
the associativity is supposed to be
( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )
I'm currently using the very nice uu-parsinglib an applicative parser combinator library.
The first problem was obviously that the intuitive expression grammar (expr -> identifier | number | expr * expr | expr + expr | (expr) is left-recursive. But I could solve that problem using the the pChainl combinator (see parseExpr in the example below).
The remaining problem (hence this question) is function application with functions returned from other functions (f()()). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list ). Any ideas how I can solve this problem elegantly using uu-parsinglib? (the problem should directly apply to parsec, attoparsec and other parser combinators as well I guess).
See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
module TestExprGrammar
(
) where
import Data.Foldable (asum)
import Data.List (intercalate)
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
data Node =
NumberLiteral Integer
| Identifier String
| Tuple [Node]
| MemberAccess Node Node
| FunctionCall Node [Node]
| BinaryOperation String Node Node
parseFunctionCall :: Parser Node
parseFunctionCall =
FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> parseParenthesisedNodeList 0
operators :: [[(Char, Node -> Node -> Node)]]
operators = [ [('+', BinaryOperation "+")]
, [('*' , BinaryOperation "*")]
, [('.', MemberAccess)]
]
samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]
parseExpr :: Parser Node
parseExpr =
foldr pChainl
(parseIdentifier
<|> parseNumber
<|> parseTuple
<|> parseFunctionCall
<|> pParens parseExpr
)
(map samePrio operators)
parseNodeList :: Int -> Parser [Node]
parseNodeList n =
case n of
_ | n < 0 -> parseNodeList 0
0 -> pListSep (pSymbol ",") parseExpr
n -> (:) <$>
parseExpr
<* pSymbol ","
<*> parseNodeList (n-1)
parseParenthesisedNodeList :: Int -> Parser [Node]
parseParenthesisedNodeList n = pParens (parseNodeList n)
parseIdentifier :: Parser Node
parseIdentifier = Identifier <$> pSome pLetter <* pSpaces
parseNumber :: Parser Node
parseNumber = NumberLiteral <$> pNatural
parseTuple :: Parser Node
parseTuple =
Tuple <$> parseParenthesisedNodeList 1
<|> Tuple [] <$ pSymbol "()"
instance Show Node where
show n =
let showNodeList ns = intercalate ", " (map show ns)
showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
in case n of
Identifier i -> i
Tuple ns -> showParenthesisedNodeList ns
NumberLiteral n -> show n
FunctionCall f args -> show f ++ showParenthesisedNodeList args
MemberAccess f g -> show f ++ "." ++ show g
BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"
Looking briefly at the list-like combinators for uu-parsinglib (I'm more familiar with parsec), I think you can solve this by folding over the result of the pSome combinator:
parseFunctionCall :: Parser Node
parseFunctionCall =
foldl' FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> pSome (parseParenthesisedNodeList 0)
This is also equivalent to the Alternative some combinator, which should indeed apply to the other parsing libs you mentioned.
I don't know this library but can show you how to remove left recursion. The standard right recursive expression grammar is
E -> T E'
E' -> + TE' | eps
T -> F T'
T' -> * FT' | eps
F -> NUMBER | ID | ( E )
To add function application you must decide its level of precedence. In most languages I've seen it is highest. So you'd add another layer of productions for function application.
E -> T E'
E' -> + TE' | eps
T -> AT'
T' -> * A T' | eps
A -> F A'
A' -> ( E ) A' | eps
F -> NUMBER | ID | ( E )
Yes this is a hairy-looking grammar and bigger than the left recursive one. That's the price of top-down predictive parsing. If you want simpler grammars use a bottom up parser generator a la yacc.

Custom ADT vs. Tree for parser return value

I'm using Parsec to build a simple Lisp parser.
What are the (dis)advantages of using a custom ADT for the parser types versus using a standard Tree (i.e. Data.Tree)?
After trying both ways, I've come up with a couple points for custom ADTs (i.e. Parser ASTNode):
seems to be much clearer and simpler
others have done it this way(including Tiger, which is/was bundled with Parsec)
and one against (i.e. Parser (Tree ASTNode):
Data.Tree already has Functor, Monad, etc. instances, which will be very helpful for semantic analysis, evaluation, calculating code statistics
For example:
custom ADT
import Text.ParserCombinators.Parsec
data ASTNode
= Application ASTNode [ASTNode]
| Symbol String
| Number Float
deriving (Show)
int :: Parser ASTNode
int = many1 digit >>= (return . Number . read)
symbol :: Parser ASTNode
symbol = many1 (oneOf ['a'..'z']) >>= (return . Symbol)
whitespace :: Parser String
whitespace = many1 (oneOf " \t\n\r\f")
app :: Parser ASTNode
app =
char '(' >>
sepBy1 expr whitespace >>= (\(e:es) ->
char ')' >>
(return $ Application e es))
expr :: Parser ASTNode
expr = symbol <|> int <|> app
example use:
ghci> parse expr "" "(a 12 (b 13))"
Right
(Application
(Symbol "a")
[Number 12.0, Application
(Symbol "b")
[Number 13.0]])
Data.Tree
import Text.ParserCombinators.Parsec
import Data.Tree
data ASTNode
= Application (Tree ASTNode)
| Symbol String
| Number Float
deriving (Show)
int :: Parser (Tree ASTNode)
int = many1 digit >>= (\x -> return $ Node (Number $ read x) [])
symbol :: Parser (Tree ASTNode)
symbol = many1 (oneOf ['a' .. 'z']) >>= (\x -> return $ Node (Symbol x) [])
whitespace :: Parser String
whitespace = many1 (oneOf " \t\n\r\f")
app :: Parser (Tree ASTNode)
app =
char '(' >>
sepBy1 expr whitespace >>= (\(e:es) ->
char ')' >>
(return $ Node (Application e) es))
expr :: Parser (Tree ASTNode)
expr = symbol <|> int <|> app
and example use:
ghci> parse expr "" "(a 12 (b 13))"
Right
(Node
(Application
(Node (Symbol "a") []))
[Node (Number 12.0) [],
Node
(Application
(Node (Symbol "b") []))
[Node (Number 13.0) []]])
(sorry for the formatting -- hopefully it's clear)
I'd absolutely go for the AST, because interpretation/compilation/language analysis in general is very much driven by the structure of your language. The AST will simply and naturally represent and respect that structure, while Tree will do neither.
For example, a common form of language implementation technique is to implement some complex features by translation: translate programs that involve those features or constructs into programs in a subset of the a language that does not use them (Lisp macros, for example, are all about this). If you use an AST, the type system will, for example, often forbid you from producing illegal translations as output. Whereas a Tree type that doesn't understand your program will not help there.
Your AST doesn't look very complicated, so writing utility functions for it should not be hard. Take this one for example:
foldASTNode :: (r -> [r] -> r) -> (String -> r) -> (Float -> r) -> r
foldASTNode app sym num node =
case node of
Application f args -> app (subfold f) (map subfold args)
Symbol str -> sym str
Number n -> num n
where subfold = foldASTNode app sym num
And in any case, what sort of Functor do you wish to have on your AST? There's no type parameter on it...

Resources