Parsing scheme dottedlist/list with Haskell - parsing

I am following this guide on writing a scheme interpreter. In trying to left-factor the grammar for DottedList/List, I came up with this:
E -> (H T)
H -> E H'
H' -> <space> H
H' -> <term>
T -> <term>
T -> <space> . <space> E
--
spaces :: Parser ()
spaces = skipMany1 (space <?> "spaces")
parseExpr :: Parser LispVal
parseExpr = (... omitted ...) <|>
do char '('
h <- sepBy parseExpr spaces
t <- optionMaybe ((spaces' >> char '.' >> spaces' >> parseExpr) <?> "parseDotExpr failed")
z <- if isJust t then return $ DottedSuffix $ fromJust t else return Tail
z' <- case z of Tail -> return $ List x
DottedSuffix s -> return $ DottedList x s
char ')'
return z'
Unfortunately this doesn't handle the basic dottedlists:
test/Spec.hs:23:
1) test eval 1 evals DottedList
expected: "(1 2 . 1)"
but got: "Parse error at \"lisp\" (line 1, column 7):\nunexpected \".\"\nexpecting spaces' or parseExpr!"
test/Spec.hs:26:
2) test eval 1 evals DottedList (quoted)
expected: "((1 2) . 1)"
but got: "Parse error at \"lisp\" (line 1, column 15):\nunexpected \".\"\nexpecting spaces' or parseExpr!"
test/Spec.hs:29:
3) test eval 1 evals DottedList (sugared)
expected: "((1 2) . 1)"
but got: "Parse error at \"lisp\" (line 1, column 9):\nunexpected \".\"\nexpecting spaces' or parseExpr!"
Update:
From #pat's response, I got my tests to pass using:
parseExpr :: Parser LispVal
parseExpr = {- omitted -}
<|> do char '('
x <- many1 (do e <- parseExpr; spaces'; return e)
{- omitted -}

The sepBy parser is seeing the space before the dot, and committing to parse another expression, which fails.
You should have lexemes consume and discard trailing spaces (see parsec's lexeme) and change the sepBy to just many1. The optionMaybe can then commit after seeing a dot, which would otherwise have required a try.

Related

Using chainl1 correctly with infix in Haskell

For the MVE code below it outputs [] rather than the expected Not (Oper Eq 2 2)) for the input parseString "2+2" which is supposed to call pOper. My guess is that pOper would expect three arguments for the anonymous function to work. That is 3 strings. However due to partial call of a function only one argument is passed. Is there a way to work around to preserve the type signature of pOper while dealing with the Not and at the same time not changing the type definitions?
import Data.Char
import Text.ParserCombinators.ReadP
import Control.Applicative ((<|>))
type Parser a = ReadP a
data Value =
IntVal Int
deriving (Eq, Show, Read)
data Exp =
Const Value
| Oper Op Exp Exp
| Not Exp
deriving (Eq, Show, Read)
data Op = Plus | Minus | Eq
deriving (Eq, Show, Read)
space :: Parser Char
space = satisfy isSpace
spaces :: Parser String
spaces = many space
space1 :: Parser String
space1 = many1 space
symbol :: String -> Parser String
symbol = token . string
token :: Parser a -> Parser a
token combinator = (do spaces
combinator)
parseString input = readP_to_S (do
e <- pExpr
token eof
return e) input
pExpr :: Parser Exp
pExpr = chainl1 pTerm pOper
pTerm :: Parser Exp
pTerm =
(do
pv <- numConst
skipSpaces
return pv)
numConst :: Parser Exp
numConst =
(do
skipSpaces
y <- munch isDigit
return (Const (IntVal (read y)))
)
-- Parser for an operator
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = symbol "+" >> return (Oper Plus)
<|> (symbol "-" >> return (Oper Minus))
<|> (symbol "=" >> return (Oper Eq))
<|> (symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2)))
There's nothing wrong with your parser for !=. Rather, your parser for operators in general is broken: it only parses the first operator correctly. A simpler version of your pOper would be
pOper = a >> b
<|> (c >> d)
But because of precedence, this isn't the same as (a >> b) <|> (c >> d). Actually, it's a >> (b <|> (c >> d))! So the symbol your first alternative parses is accidentally mandatory. It would parse 2+!=2 instead.
So, you could fix this by just adding in the missing parentheses. But if, like me, you find it a little tacky to rely so much on operator precedence for semantic meaning, consider something that's more obviously safe, using the type system to separate the clauses from the delimiters:
pOper :: ReadP (Exp -> Exp -> Exp)
pOper = asum [ symbol "+" >> return (Oper Plus)
, symbol "-" >> return (Oper Minus)
, symbol "=" >> return (Oper Eq)
, symbol "!=" >> return (\e1 e2 -> Not (Oper Eq e1 e2))
]
This way, you have a list of independent parsers, not a single parser built with alternation. asum (from Control.Applicative) does the work of combining that list into alternatives. It means the same thing, of course, but it means you don't have to learn any operator precedence tables, because , can only be a list item separator.
The best way I can think of to solve the problem is by creating these to modificatoins: 1) this alternative in the expression
pExpr :: Parser Exp
pExpr =
(do pv <- chainl1 pTerm pOper
pv2 <- pOper2 pv
return pv2)
<|> chainl1 pTerm pOper
And 2) this helper function to deal with infix patterns
pOper2 :: Exp -> Parser Exp
pOper2 e1 = (do
symbol "!="
e2 <- numConst
return (Not (Oper Eq e1 e2)))
This is the output, althought I don't know if there will be problems if other operations such as / and * which has different associativety are to be taken into account as well.
parseString "2+4+6"
[(Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6)),"")]
ghci> parseString "2+4+6 != 2"
[(Not (Oper Eq (Oper Plus (Oper Plus (Const (IntVal 2)) (Const (IntVal 4))) (Const (IntVal 6))) (Const (IntVal 2))),"")]
ghci> parseString "2 != 4"
[(Not (Oper Eq (Const (IntVal 2)) (Const (IntVal 4))),"")]

Parsec custom while loop parser only parses one statement in loop body

I'm trying to write a parser to parse a loop in the following form:
(:= x 0)
Do ((< x 10))->
(:= x (+ x 1))
print(x)
Od
What's occurring however is that my parser can only work for a loop whose body contains only one statement. To parse more than one statement, the body above would have to be written in the following way:
(:= x (+ x 1))(:= x 20)
I have tried to use delimiters such as semi-colons to try and force the parser to allow for loop body parsing to be taken line by line the above behaviour persists such that it would have to be written like: (:= x (+ x 1));(:= x 20) instead of on separate lines.
Please find my parsers below:
parsersHStatement :: Parser HStatement
parsersHStatement = try (parsePrint) <|> try (parseDo) <|> try (parseEval)
parseLoopBody :: Parser [HStatement]
parseLoopBody = many1 $ parsersHStatement
parseDo :: Parser HStatement
parseDo = do
spaces
_ <- string "Do"
spaces
_ <- string "("
p <- try (parseExpr) <|> try (parseBool)
_ <- string ")->"
spaces
q <- parseLoopBody <* spaces
spaces
_ <- string "Od"
return $ Do p q
parseEval :: Parser HStatement
parseEval = liftM Eval $ parsersHVal
parsersHVal :: Parser HVal
parsersHVal = try (parseAssign) <|> try (parsePrimitiveValue) <|> try (parseExpr)
parsePrint :: Parser HStatement
parsePrint = string "print(" *> parsersHVal <* string ")" >>= (return . Print)
parseExpr :: Parser HVal
parseExpr = do
char '('
spaces
op <- try (parseOperation)
spaces
x <- try (sepBy (parseExpr <|> parseVarOrInt) spaces)
spaces
char ')'
return $ Expr op x
parseBool :: Parser HVal
parseBool = classifyBool <$> ( (string "True") <|> (string "False") )
where
classifyBool "True" = Bool True
classifyBool "False" = Bool False
Within parseLoopBody, I tried 'feeding' spaces (many1 $ spaces *> ...) but nothing would parse then.
The following is the ADT:
data HVal
= Integer Integer
| Var String
| Bool Bool
| List [HVal]
| Expr Operation [HVal]
| Assign HVal HVal
deriving (Show, Eq, Read)
data HStatement
= Eval HVal -- Bridge between HVal and HStatement
| Print HVal
| Do HVal [HStatement]
deriving (Show, Eq, Read)
parseDo was altered to the following :
parseDo :: Parser HStatement
parseDo = do
string "Do"
spaces
string "("
p <- try (parseExpr) <|> try (parseBool)
string ")->"
spaces
q <- many1 $ parsersHStatement
spaces
string "Od"
return $ Do p q
This allows for the parsing of two statements but the second statement breaks the loop.
After a lot of fiddling around, it seemed that the error lay responsible on the function parseEval. This was changed to:
parseEval :: Parser HStatement
parseEval = do
x <- try (parseAssign) <|> try (parseExpr)
spaces
return $ Eval x
Furthermore my parseDo function was changed to:
parseDo :: Parser HStatement
parseDo = do
string "Do"
spaces
string "("
p <- try (parseExpr) <|> try (parseBool)
string ")->"
spaces
q <- many1 $ parsersHStatement
spaces
string "Od"
return $ Do p q

Haskell : Non-Exhaustive Pattern In Function Prevents Another Function From Executing Even Though Its Not Used

I'm trying to implement car, cdr, and cons functionality into a toy language I'm writing however when I try to execute my car function through main, I get the following error:
./parser "car [1 2 3]"
parser: parser.hs:(48,27)-(55,45): Non-exhaustive patterns in case
The function on lines 48-55 is the following:
parseOp :: Parser HVal
parseOp = (many1 letter <|> string "+" <|> string "-" <|> string "*" <|> string "/" <|> string "%" <|> string "&&" <|> string "||") >>=
(\x -> return $ case x of
"&&" -> Op And
"||" -> Op Or
"+" -> Op Add
"-" -> Op Sub
"*" -> Op Mult
"/" -> Op Div
"%" -> Op Mod)
I'm really unsure why the error message points to this function because it has nothing to do with the list functionality. The car function is working however because I was able to successfully execute it through GHCI. I know my problem is due to parsing but I don't see where it is. The following are the functions that relate to lists. I can't see from them how they are influenced by parseOp.
data HVal = Number Integer
| String String
| Boolean Bool
| List [HVal]
| Op Op
| Expr Op HVal HVal
| Car [HVal]
deriving (Read)
car :: [HVal] -> HVal
car xs = head xs
parseListFunctions :: Parser HVal
parseListFunctions = do
_ <- string "car "
_ <- char '['
x <- parseList
_ <- char ']'
return $ Car [x]
parseExpr :: Parser HVal
parseExpr = parseNumber
<|> parseOp
<|> parseBool
<|> parseListFunctions
<|> do
_ <- char '['
x <- parseList
_ <- char ']'
return x
<|> do
_ <- char '('
x <- parseExpression
_ <- char ')'
return x
eval :: HVal -> HVal
eval val#(Number _) = val
eval val#(String _) = val
eval val#(Boolean _) = val
eval val#(List _) = val -- Look at list eval NOT WORKING
eval val#(Op _) = val
eval (Expr op x y) = eval $ evalExpr (eval x) op (eval y)
eval (Car xs) = eval $ car xs
The removal of many1 letter in parseOp transfers the same error to the following function parseBool:
parseBool :: Parser HVal
parseBool = many1 letter >>= (\x -> return $ case x of
"True" -> Boolean True
"False" -> Boolean False)
You write
parseExpr = ... <|> parseOp <|> ... <|> parseListFunctions <|> ...
and so
car ...
is passed to parseOp first, then parseListFunctions. The parseOp parser succeeds in the
many1 letter
branch, and so in the \x -> return $ case x of ..., x is bound to "car". Because parseOp succeeds (and returns an error value with an embedded, not-yet-evaluated inexhaustive case error!), parseListFunctions is never tried.
You will need to modify your grammar to reduce the ambiguity in it, so that these conflicts where multiple branches may match do not arise.

Haskell, Parsec - Parsing application in lambda calculus

I am writing a simple parser/interpreter for lambda calculus in Haskell.
I am using the syntax "/(x.x)" for a lambda function, "x" for a name and "f y" for application, such that "/(x.x) y" is the application of identity to y and should return y.
My strategy is to have a parser for names, lambda functions and applications. The name and function parsers work, but not the application parser.
When I try to parse "/(x.x) y" it fails. It should try to parse the source as a name, as a lambda function, and after not being able to use both (because it would not parse all the source) try to parse it as an application and then succeed.
My code is:
module Main where
import Text.ParserCombinators.Parsec
data Expression =
Var String
| Lambda Expression Expression
| Application Expression Expression
deriving Eq
instance Show Expression where
show (Var v) = v
show (Lambda v e) = "/(" ++ show v ++ "." ++ show e ++ ")"
show (Application e1 e2) = "" ++ show e1 ++ " " ++ show e2 ++ ""
-- Parser
parseSource :: String -> Expression
parseSource source =
case parse (parseExpression <* eof) "" source of
Right e -> e
_ -> error "Parsing error"
parseExpression :: Parser Expression
parseExpression =
parseVar <|> parseLambda <|> parseApplication
parseVar :: Parser Expression
parseVar = Var <$> many1 letter
parseLambda :: Parser Expression
parseLambda = do
_ <- char '/'
_ <- char '('
arg <- parseVar
_ <- char '.'
body <- parseExpression
_ <- char ')'
return $ Lambda arg body
parseApplication :: Parser Expression
parseApplication = do
e1 <- parseExpression
_ <- spaces
e2 <- parseExpression
return $ Application e1 e2

Haskell Parsec - error messages are less helpful while using custom tokens

I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens.
Here are some examples of Parsec's error messages while using Char tokens:
ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf wrong"
parse error at (line 1, column 7):
unexpected "w"
expecting space or "ok"
ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong"
parse error at (line 1, column 1):
unexpected "w"
expecting "ok" or "nop"
So, string parser shows what string is expected when found an unexpected string, and choice parser shows what are alternatives.
But when I use same combinators with my tokens:
ghci> Parser.parseTest ((tok $ Ide "asdf") >> (tok $ Ide "ok")) "asdf "
parse error at "test" (line 1, column 1):
unexpected end of input
In this case, it doesn't print what was expected.
ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf "
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 1))
And when I use choice, it doesn't print alternatives.
I expect this behavior to be related with combinator functions, and not with tokens, but seems like I'm wrong. How can I fix this?
Here's the full lexer + parser code:
Lexer:
module Lexer
( Token(..)
, TokenPos(..)
, tokenize
) where
import Text.ParserCombinators.Parsec hiding (token, tokens)
import Control.Applicative ((<*), (*>), (<$>), (<*>))
data Token = Ide String
| Number String
| Bool String
| LBrack
| RBrack
| LBrace
| RBrace
| Keyword String
deriving (Show, Eq)
type TokenPos = (Token, SourcePos)
ide :: Parser TokenPos
ide = do
pos <- getPosition
fc <- oneOf firstChar
r <- optionMaybe (many $ oneOf rest)
spaces
return $ flip (,) pos $ case r of
Nothing -> Ide [fc]
Just s -> Ide $ [fc] ++ s
where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_"
rest = firstChar ++ ['0'..'9']
parsePos p = (,) <$> p <*> getPosition
lbrack = parsePos $ char '[' >> return LBrack
rbrack = parsePos $ char ']' >> return RBrack
lbrace = parsePos $ char '{' >> return LBrace
rbrace = parsePos $ char '}' >> return RBrace
token = choice
[ ide
, lbrack
, rbrack
, lbrace
, rbrace
]
tokens = spaces *> many (token <* spaces)
tokenize :: SourceName -> String -> Either ParseError [TokenPos]
tokenize = runParser tokens ()
Parser:
module Parser where
import Text.Parsec as P
import Control.Monad.Identity
import Lexer
parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
case tokenize "test" s of
Left e -> putStrLn $ show e
Right ts' -> P.parseTest p ts'
tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = token show snd test
where test (t', _) = case t == t' of
False -> Nothing
True -> Just t
SOLUTION:
Ok, after fp4me's answer and reading Parsec's Char source more carefully, I ended up with this:
{-# LANGUAGE FlexibleContexts #-}
module Parser where
import Text.Parsec as P
import Control.Monad.Identity
import Lexer
parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
case tokenize "test" s of
Left e -> putStrLn $ show e
Right ts' -> P.parseTest p ts'
type Parser a = Parsec [TokenPos] () a
advance :: SourcePos -> t -> [TokenPos] -> SourcePos
advance _ _ ((_, pos) : _) = pos
advance pos _ [] = pos
satisfy :: (TokenPos -> Bool) -> Parser Token
satisfy f = tokenPrim show
advance
(\c -> if f c then Just (fst c) else Nothing)
tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = (Parser.satisfy $ (== t) . fst) <?> show t
Now I'm getting same error messages:
ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) " asdf"
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 3))
expecting Ide "ok" or Ide "nop"
A beginning of solution can be to define your choice function in the Parser,
use a specific unexpected function to override unexpected error and finally
use the <?> operator to override the expecting message:
mychoice [] = mzero
mychoice (x:[]) = (tok x <|> myUnexpected) <?> show x
mychoice (x:xs) = ((tok x <|> mychoice xs) <|> myUnexpected) <?> show (x:xs)
myUnexpected = do
input <- getInput
unexpected $ (id $ first input )
where
first [] = "eof"
first (x:xs) = show $ fst x
and call your parser like that :
ghci> Parser.parseTest (mychoice [Ide "ok", Ide "nop"]) "asdf "
parse error at (line 1, column 1):
unexpected Ide "asdf"
expecting [Ide "ok",Ide "nop"]

Resources