Currently, I have the following code:
import Control.Applicative ((<|>))
import Text.Parsec (ParseError, endBy, sepBy, try)
import Text.Parsec.String (Parser)
import qualified Data.Char as Char
import qualified Text.Parsec as Parsec
data Operation = Lt | Gt deriving (Show)
data Value =
Raw String
| Op Operation
deriving (Show)
sampleStr :: String
sampleStr = unlines
[ "#BEGIN#"
, "x <- 3.14 + 2.72;"
, "x < 10;"
]
gtParser :: Parser Value
gtParser = do
Parsec.string "<"
return $ Op Gt
ltParser :: Parser Value
ltParser = do
Parsec.string ">"
return $ Op Lt
opParser :: Parser Value
opParser = gtParser <|> ltParser
rawParser :: Parser Value
rawParser = do
str <- Parsec.many1 $ Parsec.satisfy $ not . Char.isSpace
return $ Raw str
valueParser :: Parser Value
valueParser = try opParser <|> rawParser
eolParser :: Parser Char
eolParser = try (Parsec.char ';' >> Parsec.endOfLine)
<|> Parsec.endOfLine
lineParser :: Parser [Value]
lineParser = sepBy valueParser $ Parsec.many1 $ Parsec.char ' '
fileParser :: Parser [[Value]]
fileParser = endBy lineParser eolParser
parse :: String -> Either ParseError [[Value]]
parse = Parsec.parse fileParser "fail..."
main :: IO ()
main = print $ parse sampleStr
This will fail with the message
Left "fail..." (line 2, column 4):
unexpected "-"
expecting " ", ";" or new-line
To my understanding, since I have try opParser, after Parsec sees that the token <- cannot be parsed by opParser, it should go to rawParser. (It is essentially a lookahead).
What is my misunderstanding, and how do I fix this error?
You can replicate the problem with the smaller test case:
> Parsec.parse fileParser "foo" "x <- 3.14"
The problem is that fileParser first calls lineParser, which successfully parses "x <" into [Raw "x", Op Gt] and leaves "- 3.14" yet to be parsed. Unfortunately, fileParser now expects to parse something with eolParser, but eolParser can't parse "- 3.14" because it starts with neither a semicolon nor an endOfLine.
Your try opParser has no effect here because opParser successfully parses <, so there's nothing to backtrack from.
There are many ways you might fix the problem. If <- is the only case where a < might be misparsed, you could exclude this case with notFollowedBy:
gtParser :: Parser Value
gtParser = do
Parsec.string "<"
notFollowedBy $ Parsec.string "-"
return $ Op Gt
I was playing around with Haskell's parsec library. I was trying to parse a hexadecimal string of the form "#x[0-9A-Fa-f]*" into an integer. This the code I thought would work:
module Main where
import Control.Monad
import Numeric
import System.Environment
import Text.ParserCombinators.Parsec hiding (spaces)
parseHex :: Parser Integer
parseHex = do
string "#x"
x <- many1 hexDigit
return (fst (head (readHex x)))
testHex :: String -> String
testHex input = case parse parseHex "lisp" input of
Left err -> "Does not match " ++ show err
Right val -> "Matched" ++ show val
main :: IO ()
main = do
args <- getArgs
putStrLn (testHex (head args))
And then I tried testing the testHex function in Haskell's repl:
GHCi, version 8.6.5: http://www.haskell.org/ghc/ :? for help
[1 of 1] Compiling Main ( src/Main.hs, interpreted )
Ok, one module loaded.
*Main> testHex "#xcafebeef"
"Matched3405692655"
*Main> testHex "#xnothx"
"Does not match \"lisp\" (line 1, column 3):\nunexpected \"n\"\nexpecting hexadecimal digit"
*Main> testHex "#xcafexbeef"
"Matched51966"
The first and second try work as intended. But in the third one, the string is matching upto the invalid character. I do not want the parser to do this, but rather not match if any digit in the string is not a valid string. Why is this happening, and how do if fix this?
Thank you!
You need to place eof at the end.
parseHex :: Parser Integer
parseHex = do
string "#x"
x <- many1 hexDigit
eof
return (fst (head (readHex x)))
Alternatively, you can compose it with eof where you use it if you want to reuse parseHex in other places.
testHex :: String -> String
testHex input = case parse (parseHex <* eof) "lisp" input of
Left err -> "Does not match " ++ show err
Right val -> "Matched" ++ show val
I am writing a simple parser/interpreter for lambda calculus in Haskell.
I am using the syntax "/(x.x)" for a lambda function, "x" for a name and "f y" for application, such that "/(x.x) y" is the application of identity to y and should return y.
My strategy is to have a parser for names, lambda functions and applications. The name and function parsers work, but not the application parser.
When I try to parse "/(x.x) y" it fails. It should try to parse the source as a name, as a lambda function, and after not being able to use both (because it would not parse all the source) try to parse it as an application and then succeed.
My code is:
module Main where
import Text.ParserCombinators.Parsec
data Expression =
Var String
| Lambda Expression Expression
| Application Expression Expression
deriving Eq
instance Show Expression where
show (Var v) = v
show (Lambda v e) = "/(" ++ show v ++ "." ++ show e ++ ")"
show (Application e1 e2) = "" ++ show e1 ++ " " ++ show e2 ++ ""
-- Parser
parseSource :: String -> Expression
parseSource source =
case parse (parseExpression <* eof) "" source of
Right e -> e
_ -> error "Parsing error"
parseExpression :: Parser Expression
parseExpression =
parseVar <|> parseLambda <|> parseApplication
parseVar :: Parser Expression
parseVar = Var <$> many1 letter
parseLambda :: Parser Expression
parseLambda = do
_ <- char '/'
_ <- char '('
arg <- parseVar
_ <- char '.'
body <- parseExpression
_ <- char ')'
return $ Lambda arg body
parseApplication :: Parser Expression
parseApplication = do
e1 <- parseExpression
_ <- spaces
e2 <- parseExpression
return $ Application e1 e2
I need to write a code that parses some language. I got stuck on parsing variable name - it can be anything that is at least 1 char long, starts with lowercase letter and can contain underscore '_' character. I think I made a good start with following code:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
This function however gives me a weird results. If I call my test function
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
like this:
test_parseIdents "test"
I get this:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
Note that Parser is just synonym for ReadP a.
I also want to encode in the parser that variable names should start with a lowercase character.
Thank you for your help.
Part of the problem is with your use of the +++ operator. The following code works for me:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
So what went wrong:
+++ imposes an order on its arguments, and allows for multiple alternatives to succeed (symmetric choice). <++ is left-biased so only the left-most option succeeds -> this would remove the ambiguity in the parse, but still leaves the next problem.
Your parser was looking for letters first, then digits, and finally underscores. Digits after underscores failed, for example. The parser had to be modified to munch characters that were either letters, digits or underscores.
I also removed some functions that were unused and made an educated guess for the definition of your datatypes.
I'm working on seperating lexing and parsing stages of a parser. After some tests, I realized error messages are less helpful when I'm using some tokens other than Parsec's Char tokens.
Here are some examples of Parsec's error messages while using Char tokens:
ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf wrong"
parse error at (line 1, column 7):
unexpected "w"
expecting space or "ok"
ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong"
parse error at (line 1, column 1):
unexpected "w"
expecting "ok" or "nop"
So, string parser shows what string is expected when found an unexpected string, and choice parser shows what are alternatives.
But when I use same combinators with my tokens:
ghci> Parser.parseTest ((tok $ Ide "asdf") >> (tok $ Ide "ok")) "asdf "
parse error at "test" (line 1, column 1):
unexpected end of input
In this case, it doesn't print what was expected.
ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf "
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 1))
And when I use choice, it doesn't print alternatives.
I expect this behavior to be related with combinator functions, and not with tokens, but seems like I'm wrong. How can I fix this?
Here's the full lexer + parser code:
Lexer:
module Lexer
( Token(..)
, TokenPos(..)
, tokenize
) where
import Text.ParserCombinators.Parsec hiding (token, tokens)
import Control.Applicative ((<*), (*>), (<$>), (<*>))
data Token = Ide String
| Number String
| Bool String
| LBrack
| RBrack
| LBrace
| RBrace
| Keyword String
deriving (Show, Eq)
type TokenPos = (Token, SourcePos)
ide :: Parser TokenPos
ide = do
pos <- getPosition
fc <- oneOf firstChar
r <- optionMaybe (many $ oneOf rest)
spaces
return $ flip (,) pos $ case r of
Nothing -> Ide [fc]
Just s -> Ide $ [fc] ++ s
where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_"
rest = firstChar ++ ['0'..'9']
parsePos p = (,) <$> p <*> getPosition
lbrack = parsePos $ char '[' >> return LBrack
rbrack = parsePos $ char ']' >> return RBrack
lbrace = parsePos $ char '{' >> return LBrace
rbrace = parsePos $ char '}' >> return RBrace
token = choice
[ ide
, lbrack
, rbrack
, lbrace
, rbrace
]
tokens = spaces *> many (token <* spaces)
tokenize :: SourceName -> String -> Either ParseError [TokenPos]
tokenize = runParser tokens ()
Parser:
module Parser where
import Text.Parsec as P
import Control.Monad.Identity
import Lexer
parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
case tokenize "test" s of
Left e -> putStrLn $ show e
Right ts' -> P.parseTest p ts'
tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = token show snd test
where test (t', _) = case t == t' of
False -> Nothing
True -> Just t
SOLUTION:
Ok, after fp4me's answer and reading Parsec's Char source more carefully, I ended up with this:
{-# LANGUAGE FlexibleContexts #-}
module Parser where
import Text.Parsec as P
import Control.Monad.Identity
import Lexer
parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO ()
parseTest p s =
case tokenize "test" s of
Left e -> putStrLn $ show e
Right ts' -> P.parseTest p ts'
type Parser a = Parsec [TokenPos] () a
advance :: SourcePos -> t -> [TokenPos] -> SourcePos
advance _ _ ((_, pos) : _) = pos
advance pos _ [] = pos
satisfy :: (TokenPos -> Bool) -> Parser Token
satisfy f = tokenPrim show
advance
(\c -> if f c then Just (fst c) else Nothing)
tok :: Token -> ParsecT [TokenPos] () Identity Token
tok t = (Parser.satisfy $ (== t) . fst) <?> show t
Now I'm getting same error messages:
ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) " asdf"
parse error at (line 1, column 1):
unexpected (Ide "asdf","test" (line 1, column 3))
expecting Ide "ok" or Ide "nop"
A beginning of solution can be to define your choice function in the Parser,
use a specific unexpected function to override unexpected error and finally
use the <?> operator to override the expecting message:
mychoice [] = mzero
mychoice (x:[]) = (tok x <|> myUnexpected) <?> show x
mychoice (x:xs) = ((tok x <|> mychoice xs) <|> myUnexpected) <?> show (x:xs)
myUnexpected = do
input <- getInput
unexpected $ (id $ first input )
where
first [] = "eof"
first (x:xs) = show $ fst x
and call your parser like that :
ghci> Parser.parseTest (mychoice [Ide "ok", Ide "nop"]) "asdf "
parse error at (line 1, column 1):
unexpected Ide "asdf"
expecting [Ide "ok",Ide "nop"]