I'm practicing writing parsers. I'm using Tsodings JSON Parser video as reference. I'm trying to add to it by being able to parse arithmetic of arbitrary length and I have come up with the following AST.
data HVal
= HInteger Integer -- No Support For Floats
| HBool Bool
| HNull
| HString String
| HChar Char
| HList [HVal]
| HObj [(String, HVal)]
deriving (Show, Eq, Read)
data Op -- There's only one operator for the sake of brevity at the moment.
= Add
deriving (Show, Read)
newtype Parser a = Parser {
runParser :: String -> Maybe (String, a)
}
The following functions is my attempt of implementing the operator parser.
ops :: [Char]
ops = ['+']
isOp :: Char -> Bool
isOp c = elem c ops
spanP :: (Char -> Bool) -> Parser String
spanP f = Parser $ \input -> let (token, rest) = span f input
in Just (rest, token)
opLiteral :: Parser String
opLiteral = spanP isOp
sOp :: String -> Op
sOp "+" = Add
sOp _ = undefined
parseOp :: Parser Op
parseOp = sOp <$> (charP '"' *> opLiteral <* charP '"')
The logic above is similar to how strings are parsed therefore my assumption was that the only difference was looking specifically for an operator rather than anything that's not a number between quotation marks. It does seemingly begin to parse correctly but it then gives me the following error:
λ > runParser parseOp "\"+\""
Just ("+\"",*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
undefined, called at /DIRECTORY/parser.hs:110:11 in main:Main
I'm confused as to where the error is occurring. I'm assuming it's to do with sOp mainly due to how the other functions work as intended as the rest of parseOp being a translation of the parseString function:
stringLiteral :: Parser String
stringLiteral = spanP (/= '"')
parseString :: Parser HVal
parseString = HString <$> (charP '"' *> stringLiteral <* charP '"')
The only reason why I have sOp however is that if it was replaced with say Op, I would get the error that the following doesn't exist Op :: String -> Op. When I say this my inclination was that the string coming from the parsed expression would be passed into this function wherein I could return the appropriate operator. This however is incorrect and I'm not sure how to proceed.
charP and Applicative Instance
charP :: Char -> Parser Char
charP x = Parser $ f
where f (y:ys)
| y == x = Just (ys, x)
| otherwise = Nothing
f [] = Nothing
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input', a) <- q input
Just (input', f a)
The implementation of (<*>) is the culprit. You did not use input' in the next call to q, but used input instead. As a result you pass the string to the next parser without "eating" characters. You can fix this with:
instance Applicative Parser where
pure x = Parser $ \input -> Just (input, x)
(Parser p) <*> (Parser q) = Parser $ \input -> do
(input', f) <- p input
(input'', a) <- q input'
Just (input'', f a)
With the updated instance for Applicative, we get:
*Main> runParser parseOp "\"+\""
Just ("",Add)
I'm learning some techniques to make a very simple Haskell parser that serves to calculation consistence (addition, subtraction and other trivial operations). Library I use is Parsec. Although I've got some comprehension on binary calculation, it seems to be tough to me if I try to make a unary operator function, for example that of negation (~). There is a code snippet I use to implement parsing for multiplication:
import Text.Parsec hiding(digit)
import Data.Functor
type Parser a = Parsec String () a
digit :: Parser Char
digit = oneOf ['0'..'9']
number :: Parser Integer
number = read <$> many1 digit
applyMany :: a -> [a -> a] -> a
applyMany x [] = x
applyMany x (h:t) = applyMany (h x) t
multiplication :: Parser Integer
multiplication = do
lhv <- number
spaces
char '*'
spaces
rhv <- number
return $ lhv * rhv
Switching to an unary operation, my code for factorial as follows:
fact :: Parser Integer
fact = do
spaces
char '!'
rhv <- number
spaces
return $ factorial rhv
factorial :: Parser Integer -> Parser Integer
factorial n
| n == 0 || n == 1 = 1
| otherwise = n * factorial (n-1)
And once module is getting loaded, an error message appears just like that:
Couldn't match type `Integer'
with `ParsecT String () Data.Functor.Identity.Identity Integer'
Expected type: Parser Integer
Actual type: Integer
Confusingly, it's a hard case for me to realize what's wrong with my comprehension about unary ops comparing them to binary ones. Hoping any help to fix that.
factorial doesn't define a parser; it computes a factorial. The type should just be Integer -> Integer, not Parser Integer -> Parser Integer.
I have been reading a tutorial about parser combinators and I came across a function which I would like a bit of help in trying to understand.
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = item `bind` \c ->
if p c
then unit c
else (Parser (\cs -> []))
char :: Char -> Parser Char
char c = satisfy (c ==)
natural :: Parser Integer
natural = read <$> some (satisfy isDigit)
string :: String -> Parser String
string [] = return []
string (c:cs) = do { char c; string cs; return (c:cs)}
My question is how does the string function work or rather how does it terminate, say i did something like:
let while_parser = string "while"
and then i used it to parse a string say for example
parse while_parser "while if" , it will correctly parse me the "while".
however if i try something like parse while_parser "test it will return [].
My question is how does it fail? what happens when char c returns an empty list?
Let's say your Parser is defined like this:
newtype Parser a = Parser { runParser :: String -> [(a,String)] }
Then your Monad instance would be defined something like this:
instance Monad Parser where
return x = Parser $ \input -> [(x, input)]
p >>= f = Parser $ \input -> concatMap (\(x,s) -> runParser (f x) s) (runParser p input)
You're wondering what happens when char c fails in this line of code:
string (c:cs) = do { char c; string cs; return (c:cs) }
First, let's desugar it:
string (c:cs) = char c >>= \_ -> string cs >>= \_ -> return (c:cs)
Now the part of interest is char c >>= \_ -> string cs. From the definition of char and subsequently the definition of satisfy we see that ultimately runParser (char c) input will evaluate to [] when char c fails. Look at the definition of >>= when p is char c. concatMap won't have any work to do because the list will be empty! Thus any calls to >>= from then on will just encounter an empty list and pass it along.
One of the wonderful things about referential transparency is that you can write down your expression and evaluate it by substituting definitions and doing the function applications by hand.
I'm programming the precedence climbing algorithm in Haskell, but for a reason unknown to me, does not work. I think that Parsec state info is lost at some point, but I don't even know that is the source of the error:
module PrecedenceClimbing where
import Text.Parsec
import Text.Parsec.Char
{-
Algorithm
compute_expr(min_prec):
result = compute_atom()
while cur token is a binary operator with precedence >= min_prec:
prec, assoc = precedence and associativity of current token
if assoc is left:
next_min_prec = prec + 1
else:
next_min_prec = prec
rhs = compute_expr(next_min_prec)
result = compute operator(result, rhs)
return result
-}
type Precedence = Int
data Associativity = LeftAssoc
| RightAssoc
deriving (Eq, Show)
data OperatorInfo = OPInfo Precedence Associativity (Int -> Int -> Int)
mkOperator :: Char -> OperatorInfo
mkOperator = \c -> case c of
'+' -> OPInfo 1 LeftAssoc (+)
'-' -> OPInfo 1 LeftAssoc (-)
'*' -> OPInfo 2 LeftAssoc (*)
'/' -> OPInfo 2 LeftAssoc div
'^' -> OPInfo 3 RightAssoc (^)
getPrecedence :: OperatorInfo -> Precedence
getPrecedence (OPInfo prec _ _) = prec
getAssoc :: OperatorInfo -> Associativity
getAssoc (OPInfo _ assoc _) = assoc
getFun :: OperatorInfo -> (Int -> Int -> Int)
getFun (OPInfo _ _ fun) = fun
number :: Parsec String () Int
number = do
spaces
fmap read $ many1 digit
operator :: Parsec String () OperatorInfo
operator = do
spaces
fmap mkOperator $ oneOf "+-*/^"
computeAtom = do
spaces
number
loop minPrec res = (do
oper <- operator
let prec = getPrecedence oper
if prec >= minPrec
then do
let assoc = getAssoc oper
next_min_prec = if assoc == LeftAssoc
then prec + 1
else prec
rhs <- computeExpr(next_min_prec)
loop minPrec $ getFun oper res rhs
else return res) <|> (return res)
computeExpr :: Int -> Parsec String () Int
computeExpr minPrec = (do
result <- computeAtom
loop minPrec result) <|> (computeAtom)
getResult minPrec = parse (computeExpr minPrec) ""
My program for some reason is only processing the first operation or the first operand depending on the case, but does not go any further
GHCi session:
*PrecedenceClimbing> getResult 1 "46+10"
Right 56
*PrecedenceClimbing> getResult 1 "46+10+1"
Right 56
I'm not sure exactly what's wrong with your code but I'll offer these comments:
(1) These statements are not equivalent:
Generic Imperative: rhs = compute_expr(next_min_prec)
Haskell: rhs <- computeExpr(next_min_prec)
The imperative call to compute_expr will always return. The Haskell call may fail in which case the stuff following the call never happens.
(2) You are really working against Parsec's strengths by trying to parse tokens one at a time in sequence. To see the "Parsec way" of generically parsing expressions with operators of various precedences and associativities, have a look at:
buildExpression
Parsec and Expression Printing
Update
I've posted a solution to http://lpaste.net/165651
I need to write a code that parses some language. I got stuck on parsing variable name - it can be anything that is at least 1 char long, starts with lowercase letter and can contain underscore '_' character. I think I made a good start with following code:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
This function however gives me a weird results. If I call my test function
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
like this:
test_parseIdents "test"
I get this:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
Note that Parser is just synonym for ReadP a.
I also want to encode in the parser that variable names should start with a lowercase character.
Thank you for your help.
Part of the problem is with your use of the +++ operator. The following code works for me:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
So what went wrong:
+++ imposes an order on its arguments, and allows for multiple alternatives to succeed (symmetric choice). <++ is left-biased so only the left-most option succeeds -> this would remove the ambiguity in the parse, but still leaves the next problem.
Your parser was looking for letters first, then digits, and finally underscores. Digits after underscores failed, for example. The parser had to be modified to munch characters that were either letters, digits or underscores.
I also removed some functions that were unused and made an educated guess for the definition of your datatypes.