How to parse simple imperative language using Parsec? - parsing

I have a simple language with following grammar
Expr -> Var | Int | Expr Op Expr
Op -> + | - | * | / | % | == | != | < | > | <= | >= | && | ||
Stmt -> Skip | Var := Expr | Stmt ; Stmt | write Expr | read Expr | while Expr do Stmt | if Expr then Stmt else Stmt
I am writing simple parser for this language using Haskell's Parsec library and i am stuck with some things
When i try to parse statement skip ; skip i get only first Skip, however i want go get something like Colon Skip Skip
Also when i try to parse the assignment, i get an infinite recursion. For example, when i try to parse x := 1 my computer hangs up for long time.
Here is full source code of my parser. Thanks for any help!
module Parser where
import Control.Monad
import Text.Parsec.Language
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token
type Id = String
data Op = Add
| Sub
| Mul
| Div
| Mod
| Eq
| Neq
| Gt
| Geq
| Lt
| Leq
| And
| Or deriving (Eq, Show)
data Expr = Var Id
| Num Integer
| BinOp Op Expr Expr deriving (Eq, Show)
data Stmt = Skip
| Assign Expr Expr
| Colon Stmt Stmt
| Write Expr
| Read Expr
| WhileLoop Expr Stmt
| IfCond Expr Stmt Stmt deriving (Eq, Show)
languageDef =
emptyDef { Token.commentStart = ""
, Token.commentEnd = ""
, Token.commentLine = ""
, Token.nestedComments = False
, Token.caseSensitive = True
, Token.identStart = letter
, Token.identLetter = alphaNum
, Token.reservedNames = [ "skip"
, ";"
, "write"
, "read"
, "while"
, "do"
, "if"
, "then"
, "else"
]
, Token.reservedOpNames = [ "+"
, "-"
, "*"
, "/"
, ":="
, "%"
, "=="
, "!="
, ">"
, ">="
, "<"
, "<="
, "&&"
, "||"
]
}
lexer = Token.makeTokenParser languageDef
identifier = Token.identifier lexer
reserved = Token.reserved lexer
reservedOp = Token.reservedOp lexer
semi = Token.semi lexer
parens = Token.parens lexer
integer = Token.integer lexer
whiteSpace = Token.whiteSpace lexer
ifStmt :: Parser Stmt
ifStmt = do
reserved "if"
cond <- expression
reserved "then"
action1 <- statement
reserved "else"
action2 <- statement
return $ IfCond cond action1 action2
whileStmt :: Parser Stmt
whileStmt = do
reserved "while"
cond <- expression
reserved "do"
action <- statement
return $ WhileLoop cond action
assignStmt :: Parser Stmt
assignStmt = do
var <- expression
reservedOp ":="
expr <- expression
return $ Assign var expr
skipStmt :: Parser Stmt
skipStmt = do
reserved "skip"
return Skip
colonStmt :: Parser Stmt
colonStmt = do
s1 <- statement
reserved ";"
s2 <- statement
return $ Colon s1 s2
readStmt :: Parser Stmt
readStmt = do
reserved "read"
e <- expression
return $ Read e
writeStmt :: Parser Stmt
writeStmt = do
reserved "write"
e <- expression
return $ Write e
statement :: Parser Stmt
statement = colonStmt
<|> assignStmt
<|> writeStmt
<|> readStmt
<|> whileStmt
<|> ifStmt
<|> skipStmt
expression :: Parser Expr
expression = buildExpressionParser operators term
term = fmap Var identifier
<|> fmap Num integer
<|> parens expression
operators = [ [Infix (reservedOp "==" >> return (BinOp Eq)) AssocNone,
Infix (reservedOp "!=" >> return (BinOp Neq)) AssocNone,
Infix (reservedOp ">" >> return (BinOp Gt)) AssocNone,
Infix (reservedOp ">=" >> return (BinOp Geq)) AssocNone,
Infix (reservedOp "<" >> return (BinOp Lt)) AssocNone,
Infix (reservedOp "<=" >> return (BinOp Leq)) AssocNone,
Infix (reservedOp "&&" >> return (BinOp And)) AssocNone,
Infix (reservedOp "||" >> return (BinOp Or)) AssocNone]
, [Infix (reservedOp "*" >> return (BinOp Mul)) AssocLeft,
Infix (reservedOp "/" >> return (BinOp Div)) AssocLeft,
Infix (reservedOp "%" >> return (BinOp Mod)) AssocLeft]
, [Infix (reservedOp "+" >> return (BinOp Add)) AssocLeft,
Infix (reservedOp "-" >> return (BinOp Sub)) AssocLeft]
]
parser :: Parser Stmt
parser = whiteSpace >> statement
parseString :: String -> Stmt
parseString str =
case parse parser "" str of
Left e -> error $ show e
Right r -> r`

It's a common problem of parsers based on parser combinator: statement is left-recursive as its first pattern is colonStmt, and the first thing colonStmt will do is try parsing a statement again. Parser combinators are well-known won't terminate in this case.
Removed the colonStmt pattern from statement parser and the other parts worked appropriately:
> parseString "if (1 == 1) then skip else skip"
< IfCond (BinOp Eq (Num 1) (Num 1)) Skip Skip
> parseString "x := 1"
< Assign (Var "x") (Num 1)
The solution is fully described in this repo, there's no license file so I don't really know if it's safe to refer to the code, the general idea is to add another layer of parser when parsing any statement:
statement :: Parser Stmt
statement = do
ss <- sepBy1 statement' (reserved ";")
if length ss == 1
then return $ head ss
else return $ foldr1 Colon ss
statement' :: Parser Stmt
statement' = assignStmt
<|> writeStmt
<|> readStmt
<|> whileStmt
<|> ifStmt
<|> skipStmt

Related

Expression Evaluation using combinators in Haskell

I'm trying to make an expression evaluator in Hakell:
data Parser i o
= Success o [i]
| Failure String [i]
| Parser
{parse :: [i] -> Parser i o}
data Operator = Add | Sub | Mul | Div | Pow
data Expr
= Op Operator Expr Expr
| Val Double
expr :: Parser Char Expr
expr = add_sub
where
add_sub = calc Add '+' mul_div <|> calc Sub '-' mul_div <|> mul_div
mul_div = calc Mul '*' pow <|> calc Div '/' pow <|> pow
pow = calc Pow '^' factor <|> factor
factor = parens <|> val
val = Val <$> parseDouble
parens = parseChar '(' *> expr <* parseChar ')'
calc c o p = Op c <$> (p <* parseChar o) <*> p
My problem is that when I try to evaluate an expression with two operators with same priority (e.g. 1+1-1) the parser will fail.
How can I say that an add_sub can be an operation between two other add_subs without creating an infinite loop?
As explained by #chi the problem is that calc was using p twice which doesn't allow for patterns like muldiv + .... | muldiv - ... | ...
I just changed the definition of calc to :
calc c o p p2 = Op c <$> (p <* parseChar o) <*> p2
where p2 is the current priority (mul_div in the definition of mul_div)
it works much better but the order of calulations is backwards:
2/3/4 is parsed as 2/(3/4) instead of (2/3)/4

Record parsing in Haskell

I am building a parser using Megaparsec and I don't know which is the best approach to parse a structure like
names a b c
surnames d e f g
where names and surnames are keywords followed by a list of strings, and each of the two line is optional. This means that also
names a b c
and
surnames d e f g
are valid.
I can parse every line with something like
maybeNames <- optional $ do
constant "names"
many identifier
where identifier parses a valid non-reserved string.
Now, I'm not sure how to express that each line is optional, but still retrieve its value if it is present
Start with writing the context free grammar for your format:
program ::= lines
lines ::= line | line lines
line ::= names | surnames
names ::= NAMES ids
surnames ::= SURNAMES ids
ids ::= id | id ids
id ::= STRING
Where upper case names are for terminals,
and lower case names are for non terminals.
You could then easily use Alex + Happy to parse your text file.
You can do something similar to what appears in this guide and use <|>
to select optional arguments. Here are the essence of things:
whileParser :: Parser Stmt
whileParser = between sc eof stmt
stmt :: Parser Stmt
stmt = f <$> sepBy1 stmt' semi
where
-- if there's only one stmt return it without using ‘Seq’
f l = if length l == 1 then head l else Seq l
stmt' = ifStmt
<|> whileStmt
<|> skipStmt
<|> assignStmt
<|> parens stmt
ifStmt :: Parser Stmt
ifStmt = do
rword "if"
cond <- bExpr
rword "then"
stmt1 <- stmt
rword "else"
stmt2 <- stmt
return (If cond stmt1 stmt2)
whileStmt :: Parser Stmt
whileStmt = do
rword "while"
cond <- bExpr
rword "do"
stmt1 <- stmt
return (While cond stmt1)

Parsing issue with parens. Parsec - Haskell

This is my code:
expr :: Parser Integer
expr = buildExpressionParser table factor <?> "expression"
table :: [[ Operator Char st Integer ]]
table = [
[ op "*" (*) AssocLeft],
[ op "+" (+) AssocLeft]
]
where
op s f assoc = Infix (do { string s ; return f }) assoc
factor = do { char '(' ; x <- expr ; char ')' ; return x }
<|> number
<?> "simple expression"
number :: Parser Integer
number = do { ds <- many1 digit; return read(ds))) } <?> "number"
This works perfectly with expressions like this: (10+10) * 10.
But I have a problem with the following: 10 +10)
This has to return a parsing error (just 1 parenthesis at the end), but instead it returns 20.
How to fix this?
Thanks!
You'll need to use the eof parser to make sure you've read the whole input:
myParser = do
parsed <- expr
eof
return parsed
If you're using Control.Applicative this can be simplified to:
myParser = expr <* eof

How do I create a Parser data?

I am trying to learn how can I do a parser for expressions in Haskell and I found this code (below), but I don't even know how to use it.
I tried with: expr (Add (Num 5) (Num 2)) , but it needs a "Parser" data type.
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Expr
import Text.Parsec.Token
import Text.Parsec.Language
data Expr = Num Int | Var String | Add Expr Expr | Sub Expr Expr | Mul Expr Expr | Div Expr Expr deriving Show
expr :: Parser Expr
expr = buildExpressionParser table factor
<?> "expression"
table = [[op "*" Mul AssocLeft, op "/" Div AssocLeft],
[op "+" Add AssocLeft, op "-" Sub AssocLeft]]
where
op s f assoc = Infix (do{ string s; return f}) assoc
factor = do{ char '('
; x <- expr
; char ')'
; return x}
<|> number
<|> variable
<?> "simple expression"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (Num (read ds))}
<?> "number"
variable :: Parser Expr
variable = do{ ds<- many1 letter
; return (Var ds)}
<?> "variable"
Solution: readExpr input = parse expr "name for error messages" input
and use readExpr.
You can use the function parse which will run a Parser on an input string and return an Either ParseError Expr. I put a simple usage below where I turn that ParseError into a string and pass it along
readExpr :: String -> Either String Expr
readExpr input = case parse expr "name for error messages" input of
Left err -> Left $ "Oh noes parsers are failing: " ++ show err -- Handle error
Right a -> Right a -- Handle success
There are a few other functions, such as parseFromFile, which let you shorthand a few common patterns, to find them, check out the parsec haddock

parser in haskell error

I am supposed to make a parser for a language with the following grammar:
Program ::= Stmts "return" Expr ";"
Stmts ::= Stmt Stmts
| ε
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= Choice Choices
| Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Shift "<<" integer
| Shift ">>" integer
| Term
Term ::= Term "+" Prod
| Term "-" Prod
| Prod
Prod ::= Prod "*" Prim
| Prim
Prim ::= ident
| integer
| "(" Expr ")"
With the following data type for Expr:
data Expr = Var Ident
| Val Int
| Lshift Expr Int
| Rshift Expr Int
| Plus Expr Expr
| Minus Expr Expr
| Mult Expr Expr
deriving (Eq, Show, Read)
My problem is implementing the Shift operator, because I get the following error when I encounter a left or right shift:
unexpected ">"
expecting operator or ";"
Here is the code I have for Expr:
expr = try (exprOp)
<|> exprShift
exprOp = buildExpressionParser arithmeticalOps prim <?> "arithmetical expression"
prim :: Parser Expr
prim = new_ident <|> new_integer <|> pE <?> "primitive expression"
where
new_ident = do {i <- ident; return $ Var i }
new_integer = do {i <- first_integer; return $ Val i }
pE = parens expr
arithmeticalOps = [ [binary "*" Mult AssocLeft],
[binary "+" Plus AssocLeft, binary "-" Minus AssocLeft]
]
binary name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
exprShift =
do
e <- expr
a <- aShift
i <- first_integer
return $ a e i
aShift = (reservedOp "<<" >> return Lshift)
<|> (reservedOp ">>" >> return Rshift)
I suspect the problem is concerning lookahead, but I can't seem to figure it out.
Here's a grammar with left recursion eliminated (untested). Stmts and Choices can be simplified with Parsec's many and many1. The other recursive productions have to be expanded:
Program ::= Stmts "return" Expr ";"
Stmts ::= #many# Stmt
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= #many1# Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Term ShiftRest
ShiftRest ::= <empty>
| "<<" integer
| ">>" integer
Term ::= Prod TermRest
TermRest ::= <empty>
| "+" Term
| "-" Term
Prod ::= Prim ProdRest
ProdRest ::= <empty>
| "*" Prod
Prim ::= ident
| integer
| "(" Expr ")"
Edit - "Part Two"
"empty" (in angles) is the empty production, you were using epsilon in the original post, but I don't know its Unicode code point and didn't think to copy-paste it.
Here's an example of how I would code the grammar. Note - unlike the grammar I posted empty versions must always be the last choice to give the other productions chance to match. Also your datatypes and constructors for the Abstract Syntax Tree probably differ to the the guesses I've made, but it should be fairly clear what's going on. The code is untested - hopefully any errors are obvious:
shift :: Parser Expr
shift = do
t <- term
leftShift t <|> rightShift <|> emptyShift t
-- Note - this gets an Expr passed in - it is the "prefix"
-- of the shift production.
--
leftShift :: Expr -> Parser Expr
leftShift t = do
reservedOp "<<"
i <- int
return (LShift t i)
-- Again this gets an Expr passed in.
--
rightShift :: Expr -> Parser Expr
rightShift t = do
reservedOp ">>"
i <- int
return (RShift t i)
-- The empty version does no parsing.
-- Usually I would change the definition of "shift"
-- and not bother defining "emptyShift", the last
-- line of "shift" would then be:
--
-- > leftShift t <|> rightShift t <|> return t
--
emptyShift :: Expr -> Parser Expr
emptyShift t = return t
Parsec is still Greek to me, but my vague guess is that aShift should use try.
The parsec docs on Hackage have an example explaining the use of try with <|> that might help you out.

Resources