I have made the following parser to try to parse BNF:
type Literal = Literal of string
type RuleName = RuleName of string
type Term = Literal of Literal
| RuleName of RuleName
type List = List of Term list
type Expression = Expression of List list
type Rule = Rule of RuleName * Expression
type BNF = Syntax of Rule list
let pBFN : Parser<BNF, unit> =
let pWS = skipMany (pchar ' ')
let pLineEnd = skipMany1 (pchar ' ' >>. newline)
let pLiteral =
let pL c = between (pchar c) (pchar c) (manySatisfy (isNoneOf ("\n" + string c)))
(pL '"') <|> (pL '\'') |>> Literal.Literal
let pRuleName = between (pchar '<') (pchar '>') (manySatisfy (isNoneOf "\n<>")) |>> RuleName.RuleName
let pTerm = (pLiteral |>> Term.Literal) <|> (pRuleName |>> Term.RuleName)
let pList = sepBy1 pTerm pWS |>> List.List
let pExpression = sepBy1 pList (pWS >>. (pchar '|') .>> pWS) |>> Expression.Expression
let pRule = pWS >>. pRuleName .>> pWS .>> pstring "::=" .>> pWS .>>. pExpression .>> pLineEnd |>> Rule.Rule
many1 pRule |>> BNF.Syntax
For testing, I'm running it on BNF's BNF as per Wikipedia:
<syntax> ::= <rule> | <rule> <syntax>
<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= " " <opt-whitespace> | ""
<expression> ::= <list> | <list> <opt-whitespace> "|" <opt-whitespace> <expression>
<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>
<list> ::= <term> | <term> <opt-whitespace> <list>
<term> ::= <literal> | "<" <rule-name> ">"
<literal> ::= '"' <text> '"' | "'" <text> "'"
But it always fails with this error:
Error in Ln: 1 Col: 21
<syntax> ::= <rule> | <rule> <syntax>
^
Expecting: ' ', '"', '\'' or '<'
What am I doing wrong?
Edit
The function I'm using to test:
let test =
let text = "<syntax> ::= <rule> | <rule> <syntax>
<rule> ::= <opt-whitespace> \"<\" <rule-name> \">\" <opt-whitespace> \"::=\" <opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= \" \" <opt-whitespace> | \"\"
<expression> ::= <list> | <list> <opt-whitespace> \"|\" <opt-whitespace> <expression>
<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>
<list> ::= <term> | <term> <opt-whitespace> <list>
<term> ::= <literal> | \"<\" <rule-name> \">\"
<literal> ::= '\"' <text> '\"' | \"'\" <text> \"'\""
run pBNF text
Your first problem is with pList: sepBy1 is greedily grabbing trailing spaces, but once it does that it then expects an additional term to follow rather than the end of the list. The simplest way to fix this is to use sepEndBy1 instead.
This will expose your next problem: pEndLine isn't faithfully implemented because you're always looking for exactly one space followed by a newline, when you should be looking for any number of spaces instead (that is, you want pWS >>. newline in the interior, rather than pchar ' ' >>. newline).
Finally, note that your definition requires each rule to end with a newline, so you won't be able to parse your string as given (you'll need to append an empty line to the end). Instead you might want to pull newline out of your definition of pRule and define the main parser as sepBy1 pRule pLineEnd |>> BNF.Syntax.
Related
I have the following grammar rules:
%precedence KW2
%left "or"
%left "and"
%left "==" "!=" ">=" ">" "<=" "<"
%left "-" "+"
%left "/" "*"
%start statement1
%%
param
: id
| id ":" expr // Conflict is caused by this line
| id "=" expr
;
param_list
: param_list "," param
| param
;
defparam
: param_list "," "/"
| param_list "," "/" ","
;
param_arg_list
: defparam param_list
| param_list
;
statement1
: KEYWORD1 "(" param_arg_list ")" ":" expr {}
expression1
: KEYWORD2 param_arg_list ":" expr %prec KW2 {} // This causes shift/reduce conflicts
expr
: id
| expr "+" expr
| expr "-" expr
| expr "*" expr
| expr "/" expr
| expr "==" expr
| expr "!=" expr
| expr "<" expr
| expr "<=" expr
| expr ">" expr
| expr ">=" expr
| expr "and" expr
| expr "or" expr
| expression1
id
: TK_NAME {}
.output
State 33
12 param: id . [":", ",", ")"]
13 | id . ":" expr
14 | id . "=" expr
":" shift, and go to state 55
"=" shift, and go to state 56
":" [reduce using rule 12 (param)]
$default reduce using rule 12 (param)
The problem here is that, For the expression1, id ":" expr rule in param is not required, so If I remove id ":" expr, the conflicts are resolved. But, I can not remove id ":" expr rule in param, because statement1 requires it.
I wanted to use para_arg_list for statement1 and expression1 is that, it simplifies the grammar rules by not allowing to use the grammar rules again and again.
My question is that is there any other way to resolve the conflict?
I have a simple language with following grammar
Expr -> Var | Int | Expr Op Expr
Op -> + | - | * | / | % | == | != | < | > | <= | >= | && | ||
Stmt -> Skip | Var := Expr | Stmt ; Stmt | write Expr | read Expr | while Expr do Stmt | if Expr then Stmt else Stmt
I am writing simple parser for this language using Haskell's Parsec library and i am stuck with some things
When i try to parse statement skip ; skip i get only first Skip, however i want go get something like Colon Skip Skip
Also when i try to parse the assignment, i get an infinite recursion. For example, when i try to parse x := 1 my computer hangs up for long time.
Here is full source code of my parser. Thanks for any help!
module Parser where
import Control.Monad
import Text.Parsec.Language
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token as Token
type Id = String
data Op = Add
| Sub
| Mul
| Div
| Mod
| Eq
| Neq
| Gt
| Geq
| Lt
| Leq
| And
| Or deriving (Eq, Show)
data Expr = Var Id
| Num Integer
| BinOp Op Expr Expr deriving (Eq, Show)
data Stmt = Skip
| Assign Expr Expr
| Colon Stmt Stmt
| Write Expr
| Read Expr
| WhileLoop Expr Stmt
| IfCond Expr Stmt Stmt deriving (Eq, Show)
languageDef =
emptyDef { Token.commentStart = ""
, Token.commentEnd = ""
, Token.commentLine = ""
, Token.nestedComments = False
, Token.caseSensitive = True
, Token.identStart = letter
, Token.identLetter = alphaNum
, Token.reservedNames = [ "skip"
, ";"
, "write"
, "read"
, "while"
, "do"
, "if"
, "then"
, "else"
]
, Token.reservedOpNames = [ "+"
, "-"
, "*"
, "/"
, ":="
, "%"
, "=="
, "!="
, ">"
, ">="
, "<"
, "<="
, "&&"
, "||"
]
}
lexer = Token.makeTokenParser languageDef
identifier = Token.identifier lexer
reserved = Token.reserved lexer
reservedOp = Token.reservedOp lexer
semi = Token.semi lexer
parens = Token.parens lexer
integer = Token.integer lexer
whiteSpace = Token.whiteSpace lexer
ifStmt :: Parser Stmt
ifStmt = do
reserved "if"
cond <- expression
reserved "then"
action1 <- statement
reserved "else"
action2 <- statement
return $ IfCond cond action1 action2
whileStmt :: Parser Stmt
whileStmt = do
reserved "while"
cond <- expression
reserved "do"
action <- statement
return $ WhileLoop cond action
assignStmt :: Parser Stmt
assignStmt = do
var <- expression
reservedOp ":="
expr <- expression
return $ Assign var expr
skipStmt :: Parser Stmt
skipStmt = do
reserved "skip"
return Skip
colonStmt :: Parser Stmt
colonStmt = do
s1 <- statement
reserved ";"
s2 <- statement
return $ Colon s1 s2
readStmt :: Parser Stmt
readStmt = do
reserved "read"
e <- expression
return $ Read e
writeStmt :: Parser Stmt
writeStmt = do
reserved "write"
e <- expression
return $ Write e
statement :: Parser Stmt
statement = colonStmt
<|> assignStmt
<|> writeStmt
<|> readStmt
<|> whileStmt
<|> ifStmt
<|> skipStmt
expression :: Parser Expr
expression = buildExpressionParser operators term
term = fmap Var identifier
<|> fmap Num integer
<|> parens expression
operators = [ [Infix (reservedOp "==" >> return (BinOp Eq)) AssocNone,
Infix (reservedOp "!=" >> return (BinOp Neq)) AssocNone,
Infix (reservedOp ">" >> return (BinOp Gt)) AssocNone,
Infix (reservedOp ">=" >> return (BinOp Geq)) AssocNone,
Infix (reservedOp "<" >> return (BinOp Lt)) AssocNone,
Infix (reservedOp "<=" >> return (BinOp Leq)) AssocNone,
Infix (reservedOp "&&" >> return (BinOp And)) AssocNone,
Infix (reservedOp "||" >> return (BinOp Or)) AssocNone]
, [Infix (reservedOp "*" >> return (BinOp Mul)) AssocLeft,
Infix (reservedOp "/" >> return (BinOp Div)) AssocLeft,
Infix (reservedOp "%" >> return (BinOp Mod)) AssocLeft]
, [Infix (reservedOp "+" >> return (BinOp Add)) AssocLeft,
Infix (reservedOp "-" >> return (BinOp Sub)) AssocLeft]
]
parser :: Parser Stmt
parser = whiteSpace >> statement
parseString :: String -> Stmt
parseString str =
case parse parser "" str of
Left e -> error $ show e
Right r -> r`
It's a common problem of parsers based on parser combinator: statement is left-recursive as its first pattern is colonStmt, and the first thing colonStmt will do is try parsing a statement again. Parser combinators are well-known won't terminate in this case.
Removed the colonStmt pattern from statement parser and the other parts worked appropriately:
> parseString "if (1 == 1) then skip else skip"
< IfCond (BinOp Eq (Num 1) (Num 1)) Skip Skip
> parseString "x := 1"
< Assign (Var "x") (Num 1)
The solution is fully described in this repo, there's no license file so I don't really know if it's safe to refer to the code, the general idea is to add another layer of parser when parsing any statement:
statement :: Parser Stmt
statement = do
ss <- sepBy1 statement' (reserved ";")
if length ss == 1
then return $ head ss
else return $ foldr1 Colon ss
statement' :: Parser Stmt
statement' = assignStmt
<|> writeStmt
<|> readStmt
<|> whileStmt
<|> ifStmt
<|> skipStmt
I'm writing a GoldParser VBScript grammar. In my grammar array assignment statements such as id(1) = 2 are not parsed as assignment statements, but as call statements id ((1) = 2) (the = symbol can be both the assignment operator and the comparison operator). How can I change the following grammar to correctly parse array assignment statements?
<CallStmt> ::= 'Call' <CallExpr>
| '.' <CallPath>
| <CallPath>
<AssignStmt> ::= <CallExpr> '=' <Expr>
| 'Set' <CallExpr> '=' <Expr>
| 'Set' <CallExpr> '=' 'New' <CtorPath>
<CtorPath> ::= IDDot <CtorPath>
| <Member>
<CallPath> ::= <MemberDot> <CallPath>
| ID '(' ')'
| ID <ParameterList>
<CallExpr> ::= '.' <MemberPath>
| <MemberPath>
<MemberPath> ::= <MemberDot> <MemberPath>
| <Member>
<Member> ::= ID
| ID '(' <ParameterList> ')'
<MemberDot> ::= IDDot
| ID '(' <ParameterList> ').'
!VBScript allows to skip parameters a(1,,2)
<ParameterList> ::= <Expr> ',' <ParameterList>
| ',' <ParameterList>
| <Expr>
|
! Value can be reduced from <Expr>
<Value> ::= NumberLiteral
| StringLiteral
| <CallExpr>
| '(' <Expr> ')'
!--- The rest of the grammar ---
"Start Symbol" = <Start>
{WS} = {Whitespace} - {CR} - {LF}
{ID Head} = {Letter} + [_]
{ID Tail} = {Alphanumeric} + [_]
{String Chars} = {Printable} + {HT} - ["]
Whitespace = {WS}+
NewLine = {CR}{LF} | {CR} | {LF}
ID = {ID Head}{ID Tail}*
IDDot = {ID Head}{ID Tail}* '.'
StringLiteral = ('"' {String Chars}* '"')+
NumberLiteral = {Number}+ ('.' {Number}+ )?
<nl> ::= NewLine <nl> !One or more
| NewLine
<nl Opt> ::= NewLine <nl Opt> !Zero or more
| !Empty
<Start> ::= <nl opt> <StmtList>
<StmtList> ::= <CallStmt> <nl> <StmtList>
| <AssignStmt> <nl> <StmtList>
|
<Expr> ::= <Compare Exp>
<Compare Exp> ::= <Compare Exp> '=' <Add Exp>
| <Add Exp>
<Add Exp> ::= <Add Exp> '+' <Mult Exp>
| <Add Exp> '-' <Mult Exp>
| <Mult Exp>
<Mult Exp> ::= <Mult Exp> '*' <Negate Exp>
| <Mult Exp> '/' <Negate Exp>
| <Negate Exp>
<Negate Exp> ::= '-' <Value>
| <Value>
Note: I added the IDDot terminal to parse statements within With correctly, e.g: .obj.sub .obj.par1.
I am writing a simple parser in bison. The parser checks whether a program has any syntax errors with respect to my following grammar:
%{
#include <stdio.h>
void yyerror (const char *s) /* Called by yyparse on error */
{
printf ("%s\n", s);
}
%}
%token tNUM tINT tREAL tIDENT tINTTYPE tREALTYPE tINTMATRIXTYPE
%token tREALMATRIXTYPE tINTVECTORTYPE tREALVECTORTYPE tTRANSPOSE
%token tIF tENDIF tDOTPROD tEQ tNE tGTE tLTE tGT tLT tOR tAND
%left "(" ")" "[" "]"
%left "<" "<=" ">" ">="
%right "="
%left "+" "-"
%left "*" "/"
%left "||"
%left "&&"
%left "==" "!="
%% /* Grammar rules and actions follow */
prog: stmtlst ;
stmtlst: stmt | stmt stmtlst ;
stmt: decl | asgn | if;
decl: type vars "=" expr ";" ;
type: tINTTYPE | tINTVECTORTYPE | tINTMATRIXTYPE | tREALTYPE | tREALVECTORTYPE
| tREALMATRIXTYPE ;
vars: tIDENT | tIDENT "," vars ;
asgn: tIDENT "=" expr ";" ;
if: tIF "(" bool ")" stmtlst tENDIF ;
expr: tIDENT | tINT | tREAL | vectorLit | matrixLit | expr "+" expr| expr "-" expr
| expr "*" expr | expr "/" expr| expr tDOTPROD expr | transpose ;
transpose: tTRANSPOSE "(" expr ")" ;
vectorLit: "[" row "]" ;
matrixLit: "[" row ";" rows "]" ;
row: value | value "," row ;
rows: row | row ";" rows ;
value: tINT | tREAL | tIDENT ;
bool: comp | bool tAND bool | bool tOR bool ;
comp: expr relation expr ;
relation: tGT | tLT | tGTE | tLTE | tNE | tEQ ;
%%
int main ()
{
if (yyparse()) {
// parse error
printf("ERROR\n");
return 1;
}
else {
// successful parsing
printf("OK\n");
return 0;
}
}
The code may look long and complicated, but i think what i am going to ask does not need the full code, but in any case i preferred to write the code. I am sure my grammar is correct, but ambiguous. When i try to create the executable of the program by writing "bison -d filename.y", i get an error saying that conflicts: 13 shift/reduce. I defined the precedence of the operators at the beginning of this file, and i tried a lot of combinations of these precedences, but i still get this error. How can i remove this ambiguity? Thank you
tOR, tAND, and tDOTPROD need to have their precedence specified as well.
I am supposed to make a parser for a language with the following grammar:
Program ::= Stmts "return" Expr ";"
Stmts ::= Stmt Stmts
| ε
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= Choice Choices
| Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Shift "<<" integer
| Shift ">>" integer
| Term
Term ::= Term "+" Prod
| Term "-" Prod
| Prod
Prod ::= Prod "*" Prim
| Prim
Prim ::= ident
| integer
| "(" Expr ")"
With the following data type for Expr:
data Expr = Var Ident
| Val Int
| Lshift Expr Int
| Rshift Expr Int
| Plus Expr Expr
| Minus Expr Expr
| Mult Expr Expr
deriving (Eq, Show, Read)
My problem is implementing the Shift operator, because I get the following error when I encounter a left or right shift:
unexpected ">"
expecting operator or ";"
Here is the code I have for Expr:
expr = try (exprOp)
<|> exprShift
exprOp = buildExpressionParser arithmeticalOps prim <?> "arithmetical expression"
prim :: Parser Expr
prim = new_ident <|> new_integer <|> pE <?> "primitive expression"
where
new_ident = do {i <- ident; return $ Var i }
new_integer = do {i <- first_integer; return $ Val i }
pE = parens expr
arithmeticalOps = [ [binary "*" Mult AssocLeft],
[binary "+" Plus AssocLeft, binary "-" Minus AssocLeft]
]
binary name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
exprShift =
do
e <- expr
a <- aShift
i <- first_integer
return $ a e i
aShift = (reservedOp "<<" >> return Lshift)
<|> (reservedOp ">>" >> return Rshift)
I suspect the problem is concerning lookahead, but I can't seem to figure it out.
Here's a grammar with left recursion eliminated (untested). Stmts and Choices can be simplified with Parsec's many and many1. The other recursive productions have to be expanded:
Program ::= Stmts "return" Expr ";"
Stmts ::= #many# Stmt
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= #many1# Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Term ShiftRest
ShiftRest ::= <empty>
| "<<" integer
| ">>" integer
Term ::= Prod TermRest
TermRest ::= <empty>
| "+" Term
| "-" Term
Prod ::= Prim ProdRest
ProdRest ::= <empty>
| "*" Prod
Prim ::= ident
| integer
| "(" Expr ")"
Edit - "Part Two"
"empty" (in angles) is the empty production, you were using epsilon in the original post, but I don't know its Unicode code point and didn't think to copy-paste it.
Here's an example of how I would code the grammar. Note - unlike the grammar I posted empty versions must always be the last choice to give the other productions chance to match. Also your datatypes and constructors for the Abstract Syntax Tree probably differ to the the guesses I've made, but it should be fairly clear what's going on. The code is untested - hopefully any errors are obvious:
shift :: Parser Expr
shift = do
t <- term
leftShift t <|> rightShift <|> emptyShift t
-- Note - this gets an Expr passed in - it is the "prefix"
-- of the shift production.
--
leftShift :: Expr -> Parser Expr
leftShift t = do
reservedOp "<<"
i <- int
return (LShift t i)
-- Again this gets an Expr passed in.
--
rightShift :: Expr -> Parser Expr
rightShift t = do
reservedOp ">>"
i <- int
return (RShift t i)
-- The empty version does no parsing.
-- Usually I would change the definition of "shift"
-- and not bother defining "emptyShift", the last
-- line of "shift" would then be:
--
-- > leftShift t <|> rightShift t <|> return t
--
emptyShift :: Expr -> Parser Expr
emptyShift t = return t
Parsec is still Greek to me, but my vague guess is that aShift should use try.
The parsec docs on Hackage have an example explaining the use of try with <|> that might help you out.