ANTLR 3 bug, mismatched input, but what's wrong? - parsing

I have the following problem:
My ANTLR 3 grammar compiles, but my simple testprogram doesn't work. The grammar is as follows:
grammar Rietse;
options {
k=1;
language=Java;
output=AST;
}
tokens {
COLON = ':' ;
SEMICOLON = ';' ;
OPAREN = '(' ;
CPAREN = ')' ;
COMMA = ',' ;
OCURLY = '{' ;
CCURLY = '}' ;
SINGLEQUOTE = '\'' ;
// operators
BECOMES = '=' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
MODULO = '%' ;
EQUALS = '==' ;
LT = '<' ;
LTE = '<=' ;
GT = '>' ;
GTE = '>=' ;
UNEQUALS = '!=' ;
AND = '&&' ;
OR = '||' ;
NOT = '!' ;
// keywords
PROGRAM = 'program' ;
COMPOUND = 'compound' ;
UNARY = 'unary' ;
DECL = 'decl' ;
SDECL = 'sdecl' ;
STATIC = 'static' ;
PRINT = 'print' ;
READ = 'read' ;
IF = 'if' ;
THEN = 'then' ;
ELSE = 'else' ;
DO = 'do' ;
WHILE = 'while' ;
// types
INTEGER = 'int' ;
CHAR = 'char' ;
BOOLEAN = 'boolean' ;
TRUE = 'true' ;
FALSE = 'false' ;
}
#lexer::header {
package Eindopdracht;
}
#header {
package Eindopdracht;
}
// Parser rules
program
: program2 EOF
-> ^(PROGRAM program2)
;
program2
: (declaration* statement)+
;
declaration
: STATIC type IDENTIFIER SEMICOLON -> ^(SDECL type IDENTIFIER)
| type IDENTIFIER SEMICOLON -> ^(DECL type IDENTIFIER)
;
type
: INTEGER
| CHAR
| BOOLEAN
;
statement
: assignment_expr SEMICOLON!
| while_stat SEMICOLON!
| print_stat SEMICOLON!
| if_stat SEMICOLON!
| read_stat SEMICOLON!
;
while_stat
: WHILE^ OPAREN! or_expr CPAREN! OCURLY! statement+ CCURLY! // while (expression) {statement+}
;
print_stat
: PRINT^ OPAREN! or_expr (COMMA! or_expr)* CPAREN! // print(expression)
;
read_stat
: READ^ OPAREN! IDENTIFIER (COMMA! IDENTIFIER)+ CPAREN! // read(expression)
;
if_stat
: IF^ OPAREN! or_expr CPAREN! comp_expr (ELSE! comp_expr)? // if (expression) compound else compound
;
assignment_expr
: or_expr (BECOMES^ or_expr)*
;
or_expr
: and_expr (OR^ and_expr)*
;
and_expr
: compare_expr (AND^ compare_expr)*
;
compare_expr
: plusminus_expr ((LT|LTE|GT|GTE|EQUALS|UNEQUALS)^ plusminus_expr)?
;
plusminus_expr
: timesdivide_expr ((PLUS | MINUS)^ timesdivide_expr)*
;
timesdivide_expr
: unary_expr ((TIMES | DIVIDE | MODULO)^ unary_expr)*
;
unary_expr
: operand
| PLUS operand -> ^(UNARY PLUS operand)
| MINUS operand -> ^(UNARY MINUS operand)
| NOT operand -> ^(UNARY NOT operand)
;
operand
: TRUE
| FALSE
| charliteral
| IDENTIFIER
| NUMBER
| OPAREN! or_expr CPAREN!
;
comp_expr
: OCURLY program2 CCURLY -> ^(COMPOUND program2)
;
// Lexer rules
charliteral
: SINGLEQUOTE! LETTER SINGLEQUOTE!
;
IDENTIFIER
: LETTER (LETTER | DIGIT)*
;
NUMBER
: DIGIT+
;
COMMENT
: '//' .* '\n'
{ $channel=HIDDEN; }
;
WS
: (' ' | '\t' | '\f' | '\r' | '\n')+
{ $channel=HIDDEN; }
;
fragment DIGIT : ('0'..'9') ;
fragment LOWER : ('a'..'z') ;
fragment UPPER : ('A'..'Z') ;
fragment LETTER : LOWER | UPPER ;
// EOF
I then use the following java file to test programs:
package Package;
import java.io.FileInputStream;
import java.io.InputStream;
import org.antlr.runtime.ANTLRInputStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.tree.BufferedTreeNodeStream;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
import org.antlr.runtime.tree.DOTTreeGenerator;
import org.antlr.runtime.tree.TreeNodeStream;
import org.antlr.stringtemplate.StringTemplate;
public class Rietse {
public static void main (String[] args)
{
String inputFile = args[0];
try {
InputStream in = inputFile == null ? System.in : new FileInputStream(inputFile);
RietseLexer lexer = new RietseLexer(new ANTLRInputStream(in));
CommonTokenStream tokens = new CommonTokenStream(lexer);
RietseParser parser = new RietseParser(tokens);
RietseParser.program_return result = parser.program();
} catch (RietseException e) {
System.err.print("ERROR: RietseException thrown by compiler: ");
System.err.println(e.getMessage());
} catch (RecognitionException e) {
System.err.print("ERROR: recognition exception thrown by compiler: ");
System.err.println(e.getMessage());
e.printStackTrace();
} catch (Exception e) {
System.err.print("ERROR: uncaught exception thrown by compiler: ");
System.err.println(e.getMessage());
e.printStackTrace();
}
}
}
And at last, the testprogram itself:
print('a');
Now when I run this, I get the following errors:
line 1:7 mismatched input 'a' expecting LETTER
line 1:9 mismatched input ')' expecting LETTER
I have no clue whatsoever what causes this bug. I have tried several changes of things but nothing fixed it. Does anyone here know what's wrong with my code and how I can fix it?
Every bit of help is greatly appreciated, thanks in advance.
Greetings,
Rien

Using a rule:
CHARLITERAL
: SINGLEQUOTE (LETTER | DIGIT) SINGLEQUOTE
;
and changing operand to:
operand
: TRUE
| FALSE
| CHARLITERAL
| IDENTIFIER
| NUMBER
| OPAREN! or_expr CPAREN!
;
will fix the problem. It does give the problem of having singlequotes in the AST, but that can be fixed optionally by changing the text of the node with the
setText(String);
method.

Turn charliteral into a lexer rule (rename it to CHARLITERAL). Right now, the string 'a' is tokenized like this: SINGLEQUOTE IDENTIFIER SINGLEQUOTE, so you're getting an IDENTIFIER instead of a LETTER.
I wonder how this code can compile at all given that you're using a fragment (LETTER) from a parser rule.

Related

Parser (Yacc) seems like it ignores tokens in grammar

Parsing the c-like example code, i have the following issue. Its like some tokens, like identifiers, are ignored by grammar, causing a non-reason syntax error.
Parser code :
%{
#include <stdio.h>
#include <stdlib.h>
int yylex();
void yyerror (char const *);
%}
%token T_MAINCLASS T_ID T_PUBLIC T_STATIC T_VOID T_MAIN T_PRINTLN T_INT T_FLOAT T_FOR T_WHILE T_IF T_ELSE T_EQUAL T_SMALLER T_BIGGER T_NOTEQUAL T_NUM T_STRING
%left '(' ')'
%left '+' '-'
%left '*' '/'
%left '{' '}'
%left ';' ','
%left '<' '>'
%%
PROGRAM : T_MAINCLASS T_ID '{' T_PUBLIC T_STATIC T_VOID T_MAIN '(' ')' COMP_STMT '}'
;
COMP_STMT : '{' STMT_LIST '}'
;
STMT_LIST : /* nothing */
| STMT_LIST STMT
;
STMT : ASSIGN_STMT
| FOR_STMT
| WHILE_STMT
| IF_STMT
| COMP_STMT
| DECLARATION
| NULL_STMT
| T_PRINTLN '(' EXPR ')' ';'
;
DECLARATION : TYPE ID_LIST ';'
;
TYPE : T_INT
| T_FLOAT
;
ID_LIST : T_ID ',' ID_LIST
|
;
NULL_STMT : ';'
;
ASSIGN_STMT : ASSIGN_EXPR ';'
;
ASSIGN_EXPR : T_ID '=' EXPR
;
EXPR : ASSIGN_EXPR
| RVAL
;
FOR_STMT : T_FOR '(' OPASSIGN_EXPR ';' OPBOOL_EXPR ';' OPASSIGN_EXPR ')' STMT
;
OPASSIGN_EXPR : /* nothing */
| ASSIGN_EXPR
;
OPBOOL_EXPR : /* nothing */
| BOOL_EXPR
;
WHILE_STMT : T_WHILE '(' BOOL_EXPR ')' STMT
;
IF_STMT : T_IF '(' BOOL_EXPR ')' STMT ELSE_PART
;
ELSE_PART : /* nothing */
| T_ELSE STMT
;
BOOL_EXPR : EXPR C_OP EXPR
;
C_OP : T_EQUAL | '<' | '>' | T_SMALLER | T_BIGGER | T_NOTEQUAL
;
RVAL : RVAL '+' TERM
| RVAL '-' TERM
| TERM
;
TERM : TERM '*' FACTOR
| TERM '/' FACTOR
| FACTOR
;
FACTOR : '(' EXPR ')'
| T_ID
| T_NUM
;
%%
void yyerror (const char * msg)
{
fprintf(stderr, "C-like : %s\n", msg);
exit(1);
}
int main ()
{
if(!yyparse()){
printf("Compiled !!!\n");
}
}
Part of Lexical Scanner code :
{Empty}+ { printf("EMPTY ") ; /* nothing */ }
"mainclass" { printf("MAINCLASS ") ; return T_MAINCLASS ; }
"public" { printf("PUBLIC ") ; return T_PUBLIC; }
"static" { printf("STATIC ") ; return T_STATIC ; }
"void" { printf("VOID ") ; return T_VOID ; }
"main" { printf("MAIN ") ; return T_MAIN ; }
"println" { printf("PRINTLN ") ; return T_PRINTLN ; }
"int" { printf("INT ") ; return T_INT ; }
"float" { printf("FLOAT ") ; return T_FLOAT ; }
"for" { printf("FOR ") ; return T_FOR ; }
"while" { printf("WHILE ") ; return T_WHILE ; }
"if" { printf("IF ") ; return T_IF ; }
"else" { printf("ELSE ") ; return T_ELSE ; }
"==" { printf("EQUAL ") ; return T_EQUAL ; }
"<=" { printf("SMALLER ") ; return T_SMALLER ; }
">=" { printf("BIGGER ") ; return T_BIGGER ; }
"!=" { printf("NOTEQUAL ") ; return T_NOTEQUAL ; }
{id} { printf("ID ") ; return T_ID ; }
{num} { printf("NUM ") ; return T_NUM ; }
{string} { printf("STRING ") ; return T_STRING ; }
{punct} { printf("PUNCT ") ; return yytext[0] ; }
<<EOF>> { printf("EOF ") ; return T_EOF; }
. { yyerror("lexical error"); exit(1); }
Example :
mainclass Example {
public static void main ( )
{
int c;
float x, sum, mo;
c=0;
x=3.5;
sum=0.0;
while (c<5)
{
sum=sum+x;
c=c+1;
x=x+1.5;
}
mo=sum/5;
println (mo);
}
}
Running all this stuff it showed up this output:
C-like : syntax error
MAINCLASS EMPTY ID
It seems like id is in wrong position although in grammar we have:
PROGRAM : T_MAINCLASS T_ID '{' T_PUBLIC T_STATIC T_VOID T_MAIN '(' ')' COMP_STMT '}'
Based on the "solution" proposed in OP's self answer, it's pretty clear that the original problem was that the generated header used to compile the scanner was not the same as the header generated by bison/yacc from the parser specification.
The generated header includes definitions of all the token types as small integers; in order for the scanner to communicate with the parser, it must identify each token with the correct token type. So the parser generator (bison/yacc) produces a header based on the parser specification (the .y file), and that header must be #included into the generated scanner so that scanner actions can used symbolic token type names.
If the scanner was compiled with a header file generated from some previous version of the parser specification, it is quite possible that the token numbers no longer correspond with what the parser is expecting.
The easiest way to avoid this problem is to use a build system like make, which will automatically recompile the scanner if necessary.
The easiest way to detect this problem is to use bison's built-in trace facility. Enabling tracing requires only a couple of lines of code, and saves you from having to scatter printf statements throughout your scanner and parser. The bison trace will show you exactly what is going on, so not only is it less work than adding printfs, it is also more precise. In particular, it reports every token which is passed to the parser (and, with a little more effort, you can get it to report the semantic values of those tokens as well). So if the parser is getting the wrong token code, you'll see that right away.
After many potential helpful changes, parser worked by changing the order of these tokens.
From
%token T_MAINCLASS T_ID T_PUBLIC T_STATIC T_VOID T_MAIN T_PRINTLN T_INT T_FLOAT T_FOR T_WHILE T_IF T_ELSE T_EQUAL T_SMALLER T_BIGGER T_NOTEQUAL T_NUM T_STRING
TO
%token T_MAINCLASS T_PUBLIC T_STATIC T_VOID T_MAIN T_PRINTLN T_INT T_FLOAT T_FOR T_WHILE T_IF T_EQUAL T_ID T_NUM T_SMALLER T_BIGGER T_NOTEQUAL T_ELSE T_STRING
It looked like that the reading element was else but lexer normaly returned an id. Somehow this modification was the solution.

Need help starting with Tatsu to parse grammar

I am getting a Tatsu error
"tatsu.exceptions.FailedExpectingEndOfText: (1:1) Expecting end of text"
running a test, using a grammar I supplied - it is not clear what the problem is.
In essence, the statement calling the parser is:
ast = parse(GRAMMAR, '(instance ?FIFI Dog)')
The whole python file follows:
GRAMMAR = """
##grammar::SUOKIF
KIF = {KIFexpression}* $ ;
WHITESPACE = /\s+/ ;
StringLiteral = /['"'][A-Za-z]+['"']/ ;
NumericLiteral = /[0-9]+/ ;
Identifier = /[A-Za-z]+/ ;
LPAREN = "(" ;
RPAREN = ")" ;
QUESTION = "?" ;
MENTION = "#" ;
EQUALS = "=" ;
RARROW = ">" ;
LARROW = "<" ;
NOT = "not"|"NOT" ;
OR = "or"|"OR" ;
AND = "and"|"AND" ;
FORALL = "forall"|"FORALL" ;
EXISTS = "exists"|"EXISTS" ;
STRINGLITERAL = {StringLiteral} ;
NUMERICLITERAL = {NumericLiteral} ;
IDENTIFIER = {Identifier} ;
KIFexpression
= Word
| Variable
| String
| Number
| Sentence
;
Sentence = Equation
| RelSent
| LogicSent
| QuantSent
;
LogicSent
= Negation
| Disjunction
| Conjunction
| Implication
| Equivalence
;
QuantSent
= UniversalSent
| ExistentialSent
;
Word = IDENTIFIER ;
Variable = ( QUESTION | MENTION ) IDENTIFIER ;
String = STRINGLITERAL ;
Number = NUMERICLITERAL ;
ArgumentList
= {KIFexpression}*
;
VariableList
= {Variable}+
;
Equation = LPAREN EQUALS KIFexpression KIFexpression RPAREN ;
RelSent = LPAREN ( Variable | Word ) ArgumentList RPAREN ;
Negation = LPAREN NOT KIFexpression RPAREN ;
Disjunction
= LPAREN OR ArgumentList RPAREN
;
Conjunction
= LPAREN AND ArgumentList RPAREN
;
Implication
= LPAREN EQUALS RARROW KIFexpression KIFexpression RPAREN
;
Equivalence
= LPAREN LARROW EQUALS RARROW KIFexpression KIFexpression RPAREN
;
UniversalSent
= LPAREN FORALL LPAREN VariableList RPAREN KIFexpression RPAREN
;
ExistentialSent
= LPAREN EXISTS LPAREN VariableList RPAREN KIFexpression RPAREN
;
"""
if __name__ == '__main__':
import pprint
import json
from tatsu import parse
from tatsu.util import asjson
ast = parse(GRAMMAR, '(instance ?FIFI Dog)')
print('# PPRINT')
pprint.pprint(ast, indent=2, width=20)
print()
print('# JSON')
print(json.dumps(asjson(ast), indent=2))
print()
Can anyone help me with a fix?
Thanks.
Colin Goldberg
I can see two problems with that grammar.
As written in man pages, rule names that start with upper case character have special meaning. Change all the rule names to lower case.
Also let's review IDENTIFIER rule:
IDENTIFIER = {Identifier} ;
This means that identifier can be used multiple times, or may be missing at all. Remove the closure by defining IDENTIFIER directly:
IDENTIFIER = /[A-Za-z]+/ ;
You can do the same for NUMERICLITERAL and STRINGLITERAL.
When I did those steps, the expression could be parsed.
You need to pass the name of the "start" symbol to parse().
You can also define:
start = KIF ;
in the grammar.

ANTLR4 precedence of operator

This is my grammar:
grammar FOOL;
#header {
import java.util.ArrayList;
}
#lexer::members {
public ArrayList<String> lexicalErrors = new ArrayList<>();
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
prog : exp SEMIC #singleExp
| let exp SEMIC #letInExp
| (classdec)+ SEMIC (let)? exp SEMIC #classExp
;
classdec : CLASS ID ( EXTENDS ID )? (LPAR (vardec ( COMMA vardec)*)? RPAR)? (CLPAR ((fun SEMIC)+)? CRPAR)?;
let : LET (dec SEMIC)+ IN ;
vardec : type ID ;
varasm : vardec ASM exp ;
fun : type ID LPAR ( vardec ( COMMA vardec)* )? RPAR (let)? exp ;
dec : varasm #varAssignment
| fun #funDeclaration
;
type : INT
| BOOL
| ID
;
exp : left=term (operator=(PLUS | MINUS) right=term)*
;
term : left=factor (operator=(TIMES | DIV) right=factor)*
;
factor : left=value (operator=(EQ | LESSEQ | GREATEREQ | GREATER | LESS | AND | OR ) right=value)*
;
value : MINUS?INTEGER #intVal
| (NOT)? ( TRUE | FALSE ) #boolVal
| LPAR exp RPAR #baseExp
| IF cond=exp THEN CLPAR thenBranch=exp CRPAR (ELSE CLPAR elseBranch=exp CRPAR)? #ifExp
| MINUS?ID #varExp
| THIS #thisExp
| funcall #funExp
| (ID | THIS) DOT funcall #methodExp
| NEW ID ( LPAR (exp (COMMA exp)* )? RPAR)? #newExp
| PRINT ( exp ) #print
;
/* PRINT LPAR exp RPAR */
funcall
: ID ( LPAR (exp (COMMA exp)* )? RPAR )
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
SEMIC : ';' ;
COLON : ':' ;
COMMA : ',' ;
EQ : '==' ;
ASM : '=' ;
PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIV : '/' ;
TRUE : 'true' ;
FALSE : 'false' ;
LPAR : '(' ;
RPAR : ')' ;
CLPAR : '{' ;
CRPAR : '}' ;
IF : 'if' ;
THEN : 'then' ;
ELSE : 'else' ;
PRINT : 'print' ;
LET : 'let' ;
IN : 'in' ;
VAR : 'var' ;
FUN : 'fun' ;
INT : 'int' ;
BOOL : 'bool' ;
CLASS : 'class' ;
EXTENDS : 'extends' ;
THIS : 'this' ;
NEW : 'new' ;
DOT : '.' ;
LESSEQ : ('<=' | '=<') ;
GREATEREQ : ('>=' | '=>') ;
GREATER: '>' ;
LESS : '<' ;
AND : '&&' ;
OR : '||' ;
NOT : '!' ;
//Numbers
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+;
//IDs
fragment CHAR : 'a'..'z' |'A'..'Z' ;
ID : CHAR (CHAR | DIGIT)* ;
//ESCAPED SEQUENCES
WS : (' '|'\t'|'\n'|'\r')-> skip;
LINECOMENTS : '//' (~('\n'|'\r'))* -> skip;
BLOCKCOMENTS : '/*'( ~('/'|'*')|'/'~'*'|'*'~'/'|BLOCKCOMENTS)* '*/' -> skip;
ERR_UNKNOWN_CHAR
: . { lexicalErrors.add("UNKNOWN_CHAR " + getText()); }
;
I think that there is a problem in the grammar concerning the precedence of operator.
In particular, this one
let
int x = (5-2)+4;
in
print x;
prints 7, while this one:
let
int x = 5-2+4;
in
print x;
prints 9.
Why the first one works? How can I make the second one working, only changing the grammar?
I think there is something to change in exp, term or factor.
This is the first parse tree http://it.tinypic.com/r/2nj8tqw/9 .
This is the second parse tree http://it.tinypic.com/r/2iv02z6/9 .
exp : left=term (operator=(PLUS | MINUS) right=exp)?
This produces parse tree that is causing it. Simply put, 5 - 2 + 4 will be parsed as:
term PLUS exp
2 term MINUS exp
2 term
4
This should help, although you'll have to change the evaluation logic:
exp : left=term (operator=(PLUS | MINUS) right=term)*
Same for factor and any other possible binary operations.

Error generating files in ANTLR

So I'm trying to write a parser in ANTLR, this is my first time using it and I'm running into a problem that I can't find a solution for, apologies if this is a very simple problem. Anyway, the error I'm getting is:
"(100): Expr.g:1:13:syntax error: antlr: MismatchedTokenException(74!=52)"
The code I'm currently using is:
grammar Expr.g;
options{
output=AST;
}
tokens{
MAIN = 'main';
OPENBRACKET = '(';
CLOSEBRACKET = ')';
OPENCURLYBRACKET = '{';
CLOSECURLYBRACKET = '}';
COMMA = ',';
SEMICOLON = ';';
GREATERTHAN = '>';
LESSTHAN = '<';
GREATEROREQUALTHAN = '>=';
LESSTHANOREQUALTHAN = '<=';
NOTEQUAL = '!=';
ISEQUALTO = '==';
WHILE = 'while';
IF = 'if';
ELSE = 'else';
READ = 'read';
OUTPUT = 'output';
PRINT = 'print';
RETURN = 'return';
READC = 'readc';
OUTPUTC = 'outputc';
PLUS = '+';
MINUS = '-';
DIVIDE = '/';
MULTIPLY = '*';
PERCENTAGE = '%';
}
#header {
//package test;
import java.util.HashMap;
}
#lexer::header {
//package test;
}
#members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}
prog: stat+ ;
stat: expr NEWLINE {System.out.println($expr.value);}
| ID '=' expr NEWLINE
{memory.put($ID.text, new Integer($expr.value));}
| NEWLINE
;
expr returns [int value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += $e.value;}
| '-' e=multExpr {$value -= $e.value;}
)*
;
multExpr returns [int value]
: e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
;
atom returns [int value]
: INT {$value = Integer.parseInt($INT.text);}
| ID
{
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ) $value = v.intValue();
else System.err.println("undefined variable "+$ID.text);
}
| '(' e=expr ')' {$value = $e.value;}
;
IDENT : ('a'..'z'^|'A'..'Z'^)+ ; : .;
INT : '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS : (' '|'\t')+ {skip();} ;
Thanks for any help.
EDIT: Well, I'm an idiot, it's just a formatting error. Thanks for the responses from those who helped out.
You have some illegal characters after your IDENT token:
IDENT : ('a'..'z'^|'A'..'Z'^)+ ; : .;
The : .; are invalid there. And you're also trying to mix the tree-rewrite operator ^ inside a lexer rule, which is illegal: remove them. Lastly, you've named it IDENT while in your parser rules, you're using ID.
It should be:
ID : ('a'..'z' | 'A'..'Z')+ ;

How to turn this into a parser

If I just add on to the following yacc file, will it turn into a parser?
/* C-Minus BNF Grammar */
%token ELSE
%token IF
%token INT
%token RETURN
%token VOID
%token WHILE
%token ID
%token NUM
%token LTE
%token GTE
%token EQUAL
%token NOTEQUAL
%%
program : declaration_list ;
declaration_list : declaration_list declaration | declaration ;
declaration : var_declaration | fun_declaration ;
var_declaration : type_specifier ID ';'
| type_specifier ID '[' NUM ']' ';' ;
type_specifier : INT | VOID ;
fun_declaration : type_specifier ID '(' params ')' compound_stmt ;
params : param_list | VOID ;
param_list : param_list ',' param
| param ;
param : type_specifier ID | type_specifier ID '[' ']' ;
compound_stmt : '{' local_declarations statement_list '}' ;
local_declarations : local_declarations var_declaration
| /* empty */ ;
statement_list : statement_list statement
| /* empty */ ;
statement : expression_stmt
| compound_stmt
| selection_stmt
| iteration_stmt
| return_stmt ;
expression_stmt : expression ';'
| ';' ;
selection_stmt : IF '(' expression ')' statement
| IF '(' expression ')' statement ELSE statement ;
iteration_stmt : WHILE '(' expression ')' statement ;
return_stmt : RETURN ';' | RETURN expression ';' ;
expression : var '=' expression | simple_expression ;
var : ID | ID '[' expression ']' ;
simple_expression : additive_expression relop additive_expression
| additive_expression ;
relop : LTE | '<' | '>' | GTE | EQUAL | NOTEQUAL ;
additive_expression : additive_expression addop term | term ;
addop : '+' | '-' ;
term : term mulop factor | factor ;
mulop : '*' | '/' ;
factor : '(' expression ')' | var | call | NUM ;
call : ID '(' args ')' ;
args : arg_list | /* empty */ ;
arg_list : arg_list ',' expression | expression ;
Heh
Its only a grammer of PL
To make it a parser you need to add some code into this.
Like there http://dinosaur.compilertools.net/yacc/index.html
Look at chapter 2. Actions
Also you'd need lexical analyzer -- 3: Lexical Analysis

Resources