Grammar parser suggestion (ANTLR)

Grammar parser suggestion (ANTLR) - parsing

Well I'm trying to write a simple QBasic grammar on Antlr4. And 'Else-If' loop won't works correctly, it automatically converts to assigncommandafter THEN. And could you review my grammar, is there any improvements?
How to write an string in regular expression.(also with cyrillic alphabets)
Should i write these key words ('PRINT' 'IF')? or use lexer(like ..PRINTKEY; PRINTKEY : 'PRINT')
grammar Hello3;
// AssignCommand; MainCommand; FlowCommand
prog : (assigncommand | maincommand | flowcommand)+;
// AssignInt; AssignString
// MyAge = PreviousAge + 1
// MyName$ = FirstName$ + MiddleName$ + LastName$
assigncommand : assignint | assignstring;
assignint : IDINT '=' (IDINT | INT) (OPERATORMATH (IDINT | INT))* '\n'+;
assignstring : IDSTRING '=' (IDSTRING | STRING) ('+' (IDSTRING | STRING))* '\n'+;
//PrintCommand, InputCommand
//PRINT MyName$, MyAge, "Hello", 123
//INPUT "What is your name?", yourname$
//(or)INPUT yourname$
maincommand : printcommand | inputcommand;
printcommand : 'PRINT' (',' (IDINT | IDSTRING | STRING | INT))+ '\n'+;
inputcommand : 'INPUT' (IDINT | IDSTRING | STRING)? ',' (IDINT | IDSTRING) '\n'+;
//If-ElseFlow; WhileFlow
//If-Else-Add; Else-Add
//
//IF a > 3 THEN
//PRINT a
//a = a -1
//ELSE IF a = 1 THEN
//b = a
//END IF
//
//WHILE a > 3
//a = a - 1
//PRINT a
//WEND
flowcommand : ifelseflow | whileflow;
ifelseflow : 'IF' conditionflow 'THEN' '\n' ifelseadd* elseadd* 'END' 'IF' '\n'+;
whileflow : 'WHILE' conditionflow '\n' (assigncommand | maincommand | flowcommand)* 'WEND' '\n'+;
conditionflow : ((INT | IDINT) OPERATORBOOL (INT | IDINT)) | ((STRING | IDSTRING) '=' (STRING | IDSTRING));
ifelseadd : 'ELSEIF' conditionflow 'THEN' '\n' ((assigncommand | maincommand | flowcommand) '\n')+;
elseadd : 'ELSE' '\n' ((assigncommand | maincommand | flowcommand) '\n')+;
//Lexers
INT : [0-9]+;
STRING : '"' [a-zA-Z\u0400-\u04FF\0-9' ''?'':']+ '"';
IDINT : [a-zA-Z]([a-zA-Z0-9]*); //MyAge
IDSTRING : [a-zA-Z]([a-zA-Z0-9]*)'$'; //MyName$
OPERATORMATH : '+'|'-'|'*'|'/';
OPERATORBOOL : '='|'>'|'<'|'>='|'<=';
WS : [ \t\r]+ -> skip;

Like you, I found implementation of if..else constructs in a BASIC-like language a real challenge to implement. I found some good resources online. Please take a look at my grammar snip:
ifstmt
: IF condition_block (ELSE IF condition_block)* (ELSE stmt_block)?
;
condition_block
: expr stmt_block
;
stmt_block
: OBRACE statement+ CBRACE
| statement
;
And my implementation (in C# visitor pattern):
public override MuValue VisitIfstmt(LISBASICParser.IfstmtContext context)
{
LISBASICParser.Condition_blockContext[] conditions = context.condition_block();
bool evaluatedBlock = false;
foreach (LISBASICParser.Condition_blockContext condition in conditions)
{
MuValue evaluated = Visit(condition.expr());
if (evaluated.AsBoolean())
{
evaluatedBlock = true;
Visit(condition.stmt_block());
break;
}
}
if (!evaluatedBlock && context.stmt_block() != null)
{
Visit(context.stmt_block());
}
return MuValue.Void;
}
I borrowed the MuValue idea from Bart Kiers's excellent implementation of his Mu language. Lots of great ideas in that project of his.

Related

antlr4 line 2:0 mismatched input 'if' expecting {'if', OTHER}

I am having a bit of difficulty in my g4 file. Below is my grammar:
// Define a grammar called Hello
grammar GYOO;
program : 'begin' block+ 'end';
block
: statement+
;
statement
: assign
| print
| add
| ifstatement
| OTHER {System.err.println("unknown char: " + $OTHER.text);}
;
assign
: 'let' ID 'be' expression
;
print
: 'print' (NUMBER | ID)
;
ifstatement
: 'if' condition_block (ELSE IF condition_block)* (ELSE stat_block)?
;
add
: (NUMBER | ID) OPERATOR (NUMBER | ID) ASSIGN ID
;
stat_block
: OBRACE block CBRACE
| statement
;
condition_block
: expression stat_block
;
expression
: NOT expression //notExpr
| expression (MULT | DIV | MOD) expression //multiplicationExpr
| expression (PLUS | MINUS) expression //additiveExpr
| expression (LTEQ | GTEQ | LT | GT) expression //relationalExpr
| expression (EQ | NEQ) expression //equalityExpr
| expression AND expression //andExpr
| expression OR expression //orExpr
| atom //atomExpr
;
atom
: (NUMBER | FLOAT) //numberAtom
| (TRUE | FALSE) //booleanAtom
| ID //idAtom
| STRING //stringAtom
| NULL //nullAtom
;
ID : [a-z]+ ;
NUMBER : [0-9]+ ;
OPERATOR : '+' | '-' | '*' | '/';
ASSIGN : '=';
WS : (' ' | '\t' | '\r' | '\n') + -> skip;
OPAR : '(';
CPAR : ')';
OBRACE : '{';
CBRACE : '}';
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
AND : 'and';
EQ : 'is'; //'=='
NEQ : 'is not'; //'!='
GT : 'greater'; //'>'
LT : 'lower'; //'<'
GTEQ : 'is greater'; //'>='
LTEQ : 'is lower'; //'<='
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
MOD : '%';
POW : '^';
NOT : 'not';
FLOAT
: [0-9]+ '.' [0-9]*
| '.' [0-9]+
;
STRING
: '"' (~["\r\n] | '""')* '"'
;
COMMENT
: '/*' .*? '*/' -> channel(HIDDEN)
;
LINE_COMMENT
: '//' ~[\r\n]* -> channel(HIDDEN)
;
OTHER
: .
;
When i try to -gui tree from antlr it shows me this error:
line 2:3 missing OPERATOR at 'a'
This error is given from this code example:
begin
let a be true
if a is true
print a
end
Basically it does not recognizes the ifstatement beggining with IF 'if' and it shows the tree like i am making an assignment.
How can i fix this?
P.S. I also tried to reposition my statements. Also tried to remove all statements and leave only ifstatement, and same thing happens.
Thanks

There is at least one issue:
ID : [a-z]+ ;
...
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
...
NOT : 'not';
Since ID is placed before TRUE .. NOT, those tokens will never be created since ID has precedence over them (and ID matches these tokens as well).
Start by moving ID beneath the NOT token.

ANTLR 'or' regular expression

I have a serious problem about | expression.
My grammar contains expression like this.
...ifelse : 'IF' condition 'THEN' dosomething+ 'ENDIF'
...dosomething : assign | print | input;
but dosomething becomes constant. For example :
IF a > 3 THEN
PRINT "HEllo"
b = a
ENDIF
so first dosomething is print and grammar can't read assing, input.
If statements become like this, it works correct
IF a > 3 THEN
PRINT "HEllo"
PRINT myName
ENDIF
So i mean 'or' ( | | )+ expression becomes constants same as first occured expression.
grammar hellog;
prog : command+;
command : maincommand
| expressioncommand
| flowcommand
;
//main
maincommand : printcommand
| inputcommand
;
printcommand : 'PRINT' (IDINT | IDSTR | STRING) NL
| 'PRINT' (IDINT | IDSTR | STRING) (',' (IDINT | IDSTR | STRING))* NL
;
inputcommand : 'INPUT' (IDINT | IDSTR) NL
| 'INPUT' STRING? (IDINT | IDSTR) NL
;
//expression
expressioncommand : intexpression
| strexpression
;
intexpression : IDINT '=' (IDINT | INT) NL
| IDINT '=' (IDINT | INT) (OPERATORMATH (IDINT | INT))* NL
;
strexpression : IDSTR '=' (IDSTR | STRING) NL
| IDSTR '=' (IDSTR | STRING) ('+' (IDSTR | STRING))* NL
;
//flow
flowcommand : ifelseflow
| whileflow
;
ifelseflow : 'IF' conditionflow 'THEN' NL dosomething+ ('ELSEIF' conditionflow 'THEN' NL dosomething+)* ('ELSE' NL dosomething+)? 'ENDIF' NL;
whileflow : 'WHILE' conditionflow NL (dosomething)+ 'WEND' NL;
dosomething : command;
conditionflow : (INT | IDINT) OPERATORBOOL (INT | IDINT)
| (STRING | IDSTR) '=' (STRING | IDSTR)
;
INT : [0-9]+;
STRING : '"' .*? '"';
IDINT : [a-zA-Z]+;
IDSTR : [a-zA-Z]+'$';
NL : '\n';
WS : [ \t\r]+ -> skip;
OPERATORMATH : '+' | '-' | '*' | '/';
OPERATORBOOL : '=' | '>' | '<' | '>=' | '<=';
I just need a grammar to run these expression:
PRINT "Your name"
INPUT name
PRINT "HELLO" name
a = 6
IF a > 3 THEN
PRINT a
a = a -1
END IF
WHILE b = 3
PRINT b
a = b
WEND

My answer isn't exactly about the | alternatives, but please keep reading, because like you, I found implementation of if..else constructs in a BASIC-like language a real challenge to implement. I found some good resources online. When I got it right, many, many problems disappeared all at once and it just started to work. Please take a look at my grammar snip:
ifstmt
: IF condition_block (ELSE IF condition_block)* (ELSE stmt_block)?
;
condition_block
: expr stmt_block
;
stmt_block
: OBRACE statement+ CBRACE
| statement
;
And my implementation (in C# visitor pattern):
public override MuValue VisitIfstmt(LISBASICParser.IfstmtContext context)
{
LISBASICParser.Condition_blockContext[] conditions = context.condition_block();
bool evaluatedBlock = false;
foreach (LISBASICParser.Condition_blockContext condition in conditions)
{
MuValue evaluated = Visit(condition.expr());
if (evaluated.AsBoolean())
{
evaluatedBlock = true;
Visit(condition.stmt_block());
break;
}
}
if (!evaluatedBlock && context.stmt_block() != null)
{
Visit(context.stmt_block());
}
return MuValue.Void;
}
Much borrowed from Bart Kiers's excellent implementation of his Mu demonstration language. Lots of great ideas in that project of his. It really showed me the light and this code I've shown handles if statements great, nested arbitrarily deep if you need that. This is production code running a critical domain-specific language.

Assignment as expression in Antlr grammar

I'm trying to extend the grammar of the Tiny Language to treat assignment as expression. Thus it would be valid to write
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
a = 1 = 2; // invalid
Assignment differs from other operators in two aspects. It's right associative (not a big deal), and its left-hand side is has to be a variable. So I changed the grammar like this
statement: assignmentExpr | functionCall ...;
assignmentExpr: Identifier indexes? '=' expression;
expression: assignmentExpr | condExpr;
It doesn't work, because it contains a non-LL(*) decision. I also tried this variant:
assignmentExpr: Identifier indexes? '=' (expression | condExpr);
but I got the same error. I am interested in
This specific question
Given a grammar with a non-LL(*) decision, how to find the two paths that cause the problem
How to fix it

I think you can change your grammar like this to achieve the same, without using syntactic predicates:
statement: Expr ';' | functionCall ';'...;
Expr: Identifier indexes? '=' Expr | condExpr ;
condExpr: .... and so on;
I altered Bart's example with this idea in mind:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
}
parse
: stat+ EOF -> ^(ROOT stat+)
;
stat
: expr ';'
;
expr
: Id Assign expr -> ^(Assign Id expr)
| add
;
add
: mult (('+' | '-')^ mult)*
;
mult
: atom (('*' | '/')^ atom)*
;
atom
: Id
| Num
| '('! expr ')' !
;
Assign : '=' ;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
And for the input:
a=b=4;
a = 2 * (b = 1);
you get following parse tree:

The key here is that you need to "assure" the parser that inside an expression, there is something ahead that satisfies the expression. This can be done using a syntactic predicate (the ( ... )=> parts in the add and mult rules).
A quick demo:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
ASSIGN;
}
parse
: stat* EOF -> ^(ROOT stat+)
;
stat
: expr ';' -> expr
;
expr
: add
;
add
: mult ((('+' | '-') mult)=> ('+' | '-')^ mult)*
;
mult
: atom ((('*' | '/') atom)=> ('*' | '/')^ atom)*
;
atom
: (Id -> Id) ('=' expr -> ^(ASSIGN Id expr))?
| Num
| '(' expr ')' -> expr
;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
which will parse the input:
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
into the following AST:

Unusual ANTLR error when attempting to reorganize grammar into two files

I am reorganizing my grammar into two files in order to accomodate a tree grammar; Lua.g and LuaGrammar.g. Lua.g will have all of my lexer rules, and LuaGrammar.g will have all of my tree grammar and parser rules. However, when i try and compile LuaGrammar.g i get the following error:
[00:28:37] error(10): internal error: C:\Users\RCIX\Desktop\AguaLua\Project\trunk\AguaLua\AguaLua\ANTLR Data\LuaGrammar.g : java.lang.IllegalArgumentException: Can't find template ruleRefBang.st; group hierarchy is [CSharp2]
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:507)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:484)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:484)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.grammar.v2.CodeGenTreeWalker.getRuleElementST(CodeGenTreeWalker.java:152)
org.antlr.grammar.v2.CodeGenTreeWalker.atom(CodeGenTreeWalker.java:1986)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1708)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1556)
org.antlr.grammar.v2.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:1306)
org.antlr.grammar.v2.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1081)
org.antlr.grammar.v2.CodeGenTreeWalker.ebnf(CodeGenTreeWalker.java:1871)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1704)
org.antlr.grammar.v2.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:1306)
org.antlr.grammar.v2.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1081)
org.antlr.grammar.v2.CodeGenTreeWalker.rule(CodeGenTreeWalker.java:797)
org.antlr.grammar.v2.CodeGenTreeWalker.rules(CodeGenTreeWalker.java:588)
org.antlr.grammar.v2.CodeGenTreeWalker.grammarSpec(CodeGenTreeWalker.java:530)
org.antlr.grammar.v2.CodeGenTreeWalker.grammar(CodeGenTreeWalker.java:336)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:432)
org.antlr.Tool.generateRecognizer(Tool.java:641)
org.antlr.Tool.process(Tool.java:454)
org.antlr.works.generate.CodeGenerate.generate(CodeGenerate.java:104)
org.antlr.works.generate.CodeGenerate.run(CodeGenerate.java:185)
java.lang.Thread.run(Unknown Source)
And, i'm getting the following error:
[00:34:58] error(100): C:\Users\RCIX\Desktop\AguaLua\Project\trunk\AguaLua\AguaLua\ANTLR Data\Lua.g:0:0: syntax error: codegen: <AST>:0:0: unexpected end of subtree
when attempting to generate Lua.g. Why am i getting these errors, and how can i fix them? (Using ANTLR V3, am able to provide grammar files)
Update: here is the grammar file i am trying to compile.
tree grammar LuaGrammar;
options {
backtrack=true;
language=CSharp2;
output=AST;
tokenVocab=Lua;
filter=true;
ASTLabelType=CommonTree;
}
assignment
:
^('=' left=NAME right=NAME) {Ast. };
/*
chunk : (stat (';'!)?)* (laststat (';'!)?)?;
block : chunk;
stat : varlist1 '='^ explist1 |
functioncall |
doblock |
'while'^ exp doblock |
'repeat'^ block untilrule |
'if'^ exp thenchunk elseifchunk* elsechunk? 'end'! |
'for'^ forinitializer doblock |
'for'^ namelist inlist doblock |
'function'^ funcname funcbody |
'local' 'function' NAME funcbody |
'local'^ namelist localstat? ;
localstat
: '='^ explist1;
untilrule
: 'until'^ exp;
elseifchunk
: 'elseif'^ exp thenchunk;
thenchunk
: 'then'^ block;
elsechunk
: 'else'^ block;
forinitializer
: NAME '='^ exp ','! exp (','! exp)?;
doblock
: 'do'^ block 'end'!;
inlist
: 'in'^ explist1;
laststat : 'return'^ (explist1)? | 'break';
dotname : '.'! funcname;
colonname
: ':' NAME;
funcname : NAME^ (dotname | colonname)?;
varlist1 : var (','! var)*;
namelist : NAME (','! NAME)*;
explist1 : (exp ','!)* exp;
*/
/*
exp : expelement (binop^ exp)* ;
expelement
: ('nil' | 'false' | 'true' | number | stringrule | '...' | /*function |*\ prefixexp | tableconstructor | unop exp);
var: (namevar | dotvar | expvar | arrayvar)?;
namevar
: NAME^ var;
dotvar
: '.'! var;
expvar
: '('^ exp ')'! var;
arrayvar
: '['^ var ']'! var;
varSuffix: nameAndArgs* ('[' exp ']' | '.' NAME);
prefixexp: varOrExp nameAndArgs*;
functioncall: varOrExp nameAndArgs+;
varOrExp: var | '('! exp ')'!;
nameAndArgs: (':' NAME)? argsrule;
argsrule : '(' (explist1)? ')' | tableconstructor | stringrule ;
function : 'function' funcbody;
funcbody : funcparams funcblock;
funcblock
: ')'^ block 'end'!;
funcparams
: '('^ parlist1? ;
parlist1 : namelist (','! '...')? | '...';
tableconstructor : '{'^ (fieldlist)? '}'!;
fieldlist : field (fieldsep! field)* (fieldsep!)?;
field : '['! exp ']'! '='^ exp | NAME '='^ exp | exp;
*/
fieldsep : ',' | ';';
binop : '+' | '-' | '*' | '/' | '^' | '%' | '..' |
'<' | '<=' | '>' | '>=' | '==' | '~=' |
'and' | 'or';
unop : '-' | 'not' | '#';
number : INT | FLOAT | EXP | HEX;
stringrule : NORMALSTRING | CHARSTRING | LONGSTRING;
Lua.g:
/*
* Lua 5.1 grammar
*
* Nicolai Mainiero
* May 2007
*
* This is a Lua (http://www.lua.org) grammar for the version 5.1 for ANTLR 3.
* I tested it with basic and extended examples and it worked fine. It is also used
* for LunarEclipse (http://lunareclipse.sf.net) a Lua editor based on Eclipse.
*
* Thanks to Johannes Luber and Gavin Lambert who helped me with some mutually left recursion.
*
*/
grammar Lua;
options {
backtrack=true;
language=CSharp2;
//output=AST;
//ASTLabelType=CommonTree;
}
#lexer::namespace{AguaLua}
chunk : (stat (';'!)?)* (laststat (';'!)?)?;
block : chunk;
stat : varlist1 '='^ explist1 |
functioncall |
doblock |
'while'^ exp doblock |
'repeat'^ block untilrule |
'if'^ exp thenchunk elseifchunk* elsechunk? 'end'! |
'for'^ forinitializer doblock |
'for'^ namelist inlist doblock |
'function'^ funcname funcbody |
'local' 'function' NAME funcbody |
'local'^ namelist localstat? ;
localstat
: '='^ explist1;
untilrule
: 'until'^ exp;
elseifchunk
: 'elseif'^ exp thenchunk;
thenchunk
: 'then'^ block;
elsechunk
: 'else'^ block;
forinitializer
: NAME '='^ exp ','! exp (','! exp)?;
doblock
: 'do'^ block 'end'!;
inlist
: 'in'^ explist1;
laststat : 'return'^ (explist1)? | 'break';
dotname : '.'! funcname;
colonname
: ':' NAME;
funcname : NAME^ (dotname | colonname)?;
varlist1 : var (','! var)*;
namelist : NAME (','! NAME)*;
explist1 : (exp ','!)* exp;
exp : expelement (binop^ exp)* ;
expelement
: ('nil' | 'false' | 'true' | number | stringrule | '...' | function | prefixexp | tableconstructor | unop exp);
var: (namevar | dotvar | expvar | arrayvar)?;
namevar
: NAME^ var;
dotvar
: '.'! var;
expvar
: '('^ exp ')'! var;
arrayvar
: '['^ var ']'! var;
varSuffix: nameAndArgs* ('[' exp ']' | '.' NAME);
prefixexp: varOrExp nameAndArgs*;
functioncall: varOrExp nameAndArgs+;
varOrExp: var | '('! exp ')'!;
nameAndArgs: (':' NAME)? argsrule;
argsrule : '(' (explist1)? ')' | tableconstructor | stringrule ;
function : 'function' funcbody;
funcbody : funcparams funcblock;
funcblock
: ')'^ block 'end'!;
funcparams
: '('^ parlist1? ;
parlist1 : namelist (','! '...')? | '...';
tableconstructor : '{'^ (fieldlist)? '}'!;
fieldlist : field (fieldsep! field)* (fieldsep!)?;
field : '['! exp ']'! '='^ exp | NAME '='^ exp | exp;
fieldsep : ',' | ';';
binop : '+' | '-' | '*' | '/' | '^' | '%' | '..' |
'<' | '<=' | '>' | '>=' | '==' | '~=' |
'and' | 'or';
unop : '-' | 'not' | '#';
number : INT | FLOAT | EXP | HEX;
stringrule : NORMALSTRING | CHARSTRING | LONGSTRING;
// LEXER
NAME :('a'..'z'|'A'..'Z'|'_')(options{greedy=true;}: 'a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
INT : ('0'..'9')+;
FLOAT :INT '.' INT ;
EXP : (INT| FLOAT) ('E'|'e') ('-')? INT;
HEX :'0x' ('0'..'9'| 'a'..'f')+ ;
NORMALSTRING
: '"' ( EscapeSequence | ~('\\'|'"') )* '"'
;
CHARSTRING
: '\'' ( EscapeSequence | ~('\''|'\\') )* '\''
;
LONGSTRING
: '['('=')*'[' ( EscapeSequence | ~('\\'|']') )* ']'('=')*']'
;
fragment
EscapeSequence
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UnicodeEscape
| OctalEscape
;
fragment
OctalEscape
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
COMMENT
: '--[[' ( options {greedy=false;} : . )* ']]' {Skip();}
;
LINE_COMMENT : '--' (~ NEWLINE)* {Skip();};
fragment NEWLINE : '\r'|'\n' | '\r\n' ;
WS : (' '|'\t'|'\u000C') {Skip();};
(both are based off of a grammar produced by Nicolai Mainero and available at ANTLR's site, Lua 5.1 grammar)
If i uncomment anymore than this, it comes up with the error above.

Okay, a 'Can't find template ruleRefBang.st' has something to do with the illegal use of a "tree exclude" operator: !. Usually, it is a contradicting rewrite rule: somewhere you have a ! and then rewrite it using -> but use that ignored token anyway. Since I cannot see a -> in your grammar, that can't be the case (unless you simplified the tree grammar to post here and removed some rewrite rules?).
Anyway, I'd start by removing all ! operators in your tree grammar and if your grammar then works put them, one by one, back in again. Then you should be able to pin point the place in your grammar that houses the illegal !.
Good luck!

Parsing string interpolation in ANTLR

I'm working on a simple string manipulation DSL for internal purposes, and I would like the language to support string interpolation as it is used in Ruby.
For example:
name = "Bob"
msg = "Hello ${name}!"
print(msg) # prints "Hello Bob!"
I'm attempting to implement my parser in ANTLRv3, but I'm pretty inexperienced with using ANTLR so I'm unsure how to implement this feature. So far, I've specified my string literals in the lexer, but in this case I'll obviously need to handle the interpolation content in the parser.
My current string literal grammar looks like this:
STRINGLITERAL : '"' ( StringEscapeSeq | ~( '\\' | '"' | '\r' | '\n' ) )* '"' ;
fragment StringEscapeSeq : '\\' ( 't' | 'n' | 'r' | '"' | '\\' | '$' | ('0'..'9')) ;
Moving the string literal handling into the parser seems to make everything else stop working as it should. Cursory web searches didn't yield any information. Any suggestions as to how to get started on this?

I'm no ANTLR expert, but here's a possible grammar:
grammar Str;
parse
: ((Space)* statement (Space)* ';')+ (Space)* EOF
;
statement
: print | assignment
;
print
: 'print' '(' (Identifier | stringLiteral) ')'
;
assignment
: Identifier (Space)* '=' (Space)* stringLiteral
;
stringLiteral
: '"' (Identifier | EscapeSequence | NormalChar | Space | Interpolation)* '"'
;
Interpolation
: '${' Identifier '}'
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
;
EscapeSequence
: '\\' SpecialChar
;
SpecialChar
: '"' | '\\' | '$'
;
Space
: (' ' | '\t' | '\r' | '\n')
;
NormalChar
: ~SpecialChar
;
As you notice, there are a couple of (Space)*-es inside the example grammar. This is because the stringLiteral is a parser-rule instead of a lexer-rule. Therefor, when tokenizing the source file, the lexer cannot know if a white space is part of a string literal, or is just a space inside the source file that can be ignored.
I tested the example with a little Java class and all worked as expected:
/* the same grammar, but now with a bit of Java code in it */
grammar Str;
#parser::header {
package antlrdemo;
import java.util.HashMap;
}
#lexer::header {
package antlrdemo;
}
#parser::members {
HashMap<String, String> vars = new HashMap<String, String>();
}
parse
: ((Space)* statement (Space)* ';')+ (Space)* EOF
;
statement
: print | assignment
;
print
: 'print' '('
( id=Identifier {System.out.println("> "+vars.get($id.text));}
| st=stringLiteral {System.out.println("> "+$st.value);}
)
')'
;
assignment
: id=Identifier (Space)* '=' (Space)* st=stringLiteral {vars.put($id.text, $st.value);}
;
stringLiteral returns [String value]
: '"'
{StringBuilder b = new StringBuilder();}
( id=Identifier {b.append($id.text);}
| es=EscapeSequence {b.append($es.text);}
| ch=(NormalChar | Space) {b.append($ch.text);}
| in=Interpolation {b.append(vars.get($in.text.substring(2, $in.text.length()-1)));}
)*
'"'
{$value = b.toString();}
;
Interpolation
: '${' i=Identifier '}'
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
;
EscapeSequence
: '\\' SpecialChar
;
SpecialChar
: '"' | '\\' | '$'
;
Space
: (' ' | '\t' | '\r' | '\n')
;
NormalChar
: ~SpecialChar
;
And a class with a main method to test it all:
package antlrdemo;
import org.antlr.runtime.*;
public class ANTLRDemo {
public static void main(String[] args) throws RecognitionException {
String source = "name = \"Bob\"; \n"+
"msg = \"Hello ${name}\"; \n"+
"print(msg); \n"+
"print(\"Bye \\${for} now!\"); ";
ANTLRStringStream in = new ANTLRStringStream(source);
StrLexer lexer = new StrLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
StrParser parser = new StrParser(tokens);
parser.parse();
}
}
which produces the following output:
> Hello Bob
> Bye \${for} now!
Again, I am no expert, but this (at least) gives you a way to solve it.
HTH.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Grammar parser suggestion (ANTLR) - parsing

Related

antlr4 line 2:0 mismatched input 'if' expecting {'if', OTHER}

ANTLR 'or' regular expression

Assignment as expression in Antlr grammar

Unusual ANTLR error when attempting to reorganize grammar into two files

Parsing string interpolation in ANTLR

Categories

Resources