I'm trying to create sort of css preprocessor for my study.
This is my grammar. It does a little for now, but it already doesn't work.
grammar CSSS;
#header {
package antlr;
}
program
: member*
;
member
: element
;
element
: selector '{' property* '}'
;
selector
: selector_atom (('>'|'+'|'~') selector_atom)*
;
selector_atom
: ('#'|'.')? NAME
| '*'
;
property
: prop_name ':' prop_value+ ';'
;
prop_name
: NAME
;
prop_value
: (CSSLETTER | NUMBER)+
;
NAME: CSSLETTER+;
NUMBER: '0'..'9';
CSSLETTER: 'a'..'z' | 'A'..'Z' | '-' | '_';
WS: (' '|'\n'|'\r'|'\t'|'\f')+ { skip(); };
Problem is when i try to feed it with this input
body {
font-size: 19px;
...
}
parse tree outputs this strange dot symbol where it should skip whitespace.
Can someone please explain what is this thing and how to make everything work fine.
Related
This is freaking me out, I just can't find a solution to it. I have a grammar for search queries and would like to match any searchterm in a query composed out of printable letters except for special characters "(", ")". Strings enclosed in quotes are handled separately and work.
Here is a somewhat working grammar:
/* ANTLR Grammar for Minidb Query Language */
grammar Mdb;
start
: searchclause EOF
;
searchclause
: table expr
;
expr
: fieldsearch
| searchop fieldsearch
| unop expr
| expr relop expr
| lparen expr relop expr rparen
;
lparen
: '('
;
rparen
: ')'
;
unop
: NOT
;
relop
: AND
| OR
;
searchop
: NO
| EVERY
;
fieldsearch
: field EQ searchterm
;
field
: ID
;
table
: ID
;
searchterm
:
| STRING
| ID+
| DIGIT+
| DIGIT+ ID+
;
STRING
: '"' ~('\n'|'"')* ('"' )
;
AND
: 'and'
;
OR
: 'or'
;
NOT
: 'not'
;
NO
: 'no'
;
EVERY
: 'every'
;
EQ
: '='
;
fragment VALID_ID_START
: ('a' .. 'z') | ('A' .. 'Z') | '_'
;
fragment VALID_ID_CHAR
: VALID_ID_START | ('0' .. '9')
;
ID
: VALID_ID_START VALID_ID_CHAR*
;
DIGIT
: ('0' .. '9')
;
/*
NOT_SPECIAL
: ~(' ' | '\t' | '\n' | '\r' | '\'' | '"' | ';' | '.' | '=' | '(' | ')' )
; */
WS
: [ \r\n\t] + -> skip
;
The problem is that searchterm is too restricted. It should match any character that is in the commented out NOT_SPECIAL, i.e., valid queries would be:
Person Name=%
Person Address=^%Street%%%$^&*#^
But whenever I try to put NOT_SPECIAL in any way into the definition of searchterm it doesn't work. I have tried putting it literally into the rule, too (commenting out NOT_SPECIAL) and many others things, but it just doesn't work. In most of my attempts the grammar just complained about extraneous input after "=" and said it was expecting EOF. But I also cannot put EOF into NOT_SPECIAL.
Is there any way I can simply parse every text after "=" in rule fieldsearch until there is a whitespace or ")", "("?
N.B. The STRING rule works fine, but the user ought not be required to use quotes every time, because this is a command line tool and they'd need to be escaped.
Target language is Go.
You could solve that by introducing a lexical mode that you'll enter whenever you match an EQ token. Once in that lexical mode, you either match a (, ) or a whitespace (in which case you pop out of the lexical mode), or you keep matching your NOT_SPECIAL chars.
By using lexical modes, you must define your lexer- and parser rules in their own files. Be sure to use lexer grammar ... and parser grammar ... instead of the grammar ... you use in a combined .g4 file.
A quick demo:
lexer grammar MdbLexer;
STRING
: '"' ~[\r\n"]* '"'
;
OPAR
: '('
;
CPAR
: ')'
;
AND
: 'and'
;
OR
: 'or'
;
NOT
: 'not'
;
NO
: 'no'
;
EVERY
: 'every'
;
EQ
: '=' -> pushMode(NOT_SPECIAL_MODE)
;
ID
: VALID_ID_START VALID_ID_CHAR*
;
DIGIT
: [0-9]
;
WS
: [ \r\n\t]+ -> skip
;
fragment VALID_ID_START
: [a-zA-Z_]
;
fragment VALID_ID_CHAR
: [a-zA-Z_0-9]
;
mode NOT_SPECIAL_MODE;
OPAR2
: '(' -> type(OPAR), popMode
;
CPAR2
: ')' -> type(CPAR), popMode
;
WS2
: [ \t\r\n] -> skip, popMode
;
NOT_SPECIAL
: ~[ \t\r\n()]+
;
Your parser grammar would start like this:
parser grammar MdbParser;
options {
tokenVocab=MdbLexer;
}
start
: searchclause EOF
;
// your other parser rules
My Go is a bit rusty, but a small Java test:
String source = "Person Address=^%Street%%%$^&*#^()";
MdbLexer lexer = new MdbLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-15s %s\n", MdbLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
print the following:
ID Person
ID Address
EQ =
NOT_SPECIAL ^%Street%%%$^&*#^
OPAR (
CPAR )
EOF <EOF>
Im writing chrome DEPS file parser. How to match one of either following grammar rule defintion of rightexpr. My grammar is like
the following one:
grammar Depsgrammar;
prog: expr+ EOF;
expr: varline
;
varline:
ID EQ rightexpr
;
rightexpr :
basicvalue | bentukonejsonval| bentuktwojsonval
;
bentukonejsonval :
'[' string? (COMMA string )* COMMA? ']'
;
bentuktwojsonval :
'{' singledictexpr? (COMMA singledictexpr )* COMMA? '}'
;
singledictexpr :
string ':' basicvalue
;
basicvalue :
True
| False
| string
| NUM
| varfunc
;
varfunc :
Var '(' string ')'
;
string :
SIMPLESTRINGEXPRDOUBLEQUOTE
| SIMPLESTRINGEXPRSINGLEQUOTE
;
Var : 'Var' ;
COMMA : ',' ;
NUM : [0-9]+;
ID : [a-zA-Z0-9_]+;
True : [tT] [Rr] [Uu] [Ee];
False: [Ff] [Aa] [Ll] [Ss] [Ee];
fragment SIMPLESTRINGEXPRDOUBLEQUOTEBASE : ~ ( '\n' | '\r' | '"' )* ;
SIMPLESTRINGEXPRDOUBLEQUOTE: '"' SIMPLESTRINGEXPRDOUBLEQUOTEBASE '"' ;
fragment SIMPLESTRINGEXPRSINGLEQUOTEBASE : ~ ( '\n' | '\r' | '\'' )* ;
SIMPLESTRINGEXPRSINGLEQUOTE : '\'' SIMPLESTRINGEXPRSINGLEQUOTEBASE '\'' ;
EQ : '=';
COMMENT:
'#' ~ ( '\n' | '\r' )* '\n' -> skip ;
WS : [ \n\t\r]+ -> skip ;
I want user could enter this input
#adas21 #FS;SFD33
_as= Var('das') # somelongth comment
_as_0= FALSE # somelongth comment
_as_0= 'as!' # somelongth comment
gclient_gn_args = [
#ad as!~;
'checkout_libaom',
'checkout_nacl',
'"{cros_board}" == "amd64-generic"',
'checkout_oculus_sdk',
]
vars = {
'checkout_libaom':1,
'checkout_nacl': "SS",
'checkout_oculus_sdk': FalSe,
'checkout_oculus_sdk':'',
}
s=[
]
whenever I enter simple syntax in grun
sa=true
always give me line 1:3 mismatched input 'true' expecting blah..(rightexpr def). I'm missing in understanding of basic antlr4 choice matching decision. Could you please teach me?
Thanks
Whenever you have an error where the list of expected tokens seemingly includes the unexpected token, it is a good idea to list the generated tokens. You can do that by passing the -tokens option to grun. If you do this for your input, you'll see that true is interpreted as an ID token, not a True token.
The reason for that is that when multiple lexer rules would match on the current input and produce a match of the same size, the one that's defined earlier in the grammar is chosen. So because ID is defined before True, it takes precedence. Generally all keywords should be defined before the ID rule to prevent exactly this issue.
In other words, moving the True and False rules before ID will solve your issue.
I want to create a Grammar that will parse the input statement
myvar is 43+23
and
otherVar of myvar is "hallo"
But the parser doesn't recognize anything here.
(sorry, I am not allowed to post images :( imagine a statement node with the Tokens
[myvar] [is] [43] [+] [23] as children all marked red. Same goes for the other statement)
I'm getting error messages that confuse me:
line 2:7 no viable alternative at input 'myvaris'
line 3:19 no viable alternative at input 'otherVarofmyvaris'
Where are the spaces gone? I assume, It's something with my lexer, but I can't see what the problem is. Just in case here is the grammar for these statements:
statement
: envCall #call_Environment_Function
| identifier IS expression # assignment_statement // This one should be used
| loopHeader statement_block # loop_statement
etc...
expression
: '(' expression ')' #bracket_Expression
| mathExpression #math_Expression
| identifier #identifier_Expression // this one should be used
| objectExpression #object_Expression
etc ...
identifier //both of these should be used
: selector=IDENTIFIER OF object=expression #ofIdentifier
| selector=IDENTIFIER #idLocal
;
here are all the Lexer rules I have so far:
IdentifierNamespace: IDENTIFIER '.' IDENTIFIER;
FromIn: FROM | IN;
OPENBLOCK: NEWLINE? '{';
CLOSEBLOCK: '}' NEWLINE;
NEWLINE: ['\n''\t']+;
NUMBER: INT | FLOAT;
INT: [0-9]+;
FLOAT: [0-9]* '.' [0-9]+;
IsAre: IS | ARE;
OF: 'of';
IS: 'is';
ARE: 'are';
DO: 'do';
FROM: 'from';
IN: 'in';
IDENTIFIER : [a-zA-Z]+ ;
//WHITESPACE: [ \t]+ -> skip;
fragment UNICODE : 'u' HEX HEX HEX HEX ;
fragment HEX : [0-9a-fA-F] ;
fragment ESC : '\\' (["\\/bfnrt] | UNICODE) ;
STRING : '"' (ESC | ~["\\])* '"' ;
END: 'END'[.]* EOF;
WHITESPACE : ( '\t' | ' ' )+ -> skip ;
Ok, found it. There was a compOP defined for the parser, and it was messing up the treegeneration.
compOP: '<'
| '>'
| '=' // the programmers '=='
| '>='
| '<='
| '<>'
| '!='
| 'in'
| 'not' 'in'
| 'is' <- removed this one and it works now
;
So: never assign the same keyword to Parser and Lexer, I guess.
I am trying to adapt the STRING part of Pair in Object to a CamelString, but it fails. and report "No viable alternative at input".
I have tried to used my CamelString as an independent grammar, it works well. I think it means there is ambiguity in my grammar, but I can not understand why.
For the test input
{
'BaaaBcccCdddd':'cc'
}
Ther error is
line 2:2 no viable alternative at input '{'BaaaBcccCdddd''
The following is my grammar. It's almost the same with the standard JSON grammar for ANTLR 4.
/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */
// Derived from http://json.org
grammar Json;
json: object
| array
;
object
: '{' pair (',' pair)* '}'
| '{' '}' // empty object
;
pair : camel_string ':' value;
camel_string : '\'' (camel_body)*? '\'';
STRING
: '\'' (ESC | ~['\\])* '\'';
camel_body: CAMEL_BODY;
CAMEL_START: [a-z] ALPHA_NUM_LOWER*;
CAMEL_BODY: [A-Z] ALPHA_NUM_LOWER*;
CAMEL_END: [A-Z]+;
fragment ALPHA_NUM_LOWER : [0-9a-z];
array
: '[' value (',' value)* ']'
| '[' ']' // empty array
;
value
: STRING
| NUMBER
| object // recursion
| array // recursion
| 'true' // keywords
| 'false'
| 'null'
;
fragment ESC : '\\' (["\\/bfnrt] | UNICODE) ;
fragment UNICODE : 'u' HEX HEX HEX HEX ;
fragment HEX : [0-9a-fA-F] ;
NUMBER
: '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
fragment INT : '0' | [1-9] [0-9]* ; // no leading zeros
fragment EXP : [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
WS : [ \t\n\r]+ -> skip ;
Background
I have been using ANTLRWorks (V 1.4.3) for a few days now and trying to write a simple Boolean parser. The combined lexer/parser grammar below works well for most of the requirements including support for quoted white-spaced text as operands for a Boolean expression.
Problem
I would like the grammar to work for white-spaced operands without the need of quotes.
Example
For example, expression-
"left right" AND center
should have the same parse tree even after dropping the quotes-
left right AND center.
I have been learning about backtracking, predicates etc but can't seem to find a solution.
Code
Below is the grammar I have got so far. Any feedback on the foolish mistakes is appreciated :).
Lexer/Parser Grammar
grammar boolean_expr;
options {
TokenLabelType=CommonToken;
output=AST;
ASTLabelType=CommonTree;
}
#modifier{public}
#ctorModifier{public}
#lexer::namespace{Org.CSharp.Parsers}
#parser::namespace{Org.CSharp.Parsers}
public
evaluator
: expr EOF
;
public
expr
: orexpr
;
public
orexpr
: andexpr (OR^ andexpr)*
;
public
andexpr
: notexpr (AND^ notexpr)*
;
public
notexpr
: (NOT^)? atom
;
public
atom
: word | LPAREN! expr RPAREN!
;
public
word
: QUOTED_TEXT | TEXT
;
/*
* Lexer Rules
*/
LPAREN
: '('
;
RPAREN
: ')'
;
AND
: 'AND'
;
OR
: 'OR'
;
NOT
: 'NOT'
;
WS
: ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;}
;
QUOTED_TEXT
: '"' (LETTER | DIGIT | ' ' | ',' | '-')+ '"'
;
TEXT
: (LETTER | DIGIT)+
;
/*
Fragment lexer rules can be used by other lexer rules, but do not return tokens by themselves
*/
fragment DIGIT
: ('0'..'9')
;
fragment LOWER
: ('a'..'z')
;
fragment UPPER
: ('A'..'Z')
;
fragment LETTER
: LOWER | UPPER
;
Simply let TEXT in your atom rule match once or more: TEXT+. When it matches a TEXT token more than once, you'll also want to create a custom root node for these TEXT tokens (I added an imaginary token called WORD in the grammar below).
grammar boolean_expr;
options {
output=AST;
}
tokens {
WORD;
}
evaluator
: expr EOF
;
...
word
: QUOTED_TEXT
| TEXT+ -> ^(WORD TEXT+)
;
...
Your input "left right AND center" would now be parsed as follows: