I'm trying to write a grammar for Prolog interpreter. When I run grun from command line on input like "father(john,mary).", I get a message saying "no viable input at 'father(john,'" and I don't know why. I've tried rearranging rules in my grammar, used different entry points etc., but still get the same error. I'm not even sure if it's caused by my grammar or something else like antlr itself. Can someone point out what is wrong with my grammar or think of what could be the cause if not the grammar?
The commands I ran are:
antlr4 -no-listener -visitor Expr.g4
javac *.java
grun antlr.Expr start tests/test.txt -gui
And this is the resulting parse tree:
Here is my grammar:
grammar Expr;
#header{
package antlr;
}
//start rule
start : (program | query) EOF
;
program : (rule_ '.')*
;
query : conjunction '?'
;
rule_ : compound
| compound ':-' conjunction
;
conjunction : compound
| compound ',' conjunction
;
compound : Atom '(' elements ')'
| '.(' elements ')'
;
list : '[]'
| '[' element ']'
| '[' elements ']'
;
element : Term
| list
| compound
;
elements : element
| element ',' elements
;
WS : [ \t\r\n]+ -> skip ;
Atom : [a-z]([a-z]|[A-Z]|[0-9]|'_')*
| '0'
;
Var : [A-Z]([a-z]|[A-Z]|[0-9]|'_')*
;
Term : Atom
| Var
;
The lexer will always produce the same tokens for any input. The lexer does not "listen" to what the parser is trying to match. The rules the lexer applies are quite simple:
try to match as many characters as possible
when 2 or more lexer rules match the same amount of characters, let the rule defined first "win"
Because of the 2nd rule, the rule Term will never be matched. And moving the Term rule above Var and Atom will cause the latter rules to be never matched. The solution: "promote" the Term rule to a parser rule:
start : (program | query) EOF
;
program : (rule_ '.')*
;
query : conjunction '?'
;
rule_ : compound (':-' conjunction)?
;
conjunction : compound (',' conjunction)?
;
compound : Atom '(' elements ')'
| '.' '(' elements ')'
;
list : '[' elements? ']'
;
element : term
| list
| compound
;
elements : element (',' element)*
;
term : Atom
| Var
;
WS : [ \t\r\n]+ -> skip ;
Atom : [a-z] [a-zA-Z0-9_]*
| '0'
;
Var : [A-Z] [a-zA-Z0-9_]*
;
Related
I've written the following arithmetic grammar:
grammar Calc;
program
: expressions
;
expressions
: expression (NEWLINE expression)*
;
expression
: '(' expression ')' // parenExpression has highest precedence
| expression MULDIV expression // then multDivExpression
| expression ADDSUB expression // then addSubExpression
| OPERAND // finally the operand itself
;
MULDIV
: [*/]
;
ADDSUB
: [-+]
;
// 12 or .12 or 2. or 2.38
OPERAND
: [0-9]+ ('.' [0-9]*)?
| '.' [0-9]+
;
NEWLINE
: '\n'
;
And I've noticed that regardless of how I space the tokens I get the same result, for example:
1+2
2+3
Or:
1 +2
2+3
Still give me the same thing. Also I've noticed that adding in the following rule does nothing for me:
WS
: [ \r\n\t] + -> skip
Which makes me wonder whether skipping whitespace is the default behavior of antlr4?
ANTLR4 based parsers have the ability to skip over single unwanted or missing tokens and continue parsing if possible (which is the case here). And there's no default to ignore whitespaces. You have to always specify a whitespace rule which either skips them or puts them on a hidden channel.
I am defining a grammar in ANTLR that will express an expression which includes logical operator and parenthesis together.
Here is the grammar
grammar simpleGrammar;
/* This will be the entry point of the parser. */
parse
:
expression EOF
;
expression
:
expression binOp expression | ID | unOp (expression) | '(' expression ')'
;
binOp
:
('AND' | 'OR')
;
unOp
:
'NOT'
;
ID :
('a'..'z' | 'A'..'Z')+
;
The defined grammar can able to express parse tree without parenthesis but when I input an example with parenthesis for example, (Apple OR Bananana)AND Orange
It is showing MismatchedTokenException
So, It will be really appreciated if someone explains how to define the grammar in order to express the parenthesis.
You forgot to tell ANTLR what to do with whitespace. For example:
WS : [ \t\r\n] -> skip;
Add this and you grammar will work.
As a side note, your grammar has the same precedence for the AND and OR operators. And these operators have higher precedence than NOT. As this goes against conventional rules, I'd advise you to write your expression rule like this instead:
expression
: '(' expression ')' # parenExp
| 'NOT' expression # notExpr
| expression 'AND' expression # andExpr
| expression 'OR' expression # orExpr
| ID # atomExpr
;
I would like to be able to write a "meta-rule" in ANTLR4 that takes a rule as an input argument and performs a set modification to that rule. Here's an example grammar:
grammar G;
WS: [ \t\n\r] + -> skip;
CHAR: [a-z];
term: (CHAR)+;
sum: term ('+' term)+;
pterm: '(' term ')' | '(' pterm ')';
psum: '(' sum ')' | '(' psum ')';
expr: term | sum | pterm | psum;
The rules for pterm and psum perform the same action on term and sum, enclosing them in possibly nested parentheses. I would like to be able to replace the last three lines above with something like the following:
enclose[rule]: '(' rule ')' | '(' enclose(rule) ')';
expr: term | sum | enclose(term) | enclose(sum);
Is there a way to construct a meta-rule like this?
The short answer is, no.
Better to resolve by refactoring the grammar and identifying the structurally significant terms:
expr: LPAREN sum RPAREN | LPAREN expr RPAREN ;
sum : term ('+' term)* ; // changed to Kleene star
term: CHAR+ ;
LPAREN : '(' ;
RPAREN : ')' ;
CHAR : [a-z] ;
WS : [ \t\n\r]+ -> skip ;
The sum rule will consume all terms, so the expr rule only needs to handle sums.
I'm following the example given here-
https://datapsyche.wordpress.com/2014/10/23/back-to-learning-grammar-with-antlr/
which basically has following grammar-
grammar Simpleql;
statement : expr command* ;
expr : expr ('AND' | 'OR' | 'NOT') expr # expopexp
| expr expr # expexp
| predicate # predicexpr
| text # textexpr
| '(' expr ')' # exprgroup
;
predicate : text ('=' | '!=' | '>=' | '<=' | '>' | '<') text ;
command : '| show' text* # showcmd
| '| show' text (',' text)* # showcsv
;
text : NUMBER # numbertxt
| QTEXT # quotedtxt
| UQTEXT # unquotedtxt
;
AND : 'AND' ;
OR : 'OR' ;
NOT : 'NOT' ;
EQUALS : '=' ;
NOTEQUALS : '!=' ;
GREQUALS : '>=' ;
LSEQUALS : '<=' ;
GREATERTHAN : '>' ;
LESSTHAN : '<' ;
NUMBER : DIGIT+
| DIGIT+ '.' DIGIT+
| '.' DIGIT+
;
QTEXT : '"' (ESC|.)*? '"' ;
UQTEXT : ~[ ()=,<>!\r\n]+ ;
fragment
DIGIT : [0-9] ;
fragment
ESC : '\\"' | '\\\\' ;
WS : [ \t\r\n]+ -> skip ;
When I pass input like this-
Abishek AND (country=India OR city=NY) LOGIN 404 | show name city
I get error- line 1:65 no viable alternative at input '<EOF>'
I went through a couple of SO posts related to the error but can't seem to be able to figure out what is wrong with the grammar.
I tried running your example but was thrown a number of errors in antlrworks 2. However i was able to run it without any errors in the test rig getting the following output:
(statement (expr (expr (expr (text Abishek)) AND (expr ( (expr (expr (predicate (text country) = (text India))) OR (expr (predicate (text city) = (text NY)))) ))) (expr (expr (text LOGIN)) (expr (text 404)))) (command | show (text name) (text city)))
And the same output of the tree shown on the website.
My opinion on what's wrong may be your actual input, iv had problems in the past with ANTLR reading text from a file if the file was not encoded to be ascii/ansi/utf-8 or whatever works for the os you are using. I encountered this when i saved a file on linux from a linux text editor and tried to run it on windows with the same generated parser. So my recommendation is try re-saving your text input - 'Abishek AND (country=India OR city=NY) LOGIN 404 | show name city' and make sure the encoding is different each time incase this is the cause.
Note you can also specify the encoding like this or similar ways :
CharStream charStream = new ANTLRInputStream(inputStream, "UTF-8");
Since having an encoding error will cause it to try and parse irrelevant of encoding and result in no matches being found.
Let me know if it works after saving encoded in a few different ways and i'll try and help further. Hope this helps.
I want to create a grammar that will parse a text file and create a tree of levels according to configurable "segmentors". This is what I have created so far, it kind of works, but will halt when a "segmentor" appears in the beginning of a text. For example, text "and location" will fail to parse. Any ideas?
Also, I'm pretty certain that the grammar could be greatly improved, so any suggestions are welcome.
grammar DocSegmentor;
#header {
package segmentor.antlr;
}
// PARSER RULES
levelOne: (levelTwo LEVEL1_SEG*)+ ;
levelTwo: (levelThree+ LEVEL2_SEG?)+ ;
levelThree: (levelFour+ LEVEL3_SEG?)+ ;
levelFour: (levelFive+ LEVEL4_SEG?)+ ;
levelFive: tokens;
tokens: (DELIM | PAREN | TEXT | WS)+ ;
// LEXER RULES
LEVEL1_SEG : '\r'? '\n'| EOF ;
LEVEL2_SEG : '.' ;
LEVEL3_SEG : ',' ;
LEVEL4_SEG : 'and' | 'or' ;
DELIM : '`' | '"' | ';' | '/' | ':' | '’' | '‘' | '=' | '?' | '-' | '_';
PAREN : '(' | ')' | '[' | ']' | '{' | '}' ;
TEXT : (('a'..'z') | ('A'..'Z') | ('0'..'9'))+ ;
WS : [ \t]+ ;
I'd definitely go with a Scala parser combinator library.
https://lihaoyi.github.io/fastparse/
https://github.com/scala/scala-parser-combinators
Those are just two examples for a library you can write by hand with little effort and tune to whatever you need. I should mention that you should go with Scalaz (https://github.com/scalaz/scalaz) if you're writing a parser monad on your own.
I wouldn't use a parser at all for that task. All you need is keyword spotting.
It's much easier and more flexibel if you just scan your text for the "segmentators" by walking over the input. This also allows to handle text of any size (e.g. by using memory mapped files) while parsers usually (ANTLR for sure) load the entire text into memory and tokenize it fully, before it comes to parsing.