UnsupportedOperationException when I try translate a expression - rascal

I defined a syntax for a language of expressions, actually it's more complex, but I simplified to put here, and I defined some functions to translate this expressions for Java (I'm not using the Java syntax, I just translate to strings). I defined some constants in the syntax (I don't know if it's the correct name for this) that calls "MAXINT" and "MININT", and some functions that calls translateExp to translate each expression that I defined in the syntax to a string. The most expressions that I try translate works, but when "MAXINT" or "MININT" appears in the expression don't work and throws UnsupportedOperationException and I don't know why, for example "MAXINT - 1". Somebody knows why and can help me? Another problem that throws UnsupportedOperationException too is when I try translate some expression that have more than one plus (+) or minus (-), like "1+1+1", again, somebody knows why?
My module with the syntax and the functions:
module ExpSyntax
import String;
import ParseTree;
layout Whitespaces = [\t\n\ \r\f]*;
lexical Ident = [a-z A-Z 0-9 _] !<< [a-z A-Z][a-z A-Z 0-9 _]* !>> [a-z A-Z 0-9 _] \ Keywords;
lexical Integer_literal = [0-9]+;
keyword Keywords = "MAXINT" | "MININT";
start syntax Expression
= Expressions_primary
| Expressions_arithmetical;
syntax Expressions_primary
= Data: Ident+ id
| bracket Expr_bracketed: "(" Expression e ")"
;
syntax Expressions_arithmetical
= Integer_lit
| left Addition: Expression e1 "+" Expression e2
| left Difference: Expression e1 "-" Expression e2
;
syntax Integer_lit
= Integer_literal il
| MAX_INT: "MAXINT"
| MIN_INT: "MININT"
;
public str translate(str txt) = translateExp(parse(#Expression, txt));
public str translateExp((Expression) `<Integer_literal i>`) = "<i>";
public str translateExp((Expression) `MAXINT`) = "java.lang.Integer.MAX_VALUE";
public str translateExp((Expression) `MININT`) = "java.lang.Integer.MIN_VALUE";
public str translateExp((Expression) `<Expression e1>+<Expression e2>`) = "<translateExp(e1)> + <translateExp(e2)>";
public str translateExp((Expression) `<Expression e1>-<Expression e2>`) = "<translateExp(e1)> - <translateExp(e2)>";
public str translateExp((Expression) `<Ident id>`) = "<id>";
public str translateExp((Expression) `(<Expression e>)`) = "(<translateExp(e)>)";
This is what happens:
rascal>import ExpSyntax;
ok
rascal>translate("(test + 1) - test2");
str: "(test + 1) - test2"
rascal>translate("MAXINT - 1");
java.lang.UnsupportedOperationException(internal error) at $shell$(|main://$shell$|)
java.lang.UnsupportedOperationException
at org.rascalmpl.ast.Expression.getKeywordArguments(Expression.java:214)
at org.rascalmpl.interpreter.matching.NodePattern.<init>(NodePattern.java:84)
at org.rascalmpl.semantics.dynamic.Tree$Amb.buildMatcher(Tree.java:351)
at org.rascalmpl.ast.AbstractAST.getMatcher(AbstractAST.java:173)
at org.rascalmpl.interpreter.result.RascalFunction.prepareFormals(RascalFunction.java:503)
at org.rascalmpl.interpreter.result.RascalFunction.call(RascalFunction.java:365)
at org.rascalmpl.interpreter.result.OverloadedFunction.callWith(OverloadedFunction.java:327)
at org.rascalmpl.interpreter.result.OverloadedFunction.call(OverloadedFunction.java:305)
at org.rascalmpl.semantics.dynamic.Expression$CallOrTree.interpret(Expression.java:486)
at org.rascalmpl.semantics.dynamic.Statement$Expression.interpret(Statement.java:355)
at org.rascalmpl.semantics.dynamic.Statement$Return.interpret(Statement.java:773)
at org.rascalmpl.interpreter.result.RascalFunction.runBody(RascalFunction.java:467)
at org.rascalmpl.interpreter.result.RascalFunction.call(RascalFunction.java:413)
at org.rascalmpl.interpreter.result.OverloadedFunction.callWith(OverloadedFunction.java:327)
at org.rascalmpl.interpreter.result.OverloadedFunction.call(OverloadedFunction.java:305)
at org.rascalmpl.semantics.dynamic.Expression$CallOrTree.interpret(Expression.java:486)
at org.rascalmpl.semantics.dynamic.Statement$Expression.interpret(Statement.java:355)
at org.rascalmpl.interpreter.Evaluator.eval(Evaluator.java:936)
at org.rascalmpl.semantics.dynamic.Command$Statement.interpret(Command.java:115)
at org.rascalmpl.interpreter.Evaluator.eval(Evaluator.java:1147)
at org.rascalmpl.interpreter.Evaluator.eval(Evaluator.java:1107)
at org.rascalmpl.eclipse.console.RascalScriptInterpreter.execCommand(RascalScriptInterpreter.java:446)
at org.rascalmpl.eclipse.console.RascalScriptInterpreter.run(RascalScriptInterpreter.java:239)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)
rascal>translate("1+1+1");
java.lang.UnsupportedOperationException(internal error) at $shell$(|main://$shell$|)
java.lang.UnsupportedOperationException
at org.rascalmpl.ast.Expression.getKeywordArguments(Expression.java:214)
at org.rascalmpl.interpreter.matching.NodePattern.<init>(NodePattern.java:84)
at org.rascalmpl.semantics.dynamic.Tree$Amb.buildMatcher(Tree.java:351)
at org.rascalmpl.ast.AbstractAST.getMatcher(AbstractAST.java:173)
at org.rascalmpl.interpreter.result.RascalFunction.prepareFormals(RascalFunction.java:503)
at org.rascalmpl.interpreter.result.RascalFunction.call(RascalFunction.java:365)
at org.rascalmpl.interpreter.result.OverloadedFunction.callWith(OverloadedFunction.java:327)
at org.rascalmpl.interpreter.result.OverloadedFunction.call(OverloadedFunction.java:305)
at org.rascalmpl.semantics.dynamic.Expression$CallOrTree.interpret(Expression.java:486)
at org.rascalmpl.semantics.dynamic.Statement$Expression.interpret(Statement.java:355)
at org.rascalmpl.semantics.dynamic.Statement$Return.interpret(Statement.java:773)
at org.rascalmpl.interpreter.result.RascalFunction.runBody(RascalFunction.java:467)
at org.rascalmpl.interpreter.result.RascalFunction.call(RascalFunction.java:413)
at org.rascalmpl.interpreter.result.OverloadedFunction.callWith(OverloadedFunction.java:327)
at org.rascalmpl.interpreter.result.OverloadedFunction.call(OverloadedFunction.java:305)
at org.rascalmpl.semantics.dynamic.Expression$CallOrTree.interpret(Expression.java:486)
at org.rascalmpl.semantics.dynamic.Statement$Expression.interpret(Statement.java:355)
at org.rascalmpl.interpreter.Evaluator.eval(Evaluator.java:936)
at org.rascalmpl.semantics.dynamic.Command$Statement.interpret(Command.java:115)
at org.rascalmpl.interpreter.Evaluator.eval(Evaluator.java:1147)
at org.rascalmpl.interpreter.Evaluator.eval(Evaluator.java:1107)
at org.rascalmpl.eclipse.console.RascalScriptInterpreter.execCommand(RascalScriptInterpreter.java:446)
at org.rascalmpl.eclipse.console.RascalScriptInterpreter.run(RascalScriptInterpreter.java:239)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)

The error message is a bug, since it should report something more clear, but it appears there is still an ambiguity in your syntax definition. The run-time is trying to build a pattern matcher for Tree$Amb (org.rascalmpl.semantics.dynamic.Tree$Amb.buildMatcher) which we did not implement and will not implement.
From looking at the definition (I did not try it), it seems that the cause is this rule:
lexical Ident = [a-z A-Z 0-9 _] !<< [a-z A-Z][a-z A-Z 0-9 _]* !>> [a-z A-Z 0-9 _] \ Keywords;
Because !>> and \ bind stronger than juxtapositioning, the \ keyword reservation is applied only to the tail of an Ident and not the whole. Please add brackets (a sequence operator) to disambiguate:
lexical Ident = [a-z A-Z 0-9 _] !<< ([a-z A-Z][a-z A-Z 0-9 _]*) !>> [a-z A-Z 0-9 _] \ Keywords;
This should get you a step further.
Then your expression grammar is still ambiguous and can be simplified to this:
start syntax Expression
= Data: Ident+ id
| Integer: Integer_lit
| bracket bracketed: "(" Expression e ")"
> left
( Addition: Expression e1 "+" Expression e2
| Difference: Expression e1 "-" Expression e2
)
;
syntax Integer_lit
= Integer_literal il
| MAX_INT: "MAXINT"
| MIN_INT: "MININT"
;
Short explanation: left only works on directly recursive non-terminals. Since you defined Expressions_arithmatical in terms of Expression there was no direct recursion. A next will support indirect recursion, but for this grammar this would be unnecessary.
Also, I added that + and - are mutually left recursive by putting them in a group, otherwise 1+1-1 would have remained ambiguous.

Related

Exponent operator does not work when no space added? Whats wrong with my grammar

I am trying to write an expression evaluator in which I am trying to add underscore _ as a reserve word which would denote a certain constant value.
Here is my grammar, it successfully parses 5 ^ _ but it fails to parse _^ 5 (without space). It only acts up that way for ^ operator.
COMPILER Formula
CHARACTERS
digit = '0'..'9'.
letter = 'A'..'z'.
TOKENS
number = digit {digit}.
identifier = letter {letter|digit}.
self = '_'.
IGNORE '\r' + '\n'
PRODUCTIONS
Formula = Term{ ( '+' | '-') Term}.
Term = Factor {( '*' | "/" |'%' | '^' ) Factor}.
Factor = number | Self.
Self = self.
END Formula.
What am I missing? I am using Coco/R compiler generator.
Your current definition of the token letter causes this issue because the range A..z includes the _ character and ^ character.
You can rewrite the Formula and Term rules like this:
Formula = Formula ( '+' | '-') Term | Term
Term = Term ( '*' | "/" |'%' | '^' ) Factor | Factor
e.g. https://metacpan.org/pod/distribution/Marpa-R2/pod/Marpa_R2.pod#Synopsis

Parsing and implementing a lambda calculus in Rascal

I am trying to implement a lambda calculus inside of Rascal but am having trouble getting the precedence and parsing to work the way I would like it to. Currently I have a grammar that looks something like:
keyword Keywords= "if" | "then" | "else" | "end" | "fun";
lexical Ident = [a-zA-Z] !>> [a-zA-Z]+ !>> [a-zA-Z0-9] \ Keywords;
lexical Natural = [0-9]+ !>> [0-9];
lexical LAYOUT = [\t-\n\r\ ];
layout LAYOUTLIST = LAYOUT* !>> [\t-\n\r\ ];
start syntax Prog = prog: Exp LAYOUTLIST;
syntax Exp =
var: Ident
| nat: Natural
| bracket "(" Exp ")"
> left app: Exp Exp
> right func: "fun" Ident "-\>" Exp
When I parse a program of the form:
(fun x -> fun y -> x) 1 2
The resulting tree is:
prog(app(
app(
func(
"x",
func(
"y",
var("x")
nat(1),
nat(2))))))
Where really I am looking for something like this (I think):
prog(app(
func(
"x",
app(
func(
"y",
var("x")),
nat(2))),
nat(1)))
I've tried a number of variations of the precedence in the grammar, I've tried wrapping the App rule in parenthesis, and a number of other variations. There seems to be something going on here I don't understand. Any help would be most appreciated. Thanks.
I've used the following grammar, which removes the extra LAYOUTLIST and the dead right, but this should not make a difference. It seems to work as you want when I use the generic implode function :
keyword Keywords= "if" | "then" | "else" | "end" | "fun";
lexical Ident = [a-zA-Z] !>> [a-zA-Z]+ !>> [a-zA-Z0-9] \ Keywords;
lexical Natural = [0-9]+ !>> [0-9];
lexical LAYOUT = [\t-\n\r\ ];
layout LAYOUTLIST = LAYOUT* !>> [\t-\n\r\ ];
start syntax Prog = prog: Exp;
syntax Exp =
var: Ident
| nat: Natural
| bracket "(" Exp ")"
> left app: Exp Exp
> func: "fun" Ident "-\>" Exp
;
Then calling the parser and imploding to an untyped AST (I've removed the location annotations for readability):
rascal>import ParseTree;
ok
rascal>implode(#node, parse(#start[Prog], "(fun x -\> fun y -\> x) 1 2"))
node: "prog"("app"(
"app"(
"func"(
"x",
"func"(
"y",
"var"("x"))),
"nat"("1")),
"nat"("2")))
So, I am guessing you got the grammar right for the shape of tree you want. How do you go from concrete parse tree to abstract AST? Perhaps there is something funny going on there.

Where is this shift/reduce conflict coming from in Bison?

I am trying to get the hang of parsing by defining a very simple language in Jison (a javascript parser). It accepts the same / very similar syntax to bison.
Here is my grammar:
%token INT TRUE FALSE WHILE DO IF THEN ELSE LOCATION ASSIGN EOF DEREF
%left "+"
%left ">="
/* Define Start Production */
%start Program
/* Define Grammar Productions */
%%
Program
: Statement EOF
;
Statement
: Expression
| WHILE BoolExpression DO Statement
| LOCATION ASSIGN IntExpression
;
Expression
: IntExpression
| BoolExpression
;
IntExpression
: INT IntExpressionRest
| IF BoolExpression THEN Statement ELSE Statement
| DEREF LOCATION
;
IntExpressionRest
: /* epsilon */
| "+" IntExpression
;
BoolExpression
: TRUE
| FALSE
| IntExpression ">=" IntExpression
;
%%
I am getting one shift/reduce conflict. The output of Jison is here:
Conflict in grammar: multiple actions possible when lookahead token is >= in state 6
- reduce by rule: Expression -> IntExpression
- shift token (then go to state 17)
States with conflicts:
State 6
Expression -> IntExpression . #lookaheads= EOF >= THEN DO ELSE
BoolExpression -> IntExpression .>= IntExpression #lookaheads= EOF DO THEN ELSE >=
Your shift reduce conflict is detected because the >= is in the follow set of the Expression non-terminal. This is basically caused by the fact that a Statement can be an Expression and IntExpression can end with a statement. Consider the following input IF c THEN S1 ELSE S2 >= 42, if you had parentheses to disambiguate then this could be interpreted both as (IF c THEN S1 ELSE S2) >= 42 and IF c THEN S1 ELSE (S2 >= 42). Since the shift is preferred over the reduce, the latter will be chosen.
Your issue comes from
IF BoolExpression THEN Statement ELSE Statement
If the Statement after THEN contains an IF, how do you know if the ELSE belongs to the first or second IF? See here for more information: http://www.gnu.org/software/bison/manual/html_node/Shift_002fReduce.html
The only 100% non-ambiguous fix is to require some sort of delimiter around your if/else statements (most languages use the brackets "{" and "}"). Ex,
IF BoolExpression THEN '{' Statement '}' ELSE '{' Statement '}'

How to use a ParseKit grammar to parse a simple boolean expression language

Trying to build a grammar that will parse simple bool expressions.
I am running into an issue when there are multiple expressions.
I need to be able to parse 1..n and/or'ed expressions.
Each example below is a complete expression:
(myitem.isavailable("1234") or myitem.ispresent("1234")) and
myitem.isready("1234")
myitem.value > 4 and myitem.value < 10
myitem.value = yes or myotheritem.value = no
Grammar:
#start = conditionalexpression* | expressiontypes;
conditionalexpression = condition expressiontypes;
expressiontypes = expression | functionexpression;
expression = itemname dot property condition value;
functionexpression = itemname dot functionproperty;
itemname = Word;
propertytypes = property | functionproperty;
property = Word;
functionproperty = Word '(' value ')';
value = Word | QuotedString | Number;
condition = textcondition;
dot = '.';
textcondition = 'or' | 'and' | '<' | '>' | '=';
Developer of ParseKit here.
Here is a ParseKit grammar that matches your example input:
#start = expr;
expr = orExpr;
orExpr = andExpr orTerm*;
orTerm = 'or' andExpr;
// 'and' should bind more tightly than 'or'
andExpr = relExpr andTerm*;
andTerm = 'and' relExpr;
// relational expressions should bind more tightly than 'and'/'or'
relExpr = callExpr relTerm*;
relTerm = relOp callExpr;
// func calls should bind more tightly than relational expressions
callExpr = primaryExpr ('(' argList ')')?;
argList = Empty | atom (',' atom)*;
primaryExpr = atom | '(' expr ')';
atom = obj | literal;
// member access should bind most tightly
obj = id member*;
member = ('.' id);
id = Word;
literal = Number | QuotedString | bool;
bool = 'yes' | 'no';
relOp = '<' | '>' | '=';
To give you an idea of how I arrived at this grammar:
I realized that your language is a simple, composable expression langauge.
I remembered that XPath 1.0 is also a relatively simple expression langauge with a easily available/readable grammar.
I visited the XPath 1.0 spec online and quickly scanned the XPath basic language grammar. That served to provide a quick jumping-off point for desinging your language grammar. If you ignore the path expression part of XPath expressions, XPath is a very good template for a basic expression language.
My grammar above successfully parses all of your example inputs (see below). Hope this helps.
[foo, ., bar, (, "hello", ), or, (, bar, or, baz, >, bat, )]foo/./bar/(/"hello"/)/or/(/bar/or/baz/>/bat/)^
[myitem, ., value, >, 4, and, myitem, ., value, <, 10]myitem/./value/>/4/and/myitem/./value/</10^
[myitem, ., value, =, yes, or, myotheritem, ., value, =, no]myitem/./value/=/yes/or/myotheritem/./value/=/no^

Antlr parser for and/or logic - how to get expressions between logic operators?

I am using ANTLR to create an and/or parser+evaluator. Expressions will have the format like:
x eq 1 && y eq 10
(x lt 10 && x gt 1) OR x eq -1
I was reading this post on logic expressions in ANTLR Looking for advice on project. Parsing logical expression and I found the grammar posted there a good start:
grammar Logic;
parse
: expression EOF
;
expression
: implication
;
implication
: or ('->' or)*
;
or
: and ('&&' and)*
;
and
: not ('||' not)*
;
not
: '~' atom
| atom
;
atom
: ID
| '(' expression ')'
;
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
However, while getting a tree from the parser works for expressions where the variables are just one character (ie, "(A || B) AND C", I am having a hard time adapting this to my case (in the example "x eq 1 && y eq 10" I'd expect one "AND" parent and two children, "x eq 1" and "y eq 10", see the test case below).
#Test
public void simpleAndEvaluation() throws RecognitionException{
String src = "1 eq 1 && B";
LogicLexer lexer = new LogicLexer(new ANTLRStringStream(src));
LogicParser parser = new LogicParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
assertEquals("&&",tree.getText());
assertEquals("1 eq 1",tree.getChild(0).getText());
assertEquals("a neq a",tree.getChild(1).getText());
}
I believe this is related with the "ID". What would the correct syntax be?
For those interested, I made some improvements in my grammar file (see bellow)
Current limitations:
only works with &&/||, not AND/OR (not very problematic)
you can't have spaces between the parenthesis and the &&/|| (I solve that by replacing " (" with ")" and ") " with ")" in the source String before feeding the lexer)
grammar Logic;
options {
output = AST;
}
tokens {
AND = '&&';
OR = '||';
NOT = '~';
}
// parser/production rules start with a lower case letter
parse
: expression EOF! // omit the EOF token
;
expression
: or
;
or
: and (OR^ and)* // make `||` the root
;
and
: not (AND^ not)* // make `&&` the root
;
not
: NOT^ atom // make `~` the root
| atom
;
atom
: ID
| '('! expression ')'! // omit both `(` and `)`
;
// lexer/terminal rules start with an upper case letter
ID
:
(
'a'..'z'
| 'A'..'Z'
| '0'..'9' | ' '
| SYMBOL
)+
;
SYMBOL
:
('+'|'-'|'*'|'/'|'_')
;
ID : ('a'..'z' | 'A'..'Z')+;
states that an identifier is a sequence of one or more letters, but does not allow any digits. Try
ID : ('a'..'z' | 'A'..'Z' | '0'..'9')+;
which will allow e.g. abc, 123, 12ab, and ab12. If you don't want the latter types, you'll have to restructure the rule a little bit (left as a challenge...)
In order to accept arbitrarily many identifiers, you could define atom as ID+ instead of ID.
Also, you will likely need to specify AND, OR, -> and ~ as tokens so that, as #Bart Kiers says, the first two won't get classified as ID, and so that the latter two will get recognized at all.

Resources