I am using bison to write a parser to a C-like grammar, but I'm having problem in variable declaration.
My variables can be simples variable, arrays or structs, so I can have a variable like a.b[3].c.
I have the following rule:
var : NAME /* NAME is a string */
| var '[' expr ']'
| var '.' var
;
which are giving me a shift/reduce conflict, but I can't think of a way to solve this.
How can I rewrite the grammar or use Bison precedence to resolve the problem?
I don't know why you want the . and [] operators to be associative, however something like this might inspire you.
y
%union{}
%token NAME
%start var
%left '.'
%left '['
%%
var:
NAME
| var '[' expr ']'
| var '.' var
Also, as var probably appears in expr, there may be other problems with the grammar.
It's certainly possible to resolve this shift-reduce conflict with a precedence declaration, but in my opinion that just makes the grammar harder to read. It's better to make the grammar reflect the semantics of the language.
A var (what some might call an lvalue or a reference) is either the name of a simple scalar variable or an expression built up iteratively from a var naming a complex value and a selector, either a member name or an array index. In other words:
var : NAME
| var selector
;
selector: '.' NAME
| '[' expression ']'
;
This makes it clear what the meaning of a.b[3].c is, for example; the semantics are described by the parse tree:
a.b[3].c
+----+----+
| |
a.b[3] .c
+---+--+
| |
a.b [3]
+-+-+
| |
a .b
It's not necessary to create the selector rule; it would be trivial to wrap the two rules together. If you don't feel that it is clearer as written above, the following will work just as well:
var: NAME
| var '.' NAME
| var '[' expression ']'
;
Related
I'm trying to learn bison by writing a simple math parser and evaluator. I'm currently implementing variables. A variable can be part of a expression however I'd like do something different when one enters only a single variable name as input, which by itself is also a valid expression and hence the shift reduce conflict. I've reduced the language to this:
%token <double> NUM
%token <const char*> VAR
%nterm <double> exp
%left '+'
%precedence TWO
%precedence ONE
%%
input:
%empty
| input line
;
line:
'\n'
| VAR '\n' %prec ONE
| exp '\n' %prec TWO
;
exp:
NUM
| VAR %prec TWO
| exp '+' exp { $$ = $1 + $3; }
;
%%
As you can see, I've tried solving this by adding the ONE and TWO precedences manually to some rules, however it doesn't seem to work, I always get the exact same conflict. The goal is to prefer the line: VAR '\n' rule for a line consisting of nothing but a variable name, otherwise parse it as expression.
For reference, the conflicting state:
State 4
4 line: VAR . '\n'
7 exp: VAR . ['+', '\n']
'\n' shift, and go to state 8
'\n' [reduce using rule 7 (exp)]
$default reduce using rule 7 (exp)
Precedence comparisons are always, without exception, between a production and a token. (At least, on Yacc/Bison). So you can be sure that if your precedence level list does not contain a real token, it will have no effect whatsoever.
For the same reason, you cannot resolve reduce-reduce conflicts with precedence. That doesn't matter in this case, since it's a shift-reduce conflict, but all the same it's useful to know.
To be even more specific, the precedence comparison is between a reduction (using the precedence of the production to be reduced) and that of the incoming lookahead token. In this case, the lookahead token is \n and the reduction is exp: VAR. The precedence level of that production is the precedence of VAR, since that is the last terminal symbol in the production. So if you want the shift to win out over the reduction, you need to declare your precedences so that the shift is higher:
%precedence VAR
%precedence '\n'
No pseudotokens (or %prec modifiers) are needed.
This will not change the parse, because Bison always prefers shift if there are no applicable precedence rules. But it will suppress the warning.
I m trying to create a list in xtext, If anybody can help me to create a grammar for that, it will be really helpful. I tried writing this but its not the xtext format so i getting errors on that.
List:
'List' name=ID type = Nlist;
Nlist:
Array | Object
;
Array:
{Array} "[" values*=Value[','] "]"
;
Value:
STRING | FLOAT | BOOL | Object | Array | "null"
;
Object:
"{" members*=Member[','] "}"
;
Member:
key=STRING ':' value=Value
I m new to this one, Any help will be appreciated.
Thank you.
the default syntax for comma separated lists is e.g.
MyList: '#[' (elements+=Element (',' elements+=Element )*)? ']';
I have made a grammar that will be used with ANTLR4 with the following definition for expressions:
// Expressions
Expr : Integer # Expr_Integer
| Float # Expr_Float
| Double # Expr_Double
| String # Expr_String
| Variable # Expr_Variable
| FuncCall # Expr_FuncCall
| Expr Op_Infix Expr # Expr_Infix
| Op_Prefix Expr # Expr_Prefix
| Expr Op_Postfix # Expr_Postfix
| Expr 'is' Id # Expr_Is
| 'this' # Expr_This
| Expr '?' Expr ':' Expr # Expr_Ternary
| '(' Expr ')' # Expr_Bracketed
;
I added the labels so that I could easily differentiate between the different expression types when analysing the generated syntax tree. However, ANTLR4 throws the following error for every single one of the above lines (excluding the one with the comment):
error(50): Ash.g4:88:19: syntax error: '#' came as a complete surprise to me while looking for lexer rule element
Line 88 is the final rule alternative ( '(' Expr ')' )
I have look through the documentation and various online examples and my syntax seems correct.
What could be causing the error to be thrown?
In Antlr, rules beginning with an uppercase letter are lexer rules, and those beginning with an lowercase letter are parser rules. Antlr uses these definitions a lot to define what you can and cannot do. Usually, the lexer is faster to proccess but less powerful than the parser.
In your case, Expr should definitely be a parser rule, as basically every other rule you have referenced there. Changing it to expr should match the expected behavior.
As a rule of thumb, lexer rules are to be used only when there is no context, it doesn't matter what is next to the generated token. Things like numeric constants, string constants, identifiers and such.
I am trying to implement an interpreter for a programming language, and ended up stumbling upon a case where I would need to backtrack, but my parser generator (ply, a lex&yacc clone written in Python) does not allow that
Here's the rules involved:
'var_access_start : super'
'var_access_start : NAME'
'var_access_name : DOT NAME'
'var_access_idx : OPSQR expression CLSQR'
'''callargs : callargs COMMA expression
| expression
| '''
'var_access_metcall : DOT NAME LPAREN callargs RPAREN'
'''var_access_token : var_access_name
| var_access_idx
| var_access_metcall'''
'''var_access_tokens : var_access_tokens var_access_token
| var_access_token'''
'''fornew_var_access_tokens : var_access_tokens var_access_name
| var_access_tokens var_access_idx
| var_access_name
| var_access_idx'''
'type_varref : var_access_start fornew_var_access_tokens'
'hard_varref : var_access_start var_access_tokens'
'easy_varref : var_access_start'
'varref : easy_varref'
'varref : hard_varref'
'typereference : NAME'
'typereference : type_varref'
'''expression : new typereference LPAREN callargs RPAREN'''
'var_decl_empty : NAME'
'var_decl_value : NAME EQUALS expression'
'''var_decl : var_decl_empty
| var_decl_value'''
'''var_decls : var_decls COMMA var_decl
| var_decl'''
'statement : var var_decls SEMIC'
The error occurs with statements of the form
var x = new SomeGuy.SomeOtherGuy();
where SomeGuy.SomeOtherGuy would be a valid variable that stores a type (types are first class objects) - and that type has a constructor with no arguments
What happens when parsing that expression is that the parser constructs a
var_access_start = SomeGuy
var_access_metcall = . SomeOtherGuy ( )
and then finds a semicolon and ends in an error state - I would clearly like the parser to backtrack, and try constructing an expression = new typereference(SomeGuy .SomeOtherGuy) LPAREN empty_list RPAREN and then things would work because the ; would match the var statement syntax all right
However, given that PLY does not support backtracking and I definitely do not have enough experience in parser generators to actually implement it myself - is there any change I can make to my grammar to work around the issue?
I have considered using -> instead of . as the "method call" operator, but I would rather not change the language just to appease the parser.
Also, I have methods as a form of "variable reference" so you can do
myObject.someMethod().aChildOfTheResult[0].doSomeOtherThing(1,2,3).helloWorld()
but if the grammar can be reworked to achieve the same effect, that would also work for me
Thanks!
I assume that your language includes expressions other than the ones you've included in the excerpt. I'm also going to assume that new, super and var are actually terminals.
The following is only a rough outline. For readability, I'm using bison syntax with quoted literals, but I don't think you'll have any trouble converting.
You say that "types are first-class values" but your syntax explicitly precludes using a method call to return a type. In fact, it also seems to preclude a method call returning a function, but that seems odd since it would imply that methods are not first-class values, even though types are. So I've simplified the grammar by allowing expressions like:
new foo.returns_method_which_returns_type()()()
It's easy enough to add the restrictions back in, but it makes the exposition harder to follow.
The basic idea is that to avoid forcing the parser to make a premature decision; once new is encountered, it is only possible to distinguish between a method call and a constructor call from the lookahead token. So we need to make sure that the same reductions are used up to that point, which means that when the open parenthesis is encountered, we must still retain both possibilities.
primary: NAME
| "super"
;
postfixed: primary
| postfixed '.' NAME
| postfixed '[' expression ']'
| postfixed '(' call_args ')' /* PRODUCTION 1 */
;
expression: postfixed
| "new" postfixed '(' call_args ')' /* PRODUCTION 2 */
/* | other stuff not relevant here */
;
/* Your callargs allows (,,,3). This one doesn't */
call_args : /* EMPTY */
| expression_list
;
expression_list: expression
| expression_list ',' expression
;
/* Another slightly simplified production */
var_decl: NAME
| NAME '=' expression
;
var_decl_list: var_decl
| var_decl_list ',' var_decl
;
statement: "var" var_decl_list ';'
/* | other stuff not relevant here */
;
Now, take a look at PRODUCTION 1 and PRODUCTION 2, which are very similar. (Marked with comments.) These are basically the ambiguity for which you sought backtracking. However, in this grammar, there is no issue, since once a new has been encountered, the reduction of PRODUCTION 2 can only be performed when the lookahead token is , or ;, while PRODUCTION 1 can only be performed with lookahead tokens ., ( and [.
(Grammar tested with bison, just to make sure there are no conflicts.)
In ANTLR v3, syntactic predicates could be used to solve ambiguitites, i.e., to explicitly tell ANTLR which alternative should be chosen. ANTLR4 seems to simply accept grammars with similar ambiguities, but during parsing it reports these ambiguities. It produces a parse tree, despite these ambiguities (by chosing the first alternative, according to the documentation). But what can I do, if I want it to chose some other alternative? In other words, how can I explicitly resolve ambiguities?
(For the simple case of the dangling else problem see: What to use in ANTLR4 to resolve ambiguities (instead of syntactic predicates)?)
A more complex example:
If I have a rule like this:
expr
: expr '[' expr? ']'
| ID expr
| '[' expr ']'
| ID
| INT
;
This will parse foo[4] as (expr foo (expr [ (expr 4) ])). But I may want to parse it as (expr (expr foo) [ (expr 4) ]). (I. e., always take the first alternative if possible. It is the first alternative, so according to the documentation, it should have higher precedence. So why it builds this tree?)
If I understand correctly, I have 2 solutions:
Basically implement the syntactic predicate with a semantic predicate (however, I'm not sure how, in this case).
Restructure the grammar.
For example, replace expr with e:
e : expr | pe
;
expr
: expr '[' expr? ']'
| ID expr
| ID
| INT
;
pe : '[' expr ']'
;
This seems to work, although the grammar became more complex.
I may misunderstood some things, but both of these solutions seem less elegant and more complicated than syntactic predicates. Although, I like the solution for the dangling else problem with the ?? operator. But I'm not sure how to use in this case. Is it possible?
You may be able to resolve this by placing the ID alternative above ID expr. When left-recursion is eliminated, all of your alternatives which are not left recursive are parsed before your alternatives which are left recursive.
For your example, the first non-left-recursive alternative ID expr matches the entire expression, so there is nothing left to parse afterwards.
To get this expression (expr (expr foo) [ (expr 4) ]), you can use
top : expr EOF;
expr : expr '[' expr? ']' | ID | INT ;