Here is a Lua 5.2.2 transcript, showing the declaration and indexing of a table:
> mylist = {'foo', 'bar'}
> print(mylist[1])
foo
Why isn't the following statement legal?
> print({'foo', 'bar'}[1])
stdin:1: ')' expected near '['
I can't think of any other language where a literal can't be substituted for a reference (except, of course, when an lvalue is needed).
FWIW, parenthesizing the table literal makes the statement legal:
> print(({'foo', 'bar'})[1])
foo
It is also related to the fact that in Lua this syntax is valid:
myfunc { 1, 2, 3 }
and it is equivalent to:
myfunc( { 1, 2, 3 } )
Therefore an expression such as:
myfunc { 1, 2, 3 } [2]
is parsed as:
myfunc( { 1, 2, 3 } )[2]
so first the function call is evaluated, then the indexing takes place.
If {1,2,3}[2] could be parsed as a valid indexing operation it could lead to ambiguities in the previous expression that would require more lookahead. The Lua team chose to make the Lua bytecode compiler fast by making it a single pass compiler, so it scans source code only once with a minimum of lookahead. This message to lua mailing list from Roberto Ierusalimschy (lead Lua developer) points in that direction.
The same problem exists for string literals and method calls. This is invalid:
"my literal":sub(1)
but this is valid:
("my literal"):sub(1)
Again, Lua allows this:
func "my literal"
as equivalent to this:
func( "my literal" )
Going of the grammar defined here, the reason that the non parenthesized version is invalid yet the parenthesized is, is because the syntax tree takes a different path and expects a closing bracket ) because there should be no other symbol in that context.
In the first case:
functioncall ::= prefixexp args | prefixexp `:´ Name args
prefixexp =>
prefixexp ::= var | functioncall | `(´ exp `)´
var -> print
THEREFORE prefixexp -> print
args =>
args ::= `(´ [explist] `)´ | tableconstructor | String
match '('
explist =>
explist ::= {exp `,´} exp
exp =>
exp ::= nil | false | true | Number | String | `...´ | function | prefixexp | tableconstructor | exp binop exp | unop exp
tableconstructor =>
explist-> {'foo','bar'}
THEREFORE explist = {'foo','bar'}
match ')' ERROR!!! found '[' expected ')'
On the other hand with parenthesis:
functioncall ::= prefixexp args | prefixexp `:´ Name args
prefixexp =>
prefixexp ::= var | functioncall | `(´ exp `)´
var -> print
THEREFORE prefixexp -> print
args =>
args ::= `(´ [explist] `)´ | tableconstructor | String
match '('
explist =>
explist ::= {exp `,´} exp
exp =>
exp ::= nil | false | true | Number | String | `...´ | function | prefixexp | tableconstructor | exp binop exp | unop exp
prefixexp =>
prefixexp ::= var | functioncall | `(´ exp `)´
var =>
var ::= Name | prefixexp `[´ exp `]´ | prefixexp `.´ Name
prefixexp =>
prefixexp ::= var | functioncall | `(´ exp `)´
match '('
exp =>
exp ::= nil | false | true | Number | String | `...´ | function | prefixexp | tableconstructor | exp binop exp | unop exp
tableconstructor =>
THEREFORE exp = {'foo','bar'}
match ')'
THEREFORE prefixexp = ({'foo','bar'})
match '['
exp => Number = 1
match ']'
THEREFORE VAR = ({'foo','bar'})[1]
THEREFORE prefixexp = VAR
THEREFOER exp = VAR
THEREFORE explist = VAR
match ')'
THEREFORE args = (VAR)
=> print(VAR)
Related
Currently, I've just defined simple rules in ANTLR4:
// Recognizer Rules
program : (class_dcl)+ EOF;
class_dcl: 'class' ID ('extends' ID)? '{' class_body '}';
class_body: (const_dcl|var_dcl|method_dcl)*;
const_dcl: ('static')? 'final' PRIMITIVE_TYPE ID '=' expr ';';
var_dcl: ('static')? id_list ':' type ';';
method_dcl: PRIMITIVE_TYPE ('static')? ID '(' para_list ')' block_stm;
para_list: (para_dcl (';' para_dcl)*)?;
para_dcl: id_list ':' PRIMITIVE_TYPE;
block_stm: '{' '}';
expr: <assoc=right> expr '=' expr | expr1;
expr1: term ('<' | '>' | '<=' | '>=' | '==' | '!=') term | term;
term: ('+'|'-') term | term ('*'|'/') term | term ('+'|'-') term | fact;
fact: INTLIT | FLOATLIT | BOOLLIT | ID | '(' expr ')';
type: PRIMITIVE_TYPE ('[' INTLIT ']')?;
id_list: ID (',' ID)*;
// Lexer Rules
KEYWORD: PRIMITIVE_TYPE | BOOLLIT | 'class' | 'extends' | 'if' | 'then' | 'else'
| 'null' | 'break' | 'continue' | 'while' | 'return' | 'self' | 'final'
| 'static' | 'new' | 'do';
SEPARATOR: '[' | ']' | '{' | '}' | '(' | ')' | ';' | ':' | '.' | ',';
OPERATOR: '^' | 'new' | '=' | UNA_OPERATOR | BIN_OPERATOR;
UNA_OPERATOR: '!';
BIN_OPERATOR: '+' | '-' | '*' | '\\' | '/' | '%' | '>' | '>=' | '<' | '<='
| '==' | '<>' | '&&' | '||' | ':=';
PRIMITIVE_TYPE: 'integer' | 'float' | 'bool' | 'string' | 'void';
BOOLLIT: 'true' | 'false';
FLOATLIT: [0-9]+ ((('.'[0-9]* (('E'|'e')('+'|'-')?[0-9]+)? ))|(('E'|'e')('+'|'-')? [0-9]+));
INTLIT: [0-9]+;
STRINGLIT: '"' ('\\'[bfrnt\\"]|~[\r\t\n\\"])* '"';
ILLEGAL_ESC: '"' (('\\'[bfrnt\\"]|~[\n\\"]))* ('\\'(~[bfrnt\\"]))
{if (true) throw new bkool.parser.IllegalEscape(getText());};
UNCLOSED_STRING: '"'('\\'[bfrnt\\"]|~[\r\t\n\\"])*
{if (true) throw new bkool.parser.UncloseString(getText());};
COMMENT: (BLOCK_COMMENT|LINE_COMMENT) -> skip;
BLOCK_COMMENT: '(''*'(('*')?(~')'))*'*'')';
LINE_COMMENT: '#' (~[\n])* ('\n'|EOF);
ID: [a-zA-z_]+ [a-zA-z_0-9]* ;
WS: [ \t\r\n]+ -> skip ;
ERROR_TOKEN: . {if (true) throw new bkool.parser.ErrorToken(getText());};
I opened the parse tree, and tried to test:
class abc
{
final integer x=1;
}
It returned errors:
BKOOL::program:3:8: mismatched input 'integer' expecting PRIMITIVE_TYPE
BKOOL::program:3:17: mismatched input '=' expecting {':', ','}
I still haven't got why. Could you please help me why it didn't recognize rules and tokens as I expected?
Lexer rules are exclusive. The longest wins, and the tiebreaker is the grammar order.
In your case; integer is a KEYWORD instead of PRIMITIVE_TYPE.
What you should do here:
Make one distinct token per keyword instead of an all-catching KEYWORD rule.
Turn PRIMITIVE_TYPE into a parser rule
Same for operators
Right now, your example:
class abc
{
final integer x=1;
}
Gets converted to lexemes such as:
class ID { final KEYWORD ID = INTLIT ; }
This is thanks to the implicit token typing, as you've used definitions such as 'class' in your parser rules. These get converted to anonymous tokens such as T_001 : 'class'; which get the highest priority.
If this weren't the case, you'd end up with:
KEYWORD ID SEPARATOR KEYWORD KEYWORD ID OPERATOR INTLIT ; SEPARATOR
And that's... not quite easy to parse ;-)
That's why I'm telling you to breakdown your tokens properly.
I'm writing an ANTLR lexer/parser for context free grammar.
This is what I have now:
statement
: assignment_statement
;
assignment_statement
: IDENTIFIER '=' expression ';'
;
term
: IDENT
| '(' expression ')'
| INTEGER
| STRING_LITERAL
| CHAR_LITERAL
| IDENT '(' actualParameters ')'
;
negation
: 'not'* term
;
unary
: ('+' | '-')* negation
;
mult
: unary (('*' | '/' | 'mod') unary)*
;
add
: mult (('+' | '-') mult)*
;
relation
: add (('=' | '/=' | '<' | '<=' | '>=' | '>') add)*
;
expression
: relation (('and' | 'or') relation)*
;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
fragment DIGIT : '0'..'9';
fragment LETTER : ('a'..'z' | 'A'..'Z');
So my assignment statement is identified by the form
IDENTIFIER = expression;
However, assignment statement should also take into account cases when the right hand side is a function call (the return value of the statement). For example,
items = getItems();
What grammar rule should I add for this? I thought of adding a function call to the "expression" rule, but I wasn't sure if function call should be regarded as expression..
Thanks
This grammar looks fine to me. I am assuming that IDENT and IDENTIFIER are the same and that you have additional productions for the remaining terminals.
This production seems to define a function call.
| IDENT '(' actualParameters ')'
You need a production for the actual parameters, something like this.
actualParameters : nothing | expression ( ',' expression )*
I have a 26 rule grammar for a sub-grammar of Mini Java. This grammar is supposed to be non-object-oriented. Anyway, I've been trying to left-factor it and remove left-recursion. However I test it with JFLAP, though, it tells me it is not LL(1). I have followed every step of the algorithm in the Aho-Sethi book.
Could you please give me some tips?
Goal ::= MainClass $
MainClass ::= class <IDENTIFIER> { MethodDeclarations public static void main ( ) {
VarDeclarations Statements } }
VarDeclarations ::= VarDeclaration VarDeclarations | e
VarDeclaration ::= Type <IDENTIFIER> ;
MethodDeclarations ::= MethodDeclaration MethodDeclarations | e
MethodDeclaration ::= public static Type <IDENTIFIER> ( Parameters ) {
VarDeclarations Statements return GenExpression ; }
Parameters ::= Type <IDENTIFIER> Parameter | e
Parameter ::= , Type <IDENTIFIER> Parameter | e
Type ::= boolean | int
Statements ::= Statement Statements | e
Statement ::= { Statements }
| if ( GenExpression ) Statement else Statement
| while ( GenExpression ) Statement
| System.out.println ( GenExpression ) ;
| <IDENTIFIER> = GenExpression ;
GenExpression ::= Expression | RelExpression
Expression ::= Term ExpressionRest
ExpressionRest ::= e | + Term ExpressionRest | - Term ExpressionRest
Term ::= Factor TermRest
TermRest ::= e | * Factor TermRest
Factor ::= ( Expression )
| true
| false
| <INTEGER-LITERAL>
| <IDENTIFIER> ArgumentList
ArgumentList ::= e | ( Arguments )
RelExpression ::= RelTerm RelExpressionRest
RelExpressionRest ::= e | && RelTerm RelExpressionEnd
RelExpressionEnd ::= e | RelExpressionRest
RelTerm ::= Term RelTermRest
RelTermRest ::= == Expression | < Expression | ExpressionRest RelTermEnding
RelTermEnding ::= == Expression | < Expression
Arguments ::= Expression Argument | RelExpression Argument | e
Argument ::= , GenExpression Argument | e
Each <IDENTIFIER> is a valid Java identifier, and <INTEGER-LITERAL> is a simple integer. Each e production stands for an epsilon production, and the $ in the first rule is the end-of-file marker.
I think I spotted two problems (there might be more):
Problem #1
In MainClass you have
MethodDeclarations public static void main
And a MethodDeclaration is
public static Type | e
That's not LL(1) since when the parser sees "public" it cannot tell if it's a MethodDeclaration or the "public static void main" method.
Problem #2
Arguments ::= Expression Argument | RelExpression Argument | e
Both Expression:
Expression ::= Term ExpressionRest
... and RelExpression:
RelExpression ::= RelTerm RelExpressionRest
RelTerm ::= Term RelTermRest
... start with "Term" so that's not LL(1) either.
I'd just go for LL(k) or LL(*) because they allow you to write much more maintainable grammars.
Is there anything to prevent IDENTIFIER being the same as one of your reserved words? if not then your grammar would be ambiguous. I don't see anything else though.
If all else fails, I'd remove all but the last line of the grammar, and test that. If that passes I'd add each line one at a time until I found the problem line.
I am supposed to make a parser for a language with the following grammar:
Program ::= Stmts "return" Expr ";"
Stmts ::= Stmt Stmts
| ε
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= Choice Choices
| Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Shift "<<" integer
| Shift ">>" integer
| Term
Term ::= Term "+" Prod
| Term "-" Prod
| Prod
Prod ::= Prod "*" Prim
| Prim
Prim ::= ident
| integer
| "(" Expr ")"
With the following data type for Expr:
data Expr = Var Ident
| Val Int
| Lshift Expr Int
| Rshift Expr Int
| Plus Expr Expr
| Minus Expr Expr
| Mult Expr Expr
deriving (Eq, Show, Read)
My problem is implementing the Shift operator, because I get the following error when I encounter a left or right shift:
unexpected ">"
expecting operator or ";"
Here is the code I have for Expr:
expr = try (exprOp)
<|> exprShift
exprOp = buildExpressionParser arithmeticalOps prim <?> "arithmetical expression"
prim :: Parser Expr
prim = new_ident <|> new_integer <|> pE <?> "primitive expression"
where
new_ident = do {i <- ident; return $ Var i }
new_integer = do {i <- first_integer; return $ Val i }
pE = parens expr
arithmeticalOps = [ [binary "*" Mult AssocLeft],
[binary "+" Plus AssocLeft, binary "-" Minus AssocLeft]
]
binary name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
exprShift =
do
e <- expr
a <- aShift
i <- first_integer
return $ a e i
aShift = (reservedOp "<<" >> return Lshift)
<|> (reservedOp ">>" >> return Rshift)
I suspect the problem is concerning lookahead, but I can't seem to figure it out.
Here's a grammar with left recursion eliminated (untested). Stmts and Choices can be simplified with Parsec's many and many1. The other recursive productions have to be expanded:
Program ::= Stmts "return" Expr ";"
Stmts ::= #many# Stmt
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= #many1# Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Term ShiftRest
ShiftRest ::= <empty>
| "<<" integer
| ">>" integer
Term ::= Prod TermRest
TermRest ::= <empty>
| "+" Term
| "-" Term
Prod ::= Prim ProdRest
ProdRest ::= <empty>
| "*" Prod
Prim ::= ident
| integer
| "(" Expr ")"
Edit - "Part Two"
"empty" (in angles) is the empty production, you were using epsilon in the original post, but I don't know its Unicode code point and didn't think to copy-paste it.
Here's an example of how I would code the grammar. Note - unlike the grammar I posted empty versions must always be the last choice to give the other productions chance to match. Also your datatypes and constructors for the Abstract Syntax Tree probably differ to the the guesses I've made, but it should be fairly clear what's going on. The code is untested - hopefully any errors are obvious:
shift :: Parser Expr
shift = do
t <- term
leftShift t <|> rightShift <|> emptyShift t
-- Note - this gets an Expr passed in - it is the "prefix"
-- of the shift production.
--
leftShift :: Expr -> Parser Expr
leftShift t = do
reservedOp "<<"
i <- int
return (LShift t i)
-- Again this gets an Expr passed in.
--
rightShift :: Expr -> Parser Expr
rightShift t = do
reservedOp ">>"
i <- int
return (RShift t i)
-- The empty version does no parsing.
-- Usually I would change the definition of "shift"
-- and not bother defining "emptyShift", the last
-- line of "shift" would then be:
--
-- > leftShift t <|> rightShift t <|> return t
--
emptyShift :: Expr -> Parser Expr
emptyShift t = return t
Parsec is still Greek to me, but my vague guess is that aShift should use try.
The parsec docs on Hackage have an example explaining the use of try with <|> that might help you out.
I have this grammar of a C# like language, and I want to make a parser for it, but when I put the grammar it tells me about Shift/Reduce conflicts. I tried to fix some but I can't seem to find another way to improve this grammar. Any help would be greatly appreciated :D Here's the grammar:
Program: Decl
| Program Decl
;
Decl: VariableDecl
| FunctionDecl
| ClassDecl
| InterfaceDecl
;
VariableDecl: Variable SEMICOLON
;
Variable: Type IDENTIFIER
;
Type: TOKINT
| TOKDOUBLE
| TOKBOOL
| TOKSTRING
| IDENTIFIER
| Type BRACKETS
;
FunctionDecl: Type IDENTIFIER OPARENS Formals CPARENS StmtBlock
| TOKVOID IDENTIFIER OPARENS Formals CPARENS StmtBlock
;
Formals: VariablePlus
| /* epsilon */
;
VariablePlus: Variable
| VariablePlus COMMA Variable
;
ClassDecl: TOKCLASS IDENTIFIER OptExtends OptImplements OBRACE ListaField CBRACE
;
OptExtends: TOKEXTENDS IDENTIFIER
| /* epsilon */
;
OptImplements: TOKIMPLEMENTS ListaIdent
| /* epsilon */
;
ListaIdent: ListaIdent COMMA IDENTIFIER
| IDENTIFIER
;
ListaField: ListaField Field
| /* epsilon */
;
Field: VariableDecl
| FunctionDecl
;
InterfaceDecl: TOKINTERFACE IDENTIFIER OBRACE ListaProto CBRACE
;
ListaProto: ListaProto Prototype
| /* epsilon */
;
Prototype: Type IDENTIFIER OPARENS Formals CPARENS SEMICOLON
| TOKVOID IDENTIFIER OPARENS Formals CPARENS SEMICOLON
;
StmtBlock: OBRACE ListaOptG CBRACE
;
ListaOptG: /* epsilon */
| VariableDecl ListaOptG
| Stmt ListaOptG
;
Stmt: OptExpr SEMICOLON
| IfStmt
| WhileStmt
| ForStmt
| BreakStmt
| ReturnStmt
| PrintStmt
| StmtBlock
;
OptExpr: Expr
| /* epsilon */
;
IfStmt: TOKIF OPARENS Expr CPARENS Stmt OptElse
;
OptElse: TOKELSE Stmt
| /* epsilon */
;
WhileStmt: TOKWHILE OPARENS Expr CPARENS Stmt
;
ForStmt: TOKFOR OPARENS OptExpr SEMICOLON Expr SEMICOLON OptExpr CPARENS Stmt
;
ReturnStmt: TOKRETURN OptExpr SEMICOLON
;
BreakStmt: TOKBREAK SEMICOLON
;
PrintStmt: TOKPRINT OPARENS ListaExprPlus CPARENS SEMICOLON
;
ListaExprPlus: Expr
| ListaExprPlus COMMA Expr
;
Expr: LValue LOCATION Expr
| Constant
| LValue
| TOKTHIS
| Call
| OPARENS Expr CPARENS
| Expr PLUS Expr
| Expr MINUS Expr
| Expr TIMES Expr
| Expr DIVIDED Expr
| Expr MODULO Expr
| MINUS Expr
| Expr LESSTHAN Expr
| Expr LESSEQUALTHAN Expr
| Expr GREATERTHAN Expr
| Expr GREATEREQUALTHAN Expr
| Expr EQUALS Expr
| Expr NOTEQUALS Expr
| Expr AND Expr
| Expr OR Expr
| NOT Expr
| TOKNEW OPARENS IDENTIFIER CPARENS
| TOKNEWARRAY OPARENS Expr COMMA Type CPARENS
| TOKREADINTEGER OPARENS CPARENS
| TOKREADLINE OPARENS CPARENS
| TOKMALLOC OPARENS Expr CPARENS
;
LValue: IDENTIFIER
| Expr PERIOD IDENTIFIER
| Expr OBRACKET Expr CBRACKET
;
Call: IDENTIFIER OPARENS Actuals CPARENS
| Expr PERIOD IDENTIFIER OPARENS Actuals CPARENS
| Expr PERIOD LibCall OPARENS Actuals CPARENS
;
LibCall: TOKGETBYTE OPARENS Expr CPARENS
| TOKSETBYTE OPARENS Expr COMMA Expr CPARENS
;
Actuals: ListaExprPlus
| /* epsilon */
;
Constant: INTCONSTANT
| DOUBLECONSTANT
| BOOLCONSTANT
| STRINGCONSTANT
| TOKNULL
;
The old Bison version on my school's server says you have 241 shift/reduce conflicts. One is the dangling if/else statement. Putting "OptElse" does NOT solve it. You should just write out the IfStmt and an IfElseStmt and then use %nonassoc and %prec options in bison to fix it.
Your expressions are the issue of almost all of the other 240 conflicts. What you need to do is either force precedence rules (messy and a terrible idea) or break your arithmetic expressions into stuff like:
AddSubtractExpr: AddSubtractExpr PLUS MultDivExpr | ....
;
MultDivExpr: MultiDivExpr TIMES Factor | ....
;
Factor: Variable | LPAREN Expr RPAREN | call | ...
;
Since Bison produces a bottom up parser, something like this will give you correct order of operations. If you have a copy of the first edition of the Dragon Book, you should look at the grammar in Appendix A. I believe the 2nd edition also has similar rules for simple expressions.
conflicts (shift/reduce or reduce/reduce) mean that your grammar is not LALR(1) so can't be handled by bison directly without help. There are a number of immediately obvious problems:
expression ambiguity -- there's no precedence in the grammar, so things like a + b * c are ambiguous. You can fix this by adding precedence rules, or by splitting the Expr rule into separate AdditiveExpr, MultiplicativeExpr, ConditionalExpr etc rules.
dangling else ambiguity -- if (a) if (b) x; else y; -- the else could be matched with either if. You can either ignore this if the default shift is correct (it usually is for this specific case, but ignoring errors is always dangerous) or split the Stmt rule
There are many books on grammars and parsing that will help with this.