Problem
When I use do, a lua keyword as a table's key it gives following error
> table.newKey = { do = 'test' }
stdin:1: unexpected symbol near 'do'
>
I need to use do as key. What should I do ?
sometable.somekey is syntactic sugar for sometable['somekey'],
similarly { somekey = somevalue } is sugar for { ['somekey'] = somevalue }
Information like this can be found in this very good resource:
For such needs, there is another, more general, format. In this format, we explicitly write the index to be initialized as an expression, between square brackets:
opnames = {["+"] = "add", ["-"] = "sub",
["*"] = "mul", ["/"] = "div"}
-- Programming in Lua: 3.6 – Table Constructors
Use this syntax:
t = { ['do'] = 'test' }
or t['do'] to get or set a value.
I need to use do as key. What should I do ?
Read the Lua 5.4 Reference Manual and understand that something like t = { a = b} or t.a = b only works if a is a valid Lua identifier.
3.4.9 - Table constructors
The general syntax for constructors is
tableconstructor ::= ‘{’ [fieldlist] ‘}’
fieldlist ::= field {fieldsep field} [fieldsep]
field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp
fieldsep ::= ‘,’ | ‘;’
A field of the form name = exp is equivalent to ["name"] = exp.
So why does this not work for do?
3.1 - Lexical Conventsions
Names (also called identifiers) in Lua can be any string of Latin
letters, Arabic-Indic digits, and underscores, not beginning with a
digit and not being a reserved word. Identifiers are used to name
variables, table fields, and labels.
The following keywords are reserved and cannot be used as names:
and break do else elseif end
false for function goto if in
local nil not or repeat return
do is not a name so you need to use the syntax field ::= ‘[’ exp ‘]’ ‘=’ exp
which in your example is table.newKey = { ['do'] = 'test' }
Related
This is a demo code
label:
var id
let id = 10
goto label
If allowed keyword as identifier will be
let:
var var
let var = 10
goto let
This is totally legal code. But it seems very hard to do this in antlr.
AFAIK, If antlr match a token let, will never fallback to id token. so for antlr it will see
LET_TOKEN :
VAR_TOKEN <missing ID_TOKEN>VAR_TOKEN
LET_TOKEN <missing ID_TOKEN>VAR_TOKEN = 10
although antlr allowed predicate, I have to control ever token match and problematic. grammar become this
grammar Demo;
options {
language = Go;
}
#parser::members{
var _need = map[string]bool{}
func skip(name string,v bool){
_need[name] = !v
fmt.Println("SKIP",name,v)
}
func need(name string)bool{
fmt.Println("NEED",name,_need[name])
return _need[name]
}
}
proj#init{skip("inst",false)}: (line? NL)* EOF;
line
: VAR ID
| LET ID EQ? Integer
;
NL: '\n';
VAR: {need("inst")}? 'var' {skip("inst",true)};
LET: {need("inst")}? 'let' {skip("inst",true)};
EQ: '=';
ID: ([a-zA-Z] [a-zA-Z0-9]*);
Integer: [0-9]+;
WS: [ \t] -> skip;
Looks so terrible.
But this is easy in peg, test this in pegjs
Expression = (Line? _ '\n')* ;
Line
= 'var' _ ID
/ 'let' _ ID _ "=" _ Integer
Integer "integer"
= [0-9]+ { return parseInt(text(), 10); }
ID = [a-zA-Z] [a-zA-Z0-9]*
_ "whitespace"
= [ \t]*
I actually done this in peggo and javacc.
My question is how to handle these grammars in antlr4.6, I was so excited about the antlr4.6 go target, but seems I choose the wrong tool for my grammar ?
The simplest way is to define a parser rule for identifiers:
id: ID | VAR | LET;
VAR: 'var';
LET: 'let';
ID: [a-zA-Z] [a-zA-Z0-9]*;
And then use id instead of ID in your parser rules.
A different way is to use ID for identifiers and keywords, and use predicates for disambiguation. But it's less readable, so I'd use the first way instead.
I'm stuck on this problem for a while now, hope you can help. I've got the following (shortened) language grammar:
lexical Id = [a-zA-Z][a-zA-Z]* !>> [a-zA-Z] \ MyKeywords;
lexical Natural = [1-9][0-9]* !>> [0-9];
lexical StringConst = "\"" ![\"]* "\"";
keyword MyKeywords = "value" | "Male" | "Female";
start syntax Program = program: Model* models;
syntax Model = Declaration;
syntax Declaration = decl: "value" Id name ':' Type t "=" Expression v ;
syntax Type = gender: "Gender";
syntax Expression = Terminal;
syntax Terminal = id: Id name
| constructor: Type t '(' {Expression ','}* ')'
| Gender;
syntax Gender = male: "Male"
| female: "Female";
alias ASLId = str;
data TYPE = gender();
public data PROGRAM = program(list[MODEL] models);
data MODEL = decl(ASLId name, TYPE t, EXPR v);
data EXPR = constructor(TYPE t, list[EXPR] args)
| id(ASLId name)
| male()
| female();
Now, I'm trying to parse:
value mannetje : Gender = Male
This parses fine, but fails on implode, unless I remove the id: Id name and it's constructor from the grammar. I expected that the /MyKeywords would prevent this, but unfortunately it doesn't. Can you help me fix this, or point me in the right direction to how to debug? I'm having some trouble with debugging the Concrete and Abstract syntax.
Thanks!
It does not seem to be parsing at all (I get a ParseError if I try your example).
One of the problems is probably that you don't define Layout. This causes the ParseError with you given example. One of the easiest fixes is to extend the standard Layout in lang::std::Layout. This layout defines all the default white spaces (and comment) characters.
For more information on nonterminals see here.
I took the liberty in simplifying your example a bit further so that parsing and imploding works. I removed some unused nonterminals to keep the parse tree more concise. You probably want more that Declarations in your Program but I leave that up to you.
extend lang::std::Layout;
lexical Id = ([a-z] !<< [a-z][a-zA-Z]* !>> [a-zA-Z]) \ MyKeywords;
keyword MyKeywords = "value" | "Male" | "Female" | "Gender";
start syntax Program = program: Declaration* decls;
syntax Declaration = decl: "value" Id name ':' Type t "=" Expression v ;
syntax Type = gender: "Gender";
syntax Expression
= id: Id name
| constructor: Type t '(' {Expression ','}* ')'
| Gender
;
syntax Gender
= male: "Male"
| female: "Female"
;
data PROGRAM = program(list[DECL] exprs);
data DECL = decl(str name, TYPE t, EXPR v);
data EXPR = constructor(TYPE t, list[EXPR] args)
| id(str name)
| male()
| female()
;
data TYPE = gender();
Two things:
The names of the ADTs should correspond to the nonterminal names (you have difference cases and EXPR is not Expression). That is the only way implode can now how to do its work. Put the data decls in their own module and implode as follows: implode(#AST::Program, pt) where pt is the parse tree.
The grammar was ambiguous: the \ MyKeywords only applied to the tail of the identifier syntax. Use the fix: ([a-zA-Z][a-zA-Z]* !>> [a-zA-Z]) \ MyKeywords;.
Here's what worked for me (grammar unchanged except for the fix):
module AST
alias ASLId = str;
data Type = gender();
public data Program = program(list[Model] models);
data Model = decl(ASLId name, Type t, Expression v);
data Expression = constructor(Type t, list[Expression] args)
| id(ASLId name)
| male()
| female();
If I write grammar file in Yacc/Bison like this:
Module
:ModuleName "=" Functions
{ $$ = Builder::concat($1, $2, ","); }
Functions
:Functions Function
{ $$ = Builder::concat($1, $2, ","); }
| Function
{ $$ = $1; }
Function
: DEF ID ARGS BODY
{
/** Lacks module name to do name mangling for the function **/
/** How can I obtain the "parent" node's module name here ?? **/
module_name = ; //????
$$ = Builder::def_function(module_name, $ID, $ARGS, $BODY);
}
And this parser should parse codes like this:
main_module:
def funA (a,b,c) { ... }
In my AST, the name "funA" should be renamed as main_module.funA. But I can't get the module's information while the parser is processing Function node !
Is there any Yacc/Bison facilities can help me to handle this problem, or should I change my parsing style to avoid such embarrassing situations ?
There is a bison feature, but as the manual says, use it with care:
$N with N zero or negative is allowed for reference to tokens and groupings on the stack before those that match the current rule. This is a very risky practice, and to use it reliably you must be certain of the context in which the rule is applied. Here is a case in which you can use this reliably:
foo: expr bar '+' expr { ... }
| expr bar '-' expr { ... }
;
bar: /* empty */
{ previous_expr = $0; }
;
As long as bar is used only in the fashion shown here, $0 always refers to the expr which precedes bar in the definition of foo.
More cleanly, you could use a mid-rule action (in Module) to push the module name on a name stack (which would have to be part of the parsing context). You would then pop the stack at the end of the rule.
For more information and examples of mid-rules actions, see the manual.
I am developing Pascal language parser in Haskell using Parsec library and I need to re-define some tokens defined in Parsec.Token class.
Speeking of it, here is my case:
I need to change how stringLiteral token is matched. In default definition, it is something between char '"' (see this), but I need it to be between '\'' (apostrophes). How can I do this modification to Parsec behavior?
Thanks!!!
You are talking about adjusting the field of a data type named GenTokenParser. It looks like you are using a function that automatically fills in the data type with sensible defaults and you just want to adjust one thing, here you go:
myMakeTokenParser langDef =
let default = makeTokenParser langDef
in default { stringLiteral = newStringLit }
where
newStringLit = lexeme (
do{ str <- between (char '\'')
(char '\'' <?> "end of string")
(many stringChar)
; return (foldr (maybe id (:)) "" str)
}
<?> "literal string")
I am using Scala combinatorial parser by extending scala.util.parsing.combinator.syntactical.StandardTokenParser. This class provides following methods
def ident : Parser[String] for parsing identifiers and
def numericLit : Parser[String] for parsing a number (decimal I suppose)
I am using scala.util.parsing.combinator.lexical.Scannersfrom scala.util.parsing.combinator.lexical.StdLexicalfor lexing.
My requirement is to parse a hexadecimal number (without the 0x prefix) which can be of any length. Basically a grammar like: ([0-9]|[a-f])+
I tried integrating Regex parser but there are type issues there. Other ways to extend the definition of lexer delimiter and grammar rules lead to token not found!
As I thought the problem can be solved by extending the behavior of Lexer and not the Parser. The standard lexer takes only decimal digits, so I created a new lexer:
class MyLexer extends StdLexical {
override type Elem = Char
override def digit = ( super.digit | hexDigit )
lazy val hexDigits = Set[Char]() ++ "0123456789abcdefABCDEF".toArray
lazy val hexDigit = elem("hex digit", hexDigits.contains(_))
}
And my parser (which has to be a StandardTokenParser) can be extended as follows:
object ParseAST extends StandardTokenParsers{
override val lexical:MyLexer = new MyLexer()
lexical.delimiters += ( "(" , ")" , "," , "#")
...
}
The construction of the "number" from digits is taken care by StdLexical class:
class StdLexical {
...
def token: Parser[Token] =
...
| digit~rep(digit)^^{case first ~ rest => NumericLit(first :: rest mkString "")}
}
Since StdLexical gives just the parsed number as a String it is not a problem for me, as I am not interested in numeric value either.
You can use the RegexParsers with an action associated to the token in question.
import scala.util.parsing.combinator._
object HexParser extends RegexParsers {
val hexNum: Parser[Int] = """[0-9a-f]+""".r ^^
{ case s:String => Integer.parseInt(s,16) }
def seq: Parser[Any] = repsep(hexNum, ",")
}
This will define a parser that reads comma separated hex number with no prior 0x. And it will actually return a Int.
val result = HexParser.parse(HexParser.seq, "1, 2, f, 10, 1a2b34d")
scala> println(result)
[1.21] parsed: List(1, 2, 15, 16, 27439949)
Not there is no way to distinguish decimal notation numbers. Also I'm using the Integer.parseInt, this is limited to the size of your Int. To get any length you may have to make your own parser and use BigInteger or arrays.