How to parse subnodes that depended on parents' information? - parsing

If I write grammar file in Yacc/Bison like this:
Module
:ModuleName "=" Functions
{ $$ = Builder::concat($1, $2, ","); }
Functions
:Functions Function
{ $$ = Builder::concat($1, $2, ","); }
| Function
{ $$ = $1; }
Function
: DEF ID ARGS BODY
{
/** Lacks module name to do name mangling for the function **/
/** How can I obtain the "parent" node's module name here ?? **/
module_name = ; //????
$$ = Builder::def_function(module_name, $ID, $ARGS, $BODY);
}
And this parser should parse codes like this:
main_module:
def funA (a,b,c) { ... }
In my AST, the name "funA" should be renamed as main_module.funA. But I can't get the module's information while the parser is processing Function node !
Is there any Yacc/Bison facilities can help me to handle this problem, or should I change my parsing style to avoid such embarrassing situations ?

There is a bison feature, but as the manual says, use it with care:
$N with N zero or negative is allowed for reference to tokens and groupings on the stack before those that match the current rule. This is a very risky practice, and to use it reliably you must be certain of the context in which the rule is applied. Here is a case in which you can use this reliably:
foo: expr bar '+' expr { ... }
| expr bar '-' expr { ... }
;
bar: /* empty */
{ previous_expr = $0; }
;
As long as bar is used only in the fashion shown here, $0 always refers to the expr which precedes bar in the definition of foo.
More cleanly, you could use a mid-rule action (in Module) to push the module name on a name stack (which would have to be part of the parsing context). You would then pop the stack at the end of the rule.
For more information and examples of mid-rules actions, see the manual.

Related

How do I use lua keyword as a table key?

Problem
When I use do, a lua keyword as a table's key it gives following error
> table.newKey = { do = 'test' }
stdin:1: unexpected symbol near 'do'
>
I need to use do as key. What should I do ?
sometable.somekey is syntactic sugar for sometable['somekey'],
similarly { somekey = somevalue } is sugar for { ['somekey'] = somevalue }
Information like this can be found in this very good resource:
For such needs, there is another, more general, format. In this format, we explicitly write the index to be initialized as an expression, between square brackets:
opnames = {["+"] = "add", ["-"] = "sub",
["*"] = "mul", ["/"] = "div"}
-- Programming in Lua: 3.6 – Table Constructors
Use this syntax:
t = { ['do'] = 'test' }
or t['do'] to get or set a value.
I need to use do as key. What should I do ?
Read the Lua 5.4 Reference Manual and understand that something like t = { a = b} or t.a = b only works if a is a valid Lua identifier.
3.4.9 - Table constructors
The general syntax for constructors is
tableconstructor ::= ‘{’ [fieldlist] ‘}’
fieldlist ::= field {fieldsep field} [fieldsep]
field ::= ‘[’ exp ‘]’ ‘=’ exp | Name ‘=’ exp | exp
fieldsep ::= ‘,’ | ‘;’
A field of the form name = exp is equivalent to ["name"] = exp.
So why does this not work for do?
3.1 - Lexical Conventsions
Names (also called identifiers) in Lua can be any string of Latin
letters, Arabic-Indic digits, and underscores, not beginning with a
digit and not being a reserved word. Identifiers are used to name
variables, table fields, and labels.
The following keywords are reserved and cannot be used as names:
and break do else elseif end
false for function goto if in
local nil not or repeat return
do is not a name so you need to use the syntax field ::= ‘[’ exp ‘]’ ‘=’ exp
which in your example is table.newKey = { ['do'] = 'test' }

Is $1 a usable token in Lex?

If I have:
if {
yylval = $1;
}
Is this legal? If not, is there another way to say I want to reference what I put in?
(please dont say yylval = 'if', it's not dynamic, and I want to use it in some more complicated scenarios)
No. $1 and friends are non-terminal or terminal symbols in the grammar. I don't know what you're trying to do exactly, but normally you would have a set of rules like this:
"if" { return IF; }
"else" { return ELSE; }
[0-9]+ { yylval.intValue = atoi(yytext); return INTEGER; }
etc., where IF and ELSE are defined in y.tab.h as a result of being declared in your .y file via the %token directive.
please don't say yylval = 'if', it's not dynamic
Neither is a lex rule. Your purpose remains obscure.

PEG grammar to accept late definition

I want to write a PEG parser with PackCC (but also peg/leg or other libraries are possible) which is able to calculate some fields with variables on random position.
The first simplified approach is the following grammar:
%source {
int vars[256];
}
statement <- e:term EOL { printf("answer=%d\n", e); }
term <- l:primary
( '+' r:primary { l += r; }
/ '-' r:primary { l -= r; }
)* { $$ = l; }
/ i:var '=' s:term { $$ = vars[i] = s; }
/ e:primary { $$ = e; }
primary <- < [0-9]+ > { $$ = atoi($1); }
/ i:var !'=' { $$ = vars[i]; }
var <- < [a-z] > { $$ = $1[0]; }
EOL <- '\n' / ';'
%%
When testing with sequential order, it works fine:
a=42;a+1
answer=42
answer=43
But when having the variable definition behind the usage, it fails:
a=42;a+b;b=1
answer=42
answer=42
answer=1
And even deeper chained late definitions shall work, like:
a=42;a+b;b=c;c=1
answer=42
answer=42
answer=0
answer=1
Lets think about the input not as a sequential programming language, but more as a Excel-like spreadsheet e.g.:
A1: 42
A2: =A1+A3
A3: 1
Is it possible to parse and handle such kind of text with a PEG grammar?
Is two-pass or multi-pass an option here?
Or do I need to switch over to old style lex/yacc flex/bison?
I'm not familiar with PEG per se, but it looks like what you have is an attributed grammar where you perform the execution logic directly within the semantic action.
That won't work if you have use before definition.
You can use the same parser generator but you'll probably have to define some sort of abstract syntax tree to capture the semantics and postpone evaluation until you've parsed all input.
Yes, it is possible to parse this with a PEG grammar. PEG is effectively greedy LL(*) with infinite lookahead. Expressions like this are easy.
But the grammar you have written is left recursive, which is not PEG. Although some PEG parsers can handle left recursion, until you're an expert it's best to avoid it, and use only right recursion if needed.

ANTLR4: semantic predicate depending on state does not work

In order to have the lexer of ANTLR4 recognize different kinds of tokens in one rule I use a semantic predicate. This predicate evaluates a static field of a helper class. Have a look at some grammar excerpts:
// very simplified
#header {
import static ParserAndLexerState.*;
}
#members {
private boolean fooAllowed() {
System.out.println(fooAllowed);
}
...
methodField
: t = type
{ fooAllowed = false; }
id = Identifier
{ fooAllowed = true; /* do something with t and id*/ }
...
fragment CHAR_NO_OUT_1 : [a-eg-zA-Z_] ;
fragment CHAR_NO_OUT_2 : [a-nq-zA-Z_0-9] ;
fragment CHAR_NO_OUT_3 : [a-nq-zA-Z_0-9] ;
fragment CHAR_1 : [a-zA-Z_] ;
fragment CHAR_N : CHAR_1 | [0-9] ;
Identifier
// returns every possible identifier
: { fooAllowed() }? (CHAR_1 CHAR_N*)
// returns everything but 'foo'
| { !fooAllowed() }? CHAR_NO_OUT_1 (CHAR_NO_OUT_2 (CHAR_NO_OUT_3 CHAR_N*)?)? ;
Identifier will now always behave as if fooAllowed had the initial value of the definition in ParserAndLexerState. So if this was true Identifier will only use the first alternative of the rule, otherwise always the second. This is some weird behavior, especially considering that fooAllowed prints the right values to the console.
Is there anything in ANTLR4 that could discourages me from using global state from within semantic predicates? How can I avoid this behavior?
ANTLR 4 uses unbounded lookahead with non-deterministic termination conditions for the prediction process. While the TokenStream implementations do call TokenSource.nextToken lazily, it is not safe to ever assume that the number of tokens consumed so far is bounded.
In other words, the actual semantics of using a parser action to change the behavior of the lexer are undefined. Different versions of ANTLR 4, or even subtle changes in the input you give it, could produce completely different results.

How to embed Scala code inside a specially defined syntax?

I don't know if this info is relevant to the question, but I am learning Scala parser combinators.
Using some examples (in this master thesis) I was able to write a simple functional (in the sense that it is non imperative) programming language.
Is there a way to improve my parser/evaluator such that it could allow/evaluate input like this:
<%
import scala.<some package / classes>
import weka.<some package / classes>
%>
some DSL code (lambda calculus)
<%
System.out.println("asdasd");
J48 j48 = new J48();
%>
as input written in the guest language (DSL)?
Should I use reflection or something similar* to evaluate such input?
Is there some source code recommendation to study (may be groovy sources?)?
Maybe this is something similar: runtime compilation, but I am not sure this is the best alternative.
EDIT
Complete answer given bellow with "{" and "}". Maybe "{{" would be better.
It is the question as to what the meaning of such import statements should be.
Perhaps you start first with allowing references to java methods in your language (the Lambda Calculus, I guess?).
For example:
java.lang.System.out.println "foo"
If you have that, you can then add resolution of unqualified names like
println "foo"
But here comes the first problem: println exists in System.out and System.err, or, to be more correct: it is a method of PrintStream, and both System.err and System.out are PrintStreams.
Hence you would need some notion of Objects, Classes, Types, and so on to do it right.
I managed how to run Scala code embedded in my interpreted DSL.
Insertion of DSL vars into Scala code and recovering returning value comes as a bonus. :)
Minimal relevant code from parsing and interpreting until performing embedded Scala code run-time execution (Main Parser AST and Interpreter):
object Main extends App {
val ast = Parser1 parse "some dsl code here"
Interpreter eval ast
}
object Parser1 extends RegexParsers with ImplicitConversions {
import AST._
val separator = ";"
def parse(input: String): Expr = parseAll(program, input).get
type P[+T] = Parser[T]
def program = rep1sep(expr, separator) <~ separator ^^ Sequence
def expr: Parser[Expr] = (assign /*more calls here*/)
def scalacode: P[Expr] = "{" ~> rep(scala_text) <~ "}" ^^ {case l => Scalacode(l.flatten)}
def scala_text = text_no_braces ~ "$" ~ ident ~ text_no_braces ^^ {case a ~ b ~ c ~ d => List(a, b + c, d)}
//more rules here
def assign = ident ~ ("=" ~> atomic_expr) ^^ Assign
//more rules here
def atomic_expr = (
ident ^^ Var
//more calls here
| "(" ~> expr <~ ")"
| scalacode
| failure("expression expected")
)
def text_no_braces = """[a-zA-Z0-9\"\'\+\-\_!##%\&\(\)\[\]\/\?\:;\.\>\<\,\|= \*\\\n]*""".r //| fail("Scala code expected")
def ident = """[a-zA-Z]+[a-zA-Z0-9]*""".r
}
object AST {
sealed abstract class Expr
// more classes here
case class Scalacode(items: List[String]) extends Expr
case class Literal(v: Any) extends Expr
case class Var(name: String) extends Expr
}
object Interpreter {
import AST._
val env = collection.immutable.Map[VarName, VarValue]()
def run(code: String) = {
val code2 = "val res_1 = (" + code + ")"
interpret.interpret(code2)
val res = interpret.valueOfTerm("res_1")
if (res == None) Literal() else Literal(res.get)
}
class Context(private var env: Environment = initEnv) {
def eval(e: Expr): Any = e match {
case Scalacode(l: List[String]) => {
val r = l map {
x =>
if (x.startsWith("$")) {
eval(Var(x.drop(1)))
} else {
x
}
}
eval(run(r.mkString))
}
case Assign(id, expr) => env += (id -> eval(expr))
//more pattern matching here
case Literal(v) => v
case Var(id) => {
env getOrElse(id, sys.error("Undefined " + id))
}
}
}
}

Resources