How do I get the generated AST from lemon? - parsing

How do I get the root node of the AST (abstract syntax tree) from lemon? I have tried using %extra_argument { Node *rootNode } and using the following code to return the root node object.
program ::= statements(A). { rootNode = A; }
But root node the root node in the main parse function stays empty.
Here is the main parse function.
Node parse()
{
void* parser = ParseAlloc(malloc);
int token;
Node astRoot;
while (token = yylex())
{
Parse(parser, token, yytext, &astRoot);
}
Parse(parser, 0, NULL, &astRoot);
ParseFree(parser, free);
return astRoot;
}
Can anyone help? Thanks in advance.

rootNode is a pointer. You're updating the local variable rootNode. Try dereferencing it when you copy:
program ::= statements(A). { *rootNode = *A; }

Related

designated Initialization of a Node with a pointer (C++)

When creating a new node in a linked list, is it legal to use designated initializers to initialize the members of the node as mentioned below?
Is there any repercussion in doing so and what would be a better way to achieve the same result?
(lang : C++)
Node *temp = new Node{.data = value, .next = NULL};
struct Node
{
int data;
Node *next;
};
I think you can use function as a constructor.
Node* newFunction(int data) {
Node* newNode = malloc(sizeof(Node));
newNode->data=data;
newNode->next=NULL;
return newNode;
}
And after that, you can use in the main part like that;
Node* newNode = newFunction(5);

Why pointer argv is not updating?

Can somebody help me understand why the pointer head is not updated after new() call?
expected: val:0 # call new(), update l0.val to 0
actual: val:253784 # why update l0.val not update by the pointer
https://www.edaplayground.com/x/54Nz
#include <stdio.h>
#include <stdlib.h>
typedef struct _node {
int val;
struct _node *next;
} node;
//construct the struct
void new(node *head) {
//malloc return a pointer, type casting to (node*)
node *head_l = (node*)malloc(sizeof(node));
if(!head_l) {
printf("Create Fail!\n");
exit(1);
}
head_l->val = 0;
head_l->next = NULL;
printf("head_l->val:%0d\n",head_l->val);
//why head = head_l doesn't work??
head = head_l;
//The line below works
//*head = *head_l;
}
int main() {
node l0;
new(&l0);
printf("val:%0d\n",l0.val);
}
Function parameters receive only the value they are passed, not any reference or other connection to the argument. When the function is called, the parameter head is set to the value of a pointer to l0. Changing head does not change l0.
By referring to the post - Having a function change the value a pointer represents in C, I am able to find the root cause.
Let's say Address of head is [0x0000_0010] -> node object with NULL.
Address of head_l is [0x0003_DF58] -> node object with node.val=0.
head = head_l; only modify head from 0x0000_0010 to 0x0003_DF58.
*head = *head_l; modify [0x0000_0010] - the value of head points, to [0x0003_DF58] - the value of head_l points.
The latter one will change the destination value(NULL) to new value(node.val=0).

Code substitution for DSL using ANTLR

The DSL I'm working on allows users to define a 'complete text substitution' variable. When parsing the code, we then need to look up the value of the variable and start parsing again from that code.
The substitution can be very simple (single constants) or entire statements or code blocks.
This is a mock grammar which I hope illustrates my point.
grammar a;
entry
: (set_variable
| print_line)*
;
set_variable
: 'SET' ID '=' STRING_CONSTANT ';'
;
print_line
: 'PRINT' ID ';'
;
STRING_CONSTANT: '\'' ('\'\'' | ~('\''))* '\'' ;
ID: [a-z][a-zA-Z0-9_]* ;
VARIABLE: '&' ID;
BLANK: [ \t\n\r]+ -> channel(HIDDEN) ;
Then the following statements executed consecutively should be valid;
SET foo = 'Hello world!';
PRINT foo;
SET bar = 'foo;'
PRINT &bar // should be interpreted as 'PRINT foo;'
SET baz = 'PRINT foo; PRINT'; // one complete statement and one incomplete statement
&baz foo; // should be interpreted as 'PRINT foo; PRINT foo;'
Any time the & variable token is discovered, we immediately switch to interpreting the value of that variable instead. As above, this can mean that you set up the code in such a way that is is invalid, full of half-statements that are only completed when the value is just right. The variables can be redefined at any point in the text.
Strictly speaking the current language definition doesn't disallow nesting &vars inside each other, but the current parsing doesn't handle this and I would not be upset if it wasn't allowed.
Currently I'm building an interpreter using a visitor, but this one I'm stuck on.
How can I build a lexer/parser/interpreter which will allow me to do this? Thanks for any help!
So I have found one solution to the issue. I think it could be better - as it potentially does a lot of array copying - but at least it works for now.
EDIT: I was wrong before, and my solution would consume ANY & that it found, including those in valid locations such as inside string constants. This seems like a better solution:
First, I extended the InputStream so that it is able to rewrite the input steam when a & is encountered. This unfortunately involves copying the array, which I can maybe resolve in the future:
MacroInputStream.java
package preprocessor;
import org.antlr.v4.runtime.ANTLRInputStream;
public class MacroInputStream extends ANTLRInputStream {
private HashMap<String, String> map;
public MacroInputStream(String s, HashMap<String, String> map) {
super(s);
this.map = map;
}
public void rewrite(int startIndex, int stopIndex, String replaceText) {
int length = stopIndex-startIndex+1;
char[] replData = replaceText.toCharArray();
if (replData.length == length) {
for (int i = 0; i < length; i++) data[startIndex+i] = replData[i];
} else {
char[] newData = new char[data.length+replData.length-length];
System.arraycopy(data, 0, newData, 0, startIndex);
System.arraycopy(replData, 0, newData, startIndex, replData.length);
System.arraycopy(data, stopIndex+1, newData, startIndex+replData.length, data.length-(stopIndex+1));
data = newData;
n = data.length;
}
}
}
Secondly, I extended the Lexer so that when a VARIABLE token is encountered, the rewrite method above is called:
MacroGrammarLexer.java
package language;
import language.DSL_GrammarLexer;
import org.antlr.v4.runtime.Token;
import java.util.HashMap;
public class MacroGrammarLexer extends MacroGrammarLexer{
private HashMap<String, String> map;
public DSL_GrammarLexerPre(MacroInputStream input, HashMap<String, String> map) {
super(input);
this.map = map;
// TODO Auto-generated constructor stub
}
private MacroInputStream getInput() {
return (MacroInputStream) _input;
}
#Override
public Token nextToken() {
Token t = super.nextToken();
if (t.getType() == VARIABLE) {
System.out.println("Encountered token " + t.getText()+" ===> rewriting!!!");
getInput().rewrite(t.getStartIndex(), t.getStopIndex(),
map.get(t.getText().substring(1)));
getInput().seek(t.getStartIndex()); // reset input stream to previous
return super.nextToken();
}
return t;
}
}
Lastly, I modified the generated parser to set the variables at the time of parsing:
DSL_GrammarParser.java
...
...
HashMap<String, String> map; // same map as before, passed as a new argument.
...
...
public final SetContext set() throws RecognitionException {
SetContext _localctx = new SetContext(_ctx, getState());
enterRule(_localctx, 130, RULE_set);
try {
enterOuterAlt(_localctx, 1);
{
String vname = null; String vval = null; // set up variables
setState(1215); match(SET);
setState(1216); vname = variable_name().getText(); // set vname
setState(1217); match(EQUALS);
setState(1218); vval = string_constant().getText(); // set vval
System.out.println("Found SET " + vname +" = " + vval+";");
map.put(vname, vval);
}
}
catch (RecognitionException re) {
_localctx.exception = re;
_errHandler.reportError(this, re);
_errHandler.recover(this, re);
}
finally {
exitRule();
}
return _localctx;
}
...
...
Unfortunately this method is final so this will make maintenance a bit more difficult, but it works for now.
The standard pattern to handling your requirements is to implement a symbol table. The simplest form is as a key:value store. In your visitor, add var declarations as encountered, and read out the values as var references are encountered.
As described, your DSL does not define a scoping requirement on the variables declared. If you do require scoped variables, then use a stack of key:value stores, pushing and popping on scope entry and exit.
See this related StackOverflow answer.
Separately, since your strings may contain commands, you can simply parse the contents as part of your initial parse. That is, expand your grammar with a rule that includes the full set of valid contents:
set_variable
: 'SET' ID '=' stringLiteral ';'
;
stringLiteral:
Quote Quote? (
( set_variable
| print_line
| VARIABLE
| ID
)
| STRING_CONSTANT // redefine without the quotes
)
Quote
;

Values in $1, $2 .. variables always NULL

I am trying to create a parser with Bison (GNU bison 2.4.1) and flex (2.5.35) on my Ubuntu OS. I have something like this:
sql.h:
typedef struct word
{
char *val;
int length;
} WORD;
struct yword
{
struct word v;
int o;
...
};
sql1.y
%{
..
#include "sql.h"
..
%}
%union yystype
{
struct tree *t;
struct yword b;
...
}
%token <b> NAME
%%
...
table:
NAME { add_table(root, $1.v); }
;
...
Trouble is that whatever string I give to it, when it comes to resolve this, v always has values (NULL, 0) even if the input string should have some table name. (I chose to skip unnecessary other details/snippets, but can provide more if it helps resolve this.)
I wrote the grammar which is complete and correct, but I can't get it to build the parse tree due to this problem.
Any inputs would be quite appreciated.
Your trouble seems related to some missing or buggous code in the lexical analyzer.
Check your lexical analyzer first.
If it does not return the token proprely the parser part can not handle correctly the values.
Write a basic test that print the token value.
Do not mind the "c" style, above all is the principle :
main() {
int token;
while( token = yylex() ) {
switch( token) {
case NAME:
printf("name '%s'\n", yylval.b.v.val );
break;
...
}
}
}
If you run some input and that does not work.
if the lexical analyzer does not set yylval when it returns NAME, it is normal that val is empty.
If in your flex you have a pattern such as :
[a-z]+ { return NAME; }
It is incorrect you have to set the value like this
[a-z]+ {
yylval.val = strdup(yytext);
yylval.length = yylen;
return NAME; }

goto statement in antlr?

I would really appreciate if someone could give me advice,or point me to tutorial, or sample implementation, anything that could help me implement basic goto statement in ANTLR?
Thanks for any help
edit. ver2 of question:
Say I have this tree structure:
(BLOCK (PRINT 1) (PRINT 2) (PRINT 3) (PRINT 4) )
Now, I'm interested to know is there a way to
select, say, node (PRINT 2) and all nodes that follow
that node ((PRINT 2) (PRINT 3) (PRINT 4)) ?
I'm asking this because I'm trying to implement
basic goto mechanism.
I have print statement like this:
i=LABEL print
{interpreter.store($i.text, $print.tree);} //stores in hash table
-> print
However $print.tree just ignores later nodes,
so in input:
label: print 1
print 2
goto label
would print 121!
(What I would like is infinite loop 1212...)
I've also tried taking token
address of print statement with
getTokenStartIndex() and setting
roots node with setTokenStartIndex
but that just looped whatever was first node over and over.
My question is, how does one implement goto statement in antlr ?
Maybe my approach is wrong, as I have overlooked something?
I would really appreciate any help.
ps. even more detail, it is related to pattern 25 - Language Implementation patterns, I'm trying to add on to examples from that pattern.
Also, I've searched quite a bit on the web, looks like it is very hard to find goto example
... anything that could help me implement basic goto statement in ANTLR?
Note that it isn't ANTLR that implements this. With ANTLR you merely describe the language you want to parse to get a lexer, parser and possibly a tree-walker. After that, it's up to you to manipulate the tree and evaluate it.
Here's a possible way. Please don't look too closely at the code. It's a quick hack: there's a bit of code-duplication and I'm am passing package protected variables around which isn't as it should be done. The grammar also dictates you to start your input source with a label, but this is just a small demo of how you could solve it.
You need the following files:
Goto.g - the combined grammar file
GotoWalker.g - the tree walker grammar file
Main.java - the main class including the Node-model classes of the language
test.goto - the test input source file
antlr-3.3.jar - the ANTLR JAR (could also be another 3.x version)
Goto.g
grammar Goto;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
FILE;
BLOCK;
}
#members {
java.util.Map<String, CommonTree[]> labels = new java.util.HashMap<String, CommonTree[]>();
}
parse
: block EOF -> block
;
block
: ID ':' stats b=block? {labels.put($ID.text, new CommonTree[]{$stats.tree, $b.tree});} -> ^(BLOCK stats $b?)
;
stats
: stat*
;
stat
: Print Number -> ^(Print Number)
| Goto ID -> ^(Goto ID)
;
Goto : 'goto';
Print : 'print';
Number : '0'..'9'+;
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
GotoWalker.g
tree grammar GotoWalker;
options {
tokenVocab=Goto;
ASTLabelType=CommonTree;
}
tokens {
FILE;
BLOCK;
}
#members {
java.util.Map<String, CommonTree[]> labels = new java.util.HashMap<String, CommonTree[]>();
}
walk returns [Node n]
: block {$n = $block.n;}
;
block returns [Node n]
: ^(BLOCK stats b=block?) {$n = new BlockNode($stats.n, $b.n);}
;
stats returns [Node n]
#init{List<Node> nodes = new ArrayList<Node>();}
: (stat {nodes.add($stat.n);})* {$n = new StatsNode(nodes);}
;
stat returns [Node n]
: ^(Print Number) {$n = new PrintNode($Number.text);}
| ^(Goto ID) {$n = new GotoNode($ID.text, labels);}
;
Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
import java.util.*;
public class Main {
public static void main(String[] args) throws Exception {
GotoLexer lexer = new GotoLexer(new ANTLRFileStream("test.goto"));
GotoParser parser = new GotoParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
GotoWalker walker = new GotoWalker(new CommonTreeNodeStream(tree));
walker.labels = parser.labels;
Node root = walker.walk();
root.eval();
}
}
interface Node {
public static final Node VOID = new Node(){public Object eval(){throw new RuntimeException("VOID.eval()");}};
public static final Node BREAK = new Node(){public Object eval(){throw new RuntimeException("VOID.eval()");}};
Object eval();
}
class BlockNode implements Node {
Node stats;
Node child;
BlockNode(Node ns, Node ch) {
stats = ns;
child = ch;
}
public Object eval() {
Object o = stats.eval();
if(o != VOID) {
return o;
}
if(child != null) {
o = child.eval();
if(o != VOID) {
return o;
}
}
return VOID;
}
}
class StatsNode implements Node {
List<Node> nodes;
StatsNode(List<Node> ns) {
nodes = ns;
}
public Object eval() {
for(Node n : nodes) {
Object o = n.eval();
if(o != VOID) {
return o;
}
}
return VOID;
}
}
class PrintNode implements Node {
String text;
PrintNode(String txt) {
text = txt;
}
public Object eval() {
System.out.println(text);
return VOID;
}
}
class GotoNode implements Node {
String label;
Map<String, CommonTree[]> labels;
GotoNode(String lbl, Map<String, CommonTree[]> lbls) {
label = lbl;
labels = lbls;
}
public Object eval() {
CommonTree[] toExecute = labels.get(label);
try {
Thread.sleep(1000L);
GotoWalker walker = new GotoWalker(new CommonTreeNodeStream(toExecute[0]));
walker.labels = this.labels;
Node root = walker.stats();
Object o = root.eval();
if(o != VOID) {
return o;
}
walker = new GotoWalker(new CommonTreeNodeStream(toExecute[1]));
walker.labels = this.labels;
root = walker.block();
o = root.eval();
if(o != VOID) {
return o;
}
} catch(Exception e) {
e.printStackTrace();
}
return BREAK;
}
}
test.goto
root:
print 1
A:
print 2
B:
print 3
goto A
C:
print 4
To run the demo, do the following:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool Goto.g
java -cp antlr-3.3.jar org.antlr.Tool GotoWalker.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
or:
Windows
java -cp antlr-3.3.jar org.antlr.Tool Goto.g
java -cp antlr-3.3.jar org.antlr.Tool GotoWalker.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
which will print:
1
2
3
2
3
2
3
2
3
...
Note that the 2 and 3 are repeated until you terminate the app manually.

Resources