In a JvmModelInferrer, when constructing the body of a method or constructor, how do you insert both an XExpression from the grammar
body = op.body
and additional "boilerplate" code, for example
body = [
append(
'''
System.out.println("BOILERPLATE");
'''
)
]
I can achieve either but not both.
For a minimal working example, consider the following canonical Xbase grammar,
grammar org.example.xbase.entities.Entities with org.eclipse.xtext.xbase.Xbase
generate entities "http://www.example.org/xbase/entities/Entities"
Model:
importSection=XImportSection?
entities+=Entity*;
Entity:
'entity' name=ID ('extends' superType=JvmParameterizedTypeReference)? '{'
attributes += Attribute*
operations += Operation*
'}';
Attribute:
'attr' (type=JvmTypeReference)? name=ID ('=' initexpression=XExpression)? ';';
Operation:
'op' (type=JvmTypeReference)? name=ID
'(' (params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)? ')'
body=XBlockExpression;
and JvmModelInferrer,
package org.example.xbase.entities.jvmmodel
import com.google.inject.Inject
import org.eclipse.xtext.xbase.jvmmodel.AbstractModelInferrer
import org.eclipse.xtext.xbase.jvmmodel.IJvmDeclaredTypeAcceptor
import org.eclipse.xtext.xbase.jvmmodel.JvmTypesBuilder
import org.example.xbase.entities.entities.Entity
class EntitiesJvmModelInferrer extends AbstractModelInferrer {
#Inject extension JvmTypesBuilder
def dispatch void infer(Entity entity, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(entity.toClass("entities."+entity.name)) [
documentation = entity.documentation
if (entity.superType != null)
superTypes += entity.superType.cloneWithProxies
entity.attributes.forEach[
a |
val type = a.type ?: a.initexpression?.inferredType
members += a.toField(a.name, type) [
documentation = a.documentation
if (a.initexpression != null)
initializer = a.initexpression
]
members += a.toGetter(a.name, type)
members += a.toSetter(a.name, type)
]
entity.operations.forEach[
op |
members += op.toMethod(op.name, op.type ?: inferredType) [
documentation = op.documentation
for (p : op.params) {
parameters += p.toParameter(p.name, p.parameterType)
}
// body = [
// append(
// '''
// System.out.println("BOILERPLATE");
// '''
// )
// ]
body = op.body
]
]
]
}
}
As the comments suggest, I would like to insert "boilerplate" code into the body of the method, before the XExpression itself. While I can insert the boilerplate, or the expression, I cannot work out how to do both.
this does not work. the only thing you can do is to infer two methods
methodWithBoilerplate() {
//pre
methodwithoutboilerplate
//post
}
methodwithoutboilerplate() {
usercode goes here
}
for the first use body = '''code here'''
for the second use body = exp.body
Related
I'm writing a recursive descent parser in Go for a simple made-up language, so I'm designing the grammar as I go. My parser works but I wanted to ask if there are any best practices for how I should lay out my code or when I should put code in its own function etc ... to make it more readable.
I've been building the parser by following the simple rules I've learned so far ie. each non-terminal is it's own function, even though my code works I think looks really messy and unreadable.
I've included the code for the assignment non-terminal and the grammar above the function.
I've taken out most of the error handling to keep the function smaller.
Here's some examples of what that code can parse:
a = 10
a,b,c = 1,2,3
a int = 100
a,b string = "hello", "world"
Can anyone give me some advice as to how I can make my code more readable please?
// assignment : variable_list '=' expr_list
// | variable_list type
// | variable_list type '=' expr_list
func (p *Parser) assignment() ast.Noder {
assignment := &ast.AssignmentNode{}
assignment.Left = p.variable_list()
// This if-statement deals with rule 2 or 3
if p.currentToken.Type != token.ASSIGN {
// Static variable declaration
// Could be a declaration or an assignment
// Only static variables can be declared without providing a value
assignment.IsStatic = true
assignment.Type = p.var_type().Value
assignment.Right = nil
p.nextToken()
// Rule 2 is finished at this point in the code
// This if-statement is for rule 3
if p.currentToken.Type == token.ASSIGN {
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
} else {
// This deals with rule 1
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
if assignment.Right == nil {
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
}
if len(assignment.Left) != len(assignment.Right) {
p.FoundError(p.syntaxError("variable mismatch, " + strconv.Itoa(len(assignment.Left)) + " on left but " + strconv.Itoa(len(assignment.Right)) + " on right,"))
}
return assignment
}
how I can make my code more readable?
For readability, a prerequisite for correct, maintainable code,
// assignment : variable_list '=' expr_list
// | variable_list type
// | variable_list type '=' expr_list
func (p *Parser) assignment() ast.Noder {
assignment := &ast.AssignmentNode{}
// variable_list
assignment.Left = p.variable_list()
// type
if p.currentToken.Type != token.ASSIGN {
// Static variable declaration
// Could be a declaration or an assignment
// Only static variables can be declared without providing a value
assignment.IsStatic = true
assignment.Type = p.var_type().Value
p.nextToken()
}
// '=' expr_list
if p.currentToken.Type == token.ASSIGN {
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
// variable_list [expr_list]
if assignment.Right == nil {
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
}
if len(assignment.Left) != len(assignment.Right) {
p.FoundError(p.syntaxError(fmt.Sprintf(
"variable mismatch, %d on left but %d on right,",
len(assignment.Left), len(assignment.Right),
)))
}
return assignment
}
Note: This likely inefficient and overly complicated:
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
What is the type of assignment.Right?
As far as how to make your code more readable, there is not always a cut and dry answer. I personally find that code is more readable when you can use function names in place of comments in the code. A lot of people like to recommend the book "Clean Code" by Robert C. Martin. He pushes this throughout the book, small functions that have one purpose and are self documenting (via the function name).
Of course, as I said before this is a subjective topic. I took a crack at it, and came up with the code below, which I personally feel is more readable. It also uses the function names to document what is going on. That way the reader doesn't necessarily need to dig into every single statement in the code, but rather just the high level function names if they don't need all of the details.
// assignment : variable_list '=' expr_list
// | variable_list type
// | variable_list type '=' expr_list
func (p *Parser) assignment() ast.Noder {
assignment := &ast.AssignmentNode{}
assignment.Left = p.variable_list()
// This if-statement deals with rule 2 or 3
if p.currentToken.Type != token.ASSIGN {
// Static variable declaration
// Could be a declaration or an assignment
// Only static variables can be declared without providing a value
p.parseStaticStatement(assignment)
} else {
p.parseVariableAssignment(assignment)
}
if assignment.Right == nil {
assignment.appendDefaultValues()
}
p.checkForUnbalancedAssignment(assignment)
return assignment
}
func (p *Parser) parseStaticStatement(assignment *ast.AssingmentNode) {
assignment.IsStatic = true
assignment.Type = p.var_type().Value
assignment.Right = nil
p.nextToken()
// Rule 2 is finished at this point in the code
// This if-statement is for rule 3
if p.currentToken.Type == token.ASSIGN {
a.parseStaticAssignment()
}
}
func (p *Parser) parseStaticAssignment(assignment *ast.AssignmentNode) {
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
func (p *Parser) parseVariableAssignment(assignment *ast.AssignmentNode) {
// This deals with rule 1
assignment.Operator = p.currentToken
p.nextToken()
assignment.Right = p.expr_list()
}
func (a *ast.AssignmentNode) appendDefaultValues() {
for i := 0; i < len(assignment.Left); i++ {
assignment.Right = append(assignment.Right, nil)
}
}
func (p *Parser) checkForUnbalancedAssignment(assignment *ast.AssignmentNode) {
if len(assignment.Left) != len(assignment.Right) {
p.FoundError(p.syntaxError("variable mismatch, " + strconv.Itoa(len(assignment.Left)) + " on left but " + strconv.Itoa(len(assignment.Right)) + " on right,"))
}
}
I hope that you find this helpful. I am more than willing to answer any further questions that you may have if you leave a comment on my response.
I'm trying to write a parser for a simple DSL which has a good dozen statements in the form <statementName> <param1> <param2> ... ;, where the number of parameters vary. As the structure of the statements is very similar (all matching the statement name string followed by a series of tokens given by name) and the structure of the made results is very similar (all storing the statement name and a hash of the parameters), I'd like to know how I could specify the wanted result structure without having to repeat myself for each statement action.
Pseudo-code of an action class that would help me to specify such a result structure:
class FooActions {
method *_stmt ($/) {
#result[0] = make string of statement name $/[0];
#result[1] = make hash of $/[1..] with the keys being the name of the rule
at index (i.e. '"var"' for `<var=identifier>` and `"type"` for `<type>`, etc.) and
values being the `.made` results for the rules at index (see below);
return #result;
}
method identifier ($/) { return ~$/ }
method number ($/) { return +$/ }
method type ($/) { return ~$/ }
}
Test file:
use v6;
use Test;
use Foo;
my $s;
$s = 'GoTo 2 ;';
is_deeply Foo::FooGrammar.parse($s).made, ('GoTo', {pos => 2});
$s = 'Set foo 3 ;';
is_deeply Foo::FooGrammar.parse($s).made, ('Set', {var => 'foo', target => 3});
$s = 'Get bar Long ;';
is_deeply Foo::FooGrammar.parse($s).made, ('Get', {var => 'bar', type => 'Long'});
$s = 'Set foo bar ;';
is_deeply Foo::FooGrammar.parse($s).made, ('Set', {var => 'foo', target => 'bar'});
Grammar:
use v6;
unit package Foo;
grammar FooGrammar is export {
rule TOP { <stmt> ';' }
rule type { 'Long' | 'Int' }
rule number { \d+ }
rule identifier { <alpha> \w* }
rule numberOrIdentifier { <number> || <identifier> }
rule goto_stmt { 'GoTo' <pos=number> }
rule set_stmt { 'Set' <var=identifier> <target=numberOrIdentifier> }
rule get_stmt { 'Get' <var=identifier> <type> }
rule stmt { <goto_stmt> || <set_stmt> || <get_stmt> }
}
This approach represents each statement type as a Proto-regex and uses syms to avoid repeating the statement keywords (GoTo etc).
Individual statements don't have action methods. These are handled at the next level (TOP), which uses the caps method on the match, to convert it to a hash.
The <sym> capture is used to extra the keyword. The remainder of the line s converted to a hash. Solution follows:
Grammar and Actions:
use v6;
unit package Foo;
grammar Grammar is export {
rule TOP { <stmt> ';' }
token type { 'Long' | 'Int' }
token number { \d+ }
token identifier { <alpha>\w* }
rule numberOrIdentifier { <number> || <identifier> }
proto rule stmt {*}
rule stmt:sym<GoTo> { <sym> <pos=.number> }
rule stmt:sym<Set> { <sym> <var=.identifier> <target=.numberOrIdentifier> }
rule stmt:sym<Get> { <sym> <var=.identifier> <type> }
}
class Actions {
method number($/) { make +$/ }
method identifier($/) { make ~$/ }
method type($/) { make ~$/ }
method numberOrIdentifier($/) { make ($<number> // $<identifier>).made }
method TOP($/) {
my %caps = $<stmt>.caps;
my $keyw = .Str
given %caps<sym>:delete;
my %args = %caps.pairs.map: {.key => .value.made};
make ($keyw,%args, );
}
}
Tests:
use v6;
use Test;
use Foo;
my $actions = Foo::Actions.new;
my $s;
$s = 'GoTo 2 ;';
is-deeply Foo::Grammar.parse($s, :$actions).made, ('GoTo', {pos => 2});
$s = 'Set foo 3;';
is-deeply Foo::Grammar.parse($s, :$actions).made, ('Set', {var => 'foo', target => 3});
$s = 'Get bar Long ;';
is-deeply Foo::Grammar.parse($s, :$actions).made, ('Get', {var => 'bar', type => 'Long'});
$s = 'Set foo bar ;';
is-deeply Foo::Grammar.parse($s, :$actions).made, ('Set', {var => 'foo', target => 'bar'});
This is the code that I have:
%lex
%options flex
%{
// Used to store the parsed data
if (!('regions' in yy)) {
yy.regions = {
settings: {},
tables: [],
relationships: []
};
}
%}
text [a-zA-Z][a-zA-Z0-9]*
%%
\n\s* return 'NEWLINE';
[^\S\n]+ ; // ignore whitespace other than newlines
"." return '.';
"," return ',';
"-" return '-';
"=" return '=';
"=>" return '=>';
"<=" return '<=';
"[" return '[';
"settings]" return 'SETTINGS';
"tables]" return 'TABLES';
"relationships]" return 'RELATIONSHIPS';
"]" return ']';
{text} return 'TEXT';
<<EOF>> return 'EOF';
/lex
%left ','
%start source
%%
source
: content EOF
{
console.log(yy.regions);
console.log("\n" + JSON.stringify(yy.regions));
return yy.regions;
}
| NEWLINE content EOF
{
console.log(yy.regions);
console.log("\n" + JSON.stringify(yy.regions));
return yy.regions;
}
| NEWLINE EOF
| EOF
;
content
: '[' section content
| '[' section
;
section
: SETTINGS NEWLINE settings_content
| TABLES NEWLINE tables_content
| RELATIONSHIPS NEWLINE relationships_content
;
settings_content
: settings_line NEWLINE settings_content
| settings_line NEWLINE
| settings_line
;
settings_line
: text '=' text
{ yy.regions.settings[$1] = $3; }
;
tables_content
: tables_line NEWLINE tables_content
| tables_line NEWLINE
| tables_line
;
tables_line
: table_name
{ yy.regions.tables.push({ name: $table_name, fields: [] }); }
| field_list
{
var tableCount = yy.regions.tables.length;
var tableIndex = tableCount - 1;
yy.regions.tables[tableIndex].fields.push($field_list);
}
;
table_name
: '-' text
{ $$ = $text; }
;
field_list
: text
{ $$=[]; $$.push($text); }
| field_list ',' text
{ $field_list.push($text); $$ = $field_list; }
;
relationships_content
: relationships_line NEWLINE relationships_content
| relationships_line NEWLINE
| relationships_line
;
relationships_line
: relationship_key '=>' relationship_key
{
yy.regions.relationships.push({
pkTable: $1,
fkTable: $3
});
}
| relationship_key '<=' relationship_key
{
yy.regions.relationships.push({
pkTable: $3,
fkTable: $1
});
}
;
relationship_key
: text '.' text
{ $$ = { name: $1, field: $3 }; }
| text
{ $$ = { name: $1 }; }
;
text
: TEXT
{ $$ = $TEXT; }
;
It's used to parse this kind of code:
[settings]
DefaultFieldType = string
[tables]
-table1
id, int, PK
username, string, NULL
password, string
-table2
id, int, PK
itemName, string
itemCount, int
[relationships]
table1 => table2
foo.test => bar.test2
Into this kind of JSON:
{ settings: { DefaultFieldType: 'string' },
tables:
[ { name: 'table1', fields: [Object] },
{ name: 'table2', fields: [Object] } ],
relationships:
[ { pkTable: [Object], fkTable: [Object] },
{ pkTable: [Object], fkTable: [Object] } ] }
However I don't get syntax error. When I go to Jison demo and try to parse 5*PI 3^2, I get the following error:
Parse error on line 1:
5*PI 3^2
-----^
Expecting 'EOF', '+', '-', '*', '/', '^', ')', got 'NUMBER'
which is expected. But when I change the last line of the code which I wish to parse from:
foo.test => bar.test2
to something like
foo.test => a bar.test2
I get the following error:
throw new _parseError(str, hash);
^
TypeError: Function.prototype.toString is not generic
I traced this to the generated parser code which looks like this:
if (hash.recoverable) {
this.trace(str);
} else {
function _parseError (msg, hash) {
this.message = msg;
this.hash = hash;
}
_parseError.prototype = Error;
throw new _parseError(str, hash);
}
So this leads me to believe that there is something wrong in how I structured my code and how I handled parsing but I have no idea what that might be.
It seems like it might have something to do with error recovery. If that is correct, how is that supposed to be used? Am I supposed to add the 'error' rule upwards to every element all the way to the source root?
Your grammar seems to work as expected in the Jison demo page, at least with the browser I'm using (Firefox 46.0.1). From the amount of activity in the git repository around the code that you cite, I suspect that the version of jison you are using has one of the bugs:
https://github.com/zaach/jison/issues/328
https://github.com/zaach/jison/issues/318
I think the jison version on the demo page is older, not newer, so if grabbing the current code from github doesn't work, you could try using an older version.
I want to change every entry in csv file to 'BlahBlah'
For that I have antlr grammar as
grammar CSV;
file : hdr row* row1;
hdr : row;
row : field (',' value1=field)* '\r'? '\n'; // '\r' is optional at the end of a row of CSV file ..
row1 : field (',' field)* '\r'? '\n'?;
field
: TEXT
{
$setText("BlahBlah");
}
| STRING
|
;
TEXT : ~[,\n\r"]+ ;
STRING : '"' ('""' | ~'"')* '"' ;
But when I run this on antlr4
error(63): CSV.g4:13:3: unknown attribute reference setText in $setText
make: *** [run] Error 1
why is setText not supported in antlr4 and is there any other alternative to replace text?
Couple of problems here:
First, have to identify the receiver of the setText method. Probably want
field : TEXT { $TEXT.setText("BlahBlah"); }
| STRING
;
Second is that setText is not defined in the Token class.
Typically, create your own token class extending CommonToken and corresponding token factory class. Set the TokenLableType (in the options block) to your token class name. The setText method in CommonToken will then be visible.
tl;dr:
Given the following grammar (derived from original CSV.g4 sample and grammar attempt of OP (cf. question)):
grammar CSVBlindText;
#header {
import java.util.*;
}
/** Derived from rule "file : hdr row+ ;" */
file
locals [int i=0]
: hdr ( rows+=row[$hdr.text.split(",")] {$i++;} )+
{
System.out.println($i+" rows");
for (RowContext r : $rows) {
System.out.println("row token interval: "+r.getSourceInterval());
}
}
;
hdr : row[null] {System.out.println("header: '"+$text.trim()+"'");} ;
/** Derived from rule "row : field (',' field)* '\r'? '\n' ;" */
row[String[] columns] returns [Map<String,String> values]
locals [int col=0]
#init {
$values = new HashMap<String,String>();
}
#after {
if ($values!=null && $values.size()>0) {
System.out.println("values = "+$values);
}
}
// rule row cont'd...
: field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
( ',' field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
)* '\r'? '\n'
;
field
: TEXT
| STRING
|
;
TEXT : ~[',\n\r"]+ {setText( "BlahBlah" );} ;
STRING : '"' ('""'|~'"')* '"' ; // quote-quote is an escaped quote
One has:
$> antlr4 -no-listener CSVBlindText.g4
$> grep setText CSVBlindText*java
CSVBlindTextLexer.java: setText( "BlahBlah" );
Compiling it works flawlessly:
$> javac CSVBlindText*.java
Testdata (the users.csv file just renamed):
$> cat blinded_by_grammar.csv
User, Name, Dept
parrt, Terence, 101
tombu, Tom, 020
bke, Kevin, 008
Yields in test:
$> grun CSVBlindText file blinded_by_grammar.csv
header: 'BlahBlah,BlahBlah,BlahBlah'
values = {BlahBlah=BlahBlah}
values = {BlahBlah=BlahBlah}
values = {BlahBlah=BlahBlah}
3 rows
row token interval: 6..11
row token interval: 12..17
row token interval: 18..23
So it looks as if the setText() should be injected before the semicolon of a production and not between alternatives (wild guessing here ;-)
Previous iterations below:
Just guessing, as I 1) have no working antlr4 available currently and 2) did not write ANTLR4 grammars for quite some time now - maybe without the Dollar ($) ?
grammar CSV;
file : hdr row* row1;
hdr : row;
row : field (',' value1=field)* '\r'? '\n'; // '\r' is optional at the end of a row of CSV file ..
row1 : field (',' field)* '\r'? '\n'?;
field
: TEXT
{
setText("BlahBlah");
}
| STRING
|
;
TEXT : ~[,\n\r"]+ ;
STRING : '"' ('""' | ~'"')* '"' ;
Update: Now that an antlr 4.5.2 (at least via brew) instead of a 4.5.3 is available, I digged into that and answering some comment below from OP: the setText() will be generated in lexer java module if the grammar is well defined. Unfortunately debugging antlr4 grammars for a dilettant like me is ... but nevertheless very nice language construction kit IMO.
Sample session:
$> antlr4 -no-listener CSV.g4
$> grep setText CSVLexer.java
setText( String.valueOf(getText().charAt(1)) );
The grammar used:
(hacked up from example code retrieved via:
curl -O http://media.pragprog.com/titles/tpantlr2/code/tpantlr2-code.tgz )
grammar CSV;
#header {
import java.util.*;
}
/** Derived from rule "file : hdr row+ ;" */
file
locals [int i=0]
: hdr ( rows+=row[$hdr.text.split(",")] {$i++;} )+
{
System.out.println($i+" rows");
for (RowContext r : $rows) {
System.out.println("row token interval: "+r.getSourceInterval());
}
}
;
hdr : row[null] {System.out.println("header: '"+$text.trim()+"'");} ;
/** Derived from rule "row : field (',' field)* '\r'? '\n' ;" */
row[String[] columns] returns [Map<String,String> values]
locals [int col=0]
#init {
$values = new HashMap<String,String>();
}
#after {
if ($values!=null && $values.size()>0) {
System.out.println("values = "+$values);
}
}
// rule row cont'd...
: field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
( ',' field
{
if ($columns!=null) {
$values.put($columns[$col++].trim(), $field.text.trim());
}
}
)* '\r'? '\n'
;
field
: TEXT
| STRING
| CHAR
|
;
TEXT : ~[',\n\r"]+ ;
STRING : '"' ('""'|~'"')* '"' ; // quote-quote is an escaped quote
/** Convert 3-char 'x' input sequence to string x */
CHAR: '\'' . '\'' {setText( String.valueOf(getText().charAt(1)) );} ;
Compiling works:
$> javac CSV*.java
Now test with a matching weird csv file:
a,b
"y",'4'
As:
$> grun CSV file foo.csv
line 1:0 no viable alternative at input 'a'
line 1:2 no viable alternative at input 'b'
header: 'a,b'
values = {a="y", b=4}
1 rows
row token interval: 4..7
So in conclusion, I suggest to rework the logic of the grammar (I presume inserting "BlahBlahBlah" was not essential but a mere debugging hack).
And citing http://www.antlr.org/support.html :
ANTLR Discussions
Please do not start discussions at stackoverflow. They have asked us to
steer discussions (i.e., non-questions/answers) away from Stackoverflow; we
have a discussion forum at Google specifically for that:
https://groups.google.com/forum/#!forum/antlr-discussion
We can discuss ANTLR project features, direction, and generally argue about
whatever we want at the google discussion forum.
I hope this helps.
I have a Groovy file containing a bunch of simple functions like so:
// useful functions
def myFunc1(String arg) {
println("Hello " + arg)
}
def myFunc2(String arg) {
println("Goodbye " + arg)
}
I'd like to obtain from this:
the method name
the arguments
the body code of the function
(All as simple strings, I don't need to run anything yet.)
I was about to resort to some Regexing, but since I'm using a JVM language (Scala) I figured I might be able to use some of the Groovy compiler's stuff to do this a "nicer" way.
There seems to be a fair bit of information on loading Groovy code dynamically and running it, but not so much on introspecting the source. Any ideas?
(Failing a "nice" way, I'll also accept some Scala-foo to parse the information in a succinct fashion.)
This works, and demonstrates the token types required to find each node of importance in the AST. Hope it makes sense... By using lots of Groovy dynamism, I hope I haven't made it too hard for a port to Scala :-(
import org.codehaus.groovy.antlr.*
import org.codehaus.groovy.antlr.parser.*
import static org.codehaus.groovy.antlr.parser.GroovyTokenTypes.*
def code = '''
// useful functions
def myFunc1(String arg) {
println("Hello " + arg)
}
def myFunc2(arg, int arg2) {
println("Goodbye " + arg)
}
public String stringify( int a ) {
"$a"
}
'''
def lines = code.split( '\n' )
// Generate a GroovyRecognizer, compile an AST and assign it to 'ast'
def ast = new SourceBuffer().with { buff ->
new UnicodeEscapingReader( new StringReader( code ), buff ).with { read ->
read.lexer = new GroovyLexer( read )
GroovyRecognizer.make( read.lexer ).with { parser ->
parser.sourceBuffer = buff
parser.compilationUnit()
parser.AST
}
}
}
// Walks the ast looking for types
def findByPath( ast, types, multiple=false ) {
[types.take( 1 )[ 0 ],types.drop(1)].with { head, tail ->
if( tail ) {
findByPath( ast*.childrenOfType( head ).flatten(), tail, multiple )
}
else {
ast*.childrenOfType( head ).with { ret ->
multiple ? ret[ 0 ] : ret.head()[0]
}
}
}
}
// Walk through the returned ast
while( ast ) {
def methodModifier = findByPath( ast, [ MODIFIERS ] ).firstChild?.toStringTree() ?: 'public'
def returnType = findByPath( ast, [ TYPE, IDENT ] ) ?: 'Object'
def methodName = findByPath( ast, [ IDENT ] )
def body = findByPath( ast, [ SLIST ] )
def parameters = findByPath( ast, [ PARAMETERS, PARAMETER_DEF ], true ).collect { param ->
[ type: findByPath( param, [ TYPE ] ).firstChild?.toStringTree() ?: 'Object',
name: findByPath( param, [ IDENT ] ) ]
}
def (y1,y2,x1,x2) = [ body.line - 1, body.lineLast - 1, body.column - 1, body.columnLast ]
// Grab the text from the original string
def snip = [ lines[ y1 ].drop( x1 ), // First line prefix stripped
*lines[ (y1+1)..<y2 ], // Mid lines
lines[ y2 ].take( x2 ) ].join( '\n' ) // End line suffix stripped
println '------------------------------'
println "modifier: $methodModifier"
println "returns: $returnType"
println "name: $methodName"
println "params: $parameters"
println "$snip\n"
// Step to next branch and repeat
ast = ast.nextSibling
}
It prints out:
------------------------------
modifier: public
returns: Object
name: myFunc1
params: [[type:String, name:arg]]
{
println("Hello " + arg)
}
------------------------------
modifier: public
returns: Object
name: myFunc2
params: [[type:Object, name:arg], [type:int, name:arg2]]
{
println("Goodbye " + arg)
}
------------------------------
modifier: public
returns: String
name: stringify
params: [[type:int, name:a]]
{
"$a"
}
Hope it helps, or points you in the right direction :-)