Getting info on Groovy functions (name, signature, body code) - parsing

I have a Groovy file containing a bunch of simple functions like so:
// useful functions
def myFunc1(String arg) {
println("Hello " + arg)
}
def myFunc2(String arg) {
println("Goodbye " + arg)
}
I'd like to obtain from this:
the method name
the arguments
the body code of the function
(All as simple strings, I don't need to run anything yet.)
I was about to resort to some Regexing, but since I'm using a JVM language (Scala) I figured I might be able to use some of the Groovy compiler's stuff to do this a "nicer" way.
There seems to be a fair bit of information on loading Groovy code dynamically and running it, but not so much on introspecting the source. Any ideas?
(Failing a "nice" way, I'll also accept some Scala-foo to parse the information in a succinct fashion.)

This works, and demonstrates the token types required to find each node of importance in the AST. Hope it makes sense... By using lots of Groovy dynamism, I hope I haven't made it too hard for a port to Scala :-(
import org.codehaus.groovy.antlr.*
import org.codehaus.groovy.antlr.parser.*
import static org.codehaus.groovy.antlr.parser.GroovyTokenTypes.*
def code = '''
// useful functions
def myFunc1(String arg) {
println("Hello " + arg)
}
def myFunc2(arg, int arg2) {
println("Goodbye " + arg)
}
public String stringify( int a ) {
"$a"
}
'''
def lines = code.split( '\n' )
// Generate a GroovyRecognizer, compile an AST and assign it to 'ast'
def ast = new SourceBuffer().with { buff ->
new UnicodeEscapingReader( new StringReader( code ), buff ).with { read ->
read.lexer = new GroovyLexer( read )
GroovyRecognizer.make( read.lexer ).with { parser ->
parser.sourceBuffer = buff
parser.compilationUnit()
parser.AST
}
}
}
// Walks the ast looking for types
def findByPath( ast, types, multiple=false ) {
[types.take( 1 )[ 0 ],types.drop(1)].with { head, tail ->
if( tail ) {
findByPath( ast*.childrenOfType( head ).flatten(), tail, multiple )
}
else {
ast*.childrenOfType( head ).with { ret ->
multiple ? ret[ 0 ] : ret.head()[0]
}
}
}
}
// Walk through the returned ast
while( ast ) {
def methodModifier = findByPath( ast, [ MODIFIERS ] ).firstChild?.toStringTree() ?: 'public'
def returnType = findByPath( ast, [ TYPE, IDENT ] ) ?: 'Object'
def methodName = findByPath( ast, [ IDENT ] )
def body = findByPath( ast, [ SLIST ] )
def parameters = findByPath( ast, [ PARAMETERS, PARAMETER_DEF ], true ).collect { param ->
[ type: findByPath( param, [ TYPE ] ).firstChild?.toStringTree() ?: 'Object',
name: findByPath( param, [ IDENT ] ) ]
}
def (y1,y2,x1,x2) = [ body.line - 1, body.lineLast - 1, body.column - 1, body.columnLast ]
// Grab the text from the original string
def snip = [ lines[ y1 ].drop( x1 ), // First line prefix stripped
*lines[ (y1+1)..<y2 ], // Mid lines
lines[ y2 ].take( x2 ) ].join( '\n' ) // End line suffix stripped
println '------------------------------'
println "modifier: $methodModifier"
println "returns: $returnType"
println "name: $methodName"
println "params: $parameters"
println "$snip\n"
// Step to next branch and repeat
ast = ast.nextSibling
}
It prints out:
------------------------------
modifier: public
returns: Object
name: myFunc1
params: [[type:String, name:arg]]
{
println("Hello " + arg)
}
------------------------------
modifier: public
returns: Object
name: myFunc2
params: [[type:Object, name:arg], [type:int, name:arg2]]
{
println("Goodbye " + arg)
}
------------------------------
modifier: public
returns: String
name: stringify
params: [[type:int, name:a]]
{
"$a"
}
Hope it helps, or points you in the right direction :-)

Related

Groovy multiline string interpolation whitespace

I am trying to generate some generic Groovy code for Jenkins but I seem to have trouble with multi line strings and extra white space. I've tried everything I could find by Googling but I can't seem to get it working.
My issue isn't related to simple multi line strings. I managed to trim white space by using the stripIndent() and stripMargin() methods for simple cases. My issue is caused by having interpolated methods inside my strings.
Groovy info: Groovy Version: 3.0.2 JVM: 13.0.2 Vendor: Oracle Corporation OS: Mac OS X
String method2(String tier, String jobName) {
return """
Map downstreamJobs = [:]
stage ("${jobName}-${tier}-\${region}_${jobName}") {
test
}
""".stripIndent().stripMargin()
}
static String simpleLog() {
return """
script {
def user = env.BUILD_USER_ID
}
""".stripIndent().stripMargin()
}
static String method1() {
return """\
import jenkins.model.Jenkins
currentBuild.displayName = "name"
${simpleLog()}
""".stripIndent().stripMargin()
}
String generateFullDeploymentPipelineCode() {
return """Text here
${method1()}
${method2("test1", "test2")}
""".stripIndent().stripMargin()
}
println(generateFullDeploymentPipelineCode())
This is what it prints(or writes to disk):
Text here
import jenkins.model.Jenkins
currentBuild.displayName = "name"
script {
def user = env.BUILD_USER_ID
}
Map downstreamJobs = [:]
stage ("test2-test1-${region}_test2") {
test
}
Why the extra space around the import lines? I know the indentation method is supposed to trim all white space according to the least number of leading spaces, so that's why we use backslash (example here https://stackoverflow.com/a/19882917/7569335).
That works for simple strings, but it breaks down once use start using interpolation. Not with regular variables, just when you interpolate an entire method.
as variant - use just stripMargin() and only once on a final string
String method2(String tier, String jobName) {
return """\
|Map downstreamJobs = [:]
|stage ("${jobName}-${tier}-\${region}_${jobName}") {
| test
|}
"""
}
static String simpleLog() {
return """\
|script {
| def user = env.BUILD_USER_ID
|}
"""
}
static String method1() {
return """\
|import jenkins.model.Jenkins
|currentBuild.displayName = "name"
${simpleLog()}
"""
}
String generateFullDeploymentPipelineCode() {
return """\
|Text here
${method1()}
${method2("test1", "test2")}
""".stripIndent().stripMargin()
}
println(generateFullDeploymentPipelineCode())
result:
Text here
import jenkins.model.Jenkins
currentBuild.displayName = "name"
script {
def user = env.BUILD_USER_ID
}
Map downstreamJobs = [:]
stage ("test2-test1-${region}_test2") {
test
}
another variant with trim() and stripIndent()
def method2(String tier, String jobName) {
return """
Map downstreamJobs = [:]
stage ("${jobName}-${tier}-\${region}_${jobName}") {
test
}
""".trim()
}
def simpleLog() {
return """
script {
def user = env.BUILD_USER_ID
}
""".trim()
}
def method1() {
return """
import jenkins.model.Jenkins
currentBuild.displayName = "name"
${simpleLog()}
""".trim()
}
def generateFullDeploymentPipelineCode() {
return """\
Text here
${method1()}
${method2("test1", "test2")}
""".stripIndent()
}
println(generateFullDeploymentPipelineCode())
When you insert a string through interpolation you only indent the first line of it. The following lines of the inserted string will be indented differently, which messes everything up.
Using some lesser-known members of GString (namely .strings[] and .values[]), we can align the indentation of all lines of each interpolated value.
String method2(String tier, String jobName) {
indented """
Map downstreamJobs = [:]
stage ("${jobName}-${tier}-\${region}_${jobName}") {
test
}
"""
}
String simpleLog() {
indented """\
script {
def user = env.BUILD_USER_ID
}
"""
}
String method1() {
indented """\
import jenkins.model.Jenkins
currentBuild.displayName = "name"
${simpleLog()}
"""
}
String generateFullDeploymentPipelineCode() {
indented """\
Text here
${method1()}
${method2("test1", "test2")}
"""
}
println generateFullDeploymentPipelineCode()
//---------- Move the following code into its own script ----------
// Function to adjust the indentation of interpolated values so that all lines
// of a value match the indentation of the first line.
// Finally stripIndent() will be called before returning the string.
String indented( GString templ ) {
// Iterate over the interpolated values of the GString template.
templ.values.eachWithIndex{ value, i ->
// Get the string preceding the current value. Always defined, even
// when the value is at the beginning of the template.
def beforeValue = templ.strings[ i ]
// RegEx to match any indent substring before the value.
// Special case for the first string, which doesn't necessarily contain '\n'.
def regexIndent = i == 0
? /(?:^|\n)([ \t]+)$/
: /\n([ \t]+)$/
def matchIndent = ( beforeValue =~ regexIndent )
if( matchIndent ) {
def indent = matchIndent[ 0 ][ 1 ]
def lines = value.readLines()
def linesNew = [ lines.head() ] // The 1st line is already indented.
// Insert the indentation from the 1st line into all subsequent lines.
linesNew += lines.tail().collect{ indent + it }
// Finally replace the value with the reformatted lines.
templ.values[ i ] = linesNew.join('\n')
}
}
return templ.stripIndent()
}
// Fallback in case the input string is not a GString (when it doesn't contain expressions)
String indented( String templ ) {
return templ.stripIndent()
}
Live Demo at codingground
Output:
Text here
import jenkins.model.Jenkins
currentBuild.displayName = "name"
script {
def user = env.BUILD_USER_ID
}
Map downstreamJobs = [:]
stage ("test2-test1-${region}_test2") {
test
}
Conclusion:
Using the indented function, a clean Groovy syntax for generating code from GString templates has been achieved.
This was quite a learning experience. I first tried to do it completely different using the evaluate function, which turned out to be too complicated and not so flexible. Then I randomly browsed through some posts from mrhaki blog (always a good read!) until I discovered "Groovy Goodness: Get to Know More About a GString". This was the key to implementing this solution.

How to write a map to a YAML file in Dart

I have a map of key value pairs in Dart. I want to convert it to YAML and write into a file.
I tried using YAML package from dart library but it only provides methods to load YAML data from a file. Nothing is mentioned on how to write it back to the YAML file.
Here is an example:
void main() {
var map = {
"name": "abc",
"type": "unknown",
"internal":{
"name": "xyz"
}
};
print(map);
}
Expected output:
example.yaml
name: abc
type: unknown
internal:
name: xyz
How to convert the dart map to YAML and write it to a file?
It's a bit late of a response but for anyone else looking at this question I have written this class. It may not be perfect but it works for what I'm doing and I haven't found anything wrong with it yet. Might make it a package eventually after writing tests.
class YamlWriter {
/// The amount of spaces for each level.
final int spaces;
/// Initialize the writer with the amount of [spaces] per level.
YamlWriter({
this.spaces = 2,
});
/// Write a dart structure to a YAML string. [yaml] should be a [Map] or [List].
String write(dynamic yaml) {
return _writeInternal(yaml).trim();
}
/// Write a dart structure to a YAML string. [yaml] should be a [Map] or [List].
String _writeInternal(dynamic yaml, { int indent = 0 }) {
String str = '';
if (yaml is List) {
str += _writeList(yaml, indent: indent);
} else if (yaml is Map) {
str += _writeMap(yaml, indent: indent);
} else if (yaml is String) {
str += "\"${yaml.replaceAll("\"", "\\\"")}\"";
} else {
str += yaml.toString();
}
return str;
}
/// Write a list to a YAML string.
/// Pass the list in as [yaml] and indent it to the [indent] level.
String _writeList(List yaml, { int indent = 0 }) {
String str = '\n';
for (var item in yaml) {
str += "${_indent(indent)}- ${_writeInternal(item, indent: indent + 1)}\n";
}
return str;
}
/// Write a map to a YAML string.
/// Pass the map in as [yaml] and indent it to the [indent] level.
String _writeMap(Map yaml, { int indent = 0 }) {
String str = '\n';
for (var key in yaml.keys) {
var value = yaml[key];
str += "${_indent(indent)}${key.toString()}: ${_writeInternal(value, indent: indent + 1)}\n";
}
return str;
}
/// Create an indented string for the level with the spaces config.
/// [indent] is the level of indent whereas [spaces] is the
/// amount of spaces that the string should be indented by.
String _indent(int indent) {
return ''.padLeft(indent * spaces, ' ');
}
}
Usage:
final writer = YamlWriter();
String yaml = writer.write({
'string': 'Foo',
'int': 1,
'double': 3.14,
'boolean': true,
'list': [
'Item One',
'Item Two',
true,
'Item Four',
],
'map': {
'foo': 'bar',
'list': ['Foo', 'Bar'],
},
});
File file = File('/path/to/file.yaml');
file.createSync();
file.writeAsStringSync(yaml);
Output:
string: "Foo"
int: 1
double: 3.14
boolean: true
list:
- "Item One"
- "Item Two"
- true
- "Item Four"
map:
foo: "bar"
list:
- "Foo"
- "Bar"
package:yaml does not have YAML writing features. You may have to look for another package that does that – or write your own.
As as stopgap, remember JSON is valid YAML, so you can always write out JSON to a .yaml file and it should work with any YAML parser.
I ran into the same issue and ended up hacking together a simple writer:
// Save the updated configuration settings to the config file
void saveConfig() {
var file = _configFile;
// truncate existing configuration
file.writeAsStringSync('');
// Write out new YAML document from JSON map
final config = configToJson();
config.forEach((key, value) {
if (value is Map) {
file.writeAsStringSync('\n$key:\n', mode: FileMode.writeOnlyAppend);
value.forEach((subkey, subvalue) {
file.writeAsStringSync(' $subkey: $subvalue\n',
mode: FileMode.writeOnlyAppend);
});
} else {
file.writeAsStringSync('$key: $value\n',
mode: FileMode.writeOnlyAppend);
}
});
}

How to JvmModelInferrer method body from XExpression and append boilerplate code

In a JvmModelInferrer, when constructing the body of a method or constructor, how do you insert both an XExpression from the grammar
body = op.body
and additional "boilerplate" code, for example
body = [
append(
'''
System.out.println("BOILERPLATE");
'''
)
]
I can achieve either but not both.
For a minimal working example, consider the following canonical Xbase grammar,
grammar org.example.xbase.entities.Entities with org.eclipse.xtext.xbase.Xbase
generate entities "http://www.example.org/xbase/entities/Entities"
Model:
importSection=XImportSection?
entities+=Entity*;
Entity:
'entity' name=ID ('extends' superType=JvmParameterizedTypeReference)? '{'
attributes += Attribute*
operations += Operation*
'}';
Attribute:
'attr' (type=JvmTypeReference)? name=ID ('=' initexpression=XExpression)? ';';
Operation:
'op' (type=JvmTypeReference)? name=ID
'(' (params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)? ')'
body=XBlockExpression;
and JvmModelInferrer,
package org.example.xbase.entities.jvmmodel
import com.google.inject.Inject
import org.eclipse.xtext.xbase.jvmmodel.AbstractModelInferrer
import org.eclipse.xtext.xbase.jvmmodel.IJvmDeclaredTypeAcceptor
import org.eclipse.xtext.xbase.jvmmodel.JvmTypesBuilder
import org.example.xbase.entities.entities.Entity
class EntitiesJvmModelInferrer extends AbstractModelInferrer {
#Inject extension JvmTypesBuilder
def dispatch void infer(Entity entity, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(entity.toClass("entities."+entity.name)) [
documentation = entity.documentation
if (entity.superType != null)
superTypes += entity.superType.cloneWithProxies
entity.attributes.forEach[
a |
val type = a.type ?: a.initexpression?.inferredType
members += a.toField(a.name, type) [
documentation = a.documentation
if (a.initexpression != null)
initializer = a.initexpression
]
members += a.toGetter(a.name, type)
members += a.toSetter(a.name, type)
]
entity.operations.forEach[
op |
members += op.toMethod(op.name, op.type ?: inferredType) [
documentation = op.documentation
for (p : op.params) {
parameters += p.toParameter(p.name, p.parameterType)
}
// body = [
// append(
// '''
// System.out.println("BOILERPLATE");
// '''
// )
// ]
body = op.body
]
]
]
}
}
As the comments suggest, I would like to insert "boilerplate" code into the body of the method, before the XExpression itself. While I can insert the boilerplate, or the expression, I cannot work out how to do both.
this does not work. the only thing you can do is to infer two methods
methodWithBoilerplate() {
//pre
methodwithoutboilerplate
//post
}
methodwithoutboilerplate() {
usercode goes here
}
for the first use body = '''code here'''
for the second use body = exp.body

Group and sum collection in Groovy

I have a collection of objects that I want to group by month and name and sum total:
def things = [
[id:1, name:"fred", total:10, date: "2012-01-01"],
[id:2, name:"fred", total:10, date: "2012-01-03"],
[id:3, name:"jane", total:10, date: "2012-01-04"],
[id:4, name:"fred", total:10, date: "2012-02-11"],
[id:5, name:"jane", total:10, date: "2012-01-01"],
[id:6, name:"ted", total:10, date: "2012-03-21"],
[id:7, name:"ted", total:10, date: "2012-02-09"]
];
I would like the output to be:
[
"fred":[[total:20, month:"January"],[total:10, month:"February"]],
"jane":[[total:20,month:"January"]],
"ted" :[[total:10, month:"February"],[total:10, month:"March"]]
]
or something along those lines. What is the best way to accomplish this using groovy/grails?
The following lines
things.inject([:].withDefault { [:].withDefault { 0 } } ) {
map, v -> map[v.name][Date.parse('yyyy-MM-dd', v.date).format('MMMM')] += v.total; map
}
will give you this result:
[fred:[January:20, February:10], jane:[January:20], ted:[March:10, February:10]]
(works with Groovy >= 1.8.7 and 2.0)
I ended up with
things.collect {
// get the map down to name, total and month
it.subMap( ['name', 'total' ] ) << [ month: Date.parse( 'yyyy-MM-dd', it.date ).format( 'MMMM' ) ]
// Then group by name first and month second
}.groupBy( { it.name }, { it.month } ).collectEntries { k, v ->
// Then for the names, collect
[ (k):v.collectEntries { k2, v2 ->
// For each month, the sum of the totals
[ (k2): v2.total.sum() ]
} ]
}
To get the same result as Andre's much shorter, much better answer ;-)
Edit
bit shorter, but still not as good...
things.groupBy( { it.name }, { Date.parse( 'yyyy-MM-dd', it.date ).format( 'MMMM' ) } ).collectEntries { k, v ->
[ (k):v.collectEntries { k2, v2 ->
[ (k2): v2.total.sum() ]
} ]
}
Here's a solution to do the same thing as the other solutions, but in parallel using GPars. There may be a tighter solution, but this one does work with the test input.
#Grab(group='org.codehaus.gpars', module='gpars', version='1.0.0')
import static groovyx.gpars.GParsPool.*
//def things = [...]
withPool {
def mapInner = { entrylist ->
withPool{
entrylist.getParallel()
.map{[Date.parse('yyyy-MM-dd', it.date).format('MMMM'), it.total]}
.combine(0) {acc, v -> acc + v}
}
}
//for dealing with bug when only 1 list item
def collectSingle = { entrylist ->
def first = entrylist[0]
return [(Date.parse('yyyy-MM-dd', first.date).format('MMMM')) : first.total]
}
def result = things.parallel
.groupBy{it.name}.getParallel()
.map{ [(it.key) : (it.value?.size())>1?mapInner.call(it.value):collectSingle.call(it.value) ] }
.reduce([:]) {a, b -> a + b}
println "result = $result"
}

PEG for Python style indentation

How would you write a Parsing Expression Grammar in any of the following Parser Generators (PEG.js, Citrus, Treetop) which can handle Python/Haskell/CoffeScript style indentation:
Examples of a not-yet-existing programming language:
square x =
x * x
cube x =
x * square x
fib n =
if n <= 1
0
else
fib(n - 2) + fib(n - 1) # some cheating allowed here with brackets
Update:
Don't try to write an interpreter for the examples above. I'm only interested in the indentation problem. Another example might be parsing the following:
foo
bar = 1
baz = 2
tap
zap = 3
# should yield (ruby style hashmap):
# {:foo => { :bar => 1, :baz => 2}, :tap => { :zap => 3 } }
Pure PEG cannot parse indentation.
But peg.js can.
I did a quick-and-dirty experiment (being inspired by Ira Baxter's comment about cheating) and wrote a simple tokenizer.
For a more complete solution (a complete parser) please see this question: Parse indentation level with PEG.js
/* Initializations */
{
function start(first, tail) {
var done = [first[1]];
for (var i = 0; i < tail.length; i++) {
done = done.concat(tail[i][1][0])
done.push(tail[i][1][1]);
}
return done;
}
var depths = [0];
function indent(s) {
var depth = s.length;
if (depth == depths[0]) return [];
if (depth > depths[0]) {
depths.unshift(depth);
return ["INDENT"];
}
var dents = [];
while (depth < depths[0]) {
depths.shift();
dents.push("DEDENT");
}
if (depth != depths[0]) dents.push("BADDENT");
return dents;
}
}
/* The real grammar */
start = first:line tail:(newline line)* newline? { return start(first, tail) }
line = depth:indent s:text { return [depth, s] }
indent = s:" "* { return indent(s) }
text = c:[^\n]* { return c.join("") }
newline = "\n" {}
depths is a stack of indentations. indent() gives back an array of indentation tokens and start() unwraps the array to make the parser behave somewhat like a stream.
peg.js produces for the text:
alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
these results:
[
"alpha",
"INDENT",
"beta",
"gamma",
"INDENT",
"delta",
"DEDENT",
"DEDENT",
"epsilon",
"INDENT",
"zeta",
"DEDENT",
"BADDENT",
"eta",
"theta",
"INDENT",
"iota",
"DEDENT",
"",
""
]
This tokenizer even catches bad indents.
I think an indentation-sensitive language like that is context-sensitive. I believe PEG can only do context-free langauges.
Note that, while nalply's answer is certainly correct that PEG.js can do it via external state (ie the dreaded global variables), it can be a dangerous path to walk down (worse than the usual problems with global variables). Some rules can initially match (and then run their actions) but parent rules can fail thus causing the action run to be invalid. If external state is changed in such an action, you can end up with invalid state. This is super awful, and could lead to tremors, vomiting, and death. Some issues and solutions to this are in the comments here: https://github.com/dmajda/pegjs/issues/45
So what we are really doing here with indentation is creating something like a C-style blocks which often have their own lexical scope. If I were writing a compiler for a language like that I think I would try and have the lexer keep track of the indentation. Every time the indentation increases it could insert a '{' token. Likewise every time it decreases it could inset an '}' token. Then writing an expression grammar with explicit curly braces to represent lexical scope becomes more straight forward.
You can do this in Treetop by using semantic predicates. In this case you need a semantic predicate that detects closing a white-space indented block due to the occurrence of another line that has the same or lesser indentation. The predicate must count the indentation from the opening line, and return true (block closed) if the current line's indentation has finished at the same or shorter length. Because the closing condition is context-dependent, it must not be memoized.
Here's the example code I'm about to add to Treetop's documentation. Note that I've overridden Treetop's SyntaxNode inspect method to make it easier to visualise the result.
grammar IndentedBlocks
rule top
# Initialise the indent stack with a sentinel:
&{|s| #indents = [-1] }
nested_blocks
{
def inspect
nested_blocks.inspect
end
}
end
rule nested_blocks
(
# Do not try to extract this semantic predicate into a new rule.
# It will be memo-ized incorrectly because #indents.last will change.
!{|s|
# Peek at the following indentation:
save = index; i = _nt_indentation; index = save
# We're closing if the indentation is less or the same as our enclosing block's:
closing = i.text_value.length <= #indents.last
}
block
)*
{
def inspect
elements.map{|e| e.block.inspect}*"\n"
end
}
end
rule block
indented_line # The block's opening line
&{|s| # Push the indent level to the stack
level = s[0].indentation.text_value.length
#indents << level
true
}
nested_blocks # Parse any nested blocks
&{|s| # Pop the indent stack
# Note that under no circumstances should "nested_blocks" fail, or the stack will be mis-aligned
#indents.pop
true
}
{
def inspect
indented_line.inspect +
(nested_blocks.elements.size > 0 ? (
"\n{\n" +
nested_blocks.elements.map { |content|
content.block.inspect+"\n"
}*'' +
"}"
)
: "")
end
}
end
rule indented_line
indentation text:((!"\n" .)*) "\n"
{
def inspect
text.text_value
end
}
end
rule indentation
' '*
end
end
Here's a little test driver program so you can try it easily:
require 'polyglot'
require 'treetop'
require 'indented_blocks'
parser = IndentedBlocksParser.new
input = <<END
def foo
here is some indented text
here it's further indented
and here the same
but here it's further again
and some more like that
before going back to here
down again
back twice
and start from the beginning again
with only a small block this time
END
parse_tree = parser.parse input
p parse_tree
I know this is an old thread, but I just wanted to add some PEGjs code to the answers. This code will parse a piece of text and "nest" it into a sort of "AST-ish" structure. It only goes one deep and it looks ugly, furthermore it does not really use the return values to create the right structure but keeps an in-memory tree of your syntax and it will return that at the end. This might well become unwieldy and cause some performance issues, but at least it does what it's supposed to.
Note: Make sure you have tabs instead of spaces!
{
var indentStack = [],
rootScope = {
value: "PROGRAM",
values: [],
scopes: []
};
function addToRootScope(text) {
// Here we wiggle with the form and append the new
// scope to the rootScope.
if (!text) return;
if (indentStack.length === 0) {
rootScope.scopes.unshift({
text: text,
statements: []
});
}
else {
rootScope.scopes[0].statements.push(text);
}
}
}
/* Add some grammar */
start
= lines: (line EOL+)*
{
return rootScope;
}
line
= line: (samedent t:text { addToRootScope(t); }) &EOL
/ line: (indent t:text { addToRootScope(t); }) &EOL
/ line: (dedent t:text { addToRootScope(t); }) &EOL
/ line: [ \t]* &EOL
/ EOF
samedent
= i:[\t]* &{ return i.length === indentStack.length; }
{
console.log("s:", i.length, " level:", indentStack.length);
}
indent
= i:[\t]+ &{ return i.length > indentStack.length; }
{
indentStack.push("");
console.log("i:", i.length, " level:", indentStack.length);
}
dedent
= i:[\t]* &{ return i.length < indentStack.length; }
{
for (var j = 0; j < i.length + 1; j++) {
indentStack.pop();
}
console.log("d:", i.length + 1, " level:", indentStack.length);
}
text
= numbers: number+ { return numbers.join(""); }
/ txt: character+ { return txt.join(""); }
number
= $[0-9]
character
= $[ a-zA-Z->+]
__
= [ ]+
_
= [ ]*
EOF
= !.
EOL
= "\r\n"
/ "\n"
/ "\r"

Resources