How can I access alternate labels in ANTLR4 while generically traversing a parse tree? - pretty-print

How can I access alternate labels in ANTLR4 while generically traversing a parse tree? Or alternatively, is there any way of replicating the functionality of the ^ operator of ANTLR3, as that would do the trick.
I'm trying to write an AST pretty printer for any ANTLR4 grammar adhering to a simple methodology (like naming productions with alternate labels). I'd like to be able to pretty print a term like 3 + 5 as (int_expression (plus (int_literal 3) (int_literal 5))), or something similar, given a grammar like the following:
int_expression
: int_expression '+' int_expression # plus
| int_expression '-' int_expression # minus
| raw_int # int_literal
;
raw_int
: Int
;
Int : [0-9]+ ;
I am unable to effectively give names to the plus and minus productions, because pulling them out into their own production causes the tool to complain that the rules are mutually left-recursive. If I can't pull them out, how can I give these productions names?
Note 1: I was able to get rid of the + argument methodologically by putting "good" terminals (e.g., the Int above) in special productions (productions starting with a special prefix, like raw_). Then I could print only those terminals whose parent productions are named "raw_..." and elide all others. This worked great for getting rid of +, while keeping 3 and 5 in the output. This could be done with a ! in ANTLR3.
Note 2: I understand that I could write a specialized pretty printer or use actions for each production of a given language, but I'd like to use ANTLR4 to parse and generate ASTs for a variety of languages, and it seems like I should be able to write such a simple pretty printer generically. Said another way, I only care about getting ASTs, and I'd rather not have to encumber each grammar with a tailored pretty printer just to get an AST. Perhaps I should just go back to ANTLR3?

I suggest implementing the pretty printer as a listener implementation with a nested visitor class to get the names of the various context objects.
private MyParser parser; // you'll have to assign this field
private StringBuilder builder = new StringBuilder();
#Override
public void enterEveryRule(#NotNull ParserRuleContext ctx) {
if (!builder.isEmpty()) {
builder.append(' ');
}
builder.append('(');
}
#Override
public void visitTerminalNode(#NotNull TerminalNode node) {
// TODO: print node text to builder
}
#Override
public void visitErrorNode(#NotNull TerminalNode node) {
// TODO: print node text to builder
}
#Override
public void exitEveryRule(#NotNull ParserRuleContext ctx) {
builder.append(')');
}
protected String getContextName(#NotNull ParserRuleContext ctx) {
return new ContextNameVisitor().visit(ctx);
}
protected class ContextNameVisitor extends MyParserBaseVisitor<String> {
#Override
public String visitChildren() {
return parser.getRuleNames()[ctx.getRuleIndex()];
}
#Override
public String visitPlus(#NotNull PlusContext ctx) {
return "plus";
}
#Override
public String visitMinus(#NotNull MinusContext ctx) {
return "minus";
}
#Override
public String visitInt_literal(#NotNull MinusContext ctx) {
return "int_literal";
}
}

The API doesn't contain a method to access the alternate labels.
However there is a workaround. ANTLR4 uses the alternate labels to generate java class names and those java classes can be accessed at run time.
For example: to access alternate labels in ANTLR4 while generically traversing a parse tree (with a listener) you can use the following function:
// Return the embedded alternate label between
// "$" and "Context" from the class name
String getCtxName(ParserRuleContext ctx) {
String str = ctx.getClass().getName();
str = str.substring(str.indexOf("$")+1,str.lastIndexOf("Context"));
str = str.toLowerCase();
return str;
}
Example use:
#Override
public void exitEveryRule(ParserRuleContext ctx) {
System.out.println(getCtxName(ctx));
}

Related

Saxon Extension function in Java return Node

I am trying to implement custom function using Saxon as defined here-> https://specifications.xbrl.org/registries/functions-registry-1.0/80132%20xfi.identifier/80132%20xfi.identifier%20function.html
public class IdentifierFunction implements ExtensionFunction {
public QName getName() {
return new QName("http://www.xbrl.org/2005/function/instance", "identifier");
}
public SequenceType getResultType() {
return SequenceType.makeSequenceType(ItemType.STRING, OccurrenceIndicator.ONE);
}
public net.sf.saxon.s9api.SequenceType[] getArgumentTypes() {
return new SequenceType[] { SequenceType.makeSequenceType(ItemType.STRING, OccurrenceIndicator.ONE) };
}
public XdmValue call(XdmValue[] arguments) throws SaxonApiException {
String arg = ((XdmAtomicValue) arguments[0].itemAt(0)).getStringValue();
String newExpression="(//xbrli:xbrl/xbrli:context[#id=("+arg+"/#contextRef"+")])[1]/xbrli:entity/xbrli:identifier";
String nodeString=this.getxPathResolver().resolveNode(this.getXbrl(),newExpression);
return new XdmAtomicValue(nodeString);
}
}
resolveNode() is above code is implemented as follows
public String resolveNode(byte[] xbrlBytes, String expressionValue) {
// 1. Instantiate an XPathFactory.
javax.xml.xpath.XPathFactory factory = new XPathFactoryImpl();
// 2. Use the XPathFactory to create a new XPath object
javax.xml.xpath.XPath xpath = factory.newXPath();
NamespaceContext ctx = new NamespaceContext() {
#Override
public String getNamespaceURI(String aPrefix) {
if (aPrefix.equals("xfi"))
return "http://www.xbrl.org/2005/function/instance";
else if (aPrefix.equals("xs"))
return "http://www.w3.org/2001/XMLSchema";
else if (aPrefix.equals("xbrli"))
return "http://www.xbrl.org/2003/instance";
else
return null;
}
#Override
public Iterator getPrefixes(String val) {
throw new UnsupportedOperationException();
}
#Override
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
};
xpath.setNamespaceContext(ctx);
try {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
Document someXML = documentBuilder.parse(new InputSource(new StringReader(new String(xbrlBytes))));
// 3. Compile an XPath string into an XPathExpression
javax.xml.xpath.XPathExpression expression = xpath.compile(expressionValue);
Object result = expression.evaluate(someXML, XPathConstants.NODE);
// 4. Evaluate the XPath expression on an input document
Node nodes = (Node) result;
return nodeToString(nodes);
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
When I evaluate xfi:identifier(args) , i get String like below:
<xbrli:identifier xmlns:xbrli="http://www.xbrl.org/2003/instance"
xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
xmlns:jenv-bw2-dim="http://www.nltaxonomie.nl/nt13/jenv/20181212/dictionary/jenv-bw2-axes"
xmlns:jenv-bw2-dm="http://www.nltaxonomie.nl/nt13/jenv/20181212/dictionary/jenv-bw2-domains"
xmlns:jenv-bw2-i="http://www.nltaxonomie.nl/nt13/jenv/20181212/dictionary/jenv-bw2-data"
xmlns:kvk-i="http://www.nltaxonomie.nl/nt13/kvk/20181212/dictionary/kvk-data"
xmlns:link="http://www.xbrl.org/2003/linkbase"
xmlns:nl-cd="http://www.nltaxonomie.nl/nt13/sbr/20180301/dictionary/nl-common-data"
xmlns:rj-i="http://www.nltaxonomie.nl/nt13/rj/20181212/dictionary/rj-data"
xmlns:rj-t="http://www.nltaxonomie.nl/nt13/rj/20181212/dictionary/rj-tuples"
xmlns:xbrldi="http://xbrl.org/2006/xbrldi"
xmlns:xlink="http://www.w3.org/1999/xlink"
scheme="http://www.kvk.nl/kvk-id">62394207</xbrli:identifier>
However, I want to evaluate function number(xfi:identifier(args))
This results in NaN which is obvious because complete node string cannot be converted to number. I think, I need to change my function so that it returns Node. However, I am not sure how to do that. I tried google and also looked at Saxon documentation, but no luck yet.
Can someone help me? Basically, custom function should return an element node as per definition. and when I use number(xfi:identifier) it should give me 62394207 in this case.
regards,
Venky
Firstly, the XBRL spec for the function seems to imply that the function expects a node as argument and returns a node as its result, but in your implementation getArgumentTypes() and getResultType() define the type as xs:string - so this needs to change.
And the function should return an XdmNode, which is a subclass of XdmValue.
Next, it's very inefficient to be creating a DocumentBuilderFactory and XPathFactory, constructing an XML document tree, and compiling an XPath expression, every time your function is executed. I strongly suspect none of this is necessary.
Instead of having this.getXbrl() return a raw lexical document as byte[], have it return a prebuilt XdmNode representing the document tree. And then I would suggest that rather than selecting within that tree using XPath, you use Saxon's linq-like navigation API. If this XdmNode is in variable "root", then the XPath expression
//xbrli:xbrl/xbrli:context[#id=("+arg+"/#contextRef"+")
translates into something like
root.select(descendant("xbrl").then(child("context)).where(attributeEq("id", arg))
(except that I'm not quite sure what you're passing as arg to make your XPath expression make sense).
But you can use XPath if you prefer; just use Saxon's s9api interfaces for XPath and make sure the XPath expression is only compiled once and used repeatedly. It's straightforward then to get an XdmNode as the result of your XPath expression, which can be returned directly as the result of your extension function.

Updating AST in jjtree/javacc

I have a small c like language for which I have generated a parser and created a printVisitor using jjtree. Now, I want to be able to change the AST and print the new modified AST.
For example I have a production for a construct called "taint" which wraps definitions inside it. I want while parsing to change that AST to point directly to the declaration and ignoring the taint constructs.
That this
taint(int x) ====becomes after parsing===> int x
how is that possible? I am not sure if I need a visitor for that or can I change the jjtree directly to include that.
Below is the code for the production. Your help is much appreciated.!
void Taint() #Taint: {Token t; String varname; }
{
t = <TAINT>< LPAREN> varname=Primitive()< RPAREN>
{
ST.put(varname, new STC(varname, true, true));
jjtThis.value=t.image;
}
}
String Primitive() #Primitive: { String type; String name;}
{
type=Type() name=Id()
{
ST.put(name, new STC(name, false, true) );
return name;
}
}

Custom embedded language netbeans

I'm developing plugin for Netbeans IDE that will provide support for new custom language. I have created parser and lexer for my custom language using ANTLR features. Besides, my language contains some "SQL-like" queries that are very complicated, so i decided to write separate grammar for "SQL-like" queries. Consequently, I had to make parser and lexer for my "SQL-like" language. As a result, I have two languages where "SQL-like" language is an embedded language.
Netbeans provides class EmbeddingProvider which is responsible for embedding languages. Here is my EmbeddingProvider:
#EmbeddingProvider.Registration(mimeType = "text/x-lorx", targetMimeType = "text/x-sqll")
public class LorxEmbeddingProvider extends EmbeddingProvider {
#Override
public List<Embedding> getEmbeddings(Snapshot snapshot) {
TokenHierarchy th = snapshot.getTokenHierarchy();
TokenSequence<LorxTokenId> ts = th.tokenSequence(LorxTokenId.getLanguage());
List<Embedding> embeddings = new ArrayList<>();
while(ts.moveNext()) {
Token currToken = ts.token();
if(currToken.id().ordinal() == LorxTokenType.SqllLiteral.id) {
embeddings.add(snapshot.create(currToken.text(), "text/x-sqll"));
}
}
return embeddings;
}
#Override
public int getPriority() {
return 140;
}
#Override
public void cancel() {
}
}
Annotation is used for determining top-level language("text/x-lorx") and it's embedded language ("text/x-sqll").
Method getEmbeddings(Snapshot snapshot) executes when we open some file in the editor or just move caret to another position. I use Snapshot class to get token sequence of the current opened file. In this code sample i am iterating tokens in search of SqllLiteral token(It's like [select * from ...]). If i find this token, i create new Embedding.
public class SqllParserFactory extends ParserFactory {
#Override
public Parser createParser(Collection<Snapshot> snapshots) {
return new SqllNBParser();
}
}
After finishing getEmbeddings(Snapshot snapshot) method, SqllParserFactory of an embedded language creates new parser for sqll language and then happens nothing. I would like to know if i'm on the right way and also i would be happy if someone gave me an advise how split embedded language text into tokens.
Instead of using EmbeddingProvider you can try to use LanguageProvider.
#ServiceProvider(service = LanguageProvider.class)
public class MyEmbeddingLanguageProvider extends LanguageProvider {
#Override
public LanguageEmbedding<?> findLanguageEmbedding(Token<?> token,
LanguagePath languagePath, InputAttributes inputAttributes) {
Language embeddedLanguage = MimeLookup.getLookup("text/sqll").lookup(Language.class);
if (embeddedLanguage != null && languagePath.mimePath().equals("text/x-lorx")) {
if (token.id().ordinal() == LorxTokenType.SqllLiteral.id) {
return LanguageEmbedding.create(embeddedLanguage, 0, 0, true);
}
}
return null;
}
#Override
public Language<?> findLanguage(String mimeType) {
return null;
}
}
It should work for tokenizing both languages (which can result in nice syntax coloring). Unfortunately parsing of embedded language may not work.
The problem has been solved by overriding method in class LanguageHierarchy. I was trying to override this method before, but I used it incorrectly. The problem was in passing wrong params to method LanguageEmbedding.create(SqllTokenId.getLanguage(), 0, 0);. Instead of passing zeros, I passed length of my token, so it was skipping my token, because the second param is startSkipLength and the third is endSkipLength. The code below is correct. The LorxTokenType.SqllLiteral is token of my "sql-like" query. It is resolved as an embedded language now and processing by another lexer and parser (in my case by Sqll lexer and parser).
#Override
protected LanguageEmbedding<?> embedding(Token<LorxTokenId> token, LanguagePath languagePath, InputAttributes inputAttributes) {
if(token.id().ordinal() == LorxTokenType.SqllLiteral.id) {
return LanguageEmbedding.create(SqllTokenId.getLanguage(), 0, 0);
}
return null;
}

What is the idiomatic way to run a Specification over multitple implementations with Spock?

Let's say I have a Joiner interface:
interface Joiner {
String join(List<String> l);
}
And two implementations of it:
class Java8Joiner implements Joiner {
#Override String join(List<String> l) { String.join("", l); }
}
class GroovyJoiner implements Joiner {
#Override String join(List<String> l) { l.join('') }
}
What is the best way to check that theses two implementation, or more, pass the same tests?
Basically, I want to run all the tests defined by JoinerSpecification with a Java8Joiner, then a GroovyJoiner, and so on...
class JoinerSpecification extends Specification {
#Subject
final def joiner = // Joiner instance here
def "Joining #list -> #expectedString"(List<String> list, String expectedString) {
expect:
joiner.join(list) == expectedString
where:
list | expectedString
[] | ''
['a'] | 'a'
['a', 'b'] | 'ab'
}
}
It is perfectly fine to refactor JoinerSpecification if needed.
My primary goals are:
Maintainable test code (with as little boilerplate as possible)
Clear error messages (which test of which implementation is failing)
Easy to run from the IDE
Edit 1 (15/06/2015)
Reworded my question & added some details to make it clearer.
Opal's proposal is interesting but is impractical. Not being able to use then blocks is a showstopper.
Lets try a self-answer using another approach. Since the goal is to execute the same specification on several implementations of the same interface, why not try to use good old inheritance ?
abstract class AbstractJoinerSpec extends Specification {
abstract Joiner getSubject()
#Unroll
def "Joining #list -> #expectedString"(List<String> list, String expectedString) {
given:
def joiner = getSubject()
expect:
joiner.join(list) == expectedString
where:
list | expectedString
[] | ''
['a'] | 'a'
['a', 'b'] | 'ab'
}
}
class Java8JoinerSpec extends AbstractJoinerSpec {
#Override
Joiner getSubject() { new Java8Joiner() }
}
class GroovyJoinerSpec extends AbstractJoinerSpec {
#Override
Joiner getSubject() { new Java8Joiner() }
}
Basically, the only abstract method is getSubject and all the inherited test cases are executed by Spock (I have not been able to find this feature in the documentation but it does work).
Pros
It only requires creating a tiny class for each implementation and the specification code remains clean.
The full power of Spock can be used
Error messages and reports are clear (you know which test failed for which implementation)
Cons
Running a given test for a given implementation from the IDE is not easy. You have to copy/paste the test case into the non abstract specification or remove the abstract modifiers and implement the getSubject method.
I thing the trade-off acceptable since you don't manually run/debug an implementation that often. Keeping the code clean, and having meaningful report worth this inconvenience.
What would be very useful here is comparison chaining but, as far as I know it's not implemented in groovy, see here. Since Subject purpose according to the docs is pure informational I've come up with the following ideas:
#Grab('org.spockframework:spock-core:0.7-groovy-2.0')
#Grab('cglib:cglib-nodep:3.1')
import spock.lang.*
class Test extends Specification {
def 'attempt 1'() {
given:
def joiners = [new GroovyJoiner(), new Java8Joiner()]
expect:
joiners.every { it.join(list) == expectedString }
where:
list | expectedString
[] | ''
['a'] | 'a'
['a', 'b'] | 'ab'
}
def 'attempt 2'() {
given:
def gJoiner = new GroovyJoiner()
def jJoiner = new Java8Joiner()
expect:
[gJoiner.join(list), jJoiner.join(list), expectedString].toSet().size() == 1
where:
list | expectedString
[] | ''
['a'] | 'a'
['a', 'b'] | 'ab'
}
}
interface Joiner {
String join(List<String> l);
}
class Java8Joiner implements Joiner {
#Override String join(List<String> l) { String.join("", l); }
}
class GroovyJoiner implements Joiner {
#Override String join(List<String> l) { l.join('') }
}
In the attempt 1 I'm just checking if every element on the list is equal to expectedString. It looks nice however on error it's not verbose enough. Any element may fail and there's no trace which one.
The attempt 2 method makes use of Set. Since all the results should be equal there'll be only one element left in the collection. In prints verbose info on test failure.
What also comes to my head is guava's ComparisonChain but no idea if it would be useful as well as it's another project dependency.
Unfortunately, there is no way to create a cartesian product of two lists declaratively in Spock. You have to define your own Iterable that will provide the values for your variables.
There is a more readable (using Spock tabular data definition) way to define the data, if you are willing to sacrifice terseness for readability. Let me know if that interests you. Otherwise here is a solution that lets you define everything declaratively, without duplication, but you lose the nice tabular definitions of Spock.
If your implementation is stateless, i.e. it does not keep state, you can get away with using #Shared instances of Joiner:
import spock.lang.Shared
import spock.lang.Specification
import spock.lang.Unroll
interface Joiner {
String join(List<String> l);
}
class Java8Joiner implements Joiner {
#Override
String join(List<String> l) { String.join("", l); }
}
class GroovyJoiner implements Joiner {
#Override
String join(List<String> l) { l.join('') }
}
class S1Spec extends Specification {
#Shared
Joiner java8Joiner = new Java8Joiner()
#Shared
Joiner groovyJoiner = new GroovyJoiner()
List<Map> transpose(Map<List> mapOfLists) {
def listOfMaps = [].withDefault { [:] }
mapOfLists.each { k, values ->
[values].flatten().eachWithIndex { value, index ->
listOfMaps[index][k] = value
}
}
listOfMaps
}
Iterable distribute(List<Joiner> joiners, Map<String, List> data) {
def varsForEachIteration = transpose(data)
new Iterable() {
#Override
Iterator iterator() {
[joiners, varsForEachIteration].combinations().iterator()
}
}
}
#Unroll
def "Joining with #joiner #list -> #expectedString"() {
expect:
joiner.join(list) == expectedString
where:
[joiner, data] << distribute([java8Joiner, groovyJoiner], [
list : [[], ['a'], ['a', 'b']],
expectedString: ['', 'a', 'ab']
])
and:
list = data.list
expectedString = data.expectedString
}
}

Test that either one thing holds or another in AssertJ

I am in the process of converting some tests from Hamcrest to AssertJ. In Hamcrest I use the following snippet:
assertThat(list, either(contains(Tags.SWEETS, Tags.HIGH))
.or(contains(Tags.SOUPS, Tags.RED)));
That is, the list may be either that or that. How can I express this in AssertJ? The anyOf function (of course, any is something else than either, but that would be a second question) takes a Condition; I have implemented that myself, but it feels as if this should be a common case.
Edited:
Since 3.12.0 AssertJ provides satisfiesAnyOf which succeeds is one of the given assertion succeeds,
assertThat(list).satisfiesAnyOf(
listParam -> assertThat(listParam).contains(Tags.SWEETS, Tags.HIGH),
listParam -> assertThat(listParam).contains(Tags.SOUPS, Tags.RED)
);
Original answer:
No, this is an area where Hamcrest is better than AssertJ.
To write the following assertion:
Set<String> goodTags = newLinkedHashSet("Fine", "Good");
Set<String> badTags = newLinkedHashSet("Bad!", "Awful");
Set<String> tags = newLinkedHashSet("Fine", "Good", "Ok", "?");
// contains is statically imported from ContainsCondition
// anyOf succeeds if one of the conditions is met (logical 'or')
assertThat(tags).has(anyOf(contains(goodTags), contains(badTags)));
you need to create this Condition:
import static org.assertj.core.util.Lists.newArrayList;
import java.util.Collection;
import org.assertj.core.api.Condition;
public class ContainsCondition extends Condition<Iterable<String>> {
private Collection<String> collection;
public ContainsCondition(Iterable<String> values) {
super("contains " + values);
this.collection = newArrayList(values);
}
static ContainsCondition contains(Collection<String> set) {
return new ContainsCondition(set);
}
#Override
public boolean matches(Iterable<String> actual) {
Collection<String> values = newArrayList(actual);
for (String string : collection) {
if (!values.contains(string)) return false;
}
return true;
};
}
It might not be what you if you expect that the presence of your tags in one collection implies they are not in the other one.
Inspired by this thread, you might want to use this little repo I put together, that adapts the Hamcrest Matcher API into AssertJ's Condition API. Also includes a handy-dandy conversion shell script.

Resources