Custom embedded language netbeans - parsing

I'm developing plugin for Netbeans IDE that will provide support for new custom language. I have created parser and lexer for my custom language using ANTLR features. Besides, my language contains some "SQL-like" queries that are very complicated, so i decided to write separate grammar for "SQL-like" queries. Consequently, I had to make parser and lexer for my "SQL-like" language. As a result, I have two languages where "SQL-like" language is an embedded language.
Netbeans provides class EmbeddingProvider which is responsible for embedding languages. Here is my EmbeddingProvider:
#EmbeddingProvider.Registration(mimeType = "text/x-lorx", targetMimeType = "text/x-sqll")
public class LorxEmbeddingProvider extends EmbeddingProvider {
#Override
public List<Embedding> getEmbeddings(Snapshot snapshot) {
TokenHierarchy th = snapshot.getTokenHierarchy();
TokenSequence<LorxTokenId> ts = th.tokenSequence(LorxTokenId.getLanguage());
List<Embedding> embeddings = new ArrayList<>();
while(ts.moveNext()) {
Token currToken = ts.token();
if(currToken.id().ordinal() == LorxTokenType.SqllLiteral.id) {
embeddings.add(snapshot.create(currToken.text(), "text/x-sqll"));
}
}
return embeddings;
}
#Override
public int getPriority() {
return 140;
}
#Override
public void cancel() {
}
}
Annotation is used for determining top-level language("text/x-lorx") and it's embedded language ("text/x-sqll").
Method getEmbeddings(Snapshot snapshot) executes when we open some file in the editor or just move caret to another position. I use Snapshot class to get token sequence of the current opened file. In this code sample i am iterating tokens in search of SqllLiteral token(It's like [select * from ...]). If i find this token, i create new Embedding.
public class SqllParserFactory extends ParserFactory {
#Override
public Parser createParser(Collection<Snapshot> snapshots) {
return new SqllNBParser();
}
}
After finishing getEmbeddings(Snapshot snapshot) method, SqllParserFactory of an embedded language creates new parser for sqll language and then happens nothing. I would like to know if i'm on the right way and also i would be happy if someone gave me an advise how split embedded language text into tokens.

Instead of using EmbeddingProvider you can try to use LanguageProvider.
#ServiceProvider(service = LanguageProvider.class)
public class MyEmbeddingLanguageProvider extends LanguageProvider {
#Override
public LanguageEmbedding<?> findLanguageEmbedding(Token<?> token,
LanguagePath languagePath, InputAttributes inputAttributes) {
Language embeddedLanguage = MimeLookup.getLookup("text/sqll").lookup(Language.class);
if (embeddedLanguage != null && languagePath.mimePath().equals("text/x-lorx")) {
if (token.id().ordinal() == LorxTokenType.SqllLiteral.id) {
return LanguageEmbedding.create(embeddedLanguage, 0, 0, true);
}
}
return null;
}
#Override
public Language<?> findLanguage(String mimeType) {
return null;
}
}
It should work for tokenizing both languages (which can result in nice syntax coloring). Unfortunately parsing of embedded language may not work.

The problem has been solved by overriding method in class LanguageHierarchy. I was trying to override this method before, but I used it incorrectly. The problem was in passing wrong params to method LanguageEmbedding.create(SqllTokenId.getLanguage(), 0, 0);. Instead of passing zeros, I passed length of my token, so it was skipping my token, because the second param is startSkipLength and the third is endSkipLength. The code below is correct. The LorxTokenType.SqllLiteral is token of my "sql-like" query. It is resolved as an embedded language now and processing by another lexer and parser (in my case by Sqll lexer and parser).
#Override
protected LanguageEmbedding<?> embedding(Token<LorxTokenId> token, LanguagePath languagePath, InputAttributes inputAttributes) {
if(token.id().ordinal() == LorxTokenType.SqllLiteral.id) {
return LanguageEmbedding.create(SqllTokenId.getLanguage(), 0, 0);
}
return null;
}

Related

Equivalent of tuples in Dart [duplicate]

Is there a way to return several values in a function return statement (other than returning an object) like we can do in Go (or some other languages)?
For example, in Go we can do:
func vals() (int, int) {
return 3, 7
}
Can this be done in Dart? Something like this:
int, String foo() {
return 42, "foobar";
}
Dart doesn't support multiple return values.
You can return an array,
List foo() {
return [42, "foobar"];
}
or if you want the values be typed use a Tuple class like the package https://pub.dartlang.org/packages/tuple provides.
See also either for a way to return a value or an error.
I'd like to add that one of the main use-cases for multiple return values in Go is error handling which Dart handle's in its own way with Exceptions and failed promises.
Of course this leaves a few other use-cases, so let's see how code looks when using explicit tuples:
import 'package:tuple/tuple.dart';
Tuple2<int, String> demo() {
return new Tuple2(42, "life is good");
}
void main() {
final result = demo();
if (result.item1 > 20) {
print(result.item2);
}
}
Not quite as concise, but it's clean and expressive code. What I like most about it is that it doesn't need to change much once your quick experimental project really takes off and you start adding features and need to add more structure to keep on top of things.
class FormatResult {
bool changed;
String result;
FormatResult(this.changed, this.result);
}
FormatResult powerFormatter(String text) {
bool changed = false;
String result = text;
// secret implementation magic
// ...
return new FormatResult(changed, result);
}
void main() {
String draftCode = "print('Hello World.');";
final reformatted = powerFormatter(draftCode);
if (reformatted.changed) {
// some expensive operation involving servers in the cloud.
}
}
So, yes, it's not much of an improvement over Java, but it works, it is clear, and reasonably efficient for building UIs. And I really like how I can quickly hack things together (sometimes starting on DartPad in a break at work) and then add structure later when I know that the project will live on and grow.
Create a class:
import 'dart:core';
class Tuple<T1, T2> {
final T1 item1;
final T2 item2;
Tuple({
this.item1,
this.item2,
});
factory Tuple.fromJson(Map<String, dynamic> json) {
return Tuple(
item1: json['item1'],
item2: json['item2'],
);
}
}
Call it however you want!
Tuple<double, double>(i1, i2);
or
Tuple<double, double>.fromJson(jsonData);
You can create a class to return multiple values
Ej:
class NewClass {
final int number;
final String text;
NewClass(this.number, this.text);
}
Function that generates the values:
NewClass buildValues() {
return NewClass(42, 'foobar');
}
Print:
void printValues() {
print('${this.buildValues().number} ${this.buildValues().text}');
// 42 foobar
}
The proper way to return multiple values would be to store those values in a class, whether your own custom class or a Tuple. However, defining a separate class for every function is very inconvenient, and using Tuples can be error-prone since the members won't have meaningful names.
Another (admittedly gross and not very Dart-istic) approach is try to mimic the output-parameter approach typically used by C and C++. For example:
class OutputParameter<T> {
T value;
OutputParameter(this.value);
}
void foo(
OutputParameter<int> intOut,
OutputParameter<String>? optionalStringOut,
) {
intOut.value = 42;
optionalStringOut?.value = 'foobar';
}
void main() {
var theInt = OutputParameter(0);
var theString = OutputParameter('');
foo(theInt, theString);
print(theInt.value); // Prints: 42
print(theString.value); // Prints: foobar
}
It certainly can be a bit inconvenient for callers to have to use variable.value everywhere, but in some cases it might be worth the trade-off.
you can use dartz package for Returning multiple data types
https://www.youtube.com/watch?v=8yMXUC4W1cc&t=110s
Dart is finalizing records, a fancier tuple essentially.
Should be in a stable release a month from the time of writing.
I'll try to update, it's already available with experiments flags.
you can use Set<Object> for returning multiple values,
Set<object> foo() {
return {'my string',0}
}
print(foo().first) //prints 'my string'
print(foo().last) //prints 0
In this type of situation in Dart, an easy solution could return a list then accessing the returned list as per your requirement. You can access the specific value by the index or the whole list by a simple for loop.
List func() {
return [false, 30, "Ashraful"];
}
void main() {
final list = func();
// to access specific list item
var item = list[2];
// to check runtime type
print(item.runtimeType);
// to access the whole list
for(int i=0; i<list.length; i++) {
print(list[i]);
}
}

Saxon Extension function in Java return Node

I am trying to implement custom function using Saxon as defined here-> https://specifications.xbrl.org/registries/functions-registry-1.0/80132%20xfi.identifier/80132%20xfi.identifier%20function.html
public class IdentifierFunction implements ExtensionFunction {
public QName getName() {
return new QName("http://www.xbrl.org/2005/function/instance", "identifier");
}
public SequenceType getResultType() {
return SequenceType.makeSequenceType(ItemType.STRING, OccurrenceIndicator.ONE);
}
public net.sf.saxon.s9api.SequenceType[] getArgumentTypes() {
return new SequenceType[] { SequenceType.makeSequenceType(ItemType.STRING, OccurrenceIndicator.ONE) };
}
public XdmValue call(XdmValue[] arguments) throws SaxonApiException {
String arg = ((XdmAtomicValue) arguments[0].itemAt(0)).getStringValue();
String newExpression="(//xbrli:xbrl/xbrli:context[#id=("+arg+"/#contextRef"+")])[1]/xbrli:entity/xbrli:identifier";
String nodeString=this.getxPathResolver().resolveNode(this.getXbrl(),newExpression);
return new XdmAtomicValue(nodeString);
}
}
resolveNode() is above code is implemented as follows
public String resolveNode(byte[] xbrlBytes, String expressionValue) {
// 1. Instantiate an XPathFactory.
javax.xml.xpath.XPathFactory factory = new XPathFactoryImpl();
// 2. Use the XPathFactory to create a new XPath object
javax.xml.xpath.XPath xpath = factory.newXPath();
NamespaceContext ctx = new NamespaceContext() {
#Override
public String getNamespaceURI(String aPrefix) {
if (aPrefix.equals("xfi"))
return "http://www.xbrl.org/2005/function/instance";
else if (aPrefix.equals("xs"))
return "http://www.w3.org/2001/XMLSchema";
else if (aPrefix.equals("xbrli"))
return "http://www.xbrl.org/2003/instance";
else
return null;
}
#Override
public Iterator getPrefixes(String val) {
throw new UnsupportedOperationException();
}
#Override
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
};
xpath.setNamespaceContext(ctx);
try {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
Document someXML = documentBuilder.parse(new InputSource(new StringReader(new String(xbrlBytes))));
// 3. Compile an XPath string into an XPathExpression
javax.xml.xpath.XPathExpression expression = xpath.compile(expressionValue);
Object result = expression.evaluate(someXML, XPathConstants.NODE);
// 4. Evaluate the XPath expression on an input document
Node nodes = (Node) result;
return nodeToString(nodes);
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
When I evaluate xfi:identifier(args) , i get String like below:
<xbrli:identifier xmlns:xbrli="http://www.xbrl.org/2003/instance"
xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
xmlns:jenv-bw2-dim="http://www.nltaxonomie.nl/nt13/jenv/20181212/dictionary/jenv-bw2-axes"
xmlns:jenv-bw2-dm="http://www.nltaxonomie.nl/nt13/jenv/20181212/dictionary/jenv-bw2-domains"
xmlns:jenv-bw2-i="http://www.nltaxonomie.nl/nt13/jenv/20181212/dictionary/jenv-bw2-data"
xmlns:kvk-i="http://www.nltaxonomie.nl/nt13/kvk/20181212/dictionary/kvk-data"
xmlns:link="http://www.xbrl.org/2003/linkbase"
xmlns:nl-cd="http://www.nltaxonomie.nl/nt13/sbr/20180301/dictionary/nl-common-data"
xmlns:rj-i="http://www.nltaxonomie.nl/nt13/rj/20181212/dictionary/rj-data"
xmlns:rj-t="http://www.nltaxonomie.nl/nt13/rj/20181212/dictionary/rj-tuples"
xmlns:xbrldi="http://xbrl.org/2006/xbrldi"
xmlns:xlink="http://www.w3.org/1999/xlink"
scheme="http://www.kvk.nl/kvk-id">62394207</xbrli:identifier>
However, I want to evaluate function number(xfi:identifier(args))
This results in NaN which is obvious because complete node string cannot be converted to number. I think, I need to change my function so that it returns Node. However, I am not sure how to do that. I tried google and also looked at Saxon documentation, but no luck yet.
Can someone help me? Basically, custom function should return an element node as per definition. and when I use number(xfi:identifier) it should give me 62394207 in this case.
regards,
Venky
Firstly, the XBRL spec for the function seems to imply that the function expects a node as argument and returns a node as its result, but in your implementation getArgumentTypes() and getResultType() define the type as xs:string - so this needs to change.
And the function should return an XdmNode, which is a subclass of XdmValue.
Next, it's very inefficient to be creating a DocumentBuilderFactory and XPathFactory, constructing an XML document tree, and compiling an XPath expression, every time your function is executed. I strongly suspect none of this is necessary.
Instead of having this.getXbrl() return a raw lexical document as byte[], have it return a prebuilt XdmNode representing the document tree. And then I would suggest that rather than selecting within that tree using XPath, you use Saxon's linq-like navigation API. If this XdmNode is in variable "root", then the XPath expression
//xbrli:xbrl/xbrli:context[#id=("+arg+"/#contextRef"+")
translates into something like
root.select(descendant("xbrl").then(child("context)).where(attributeEq("id", arg))
(except that I'm not quite sure what you're passing as arg to make your XPath expression make sense).
But you can use XPath if you prefer; just use Saxon's s9api interfaces for XPath and make sure the XPath expression is only compiled once and used repeatedly. It's straightforward then to get an XdmNode as the result of your XPath expression, which can be returned directly as the result of your extension function.

Updating AST in jjtree/javacc

I have a small c like language for which I have generated a parser and created a printVisitor using jjtree. Now, I want to be able to change the AST and print the new modified AST.
For example I have a production for a construct called "taint" which wraps definitions inside it. I want while parsing to change that AST to point directly to the declaration and ignoring the taint constructs.
That this
taint(int x) ====becomes after parsing===> int x
how is that possible? I am not sure if I need a visitor for that or can I change the jjtree directly to include that.
Below is the code for the production. Your help is much appreciated.!
void Taint() #Taint: {Token t; String varname; }
{
t = <TAINT>< LPAREN> varname=Primitive()< RPAREN>
{
ST.put(varname, new STC(varname, true, true));
jjtThis.value=t.image;
}
}
String Primitive() #Primitive: { String type; String name;}
{
type=Type() name=Id()
{
ST.put(name, new STC(name, false, true) );
return name;
}
}

Test that either one thing holds or another in AssertJ

I am in the process of converting some tests from Hamcrest to AssertJ. In Hamcrest I use the following snippet:
assertThat(list, either(contains(Tags.SWEETS, Tags.HIGH))
.or(contains(Tags.SOUPS, Tags.RED)));
That is, the list may be either that or that. How can I express this in AssertJ? The anyOf function (of course, any is something else than either, but that would be a second question) takes a Condition; I have implemented that myself, but it feels as if this should be a common case.
Edited:
Since 3.12.0 AssertJ provides satisfiesAnyOf which succeeds is one of the given assertion succeeds,
assertThat(list).satisfiesAnyOf(
listParam -> assertThat(listParam).contains(Tags.SWEETS, Tags.HIGH),
listParam -> assertThat(listParam).contains(Tags.SOUPS, Tags.RED)
);
Original answer:
No, this is an area where Hamcrest is better than AssertJ.
To write the following assertion:
Set<String> goodTags = newLinkedHashSet("Fine", "Good");
Set<String> badTags = newLinkedHashSet("Bad!", "Awful");
Set<String> tags = newLinkedHashSet("Fine", "Good", "Ok", "?");
// contains is statically imported from ContainsCondition
// anyOf succeeds if one of the conditions is met (logical 'or')
assertThat(tags).has(anyOf(contains(goodTags), contains(badTags)));
you need to create this Condition:
import static org.assertj.core.util.Lists.newArrayList;
import java.util.Collection;
import org.assertj.core.api.Condition;
public class ContainsCondition extends Condition<Iterable<String>> {
private Collection<String> collection;
public ContainsCondition(Iterable<String> values) {
super("contains " + values);
this.collection = newArrayList(values);
}
static ContainsCondition contains(Collection<String> set) {
return new ContainsCondition(set);
}
#Override
public boolean matches(Iterable<String> actual) {
Collection<String> values = newArrayList(actual);
for (String string : collection) {
if (!values.contains(string)) return false;
}
return true;
};
}
It might not be what you if you expect that the presence of your tags in one collection implies they are not in the other one.
Inspired by this thread, you might want to use this little repo I put together, that adapts the Hamcrest Matcher API into AssertJ's Condition API. Also includes a handy-dandy conversion shell script.

How can I access alternate labels in ANTLR4 while generically traversing a parse tree?

How can I access alternate labels in ANTLR4 while generically traversing a parse tree? Or alternatively, is there any way of replicating the functionality of the ^ operator of ANTLR3, as that would do the trick.
I'm trying to write an AST pretty printer for any ANTLR4 grammar adhering to a simple methodology (like naming productions with alternate labels). I'd like to be able to pretty print a term like 3 + 5 as (int_expression (plus (int_literal 3) (int_literal 5))), or something similar, given a grammar like the following:
int_expression
: int_expression '+' int_expression # plus
| int_expression '-' int_expression # minus
| raw_int # int_literal
;
raw_int
: Int
;
Int : [0-9]+ ;
I am unable to effectively give names to the plus and minus productions, because pulling them out into their own production causes the tool to complain that the rules are mutually left-recursive. If I can't pull them out, how can I give these productions names?
Note 1: I was able to get rid of the + argument methodologically by putting "good" terminals (e.g., the Int above) in special productions (productions starting with a special prefix, like raw_). Then I could print only those terminals whose parent productions are named "raw_..." and elide all others. This worked great for getting rid of +, while keeping 3 and 5 in the output. This could be done with a ! in ANTLR3.
Note 2: I understand that I could write a specialized pretty printer or use actions for each production of a given language, but I'd like to use ANTLR4 to parse and generate ASTs for a variety of languages, and it seems like I should be able to write such a simple pretty printer generically. Said another way, I only care about getting ASTs, and I'd rather not have to encumber each grammar with a tailored pretty printer just to get an AST. Perhaps I should just go back to ANTLR3?
I suggest implementing the pretty printer as a listener implementation with a nested visitor class to get the names of the various context objects.
private MyParser parser; // you'll have to assign this field
private StringBuilder builder = new StringBuilder();
#Override
public void enterEveryRule(#NotNull ParserRuleContext ctx) {
if (!builder.isEmpty()) {
builder.append(' ');
}
builder.append('(');
}
#Override
public void visitTerminalNode(#NotNull TerminalNode node) {
// TODO: print node text to builder
}
#Override
public void visitErrorNode(#NotNull TerminalNode node) {
// TODO: print node text to builder
}
#Override
public void exitEveryRule(#NotNull ParserRuleContext ctx) {
builder.append(')');
}
protected String getContextName(#NotNull ParserRuleContext ctx) {
return new ContextNameVisitor().visit(ctx);
}
protected class ContextNameVisitor extends MyParserBaseVisitor<String> {
#Override
public String visitChildren() {
return parser.getRuleNames()[ctx.getRuleIndex()];
}
#Override
public String visitPlus(#NotNull PlusContext ctx) {
return "plus";
}
#Override
public String visitMinus(#NotNull MinusContext ctx) {
return "minus";
}
#Override
public String visitInt_literal(#NotNull MinusContext ctx) {
return "int_literal";
}
}
The API doesn't contain a method to access the alternate labels.
However there is a workaround. ANTLR4 uses the alternate labels to generate java class names and those java classes can be accessed at run time.
For example: to access alternate labels in ANTLR4 while generically traversing a parse tree (with a listener) you can use the following function:
// Return the embedded alternate label between
// "$" and "Context" from the class name
String getCtxName(ParserRuleContext ctx) {
String str = ctx.getClass().getName();
str = str.substring(str.indexOf("$")+1,str.lastIndexOf("Context"));
str = str.toLowerCase();
return str;
}
Example use:
#Override
public void exitEveryRule(ParserRuleContext ctx) {
System.out.println(getCtxName(ctx));
}

Resources