I would like to parse a Xtext grammar with Xtext. Therefore I took the grammar from GitHub and adapted it a little bit. Everything works fine except the import of grammars and the defining of reused grammars with "with".
So when I create a Xtext-file that should be parsed, e.g.:
grammar org.xtext.example.mydsl.Expression with org.eclipse.xtext.common.Terminals
import "http://www.xtext.org/example/mydsl/MyDsl" as mydsl
generate expression "http://www.xtext.org/example/mydsl/Expression"
I got the following errors:
Line 1: Couldn't resolve reference to Grammar 'org.eclipse.xtext.common.Terminals'. (Even if I change the feature name to importURI or importedNamespace of the root rule and use a grammar defined in the same workspace!)
Line 3/4: Couldn't resolve reference to EPackage 'http://www.xtext.org/example/mydsl/...'.
However, I need the complete grammar for my further work and this includes especially the reused grammar (such like the Terminals, Xbase or any other grammar in the workspace) because the grammar could contain rules that reference rules from the reused one.
Is there a way to resolve the grammar? I thought already about Scoping but failed in understanding how I could use it in my case.
BTW, is there a way to parse the file extension .xtext? I get the warning, that two content parsers are implementing the same file extension and I get my model parsed in the normal Xtext-manner. Is there a way to switch to my content parser?
You may follow the dialog on https://www.eclipse.org/forums/index.php/t/1067192/ the dicussion is about using fragments to change the worflow.
For to programmatically parse (ONLY PARSING!!) xtext-files I wrote some lines of code:
public static void ParseGrammar()
{
String t = "grammar org.xtext.example.Entity with org.eclipse.xtext.common.Terminals\n" +
"generate entity \"http://www.xtext.org/example/Entity\"\n" +
"Model:\n" +
" (types+=Type)*;\n" +
"Type:\n" +
" TypeDef | Entity;\n" +
"TypeDef:\n" +
" \"typedef\" name=ID (\"mapsto\" mappedType=JAVAID)?;\n" +
"JAVAID:\n" +
" name=ID(\".\" ID)*;\n" +
"Entity:\n" +
" \"entity\" name=ID (\"extends\" superEntity=[Entity])?\n" +
" \"{\"\n" +
" (attributes+=Attribute)*\n" +
" \"}\";\n" +
"Attribute:\n" +
" type=[Type] (many?=\"*\")? name=ID;";
new org.eclipse.emf.mwe.utils.StandaloneSetup().setPlatformUri("../");
Injector injector = Guice.createInjector(new XtextRuntimeModule());
Reader reader = new InputStreamReader(new StringInputStream(t));
IParser parser = injector.getInstance(IParser.class);
IParseResult result = parser.parse(reader);
boolean err = result.hasSyntaxErrors();
EObject eRoot = result.getRootASTElement();
}
like you can see it uses "... with org.eclipse.xtext.common.Terminals ...". It did run
without any errors and produced an AST.
Related
I am trying to add instrumentation (e.g. logging some information) to methods in a Java file. I am using the following Rascal code which seems to work mostly:
import ParseTree;
import lang::java::\syntax::Java15;
// .. more imports
// project is a loc
M3 model = createM3FromEclipseProject(project);
set[loc] projectFiles = { file | file <- files(model)} ;
for (pFile <- projectFiles) {
CompilationUnit cunit = parse(#CompilationUnit, pFile);
cUnitNew = visit(cunit) {
case (MethodBody) `{<BlockStm* post>}`
=> (MethodBody) `{
'System.out.println(new Throwable().getStackTrace()[0]);
'<BlockStm* post>
'}`
}
writeFile(pFile, cUnitNew);
}
I am running into two issues regarding whitespace, which might be unrelated.
The line of code that I am inserting does not preserve whitespace that was there previously. If there was a tab character, it will now be removed. The same is true for the line directly following the line I am inserting and the closing brace. How can I 'capture' whitespace in my pattern?
Example before transforming (all lines start with a tab character, line 2 and 3 with two):
void beforeFirst() throws Exception {
rowIdx = -1;
rowSource.beforeFirst();
}
Example after transforming:
void beforeFirst() throws Exception {
System.out.println(new Throwable().getStackTrace()[0]);
rowIdx = -1;
rowSource.beforeFirst();
}
An additional issue regarding whitespace; if a file ends on a newline character, the parse function will throw a ParseError without further details. Removing this newline from the original source will fix the issue, but I'd rather not 'manually' have to fix code before parsing. How can I circumvent this issue?
Alas, capturing whitespace with a concrete pattern is not a feature of the current version of Rascal. We used to have it, but now it's back on the TODO list. I can point you to papers about the topic if you are interested. So for now you have to deal with this "damage" later.
You could write a Tree to Tree transformation on the generic level (see ParseTree.rsc), to fix indentation issues in a parse tree after your transformation, or to re-insert the comments that you lost. This is about matching the Tree data-type and appl constructors. The Tree format is a form of reflection on the parse trees of Rascal that allow any kind of transformation, including whitespace and comments.
The parse error you talked about is caused by not using the start non-terminal. If you use parse(#start[CompilationUnit], ...) then whitespace and comments before and after the CompilationUnit are accepted.
I'm trying to re-create Tijs' CurryOn16 example "TrafoFields" scraping the code from the video, but using the Java18.rsc grammar instead of his Java15.rsc. I've parsed the Example.java successfully in the repl, like he did in the video, yielding a var pt. I then try to do the transformation with trafoFields(pt). The response I get is:
|project://Rascal-Test/src/TrafoFields.rsc|(235,142,<12,9>,<16,11>): Syntax error: concrete syntax fragment
My TrafoFields.rsc looks like this:
module TrafoFields
import lang::java::\syntax::Java18;
/**
* - Make public fields private
* - add getters and setters
*/
start[CompilationUnit] trafoFields(start[CompilationUnit] cu) {
return innermost visit (cu) {
case (ClassBody)`{
' <ClassBodyDeclaration* cs1>
' public <Type t> <ID f>;
' <ClassBodyDeclaration* cs2>
'}`
=> (ClassBody)`{
' <ClassBodyDeclaration* cs1>
' private <Type t> <ID f>;
' public void <ID setter>(<Type t> x) {
' this.<ID f> = x;
' }
' public <Type t> <ID getter>() {
' return this.<ID f>;
' }
' <ClassBodyDeclaration* cs2>
'}`
when
ID setter := [ID]"set<f>",
ID getter := [ID]"get<f>"
}
}
The only deviation from Tijs' code is that I've changed ClassBodyDec* to ClassBodyDeclaration*, as the grammar has this as a non-terminal. Any hint what else could be wrong?
UPDATE
More non-terminal re-writing adapting to Java18 grammar:
Id => ID
Ah yes, that is the Achilles-heal of concrete syntax usability; parse errors.
Note that a generalized parser (such as GLL which Rascal uses), simulates "unlimited lookahead" and so a parse error may be reported a few characters or even a few lines after the actual cause (but never before!). So shortening the example (delta debugging) will help localize the cause.
My way-of-life in this is:
First replace all pattern holes by concrete Java snippets. I know Java, so I should be able to write a correct fragment that would have matched the holes.
If there is still a parse error, now you check the top-non-terminal. Is it the one you needed? also make sure there is no extra whitespace before the start and after the end of the fragment inside the backquotes. Still a parse error? Write a shorter fragment first for a sub-nonterminal first.
Parse error solved? this means one of the pattern holes was not syntactically correct. The type of the hole is leading here, it should be one of the non-terminals used the grammar literally, and of course at the right spot in the fragment. Add the holes back in one-by-one until you hit the error again. Then you know the cause and probably also the fix.
I have text that is already tokenized, sentence-split, and POS-tagged.
I would like to use CoreNLP to additionally annotate lemmas (lemma), named entities (ner), contituency and dependency parse (parse), and coreferences (dcoref).
Is there a combination of commandline options and option file specifications that makes this possible from the command line?
According to this question, I can ask the parser to view whitespace as delimiting tokens, and newlines as delimiting sentences by adding this to my properties file:
tokenize.whitespace = true
ssplit.eolonly = true
This works well, so all that remains is to specify to CoreNLP that I would like to provide POS tags too.
When using the Stanford Parser standing alone, it seems to be possible to have it use existing POS tags, but copying that syntax to the invocation of CoreNLP doesn't seem to work. For example, this does not work:
java -cp *:./* -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props my-properties-file -outputFormat xml -outputDirectory my-output-dir -sentences newline -tokenized -tagSeparator / -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerMethod newCoreLabelTokenizerFactory -file my-annotated-text.txt
While this question covers programmatic invocation, I'm invoking CoreNLP form the commandline as part of a larger system, so I'm really asking whether this is possible to achieve this with commandline options.
I don't think this is possible with command line options.
If you want you can make a custom annotator and include it in your pipeline you could go that route.
Here is some sample code:
package edu.stanford.nlp.pipeline;
import edu.stanford.nlp.util.logging.Redwood;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.concurrent.MulticoreWrapper;
import edu.stanford.nlp.util.concurrent.ThreadsafeProcessor;
import java.util.*;
public class ProvidedPOSTaggerAnnotator {
public String tagSeparator;
public ProvidedPOSTaggerAnnotator(String annotatorName, Properties props) {
tagSeparator = props.getProperty(annotatorName + ".tagSeparator", "_");
}
public void annotate(Annotation annotation) {
for (CoreLabel token : annotation.get(CoreAnnotations.TokensAnnotation.class)) {
int tagSeparatorSplitLength = token.word().split(tagSeparator).length;
String posTag = token.word().split(tagSeparator)[tagSeparatorSplitLength-1];
String[] wordParts = Arrays.copyOfRange(token.word().split(tagSeparator), 0, tagSeparatorSplitLength-1);
String tokenString = String.join(tagSeparator, wordParts);
// set the word with the POS tag removed
token.set(CoreAnnotations.TextAnnotation.class, tokenString);
// set the POS
token.set(CoreAnnotations.PartOfSpeechAnnotation.class, posTag);
}
}
}
This should work if you provide your token with POS tokens separated by "_". You can change it with the forcedpos.tagSeparator property.
If you set customAnnotator.forcedpos = edu.stanford.nlp.pipeline.ProvidedPOSTaggerAnnotator
to the property file, include the above class in your CLASSPATH, and then include "forcedpos" in your list of annotators after "tokenize", you should be able to pass in your own pos tags.
I may clean this up some more and actually include it in future releases for people!
I have not had time to actually test this code out, if you try it out and find errors please let me know and I'll fix it!
I'd like to run through a simple Rascal MPL parsing example, and am trying to follow Listing 1 from the Rascal Language Workbench (18531D.pdf) of May 3rd 2011. I've downloaded the current Rascal MPL version 0.5.1, and notice that a few module paths have changed. The following shows the content of my Entities.rsc:
module tut1::Entities
extend lang::std::Layout;
extend lang::std::Id;
extend Type;
start syntax Entities
= entities: Entity* entities;
syntax Entity
= #Foldable entity: "entity" Id name "{" Field* "}";
syntax Field
= field: Symbol Id name;
I'm assuming here that what was Name and Ident is now Id; and what was Type is now Symbol. I then continue as follows:
rascal>import tut1::Entities;
ok
rascal>import ParseTree;
ok
However, when I attempt to execute the crucial parse function, I receive the errors listed below. Where am I going wrong? (Despite the message I note that I can declare a Symbol variable at the Rascal prompt.)
rascal>parse(#Entities, "entity Person { string name integer age }");
Extending again?? ParseTree
Extending again?? Type
expanding parameterized symbols
generating stubs for regular
generating literals
establishing production set
generating item allocations
computing priority and associativity filter
printing the source code of the parser class
|prompt:///|(22,43,<1,22>,<1,65>): Java("Undeclared non-terminal: Symbol, in class: class org.rascalmpl.java.parser.object.$shell$")
org.rascalmpl.parser.gtd.SGTDBF.invokeExpects(SGTDBF.java:139)
org.rascalmpl.parser.gtd.SGTDBF.expandStack(SGTDBF.java:864)
org.rascalmpl.parser.gtd.SGTDBF.expand(SGTDBF.java:971)
org.rascalmpl.parser.gtd.SGTDBF.parse(SGTDBF.java:1032)
org.rascalmpl.parser.gtd.SGTDBF.parse(SGTDBF.java:1089)
org.rascalmpl.parser.gtd.SGTDBF.parse(SGTDBF.java:1082)
org.rascalmpl.interpreter.Evaluator.parseObject(Evaluator.java:493)
org.rascalmpl.interpreter.Evaluator.parseObject(Evaluator.java:544)
org.rascalmpl.library.Prelude.parse(Prelude.java:1644)
org.rascalmpl.library.Prelude.parse(Prelude.java:1637)
sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-2)
somewhere in: $shell$
The example is out-of-date. Something like this would work better:
module tut1::Entities
extend lang::std::Layout; // for spaces and such
extend lang::std::Id; // for the Id non-terminal
start syntax Entities
= entities: Entity* entities;
syntax Entity
= #Foldable entity: "entity" Id name "{" Field* "}";
syntax Field
= field: Id symbol Id name; // now Id is used instead of Symbol and "symbol" is just the name of a slot in the rule
Some explanation:
the imports/extends have gone, they were unnecessary and might be confusing
there was a missing definition for the Symbol non-terminal. I don't know what it was supposed to do, but it should have been defined syntax Symbol = ..., but it did not make sense to me and instead I reused Id to define the type of a field.
the type checker (under development) would have warned you before using the parse function.
I've written a lexer and parser using scala.util.parsing.combinators.Parsers. I have a bug in at least one of my productions, but I have so many of them that it is difficult to eyeball them to determine the problem.
What I need is a log of every attempt my Parser makes to match the input with any production; logging all the Success and Failure objects when they are instantiated would be lovely. Unfortunately, the only way I can see to do this is to extend a lot of the basic classes provided by the library, then rewriting my massive parser to extend the new classes.
Is there an easy way to get this logging behavior?
You could use the log combinator to wrap productions of your grammar. Here's the definition in Parsers.scala:
def log[T](p: => Parser[T])(name: String): Parser[T] = Parser{ in =>
println("trying "+ name +" at "+ in)
val r = p(in)
println(name +" --> "+ r)
r
}
Otherwise, I think you should be able to override success and failure, but it would be quite uninformative, since you won't know what production called them.