Converting Stanford dependency relation to dot format - parsing

I am a newbie to this field. I have dependency relation in this form:
amod(clarity-2, sound-1)
nsubj(good-6, clarity-2)
cop(good-6, is-3)
advmod(good-6, also-4)
neg(good-6, not-5)
root(ROOT-0, good-6)
nsubj(ok-10, camera-8)
cop(ok-10, is-9)
ccomp(good-6, ok-10)
As mentioned in the links we have to convert this dependency relation to dot format and then use Graphviz for drawing a 'dependency tree'. I am not able to understand how to pass this dependency relation to toDotFormat() function of edu.stanford.nlp.semgraph.SemanticGraph. When I give this string, 'amod(clarity-2, sound-1)' as input to toDotFormat() am getting the output in this form digraph amod(clarity-2, sound-1) { }.
I am trying the solution given here how to get a dependency tree with Stanford NLP parser

You need to call toDotFormat on an entire dependency tree. How have you generated these dependency trees in the first place?
If you're using the StanfordCoreNLP pipeline, adding in the toDotFormat call is easy:
Properties properties = new Properties();
props.put("annotators", "tokenize, ssplit, pos, depparse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "This is a sentence I want to parse.";
Annotation document = new Annotation(text);
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(dependencies.toDotFormat());
}

Related

Saxon - s9api - setParameter as node and access in transformation

we are trying to add parameters to a transformation at the runtime. The only possible way to do so, is to set every single parameter and not a node. We don't know yet how to create a node for the setParameter.
Current setParameter:
QName TEST XdmAtomicValue 24
Expected setParameter:
<TempNode> <local>Value1</local> </TempNode>
We searched and tried to create a XdmNode and XdmItem.
If you want to create an XdmNode by parsing XML, the best way to do it is:
DocumentBuilder db = processor.newDocumentBuilder();
XdmNode node = db.build(new StreamSource(
new StringReader("<doc><elem/></doc>")));
You could also pass a string containing lexical XML as the parameter value, and then convert it to a tree by calling the XPath parse-xml() function.
If you want to construct the XdmNode programmatically, there are a number of options:
DocumentBuilder.newBuildingStreamWriter() gives you an instance of BuildingStreamWriter which extends XmlStreamWriter, and you can create the document by writing events to it using methods such as writeStartElement, writeCharacters, writeEndElement; at the end call getDocumentNode() on the BuildingStreamWriter, which gives you an XdmNode. This has the advantage that XmlStreamWriter is a standard API, though it's not actually a very nice one, because the documentation isn't very good and as a result implementations vary in their behaviour.
Another event-based API is Saxon's Push class; this differs from most push-based event APIs in that rather than having a flat sequence of methods like:
builder.startElement('x');
builder.characters('abc');
builder.endElement();
you have a nested sequence:
Element x = Document.elem('x');
x.text('abc');
x.close();
As mentioned by Martin, there is the "sapling" API: Saplings.doc().withChild(elem(...).withChild(elem(...)) etc. This API is rather radically different from anything you might be familiar with (though it's influenced by the LINQ API for tree construction on .NET) but once you've got used to it, it reads very well. The Sapling API constructs a very light-weight tree in memory (hance the name), and converts it to a fully-fledged XDM tree with a final call of SaplingDocument.toXdmNode().
If you're familiar with DOM, JDOM2, or XOM, you can construct a tree using any of those libraries and then convert it for use by Saxon. That's a bit convoluted and only really intended for applications that are already using a third-party tree model heavily (or for users who love these APIs and prefer them to anything else).
In the Saxon Java s9api, you can construct temporary trees as SaplingNode/SaplingElement/SaplingDocument, see https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingDocument.html and https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingElement.html.
To give you a simple example constructing from a Map, as you seem to want to do:
Processor processor = new Processor();
Map<String, String> xsltParameters = new HashMap<>();
xsltParameters.put("foo", "value 1");
xsltParameters.put("bar", "value 2");
SaplingElement saplingElement = new SaplingElement("Test");
for (Map.Entry<String, String> param : xsltParameters.entrySet())
{
saplingElement = saplingElement.withChild(new SaplingElement(param.getKey()).withText(param.getValue()));
}
XdmNode paramNode = saplingElement.toXdmNode(processor);
System.out.println(paramNode);
outputs e.g. <Test><bar>value 2</bar><foo>value 1</foo></Test>.
So the key is to understand that withChild() returns a new SaplingElement.
The code can be compacted using streams e.g.
XdmNode paramNode2 = Saplings.elem("root").withChild(
xsltParameters
.entrySet()
.stream()
.map(p -> Saplings.elem(p.getKey()).withText(p.getValue()))
.collect(Collectors.toList())
.toArray(SaplingElement[]::new))
.toXdmNode(processor);
System.out.println(paramNode2);

Storing line number in ANTLR Parse Tree

Is there any way of storing line numbers in the created parse tree, using ANTLR 4? I came across this article, which does it but I think it's for older ANTLR version, because
parser.setASTFactory(factory);
It does not seem to be applicable to ANTLR 4.
I am thinking of having something like
treenode.getLine()
, like we can have
treenode.getChild()
With Antlr4, you normally implement either a listener or a visitor.
Both give you a context where you find the location of the tokens.
For example (with a visitor), I want to keep the location of an assignment defined by a Uppercase identifier (UCASE_ID in my token definition).
The bit you're interested in is ...
ctx.UCASE_ID().getSymbol().getLine()
The visitor looks like ...
static class TypeAssignmentVisitor extends ASNBaseVisitor<TypeAssignment> {
#Override
public TypeAssignment visitTypeAssignment(TypeAssignmentContext ctx) {
String reference = ctx.UCASE_ID().getText();
int line = ctx.UCASE_ID().getSymbol().getLine();
int column = ctx.UCASE_ID().getSymbol().getCharPositionInLine()+1;
Type type = ctx.type().accept(new TypeVisitor());
TypeAssignment typeAssignment = new TypeAssignment();
typeAssignment.setReference(reference);
typeAssignment.setReferenceToken(new Token(ctx.UCASE_ID().getSymbol().getLine(), ctx.UCASE_ID().getSymbol().getCharPositionInLine()+1));
typeAssignment.setType(type);
return typeAssignment;
}
}
I was new to Antlr4 and found this useful to get started with listeners and visitors ...
https://github.com/JakubDziworski/AntlrListenerVisitorComparison/

Produce a document and return a scalar with one XsltTransformer object?

I have a method that transforms an XML document into an HTML document.
Processor saxProc = ...
XsltTransformer trans = ...
XdmNode source = saxProc.newDocumentBuilder().build(new StreamSource(xmlFile));
trans.setInitialContextNode(source);
Serializer out = saxProc.newSerializer(htmlFile);
out.setOutputProperty(Serializer.Property.METHOD, "html");
trans.setDestination(out);
trans.transform();
I now need this method to make available a new class member whose scalar value is the result of an XPATH expression executed upon the same source XML file.
Perhaps the best thing to do is create an additional XsltTransformer to return the scalar value?
But after reading the doc for setDestination and Destination, I'm wondering, should I investigate the possibility of defining an additional destination that can receive the scalar value during the existing transform?
If you want to use XPath against your input document then use http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XPathSelector.html by calling http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/Processor.html#newXPathCompiler--, http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html#compile-java.lang.String-, http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XPathExecutable.html#load-- on your Processor.

CoreNLP : provide pos tags

I have text that is already tokenized, sentence-split, and POS-tagged.
I would like to use CoreNLP to additionally annotate lemmas (lemma), named entities (ner), contituency and dependency parse (parse), and coreferences (dcoref).
Is there a combination of commandline options and option file specifications that makes this possible from the command line?
According to this question, I can ask the parser to view whitespace as delimiting tokens, and newlines as delimiting sentences by adding this to my properties file:
tokenize.whitespace = true
ssplit.eolonly = true
This works well, so all that remains is to specify to CoreNLP that I would like to provide POS tags too.
When using the Stanford Parser standing alone, it seems to be possible to have it use existing POS tags, but copying that syntax to the invocation of CoreNLP doesn't seem to work. For example, this does not work:
java -cp *:./* -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props my-properties-file -outputFormat xml -outputDirectory my-output-dir -sentences newline -tokenized -tagSeparator / -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerMethod newCoreLabelTokenizerFactory -file my-annotated-text.txt
While this question covers programmatic invocation, I'm invoking CoreNLP form the commandline as part of a larger system, so I'm really asking whether this is possible to achieve this with commandline options.
I don't think this is possible with command line options.
If you want you can make a custom annotator and include it in your pipeline you could go that route.
Here is some sample code:
package edu.stanford.nlp.pipeline;
import edu.stanford.nlp.util.logging.Redwood;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.util.concurrent.MulticoreWrapper;
import edu.stanford.nlp.util.concurrent.ThreadsafeProcessor;
import java.util.*;
public class ProvidedPOSTaggerAnnotator {
public String tagSeparator;
public ProvidedPOSTaggerAnnotator(String annotatorName, Properties props) {
tagSeparator = props.getProperty(annotatorName + ".tagSeparator", "_");
}
public void annotate(Annotation annotation) {
for (CoreLabel token : annotation.get(CoreAnnotations.TokensAnnotation.class)) {
int tagSeparatorSplitLength = token.word().split(tagSeparator).length;
String posTag = token.word().split(tagSeparator)[tagSeparatorSplitLength-1];
String[] wordParts = Arrays.copyOfRange(token.word().split(tagSeparator), 0, tagSeparatorSplitLength-1);
String tokenString = String.join(tagSeparator, wordParts);
// set the word with the POS tag removed
token.set(CoreAnnotations.TextAnnotation.class, tokenString);
// set the POS
token.set(CoreAnnotations.PartOfSpeechAnnotation.class, posTag);
}
}
}
This should work if you provide your token with POS tokens separated by "_". You can change it with the forcedpos.tagSeparator property.
If you set customAnnotator.forcedpos = edu.stanford.nlp.pipeline.ProvidedPOSTaggerAnnotator
to the property file, include the above class in your CLASSPATH, and then include "forcedpos" in your list of annotators after "tokenize", you should be able to pass in your own pos tags.
I may clean this up some more and actually include it in future releases for people!
I have not had time to actually test this code out, if you try it out and find errors please let me know and I'll fix it!

ELKI DBSCAN R* tree index

In MiniGUi, I can see db.index. How do I set it to tree.spatial.rstarvariants.rstar.RStartTreeFactory via Java code?
I have implemented:
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,tree.spatial.rstarvariants.rstar.RStarTreeFactory);
For the second parameter of addParameter() function tree.spatial...RStarTreeFactory class not found
// Setup parameters:
ListParameterization params = new ListParameterization();
params.addParameter(
FileBasedDatabaseConnection.Parameterizer.INPUT_ID,
fileLocation);
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
RStarTreeFactory.class);
I am getting NullPointerException. Did I use RStarTreeFactory.class correctly?
The ELKI command line (and MiniGui; which is a command line builder) allow to specify shorthand class names, leaving out the package prefix of the implemented interface.
The full command line documentation yields:
-db.index <object_1|class_1,...,object_n|class_n>
Database indexes to add.
Implementing de.lmu.ifi.dbs.elki.index.IndexFactory
Known classes (default package de.lmu.ifi.dbs.elki.index.):
-> tree.spatial.rstarvariants.rstar.RStarTreeFactory
-> ...
I.e. for this parameter, the class prefix de.lmu.ifi.dbs.elki.index. may be omitted.
The full class name thus is:
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeFactory
or you just type RStarTreeFactory, and let eclipse auto-repair the import:
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
RStarTreeFactory.class);
// Bulk loading static data yields much better trees and is much faster, too.
params.addParameter(RStarTreeFactory.Parameterizer.BULK_SPLIT_ID,
SortTileRecursiveBulkSplit.class);
// Page size should fit your dimensionality.
// For 2-dimensional data, use page sizes less than 1000.
// Rule of thumb: 15...20 * (dim * 8 + 4) is usually reasonable
// (for in-memory bulk-loaded trees)
params.addParameter(AbstractPageFileFactory.Parameterizer.PAGE_SIZE_ID, 300);
See also: Geo Indexing example in the tutorial folder.

Resources