Produce a document and return a scalar with one XsltTransformer object? - saxon

I have a method that transforms an XML document into an HTML document.
Processor saxProc = ...
XsltTransformer trans = ...
XdmNode source = saxProc.newDocumentBuilder().build(new StreamSource(xmlFile));
trans.setInitialContextNode(source);
Serializer out = saxProc.newSerializer(htmlFile);
out.setOutputProperty(Serializer.Property.METHOD, "html");
trans.setDestination(out);
trans.transform();
I now need this method to make available a new class member whose scalar value is the result of an XPATH expression executed upon the same source XML file.
Perhaps the best thing to do is create an additional XsltTransformer to return the scalar value?
But after reading the doc for setDestination and Destination, I'm wondering, should I investigate the possibility of defining an additional destination that can receive the scalar value during the existing transform?

If you want to use XPath against your input document then use http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XPathSelector.html by calling http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/Processor.html#newXPathCompiler--, http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html#compile-java.lang.String-, http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/XPathExecutable.html#load-- on your Processor.

Related

Find function/method body explicit dependency types using Dart analyzer package

I would like to understand how can I analyze methods / functions body to find types that are explicitly referenced from it. I have success analyzing method declaration (return type, parameter types, etc..), however I have no idea how to do that for body.
Assuming following function:
String someFunction(int param) {
final list = <String>['a', 'b', 'c']; // -> DartTypes: String, List<String>
final myClass = MyClass<Arg>(); // -> DartTypes: Arg, MyClass<Arg>
final functionCall = anotherFunction<FunctionArg<Arg>>(); // -> DartTypes: Arg, FunctionArg<Arg>
return 'result';
}
// At is point I would like to know that my function depends on
// String, List<String>, Arg, MyClass<Arg>, FunctionArg<Arg>
// in term of DartType instances with proper typeArguments.
I tried getting AstNode for method element described here: https://stackoverflow.com/a/57043177/2033394
However I could not get elements from nodes to figure out their types. Their declaredElement values are always null. So I can not get back to Element API from AST API.
If you've used the exact snippet from the answer you've referenced, the problem is likely in getParsedLibraryByElement(). This method only parses the referenced library - meaning that you'll get an AST that doesn't necessarily have semantic references (like the declaredElement of AST nodes) set.
Instead, you'll want to use getResolvedLibraryByElement. The AST returned by that method will have its types and references fully resolved.
With the resolved AST, you could visit the body of the method with a custom visitor to find type references. Your definition of "referenced types" isn't really exact - but perhaps you can collect types in visitNamedType for type references and visitVariableDeclaration to collect the types of variables.

Saxon - s9api - setParameter as node and access in transformation

we are trying to add parameters to a transformation at the runtime. The only possible way to do so, is to set every single parameter and not a node. We don't know yet how to create a node for the setParameter.
Current setParameter:
QName TEST XdmAtomicValue 24
Expected setParameter:
<TempNode> <local>Value1</local> </TempNode>
We searched and tried to create a XdmNode and XdmItem.
If you want to create an XdmNode by parsing XML, the best way to do it is:
DocumentBuilder db = processor.newDocumentBuilder();
XdmNode node = db.build(new StreamSource(
new StringReader("<doc><elem/></doc>")));
You could also pass a string containing lexical XML as the parameter value, and then convert it to a tree by calling the XPath parse-xml() function.
If you want to construct the XdmNode programmatically, there are a number of options:
DocumentBuilder.newBuildingStreamWriter() gives you an instance of BuildingStreamWriter which extends XmlStreamWriter, and you can create the document by writing events to it using methods such as writeStartElement, writeCharacters, writeEndElement; at the end call getDocumentNode() on the BuildingStreamWriter, which gives you an XdmNode. This has the advantage that XmlStreamWriter is a standard API, though it's not actually a very nice one, because the documentation isn't very good and as a result implementations vary in their behaviour.
Another event-based API is Saxon's Push class; this differs from most push-based event APIs in that rather than having a flat sequence of methods like:
builder.startElement('x');
builder.characters('abc');
builder.endElement();
you have a nested sequence:
Element x = Document.elem('x');
x.text('abc');
x.close();
As mentioned by Martin, there is the "sapling" API: Saplings.doc().withChild(elem(...).withChild(elem(...)) etc. This API is rather radically different from anything you might be familiar with (though it's influenced by the LINQ API for tree construction on .NET) but once you've got used to it, it reads very well. The Sapling API constructs a very light-weight tree in memory (hance the name), and converts it to a fully-fledged XDM tree with a final call of SaplingDocument.toXdmNode().
If you're familiar with DOM, JDOM2, or XOM, you can construct a tree using any of those libraries and then convert it for use by Saxon. That's a bit convoluted and only really intended for applications that are already using a third-party tree model heavily (or for users who love these APIs and prefer them to anything else).
In the Saxon Java s9api, you can construct temporary trees as SaplingNode/SaplingElement/SaplingDocument, see https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingDocument.html and https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingElement.html.
To give you a simple example constructing from a Map, as you seem to want to do:
Processor processor = new Processor();
Map<String, String> xsltParameters = new HashMap<>();
xsltParameters.put("foo", "value 1");
xsltParameters.put("bar", "value 2");
SaplingElement saplingElement = new SaplingElement("Test");
for (Map.Entry<String, String> param : xsltParameters.entrySet())
{
saplingElement = saplingElement.withChild(new SaplingElement(param.getKey()).withText(param.getValue()));
}
XdmNode paramNode = saplingElement.toXdmNode(processor);
System.out.println(paramNode);
outputs e.g. <Test><bar>value 2</bar><foo>value 1</foo></Test>.
So the key is to understand that withChild() returns a new SaplingElement.
The code can be compacted using streams e.g.
XdmNode paramNode2 = Saplings.elem("root").withChild(
xsltParameters
.entrySet()
.stream()
.map(p -> Saplings.elem(p.getKey()).withText(p.getValue()))
.collect(Collectors.toList())
.toArray(SaplingElement[]::new))
.toXdmNode(processor);
System.out.println(paramNode2);

How can I write FlowFile attributes to Avro metadata inside the FlowFile's content?

I am creating FlowFiles that are manipulated and split downstream after being emitted by an ExecuteSql processor. I have populated the FlowFiles' attributes with data that I want to put into the Avro metadata contained within each FlowFile's content.
How can I do this?
I've tried using an UpdateRecord processor configured with an AvroReader and AvroRecordSetWriter and a property with a key of /canary that should be writing a FlowFile attribute to that key somewhere in the Avro document. It does not appear anywhere in the output, though.
It would be acceptable to move the records in the Avro data to a subkey and have a metadata section be a part of the record data. I would prefer not to do this, though, because it does not seem like the correct solution and because it sounds much more complex than simply modifying the Avro metadata.
The record-aware processors (and the Readers/Writers) are not metadata-aware, meaning they cannot currently (as of NiFi 1.5.0) act on metadata in any way (inspect, create, delete, etc.), so UpdateRecord won't work for metadata per se. With your /canary property key, it will try to insert a field into your Avro record at the top level, named canary, and should have the value you specify. However I believe your output schema needs to have the canary field added at the top level, or it may be ignored (I'm not positive of this, you can check the output schema to see if it is added automatically).
There is currently no NiFi processor that can update Avro metadata explicitly (MergeContent does some with regards to merging various Avro files together, but you can't choose to set a value, e.g.). However I have an unpolished Groovy script you could use in ExecuteScript to add metadata to Avro files in NiFi 1.5.0+. In ExecuteScript you would set the language to Groovy and the following as the Script Body, then add user-defined (aka "dynamic" properties) to ExecuteScript, where the key will be the metadata key, and the evaluated value (the properties support Expression Language) will be the value:
#Grab('org.apache.avro:avro:1.8.1')
import org.apache.avro.*
import org.apache.avro.file.*
import org.apache.avro.generic.*
def flowFile = session.get()
if(!flowFile) return
try {
// Save off dynamic property values for metadata key/values later
def metadata = [:]
context.properties.findAll {e -> e.key.dynamic}.each {k,v -> metadata.put(k.name, context.getProperty(k).evaluateAttributeExpressions(flowFile).value.bytes)}
flowFile = session.write(flowFile, {inStream, outStream ->
DataFileStream<GenericRecord> reader = new DataFileStream<>(inStream, new GenericDatumReader<GenericRecord>())
DataFileWriter<GenericRecord> writer = new DataFileWriter<>(new GenericDatumWriter<GenericRecord>())
def schema = reader.schema
def inputCodec = reader.getMetaString(DataFileConstants.CODEC) ?: DataFileConstants.NULL_CODEC
// Forward the existing metadata to the output
reader.metaKeys.each { key ->
if (!DataFileWriter.isReservedMeta(key)) {
byte[] metadatum = reader.getMeta(key)
writer.setMeta(key, metadatum)
}
}
// For each dynamic property, set the key/value pair as Avro metadata
metadata.each {k,v -> writer.setMeta(k,v)}
writer.setCodec(CodecFactory.fromString(inputCodec))
writer.create(schema, outStream)
writer.appendAllFrom(reader, false)
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
} catch(e) {
log.error('Error adding Avro metadata, penalizing flow file and routing to failure', e)
flowFile = session.penalize(flowFile)
session.transfer(flowFile, REL_FAILURE)
}
Note that this script can work with versions of NiFi previous to 1.5.0, but the #Grab at the top is not supported until 1.5.0, so you'd have to download Avro and its dependencies into a flat folder, and point to that in the Module Directory property of ExecuteScript.

Converting Stanford dependency relation to dot format

I am a newbie to this field. I have dependency relation in this form:
amod(clarity-2, sound-1)
nsubj(good-6, clarity-2)
cop(good-6, is-3)
advmod(good-6, also-4)
neg(good-6, not-5)
root(ROOT-0, good-6)
nsubj(ok-10, camera-8)
cop(ok-10, is-9)
ccomp(good-6, ok-10)
As mentioned in the links we have to convert this dependency relation to dot format and then use Graphviz for drawing a 'dependency tree'. I am not able to understand how to pass this dependency relation to toDotFormat() function of edu.stanford.nlp.semgraph.SemanticGraph. When I give this string, 'amod(clarity-2, sound-1)' as input to toDotFormat() am getting the output in this form digraph amod(clarity-2, sound-1) { }.
I am trying the solution given here how to get a dependency tree with Stanford NLP parser
You need to call toDotFormat on an entire dependency tree. How have you generated these dependency trees in the first place?
If you're using the StanfordCoreNLP pipeline, adding in the toDotFormat call is easy:
Properties properties = new Properties();
props.put("annotators", "tokenize, ssplit, pos, depparse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
String text = "This is a sentence I want to parse.";
Annotation document = new Annotation(text);
pipeline.annotate(document);
// these are all the sentences in this document
// a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
// this is the Stanford dependency graph of the current sentence
SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
System.out.println(dependencies.toDotFormat());
}

Saxon XPath API returns TinyElementImpl instead of org.w3c.dom.Node

I have the following code:
// xpath evaluates to net.sf.saxon.xpath.XPathEvaluator
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expression = xpath.compile("/foo/bar");
Object evaluate = expression.evaluate(someXML, XPathConstants.NODE);
Object evaluate2 = expression.evaluate(someXML, XPathConstants.NODESET);
System.out.println(evaluate!=null?evaluate.getClass():"null");
System.out.println(evaluate2!=null?evaluate2.getClass():"null2");
System.out.println(evaluate instanceof Node);
System.out.println(evaluate2 instanceof NodeList);
and this is the result...
class net.sf.saxon.tinytree.TinyElementImpl
class java.util.ArrayList
false
false
Just to clarify, if I do this:
org.w3c.dom.Node node = (org.w3c.dom.Node)evaluate;
or
org.w3c.dom.NodeList node = (org.w3c.dom.NodeList)evaluate2;
I get a ClassCastException
How can that be? according to Suns Java 1.5 API NODE and NODESET should map to org.w3c.dom.Node and org.w3c.dom.NodeList respectively
Just to clarify2 yes I know Node is an iterface, that getClass() returns a concrete class.
Ok I figured it out!
If the evaluate method receives an InputSource the above error occurs.
e.g.
InputSource someXML = new InputSource(new StringReader("<someXML>...</someXML>)");
Object result = expression.evaluate(someXML, XPathConstants.NODE);
Node node = (Node) result; // ClassCastException
Then result is not implementing org.w3c.dom.Node (TinyElementImpl)
But if evaluate receives a Node (or a Document):
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = builderFactory.newDocumentBuilder();
Document someXML = documentBuilder.parse(new InputSource(new StringReader("<someXML>...</someXML>)"));
Object result = expression.evaluate(someXML, XPathConstants.NODE);
Node node = (Node) result; // works
It works, but still, this is weird...
Try this code:
Object evaluate = expression.evaluate(someXML, XPathConstants.NODE);
System.out.println(evaluate instanceof Node);
System.out.println(NodeOverNodeInfo.wrap((NodeInfo) evaluate) instanceof Node);
It prints:
false
true
The returned object is of type NodeInfo, so you need wrap it as a real Node, so you can access its methods:
Node n = NodeOverNodeInfo.wrap((NodeInfo) evaluate);
System.out.println(n.getNodeName());
System.out.println(n.getTextContent());
It's a bit odd, this one. The Saxon javadoc says that TinyElementImpl doesn't implement any of the org.w3c.dom interfaces, and yet you're getting them back from the XPath evaluation.
My guess is that Saxon eschews the standard DOM model in favour of its own one. I suspect that the XPathConstants.NODE that you pass to evaluate is really just a hint. It's permitted for XPath expressions to return any old thing (for example, Apache JXPath uses XPath expressions to query java objects graphs), so it's permitted for Saxon to return its own DOM types rather than org.w3c standard ones.
Solution: either use the Saxon DOM types as returned, or don't use Saxon.
Node is an interface. You have to have a concrete class for implementation. And getClass() returns that concrete class.
Edit in response to comment:
Sorry, I didn't pay attention to the instanceof. Looking at the source code, it appears that TinyNodeImpl doesn't implement org.w3c.dom.Node. And looking at the JDK docs, it appears that it doesn't have to: the doc for javax.xml.XPath refers you to XPathConstants for the result type, and it refers to the "The XPath 1.0 NodeSet data type" (which, if you look at the XPath 1.0 spec, is not defined).
So, it seems that returns from the XPath API are only required to be consistent when used within that API. Not exactly what you wanted to hear, I'm sure. Can you use the built-in JDK implementation? I know that it returns org.w3c.dom objects.
kdgregory is correct that Node is just an interface, and TinyElementImpl implements that interface. expression.evaluate() can't return an instance of Node, it has to return a concrete class which implements node.
It might be useful to point out that you can use an instance of TinyElementImpl as as Node, and you can easily cast instances of TinyElementImp to Node.
For example, this should work just fine:
Node result = (Node) expression.evaluate(someXML, XPathConstants.NODE);
You can then use result by calling any of the methods of Node, and by passing it to any method which accepts a Node.

Resources