Modifying XPath 2.0 result trees using Saxon

Modifying XPath 2.0 result trees using Saxon - saxon

I would like to
add/remove/update elements/attributes/values to the "subTree"
be able to save the updated "targetDoc" back to the "target" file location.
determine which tree model would be best for this xpath + tree modification procedure.
I thought I should somehow be able to get a MutableNodeInfo object, but I don't know how to do this. I tried using the processor.setConfigurationProperty(FeatureKeys.TREE_MODEL, Builder.LINKED_TREE); but this still gives me an underlying node of TinyElementImpl. I require xpath 2.0 to avoid having to enter default namespaces, which is why I am using saxon s9api instead of Java's default DOM model. I would also like to avoid using xslt/xquery if possible because these tree modifications are being done dynamically, making xslt/xquery more complicated in my situation.
public static void main(String[] args) {
// XML File namesspace URIs
Hashtable<String, String> namespaceURIs = new Hashtable<>();
namespaceURIs.put("def", "http://www.cdisc.org/ns/def/v2.0");
namespaceURIs.put("xmlns", "http://www.cdisc.org/ns/odm/v1.3");
namespaceURIs.put("xsi", "http://www.w3.org/2001/XMLSchema-instance");
namespaceURIs.put("xlink", "http://www.w3.org/1999/xlink");
namespaceURIs.put("", "http://www.cdisc.org/ns/odm/v1.3");
// The source/target xml document
String target = "Path to file.xml";
// An xpath string
String xpath = "/ODM/Study/MetaDataVersion/ItemGroupDef[#OID/string()='IG.TA']";
Processor processor = new Processor(true);
// I thought this tells the processor to use something other than
// TinyTree
processor.setConfigurationProperty(FeatureKeys.TREE_MODEL,
Builder.LINKED_TREE);
DocumentBuilder builder = processor.newDocumentBuilder();
XPathCompiler xpathCompiler = processor.newXPathCompiler();
for (Entry<String, String> entry : namespaceURIs.entrySet()) {
xpathCompiler.declareNamespace(entry.getKey(), entry.getValue());
}
try {
XdmNode targetDoc = builder.build(Paths.get(target).toFile());
XPathSelector selector = xpathCompiler.compile(xpath).load();
selector.setContextItem(targetDoc);
XdmNode subTree = (XdmNode) selector.evaluateSingle();
// The following prints: class
// net.sf.saxon.tree.tiny.TinyElementImpl
System.out.println(subTree.getUnderlyingNode().getClass());
/*
* Here, is where I would like to modify subtree and save modified doc
*/
} catch (SaxonApiException e) {
e.printStackTrace();
}
}

I think you can supply a DOM node to Saxon and run XPath against it but it that case you don't use the document builder for Saxon's native trees, you build a DOM using the javax.xml.parsers.DocumentBuilder and once you have a W3C DOM node you can supply it to Saxon using the wrap method of a Saxon DocumentBuilder. Here is sample code taken from the file S9APIExamples.java in the Saxon 9.6 resources file:
// Build the DOM document
File file = new File("data/books.xml");
DocumentBuilderFactory dfactory = DocumentBuilderFactory.newInstance();
dfactory.setNamespaceAware(true);
javax.xml.parsers.DocumentBuilder docBuilder;
try {
docBuilder = dfactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
throw new SaxonApiException(e);
}
Document doc;
try {
doc = docBuilder.parse(new InputSource(file.toURI().toString()));
} catch (SAXException e) {
throw new SaxonApiException(e);
} catch (IOException e) {
throw new SaxonApiException(e);
}
// Compile the XPath Expression
Processor proc = new Processor(false);
DocumentBuilder db = proc.newDocumentBuilder();
XdmNode xdmDoc = db.wrap(doc);
XPathCompiler xpath = proc.newXPathCompiler();
XPathExecutable xx = xpath.compile("//ITEM/TITLE");
// Run the XPath Expression
XPathSelector selector = xx.load();
selector.setContextItem(xdmDoc);
for (XdmItem item : selector) {
XdmNode node = (XdmNode) item;
org.w3c.dom.Node element = (org.w3c.dom.Node) node.getExternalNode();
System.out.println(element.getTextContent());
}
There are also samples showing how to use Saxon with JDOM and other mutable tree implementations but I think you need Saxon PE or EE to have direct support for those.

The MutableNodeInfo interface in Saxon is designed very specifically to meet the needs of XQuery Update, and I would advise against trying to use it directly from Java; the implementation isn't likely to be robust when handling method calls other than those made by XQuery Update.
In fact, it's generally true that the Saxon NodeInfo interface is designed as a target for XPath, rather than for user-written Java code. I would therefore suggest using a third party tree model; the ones I like best are JDOM2 and XOM. Both of these allow you to mix direct Java navigation and update with use of XPath 2.0 navigation using Saxon.

Related

Parsing the swagger API doc (swagger.json) to Java objects

I want to parse any complex swagger-API-document(swagger.json) to Java objects.
may be List>
what are available options?
I am trying with io.swagger.parser.SwaggerParser.
but want to make sure that I know other available options and I use the correct parser which suffices to parse any complex document.
currently we are trying as below.
public List<Map<String,Object>> parse(String swaggerDocString) throws SwaggerParseException{
try{
Swagger swagger = new SwaggerParser().parse(swaggerDocString);
return processSwagger(swagger);
}catch(Exception ex){
String exceptionRefId=OSGUtil.getExceptionReferenceId();
logger.error("exception ref id " + exceptionRefId + " : Error while loading swagger file " + ex);
throw new SwaggerParseException("", ex.getLocalizedMessage(),exceptionRefId);
}
}
public List<Map<String,Object>> processSwagger(Swagger swagger){
List<Map<String,Object>> finalResult=new ArrayList<>();
Map<String, Model> definitions = swagger.getDefinitions();
// loop all the available paths of the swagger
if(swagger.getPaths()!=null && swagger.getPaths().keySet()!=null &&swagger.getPaths().keySet().size()>0 ){
swagger.getPaths().keySet().forEach(group->{
//get the path
Path path=swagger.getPath(group);
//list all the operations of the path
Map<HttpMethod,Operation> mapList=path.getOperationMap();
mapList.forEach((httpMethod,operation)->{
processPathData(finalResult,operation,path,group,httpMethod,definitions,group);
});
});
}
return finalResult;
}
whats the differences between
swagger-compat-spec-parser,
swagger-parser

swagger has the implementations for all the technologies.
https://swagger.io/tools/open-source/open-source-integrations/
and details for parsing swagger into Java is here.
https://github.com/swagger-api/swagger-parser/tree/v1

Jena alternative of ModelBuilder in RDF4J

Is there some Interface available in Apache Jena like ModelBuilder in RDF4J?
I can see ModelMaker in Jena but that is not something similar to builder I suppose.
Following is the function using rdf4j that need to be implemented in Jena:
public static org.eclipse.rdf4j.model.Model convertGraph2RDFModel(Graph graph, String label) {
ModelBuilder builder = new ModelBuilder();
GraphTraversalSource t = graph.traversal();
GraphTraversal<Vertex, Vertex> hasLabel = t.V().hasLabel(label);
Vertex s;
if(hasLabel.hasNext()){
s = hasLabel.next();
extractModelFromVertex(builder, s);
}
return builder.build();
}
private static void extractModelFromVertex(ModelBuilder builder, Vertex s) {
builder.subject(s.label());
Iterator<VertexProperty<String>> propertyIter = s.properties();
while (propertyIter.hasNext()){
VertexProperty<String> property = propertyIter.next();
builder.add(property.label(), property.value());
}
Iterator<Edge> edgeIter = s.edges(Direction.OUT);
Edge edge;
Stack<Vertex> vStack = new Stack<Vertex>();
while(edgeIter.hasNext()){
edge = edgeIter.next();
s = edge.inVertex();
builder.add(edge.label(), s.label());
vStack.push(s);
}
Iterator<Vertex> vIterator = vStack.iterator();
while(vIterator.hasNext()){
s = vIterator.next();
extractModelFromVertex(builder,s);
}
}

I don't know if Jena has similar functionality, but you could of course just continue using the RDF4J ModelBuilder, serialize its output Model to, say, a Turtle or TriG string (or file), then use Jena to read it in again.
org.eclipse.rdf4j.model.Model m = ...; // RDF4J Model built by the ModelBuilder
java.io.Writer writer = new StringWriter();
org.eclipse.rdf4j.rio.Rio.write(m, writer, RDFFormat.TRIG);
String = writer.toString();
// Use Jena's parser to read the string back in.
Or alternatively just iterate over the RDF4J model and convert each statement directly (without serializing and deserializing in between):
org.eclipse.rdf4j.model.Model rdf4jModel = ...; // RDF4J Model built by the ModelBuilder
org.apache.jena.rdf.model.Model jenaModel = ...; // (empty) Jena model to receive converted rdf4j model
rdf4jModel.forEach(stmt -> jenaModel.add(convert(stmt)));
...
public org.apache.jena.rdf.model.Statement convert(
org.eclipse.rdf4j.model.Statement stmt) {
... // create a Jena statement from the RDF4J one.
}
I'll admit it's probably easier to settle on using a single framework in most applications, but there's no fundamental reason you can't use bits of RDF4J and Jena in combination.

NEO4J Spatial: tips about batch inserter

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)
Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)
Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)
I list my software versions:
Neo4j: 2.0.2
Neo4jSpatial: 0.13-neo4j-2.0.1
Neo4jGraphCollections: 0.7.1-neo4j-2.0.1
Osmosis: 0.43.1
Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:
public class BatchInserterSinkTest implements Sink
{
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
static
{
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
}
#Override
public void initialize(Map<String, Object> arg0)
{
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
}
#Override
public void complete()
{
// TODO Auto-generated method stub
}
#Override
public void release()
{
// TODO Auto-generated method stub
}
#Override
public void process(EntityContainer ec)
{
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
spl.add(graphNode);
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
}
}
}
Then I wrote the following test case:
public class BatchInserterTest
{
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
#Test
public void batchInserter()
{
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
try
{
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
{
pbf = true;
}
else if (file.getName().endsWith(".gz"))
{
compression = CompressionMethod.GZip;
}
else if (file.getName().endsWith(".bz2"))
{
compression = CompressionMethod.BZip2;
}
RunnableSource reader;
if (pbf)
{
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
}
else
{
reader = new XmlReader(file, false, compression);
}
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
readerThread.start();
while (readerThread.isAlive())
{
try
{
readerThread.join();
}
catch (InterruptedException e)
{
/* do nothing */
}
}
}
catch (Exception e)
{
logger.error("Errore nella creazione di neo4j con batchInserter", e);
}
}
}
By executing this code, I get this exception:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
at java.lang.Thread.run(Thread.java:744)
This is related to this code:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the SimplePointLayer....so how can I create it by using the batchInserter graph db service?
Is there any little simple sample?
Any tip is really really appreciated
cheers
Angelo

The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.
What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

Lucene does not index some words?

I use leucene.net for my site and it Index some of the words fine and correct but it doesn't index some words like "الله"!
I have see the indexed file with Luke and it shows that "الله"is not indexed.
I have used ArabicAnalyzer for indexing.
you can see my site at www.qoranic.com , if you search "مریم" it will be ok but if you search "الله" it shows nothing.
any idea is appreciated in forward.

The ArabicAnalyzer does some transformation to that input; it will transform the input الله to له. This is due to the usage of the ArabicStemFilter (and ArabicStemmer) which is documented with ...
Stemming is defined as:
Removal of attached definite article, conjunction, and prepositions.
Stemming of common suffixes.
This shouldn't be an issue since you should be parsing the user provided query through the same analyzer when searching, producing the same tokens.
Here's the sample code I used to see what terms an analyzer produced from a given input.
using System;
using Lucene.Net.Analysis.AR;
using Lucene.Net.Analysis.Tokenattributes;
using System.IO;
namespace ConsoleApplication {
public static class Program {
public static void Main() {
var luceneVersion = Lucene.Net.Util.Version.LUCENE_30;
var input = "الله";
var analyzer = new ArabicAnalyzer(luceneVersion);
var inputReader = new StringReader(input);
var stream = analyzer.TokenStream("fieldName", inputReader);
var termAttribute = stream.GetAttribute<ITermAttribute>();
while(stream.IncrementToken()) {
Console.WriteLine("Term: {0}", termAttribute.Term);
}
Console.WriteLine("Done.");
Console.ReadLine();
}
}
}
You can overcome this behavior (remove the stemming) by writing a custom Analyzer which uses the ArabicNormalizationFilter, just as ArabicAnalyzer does, but without the call to ArabicStemFilter.
public class CustomAnalyzer : Analyzer {
public override TokenStream TokenStream(String fieldName, TextReader reader) {
TokenStream result = new ArabicLetterTokenizer(reader);
result = new LowerCaseFilter(result);
result = new ArabicNormalizationFilter(result);
return result;
}
}

Sun sTax, JAXB and turning off validation against DTD/XSD/schema

We are using JAXB in conjuction with sTAX XMLEventReaderAPI to parse and extract data xml retrieved by making a REST Call.
InputStream responseStream = response.getEntityInputStream();
if (responseStream != null)
{
XMLInputFactory xmlif = XMLInputFactory.newInstance();
// stax API
XMLEventReader xmler = xmlif.createXMLEventReader(new InputStreamReader(responseStream));
EventFilter filter = new EventFilter() {
public boolean accept(XMLEvent event) {
return event.isStartElement();
}
};
XMLEventReader xmlfer = xmlif.createFilteredReader(xmler, filter);
xmlfer.nextEvent();
// use jaxb
JAXBContext ctx = JAXBContext.newInstance(Summary.class);
Unmarshaller um = ctx.createUnmarshaller();
while (xmlfer.peek() != null) {
JAXBElement<CustomObject> se = um.unmarshal(xmler,
CustomObject.class);
CustomObject = se.getValue();
}
responseStream.close();
} else {
logger.error("InputStream response from API is null. No data to process");
}
response.close();
}
So Basically we parse using sTAX first then unarshall content using JAXB which unmarshalls it the CustomObject type. We do other stuff to this CustomObject type later.
However we ran into an issue as this chunk of code executes on JBoss AS 6.1.0.Final
We get an exception saying "The declaration for the entity "HTML.version" must end with '>'"
It appears that either sTAX or JAXB is validating against a DTD/XSD. The XSD is defined on the same server to which the REST call is made.
Because we are using SUN sTAX and not woodstox that there is no inherent DTD/XSD Validation that comes with it. There is no validation and the error cannot come from the sTAX call
Is that correct ?
If the issue is not validation failure with sTAX it has got to be JAXB.
However I cannot do the following:
um.setValidating(false);
because setValidating is a deprecated method.
Any ideas/suggestions on how to go about this ? Is our hypothesis correct ? Is this a known JBoss Issue perhaps ?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Modifying XPath 2.0 result trees using Saxon - saxon

Related

Parsing the swagger API doc (swagger.json) to Java objects

Jena alternative of ModelBuilder in RDF4J

NEO4J Spatial: tips about batch inserter

Lucene does not index some words?

Sun sTax, JAXB and turning off validation against DTD/XSD/schema

Categories

Resources