Jena Fuseki: Adding temporary triples to execution context during SPARQL-query with property functions - jena

I've been following the ARQ guide on Property Functions. The section on Graph Operations concludes with "New Triples or Graphs can therefore be created as part of the Property Function" and I've been hoping to use this as a means to add triples to the current query execution context (and not to persist), accessible for the remaining query.
I've been trying the code snippets in that section of the guide:
DatasetGraph datasetGraph = execCxt.getDataset();
Node otherGraphNode = NodeFactory.createURI("http://example.org/otherGraph");
Graph newGraph = new SimpleGraphMaker().createGraph();
Triple triple = ...
newGraph.add(triple);
datasetGraph.addGraph(otherGraphNode, newGraph);
but I'm running into issues, seemingly with the read-lock.
org.apache.jena.dboe.transaction.txn.TransactionException: Can't become a write transaction
at org.apache.jena.dboe.transaction.txn.Transaction.ensureWriteTxn(Transaction.java:251) ~[fuseki-server.jar:4.2.0]
at org.apache.jena.tdb2.store.StorageTDB.ensureWriteTxn(StorageTDB.java:200) ~[fuseki-server.jar:4.2.0]
at org.apache.jena.tdb2.store.StorageTDB.add(StorageTDB.java:81) ~[fuseki-server.jar:4.2.0]
at org.apache.jena.dboe.storage.system.DatasetGraphStorage.add(DatasetGraphStorage.java:181) ~[fuseki-server.jar:4.2.0]
at org.apache.jena.dboe.storage.system.DatasetGraphStorage.lambda$addGraph$1(DatasetGraphStorage.java:194) ~[fuseki-server.jar:4.2.0]
Is there any way to add triples to the execution context during a SPARQL query?

Yes, and SPARQL Anything uses that capability to triplify non-RDF data at query time and makes it available in the execution context's DatasetGraph.
Here is an example of that being done:
if (this.execCxt.getDataset().isEmpty()) {
// we only need to call getDatasetGraph() if we have an empty one
// otherwise we could triplify the same data multiple times
dg = getDatasetGraph(p, opBGP);
} else {
dg = this.execCxt.getDataset();
}
These lines
This answer might not address your particular need (adding individual triples) but hopefully some of code in the project could serve as an example of what you are looking for.

Related

Saxon - s9api - setParameter as node and access in transformation

we are trying to add parameters to a transformation at the runtime. The only possible way to do so, is to set every single parameter and not a node. We don't know yet how to create a node for the setParameter.
Current setParameter:
QName TEST XdmAtomicValue 24
Expected setParameter:
<TempNode> <local>Value1</local> </TempNode>
We searched and tried to create a XdmNode and XdmItem.
If you want to create an XdmNode by parsing XML, the best way to do it is:
DocumentBuilder db = processor.newDocumentBuilder();
XdmNode node = db.build(new StreamSource(
new StringReader("<doc><elem/></doc>")));
You could also pass a string containing lexical XML as the parameter value, and then convert it to a tree by calling the XPath parse-xml() function.
If you want to construct the XdmNode programmatically, there are a number of options:
DocumentBuilder.newBuildingStreamWriter() gives you an instance of BuildingStreamWriter which extends XmlStreamWriter, and you can create the document by writing events to it using methods such as writeStartElement, writeCharacters, writeEndElement; at the end call getDocumentNode() on the BuildingStreamWriter, which gives you an XdmNode. This has the advantage that XmlStreamWriter is a standard API, though it's not actually a very nice one, because the documentation isn't very good and as a result implementations vary in their behaviour.
Another event-based API is Saxon's Push class; this differs from most push-based event APIs in that rather than having a flat sequence of methods like:
builder.startElement('x');
builder.characters('abc');
builder.endElement();
you have a nested sequence:
Element x = Document.elem('x');
x.text('abc');
x.close();
As mentioned by Martin, there is the "sapling" API: Saplings.doc().withChild(elem(...).withChild(elem(...)) etc. This API is rather radically different from anything you might be familiar with (though it's influenced by the LINQ API for tree construction on .NET) but once you've got used to it, it reads very well. The Sapling API constructs a very light-weight tree in memory (hance the name), and converts it to a fully-fledged XDM tree with a final call of SaplingDocument.toXdmNode().
If you're familiar with DOM, JDOM2, or XOM, you can construct a tree using any of those libraries and then convert it for use by Saxon. That's a bit convoluted and only really intended for applications that are already using a third-party tree model heavily (or for users who love these APIs and prefer them to anything else).
In the Saxon Java s9api, you can construct temporary trees as SaplingNode/SaplingElement/SaplingDocument, see https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingDocument.html and https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/sapling/SaplingElement.html.
To give you a simple example constructing from a Map, as you seem to want to do:
Processor processor = new Processor();
Map<String, String> xsltParameters = new HashMap<>();
xsltParameters.put("foo", "value 1");
xsltParameters.put("bar", "value 2");
SaplingElement saplingElement = new SaplingElement("Test");
for (Map.Entry<String, String> param : xsltParameters.entrySet())
{
saplingElement = saplingElement.withChild(new SaplingElement(param.getKey()).withText(param.getValue()));
}
XdmNode paramNode = saplingElement.toXdmNode(processor);
System.out.println(paramNode);
outputs e.g. <Test><bar>value 2</bar><foo>value 1</foo></Test>.
So the key is to understand that withChild() returns a new SaplingElement.
The code can be compacted using streams e.g.
XdmNode paramNode2 = Saplings.elem("root").withChild(
xsltParameters
.entrySet()
.stream()
.map(p -> Saplings.elem(p.getKey()).withText(p.getValue()))
.collect(Collectors.toList())
.toArray(SaplingElement[]::new))
.toXdmNode(processor);
System.out.println(paramNode2);

Jena read hook not invoked upon duplicate import read

My problem will probably be explained better with code.
Consider the snippet below:
// First read
OntModel m1 = ModelFactory.createOntologyModel();
RDFDataMgr.read(m1,uri0);
m1.loadImports();
// Second read (from the same URI)
OntModel m2 = ModelFactory.createOntologyModel();
RDFDataMgr.read(m2,uri0);
m2.loadImports();
where uri0 points to a valid RDF file describing an ontology model with n imports.
and the following custom ReadHook (which has been set in advance):
#Override
public String beforeRead(Model model, String source, OntDocumentManager odm) {
System.out.println("BEFORE READ CALLED: " + source);
}
Global FileManager and OntDocumentManager are used with the following settings:
processImports = true;
caching = true;
If I run the snippet above, the model will be read from uri0 and beforeRead will be invoked exactly n times (once for each import).
However, in the second read, beforeRead won't be invoked even once.
How, and what should I reset in order for Jena to invoke beforeRead in the second read as well?
What I have tried so far:
At first I thought it was due to caching being on, but turning it off or clearing it between the first and second read didn't do anything.
I have also tried removing all ignoredImport records from m1. Nothing changed.
Finally got to solve this. The problem was in ModelFactory.createOntologyModel(). Ultimately, this gets translated to ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_RDFS_INF,null).
All ontology models created with the static OntModelSpec.OWL_MEM_RDFS_INF will have their ImportsModelMaker and some of its other objects shared, which results in a shared state. Apparently, this state has blocked the reading hook to be invoked twice for the same imports.
This can be prevented by creating a custom, independent and non-static OntModelSpec instance and using it when creating an OntModel, for example:
new OntModelSpec( ModelFactory.createMemModelMaker(), new OntDocumentManager(), RDFSRuleReasonerFactory.theInstance(), ProfileRegistry.OWL_LANG );

Is it possible to keep track of the state when processing logs by DataFlow?

Is it possible to have the DataFlow process maintain the state. There are log processing tools that allow for that by providing fast access (propriety / in-memory) files available for the real time process to keep track of the state on the logs while processing them.
A use case example would be with tracking registration steps taken by users. The registration steps would come in different logs and the data form those logs would be assembled by the real time process into one final database record (for each registered user) that is written to a database.
Can my DataFLow code keep track of the many registration steps (streaming input) by users and once user's registration steps are completed then have the DataFLow process write the records to the database (one record per user).
I don't know much about DataFlow architecture. It must be using some (proprietary / in-memory nosql) data storage for keeping track of things it needs to keep track of (ex. when it tries to produce top 100 customers). Is that fast access data storage also available to the DataFlow processes to use?
Thanks
As danielm said, state is not yet exposed. The good news is you may not need it for your use case.
If you have a PCollection<KV<UserId, LogEvent>> you can use a CombineFn and Combine.perKey to take all of the LogEvents for a specific UserId and combine them into a single output. The CombineFn tells Dataflow how to create an accumulator, update it by incorporating input elements, and then extract a final output. Transforms like Top actually use a CombineFn (with a Heap as the accumulator) rather than an actual state API.
If your events are of different types, you can still do something like this. For instance, if you have two logs, you can do:
PCollection<KV<UserId, LogEvent1>> events1 = ...;
PCollection<KV<UserId, LogEvent2>> events2 = ...;
// Create tuple tags for the value types in each collection.
final TupleTag<LogEvent1> tag1 = new TupleTag<LogEvent1>();
final TupleTag<LogEvent2> tag2 = new TupleTag<LogEvent2>();
//Merge collection values into a CoGbkResult collection
PCollection<KV<UserIf, CoGbkResult>> coGbkResultCollection =
KeyedPCollectionTuple.of(tag1, pt1)
.and(tag2, pt2)
.apply(CoGroupByKey.<UserId>create());
// Access results and do something.
PCollection<T> finalResultCollection =
coGbkResultCollection.apply(ParDo.of(
new DoFn<KV<K, CoGbkResult>, T>() {
#Override
public void processElement(ProcessContext c) {
KV<K, CoGbkResult> e = c.element();
// Get all LogEvent1 values
Iterable<LogEvent1> event1s = e.getValue().getAll(tag1);
// There will only be one LogEvent2
LogEvent2 event2 = e.getValue().getOnly(tag2);
... Do Something to compute T ....
c.output(...some T...);
}
}));
The above example was adapted from docs on CoGroupByKey which have information.
Dataflow does not currently expose the underlying state mechanism that it uses. However, this is definitely on the radar for a future update.

ELKI DBSCAN R* tree index

In MiniGUi, I can see db.index. How do I set it to tree.spatial.rstarvariants.rstar.RStartTreeFactory via Java code?
I have implemented:
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,tree.spatial.rstarvariants.rstar.RStarTreeFactory);
For the second parameter of addParameter() function tree.spatial...RStarTreeFactory class not found
// Setup parameters:
ListParameterization params = new ListParameterization();
params.addParameter(
FileBasedDatabaseConnection.Parameterizer.INPUT_ID,
fileLocation);
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
RStarTreeFactory.class);
I am getting NullPointerException. Did I use RStarTreeFactory.class correctly?
The ELKI command line (and MiniGui; which is a command line builder) allow to specify shorthand class names, leaving out the package prefix of the implemented interface.
The full command line documentation yields:
-db.index <object_1|class_1,...,object_n|class_n>
Database indexes to add.
Implementing de.lmu.ifi.dbs.elki.index.IndexFactory
Known classes (default package de.lmu.ifi.dbs.elki.index.):
-> tree.spatial.rstarvariants.rstar.RStarTreeFactory
-> ...
I.e. for this parameter, the class prefix de.lmu.ifi.dbs.elki.index. may be omitted.
The full class name thus is:
de.lmu.ifi.dbs.elki.index.tree.spatial.rstarvariants.rstar.RStarTreeFactory
or you just type RStarTreeFactory, and let eclipse auto-repair the import:
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
RStarTreeFactory.class);
// Bulk loading static data yields much better trees and is much faster, too.
params.addParameter(RStarTreeFactory.Parameterizer.BULK_SPLIT_ID,
SortTileRecursiveBulkSplit.class);
// Page size should fit your dimensionality.
// For 2-dimensional data, use page sizes less than 1000.
// Rule of thumb: 15...20 * (dim * 8 + 4) is usually reasonable
// (for in-memory bulk-loaded trees)
params.addParameter(AbstractPageFileFactory.Parameterizer.PAGE_SIZE_ID, 300);
See also: Geo Indexing example in the tutorial folder.

Not able to access iterator() from RestTraverser, it gives exception java.lang.IllegalAccessError

I am implementing traversal framework using neo4j java-rest-binding project.
Code is as follows:
RestAPI db = new RestAPIFacade("http://localhost:7474/db/data");
RestNode n21 = db.getNodeById(21);
Map<String,Object> traversalDesc = new HashMap<String, Object>();
traversalDesc.put("order", "breadth_first");
traversalDesc.put("uniqueness", "node_global");
traversalDesc.put("uniqueness", "relationship_global");
traversalDesc.put("returnType", "fullpath");
traversalDesc.put("max_depth", 2);
RestTraverser traverser = db.traverse(n21, traversalDesc);
Iterable<Node> nodes = traverser.nodes();
System.out.println("All Nodes:"); // First Task
for(Node n:nodes){
System.out.println(n.getId());
}
Iterable<Relationship> rels = traverser.relationships();
System.out.println("All Relations:"); // Second Task
for(Relationship r:rels){
System.out.println(r.getId());
}
Iterator<Path> paths = traverser.iterator(); // Third Task
while(paths.hasNext()){
System.out.println(paths.next());
}
I need to do 3 tasks as commented in the code:
Print all the node IDs related to node no. 21
Print all the relation IDs related to node no. 21
Traverse all the paths related to node no. 21
Tasks 1 & 2 are working fine.
But when I try to do traverser.iterator() in 3rd task it throws an Exception saying:
java.lang.IllegalAccessError: tried to access class org.neo4j.helpers.collection.WrappingResourceIterator from class org.neo4j.rest.graphdb.traversal.RestTraverser
Can anyone please check why this is happening or if I am doing wrong then what is the right method to do it.
Thanks in Advance.
I don't believe using the Neo4j Traversal Framework via the REST DB binding is properly supported, nor is it advisable. If you traverse via REST, each node and each relationship will be retrieved over the network as the traversal proceeds, incurring a tremendous overhead for the traversal.
Edit: The above is not true, the REST traverser is smarter than I thought.
In general, it will be faster to use Cypher, and access the Neo4j Server using JDBC. Read more about JDBC here: https://github.com/neo4j-contrib/neo4j-jdbc
If you really want to use the Traversal Framework, you should use Server Extensions, which allow you to design a traversal to run on the server itself, and then only move the result of the traversal over the network. Read more about server extensions here: http://docs.neo4j.org/chunked/stable/server-unmanaged-extensions.html

Resources