Retrieving statement in Jena by its unique ID - jena

I'm building a REST API which will serve information about statements stored in my Jena TDB.
It would be great if each statement has its unique ID so I can use this ID in GET request to retrieve information about particular statement. Is there something like that in Jena?
I know I can retrieve statement(s) by providing appropriate subject/predicate/object identifiers to model.listStatements method, but it would be quite ugly to add these parameters to API GET requests.

In RDF, a triple is defined by its subject, object and predicate. If you have two triples with the same S/P/O, it is really the same triple (value-equality, not instance equality). An RDF graph is a set of triples; if you add a triple twice, the set has only one instance. There is no triple id concept in RDF, and there isn't internally in TDB.

So you could use unique identifiers, say a string of length 4, for every S, every P and every O. Just save them all as key/value (id/resource, id/property) pairs. Then you will have a string of length 12 as unique identifier of your statement.
Even if a statements is deleted and added again, leading to a different id when tagging every statement with an id, this method will yield the same statement every time.

Related

How to handle cypher query common stanzas

I'm writing a bunch of queries in order to build a tree inside Neo4j, but in order to add different types of new data, I'm writing the same opening stanzas for each of my queries.
Example: I want to be able to add Root(identifier=Root1)->A(identifier=1)->B(identifier=2)... without modifying the trees pointed to by other roots.
All of my queries start off with
Match
(root:`Root` {identifier=$identifier})
Create
(root)-[:`someRel`]->(a:`A` {identifier=$a_identifier})
Then some time passes and A needs a child:
Match
(root:`Root` {identifier=$identifier})
-[:`someRel`]->
(a:`A` {identifier=$a_identifier})
Create
(a)-[:`someOtherRel`]->(b:`B` {identifier=$b_identifier})
Then some other time passes and maybe B needs a child, and I have to use the same opening stanza to get to A and then add another one to get the correct B.
Is there some functionality that I'm missing that will allow me to not have to build up those opening stanzas every time I want to get to the correct B, (or C or D) or do I just need to do this using string concatenation?
String concatenation example: (python)
MATCH
{ROOT_LOOKUP_STANZA},
{A_LOOKUP_STANZA},
{B_LOOKUP_STANZA},
CREATE
(b)-[:`c_relationship`]->(c:`C` {...})
Some additional notes:
Root Nodes have to be uniquely identified
The rest of the nodes have to be uniquely identified with their parents. So the following is valid:
Root(root)->A(a)->B(b)
Root(root)->A(a1)->B(b)
In this case B(b) references two different nodes because their parents are different.
So your main problem is that children do not have unique ids, only the root nodes have unique ids. Neo4j does not have have any mechanic (yet) to carry the final context of one query into the start of another, and makes no guarantee that a nodes internal id will be the same between queries. So for your data as is, you must match the whole chain to be sure you match the correct node to append to. There are a few things you can do make this not necessary though.
Add a UUID
By adding a universally unique id to each node (, and indexing that property,) you will be able to match on that id with the guarantee that there are no collisions and it will be the same across queries. Any time using a nodes internal ID would be useful, that is a good sign you could use UUID's in your data. (Also helps if the data is mirrored to other databases)
Store the path as a Unique ID
It's possible you don't know the UUID assigned in Neo4j (because it's not in the source data), but in a tree you can create a unique ID in the format of <parent-ID>_<index><sorted-labels><source-id>. The idea here is that the parent is guaranteed to have a unique id, and you combine that id with the info that makes this child unique to that parent. This allows you do generate a deterministic unique ID. (Requires a tree data structure, with a unique root id) In most cases, you can probably leave the index part out (that is for cases of lists/arrays in the source data). In essence, you are storing the path from the root node to this node as the nodes unique id. (Again, you will want an index on this id)
Batch the job
If this is all part of just one job, another option is pool the changes you want to make, and generate one cypher that will do all of them while Neo4j already has everything fetched.

How to match nodes only by value? (not defining specific property)

I want to search nodes by value only, which can be in any of node properties. I know that this is an expensive operation, but nodes will be cut off by some relationship conditions.
I want something like this:
MATCH (n: {*:"Search value"})
RETURN n
Where * imply "any property".
Is there a way to do this?
interesting tidbits can be found in this abstract regarding this topic and why it might not be implemented
https://docs.google.com/document/d/1FPfGkgzhcRXVkleBLBsA92U94Mx4yafu3nO-Xf-NzsE/edit#heading=h.pyvdg2rbofq
Semantics of dynamic key expressions
Using a dynamic key expression like <mapExpr>[<keyExpr>] requires that <mapExpr> evaluates to a map or an entity, and that <keyExpr> evaluates to a string. If this is not the case, a type error is produced either at compile time or at runtime.
If this is given, evaluating <mapExpr>[<keyExpr>] first evaluates keyExpr to a string value (the key), and then evaluates <mapExpr> to a map-like value (the map). Finally the result of <mapExpr>[<keyExpr>] is computed by performing a lookup of the key in the map. If the key is found, the associated value becomes the result. If the key is not found, <mapExpr>[<keyExpr>] evaluates to NULL.
Thus the result of evaluating <mapExpr>[<keyExpr>] can be any value (including NULL
Caveats
Dynamic property lookup might entice users to encode information in property key names. This is bad practice as it interferes with planning, leads to unnatural data models, and might lead to exhausting the available property key id space. This is addressed by issuing a warning when a query uses a dynamic property lookup with a dynamic property key name.
To my knowledge, no. Seems to me that what you really are trying to do would be better achieved by creating a search index over the graph using something like elasticsearch or solr. This would give you the ability to search over all properties. Your choice of analyzer whilst indexing would give you the option of exact or partial value matches.

Specifying the primitive type of a property in a Cypher CREATE clause

Contrary to what's possible with the Java API, there doesn't seem to be a way to specify whether a numeric property is a byte, short, int or long:
CREATE (n:Test {value: 1}) RETURN n
always seems to create a long property. I've tried toInt(), but it is obviously understood in the mathematical sense of "integer" more than in the computer data type sense.
Is there some way I'm overlooking to actually force the type?
We have defined a model and want to insert test data using Cypher statements, but the code using the data then fails with a ClassCastException since the types don't match.
If you run your cypher queries with the embedded API then
you can provide parameters in a hashmap with the correctly typed values.
For remote users it doesn't really matter as it goes through JSON serialization back and forth which looses the type information anyway. So it is just "numeric".
Why do you care about the numeric type?
you can also just use ((Number)n.getProperty("value")).xxxValue() (xxx = int,long,byte)

Combining two neo4j cypher calls into one call

I have the following two cypher calls that I'd like to combine into one;
start r=relationship:link("key:\"foo\" and value:\"bar\"") return r.guid
This returns a relationship that contains a guid that I need based on a key value pair (in this case key:foo and value:bar).
Lets assume r.guid above returns 12345.
I then need all the property relationships for the object in question based on the returned guid and a property type key;
start r=relationship:properties("to:\"12345\" and key:\"baz\"") return r
This returns several relationships which have the values I need, in this case all property types baz that belong to guid 12345.
How do I combine these two calls into one? I'm sure its simple but I'm stumbling..
The answer I've gotten is that there is no way to perform an index lookup in the middle of a Cypher query, or to use a variable you have declared to perform the lookup.
Perhaps in later version of Cypher, as this ability should be standard especially with the dense node issue and the suggested solution of indexing.

How best to map this structure onto amazon SimpleDB

So simpledb has a kind of spreadsheet data model.
I have an app that simply needs to store keys against values. Except that a single key can have multiple values.
There will be multiple clients. Each client has an id with it's own set of keys.
I'd like to stick with a single domain if I can at this stage.
How can I map this onto simpleDB?
I was thinking
domain = mydomain
item = clientid
attribute.n.name = key_1 ... key_n
attribute.n.value = val1 ... valn
That would satisfy the ability to store multiple values for the same key.
But then I found that I need to either get ALL attributes in my select or know example
how many attributes I have. I will not know this up front.
Also I allow deleting a specific value from a key (or attribute). I will have to search for it first. It seems that in the select there is no attributeName() function, just the itemName() function.
Would it perhaps be better to make the item name a combination of id + key + _n ?
e.g. if the id is 'myid' and the key is 'boots' then the item name would be
'myidboots_1'
And then have a single attribute per item called say 'keyval'.
and I can do a
select 'keyval' where itemName like 'myidboots_%' ?
Still kindof cumbersome compared to a normal sql database.
Maybe I should try encoding the values like a comma separated list?
Except that it's probably more cumbersome and also I've read that there is a 1000 character limit.
Any other suggestions?
I'm not sure I totally follow your question, but I think it might be helpful to point out that SimpleDB lets you do classic SQL style queries like:
select * from foo where bar = '1'
This will return all the attributes/values for the resulting records.

Resources