SPARQL property path queries with arbitrary properties - path

SPARQL property path queries of arbitrary length require using specific properties. I want to query and find any path starting from a resource and ending in another resource. For example:
SELECT ?p
WHERE { :startNode ?p* :endNode }
where ?p* specifies a path. Is there a way of doing this?

You're right that you can't use variables in property path expressions. There are a few things that you can do, though, that might help you.
A wildcard to check whether a path exists
You can use a wildcard by taking the disjunction of it and its negation, so you can do a simple query that checks whether there is a path connecting two resources:
<source> (<>|!<>)* <target>
If you have a : prefix defined, that can be even shorter, since : is a valid IRI:
<source> (:|!:)* <target>
If there is a path (or multiple paths) between two nodes, you can split it up using wildcard paths joined by ?p, and so find all the ?ps that are on the path:
<source> (:|!:)* ?x .
?x ?p ?y .
?y (:|!:)* <target> .
You can make that even shorter, I think, by using blank nodes instead of ?x and ?y:
<source> (:|!:)* [ ?p [ (:|!:)* <target> ] ]
(That might not work, though. I seem to recall the grammar actually disallowing property paths in some places within blank nodes. I'm not sure.)
For a single path, get properties and positions, then group_concat
Now, in the case that there is just one path between two resources, you can even get the properties along that path, along with their positions. You could order by those positions, and then use a group by to concatenate the properties in order into a single string. This is probably easiest to see with an example. Suppose you've got the following data which has a single path from :a to :d:
#prefix : <urn:ex:> .
:a :p1 :b .
:b :p2 :c .
:c :p3 :d .
Then you can use a query like this to get each property in the path and its position. (This only works if there's a single path, though. See my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL? for a bit more about how this works.)
prefix : <urn:ex:>
select ?p (count(?mid) as ?pos) where {
:a (:|!:)* ?mid .
?mid (:|!:)* ?x .
?x ?p ?y.
?y (:|!:)* :d
}
group by ?x ?p ?y
-------------
| p | pos |
=============
| :p2 | 2 |
| :p1 | 1 |
| :p3 | 3 |
-------------
Now, if you order those results by ?pos and wrap that query in another, then you can use group_concat on ?p to get a single string of the properties in order. (The order being preserved isn't guaranteed, but it's pretty common behavior. See my answer to obtain the matrix in protege for another example of how this technique works, and my answer to Ordering in GROUP_CONCAT in SPARQL 1.1 for discussion about why it is not guaranteed.)
prefix : <urn:ex:>
select (group_concat(concat('<',str(?p),'>');separator=' ') as ?path) {
select ?p (count(?mid) as ?pos) where {
:a (:|!:)* ?mid .
?mid (:|!:)* ?x .
?x ?p ?y.
?y (:|!:)* :d
}
group by ?x ?p ?y
order by ?pos
}
-----------------------------------------
| path |
=========================================
| "<urn:ex:p1> <urn:ex:p2> <urn:ex:p3>" |
-----------------------------------------

Related

Cypher: How to match node based on property and then set or add new nodes accordingly?

Suppose I have a nodes with properties prop1 and prop2.
I would like to add a new node according to the following logic:
If a node n satisfies n.prop1=123 then set n.prop2=b to be 789;
If no such node exists then add a Node with with {prop1=123, prop2=789}.
The problem with doing something like Merge (n:Node {prop1:123, prop2:789}) is that if some node (m:Node {prop1:123, prop2:11111}) exists, for example, then we will end up with two nodes satisfying prop1=123.
On the other hand, if I do Match (n:Node) Where prop1=123 Set prop2=789 then this will do nothing if a no node with prop1=123 exits.
How can I accomplish this?
MERGE is still the answer, set the second value after the node has been matched or created :
MERGE (n:Node {prop1: 123})
SET n.prop2 = 789
Whenever we execute a merge query, a node is either matched or created. Using on create and on the match, you can set properties for indicating whether the node is created or matched.
Syntax
Following is the syntax of OnCreate and OnMatch clauses.
MERGE (node:label {properties . . . . . . . . . . .})
ON CREATE SET property.isCreated ="true"
ON MATCH SET property.isFound ="true"

Can Cypher filter results based on an attribute of the first encountered node of a given type?

I'm using Neo4J and learning Cypher, and have a question about filtering results based on an attribute of the first encountered node of a given type (in the OPTIONAL MATCH line of the example code below).
My query is as follows:
MATCH
(a:Word),
(b:Word)
WHERE a.lemma IN [ "enjoy" ]
AND b.lemma IN [ "control", "achievement" ]
OPTIONAL MATCH p = shortestPath((a)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
RETURN
a.lemma as From, b.lemma as To,
length(
filter(n in nodes(p) WHERE 'Word' in labels(n))
) - 1 as Shortest_Number_of_Hops_Only_Counting_Words,
length(p) as Shortest_Number_of_Hops_Counting_All_Nodes
Two general types of paths might occur in the database:
(a:Word) <-[IS_A_FORM_OF]- (Morph) -[IS_A_FORM_OF]-> (Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (b:Word)
and
(a:Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (Word) -[IS_DEFINED_AS]-> (Synset) <-[IS_DEFINED_AS]- (b:Word)
There might be any number of hops (currently capped at 15 in the query above) between a and b.
I've tried to give a very specific example above, but my question really is a very general one about using Cypher: I would like to filter for paths in which the first Synset node encountered contains a certain attribute (for example, {part_of_speech: 'verb'}. I've been reading the Cypher refcard and am wondering whether the head() expression should be used to somehow select the first Synset node in the path, but I'm unsure how to do it. Is there a straightforward way to add this to the MATCH / WHERE statement?
You can match Synset node by its property like this
MATCH (verb:Synset {part_of_speech: 'verb'})
RETURN verb
Then variable verb will match only Synset nodes whose part_of_speech property is "verb".
You can use this variable further on in your request. For example you can write essentially the same request restricting the value of node's property in WHERE section:
MATCH (verb:Synset)
WHERE verb.part_of_speech = 'verb'
RETURN verb
Applying to your request you might rewrite it like this:
MATCH
(a:Word) -[:IS_DEFINED_AS]-> (verb:Synset {part_of_speech: "verb"}),
(b:Word)
WHERE a.lemma IN [ "enjoy" ]
AND b.lemma IN [ "control", "achievement" ]
OPTIONAL MATCH p = shortestPath((a)-[:IS_DEFINED_AS]-(verb)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
RETURN
a.lemma as From, b.lemma as To,
length(
filter(n in nodes(p) WHERE 'Word' in labels(n))
) - 1 as Shortest_Number_of_Hops_Only_Counting_Words,
length(p) as Shortest_Number_of_Hops_Counting_All_Nodes
#oleg-kurbatov's answer does work, but only if (a:Word) is immediately connected to a Synset (it doesn't account for instances where (a:Word) must travel through a node of type Morph, etc., before getting to a Synset (as in my first example path in the original question). Additionally, the adding-paths-together approach seems more computationally intensive – 802ms for my original query vs 2364ms using a slightly modified version of Oleg's suggested implementation (since Cypher/Neo4J doesn't allow specifying more than one specific hop when using shortestPath():
MATCH
(a:Word),
(b:Word)
WHERE a.lemma IN [ "enjoy" ]
AND b.lemma IN [ "control", "achievement" ]
MATCH p1 = (a)-[:IS_DEFINED_AS]-> (initial_synset:Synset{pos: 'v'})
OPTIONAL MATCH p2 = shortestPath((initial_synset)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
RETURN
a.lemma as From, b.lemma as To,
length(
filter(n in nodes(p2) WHERE 'Word' in labels(n))
) as Shortest_Number_of_Hops_Only_Counting_Words,
length(p1) + length(p2) as Shortest_Number_of_Hops_Counting_All_Nodes
Taking Oleg's suggestion as a starting point, though, I did figure out one way to filter shortestPath() so that it only settles on a path where the first encountered Synset node has a 'pos' attribute of 'v', without increasing query execution time: I amended the OPTIONAL MATCH line in my original question to read:
OPTIONAL MATCH p = shortestPath((a)-[:IS_DEFINED_AS|IS_A_FORM_OF*..15]-(b))
WHERE head(filter(x in nodes(p) WHERE x:Synset)).pos = 'v'
As I understand, filter(x in nodes(p) WHERE x:Synset) gets a list of all Synset-type nodes in the path being considered. head(...) gets the first node from that list, and .pos = 'v' checks that that node's "pos" attribute is "v".

FOREACH with collection in cypher

I have a collection of rels,created using this
MATCH (u:user)-[i:INTEREST]->(t:term)
WITH COLLECT([i,t]) AS its
RETURN its
and it returns the array of rels and nodes correctly.
see also http://console.neo4j.org/r/cw7saq
Now I want to set the properties of the relationship, but don't see how I can access the rels in the array. Tried this,
MATCH (u:user)-[i:INTEREST]->(t:term)
WITH COLLECT([i,t]) AS its
FOREACH (it IN its |
SET it[0].testprop=89292" )
but it returns an error
Error: Invalid input '[': expected an identifier character, node labels, a property map, a relationship pattern, '(', '.' or '=' (line 4, column 16)
" SET it[0].testprop=89292" )"
anyone knows what is the right syntax to do this ?
Anyone encountering a subset error like mentioned by OP can resolve it with parentheses:
MATCH (u:user)-[i:INTEREST]->(t:term)
WITH COLLECT([i,t]) AS its
FOREACH (it IN its |
SET (it[0]).testprop=89292" )
There's no need to collect the term nodes as well. Just do it as follows:
MATCH path=(u:user)-[i:INTEREST]->(t:term)
FOREACH (n IN rels(path) | set n.testprop=89292)

Arbitrary path length query in SPARQL

Is it possible to do arbitrary length of path queries in SPARQL. Lets say i have neo4j store which has a graph that only represents PARENT_OF relationships (consider a family tree for example). A cypher query to get all ancestors of a person would look like
start n (some node from index query) match n<-[:PARENT_OF*]-k return k
How would this query look like in SPARQL if this neo store were to be represented as a RDF based triple store. Is this even possible.
If you have data like this:
#prefix : <http://stackoverflow.com/q/22210295/1281433/> .
:a :parentOf :b .
:b :parentOf :c .
:c :parentOf :d .
then you can use a query like this, using SPARQL 1.1's property paths:
prefix : <http://stackoverflow.com/q/22210295/1281433/>
select ?ancestor ?descendent where {
?ancestor :parentOf+ ?descendent
}
to get results like this:
-------------------------
| ancestor | descendent |
=========================
| :a | :b |
| :a | :c |
| :a | :d |
| :b | :c |
| :b | :d |
| :c | :d |
-------------------------
Note that using * permits zero occurrences of the relation, and relates each node to itself. If you want each thing to be an ancestor of itself, then you could replace + with * in my query.

Cypher: Return path where begin and end may be equal

I have a taxonomy as a Neo4j graph. The basic structure is like this:
taxonomyName -HAS_ROOT_TERM-> root -IS_BROADER_THAN-> term -IS_BROADER_THAN-> term'-IS_BROADER_THAN-> term'' - ...
Now I want for a given term - e.g. term'' - its path from the taxonomy root (or multiple paths; please note that there may be multiple taxonomies with multiple eligible roots, the structure is actually a poly-hierarchy):
START n=node:index("id:term''Id")
MATCH p = taxonomy-[:HAS_ROOT_TERM]->r-[:IS_BROADER_THAN*]->n
RETURN TAIL(EXTRACT(n in NODES(p) : n.id))
The TAIL excludes the first node to that I don't get back the taxonomy node itself. This works fine, except when I directly query for a root term. Then nothing is returned. Of course: I search a path with at least three elements, the taxonomy node, a root node and any descendant of the root. Now I'd need to express that r and n may be equal. I tried to make the IS_BROADER_THAN relationship optional, but then just null is returned because the pattern cannot be found.
So how do I restrict my query to paths including a root term and allowing paths of length one, only containing a root term?
Thank you!
Typical "RTFM" case, I'm afraid ;-)
The documentation at http://docs.neo4j.org/chunked/stable/query-match.html#match-zero-length-paths tells us, that
... root -[:IS_BROADER_THAN*0..]-> term ...
does the trick. Only specifying the asterisk assumes a 1.. range. With 0.., start and end node may be the same, i.e. the relationship may not have been traversed at all.

Resources