Reusing path in multiple MATCH UNION queries cypher neo4j - neo4j

I'd like to pull and combine data from several different paths that share a path at the beginning, not all of which might exist. For example, I'd like to do something like this:
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:FETCHING]->(data)
RETURN data.attribute
UNION ALL
MATCH (s)-[:OPTIONAL]->(o:OtherData)
RETURN o.attribute;
so that it doesn't retrace the path up to s. I can't actually do this, though, because UNION separates queries and the (s)-[:OPTIONAL] in the second part will match anything with an outgoing OPTIONAL relation; the s is a loose handle.
Is there a better way of doing this than repeating the path:
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:FETCHING]->(data)
RETURN data.attribute
UNION ALL
MATCH (:Complex)-[:PATH]->(s:Somewhere)-[:OPTIONAL]->(o:OtherData)
RETURN o.attribute;
I made a few attempts using WITH, but they all either caused the query to return nothing if any part failed, or I could not get them to line up into a single column and instead got rows with redundant data, or (with multiple, nested WITHs, which I'm not sure about the scoping of) just fetching everything.

Have you looked at the semantics of an optional match? So you can match to s, beyond s and your optional component. Something like:
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
RETURN data.attribute, otherData.attribute
Sorry I missed the importance of a single column, is it really important?
You can gather the vaues into a single collection :
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
RETURN [data.attribute] + COLLECT(otherData.attribute)
But doesn't this work for a single column:
MATCH (:Complex)-[:PATH]->(s:Somewhere)
MATCH (s)-[:FETCHING]->(data)
OPTIONAL MATCH (s)-[:OPTIONAL]->(otherData)
WITH [data.attribute] + COLLECT(otherData.attribute) as col
RETURN UNWIND col AS val

Related

Multiple Match queries in one query

I have the following records in my neo4j database
(:A)-[:B]->(:C)-[:D]->(:E)
(:C)-[:D]->(:E)
I want to get all the C Nodes and all the relations and related Nodes. If I do the query
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
I get the first to match if I do
Match (i:C)-[u:D]->(y:E)
Return i,u,y
I get the second to match.
But I want both of them in one query. How do I do that?
The easiest way is to UNION the queries, and pad unused variables with null (because all cyphers UNION'ed must have the same return columns
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
UNION
Match (i:C)-[u:D]->(y:E)
Return NULL as p, NULL as o,i,u,y
In your example though, the second match actually matches the last half of the first chain as well, so maybe you actually want something more direct like...
MATCH (c:C)
OPTIONAL MATCH (connected)
WHERE (c)-[*..20]-(connected)
RETURN c, COLLECT(connected) as connected
It looks like you're being a bit too specific in your query. If you just need, for all :C nodes, the connected nodes and relationships, then this should work:
MATCH (c:C)-[r]-(n)
RETURN c, r, n

Combining collection results of multiple MATCH cypher queries

I have a graph in which, there can exist three patters of paths between (:srcType) and (:destType):
Pattern 1
(:srcType)<-[]-()<-[]-(srcParent)<-[]-(center)-[]->(destParent)-[]->()-[]->(:destType)
Notice that, here the direction of relationships reverses as path goes through (center):<-[]-(center)-[]->
Pattern 2
In this pattern (srcParent) it self is a center. Thus direction of relationships reverses across (srcParent):
(:srcType)<-[]-()<-[]-(srcParent)-[]->(destParent)-[]->()-[]->(:destType)
Pattern 3
In this pattern (destParent) it self is a center. Thus direction of relationships reverses across (destParent):
(:srcType)<-[]-()<-[]-(srcParent)<-[]-(destParent)-[]->()-[]->(:destType)
I am giving id of (:srcType) and trying to obtain all (:destType) nodes. Note that given one (:srcType) it can have one (:destType) node associated with it following first pattern, another following 2nd pattern and few more following third pattern. I am trying to retrieve single collection containing all these (:destType) nodes. So I have combined above queries as follows:
MATCH (src:srcType)<-[]-()<-[]-(srcParent)<-[]-(center)-[]->(destParent)-[]->()-[]->(dest1:destType)
WHERE id(src)=3
WITH dest1
MATCH (src:srcType)<-[]-()<-[]-(srcParent)-[]->(destParent)-[]->()-[]->(dest2:destType)
WHERE id(src)=3
WITH dest1, dest2
MATCH (src:srcType)<-[]-()<-[]-(srcParent)<-[]-(destParent)-[]->()-[]->(dest3:destType)
WHERE id(src)=3
RETURN dest1, dest2, dest3
So here I am matching each pattern one by one in MATCH clauses and feeding (:destType)s output of one MATCH to next one using WITH clause. At the end I am returning all destTypes.
Q1. But this is not executing. When I run one of the pattern (single WITH), it correctly returns whichever (:destType) that matches the path. But with above query it returns 0 rows. Why is it so?
Q2. Also instead of returning all destTypes, I want to return single collection containing elements of all of them. Knowing that collections can be merged using +, is it possible to return something like below?
RETURN destType1+destType2+destType2
Note
I will need to add different filters for each pattern afterwards. So the future query may look something like this:
MATCH (src:srcType)<-[]-()<-[]-(srcParent)<-[]-(center)-[]->(destParent)-[]->()-[]->(dest1:destType)
WHERE id(src)=3 AND srcParent.prop1='a'
WITH dest1
MATCH (src:srcType)<-[]-()<-[]-(srcParent)-[]->(destParent)-[]->()-[]->(dest2:destType)
WHERE id(src)=3 AND destParent.prop2='b'
WITH dest1, dest2
MATCH (src:srcType)<-[]-()<-[]-(srcParent)<-[]-(destParent)-[]->()-[]->(dest3:destType)
WHERE id(src)=3 AND srcParent.prop3='c'
RETURN dest1, dest2, dest3
Given that these patterns may or may not be present, and that you want a collection of all results at the end, a good approach would be to match on the src node first, then use OPTIONAL MATCHes, and collect the results along the way, adding new ones in.
If we modify your last query, it may look something like this:
MATCH (src:srcType)
WHERE id(src) = 3
OPTIONAL MATCH (src)<-[]-()<-[]-(srcParent)<-[]-(center)-[]->(destParent)-[]->()-[]->(dest1:destType)
WHERE srcParent.prop1='a'
WITH src, COLLECT(dest1) as dests
OPTIONAL MATCH (src)<-[]-()<-[]-(srcParent)-[]->(destParent)-[]->()-[]->(dest2:destType)
WHERE destParent.prop2='b'
WITH src, dests + COLLECT(dest2) as dests
OPTIONAL MATCH (src)<-[]-()<-[]-(srcParent)<-[]-(destParent)-[]->()-[]->(dest3:destType)
WHERE srcParent.prop3='c'
RETURN dests + COLLECT(dest3) as dests

How to query for multiple OR'ed Neo4j paths?

Anyone know of a fast way to query multiple paths in Neo4j ?
Lets say I have movie nodes that can have a type that I want to match (this is psuedo-code)
MATCH
(m:Movie)<-[:TYPE]-(g:Genre { name:'action' })
OR
(m:Movie)<-[:TYPE]-(x:Genre)<-[:G_TYPE*1..3]-(g:Genre { name:'action' })
(m)-[:SUBGENRE]->(sg:SubGenre {name: 'comedy'})
OR
(m)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
The problem is, the first "m:Movie" nodes to be matched must match one of the paths specified, and the second SubGenre is depenedent on the first match.
I can make a query that works using MATCH and WHERE, but its really slow (30 seconds with a small 20MB dataset).
The problem is, I don't know how to OR match in Neo4j with other OR matches hanging off of the first results.
If I use WHERE, then I have to declare all the nodes used in any of the statements, in the initial MATCH which makes the query slow (since you cannot introduce new nodes in a WHERE)
Anyone know an elegant way to solve this ?? Thanks !
You can try a variable length path with a minimal length of 0:
MATCH
(m:Movie)<-[:TYPE|:SUBGENRE*0..4]-(g)
WHERE g:Genre and g.name = 'action' OR g:SubGenre and g.name='comedy'
For the query to use an index to find your genre / subgenre I recommend a UNION query though.
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' })
RETURN distinct m
UNION
(m:Movie)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
RETURN distinct m
Perhaps the OPTIONAL MATCH clause might help here. OPTIONAL MATCH beavior is similar to the MATCH statement, except that instead of an all-or-none pattern matching approach, any elements of the pattern that do not match the pattern specific in the statement are bound to null.
For example, to match on a movie, its genre and a possible sub-genre:
OPTIONAL MATCH (m:Movie)-[:IS_GENRE]->(g:Genre)<-[:IS_SUBGENRE]-(sub:Genre)
WHERE m.title = "The Matrix"
RETURN m, g, sub
This will return the movie node, the genre node and if it exists, the sub-genre. If there is no sub-genre then it will return null for sub. You can use variable length paths as you have above as well with OPTIONAL MATCH.
[EDITED]
The following MATCH clause should be equivalent to your pseudocode. There is also a USING INDEX clause that assumes you have first created an index on :SubGenre(name), for efficiency. (You could use an index on :Genre(name) instead, if Genre nodes are more numerous than SubGenre nodes.)
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' }),
(m)-[:SUBGENRE]->()<-[:SUB_TYPE*0..3]-(sg:SubGenre { name: 'comedy' })
USING INDEX sg:SubGenre(name)
Here is a console that shows the results for some sample data.

variable length matching over paths in Cypher

Is it possible to allow variable length matching over multiple paths in Cypher? By a path I mean something like two KNOWS traversals or three BLOCKS traversals, where KNOWS and BLOCKS are relationship types in my graph. For example, I'd like to be able to write something like:
MATCH (s)-[r*]->(t)
WHERE ALL (x in type(r) WHERE x=KNOWS/KNOWS OR x= BLOCKS/BLOCKS/BLOCKS)
RETURN s, t
where by "KNOWS/KNOWS" I mean something like (a)-[:KNOWS]->(b)-[:KNOWS]->(c). I want to do this without changing the data itself, by adding relationships such as KNOWS/KNOWS, but rather just as a cypher query.
Yes, you can do this. It's actually much easier than you think:
MATCH p=(s)-[r:KNOWS|BLOCKS*]->(t)
RETURN s, t;
When you specify the r, with a colon you can indicate which types you want to traverse, and separate them by a pipe for OR. The asterisk just operates the way you expect.
If you only want one type of relationship at a time you can do this:
OPTIONAL MATCH p1=(s1)-[r1:KNOWS*]->(t1)
OPTIONAL MATCH p2=(s2)-[r2:BLOCKS*]->(t2)
RETURN p1, p2;
If you want exactly two knows and 3 blocks, then it's:
OPTIONAL MATCH p1=(s1)-[:KNOWS]->()-[:KNOWS]->(t1)
OPTIONAL MATCH p2=(s2)-[:BLOCKS]->()-[:BLOCKS]->()-[:BLOCKS]->(t2)
RETURN p1, p2;

Difference between START n = node(*) and MATCH (n)

In Neo4j 2.0 this query:
MATCH (n) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
returns different results than this query:
START n = node(*) WHERE n.username = 'blevine'
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
The first query works as expected returning a single Node for 'blevine' and the associated Nodes mentioned in the OPTIONAL MATCH clauses. The second query returns many more Nodes which do not even have a username property. I realize that start n = node(*) is not recommended and that START is not even required in 2.0. But the second form (with OPTIONAL MATCH replaced with question marks on the relationship type) worked prior to 2.0. In the second form, why is 'n' not being constrained to the single 'blevine' node by the first WHERE clause?
To run the second query as expected you would just need to add WITH n. In your query you would need to filter the result and pass it for optional match which is to be done using WITH
START n = node(*) WHERE n.username = 'blevine'
WITH n
OPTIONAL MATCH n-[:Person]->person
OPTIONAL MATCH n-[:UserLink]->role
RETURN n AS user,person,collect(role) AS roles
From the documentation
WHERE defines the MATCH patterns in more detail. The predicates are part of the
pattern description, not a filter applied after the matching is done.
This means that WHERE should always be put together with the MATCH clause it belongs to.
when you do start n=node(*) where n.name="xyz" you need to pass the result explicitly into your next optional matches. But when you do MATCH (n) WHERE n.name="xyz" this tells graph specifically what node to start looking into.
EDIT
Here is the thing. The documentation says Optional Match returns null if a pattern is not found so in your first case, it includes all those results too where n.username property is null or cases where n doesnt even have a relationship suggested in the OPTIONAL MATCH pattern. So when you do a WITH n , the graph is explicitly told to use only n.
Excerpt from the documentation (link : here)
OPTIONAL MATCH matches patterns against your graph database, just like MATCH does.
The difference is that if no matches are found, OPTIONAL MATCH will use NULLs for
missing parts of the pattern. OPTIONAL MATCH could be considered the Cypher
equivalent of the outer join in SQL.
Either the whole pattern is matched, or nothing is matched. Remember that
WHERE is part of the pattern description, and the predicates will be
considered while looking for matches, not after. This matters especially
in the case of multiple (OPTIONAL) MATCH clauses, where it is crucial to
put WHERE together with the MATCH it belongs to.
Also few more things to note about the behaviour of WHERE clause: here
Excerpts:
WHERE is not a clause in it’s own right — rather, it’s part of MATCH,
OPTIONAL MATCH, START and WITH.
In the case of WITH and START, WHERE simply filters the results.
For MATCH and OPTIONAL MATCH on the other hand, WHERE adds constraints
to the patterns described. It should not be seen as a filter after the
matching is finished.

Resources