Cypher query with diverging paths - neo4j

I would like to query for the following subgraph in my Neo4J database:
(a)-->(b)-->(c)-->(d)
|
| -->(e)
Note: a, b, c, d, e are attribute values (non-unique values) for each of the nodes. There are thousands for these nodes with similar attribute values (a to e) but they are randomly connected to one another.
How can I write the Cyper query to specifically find the particular subgraph (akin to subgraph isomorphism problem) I seek and return (a)? I've tried the following Cyper query but other subgraphs pop up:
START n1=node:SomeIndex(AttrVal="a")
MATCH n1-[]->n2-[]->n3-[]->n4
WHERE n2.AttrVal="b" AND n3.AttrVal="c" and n4.AttrVal="d"
WITH n1, n2
MATCH n2-[]->n5
WHERE n5.AttrVal="e"
RETURN n1
Am I using the WITH and 2nd MATCH clause wrongly?
Thanks!

You can use the comma to combine multiple paths in a single match clause:
START n1=node:SomeIndex(AttrVal="a")
MATCH n1-[]->n2-[]->n3-[]->n4, n2-[]->n5
WHERE n2.AttrVal="b" AND n3.AttrVal="c" and n4.AttrVal="d" and n5.attrVal='e'
RETURN n1
Side note 1:
you can also refactor the statement like this:
START n1=node:SomeIndex(AttrVal="a"), n2=node:SomeIndex(AttrVal="b")
n3=node:SomeIndex(AttrVal="c"), n4=node:SomeIndex(AttrVal="d"),
n5=node:SomeIndex(AttrVal="e")
MATCH n1-[]->n2-[]->n3-[]->n4, n2-[]->n5
RETURN n1
Depending on the structure of your graph the second might be faster.
Side note 2:
When matching an arbitrary relationship type as you did in n1-[]->n2 you can use a shorter and more readable notation: n1-->n2

Related

Remove automorphisms of a cypher query output

When doing a Cypher query to retrieve a specific subgraph with automorphisms, let's say
MATCH (a)-[:X]-(b)-[:X]-(c),
RETURN a, b, c
It seems that the default behaviour is to return every retrieved subgraph and all their automorphisms.
In that exemple, if (u)-[:X]-(v)-[:X]-(w) is a graph matching the pattern, the output will be u,v,w but also w,v,u, which consist in the same graph.
Is there a way to retrieve each subgraph only once ?
EDIT: It would be great if Cypher have a feature to do that in the search, using some kind of symmetry breaking condition as it would reduce the computing time. If that is not the case, how would you post-process to find the desired output ?
In the query you are making, (a)-[r:X]-(b) and (a)-[t:X]-(c) refer to a similar pattern. Since (b) and (c) can be interchanged. What is the need to repeat matching twice? MATCH (a)-[r:X]-(b) RETURN a, r, b returns all the subgraphs you are looking for.
EDIT
You can do something as follows to find the nodes, which are having two relations of type X.
MATCH (a)-[r:X]-(b) WHERE size((a)-[:X]-()) = 2 RETURN a, r, b
For these kind of mirrored patterns, we can add a restriction on the internal graph ids so only one of the two paths is kept:
MATCH (a)-[:X]-(b)-[:X]-(c)
WHERE id(a) < id(c)
RETURN a, b, c
This will also prevent the case where a = c.

Optimize Cypher query on multiple unique labels

In Neo4j, is it faster to run a query against all nodes (AllNodesScan) and then filter on their labels with a WHERE clause, or to run multiple queries with a NodeByLabelScan?
To illustrate, I want all nodes that are labeled with one of the labels in label_list:
label_list = ['label_1', 'label_2', ...]
Which would be faster in an application (this is pseudo-code):
for label in label_list:
run.query("MATCH (n:{label}) return n")
or
run.query("MATCH (n) WHERE (n:label_1 or n:label_2 or ...)")
EDIT:
Actually, I just realized that the best option might be to run multiple NodeByLabelScan in a single query, with something looking like this:
MATCH (a:label_1)
MATCH (b:label_2)
...
UNWIND [a, b ..] as foo
RETURN foo
Could someone speak to it?
Yes, it would be better to run multiple NodeByLabelScans in a single query.
For example:
OPTIONAL MATCH (a:label_1)
WITH COLLECT(a) AS list
OPTIONAL MATCH (b:label_2)
WITH list + COLLECT(b) AS list
OPTIONAL MATCH (c:label_3)
WITH list + COLLECT(c) AS list
UNWIND list AS n
RETURN DISTINCT n
Notes on the query:
It uses OPTIONAL MATCH so that the query can proceed even if a wanted label is not found in the DB.
It uses multiple aggregation steps to avoid cartesian products (also see this).
And it uses UNWIND so that it can useDISTINCT to return distinct nodes (since a node can have multiple labels).

traversing neo4j graph with conditional edge types

i have a fixed database that has the nodes connecting people and edges with six different types of relationships. to make it simple, I call in this post the types of relationships A, B, C, D, E and F. None of the relationships are directional. new at the syntax so thank you for help.
I need to get sets of relationships that traverses the graph based on a conditional path A to (B or C.D) to E to F. So this means that I first need the relationship that links two nodes ()-[:A]-(), but then I am confused about how to express a conditional relationship. To get to the next node, I need either B or C then D so that it is ()-[:B]-() OR ()-[:C]-()-[:D]-(). How to express this conditional traversal in the MATCH syntax?
Tried all of these and got syntax errors:
(node2:Node)-[rel2:B|rel3:C]-(node3:Node)
(node2:Node)-[rel2:B]OR[rel3:C]-(node3:Node)
This pure-Cypher query should return all matching paths:
MATCH p=()-[:A]-()-[r:B|C|D*1..2]-()-[:E]-()-[:F]-()
WHERE (SIZE(r) = 1 AND TYPE(r[0]) = 'B') OR
(SIZE(r) = 2 AND TYPE(r[0]) = 'C' AND TYPE(r[1]) = 'D')
RETURN p
The [r:B|C|D*1..2] pattern matches 1 or 2 relationships that have the types B, C, and/or D (which can include subpaths that you don't want); and the WHERE clause filters out subpaths that you don't want.
This isn't something that can really be expressed with Cypher, when the number of hops to traverse isn't the same.
The easiest way to do this would probably be to use apoc.cypher.run() from APOC Procedures to execute a UNION query to cover both paths, then work with the result of the call:
//assume `node2` is in scope
CALL apoc.cypher.run("MATCH (node2)-[:B]-(node3:Node) RETURN node3
UNION
MATCH (node2)-[:C]-()-[:D]-(node3:Node) RETURN node3",
{node2:node2}) YIELD value
WITH value.node3 as node3 // , <whatever else you want in scope>
...

Multiple Match queries in one query

I have the following records in my neo4j database
(:A)-[:B]->(:C)-[:D]->(:E)
(:C)-[:D]->(:E)
I want to get all the C Nodes and all the relations and related Nodes. If I do the query
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
I get the first to match if I do
Match (i:C)-[u:D]->(y:E)
Return i,u,y
I get the second to match.
But I want both of them in one query. How do I do that?
The easiest way is to UNION the queries, and pad unused variables with null (because all cyphers UNION'ed must have the same return columns
Match (p:A)-[o:B]->(i:C)-[u:D]->(y:E)
Return p,o,i,u,y
UNION
Match (i:C)-[u:D]->(y:E)
Return NULL as p, NULL as o,i,u,y
In your example though, the second match actually matches the last half of the first chain as well, so maybe you actually want something more direct like...
MATCH (c:C)
OPTIONAL MATCH (connected)
WHERE (c)-[*..20]-(connected)
RETURN c, COLLECT(connected) as connected
It looks like you're being a bit too specific in your query. If you just need, for all :C nodes, the connected nodes and relationships, then this should work:
MATCH (c:C)-[r]-(n)
RETURN c, r, n

match a branching path of variable length

I have a graph which looks like this:
Here is the link to the graph in the neo4j console:
http://console.neo4j.org/?id=av3001
Basically, you have two branching paths, of variable length. I want to match the two paths between orange node and yellow nodes. I want to return one row of data for each path, including all traversed nodes. I also want to be able to include different WHERE clauses on different intermediate nodes.
At the end, i need to have a table of data, like this:
a - b - c - d
neo - morpheus - null - leo
neo - morpheus - trinity - cypher
How could i do that?
I have tried using OPTIONAL MATCH, but i can't get the two rows separately.
I have tried using variable length path, which returns the two paths but doesn't allow me to access and filter intermediate nodes. Plus it returns a list, and not a table of data.
I've seen this question:
Cypher - matching two different possible paths and return both
It's on the same subject but the example is very complex, a more generic solution to this simpler problem is what i'm looking for.
You can define what your end node by using WHERE statement. So in your case end node has no outgoing relationship. Not sure why you expect a null on return as you said neo - morpheus - null - leo
MATCH p=(n:Person{name:"Neo"})-[*]->(end) where not (end)-->()
RETURN extract(x IN nodes(p) | x.name)
Edit:
may not the the best option as I am not sure how to do this programmatically. If I use UNWIND I get back only one row. So this is a dummy solution
MATCH p=(n{name:"Neo"})-[*]->(end) where not (end)-->()
with nodes(p) as list
return list[0].name,list[1].name,list[2].name,list[3].name
You can use Cypher to match a path like this MATCH p=(:a)-[*]->(:d) RETURN p, and p will be a list of nodes/relationships in the path in the order it was traversed. You can apply WHERE to filter the path just like with node matching, and apply any list functions you need to it.
I will add these examples too
// Where on path
MATCH p=(:a)-[*]-(:d) WHERE NONE(n in NODES(p) WHERE n.name="Trinity") WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Spit path into columns
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Match path, filter on label
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN FILTER(n in p WHERE "a" in LABELS(n)) as a, FILTER(n in p WHERE "b" in LABELS(n)) as b, FILTER(n in p WHERE "c" in LABELS(n)) as c, FILTER(n in p WHERE "d" in LABELS(n)) as d
Unfortunately, you HAVE to explicitly set some logic for each column. You can't make dynamic columns (that I know of). In your table example, what is the rule for which column gets 'null'? In the last example, I set each column to be the set of nodes of a label.
I.m.o. you're asking for extensive post-processing of the results of a simply query (give me all the paths starting from Neo). I say this because :
You state you need to be able to specify specific WHERE clauses for each path (but you don't specify which clauses for which path ... indicating this might be a dynamic thing ?)
You don't know the size of the longest path beforehand ... but you still want the result to be a same-size-for-all-results table. And would any null columns then always be just before the end node ? Why (for that makes no real sense other then convenience) ?
...
Therefore (and again i.m.o.) you need to process the results in a (Java or whatever you prefer) program. There you'll have full control over the resultset and be able to slice and dice as you wish. Cypher (exactly like SQL in fact) can only do so much and it seems that you're going beyond that.
Hope this helps,
Regards,
Tom
P.S. This may seem like an easy opt-out, but look at how simple your query is as compared to the constructs that have to be wrought trying to answer your logic. So ... separate the concerns.

Resources