How to find all nodes between two nodes in graph database? - neo4j

I`m using Cypher query language and I need to find nodes between node A and E. (A->B->C->D->E)
Next query returns all nodes including A and E, but i need to exclude them, to have B, C, D nodes. How can I filter my query result?
MATCH p= (A:City{name: 'City1'})-[:LINKED*]->(E:City{name: "City5"}) return nodes(p)

There are a number of ways you might do this, but a simple option is to just index into the node list using:
return nodes(p)[1..-1]

Related

To get nodes and relationships between two specified nodes for review

I have a database containing millions of nodes and edge data and I want to get all the nodes and relationships data between two specified nodes.
Below is the sample data for the graph which has 7 nodes and 7 relationships.
To traverse from 1st node to 7th node I can use the variable length relationship approach and can get the nodes and relationships in between the first and 7th nodes (but in this approach we need to know the number of relationships and nodes between 1st and 7th node).
For using variable length relationship approach we have to specify the number where we will get the end node and it traverses in one direction.
But in my case I know the start and end node and don't know how many relationships and nodes are in between them. Please suggest how I can write a Cypher query for this case.
I have used the APOC spanning tree procedure where it returns ‘path’ from the 1st and 7th element, but it does not return the nodes and relationships. Can I get nodes and relationships data in return using the spanning tree procedure and how?
Is there any other way to get all nodes and relations between two nodes without using the APOC procedure?
Here is query with apoc procedure:
MATCH (start:temp {Name:"Joel"}), (end:temp {Name:"Jack"}) CALL apoc.path.spanningTree(start,{terminatorNodes:[end]}) YIELD path return path
Note: In our graph database nodes can have multi direction relations.
[Sample nodes and relationships snapshot]
: https://i.stack.imgur.com/nN9hk.png
I assume you do not want to have duplicates in your result, so my approach would be this
MATCH (start:temp {Name:"Joel"}), (end:temp {Name:"Jack"})
MATCH p=shortestPath((start)-[*]->(end))
UNWIND nodes(p) AS node
UNWIND relationships(p) AS rel
RETURN COLLECT(DISTINCT node) as nodes, COLLECT(DISTINCT rel) as rels
Might be better to use shortestPath operator to find the shortest path between two nodes.
MATCH (start:temp {Name:"Joel"}), (end:temp {Name:"Jack"})
MATCH p=shortestPath((start)-[*]->(end))
RETURN nodes(p) as nodes, relationships(p) as rels

Remove automorphisms of a cypher query output

When doing a Cypher query to retrieve a specific subgraph with automorphisms, let's say
MATCH (a)-[:X]-(b)-[:X]-(c),
RETURN a, b, c
It seems that the default behaviour is to return every retrieved subgraph and all their automorphisms.
In that exemple, if (u)-[:X]-(v)-[:X]-(w) is a graph matching the pattern, the output will be u,v,w but also w,v,u, which consist in the same graph.
Is there a way to retrieve each subgraph only once ?
EDIT: It would be great if Cypher have a feature to do that in the search, using some kind of symmetry breaking condition as it would reduce the computing time. If that is not the case, how would you post-process to find the desired output ?
In the query you are making, (a)-[r:X]-(b) and (a)-[t:X]-(c) refer to a similar pattern. Since (b) and (c) can be interchanged. What is the need to repeat matching twice? MATCH (a)-[r:X]-(b) RETURN a, r, b returns all the subgraphs you are looking for.
EDIT
You can do something as follows to find the nodes, which are having two relations of type X.
MATCH (a)-[r:X]-(b) WHERE size((a)-[:X]-()) = 2 RETURN a, r, b
For these kind of mirrored patterns, we can add a restriction on the internal graph ids so only one of the two paths is kept:
MATCH (a)-[:X]-(b)-[:X]-(c)
WHERE id(a) < id(c)
RETURN a, b, c
This will also prevent the case where a = c.

Optimize Cypher query on multiple unique labels

In Neo4j, is it faster to run a query against all nodes (AllNodesScan) and then filter on their labels with a WHERE clause, or to run multiple queries with a NodeByLabelScan?
To illustrate, I want all nodes that are labeled with one of the labels in label_list:
label_list = ['label_1', 'label_2', ...]
Which would be faster in an application (this is pseudo-code):
for label in label_list:
run.query("MATCH (n:{label}) return n")
or
run.query("MATCH (n) WHERE (n:label_1 or n:label_2 or ...)")
EDIT:
Actually, I just realized that the best option might be to run multiple NodeByLabelScan in a single query, with something looking like this:
MATCH (a:label_1)
MATCH (b:label_2)
...
UNWIND [a, b ..] as foo
RETURN foo
Could someone speak to it?
Yes, it would be better to run multiple NodeByLabelScans in a single query.
For example:
OPTIONAL MATCH (a:label_1)
WITH COLLECT(a) AS list
OPTIONAL MATCH (b:label_2)
WITH list + COLLECT(b) AS list
OPTIONAL MATCH (c:label_3)
WITH list + COLLECT(c) AS list
UNWIND list AS n
RETURN DISTINCT n
Notes on the query:
It uses OPTIONAL MATCH so that the query can proceed even if a wanted label is not found in the DB.
It uses multiple aggregation steps to avoid cartesian products (also see this).
And it uses UNWIND so that it can useDISTINCT to return distinct nodes (since a node can have multiple labels).

Neo4j Cypher find two disjoint nodes

I'm using Neo4j to try to find any node that is not connected to a specific node "a". The query that I have so far is:
MATCH p = shortestPath((a:Node {id:"123"})-[*]-(b:Node))
WHERE p IS NULL
RETURN b.id as b
So it tries to find the shortest path between a and b. If it doesn't find a path, then it returns that node's id. However, this causes my query to run for a few minutes then crashes when it runs out of memory. I was wondering if this method would even work, and if there is a more efficient way? Any help would be greatly appreciated!
edit:
MATCH (a:Node {id:"123"})-[*]-(b:Node),
(c:Node)
WITH collect(b) as col, a, b, c
WHERE a <> b AND NOT c IN col
RETURN c.id
So col (collect(b)) contains every node connected to a, therefore if c is not in col then c is not connected to a?
For one, you're giving this MATCH an impossible predicate to fulfill, so it will never find the shortest path.
WHERE clauses are associated with MATCH, OPTIONAL MATCH, and WITH clauses, so your query is asking for the shortest path where the path doesn't exist. That will never return anything.
Also, the shortestPath will start at the node you DON'T want to be connected, so this has no way of finding the nodes that aren't connected to it.
Probably the easiest way to approach this is to MATCH to all nodes connected to your node in question, then MATCH to all :Nodes checking for those that aren't in the connected set. That means you won't have to do a shortestPath from every single node in the db, just a membership check in a collection.
You'll need APOC Procedures for this, as it has the fastest means of matching to nodes within a subgraph.
MATCH (a:Node {id:"123"})
CALL apoc.path.subgraphNodes(a, {}) YIELD node
WITH collect(node) as subgraph
MATCH (b:Node)
WHERE NOT b in subgraph
RETURN b.id as b
EDIT
Your edited query is likely to blow up, that's going to generate a huge result set (the query will build a result set of every node reachable from your start node by a unique path in a cartesian product with every :Node).
Instead, go step by step, collect the distinct nodes (because otherwise you'll get multiples of the same nodes that can be reached via different paths), and then only after you have your collection should you start your match for nodes that aren't in the list.
MATCH (:Node {id:"123"})-[*0..]-(b:Node)
WITH collect(DISTINCT b) as col
MATCH (a:Node)
WHERE NOT a IN col
RETURN a.id

Cypher query with diverging paths

I would like to query for the following subgraph in my Neo4J database:
(a)-->(b)-->(c)-->(d)
|
| -->(e)
Note: a, b, c, d, e are attribute values (non-unique values) for each of the nodes. There are thousands for these nodes with similar attribute values (a to e) but they are randomly connected to one another.
How can I write the Cyper query to specifically find the particular subgraph (akin to subgraph isomorphism problem) I seek and return (a)? I've tried the following Cyper query but other subgraphs pop up:
START n1=node:SomeIndex(AttrVal="a")
MATCH n1-[]->n2-[]->n3-[]->n4
WHERE n2.AttrVal="b" AND n3.AttrVal="c" and n4.AttrVal="d"
WITH n1, n2
MATCH n2-[]->n5
WHERE n5.AttrVal="e"
RETURN n1
Am I using the WITH and 2nd MATCH clause wrongly?
Thanks!
You can use the comma to combine multiple paths in a single match clause:
START n1=node:SomeIndex(AttrVal="a")
MATCH n1-[]->n2-[]->n3-[]->n4, n2-[]->n5
WHERE n2.AttrVal="b" AND n3.AttrVal="c" and n4.AttrVal="d" and n5.attrVal='e'
RETURN n1
Side note 1:
you can also refactor the statement like this:
START n1=node:SomeIndex(AttrVal="a"), n2=node:SomeIndex(AttrVal="b")
n3=node:SomeIndex(AttrVal="c"), n4=node:SomeIndex(AttrVal="d"),
n5=node:SomeIndex(AttrVal="e")
MATCH n1-[]->n2-[]->n3-[]->n4, n2-[]->n5
RETURN n1
Depending on the structure of your graph the second might be faster.
Side note 2:
When matching an arbitrary relationship type as you did in n1-[]->n2 you can use a shorter and more readable notation: n1-->n2

Resources