Find all leaves of a selected subgraph with Neo4j/ Cypher - neo4j

Initial Situation
Large Neo4j 3.4.6 graph with a tree-like structure (10 levels deep, 10 million nodes).
Unexceptional all nodes are connected with each other. The nodes as well as the relationships are in each case of the same type.
Exactly one central root node.
Reduced and simplified example:
Graphic representation
CREATE (Root:CustomType {name: 'Root'})
CREATE (NodeA:CustomType {name: 'NodeA'})
CREATE (NodeB:CustomType {name: 'NodeB'})
CREATE (NodeC:CustomType {name: 'NodeC'})
CREATE (NodeD:CustomType {name: 'NodeD'})
CREATE (NodeE:CustomType {name: 'NodeE'})
CREATE (NodeF:CustomType {name: 'NodeF'})
CREATE (NodeG:CustomType {name: 'NodeG'})
CREATE (NodeH:CustomType {name: 'NodeH'})
CREATE (NodeI:CustomType {name: 'NodeI'})
CREATE (NodeJ:CustomType {name: 'NodeJ'})
CREATE (NodeK:CustomType {name: 'NodeK'})
CREATE (NodeL:CustomType {name: 'NodeL'})
CREATE (NodeM:CustomType {name: 'NodeM'})
CREATE (NodeN:CustomType {name: 'NodeN'})
CREATE (NodeO:CustomType {name: 'NodeO'})
CREATE (NodeP:CustomType {name: 'NodeP'})
CREATE (NodeQ:CustomType {name: 'NodeQ'})
CREATE
(Root)-[:CONTAINS]->(NodeA),
(Root)-[:CONTAINS]->(NodeB),
(Root)-[:CONTAINS]->(NodeC),
(NodeA)-[:CONTAINS]->(NodeD),
(NodeA)-[:CONTAINS]->(NodeE),
(NodeA)-[:CONTAINS]->(NodeF),
(NodeE)-[:CONTAINS]->(NodeG),
(NodeE)-[:CONTAINS]->(NodeH),
(NodeF)-[:CONTAINS]->(NodeI),
(NodeF)-[:CONTAINS]->(NodeJ),
(NodeF)-[:CONTAINS]->(NodeK),
(NodeI)-[:CONTAINS]->(NodeL),
(NodeI)-[:CONTAINS]->(NodeM),
(NodeJ)-[:CONTAINS]->(NodeN),
(NodeK)-[:CONTAINS]->(NodeO),
(NodeK)-[:CONTAINS]->(NodeP),
(NodeM)-[:CONTAINS]->(NodeQ);
To be solved challenge
By means of a MATCH-WITH-UNWIND Cypher query I’m successfully able to select a subtree and bind it to a path. Let’s say the subtree spans over the nodes A,E,F,I and J.
Based on this path I need all leaves of the subtree, not the complete tree now.
.
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'}) /* simplified */
WITH
nodes(path) as selectedPath
/* here: necessary magic to identify the leaf nodes of the subtree */
RETURN
leafNode;
Among other things I tried to solve the requirement with a WHERE NOT(node-->()) approach, but realized this works for leaves of the complete tree only. Unfortunately I was not able to convince the WHERE NOT(node-->()) clause to respect the selected subtree boundaries.
So, how can I find all leaves of a selected subgraph with Cypher and Neo4j? Can you please give me an advice how to solve this challenge? Many thanks in advance for pointing me into the right direction!

You correctly noted that the check node with no children is suitable only for the entire tree. So you need to go through all the relationships in the subtree, and find such a node of the subtree that is as the end of the relationship, but not as the start of the relationship:
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'})
UNWIND relationShips(path) AS r
WITH collect(DISTINCT endNode(r)) AS endNodes,
collect(DISTINCT startNode(r)) AS startNodes
UNWIND endNodes AS leaf
WITH leaf WHERE NOT leaf IN startNodes
RETURN leaf

Related

Neo4J cypher query to find similar graphs

I have several separated graphs in a single database and I am currently searching for a way to get a list of all similar graphs.
For instance, I have the following three graphs:
As you can see, graph 1 and 2 are similar and graph 3 is different, because the last node of graph 3 has Label_4 and not Label_3 (as it is the case for 1 and 2).
Therefore, I would like to get as a result of the query something like:
[a1->b1->c1,a2->b2->c2],[a3->b3->d3]
whereas a1->b1->c1 is graph 1, a2->b2->c2 is graph 2, and a3->b3->d3 is graph 3.
Is there a way to achieve this with Cypher? The representation of the result can also be different, as long as it groups similar graphs (e.g., also a list node IDs or only the start node IDs is fine).
For the creation of the example I used the following commands:
CREATE (a1:Label_1 {name: "Label_1"})
CREATE (b1:Label_2 {name: "Label_2"})
CREATE (c1:Label_3 {name: "Label_3"})
CREATE (a2:Label_1 {name: "Label_1"})
CREATE (b2:Label_2 {name: "Label_2"})
CREATE (c2:Label_3 {name: "Label_3"})
CREATE (a3:Label_1 {name: "Label_1"})
CREATE (b3:Label_2 {name: "Label_2"})
CREATE (d3:Label_4 {name: "Label_4"})
CREATE (a1)-[:FOLLOWS]->(b1)
CREATE (b1)-[:FOLLOWS]->(c1)
CREATE (a2)-[:FOLLOWS]->(b2)
CREATE (b2)-[:FOLLOWS]->(c2)
CREATE (a3)-[:FOLLOWS]->(b3)
CREATE (b3)-[:FOLLOWS]->(d3)
If you are: (A) trying to group complete directed graphs (i.e., directed graphs that start at a root node and end at a leaf node), and (B) only interested in using one of the (possibly many) labels for each node, this should work (but, due to the unbounded variable-length relationship, it could take a very long time or run out of memory in large DBs):
MATCH p = (n)-[*]->(m)
WHERE NOT ()-->(n) AND NOT (m)-->()
RETURN [x IN NODES(p) | LABELS(x)[0]] as labelPath, COLLECT(p)
You can remove the (A) constraint by removing the WHERE clause, but then you'd have a much bigger result set (and increase the time to completion and the risk of running out of memory).

Neo4j's Cypher query language - reducing nodes in a match

Relatively new to Neo4j. I realize the way I originally posted this it was too ambiguous. Below is hopefully a better explanation.
//Subgraph 1
Create (p1:Person {name: 'Person1'})
Create (p2:Person {name: 'Person2'})
Create (a1:Address {street: 'Suspicious'})
Create (p1)-[:Resides]->(a1)
Create (p2)-[:Resides]->(a1)
//Subgraph 2
Create (p3:Person {name: 'Person3'})
Create (p4:Person {name: 'Person4'})
Create (a2:Address {street: 'Double'})
Create (p3)-[:Resides]->(a2)
Create (p4)-[:Resides]->(a2)
Create (p3)-[:Knows]->(p4)
//Subgraph 3
Create (p5:Person {name: 'Person5'})
Create (a3:Address {street: 'Single'})
Create (p5)-[:Resides]->(a3)
What I would like to write is a query to detect the following:
- All addresses (and people) that have 2 or more People residing there that do not know each other.
This means that only Subgraph1 should be found.
Subgraph2 would not be found because there are 2 people that reside there but they know each other.
Subgraph3 would not be found because there is only 1 person residing there.
Again, thanks for the help.
This Cypher query should work:
MATCH (n1)-[:RESIDES_AT]->()<-[:RESIDES_AT]-(n2)
WHERE NOT exists((n1)-[:KNOWS]-(n2))
RETURN n1, n2
start by matching on nodes that have a RESIDES_AT relationship to the same node, then filter out nodes that have a KNOWS relationship.

CREATE UNIQUE in neo4j produces duplicate nodes

According to the neo4j documentation:
CREATE UNIQUE is in the middle of MATCH and CREATE — it will match
what it can, and create what is missing. CREATE UNIQUE will always
make the least change possible to the graph — if it can use parts of
the existing graph, it will.
This sounds great, but CREATE UNIQUE doesn't seem to follow the 'least possible change' rule. e.g., here is some Cypher to create two people:
CREATE (n:Person {name: 'Alice'})
CREATE (n:Person {name: 'Bob'})
CREATE INDEX ON :Person(name)
and here's two CREATE UNIQUE statements, to create a relationship between those people. Since both people already exist in the graph, only the relationships should be newly created:
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)-[:knows]->(b:Person {name: 'Bob'})
RETURN a
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)<-[:knows]-(b:Person {name: 'Bob'})
RETURN a
After this, the graph should look like
(Alice)<---KNOWS--->(Bob).
But when you run a MATCH query:
MATCH (a:Person)
RETURN a
it seems that the graph now looks like
(Bob)
(Bob)--KNOWS-->(Alice)--KNOWS-->(Bob);
two extra Bobs have been created.
I looked a bit through the other Cypher commands, but none of them seem intended for this use case: create a link between existing node A and existing node B if B exists, and otherwise create a link between existing node A and a newly created node B. How can this problem best be solved within the Cypher framework?
This query should do what you want (if you always want to end up with a single knows relationship between the 2 nodes):
MATCH (a:Person {name: 'Alice'})
MERGE (b:Person {name: 'Bob'})
MERGE (a)-[:knows]->(b)
RETURN a;
Here is how you can do it with CREATE UNIQUE
MATCH (a:Person {name: 'Alice'}), (b:Person {name:'Bob'})
CREATE UNIQUE (a)-[:knows]->(b), (b)-[:knows]->(a)
You need 2 match clauses otherwise you are always creating the node in the CREATE UNIQUE statement, not matching existing nodes.

neo4j cypher joining 2 nodes merge

I have 2 node tags: User, Tag.
Lets say that I have a user node that exists.
Is it possible to match that node,
and then if the tag exists merge between them,
and if the tag doesn't exist create the tag.
I tryed:
MATCH (n:User {name: "user"}) MERGE (n)-[r:follow]->(tag:Tag {name: "notexist")
In the above example it creates the node "notexist" and the relationship.
But if I have a node that is named "notexist" it doesn't merge, instead it creates another tag
named "notexist"
thank you
Lee,
Here's how to do this.
MATCH(n:User {name: 'user'})
WITH n
MERGE (t:Tag {name: 'notexist'})
WITH n, t
MERGE (n)-[r:follow]->(t);
Grace and peace,
Jim

CREATE UNIQUE with distinct nodes

I'm trying to put Java packages like org.somepackage.parser and org.otherpackage.parser into a neo4j database. I tried to solve the problem with the following cypher queries.
MATCH (root:package {isRoot: true})
CREATE UNIQUE (root)
<-[:subpackage]-(:package {name: 'org'})
<-[:subpackage]-(:package {name: 'somepackage'})
<-[:subpackage]-(:package {name: 'parser'})
MATCH (root:package {isRoot: true})
CREATE UNIQUE (root)
<-[:subpackage]-(:package {name: 'org'})
<-[:subpackage]-(:package {name: 'otherpackage'})
<-[:subpackage]-(:package {name: 'parser'})
Using the queries above the parser node just gets two relations to somepackage and otherpackage. I know that it's the expected behavior, but is there someway to get 2 different parser nodes? One linked to somepackage and the other linked to otherpackage?
Either you can add an attrib named id in your parser node and generate that id using java randomUUID or if you dont want to do that you can simply rename your parser nodes as
name: "org.otherpackage.parser" and name: "org.somepackage.parser"

Resources