How to find specific subgraph in Neo4j using where clause - neo4j

I have a large graph where some of the relationships have properties that I want to use to effectively prune the graph as I create a subgraph. For example, if I have a property called 'relevance score' and I want to start at one node and sprawl out, collecting all nodes and relationships but pruning wherever a relationship has the above property.
My attempt to do so netted this query:
start n=node(15) match (n)-[r*]->(x) WHERE NOT HAS(r.relevance_score) return x, r
My attempt has two issues I cannot resolve:
1) Reflecting I believe this will not result in a pruned graph but rather a collection of disjoint graphs. Additionally:
2) I am getting the following error from what looks to be a correctly formed cypher query:
Type mismatch: expected Any, Map, Node or Relationship but was Collection<Relationship> (line 1, column 52 (offset: 51))
"start n=node(15) match (n)-[r*]->(x) WHERE NOT HAS(r.relevance_score) return x, r"

You should be able to use the ALL() function on the collection of relationships to enforce that for all relationships in the path, the property in question is null.
Using Gabor's sample graph, this query should work.
MATCH p = (n {name: 'n1'})-[rs1*]->()
WHERE ALL(rel in rs1 WHERE rel.relevance_score is null)
RETURN p

One solution that I can think of is to go through all relationships (with rs*), filter the the ones without the relevance_score property and see if the rs "path" is still the same. (I quoted "path" as technically it is not a Neo4j path).
I created a small example graph:
CREATE
(n1:Node {name: 'n1'}),
(n2:Node {name: 'n2'}),
(n3:Node {name: 'n3'}),
(n4:Node {name: 'n4'}),
(n5:Node {name: 'n5'}),
(n1)-[:REL {relevance_score: 0.5}]->(n2)-[:REL]->(n3),
(n1)-[:REL]->(n4)-[:REL]->(n5)
The graph contains a single relevant edge, between nodes n1 and n2.
The query (note that I used {name: 'n1'} to get the start node, you might use START node=...):
MATCH (n {name: 'n1'})-[rs1*]->(x)
UNWIND rs1 AS r
WITH n, rs1, x, r
WHERE NOT exists(r.relevance_score)
WITH n, rs1, x, collect(r) AS rs2
WHERE rs1 = rs2
RETURN n, x
The results:
╒══════════╤══════════╕
│n │x │
╞══════════╪══════════╡
│{name: n1}│{name: n4}│
├──────────┼──────────┤
│{name: n1}│{name: n5}│
└──────────┴──────────┘
Update: see InverseFalcon's answer for a simpler solution.

Related

Retrieve relationships between child nodes using neo4j cypher

I have a following network result when I run this query in neo4j browser:
MATCH (n1:Item {name: 'A'})-[r]-(n2:Item) Return n1,r,n2
At the bottom of the graph, it says: Displaying 6 nodes, 7 relationships.
But when I look on the table in the neo4j browser, I only have 5 records
n1,r,n2
A,A->B,B
A,A->C,C
A,A->D,D
A,A->E,E
A,A->F,F
So in the java code, when I get the list of records using the code below:
List<Record> records = session.run(query).list();
I only get 5 records, so I only get the 5 relationships.
But I want to get all 7 relationships including the 2 below:
B->C
C->F
How can i achieve that using the cypher query?
This should work:
MATCH (n:Item {name: 'A'})-[r1]-(n2:Item)
WITH n, COLLECT(r1) AS rs, COLLECT(n2) as others
UNWIND others AS n2
OPTIONAL MATCH (n2)-[r2]-(x)
WHERE x IN others
RETURN n, others, rs + COLLECT(r2) AS rs
Unlike #FrantišekHartman's first approach, this query uses UNWIND to bind n2 (which is not specified in the WITH clause and therefore becomes unbound) to the same n2 nodes found in the MATCH clause. This query also combines all the relationships into a single rs list.
There are many ways to achieve this. One ways is to travers to 2nd level and check that the 2nd level node is in the first level as well
MATCH (n1:Item {name: 'A'})-[r]-(n2:Item)
WITH n1,collect(r) AS firstRels,collect(n2) AS firstNodes
OPTIONAL MATCH (n2)-[r2]-(n3:Item)
WHERE n3 IN firstNodes
RETURN n1,firstRels,firstNodes,collect(r2) as secondRels
Or you could do a Cartesian product between the first level nodes and match:
MATCH (n1:Item {name: 'A'})-[r]-(n2:Item)
WITH n1,collect(r) AS firstRels,collect(n2) as firstNodes
UNWIND firstNodes AS x
UNWIND firstNodes AS y
OPTIONAL MATCH (x)-[r2]-(y)
RETURN n1,firstRels,firstNodes,collect(r2) as secondRels
Depending on on cardinality of firstNodes and secondRels and other existing relationships one might be faster than the other.

Is there a way to Traverse neo4j Path using Cypher for different relationships

I want to Traverse a PATH in neo4j (preferably using Cypher, but I can write neo4j managed extensions).
Problem -
For any starting node (:Person) I want to traverse hierarchy like
(me:Person)-[:FRIEND|:KNOWS*]->(newPerson:Person)
if the :FRIEND outgoing relationship is present then the path should traverse that, and ignore any :KNOWS outgoing relationships, if :FRIEND relationship does not exist but :KNOWS relationship is present then the PATH should traverse that node.
Right now the problem with above syntax is that it returns both the paths with :FRIEND and :KNOWS - I am not able to filter out a specific direction based on above requirement.
1. Example data set
For the ease of possible further answers and solutions I note my graph creating statement:
CREATE
(personA:Person {name:'Person A'})-[:FRIEND]->(personB:Person {name: 'Person B'}),
(personB)-[:FRIEND]->(personC:Person {name: 'Person C'}),
(personC)-[:FRIEND]->(personD:Person {name: 'Person D'}),
(personC)-[:FRIEND]->(personE:Person {name: 'Person E'}),
(personE)-[:FRIEND]->(personF:Person {name: 'Person F'}),
(personA)-[:KNOWS]->(personG:Person {name: 'Person G'}),
(personA)-[:KNOWS]->(personH:Person {name: 'Person H'}),
(personH)-[:KNOWS]->(personI:Person {name: 'Person I'}),
(personI)-[:FRIEND]->(personJ:Person {name: 'Person J'});
2. Scenario "Optional Match"
2.1 Solution
MATCH (startNode:Person {name:'Person A'})
OPTIONAL MATCH friendPath = (startNode)-[:FRIEND*]->(:Person)
OPTIONAL MATCH knowsPath = (startNode)-[:KNOWS*]->(:Person)
RETURN friendPath, knowsPath;
If you do not need every path to all nodes of the entire path, but only the whole, I recommend using shortestPath() for performance reasons.
2.1 Result
Note the missing node 'Person J', because it owns a FRIENDS relationship to node 'Person I'.
3. Scenario "Expand paths"
3.1 Solution
Alternatively you could use the Expand paths functions of the APOC user library. Depending on the next steps of your process you can choose between the identification of nodes, relationships or both.
MATCH (startNode:Person {name:'Person A'})
CALL apoc.path.subgraphNodes(startNode,
{maxLevel: -1, relationshipFilter: 'FRIEND>', labelFilter: '+Person'}) YIELD node AS friendNodes
CALL apoc.path.subgraphNodes(startNode,
{maxLevel: -1, relationshipFilter: 'KNOWS>', labelFilter: '+Person'}) YIELD node AS knowsNodes
WITH
collect(DISTINCT friendNodes.name) AS friendNodes,
collect(DISTINCT knowsNodes.name) AS knowsNodes
RETURN friendNodes, knowsNodes;
3.2 Explanation
line 1: defining your start node based on the name
line 2-3: Expand from the given startNode following the given relationships (relationshipFilter: 'FRIEND>') adhering to the label filter (labelFilter: '+Person').
line 4-5: Expand from the given startNode following the given relationships (relationshipFilter: 'KNOWS>') adhering to the label filter (labelFilter: '+Person').
line 7: aggregates all nodes by following the FRIEND relationship type (omit the .name part if you need the complete node)
line 8: aggregates all nodes by following the KNOWS relationship type (omit the .name part if you need the complete node)
line 9: render the resulting groups of nodes
3.3 Result
╒═════════════════════════════════════════════╤═════════════════════════════════════════════╕
│"friendNodes" │"knowsNodes" │
╞═════════════════════════════════════════════╪═════════════════════════════════════════════╡
│["Person A","Person B","Person C","Person E",│["Person A","Person H","Person G","Person I"]│
│"Person D","Person F"] │ │
└─────────────────────────────────────────────┴─────────────────────────────────────────────┘
MATCH p = (me:Person)-[:FRIEND|:KNOWS*]->(newPerson:Person)
WITH p, extract(r in relationships(p) | type(r)) AS types
RETURN p ORDER BY types asc LIMIT 1
This is a matter of interrogating the types of outgoing relationships for each node and then making a prioritized decision on which relationships to retain leveraging some nested case logic.
Using the small graph above
MATCH path = (a)-[r:KNOWS|FRIEND]->(b)
WITH a, COLLECT([type(r),a,r,b]) AS rels
WITH a,
rels,
CASE WHEN filter(el in rels WHERE el[0] = "FRIEND") THEN filter(el in rels WHERE el[0] = "FRIEND")
ELSE CASE WHEN filter(el in rels WHERE el[0] = "KNOWS") THEN filter(el in rels WHERE el[0] = "KNOWS") ELSE [''] END END AS search
UNWIND search AS s
RETURN s[1] AS a, s[2] AS r, s[3] AS b
I believe this returns your expected result:
Based on your logic, there should be no traversal to Person G or Person H from Person A, as there is a FRIEND relationship from Person A to Person B that takes precedence.
However there is a traversal from Person H to Person I because of the existence of the singular KNOWS relationship, and then a subsequent traversal from Person I to Person J.

Cypher - how to walk graph while computing

I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1

Cypher - Only show node name, not full node in path variable

In Cypher I have the following query:
MATCH p=(n1 {name: "Node1"})-[r*..6]-(n2 {name: "Node2"})
RETURN p, reduce(cost = 0, x in r | cost + x.cost) AS cost
It is working as expected. However, it prints the full n1 node, then the full r relationship (with all its attributes), and then full n2.
What I want instead is to just show the value of the name attribute of n1, the type attribute of r and again the name attribute of n2.
How could this be possible?
Thank you.
The tricky part of your request is the type attribute of r, as r is a collection of relationships of the path, not a single relationship. We can use EXTRACT to produce a list of relationship types for all relationships in your path. See if this will work for you:
MATCH (n1 {name: "Node1"})-[r*..6]-(n2 {name: "Node2"})
RETURN n1.name, EXTRACT(rel in r | TYPE(rel)) as types, n2.name, reduce(cost = 0, x in r | cost + x.cost) AS cost
You also seem to be calculating a cost for the path. Have you looked at the shortestPath() function?

Neo4J/Cypher Filter nodes based on multiple relationships

Using Neo4J and Cypher:
Given the diagram below, I want to be able to start at node 'A' and get all the children that have a 'ChildOf' relationship with 'A', but not an 'InactiveChildOf' relationship. So, in this example, I would get back A, C and G. Also, a node can get a new parent ('H' in the diagram) and if I ask for the children of 'H', I should get B, D and E.
I have tried
match (p:Item{name:'A'}) -[:ChildOf*]-(c:Item) where NOT (p)-[:InactiveChildOf]-(c) return p,c
however, that also returns D and E.
Also tried:
match (p:Item{name:'A'}) -[rels*]-(c:Item) where None (r in rels where type(r) = 'InactiveChildOf') return p,c
But that returns all.
Hopefully, this is easy for Neo4J and I am just missing something obvious. Appreciate the help!
Example data: MERGE (a:Item {name:'A'}) MERGE (b:Item {name:'B'}) MERGE (c:Item {name:'C'}) MERGE (d:Item {name:'D'}) MERGE (e:Item {name:'E'}) MERGE (f:Item {name:'F'}) MERGE (g:Item {name:'G'}) MERGE (h:Item {name:'H'}) MERGE (b)-[:ChildOf]->(a) MERGE (b)- [:InactiveChildOf] ->(a) MERGE (c)-[:ChildOf]->(a) MERGE (d)-[:ChildOf]->(b) MERGE (e)-[:ChildOf]->(b) MERGE (f)-[:ChildOf]->(c) MERGE (f)- [:InactiveChildOf] ->(c) MERGE (g)-[:ChildOf]->(c) MERGE (b)-[:ChildOf]->(h)
Note, I understand that I could simply put an "isActive" property on the ChildOf relationship or remove the relationship, but I am exploring options and trying to understand if this concept would work.
If a query interpreted as: find all the nodes, the path to which passes through the nodes unrelated by InactiveChildOf to the previous node, the request might be something like this:
match path = (p:Item{name:'A'})<-[:ChildOf*]-(c:Item)
with nodes(path) as nds
unwind range(0,size(nds)-2) as i
with nds,
nds[i] as i1,
nds[i+1] as i2
where not (i1)-[:InactiveChildOf]-(i2)
with nds,
count(i1) as test
where test = size(nds)-1
return head(nds),
last(nds)
Update: I think that this version is better (check that between two nodes there is no path that will contain at least one non-active type of relationship):
match path = (p:Item {name:'A'})<-[:ChildOf|InactiveChildOf*]-(c)
with p, c,
collect( filter( r in rels(path)
where type(r) = 'InactiveChildOf'
)
) as test
where all( t in test where size(t) = 0 )
return p, c
By reading and examining the graph, correct me if I'm wrong but the actual text representation of the cypher query should be
Find me nodes in a path to A, all nodes in that path cannot have an outgoing
InactiveChildOf relationship.
So, in Cypher it would be :
MATCH p=(i:Item {name:"A"})<-[:ChildOf*]-(x)
WHERE NONE( x IN nodes(p) WHERE (x)-[:InactiveChildOf]->() )
UNWIND nodes(p) AS n
RETURN distinct n
Which returns

Resources