Excluding Nodes that are unconnected - neo4j

I have a Graph connected as:
-->(D)-->(E)-->(F)
/
(A)-->(B)
\
-->(C)
The Graph is a tree with root = A with a directed relationship from parent to child through a :HAS_CHILD
What I want to do is to exclude nodes for a given property for instance:
MATCH (n:Node)
WHERE n.name <> "D"
return n
Which would give me the a subgraph:
(E)-->(F)
(A)-->(B)
\
-->(C)
Where E and F is not reachable from the root node. How do I exclude such subtrees?
Preferred result would be:
(A)-->(B)
\
-->(C)

I think we do not have the full picture of your data and what you really want here.
My guess is that your data model is a tree. It seems to me that you're trying to define a node to exclude, which also excludes all branches beneath that node (so in your example, you may have a rich and complex subtree beneath D, and you want to exclude all of that). This assumes a directed relationship down from parents to children in your tree.
If so, you can try the following query. I'm assuming the relationship from parent to child as :HAS_CHILD, since that wasn't included in your description.
MATCH (excluded:Node {name: "D"})
WITH excluded
MATCH (n:Node)
WHERE n <> excluded
AND NOT (excluded)-[:HAS_CHILD*]->(n)
RETURN n
Or, an alternative, which may perform better if your tree is large and the subtree beneath your excluded node is comparatively smaller than the entire tree:
MATCH (excludedRoot:Node {name: "D"})-[:HAS_CHILD*0..]->(excluded)
WITH COLLECT(excluded) as excludedNodes
MATCH (n:Node)
WHERE NOT n IN excludedNodes
RETURN n

So you want all the nodes that are neither D nor only connected to D:
MATCH (excluded:Node {name: "D"})
MATCH (n:Node)
WHERE n <> excluded
OPTIONAL MATCH (n)--(n2:Node)
WHERE n2 <> excluded
WITH n, collect(n2) AS nodes
WHERE size(nodes) > 0
RETURN n
This supposes that there's only one excluded node, as it will exclude the connected nodes for each excluded.
Should there be more than one, this modified query should work:
MATCH (excluded:Node {name: "D"})
WITH collect(excluded) AS excluded
MATCH (n:Node)
WHERE NOT n IN excluded
OPTIONAL MATCH (n)--(n2:Node)
WHERE NOT n2 IN excluded
WITH n, collect(n2) AS nodes
WHERE size(nodes) > 0
RETURN n

Related

Cypher - how to walk graph while computing

I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?
You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.
It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

Cypher: Quantifying over zero or more node-then-relations

I want to return all nodes a and b, where b is not downstream of a via any path that begins with relation rel. I keep finding myself having to write one condition for the case where a is linked directly to b via rel, and one for the indirect case, leading to something like this:
//Semi-pseudo-code.
match (a)-[*]->(b)
optional match dir=(a)-[:rel]->(b)
optional match indir=(a)-[:rel]-()-[*]->(b)
where length(dir)=0
and length(indir)=0
return a,b
Is there any easier way? Really I want something like this, where the bare quantifier means "zero or more nodes-then-relations":
match (a)-[*]->(b)
match not (a)-[:rel]-*->(b)
return a,b
Note: I suspect this may at root be the same as my last question: Cypher: Matching nodes at arbitrary depth via a strictly alternating set of relations
We can use WHERE NOT to formulate negative conditions, in a similar fashion to your second semi-pseudocode:
MATCH (a)-[*]->(b)
WHERE NOT ((a)-[:rel]->()-[*1..]->(b))
RETURN a, b
Of course, this will be anything but efficient, so you should at least try to restrict the labels of a and b and the relationships between them, e.g. (a:Label1)-[:rel1|rel2*]->(b:Label2)
An example:
CREATE
(n1:N {name: "n1"}),
(n2:N {name: "n2"}),
(n3:N {name: "n3"}),
(n4:N {name: "n4"}),
(n5:N {name: "n5"}),
(n1)-[:x]->(n2),
(n3)-[:rel]->(n4),
(n4)-[:x]->(n5)
The query results in:
╒══════════╤══════════╕
│a │b │
╞══════════╪══════════╡
│{name: n1}│{name: n2}│
├──────────┼──────────┤
│{name: n4}│{name: n5}│
└──────────┴──────────┘
As you can see, it does not include n3 and n5, as it starts with a :rel relationship.
This should work:
MATCH (a)-[rs*]->(b)
WHERE TYPE(rs[0]) <> 'rel'
RETURN a, b;
However, the query below should be much more performant, as it filters out all unwanted path beginnings before it does the very expensive variable-length path search. The *0.. syntax makes the variable-length search use a lower bound of 0 for the length (so x will also be returnable as b).
MATCH (a)-[r]->(x)
WHERE TYPE(r) <> 'rel'
MATCH (x)-[*0..]->(b)
RETURN a, b;

Neo4J/Cypher Filter nodes based on multiple relationships

Using Neo4J and Cypher:
Given the diagram below, I want to be able to start at node 'A' and get all the children that have a 'ChildOf' relationship with 'A', but not an 'InactiveChildOf' relationship. So, in this example, I would get back A, C and G. Also, a node can get a new parent ('H' in the diagram) and if I ask for the children of 'H', I should get B, D and E.
I have tried
match (p:Item{name:'A'}) -[:ChildOf*]-(c:Item) where NOT (p)-[:InactiveChildOf]-(c) return p,c
however, that also returns D and E.
Also tried:
match (p:Item{name:'A'}) -[rels*]-(c:Item) where None (r in rels where type(r) = 'InactiveChildOf') return p,c
But that returns all.
Hopefully, this is easy for Neo4J and I am just missing something obvious. Appreciate the help!
Example data: MERGE (a:Item {name:'A'}) MERGE (b:Item {name:'B'}) MERGE (c:Item {name:'C'}) MERGE (d:Item {name:'D'}) MERGE (e:Item {name:'E'}) MERGE (f:Item {name:'F'}) MERGE (g:Item {name:'G'}) MERGE (h:Item {name:'H'}) MERGE (b)-[:ChildOf]->(a) MERGE (b)- [:InactiveChildOf] ->(a) MERGE (c)-[:ChildOf]->(a) MERGE (d)-[:ChildOf]->(b) MERGE (e)-[:ChildOf]->(b) MERGE (f)-[:ChildOf]->(c) MERGE (f)- [:InactiveChildOf] ->(c) MERGE (g)-[:ChildOf]->(c) MERGE (b)-[:ChildOf]->(h)
Note, I understand that I could simply put an "isActive" property on the ChildOf relationship or remove the relationship, but I am exploring options and trying to understand if this concept would work.
If a query interpreted as: find all the nodes, the path to which passes through the nodes unrelated by InactiveChildOf to the previous node, the request might be something like this:
match path = (p:Item{name:'A'})<-[:ChildOf*]-(c:Item)
with nodes(path) as nds
unwind range(0,size(nds)-2) as i
with nds,
nds[i] as i1,
nds[i+1] as i2
where not (i1)-[:InactiveChildOf]-(i2)
with nds,
count(i1) as test
where test = size(nds)-1
return head(nds),
last(nds)
Update: I think that this version is better (check that between two nodes there is no path that will contain at least one non-active type of relationship):
match path = (p:Item {name:'A'})<-[:ChildOf|InactiveChildOf*]-(c)
with p, c,
collect( filter( r in rels(path)
where type(r) = 'InactiveChildOf'
)
) as test
where all( t in test where size(t) = 0 )
return p, c
By reading and examining the graph, correct me if I'm wrong but the actual text representation of the cypher query should be
Find me nodes in a path to A, all nodes in that path cannot have an outgoing
InactiveChildOf relationship.
So, in Cypher it would be :
MATCH p=(i:Item {name:"A"})<-[:ChildOf*]-(x)
WHERE NONE( x IN nodes(p) WHERE (x)-[:InactiveChildOf]->() )
UNWIND nodes(p) AS n
RETURN distinct n
Which returns

Cypher Optional Match

I have a graph in that contains two types of nodes (objects and pieces) and two types of links (similarTo and contains). Some pieces are made of the pieces.
I would like to extract the path to each piece starting from a set of objects.
MATCH (o:Object)
WITH o
OPTIONAL MATCH path = (p:Piece) <-[:contains*]- (o) -[:similarTo]- (:Object)
RETURN path
The above query only returns part of the pieces. In the returned graph, some objects do not directly connect to any pieces, the latter are not returned, although they actually do!
I can change the query to:
MATCH (o:Object) -[:contains*]-> (p:Piece)
OPTIONAL MATCH (o) –[:similarTo]- (:Object)
However, I did not manage to return the whole path for that query, which I need to return collection of nodes and links with:
WITH rels(path) as relations , nodes(path) as nodes
UNWIND relations as r unwind nodes as n
RETURN {nodes: collect(distinct n), links: collect(distinct {source: id(startNode(r)), target: id(endNode(r))})}
I'd be grateful to any recommendation.
Would something like this do the trick ?
I created a small graph representing objects and pieces here : http://console.neo4j.org/r/abztz4
Execute distinct queries with UNION ALL
Here you'll combine the two use cases in one set of paths :
MATCH (o:Object)
WITH o
OPTIONAL MATCH p=(o)-[:CONTAINS]->(piece)
RETURN p
UNION ALL
MATCH (o:Object)
WITH o
OPTIONAL MATCH p=(o)-[:SIMILAR_TO]-()-[:CONTAINS]->(piece)
RETURN p

Resources