Find connected groups over levels - neo4j

A node A has 3 connected Nodes B1, B2, B3. Those Bx Nodes have again connected Nodes C1,C2,C3 and C4. Also Node A have 2 connected nodes C5 and C6.
Starting with node A I want to collect all C-nodes. I did a query for the A node, collect the two C-Nodes, then a query for the B-nodes, collect again all C-nodes and merge both arrays. Work but is not very clever.
I tried (Pseudocode)
MATCH (g)<-[:IS_SUBGROUP_OF*1]-(i)-[:HAS_C_NODES]->(c) WHERE g = A.uuid RETURN C_NODES
But I get either all c-nodes for A or for the B-nodes
How would I do a query that collects all C-Nodes starting with Node A?
* edited *
Here is some example data:
CREATE (a:A), (b1:B1), (b2:B2), (b3:B3), (c1:C1), (c2:C2), (c3:C3), (c4:C4), (a)-[r:HAS]->(c4), (a)-[r1:HAS]->(b1), (a)-[r2:HAS]->(b2), (a)-[r3:HAS]->(b3), (b1)-[r4:HAS]->(c1), (b1)-[r5:HAS]->(c2), (b2)-[r6:HAS]->(c3)
A query should return all nodes starting with C, no matter to which node they are connected (A or B).

You can add multiple labels for each node. You should use this to your advantage and segregate all the B and C nodes into a second label.
Eg:
CREATE (a:A), (b1:B1:BType), (b2:B2:BType), (b3:B3:BType), (c1:C1:CType), (c2:C2:CType), (c3:C3:CType), (c4:C4:CType), (a)-[r:HAS]->(c4), (a)-[r1:HAS]->(b1), (a)-[r2:HAS]->(b2), (a)-[r3:HAS]->(b3), (b1)-[r4:HAS]->(c1), (b1)-[r5:HAS]->(c2), (b2)-[r6:HAS]->(c3)
I have modified your create statement to group all the B nodes as :BType label and all the C nodes as :CType label.
You can simply use the optional match keyword to selectively traverse through the relationships if they exist and obtain the results you want.
match (a:A)-[:HAS]->(b:BType)-[:HAS]->(c:CType) optional match (a:A)-[:HAS]->(xc:CType) return c,xc
If you would like both sets of nodes to be grouped together you could try this statement instead which uses collect().
match (a:A)-[:HAS]->(b:BType)-[:HAS]->(c:CType) with a,collect (distinct c) as set1 optional match (a:A)-[:HAS]->(xc:CType) return set1 + collect (distinct xc) as output

Related

Neo4J: How can I find if a path traversing multiple nodes given in a list exist?

I have a graph of nodes with a relationship NEXT with 2 properties sequence (s) and position (p). For example:
N1-[NEXT{s:1, p:2}]-> N2-[NEXT{s:1, p:3}]-> N3-[NEXT{s:1, p:4}]-> N4
A node N might have multiple outgoing Next relationships with different property values.
Given a list of node names, e.g. [N2,N3,N4] representing a sequential path, I want to check if the graph contains the nodes and that the nodes are connected with relationship Next in order.
For example, if the list contains [N2,N3,N4], then check if there is a relationship Next between nodes N2,N3 and between N3,N4.
In addition, I want to make sure that the nodes are part of the same sequence, thus the property s is the same for each relationship Next. To ensure that the order maintained, I need to verify if the property p is incremental. Meaning, the value of p in the relationship between N2 -> N3 is 3 and the value p between N3->N4 is (3+1) = 4 and so on.
I tried using APOC to retrieve the possible paths from an initial node N using python (library: neo4jrestclient) and then process the paths manually to check if a sequence exists using the following query:
q = "MATCH (n:Node) WHERE n.name = 'N' CALL apoc.path.expandConfig(n {relationshipFilter:'NEXT>', maxLevel:4}) YIELD path RETURN path"
results = db.query(q,data_contents=True)
However, running the query took some time that I eventually stopped the query. Any ideas?
This one is a bit tough.
First, pre-match to the nodes in the path. We can use the collected nodes here to be a whitelist for nodes in the path
Assuming the start node is included in the list, a query might go like:
UNWIND $names as name
MATCH (n:Node {name:name})
WITH collect(n) as nodes
WITH nodes, nodes[0] as start, tail(nodes) as tail, size(nodes)-1 as depth
CALL apoc.path.expandConfig(start, {whitelistNodes:nodes, minLevel:depth, maxLevel:depth, relationshipFilter:'NEXT>'}) YIELD path
WHERE all(index in range(0, size(nodes)-1) WHERE nodes[index] = nodes(path)[index])
// we now have only paths with the given nodes in order
WITH path, relationships(path)[0].s as sequence
WHERE all(rel in tail(relationships(path)) WHERE rel.s = sequence)
// now each path only has relationships of common sequence
WITH path, apoc.coll.pairsMin([rel in relationships(path) | rel.p]) as pairs
WHERE all(pair in pairs WHERE pair[0] + 1 = pair[1])
RETURN path

Selecting one of two nodes in Cypher query

all Cypher masters!
I can't figure out how to query all B nodes while choosing either B1 or B2 and B4 or B5. There is no constraint on which of them, only that one is chosen. As in the image, there's a relation (B1,B2) and (B4,B5).
In other words - I want to MATCH all nodes of type B connected to some node with type A, but excluding either B1 or B2 and B4 or B5 (using the relation between them) in the result. The nodes of type B can only be pairwise connected - that is, no (B1,B2), (B2,B3) will exists simultaneously. Although, there can be more than one pair as the image shows.
Any ideas are more than welcome!
I think this is a simple additional condition:
MATCH (A:A)--(B:B)
OPTIONAL MATCH (B)--(BT:B)--(A)
WITH B WHERE BT IS NULL OR id(B) > id(BT)
RETURN B
So for this, it will be faster to use APOC Procedures, as there are some helpful collection functions, and a procedure we'll want to easily get the relationships that exist between a group of nodes.
The idea here is that we'll match to the connected :B nodes, use the cover() procedure to get all relationships among these nodes, collect those relationships and from those take one of the nodes for those relationships (we'll use the startnode here), and then we'll just subtract those chosen nodes from our list leaving us with the :B nodes we want:
MATCH (a:A)--(b:B)
WITH collect(b) as bNodes
CALL apoc.algo.cover(bNodes) YIELD rel
WITH bNodes, [r in collect(rel) | startNode(r)] as toRemove
RETURN apoc.coll.subtract(bNodes, toRemove) as nodes
If you don't have (or don't want to use) APOC, here's a Cypher-only version:
MATCH (a:A)--(b:B)
WITH collect(b) as bNodes
UNWIND bNodes as b
OPTIONAL MATCH (b)-[r]-(other)
WHERE other IN bNodes
WITH bNodes, collect(DISTINCT startNode(r)) as toRemove
RETURN [b in bNodes WHERE NOT b in toRemove] as nodes

Return multiple sums of relationship weights using cypher

I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).

Neo4j - Intersect two node lists using Cypher

Having the following graphs:
node g1 with child nodes (a, b)
node g2 with child nodes (b, c)
using the query
MATCH (n)-[]-(m) WHERE ID(m) = id RETURN n
being id the id of the node g1, I get a and b, and vice-versa when using the id of g2. What I would like to understand is how can I get the intersection of those two results, in this case having the first return (a, b) and the second return (b, c) getting as final result (b).
I tried using the WITH cause but I wasn't able to achieve the desired result. Keep in mind that I'm new to Neo4j and only came here after a few failed attempts, research on Neo4j Documentation, general google search and
Stackoverflow.
Edit1 (one of my tries):
MATCH (n)-[]->(m)
WHERE ID(m) = 750
WITH n
MATCH (o)-[]->(b)
WHERE ID(b) = 684 and o = n
RETURN o
Edit2:
The node (b), that I represented as being the same on both graphs are in fact two different nodes on the db, each one relating to a different graph (g1 and g2). Representatively they are the same as they have the exactly same info (labels and attributes), but on the database thy are not. I'm sorry since it was my fault for not being more explicit on this matter :(
Edit3:
Why I don't using a single node (b) for both graphs
Using the graphs above as example, imagine that I have yet another layer so: on g1 the child node (b) as a child (e), while on g2 the child node (b) as a child (f). If I had (b) as a single node, when I create (e) and (f) I only could add it to (b) loosing the hierarchy, becoming impossible to distinguish which of them, (e) or (f), belonged to g1 ou g2.
This should work (assuming you pass id1 and id2 as parameters):
MATCH (a)--(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN n;
[UPDATED, based on new info from comments]
If you have multiple "clones" of the "same" node and you want to quickly determine which clones are related without having to perform a lot of (slow) property comparisons, you can add a relationship (say, typed ":CLONE") between clones. That way, a query like this would work:
MATCH (a)--(m)-[:CLONE]-(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN m, n;
You can find the duplicity of the node, by using this query -
[1]
Duplicity with single node -
MATCH pathx =(n)-[:Relationship]-(find) WHERE find.name = "action" RETURN pathx;
[2]
or for two nodes giving only immediate parent node
MATCH pathx =(n)-[:Relationship]-(find), pathy= (p)-[:Relationship]
-(seek) WHERE find.name = "action" AND seek.name="requestID" RETURN pathx,
pathy;
[3]
or to find the entire network i.e. all the nodes connected -
MATCH pathx =(n)--()-[:Relationship]-(find), pathy= (p)--()-[:Relationship]-
(seek) WHERE find.name = "action"
AND seek.name="requestID" RETURN pathx, pathy;

Get nodes connected only to other nodes that have a property in a range

I have 4 types of nodes: S, G, R and C
S nodes have an idStr property that identifies them.
Every node of type G uses just a S node: (:G)-[:USES]->(:S)
Every node of type C may be connected to multiple R or G nodes: (:C)-[:CONNECTED_TO]->(:R|:G)
Every node of type R may be connected to multiple R or G nodes: (:R)-[:CONNECTED_TO]->(:R|:G)
Question:
Given an idStr range, I want to get all R and C nodes that are connected (directly or indirectly) only to G nodes that use S nodes with an idStr in that range.
The closest approach I have achieved is:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs
MATCH p=(n)-[:CONNECTED_TO*]->(c:G)
WITH FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
but still some nodes that are connected to G nodes that use S nodes not in the range are being returned. [Neo4j Console Test]
What am I trying to do?
First match is used to get two things: G nodes that use S nodes with idStr in the given range (GroupGs) and the C nodes that are connected to those G nodes.
Once we get that, we have to check if those C nodes are connected to more G nodes (directly or through R nodes). That is the second match.
Now we have to check for each C node if all the G nodes connected to it (directly or through R nodes) are in the GroupGs range. If it is so, that C node (and the R nodes in the paths to the G nodes) are a match, and that is what I am trying to get.
Second approach (suggested by #FrobberOfBits)
Trying to use just one match, so we are sure the n node is the same in the matching:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C), p=(n)-[:CONNECTED_TO*]->(c:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs, FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
The result is the same. [Neo4j Console Test]
Third approach (suggested by #FrobberOfBits)
Giving semantics to the problem, C may be an endpoint in a network, R a repeater, G a gateway and S a Sim card.
Sim nodes have an iccid property that identifies them.
Every node of type Gateway uses just a Sim node: (:Gateway)-[:USES]->(:Sim)
Every node of type Endpoint may be connected to multiple Repeater or Gateway nodes: (:Endpoint)-[:CONNECTED_TO]->(:Repeater|:Gateway)
Every node of type Repeater may be connected to multiple Repeater or Gateway nodes: (:Repeater)-[:CONNECTED_TO]->(:Repeater|:Gateway)
I am trying to get all the Repeater and Endpoint nodes that are just connected to Gateway nodes that are using Sim nodes whose iccid are in a range.
Any idea about what am I doing wrong?
Your query is really confusing things with the variables you choose -- binding "a" to label S's, and "b" to label G's? Later binding "c" to "G's" in the second match clause? This query is going to be hard to debug in the future, and makes it hard to see what's going on; consider binding label "G" to "g", or "gs", or similar, and so on.
I think your problem is the second match clause. The (c:G) in the second match clause doesn't relate to anything in the first (which is (b:G)). This means that the path via a set of CONNECTED_TO* relationships from some node to some (c:G) has nothing to do with the complex match on the first line of the query. This second match matches anything labeled G, not just the things you specify in the first match.
That second match is bad because of the requirement you stated:
only to G nodes that use S nodes with an idStr in that range
I don't have your test data, so I can't verify that this works. But here's something to try instead:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C),
p=(n)-[:CONNECTED_TO*]->(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs,
FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
Apologies if the syntax edited here isn't perfect; this is a complex query and is going to take some fiddling, but I think the placement and mis-labeling of that second MATCH is your issue. My solution may not be perfect and may require tinkering, but should get you there.
I think I finally got it:
MATCH (a:S)<-[:USES]-(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(b) AS GroupGs
MATCH (c)-[:CONNECTED_TO*]->(d:G)
WHERE NOT d IN GroupGs
WITH COLLECT(c) AS badCandidates,GroupGs
MATCH (e)-[:CONNECTED_TO*]->(f:G)
WHERE NOT e IN badCandidates AND f IN GroupGs
RETURN e
First I get GroupGs: all the G nodes that use a S node with an idStr property in the given range.
Now I collect all the C and R nodes that are connected to a G node not in the GroupGs and I call them badCandidates.
Finally, I get all the C and R nodes that are not in the badCandidates collection and are connected to a G node in the GroupGs.
Here you have an example: [Neo4j Console Test]
I hope this helps someone.

Resources