Get nodes connected only to other nodes that have a property in a range - neo4j

I have 4 types of nodes: S, G, R and C
S nodes have an idStr property that identifies them.
Every node of type G uses just a S node: (:G)-[:USES]->(:S)
Every node of type C may be connected to multiple R or G nodes: (:C)-[:CONNECTED_TO]->(:R|:G)
Every node of type R may be connected to multiple R or G nodes: (:R)-[:CONNECTED_TO]->(:R|:G)
Question:
Given an idStr range, I want to get all R and C nodes that are connected (directly or indirectly) only to G nodes that use S nodes with an idStr in that range.
The closest approach I have achieved is:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs
MATCH p=(n)-[:CONNECTED_TO*]->(c:G)
WITH FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
but still some nodes that are connected to G nodes that use S nodes not in the range are being returned. [Neo4j Console Test]
What am I trying to do?
First match is used to get two things: G nodes that use S nodes with idStr in the given range (GroupGs) and the C nodes that are connected to those G nodes.
Once we get that, we have to check if those C nodes are connected to more G nodes (directly or through R nodes). That is the second match.
Now we have to check for each C node if all the G nodes connected to it (directly or through R nodes) are in the GroupGs range. If it is so, that C node (and the R nodes in the paths to the G nodes) are a match, and that is what I am trying to get.
Second approach (suggested by #FrobberOfBits)
Trying to use just one match, so we are sure the n node is the same in the matching:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C), p=(n)-[:CONNECTED_TO*]->(c:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs, FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
The result is the same. [Neo4j Console Test]
Third approach (suggested by #FrobberOfBits)
Giving semantics to the problem, C may be an endpoint in a network, R a repeater, G a gateway and S a Sim card.
Sim nodes have an iccid property that identifies them.
Every node of type Gateway uses just a Sim node: (:Gateway)-[:USES]->(:Sim)
Every node of type Endpoint may be connected to multiple Repeater or Gateway nodes: (:Endpoint)-[:CONNECTED_TO]->(:Repeater|:Gateway)
Every node of type Repeater may be connected to multiple Repeater or Gateway nodes: (:Repeater)-[:CONNECTED_TO]->(:Repeater|:Gateway)
I am trying to get all the Repeater and Endpoint nodes that are just connected to Gateway nodes that are using Sim nodes whose iccid are in a range.
Any idea about what am I doing wrong?

Your query is really confusing things with the variables you choose -- binding "a" to label S's, and "b" to label G's? Later binding "c" to "G's" in the second match clause? This query is going to be hard to debug in the future, and makes it hard to see what's going on; consider binding label "G" to "g", or "gs", or similar, and so on.
I think your problem is the second match clause. The (c:G) in the second match clause doesn't relate to anything in the first (which is (b:G)). This means that the path via a set of CONNECTED_TO* relationships from some node to some (c:G) has nothing to do with the complex match on the first line of the query. This second match matches anything labeled G, not just the things you specify in the first match.
That second match is bad because of the requirement you stated:
only to G nodes that use S nodes with an idStr in that range
I don't have your test data, so I can't verify that this works. But here's something to try instead:
MATCH (a:S)<-[:USES]-(b:G)<-[:CONNECTED_TO*]-(n:C),
p=(n)-[:CONNECTED_TO*]->(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(DISTINCT b) AS GroupGs,
FILTER(x IN NODES(p) WHERE NOT x:G) AS cs,GroupGs,COLLECT(c) AS gs
WHERE ALL(x IN gs WHERE x IN GroupGs)
RETURN cs
Apologies if the syntax edited here isn't perfect; this is a complex query and is going to take some fiddling, but I think the placement and mis-labeling of that second MATCH is your issue. My solution may not be perfect and may require tinkering, but should get you there.

I think I finally got it:
MATCH (a:S)<-[:USES]-(b:G)
WHERE a.idStr IN ['1a','b2','something']
WITH COLLECT(b) AS GroupGs
MATCH (c)-[:CONNECTED_TO*]->(d:G)
WHERE NOT d IN GroupGs
WITH COLLECT(c) AS badCandidates,GroupGs
MATCH (e)-[:CONNECTED_TO*]->(f:G)
WHERE NOT e IN badCandidates AND f IN GroupGs
RETURN e
First I get GroupGs: all the G nodes that use a S node with an idStr property in the given range.
Now I collect all the C and R nodes that are connected to a G node not in the GroupGs and I call them badCandidates.
Finally, I get all the C and R nodes that are not in the badCandidates collection and are connected to a G node in the GroupGs.
Here you have an example: [Neo4j Console Test]
I hope this helps someone.

Related

Select paths based on information of linked nodes

I have following problem:
I want to select paths nodes with type A. But I do not want all paths, only ones with specific properties. The problem is, that in our datamodel these properties are stored in a separate node of type AD. For the start and endpoint everything works fine. And I think I also have worked out the general structure as this query here works perfectly fine.
MATCH (n:A)-->(ad:AD) WHERE ad.name='AD0'
WITH n AS start
MATCH (n:A)-->(ad:AD) WHERE ad.name='AD3'
WITH n AS end, start
MATCH p = (start) -[:L*0..10]-> (end)
WHERE ALL (x in nodes(p) [1..-1] WHERE ( (x.name STARTS WITH 'ad1' OR x.name STARTS WITH 'ad2')))
return p
The problem here is, that I get the property for the intermediate nodes out of the nodes of type A, which will not pe possible in our final model. For testing I added a property to A containing the information normally stored in AD.
The result should only contain nodes of type A linked to nodes of type AD and AD.name should be AD0... AD3, but I want to exclude nodes of type A linked to AD nodes with AD.name='AD4' for example.
For this I tried the following query, but it only returns path containing nodes A linked to nodes AD with AD.name = AD0 or AD3.
MATCH (n:A)-->(ad:AD WHERE ad.name='AD0'
WITH n AS start
MATCH (n:A)-->(ad:AD) WHERE ad.name='AD3'
WITH n AS end, start
MATCH (n:AD) WITH n AS ad, end, start //somehow needed otherwise I cannot use AD in the where clause
MATCH p = (start) -[:L*0..]-> (end)
WHERE ALL (
x in nodes(p) [1..-1] WHERE (
((x)-->(ad:AD))
AND
(ad.name ='AD1' OR ad.name='AD2')
)
)
return p
Any idea why paths containing only nodes of type A linked to nodes of type AD with AD.name =AD1 or AD2 are not returned?
I was able to solve this. No idea if there is a better way, but I had to put my intermediate nodes in a seperate list, otherwise I cannot use it in the where part of the WHERE ALL clause. The working code looks like this:
MATCH (n:A)-->(ad:AD WHERE ad.name='AD0'
WITH n AS start
MATCH (n:A)-->(ad:AD) WHERE ad.name='AD3'
WITH n AS end, start
MATCH (n:A)-->(ad:AD) WHERE (ad.name IN ['AD1', 'AD2'])
WITH collect(n) AS intermediates, sinks, sources
MATCH p = (start) -[:L*0..]-> (end)
WHERE ALL (
x IN nodes(p) [1..-1] WHERE (
x IN intermediates
)
)
return p

Neo4j Cypher find two disjoint nodes

I'm using Neo4j to try to find any node that is not connected to a specific node "a". The query that I have so far is:
MATCH p = shortestPath((a:Node {id:"123"})-[*]-(b:Node))
WHERE p IS NULL
RETURN b.id as b
So it tries to find the shortest path between a and b. If it doesn't find a path, then it returns that node's id. However, this causes my query to run for a few minutes then crashes when it runs out of memory. I was wondering if this method would even work, and if there is a more efficient way? Any help would be greatly appreciated!
edit:
MATCH (a:Node {id:"123"})-[*]-(b:Node),
(c:Node)
WITH collect(b) as col, a, b, c
WHERE a <> b AND NOT c IN col
RETURN c.id
So col (collect(b)) contains every node connected to a, therefore if c is not in col then c is not connected to a?
For one, you're giving this MATCH an impossible predicate to fulfill, so it will never find the shortest path.
WHERE clauses are associated with MATCH, OPTIONAL MATCH, and WITH clauses, so your query is asking for the shortest path where the path doesn't exist. That will never return anything.
Also, the shortestPath will start at the node you DON'T want to be connected, so this has no way of finding the nodes that aren't connected to it.
Probably the easiest way to approach this is to MATCH to all nodes connected to your node in question, then MATCH to all :Nodes checking for those that aren't in the connected set. That means you won't have to do a shortestPath from every single node in the db, just a membership check in a collection.
You'll need APOC Procedures for this, as it has the fastest means of matching to nodes within a subgraph.
MATCH (a:Node {id:"123"})
CALL apoc.path.subgraphNodes(a, {}) YIELD node
WITH collect(node) as subgraph
MATCH (b:Node)
WHERE NOT b in subgraph
RETURN b.id as b
EDIT
Your edited query is likely to blow up, that's going to generate a huge result set (the query will build a result set of every node reachable from your start node by a unique path in a cartesian product with every :Node).
Instead, go step by step, collect the distinct nodes (because otherwise you'll get multiples of the same nodes that can be reached via different paths), and then only after you have your collection should you start your match for nodes that aren't in the list.
MATCH (:Node {id:"123"})-[*0..]-(b:Node)
WITH collect(DISTINCT b) as col
MATCH (a:Node)
WHERE NOT a IN col
RETURN a.id

Neo4j - Intersect two node lists using Cypher

Having the following graphs:
node g1 with child nodes (a, b)
node g2 with child nodes (b, c)
using the query
MATCH (n)-[]-(m) WHERE ID(m) = id RETURN n
being id the id of the node g1, I get a and b, and vice-versa when using the id of g2. What I would like to understand is how can I get the intersection of those two results, in this case having the first return (a, b) and the second return (b, c) getting as final result (b).
I tried using the WITH cause but I wasn't able to achieve the desired result. Keep in mind that I'm new to Neo4j and only came here after a few failed attempts, research on Neo4j Documentation, general google search and
Stackoverflow.
Edit1 (one of my tries):
MATCH (n)-[]->(m)
WHERE ID(m) = 750
WITH n
MATCH (o)-[]->(b)
WHERE ID(b) = 684 and o = n
RETURN o
Edit2:
The node (b), that I represented as being the same on both graphs are in fact two different nodes on the db, each one relating to a different graph (g1 and g2). Representatively they are the same as they have the exactly same info (labels and attributes), but on the database thy are not. I'm sorry since it was my fault for not being more explicit on this matter :(
Edit3:
Why I don't using a single node (b) for both graphs
Using the graphs above as example, imagine that I have yet another layer so: on g1 the child node (b) as a child (e), while on g2 the child node (b) as a child (f). If I had (b) as a single node, when I create (e) and (f) I only could add it to (b) loosing the hierarchy, becoming impossible to distinguish which of them, (e) or (f), belonged to g1 ou g2.
This should work (assuming you pass id1 and id2 as parameters):
MATCH (a)--(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN n;
[UPDATED, based on new info from comments]
If you have multiple "clones" of the "same" node and you want to quickly determine which clones are related without having to perform a lot of (slow) property comparisons, you can add a relationship (say, typed ":CLONE") between clones. That way, a query like this would work:
MATCH (a)--(m)-[:CLONE]-(n)--(c)
WHERE ID(a) = {id1} AND ID(c) = {id2}
RETURN m, n;
You can find the duplicity of the node, by using this query -
[1]
Duplicity with single node -
MATCH pathx =(n)-[:Relationship]-(find) WHERE find.name = "action" RETURN pathx;
[2]
or for two nodes giving only immediate parent node
MATCH pathx =(n)-[:Relationship]-(find), pathy= (p)-[:Relationship]
-(seek) WHERE find.name = "action" AND seek.name="requestID" RETURN pathx,
pathy;
[3]
or to find the entire network i.e. all the nodes connected -
MATCH pathx =(n)--()-[:Relationship]-(find), pathy= (p)--()-[:Relationship]-
(seek) WHERE find.name = "action"
AND seek.name="requestID" RETURN pathx, pathy;

Filtering out nodes on two cypher paths

I have a simplified Neo4j graph (old version 2.x) as the image with 'defines' and 'same' edges. Assume the number on the define edge is a property on the edge
The queries I would like to run are:
1) Find nodes defined by both A and B -- Requried result: C, C, D
START A=node(885), B=node(996) MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
Above works and returns C and D. But I want C twice since its defined twice. But without the distinct on x, it returns all the paths from A to B.
2)Find nodes that are NOT (defined by both A,B OR are defined by both A,B but connected via a same edge) -- Required result: G
Something like:
R1: MATCH (A-[:define]->(x)<-[:define]-B) RETURN DISTINCT x
R2: MATCH (A-[:define]->(e)-(:similar)-(f)<-[:define]-B) RETURN e,f
(Nodes defined by A - (R1+R2) )
3) Find 'middle' nodes that do not have matching calls from both A and B --Required result: C,G
I want to output C due to the 1 define(either 45/46) that does not have a matching define from B.
Also output G because there's no define to G from B.
Appreciate any help on this!
Your syntax is a bit strange to me, so I'm going to assume you're using an older version of Neo4j. We should be able to use the same approaches, though.
For #1, Your proposed match without distinct really should be working. The only thing I can see is adding missing parenthesis around A and B node variables.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
RETURN x
Also, I'm not sure what you mean by "returns all paths from A to B." Can you clarify that, and provide an example of the output?
As for #2, we'll need several several parts to this query, separating them with WITH accordingly.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)<-[:define]-(B)
WITH A, B, COLLECT(DISTINCT x) as exceptions
OPTIONAL MATCH (A)-[:define]->(x)-[:same]-(y)<-[:define]-(B)
WHERE x NOT IN exceptions AND y NOT IN exceptions
WITH A, B, exceptions + COLLECT(DISTINCT x) + COLLECT(DISTINCT y) as allExceptions
MATCH (aNode)
WHERE aNode NOT IN allExceptions AND aNode <> A AND aNode <> B
RETURN aNode
Also, you should really be using labels on your nodes. The final match will match all nodes in your graph and will have to filter down otherwise.
EDIT
Regarding your #3 requirement, the SIZE() function will be very helpful here, as you can get the size of a pattern match, and it will tell you the number of occurrences of that pattern.
The approach on this query is to first get the collection of nodes defined by A or B, then filter down to the nodes where the number of :defines relationships from A are not equal to the number of :defines relationships from B.
While we would like to use something like a UNION WITH in order to get the union of nodes defined by A and union it with the nodes defined by B, Neo4j's UNION support is weak right now, as it doesn't let you do any additional operations after the UNION happens, so instead we have to resort to adding both sets of nodes into the same collection then unwinding them back into rows.
START A=node(885), B=node(996)
MATCH (A)-[:define]->(x)
WITH A, B, COLLECT(x) as middleNodes
MATCH (B)-[:define]->(x)
WITH A, B, middleNodes + COLLECT(x) as allMiddles
UNWIND allMiddles as middle
WITH DISTINCT A, B, middle
WHERE SIZE((A)-[:define]->(middle)) <> SIZE((B)-[:define]->(middle))
RETURN middle

How to determine a set of nodes based on the incoming relationship of another set of nodes and some special conditions

I've got a Cypher query that gets a set of nodes 'n' of type 't', say (it works it's way through a number of different node types in the graph to reach this point).
If we assume the following:
The rest of type t nodes are the set 'm', so no intersect between m and n.
Type t nodes have multiple types of relationships between them.
I have a specific relationship 'r' that I'm interested in. In this specific case I know the following to be true:
Type t nodes can have 0 or more of these r relationships, incoming/outgoing.
The nodes within set n have no outgoing r relationships to set m
The nodes within set m may have outgoing r relationships to set m or n.
I have set n, I'm trying to determine the nodes from set m that meet the following conditions:
Have 0 r relationships
OR
Only have r relationships to set n, but not to any node in set m.
Some example data:
Type t nodes:
n1, n2, n3
m1, m2, m3
Type r relationships
m1 (no r relationships)
m2->n1, m2->n2
m3->n3, m3->m2
The results should return m1 and m2, but not m3.
I'm quite new to Cypher, so feel free to point to relevant documentation as required. Also, if you can explain the process you go through to determine the answer, I'd appreciate that as I suspect I'm just not quite understanding something simple here.
Your example is more model than data, you may know how to tell m:s and n:s apart but I cant write a query on the identifiers alone, there must be some actual data or structure to discriminate. For isntance, assume all nodes in the graph are type t, let sets n, m be distinguished by labels :N, :M, let the identifiers you use be values for property uid (to make the query results map with your question), and let type r relationship be [:R], then create your graph with
CREATE
(n1:N{uid:"n1"}), (n2:N{uid:"n2"}), (n3:N{uid:"n3"})
,(m1:M{uid:"m1"}), (m2:M{uid:"m2"}), (m3:M{uid:"m3"})
, m2-[:R]->n1, m2-[:R]->n2
, m3-[:R]->n3, m3-[:R]->m2
The query could then look something like
MATCH (n:N) // bind each node in the set n
WITH collect(n) AS nn // collect and treat them as a set nn
MATCH (m:M) // grab each node in the set m
OPTIONAL MATCH m-[:R]->(x) // optionally expand from m to unknown by r
WITH nn, m, collect(x) AS xx // collect unknown per m as xx where
WHERE ALL (x IN xx // all unknown nodes are in the nn set
WHERE x IN nn) // (if m has no -[:R]-> then the set xx is empty
// and the condition is true–i.e.
// either m has no outgoing r or
// the other node is in nn)
RETURN m
Result
m
(3:M {uid:"m1"})
(4:M {uid:"m2"})
You can try the query here.

Resources