Cypher: Exclude shared nodes? - neo4j

I have what feels like a fairly simple Cypher question.
I have the following data, where I have two A nodes and 3 B nodes with b1 being related to a1, b2 related to a2, and b3 is shared and related to both a1 and a2. My goal is write a Cypher statement that, given a particular A node, will return the B nodes that are connected only to it and are not related with any other A node. For example, when given node a1 the query should return b1 and when given node a2, b2 should be returned. The b3 node, which is relate to both a1 and a2, should never be returned from this query regardless of which A node is specified. Said differently, I am trying to find the B nodes that are unique to a given A node, in that the resulting B nodes are not related to any other A node other than the one specified in my match.
This example data will (hopefully) make my goal more clear:
CREATE (n:A { code: 'a1' })
CREATE (n:A { code: 'a2' })
CREATE (n:B { code: 'b1' })
CREATE (n:B { code: 'b2' })
CREATE (n:B { code: 'b3' })
match (a:A), (b:B) where a.code = 'a1' and b.code = 'b1' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a2' and b.code = 'b2' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a1' and b.code = 'b3' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a2' and b.code = 'b3' create (a)<-[r:A_AND_B]-(b) return a, r, b
If I were willing to include the shared b3 node, the query would be straight forward and be:
match (a:A)-[r:A_AND_B]-(b:B) where a.code = 'a1' return b
This returns b1 and b2. However, given that I do want to include any B nodes that are relate to a different A node (b2 should not be returned in this case), I'm struggling to come up with the right approach and syntax.
I've explored explored Cypher's WITH and OPTIONAL MATCH with so far no luck. I am also able to accomplish what I want if I use two separate queries, which is a bit of a cheat and sidesteps the learning opportunity.
Can someone provide a boost?

How about something like this:
match (a:A)-[:A_AND_B]-(b:B)
where a.code = 'a1'
match (b)-[r:A_AND_B]-(:A)
with b, count(r) as c
where c = 1
return b

Related

Query to write hops and return all the properties from the middle nodes or a better way to do it and skip hops?

Node A is connected to Node E through different nodes B (B can be repeating), C and D etc as given below.
(A)--(C)--(D)--(E)
(A)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(B)--(C)--(D)--(E)
There could be up to 7 B nodes between A and C or no B node at all (like the first case above).
Question: How to get all the E1, E2, E3, E4 connected to A1 with a single query and return properties from all A, B, C, D and E nodes? I could not return the properties using the hops.
MATCH (A {Id:30})-[*1..6]-(E) RETURN DISTINCT A.Name, E.Name;
But we want to return B.Name (If there are multiple B nodes in the middle their names too), C.Name and D.Name too. Happy to skip hopping completely if required. Help please? Thanks in advance.
Try this
// in case there is always A,C and E, you can look for
// paths with length 3 to 6
MATCH path=(A)-[*3..6]-(E)
// return the name of each node in the same order
RETURN [n IN nodes(path) | n.name] AS nodeNames
Assuming A through E are node labels, this query should get all paths that match your pattern (with 0 to 7 B nodes between the A and C nodes), and return distinct lists of node Name values:
MATCH p=(:A)-[*..8]-(:C)--(:D)--(:E)
WHERE ALL(n IN NODES(p)[1..-3] WHERE 'B' IN LABELS(n))
RETURN DISTINCT [m IN NODES(p) | m.Name] AS names
In general, the query would be more efficient if you could also specify the relationship types and their directionality.

How do you find all nodes with exactly N relationships of a single type?

I have some nodes A connected to nodes B via a relationship rel. What query will I write to return all B which are related to exactly 3 A or exactly 4 A or exactly n A via the relation rel? I achieved this for 2 by using the following match clause:
MATCH (a1:A)<-[:rel]-(b:B)-[:rel]->(a2:A)
I also require to know which 3 A are connected to that B as I need to count the occurrences of that set of 3 A
You can aggregate by COUNT:
MATCH (b:B)-[:rel]->(a:A)
WITH b, count(a) AS cnt WHERE cnt = 3 // or for example WHERE cnt IN [3, 4]
RETURN b
Upd: Try to use the COLLECT and SIZE functions if you need to return nodes connected to (:B):
MATCH (b:B)-[:rel]->(a:A)
WITH b,
collect(a) AS nds WHERE size(nds) = 3
RETURN b, nds
I guess it can be shorter
MATCH (b:B)
WHERE size((b)-[:rel]->(:A)) = 3
RETURN b,[(b)-[:rel]->(a:A) | a ] AS nds

In Neo4J, how to match all nodes that are relted to nodes that all relate to a specific node?

Suppose I have Red, Blue and Green nodes (R, B, G) and the relationships might look like this:
As you can see, R points to B and G also points to B. I want to match all R nodes where all the B nodes they point to are also related to a specific G node. How would I do this?
You can set this up on your own database by running something like this:
CREATE
(R1:Test_R),
(B1:Test_B),
(G:Test_G),
(R2:Test_R),
(B2:Test_B),
(R1)-[:TEST_LINK]->(B1),
(R1)-[:TEST_LINK]->(B2),
(R2)-[:TEST_LINK]->(B1),
(G)-[:TEST_LINK]->(B1)
RETURN
R1, R2, B1, B2, G
You can then query them by running something like this:
MATCH
(R:Test_R)-[:TEST_LINK]->(B:Test_B)
OPTIONAL MATCH
(B)<-[:TEST_LINK]-(G:Test_G)
RETURN
R,B,G
You can do it using a query something like the following:
MATCH
(R:Test_R)-[:TEST_LINK]->(B:Test_B)
WITH
{R: R, B: COLLECT(B) } AS d
MATCH
(G:Test_G)
WHERE
ID(G) = 5770 // Match our specific G node
AND ALL(b IN d.B WHERE (b)<-[:TEST_LINK]-(G) )
RETURN d.R
The ALL function will return true if all items in the list return true for the predicate specified:
ALL(<variable> IN <list> WHERE <predicate)

Cypher : Return Nodes that matched along with Nodes that didn't match

With Labels A, B, and Z, A and B have their own relationships to Z. With the query
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z
RETURN DISTINCT a, matched_z
Which returns the nodes of A and all the Nodes Z that have a relationship to A and B
I'm stuck on trying to ALSO return a separate array of the Z Nodes that B has with Z but not with A (i.e. missing_z). I am attempting to do an initial query to return all the relationships between B & Z
results = MATCH (b:B { uuid: {id} })
MATCH (b)-[:rel2]->(z:Z)
RETURN DISTINCT COLLECT(z.uuid) AS z
MATCH (a:A)
MATCH (b:B { uuid: {id} })
MATCH (a)-[:rel1]->(z:Z)<-[:rel2]-(b)
WITH a, COLLECT(z) AS matched_z, z
RETURN DISTINCT a, matched_z, filter(skill IN z.array WHERE NOT z.uuid IN {results}) AS missing_z
The results seem to have nil for missing_z where one would assume it should be populated. Not sure if filter is the correct way to go with a WHERE NOT / IN scenario. Can the above 2 queries be combined into 1?
The hard part here, in my opinion, is that any failed matches will drop everything you have matched so far. But your starting point seems to be "All Z related by B.uuid", So start by collecting that and filtering/copying from there.
Use WITH + aggregation functions to copy+filter columns
Use OPTIONAL MATCH if a failure to match shouldn't drop already collected rows.
If I understand what you are trying to do well enough, This cypher should do the job, and just adjust it as needed (let me know if you need help understanding any part of it/adapting it)
// Match base set
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
// Collect into single list
WITH COLLECT(z) as zs
// Match all A (ignore relation to Zs)
MATCH (a:A)
// For each a, return a, the sub-list of Zs related to a, and the sub-list of Zs not related to a
RETURN a as a, FILTER(n in zs WHERE (a)-[:rel1]->(n)) as matched, FILTER(n in zs WHERE NOT (a)-[:rel1]->(n)) as unmatched
This query might do what you want:
MATCH (z:Z)<-[:rel2]-(b:B { uuid: {id} })
WITH COLLECT(z) as all_zs
UNWIND all_zs AS z
MATCH (a)-[:rel1]->(z)
WITH all_zs, COLLECT(DISTINCT z) AS matched_zs
RETURN matched_zs, apoc.coll.subtract(all_zs, matched_zs) AS missing_zs;
It first stores in the all_zs variable all the Z nodes that have a rel2 relationship from b. This collection's contents remain unaffected even if the second MATCH clause matches a subset of those Z nodes.
It then stores in matched_zs the distinct all_zs nodes that have a rel1 relationship from any A node.
Finally, it returns:
the matched_zs collection, and
the unique nodes from all_zs that are not also in matched_zs, as missing_zs.
The query uses the convenient APOC function apoc.coll.subtract to generate the latter return value.

Neo4j Cypher path using several times the same edge

Let us consider a simple example with two types of nodes: Company and Worker. For any couple of companies c1 and c2 (which respect some conditions that I will ignore here), I'd need to know: 1. How many workers they have in common, how many workers has c1, and how many workers has c2.
My first guess was :
MATCH (w_c1:Worker)--(c1:Company)--(w_common)--(c2:Company)--(w_c2:Worker)
WHERE <something>
RETURN c1, c2, COUNT(DISTINCT w_common), COUNT(DISTINCT w_c1), COUNT(DISTINCT w_c1)
The problem with that request is that, if I have only one link between any pair of connected nodes, COUNT(DISTINCT w_c1) (id for w_c2) does only count the worker of c1 which are not common with c2. But if I have several relations between some nodes, the results is sometimes "correct". It sounds like the path in the match does not "come back" : (w_common)--(c2:Company)--(w_c2:Worker) will no match ("worker1")--("company2")--("worker1") (which may make sense to avoid infinite loops).
My second guess was to split the request in two parts:
My first guess was :
MATCH (c1:Company)--(w_common)--(c2:Company)
MATCH (c1)--(w_c1:Worker), (c2)--(w_c2:Worker)
WHERE <something>
RETURN c1, c2, COUNT(DISTINCT w_common), COUNT(DISTINCT w_c1), COUNT(DISTINCT w_c1)
But then, the results is correct but I have a warning about cartesian products, and indeed, on big dataset, my request does not complete after hours. I tried with a "WITH c1, w_common, c2" between the two matches, but I still have the warning
How should I proceed ?
One thing that will help you is the SIZE() function, which can tell you the number of occurrences of a pattern, such as the number of :Workers per :Company.
This query may work for you, assuming that a Worker working for a Company only has one relationship to that Company:
MATCH (c1:Company)--(w_common:Worker)--(c2:Company)
WHERE <your criteria for matching on a specific c1 and c2>
RETURN COUNT(w_common) as inCommonCount, SIZE( (c1)--(:Worker) ) as c1Count, SIZE( (c2)--(:Worker) ) as c2Count
You can use sub-totals:
OPTIONAL MATCH (C1:Company {name: 'c1'})
OPTIONAL MATCH (C2:Company {name: 'c2'})
WITH C1, C2
MATCH (C:Company)<-[:workto]-(W:Worker) WHERE C = C1 OR C = C2
WITH C1, C2, W,
sum(CASE WHEN C = C1 THEN 1 ELSE 0 END) as tmp1,
sum(CASE WHEN C = C2 THEN 1 ELSE 0 END) as tmp2
RETURN C1, C2,
sum(tmp1) as cc1, sum(tmp2) as cc2,
sum(tmp1 * tmp2 ) as common

Resources