Neo4j: How to fix one of the node in match clause? - neo4j

For clauses like MATCH (a:Address)-[:BelongTo]->(w1:Wallet), (a)-[r0:BelongTo]->(w2:Wallet) WHERE ID(w1)>ID(w2) WITH w1, w2..., is it possible to make sure that ex. w1 is always a fixed node? If yes, is it possible to decide on the node by choosing the node having ex. the minimum value for a certain property over all the nodes which could also be w1?
More concretely, for example, an address a belong to wallet a, b, c with a>b>c in terms of ID. Then normally these rows of result will be returned:
w1 w2
--------
a b
b c
a c
I only want these two rows of result to be returned:
w1 w2
--------
a b
a c
Note: I want the query try to get every pair of wallets to both an address belongs to. All addresses which belongs to two or more wallet should be included in return if a is returned.
So for example, If there are two addresses which belong to three different wallets, what would the query you posted do?
More concretely, if addresses a1 and a2 belong to b1, c1, d1 and b2, c2, d2 respectively, (with b1 > c1 > d1> b2> c2>d2 in terms of id)
I want it to return:
a w1 w2
-----------
a1 b1 c1
a1 b1 d1
a2 b2 c2
a2 b2 d2
Is it possible?

Yes, you can do this by finding (for each a:Address), the :Wallet with the minimum id. After you match to this :Wallet, you can match to all the other :Wallets.
MATCH (a:Address)-[:BelongTo]->(w1:Wallet)
WITH a, min(id(w1)) as minId
// since we have the minId, we can do a fast lookup of the node
MATCH (minW:Wallet)
WHERE id(minW) = minId
// now get all the others
MATCH (a)-[:BelongTo]->(w2:Wallet)
WHERE minW <> w2
...
If you don't care how the fixed node is taken, and if it only matters for the duration of the query, it may be easier to collect all the :Wallet nodes, take the first node in the collection, and then UNWIND the rest into rows and continue the query:
MATCH (a:Address)-[:BelongTo]->(w:Wallet)
WITH a, collect(w) as wallets
WITH a, head(wallets) as w1, wallets
UNWIND tail(wallets) as w2
...

Related

Selecting one of two nodes in Cypher query

all Cypher masters!
I can't figure out how to query all B nodes while choosing either B1 or B2 and B4 or B5. There is no constraint on which of them, only that one is chosen. As in the image, there's a relation (B1,B2) and (B4,B5).
In other words - I want to MATCH all nodes of type B connected to some node with type A, but excluding either B1 or B2 and B4 or B5 (using the relation between them) in the result. The nodes of type B can only be pairwise connected - that is, no (B1,B2), (B2,B3) will exists simultaneously. Although, there can be more than one pair as the image shows.
Any ideas are more than welcome!
I think this is a simple additional condition:
MATCH (A:A)--(B:B)
OPTIONAL MATCH (B)--(BT:B)--(A)
WITH B WHERE BT IS NULL OR id(B) > id(BT)
RETURN B
So for this, it will be faster to use APOC Procedures, as there are some helpful collection functions, and a procedure we'll want to easily get the relationships that exist between a group of nodes.
The idea here is that we'll match to the connected :B nodes, use the cover() procedure to get all relationships among these nodes, collect those relationships and from those take one of the nodes for those relationships (we'll use the startnode here), and then we'll just subtract those chosen nodes from our list leaving us with the :B nodes we want:
MATCH (a:A)--(b:B)
WITH collect(b) as bNodes
CALL apoc.algo.cover(bNodes) YIELD rel
WITH bNodes, [r in collect(rel) | startNode(r)] as toRemove
RETURN apoc.coll.subtract(bNodes, toRemove) as nodes
If you don't have (or don't want to use) APOC, here's a Cypher-only version:
MATCH (a:A)--(b:B)
WITH collect(b) as bNodes
UNWIND bNodes as b
OPTIONAL MATCH (b)-[r]-(other)
WHERE other IN bNodes
WITH bNodes, collect(DISTINCT startNode(r)) as toRemove
RETURN [b in bNodes WHERE NOT b in toRemove] as nodes

Neo4j Cypher path using several times the same edge

Let us consider a simple example with two types of nodes: Company and Worker. For any couple of companies c1 and c2 (which respect some conditions that I will ignore here), I'd need to know: 1. How many workers they have in common, how many workers has c1, and how many workers has c2.
My first guess was :
MATCH (w_c1:Worker)--(c1:Company)--(w_common)--(c2:Company)--(w_c2:Worker)
WHERE <something>
RETURN c1, c2, COUNT(DISTINCT w_common), COUNT(DISTINCT w_c1), COUNT(DISTINCT w_c1)
The problem with that request is that, if I have only one link between any pair of connected nodes, COUNT(DISTINCT w_c1) (id for w_c2) does only count the worker of c1 which are not common with c2. But if I have several relations between some nodes, the results is sometimes "correct". It sounds like the path in the match does not "come back" : (w_common)--(c2:Company)--(w_c2:Worker) will no match ("worker1")--("company2")--("worker1") (which may make sense to avoid infinite loops).
My second guess was to split the request in two parts:
My first guess was :
MATCH (c1:Company)--(w_common)--(c2:Company)
MATCH (c1)--(w_c1:Worker), (c2)--(w_c2:Worker)
WHERE <something>
RETURN c1, c2, COUNT(DISTINCT w_common), COUNT(DISTINCT w_c1), COUNT(DISTINCT w_c1)
But then, the results is correct but I have a warning about cartesian products, and indeed, on big dataset, my request does not complete after hours. I tried with a "WITH c1, w_common, c2" between the two matches, but I still have the warning
How should I proceed ?
One thing that will help you is the SIZE() function, which can tell you the number of occurrences of a pattern, such as the number of :Workers per :Company.
This query may work for you, assuming that a Worker working for a Company only has one relationship to that Company:
MATCH (c1:Company)--(w_common:Worker)--(c2:Company)
WHERE <your criteria for matching on a specific c1 and c2>
RETURN COUNT(w_common) as inCommonCount, SIZE( (c1)--(:Worker) ) as c1Count, SIZE( (c2)--(:Worker) ) as c2Count
You can use sub-totals:
OPTIONAL MATCH (C1:Company {name: 'c1'})
OPTIONAL MATCH (C2:Company {name: 'c2'})
WITH C1, C2
MATCH (C:Company)<-[:workto]-(W:Worker) WHERE C = C1 OR C = C2
WITH C1, C2, W,
sum(CASE WHEN C = C1 THEN 1 ELSE 0 END) as tmp1,
sum(CASE WHEN C = C2 THEN 1 ELSE 0 END) as tmp2
RETURN C1, C2,
sum(tmp1) as cc1, sum(tmp2) as cc2,
sum(tmp1 * tmp2 ) as common

Cypher: Exclude shared nodes?

I have what feels like a fairly simple Cypher question.
I have the following data, where I have two A nodes and 3 B nodes with b1 being related to a1, b2 related to a2, and b3 is shared and related to both a1 and a2. My goal is write a Cypher statement that, given a particular A node, will return the B nodes that are connected only to it and are not related with any other A node. For example, when given node a1 the query should return b1 and when given node a2, b2 should be returned. The b3 node, which is relate to both a1 and a2, should never be returned from this query regardless of which A node is specified. Said differently, I am trying to find the B nodes that are unique to a given A node, in that the resulting B nodes are not related to any other A node other than the one specified in my match.
This example data will (hopefully) make my goal more clear:
CREATE (n:A { code: 'a1' })
CREATE (n:A { code: 'a2' })
CREATE (n:B { code: 'b1' })
CREATE (n:B { code: 'b2' })
CREATE (n:B { code: 'b3' })
match (a:A), (b:B) where a.code = 'a1' and b.code = 'b1' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a2' and b.code = 'b2' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a1' and b.code = 'b3' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a2' and b.code = 'b3' create (a)<-[r:A_AND_B]-(b) return a, r, b
If I were willing to include the shared b3 node, the query would be straight forward and be:
match (a:A)-[r:A_AND_B]-(b:B) where a.code = 'a1' return b
This returns b1 and b2. However, given that I do want to include any B nodes that are relate to a different A node (b2 should not be returned in this case), I'm struggling to come up with the right approach and syntax.
I've explored explored Cypher's WITH and OPTIONAL MATCH with so far no luck. I am also able to accomplish what I want if I use two separate queries, which is a bit of a cheat and sidesteps the learning opportunity.
Can someone provide a boost?
How about something like this:
match (a:A)-[:A_AND_B]-(b:B)
where a.code = 'a1'
match (b)-[r:A_AND_B]-(:A)
with b, count(r) as c
where c = 1
return b

Find connected groups over levels

A node A has 3 connected Nodes B1, B2, B3. Those Bx Nodes have again connected Nodes C1,C2,C3 and C4. Also Node A have 2 connected nodes C5 and C6.
Starting with node A I want to collect all C-nodes. I did a query for the A node, collect the two C-Nodes, then a query for the B-nodes, collect again all C-nodes and merge both arrays. Work but is not very clever.
I tried (Pseudocode)
MATCH (g)<-[:IS_SUBGROUP_OF*1]-(i)-[:HAS_C_NODES]->(c) WHERE g = A.uuid RETURN C_NODES
But I get either all c-nodes for A or for the B-nodes
How would I do a query that collects all C-Nodes starting with Node A?
* edited *
Here is some example data:
CREATE (a:A), (b1:B1), (b2:B2), (b3:B3), (c1:C1), (c2:C2), (c3:C3), (c4:C4), (a)-[r:HAS]->(c4), (a)-[r1:HAS]->(b1), (a)-[r2:HAS]->(b2), (a)-[r3:HAS]->(b3), (b1)-[r4:HAS]->(c1), (b1)-[r5:HAS]->(c2), (b2)-[r6:HAS]->(c3)
A query should return all nodes starting with C, no matter to which node they are connected (A or B).
You can add multiple labels for each node. You should use this to your advantage and segregate all the B and C nodes into a second label.
Eg:
CREATE (a:A), (b1:B1:BType), (b2:B2:BType), (b3:B3:BType), (c1:C1:CType), (c2:C2:CType), (c3:C3:CType), (c4:C4:CType), (a)-[r:HAS]->(c4), (a)-[r1:HAS]->(b1), (a)-[r2:HAS]->(b2), (a)-[r3:HAS]->(b3), (b1)-[r4:HAS]->(c1), (b1)-[r5:HAS]->(c2), (b2)-[r6:HAS]->(c3)
I have modified your create statement to group all the B nodes as :BType label and all the C nodes as :CType label.
You can simply use the optional match keyword to selectively traverse through the relationships if they exist and obtain the results you want.
match (a:A)-[:HAS]->(b:BType)-[:HAS]->(c:CType) optional match (a:A)-[:HAS]->(xc:CType) return c,xc
If you would like both sets of nodes to be grouped together you could try this statement instead which uses collect().
match (a:A)-[:HAS]->(b:BType)-[:HAS]->(c:CType) with a,collect (distinct c) as set1 optional match (a:A)-[:HAS]->(xc:CType) return set1 + collect (distinct xc) as output

How to determine a set of nodes based on the incoming relationship of another set of nodes and some special conditions

I've got a Cypher query that gets a set of nodes 'n' of type 't', say (it works it's way through a number of different node types in the graph to reach this point).
If we assume the following:
The rest of type t nodes are the set 'm', so no intersect between m and n.
Type t nodes have multiple types of relationships between them.
I have a specific relationship 'r' that I'm interested in. In this specific case I know the following to be true:
Type t nodes can have 0 or more of these r relationships, incoming/outgoing.
The nodes within set n have no outgoing r relationships to set m
The nodes within set m may have outgoing r relationships to set m or n.
I have set n, I'm trying to determine the nodes from set m that meet the following conditions:
Have 0 r relationships
OR
Only have r relationships to set n, but not to any node in set m.
Some example data:
Type t nodes:
n1, n2, n3
m1, m2, m3
Type r relationships
m1 (no r relationships)
m2->n1, m2->n2
m3->n3, m3->m2
The results should return m1 and m2, but not m3.
I'm quite new to Cypher, so feel free to point to relevant documentation as required. Also, if you can explain the process you go through to determine the answer, I'd appreciate that as I suspect I'm just not quite understanding something simple here.
Your example is more model than data, you may know how to tell m:s and n:s apart but I cant write a query on the identifiers alone, there must be some actual data or structure to discriminate. For isntance, assume all nodes in the graph are type t, let sets n, m be distinguished by labels :N, :M, let the identifiers you use be values for property uid (to make the query results map with your question), and let type r relationship be [:R], then create your graph with
CREATE
(n1:N{uid:"n1"}), (n2:N{uid:"n2"}), (n3:N{uid:"n3"})
,(m1:M{uid:"m1"}), (m2:M{uid:"m2"}), (m3:M{uid:"m3"})
, m2-[:R]->n1, m2-[:R]->n2
, m3-[:R]->n3, m3-[:R]->m2
The query could then look something like
MATCH (n:N) // bind each node in the set n
WITH collect(n) AS nn // collect and treat them as a set nn
MATCH (m:M) // grab each node in the set m
OPTIONAL MATCH m-[:R]->(x) // optionally expand from m to unknown by r
WITH nn, m, collect(x) AS xx // collect unknown per m as xx where
WHERE ALL (x IN xx // all unknown nodes are in the nn set
WHERE x IN nn) // (if m has no -[:R]-> then the set xx is empty
// and the condition is true–i.e.
// either m has no outgoing r or
// the other node is in nn)
RETURN m
Result
m
(3:M {uid:"m1"})
(4:M {uid:"m2"})
You can try the query here.

Resources