Neo4j Cypher path using several times the same edge - neo4j

Let us consider a simple example with two types of nodes: Company and Worker. For any couple of companies c1 and c2 (which respect some conditions that I will ignore here), I'd need to know: 1. How many workers they have in common, how many workers has c1, and how many workers has c2.
My first guess was :
MATCH (w_c1:Worker)--(c1:Company)--(w_common)--(c2:Company)--(w_c2:Worker)
WHERE <something>
RETURN c1, c2, COUNT(DISTINCT w_common), COUNT(DISTINCT w_c1), COUNT(DISTINCT w_c1)
The problem with that request is that, if I have only one link between any pair of connected nodes, COUNT(DISTINCT w_c1) (id for w_c2) does only count the worker of c1 which are not common with c2. But if I have several relations between some nodes, the results is sometimes "correct". It sounds like the path in the match does not "come back" : (w_common)--(c2:Company)--(w_c2:Worker) will no match ("worker1")--("company2")--("worker1") (which may make sense to avoid infinite loops).
My second guess was to split the request in two parts:
My first guess was :
MATCH (c1:Company)--(w_common)--(c2:Company)
MATCH (c1)--(w_c1:Worker), (c2)--(w_c2:Worker)
WHERE <something>
RETURN c1, c2, COUNT(DISTINCT w_common), COUNT(DISTINCT w_c1), COUNT(DISTINCT w_c1)
But then, the results is correct but I have a warning about cartesian products, and indeed, on big dataset, my request does not complete after hours. I tried with a "WITH c1, w_common, c2" between the two matches, but I still have the warning
How should I proceed ?

One thing that will help you is the SIZE() function, which can tell you the number of occurrences of a pattern, such as the number of :Workers per :Company.
This query may work for you, assuming that a Worker working for a Company only has one relationship to that Company:
MATCH (c1:Company)--(w_common:Worker)--(c2:Company)
WHERE <your criteria for matching on a specific c1 and c2>
RETURN COUNT(w_common) as inCommonCount, SIZE( (c1)--(:Worker) ) as c1Count, SIZE( (c2)--(:Worker) ) as c2Count

You can use sub-totals:
OPTIONAL MATCH (C1:Company {name: 'c1'})
OPTIONAL MATCH (C2:Company {name: 'c2'})
WITH C1, C2
MATCH (C:Company)<-[:workto]-(W:Worker) WHERE C = C1 OR C = C2
WITH C1, C2, W,
sum(CASE WHEN C = C1 THEN 1 ELSE 0 END) as tmp1,
sum(CASE WHEN C = C2 THEN 1 ELSE 0 END) as tmp2
RETURN C1, C2,
sum(tmp1) as cc1, sum(tmp2) as cc2,
sum(tmp1 * tmp2 ) as common

Related

Query to write hops and return all the properties from the middle nodes or a better way to do it and skip hops?

Node A is connected to Node E through different nodes B (B can be repeating), C and D etc as given below.
(A)--(C)--(D)--(E)
(A)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(C)--(D)--(E)
(A)--(B)--(B)--(B)--(C)--(D)--(E)
There could be up to 7 B nodes between A and C or no B node at all (like the first case above).
Question: How to get all the E1, E2, E3, E4 connected to A1 with a single query and return properties from all A, B, C, D and E nodes? I could not return the properties using the hops.
MATCH (A {Id:30})-[*1..6]-(E) RETURN DISTINCT A.Name, E.Name;
But we want to return B.Name (If there are multiple B nodes in the middle their names too), C.Name and D.Name too. Happy to skip hopping completely if required. Help please? Thanks in advance.
Try this
// in case there is always A,C and E, you can look for
// paths with length 3 to 6
MATCH path=(A)-[*3..6]-(E)
// return the name of each node in the same order
RETURN [n IN nodes(path) | n.name] AS nodeNames
Assuming A through E are node labels, this query should get all paths that match your pattern (with 0 to 7 B nodes between the A and C nodes), and return distinct lists of node Name values:
MATCH p=(:A)-[*..8]-(:C)--(:D)--(:E)
WHERE ALL(n IN NODES(p)[1..-3] WHERE 'B' IN LABELS(n))
RETURN DISTINCT [m IN NODES(p) | m.Name] AS names
In general, the query would be more efficient if you could also specify the relationship types and their directionality.

Selecting one of two nodes in Cypher query

all Cypher masters!
I can't figure out how to query all B nodes while choosing either B1 or B2 and B4 or B5. There is no constraint on which of them, only that one is chosen. As in the image, there's a relation (B1,B2) and (B4,B5).
In other words - I want to MATCH all nodes of type B connected to some node with type A, but excluding either B1 or B2 and B4 or B5 (using the relation between them) in the result. The nodes of type B can only be pairwise connected - that is, no (B1,B2), (B2,B3) will exists simultaneously. Although, there can be more than one pair as the image shows.
Any ideas are more than welcome!
I think this is a simple additional condition:
MATCH (A:A)--(B:B)
OPTIONAL MATCH (B)--(BT:B)--(A)
WITH B WHERE BT IS NULL OR id(B) > id(BT)
RETURN B
So for this, it will be faster to use APOC Procedures, as there are some helpful collection functions, and a procedure we'll want to easily get the relationships that exist between a group of nodes.
The idea here is that we'll match to the connected :B nodes, use the cover() procedure to get all relationships among these nodes, collect those relationships and from those take one of the nodes for those relationships (we'll use the startnode here), and then we'll just subtract those chosen nodes from our list leaving us with the :B nodes we want:
MATCH (a:A)--(b:B)
WITH collect(b) as bNodes
CALL apoc.algo.cover(bNodes) YIELD rel
WITH bNodes, [r in collect(rel) | startNode(r)] as toRemove
RETURN apoc.coll.subtract(bNodes, toRemove) as nodes
If you don't have (or don't want to use) APOC, here's a Cypher-only version:
MATCH (a:A)--(b:B)
WITH collect(b) as bNodes
UNWIND bNodes as b
OPTIONAL MATCH (b)-[r]-(other)
WHERE other IN bNodes
WITH bNodes, collect(DISTINCT startNode(r)) as toRemove
RETURN [b in bNodes WHERE NOT b in toRemove] as nodes

Return multiple sums of relationship weights using cypher

I have a graph with one node type 'nodeName' and one relationship type 'relName'. Each node pair has 0-1 'relName' relationships with each other but each node can be connected to many nodes.
Given an initial list of nodes (I'll refer to this list as the query subset) I want to:
Find all the nodes that connect to the query subset
I'm currently doing this (which may be overly convoluted):
MATCH (a: nodeName)-[r:relName]-()
WHERE (a.name IN ['query list'])
WITH a
MATCH (b: nodeName)-[r2:relName]-()
WHERE NOT (b.name IN ['query list'])
WITH a, b
MATCH (a)--(b)
RETURN DISTINCT b
Then for each connected node (b) I want to return the SUM of the weights that connect to the query subset
For example. If node b1 has 4 edges that connect to nodes in the query subset I would like to RETURN SUM(r2.weight) AS totalWeight for b2. I actually need a list of all the b nodes ordered by totalWeight.
No. 2 is where I'm stuck. I've been reading the docs about FOREACH and reduce() but I'm not sure how to apply them here.
Speed is important as I have 30,000 nodes and 1.5M edges if you have any suggestions regarding this please throw them into the mix.
Many thanks
Matt
Why do you need so many Match statements? You can specify a nodes and b nodes in single Match statement and select only those who have a relationship between them.
After that just return b nodes and sum of the weights. b nodes will automatically be acting as a group by if it is returned along with aggregation function such as sum.
MATCH (a:nodeName)-[r:relName]-(b:nodeName)
WHERE (a.name IN ['query list']) AND NOT((b.name IN ['query list']))
RETURN b.name, sum(r.weight) as weightSum order by weightSum
I think we can simplify that query a bit.
MATCH (a: nodeName)
WHERE (a.name IN ['query list'])
WITH collect(a) as subset
UNWIND subset as a
MATCH (a)-[r:relName]-(b)
WHERE NOT b in subset
RETURN b, sum(r.weight) as totalWeight
ORDER BY totalWeight ASC
Since sum() is an aggregating function, it will make the non-aggregation variables the grouping key (in this case b), so the sum is per b node, then we order them (switch to DESC if needed).

Cypher: Exclude shared nodes?

I have what feels like a fairly simple Cypher question.
I have the following data, where I have two A nodes and 3 B nodes with b1 being related to a1, b2 related to a2, and b3 is shared and related to both a1 and a2. My goal is write a Cypher statement that, given a particular A node, will return the B nodes that are connected only to it and are not related with any other A node. For example, when given node a1 the query should return b1 and when given node a2, b2 should be returned. The b3 node, which is relate to both a1 and a2, should never be returned from this query regardless of which A node is specified. Said differently, I am trying to find the B nodes that are unique to a given A node, in that the resulting B nodes are not related to any other A node other than the one specified in my match.
This example data will (hopefully) make my goal more clear:
CREATE (n:A { code: 'a1' })
CREATE (n:A { code: 'a2' })
CREATE (n:B { code: 'b1' })
CREATE (n:B { code: 'b2' })
CREATE (n:B { code: 'b3' })
match (a:A), (b:B) where a.code = 'a1' and b.code = 'b1' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a2' and b.code = 'b2' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a1' and b.code = 'b3' create (a)<-[r:A_AND_B]-(b) return a, r, b
match (a:A), (b:B) where a.code = 'a2' and b.code = 'b3' create (a)<-[r:A_AND_B]-(b) return a, r, b
If I were willing to include the shared b3 node, the query would be straight forward and be:
match (a:A)-[r:A_AND_B]-(b:B) where a.code = 'a1' return b
This returns b1 and b2. However, given that I do want to include any B nodes that are relate to a different A node (b2 should not be returned in this case), I'm struggling to come up with the right approach and syntax.
I've explored explored Cypher's WITH and OPTIONAL MATCH with so far no luck. I am also able to accomplish what I want if I use two separate queries, which is a bit of a cheat and sidesteps the learning opportunity.
Can someone provide a boost?
How about something like this:
match (a:A)-[:A_AND_B]-(b:B)
where a.code = 'a1'
match (b)-[r:A_AND_B]-(:A)
with b, count(r) as c
where c = 1
return b

Count duplicated

Imagine that i have a graph in which for every pair of nodes m,n of type Nod1 there can be a node k of type Nod2 that connect them through relationships of type Rel, that is, there can be multiple patterns of the kind p=(m:Nod1)-[r:Rel]-(k:Nod2)-[s:Rel]-(n:Nod1). For a given node m (satisfying for example m.key="whatever") how can i find the node n that maximizes the number of nodes k that connect m to n? For example: imagine that there are 3 nodes k that connects m to n1 satisfying n1.key="hello" and 10 nodes k that connects m to n2 satisfying n2.key="world"; how to build a query that retrieves the node n2? :)
The title of the question is count duplicated, because i think that the problem is solved if i can count all "duplicated" patterns for each node n (that is, all patterns that has n as "endnode")!! :)
Start by matching your m; then match the pattern you want, then filter by distinct n nodes, and count the number of k nodes connected via that n node, and you should be there.
MATCH (m:Nod1 { key: "whatever" })
WITH m
MATCH (m)-[r:Rel]-(k:Nod2)-[s:Rel]-(n:Nod1)
RETURN distinct(n), count(k) as x
ORDER BY x DESC;

Resources