Cypher multiple OPTIONAL MATCH - Pattern Comprehension - COUNT DISTINCT

Cypher multiple OPTIONAL MATCH - Pattern Comprehension - COUNT DISTINCT - neo4j

I have read a lot of comments about OPTIONAL MATCH and Pattern Comprehesion, but I can't find a solution for my case.
I have a node (Account) in my Neo4j Database and I'd like to count the nodes which belongs to each account.
The following code works with one or two optional matches, but the many optional matches produce a cross product and a timeout.
// Account
MATCH (a:Account{billingCountry: "DE", isDeleted: false})
WHERE a.id IS NOT NULL
// User
MATCH (a)<-[:CREATED]-(u:User)
// Contact
OPTIONAL MATCH (a) <-[:CONTACT_OF]- (c:Contact{isDeleted: false})
// Opportunity
OPTIONAL MATCH (a) <-[:OPPORTUNITY_OF]- (o:Opportunity{isDeleted: false, s4sMarked_For_Deletion__C: false})
// Open Opportunity
OPTIONAL MATCH (a)<-[:OPPORTUNITY_OF]-(open:Opportunity{isClosed: false, isDeleted: false})
// Attribute
OPTIONAL MATCH (a) <-[:ATTRIBUTE_OF]- (aa:Attribute_Assignment{isDeleted: false})
// Sales Planning
OPTIONAL MATCH (a) <-[:SALESPLAN_OF]- (s:Sales_Planning)
// Task
OPTIONAL MATCH (a) <-[:TASK_OF]- (t:Task{isDeleted: false})
// Event
OPTIONAL MATCH (a) <-[:EVENT_OF]- (e:Event{isDeleted: false})
// Contract
OPTIONAL MATCH (a) <-[:CONTRACT_OF]- (ct:Contract{isDeleted: false})
RETURN
a.id
u.name AS User_Name,
u.department AS User_Department,
COUNT(DISTINCT c.id) AS Contact_Count,
COUNT(DISTINCT o.id) AS Opportunity_Count,
COUNT(DISTINCT open.id) AS OpenOpp_Count,
COUNT(DISTINCT aa.id) AS Attribute_Count,
COUNT(DISTINCT s.timeYear) AS Sales_Plan_Count,
COUNT(DISTINCT t.id) AS Task_Count,
COUNT(DISTINCT e.id) AS Event_Count,
COUNT(DISTINCT ct.id) AS Contract_Count
I can rewrite the query with a Pattern Compression, but then I just get back the non distinct ids in arrays.
Is there a way to count the distinct values inside the arrays or another way how to count the values in pattern compression?
MATCH (a:Account{billingCountry: "DE", isDeleted: false})
WHERE a.id IS NOT NULL
RETURN a.id,
[
[(a)<-[:CONTACT_OF]- (c:Contact{isDeleted: false}) | c.id],
[(a)<-[:OPPORTUNITY_OF]- (o:Opportunity{isDeleted: false, s4sMarked_For_Deletion__C: false}) | o.id],
[(a)<-[:OPPORTUNITY_OF]-(open:Opportunity{isClosed: false, isDeleted: false}) | open.id],
[(a) <-[:ATTRIBUTE_OF]- (aa:Attribute_Assignment{isDeleted: false}) | aa.id],
[(a) <-[:SALESPLAN_OF]- (s:Sales_Planning) | s.timeYear],
[(a) <-[:TASK_OF]- (t:Task{isDeleted: false}) | t.id],
[(a) <-[:EVENT_OF]- (e:Event{isDeleted: false}) | e.id],
[(a) <-[:CONTRACT_OF]- (ct:Contract{isDeleted: false}) | ct.id]
]
If I made a formal mistake in my first stockoverflow post, I would appreciate feedback :)

The problem lies, in the RETURN statement, because you are calculating all the counts at the last, neo4j has to calculate the cartesian products. If you calculate each node count at each step, it will be much more optimal. Like this:
MATCH (a:Account{billingCountry: "DE", isDeleted: false})
WHERE a.id IS NOT NULL
MATCH (a)<-[:CREATED]-(u:User)
OPTIONAL MATCH (a) <-[:CONTACT_OF]- (c:Contact{isDeleted: false})
WITH a, u, COUNT(DISTINCT c.id) AS Contact_Count,
OPTIONAL MATCH (a) <-[:OPPORTUNITY_OF]- (o:Opportunity{isDeleted: false, s4sMarked_For_Deletion__C: false})
WITH a, u, Contact_Count, COUNT(DISTINCT o.id) AS Opportunity_Count
OPTIONAL MATCH (a)<-[:OPPORTUNITY_OF]-(open:Opportunity{isClosed: false, isDeleted: false})
WITH a, u, Contact_Count, Opportunity_Count, COUNT(DISTINCT open.id) AS OpenOpp_Count
OPTIONAL MATCH (a) <-[:ATTRIBUTE_OF]- (aa:Attribute_Assignment{isDeleted: false})
WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, COUNT(DISTINCT aa.id) AS Attribute_Count
OPTIONAL MATCH (a) <-[:SALESPLAN_OF]- (s:Sales_Planning)
WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, Attribute_Count,COUNT(DISTINCT s.timeYear) AS Sales_Plan_Count
OPTIONAL MATCH (a) <-[:TASK_OF]- (t:Task{isDeleted: false})
WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, Attribute_Count, Sales_Plan_Count, COUNT(DISTINCT t.id) AS Task_Count
OPTIONAL MATCH (a) <-[:EVENT_OF]- (e:Event{isDeleted: false})
WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, Attribute_Count, Sales_Plan_Count, Task_Count, COUNT(DISTINCT e.id) AS Event_Count
OPTIONAL MATCH (a) <-[:CONTRACT_OF]- (ct:Contract{isDeleted: false})
RETURN
a.id, u.name AS User_Name, u.department AS User_Department, Contact_Count,
Opportunity_Count, OpenOpp_Count, Attribute_Count, Sales_Plan_Count,
Task_Count, Event_Count, COUNT(DISTINCT ct.id) AS Contract_Count

Related

Combine two cypher queries

Currently this is the data stored in the database
Org Name Org ID
A 1
B 2
C 5
D 9
I'm trying to combine these 2 queries:
MATCH (n:Org)
WHERE n.id in [1,2]
RETURN n.name as group1_name, n.id as group1_id
MATCH (n:Org)
WHERE n.id in [5,9]
RETURN n.name as group2_name, n.id as group2_id
I need the result to be shown like this:
group1_id group1_name group2_id group1_name
1 A 5 C
2 B 9 D

Assuming the two id lists are always the same size (in your example, 2), here is one approach (assuming you also want the id values sorted in ascending order):
MATCH (n:Org)
WHERE n.id in [1, 2]
WITH n ORDER BY n.id
WITH COLLECT(n) AS ns
MATCH (m:Org)
WHERE m.id in [5, 9]
WITH ns, m ORDER BY m.id
WITH ns, COLLECT(m) AS ms
UNWIND [i IN RANGE(0, SIZE(ns)-1) | {a: ns[i], b: ms[i]}] AS row
RETURN
row.a.id as group1_id, row.a.name as group1_name,
row.b.id as group2_id, row.b.name as group2_name
And here is a simpler approach:
WITH [1, 2] AS xs, [5, 9] AS ys
UNWIND RANGE(0, SIZE(xs)-1) AS i
MATCH (n:Org), (m:Org)
WHERE n.id = xs[i] AND m.id = ys[i]
RETURN n.id as group1_id, n.name as group1_name, m.id as group2_id, m.name as group2_name
And finally, if the xs and ys lists are passed to the query as parameters:
UNWIND RANGE(0, SIZE($xs)-1) AS i
MATCH (n:Org), (m:Org)
WHERE n.id = $xs[i].id AND m.id = $ys[i].y
RETURN n.id as group1_id, n.name as group1_name, m.id as group2_id, m.name as group2_name

Is there a way i can return all the nodes their relationship and it's properties for the following query

I want to get all the list of distinct nodes and relationship that I am getting through this query.
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
Return a,b,c,d,r
limit 10

This should work:
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
WITH * LIMIT 10
RETURN
COLLECT(DISTINCT a) AS aList,
COLLECT(DISTINCT b) AS bList,
COLLECT(DISTINCT c) AS cList,
COLLECT(DISTINCT r) AS rList,
COLLECT(DISTINCT d) AS dList

Neo4j: skip nodes that have even just a single relationship matching a query

The scenario is the following:
I have a set of nodes of type x that are linked to nodes of type y.
I want to match all x nodes except those that are linked to a y node that has an attribute equal to a particular value.
Example input:
CREATE (a:x {name: 'a'}), (b:x {name: 'b'}), (c:x {name: 'c'});
CREATE (d:y {name: 'd', attrib: 1}), (e:y {name: 'e', attrib: 2}),
(f:y {name: 'f', attrib: 3}), (g:y {name: 'g', attrib: 4}),
(h:y {name: 'h', attrib: 5}), (i:y {name: 'i', attrib: 6});
MATCH (a), (d), (e) WHERE a.name = 'a' AND d.name = 'd' AND e.name = 'e'
CREATE (a)-[r:z]->(d), (a)-[s:z]->(e) RETURN *;
MATCH (b), (f), (g) WHERE b.name = 'b' AND f.name = 'f' AND g.name = 'g'
CREATE (b)-[r:z]->(f), (b)-[s:z]->(g) RETURN *;
MATCH (c), (h), (i) WHERE c.name = 'c' AND h.name = 'h' AND i.name = 'i'
CREATE (c)-[r:z]->(h), (c)-[s:z]->(i) RETURN *;
Here I want to return all the x nodes except those that are linked to a y node that has attrib = 5.
Here's what I tried:
MATCH (n:x)-[]-(m:y) WHERE NOT m.attrib = 5 RETURN n
From this query I get all x nodes, that is: a, b and c. I would like to exclude c, because it's linked to h, which has h.attrib = 5.
Edit:
I found a query that does the job:
MATCH (n:x), (m:x)-[]-(o:y)
WHERE o.attrib = 5
WITH collect(n) as all_x_nodes, collect(m) as bad_x_nodes
RETURN [n IN all_x_nodes WHERE NOT n IN bad_x_nodes]
The problem is that it's not efficient. Any better alternative?

This simple query should do exactly what you asked for: "return all the x nodes except those that are linked to a y node that has attrib = 5."
MATCH (n:x)
WHERE NOT (n)--(:y {attrib: 5})
RETURN n;

A better approach is to find all :x nodes that you want to exclude (that are connected to the :y node with the specific attribute), collect those x nodes, then match to all :x nodes that aren't in the collection:
MATCH (exclude:x)--(:y{attrib:5})
WITH collect(distinct exclude) as excluded
MATCH (n:x)
WHERE NOT n in excluded
RETURN collect(n) as result
An alternate approach using APOC Procedures is to get both collections, and subtract the excluded collection from the other:
MATCH (exclude:x)--(:y{attrib:5})
WITH collect(distinct exclude) as excluded
MATCH (n:x)
WITH excluded, collect(n) as nodes
RETURN apoc.coll.subtract(nodes, excluded) as result
In either case, it would help to have an index on :y(attrib). In this data set it doesn't matter. On much larger sets it will.

Neo4j Cypher query and index of element in the collection

I'm trying to find index number of Decision by {decisionGroupId}, {decisionId} and {criteriaIds}
This is my current Cypher query:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion)
WHERE c.id IN {criteriaIds}
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
ORDER BY weight DESC, totalVotes DESC
WITH COLLECT(childD) AS ps
RETURN REDUCE(ix = -1, i IN RANGE(0, SIZE(ps)-1)
| CASE ps[i].id WHEN {decisionId} THEN i ELSE ix END) AS ix
I have only 3 Decision in the database but this query returns the following indices:
2
3
4
while I expecting something like(starting from 0 and -1 if not found)
0
1
2
What is wrong with my query and how to fix it?
UPDATED
This query is working fine with COLLECT(DISTINCT childD) AS ps:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion)
WHERE c.id IN {criteriaIds}
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
ORDER BY weight DESC, totalVotes DESC
WITH COLLECT(DISTINCT childD) AS ps
RETURN REDUCE(ix = -1, i IN RANGE(0, SIZE(ps)-1)
| CASE ps[i].id WHEN {decisionId} THEN i ELSE ix END) AS ix
Please help me to refactor this query and get rid of heavy REDUCE.

Let's try to get the reduce part right with a simpler query:
WITH ['a', 'b', 'c'] AS ps
RETURN
reduce(ix = -1, i IN RANGE(0, SIZE(ps)-1) |
CASE ps[i] WHEN 'b' THEN i ELSE ix END) AS ix
)
As I stated in the comments, it is usually better to avoid reduce if possible. So, to express the same using a list comprehension, use WHERE for filtering.
WITH ['a', 'b', 'c'] AS ps
RETURN [i IN RANGE(0, SIZE(ps)-1) WHERE ps[i] = 'b'][0]
The list comprehension results in a list with a single element, and we will use the [0] indexer to select that element.
After adapting this to your query, we'll get something like this:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion)
WHERE c.id IN {criteriaIds}
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
ORDER BY weight DESC, totalVotes DESC
WITH COLLECT(DISTINCT childD) AS ps
RETURN [i IN RANGE(0, SIZE(ps)-1) WHERE ps[i].id = {decisionId}][0]

If you have APOC installed, you can also use the function:
return apoc.coll.indexOf([1,2,3],2)

Sum up counts in Neo4j

I have directors who have directed films. These films have genres and some actors starring. I want to find the films by a directed sorted by the sum of (no of genres of the film, no of actors starring in the film).
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc
ORDER BY gc DESC
This works but now I want to sum gc and sc and order films by the result. How does one do that?

I think you can just add the sum you want in your RETURN statement and then order results by it:
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc, gc+sc AS S
ORDER BY S DESC

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Cypher multiple OPTIONAL MATCH - Pattern Comprehension - COUNT DISTINCT - neo4j

Related

Combine two cypher queries

Is there a way i can return all the nodes their relationship and it's properties for the following query

Neo4j: skip nodes that have even just a single relationship matching a query

Neo4j Cypher query and index of element in the collection

Sum up counts in Neo4j

Categories

Resources