Very slow this cypher query, is there any optimization? - neo4j

Using Neo4J 2.1.5.
Data:
2000 persons
Goal: For each person, calculate the total of friends, friends' friends, friends' friends' friends.
Result is like follows:
Person FullName | Friends total | Friends-2 total | Friends-3 total | global total
MATCH (person:Person)
WITH person
OPTIONAL MATCH person-[:KNOWS]-(p2:Person)
WITH person, count(p2) as f1
OPTIONAL MATCH path = shortestPath(person-[:KNOWS*..2]-(f2:Person))
WHERE length(path) = 2
WITH count(nodes(path)[-1]) AS f2, person, f1
OPTIONAL MATCH path = shortestPath(person-[:KNOWS*..3]-(f3:Person))
WHERE length(path) = 3
WITH count(nodes(path)[-1]) AS f3, person, f2, f1
RETURN person._firstName + " " + person._lastName, f1, f2, f3, f1+f2+f3 AS total
The tricks is to avoid wrong calculations with cylic graph; that's why I use shortestPath.
However, this query lasts long: 60 seconds!
Any optimization possible?

[EDITED]
Does this work for you?
MATCH (person:Person)
OPTIONAL MATCH (person)-[:KNOWS]-(p1:Person)
WITH person, COALESCE(COLLECT(p1),[]) AS p1s
WITH person, CASE p1s WHEN [] THEN [NULL] ELSE p1s END AS p1s
UNWIND p1s AS p1
OPTIONAL MATCH (p1)-[:KNOWS]-(p2:Person)
WHERE NOT ((p2 = person) OR (p2 IN p1s))
WITH person, p1s, COALESCE(COLLECT(DISTINCT p2),[]) AS p2s
WITH person, p1s, CASE p2s WHEN [] THEN [NULL] ELSE p2s END AS p2s UNWIND p2s AS p2
OPTIONAL MATCH (p2)-[:KNOWS]-(p3:Person)
WHERE NOT ((p3 = person) OR (p3 IN p1s) OR (p3 IN p2s))
WITH person,
CASE p1s WHEN [NULL] THEN 0 ELSE SIZE(p1s) END AS f1,
CASE p2s WHEN [NULL] THEN 0 ELSE SIZE(p2s) END AS f2,
COUNT(DISTINCT p3) AS f3
RETURN person.firstName + " " + person.lastName, f1, f2, f3, f1+f2+f3 AS total;
Each friend is only counted once.
Here is an explanation of some of the more obscure tactics used. The query has to to replace empty p1s and p2s collections with [NULL] so that UNWIND will not abort the rest of the query. Then, when counting the size of the collections, we need give [NULL] collections a count of 0.

Related

Is there a way i can return all the nodes their relationship and it's properties for the following query

I want to get all the list of distinct nodes and relationship that I am getting through this query.
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
Return a,b,c,d,r
limit 10
This should work:
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
WITH * LIMIT 10
RETURN
COLLECT(DISTINCT a) AS aList,
COLLECT(DISTINCT b) AS bList,
COLLECT(DISTINCT c) AS cList,
COLLECT(DISTINCT r) AS rList,
COLLECT(DISTINCT d) AS dList

Invalid use of aggregating function max

match(n)-[r:LIKES]->(m) with count(n) as cnt, m where cnt = max(cnt) return m
Above query results in following error:
Invalid use of aggregating function max(...) in this context (line 1,
column 61 (offset: 60))
This query should return a collection of the m nodes that have the maximum count. It only needs to perform a single MATCH operation, and should be relatively performant.
MATCH (n)-[:LIKES]->(m)
WITH m, COUNT(n) AS cnt
WITH COLLECT({m: m, cnt: cnt}) AS data, MAX(cnt) AS maxcnt
RETURN REDUCE(ms = [], d IN data | CASE WHEN d.cnt = maxcnt THEN ms + d.m ELSE ms END) AS ms;
If you're just trying to find a single node that has the most LIKES relationships leading to it, then you can add an ORDER BY and LIMIT:
MATCH (n)-[:LIKES]->(m)
WITH m, count(n) AS cnt
ORDER BY cnt DESC
LIMIT 1
RETURN m
However, that query has the limitation in that if more than one node has the maximum number of inbound relationships, then those tied nodes won't be returned. To achieve that result, you might try something like this:
MATCH (n)-[:LIKES]->(m)
WITH m, count(n) AS cnt
WITH MAX(cnt) AS maxcnt
MATCH (o)
WHERE size((o)<-[:LIKES]-()) = maxcnt
RETURN o

Sum up counts in Neo4j

I have directors who have directed films. These films have genres and some actors starring. I want to find the films by a directed sorted by the sum of (no of genres of the film, no of actors starring in the film).
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc
ORDER BY gc DESC
This works but now I want to sum gc and sc and order films by the result. How does one do that?
I think you can just add the sum you want in your RETURN statement and then order results by it:
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc, gc+sc AS S
ORDER BY S DESC

cyhper combine two columns into a single

I couldn't find a similar post, so if you already know one or if my question is not the proper one, please let me know.
I have this query
MATCH
(t:Taxi {name:'Taxi1813'})<-[:ASSIGNED]-(u2:User)-[rd2:DROP_OFF]->
(g2:Grid)-[r:TO*1..2]-(g:Grid)<-[rd:DROP_OFF]-(u:User)-[:ASSIGNED]->(t)
WHERE ID(u2) < ID(u) AND rd2.time >= '04:38' AND rd2.time <= '04:42'
WITH DISTINCT u2, g2, u, g, rd2, rd
MATCH p=shortestPath((g2)-[r:TO*1..2]-(g))
WITH rd2, rd,u2, g2, u, g, p, REDUCE(totalTime = 0, x IN RELATIONSHIPS(p) | totalTime + x.time) AS totalTime
WHERE totalTime <= 4
RETURN u2.name, u.name
So at the end i got two columns
u2.name u.name
User179 UserTest
User177 User179
Is there is a way or function to merge both columns into a single one and remove duplicates
Users
User179
User177
UserTest
Any suggestions? Thank you
You can combine the two collections into a single collection and then return just the distinct items.
WITH ['User179', 'User177'] AS list1
, ['UserTest', 'User179'] AS list2
UNWIND list1 + list2 AS item
RETURN DISTINCT item
Alternatively, if you are using APOC you could use apoc.coll.union() instead.
WITH ['User179', 'User177'] AS list1
, ['UserTest', 'User179'] AS list2
RETURN apoc.coll.union(list1,list2)
u2.name + u.name combines the list
you could make something like "where u2.name is not equal to u.name(not right syntax)"

Neo4j collection function error?

I am running the following query that is meant to compare two collections nodes set1 and set2. All nodes in set2 are in set1, and I would like to identify all the nodes in set1 that are NOT in set2. However, the query returns a set of nodes that includes some of the nodes in set1. I am running this query on v2.1.7. Suggestions?
Query:
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with nodes(p) as set1, p
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with nodes(q) as set2,set1, p
WHERE ALL(x in set2 WHERE NOT x in set1)
with nodes(p) as pneumo
UNWIND pneumo AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
Alternative query, same result:
Query:
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with nodes(p) as set1, p
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with nodes(q) as set2,set1, p
WHERE NONE(x in set2 WHERE x in set1)
with nodes(p) as pneumo
UNWIND pneumo AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
Your matches don't return just one row as you might expect but many rows,
and your comparison is done between the cross product of those many row combinations. You probably want to create a set for each of your two subtrees first with a combination of unwind + collect(distinct)
The code below will not be as fast, as cypher internally doesn't have a Set concept yet.
try this
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
unwind nodes(p) as n
with collect(distinct n) as set1
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
unwind nodes(q) as m
with collect(distinct m) as set2
WHERE NONE(x in set2 WHERE x in set1)
UNWIND set1 AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
The following query was successful, and addresses Michael's discussion regarding cross products (above).
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with distinct nodes(p) as set1
UNWIND set1 as x1
with collect(DISTINCT x1) as set11
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with distinct nodes(q) as set2,set11
UNWIND set2 as x2
with collect(distinct x2) as set22,set11
with REDUCE(pneumo=[],x in set11|case when x in set22 then pneumo else pneumo
+ [x] END) AS pneumo
return pneumo

Resources