Sum up counts in Neo4j - neo4j

I have directors who have directed films. These films have genres and some actors starring. I want to find the films by a directed sorted by the sum of (no of genres of the film, no of actors starring in the film).
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc
ORDER BY gc DESC
This works but now I want to sum gc and sc and order films by the result. How does one do that?

I think you can just add the sum you want in your RETURN statement and then order results by it:
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc, gc+sc AS S
ORDER BY S DESC

Related

Get nodes sorted by proximity and at the same level sort by date

I'm pretty new in neo4j and I have troubles to get a well result for my query. I have the next model:
Player <- HAS_PLAYERS - Game
Node Player: playerId, name,...etc
Node Game: gameId, gameDate
Rel. HAS_PLAYERS: result
Note that a Game could have 1-4 players.
I would like to make a query to suggest future opponents to a player ordered by:
Previous opponents ordered by gameDate (more recent) and then opponents of opponents ordered by gameDate.
For example:
PlayerA <- 2021/02/01 -> PlayerB*
PlayerA <- 2021/02/01 -> PlayerC*
PlayerA <- 2021/02/11 -> PlayerB
PlayerB <- 2021/02/04 -> PlayerC
PlayerB <- 2021/02/20 -> PlayerD
PlayerC <- 2021/02/15 -> PlayerD
PlayerC <- 2021/12/01 -> PlayerE
PlayerD <- 2021/02/07 -> PlayerE
PlayerD <- 2021/02/23 -> PlayerF
* = Same game
The result would be:
PlayerB
PlayerC
PlayerE
PlayerD
Explanation:
PlayerB and PlayerC have been opponents before but PlayerB is the first one because the last game was more recent than PlayerC.
PlayerE and PlayerD are opponents-of-opponents and PlayerE is before because the next game will be in December.
I have the next query but my problem is the query returns duplicated nodes:
# Getting direct opponents
MATCH (p:Player {userId: "PlayerA"})<-[:HAS_PLAYERS]-(g:Game)-[:HAS_PLAYERS]->(o:Player)
WITH p, o, g ORDER BY g.gameDate DESC
WITH p, COLLECT(o) AS opponents
# Getting opponents-of-opponents (ops)
MATCH (p)-[:HAS_PLAYERS*3]-(gops:Game)--(ops:Player)
WHERE p.userId <> ops.userId AND NOT ops IN opponents
# Trying to remove duplicated nodes
WITH DISTINCT ops, opponents, gops
WITH opponents, ops, gops ORDER BY gops.gameDate DESC
# Concat both lists: opponents and opponents-of-opponents
WITH REDUCE(s = opponents, o2 IN COLLECT(ops) | s + o2) as listAllOpponents
UNWIND listAllOpponents as opPlayer
RETURN opPlayer
It returns something like:
PlayerB
PlayerC
PlayerD
PlayerE
PlayerD
Any help would be appreciated.
When you aggregate the nodes, it will not remove duplicates so adding the keyword "distinct" will fix it. Instead of COLLECT(o), use COLLECT(DISTINCT o) as opponents and COLLECT(DISTINCT ops).
// Getting direct opponents
MATCH (p:Player {userId: "34618"})<-[:HAS_PLAYERS]-(g:Game)-[:HAS_PLAYERS]->(o:Player)
WITH p, o, g ORDER BY g.gameDate DESC
WITH p, COLLECT(DISTINCT o) AS opponents
// Getting opponents-of-opponents (ops)
MATCH (p)-[:HAS_PLAYERS*3]-(gops:Game)--(ops:Player)
WHERE p.userId <> ops.userId AND NOT ops IN opponents
// Trying to remove duplicated nodes
WITH DISTINCT ops, opponents, gops
WITH opponents, ops, gops ORDER BY gops.gameDate DESC
// Concat both lists: opponents and opponents-of-opponents
WITH REDUCE(s = opponents, o2 IN COLLECT(DISTINCT ops) | s + o2) as listAllOpponents
UNWIND listAllOpponents as opPlayer
RETURN opPlayer
Result:
PlayerB
PlayerC
PlayerE
PlayerD
This is my solution:
# Getting direct opponents
MATCH (p:Player {userId: "PlayerA"})<-[:HAS_PLAYERS]-(g:Game)-[:HAS_PLAYERS]->(o:Player)
WITH p, o, max(g.gameDate) as maxDate
WITH p, o ORDER BY maxDate DESC
WITH p, COLLECT(o) AS opponents
# Getting opponents-of-opponents (ops)
OPTIONAL MATCH (p)-[:HAS_PLAYERS*3]-(gops:Game)--(ops:Player)
WHERE p.userId <> ops.userId AND NOT ops IN opponents
WITH opponents, ops, max(gops.start) as maxDate
WITH opponents, ops ORDER BY maxDate DESC
WITH opponents, COLLECT(ops) AS opponentsOfOpponents
# Concat both lists: opponents and opponents-of-opponents
UNWIND (opponents + opponentsOfOpponents) AS player
RETURN player

Is there a way i can return all the nodes their relationship and it's properties for the following query

I want to get all the list of distinct nodes and relationship that I am getting through this query.
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
Return a,b,c,d,r
limit 10
This should work:
MATCH (a:Protein{name:'9606.ENSP00000005995'})-[r:ON_INTERACTION_WITH]-(b:Protein)-[d:ON_INTERACTION_WITH]-(c:Protein)
WITH * LIMIT 10
RETURN
COLLECT(DISTINCT a) AS aList,
COLLECT(DISTINCT b) AS bList,
COLLECT(DISTINCT c) AS cList,
COLLECT(DISTINCT r) AS rList,
COLLECT(DISTINCT d) AS dList

Invalid use of aggregating function max

match(n)-[r:LIKES]->(m) with count(n) as cnt, m where cnt = max(cnt) return m
Above query results in following error:
Invalid use of aggregating function max(...) in this context (line 1,
column 61 (offset: 60))
This query should return a collection of the m nodes that have the maximum count. It only needs to perform a single MATCH operation, and should be relatively performant.
MATCH (n)-[:LIKES]->(m)
WITH m, COUNT(n) AS cnt
WITH COLLECT({m: m, cnt: cnt}) AS data, MAX(cnt) AS maxcnt
RETURN REDUCE(ms = [], d IN data | CASE WHEN d.cnt = maxcnt THEN ms + d.m ELSE ms END) AS ms;
If you're just trying to find a single node that has the most LIKES relationships leading to it, then you can add an ORDER BY and LIMIT:
MATCH (n)-[:LIKES]->(m)
WITH m, count(n) AS cnt
ORDER BY cnt DESC
LIMIT 1
RETURN m
However, that query has the limitation in that if more than one node has the maximum number of inbound relationships, then those tied nodes won't be returned. To achieve that result, you might try something like this:
MATCH (n)-[:LIKES]->(m)
WITH m, count(n) AS cnt
WITH MAX(cnt) AS maxcnt
MATCH (o)
WHERE size((o)<-[:LIKES]-()) = maxcnt
RETURN o

Very slow this cypher query, is there any optimization?

Using Neo4J 2.1.5.
Data:
2000 persons
Goal: For each person, calculate the total of friends, friends' friends, friends' friends' friends.
Result is like follows:
Person FullName | Friends total | Friends-2 total | Friends-3 total | global total
MATCH (person:Person)
WITH person
OPTIONAL MATCH person-[:KNOWS]-(p2:Person)
WITH person, count(p2) as f1
OPTIONAL MATCH path = shortestPath(person-[:KNOWS*..2]-(f2:Person))
WHERE length(path) = 2
WITH count(nodes(path)[-1]) AS f2, person, f1
OPTIONAL MATCH path = shortestPath(person-[:KNOWS*..3]-(f3:Person))
WHERE length(path) = 3
WITH count(nodes(path)[-1]) AS f3, person, f2, f1
RETURN person._firstName + " " + person._lastName, f1, f2, f3, f1+f2+f3 AS total
The tricks is to avoid wrong calculations with cylic graph; that's why I use shortestPath.
However, this query lasts long: 60 seconds!
Any optimization possible?
[EDITED]
Does this work for you?
MATCH (person:Person)
OPTIONAL MATCH (person)-[:KNOWS]-(p1:Person)
WITH person, COALESCE(COLLECT(p1),[]) AS p1s
WITH person, CASE p1s WHEN [] THEN [NULL] ELSE p1s END AS p1s
UNWIND p1s AS p1
OPTIONAL MATCH (p1)-[:KNOWS]-(p2:Person)
WHERE NOT ((p2 = person) OR (p2 IN p1s))
WITH person, p1s, COALESCE(COLLECT(DISTINCT p2),[]) AS p2s
WITH person, p1s, CASE p2s WHEN [] THEN [NULL] ELSE p2s END AS p2s UNWIND p2s AS p2
OPTIONAL MATCH (p2)-[:KNOWS]-(p3:Person)
WHERE NOT ((p3 = person) OR (p3 IN p1s) OR (p3 IN p2s))
WITH person,
CASE p1s WHEN [NULL] THEN 0 ELSE SIZE(p1s) END AS f1,
CASE p2s WHEN [NULL] THEN 0 ELSE SIZE(p2s) END AS f2,
COUNT(DISTINCT p3) AS f3
RETURN person.firstName + " " + person.lastName, f1, f2, f3, f1+f2+f3 AS total;
Each friend is only counted once.
Here is an explanation of some of the more obscure tactics used. The query has to to replace empty p1s and p2s collections with [NULL] so that UNWIND will not abort the rest of the query. Then, when counting the size of the collections, we need give [NULL] collections a count of 0.

Neo4j collection function error?

I am running the following query that is meant to compare two collections nodes set1 and set2. All nodes in set2 are in set1, and I would like to identify all the nodes in set1 that are NOT in set2. However, the query returns a set of nodes that includes some of the nodes in set1. I am running this query on v2.1.7. Suggestions?
Query:
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with nodes(p) as set1, p
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with nodes(q) as set2,set1, p
WHERE ALL(x in set2 WHERE NOT x in set1)
with nodes(p) as pneumo
UNWIND pneumo AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
Alternative query, same result:
Query:
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with nodes(p) as set1, p
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with nodes(q) as set2,set1, p
WHERE NONE(x in set2 WHERE x in set1)
with nodes(p) as pneumo
UNWIND pneumo AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
Your matches don't return just one row as you might expect but many rows,
and your comparison is done between the cross product of those many row combinations. You probably want to create a set for each of your two subtrees first with a combination of unwind + collect(distinct)
The code below will not be as fast, as cypher internally doesn't have a Set concept yet.
try this
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
unwind nodes(p) as n
with collect(distinct n) as set1
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
unwind nodes(q) as m
with collect(distinct m) as set2
WHERE NONE(x in set2 WHERE x in set1)
UNWIND set1 AS pneumolist
RETURN distinct pneumolist.FSN,pneumolist.sctid
The following query was successful, and addresses Michael's discussion regarding cross products (above).
MATCH p=(a:ObjectConcept{sctid:233604007})<-[:ISA*]-(b:ObjectConcept)
with distinct nodes(p) as set1
UNWIND set1 as x1
with collect(DISTINCT x1) as set11
MATCH q=(a:ObjectConcept{sctid:34020007})<-[:ISA*]-(b:ObjectConcept)
with distinct nodes(q) as set2,set11
UNWIND set2 as x2
with collect(distinct x2) as set22,set11
with REDUCE(pneumo=[],x in set11|case when x in set22 then pneumo else pneumo
+ [x] END) AS pneumo
return pneumo

Resources