Cypher query doesn't behave as expected with multiple match blocks? - neo4j

I've got the following query:
MATCH (u:User) WHERE u.username = "ben"
OPTIONAL MATCH (u)-[:HAS]->(pl)
//MATCH (u)-[r1:IS_AT|PREFERS|DESIRES|VALUES]->()<-[]-(fp:FitnessProgram) WHERE NOT (fp)-[:LIMITED_BY]-(pl)
//WITH u, pl, fp, coalesce(r1.importance, 0.5) AS importance
//WITH u, pl, fp, collect({name: fp.name, importance: importance}) AS fpTraits
//WITH u, pl, reduce(s = 0, t IN fpTraits | s + t.importance) AS fpScore order by fpScore
MATCH (u)-[r2:IS_AT|PREFERS|DESIRES|VALUES]->()<-[]-(ns:NutritionalSupplement) WHERE NOT (ns)-[:LIMITED_BY]-(pl)
WITH u, ns, coalesce(r2.importance, 0.5) AS importance
WITH u, ns, collect({name: ns.name, importance: importance}) AS nsTraits
WITH u, ns, reduce(s = 0, t IN nsTraits | s + t.importance) AS nsScore order by nsScore desc limit 5
return u, ns.name, nsScore
As it is, with the 4 lines commented out, it works correctly and gives me the top 5 nutritional supplements as expected.
If I commented out the bottom block and uncomment the top block, that one works as expected too.
If I have both uncommented like below, neither block works and I get a bunch of dupes and the scores are all crazy... seems like the two matches get combined in some way I'm not understanding yet (I'm new to Neo4j)?
MATCH (u:User) WHERE u.username = "ben"
OPTIONAL MATCH (u)-[:HAS]->(pl)
MATCH (u)-[r1:IS_AT|PREFERS|DESIRES|VALUES]->()<-[]-(fp:FitnessProgram) WHERE NOT (fp)-[:LIMITED_BY]-(pl)
WITH u, pl, fp, coalesce(r1.importance, 0.5) AS importance
WITH u, pl, fp, collect({name: fp.name, importance: importance}) AS fpTraits
WITH u, pl, fp, reduce(s = 0, t IN fpTraits | s + t.importance) AS fpScore order by fpScore desc limit 5
MATCH (u)-[r2:IS_AT|PREFERS|DESIRES|VALUES]->()<-[]-(ns:NutritionalSupplement) WHERE NOT (ns)-[:LIMITED_BY]-(pl)
WITH u, fp, fpScore, ns, coalesce(r2.importance, 0.5) AS importance
WITH u, fp, fpScore, ns, collect({name: ns.name, importance: importance}) AS nsTraits
WITH u, fp, fpScore, ns, reduce(s = 0, t IN nsTraits | s + t.importance) AS nsScore order by nsScore desc limit 5
return u, fp.name, fpScore, ns.name, nsScore

What values of fp do you expect to have in the last block? It's not a part of the last query, so I don't think it can be in your WITH statements
You do not need to keep declaring fp in your WITH statements:
MATCH (u)-[r2:IS_AT|PREFERS|DESIRES|VALUES]->()<-[]-(ns:NutritionalSupplement)
WHERE NOT (ns)-[:LIMITED_BY]-(pl)
WITH u, ns, coalesce(r2.importance, 0.5) AS importance
WITH u, ns, collect({name: ns.name, importance: importance}) AS nsTraits
WITH u ns, reduce(s = 0, t IN nsTraits | s + t.importance) AS nsScore order by nsScore desc limit 5
return u, fp.name, fpScore, ns.name, nsScore

Related

Neo4j Cypher query and index of element in the collection

I'm trying to find index number of Decision by {decisionGroupId}, {decisionId} and {criteriaIds}
This is my current Cypher query:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion)
WHERE c.id IN {criteriaIds}
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
ORDER BY weight DESC, totalVotes DESC
WITH COLLECT(childD) AS ps
RETURN REDUCE(ix = -1, i IN RANGE(0, SIZE(ps)-1)
| CASE ps[i].id WHEN {decisionId} THEN i ELSE ix END) AS ix
I have only 3 Decision in the database but this query returns the following indices:
2
3
4
while I expecting something like(starting from 0 and -1 if not found)
0
1
2
What is wrong with my query and how to fix it?
UPDATED
This query is working fine with COLLECT(DISTINCT childD) AS ps:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion)
WHERE c.id IN {criteriaIds}
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
ORDER BY weight DESC, totalVotes DESC
WITH COLLECT(DISTINCT childD) AS ps
RETURN REDUCE(ix = -1, i IN RANGE(0, SIZE(ps)-1)
| CASE ps[i].id WHEN {decisionId} THEN i ELSE ix END) AS ix
Please help me to refactor this query and get rid of heavy REDUCE.
Let's try to get the reduce part right with a simpler query:
WITH ['a', 'b', 'c'] AS ps
RETURN
reduce(ix = -1, i IN RANGE(0, SIZE(ps)-1) |
CASE ps[i] WHEN 'b' THEN i ELSE ix END) AS ix
)
As I stated in the comments, it is usually better to avoid reduce if possible. So, to express the same using a list comprehension, use WHERE for filtering.
WITH ['a', 'b', 'c'] AS ps
RETURN [i IN RANGE(0, SIZE(ps)-1) WHERE ps[i] = 'b'][0]
The list comprehension results in a list with a single element, and we will use the [0] indexer to select that element.
After adapting this to your query, we'll get something like this:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = {decisionGroupId}
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion)
WHERE c.id IN {criteriaIds}
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
ORDER BY weight DESC, totalVotes DESC
WITH COLLECT(DISTINCT childD) AS ps
RETURN [i IN RANGE(0, SIZE(ps)-1) WHERE ps[i].id = {decisionId}][0]
If you have APOC installed, you can also use the function:
return apoc.coll.indexOf([1,2,3],2)

Invalid use of aggregating function max

match(n)-[r:LIKES]->(m) with count(n) as cnt, m where cnt = max(cnt) return m
Above query results in following error:
Invalid use of aggregating function max(...) in this context (line 1,
column 61 (offset: 60))
This query should return a collection of the m nodes that have the maximum count. It only needs to perform a single MATCH operation, and should be relatively performant.
MATCH (n)-[:LIKES]->(m)
WITH m, COUNT(n) AS cnt
WITH COLLECT({m: m, cnt: cnt}) AS data, MAX(cnt) AS maxcnt
RETURN REDUCE(ms = [], d IN data | CASE WHEN d.cnt = maxcnt THEN ms + d.m ELSE ms END) AS ms;
If you're just trying to find a single node that has the most LIKES relationships leading to it, then you can add an ORDER BY and LIMIT:
MATCH (n)-[:LIKES]->(m)
WITH m, count(n) AS cnt
ORDER BY cnt DESC
LIMIT 1
RETURN m
However, that query has the limitation in that if more than one node has the maximum number of inbound relationships, then those tied nodes won't be returned. To achieve that result, you might try something like this:
MATCH (n)-[:LIKES]->(m)
WITH m, count(n) AS cnt
WITH MAX(cnt) AS maxcnt
MATCH (o)
WHERE size((o)<-[:LIKES]-()) = maxcnt
RETURN o

Sum up counts in Neo4j

I have directors who have directed films. These films have genres and some actors starring. I want to find the films by a directed sorted by the sum of (no of genres of the film, no of actors starring in the film).
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc
ORDER BY gc DESC
This works but now I want to sum gc and sc and order films by the result. How does one do that?
I think you can just add the sum you want in your RETURN statement and then order results by it:
MATCH(n) -- (f:Film)
WHERE n.name = "Steven Spielberg"
MATCH (f) - [r] -> (g:Genre)
OPTIONAL MATCH (f) - [r2] -> (s:Starring)
WITH n, f, count(r) as gc, count(r2) as sc
RETURN n, f, gc, sc, gc+sc AS S
ORDER BY S DESC

Very slow this cypher query, is there any optimization?

Using Neo4J 2.1.5.
Data:
2000 persons
Goal: For each person, calculate the total of friends, friends' friends, friends' friends' friends.
Result is like follows:
Person FullName | Friends total | Friends-2 total | Friends-3 total | global total
MATCH (person:Person)
WITH person
OPTIONAL MATCH person-[:KNOWS]-(p2:Person)
WITH person, count(p2) as f1
OPTIONAL MATCH path = shortestPath(person-[:KNOWS*..2]-(f2:Person))
WHERE length(path) = 2
WITH count(nodes(path)[-1]) AS f2, person, f1
OPTIONAL MATCH path = shortestPath(person-[:KNOWS*..3]-(f3:Person))
WHERE length(path) = 3
WITH count(nodes(path)[-1]) AS f3, person, f2, f1
RETURN person._firstName + " " + person._lastName, f1, f2, f3, f1+f2+f3 AS total
The tricks is to avoid wrong calculations with cylic graph; that's why I use shortestPath.
However, this query lasts long: 60 seconds!
Any optimization possible?
[EDITED]
Does this work for you?
MATCH (person:Person)
OPTIONAL MATCH (person)-[:KNOWS]-(p1:Person)
WITH person, COALESCE(COLLECT(p1),[]) AS p1s
WITH person, CASE p1s WHEN [] THEN [NULL] ELSE p1s END AS p1s
UNWIND p1s AS p1
OPTIONAL MATCH (p1)-[:KNOWS]-(p2:Person)
WHERE NOT ((p2 = person) OR (p2 IN p1s))
WITH person, p1s, COALESCE(COLLECT(DISTINCT p2),[]) AS p2s
WITH person, p1s, CASE p2s WHEN [] THEN [NULL] ELSE p2s END AS p2s UNWIND p2s AS p2
OPTIONAL MATCH (p2)-[:KNOWS]-(p3:Person)
WHERE NOT ((p3 = person) OR (p3 IN p1s) OR (p3 IN p2s))
WITH person,
CASE p1s WHEN [NULL] THEN 0 ELSE SIZE(p1s) END AS f1,
CASE p2s WHEN [NULL] THEN 0 ELSE SIZE(p2s) END AS f2,
COUNT(DISTINCT p3) AS f3
RETURN person.firstName + " " + person.lastName, f1, f2, f3, f1+f2+f3 AS total;
Each friend is only counted once.
Here is an explanation of some of the more obscure tactics used. The query has to to replace empty p1s and p2s collections with [NULL] so that UNWIND will not abort the rest of the query. Then, when counting the size of the collections, we need give [NULL] collections a count of 0.

Calculating similarity index between two movies(Neo4j, cypher)

In extension to this
Multiple relationships in Match Cypher
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
(t)<-[h2:HAS_TAG]-(sm:Movie),
(m)-[h:HAS_TAG]->(t0:Tag),
(sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
WITH DISTINCT sm, h
RETURN sm, collect(h.weight)
I am finding trouble in getting the distinct values of h1, h2, H, h all at the same time.
I want to calculate the similarity index between any two movies which will be dependent on h1, h2, h, H (h1.h2/|h||H|)
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
(t)<-[h2:HAS_TAG]-(sm:Movie),
(m)-[h:HAS_TAG]->(t0:Tag),
(sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
WITH sum(h1.weight*h2.weight) as num, sm, H, m, h
WITH DISTINCT m, sqrt(sum(h.weight^2)) as den1, sm, H, num
WITH DISTINCT sm, sqrt(sum(H.weight^2)) as den2, den1, num
RETURN num/(den1*den2)
This is all messed up..But I am unable to figure out the right way to solve this. Please help.
This works and gives the correct answer...
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag)<-[h2:HAS_TAG]-(sm)
WHERE m <> sm
WITH SUM(h1.weight * h2.weight) AS num,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(h1)| xDot + a.weight^2)) AS xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(h2)| yDot + b.weight^2)) AS yLength, m, sm
RETURN num, xLength, yLength
Take a look at this example I generated using the Neo4j Console:
http://console.neo4j.org/?id=aq6cb3
The query should be:
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
(t)<-[h2:HAS_TAG]-(sm:Movie),
(m)-[h:HAS_TAG]->(t0:Tag),
(sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
WITH m, sm,
collect(DISTINCT h) AS h,
collect(DISTINCT H) AS H,
sum(h1.weight*h2.weight) AS num
WITH m, sm, num,
sqrt(reduce(s = 0.0, x IN h | s +(x.weight^2))) AS den1,
sqrt(reduce(s = 0.0, x IN H | s +(x.weight^2))) AS den2
RETURN m.title, sm.title, (num/(den1*den2)) AS similarity
Which results in the following:
+---------------------------------------------------------------+
| m.title | sm.title | similarity |
+---------------------------------------------------------------+
| "The Matrix" | "The Matrix: Revolutions" | 3.859767091086958 |
| "The Matrix" | "The Matrix: Reloaded" | 1.4380667053087486 |
+---------------------------------------------------------------+
I used the reduce function to aggregate the relationship values from a distinct collection and perform the similarity index calculation.

Resources