Neo4j Cypher query and complex sorting - neo4j

I have a following Cypher query:
MATCH (t:Tenant) WHERE ID(t) in {tenantIds}
OR t.isPublic
WITH COLLECT(t) as tenants
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = {decisionId}
AND (not (parentD)-[:BELONGS_TO]-(:Tenant)
OR any(t in tenants WHERE (parentD)-[:BELONGS_TO]-(t)))
AND (not (childD)-[:BELONGS_TO]-(:Tenant)
OR any(t in tenants WHERE (childD)-[:BELONGS_TO]-(t)))
MATCH (childD)<-[:SET_FOR]-(filterValue630:Value)-[:SET_ON]->(filterCharacteristic630:Characteristic)
WHERE id(filterCharacteristic630) = 630
WITH filterValue630, childD, ru, u
WHERE (filterValue630.value <= 799621200000)
OPTIONAL MATCH (childD)<-[:SET_FOR]->(sortValue631:Value)-[:SET_ON]->(sortCharacteristic631:Characteristic)
WHERE id(sortCharacteristic631) = 631
RETURN ru, u, childD AS decision,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD)
| {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD)
| {characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
ORDER BY sortValue631.value ASC, childD.createDate DESC
SKIP 0 LIMIT 100
as a result of this query execution I receive 15 records where each of them correctly contains populated commentGroups, weightedCriteria and valuedCharacteristics collections.
But when I changing my query to the following one(I'm adding sort condition by criteria weight):
MATCH (t:Tenant)
WHERE ID(t) in {tenantIds}
OR t.isPublic
WITH COLLECT(t) as tenants
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = {decisionId}
AND (not (parentD)-[:BELONGS_TO]-(:Tenant)
OR any(t in tenants WHERE (parentD)-[:BELONGS_TO]-(t)))
AND (not (childD)-[:BELONGS_TO]-(:Tenant)
OR any(t in tenants WHERE (childD)-[:BELONGS_TO]-(t)))
MATCH (childD)<-[:SET_FOR]-(filterValue630:Value)-[:SET_ON]->(filterCharacteristic630:Characteristic)
WHERE id(filterCharacteristic630) = 630
WITH filterValue630, childD, ru, u
WHERE (filterValue630.value <= 799621200000)
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN {criteriaIds}
WITH c, childD, ru, u, (vg.avgVotesWeight * (CASE WHEN c IS NOT NULL THEN coalesce({criteriaCoefficients}[toString(id(c))], 1.0) ELSE 1.0 END)) as weight, vg.totalVotes as totalVotes
OPTIONAL MATCH (childD)<-[:SET_FOR]->(sortValue631:Value)-[:SET_ON]->(sortCharacteristic631:Characteristic)
WHERE id(sortCharacteristic631) = 631
RETURN ru, u, childD AS decision, toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes, sortValue631,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD)
| {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD)
| {characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
ORDER BY weight DESC, totalVotes ASC, sortValue631.value ASC, childD.createDate DESC
SKIP 0 LIMIT 100
the query works without errors and returns the same result set of 15 records but commentGroups, weightedCriteria and valuedCharacteristics collections are only populated where weight > 0 The rest of them are null
This is wrong and not as expected. The commentGroups, weightedCriteria and valuedCharacteristics collections should be populated for all records in my result set as it was after the first query execution.
Right now I don't understand why the following part of new Cypher query prevents correct population of the mentioned collections:
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN {criteriaIds}
WITH c, childD, ru, u, (vg.avgVotesWeight * (CASE WHEN c IS NOT NULL THEN coalesce({criteriaCoefficients}[toString(id(c))], 1.0) ELSE 1.0 END)) as weight, vg.totalVotes as totalVotes
What am I doing wrong within a new query and how to fix it?
UPDATED
This is the query which produces the issue:
MATCH (t:Tenant) WHERE ID(t) in []
OR t.isPublic
WITH COLLECT(t) as tenants
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = 60565
AND (not (parentD)-[:BELONGS_TO]-(:Tenant)
OR any(t in tenants WHERE (parentD)-[:BELONGS_TO]-(t)))
AND (not (childD)-[:BELONGS_TO]-(:Tenant)
OR any(t in tenants WHERE (childD)-[:BELONGS_TO]-(t)))
MATCH (childD)<-[:SET_FOR]-(filterValue60639:Value)-[:SET_ON]->(filterCharacteristic60639:Characteristic)
WHERE id(filterCharacteristic60639) = 60639
WITH filterValue60639, childD, ru, u
WHERE (filterValue60639.value <= 799621200000)
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [60581, 60575]
WITH childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
OPTIONAL MATCH (childD)<-[:SET_FOR]->(sortValue60640:Value)-[:SET_ON]->(sortCharacteristic60640:Characteristic)
WHERE id(sortCharacteristic60640) = 60640
RETURN ru, u, childD AS decision, toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes, sortValue60640,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD)
| {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD)
| {characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
ORDER BY weight DESC, totalVotes ASC, sortValue60640.value ASC, childD.createDate DESC
SKIP 0 LIMIT 100
for a some reason
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [60581, 60575]
prevents commentGroups, weightedCriteria and valuedCharacteristics collection population for all childD that do not match this expression.. How to fix this ?

Okay, this is a rather odd thing. I found something that should work, though at the moment I can't tell why it's working, just that it involves calculating weight and totalVotes before your return.
Take the first line of your RETURN, and replace it with this, which includes a WITH clause first, which will calculate the weight and totalVotes, then perform the RETURN:
WITH ru, u, childD, toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes, sortValue60640
RETURN ru, u, childD AS decision, weight, totalVotes, sortValue60640,
One other thing to note, you can save some unnecessary operations by performing your ORDER BY, SKIP, and LIMIT operations before you perform your pattern comprehensions:
WITH ru, u, childD, toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes, sortValue60640
ORDER BY weight DESC, totalVotes ASC, sortValue60640.value ASC, childD.createDate DESC
SKIP 0 LIMIT 100
RETURN ru, u, childD AS decision, weight, totalVotes, sortValue60640,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD)
| {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD)
| {characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

Related

Neo4j Cypher count query performance optimizaztion

I have the following Neo4j Cypher count() query:
MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = 1
MATCH (childD)-[relationshipValueRel4:HAS_VALUE_ON]-(filterCharacteristic4:Characteristic)
WHERE filterCharacteristic4.id = 4
WITH relationshipValueRel4, childD, dg
WHERE (ANY (id IN [5, 25, 106] WHERE id IN relationshipValueRel4.optionIds ))
WITH childD, dg
RETURN count(childD) as total
Right now this query works pretty slow:
Cypher version: CYPHER 3.3, planner: COST, runtime: INTERPRETED. 3380782 total db hits in 2991 ms.
This is PROFILE output:
How to optimize this query performance ?
P.S
The corresponding main query works pretty fast:
PROFILE MATCH (dg:DecisionGroup)-[:CONTAINS]->(childD:Decision)
WHERE dg.id = 1
MATCH (childD)-[relationshipValueRel4:HAS_VALUE_ON]-(filterCharacteristic4:Characteristic)
WHERE filterCharacteristic4.id = 4
WITH relationshipValueRel4, childD, dg
WHERE (ANY (id IN [5, 25, 106]
WHERE id IN relationshipValueRel4.optionIds ))
WITH childD, dg WITH childD , dg
SKIP 0 LIMIT 10
WITH *
MATCH (childD)-[ru:CREATED_BY]->(u:User)
OPTIONAL MATCH (childD)-[rup:UPDATED_BY]->(up:User)
RETURN ru, u, rup, up, childD AS decision,
[ (dg)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: toInt(entity.id), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (dg)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, [ (dg)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-()) | {characteristicId: toInt(ch1.id), optionIds: v1.optionIds, valueIds: v1.valueIds, value: v1.value, available: v1.available, totalHistoryValues: v1.totalHistoryValues, totalFlags: v1.totalFlags, description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
Cypher version: CYPHER 3.3, planner: COST, runtime: INTERPRETED. 11725 total db hits in 8 ms
Please help to optimize the count() query performance also.

Neo4j Cypher pattern comprehension with optional match

I have the following pattern comprehension:
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria
Right now I need to add additional optional matching to this query.. something like this one:
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(v1:Vote)-[:VOTED_ON]->(c1)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria
but it doesn't work with a Cypher error - org.neo4j.driver.v1.exceptions.ClientException: Invalid input 'P': expected 'r/R'
Please show how to correctly add this optional match.
UPDATED
I need something like this one:
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)<-[:VOTED_FOR*0..1]-(v1:Vote)-[:VOTED_ON]->(c1)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes), userVotes: v1} ] AS weightedCriteria
in other words, I need to make the list of v1 as the sublist of weightedCriteria.userVotes(SDN #QueryResult) but right now my test fails on this new query with the assertion - it expects 3 records but returns 13...
This is a Neo4j sandbox:
https://10-0-1-12-35256.neo4jsandbox.com/browser/
Username: neo4j
Password: probe-jumps-lick
this my old query:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE parentD.id = 1
WITH childD , parentD
ORDER BY childD.createDate DESC
SKIP 0 LIMIT 100
WITH *
MATCH (childD)-[ru:CREATED_BY]->(u:User)
OPTIONAL MATCH (childD)-[rup:UPDATED_BY]->(up:User)
RETURN ru, u, rup, up, childD AS decision,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: toInt(entity.id), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-())
| {characteristicId: toInt(ch1.id), value: v1.value, available: v1.available, totalHistoryValues: v1.totalHistoryValues, totalFlags: v1.totalFlags, description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
this is a new Cypher query:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE parentD.id = 1
WITH childD , parentD
ORDER BY childD.createDate DESC
SKIP 0 LIMIT 100
WITH *
MATCH (childD)-[ru:CREATED_BY]->(u:User)
OPTIONAL MATCH (childD)-[rup:UPDATED_BY]->(up:User)
RETURN ru, u, rup, up, childD AS decision,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: toInt(entity.id), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)<-[:VOTED_FOR*0..1]-(v1:Vote)-[:VOTED_ON]->(c1)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes), userVotes: v1} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-())
| {characteristicId: toInt(ch1.id), value: v1.value, available: v1.available, totalHistoryValues: v1.totalHistoryValues, totalFlags: v1.totalFlags, description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
I believe this query is working fine and produces JOIN between Criteria and Votes.. this is why I see 13 records instead of 3 from my previous query...
It will be really great if you can show me a solution how to return the 3 rows (with Criterion info on the root level and nested info(sublist) of Votes) instead of 13 different records...
I need the solution that will produce 3 original records(as the first query) with a nested info and not JOINS... I need this because I do the custom projection of query result into my object model and need to have a sublist of Votes as a property of weightedCriteria.
Also inside of this pattern comprehension I have to filter Votes by User - something like this: (v1)-[ru:CREATED_BY]->(u:User) WHERE u.id = {userId}
is it possible to implement?
Instead of OPTIONAL MATCH you can do it using a variable-length pattern matching in the :VOTED_FOR relationship. A variable-length from zero to one will be equivalent to an OPTIONAL MATCH:
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)<-[:VOTED_FOR*0..1]-(v1:Vote)-[:VOTED_ON]->(c1)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria
[:VOTED_FOR*0..1] makes this relation optional in the pattern.

SDN4/OGM Cypher query and duplicates at Result

I have a following Cypher query:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE parentD.id = 1
OPTIONAL MATCH (childD)-[sortValue1:HAS_VALUE_ON]->(sortCharacteristic1:Characteristic)
WHERE sortCharacteristic1.id = 1
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)
OPTIONAL MATCH (childD)-[rup:UPDATED_BY]->(up:User)
WITH ru, u, rup, up, childD , sortValue1
ORDER BY sortValue1.value ASC SKIP 0 LIMIT 100
RETURN ru, u, rup, up, childD AS decision,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: toInt(entity.id), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-())
| {characteristicId: toInt(ch1.id), value: v1.value, available: v1.available, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
which correctly returns 3 Decision:
I have introduced a new node Tag and associated it with Decision in a following manner:
(d:Decision)-[rdt:BELONGS_TO]->(t:Tag)
I have updated the first query in order to return Decision with Tags:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE parentD.id = 1
OPTIONAL MATCH (childD)-[sortValue1:HAS_VALUE_ON]->(sortCharacteristic1:Characteristic)
WHERE sortCharacteristic1.id = 1
WITH *
MATCH (childD)-[ru:CREATED_BY]->(u:User)
OPTIONAL MATCH (childD)-[rup:UPDATED_BY]->(up:User)
OPTIONAL MATCH (childD)-[rdt:BELONGS_TO]->(t:Tag)
WITH ru, u, rup, up, rdt, t, childD , sortValue1
ORDER BY sortValue1.value ASC SKIP 0 LIMIT 100
RETURN ru, u, rup, up, rdt, t, childD AS decision,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD)
| {entityId: toInt(entity.id), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD)
| {criterionId: toInt(c1.id), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-())
| {characteristicId: toInt(ch1.id), value: v1.value, available: v1.available, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
I have added 2 Tag to Neo4j decision.
The query works fine and correctly returns the same 3 decisions where Neo4j contains 2 Tag:
but I ran into the following issue - I use SDN4/OGM in order to convert a query Result to my model:
Result result = session.query(cypherQuery.toString(), parameters);
for (Map<String, Object> result : queryResult) {
Decision decision = (Decision) result.get("decision");
}
and instead of having 3 Decision - Result contains 4 Decision:
Redis(null tags)
MongoDB(null tags)
Neo4j(tag1 and tag 2)
Neo4j(tag1 and tag 2)
as you can see - the result contains the same Neo4j decision two time.
What am I doing wrong and how to tell OGM/SDN 4 to place Neo4j(tag1 and tag 2) only one time in the result set ?
By the way - I have added OPTIONAL MATCH (childD)-[rdt:BELONGS_TO]->(t:Tag) in order to initialize Tag within Decision. If it can be done in some other way - please let me know also.
UPDATED
I have created a Neo4j Sandbox to test the mentioned queries:
http://52.87.220.140:33853/browser/
Username: neo4j
Password: idea-chocks-payroll
I have debugged the OGM/Neo4j internals. The following data comes with org.neo4j.graphdb.Result:
(scala.collection.convert.Wrappers$MapWrapper<A,B>) {rup=null, commentGroups=[], up=null, t=Node[6677], u=Node[6667], decision=Node[6678], weightedCriteria=[], ru=(6678)-[CREATED_BY,22875]->(6667), valuedCharacteristics=[], rdt=(6678)-[BELONGS_TO,22876]->(6677)}
(scala.collection.convert.Wrappers$MapWrapper<A,B>) {rup=null, commentGroups=[], up=null, t=Node[6676], u=Node[6667], decision=Node[6678], weightedCriteria=[], ru=(6678)-[CREATED_BY,22875]->(6667), valuedCharacteristics=[], rdt=(6678)-[BELONGS_TO,22877]->(6676)}
(scala.collection.convert.Wrappers$MapWrapper<A,B>) {rup=null, commentGroups=[], up=null, t=null, u=Node[6667], decision=Node[6684], weightedCriteria=[], ru=(6684)-[CREATED_BY,22895]->(6667), valuedCharacteristics=[{totalHistoryValues=0, description=null, valueType=INTEGER, characteristicId=1, available=null, visualMode=INTEGERRANGESLIDER, value=25}], rdt=null}
(scala.collection.convert.Wrappers$MapWrapper<A,B>) {rup=null, commentGroups=[], up=null, t=null, u=Node[6667], decision=Node[6681], weightedCriteria=[], ru=(6681)-[CREATED_BY,22886]->(6667), valuedCharacteristics=[{totalHistoryValues=0, description=Integer value, valueType=INTEGER, characteristicId=1, available=true, visualMode=INTEGERRANGESLIDER, value=10}], rdt=null}
There are 4 Objects(rows) at the org.neo4j.graphdb.Result model..
Do I need to handle duplicates at my application code or it is possible to change Cypher query or SDN4/OGM in order to prevent duplicates at the Result ? Ideally I need a single row with decision=Node[6678] and 2 Tag inside(t=Node[6676] and t=Node[6677]) instead of a 2 different rows.
For example, is it possible to change the following Cypher RETURN statement in order to place tags(t) inside decision: RETURN ru, u, rup, up, rdt, t, childD AS decision instead of having both of them at the same level ?

Neo4j Cypher query structure and performance optimization

I have created a Cypher query dynamic builder. For a complex cases this builder produces a quite big queries, for example:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = {decisionId}
MATCH (childD)<-[:SET_FOR]-(filterValue415431:Value)-[:SET_ON]->(filterCharacteristic415431:Characteristic)
WHERE id(filterCharacteristic415431) = 415431
WITH filterValue415431, childD, ru, u
WHERE ({filterValue4154311} IN filterValue415431.value )
OR ({filterValue4154312} IN filterValue415431.value )
OR ({filterValue4154313} IN filterValue415431.value )
OR ({filterValue4154314} IN filterValue415431.value )
OR ({filterValue4154315} IN filterValue415431.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415441:Value)-[:SET_ON]->(filterCharacteristic415441:Characteristic)
WHERE id(filterCharacteristic415441) = 415441
WITH filterValue415441, childD, ru, u
WHERE ({filterValue4154416} IN filterValue415441.value )
OR ({filterValue4154417} IN filterValue415441.value )
OR ({filterValue4154418} IN filterValue415441.value )
OR ({filterValue4154419} IN filterValue415441.value )
OR ({filterValue41544110} IN filterValue415441.value )
OR ({filterValue41544111} IN filterValue415441.value )
OR ({filterValue41544112} IN filterValue415441.value )
OR ({filterValue41544113} IN filterValue415441.value )
OR ({filterValue41544114} IN filterValue415441.value )
OR ({filterValue41544115} IN filterValue415441.value )
OR ({filterValue41544116} IN filterValue415441.value )
OR ({filterValue41544117} IN filterValue415441.value )
MATCH (childD)<-[:SET_FOR]-(filterValue416273:Value)-[:SET_ON]->(filterCharacteristic416273:Characteristic)
WHERE id(filterCharacteristic416273) = 416273
WITH filterValue416273, childD, ru, u
WHERE (filterValue416273.value >= {filterValue41627318})
AND (filterValue416273.value <= {filterValue41627319})
MATCH (childD)<-[:SET_FOR]-(filterValue417410:Value)-[:SET_ON]->(filterCharacteristic417410:Characteristic)
WHERE id(filterCharacteristic417410) = 417410
WITH filterValue417410, childD, ru, u
MATCH (childD)<-[:SET_FOR]-(filterValue416423:Value)-[:SET_ON]->(filterCharacteristic416423:Characteristic)
WHERE id(filterCharacteristic416423) = 416423
WITH filterValue416423, childD, ru, u
WHERE ({filterValue41642320} IN filterValue416423.value )
OR ({filterValue41642321} IN filterValue416423.value )
OR ({filterValue41642322} IN filterValue416423.value )
OR ({filterValue41642323} IN filterValue416423.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415673:Value)-[:SET_ON]->(filterCharacteristic415673:Characteristic)
WHERE id(filterCharacteristic415673) = 415673
WITH filterValue415673, childD, ru, u
WHERE ({filterValue41567324} IN filterValue415673.value )
OR ({filterValue41567325} IN filterValue415673.value )
OR ({filterValue41567326} IN filterValue415673.value )
OR ({filterValue41567327} IN filterValue415673.value )
OR ({filterValue41567328} IN filterValue415673.value )
OR ({filterValue41567329} IN filterValue415673.value )
OR ({filterValue41567330} IN filterValue415673.value )
OR ({filterValue41567331} IN filterValue415673.value )
OR ({filterValue41567332} IN filterValue415673.value )
OR ({filterValue41567333} IN filterValue415673.value )
OR ({filterValue41567334} IN filterValue415673.value )
OR ({filterValue41567335} IN filterValue415673.value )
OR ({filterValue41567336} IN filterValue415673.value )
OR ({filterValue41567337} IN filterValue415673.value )
OR ({filterValue41567338} IN filterValue415673.value )
OR ({filterValue41567339} IN filterValue415673.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN {criteriaIds}
WITH childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
{entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
{criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD) |
{characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
Right now I'm not very happy with a performance. For example call on this query takes ~500ms
Could you please take a look and tell if there is a chance to improve this query ?
UPDATED
This is a pretty much the same query but with a different parameters:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423:Value)-[:SET_ON]->(filterCharacteristic416423:Characteristic)
WHERE id(filterCharacteristic416423) = 416423
WITH filterValue416423, childD, ru, u
WHERE ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
MATCH (childD)<-[:SET_FOR]-(filterValue416273:Value)-[:SET_ON]->(filterCharacteristic416273:Characteristic)
WHERE id(filterCharacteristic416273) = 416273 WITH filterValue416273, childD, ru, u
WHERE (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
MATCH (childD)<-[:SET_FOR]-(filterValue415431:Value)-[:SET_ON]->(filterCharacteristic415431:Characteristic)
WHERE id(filterCharacteristic415431) = 415431 WITH filterValue415431, childD, ru, u
WHERE ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415441:Value)-[:SET_ON]->(filterCharacteristic415441:Characteristic)
WHERE id(filterCharacteristic415441) = 415441 WITH filterValue415441, childD, ru, u
WHERE ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
{entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
{criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD) |
{characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 646192 total db hits in 390 ms.
UPDATED
This is the output of :schema
Indexes
ON :Characteristic(lowerName) ONLINE
ON :CharacteristicGroup(lowerName) ONLINE
ON :Criterion(lowerName) ONLINE
ON :CriterionGroup(lowerName) ONLINE
ON :Decision(lowerName) ONLINE
ON :FlagType(name) ONLINE (for uniqueness constraint)
ON :HistoryValue(originalValue) ONLINE
ON :Permission(code) ONLINE (for uniqueness constraint)
ON :Role(name) ONLINE (for uniqueness constraint)
ON :User(email) ONLINE (for uniqueness constraint)
ON :User(username) ONLINE (for uniqueness constraint)
ON :Value(value) ONLINE
Constraints
ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
ON ( role:Role ) ASSERT role.name IS UNIQUE
ON ( user:User ) ASSERT user.email IS UNIQUE
ON ( user:User ) ASSERT user.username IS UNIQUE
UPDATED
I have optimized the query as suggest at the answer below:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423)-[:SET_ON]->(filterCharacteristic416423)
WHERE id(filterCharacteristic416423) = 416423
WITH DISTINCT filterValue416423, childD
WHERE ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
MATCH (childD)<-[:SET_FOR]-(filterValue416273)-[:SET_ON]->(filterCharacteristic416273)
WHERE id(filterCharacteristic416273) = 416273
WITH DISTINCT childD, filterValue416273
WHERE (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
MATCH (childD)<-[:SET_FOR]-(filterValue415431)-[:SET_ON]->(filterCharacteristic415431)
WHERE id(filterCharacteristic415431) = 415431
WITH DISTINCT childD, filterValue415431
WHERE ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415441)-[:SET_ON]->(filterCharacteristic415441)
WHERE id(filterCharacteristic415441) = 415441
WITH DISTINCT childD, filterValue415441
WHERE ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH DISTINCT * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH DISTINCT childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH DISTINCT ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
{entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
{criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1)-[:SET_FOR]->(childD) |
{characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
PROFILE output:
With DISTINCT childD the query works pretty slow, without much better but stil so far from perfect
One more try
PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423)-[:SET_ON]->(filterCharacteristic416423)
USING JOIN ON childD
WHERE id(filterCharacteristic416423) = 416423
AND ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue416273)-[:SET_ON]->(filterCharacteristic416273)
USING JOIN ON childD
WHERE id(filterCharacteristic416273) = 416273 AND (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415431)-[:SET_ON]->(filterCharacteristic415431)
USING JOIN ON childD
WHERE id(filterCharacteristic415431) = 415431
AND ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415441)-[:SET_ON]->(filterCharacteristic415441)
USING JOIN ON childD
WHERE id(filterCharacteristic415441) = 415441
AND ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH DISTINCT * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH DISTINCT childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH DISTINCT ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN childD
The main problem with your query, is that you are basically doing a lot of checks, with rows running wild. So here are some tips to reduce how many rows you are generating at each MATCH.
1) Unless you NEED duplicates, use WITH DISTINCT instead of just WITH. WITH can create duplicate rows (because you only cut off a column), and every duplicate row you process is wasted time and extra DB hits. (Namely, every filter column you drop adds duplicate rows)
2) :Value.value is overloaded. It has no semantic meaning, and the value isn't even guaranteed to be any kind of type. That means every :Value check has to go out and touch a bunch of :Value nodes that have nothing to do with what your searching for. So as the number of attached :Value nodes increases, the more expensive it becomes to find the right one (This is less expensive if it could be indexed, so that it could just find the right :Value, and see what it is connected to. This doesn't help if you can't change the schema you're working with, and by schema, I mean how your data/relationships are setup).
3) Only check what you need to check. It might seem more efficient to say (a:A)-[:TO]->(b:B), but if all [:TO] are from :A to :B, Neo4j now has to verify that the first node is an :A and the second node is a :B. Cypher doesn't know what is implicitly true, so it has to do the check, but each of these redundant checks has to go out and hit the DB for every row. So it is better to say (a)-[:TO]->(b).
4) Limit variable scope. Here, you match -[ru:CREATED_BY]->(u:User) at the beginning but than don't use it til the end, with no filters. This multiplies how many rows you have by the number of -[ru:CREATED_BY]->(u:User) on each decision, that ALL have to be checked in the further matches. Unless -[ru:CREATED_BY]->(u:User) somehow greatly limits the matched decisions (or there can only be one per decision), match this support information at the end.
5) Order your filters from strongest to weakest (if you can). to cut as many rows as early as possible.
6) Tricks to minimize rows. Each row pulled up makes the following steps in the query have to work that much harder, so minimize rows in queries. If you are using OR to combine unrelated, but similar columns queries (like all orgs with conditions A or orgs with conditions B) and the work of the two queries just make things more expensive for the other half, it might be better to use UNION to combine the results of smaller, faster queries (and UNION can run in parallel up to the merge results). Note that simple queries like WHERE org.id in [1,2,3] are still faster than UNION, since the work can all be done in one lookup.
Aside from union, if you are collecting nodes that you don't filter on, you can use collect(column) to reduce 'duplicates' down to 1 row, and than UNWIND (column) as column at the end of the query to get your rows back! (column here referring to variable name)
7) Doing a lot of filters on 1 node? Cypher has USING hints for that! The hint USING JOIN ON column tells Cypher that it will probably be more efficient doing this match with more starting leafs and joining them. So using USING JOIN ON childD on each match will tell Cypher to do all the filters in parallel, and use the overlapping rows of all of them. Note that USINGs are just you telling Cypher "trust me, this should go faster if we try doing this" which can actually make the query worse if you are wrong. (USING JOIN should be useful though for making large queries more parallel though)
UPDATE:
First, a note on node.id = "constant" AND node.value = "constant" OR node.id = "constant2" AND node.value = "constant2" vs node.value = map[node.id]. The first query is able to do node filtering on node lookup, while the later has to filter through all of the nodes that where already looked up. Without previous filtering on that lookup, that means the map has to pull in all nodes. While the map offers some level of (arguable) simplicity/flexibility, it is one of the least efficient ways to filter nodes.
Second, The big problem with your query now, is the :Value is super overloaded, and you aren't finding it by ID. :Value should be a relationship, or have an indexed ID field so that you don't have to touch ALL <-[:SET_FOR]- and -[:SET_ON]->. Using the Join hint I think will at least make SET_FOR higher priority, which appears to be the more efficient of the two.
Here is my attempt to rewrite the PROFILE query more efficiently. (v1)
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423)-[:SET_ON]->(filterCharacteristic416423)
USING JOIN ON childD
WHERE id(filterCharacteristic416423) = 416423
WHERE ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue416273)-[:SET_ON]->(filterCharacteristic416273)
USING JOIN ON childD
WHERE id(filterCharacteristic416273) = 416273 AND (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415431)-[:SET_ON]->(filterCharacteristic415431)
USING JOIN ON childD
WHERE id(filterCharacteristic415431) = 415431
WHERE ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415441)-[:SET_ON]->(filterCharacteristic415441)
USING JOIN ON childD
WHERE id(filterCharacteristic415441) = 415441
WHERE ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH DISTINCT * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH DISTINCT childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH DISTINCT ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
{entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
{criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1)-[:SET_FOR]->(childD) |
{characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics

Neo4j Cypher query sort order

In my Neo4j/SDN project I have a following mode:
Decision entity that contains child Decision and Characteristic entities.
Each pair of child Decision and Characteristic can have a Value node assigned.
I have created 3 child Decision nodes, for example
childDecision1
childDecision2
childDecision3
and one Characteristic:
characterisitc1
I have assigned following values to the following pairs:
childDecision2 + characterisitc1 = Value(Integer 10)
childDecision3 + characterisitc1 = Value(Integer 25)
I'm executing the following Cypher query(with ORDER BY sortValue88.value ASC):
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = {decisionId}
OPTIONAL MATCH (childD)<-[:SET_FOR]->(sortValue88:Value)-[:SET_ON]->(sortCharacteristic88:Characteristic)
WHERE id(sortCharacteristic88) = 88
WITH ru, u, childD , sortValue88
ORDER BY sortValue88.value ASC SKIP 0 LIMIT 100
RETURN ru, u, childD AS decision, [ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, [ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) | {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, [ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD) | {characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
As the result I have:
childDecision2 (Value = 10)
childDecision3 (Value = 25)
childDecision1 (no value provided)
So far everything works fine.
Right now I'm going to change the sort order direction from ASC to DESC:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = {decisionId}
OPTIONAL MATCH (childD)<-[:SET_FOR]->(sortValue88:Value)-[:SET_ON]->(sortCharacteristic88:Characteristic)
WHERE id(sortCharacteristic88) = 88
WITH ru, u, childD , sortValue88
ORDER BY sortValue88.value DESC SKIP 0 LIMIT 100
RETURN ru, u, childD AS decision, [ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups, [ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) | {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria, [ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD) | {characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
As the result I have:
childDecision1 (no value provided)
childDecision3 (Value = 25)
childDecision2 (Value = 10)
Right now I don't understand why the childDecision1 hold the first place but I expect childDecision3 instead there.
Could you please help to explain/fix this behavior ?
Because: When sorting the result set, null will always come at the end of the result set for ascending sorting, and first when doing descending sort.
So you need to know the minimum possible value for sorting. For example, if all values are not less than zero
WITH [1, 0, 2, NULL, 4] AS CS
UNWIND RANGE(0, size(CS)-1) as i
RETURN i,
CASE WHEN CS[i] IS NULL THEN -1 ELSE CS[i] END AS sortValue
ORDER BY sortValue DESC

Resources