Neo4j Cypher query does many DB hits - neo4j

I have the following query:
MATCH (dg:DecisionGroup {id: -3})-[rdgd:CONTAINS]->(childD:Vacancy ) -[:REQUIRES]->(ceNode:Requirable)
WHERE ceNode.id in [2, 4, 8, 9]
WITH childD , collect(ceNode) as ceNodes with childD,
apoc.coll.toSet(reduce(ceNodeLabels = [], n IN ceNodes | ceNodeLabels + labels(n))) as ceNodeLabels WHERE all(x IN ['Employment', 'Location'] WHERE x IN ceNodeLabels)
WITH childD
WHERE ( (childD.`hourlyRateUsd` >= 35) OR (childD.`salaryUsd` >= 5000) ) AND (childD.`active` = true)
WITH childD MATCH (childD)-[:CONTAINS]->(childDStat:JobableStatistic)
MATCH (childD)-[:HAS_VOTE_ON]->(vc:Criterion)
WHERE vc.id IN [64, 65, 67, 71, 72, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 63]
WITH childD, childDStat, collect(DISTINCT vc.id) as vacancyCriterionIds
WHERE ALL(id IN childDStat.detailedCriterionIds WHERE id IN [64, 65, 67, 71, 72, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 63])
UNWIND childDStat.detailedCriterionIds AS mCId
WITH childD, childDStat, mCId
WHERE (childDStat['criterionAvgVoteWeights.' + mCId] = 0 OR childDStat['criterionAvgVoteWeights.' + mCId] <= {`80`:1.4, `84`:2.8, `72`:3.0, `83`:1.4, `82`:1.4, `71`:5.0, `81`:4.2, `77`:0.0, `76`:5.0, `65`:2.0, `64`:4.0, `75`:3.0, `74`:4.0, `85`:2.8, `63`:4.0, `79`:5.0, `68`:0.0, `78`:2.8, `67`:1.0}[toString(mCId)]) AND (childDStat['criterionExperienceMonths.' + mCId] = 0 OR childDStat['criterionExperienceMonths.' + mCId] <= {`80`:0, `84`:0, `72`:48, `83`:0, `82`:0, `71`:36, `81`:0, `77`:0, `76`:0, `65`:7, `64`:96, `75`:36, `74`:72, `85`:0, `63`:60, `79`:0, `68`:0, `78`:0, `67`:4}[toString(mCId)])
WITH childD, collect(mCId) as mCIds, childDStat WHERE size(mCIds) >= size(childDStat.detailedCriterionIds)
WITH childD, childDStat
UNWIND childDStat.criterionIds AS cId
WITH childD, sum(childDStat['criterionCoefficients.' + cId] * {`80`:1.4, `84`:2.8, `72`:3.0, `83`:1.4, `82`:1.4, `71`:5.0, `81`:4.2, `77`:0.0, `76`:5.0, `65`:2.0, `64`:4.0, `75`:3.0, `74`:4.0, `85`:2.8, `63`:4.0, `79`:5.0, `68`:0.0, `78`:2.8, `67`:1.0}[toString(cId)]) as weight, sum({`80`:1, `84`:1, `72`:1, `83`:1, `82`:1, `71`:1, `81`:1, `77`:1, `76`:1, `65`:1, `64`:1, `75`:1, `74`:1, `85`:1, `63`:1, `79`:1, `68`:1, `78`:1, `67`:1}[toString(cId)]) as totalVotes, sum(childDStat['criterionCoefficients.' + cId]) as criterionCoefficientSum WITH childD, weight, totalVotes, criterionCoefficientSum
MATCH (dg:DecisionGroup {id: -3})-[rdgd:CONTAINS]->(childD)
OPTIONAL MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH childD, dg, rdgd, u, ru, weight, totalVotes , criterionCoefficientSum
ORDER BY weight / criterionCoefficientSum DESC, childD.createdAt DESC
SKIP 0 LIMIT 10
OPTIONAL MATCH (jobable:Decision:Profile {id: 35463})
RETURN childD AS decision, dg, rdgd, u, ru, weight, totalVotes,
[ (jobable)-[vg1:HAS_VOTE_ON]->(c1:Criterion)<-[:HAS_VOTE_ON]-(childD) | {criterion: c1, relationship: vg1} ] AS jobableWeightedCriteria ,
[(jobable)-[:HAS_VOTE_ON]->(c1:Criterion)<-[vg1:HAS_VOTE_ON]-(childD) | {criterion: c1, relationship: vg1} ] AS weightedCriteria ,
[ (childD)-[:REQUIRES]->(ce:CompositeEntity) | {entity: ce} ] AS decisionCompositeEntities,
[ (childD)-[:REQUIRES]->(ce:CompositeEntity)-[:CONTAINS]->(trans:Translation) WHERE trans.iso6391 = 'uk' | {entityId: toInteger(id(ce)), translation: trans} ] AS decisionCompositeEntitiesTranslations,
[ (childD)-[:CONTAINS]->(trans:Translation) WHERE trans.iso6391 = 'uk' | {entityId: toInteger(childD.id), translation: trans} ] AS decisionTranslations
I pass every single value to the query as parameters. This is just the output for debugging.
Most of the query works fine except the part at the beginning:
Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED. 781495 total db hits in 577 ms.
Please advise how to optimize/refactor this part of the query in order to reduce DB hits. Thanks!

Related

Neo4 cypher: Check if node with relationships to a list of node IDs exists

I have the following node structure: (:Patch)-[:INCLUDES]->(:Roster)-[:HAS]->(:PublicList)-[:INCLUDES]->(u:Unit)
Then I have an array of :Unit ids: [197, 196, 19, 20, 191, 171, 3, 174, 194, 185]
I would like to check whether a :PublicList that has the :INCLUDES relationship to all the :Unit ids in the list already exists.
I tried writing a COUNT and MATCH query like this, but this just seems like an error-prone long-winded approach:
MATCH (p:Patch)-[:INCLUDES]->(r:Roster)-[:HAS]-(d:PublicList)
WITH COLLECT(d) as drafts
UNWIND drafts as draft
WITH draft
UNWIND [197, 196, 19, 20, 191, 171, 3, 174, 194, 185] as unitID
MATCH (draft)-[:INCLUDES]->(u:Unit)
WHERE id(u) = unitID
WITH count(DISTINCT u) as draftUnits
WITH COLLECT(draftUnits) as matchCounts
RETURN matchCounts
Can someone help me write this so it returns a boolean if a :PublicList has a:INCLUDES relationship to all the IDs in the list?
I suggest to first match the units, put them into a collection and then use the ALL predicate to check that the PublicList has a connection to all units.
MATCH (n:Unit) WHERE id(n) IN [197, 196, 19, 20, 191, 171, 3, 174, 194, 185]
WITH collect(n) AS units
MATCH (p:Patch)-[:INCLUDES]->(r:Roster)-[:HAS]-(d:PublicList)
WHERE ALL(x IN units WHERE (d)-[:INCLUDES]->(x))
RETURN count(*) AS matchCount
If you want to return the PublicList along with a boolean value if it matches all of them, you can slightly adjust like this
MATCH (n:Unit) WHERE id(n) IN [197, 196, 19, 20, 191, 171, 3, 174, 194, 185]
WITH collect(n) AS units
MATCH (p:Patch)-[:INCLUDES]->(r:Roster)-[:HAS]-(d:PublicList)
RETURN d, ALL(x IN units WHERE (d)-[:INCLUDES]->(x)) as matchAll
Your query looks good but can be improved. Just to fix it, you need to use
u.id = unitID
instead of
WHERE id(u) = unitID
The latter is an internal id function which is uses a unique identification to all other nodes in the same database while the latter is a simple property named: id

Neo4j Cypher query execution plan optimization

I have the following Cypher query:
MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile )
MATCH (childD)-[:EMPLOYMENT_AS]->(root2:Employment )
WHERE root2.id IN ([1]) WITH DISTINCT childD, dg, rdgd
MATCH path3=(root3:Location )-[:CONTAINS*0..]->(descendant3:Location)
WHERE (descendant3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]) OR root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]))
UNWIND nodes(path3) AS pathNode3 WITH childD, dg, rdgd, COLLECT(DISTINCT pathNode3) AS pathNodes3
MATCH (childD)-[:LOCATED_IN]->(pathNode3) WHERE pathNode3 IN pathNodes3 WITH DISTINCT childD, dg, rdgd WHERE (childD.`active` = true) AND (childD.`experienceMonths` >= 129) AND ( (childD.`minSalaryUsd` <= 8883) OR (childD.`minHourlyRateUsd` <= 126) )
MATCH (childD)-[criterionRelationship8:HAS_VOTE_ON]->(c:Criterion {id: 2}) WHERE (criterionRelationship8.`properties.experienceMonths` >= 1) WITH DISTINCT childD, dg, rdgd
MATCH (childD)-[criterionRelationship10:HAS_VOTE_ON]->(c:Criterion {id: 36}) WHERE (criterionRelationship10.`avgVotesWeight` >= 1.0) AND (criterionRelationship10.`properties.experienceMonths` >= 1) WITH DISTINCT childD, dg, rdgd
MATCH (childD)-[criterionRelationship13:HAS_VOTE_ON]->(c:Criterion {id: 4}) WHERE (criterionRelationship13.`properties.experienceMonths` >= 0) WITH DISTINCT childD, dg, rdgd
MATCH (childD)-[criterionRelationship15:HAS_VOTE_ON]->(c:Criterion {id: 22}) WHERE (criterionRelationship15.`avgVotesWeight` >= 1.0) AND (criterionRelationship15.`properties.experienceMonths` >= 1) WITH DISTINCT childD, dg, rdgd
OPTIONAL MATCH (childD)-[ru:CREATED_BY]->(u:User) WITH childD, u, ru, dg, rdgd
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion) WHERE c.id IN [2, 36, 4, 22] WITH c, childD, u, ru, dg, rdgd, (vg.avgVotesWeight * (CASE WHEN c IS NOT NULL THEN coalesce({`22`:1.2236918603185925, `2`:2.9245935245152226, `36`:0.2288013749943646, `4`:3.9599506966378435}[toString(c.id)], 1.0) ELSE 1.0 END)) as weight, vg.totalVotes as totalVotes
WITH childD, u, ru , dg, rdgd , toFloat(sum(weight)) as weight, toInteger(sum(totalVotes)) as totalVotes
ORDER BY weight DESC , childD.createdAt DESC
SKIP 0 LIMIT 20
WITH * OPTIONAL MATCH (childD)-[rup:UPDATED_BY]->(up:User)
RETURN rdgd, ru, u, rup, up, childD AS decision, weight, totalVotes, [ (c1)<-[vg1:HAS_VOTE_ON]-(childD) WHERE c1.id IN [2, 36, 4, 22] | {criterion: c1, relationship: vg1} ] AS weightedCriteria
This query is automatically generated by my Cypher query builder. Right now on 1000 Profiles the query executes ~8 seconds.
Looks like this part of the query causes most of the issues:
MATCH (childD)-[:EMPLOYMENT_AS]->(root2:Employment )
WHERE root2.id IN ([1]) WITH DISTINCT childD, dg, rdgd
MATCH path3=(root3:Location )-[:CONTAINS*0..]->(descendant3:Location)
WHERE (descendant3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]) OR root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]))
UNWIND nodes(path3) AS pathNode3 WITH childD, dg, rdgd, COLLECT(DISTINCT pathNode3) AS pathNodes3
MATCH (childD)-[:LOCATED_IN]->(pathNode3) WHERE pathNode3 IN pathNodes3 WITH DISTINCT childD, dg, rdgd
Is there a way to optimize this?
This is PROFILE output:
UPDATED
I reimplemented initial part of the query to the following:
WITH [] as ceNodeList MATCH (root2:Employment )
WHERE root2.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
WITH ceNodeList, root2, COLLECT(root2) AS listRoot2
WITH apoc.coll.unionAll(ceNodeList, listRoot2) AS ceNodeList
WITH apoc.coll.toSet(ceNodeList) as ceNodeList
MATCH (root3:Location )
WHERE root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73])
WITH ceNodeList, root3, COLLECT(root3) AS listRoot3
OPTIONAL MATCH (root3)-[:CONTAINS*0..]->(descendant3:Location)
OPTIONAL MATCH (ascendant3:Location)-[:CONTAINS*0..]->(root3)
WITH ceNodeList, listRoot3, COLLECT( DISTINCT ascendant3) AS listAscendant3, COLLECT( DISTINCT descendant3) AS listDescendant3
WITH listRoot3, listAscendant3, apoc.coll.unionAll(ceNodeList, apoc.coll.unionAll(listDescendant3, apoc.coll.unionAll(listRoot3, listAscendant3))) AS ceNodeList
WITH apoc.coll.toSet(ceNodeList) as ceNodeList
UNWIND ceNodeList AS ceNode
WITH DISTINCT ceNode MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile ) -[:REQUIRES]->(ceNode)
WITH DISTINCT childD, dg, rdgd, collect(ceNode) as ceNodes
WITH childD, dg, rdgd, ceNodes, reduce(ceNodeLabels = [], n IN ceNodes | ceNodeLabels + labels(n)) as ceNodeLabels
WHERE all(x IN ['Employment', 'Location']
WHERE x IN ceNodeLabels) WITH childD, dg, rdgd return count(childD)
Now it works several times faster, but still not perfect. Is there something I may do in order to improve this?
UPDATED1
WITH [] as ceNodeList
MATCH (root2:Location )
WHERE root2.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100])
WITH ceNodeList, root2
OPTIONAL MATCH (root2)-[:CONTAINS*0..]->(descendant2:Location)
OPTIONAL MATCH (ascendant2:Location)-[:CONTAINS*0..]->(root2)
WITH ceNodeList, COLLECT(root2) AS listRoot2, COLLECT( DISTINCT ascendant2) AS listAscendant2, COLLECT( DISTINCT descendant2) AS listDescendant2
WITH apoc.coll.union(ceNodeList, apoc.coll.union(listDescendant2, apoc.coll.union(listRoot2, listAscendant2))) AS ceNodeList
WITH ceNodeList MATCH (root3:Employment )
WHERE root3.id IN ([101, 102, 103, 104, 105])
WITH ceNodeList, COLLECT(root3) AS listRoot3
WITH apoc.coll.union(ceNodeList, listRoot3) AS ceNodeList
WITH ceNodeList
UNWIND ceNodeList as seNode
WITH collect(seNode.id) as seNodeIds with apoc.coll.toSet(seNodeIds) as seNodeIds
MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Profile ) -[:REQUIRES]->(ceNode)
WHERE ceNode.id in seNodeIds
WITH DISTINCT childD, dg, rdgd, collect(ceNode) as ceNodes
WITH childD, dg, rdgd, ceNodes, reduce(ceNodeLabels = [], n IN ceNodes | ceNodeLabels + labels(n)) as ceNodeLabels
WHERE all(x IN ['Employment', 'Location']
WHERE x IN ceNodeLabels)
WITH childD, dg, rdgd
Try this:
WITH [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35] AS ids
WITH reduce(idsMap = {}, x IN ids | apoc.map.setEntry(idsMap, toString(x), true))
MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile )
MATCH (childD)-[:EMPLOYMENT_AS]->(root2:Employment )
WHERE root2.id = 1
WITH DISTINCT childD, dg, rdgd, idsMap
MATCH (descendant3:Location) WHERE apoc.map.get(idsMap, toString(descendant3.id), false) = true
MATCH path3=(root3:Location )-[:CONTAINS*0..]->(descendant3)
WHERE apoc.map.get(idsMap, toString(root3.id), false) = true
UNWIND nodes(path3) AS pathNode3 WITH childD, dg, rdgd, COLLECT(DISTINCT pathNode3) AS pathNodes3
MATCH (childD)-[:LOCATED_IN]->(pathNode3) WHERE pathNode3 IN pathNodes3 WITH DISTINCT childD, dg, rdgd WHERE (childD.`active` = true) AND (childD.`experienceMonths` >= 129) AND ( (childD.`minSalaryUsd` <= 8883) OR (childD.`minHourlyRateUsd` <= 126) )
MATCH (childD)-[criterionRelationship8:HAS_VOTE_ON]->(c:Criterion {id: 2}) WHERE (criterionRelationship8.`properties.experienceMonths` >= 1) WITH DISTINCT childD, dg, rdgd
MATCH (childD)-[criterionRelationship10:HAS_VOTE_ON]->(c:Criterion {id: 36}) WHERE (criterionRelationship10.`avgVotesWeight` >= 1.0) AND (criterionRelationship10.`properties.experienceMonths` >= 1) WITH DISTINCT childD, dg, rdgd
MATCH (childD)-[criterionRelationship13:HAS_VOTE_ON]->(c:Criterion {id: 4}) WHERE (criterionRelationship13.`properties.experienceMonths` >= 0) WITH DISTINCT childD, dg, rdgd
MATCH (childD)-[criterionRelationship15:HAS_VOTE_ON]->(c:Criterion {id: 22}) WHERE (criterionRelationship15.`avgVotesWeight` >= 1.0) AND (criterionRelationship15.`properties.experienceMonths` >= 1) WITH DISTINCT childD, dg, rdgd
OPTIONAL MATCH (childD)-[ru:CREATED_BY]->(u:User) WITH childD, u, ru, dg, rdgd
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c:Criterion) WHERE c.id IN [2, 36, 4, 22] WITH c, childD, u, ru, dg, rdgd, (vg.avgVotesWeight * (CASE WHEN c IS NOT NULL THEN coalesce({`22`:1.2236918603185925, `2`:2.9245935245152226, `36`:0.2288013749943646, `4`:3.9599506966378435}[toString(c.id)], 1.0) ELSE 1.0 END)) as weight, vg.totalVotes as totalVotes
WITH childD, u, ru , dg, rdgd , toFloat(sum(weight)) as weight, toInteger(sum(totalVotes)) as totalVotes
ORDER BY weight DESC , childD.createdAt DESC
SKIP 0 LIMIT 20
WITH * OPTIONAL MATCH (childD)-[rup:UPDATED_BY]->(up:User)
RETURN rdgd, ru, u, rup, up, childD AS decision, weight, totalVotes, [ (c1)<-[vg1:HAS_VOTE_ON]-(childD) WHERE c1.id IN [2, 36, 4, 22] | {criterion: c1, relationship: vg1} ] AS weightedCriteria
Here, I have created a map from the ids given and then used it instead of IN operator.
Update:
I think your new query can be simplified a bit. We can combine apoc.coll.unionAll and apoc.coll.toSet, with a single call to apoc.coll.union, try this:
MATCH (root2:Employment)
WHERE root2.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
WITH COLLECT(root2) AS ceNodeList
MATCH (root3:Location)
WHERE root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73])
WITH ceNodeList, root3, COLLECT(root3) AS listRoot3
OPTIONAL MATCH (root3)-[:CONTAINS*0..]-(descendants:Location)
WITH ceNodeList, listRoot3, COLLECT(DISTINCT descendant3) AS listDescendant3
WITH apoc.coll.union(ceNodeList, apoc.coll.union(listDescendant3, listRoot3)) AS ceNodeList
UNWIND ceNodeList AS ceNode
WITH DISTINCT ceNode MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile)-[:REQUIRES]->(ceNode)
WITH DISTINCT childD, dg, rdgd, collect(ceNode) as ceNodes
WITH childD, dg, rdgd, ceNodes, reduce(ceNodeLabels = [], n IN ceNodes | ceNodeLabels + labels(n)) as ceNodeLabels
WHERE all(x IN ['Employment', 'Location']
WHERE x IN ceNodeLabels) WITH childD, dg, rdgd return count(childD)

Neo4j Cypher group by a column in a list of rows for aggregation

I have the following Neo4j Cypher query:
MATCH (v:Vacancy {deleted: false})-[vv:HAS_VOTE_ON]->(c:Criterion)<-[vp:HAS_VOTE_ON]-(p:Profile {id: 703, deleted: false})
WHERE vv.avgVotesWeight > 0 AND vv.avgVotesWeight <= vp.avgVotesWeight
WITH v, p
MATCH (v)-[vv1:HAS_VOTE_ON]->(cv:Criterion)
OPTIONAL MATCH (p)-[vp1:HAS_VOTE_ON]->(cv)
WITH v.id as vacancyId, cv.id as criterionId, coalesce(vv1.`properties.skillCoefficient`, 1.0) as vacancyCriterionCoefficient, coalesce(vp1.avgVotesWeight, 0) as profileCriterionVoteWeight, coalesce(vp1.totalVotes, 0) as profileCriterionTotalVotes
RETURN vacancyId, criterionId, vacancyCriterionCoefficient, profileCriterionVoteWeight, profileCriterionTotalVotes
which returns the following values:
Now, for each Vacancy (with the same vacancyId) I need to calculate totalProfileCriterionVoteWeight (SUM) for all criteria by the folowing formula:
vacancyCriterionCoefficient * profileCriterionVoteWeight
For this purpose, I need to group somehow the rows by vacancyId.
Could you please show how it is possible with a Cypher here?
You can replace your last line with:
WITH distinct(vacancyId) as vacancyId, sum(vacancyCriterionCoefficient * profileCriterionVoteWeight) as totalProfileCriterionVoteWeight
RETURN vacancyId, totalProfileCriterionVoteWeight
Which For the data shown in the picture will return:
╒═══════════╤═════════════════════════════════╕
│"vacancyId"│"totalProfileCriterionVoteWeight"│
╞═══════════╪═════════════════════════════════╡
│704 │22 │
├───────────┼─────────────────────────────────┤
│706 │16 │
└───────────┴─────────────────────────────────┘
Explanation: distinct allows to "group" the rows, only with an "accumulator" to other fields. Here we just needed to use SUM as an accumulator.
In order to test it, I used sample data:
MERGE (a:Node{vacancyId:704, criterionId: 6907, vacancyCriterionCoefficient: 1, profileCriterionVoteWeight: 1, profileCriterionTotalVotes: 1})
MERGE (b:Node{vacancyId:704, criterionId: 6909, vacancyCriterionCoefficient: 3, profileCriterionVoteWeight: 5, profileCriterionTotalVotes: 1})
MERGE (c:Node{vacancyId:704, criterionId: 6908, vacancyCriterionCoefficient: 2, profileCriterionVoteWeight: 3, profileCriterionTotalVotes: 1})
MERGE (d:Node{vacancyId:706, criterionId: 6909, vacancyCriterionCoefficient: 1, profileCriterionVoteWeight: 5, profileCriterionTotalVotes: 1})
MERGE (e:Node{vacancyId:706, criterionId: 6908, vacancyCriterionCoefficient: 3, profileCriterionVoteWeight: 3, profileCriterionTotalVotes: 1})
MERGE (f:Node{vacancyId:706, criterionId: 6907, vacancyCriterionCoefficient: 2, profileCriterionVoteWeight: 1, profileCriterionTotalVotes: 1})
And query:
MATCH (n)
WITH n.vacancyId as vacancyId, n.criterionId as criterionId, n.vacancyCriterionCoefficient as vacancyCriterionCoefficient, n.profileCriterionVoteWeight as profileCriterionVoteWeight, n.profileCriterionTotalVotes as profileCriterionTotalVotes
WITH distinct(vacancyId) as vacancyId, sum(vacancyCriterionCoefficient * profileCriterionVoteWeight) as totalProfileCriterionVoteWeight
//return vacancyId, criterionId, vacancyCriterionCoefficient, profileCriterionVoteWeight, profileCriterionTotalVotes
RETURN vacancyId, totalProfileCriterionVoteWeight
Which provide the results above

How to filter a number of records and get only the outer most records from a postgres ltree structure?

I have database records arranged in an ltree structure (Postgres ltree extension).
I want to filter these items down to the outer most ancestors of the current selection.
Test cases:
[11, 111, 1111, 2, 22, 222, 2221, 2222] => [11, 2];
[1, 11, 111, 1111, 1112, 2, 22, 222, 2221, 2222, 3, 4, 5] => [1, 2, 3, 4, 5];
[1111, 1112, 2221, 2222] => [1111, 1112, 2221, 2222];
1
|_1.1
| |_1.1.1
| |_1.1.1.1
| |_1.1.1.2
2
|_2.2
| |_2.2.2
| |_2.2.2.1
| |_2.2.2.2
3
|
4
|
5
I have implemented this in Ruby like so.
def fetch_outer_most_items(identifiers)
ordered_items = Item.where(id: identifiers).order("path DESC")
items_array = ordered_items.to_a
outer_most_item_ids = []
while(items_array.size > 0) do
item = items_array.pop
outer_most_item_ids.push(item.id)
duplicate_ids = ordered_items.where("items.path <# '#{item.path}'").pluck(:id)
if duplicate_ids.any?
items_array = items_array.select { |i| !duplicate_ids.include?(i.id) }
end
end
return ordered_items.where(id: outer_most_item_ids)
end
I have eliminated descendants as duplicates via recursion. I'm pretty sure there is an SQL way of doing this which will be the preferred solution as this one triggers n+1 queries. Ideally I would add this function as a named scope for the Item model.
Any pointers please?

Attempting a transpose by performing multiple joins of table on subsets of same table in Hive

I'm attempting to perform a transpose on the column date by performing multiple joins of my table data_A on subsets of the same table:
Here's the code to create my test dataset, which contains duplicate records for every value of count:
create table database.data_A (member_id string, x1 int, x2 int, count int, date date);
insert into table database.data_A
select 'A0001',1, 10, 1, '2017-01-01'
union all
select 'A0001',1, 10, 2, '2017-07-01'
union all
select 'A0001',2, 20, 1, '2017-01-01'
union all
select 'A0001',2, 20, 2, '2017-07-01'
union all
select 'B0001',3, 50, 1, '2017-03-01'
union all
select 'C0001',4, 100, 1, '2017-04-01'
union all
select 'D0001',5, 200, 1, '2017-10-01'
union all
select 'D0001',5, 200, 2, '2017-11-01'
union all
select 'D0001',5, 200, 3, '2017-12-01'
union all
select 'D0001',6, 500, 1, '2017-10-01'
union all
select 'D0001',6, 500, 2, '2017-11-01'
union all
select 'D0001',6, 500, 3, '2017-12-01'
union all
select 'D0001',7, 1000, 1, '2017-10-01'
union all
select 'D0001',7, 1000, 2, '2017-11-01'
union all
select 'D0001',7, 1000, 3, '2017-12-01';
I'd like to transpose the data into this:
member_id x1 x2 date1 date2 date3
'A0001', 1, 10, '2017-01-01' '2017-07-01' .
'A0001', 2, 20, '2017-01-01' '2017-07-01' .
'B0001', 3, 50, '2017-03-01' . .
'C0001', 4, 100, '2017-04-01' . .
'D0001', 5, 200, '2017-10-01' '2017-11-01' '2017-12-01'
'D0001', 6, 500, '2017-10-01' '2017-11-01' '2017-12-01'
'D0001', 7, 1000, '2017-10-01' '2017-11-01' '2017-12-01'
My first program (which was not successful):
create table database.data_B as
select a.member_id, a.x1, a.x2, a.date_1, b.date_2, c.date_3
from (select member_id, x1, x2, date as date_1 from database.data_A where count=1) as a
left join
(select member_id, date as date_2 from database.data_A where count=2) as b
on (a.member_id=b.member_id)
left join
(select member_id, date as date_3 from database.data_A where count=3) as c
on (a.member_id=c.member_id);
Below will do the job.
select
member_id,
x1,
x2,
max(case when count=1 then date1 else '.' end) as date11,
max(case when count=2 then date1 else '.' end) as date2,
max(case when count=3 then date1 else '.' end) as date3
from data_A
group by member_id,x1, x2

Resources