I have this cypher query:
CALL db.index.fulltext.queryNodes("names","John Snow") YIELD node, score
WITH node, score MATCH (node)-[c:ACTIVE]->() WHERE c.is_active = 'True'
RETURN DISTINCT node, score ORDER BY score DESC LIMIT 10
I would like to filter results based on score percentile, so probably I need percentileDisc() as pd aggregation and then WHERE score > pd clause. How do I apply it here? percentileDisc(score, 0.5) always gives score itself.
This should work:
CALL db.index.fulltext.queryNodes("names","John Snow") YIELD node, score
WHERE EXISTS ((node)-[:ACTIVE {is_active: 'True'}]->())
WITH COLLECT({node: node, score: score}) AS data, percentileDisc(score, 0.5) AS p
UNWIND data AS d
WITH p, d
WHERE d.score > p
RETURN p, d.node AS node, d.score AS score
ORDER BY score DESC LIMIT 10
Related
I'm working with Flight Analyzer database (https://neo4j.com/graphgist/flight-analyzer).
We have there few nodes and relationships types.
Nodes:
Airport
(SEA:Airport { name:'SEA' })
Flight
(f0:Flight { date:'11/30/2015 04:24:12', duration:218, distance:1721, airline:'19977' })
Ticket
(t1f0:Ticket { class:'economy', price:1344.75 })
Relationships
Destination
(f0)-[:DESTINATION]->(ORD)
Origin
(f0)-[:ORIGIN]->(SEA)
Assign
(t1f0)-[:ASSIGN]->(f0)
Now I need to find some path and I have problem with that connection ORIGIN - FLIGHT - DESTINATION.
I need to find all airports that are connected to LAX airport with sum of ticket prices < 3000.
I tried
MATCH path = (origin:Airport { name:"LAX" })<-[r:ORIGIN|DESTINATION*..5]->(destination:Airport)
WHERE REDUCE(s = 0, n IN [x IN NODES(path) WHERE 'Flight' IN LABELS(x)] |
s + [(n)<-[:ASSIGN]-(ticket) | ticket.price][0]
) < 3000
RETURN path
but in this solution LAX can be ORIGIN and DESTINATION too. I only want to chose paths that always have the same order aiport1 <- origin - flight1 - destination -> airport2 <- origin - flight2 - destination -> aiport etc..
I need to include departure and arrive time so
flight1 date + duration < flight2 date then flight2 date + duration < flight3 date etc...
[UPDATED]
This query should check that:
matched paths have alternating ORIGIN/DESTINATION relationships, and
every departing flight lands at least 30 minutes before the next departing flight (if any), and
the sum of the ticket prices of the Flight nodes (which are every other node starting at the second one) < 3000
MATCH p = (origin:Airport {name: 'LAX'})-[:ORIGIN|DESTINATION*..5]-(destination:Airport)
WHERE
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE
TYPE(RELATIONSHIPS(p)[i]) = ['ORIGIN', 'DESTINATION'][i] AND
(i%4 <> 1 OR (i + 2) > LENGTH(p) OR
(apoc.date.parse(NODES(p)[i].date,'m','MM/dd/yyyy hh:mm:ss') + NODES(p)[i].duration + 30) < apoc.date.parse(NODES(p)[i+2].date,'m','MM/dd/yyyy hh:mm:ss'))
) AND
REDUCE(s = 0, n IN [k IN RANGE(1, LENGTH(p), 2) | NODES(p)[k]] |
s + [(n)<-[:ASSIGN]-(ticket) | ticket.price][0]
) < 3000
RETURN p
The query uses the apoc.date.parse function to convert each date into the number of epoch minutes, so that a duration (assumed to also be in minutes) can be added to it.
I believe, you should create new relationships like flyto from an airport to an airport with ticket price and ticket class. it can be useful.
then you can find flights easier.
match
(a:Airport )<-[:ORIGIN]-(f:Flight)-[:DESTINATION ]->(b:Airport ),
(f)-[:ASSIGN]-(t:Ticket)
CREATE (a)-[r:FLY_TO {price:t.price,Class:t.class} ]->(b)
I use the following Cypher query:
MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
WITH u, hv
ORDER BY hv.createDate DESC
WITH count(hv) as count, ceil(toFloat(count(hv)) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
This query works fine and does exactly what I need.
Right now I'm concerned about two possible issues inside of this query from the performance point of view and Cypher best practices.
First of all, as you may see - I two times use the same count(hv) function. Will it cause the problems during the execution from the performance point of view or Cypher and Neo4j are smart enough to optimize it? If no, please show how to fix it.
And the second place is the CASE statement inside range() function? The same question here - will this CASE statement be executed only once or every time for every iteration over my range? Please show how to fix it if needed.
UPDATED
I tried to do a separator WITH for count but the query doesn't return the results(returns empty result)
MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
WITH u, hv ORDER BY hv.createDate DESC
WITH u, hv, count(hv) as count
WITH u, hv, count, ceil(toFloat(count) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
1 MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
2 WHERE v.id = {valueId}
3 OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
4 WHERE {fetchCreateUsers}
5 WITH u, hv
6 ORDER BY hv.createDate DESC
7 WITH count(hv) as count, ceil(toFloat(count(hv)) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
8 RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
(1) You need to pass hv in line 5, because it's values are collected in line 7. That said, you can still do something like this:
5 WITH u, collect(hv) AS hvs, count(hv) as count
UNWIND hvs AS hv
However, this is not very elegant and probably not worth doing.
(2) You can calculate the CASE expression in line 7:
7 WITH count, data, step, CASE step WHEN 0 THEN 1 ELSE step END AS stepFlag
8 RETURN REDUCE(s = [], i IN RANGE(0, count - 1, stepFlag) | s + data[i]) AS result, step, count
The structure of my data base is:
( :node ) -[:give { money: some_int_value } ]-> ( :Org )
One node can have multiple relations.
I need to find top 3 nodes with the most number of relations :give with their property money holding: vx <= money <= vy
Using ORDER BY and LIMIT should solve your problem:
Match ( n:node ) -[r:give { money: some_int_value } ]-> ( :Org )
RETURN n
ORDER BY count(r) DESC //Order by the number of relations each node has
LIMIT 3 //We only want the top 3 nodes
Instead of using the label 'node', maybe use something more descriptive like Person for the label so the datamodel is more clear:
MATCH (p:Person)-[r:give]->(o:Org)
WITH count(r) AS num, sum(r.money) AS total, p
RETURN p, num, total ORDER BY num DESC LIMIT 3;
I'm not sure what you mean by "their property money holding: vx <= money <= vy". If you could clarify I can update my answer accordingly. You can calculate the total of the money properties using the sum() function.
Edit
To only include relationships with money property with value greater than 10 and less 25:
MATCH (p:Person)-[r:give]->(o:Org)
WHERE r.money >= 10 AND r.money <= 25
WITH count(r) AS num, sum(r.money) AS total, p
RETURN p, num, total ORDER BY num DESC LIMIT 3;
I am trying to write a cypher query that finds a path between nodes a and b such that each step has the maximum timestamp value out of all available alternatives that is less than 15.
Here is my query so far, it does everything except for select the maximum possible timestamp at each step. How do I express this condition?
MATCH path=(a:NODE)-[rs:PARENT*]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE' AND ALL (r IN rs
WHERE r.timestamp < 15)
RETURN path
This is just awful sudo code but I think it expresses what I am looking for
MATCH path=(a:NODE)-[rs:PARENT*]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE' AND ALL (r IN rs
WHERE r.timestamp < 15 AND r.timestamp = max(allPossibleRsForThisStep))
RETURN path
Can this kind of query be written in cypher?
It won't be fast in cypher, it's possible to compute all maximum values first and then do what you want to do by compare the max value in a list with the current value.
Something like this (not sure if it works)
WITH range(1,10) as max_vals // a list with 10 values (actual values are not that important)
MATCH (a:NODE)-[rs:PARENT*..10]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE'
WITH a,b,
map(idx in range(0,size(rs)) |
max_vals[idx] = case when max_vals[idx]<rs[idx].timestamp then rs[idx].timestamp else max_vals[idx] end ), max_vals
MATCH path=(a)-[rs:PARENT*..10]->(b)
AND ALL (idx in range(0,size(rs) WHERE rs[idx].timestamp < 15 AND rs[idx].timestamp = max_vals[idx])
RETURN path
I want to iterate through the relationships between the "begining node" and the "end node".
Indeed, there is my cypher request :
MATCH (ar1:Article)-[:PART_OF]->()-[:SERIES]->(s1),
(ar2:Article)-[:PART_OF]->()-[:SERIES]->(s2),
(ar1)-[:CREATOR]->(au1:Author),
(ar2)-[:CREATOR]->(au1:Author),
p1 = (au1)-[CONTRIBUTOR*]->(au2:Author)
WITH REDUCE (edge IN relationships(p1)|weight + 1/edge.fdegree) AS
strength_au1_au2_p1,ar1 AS ar1,s1 AS s1,ar2 AS ar2,s2 AS s2,au1 AS au1,au2 AS au2
WHERE s1.name='WWW' AND s2.name='Pods' AND ar2.year >2010.0 AND ar1.year >2010.0
AND strength_au1_au2_p1<5.0
RETURN ar1,s1,ar2,s2,au1,au2,ar1.year AS calc_fuzzy_ar1_year_recent,ar2.year AS
calc_fuzzy_ar2_year_recent,strength_au1_au2_p1 AS calc_fuzzy_length_p1_short**
Now I want to iterate through CONTRIBUTOR* relationships (in p1) and get each of its 'fdegree' and return the minimum value(fdegree) of relationships in p1.
Thank you all
Try this:
MATCH (au1:Author)<-[:CREATOR]-(ar1:Article)-[:PART_OF]->()-[:SERIES]->(s1),
(au2:Author)<-[:CREATOR]-(ar2:Article)-[:PART_OF]->()-[:SERIES]->(s2)
WHERE s1.name='WWW' AND s2.name='Pods' AND ar2.year >2010.0 AND ar1.year >2010.0
WITH au1,au2,ar1,ar2,s1,s2
MATCH (au1)-[rels:CONTRIBUTOR*]->(au2:Author)
WHERE REDUCE (weight = 0, edge IN rels | weight + 1/edge.fdegree) < 5.0
RETURN au1,au2,ar1,ar2,s1,s2,
REDUCE (weight = 1000000, edge IN rels |
case when weight < edge.fdegree then weight else edge.fdegree end) as min_degree