I am trying to write a cypher query that finds a path between nodes a and b such that each step has the maximum timestamp value out of all available alternatives that is less than 15.
Here is my query so far, it does everything except for select the maximum possible timestamp at each step. How do I express this condition?
MATCH path=(a:NODE)-[rs:PARENT*]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE' AND ALL (r IN rs
WHERE r.timestamp < 15)
RETURN path
This is just awful sudo code but I think it expresses what I am looking for
MATCH path=(a:NODE)-[rs:PARENT*]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE' AND ALL (r IN rs
WHERE r.timestamp < 15 AND r.timestamp = max(allPossibleRsForThisStep))
RETURN path
Can this kind of query be written in cypher?
It won't be fast in cypher, it's possible to compute all maximum values first and then do what you want to do by compare the max value in a list with the current value.
Something like this (not sure if it works)
WITH range(1,10) as max_vals // a list with 10 values (actual values are not that important)
MATCH (a:NODE)-[rs:PARENT*..10]->(b:NODE)
WHERE a.name = 'SOME_VALUE' and b.name = 'SOME_OTHER_VALUE'
WITH a,b,
map(idx in range(0,size(rs)) |
max_vals[idx] = case when max_vals[idx]<rs[idx].timestamp then rs[idx].timestamp else max_vals[idx] end ), max_vals
MATCH path=(a)-[rs:PARENT*..10]->(b)
AND ALL (idx in range(0,size(rs) WHERE rs[idx].timestamp < 15 AND rs[idx].timestamp = max_vals[idx])
RETURN path
Related
I want to see if a path exists for a graph, given a list of sequential properties to search for. The list can be of variable length.
This is my most recent attempt:
WITH ['a', 'b', 'c', 'd'] AS search_list // can be any list of strings
// FOREACH (i IN range(search_list) |
// MATCH (a:Node {prop:i})-->(b:Node {prop:i+1}))
// RETURN true if all relationships exist, false if not
This solution doesn't work because you can't use MATCH in a FOREACH. What should I do instead?
You can try to build a query manually for the match entire path and execute it using the function apoc.cypher.run:
WITH ['a', 'b', 'c', 'd'] AS search_list
WITH search_list,
'MATCH path = ' +
REDUCE(c = '', i in range(0, size(search_list) - 2) |
c + '(:Node {prop: $props[' + i + ']})-->'
) +
'(:Node {prop: $props[' + (size(search_list) - 1) +']}) ' +
'RETURN count(path) as pathCount' AS cypherQuery
CALL apoc.cypher.run(cypherQuery, {props: search_list}) YIELD value
RETURN CASE WHEN value.pathCount > 0
THEN true
ELSE false
END AS pathExists
Assuming you pass the list of property values in a $props parameter and the length of that list is 4, this query will first search for all paths of length 4 that have the desired start and end nodes (to narrow down the candidate paths), and then filter the interior nodes of the paths:
MATCH p=(a:Node {prop: $props[0]})-[*4]->(b:Node {prop: $props[-1]})
WITH p, NODES(p)[1..-2] AS midNodes
WHERE ALL(i IN RANGE(1, SIZE(midNodes)) WHERE midNodes[i-1] = $props[i])
RETURN p;
To increase efficiency, you should create an index on :Node(prop) as well.
If this query returns nothing, then there are no matching paths.
I use the following Cypher query:
MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
WITH u, hv
ORDER BY hv.createDate DESC
WITH count(hv) as count, ceil(toFloat(count(hv)) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
This query works fine and does exactly what I need.
Right now I'm concerned about two possible issues inside of this query from the performance point of view and Cypher best practices.
First of all, as you may see - I two times use the same count(hv) function. Will it cause the problems during the execution from the performance point of view or Cypher and Neo4j are smart enough to optimize it? If no, please show how to fix it.
And the second place is the CASE statement inside range() function? The same question here - will this CASE statement be executed only once or every time for every iteration over my range? Please show how to fix it if needed.
UPDATED
I tried to do a separator WITH for count but the query doesn't return the results(returns empty result)
MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
WHERE v.id = {valueId}
OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
WHERE {fetchCreateUsers}
WITH u, hv ORDER BY hv.createDate DESC
WITH u, hv, count(hv) as count
WITH u, hv, count, ceil(toFloat(count) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
1 MATCH (v:Value)-[:CONTAINS]->(hv:HistoryValue)
2 WHERE v.id = {valueId}
3 OPTIONAL MATCH (hv)-[:CREATED_BY]->(u:User)
4 WHERE {fetchCreateUsers}
5 WITH u, hv
6 ORDER BY hv.createDate DESC
7 WITH count(hv) as count, ceil(toFloat(count(hv)) / {maxResults}) as step, COLLECT({userId: u.id, historyValueId: hv.id, historyValue: hv.originalValue, historyValueCreateDate: hv.createDate}) AS data
8 RETURN REDUCE(s = [], i IN RANGE(0, count - 1, CASE step WHEN 0 THEN 1 ELSE step END) | s + data[i]) AS result, step, count
(1) You need to pass hv in line 5, because it's values are collected in line 7. That said, you can still do something like this:
5 WITH u, collect(hv) AS hvs, count(hv) as count
UNWIND hvs AS hv
However, this is not very elegant and probably not worth doing.
(2) You can calculate the CASE expression in line 7:
7 WITH count, data, step, CASE step WHEN 0 THEN 1 ELSE step END AS stepFlag
8 RETURN REDUCE(s = [], i IN RANGE(0, count - 1, stepFlag) | s + data[i]) AS result, step, count
Based on the previous question:
Neo4j Cypher query structure and performance optimization
Neo4j Cypher node filtering by pattern comprehension
finally I have refactored my query to the following:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = {decisionId}
MATCH (childD)<-[:SET_FOR]-(equalFilterValue)-[:SET_ON]->(equalFilterCharacteristic)
WHERE ALL(key IN keys({equalFilters}) WHERE id(equalFilterCharacteristic) = toInt(key) AND equalFilterValue.value = ({equalFilters}[key]))
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(rangeFilterValue)-[:SET_ON]->(rangeFilterCharacteristic)
WHERE ALL(key IN keys({rangeFilters}) WHERE id(rangeFilterCharacteristic) = toInt(key) AND ({rangeFilters}[key])[0] <= rangeFilterValue.value <= ({rangeFilters}[key])[1])
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)
RETURN ru, u, childD AS decision
SKIP 0 LIMIT 100
This query works fine if each *filter type(map) has only one key, for example:
queries.add(new InQuery(integerCharacteristic.getId(), 30));
or
queries.add(new InQuery(stringCharacteristic.getId(), "Two"));
but fails when I add 2 or more conditions, for example:
queries.add(new InQuery(integerCharacteristic.getId(), 30));
queries.add(new InQuery(stringCharacteristic.getId(), "Two"));
The following query doesn't work as expected and my test assertions fail:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = {decisionId}
MATCH (childD)<-[:SET_FOR]-(inFilterValue)-[:SET_ON]->(inFilterCharacteristic)
WHERE ALL(key IN keys({inFilters}) WHERE id(inFilterCharacteristic) = toInt(key) AND ({inFilters}[key]) IN inFilterValue.value)
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)
RETURN ru, u, childD AS decision
SKIP 0 LIMIT 100
The parameter:
inFilters = {3153=30, 3151=Two}
Why it doesn't work when inFilters map contains 2 or more keys and how to make it working ?
So for the reason this only works with 1 key, lets go over what one WHERE ALL is doing when there is 2 keys.
WITH inFilters = {3153=30, 3151=Two}
....
WHERE ALL(key IN keys({equalFilters})
WHERE id(equalFilterCharacteristic) = toInt(key)
AND equalFilterValue.value = ({equalFilters}[key]))
Is equivalent to
WHERE id(equalFilterCharacteristic) = toInt(3153)
AND equalFilterValue.value = ({equalFilters}[3153])
AND id(equalFilterCharacteristic) = toInt(3151)
AND equalFilterValue.value = ({equalFilters}[3151])
And the problem there is that now we are checking that the node id of equalFilterCharacteristic is equal to 3153 AND 3151 at the same time, for each and every equalFilterCharacteristic. Since Neo4j only uses real numbers for the node ids, the above statement basically ends up reducing to an expensive WHERE FALSE when there is more than 1 key. So WHERE ALL can never be true in the above case. WHERE ANY however will evaluate to true if at least 1 check-group is true, and would be equivalent to
WHERE (
id(equalFilterCharacteristic) = toInt(3153)
AND equalFilterValue.value = ({equalFilters}[3153])
)
OR
(
id(equalFilterCharacteristic) = toInt(3151)
AND equalFilterValue.value = ({equalFilters}[3151])
)
Of course, since you know the key you are trying to match, you can skip the ALL and just do (don't think you need toInt() on id(), but id() is a long)
WHERE equalFilterValue.value = ({equalFilters}[id(equalFilterCharacteristic)])
Suppose I have a node with a collection in a property, say
START x = node(17) SET x.c = [ 4, 6, 2, 3, 7, 9, 11 ];
and somewhere (i.e. from .csv file) I get another collection of values, say
c1 = [ 11, 4, 5, 8, 1, 9 ]
I'm treating my collections as just sets, order of elements does not matter. What I need is to merge x.c with c1 with come magic operation so that resulting x.c will contain only distinct elements from both. The following idea comes to mind (yet untested):
LOAD CSV FROM "file:///tmp/additives.csv" as row
START x=node(TOINT(row[0]))
MATCH c1 = [ elem IN SPLIT(row[1], ':') | TOINT(elem) ]
SET
x.c = [ newxc IN x.c + c1 WHERE (newx IN x.c AND newx IN c1) ];
This won't work, it will give an intersection but not a collection of distinct items.
More RTFM gives another idea: use REDUCE() ? but how?
How to extend Cypher with a new builtin function UNIQUE() which accept collection and return collection, cleaned form duplicates?
UPD. Seems that FILTER() function is something close but intersection again :(
x.c = FILTER( newxc IN x.c + c1 WHERE (newx IN x.c AND newx IN c1) )
WBR,
Andrii
How about something like this...
with [1,2,3] as a1
, [3,4,5] as a2
with a1 + a2 as all
unwind all as a
return collect(distinct a) as unique
Add two collections and return the collection of distinct elements.
dec 15, 2014 - here is an update to my answer...
I started with a node in the neo4j database...
//create a node in the DB with a collection of values on it
create (n:Node {name:"Node 01",values:[4,6,2,3,7,9,11]})
return n
I created a csv sample file with two columns...
Name,Coll
"Node 01","11,4,5,8,1,9"
I created a LOAD CSV statement...
LOAD CSV
WITH HEADERS FROM "file:///c:/Users/db/projects/coll-merge/load_csv_file.csv" as row
// find the matching node
MATCH (x:Node)
WHERE x.name = row.Name
// merge the collections
WITH x.values + split(row.Coll,',') AS combo, x
// process the individual values
UNWIND combo AS value
// use toInt as the values from the csv come in as string
// may be a better way around this but i am a little short on time
WITH toInt(value) AS value, x
// might as well sort 'em so they are all purdy
ORDER BY value
WITH collect(distinct value) AS values, x
SET x.values = values
You could use reduce like this:
with [1,2,3] as a, [3,4,5] as b
return reduce(r = [], x in a + b | case when x in r then r else r + [x] end)
Since Neo4j 3.0, with APOC Procedures you can easily solve this with apoc.coll.union(). In 3.1+ it's a function, and can be used like this:
...
WITH apoc.coll.union(list1, list2) as unionedList
...
I want to iterate through the relationships between the "begining node" and the "end node".
Indeed, there is my cypher request :
MATCH (ar1:Article)-[:PART_OF]->()-[:SERIES]->(s1),
(ar2:Article)-[:PART_OF]->()-[:SERIES]->(s2),
(ar1)-[:CREATOR]->(au1:Author),
(ar2)-[:CREATOR]->(au1:Author),
p1 = (au1)-[CONTRIBUTOR*]->(au2:Author)
WITH REDUCE (edge IN relationships(p1)|weight + 1/edge.fdegree) AS
strength_au1_au2_p1,ar1 AS ar1,s1 AS s1,ar2 AS ar2,s2 AS s2,au1 AS au1,au2 AS au2
WHERE s1.name='WWW' AND s2.name='Pods' AND ar2.year >2010.0 AND ar1.year >2010.0
AND strength_au1_au2_p1<5.0
RETURN ar1,s1,ar2,s2,au1,au2,ar1.year AS calc_fuzzy_ar1_year_recent,ar2.year AS
calc_fuzzy_ar2_year_recent,strength_au1_au2_p1 AS calc_fuzzy_length_p1_short**
Now I want to iterate through CONTRIBUTOR* relationships (in p1) and get each of its 'fdegree' and return the minimum value(fdegree) of relationships in p1.
Thank you all
Try this:
MATCH (au1:Author)<-[:CREATOR]-(ar1:Article)-[:PART_OF]->()-[:SERIES]->(s1),
(au2:Author)<-[:CREATOR]-(ar2:Article)-[:PART_OF]->()-[:SERIES]->(s2)
WHERE s1.name='WWW' AND s2.name='Pods' AND ar2.year >2010.0 AND ar1.year >2010.0
WITH au1,au2,ar1,ar2,s1,s2
MATCH (au1)-[rels:CONTRIBUTOR*]->(au2:Author)
WHERE REDUCE (weight = 0, edge IN rels | weight + 1/edge.fdegree) < 5.0
RETURN au1,au2,ar1,ar2,s1,s2,
REDUCE (weight = 1000000, edge IN rels |
case when weight < edge.fdegree then weight else edge.fdegree end) as min_degree