How to build a correlated subquery? - neo4j

I have something like:
create (:ex {name: "x", ver: 1.0});
create (:ex {name: "x", ver: 1.1});
create (:ex {name: "y", ver: 0.9});
and want to return the latest version for any given name i.e. x-1.1 and y-0.9. I've tried this:
match (n:ex), (m:ex) where m.name = n.name and m.ver = max(n.ver) return m
but neo hates me with:
Invalid use of aggregating function max(...) in this context (line 1, column (x) (offset: 61))
what's the correct approach here?
* Edit I *
I did also try stringing my versions together:
create (:ex {name: "x", ver: 1.0})-[:PrecededBy]->(:ex {name: "x", ver: 1.1});
match (n:ex {name: "x", ver: 1.1}) create (n)->[:PrecededBy]->(:ex {name: "x", ver: 1.2});
create (:ex {name: "y", ver: 0.9});
thinking I could use endNode() but that doesn't seem to work at all:
match (n:ex)-[r]-() return endNode(r)
returns 3 nodes!
* Edit II *
I might have thought something like this might have worked:
match p=(:ex)-[*]->(:ex) return last(nodes(p))
but clearly I don't understand last()

When you use aggregation functions like MAX() and COLLECT() (which are only valid in WITH and RETURN clauses), you can also specify one or more non-aggregating "grouping keys" in the same clause.
For example, to get the maximum version (max_ver) for every distinct name (the "grouping key"):
MATCH (n:ex)
RETURN n.name AS name, MAX(n.ver) AS max_ver;
[UPDATED]
On the other hand, if you want to get the node with the maximum version for each name, here is one way to do that:
MATCH (n:ex)
WITH n
ORDER BY n.ver DESC
WITH n.name AS name, COLLECT(n) AS ns
RETURN name, ns[0] AS latest;
This query orders all the nodes by descending version number, collects the nodes with the same name (maintaining the order), and returns a row with each name and the node having that name with the highest version number.

Related

Cypher: List declared before UNWIND-ing a second list becomes null after UNWIND-ing the second list and executing a MATCH which returns no results

I have the following scenario in a noe4j db:
There are tasks which can be assigned to different users based on some criteria. There's an optional criterion (some tasks have a filter for user's location, some don't).
I need to find all tasks for a user (if they have a location filter, I need to check user's location as well, if they don't I match only by the rest of the criteria).
I've tried to collect the tasks matching the mandatory criteria, then filter those which don't require the optional filter, then filter those which require the optional filter and match the current user and eventually merge the two lists.
Could you also suggest a more efficient way to do this please?
Here's a minimal example (of course, I have more complex matches after UNWIND)
WITH [{a: 'test'}, {a: 'a', b: 'b'}] AS initialList
WITH [i IN initialList WHERE i.b IS NULL] AS itemsWithoutB, initialList
UNWIND initialList AS item
MATCH (item) WHERE item.a IS NULL
RETURN COLLECT(item) + itemsWithoutB
I would expect here to have the content of itemsWithoutB returned, but I get no records (Response: []).
Note that if the MATCH done after UNWIND does actually return some records, then the content of itemsWithoutB is returned as well.
For example:
WITH [{a: 'test'}, {a: 'a', b: 'b'}] AS initialList
WITH [i IN initialList WHERE i.b IS NULL] AS itemsWithoutB, initialList
UNWIND initialList AS item
MATCH (item) WHERE item.a IS NOT NULL
RETURN COLLECT(item) + itemsWithoutB
this returns:
╒═════════════════════════════════════════════╕
│"COLLECT(item) + itemsWithoutB" │
╞═════════════════════════════════════════════╡
│[{"a":"test"},{"a":"a","b":"b"},{"a":"test"}]│
└─────────────────────────────────────────────┘
Neo4j version: enterprise 3.5.6
What am I missing here, please?
---EDITED---
I'm adding here a more complex example, closer to the real scenario:
Generate initial data:
MERGE (d:Device {code: 'device1', latitude:90.5, longitude: 90.5})-[:USED_BY]->(u:User {name: 'user1'})-[:WORKS_IN]->(c:Country {code: 'RO'})<-[targets:TARGETS]-(:Task {name: 'task1', latitude: 90.5, longitude: 90.5, maxDistance: 1000, maxMinutesAfterLastInRange: 99999})<-[:IN_RANGE {timestamp: datetime()}]-(d)
MERGE (c)<-[:TARGETS]-(:Task {name: 'task2'})
MERGE (c)<-[:TARGETS]-(:Task {name: 'task4', latitude: 10.5, longitude: 10.5, maxDistance: 1, maxMinutesAfterLastInRange: 99999})
CREATE (:User {name: 'user2'})-[:WORKS_IN]->(:Country {code: 'GB'})<-[:TARGETS]-(:Task {name: 'task3'})
Here's a neo4j console link for this example.
I want to be able to use the same query to find the tasks for any user (task1 and task2 should be returned for user1, task3 for user2, task4 shouldn't be returned for neither of them).
The following query works for user1, but doesn't work if I change the user name filter to "user2":
MATCH (user:User {name: "user1"})-[:WORKS_IN]->(country)
OPTIONAL MATCH (device:Device)-[:USED_BY]->(user)
WITH country, device
MATCH (task:Task)-[:TARGETS]->(country)
WITH COLLECT(task) AS filteredTasks, device
WITH [t IN filteredTasks WHERE t.latitude IS NULL OR t.longitude IS NULL] AS matchedTasksWithoutLocationFilter, filteredTasks, device
UNWIND filteredTasks AS task
MATCH (device)-[inRange:IN_RANGE]->(task)
WHERE task.maxMinutesAfterLastInRange IS NOT NULL
AND duration.between(datetime(inRange.timestamp), datetime()).minutes <= task.maxMinutesAfterLastInRange
RETURN COLLECT(task) + matchedTasksWithoutLocationFilter AS matchedTasks
Updated answer based on new information
I think you can do this in one shot and not need list comprehensions.
MATCH (user: User {name: "user1" })-[:WORKS_IN]->(country)<-[:TARGETS]-(task: Task)
OPTIONAL MATCH (task)<-[inRange: IN_RANGE]-(device: Device)-[:USED_BY]->(user)
WITH task, inRange
MATCH (task)
WHERE (task.latitude IS NULL OR task.longitude IS NULL)
OR (inRange IS NOT NULL AND
task.maxMinutesAfterLastInRange IS NOT NULL AND
duration.between(datetime(inRange.timestamp), datetime()).minutes <= task.maxMinutesAfterLastInRange)
RETURN task
For user1:
╒══════════════════════════════════════════════════════════════════════╕
│"task" │
╞══════════════════════════════════════════════════════════════════════╡
│{"name":"task2"} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"task1","maxDistance":1000,"maxMinutesAfterLastInRange":99999,│
│"latitude":90.5,"longitude":90.5} │
└──────────────────────────────────────────────────────────────────────┘
For user2:
╒════════════════╕
│"task" │
╞════════════════╡
│{"name":"task3"}│
└────────────────┘
Original answer
When your MATCH doesn't return any nodes (in that example, all nodes have an a property), the rest of the query's got no work to do - sort of like a failed inner join in a traditional SQL database.
If you switch to an OPTIONAL MATCH then you'll see results from itemsWithoutB irrespective of whether the MATCH worked. I know your example's synthetic so I'm not sure if that's what you're after - in your example the COLLECT(item) is going to be working off the item from UNWIND, and the result of the OPTIONAL MATCH is basically irrelevant. Still, imagine that these are real nodes with real queries:
WITH [{a: 'test'}, {a: 'a', b: 'b'}] AS initialList
WITH [i IN initialList WHERE i.b IS NULL] AS itemsWithoutB, initialList
UNWIND initialList AS item
OPTIONAL MATCH (item) WHERE item.a IS NULL
RETURN COLLECT(item) + itemsWithoutB
You may need to do some further work to de-duplicate the results.

Cypher - how to walk graph while computing

I'm just starting studying Cypher here..
How would would I specify a Cypher query to return the node connected, from 1 to 3 hops away of the initial node, which has the highest average of weights in the path?
Example
Graph is:
(I know I'm not using the Cypher's notation here..)
A-[2]-B-[4]-C
A-[3.5]-D
It would return D, because 3.5 > (2+4)/2
And with Graph:
A-[2]-B-[4]-C
A-[3.5]-D
A-[2]-B-[4]-C-[20]-E
A-[2]-B-[4]-C-[20]-E-[80]-F
It would return E, because (2+4+20)/3 > 3.5
and F is more than 3 hops away
One way to write the query, which has the benefit of being easy to read, is
MATCH p=(A {name: 'A'})-[*1..3]-(x)
UNWIND [r IN relationships(p) | r.weight] AS weight
RETURN x.name, avg(weight) AS avgWeight
ORDER BY avgWeight DESC
LIMIT 1
Here we extract the weights in the path into a list, and unwind that list. Try inserting a RETURN there to see what the results look like at that point. Because we unwind we can use the avg() aggregation function. By returning not only the avg(weight), but also the name of the last path node, the aggregation will be grouped by that node name. If you don't want to return the weight, only the node name, then change RETURN to WITH in the query, and add another return clause which only returns the node name.
You can also add something like [n IN nodes(p) | n.name] AS nodesInPath to the return statement to see what the path looks like. I created an example graph based on your question with below query with nodes named A, B, C etc.
CREATE (A {name: 'A'}),
(B {name: 'B'}),
(C {name: 'C'}),
(D {name: 'D'}),
(E {name: 'E'}),
(F {name: 'F'}),
(A)-[:R {weight: 2}]->(B),
(B)-[:R {weight: 4}]->(C),
(A)-[:R {weight: 3.5}]->(D),
(C)-[:R {weight: 20}]->(E),
(E)-[:R {weight: 80}]->(F)
1) To select the possible paths with length from one to three - use match with variable length relationships:
MATCH p = (A)-[*1..3]->(T)
2) And then use the reduce function to calculate the average weight. And then sorting and limits to get one value:
MATCH p = (A)-[*1..3]->(T)
WITH p, T,
reduce(s=0, r in rels(p) | s + r.weight)/length(p) AS weight
RETURN T ORDER BY weight DESC LIMIT 1

Neo4j Cypher match doesn't find node that it should

Having added nodes with properties "id" and "name"
CREATE (s:subsystem {id: 12, name:"InjectEolCurrentCellSOCs"})
CREATE (s:subsystem {id: 13, name:"InjectEolCellCapacities"})
CREATE (s:subsystem {id: 14, name:"InjectEolCellResistances"})
CREATE (s:subsystem {id: 15, name:"InjectEolCellSOCs"})
This command works/finds the node and returns the requested value:
match(n {id:13}) return (n.name);
But this command does not find a match:
match(n {name:"InjectEolCellCapacities"}) return (n);
Could this be related to the fact that "InjectEolCellCapacities" and "InjectEolCellResistances" have the same first 13 characters ?
If you look at your first imag, you will see that you have saved the data "InjectEolCellCapacities " (there is a space at the end).
So if you want to match it, yu should use this query : MATCH (n:subsystem { name:"InjectEolCellCapacities "}) RETURN n
You can also search all subsystem node that have a name property that starts with InjectEolCellCapacities like this : MATCH (n:subsystem) WHERE n.name STARTS WITH 'InjectEolCellCapacities' RETURN n

How to match all paths that ends with nodes with common properties in Neo4j?

I would like to match all paths from one given node.
-->(c: {name:"*Tom*"})
/
(a)-->(b)-->(d: {name:"*Tom*"})
\
-->(e: {name:"*Tom*"})
These paths have specified structure that:
- the name of all children of the second-last node (b) should contain "Tom" substring.
How to write correct Cypher?
Let's recreate the dataset:
CREATE
(a:Person {name: 'Start'}),
(b:Person),
(c:Person {name: 'Tommy Lee Jones'}),
(d:Person {name: 'Tom Hanks'}),
(e:Person {name: 'Tom the Cat'}),
(a)-[:FRIEND]->(b),
(b)-[:FRIEND]->(c),
(b)-[:FRIEND]->(d),
(b)-[:FRIEND]->(e)
As you said in the comment, all requires a list. To get a list, you should use the collect function on the neighbours of b:
MATCH (:Person)-[:FRIEND]->(b:Person)-[:FRIEND]->(bn:Person)
WITH b, collect(bn) AS bns
WHERE all(bn in bns where bn.name =~ '.*Tom.*')
RETURN b, bns
We call b's neighbours as bn and collect them to a bns list.

Neo4j Passing distinct nodes through WITH in Cypher

I have the following query, where there are 3 MATCHES, connected with WITH, searching through 3 paths.
MATCH (:File {name: 'A'})-[:FILE_OF]->(:Fun {name: 'B'})-->(ent:CFGEntry)-[:Flows*]->()-->(expr:CallExpr {name: 'C'})-->()-[:IS_PARENT]->(Callee {name: 'd'})
WITH expr, ent
MATCH (expr)-->(:Arg {chNum: '1'})-->(id:Id)
WITH id, ent
MATCH (entry)-[:Flows*]->(:IdDecl)-[:Def]->(sym:Sym)
WHERE id.name = sym.name
RETURN id.name
The query returns two distinct id and one distinct entry, and 7 distinct sym.
The problem is that since in the second MATCH I pass "WITH id, entry", and two distinct id were found, two instances of entry is passed to the third match instead of 1, and the run time of the third match unnecessarily gets doubled at least.
I am wondering if anyone know how I should write this query to just make use of one single instance of entry.
Your best bet will be to aggregate id, but then you'll need to adjust your logic in the third part of your query accordingly:
MATCH (:File {name: 'A'})-[:FILE_OF]->(:Fun {name: 'B'})-->(ent:CFGEntry)-[:Flows*]->()-->(expr:CallExpr {name: 'C'})-->()-[:IS_PARENT]->(Callee {name: 'd'})
WITH expr, ent
MATCH (expr)-->(:Arg {chNum: '1'})-->(id:Id)
WITH collect(id.name) as names, ent
MATCH (entry)-[:Flows*]->(:IdDecl)-[:Def]->(sym:Sym)
WHERE sym.name in names
RETURN sym.name

Resources