I copy one of the example class query from neo4j cypher aggregation class
https://www.youtube.com/watch?v=wfMTg0ujVjk
Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), sum(x)/NumActors
However, in the neo4j web console provided by the class. I am getting this error
Aggregation column contains implicit grouping expressions. For example, in 'RETURN n.a, n.a + n.b + count(*)' the aggregation expression 'n.a + n.b + count(*)' includes the implicit grouping key 'n.b'. It may be possible to rewrite the query by extracting these grouping/aggregation expressions into a preceding WITH clause. Illegal expression(s): NumActors (line 6, column 8 (offset: 183))
"Return sum(x), sum(x)/NumActors"
^
Apparently, this query works
Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x)
So it was complaining when we sum(x) with NumActors as an implicit aggregation key. How can I achieve the same goal by updating the query (computing sum and average)? my initial query syntax looks perfectly fine to me...
Use this function AVG() for this purpose:
Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), avg(x)
Sample result:
╒══════════╤═════════╕
│"sum(x)" │"avg(x)" │
╞══════════╪═════════╡
│P276Y5M14D│P69Y1M11D│
└──────────┴─────────┘
reference:
https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-avg-duration
For the explanation of your error, neo4j allows aggregation expressions if it follows some requirements stated on this guideline: https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#grouping-keys
If you insist to use your original query, this is the fix as stated in the error message of neo4j. You should use WITH to get the sum() first then use it to compute the average.
Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
UNWIND Ages as x
WITH sum(x) as total, NumActors
RETURN total, total/NumActors as avg
This is a bit tricky. So I will try to simplify it as much as possible. Let's try to understand how aggregations work, for this query:
Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x)
We first match for actors having their name starting as Tom and have some birth date. Let's say we have 10 actors, after the MATCH and WHERE clause executes. Now comes the statement
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
This performs two aggregate operations, count and collect, but note that we have not specified any grouping key specifically. Hence the output, of this query will be a single row with two columns NumActors and Ages. Now we unwind the ages array, so now we will have 10 rows, and finally, we return the sum, since there is no explicit grouping key specified the sum is calculated for all the data rows. Hence it works.
Let's consider your query:
Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), sum(x)/NumActors
Everything till UNWIND stage will be the same as above. Now, notice the return statement:
Return sum(x), sum(x)/NumActors
Here, also no explicit grouping key is specified. So, the sum(x), will include all the rows, but the term sum(x)/NumActors, is erroneous because Neo4j is unable to determine, from which row the value of NumActors to pick up, since it is not specified as the grouping key. That's why you get the error. The message:
Aggregation column contains implicit grouping expressions. For example, in 'RETURN n.a, n.a + n.b + count(*)' the aggregation expression 'n.a + n.b + count(*)' includes the implicit grouping key 'n.b'. It may be possible to rewrite the query by extracting these grouping/aggregation expressions into a preceding WITH clause. Illegal expression(s): NumActors (line 6, column 8 (offset: 183))
"Return sum(x), sum(x)/NumActors"
Clearly says, that the key n.b is getting grouped, it can't be used in an aggregate expression. Similarly here, NumActors is getting grouped, we can't use it in aggregation expression. Try this:
Match (a:Actor)
Where a.born is not null
And a.name starts with 'Tom'
with count(a) as NumActors, collect(duration.between(date(a.born), date())) as Ages
Unwind Ages AS x
Return sum(x), sum(x)/COLLECT(DISTINCT NumActors)[0]
The error is caused by the fact that you used NumActors in the RETURN clause without making it an explicit grouping key.
OPTION 1
Specify NumActors as an explicit grouping key:
...
RETURN sum(x), sum(x)/NumActors, NumActors
OPTION 2
Don't do aggregation in the RETURN clause:
MATCH (a:Actor)
WHERE a.born IS NOT NULL AND a.name STARTS WITH 'Tom'
WITH COUNT(a) AS numActors, SUM(duration.between(DATE(a.born), DATE())) as sumAges
RETURN sumAges, sumAges/numActors
NOTE: The code above also simplifies (and speeds up) the query by doing the SUM in the first (and now, only) WITH clause. You don't need the wasteful COLLECT -> UNWIND -> SUM sequence.
OPTION 3
If you don't need to have the sum of all the ages returned, just directly calculate the average age:
MATCH (a:Actor)
WHERE a.born IS NOT NULL AND a.name STARTS WITH 'Tom'
RETURN AVG(duration.between(DATE(a.born), DATE())) as avgAge
ADDENDUM
There is another performance improvement that could be made to all the solutions, and it would also remove a possible error.
We should calculate the DATE() value once, put it in a variable, and use that variable to calculate all ages. Not only would it avoid repeated calls to DATE(), but it would guarantee that all ages are calculated using the same current date.
Without this tweak, then it is possible for us to use more than one current date during the calculation, which would give us an inconsistent result.
So, for instance, Option 3 could be turned into this:
WITH DATE() AS now
MATCH (a:Actor)
WHERE a.born IS NOT NULL AND a.name STARTS WITH 'Tom'
RETURN AVG(duration.between(DATE(a.born), now)) as avgAge
I am working with email data and would like to use an existing query and add to it. I would like to add and where lastseen(which is stored as DateTime) within the last 15 minutes, without typing in the full datetime range in the cypher statement. Something like (lastseen - (datetime() - 15MM)).
Below is a sample Sender nodes properties:
<id>:12662 domain:corp.com firstseen:"2020-01-14T06:02:33Z" lastseen:"2020-01-14T06:25:45Z" name:person#corp.com timesseen:300
The following is the query I would like to incorporate the time portion in:
MATCH path = (s:Sender)-->(a:Attachment)-->(:Recipient)
WITH s, COUNT(DISTINCT a) AS cnt, COLLECT(path) AS paths
WHERE cnt >= 2
return paths
You could use the duration function to get your range.
The duration mask PT900S is 900 seconds.
WITH datetime() AS end
WITH end, end - duration("PT900S") AS start
RETURN start, end, duration.between(end, start)
Incorporated in your query it might look something like this...
MATCH path = (s:Sender)-->(a:Attachment)-->(:Recipient)
WHERE datetime() - duration("PT900S") <= s.lastseen <= datetime()
WITH s, COUNT(DISTINCT a) AS cnt, COLLECT(path) AS paths
WHERE cnt >= 2
RETURN paths
When creating a query in Google Sheets, I'm finding that hardcoding is working fine but using several references isn't working correctly.
Cell A2 = 0.75 (from a formula =(mround(Estimator!$C$4/57.2958,0.25)), type = number)
Cell B2 = 0.9 (from a formula =(mround(Estimator!$C$5+100,0.1)-100) type = number)
Specifically, the query below works:
=query(Time_Data, "SELECT N, O, P WHERE A="0.75" AND B="0.9)
And the query below works:
=query(Time_Data, "SELECT N, O, P WHERE A="&$A$2&" AND B="0.9)
But this query does not work:
=query(Time_Data, "SELECT N, O, P WHERE A="&0.75&" AND B="&$B$2)
And most importantly, this query does not work:
=query(Time_Data, "SELECT N, O, P WHERE A="&$A$2&" AND B="&$B$2)
Any suggestions about how to get this reference to work?
It would help if we could see your data. But maybe try FILTER() and see if that works?
=FILTER(N:P, A:A=A2, B:B=B2)
this is the correct syntax:
=QUERY(Time_Data; "SELECT N, O, P WHERE A matches'"&$A$2&"' AND B matches '"&$B$2&"'")
and if by any chance it wouldn't work try:
=QUERY(Time_Data; "SELECT N, O, P WHERE A matches '"&INDIRECT("A2")&"'
AND B matches '"&INDIRECT("B2")&"'")
I called google support and they advised as follows: the formula in cell B2 was edited from
=(mround(Estimator!$C$5+100,0.1)-100)
to
=Value((mround(Estimator!$C$5+100,0.1)-100))
This resolved the issue.
I am in need of get data of whether there is no relation exists between two labels and condition based data on one of labels. I found an answer following ::
MATCH (n:Label1)
WHERE NOT (n)-[:REL_1]-(:Label2)
OR (n)-[:REL_1]-(e:Label2 {id:1})
RETURN count(DISTINCT(n))
But What I need is like all of id>=5 data should come in result
If I perform a query like ::
MATCH (n:Label1)
WHERE NOT (n)-[:REL_1]-(:Label2)
OR (n)-[:REL_1]-(e:Label2)
WHERE e.id >= 5
RETURN count(DISTINCT(n))
It is producing error ::
Invalid input 'H': expected 'i/I' (line 1, column 94 (offset: 93))
[UPDATED]
A Cypher query cannot have 2 WHERE clauses in a row. In addition, you have not defined the e identifier.
This query should work (I assume you only want to count n if it does not have such a relationship OR if it has at least one in which e.id is at least 5):
MATCH (n:Label1)
OPTIONAL MATCH (n)-[:REL_1]-(e:Label2)
WITH n, e
WHERE e IS NULL OR e.id >= 5
RETURN count(DISTINCT n);
You can nest WHERE filters, you just have to give up syntactic sugar to do it.
MATCH (n:Label1)
WHERE ALL(p IN (n) - [:REL_1] - (:Label2) WHERE LAST(NODES(p))['id'] >= 5)
RETURN COUNT(n)
You reconstruct any match-filter construct you can dream of with the ANY, ALL, NONE toolset, which allow you to apply filters internally as well and nest the iterable component (between IN and `WHERE) to multiple depths.
I'm trying to write a Cypher query that uses aggregation to pull back the most relevant paths. My desired path is described in the MATCH clause below
MATCH p=(a:TYP1)--(b:TYP2)--(c:TYP3)--(d:TYP4)
RETURN a, count(distinct d) as cntTYP4
ORDER BY cntTYP4 DESC
This produces a list of nodes of TYP1 sorted in descending order by the number of TYP4 nodes that they link to in the MATCH clause. What I would like to do is return all paths p where cntTYP4 > 5 (for example). My attempts to structure a query this far have been unsuccessful. Hopefully I'm missing something obvious!
You can use WITH to do this. Something like:
MATCH p=(a:TYP1)--(b:TYP2)--(c:TYP3)--(d:TYP4)
WITH a, count(distinct d) as cntTYP4
WHERE cntTYP4 > 5
RETURN a, cntTYP4
ORDER BY cntTYP4 DESC
HTH,
Andrés