aggregate count changes when multiple returns - neo4j

I am sending a cypher query through php.
match (n:person)-[:watched]->(m:movie)
where m.Title in $mycollection
return count(distinct n.id);
this returns the number of people who have watched movies in my collection.
I actually want to return the list of names, and return n.name works fine.
When I try to return n.name and count(distinct n.id) at the same time, I lose the total count and get the count per row.
match (n:person)-[:watched]->(m:movie)
where m.Title in $mycollection
return n.name, count(distinct n.id);
does not work. The count column appears as 1 for each row.
As I'm using php, I've also tried:
$count = $result->getNodesCount();
to no avail. So I'm using php to count the array. But it feels like Cypher should be able to do it, right?

return n.name, count(distinct n.name) means "return each distinct n.name value and its number of distinct values". The number must always be 1, since a distinct value is, obviously, distinct.
If you are actually looking for the number of times each person had an outgoing relationship to a movie whose title is in $mycollection, do this instead (where count(*) counts the number of times a given n.name was matched):
MATCH (n:person)-->(m:movie)
WHERE m.Title in $mycollection
RETURN n.name, count(*);
Note that the above query omits the [watched] pattern found in your query, since that syntax (with no colon before watched) does no filtering at all. It merely assigns the relationship to a variable named watched, but that variable is not otherwise used, and is therefore superfluous.
If you had intended to use watched as the relationship type, then do this instead:
MATCH (n:person)-[:watched]->(m:movie)
WHERE m.Title in $mycollection
RETURN n.name, count(*);
This modified query returns the number of times each person watched a movie whose title is in $mycollection

Related

Return count of relationships instead of all

I wonder if anyone can advise how to adjust the following query so that it returns one relationship with a count of the number of actual relationships rather than every relationship? I have some nodes with many relationships and it's killing the graph's performance.
MATCH (p:Provider{countorig: "XXXX"})-[r:supplied]-(i:Importer)
RETURN p, i limit 100
Many thanks
To return the relationship name along with a count, change your "return" statement, like this:
MATCH (p:Provider{countorig: "XXXX"})-[r:supplied]-(i:Importer)
RETURN type(r), count(r)
Using type(r) will return the type of the relationship, which looks to be "supplied" in your example. And then count(r) is just using the built-in function to count the number of occurrences of that relationship in the query.

Neo4j Cypher- With clause query

I'm doing some codes on the Neo4j's movies dataset the question was
Retrieve the actors who have acted in exactly five movies, returning the name of the actor, and the list of movies for that actor.
I wrote this following query and im not getting the result and it shows "no changes no result"
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a,m, count(m) AS numMovies
WHERE numMovies = 5
RETURN a.name,collect(m.title) AS movies
where as when I wrote this query for the same satement this time I just write the "collect(m.title) AS movies " in the WITH clause and I got the desired result.
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(m) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, movies
My doubt is that why result varies when I wrote the "collect(m.title) AS movies" in the RETURN clause.
Your first query has m, count(m), which will result in a count of 1 for each Movie node m.
You can check this by returning from the query in the second line:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN a, m, count(m) AS numMovies
The solution is to remove the separate m variable from the WITH clause as shown in your second query.

why DISTINCT is needed in this Cypher query?

The below query is taken from neo4j movie review dataset sandbox:
MATCH (u:User {name: "Some User"})-[r:RATED]->(m:Movie)
WITH u, avg(r.rating) AS mean
MATCH (u)-[r:RATED]->(m:Movie)-[:IN_GENRE]->(g:Genre)
WHERE r.rating > mean
WITH u, g, COUNT(*) AS score
MATCH (g)<-[:IN_GENRE]-(rec:Movie)
WHERE NOT EXISTS((u)-[:RATED]->(rec))
RETURN rec.title AS recommendation, rec.year AS year, COLLECT(DISTINCT g.name) AS genres, SUM(score) AS sscore
ORDER BY sscore DESC LIMIT 10
what I can not understand is: why the DISTINCT keyword is required in the query's return statement?. Because the expected results from the last MATCH statement is something like this:
g1,x
g1,y
...
g2,z
g2,v
g2,m
...
gn,m
gn,b
gn,x
where g1,g2,..gn are the set of genres and x,y,z,v,m,b... are a set of movies (in addition there is a user and score column deleted for readability).
So according to my understanding what this query is returning: For each movie return its genres and the sum of their scores.
Assumptions:
Every Movie has a unique title. (This is required for the query to work as is.)
Every Genre has a unique name.
Every Movie has at most one IN_GENRE relationship to each distinct Genre.
Given the above assumptions, you are correct that the DISTINCT is not necessary. That is because the RETURN clause is using rec.title as one of the aggregation grouping keys.

Get the count of multiple properties in neo4j

I am trying to combine 2 cyphers into one for performance but have not succeeded.
I need to get the count of multiple properties unique to eachother in the same cypher.
EX 1:
Match (n)
RETURN n.foo, count(*) AS count
EX 2:
Match (n)
RETURN n.bar, count(*) AS count
I was hoping I could just run both:
Match (n)
RETURN n.foo, count(*) AS fooCount, n.bar, count(*) AS barCount
But this returns the same count for both as it is finding where they both match. Not what I want.
So was looking for a way to group them to be unique like:
Match (n)
RETURN {n.foo, count(*) AS fooCount}, {n.bar, count(*) AS barCount}
Obviously this is not valid syntax but shows what I am trying to do.
Any assistance on this is of course appreciated.
It's best to do this back to back, all at once isn't a good idea for this kind of query, as aggregation won't work in your favor.
You could try this:
MATCH (n)
WITH n.bar as bar, count(*) AS count
WITH collect({bar:bar, count:count}) as barCounts
MATCH (n)
WITH barCounts, n.foo as foo, count(*) AS count
WITH barCounts, collect({foo:foo, count:count}) as fooCounts
RETURN barCounts, fooCounts
Since you are trying to aggregate separate query results, you can also use UNION as a quick and easy way to return both at the same time.
Match (n)
RETURN "foo" as type, n.foo as value, count(*) AS count
UNION ALL
Match (n)
RETURN "bar" as type, n.bar as value, count(*) AS count
Just a few notes, both returns for a UNION must have the same column names.
Also, the "type" column in the example isn't necessary, but it shows how you can add filler if both queries don't have the same number of return columns. (Or if you want to tell which query the result is from.) If there is a "foo" and a "bar" with the same value+count, UNION ALL will keep both, and UNION will drop the duplicate (if you remove the type column).
Maybe it's outdated, but just in case someone needs it, I've found another approach using an APOC function which avoids running multiple times the same MATCH (n). In your case, it could be something like:
MATCH (n)
WITH collect(n.bar) as bars, collect(n.foo) as foos
WITH apoc.coll.frequenciesAsMap(bars) as barCounts, apoc.coll.frequenciesAsMap(foos) as fooCounts
RETURN barCounts, fooCounts
Single MATCH, multiple Counts.
Wish it could help someone!

Limit the results of a union cypher query

Let's say we have the example query from the documentation:
MATCH (n:Actor)
RETURN n.name AS name
UNION
MATCH (n:Movie)
RETURN n.title AS name
I know that if I do that:
MATCH (n:Actor)
RETURN n.name AS name
LIMIT 5
UNION
MATCH (n:Movie)
RETURN n.title AS name
LIMIT 5
I can reduce the returned results of each sub query to 5.How can I LIMIT the total results of the union query?
This is not yet possible, but there is already an open neo4j issue that requests the ability to do post-UNION processing, which includes what you are asking about. You can add a comment to that neo4j issue if you support having it resolved.
This can be done using UNION post processing by rewriting the query using the COLLECT function and the UNWIND clause.
First we turn the columns of a result into a map (struct, hash, dictionary), to retain its structure. For each partial query we use the COLLECT to aggregate these maps into a list, which also reduces our row count (cardinality) to one (1) for the following MATCH. Combining the lists is a simple list concatenation with the “+” operator.
Once we have the complete list, we use UNWIND to transform it back into rows of maps. After this, we use the WITH clause to deconstruct the maps into columns again and perform operations like sorting, pagination, filtering or any other aggregation or operation.
The rewritten query will be as below:
MATCH (n:Actor)
with collect ({name: n.title}) as row
MATCH (n:Movie)
with row + collect({name: n.title}) as rows
unwind rows as row
with row.name as name
return name LIMIT 5
This is possible in 4.0.0
CALL {
MATCH (p:Person) RETURN p
UNION
MATCH (p:Person) RETURN p
}
RETURN p.name, p.age ORDER BY p.name
Read more about Post-union processing here https://neo4j.com/docs/cypher-manual/4.0/clauses/call-subquery/

Resources