Let's say we have the example query from the documentation:
MATCH (n:Actor)
RETURN n.name AS name
UNION
MATCH (n:Movie)
RETURN n.title AS name
I know that if I do that:
MATCH (n:Actor)
RETURN n.name AS name
LIMIT 5
UNION
MATCH (n:Movie)
RETURN n.title AS name
LIMIT 5
I can reduce the returned results of each sub query to 5.How can I LIMIT the total results of the union query?
This is not yet possible, but there is already an open neo4j issue that requests the ability to do post-UNION processing, which includes what you are asking about. You can add a comment to that neo4j issue if you support having it resolved.
This can be done using UNION post processing by rewriting the query using the COLLECT function and the UNWIND clause.
First we turn the columns of a result into a map (struct, hash, dictionary), to retain its structure. For each partial query we use the COLLECT to aggregate these maps into a list, which also reduces our row count (cardinality) to one (1) for the following MATCH. Combining the lists is a simple list concatenation with the “+” operator.
Once we have the complete list, we use UNWIND to transform it back into rows of maps. After this, we use the WITH clause to deconstruct the maps into columns again and perform operations like sorting, pagination, filtering or any other aggregation or operation.
The rewritten query will be as below:
MATCH (n:Actor)
with collect ({name: n.title}) as row
MATCH (n:Movie)
with row + collect({name: n.title}) as rows
unwind rows as row
with row.name as name
return name LIMIT 5
This is possible in 4.0.0
CALL {
MATCH (p:Person) RETURN p
UNION
MATCH (p:Person) RETURN p
}
RETURN p.name, p.age ORDER BY p.name
Read more about Post-union processing here https://neo4j.com/docs/cypher-manual/4.0/clauses/call-subquery/
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I am sending a cypher query through php.
match (n:person)-[:watched]->(m:movie)
where m.Title in $mycollection
return count(distinct n.id);
this returns the number of people who have watched movies in my collection.
I actually want to return the list of names, and return n.name works fine.
When I try to return n.name and count(distinct n.id) at the same time, I lose the total count and get the count per row.
match (n:person)-[:watched]->(m:movie)
where m.Title in $mycollection
return n.name, count(distinct n.id);
does not work. The count column appears as 1 for each row.
As I'm using php, I've also tried:
$count = $result->getNodesCount();
to no avail. So I'm using php to count the array. But it feels like Cypher should be able to do it, right?
return n.name, count(distinct n.name) means "return each distinct n.name value and its number of distinct values". The number must always be 1, since a distinct value is, obviously, distinct.
If you are actually looking for the number of times each person had an outgoing relationship to a movie whose title is in $mycollection, do this instead (where count(*) counts the number of times a given n.name was matched):
MATCH (n:person)-->(m:movie)
WHERE m.Title in $mycollection
RETURN n.name, count(*);
Note that the above query omits the [watched] pattern found in your query, since that syntax (with no colon before watched) does no filtering at all. It merely assigns the relationship to a variable named watched, but that variable is not otherwise used, and is therefore superfluous.
If you had intended to use watched as the relationship type, then do this instead:
MATCH (n:person)-[:watched]->(m:movie)
WHERE m.Title in $mycollection
RETURN n.name, count(*);
This modified query returns the number of times each person watched a movie whose title is in $mycollection
I am trying to combine 2 cyphers into one for performance but have not succeeded.
I need to get the count of multiple properties unique to eachother in the same cypher.
EX 1:
Match (n)
RETURN n.foo, count(*) AS count
EX 2:
Match (n)
RETURN n.bar, count(*) AS count
I was hoping I could just run both:
Match (n)
RETURN n.foo, count(*) AS fooCount, n.bar, count(*) AS barCount
But this returns the same count for both as it is finding where they both match. Not what I want.
So was looking for a way to group them to be unique like:
Match (n)
RETURN {n.foo, count(*) AS fooCount}, {n.bar, count(*) AS barCount}
Obviously this is not valid syntax but shows what I am trying to do.
Any assistance on this is of course appreciated.
It's best to do this back to back, all at once isn't a good idea for this kind of query, as aggregation won't work in your favor.
You could try this:
MATCH (n)
WITH n.bar as bar, count(*) AS count
WITH collect({bar:bar, count:count}) as barCounts
MATCH (n)
WITH barCounts, n.foo as foo, count(*) AS count
WITH barCounts, collect({foo:foo, count:count}) as fooCounts
RETURN barCounts, fooCounts
Since you are trying to aggregate separate query results, you can also use UNION as a quick and easy way to return both at the same time.
Match (n)
RETURN "foo" as type, n.foo as value, count(*) AS count
UNION ALL
Match (n)
RETURN "bar" as type, n.bar as value, count(*) AS count
Just a few notes, both returns for a UNION must have the same column names.
Also, the "type" column in the example isn't necessary, but it shows how you can add filler if both queries don't have the same number of return columns. (Or if you want to tell which query the result is from.) If there is a "foo" and a "bar" with the same value+count, UNION ALL will keep both, and UNION will drop the duplicate (if you remove the type column).
Maybe it's outdated, but just in case someone needs it, I've found another approach using an APOC function which avoids running multiple times the same MATCH (n). In your case, it could be something like:
MATCH (n)
WITH collect(n.bar) as bars, collect(n.foo) as foos
WITH apoc.coll.frequenciesAsMap(bars) as barCounts, apoc.coll.frequenciesAsMap(foos) as fooCounts
RETURN barCounts, fooCounts
Single MATCH, multiple Counts.
Wish it could help someone!
I want to create a map projection with node properties and some additional information.
Also I want to collect some ids in a collection and use this later in the query to filter out nodes (where ID(n) in ids...).
The map projection is created in an apoc call which includes several union matches.
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
MATCH (n)-[:has]->({"Vacation"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u, nodeWithInfo
This does not return anything because, when the where part is evaluated it doesn´t check "nodesWithAdditionalInfosIds" as a flat list but instead only gets one id per row.
The problem only exists because I am passing the ids (nodesWithAdditionalInfosIds) AND the nodeProjection (nodeWithInfo) on in the WITH clause.
If I instead only use the id collection and don´t use the nodeWithInfo projection the following adjustement works and returns my only the nodes which ids are in the id collection:
...
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
MATCH (n)-[:has]->({"Urlaub"})
MATCH (u)-[:is]->({"Out of Order"})
WHERE ID(n) in nodesWithAdditionalInfosIds and ID(u) in nodesWithAdditionalInfosIds
return n, u
If I just return the collection "nodesWithAdditionalInfosIds" directly after the WITH clause in both examples this gets obvious. Since the first one generates a flat list in one result row and the second one gives me one id per row.
I have the feeling that I am missing a crucial knowledge about neo4js With clause.
Is there a way I can pass on my listOfIds and use it as a flat list without the need to have an exclusive WITH clause for the collection?
edit:
Right now I am using the following workaround:
After I do the check on the ID of "n" and "u" I don´t return but instead keep the filtered "n" and "u" nodes and start a second apoc call that returns "nodeWithInfo" like before.
WITH n, u
call apoc.cypher.run('MATCH (n)-[:IS_A]->({name: "User"}) MATCH (add)-[:IS_A]->({name: "AdditionalInformationForUser"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo UNION MATCH (n)-[:IS_A]->({Department}) MATCH (add)-[:IS_A]->({"AdditionalInformationForDepartment"}) RETURN n{.*, info: collect(add.name), id: ID(n)} as nodeWithInfo', NULL) YIELD value
WITH (value.nodeWithInfo) AS nodeWithInfo, n, u
WHERE nodeWithInfo.id = ID(n) OR nodeWithInfo.id = ID(u)
RETURN nodeWithInfo, n, u
This way I can return the nodes n, u and the additional information (to one of the nodes) per row. But I am sure there must be a better way.
I know ids in neo4j have to be used with care, if at all. In this case I only need them to be valid inside this query, so it doesn´t matter if the next time the same node has another id.
The problem is stripped down to the core problem (in my opinion), the original query is a little bigger with several UNION MATCH inside apoc and the actual match on nodes which ids are contained in my collection is checking for some more restrictions instead of asking for any node.
Aggregating functions, like COLLECT(), aggregate over a set of "grouping keys".
In the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds, nodeWithInfo
the grouping key is nodeWithInfo. Therefore, each nodesWithAdditionalInfosIds would always be a list containing one value.
And in the following clause:
WITH collect(nodeWithInfo.id) as nodesWithAdditionalInfosIds
there is no grouping key. Therefore, in this situation, nodesWithAdditionalInfosIds will contain all the nodeWithInfo.id values.
I have a Cypher query which I'd like to expand to be summed up over a list of matching nodes.
My query looks like this:
MATCH (u:User {name: {input} })-[r:USES]-(t) RETURN SUM(t.weight)
This matches one User node, and I'd like to adjust it to match a list of User nodes and then perform the aggregation.
My current implementation just calls the query in a loop and performs the aggregation outside of Cypher. This results in slightly inaccurate results and a lot of API calls.
Is there a way to evaluate the Cypher query against a list of elements (strings in my case)?
I'm using Neo4j 2.1 or 2.2.
Cheers
You can use the IN operator and pass in an array of usernames:
MATCH (u:User)-[r:USES]-(t)
WHERE u.name in ['John','Jim','Jack']
RETURN u.name, SUM(t.weight)
Instead of the array here you can use an parameter holding and array value as well:
MATCH (u:User)-[r:USES]-(t)
WHERE u.name in {userNames}
RETURN u.name, SUM(t.weight)
userNames = ['John','Jim','Jack']