I'm starting studying the Neo4j and the Cypher Queries, and I need some help to understand and get some results.
I have this MATCH query:
MATCH (p:EntidadePessoa {id: 168750})-[rd:SE_RELACIONA]->(d:Documento)<--(p2)
where p2:EntidadePessoa or p2:EntidadeOrganizacao
return p,d,p2
That results this:
Commencing with EntidadePessoa id:168750, I want the count of EntidadeDocumento directcly linked to EntidadePessoa id:168750, is this case 2, and the count of each Entidade* that is linked to EntidadeDocumento, in this case 4 for each EntidadeDocumento.
I tried some queries, but none give me the results I wanted, the count number is never the numbers I wanted.
Could you help with that?
A query to get the count of d nodes connected to p (id: 168750).
MATCH (p:EntidadePessoa {id: 168750})-[rd:SE_RELACIONA]->(d:Documento)
RETURN p.id as `nodeId`, count(d) as `connectedCount`
output:
nodeId connectedCount
168750 2
A query with a subquery to look at just d:Documento, incoming links from p2, omitting (p:EntidadePessoa {id: 168750}) and counting p2s for each d node:
CALL
{MATCH (p)-[rd:SE_RELACIONA]->(d:Documento)
WHERE p.id=168750
RETURN d,p.id as `topid`}
MATCH (p2)-->(d)
WHERE (p2:EntidadePessoa or p2:EntidadeOrganizacao) AND p2.id<>topid
//p2.id<>topid ensures p is not included in count(p2)
RETURN d.id as `nodeId`, count(p2) as `connectedCount`
output:
nodeId connectedCount
164532 4
164552 4
Combine both these results with a UNION:
MATCH (p:EntidadePessoa {id: 168750})-[rd:SE_RELACIONA]->(d:Documento)
RETURN p.id as `nodeId`, count(d) as `connectedCount`UNION
CALL
{MATCH (p)-[rd:SE_RELACIONA]->(d:Documento)
WHERE p.id=168750
RETURN d,p.id as `topid`}
MATCH (p2)-->(d)
WHERE (p2:EntidadePessoa or p2:EntidadeOrganizacao) AND p2.id<>topid
//p2.id<>topid ensures p is not included in count(p2)
RETURN d.id as `nodeId`, count(p2) as `connectedCount`
output:
nodeId connectedCount
168750 2
164552 4
164552 4
This will give you the result. This is similar to count and group_by in sql.
MATCH (p:EntidadePessoa {id: 168750})-[rd:SE_RELACIONA]->(d:Documento)<--(p2)
where p2:EntidadePessoa or p2:EntidadeOrganizacao
With p, count(distinct d) as cnt_d, count(p2) as cnt_p2
return p, cnt_d, cnt_p2
Related
So I have a query along the lines of MATCH (g:Gene)-[r]-() RETURN DISTINCT type(r), count(r) which returns a breakdown table of the number of in/out-going edges from a gene node, per relationship type.
I want to do this on a number of nodes and instead of doing it one table at a time, it would be awesome to just return a table with relationship types in one column and counts per gene on the subsequent ones.
MATCH (g:Gene {name: "G1"})-[r]-(n)
RETURN DISTINCT type(r), count(r) as g1
UNION ALL MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN DISTINCT type(r), count(r) as g2
Doesn't work due to syntax error: All sub queries in an UNION must have the same column names (line 3, column 1 (offset: 108)). This is likely due to the fact that some genes don't have all the relationship types that others have.
If I do the following:
MATCH (g:Gene {name: "G1"})-[r]-(n)
RETURN DISTINCT type(r), null as g2, count(r) as g1
UNION ALL MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN DISTINCT type(r), null as g1, count(r) as g2
then I get duplicate rows for relationship types, where it's null for g1 in one and for g2 in the other.
What am I misunderstanding here?
The UNION error is because the columns do not match- changing it to
MATCH (g:Gene {name: "G1"})-[r]-(n)
RETURN DISTINCT type(r) as type, count(r) as count
UNION ALL
MATCH (g:Gene {name: "G2"})-[r]-(n)
RETURN DISTINCT type(r) as type , count(r) as count
will work.
You can use UNWIND. and you need to return the node (g in here).
UNWIND ["G1", "G2"] AS name
MATCH (g:Gene {name: name})-[r]-(n)
RETURN g , DISTINCT type(r) as type, count(r) as count
I am writing a Cypher query in Neo4j 2.0.4 that attempts to get the total number of inbound and outbound relationships for a selected node. I can do this easily when I only use this query one-node-at-a-time, like so:
MATCH (g1:someIndex{name:"name1"})
MATCH g1-[r1]-()
RETURN count(r1);
//Returns 305
MATCH (g2:someIndex{name:"name2"})
MATCH g2-[r2]-()
RETURN count(r2);
//Returns 2334
But when I try to run the query with 2 nodes together (i.e. get the total number of relationships for both g1 and g2), I seem to get a bizarre result.
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
For some reason, the number is much much greater than the total of 305+2334.
It seems like other Neo4j users have run into strange issues when using multiple MATCH clauses, so I read through Michael Hunger's explanation at https://groups.google.com/d/msg/neo4j/7ePLU8y93h8/8jpuopsFEFsJ, which advised Neo4j users to pipe the results of one match using WITH to avoid "identifier uniqueness". However, when I run the following query, it simply times out:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
I suspect this query doesn't return because there's a lot of records returned by r1. In this case, how would I operate my "get-number-of-relationships" query on 2 nodes? Am I just using some incorrect syntax, or is there some fundamental issue with the logic of my "2 node at a time" query?
Your first problem is that you are returning a Cartesian product when you do this:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
If there are 305 instances of r1 and 2334 instances of r2, you're returning (305 * 2334) == 711870 rows, and because you are summing this (count(r1)+count(r2)) you're getting a total of 711870 + 711870 == 1423740.
Your second problem is that you are not carrying over g2 in the WITH clause of this query:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
You match on g2 in the first MATCH clause, but then you leave it behind when you only carry over r1 in the WITH clause at line 3. Then, in line 4, when you match on g2-[r2]-() you are matching literally everything in your graph, because g2 has been unbound.
Let me walk through a solution with the movie dataset that ships with the Neo4j browser, as you have not provided sample data. Let's say I want to get the total count of relationships attached to Tom Hanks and Hugo Weaving.
As separate queries:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
RETURN COUNT(r)
=> 13
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN COUNT(r)
=> 5
If I try to do it your way, I'll get (13 * 5) * 2 == 90, which is incorrect:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(r1) + COUNT(r2)
=> 90
Again, this is because I've matched on all combinations of r1 and r2, of which there are 65 (13 * 5 == 65) and then summed this to arrive at a total of 90 (65 + 65 == 90).
The solution is to use DISTINCT:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2)
=> 18
Clearly, the DISTINCT modifier only counts the distinct instances of each entity.
You can also accomplish this with WITH if you wanted:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
WITH COUNT(r) AS r1
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN r1 + COUNT(r)
=> 18
TL;DR - Beware of Cartesian products. DISTINCT is your friend:
MATCH (:someIndex{name:"name1"})-[r1]-(),
(:someIndex{name:"name2"})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2);
The explosion of results you're seeing can be easily explained:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
In the 2nd line every combination of any relationship from g1 is combined with any relationship of g2, this explains the number since 1423740 = 305 * 2334 * 2. So you're evaluating basically a cross product here.
The right way to calculate the sum of all relationships for name1 and name2 is:
MATCH (g:someIndex)-[r]-()
WHERE g.name in ["name1", "name2"]
RETURN count(r)
My query is:
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
RETURN n,p
The result is on the screenshot below.
How to count the total nodes?
I need 14 as a text result. Something like RETURN COUNT(n)+COUNT(p) but it shows 24.
The following request doesn't work correctly:
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
RETURN count(n)
Returns me 12, which is the number of relationships pairs as on the picture, not nodes.
MATCH (n)-[:NT]-(p)
WHERE ...some properties filters...
RETURN count(n)
Returns 24.
How to count toward that two nodes (in this example) that have outgoing ONLY arrows? Should be 14 at once.
UPD:
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN DISTINCT FILTER(x in n.myID WHERE NOT x in p.myID)
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN DISTINCT FILTER(x in p.myID WHERE NOT x in n.myID)
The COUNT of DISTINCT UNION of myID gives me the result.
I don't know how to make it with cypher.
Or the DISTINCT UNION of collections:
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN collect(DISTINCT p.myID), collect(DISTINCT n.myID)
The result is:
collect(DISTINCT p.myID)
26375, 26400, 21636, 29939, 20454, 26543, 19089, 4483, 26607, 30375, 26608, 26605
collect(DISTINCT n.myID)
11977, 19478, 20454
Which is 15 items. One is common. If you UNION or DISTINCT the 20454 the total COUNT would be 14. The actual number of nodes on the picture.
I can not achieve this simple pattern.
Your original queries are working correctly.
If you want to get a count of distinct n nodes, your queries should RETURN COUNT(DISTINCT n).
To count the number of nodes that only have outgoing relationships:
MATCH (n)-->()
WHERE NOT ()-->(n)
COUNT(DISTINCT n);
To count the number of distinct nodes that are directly involved in an :NT relationship:
MATCH (n)-[:NT]-()
COUNT(DISTINCT n);
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
WITH collect(DISTINCT p.myID) AS set1
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
WITH collect(DISTINCT n.myID) AS set2, set1
WITH set1 + set2 AS BOTH
UNWIND BOTH AS res
RETURN COUNT(DISTINCT res);
I'm playing with cypher and I have some simple aggregation going on for me.
MATCH (p:Person)-[:HAS_CAR]->(n:Car)
RETURN n, count(p)
MATCH (p:Person)-[:HAS_APARTMENT]->(n:Apartment)
RETURN n, count(p)
MATCH (p:Person)-[:HAS_HOUSE]->(n:House)
RETURN n, count(p)
The problem is that I have to make 3 trips to the database to get all those results together. The problematic thing about that is that those queries are the last MATCH statement in a much bigger chain. Like this:
MATCH (:City { Id: 10})<-[:LIVES_IN]-(p:Person)
WITH p
MATCH ...
WITH p
MATCH ...
WITH p
MATCH ...
WITH p
MATCH ...
WITH p
MATCH p-[:HAS_CAR]->(n:Car)
RETURN n, count(p)
After all those MATCH ... WITH statements, only a few person nodes are matched so the last part of the query is very fast, but the initial part is not. I can't help but think that this could be improved because all three queries share a lot of statements.
I came up with this:
...
MATCH p-[:HAS_CAR|HAS_APARTMENT|HAS_HOUSE]->(n)
RETURN n, labels(n), count(p)
And I can work with that. But what if I wanted to mix in something like this:
MATCH p-[:KNOWS]->(:Person)-[:HAS_BIKE]->(n:Bike)
RETURN n, count(p)
Or even:
MATCH p-[:KNOWS]->(:Person)-[:HAS_BIKE|HAS_BOAT]->(n)
RETURN n, labels(n), count(p)
Can all of this be done in a single query and how?
Sometimes you need to use collections instead of rows to merge aggregation queries together and pass them along. This strategy might help... For example:
MATCH (p:Person)-[:HAS_CAR]->(car:Car)
WITH car, count(p) carCount
WITH collect({car:car, count:carCount}) as carCounts
MATCH (p:Person)-[:HAS_APARTMENT]->(n:Apartment)
WITH n, count(p) as apartmentCount, carCounts
RETURN collect({apartment:n, count:apartmentCount}) as apartmentCounts, carCounts
Update (see comments)--this lets you pass along the results of a filter and do a quick id lookup to find them again:
MATCH (p:Person)
WHERE p.name = "John" // or whatever else you need to filter on
WITH collect(id(p)) as pids
MATCH (p)-[:HAS_CAR]->(car:Car)
WHERE id(p) IN pids
WITH car, count(p) carCount, pids
WITH collect({car:car, count:carCount}) as carCounts, pids
MATCH (p)-[:HAS_APARTMENT]->(n:Apartment)
WHERE id(p) IN pids
WITH n, count(p) as apartmentCount, carCounts
RETURN collect({apartment:n, count:apartmentCount}) as apartmentCounts, carCounts
I currently have this query:
START n=node(*)
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN p, count(p)
LIMIT 5
Say there are 42 people in FooManGroup, I want to return 5 of these people, with a count of 42.
Is this possible to do in one query?
Running this now returns 5 rows, which is fine, but a count of 104, which is the total number of nodes of any type in my DB.
Any suggestions?
You can use a WITH clause to do the counting of the persons, followed by an identical MATCH clause to do the matching of each person. Notice that you need to START on the p nodes and not just some n that will match any node in the graph:
MATCH (p:Person )-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
WITH count(p) as personsInGroup
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN p, personsInGroup
LIMIT 5
It may not be the best or most elegant way to this, but it works. If you use cypher 2.0 it may be a bit more compact like this:
MATCH (p:Person)-[:is_member]->(g:Group {name: 'FooManGroup'})
WITH count(p) as personsInGroup
MATCH (p:Person)-[:is_member]->(g:Group {name: 'FooManGroup'})
RETURN p, personsInGroup
LIMIT 5
Relationship types are always uppercased in cypher, so :is_member should be :IS_MEMBER which I think is more readable:
MATCH (p:Person)-[:IS_MEMBER]->(g:Group {name: 'FooManGroup'})
WITH count(p) as personsInGroup
MATCH (p:Person)-[:IS_MEMBER]->(g:Group {name: 'FooManGroup'})
RETURN p, personsInGroup
LIMIT 5
Try this:
MATCH (p:Person)-[:is_member]->(g:Group)
WHERE g.name ='FooManGroup'
RETURN count(p), collect(p)[0..5]