I am trying to understand the output of the following queries in Cypher.
start n=node:node_auto_index(name="root_node")
match n-[:SC]->c, b<-[:CB]-c-[:CB]->b1
where (b.days_in_number - b1.days_in_number <= 7) AND (b.days_in_number > b1.days_in_number)
and c.name = "C16659"
with distinct n,c, b, b1
match n-[:SC]->c-[:CB]->b-[:CT]->i1, n-[:SC]->c-[:CB]->b1-[:CT]->i2
with b.name as bname,b1.name as b1name,i1.name as i1name,i2.name as i2name
return bname,b1name,i1name,i2name
order by bname,b1name,i1name,i2name;
returns 3680 rows
start n=node:node_auto_index(name="root_node")
match n-[:SC]->c, b<-[:CB]-c-[:CB]->b1
where (b.days_in_number - b1.days_in_number <= 7) AND (b.days_in_number > b1.days_in_number)
and c.name = "C16659"
with distinct n,c, b, b1
match b-[:CT]->i1, b1-[:CT]->i2
with b.name as bname,b1.name as b1name,i1.name as i1name,i2.name as i2name
return bname,b1name,i1name,i2name
order by bname,b1name,i1name,i2name;
returns 184 rows
query 1 seems to be doing cartesian, but I am unable to understand why? Can anyone please explain?
c-[:CB]->b is 1:n relationship.
UPDATE:
when I run the following query I get correct 184 results:
start n=node:node_auto_index(name="root_node")
match n-[:SC]->c, b<-[:CB]-c-[:CB]->b1
where (b.days_in_number - b1.days_in_number <= 7) AND (b.days_in_number > b1.days_in_number)
and c.name = "C16659"
with distinct n,c, b, b1
match c-[:CB]->b-[:CT]->i1, c-[:CB]->b1-[:CT]->i2
with n.name as nname,c.name as cname, b.name as bname,b1.name as b1name,i1.name as i1name,i2.name as i2name
return nname,cname,bname,b1name,i1name,i2name
order by nname,cname,bname,b1name,i1name,i2name;
This suggests putting n back leads to Cartesian.
n-[:SC]->c is 1:1 relationship. Why is this happening?
Have you verified that the relationship between n and c is realy unique? Maybee looking at the paths will help. Try
match p1=n-[:SC]->c-[:CB]->b-[:CT]->i1, p2=n-[:SC]->c-[:CB]->b1-[:CT]->i2
return p1,p2
to see where the aditional results differ from each other.
Related
I'm trying to make a cypher query to make nodes list which is using "multi match" as follows:
MATCH (N1:Node)-[r1:write]->(P1:Node),
(P1:Node)-[r2:access]->(P2:Node)<-[r3:create]-(P1)
WHERE r1.Time <r3.Time and r1.ID = r2.ID and r3.Time < r2.Time
return nodes(*)
I expect the output of the Cypher to be all nodes of the result, but Cypher doesn't support nodes(*).
I know that there is a way to resolve this like thisa;
MATCH G1=(N1:Node)-[r1:write]->(P1:Node),
G2=(P1:Node)-[r2:access]->(P2:Node)<-[r3:create]-(P1)
WHERE r1.Time <r3.Time and r1.ID = r2.ID and r3.Time < r2.Time
return nodes(G1), nodes(G2)
But the match part could be changed frequently so I want to know the way to get nodes from multi-match without handling variable for every match.
Is it possible?
Your specific example is easy to consolidate into a single path, as in:
MATCH p=(:Node)-[r1:write]->(p1:Node)-[r2:access]->(:Node)<-[r3:create]-(p1)
WHERE r1.Time < r3.Time < r2.Time AND r1.ID = r2.ID
RETURN NODES(p)
If you want the get distinct nodes, you can replace RETURN NODES(p) with UNWIND NODES(p) AS n RETURN DISTINCT n.
But, in general, you might be able to use the UNION clause to join the results of multiple disjoint statements, as in:
MATCH p=(:Node)-[r1:write]->(p1:Node)-[r2:access]->(:Node)<-[r3:create]-(p1)
WHERE r1.Time < r3.Time < r2.Time AND r1.ID = r2.ID
UNWIND NODES(p) AS n
RETURN n
UNION
MATCH p=(q0:Node)-[s1:create]->(q1:Node)-[s2:read]->(q0)
WHERE s2.Time > s1.Time AND s1.ID = s2.ID
UNWIND NODES(p) AS n
RETURN n
The UNION clause would combine the 2 results (after removing duplicates). UNION ALL can be used instead to keep all the duplicate results. The drawback of using UNION (ALL) is that variables cannot be shared between the sub-statements.
There are three node types: A, B and C.
I need all the A's and B's and only the C's that participate in exactly one relationship.
match (n)
where n:A or n:B or (n:C)-[]-()
with count(n) as countOfRels
where countOfRels > 0
return n
Not close, I know. I'm not sure where to go from here.
It's a bit strange that A, B and C do not seem to be related ... but here's how you could solve your question for C :
MATCH (n:C)
WHERE size((n)-[]-()) = 1
RETURN n
UNION
MATCH (n:A)
RETURN n
UNION
MATCH (n:B)
RETURN n;
Hope this helps.
Regards,
Tom
you can use this
match(n)
where n:A OR n:B OR (n:C)-[r]-()
with count(r) as countOfRels
where countOfRels > 0
return n
Hope this helps.
You can do MATCH (a)--() WHERE NOT ()--(a)--() to match "nodes with only one relation". After that, You can use UNION or COLLECT()+UNWIND to combine the separate queries into one row result set.
// using Union
MATCH (n:C)--()
WHERE NOT ()--(n)--()
RETURN n
UNION
MATCH (n:A)
RETURN n
UNION
MATCH (n:B)
RETURN n;
// Using collect
OPTIONAL MATCH (a:A)
OPTIONAL MATCH (b:B)
OPTIONAL MATCH (c:C)--() WHERE NOT ()--(c)--()
WITH COLLECT(a)+COLLECT(b)+COLLECT(c) as nodez
UNWIND nodez as n
RETURN DISTINCT n
I'm starting with Neo4j and using graphs, and I'm trying to get the following:
I have to find the subtraction(difference) between the number of users (each user is a node) and the number of differents names they have. I have 16 nodes, and each one has his own name (name is one of the properties it has), but some of them have the same name (for example the node A has (Name:Amanda,City:Roma) and node B has (Name:Amanda, City:Paris), so I will have less name's count because some of them are repeated.
I have tried this:
match (n) with n, count(n) as c return sum(c)
That gives me the number of nodes. And then I tried this
match (n) with n, count(n) as nodeC with n, count( distinct n.Name) as
nameC return sum(nodeC) as sumN, sum(nameC) as sumC, sumN-sumC
But it doesn't work (I'm not sure if even i'm getting the names well, because when I try it, separated, it doesn't work neither).
I think this is what you are looking for:
MATCH (n)
RETURN COUNT(n) - COUNT(DISTINCT n.name) AS diff;
I am writing a Cypher query in Neo4j 2.0.4 that attempts to get the total number of inbound and outbound relationships for a selected node. I can do this easily when I only use this query one-node-at-a-time, like so:
MATCH (g1:someIndex{name:"name1"})
MATCH g1-[r1]-()
RETURN count(r1);
//Returns 305
MATCH (g2:someIndex{name:"name2"})
MATCH g2-[r2]-()
RETURN count(r2);
//Returns 2334
But when I try to run the query with 2 nodes together (i.e. get the total number of relationships for both g1 and g2), I seem to get a bizarre result.
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
For some reason, the number is much much greater than the total of 305+2334.
It seems like other Neo4j users have run into strange issues when using multiple MATCH clauses, so I read through Michael Hunger's explanation at https://groups.google.com/d/msg/neo4j/7ePLU8y93h8/8jpuopsFEFsJ, which advised Neo4j users to pipe the results of one match using WITH to avoid "identifier uniqueness". However, when I run the following query, it simply times out:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
I suspect this query doesn't return because there's a lot of records returned by r1. In this case, how would I operate my "get-number-of-relationships" query on 2 nodes? Am I just using some incorrect syntax, or is there some fundamental issue with the logic of my "2 node at a time" query?
Your first problem is that you are returning a Cartesian product when you do this:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
If there are 305 instances of r1 and 2334 instances of r2, you're returning (305 * 2334) == 711870 rows, and because you are summing this (count(r1)+count(r2)) you're getting a total of 711870 + 711870 == 1423740.
Your second problem is that you are not carrying over g2 in the WITH clause of this query:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
You match on g2 in the first MATCH clause, but then you leave it behind when you only carry over r1 in the WITH clause at line 3. Then, in line 4, when you match on g2-[r2]-() you are matching literally everything in your graph, because g2 has been unbound.
Let me walk through a solution with the movie dataset that ships with the Neo4j browser, as you have not provided sample data. Let's say I want to get the total count of relationships attached to Tom Hanks and Hugo Weaving.
As separate queries:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
RETURN COUNT(r)
=> 13
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN COUNT(r)
=> 5
If I try to do it your way, I'll get (13 * 5) * 2 == 90, which is incorrect:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(r1) + COUNT(r2)
=> 90
Again, this is because I've matched on all combinations of r1 and r2, of which there are 65 (13 * 5 == 65) and then summed this to arrive at a total of 90 (65 + 65 == 90).
The solution is to use DISTINCT:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2)
=> 18
Clearly, the DISTINCT modifier only counts the distinct instances of each entity.
You can also accomplish this with WITH if you wanted:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
WITH COUNT(r) AS r1
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN r1 + COUNT(r)
=> 18
TL;DR - Beware of Cartesian products. DISTINCT is your friend:
MATCH (:someIndex{name:"name1"})-[r1]-(),
(:someIndex{name:"name2"})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2);
The explosion of results you're seeing can be easily explained:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
In the 2nd line every combination of any relationship from g1 is combined with any relationship of g2, this explains the number since 1423740 = 305 * 2334 * 2. So you're evaluating basically a cross product here.
The right way to calculate the sum of all relationships for name1 and name2 is:
MATCH (g:someIndex)-[r]-()
WHERE g.name in ["name1", "name2"]
RETURN count(r)
My query is:
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
RETURN n,p
The result is on the screenshot below.
How to count the total nodes?
I need 14 as a text result. Something like RETURN COUNT(n)+COUNT(p) but it shows 24.
The following request doesn't work correctly:
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
RETURN count(n)
Returns me 12, which is the number of relationships pairs as on the picture, not nodes.
MATCH (n)-[:NT]-(p)
WHERE ...some properties filters...
RETURN count(n)
Returns 24.
How to count toward that two nodes (in this example) that have outgoing ONLY arrows? Should be 14 at once.
UPD:
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN DISTINCT FILTER(x in n.myID WHERE NOT x in p.myID)
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN DISTINCT FILTER(x in p.myID WHERE NOT x in n.myID)
The COUNT of DISTINCT UNION of myID gives me the result.
I don't know how to make it with cypher.
Or the DISTINCT UNION of collections:
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN collect(DISTINCT p.myID), collect(DISTINCT n.myID)
The result is:
collect(DISTINCT p.myID)
26375, 26400, 21636, 29939, 20454, 26543, 19089, 4483, 26607, 30375, 26608, 26605
collect(DISTINCT n.myID)
11977, 19478, 20454
Which is 15 items. One is common. If you UNION or DISTINCT the 20454 the total COUNT would be 14. The actual number of nodes on the picture.
I can not achieve this simple pattern.
Your original queries are working correctly.
If you want to get a count of distinct n nodes, your queries should RETURN COUNT(DISTINCT n).
To count the number of nodes that only have outgoing relationships:
MATCH (n)-->()
WHERE NOT ()-->(n)
COUNT(DISTINCT n);
To count the number of distinct nodes that are directly involved in an :NT relationship:
MATCH (n)-[:NT]-()
COUNT(DISTINCT n);
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
WITH collect(DISTINCT p.myID) AS set1
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
WITH collect(DISTINCT n.myID) AS set2, set1
WITH set1 + set2 AS BOTH
UNWIND BOTH AS res
RETURN COUNT(DISTINCT res);