How to achieve the following in Cypher query? - neo4j

I tried the following query # http://goo.gl/Ou2GZG
START s=node(1), t=node(4)
MATCH p=s-[*]-pt--t
WHERE SINGLE (n1 IN nodes(p)
WHERE id(n1)=id(t))
WITH DISTINCT pt AS pts, t
MATCH p=t-[*]-pfn
WHERE NONE (n IN nodes(p)
WHERE id(n)=3 OR id(n)=7)
RETURN DISTINCT pfn AS pf
but I don't want to hard code 3 and 7 in the penultimate line where 3 and 7 are the nodes contained in (pts). I tried the following but I am getting "Unclosed parenthesis" error
START s=node(1), t=node(4)
MATCH p=s-[*]-pt--t
WHERE SINGLE (n1 IN nodes(p)
WHERE id(n1)=id(t))
WITH DISTINCT pt AS pts, t
MATCH p=t-[*]-pfn FOREACH(pt in pts :
WHERE NONE (n IN nodes(p)
WHERE id(n)=id(pt)))
RETURN DISTINCT pfn AS pf

I think you can use the ALL predicate to ensures that for each node n in the path p there doesn't exist a node in pt that has the same id as the node n,
START s=node(1), t=node(4)
MATCH p=s-[*]-pt--t
WHERE SINGLE (n1 IN nodes(p)
WHERE id(n1)=id(t))
WITH DISTINCT collect(id(pt)) AS pts, t
MATCH p=t-[*]-pfn
WHERE ALL (n IN nodes(p)
WHERE NONE (pt IN pts
WHERE id(n)= pt))
RETURN DISTINCT pfn AS pf

Related

cypher multiple match is very slow

I want to match some node which don't satisfy the condition, but it's very slow. How to optimize it? There are about 70000 nodes in the database.
match (n:A)--(p:B)--(q:C)
with collect(n) as nc
match m where not m in nc
return count(m)
There is no need to first get all the nodes that match an initial pattern. It is enough to go through all the nodes and check them:
MATCH (m) WHERE NOT (m)--(:B)--(:C)
RETURN count(m)
Or if the condition on the label 'A' is important:
MATCH (m)
OPTIONAL MATCH p = (m)--(:B)--(:C)
WITH m WHERE (p IS NULL) OR (NOT p IS NULL AND NOT 'A' IN LABELS(m))
RETURN count(m)

Get the edges for 1st and 2nd level neighbors limited by their degree

I would like to make a query that contains all 1st and second level neighbors from a given node. However I only want to retain a neighbor if it has at least two edges within the query (so that any node that is only connected to the rest with one edge is left out.
As a result i would obtain list of all the edges present.
I have tried the code below, which yields me the nodes im interrested in. But how do i get the edges? I tried to add them at the WITH command, but that didnt work.
MATCH (g:User)-[r:KNOWS*0..2]-(p)
WHERE id(g)=410
WITH p as p, count(r) as rC
WHERE rC>=2
RETURN p, rC
I used this way of generating test data
WITH ["Andres","Wes","Rik","Mark","Peter","Kenny","Michael","Stefan","Max","Chris"] AS names
FOREACH (r IN range(0,800) | CREATE (:User {id:r, name:names[r % size(names)]+" "+r}));
match (u:User),(p:User)
with u,p
limit 10000
where rand() < 0.1
create (u)-[:KNOWS]->(p);
You can collect relationships of the path:
MATCH path = (g:User)-[r:KNOWS*0..2]-(p)
WHERE id(g)=410
WITH p,
count(r) as rC,
collect( relationships(path) ) as edges
WHERE rC>=2
RETURN p, edges, rC
Update:
MATCH path = (g:User)-[r:KNOWS*0..2]-(p) WHERE g.id=410
WITH p,
count(r) as rC,
collect( EXTRACT(x in relationships(path) | id(x)) ) as edges
WHERE rC>=2
RETURN p, edges, rC

Get nodes BETWEEN two nodes

How can I get the nodes on a path of variable length between two nodes, exclusive, in Neo4j?
Example
N1 -RELATIONSHIP-> N2 -RELATIONSHIP-> N3 -RELATIONSHIP-> N4
I'd like to get N2 and N3
I do not know the length of the path beforehand, I only know the starting node
Match p= (n1)-[r:RELATIONSHIP*]->(n4) return filter(x IN nodes(p)
WHERE x<>n1 AND x<>n4) AS pathNodes
try this
You can get all the nodes with in the path like this
MATCH p=(n1)-->(b)-->(n4)
RETURN filter(x IN nodes(p)
WHERE id(x) <> id(n1) AND id(x) <>id(n4)) AS allNodes
Here is the reference doc http://neo4j.com/docs/stable/query-functions-collection.html

Using Match with Multiple Clauses Causes Odd Results

I am writing a Cypher query in Neo4j 2.0.4 that attempts to get the total number of inbound and outbound relationships for a selected node. I can do this easily when I only use this query one-node-at-a-time, like so:
MATCH (g1:someIndex{name:"name1"})
MATCH g1-[r1]-()
RETURN count(r1);
//Returns 305
MATCH (g2:someIndex{name:"name2"})
MATCH g2-[r2]-()
RETURN count(r2);
//Returns 2334
But when I try to run the query with 2 nodes together (i.e. get the total number of relationships for both g1 and g2), I seem to get a bizarre result.
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
For some reason, the number is much much greater than the total of 305+2334.
It seems like other Neo4j users have run into strange issues when using multiple MATCH clauses, so I read through Michael Hunger's explanation at https://groups.google.com/d/msg/neo4j/7ePLU8y93h8/8jpuopsFEFsJ, which advised Neo4j users to pipe the results of one match using WITH to avoid "identifier uniqueness". However, when I run the following query, it simply times out:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
I suspect this query doesn't return because there's a lot of records returned by r1. In this case, how would I operate my "get-number-of-relationships" query on 2 nodes? Am I just using some incorrect syntax, or is there some fundamental issue with the logic of my "2 node at a time" query?
Your first problem is that you are returning a Cartesian product when you do this:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
If there are 305 instances of r1 and 2334 instances of r2, you're returning (305 * 2334) == 711870 rows, and because you are summing this (count(r1)+count(r2)) you're getting a total of 711870 + 711870 == 1423740.
Your second problem is that you are not carrying over g2 in the WITH clause of this query:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
You match on g2 in the first MATCH clause, but then you leave it behind when you only carry over r1 in the WITH clause at line 3. Then, in line 4, when you match on g2-[r2]-() you are matching literally everything in your graph, because g2 has been unbound.
Let me walk through a solution with the movie dataset that ships with the Neo4j browser, as you have not provided sample data. Let's say I want to get the total count of relationships attached to Tom Hanks and Hugo Weaving.
As separate queries:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
RETURN COUNT(r)
=> 13
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN COUNT(r)
=> 5
If I try to do it your way, I'll get (13 * 5) * 2 == 90, which is incorrect:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(r1) + COUNT(r2)
=> 90
Again, this is because I've matched on all combinations of r1 and r2, of which there are 65 (13 * 5 == 65) and then summed this to arrive at a total of 90 (65 + 65 == 90).
The solution is to use DISTINCT:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2)
=> 18
Clearly, the DISTINCT modifier only counts the distinct instances of each entity.
You can also accomplish this with WITH if you wanted:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
WITH COUNT(r) AS r1
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN r1 + COUNT(r)
=> 18
TL;DR - Beware of Cartesian products. DISTINCT is your friend:
MATCH (:someIndex{name:"name1"})-[r1]-(),
(:someIndex{name:"name2"})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2);
The explosion of results you're seeing can be easily explained:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
In the 2nd line every combination of any relationship from g1 is combined with any relationship of g2, this explains the number since 1423740 = 305 * 2334 * 2. So you're evaluating basically a cross product here.
The right way to calculate the sum of all relationships for name1 and name2 is:
MATCH (g:someIndex)-[r]-()
WHERE g.name in ["name1", "name2"]
RETURN count(r)

Neo4j Cypher Count displaying nodes, not relationships pairs Or Union Sets

My query is:
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
RETURN n,p
The result is on the screenshot below.
How to count the total nodes?
I need 14 as a text result. Something like RETURN COUNT(n)+COUNT(p) but it shows 24.
The following request doesn't work correctly:
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
RETURN count(n)
Returns me 12, which is the number of relationships pairs as on the picture, not nodes.
MATCH (n)-[:NT]-(p)
WHERE ...some properties filters...
RETURN count(n)
Returns 24.
How to count toward that two nodes (in this example) that have outgoing ONLY arrows? Should be 14 at once.
UPD:
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN DISTINCT FILTER(x in n.myID WHERE NOT x in p.myID)
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN DISTINCT FILTER(x in p.myID WHERE NOT x in n.myID)
The COUNT of DISTINCT UNION of myID gives me the result.
I don't know how to make it with cypher.
Or the DISTINCT UNION of collections:
MATCH (n)-[:NT]->(p)
WHERE ...
RETURN collect(DISTINCT p.myID), collect(DISTINCT n.myID)
The result is:
collect(DISTINCT p.myID)
26375, 26400, 21636, 29939, 20454, 26543, 19089, 4483, 26607, 30375, 26608, 26605
collect(DISTINCT n.myID)
11977, 19478, 20454
Which is 15 items. One is common. If you UNION or DISTINCT the 20454 the total COUNT would be 14. The actual number of nodes on the picture.
I can not achieve this simple pattern.
Your original queries are working correctly.
If you want to get a count of distinct n nodes, your queries should RETURN COUNT(DISTINCT n).
To count the number of nodes that only have outgoing relationships:
MATCH (n)-->()
WHERE NOT ()-->(n)
COUNT(DISTINCT n);
To count the number of distinct nodes that are directly involved in an :NT relationship:
MATCH (n)-[:NT]-()
COUNT(DISTINCT n);
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
WITH collect(DISTINCT p.myID) AS set1
MATCH (n)-[:NT]->(p)
WHERE ...some properties filters...
WITH collect(DISTINCT n.myID) AS set2, set1
WITH set1 + set2 AS BOTH
UNWIND BOTH AS res
RETURN COUNT(DISTINCT res);

Resources