Cypher: Graph a nodes's connections 3 deep - neo4j

given a node, I want to use D3 to graph it, and it's neighborhood 3 deep.
The best strategy I can come up with is:
Query1: MATCH (n)-[r]-(m) WHERE id(n) IN [501] RETURN n, r, m
Then from the results, in my app, collect all of m's id's, put those new id's in the IN clause (remove the ones I've already done), and repeat the query.
Query2: MATCH (n)-[r]-(m) WHERE id(n) IN [502,511,1111] RETURN n, r, m
Query3: MATCH (n)-[r]-(m) WHERE id(n) IN [512,519,1116,1130] RETURN n, r, m
Note: we don't know the id's of the 2nd query till after the 1st, etc.
But this means running 3 query's, and lost of IO shuffling.
Is there a better better way to do this? I feel like I'm doing too much work in my app, when it should be done in cypher. I looked in the D3 examples, but didn't see this kind of query.
Thanks!
Mike

You can run following query or similar, but the concept should be easy to understand:
MATCH (n)-[r*1..3]-(m) WHERE id(n) IN [501] RETURN n, r, m
Where *1..3 means match from 1 to 3 relationships away from the n node
This way it is only a single query and should be significantly faster then running three separate queries

Does this work?
MATCH (n)-[r]-(m)-[s]-(o) WHERE id(n) IN [501] RETURN n, r, m, s, o

Does this work for you?
MATCH (n)-[r1]-(m1)-[r2]-(m2)-[r3]-(m3)
WHERE ID(n) = 501 AND
ID(m2) IN [502,511,1111] AND
ID(m3) IN [512,519,1116,1130]
RETURN n, r1, m1, r2, m2, r3, m3;

Why not just combine all 3:
MATCH (n)-[r]-(m)
WHERE id(n) IN [501]
WITH m
MATCH (m)-[s]-(o)
WITH o
MATCH (o)-[t]-(p)
RETURN o,t,p
The difference between this and the other answers is that this will revisit relationships especially because there is no direction specified, if that's what you want.

How about this: with "resultDataContents":["graph"]
MATCH path = (n)-[*..3]-(m)
WHERE id(n) IN [501]
RETURN path
or if you want to save bandwidth
MATCH path = (n)-[*..3]-(m)
WHERE id(n) IN [501]
RETURN [x in nodes(path) | id(x)] as node_ids, [x in rels(path) | id(x)] as rel_ids

Thanks for the answers, they didn't work out for me.
I did this:
Set idsTodo to the initial node.
Then ran this:
MATCH (n)-[r]->(m) WHERE id(n) IN {idsTodo} RETURN n, r, m
Added idsTodo to idsDone
Then added m.ids to a idsTodo, then subtracted idsDone
Then ran the query again, repeating 2 more times.
By the end, I had every node and it's relationships.

Related

How to eliminate a path where exists a relationship outside of the path, but between nodes in the path?

I have modified the 'vanilla' initial query in this console, and added one relationship type 'LOCKED' between the 'Morpheus' and 'Cypher' nodes.
How can I modify the existing (first-run) query, which is a variable length path so that it no longer reaches the Agent Smith node due to the additional Locked relationship I've added?
First-run query:
MATCH (n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
RETURN n AS Neo,r,m
I have tried this kind of thing:
MATCH p=(n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
AND none(rel IN rels(p) WHERE EXISTS (StartNode(rel)-[:LOCKED]->EndNode(rel)))
RETURN n AS Neo,r,m
..but it doesn't recognize the pattern inside the none() function.
I'm using Community 2.2.1
Thanks for reading
I'm pretty sure you can't use a function in a MATCHy type clause like that (though it's clever). What about this?
MATCH path=(neo:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
WHERE neo.name='Neo'
AND NOT('LOCKED' IN rels(path))
RETURN neo,r,m
EDIT:
Oops, looks like Dave might have beat me to the punch. Here's the solution I came up with anyway ;)
MATCH p=(neo:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE neo.name='Neo'
WITH p, neo, m
UNWIND rels(p) AS rel
MATCH (a)-[rel]->(b)
OPTIONAL MATCH a-[locked_rel:LOCKED]->b
WITH neo, m, collect(locked_rel) AS locked_rels
WHERE none(locked_rel IN locked_rels WHERE ()-[locked_rel]->())
RETURN neo, m
Ok, this is a little convoluted but i think it works. The approach is to take all of the paths and find the last known good nodes (ones that have LOCKED relationships leaving them). Then use that node(s) as a new ending point(s) and return the paths.
match p=(n:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
where n.name='Neo'
with n, relationships(p) as rels
unwind rels as r
with n
, case
when type(r) = 'LOCKED' then startNode(r)
else null
end as last_good_node
with n
, (collect( distinct last_good_node)) as last_good_nodes
unwind last_good_nodes as g
match p=n-[r:KNOWS|LOVES*]->g
return p
I think this would be simpler if there was a locked: true property on the KNOWS and LOVES relationships.

Neo4j : how to match nodes that have a common value in a property array

I have nodes with an "id" property array:
node 1: {id:[1,2,3]}
node 2: {id:[3,4,5]}
node 4: {id:[6,7,8]}
I want a query to match the node pairs that have at least one common value in the ID property array; for example the query I'm looking for would return only node 1, node 2 (they have the value "3" in common).
I've tried this, but it didn't work for me:
MATCH (n), (m) where FILTER(x IN n.id WHERE x IN m.id) return n,m;
Thanks!
Actually, your original query should have returned some results.
Here is an improved version of that query:
MATCH (n), (m)
WHERE ID(n) < ID(m) AND ANY(x IN n.id WHERE x IN m.id)
RETURN n, m;
It avoids duplicate results by ordering the nodes by ID.
It use the ANY function, which exits as soon as a match is found.
See this console.
This is a bit convoluted, but it seems to work:
MATCH n, m
WHERE ID(n)< ID(m)
WITH n, n.id AS n_ids, m, m.id AS m_ids
UNWIND n_ids AS n_id
UNWIND m_ids AS m_id
WITH n, m, n_id, m_id
WHERE n_id = m_id
RETURN n, m
If that doesn't make sense to you, I'd suggest you try changing each WITH to a RETURN and removing everything afterwards to see the results at each step.
EDIT: You can also make this a bit shorter thusly:
MATCH n, m
WHERE ID(n)< ID(m)
WITH n, n.id AS n_ids, m, m.id AS m_ids
UNWIND n_ids AS n_id
WITH n, m, n_id, m_ids
WHERE n_id IN m_ids
RETURN n, m
(You might need a DISTINCT in there at the end for a larger dataset)

Using Match with Multiple Clauses Causes Odd Results

I am writing a Cypher query in Neo4j 2.0.4 that attempts to get the total number of inbound and outbound relationships for a selected node. I can do this easily when I only use this query one-node-at-a-time, like so:
MATCH (g1:someIndex{name:"name1"})
MATCH g1-[r1]-()
RETURN count(r1);
//Returns 305
MATCH (g2:someIndex{name:"name2"})
MATCH g2-[r2]-()
RETURN count(r2);
//Returns 2334
But when I try to run the query with 2 nodes together (i.e. get the total number of relationships for both g1 and g2), I seem to get a bizarre result.
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
For some reason, the number is much much greater than the total of 305+2334.
It seems like other Neo4j users have run into strange issues when using multiple MATCH clauses, so I read through Michael Hunger's explanation at https://groups.google.com/d/msg/neo4j/7ePLU8y93h8/8jpuopsFEFsJ, which advised Neo4j users to pipe the results of one match using WITH to avoid "identifier uniqueness". However, when I run the following query, it simply times out:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
I suspect this query doesn't return because there's a lot of records returned by r1. In this case, how would I operate my "get-number-of-relationships" query on 2 nodes? Am I just using some incorrect syntax, or is there some fundamental issue with the logic of my "2 node at a time" query?
Your first problem is that you are returning a Cartesian product when you do this:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
If there are 305 instances of r1 and 2334 instances of r2, you're returning (305 * 2334) == 711870 rows, and because you are summing this (count(r1)+count(r2)) you're getting a total of 711870 + 711870 == 1423740.
Your second problem is that you are not carrying over g2 in the WITH clause of this query:
MATCH (g1:gene{name:"SV422_HUMAN"}),(g2:gene{name:"BRCA1_HUMAN"})
MATCH g1-[r1]-()
WITH r1
MATCH g2-[r2]-()
RETURN count(r1)+count(r2);
You match on g2 in the first MATCH clause, but then you leave it behind when you only carry over r1 in the WITH clause at line 3. Then, in line 4, when you match on g2-[r2]-() you are matching literally everything in your graph, because g2 has been unbound.
Let me walk through a solution with the movie dataset that ships with the Neo4j browser, as you have not provided sample data. Let's say I want to get the total count of relationships attached to Tom Hanks and Hugo Weaving.
As separate queries:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
RETURN COUNT(r)
=> 13
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN COUNT(r)
=> 5
If I try to do it your way, I'll get (13 * 5) * 2 == 90, which is incorrect:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(r1) + COUNT(r2)
=> 90
Again, this is because I've matched on all combinations of r1 and r2, of which there are 65 (13 * 5 == 65) and then summed this to arrive at a total of 90 (65 + 65 == 90).
The solution is to use DISTINCT:
MATCH (:Person {name:'Tom Hanks'})-[r1]-(),
(:Person {name:'Hugo Weaving'})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2)
=> 18
Clearly, the DISTINCT modifier only counts the distinct instances of each entity.
You can also accomplish this with WITH if you wanted:
MATCH (:Person {name:'Tom Hanks'})-[r]-()
WITH COUNT(r) AS r1
MATCH (:Person {name:'Hugo Weaving'})-[r]-()
RETURN r1 + COUNT(r)
=> 18
TL;DR - Beware of Cartesian products. DISTINCT is your friend:
MATCH (:someIndex{name:"name1"})-[r1]-(),
(:someIndex{name:"name2"})-[r2]-()
RETURN COUNT(DISTINCT r1) + COUNT(DISTINCT r2);
The explosion of results you're seeing can be easily explained:
MATCH (g1:someIndex{name:"name1"}), (g2:someIndex{name:"name2"})
MATCH g1-[r1]-(), g2-[r2]-()
RETURN count(r1)+count(r2);
//Returns 1423740
In the 2nd line every combination of any relationship from g1 is combined with any relationship of g2, this explains the number since 1423740 = 305 * 2334 * 2. So you're evaluating basically a cross product here.
The right way to calculate the sum of all relationships for name1 and name2 is:
MATCH (g:someIndex)-[r]-()
WHERE g.name in ["name1", "name2"]
RETURN count(r)

Count node depth in neo4j

I have this query in Neo4j:
MATCH (sentence:Sentence)-[r*]->(n:Word )
WITH n, COUNT(r) AS c
RETURN n, c
My graph is a linguistic database containing words and dependency relations between them.
This query should return depth of nodes, however the COUNT(r) always returns 1.
When I ommit the COUNT function and write just
WITH n, r AS c
instead (trying in web browser neo4j interface), neo4j returns multiple relations for each word node "n" as expected.
Can you please help me what am I doing wrong, how to count the length of path between sentence node and word node? thanks.
I think it query return n and c and there are multiple record of n so count(r) return 1.
Try this -
MATCH (sentence:Sentence)-[r*]->(n:Word )
WITH n, LENGTH(r) AS depth
RETURN n, depth
You will get depth like this.
Or Try this
MATCH p= (sentence:Sentence)-->(n:Word)
RETURN n, length(p) as depth
http://docs.neo4j.org/chunked/stable/query-functions-scalar.html#functions-length
Finally found the solution myself - it is cypher's LENGTH function:
MATCH (sentence:Sentence)-[r*]->(n:Word )
WITH n, LENGTH(r) AS c
RETURN n, c
found in this useful cheat sheet: http://assets.neo4j.org/download/Neo4j_CheatSheet_v3.pdf
In version 4.x, U should use SIZE function
MATCH (sentence:Sentence)-[r*]->(n:Word )
WITH n, SIZE(r) AS depth
RETURN n, depth
https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-size

Query nodes from multiple specific paths in cypher

I'm playing with cypher and I have some simple aggregation going on for me.
MATCH (p:Person)-[:HAS_CAR]->(n:Car)
RETURN n, count(p)
MATCH (p:Person)-[:HAS_APARTMENT]->(n:Apartment)
RETURN n, count(p)
MATCH (p:Person)-[:HAS_HOUSE]->(n:House)
RETURN n, count(p)
The problem is that I have to make 3 trips to the database to get all those results together. The problematic thing about that is that those queries are the last MATCH statement in a much bigger chain. Like this:
MATCH (:City { Id: 10})<-[:LIVES_IN]-(p:Person)
WITH p
MATCH ...
WITH p
MATCH ...
WITH p
MATCH ...
WITH p
MATCH ...
WITH p
MATCH p-[:HAS_CAR]->(n:Car)
RETURN n, count(p)
After all those MATCH ... WITH statements, only a few person nodes are matched so the last part of the query is very fast, but the initial part is not. I can't help but think that this could be improved because all three queries share a lot of statements.
I came up with this:
...
MATCH p-[:HAS_CAR|HAS_APARTMENT|HAS_HOUSE]->(n)
RETURN n, labels(n), count(p)
And I can work with that. But what if I wanted to mix in something like this:
MATCH p-[:KNOWS]->(:Person)-[:HAS_BIKE]->(n:Bike)
RETURN n, count(p)
Or even:
MATCH p-[:KNOWS]->(:Person)-[:HAS_BIKE|HAS_BOAT]->(n)
RETURN n, labels(n), count(p)
Can all of this be done in a single query and how?
Sometimes you need to use collections instead of rows to merge aggregation queries together and pass them along. This strategy might help... For example:
MATCH (p:Person)-[:HAS_CAR]->(car:Car)
WITH car, count(p) carCount
WITH collect({car:car, count:carCount}) as carCounts
MATCH (p:Person)-[:HAS_APARTMENT]->(n:Apartment)
WITH n, count(p) as apartmentCount, carCounts
RETURN collect({apartment:n, count:apartmentCount}) as apartmentCounts, carCounts
Update (see comments)--this lets you pass along the results of a filter and do a quick id lookup to find them again:
MATCH (p:Person)
WHERE p.name = "John" // or whatever else you need to filter on
WITH collect(id(p)) as pids
MATCH (p)-[:HAS_CAR]->(car:Car)
WHERE id(p) IN pids
WITH car, count(p) carCount, pids
WITH collect({car:car, count:carCount}) as carCounts, pids
MATCH (p)-[:HAS_APARTMENT]->(n:Apartment)
WHERE id(p) IN pids
WITH n, count(p) as apartmentCount, carCounts
RETURN collect({apartment:n, count:apartmentCount}) as apartmentCounts, carCounts

Resources