Neo4j Cypher find loop from specific node - neo4j

I want to find all loops that originate and terminate with a specific node in a Neo4j database. I tried:
START n=node:Event(time=",timestamp,")
MATCH p=(n)-[:LINKED_TO*1..5]->(n)
WHERE NONE (n IN nodes(p) WHERE size(filter(x IN nodes(p) WHERE n = x))> 2)
RETURN p, length(p)
This is the best I can mashup from what is on the web. There are two things I don't like about this:
1. it crashes
2. the count threshold must be ">2" to allow for the start+termination node. That means that loops that visit the same intermediate node twice will be included, which I wish was not the case.
I'm not interested in the shortest path. I want to know all loops that return to my starting node.
Thank you in advance!

This query should return all loops that start and end at the specified node and have no other repeated nodes:
START n=node:Event(time=",timestamp,")
MATCH p=(n)-[:LINKED_TO*1..5]->(n)
UNWIND TAIL(NODES(p)) AS m
WITH p, COUNT(DISTINCT m) AS cm
WHERE LENGTH(p)-1 = cm
RETURN p, LENGTH(p);

Thank you, cybersam! That was helpful. As typed, it gave a few errors and warned me that "START" is deprecated. I found the following modifications worked:
MATCH (n:Event{time:1458238060505007})
MATCH p=(n)-[:LINKED_TO*1..5]->(n)
UNWIND TAIL(NODES(p)) AS m WITH p RETURN p
The only problem with this is that it appears to give all paths that go through the desired start node, n. Is that true? If so, is there a way to correct this?

This what finally worked for me. It is very close to what cybersam suggested. Apologies for doing this "the wrong way". I'm sure cybersam will yell at me, again, but adding code via Comment is not very easy to read.
MATCH p=(n:Event{time:",timestamp,"})-[:LINKED_TO*1..5]->(n)
UNWIND TAIL (NODES(p)) AS m
WITH p,COUNT(DISTINCT m) AS cm
WHERE LENGTH(p) = cm
RETURN p
As I noted earlier, one sticking point was the use of "START", which is deprecated and causes errors (for example, when using RNeo4j in R, which I'm using). The new way appears to be to use MATCH and specify your starting node in the path pattern. The other confusing thing for me was the use of "LENGTH(p)-1" instead of "LENGTH(p)". For one node connecting to another, the path has a length of 2, not 3 and there are only 2 distinct nodes. For my application, "LENGTH(p)=cm" worked.
Finally, if you want the nodes in the paths, do NOT try to use "WITH m,..." because this messes up the "COUNT(DISTINCT(m))" computation for some reason that I do not understand.

Related

NEO4j Cypher query to return paths with a condition on the number of connected nodes

I would like to clean up my graph database a bit by removing unnecessary nodes. In one case, unnecessary nodes are nodes B between nodes A and C where B has the same name as node C and NO OTHER incoming relationships. I am having trouble coming up with a Cypher query that restricts the number of incoming edges.
The first part was easy:
MATCH (n1:TypeA)<-[r1:Inside]-(n2:TypeB)<-[r2:Inside]-(n3:TypeC)
WHERE n2.name = n3.name
Based on other SE questions (especially this one) I then tried doing something like:
WITH n2, collect(r2) as rr
WHERE length(rr) = 1
RETURN n2
but this also returned nodes with more than one incoming edge. It seems my WHERE clause on the length is not filtering the returned n2 nodes. I tried a few other things I found online, but they either returned nothing or were no
longer syntactically correct in the current version.
After I find the n2 nodes that match the pattern, I'll want to connect n3 directly to n1 and DETACH DELETE n2. Again, I was easily able to do that part when I didn't need the restriction on the number of incoming edges to n2. That previous question has FOREACH (r IN rr | DELETE r), but I want to detach delete the n2 nodes, not just those edges. I don't know how to correctly adapt this to operating on the nodes attached to the rs and I certainly want to be sure it's finding the correct nodes before deleting anything since Neo4j lacks basic undo functionality (but you can't put a RETURN command inside a FOREACH for some crazy reason).
How do I filter nodes on a path by the number of incoming edges using Cypher?
I think I can do this in py2neo by first collecting all the n1,n2,n3 triples matching the pattern, then going through each returned record and add them to a list if n2 has only one incoming edge. Then go through that list and perform the trimming operation, but if this can be done in pure Cypher, then I'd like to know how because I have a number of similar adjustments to make.
You need to pass along path in your WITH statement.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH path, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
RETURN path
Or shorter like this:
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
AND size((n2)<-[:PARTOF]-()) = 1
RETURN path
Borrowing some insight from this answer I came up with a kludge that seems to work.
MATCH path = (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
WHERE n2.name = n3.name
WITH n2, size((n2)<-[:PARTOF]-()) as degree
WHERE degree = 1
MATCH (n1:Ward)<-[r1:PARTOF]-(n2:Unknown)<-[r2:PARTOF]-(n3:Chome)
RETURN n1,n2,n3
I expect that not all parts of that are needed and it isn't an efficient solution, but I don't know enough yet to improve upon it.
For example, I define the first line as path, but I can't use MATCH path in the penultimate line, and I have no idea why.
Also, if I write WITH size((n2)<-[:PARTOF]-()) as degree (dropping the n2, after WITH) it returns not only the n2 with degree > 1, but also all the nodes connected to them (even more than the n3 nodes). I don't know why it behaves like this and the Neo4j documentation for WITH has no examples or description to help me understand why the n2 is necessary here. Any improvements to my Cypher query or explanation of how or why it works would be greatly appreciated.

Getting first root of a leaf in neo4j

I have follwing simple graph:
CREATE (:leaf)<-[:rel]-(:nonleaf)<-[:rel]-(:nonleaf)<-[:rel]-(:nonleaf)<-[:rel]-(r1:nonleaf{root:true})<-[:rel]-(r2:nonleaf{root:true})
I want to get the first ancestor starting from (:leaf) with root:true set on it. That is I want to get r1. For this I wrote following cypher:
MATCH (:leaf)<-[*]-(m)<-[*]-(r:nonleaf{root:true}) WHERE m.root<>true OR NOT exists(m.root)
RETURN r
But it returned both (r1) and (r2). Same happened for following cypher:
MATCH shortestPath((l:leaf)<-[*]-(r:nonleaf{root:true}))
RETURN r
Whats going on here?
Update
Ok after thinking more, it clicked to my mind that (r2) is also returned because on path from (:leaf) to (r2), there are nodes with no root property set on them (this should have clicked to me earlier, pretty much obvious, subtle interpretation mistake). In other words, it returns (:nonleaf{root:true}) if "for at least one m" following condition is true: m.root<>true OR NOT exists(m.root). The requirement here is that the condition should be valid for "all ms" on the path, "not at least one m". Now it remains to figure out how to put this in cypher and my grip on cypher isnt that tight...
You can enforce that there is a single root node on the matched path with the help of the single() predicate function:
MATCH p=(:leaf)<-[*]-(r:nonleaf{root:true})
WHERE SINGLE(m IN nodes(p) WHERE exists(m.root) AND m.root=true )
RETURN r
You just need to adjust your where condition a little so that it says "and the :nonleaf node before the root :nonleaf node matched is not itself marked as a root node". I think this will satisfy your needs.
MATCH (l:leaf)<-[*]-(r:nonleaf {root: true})
WHERE NOT (:nonleaf {root: true})<--(r)
RETURN r
UPDATED
Reading the updated example in the comments, I thought of another way to solve your problem using the APOC procedure apoc.path.expandConfig.
It does require a slight change to your data. Each root: true node should have a :root label set on it. Here is an update statement...
MATCH (n:nonleaf {root:true})
SET n:root
RETURN n
And here is the updated query
MATCH (leaf:leaf {name: 'leaf'})
WITH leaf
CALL apoc.path.expandConfig(leaf, {relationshipFilter:'<rel',labelFilter:'/root'} ) yield path
RETURN last(nodes(path))

Optimizing Cypher Query Neo4j

I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;

How to eliminate a path where exists a relationship outside of the path, but between nodes in the path?

I have modified the 'vanilla' initial query in this console, and added one relationship type 'LOCKED' between the 'Morpheus' and 'Cypher' nodes.
How can I modify the existing (first-run) query, which is a variable length path so that it no longer reaches the Agent Smith node due to the additional Locked relationship I've added?
First-run query:
MATCH (n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
RETURN n AS Neo,r,m
I have tried this kind of thing:
MATCH p=(n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
AND none(rel IN rels(p) WHERE EXISTS (StartNode(rel)-[:LOCKED]->EndNode(rel)))
RETURN n AS Neo,r,m
..but it doesn't recognize the pattern inside the none() function.
I'm using Community 2.2.1
Thanks for reading
I'm pretty sure you can't use a function in a MATCHy type clause like that (though it's clever). What about this?
MATCH path=(neo:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
WHERE neo.name='Neo'
AND NOT('LOCKED' IN rels(path))
RETURN neo,r,m
EDIT:
Oops, looks like Dave might have beat me to the punch. Here's the solution I came up with anyway ;)
MATCH p=(neo:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE neo.name='Neo'
WITH p, neo, m
UNWIND rels(p) AS rel
MATCH (a)-[rel]->(b)
OPTIONAL MATCH a-[locked_rel:LOCKED]->b
WITH neo, m, collect(locked_rel) AS locked_rels
WHERE none(locked_rel IN locked_rels WHERE ()-[locked_rel]->())
RETURN neo, m
Ok, this is a little convoluted but i think it works. The approach is to take all of the paths and find the last known good nodes (ones that have LOCKED relationships leaving them). Then use that node(s) as a new ending point(s) and return the paths.
match p=(n:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
where n.name='Neo'
with n, relationships(p) as rels
unwind rels as r
with n
, case
when type(r) = 'LOCKED' then startNode(r)
else null
end as last_good_node
with n
, (collect( distinct last_good_node)) as last_good_nodes
unwind last_good_nodes as g
match p=n-[r:KNOWS|LOVES*]->g
return p
I think this would be simpler if there was a locked: true property on the KNOWS and LOVES relationships.

cypher query to get middle objects in traversal and shortest path

Can't seem to figure out, and I'm not sure it entirely possible. I have a graph like so
a-[:granted]->b-[:granted]->...x-[granted_source]>s
where b and x are of interest. While I already know a and s, the end points, which are defined in START clause.
Note that b and c could be one ( a->b->s ) or more then one ( a->b->c->x->s ) and the goal is to find the shortest path returning only the nodes that are pointed to by a 'granted' relationship.
The closest I've got is:
start s=node(21), p=node(2)
match paths=shortestPath(p-[:granted|granted_source*]->s)
return NODES(paths)
Which gives all the nodes, including start (p) and end (s). But I can't seem to filter out, or better would be to not return them at all, only the nodes that are pointed to by a granted relationship and in the order from (s) if possible. I'm on Neo4j 2.0b and I'm wondering if Labels, which I have no issue using, would be the better way to go? Any help would be very appreciated.
So, you want to chop the head and tail off of a collection of nodes? (Am I understanding that right?) How about:
start s=node(21), p=node(2)
match paths=shortestPath(p-[:granted|granted_source*]->s)
return NODES(paths)[1..-1]
I think I resolved it using a WITH, I think this is probably the best performance given that first the p-... are fetched, then all ...->s are fetched and then using the shortestPath() is used to get the 'in between' nodes. The results appear correct.
start s=node(21), p=node(2)
match p-[:granted]-x, y-[:granted_source]->s
with x, y
match paths=shortestPath(x-[:granted*]->y)
return NODES(paths)

Resources