Getting number of unique paths traversed while traversing using Neo4j traversal API - neo4j

I have created following basic graph:
CREATE (:NodeType1 {prop1:'value1'})-[:RelType1 {prop2:'value2'}]->(:NodeType2 {prop3:'value3'})-[:RelType2 {prop4:'value4'}]->(:NodeType3 {prop5:'value5'})-[:RelType3 {prop6:'value6'}]->(:NodeType4 {prop7:'value7'})
CREATE (:NodeType1 {prop1:'value8'})-[:RelType1 {prop2:'value9'}]->(:NodeType2 {prop3:'value10'})-[:RelType2 {prop4:'value11'}]->(:NodeType3 {prop5:'value12'})-[:RelType3 {prop6:'value13'}]->(:NodeType4 {prop7:'value14'})
MATCH path=(n:NodeType1 {prop1:'value1'})-[*]->(m:NodeType4 {prop7:'value7'})
CREATE (n)-[:RelType1 {prop2:'value15'}]->(:NodeType2 {prop3:'value16'})-[:RelType2 {prop4:'value17'}]->(:NodeType3 {prop5:'value18'})-[:RelType3 {prop6:'value19'}]->(m)
The graph looks like this:
When I run following cypher:
MATCH path=(a:NodeType1 {prop1:"value1"})-[:RelType1]->(b)-[:RelType2]->(c)-[:RelType3]->(d)
RETURN count(nodes(path))
I get 2 as the output. It seems that nodes() doesnt actually return the number nodes in the path but simply the number of rows in the returned result, since if I return the path:
MATCH path=(a:NodeType1 {prop1:"value1"})-[:RelType1]->(b)-[:RelType2]->(c)-[:RelType3]->(d)
RETURN path
I get two rows in the returned result:
Now I am guessing how can I get the same output when doing traversal using Neo4J traversal API. I get number of unique nodes in the cypher as follows:
MATCH path=(a:NodeType1 {prop1:"value1"})-[:RelType1]->(b)-[:RelType2]->(c)-[:RelType3]->(d)
RETURN size(collect(distinct a))+size(collect(distinct b))+size(collect(distinct c))+size(collect(distinct d))
Above query correctly returns 6.
Same can be done in the traversal API by having static counter inside path expander which is incremented each time expand() is called. (Is there any better approach for this?)
public class MyPathExpander implements PathExpander{
static int nodeCount = 0;
#Override
public Iterable expand(Path path, BranchState state) {
Node lastNode = path.endNode();
nodeCount++; //**increment the count of nodes visited
if(lastNode.hasLabel(MyLabels.NodeType1))
return lastNode.getRelationships(MyRelations.RelType1, Direction.OUTGOING);
else if (lastNode.hasRelationship(MyRelations.RelType1, Direction.INCOMING))
return lastNode.getRelationships(MyRelations.RelType2, Direction.OUTGOING);
else if (lastNode.hasRelationship(MyRelations.RelType2, Direction.INCOMING))
return lastNode.getRelationships(MyRelations.RelType3, Direction.OUTGOING);
else if (lastNode.hasRelationship(MyRelations.RelType3, Direction.INCOMING))
return null;
return null;
}
}
However I am not able to think of way which will tell me number of unique paths followed during traversal while using Traversal API (equivalent to above RETURN count(nodes(path))). How can I do this? Is it not possible with traversal API?
PS: By unique path I mean unique permutations of order of nodes visited while traversing. For example, all a-b-c, a-c-b and a-b-c-d are unique.

This query not return count of unique path, but count of all paths return by query:
MATCH path=(a:NodeType1 {prop1:"value1"})-[:RelType1]->(b)
-[:RelType2]->(c)-[:RelType3]->(d)
RETURN count(nodes(path))
If you want to count the number of unique nodes in the query, you can do so:
MATCH path=(a:NodeType1 {prop1:"value1"})-[:RelType1]->(b)
-[:RelType2]->(c)-[:RelType3]->(d)
UNWIND nodes(path) AS N
RETURN count(distinct N)
One way to count the number of unique paths will be (for each path calculates its unique fingerprint, it will not be difficult to repeat by traversal API):
MATCH path=(a:NodeType1 {prop1:"value1"})-[:RelType1]->(b)
-[:RelType2]->(c)-[:RelType3]->(d)
WITH path,
REDUCE(acc="", x in nodes(path) | acc + id(x)) +
REDUCE(acc="", x in rels(path) | acc + id(x)) as uniID
RETURN count(distinct uniID)

Related

How Many Nodes Are Involved in a Match

How can I know how many nodes and edges are involved in a MATCH? Is there another way besides Explain / Profile Match?
If you mean how many nodes are matched in a path, such as a variable-length path, then you can assign a path variable for this:
MATCH p = (k:Person {name:'Keanu Reeves'})-[*..8]-(t:Person {name:'Tom Hanks'})
WITH p LIMIT 1
RETURN p, length(p) as pathLength, length(p) + 1 as numberOfNodesInPath
You can also use nodes(p) and relationships(p) to get the collection of nodes and relationships that make up the path, and you can use size() on those collections to get their size.
There exists the COUNT() function of Cypher that allows you to count the number of elements. As for example in this query:
MATCH (n)
RETURN COUNT(n);
This query will count all nodes in your database.
You can find more information in the cypher manual, under the aggregating functions. Check it out.
The following Cypher snippet should return the number of distinct nodes and relationships found by any given MATCH clause. Just replace <your code here> with your MATCH pattern.
MATCH <your code here>
WITH COLLECT(NODES(p)) AS ns, SUM(SIZE(RELATIONSHIPS(p))) AS relCount
UNWIND ns AS nodeList
UNWIND nodeList AS node
RETURN COUNT(DISTINCT node) AS nodeCount, relCount;

neo4j how to use count(distinct()) over the nodes of path

I search the longest path of my graph and I want to count the number of distinct nodes of this longest path.
I want to use count(distinct())
I tried two queries.
First is
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return nodes(p1)
The query result is a graph with the path nodes.
But if I tried the query
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return count(distinct(primero))
The result is
count(distinct(primero))
2
How can I use count(distinct()) over the node primero.
Node Primero has a field called id.
You should bind at least one of those nodes, add a direction and also consider a path-limit otherwise this is an extremely expensive query.
match p=(primero)-[:ResponseTo*..30]-(segundo)
with p order by length(p) desc limit 1
unwind nodes(p) as n
return distinct n;

Get nodes from LENGTH(r)

This code
MATCH (n { name: 'Create node' })<-[r*]-(s { name: ';' })
WITH n,s, LENGTH(r) AS depth
RETURN n,s, depth
will return the number of relationships between first and last nodes. Is it possible to get the nodes that are in between those relationships?
Bonus question: is it possible to get them in order?
http://console.neo4j.org/r/z1iafh
(this code does not work in console, only on localhost. query to create nodes
create
(_0 {name:"CREATE"}),
(_1 {name:"("}),
(_2 {name:"node_name"}),
(_3 {name:")"}),
(_4 {name:";"}),
_1-[:CREATE_NODE_COMMAND]->_0,
_2-[:CREATE_NODE_COMMAND]->_1,
_3-[:CREATE_NODE_COMMAND]->_2,
_4-[:CREATE_NODE_COMMAND]->_3
)
You could augment your match statement to match the entire path from n to s and then you could use the nodes function on the path to return the collection of nodes in order (from n to s). If you want just the nodes between the start and end nodes you could return the collection form the second to the second last only.
MATCH p=(n { name: 'Create node' })<-[r*]-(s { name: ';' })
WITH n,s, size(r) as depth, length(p) as depth2, nodes(p) as nodes
RETURN n,s, depth, depth2, nodes[1..length(nodes)-1]
size() can be used to return the number of elements in a collection whereas length() should only be used to return the length of a path or a string. Its use on other objects (collections and patterns) may be deprecated in future neo4j versions; currently supported for backwards compatibility.

How does count(nodes(p)) work in Cypher, Neo4j

I'm looking for an explanation of how this works and why doesn't return the number of nodes in a path. Suppose I matched a path p. Now:
WITH p, count(nodes(p)) AS L1 RETURN L1
returns 1.
When this is clear, how do I count paths nodes properly?
count() is an aggregate function. When using any aggregate function, result rows will be grouped by whatever is included in the RETURN clause and not an aggregate function. In this case, result rows will be grouped by p and the return value will be count(nodes(p)).
nodes(p) returns an array of nodes, so count(nodes(p)) will return the count of arrays and will always equal 1.
In order return the amount of nodes in the path you should use size(nodes(p)).
If you're just interested in the length of a path and not particularly in the nodes that are included in it, I would encourage you to use length(p). This will return the length in rels for a given path, without having to manipulate/access the nodes.

How to find out connection between a set of nodes?

I have a scenario where I know IDs of a list of nodes.
I need to get connection(if exists) between these nodes given their IDs.
Is there any way to achieve this?
Update:
I am using node id property not the neo4j's internal ID(using like match (n:Person{id:3}))
You can use the IN clause to select from a list of values:
MATCH (n)-[r*..2]-(m)
WHERE ID(n) IN [0,1,2] AND ID(m) IN [2,3,4]
RETURN r
I've limited the path length to 2 hops of indeterminate relationship type here, and arbitrarily picked some IDs.
To return the path instead:
MATCH p=(n)-[r*..2]-(m)
WHERE ID(n) IN [0,1,2] AND ID(m) IN [2,3,4]
RETURN p
START n=node(1,2,3,4,5,6) //your IDs of a list of nodes
MATCH p=n-[r]-m //the connection for 1 hop. for multiple hops do n-[r*]-m
WHERE Id(m) in [1,2,3,4,5,6] //your IDs of a list of nodes
RETURN p

Resources