Match Only Full Paths in Neo4J with Cypher (not sub-paths)

Match Only Full Paths in Neo4J with Cypher (not sub-paths) - neo4j

If I have a graph like the following (where the nesting could go on for an arbitrary number of nodes):
(a)-[:KNOWS]->(b)-[:KNOWS]->(c)-[:KNOWS]->(d)-[:KNOWS]->(e)
| |
| (i)-[:KNOWS]->(j)
|
(f)-[:KNOWS]->(g)-[:KNOWS]->(h)-[:KNOWS]->(n)
|
(k)-[:KNOWS]->(l)-[:KNOWS]->(m)
How can I retrieve all of the full-length paths (in this case, from (a)-->(m), (a)-->(n) (a)-->(j) and (a)-->(e)? The query should also be able to return the nodes with no relationships of the given type.
So far I am just doing the following (I only want the id property):
MATCH path=(a)-[:KNOWS*]->(b)
RETURN collect(extract(n in nodes(path) | n.id)) as paths
I need the paths so that in the programming language (in this case clojure) I can create a nested map like this:
{"a" {"b" {"f" {"g" {"k" {"l" {"m" nil}}
"h" {"n" nil}}}
"c" {"d" {"e" nil}
"i" {"j" nil}}}}}
Is it possible to generate the map directly with the query?

Just had to do something similar, this worked on your example, finds all nodes which do not have outgoing [:KNOWS]:
match p=(a:Node {name:'a'})-[:KNOWS*]->(b:Node)
optional match (b)-[v:KNOWS]->()
with p,v
where v IS NULL
return collect(extract(n in nodes(p) | n.id)) as paths

Here is one query that will get you started. This query will return just the longest chain of nodes when there is a single chain without forks. It matches all of the paths like yours does but only returns the longest one by using limit to reduce the result.
MATCH p=(a:Node {name:'a'})-[:KNOWS*]->(:Node)
WITH length(p) AS size, p
ORDER BY size DESC
LIMIT 1
RETURN p AS Longest_Path
I think this gets the second part of your question where there are multiple paths. It looks for paths where the last node does not have an outbound :KNOWS relationship and where the starting node does not have an inbound :KNOWS relationship.
MATCH p=(a:Node {name:'a'})-[:KNOWS*]->(x:Node)
WHERE NOT x-[:KNOWS]->()
AND NOT ()-[:KNOWS]->(a)
WITH length(p) AS size, p
ORDER BY size DESC
RETURN reduce(node_ids = [], n IN nodes(p) | node_ids + [id(n)])

Related

How Many Nodes Are Involved in a Match

How can I know how many nodes and edges are involved in a MATCH? Is there another way besides Explain / Profile Match?

If you mean how many nodes are matched in a path, such as a variable-length path, then you can assign a path variable for this:
MATCH p = (k:Person {name:'Keanu Reeves'})-[*..8]-(t:Person {name:'Tom Hanks'})
WITH p LIMIT 1
RETURN p, length(p) as pathLength, length(p) + 1 as numberOfNodesInPath
You can also use nodes(p) and relationships(p) to get the collection of nodes and relationships that make up the path, and you can use size() on those collections to get their size.

There exists the COUNT() function of Cypher that allows you to count the number of elements. As for example in this query:
MATCH (n)
RETURN COUNT(n);
This query will count all nodes in your database.
You can find more information in the cypher manual, under the aggregating functions. Check it out.

The following Cypher snippet should return the number of distinct nodes and relationships found by any given MATCH clause. Just replace <your code here> with your MATCH pattern.
MATCH <your code here>
WITH COLLECT(NODES(p)) AS ns, SUM(SIZE(RELATIONSHIPS(p))) AS relCount
UNWIND ns AS nodeList
UNWIND nodeList AS node
RETURN COUNT(DISTINCT node) AS nodeCount, relCount;

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

Background
Hi all, I am currently trying to write a cypher statement that allows me to find a set of paths on a map from a starting point. I want my search result to always return connecting streets within 5 nodes. Optionally, if there's a nearby hospital, I would like my search pattern to also indicate nearby hospitals.
Main Problem
Because there isn't always a nearby hospital to the current street, sometimes my optional match search pattern comes back as null. Here's the current cypher statement I'm using:
MATCH path=(a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WHERE ALL (x IN nodes(path) WHERE (x:Street))
WITH DISTINCT nodes(path) + nodes(optionalPath) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
However, this syntax only works if optionalPath contains nodes. If it doesn't, the statement nodes(path) + nodes(optionalPath) is an operation adding null and I get no records. This is true even the nodes(path) term does contain nodes.
What's the best way to get around this problem?

You can use COALESCE to replace a NULL with some other value. For example:
MATCH path=(:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
WHERE ALL (x IN nodes(path) WHERE x:Street)
OPTIONAL MATCH optionalPath=(b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodes(path) + COALESCE(nodes(optionalPath), []) as n
UNWIND n as nodes
RETURN DISTINCT nodes;
I have also made a few other improvements:
The WHERE clause was moved up right after the first MATCH. This eliminates the unwanted path values immediately. Your original query would get all path values (even unwanted ones) and always the perform the second MATCH query, and only eliminate unwanted paths afterwards. (But, it is actually not clear if you even need the WHERE clause at all; for example, if the CONNECTED_TO relationship is only used between Street nodes.)
The DISTINCT in your WITH clause would have prevented duplicate n collections, but the collections internally could have had duplicate paths. This was probably not what you wanted.

It seems you don't really want the path, just all the street nodes within 5 steps, plus any connected hospitals. So I would simplify your query to just that, and then condense the 3 columns down to 1.
MATCH (a:Street {id: 123})-[:CONNECTED_TO*..5]-(b:Street)
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH collect(a) + collect(b) + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
If Streets can be indirectly connected (hospital in between), Than I'd adjust like this
MATCH (a:Street {id: 123})-[:CONNECTED_TO]-(b:Street)
WITH a as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
MATCH (a)-[:CONNECTED_TO]-(b:Street)
WITH nodez+collect(b) as nodez, b as a
OPTIONAL MATCH (b)-[:CONNECTED_TO]->(hospital:Hospital)
WITH nodez + collect(hospital) as n
UNWIND n as nodez
RETURN DISTINCT nodez;
It's a bit more verbose, but just says exactly what you want (and also adds the start node to the hospital check list)

neo4j how to use count(distinct()) over the nodes of path

I search the longest path of my graph and I want to count the number of distinct nodes of this longest path.
I want to use count(distinct())
I tried two queries.
First is
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return nodes(p1)
The query result is a graph with the path nodes.
But if I tried the query
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return count(distinct(primero))
The result is
count(distinct(primero))
2
How can I use count(distinct()) over the node primero.
Node Primero has a field called id.

You should bind at least one of those nodes, add a direction and also consider a path-limit otherwise this is an extremely expensive query.
match p=(primero)-[:ResponseTo*..30]-(segundo)
with p order by length(p) desc limit 1
unwind nodes(p) as n
return distinct n;

collect distinct nodes and relations from multiple rows

I'm writing a query to retrieve nodes and relations from multiple paths :
MATCH path=(p:Label)-[*..100]->()
RETURN [n in nodes(path) | ID(n)] as nodeIds,
[n in nodes(path)] as nodes,
[r in relationships(path) | ID(r)] as relationshipIds,
[r in relationships(path) | type(r)] as relationshipTypes,
[r in relationships(path)] as relationships
However I have multiple rows (corresponding to each path) with possibly the same data.
I'd like to have one row containing all the distinct nodeIds, relationshipIds, ...
Thank you !

When I run this query, I don't get duplicate data. But I can see why you might think there's duplicate data. First, you could try this:
MATCH path=(p:Label)-[*..100]->()
WITH DISTINCT(path) as path
RETURN [n in nodes(path) | ID(n)] as nodeIds,
[n in nodes(path)] as nodes,
[r in relationships(path) | ID(r)] as relationshipIds,
[r in relationships(path) | type(r)] as relationshipTypes,
[r in relationships(path)] as relationships
ORDER BY length(path) LIMIT 1;
This would ensure that all paths were distinct, which would mean you couldn't have repeated data, but I think that should already be the case. Ordering by path length means longest paths go first, and limit 1 means only the longest path.
Anyway, the duplication you're probably seeing is due to paths and path fragments. Let's say I have a->b->c. Your query will report three paths:
a->b->c
a->b
b->c
Note that's the correct answer. But in terms of node IDs and relationship IDs, you're going to see a lot of duplication in the result set, because every single node ID will occur at least twice in the results.

neo4j collecting nodes and relations type b-->a<--c,a<--d

I am extending maxdemarzi's excellent graph visualisation example (http://maxdemarzi.com/2013/07/03/the-last-mile/) using VivaGraph backed by neo4j.
I want to display relationships of the type
a-->b<--c,b<--d
I tried the query
MATCH p = (a)--(b:X)--(c),(b:X)--(d)
RETURN EXTRACT(n in nodes(p) | {id:ID(n), name:COALESCE(n.name, n.title, ID(n)), type:LABELS(n)}) AS nodes,
EXTRACT(r in relationships(p)| {source:ID(startNode(r)) , target:ID(endNode(r))}) AS rels
It looks like the named query picks up only a-->b<--c pattern and omits the b<--d patterns.
Am i missing something... can i not add multiple patterns in a named query?

The most immediate problem is that the comma in the MATCH clause separates the first pattern from the second. The variable 'p' only stores the first pattern. This is why you aren't getting the results you desire. Independent of that, you are at risk of having a 'loose binding' by putting a label on both of your nodes named 'b' in the two patterns. The second 'b' node should not have a label.
So here is a version of your query that should work.
MATCH p1=(a)-->(b:X)<--(c), p2=(b)<--(d)
WITH nodes(p1) + d AS ns, relationships(p1) + relationships(p2) AS rs
RETURN EXTRACT(n IN ns | {id:ID(n), name:COALESCE(n.name, n.title, ID(n)), type:LABELS(n)}) AS nodes,
EXTRACT(r in rs| {source:ID(startNode(r)) , target:ID(endNode(r))}) AS rels
Capture both paths, then build collections from the nodes and relationships of both paths. The collection of nodes actually only extracts the nodes from p1 and adds the 'd' node. You could write that part as
nodes(p1) + nodes(p2) as ns
but then the 'b' node will appear in the list twice.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Match Only Full Paths in Neo4J with Cypher (not sub-paths) - neo4j

Just had to do something similar, this worked on your example, finds all nodes which do not have outgoing [:KNOWS]: match p=(a:Node {name:'a'})-[:KNOWS*]->(b:Node) optional match (b)-[v:KNOWS]->() with p,v where v IS NULL return collect(extract(n in nodes(p) | n.id)) as paths

Related

How Many Nodes Are Involved in a Match

How to write cypher statement to combine nodes when an OPTIONAL MATCH is null?

neo4j how to use count(distinct()) over the nodes of path

collect distinct nodes and relations from multiple rows

neo4j collecting nodes and relations type b-->a<--c,a<--d

Categories

Resources