Cypher match path with intermediate nodes - neo4j

I have the following graph with Stop (red) and Connection (green) nodes.
I want to find the shortest path from A to C using a cost property on Connection.
I would like to avoid making Connection a relationship because than I loose the CONTAINS relationship of Foo.
I can match a single hop like this
MATCH p=(:Stop {name:'A'})<-[:BEGINS_AT]-(:Connection)-[:ENDS_AT]->(:Stop {name:'B'}) RETURN p
but this does not work with an arbitrary number of Connections like it would with relationships and [*].
I also tried to make a projection down to simple relationships but it seems like I cannot do something with this without GDS.
MATCH (s1:Stop)<-[:BEGINS_AT]-(c:Connection)-[:ENDS_AT]->(s2:Stop) RETURN id(s1) AS source, id(s2) AS target, c.cost AS cost
Note that the connection is unidirectional, so it must not be possible to go from C to A.
Is there a way to do this without any Neo4j plugins?

This should get all usable paths (without plugins):
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=(a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop)
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
To get the shortest path:
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=(a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop)
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p
ORDER BY LENGTH(p)
LIMIT 1;
or
WITH ['BEGINS_AT', 'ENDS_AT'] AS types
MATCH p=shortestpath((a:Stop)-[:BEGINS_AT|ENDS_AT*]-(b:Stop))
WHERE a.name = 'A' AND b.name = 'B' AND
ALL(i IN RANGE(0, LENGTH(p)-1) WHERE TYPE(RELATIONSHIPS(p)[i]) = types[i%2])
RETURN p

If you want to calculate the weighted shortest path, then it is the easiest to use GDS or even APOC plugin. You could probably create a shortest weighted path function with cypher but it would be not optimized. I can only think of finding all paths between the two nodes and suming the weights. In the next step you would filter the path with the minimum sum of weight. This would not scale well though.
As for the second part of your question I would need more information as I dont know exactly what you want.

Related

Neo4j -shortest path where intermediate nodes have a relationship [duplicate]

I have a graph database that consists of nodes (bus stations) with a property called “is_in_operation” which is set to “true” if the bus station is operational; otherwise it is set to “false”.
There is a relationship created between two nodes if a bus travels between the two stations.
I would like to find the path with the shortest number of stops between two nodes where all nodes in the path are operational.
There is an example in the database where there are 2 paths between 2 specified nodes. The “is_in_operation” property is set to ‘true’ for all nodes in both paths. When I run the following query, I get the correct answer
START d=node(1), e=node(5)
MATCH p = shortestPath( d-[*..15]->e ) where all (x in nodes(p) where x.is_in_operation='true')
RETURN p;
When I set the ‘is_in_operation’ property to ‘false’ for one of the intermediate nodes in the shortest path and rerun the query, I expect it to return the other path. However, I get no answer at all.
Is the query incorrect? If so, how should I specify the query?
The problem is that shortestPath can't take into account the where clause, so you're matching the shortest path, and then filtering it out with your where.
How about this one--it might not be as efficient as shortestPath, but it should return a result, if one exists:
START d=node(1), e=node(5)
MATCH p = d-[*..15]->e
where all (x in nodes(p) where x.is_in_operation='true')
RETURN p
order by len(p)
limit 1;

neo4j -- Find all shortest paths between more than 2 nodes

For example,I want to query allShortestPaths between 3 nodes(A,B,C),it means i want to query:
1. the allShortestPaths between A and B
2. the allShortestPaths between C and B
3. the allShortestPaths between A and C
but I only find the allShortestPaths query to get allShortestPaths between two nodes.
As follow:
MATCH (node1:E { eid:"a9c2f114-796f-4934-a2d0-04bb3345e1d2" }),
(node2:E { eid:"01968dd2-1ed6-472d-82e9-be7701036b3b" }),
p = allShortestPaths((node1)-[*]-(node2))
RETURN p LIMIT 25
I am wondering if there exists a allShortestPaths query supporting more than 2 nodes input?
Now,to search 3 nodes,I have to invoke the "allShortestPaths" three times,as follow:
MATCH (node1:E { eid:"b73ade90-dfa1-4b94-bd0f-c16fd93bd680" }),
(node2:E { eid:"ddb5c52d-7002-4ac7-87d5-0f727f2ab3e7" }),
(node3:E { eid:"0398b081-6676-4a91-856b-abbabaee5e70" }) ,
p = allShortestPaths((node1)-[*]-(node2)),
q = allShortestPaths((node3)-[*]-(node2)),
m = allShortestPaths((node3)-[*]-(node1))
RETURN p,q,m LIMIT 10
What i want to do is to search allShortestPaths between arbitrary number of nodes.
So far,I intend to write user-defined procedures,but it will costs more time.I wondering who can provide better advice.
i want to search search allShortestPaths between serveral nodes.
such as: allShortestPaths((a)-[*]-(b)-[*]-(c)-[*]-(a))
I want get the all shortest path between a and b,b and c,c and a in a query
You need a nested loops:
// Array of id
WITH ["b73ade90-dfa1-4b94-bd0f-c16fd93bd680",
"ddb5c52d-7002-4ac7-87d5-0f727f2ab3e7",
"0398b081-6676-4a91-856b-abbabaee5e70"] as IDS
UNWIND IDS as vid
// Looking for the desired nodes
MATCH (N:E {id: vid})
WITH collect(N) as NS
// Nested loops
UNWIND RANGE(0, size(NS)-2) as i1
UNWIND RANGE(i1+1, size(NS)-1) as i2
WITH NS[i1] as N1,
NS[i2] as N2
// Get paths
MATCH ps = allShortestPaths((N1)-[*]-(N2))
RETURN ps
Neo4j doesn't provide a version of allShortestPaths taking multiple patterns, which is what you want:
allShortestPaths((node1)-[*]-(node2), (node1)-[*]-(node3), (node2)-[*]-(node3))
You wish to optimize the traversals by piggy-backing on the first one to do the second one at the same time, but there's no such thing out of the box, and it wouldn't do the third one either. It's a really specific use case.
You either have to call allShortestPaths n*(n-1) times (for n nodes) in Cypher, or try implementing it yourself server-side in a procedure using the Traversal framework.
here a sample cypher
MATCH (n:Entity) where n.name IN {names}
WITH collect(n) as nodes
UNWIND nodes as n
UNWIND nodes as m
WITH * WHERE id(n) < id(m)
MATCH path = allShortestPaths( (n)-[*..4]-(m) )
RETURN path
see https://neo4j.com/developer/kb/all-shortest-paths-between-set-of-nodes/ for more

Optimizing Cypher Query Neo4j

I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;

Cypher path needs to exclude a certain relation

I have this graph:
A-[:X]->B-> a whole tree of badness
A-[:Y]->C-> a whole tree of goodness
I would like to know how to specify a path starting with A that excludes the :X relationship.
In this case "Y" could be any one of a number of different edge types. I do not want to specify them explicitly.
How do I write a path statement that includes A-[*]-B where * is not :X but can be anything else?
Solution for a fixed number of relationships between A and B
You can exclude a relationship type by matching all relationships from A to B and then filter out a specific type with WHERE NOT
MATCH p = (a:Label1)-[]-(b:Label2)
WHERE NOT (a)-[:X]-(b)
RETURN p
Solution for a variable length path between A and B
If you have a variable length path between A and B you cannot put the exact pattern in the WHERE NOT. Instead, you can use a NONE predicate on the path:
MATCH p = (a:Label1)-[*]-(b:Label2)
// this WHERE makes sure that none of the relationships in the
// returned path fulfill the criterion type(relationship) = 'X'
WHERE NONE (r in relationships(p) WHERE type(r) = 'X')
RETURN p
This Cypher query is simpler than the variable-length path query from #MartinPreusse, as it avoids using the RELATIONSHIPS function. Profiling shows that its execution plan is also a bit simpler, so it might be faster.
MATCH p=(a:Label1)-[rels*]-(b:Label2)
WHERE NONE (r IN rels WHERE type(r)= 'X')
RETURN p

Neo4j cypher query efficiency and syntax

I am attempting to query an ontology of health represented as an acyclic, directed graph in Neo4j v2.1.5. The database consists of 2 million nodes and 5 million edges/relationships. The following query identifies all nodes subsumed by a disease concept and caused by a particular bacteria or any of the bacteria subtypes as follows:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b.sctid, b.FSN
This query runs in < 1 second and returns the correct answers. However, adding one additional parameter adds substantial time (20 minutes). Example:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept),
t=(e:ObjectConcept{bacteria})<-[:ISA*]-(f:ObjectConcept),
WHERE NOT (b)-->()--(c)
AND NOT (b)-->()-->(d)
AND NOT (b)-->()-->(e)
AND NOT (b)-->()-->(f)
RETURN distinct b.sctid, b.FSN
I am new to cypher coding, but I have to imagine there is a better way to write this query to be more efficient. How would Collections improve this?
Thanks
I already answered that on the google group:
Hi Scott,
I presume you created indexes or constraints for :ObjectConcept(name) ?
I am working with an acyclic, directed graph (an ontology) that models
human health and am needing to identify certain diseases (example:
Pneumonia) that are infectious but NOT caused by certain bacteria
(staph or streptococcus). All concepts are Nodes defined as
ObjectConcepts. ObjectConcepts are connected by relationships such as
[ISA], [Pathological_process], [Causative_agent], etc.
The query requires:
a) Identification of all concepts subsumed by the concept Pneumonia as follows:
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check that with
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return count(*)
b) Identification of all concepts subsumed by Genus Staph and Genus Strep (including the concept Genus Staph and Genus Strep) as follows. Note:
with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept)
this is then the cross product of the paths from "p", "q" and "h", e.g. if all 3 of them return 1000 paths, you're at 1bn paths !!
c) Identify all nodes(p) that do not have a causative agent of Strep (i.e., nodes(q)) or Staph (nodes(h)) as follows:
with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name;
you don't need the WITH or even the MATCH (b),(c),(d),(e),(f)
what connections are there between b and the other nodes ? do you have concrete ones? for the first there is also missing one direction.
the where clause can be a problem, in general you want to show that perhaps this query is better reproduced by a UNION of simpler matches
e.g
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep}) return b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph}) return b.name
another option would be to utilize the shortestPath() function to find one or all shortest path(s) between Pneumonia and the bacteria with certain rel-types and direction.
Perhaps you can share the dataset and the expected result.
The query was successfully accomplished using UNION functions as follows:
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
q = (c:ObjectConcept{sctid:58800005})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b
UNION
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
t = (e:ObjectConcept{sctid:65119002}) <-[:ISA*]- (f:ObjectConcept)
WHERE NOT (b)-->()-->(e) AND NOT (b)-->()-->(f)
RETURN distinct b
The query runs in sub 20 seconds vs. 20 minutes by reducing the cardinality of the objects being queried.

Resources