Neo4j: Customize Path Traversal - neo4j

I am pretty new to Neo4j. I have implemented an example use case with the following setup:
acyclic directed graph
nodes have a property called externalID
Nodes:
Node Type S (Start Node)
Node Type E (End Node)
Node Type I (Intermediate Node)
Relations:
Node Type S can only have outgoing relations to Nodes of Type I
Node Type I can have ingoing relations from I and S
Node Type I can have outgoing relations to I and E
Node Type E can only have incomming relations from I
All relations have a weight property assigned which can be any number
With the help of stackoverflow and several tutorials I was able to formulate a Cypher query which gets me all paths from any start node with one externalID to the matching end node with the same externalID.
MATCH p=(a:S)-[r*]->(b:E)
WHERE a.externalID=b.externalID
WITH p, relationships(p) as rcoll
RETURN p
The query works more or less good so far ...
However, I have no idea how to change the behavior on how the graph is scanned for possible paths. Actually I only need a subset of all possible paths. Such paths fulfill the following requirement:
The path traversal is started at a Start Node S with a given capacity C.
if a relationship is traversed the weight property of this relationship is subtracted from the current capacity C (that means negative weights are added)
if the capacity gets negative the path up to this point is invalid (the path up to the previous node is still valid and may continue with other relationships)
if the capacity is still positive continue with another relationship from this point and use the result of C - weight as new C
Can I somehow adjust the query or is there any other possibility with Neo4j to get all paths using the strategy above?
Thanks a lot for your help in advance.

This Cypher query might be suitable for your use case:
MATCH p = (a:S)-[r*]->(b:E)
WHERE a.externalID = b.externalID
WITH
p,
REDUCE(c = a.capacity, r IN RELATIONSHIPS(p) |
CASE WHEN c < 0 THEN -1 ELSE c - r.weight END) AS residual
WHERE residual >= 0
RETURN p;
The REDUCE clause will set residual to a negative value if the capacity is ever reduced below 0, even if subsequent weights would normally cause it to go positive.

Related

Getting a subgraph not crossing through a certain node

I have a huge (non cyclic) graph and want to find all nodes reachable by relation X from a given node. However, I do not want to cross a node having a certain attribute {attr:'donotcross'} as this represents a choke point I do not want to cross (i.e. this is the only node leading to an adjacent subgraph).
Currently I do breadth first search myself using a trivial Cypher query to isolate neighboring nodes and python, stopping the recursion as soon as I reach that specific node. This, however, is really slow and I think that using pure Cypher to isolate those nodes could be faster.
What does the Cypher query look like returning all connected nodes via X but not traversing a node with property attr:'donotcross'?
My intuition would be
MATCH (n)-[:X*]->(inter)-[:X*]->(m) WHERE NOT inter.attr = 'donotcross' RETURN m
With n being the start node. However, this does not work as this pattern can match a path with a forbidden node if there are more than the forbidden node in between the start and target node.
Using Cypher alone, you can use the following approach:
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE inter.attr = 'donotcross')
RETURN DISTINCT m
Keep in mind you should at least be using labels for your starting node n, if you aren't able to look them up by an indexed property for a specific label.
Also, if there are relatively few of these donotcross nodes, and if there is an index on the label of these nodes on attr, then it may be faster to first match on these nodes, collect them, then filter based on that:
MATCH (x) // best to use a label and index lookup!
WHERE x.attr = 'donotcross'
WITH collect(x) as excluded
MATCH path = (n)-[:X*]->(m) // best to use a label!
WHERE none(node in nodes(path) WHERE node in excluded)
RETURN DISTINCT m

Find a set of (n) nodes where relationship weight between each pair of node is greater than a value(w)

I have a database where each node is connected to all other nodes with a relationship, and each relationship has a weight. I need a query where given a weight w and a number of nodes n, I want all n nodes where each pair of relationship has a weight greater than w.
Any help on this would be great
It depends on what you would like your result set to look like. Something as simple as this query would return all paths that fall under your criteria:
MATCH p=()-[r:my_rel]->() WHERE r.weight > w RETURN p;
This would return all such paths.
If you would like the two nodes only (and not the entire pattern's results), you can return only those two nodes:
MATCH (n1)-[r:my_rel]->(n2) WHERE r.weight > w RETURN n1,n2;
Do note that due to Neo4J's storage internals, performing a search based on the properties of a relationship tends to not perform as well as those based on properties of a node.

Find all relations starting with a given node

In a graph where the following nodes
A,B,C,D
have a relationship with each nodes successor
(A->B)
and
(B->C)
etc.
How do i make a query that starts with A and gives me all nodes (and relationships) from that and outwards.
I do not know the end node (C).
All i know is to start from A, and traverse the whole connected graph (with conditions on relationship and node type)
I think, you need to use this pattern:
(n)-[*]->(m) - variable length path of any number of relationships from n to m. (see Refcard)
A sample query would be:
MATCH path = (a:A)-[*]->()
RETURN path
Have also a look at the path functions in the refcard to expand your cypher query (I don't know what exact conditions you'll need to apply).
To get all the nodes / relationships starting at a node:
MATCH (a:A {id: "id"})-[r*]-(b)
RETURN a, r, b
This will return all the graphs originating with node A / Label A where id = "id".
One caveat - if this graph is large the query will take a long time to run.

Neo4J find route thru more points

I am creating simple graph db for tranportation between few cities. My structure is:
Station = physical station
Stop = each station has several stops, depend on time and line ID
Ride = connection between stops
I need to find route from city A to city C, but i has no direct stopconnection, but they are connected thru city B. see picture please, as new user i cant post images to question.
How can I get router from City A with STOP 1 connect RIDE 1 to STOP 2 then
STOP 2 connected by same City B to STOP3 and finnaly from STOP3 by RIDE2 to STOP4 (City C)?
Thank you.
UPDATE
Solution from Vince is ok, but I need set filter to STOP nodes for departure time, something like
MATCH p=shortestPath((a:City {name:'A'})-[*{departuretime>xxx}]-(c:City {name:'C'})) RETURN p
Is possible to do without iterations all matches collection? because its to slow.
If you are simply looking for a single route between two nodes, this Cypher query will return the shortest path between two City nodes, A and C.
MATCH p=shortestPath((a:City {name:'A'})-[*]-(c:City {name:'C'})) RETURN p
In general if you have a lot of potential paths in your graph, you should limit the search depth appropriately:
MATCH p=shortestPath((a:City {name:'A'})-[*..4]-(c:City {name:'C'})) RETURN p
If you want to return all possible paths you can omit the shortestPath clause:
MATCH p=(a:City {name:'A'})-[*]-(c:City) {name:'C'}) RETURN p
The same caveats apply. See the Neo4j documentation for full details
Update
After your subsequent comment.
I'm not sure what the exact purpose of the time property is here, but it seems as if you actually want to create the shortest weighted path between two nodes, based on some minimum time cost. This is different of course to shortestPath, because that minimises on the number of edges traversed only, not the cost of those edges.
You'd normally model the traversal cost on edges, rather than nodes, but your graph has time only on the STOP nodes (and not for example on the RIDE edges, or the CITY nodes). To make a shortest weighted path query work here, we'd need to also model time as a property on all nodes and edges. If you make this change, and set the value to 0 for all nodes / edges where it isn't relevant then the following Cypher query does what I think you need.
MATCH p=(a:City {name: 'A'})-[*]-(c:City {name:'C'})
RETURN p AS shortestPath,
reduce(time=0, n in nodes(p) | time + n.time) AS m,
reduce(time=0, r in relationships(p) | time + r.time) as n
ORDER BY m + n ASC
LIMIT 1
In your example graph this produces a least cost path between A and C:
(A)->(STOP1)-(STOP2)->(B)->(STOP5)->(STOP6)->(C)
with a minimum time cost of 230.
This path includes two stops you have designated "bad", though I don't really understand why they're bad, because their traversal costs are less than other stops that are not "bad".
Or, use Dijkstra
This simple Cypher will probably not be performant on densely connected graphs. If you find that performance is a problem, you should use the REST API and the path endpoint of your source node, and request a shortest weighted path to the target node using Dijkstra's algorithm. Details here
Ah ok, if the requirement is to find paths through the graph where the departure time at every stop is no earlier than the departure time of the previous stop, this should work:
MATCH p=(:City {name:'A'})-[*]-(:City {name:'C'})
MATCH (a:Stop) where a in nodes(p)
MATCH (b:Stop) where b in nodes(p)
WITH p, a, b order by b.time
WITH p as ps, collect(distinct a) as as, collect(distinct b) as bs
WHERE as = bs
WITH ps, last(as).time - head(as).time as elapsed
RETURN ps, elapsed ORDER BY elapsed ASC
This query works by matching every possible path, and then collecting all the stops on each matched path twice over. One of these collections of stops is ordered by departure time, while the other is not. Only if the two collections are equal (i.e. number and order) is the path admitted to the results. This step evicts invalid routes. Finally, the paths themselves are ordered by least elapsed time between the first and last stop, so the quickest route is first in the list.
Normal warnings about performance, etc. apply :)

finding the farthest node using Neo4j (node without any incoming relation)

I have created a graph db in Neo4j and want to use it for generalization purposes.
There are about 500,000 nodes (20 distinct labels) and 2.5 million relations (50 distinct types) between them.
In an example path : a -> b -> c-> d -> e
I want to find out the node without any incoming relations (which is 'a').
And I should do this for all the nodes (finding the nodes at the beginning of all possible paths that have no incoming relations).
I have tried several Cypher codes without any success:
match (a:type_A)-[r:is_a]->(b:type_A)
with a,count (r) as count
where count = 0
set a.isFirst = 'true'
or
match (a:type_A), (b:type_A)
where not (a)<-[:is_a*..]-(b)
set a.isFirst = 'true'
Where is the problem?!
Also, I have to create this code in neo4jClient, too.
Your first query will only match paths where there is a relationship [r:is_a], so counting r can never be 0. Your second query will return any arbitrary pair of nodes labeled :typeA that aren't transitively related by [:is_a]. What you want is to filter on a path predicate. For the general case try
MATCH (a)
WHERE NOT ()-->a
This translates roughly "any node that does not have incoming relationships". You can specify the pattern with types, properties or labels as needed, for instance
MATCH (a:type_A)
WHERE NOT ()-[:is_a]->a
If you want to find all nodes that have no incoming relationships, you can find them using OPTIONAL MATCH:
START n=node(*)
OPTIONAL MATCH n<-[r]-()
WITH n,r
WHERE r IS NULL
RETURN n

Resources