neo4j shortest with connector node and multiple options - neo4j

I have Cities, Roads and Transporters in my database.
A Road is connected with a From and To relationship to two (different) Cities. Each road has also a property distance (in kilometers).
Multiple Transporters could have a relationship to Roads. Every Transporter has a price (per kilometer).
Now my question. I want the cheapest option to get a packet from city A to B. There could be a direct road or else we have to go via other cities and transporters. And I want explicitly use the Dijkstra algorithm for this.
Can this query be done in Cypher? And if not, how can it be done using the Neo4J Java API?

Based on your sample dataset, I think there is a modelisation problem that makes maybe things difficult, certainly for matching on directed relationships.
However this is already how you can find the lowest cost path :
MATCH (a:City { name:'CityA' }),(d:City { name:'CityD' })
MATCH p=(a)-[*]-(d)
WITH p, filter(x IN nodes(p)
WHERE 'Road' IN labels(x)) AS roads
WITH p, reduce(dist = 0, x IN roads | dist + x.distance) AS totalDistance
RETURN p, totalDistance
ORDER BY totalDistance
LIMIT 5

Related

"Query Optimization" : Neo4j Query builds a cartesian product between disconnected patterns -

I’m supposed to have graph of multiple nodes(more than 2) with their relationships at 1st degree, second degree, third degree.
For that right now I am using this query
WITH ["1258311979208519680","3294971891","1176078684270333952",”117607868427845”] as ids
MATCH (n1:Target),(n2:Target) WHERE n1.id in ids and n2.id in ids and n1.id<>n2.id and n1.uid=103 and n2.uid=103
MATCH p = ((n1)-[*..3]-(n2)) RETURN p limit 30
In which 4 nodes Id’s are mention in WITH[ ] and next [*..3] it is used to draw 3rd degree graph between the selected nodes.
WHAT the ABOVE QUERY DOING
After running the above query it will return the mutual nodes in case of second degree [*..2] if any of the 2 selected nodes have mutual relation it’ll return.
WHAT I WANT
*1) First of all I want to optimize the query, as it is taking so much time and this query causing the Cartesian product which slow down the query process.
2) As in this above query if any 2 nodes have mutual relationship it will return the data, I WANT, the query will return mutual nodes attached with all selected nodes. Means if we have some nodes in return, these nodes must have relation to all selected target nodes.
Any suggestions to modify the query, to optimize the query.
If you are looking for to avoid the cartesian product issue with the given query
WITH ["1258311979208519680","3294971891","1176078684270333952",”117607868427845”] as ids
MATCH (n1:Target),(n2:Target) WHERE n1.id in ids and n2.id in ids and n1.id<>n2.id and n1.uid=103 and n2.uid=103
MATCH p = ((n1)-[*..3]-(n2)) RETURN p limit 30
I suggest to use this one below
MATCH (node1:Target) WHERE node1.id IN ["1258311979208519680","3294971891","1176078684270333952"]
MATCH (node2:Target) WHERE node2.id IN ["1258311979208519680","3294971891","1176078684270333952"]
and node1.id <> node2.id
MATCH p=(node1)-[*..2]-(node2)
RETURN p
It will remove the cartesian product issue.
Try this..

Merging Nodes on Path calculating a (compound) Properties of the relationship

(Using Neo4j 3.x and the neo4j.v1 Python driver)
I have a track consisting of a linked list of nodes each one representing a (lon, lat) pair of coordinates.
(A)-[:NEXT]->(B)-[:NEXT]->(C) etc.
with properties lon, lat on each node
Question 1
The direct distance between the coordinates of two neighboring nodes, e.g. (0,0) and (1,1), could be added as a "distance" property of the relationship -[:NEXT {distance: 1.41421}]-> between the two neighboring nodes. How could you do that given that I have thousands of nodes like this?
Question 2
Whole segments of this linked list of nodes could be replaced by a single -[:NEXT]-> relationship with the "distance" property as the sum of all the distances between adjacent nodes of the original list. How could this be done efficiently for, again, thousands or more nodes?
(A)-[:NEXT {distance: 1}]->(B)-...->(D)-[:NEXT {distance: 1}]->(E)
(A)-[:NEXT {distance: 4}}->(E)
Thank you for your guidance and hints.
Part1 : You're using lat/lon, in neo4j 3.0 there is native support for point and distance, make sure to use latitude and longitude property keys. You can then set this property on the relationship with the following :
MATCH (start:YourNodeLabel)-[r:NEXT]->(end)
SET r.distance = distance(point(start), point(end)) / 1000
Part 2 : If you know the start and end node of the path, you can then create this next relationship by reducing the distance properties of the relationships :
MATCH (start:YourNodeLabel {name:"A"}), (end:YourNodeLabel {name:"E"})
MATCH (start)-[r:NEXT*]->(end)
CREATE (start)-[newrel:NEXT]->(end)
SET newrel.distance = reduce(d=0.0, x IN r | d + x.distance)
Be careful however with this, taking into account that there could be more than one path from start to end, in that case for eg if you want to find the shortest distance from start to end, you will need to calculate the total distance and take the lowest one :
MATCH (start:YourNodeLabel {name:"A"}), (end:YourNodeLabel {name:"E"})
MATCH p=(start)-[:NEXT*]->(end)
WITH p, start ,end, reduce(d=0.0, x IN rels(p) | d + x.distance) as totalDistance
ORDER BY totalDistance ASC
LIMIT 1
CREATE (start)-[newRel:NEXT]->(end)
SET newRel.distance = totalDistance
If you don't have the distance properties on the relationships, you can also calculate the geo distance on the fly in the reduce functions :
MATCH (start:YourNodeLabel {name:"A"}), (end:YourNodeLabel {name:"E"})
MATCH p=(start)-[:NEXT*]->(end)
WITH p, start, end,
reduce(d=0.0, x IN range(1, size(nodes(p))-1) | d + distance(point(nodes(p)[x-1]), point(nodes(p)[x])) / 1000) as distance
ORDER BY distance ASC
LIMIT 1
CREATE (start)-[newRel:NEXT]->(end)
SET newRel.distance = distance
As a general advise, I wouldn't use the same relationship type name for the relationship used as a shortcut, maybe CONNECT_TO or REACH_POINT can be more suited in order to not to interfer with the NEXT relationships in other queries.

Neo4J find route thru more points

I am creating simple graph db for tranportation between few cities. My structure is:
Station = physical station
Stop = each station has several stops, depend on time and line ID
Ride = connection between stops
I need to find route from city A to city C, but i has no direct stopconnection, but they are connected thru city B. see picture please, as new user i cant post images to question.
How can I get router from City A with STOP 1 connect RIDE 1 to STOP 2 then
STOP 2 connected by same City B to STOP3 and finnaly from STOP3 by RIDE2 to STOP4 (City C)?
Thank you.
UPDATE
Solution from Vince is ok, but I need set filter to STOP nodes for departure time, something like
MATCH p=shortestPath((a:City {name:'A'})-[*{departuretime>xxx}]-(c:City {name:'C'})) RETURN p
Is possible to do without iterations all matches collection? because its to slow.
If you are simply looking for a single route between two nodes, this Cypher query will return the shortest path between two City nodes, A and C.
MATCH p=shortestPath((a:City {name:'A'})-[*]-(c:City {name:'C'})) RETURN p
In general if you have a lot of potential paths in your graph, you should limit the search depth appropriately:
MATCH p=shortestPath((a:City {name:'A'})-[*..4]-(c:City {name:'C'})) RETURN p
If you want to return all possible paths you can omit the shortestPath clause:
MATCH p=(a:City {name:'A'})-[*]-(c:City) {name:'C'}) RETURN p
The same caveats apply. See the Neo4j documentation for full details
Update
After your subsequent comment.
I'm not sure what the exact purpose of the time property is here, but it seems as if you actually want to create the shortest weighted path between two nodes, based on some minimum time cost. This is different of course to shortestPath, because that minimises on the number of edges traversed only, not the cost of those edges.
You'd normally model the traversal cost on edges, rather than nodes, but your graph has time only on the STOP nodes (and not for example on the RIDE edges, or the CITY nodes). To make a shortest weighted path query work here, we'd need to also model time as a property on all nodes and edges. If you make this change, and set the value to 0 for all nodes / edges where it isn't relevant then the following Cypher query does what I think you need.
MATCH p=(a:City {name: 'A'})-[*]-(c:City {name:'C'})
RETURN p AS shortestPath,
reduce(time=0, n in nodes(p) | time + n.time) AS m,
reduce(time=0, r in relationships(p) | time + r.time) as n
ORDER BY m + n ASC
LIMIT 1
In your example graph this produces a least cost path between A and C:
(A)->(STOP1)-(STOP2)->(B)->(STOP5)->(STOP6)->(C)
with a minimum time cost of 230.
This path includes two stops you have designated "bad", though I don't really understand why they're bad, because their traversal costs are less than other stops that are not "bad".
Or, use Dijkstra
This simple Cypher will probably not be performant on densely connected graphs. If you find that performance is a problem, you should use the REST API and the path endpoint of your source node, and request a shortest weighted path to the target node using Dijkstra's algorithm. Details here
Ah ok, if the requirement is to find paths through the graph where the departure time at every stop is no earlier than the departure time of the previous stop, this should work:
MATCH p=(:City {name:'A'})-[*]-(:City {name:'C'})
MATCH (a:Stop) where a in nodes(p)
MATCH (b:Stop) where b in nodes(p)
WITH p, a, b order by b.time
WITH p as ps, collect(distinct a) as as, collect(distinct b) as bs
WHERE as = bs
WITH ps, last(as).time - head(as).time as elapsed
RETURN ps, elapsed ORDER BY elapsed ASC
This query works by matching every possible path, and then collecting all the stops on each matched path twice over. One of these collections of stops is ordered by departure time, while the other is not. Only if the two collections are equal (i.e. number and order) is the path admitted to the results. This step evicts invalid routes. Finally, the paths themselves are ordered by least elapsed time between the first and last stop, so the quickest route is first in the list.
Normal warnings about performance, etc. apply :)

neo4j cartesian product performance improvement

I have a Graph database with over 2 million nodes. I have an application which takes a social graph and does some inference on it. As one step of the algorithm, I have to get all possible combinations of a relationship [:friends] of two connected nodes. Currently, I have a query which looks like:
match (a)-[:friend]-(c), (b)-[:friend]-(d) where id(a)={ida} and id(b)={idb} return distinct c as first, d as second
So, I already know the nodes a and b and I want to get all the possible pairs that can be made from friends of a and b.
This is obviously a very slow operation. I was wondering if there is a more efficient way of getting the same result in neo4j. Perhaps adding indexes might help? Any ideas / clues are welcome!
Example
Node a has friends : x, y
Node b has friends : g, h, i``
Then the result should be:
x,g
x,h
x,i
y,g
y,h
y,i`
If you are not already you should use labels to speed up your query, which might look like:
MATCH (p1:Person)-[:FRIEND]->(p3:Person),(p2:Person)-[:FRIEND]->(p4:Person)
WHERE ID(p1) = 6 AND ID(p2) = 7
RETURN p3 as first, p4 as second
Obviously that will rely on you having created your nodes with a :Person label.
How many friends does the average node have?
I wouldn't use two patterns but just one and the IN operator.
MATCH (p:Person)-[:FRIEND]->(friend:Person)
WHERE id(p) IN [1,2,3]
RETURN p, collect(friend) as friends
Then you have no cross product and you can also return the friends nicely as collection per person.

get all nodes and relationships properties within closed circle

Lets asume that John is selling goods to Met , Met is selling goods To both Bob and Alen ,
and Alen sells goods to John again .
What I need is a Cypher query that returns all the closed circles like in this example
John..Met..Alen because Alen sells goods to John again making it a closed circle displaying also the lowest amount of relationship property (amount) .How do I do this from entire database , get me all the closed circles and min amounths .Thanks!
Starting with Stefans answer, for the minimum you would want to take the lengths of the paths into account.
start n=node(*)
match p=n-[:SELLS_TO*1..5]->n
return p, lenght(p)
To just the the shortest path length per node
start n=node(*)
match p=n-[:SELLS_TO*1..5]->n
return n, min(lenght(p))
if you want to get the shortest path:
start n=node(*)
match p=n-[:SELLS_TO*1..5]->n
with n, collect(nodes(p)) as nodes, min(length(nodes(p))) as l
return n, head(filter(p in nodes : length(p) = l)) as shortest_circle,l
See the Neo4j console for an example: http://console.neo4j.org/r/wrm522
Something you'll note there is that if you scan the whole graph you will get the same circle multiple times for each node of the circle.
This uses the nodes, length, collect, head and filter functions and the min aggregate.
see: http://docs.neo4j.org/chunked/milestone/query-function.html
As Stefan already said, scanning over all nodes is very probably quite expensive.
You could do a query like:
start n=node(*)
match p=n-[:SELLS_TO*1..5]->n
return p
where 5 ist the maximum depth for a loop.
See an example in Neo4j console. However using "node(*)" triggers a global query which scales linearly wiht the size of your graph.

Resources