Merging Nodes on Path calculating a (compound) Properties of the relationship - neo4j

(Using Neo4j 3.x and the neo4j.v1 Python driver)
I have a track consisting of a linked list of nodes each one representing a (lon, lat) pair of coordinates.
(A)-[:NEXT]->(B)-[:NEXT]->(C) etc.
with properties lon, lat on each node
Question 1
The direct distance between the coordinates of two neighboring nodes, e.g. (0,0) and (1,1), could be added as a "distance" property of the relationship -[:NEXT {distance: 1.41421}]-> between the two neighboring nodes. How could you do that given that I have thousands of nodes like this?
Question 2
Whole segments of this linked list of nodes could be replaced by a single -[:NEXT]-> relationship with the "distance" property as the sum of all the distances between adjacent nodes of the original list. How could this be done efficiently for, again, thousands or more nodes?
(A)-[:NEXT {distance: 1}]->(B)-...->(D)-[:NEXT {distance: 1}]->(E)
(A)-[:NEXT {distance: 4}}->(E)
Thank you for your guidance and hints.

Part1 : You're using lat/lon, in neo4j 3.0 there is native support for point and distance, make sure to use latitude and longitude property keys. You can then set this property on the relationship with the following :
MATCH (start:YourNodeLabel)-[r:NEXT]->(end)
SET r.distance = distance(point(start), point(end)) / 1000
Part 2 : If you know the start and end node of the path, you can then create this next relationship by reducing the distance properties of the relationships :
MATCH (start:YourNodeLabel {name:"A"}), (end:YourNodeLabel {name:"E"})
MATCH (start)-[r:NEXT*]->(end)
CREATE (start)-[newrel:NEXT]->(end)
SET newrel.distance = reduce(d=0.0, x IN r | d + x.distance)
Be careful however with this, taking into account that there could be more than one path from start to end, in that case for eg if you want to find the shortest distance from start to end, you will need to calculate the total distance and take the lowest one :
MATCH (start:YourNodeLabel {name:"A"}), (end:YourNodeLabel {name:"E"})
MATCH p=(start)-[:NEXT*]->(end)
WITH p, start ,end, reduce(d=0.0, x IN rels(p) | d + x.distance) as totalDistance
ORDER BY totalDistance ASC
LIMIT 1
CREATE (start)-[newRel:NEXT]->(end)
SET newRel.distance = totalDistance
If you don't have the distance properties on the relationships, you can also calculate the geo distance on the fly in the reduce functions :
MATCH (start:YourNodeLabel {name:"A"}), (end:YourNodeLabel {name:"E"})
MATCH p=(start)-[:NEXT*]->(end)
WITH p, start, end,
reduce(d=0.0, x IN range(1, size(nodes(p))-1) | d + distance(point(nodes(p)[x-1]), point(nodes(p)[x])) / 1000) as distance
ORDER BY distance ASC
LIMIT 1
CREATE (start)-[newRel:NEXT]->(end)
SET newRel.distance = distance
As a general advise, I wouldn't use the same relationship type name for the relationship used as a shortcut, maybe CONNECT_TO or REACH_POINT can be more suited in order to not to interfer with the NEXT relationships in other queries.

Related

Neo4j: How to find for each node its next neighbour by distance and create a relationship

I imported a large set of nodes (>16 000) where each node contains the information about a location (longitudinal/lateral geo-data). All nodes have the same label. There are no relationships in this scenario. Now I want to identify for each node the next neighbour by distance and create a relationship between these nodes.
This (brute force) way worked well for sets containing about 1000 nodes: (1) I first defined relationships between all nodes containing the distance information. (2) Then I defined for all relationships the property "mindist=false".(3) After that I identified the next neighbour looking at the the distance information for each relationship and set "mindist" property "true" where the relationship represents the shortest distance. (4) Finally I deleted all relationships with "mindist=false".
(1)
match (n1:XXX),(n2:XXX)
where id(n1) <> id(n2)
with n1,n2,distance(n1.location,n2.location) as dist
create(n1)-[R:DISTANCE{dist:dist}]->(n2)
Return R
(2)
match (n1:XXX)-[R:DISTANCE]->(n2:XXX)
set R.mindist=false return R.mindist
(3)
match (n1:XXX)-[R:DISTANCE]->(n2:XXX)
with n1, min(R.dist) as mindist
match (o1:XXX)-[r:DISTANCE]->(o2:XXX)
where o1.name=n1.name and r.dist=mindist
Set r.mindist=TRUE
return r
(4)
match (n)-[R:DISTANCE]->()
where R.mindist=false
delete R return n
With sets containing about 16000 nodes this solution didn't work (memory problems ...). I am sure there is a smarter way to solve this problem (but at this point of time I am still short on experience working with neo4j/cypher). ;-)
You can process find the closest neighbor one by one for each node in batch using APOC. (This is also a brute-force way, but runs faster). It takes around 75 seconds for 7322 nodes.
CALL apoc.periodic.iterate("MATCH (n1:XXX)
RETURN n1", "
WITH n1
MATCH (n2:XXX)
WHERE id(n1) <> id(n2)
WITH n1, n2, distance(n1.location,n2.location) as dist ORDER BY dist LIMIT 1
CREATE (n1)-[r:DISTANCE{dist:dist}]->(n2)", {batchSize:1, parallel:true, concurrency:10})
NOTE: batchSize should be always 1 in this query. You can change
concurrency for experimentation.
Our options within Cypher are I think limited to a naive O(n^2) brute-force check of the distance from every node to every other node. If you were to write some custom Java to do it (which you could expose as a Neo4j plugin), you could do the check much quicker.
Still, you can do it with arbitrary numbers of nodes in the graph without blowing out the heap if you use APOC to split the query up into multiple transactions. Note: you'll need to add the APOC plugin to your install.
Let's first create 20,000 points of test data:
WITH range(0, 20000) as ids
WITH [x in ids | { id: x, loc: point({ x: rand() * 100, y: rand() * 100 }) }] as points
UNWIND points as pt
CREATE (p: Point { id: pt.id, location: pt.loc })
We'll probably want a couple of indexes too:
CREATE INDEX ON :Point(id)
CREATE INDEX ON :Point(location)
In general, the following query (don't run it yet...) would, for each Point node create a list containing the ID and distance to every other Point node in the graph, sort that list so the nearest one is at the top, pluck the first item from the list and create the corresponding relationship.
MATCH (p: Point)
MATCH (other: Point) WHERE other.id <> p.id
WITH p, [x in collect(other) | { id: x.id, dist: distance(p.location, x.location) }] AS dists
WITH p, head(apoc.coll.sortMaps(dists, '^dist')) AS closest
MATCH (closestPoint: Point { id: closest.id })
MERGE (p)-[:CLOSEST_TO]->(closestPoint)
However, the first two lines there cause a cartesian product of nodes in the graph: for us, it's 400 million rows (20,000 * 20,000) that flow into the rest of the query all of which is happening in memory - hence the blow-up. Instead, let's use APOC and apoc.periodic.iterate to split the query in two:
CALL apoc.periodic.iterate(
"
MATCH (p: Point)
RETURN p
",
"
MATCH (other: Point) WHERE other.id <> p.id
WITH p, [x in collect(other) | { id: x.id, dist: distance(p.location, x.location) }]
AS dists
WITH p, head(apoc.coll.sortMaps(dists, '^dist')) AS closest
MATCH (closestPoint: Point { id: closest.id })
MERGE (p)-[:CLOSEST_TO]->(closestPoint)
", { batchSize: 100 })
The first query just returns all Point nodes. apoc.periodic.iterate will then take the 20,000 nodes from that query and split them up into batches of 100 before running the inner query on each of the nodes in each batch. We'll get a commit after each batch, and our memory usage is constrained to whatever it costs to run the inner query.
It's not quick, but it does complete. On my machine it's running about 12 nodes a second on a graph with 20,000 nodes but the cost exponentially increases as the number of nodes in the graph increases. You'll rapidly hit the point where this approach just doesn't scale well enough.

Compute the distances between two nodes and their lowest common ancestor (LCA)

I need to compute the distance that separate two nodes A and B with their lowest common ancestor in a graph. I use the followinf function to find LCA:
match p1 = (A:Category {idCat: "Main_topic") -[*0..]-> (common:Category) <-[*0..]- (B:Category {idCat: "Heat_transfer"})
return common, p1
Is there any function in Neo4j that allows to return the respective distance between d(A,common) and d(B, common).
Thank you fo your help
If I understand the lowest common ancestor correctly, this comes down to finding the shortest path between A and B with at least one node in between. That you can do using this query. Here the condition that the length of p is larger than 1 forces at least one node between the two. Below example uses the IMDB toy database and returns the movie Avatar.
match p=shortestPath((n:Person {name:'Zoe Saldana'})-[r*1..15]-(n1:Person {name:'James Cameron'})) where length(p) > 1 return nodes(p)[1]
Basically you can choose any element from the nodes in the path, except the first and last one (since those will be A and B)

How to search the shortest distance between a set of nodes using Cypher

I'm trying to figure out if there's someway using cypher in Neo4j to get the shortest distance between a group of nodes.
Some notes to take account for this search:
- Distance is a property of the relationships between nodes. Distance values are in meters
- All nodes have a relationship between them with a given distance.
- The start and end nodes to follow must be the same node.
This is what kind of input i want:
MATCH
(root) -[root_p1:PATH_TO]-> (p1), (root) -[root_p2:PATH_TO]-> (p2), (root) -[root_p3:PATH_TO]-> (p3), (p1) -[p1_root:PATH_TO]-> (root), (p1) -[p1_p2:PATH_TO]-> (p2), (p1) -[p1_p3:PATH_TO]-> (p3), (p2) -[p2_root:PATH_TO]-> (root), (p2) -[p2_p1:PATH_TO]-> (p1), (p2) -[p2_p3:PATH_TO]-> (p3), (p3) -[p3_root:PATH_TO]-> (root), (p3) -[p3_p1:PATH_TO]-> (p1), (p3) -[p3_p2:PATH_TO]-> (p2)
WHERE ID(root) = 10 AND ID(p1) = 1 AND ID(p2) = 2 AND ID(p3) = 3
.
.
.
And then the result should be correct sequence of nodes that contribute to get the shortest path possible.
This query might suit you needs:
MATCH p=(n)-[rels:PATH_TO*]->(n)
WITH p, REDUCE(s = 0, x IN rels | s + x.distance) AS dist
WITH p, MIN(dist) AS d
ORDER BY d
LIMIT 1
RETURN RELATIONSHIPS(p), d;
It finds all directed cyclic paths with PATH_TO relationships; calculates the total distance of each path; gets one path (out of potentially many) with the shortest total distance; and returns all of its relationships, along with the total distance.
Note: This query can take a very long time for large graphs. If so, you can try to put a reasonable upper bound on the variable-length pattern. For example, replace [rels:PATH_TO*] with [rels:PATH_TO*..5].

neo4j shortest with connector node and multiple options

I have Cities, Roads and Transporters in my database.
A Road is connected with a From and To relationship to two (different) Cities. Each road has also a property distance (in kilometers).
Multiple Transporters could have a relationship to Roads. Every Transporter has a price (per kilometer).
Now my question. I want the cheapest option to get a packet from city A to B. There could be a direct road or else we have to go via other cities and transporters. And I want explicitly use the Dijkstra algorithm for this.
Can this query be done in Cypher? And if not, how can it be done using the Neo4J Java API?
Based on your sample dataset, I think there is a modelisation problem that makes maybe things difficult, certainly for matching on directed relationships.
However this is already how you can find the lowest cost path :
MATCH (a:City { name:'CityA' }),(d:City { name:'CityD' })
MATCH p=(a)-[*]-(d)
WITH p, filter(x IN nodes(p)
WHERE 'Road' IN labels(x)) AS roads
WITH p, reduce(dist = 0, x IN roads | dist + x.distance) AS totalDistance
RETURN p, totalDistance
ORDER BY totalDistance
LIMIT 5

Limiting a Neo4j cypher query results by sum of relationship property

Is there a way to limit a cypher query by the sum of a relationship property?
I'm trying to create a cypher query that returns nodes that are within a distance of 100 of the start node. All the relationships have a distance set, the sum of all the distances in a path is the total distance from the start node.
If the WHERE clause could handle aggregate functions what I'm looking for might look like this
START n=node(1)
MATCH path = n-[rel:street*]-x
WHERE SUM( rel.distance ) < 100
RETURN x
Is there a way that I can sum the distances of the relationships in the path for the where clause?
Sure, what you want to do is like a having in a SQL query.
In cypher you can chain query segments and use the results of previous parts in the next part by using WITH, see the manual.
For your example one would assume:
START n=node(1)
MATCH n-[rel:street*]-x
WITH SUM(rel.distance) as distance
WHERE distance < 100
RETURN x
Unfortunately sum doesn't work with collections yet
So I tried to do it differently (for variable length paths):
START n=node(1)
MATCH n-[rel:street*]-x
WITH collect(rel.distance) as distances
WITH head(distances) + head(tail(distances)) + head(tail(tail(distances))) as distance
WHERE distance < 100
RETURN x
Unfortunately head of an empty list doesn't return null which could be coalesced to 0 but just fails. So this approach would only work for fixed length paths, don't know if that's working for you.
I've come across the same problem recently. In more recent versions of neo4j this was solved by the extract and reduce clauses. You could write:
START n=node(1)
MATCH path = (n)-[rel:street*..100]-(x)
WITH extract(x in rel | x.distance) as distances, x
WITH reduce(res = 0, x in rs | res + x) as distance, x
WHERE distance <100
RETURN x
i dont know about a limitation in the WHERE clause, but you can simply specify it in the MATCH clause:
START n=node(1)
MATCH path = n-[rel:street*..100]-x
RETURN x
see http://docs.neo4j.org/chunked/milestone/query-match.html#match-variable-length-relationships

Resources