Keep relationships to closest neighbors - neo4j

I have a graph with structure (:Person)-[:KNOW]->(:Person).
Now a Person node has a latitude and longitude. One Person is connected to 10 other Persons.
For each Person, I want to keep the relationships to only 5 closest Persons. Since the graph is very large, I'm thinking about using apoc.periodic.iterate.
Here is what I have now, but I don't know how to Delete the relationships for the last 5 Persons:
CALL apoc.periodic.iterate("MATCH (n:Person) RETURN n",
"WITH n
MATCH (n)-[r:KNOW]->(m:Person)
WITH point({longitude: TOFLOAT(n.long), latitude: TOFLOAT(n.lat)}) AS p1, point({longitude: TOFLOAT(m.long), latitude: TOFLOAT(m.lat)}) AS p2, r
WITH point.distance(p1, p2) AS Distance, r ORDER BY Distance",
{batchSize:10000, parallel:false})
Could you suggest a solution?

After you sort (ORDER BY) by distance, you can collect the relationships then get the 6th to the last item of the list. Then you can remove the farthest nodes to n.
CALL apoc.periodic.iterate("MATCH (n:Person) RETURN n",
"WITH n
MATCH (n)-[r:KNOW]->(m:Person)
WITH point({longitude: TOFLOAT(n.long), latitude: TOFLOAT(n.lat)}) AS p1, point({longitude: TOFLOAT(m.long), latitude: TOFLOAT(m.lat)}) AS p2, r, n
WITH point.distance(p1, p2) AS Distance, n, r ORDER BY n, Distance
WITH n, collect(r)[5..] as farthest_dist
FOREACH (farthest_r in farthest_dist|DELETE farthest_r)",
{batchSize:10000, parallel:false})
I added n on the sorting because batch_size is 10000 so there are 10k n persons per batch. This notation collect(r)[5..] means put all relationships in a list and give me the 6th item, 7th item, up to the max item. You can also use UNWIND() at the last statement instead of FOREACH(), if you like.
UNWIND farthest_dist as farthest_r
DELETE farthest_r
Before you remove the relationships, I would suggest that you backup your database first so that you can restore your data when needed.

Related

Neo4J variable alias not recognized

I have three Person nodes with various relationships. Each node has a latitude and longitude.
First, I find the combinations of pairs of nodes:
MATCH (p1: Person)-[]->(p2: Person)
RETURN p1.name, p2.name
My output is correct:
Next, I attempt to find the distances between the pairs of nodes:
MATCH (p1:Person)-[]->(p2:Person)
WITH point({longitude: p1.longitude, latitude: p1.latitude}) AS p1Point,
point({longitude: p2.longitude, latitude: p2.latitude}) AS p2Point
RETURN (distance(p1Point, p2Point)) AS distance
My output is here:
Finally, I want to put it all together. I want to list the names of each node pair and the associated distance between them:
MATCH (p1:Person)-[]->(p2:Person)
WITH point({longitude: p1.longitude, latitude: p1.latitude}) AS p1Point,
point({longitude: p2.longitude, latitude: p2.latitude}) AS p2Point
RETURN p1.name, p2.name, (distance(p1Point, p2Point)) AS distance
I get an error that p1 is not defined.
What is the problem here? There is similar syntax described here:
https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/all-pairs-shortest-path/
The WITH clause unbinds all variables except for the ones it carries forward.
This should work:
MATCH (p1:Person)-->(p2:Person)
WITH p1, p2,
point({longitude: p1.longitude, latitude: p1.latitude}) AS p1Point,
point({longitude: p2.longitude, latitude: p2.latitude}) AS p2Point
RETURN p1.name, p2.name, distance(p1Point, p2Point) AS distance
This should also work (since WITH is not really needed):
MATCH (p1:Person)-->(p2:Person)
RETURN p1.name, p2.name,
distance(
point({longitude: p1.longitude, latitude: p1.latitude}),
point({longitude: p2.longitude, latitude: p2.latitude})) AS distance
[UPDATED]
By the way, with an always-bidirectional relationship like RELATED_TO, you should just use a single undirected relationship instead of two directed relationships pointing in opposite directions. Note, though, that the CREATE clause only supports creating directed relationships, so just pick any arbitrary direction -- it does not matter which. Later, when you do a search, just use an undirected search (like MATCH (p1:Person)--(p2:Person) ...). Also, you should look into whether you should use MERGE instead, since it does allow an undirected relationship pattern.

How to sum up property value of node type different from starting and ending node with Cypher in Neo4j

I have Neo4j community 3.5.5, where I built a graph data model with rail station and line sections between stations. Stations and line sections are nodes and connect is the relationship linking them.
I would like to name a starting station and an ending station and sum up length of all rail section in between. I tried the Cypher query below but Neo4j does not recognize line_section as a node type.
match (n1:Station)-[:Connect]-(n2:Station)
where n1.Name='Station1' and n2.Name='Station3'
return sum(Line_Section.length)
I know it is possible to do the summing with Neo4j Traversal API. Is it possible to do this in Cypher?
First capture the path from a start node to the end node in a variable, then reduce the length property over it.
MATCH path=(n1: Station { name: 'Station1' })-[:Connect]->(n2: Station { name: 'Station2' })
RETURN REDUCE (totalLength = 0, node in nodes(path) | totalLength + node.length) as totalDistance
Assuming the line section nodes have the label Line_Section, you can use a variable-length relationship pattern to get the entire path, list comprehension to get a list of the line section nodes, UNWIND to get the individual nodes, and then use aggregation to SUM all the line section lengths:
MATCH p = (n1:Station)-[:Connect*]-(n2:Station)
WHERE n1.Name='Station1' AND n2.Name='Station3'
UNWIND [n IN NODES(p) WHERE 'Line_Section' in LABELS(n)] AS ls
RETURN p, SUM(ls.length) AS distance
Or, you could use REDUCE in place of UNWIND and SUM:
MATCH p = (n1:Station)-[:Connect*]-(n2:Station)
WHERE n1.Name='Station1' AND n2.Name='Station3'
RETURN p, REDUCE(s = 0, ls IN [n IN NODES(p) WHERE 'Line_Section' in LABELS(n)] |
s + ls.length) AS distance
[UPDATED]
Note: a variable-length relationship with an unbounded number of hops is expensive, and can take a long time or run out of memory.
If you only want the distance for the shortest path, then this should be faster:
MATCH
(n1:Station { name: 'Station1' }),
(n2:Station {name: 'Station3' }),
p = shortestPath((n1)-[*]-(n2))
WHERE ALL(r IN RELATIONSHIPS(p) WHERE TYPE(r) = 'Connect')
RETURN p, REDUCE(s = 0, ls IN [n IN NODES(p) WHERE 'Line_Section' in LABELS(n)] |
s + ls.length) AS distance

is it possible to iterate though property of relationship cypher

This is related to this question: How to store properties of a neo4j node as an array?
I would like to iterate through a property of a relationship and check max of that value and assign a new relationship of node1 and node2 and delete node1 from the pool and move to the second one. In other words as in the context of my previous question, How to assign a given employee to a given position based max(r.score) and move to the other employee who has a maximum r.score for another position? Thanks
Have this basic query to assign a position for the employee who has a maximum r.score w.r.t position and remove him from pool of candidates. However, I have to run this manually for the second position. Ideally I want something that checks length if available positions and then fills positions with max(r.score) and then stops when all positions are filled. may be returns a report of hired employees...
MATCH (e:Employee)-[r:FUTURE_POSITION]->(p:Position)
WITH MAX(r.score) as s
MATCH (e)-[r]->(p) WHERE r.score = s
CREATE (e)-[r2:YOUAREHIRED]->(p)
DELETE r
RETURN e.name, s
This query may work for you:
MATCH (:Employee)-[r:FUTURE_POSITION]->(p:Position)
WITH p, COLLECT(r) AS rs
WITH p, REDUCE(t = rs[0], x IN rs[1..] |
CASE WHEN x.score > t.score THEN x ELSE t END) AS maxR
WITH p, maxR, maxR.score AS maxScore, STARTNODE(maxR) AS e
CREATE (e)-[:YOUAREHIRED]->(p)
DELETE maxR
RETURN p, e.name AS name, maxScore;
The first WITH clause collects all the FUTURE_POSITION relationships for each p.
The second WITH clause obtains, for each p, the relationship with the maximum score.
The third WITH clause extracts the variables needed by subsequent clauses.
The CREATE clause creates the YOUAREHIRED relationship between e (the employee with the highest score for a given p) and p.
The DELETE clause deletes the FUTURE_POSITION relationship between e and p.
The RETURN clause returns each p, along with and the name of the employee who was just hired for p, and his score, maxScore.
[UPDATE]
If you want to delete all FUTURE_POSITION relationships of each p node that gets a YOUAREHIRED relationship, you can use this slightly different query:
MATCH (:Employee)-[r:FUTURE_POSITION]->(p:Position)
WITH p, COLLECT(r) AS rs
WITH p, rs, REDUCE(t = rs[0], x IN rs[1..] |
CASE WHEN x.score > t.score THEN x ELSE t END) AS maxR
WITH p, rs, maxR.score AS maxScore, STARTNODE(maxR) AS e
CREATE (e)-[:YOUAREHIRED]->(p)
FOREACH(x IN rs | DELETE x)
RETURN p, e.name AS name, maxScore;

Neo4j: optimum path search

Having a graph of people who like rated movies, I would like to extract for each pair of people their highest rated movie. I'm using the following query which requires sorting movies on their rate for each pair of people.
MATCH (p1:People) -[:LIKES]-> (m:Movie) <-[:LIKES]- (p2:People) WHERE id(p1) < id(p2)
WITH p1, p2, m ORDER BY m.Rating desc
RETURN p1, p2, head(collect(m) as best
I can put movie rating (1/rating or maxRating-rating) into :LIKES relationships, which hence let me identify which movie is in the top rating of both people.
MATCH (p1:People), (p2:People) call apoc.algo.dijkstra(p1, p2, 'LIKES', 'rating') YIELD path as path, weight as weight return path, weight
Is there a way to use a Dijkstra-like algorithm which would find the allOptimumPath through highest scored nodes to improve the performance of my first query and return paths rather than their starting, middle and ending nodes ?
Many thanks in advance.
Here is an alternate solution which preserves the path rather than reporting extracted nodes.
MATCH path=(p1:People) -[:LIKES]-> (m:Movie) <-[:LIKES]- (p2:People)
WHERE id(p1) < id(p2)
WITH head(nodes(p)) as p1, last(nodes(p)) as p2, path
ORDER BY m.Rating desc
WITH p1, p2, head(collect(p)) as optPath
RETURN optPath

Merging tracks in Neo4j

(Using Neo4j 3.x and neo4j.v1 Python driver)
I have two tracks T1 and T2 to the same target. Somewhere before reaching the target, the two tracks meet at node X and become one until the target is reached.
Track T1: T----------X-----------A
Track T2: '-----Q
I use the following Cypher query to generate each one of the tracks:
UNWIND {coords} AS coordinates
UNWIND {pax} AS pax
CREATE (n:Node)
SET n = coordinates
SET n.pax = pax
RETURN n
using the parameter list, e.g. {'pax': 'A', 'coords': [{'id': 0, 'lon': '8.553095', 'lat': '47.373146'}, etc.]}
and then link the nodes using the id only for the purpose of keeping the sequence of the trackpoints:
UNWIND {pax} AS pax
MATCH (n:Node {pax: pax})
WITH n
ORDER BY n.id
WITH COLLECT(n) AS nodes
UNWIND RANGE(0, SIZE(nodes) - 2) AS idx
WITH nodes[idx] AS n1, nodes[idx+1] AS n2
MERGE (n1)-[:NEXT]->(n2)
From the (unknown) point X (CS1 in the picture above) on, both tracks have identical trackpoints. I can match those using:
MATCH (n:Node), (m:Node)
WHERE m <> n AND n.id < m.id AND n.lat = m.lat AND n.lon = m.lon
MERGE (n)-[:IS]->(m)
with lat, lon being the (identical) coordinates. This is just my clumsy way to determine the first joint trackpoint. What I really need is to have one (linked) track from point X onward with the pax property updated, e.g. as ['A', 'B']
Question 1 (generalized):
How can I merge two nodes with a relationship into one node with an updated property? C3 and S3 merge into a new node CS3.
Question 2:
How can I do this if I have two linked lists with a set of pairwise identical properties?
(Ax)-[:NEXT]-> (A1)-[:NEXT]->(A2)-[:NEXT]->(A3)
(Ax)-[:NEXT]-> (B1)-[:NEXT]->(B2)-[:NEXT]->(B3)
where Ax.x <> Bx.x but A1.x = B1.X and A2.x = B2.x etc.
Thank you all for your hints and helpful ideas.

Resources