How can I break my Cypher query to 2 subqueries? - neo4j

I want to have a query that starts with 2 given nodes and for each of them takes the up to 5 related nodes (by relation R1) and then looks for shortestPath between those 10 nodes (5 from the 1st and 5 from the 2nd original nodes).
I can't manage to "break" my query into 2 parts, each one calculates the 5 nodes and then MATCH the path on both of them.
My query so far is:
MATCH (n1:MyNode {node_id:12345})-[:R1]-(r1:RelatedNode)
WITH r1 LIMIT 5
MATCH (n2:MyNode {node_id:98765})-[:R1]-(r2:RelatedNode)
WITH r1,r2 LIMIT 5
MATCH p=shortestPath( (r1)-[*1..10]-(r2) )
RETURN p
The problem is that the second subquery is not really separated from the first, and still carries on r1, which makes the LIMIT wrong.
I want to run the first part, then run the second part (with only r2), and only then after having r1 and r2 calculated separately, match the shortest path. Can that be done?
Thanks!

Not sure if I understood your requirement correctly. I assume you want to find the shortest path in between any of the first five neighbors of n1 and n2.
I guess you have to pass through the limited result as a collection and UNWIND it later on:
MATCH (n1:MyNode {node_id:12345})-[:R1]-(r1:RelatedNode)
WITH r1 LIMIT 5
WITH collect(r1) as startNodes
MATCH (n2:MyNode {node_id:98765})-[:R1]-(r2:RelatedNode)
WITH r2, startNodes LIMIT 5
WITH collect(r2) as endNodes, startNodes
UNWIND startNodes as s UNWIND endNodes as e
MATCH p=shortestPath( (s)-[*1..10]-(e) )
RETURN p, length(p) ORDER BY length(p) ASC LIMIT 1
Be aware that the two UNWIND basically create a cross product. So you're calculated 5*5 = 25 shortest paths. Out of them we sort by length and pick the first one.

Related

Neo4J finding two nodes such that the shortest path between them is of length n

I would like to know if there is a way to find two nodes such that the shortest path between them is of a specific length, say, 10.
All my nodes have the same label; "n1", and the shortest path can be through any edge type.
So far i have been doing this manually, by finding the shortest path between node n and node m, and constantly changing n and m and stop when i find a path of length 10.
Here is the Cypher query:
match sp = shortestpath((startNode)-[*]->(endNode)) where id(startNode) = 1 and id(endNode) = 2 return sp
Note, i do not specify the node label since i only have one label in the graph.
So i just continuously change the start and end nodes and run it until i find a path of the desired length.
I'm sure there is an easier way to do this, but since i am a Neo beginner i am struggling to figure it out.
I have also tried this:
MATCH (n1), (n2)
WHERE n1 <> n2 and shortestPath((n1)-[*]-(n2)) = 5
RETURN n1, n2
LIMIT 2
However, i don't believe this is correct because shortest paths of length 5 is very common in my graph, and it is taking a long time to execute...
[UPDATED]
This query should be more performant. It avoids using a cartesian product, places an upper bound on the variable-length relationship pattern, and does not even use shortestpath.
MATCH p=(n1)-[*10]->(n2)
WHERE n1 <> n2 AND NOT (n1)-[*..9]->(n2)
RETURN n1, n2
LIMIT 1
This has worked for me!
MATCH (n1), (n2)
WHERE n1 <> n2 and length(shortestPath((n1)-[*]->(n2))) = 10
RETURN n1, n2
LIMIT 1

Neo4j - Find movies that share most tags with a particular movie

This Cypher query I wrote to find the top 10 movies that share the most number of tags with "Explorers (1985)" is not returning the desired result. In fact, it runs for a very long time and then stops because there isn't enough memory to complete the computation.
I'm relatively new to Cypher. I would appreciate any help someone could offer.
MATCH (m1:Movie {title:"Explorers (1985)"})-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(m2:Movie)
WITH size((m2)-[:HAS_TAG]->(t)) as cnt,
m2
RETURN m2, cnt
ORDER BY in DESC LIMIT 10
You can simplify your match a bit, since we're traversing the same relationship type twice. As for the tag count, you can either count(t), or simply count the times m2 occurs in the results, since it will have a row for each tag in common.
Give this one a try:
MATCH (:Movie {title:"Explorers (1985)"})-[:HAS_TAG*2]-(m2:Movie)
WITH m2, count(m2) as cnt
RETURN m2, cnt
ORDER BY cnt DESC LIMIT 10
I can't confirm on a large data set, but I think you can minimize loaded nodes by breaking up the query into steps like this...
// Get tags on base movie
MATCH (m1:Movie {title:"Explorers (1985)"})-[:HAS_TAG]->(t:Tag)
// Reduce tags to 1 row
WITH m1, COLLECT(id(t)) as tags
// Find only valid Movie-HAS->Tag
MATCH (m2:Movie)-[:HAS_TAG]->(t:Tag)
WHERE id(t) in tags AND NOT m2.title = "Explorers (1985)"
RETURN m2, COUNT(t) as cnt
ORDER BY cnt DESC LIMIT 10

neo4j indegree outdegree union

I want to compute Indegree and Outdegree and return a graph that has a connection between top 5 Indegree nodes and top 5 Outdegree nodes. I have written a code as
match (a:Port1)<-[r]-()
return a.id as NodeIn, count(r) as Indegree
order by Indegree DESC LIMIT 5
union
match (n:Port1)-[r]->()
return n.id as NodeOut, count(r) as Outdegree
order by Outdegree DESC LIMIT 5
union
match p=(u:Port1)-[:LinkTo*1..]->(t:Port1)
where u.id in NodeIn and t.id in NodeOut
return p
I get an error as
All sub queries in an UNION must have the same column names (line 4, column 1 (offset: 99)) "union"
What are the changes that I need to do to the code?
There's a few things we can improve.
The matches you're doing isn't the most efficient way to get incoming and outgoing degrees for relationships.
Also, UNION can only be used to combine query results with identical columns. In this case, we won't even need UNION, we can use WITH to pipe results from one part of a query to another, and COLLECT() the nodes you need in between.
Try this query:
match (a:Port1)
with a, size((a)<--()) as Indegree
order by Indegree DESC LIMIT 5
with collect(a) as NodesIn
match (a:Port1)
with NodesIn, a, size((a)-->()) as Outdegree
order by Outdegree DESC LIMIT 5
with NodesIn, collect(a) as NodesOut
unwind NodesIn as NodeIn
unwind NodesOut as NodeOut
// we now have a cartesian product between both lists
match p=(NodeIn)-[:LinkTo*1..]->(NodeOut)
return p
Be aware that this performs two NodeLabelScans of :Port1 nodes, and does a cross product of the top 5 of each, so there are 25 variable length path matches, which can be expenses, as this generates all possible paths from each NodeIn to each NodeOut.
If you only one the shortest connection between each, then you might try replacing your variable length match with a shortestPath() call, which only returns the shortest path found between each two nodes:
...
match p = shortestPath((NodeIn)-[:LinkTo*1..]->(NodeOut))
return p
Also, make sure your desired direction is correct, as you're matching nodes with the highest in degree and getting an outgoing path to nodes with the highest out degree, that seems like it might be backwards to me, but you know your requirements best.

With cypher, how can i get the less Transfer Path in my DB?

i created a Bus Route database with neo4j, you can download it here https://www.dropbox.com/s/zamkyh2aaw3voe6/data.rar?dl=0
i want to get the less Transfer Path, and i am doing like :
MATCH path=allShortestPaths((start:潍坊_STATION {name:'寒亭一村'})-[rels*..50]->(end:潍坊_STATION {name:'火车站'}))
RETURN NODES(path) AS stations,relationships(path) AS path,
length(FILTER(index IN RANGE(1, length(rels)-1) WHERE (rels[index]).bus <> (rels[index - 1]).bus)) AS transfer_count
ORDER BY transfer_count
LIMIT 10
But the result is not correct , who can help me ?
Does this give you what you want:
MATCH path=allShortestPaths((start:潍坊_STATION {name:'寒亭一村'})-[rels*..50]->(end:潍坊_STATION {name:'火车站'}))
RETURN NODES(path) AS stations,relationships(path) AS route,
length(path) AS transfer_count
ORDER BY transfer_count ASC
LIMIT 10
This should return 10 paths connecting the two stations ordered by the length of the path. Although, the ordering here isn't needed. Since we are using the allShortestPaths function, all paths found will be the same length (in the case of the data you shared, 22).
You mention transfer time in your question, but I don't see the time as a property in your data. If you have the time stored you could use the reduce function to sum the travel time and order by that.
Edit
Use the extract function to collect the bus names from the relationships in the path:
MATCH
path=allShortestPaths((start:潍坊_STATION {name:'寒亭一村'})-[rels*..50]->(end:潍坊_STATION {name:'火车站'}))
RETURN
NODES(path) AS stations, relationships(path) AS route,
length(path) AS transfer_count, extract(x in rels | x.bus) AS buses
ORDER BY transfer_count ASC
LIMIT 10

Neo4j cyper query: How to get unique nodes for two depth with their depth value

I am using Cyper query in neo4j
My requirement is,
need to get two level unique(friends) and their shortest depth value.
Graph looks like,
a-[:frnd]->b, b-[:frnd]->a
b-[:frnd]->c, c-[:frnd]->b
c-[:frnd]->d, d-[:frnd]->c
a-[:frnd]->c, c-[:frnd]->a
I tried as,
START n=node(8) match p=n-[:frnd*1..2]->(x) return x.email, length(p)
My output is,
b 1 <--length(p)
a 2
c 2
c 1
d 2
a 2 and so on.
My required output,
My parent node(a) should not not be listed.
I need only (c) with shortest length 1
c with 2 should not be repeated.
Pls help me to solve this,.
(EDITED. Finding n via START n=node(8) causes problems with other variables later on. So, below we find n in the MATCH statement.)
MATCH p = shortestPath((n {email:"a"})-[:frnd*..2]->(x))
WHERE n <> x AND length(p) > 0
RETURN x.email, length(p)
ORDER BY length(p)
LIMIT 1
If there are multiple "closest friends", this returns one of them.
Also, the shortestPath() function does not support a minimal path length -- so "1..2" had be become "..2", and the WHERE clause needed to specify length(p) > 0.

Resources