How do we find out paths in neo4j which will cover all the selected set of nodes.
This will be used for solving a tsp of sorts. to find path which covers all the given set of cities, later will sort on the basis of total distance covered by the path.
You can find all paths between two nodes like this.
match p=(:startNode)-->(:endNode)
return p;
or you can try like
match p=(:startNode)-->(:endNode)
return nodes(p) as node, relationships(p) as rels
You can get all paths like this
match p = ()-->()
return p
limit 100 // Apply limit according to nodes
Related
I have a graph database that consists of nodes (bus stations) with a property called “is_in_operation” which is set to “true” if the bus station is operational; otherwise it is set to “false”.
There is a relationship created between two nodes if a bus travels between the two stations.
I would like to find the path with the shortest number of stops between two nodes where all nodes in the path are operational.
There is an example in the database where there are 2 paths between 2 specified nodes. The “is_in_operation” property is set to ‘true’ for all nodes in both paths. When I run the following query, I get the correct answer
START d=node(1), e=node(5)
MATCH p = shortestPath( d-[*..15]->e ) where all (x in nodes(p) where x.is_in_operation='true')
RETURN p;
When I set the ‘is_in_operation’ property to ‘false’ for one of the intermediate nodes in the shortest path and rerun the query, I expect it to return the other path. However, I get no answer at all.
Is the query incorrect? If so, how should I specify the query?
The problem is that shortestPath can't take into account the where clause, so you're matching the shortest path, and then filtering it out with your where.
How about this one--it might not be as efficient as shortestPath, but it should return a result, if one exists:
START d=node(1), e=node(5)
MATCH p = d-[*..15]->e
where all (x in nodes(p) where x.is_in_operation='true')
RETURN p
order by len(p)
limit 1;
Can anybody send the Cypher query to find LONGEST PATH between two nodes using neo4j 3.1.0 version.
The graph algorithm to find the longest path is not implemented.
Here is a Cypher query that gets all paths and sorts them by size:
// get the nodes
MATCH (a:Node), (b:Node)
WHERE ID(a) = 14 AND ID(b) = 7
WITH a,b
// match all paths
MATCH p=(a)-[*]-(b)
// return the longest
RETURN p, length(p) ORDER BY length(p) DESC LIMIT 1
However, without any restrictions on the query, this might not work on large graphs. Finding the longest path in an undirected graph is expensive: https://en.wikipedia.org/wiki/Longest_path_problem
And without restrictions on the query (direction and relationship type) the query will look for all undirected paths.
You should restrict your path query or try two solve your problem without the longest path.
If you are looking for the longest path between nodes where each node in the chain has the same relationship to the next then see Achieving longestPath Using Cypher which shows:
MATCH p=(parent:Entity)-[:HAS_CHILD*1..10]->(child:Person)
WHERE size((child)-[:HAS_CHILD]->()) = 0
RETURN child;
However I am using:
MATCH (parent:Entity)-[:HAS_CHILD*1..10]->(child:Person)
WHERE NOT (child)-[r:HAS_CHILD]->()
RETURN child;
If anyone can comment on performance or any other aspect please do!
I also discovered an undocumented feature which will return the child when it is also the parent (as in only one node):
MATCH (parent:Entity)-[:HAS_CHILD*0..]->(child:Person)
WHERE NOT (child)-[r:HAS_CHILD]->()
RETURN child;
I search the longest path of my graph and I want to count the number of distinct nodes of this longest path.
I want to use count(distinct())
I tried two queries.
First is
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return nodes(p1)
The query result is a graph with the path nodes.
But if I tried the query
match p=(primero)-[:ResponseTo*]-(segundo)
with max(length(p)) as lengthPath
match p1=(primero)-[:ResponseTo*]-(segundo)
where length(p1) = lengthPath
return count(distinct(primero))
The result is
count(distinct(primero))
2
How can I use count(distinct()) over the node primero.
Node Primero has a field called id.
You should bind at least one of those nodes, add a direction and also consider a path-limit otherwise this is an extremely expensive query.
match p=(primero)-[:ResponseTo*..30]-(segundo)
with p order by length(p) desc limit 1
unwind nodes(p) as n
return distinct n;
I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
If I have a linked series of nodes: A<--B-->C-->D
match (n)-[r]-(m) return n,r,m will give me [A,B],[B,A],[B,C],[C,B],[C,D],[D,C].
What if I want to return only one of each pair: [A,B], [B,C], [C,D]?
How do I discard the 'duplicates' that come from ignoring the path direction? The nature of this data is such that path directions are fluid and unpredictable. I can get the answer in code with the duplicated results but I'm wondering if there is a way to get neo4j to do the work in advance.
Use the direction arrow:
match (n)-[r]->(m) return n,r,m
if you want to have undirected paths, the pair will be returned twice
you can limit that by imposing order.
match (n)-[r]->(m) where id(n) < id(m) return n,r,m