Neo4j Cypher query running endlessly - neo4j

I have rerun the query below multiple times for the last two days and the Neo4j interface says it's running but it seems like it is running endlessly. I have run other queries which have all return an output. I left the query running for 9 hours and it was still running after 9 hr. I'm not sure what the issue is but would appreciate any help.
I'm running Neo4j-community-2.3.12 which is an older version but it should work as I am following a tutorial and the rest of the queries work fine.
Cypher script - which is very basic:
match p=(ione)-[:ResponseTo*]->(itwo)
where length(p)=9 with p
match (u)-[:CreateChat]->(i)
where i in nodes(p)
return count(distinct u);
Image of query running endlessly:

This query looks like an endless loop.
I would suggest instead of getting all the paths and checking length later get the paths of the desired length(9).
Also, consider adding labels in path query.
match p=(ione)-[:ResponseTo*9]->(itwo)
with p
match (u)-[:CreateChat]->(i)
where i in nodes(p)
return count(distinct u);

As Raj noted, you will want to use labels in this, as right now this is doing an all nodes scan which isn't performant.
We can also make sure the second match is more performant by ensuring we start i with the previously matched nodes, rather than applying that as a filter after the match:
match p=(ione)-[:ResponseTo*9]->(itwo)
unwind nodes(p) as i
with DISTINCT i
match (u)-[:CreateChat]->(i)
return count(distinct u);

Related

Unexpected relation and node COUNT in Neo4j

Whenever I am using a query to get the count of a specific node, I always get the number greater than 1 even though there is only one distinct type of that node existing.
Sample query:
MATCH (p)-[rel]->(v:myDistinctNode) RETURN COUNT(v)
Output: 80
MATCH (p)-[rel]->(v:myDistinctNode) RETURN COUNT(DISTINCT v)
Output: 1
I see different results while using DISTINCT, but I cannot use DISTINCT all the time. Why I am seeing this and how can I avoid it? Thanks!
Neo4j Kernel-Version: 3.5.14
The short answer is that you need to use a collect statement to make it work.
MATCH (p)<-[rel]-(v:myDistinctNode) WITH collect(v) AS nodes RETURN count(nodes)
This should return one.
I'm not a cypher expert, but I believe the reason it doesn't work is that the cypher result seems more like a table where in one row you have p, another row you have r, and the last row you have v. Even though v is a unique entity, there are still 80 rows that have v.

Finding nodes not matching query in cypher

We have build up a NEO4j database with sample network data in it. We are currently trying to find a query to find NOT matching nodes.
We have 10 nodes in database and we get 8 of them by using the following query:
match(n:nodeType)-[r:binded]-(m.nodetype2)-[z:binded]-(t:nodeType) where n.workingMode='Working' and t.workingMode='Protection' return n
What we want is finding the two nodes which does not satify above condition.
I have found some entires mentioning the usage of NOT x IN y and tried a few solutions including
match(a) where a.workingMode ='Working'
match(n:nodeType)-[r:binded]-(m.nodetype2)-[z:binded]-(t:nodeType) where n.workingMode='Working' and t.workingMode='Protection'
and NOT a.id IN n.id
return a
but that returns 10 and others i have tried provided either 10 or 0 results.
Thanks in advance
You can pass the result of your first statement to the second statement of the query as a list and then use NOT IN to exclude these nodes:
MATCH(n:nodeType)-[r:binded]-(m.nodetype2)-[z:binded]-(t:nodeType)
WHERE n.workingMode='Working' AND t.workingMode='Protection'
WITH collect(n) as ns
MATCH (a:nodeType)
WHERE a.workingMode ='Working' AND NOT a IN ns
RETURN a

simple match query taking ages

I have a simple query
MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN m
and when executing the query "manually" (i.e. using the browser interface to follow edges) I only get a single node as a result as there are no further connections. Checking this with the query
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o
shows no results and
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE) RETURN m
shows a single node so I have made no mistake doing the query manually.
However, the issue is that the first question takes ages to finish and I do not understand why.
Consequently: What is the reason such trivial query takes so long even though the maximum result would be one?
Bonus: How to fix this issue?
As Tezra mentioned, the variable-length pattern match isn't in the same category as the other two queries you listed because there's no restrictions given on any of the nodes in between n and m, they can be of any type. Given that your query is taking a long time, you likely have a fairly dense graph of :CONNECTION relationships between nodes of different types.
If you want to make sure all nodes in your path are of the same label, you need to add that yourself:
MATCH path = (n:TYPE {id:123})<-[:CONNECTION*]-(m:TYPE)
WHERE all(node in nodes(path) WHERE node:TYPE)
RETURN m
Alternately you can use APOC Procedures, which has a fairly efficient means of finding connected nodes (and restricting nodes in the path by label):
MATCH (n:TYPE {id:123})
CALL apoc.path.subgraphNodes(n, {labelFilter:'TYPE', relationshipFilter:'<CONNECTION'}) YIELD node
RETURN node
SKIP 1 // to avoid returning `n`
MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:TYPE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o Is not a fair test of MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN m because it excludes the possibility of MATCH (n:TYPE {id:123})<-[:CONNECTION]<-(m:ANYTHING_ELSE)<-[n:CONNECTION]-(o:TYPE) RETURN m,o.
For your main query, you should be returning DISTINCT results MATCH (n:TYPE {id:123})<-[:CONNECTION*]<-(m:TYPE) RETURN DISTINCT m.
This is for 2 main reasons.
Without distinct, each node needs to be returned the number of times for each possible path to it.
Because of the previous point, that is a lot of extra work for no additional meaningful information.
If you use RETURN DISTINCT, it gives the cypher planner the choice to do a pruning search instead of an exhaustive search.
You can also limit the depth of the exhaustive search using ..# so that it doesn't kill your query if you run against a much older version of Neo4j where the Cypher Planner hasn't learned pruning search yet. Example use MATCH (n:TYPE {id:123})<-[:CONNECTION*..10]<-(m:TYPE) RETURN m

Cypher query fails with variable length paths when trying to find all paths with unique node occurences

I have a highly interconnected graph where starting from a specific node
i want to find all nodes connected to it regardless of the relation type, direction or length. What i am trying to do is to filter out paths that include a node more than 1 times. But what i get is a
Neo.DatabaseError.General.UnknownError: key not found: UNNAMED27
I have managed to create a much simpler database
in neo4j sandbox and get the same message again using the following data:
CREATE (n1:Person { pid:1, name: 'User1'}),
(n2:Person { pid:2, name: 'User2'}),
(n3:Person { pid:3, name: 'User3'}),
(n4:Person { pid:4, name: 'User4'}),
(n5:Person { pid:5, name: 'User5'})
With the following relationships:
MATCH (n1{pid:1}),(n2{pid:2}),(n3{pid:3}),(n4{pid:4}),(n5{pid:5})
CREATE (n1)-[r1:RELATION]->(n2),
(n5)-[r2:RELATION]->(n2),
(n1)-[r3:RELATION]->(n3),
(n4)-[r4:RELATION]->(n3)
The Cypher Query that causes this issue in the above model is
MATCH p= (n:Person{pid:1})-[*0..]-(m)
WHERE ALL(c IN nodes(p) WHERE 1=size(filter(d in nodes(p) where c.pid = d.pid)) )
return m
Can anybody see what is wrong with this query?
The error seems like a bug to me. There is a closed neo4j issue that seems similar, but it was supposed to be fixed in version 3.2.1. You should probably create a new issue for it, since your comments state you are using 3.2.5.
Meanwhile, this query should get the results you seem to want:
MATCH p=(:Person{pid:1})-[*0..]-(m)
WITH m, NODES(p) AS ns
UNWIND ns AS n
WITH m, ns, COUNT(DISTINCT n) AS cns
WHERE SIZE(ns) = cns
return m
You should strongly consider putting a reasonable upper bound on your variable-length path search, though. If you do not do so, then with any reasonable DB size your query is likely to take a very long time and/or run out of memory.
When finding paths, Cypher will never visit the same node twice in a single path. So MATCH (a:Start)-[*]-(b) RETURN DISTINCT b will return all nodes connected to a. (DISTINCT here is redundant, but it can affect query performance. Use PROFILE on your version of Neo4j to see if it cares and which is better)
NOTE: This works starting with Neo4j 3.2 Cypher planner. For previous versions of
the Cypher planner, the only performant way to do this is with APOC, or add a -[:connected_to]-> relation from start node to all children so that path doesn't have to be explored.)

Optimizing Cypher Query Neo4j

I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;

Resources