I'm using Neo4j to try to find any node that is not connected to a specific node "a". The query that I have so far is:
MATCH p = shortestPath((a:Node {id:"123"})-[*]-(b:Node))
WHERE p IS NULL
RETURN b.id as b
So it tries to find the shortest path between a and b. If it doesn't find a path, then it returns that node's id. However, this causes my query to run for a few minutes then crashes when it runs out of memory. I was wondering if this method would even work, and if there is a more efficient way? Any help would be greatly appreciated!
edit:
MATCH (a:Node {id:"123"})-[*]-(b:Node),
(c:Node)
WITH collect(b) as col, a, b, c
WHERE a <> b AND NOT c IN col
RETURN c.id
So col (collect(b)) contains every node connected to a, therefore if c is not in col then c is not connected to a?
For one, you're giving this MATCH an impossible predicate to fulfill, so it will never find the shortest path.
WHERE clauses are associated with MATCH, OPTIONAL MATCH, and WITH clauses, so your query is asking for the shortest path where the path doesn't exist. That will never return anything.
Also, the shortestPath will start at the node you DON'T want to be connected, so this has no way of finding the nodes that aren't connected to it.
Probably the easiest way to approach this is to MATCH to all nodes connected to your node in question, then MATCH to all :Nodes checking for those that aren't in the connected set. That means you won't have to do a shortestPath from every single node in the db, just a membership check in a collection.
You'll need APOC Procedures for this, as it has the fastest means of matching to nodes within a subgraph.
MATCH (a:Node {id:"123"})
CALL apoc.path.subgraphNodes(a, {}) YIELD node
WITH collect(node) as subgraph
MATCH (b:Node)
WHERE NOT b in subgraph
RETURN b.id as b
EDIT
Your edited query is likely to blow up, that's going to generate a huge result set (the query will build a result set of every node reachable from your start node by a unique path in a cartesian product with every :Node).
Instead, go step by step, collect the distinct nodes (because otherwise you'll get multiples of the same nodes that can be reached via different paths), and then only after you have your collection should you start your match for nodes that aren't in the list.
MATCH (:Node {id:"123"})-[*0..]-(b:Node)
WITH collect(DISTINCT b) as col
MATCH (a:Node)
WHERE NOT a IN col
RETURN a.id
Related
I have a graph of nodes with a relationship NEXT with 2 properties sequence (s) and position (p). For example:
N1-[NEXT{s:1, p:2}]-> N2-[NEXT{s:1, p:3}]-> N3-[NEXT{s:1, p:4}]-> N4
A node N might have multiple outgoing Next relationships with different property values.
Given a list of node names, e.g. [N2,N3,N4] representing a sequential path, I want to check if the graph contains the nodes and that the nodes are connected with relationship Next in order.
For example, if the list contains [N2,N3,N4], then check if there is a relationship Next between nodes N2,N3 and between N3,N4.
In addition, I want to make sure that the nodes are part of the same sequence, thus the property s is the same for each relationship Next. To ensure that the order maintained, I need to verify if the property p is incremental. Meaning, the value of p in the relationship between N2 -> N3 is 3 and the value p between N3->N4 is (3+1) = 4 and so on.
I tried using APOC to retrieve the possible paths from an initial node N using python (library: neo4jrestclient) and then process the paths manually to check if a sequence exists using the following query:
q = "MATCH (n:Node) WHERE n.name = 'N' CALL apoc.path.expandConfig(n {relationshipFilter:'NEXT>', maxLevel:4}) YIELD path RETURN path"
results = db.query(q,data_contents=True)
However, running the query took some time that I eventually stopped the query. Any ideas?
This one is a bit tough.
First, pre-match to the nodes in the path. We can use the collected nodes here to be a whitelist for nodes in the path
Assuming the start node is included in the list, a query might go like:
UNWIND $names as name
MATCH (n:Node {name:name})
WITH collect(n) as nodes
WITH nodes, nodes[0] as start, tail(nodes) as tail, size(nodes)-1 as depth
CALL apoc.path.expandConfig(start, {whitelistNodes:nodes, minLevel:depth, maxLevel:depth, relationshipFilter:'NEXT>'}) YIELD path
WHERE all(index in range(0, size(nodes)-1) WHERE nodes[index] = nodes(path)[index])
// we now have only paths with the given nodes in order
WITH path, relationships(path)[0].s as sequence
WHERE all(rel in tail(relationships(path)) WHERE rel.s = sequence)
// now each path only has relationships of common sequence
WITH path, apoc.coll.pairsMin([rel in relationships(path) | rel.p]) as pairs
WHERE all(pair in pairs WHERE pair[0] + 1 = pair[1])
RETURN path
I have 3 labels, A, B, and Z. A & B both have a relationship to Z. I want to find all the A nodes that do not have share any of nodes Z in common with B
Currently, doing a normal query where the relationship DOES exist, works.
MATCH (a:A)-[:rel1]->(z:Z)<-[:rel2]-(b:B { uuid: {<SOME ID>} })
RETURN DISTINCT a
But when I do
MATCH (a:A)
WHERE NOT (a)-[:rel1]->(z:Z)<-[:rel2]-(b:B { uuid: {<SOME ID>} }))
RETURN DISTINCT a
It throws an error
Neo4j::Server::CypherResponse::ResponseError: z not defined
Not sure if the syntax for this is incorrect, I tried WHERE NOT EXIST() but no luck.
The query is part of a larger one called through a rails app using neo4jrb / (Neo4j::Session.query)
This is a problem to do with the scope of your query. When you describe a node in a MATCH clause like the below
MATCH (n:SomeLabel)
You're telling cypher to look for a node with the label SomeLabel, and assign it to the variable n in the rest of the query, and at the end of the query, you can return the values stored in this node using RETURN n (unless you drop n by not including it in a WITH clause).
Later on in you query, if you want to MATCH another node, you can do it in reference to n, so for example:
MATCH (m:SomeOtherLabel)-[:SOME_RELATIONSHIP]-(n)
Will match a variable connected (in any direction) to the node n, with a label SomeOtherLabel, and assign it to the variable m for the rest of the query.
You can only assign nodes to variables like this in MATCH, OPTIONAL MATCH, MERGE, CREATE and (sort of) in WITH and UNWIND clauses (someone correct me here if I've missed one, I suppose you also do this in list comprehensions and FOREACH clauses).
In your second query, you are trying to find a node with the label A, which is not connected to a node with the label Z. However, the way you have written the query means that you are actually saying find a node with label A which is not connected via a rel1 relationship to the node stored as z. This will fail (and as shown, neo complains that z is not defined), because you can't create a new variable like this in the WHERE clause.
To correct your error, you need to remove the reference to the variable z, and ensure you have also defined the variable b containing your node before the WHERE clause. Now, you keep the label in the query, like the below.
MATCH (a:A)
MATCH (b:B { uuid: {<SOME ID>} })
WHERE NOT (a)-[:rel1]->(:Z)<-[:rel2]-(b) // changed this line
RETURN DISTINCT a
And with a bit of luck, this will now work.
You get the error because z is the identifier of a node that you are using in a where clause that you have not yet identified.
Since you know b already I would match it first and then use it in your where clause. You don't need to assign :Z an identifier, simply using the node label will suffice.
MATCH (b:B { uuid: {<SOME ID>} })
WITH b
MATCH (a:A)
WHERE NOT (a)-[:rel1]->(:Z)<-[:rel2]-(b)
RETURN DISTINCT a
I want to write a query in Cypher and run it on Neo4j.
The query is:
Given some start vertexes, walk edges and find all vertexes that is connected to any of start vertex.
(start)-[*]->(v)
for every edge E walked
if startVertex(E).someproperty != endVertex(E).someproperty, output E.
The graph may contain cycles.
For example, in the graph above, vertexes are grouped by "group" property. The query should return 7 rows representing the 7 orange colored edges in the graph.
If I write the algorithm by myself it would be a simple depth / breadth first search, and for every edge visited if the filter condition is true, output this edge. The complexity is O(V+E)
But I can't express this algorithm in Cypher since it's very different language.
Then i wrote this query:
find all reachable vertexes
(start)-[*]->(v), reachable = start + v.
find all edges starting from any of reachable. if an edge ends with any reachable vertex and passes the filter, output it.
match (reachable)-[]->(n) where n in reachable and reachable.someprop != n.someprop
so the Cypher code looks like this:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
The performance of this query is not good as I thought. There are index on :Col(schema)
I am running neo4j 2.3.0 docker image from dockerhub on my windows laptop. Actually it runs on a linux virtual machine on my laptop.
My sample data is a small dataset that contains 0.1M vertexes and 0.5M edges. For some starting nodes it takes 60 or more seconds to complete this query. Any advice for optimizing or rewriting the query? Thanks.
The following code block is the logic I want:
VertexQueue1 = (starting vertexes);
VisitedVertexSet = (empty);
EdgeSet1 = (empty);
While (VertexSet1 is not empty)
{
Vertex0 = VertexQueue1.pop();
VisitedVertexSet.add(Vertex0);
foreach (Edge0 starting from Vertex0)
{
Vertex1 = endingVertex(Edge0);
if (Vertex1.schema <> Vertex0.schema)
{
EdgeSet1.put(Edge0);
}
if (VisitedVertexSet.notContains(Vertex1)
and VertexQueue1.notContains(Vertex1))
{
VertexQueue1.push(Vertex1);
}
}
}
return EdgeSet1;
EDIT:
The profile result shows that expanding all paths has a high cost. Looking at the row number, it seems that Cypher exec engine returns all paths but I want distint edge list only.
LEFT one:
match (start:Col {table:"F_XXY_DSMK_ITRPNL_IDX_STAT_W"})
,(start)-[*0..]->(prev:Col)-->(node:Col)
where prev.schema<>node.schema
return distinct prev,node
RIGHT one:
MATCH (n:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
WITH n MATCH (n:Col)-[*]->(m:Col)
WITH collect(distinct n) + collect(distinct m) AS c1
UNWIND c1 AS rn
MATCH (rn:Col)-[]->(xn:Col) WHERE rn.schema<>xn.schema and xn in c1
RETURN rn,xn
I think Cypher lets this be much easier than you're expecting it to be, if I'm understanding the query. Try this:
MATCH (start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WHERE start.schema <> node.schema
RETURN start, node
Though I'm not sure why you're comparing the schema property on the nodes. Isn't the schema for the start node fixed by the value that you pass in?
I might not be understanding the query though. If you're looking for more than just the nodes connected to the start node, you could do:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN prev, node
That open-ended variable length relationship specification might be slow, though.
Also note that when Cypher is browsing a particular path it stops which it finds that it's looped back onto some node (EDIT relationship, not node) in the path matched so far, so cycles aren't really a problem.
Also, is the DWMDATA value that you're passing in interpolated? If so, you should think about using parameters for security / performance:
http://neo4j.com/docs/stable/cypher-parameters.html
EDIT:
Based on your comment I have a couple of thoughts. First limiting to DISTINCT path isn't going to help because every path that it finds is distinct. What you want is the distinct set of pairs, I think, which I think could be achieved by just adding DISTINCT to the query:
MATCH
(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})
(start)-[*0..]->(prev:Col)-->(node:Col)
WHERE prev.schema <> node.schema
RETURN DISTINT prev, node
Here is another way to go about it which may or may not be more efficient, but might at least give you an idea for how to go about things differently:
MATCH
path=(start:Col {schema:"${DWMDATA}",table:"CHK_P_T80_ASSET_ACCT_AMT_DD"})-->(node:Col)
WITH rels(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel
WITH startNode(rel) AS start_node, endNode(rel) AS end_node
WHERE start_node.schema <> end_node.schema
RETURN start_node, end_node
I can't say that this would be faster, but here's another way to try:
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH collect(ID(node)) AS node_ids
MATCH (:Col)-[r]->(node:Col)
WHERE ID(node) IN node_ids
WITH DISTINCT r
RETURN startNode(r) AS start_node, endNode(r) AS end_node
I suspect that the problem in all cases is with the open-ended variable length path. I've actually asked on the Slack group to try to get a better understanding of how it works. In the meantime, for all the queries that you try I would suggest prefixing them with the PROFILE keyword to get a report from Neo4j on what parts of the query are slow.
// this is very inefficient!
MATCH (start:Col)-[*]->(node:Col)
WHERE start.property IN {property_values}
WITH distinct node
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
you might be better off with this:
MATCH (start:Col)
WHERE start.property IN {property_values}
MATCH (node:Col)
WHERE shortestPath((start)-[*]->(node)) IS NOT NULL
MATCH (prev)-[r]->(node)
RETURN distinct prev, node;
I'm struggling with a problem despite having read a lot of documentation... I'm trying to find my graph root node (or nodes, they may be several top nodes) and counting their immediate children (all relations are typed :BELONGS_TO)
My graph looks like this (cf. attached screenshot). I have been trying the following query which works as long as the root node only has ONE incomming relationship, and it doesn not when it has more than one. (i'm not realy familiar with the cyhper language yet).
MATCH (n:Somelabel) WHERE NOT (()-[:BELONGS_TO]->(n:Somelabel)) RETURN n
Any help would be much appreciated ! (i haven't even tried to count the root nodes immediate children yet...which would be "2" according to my graph)
Correct query was given by cybersam
MATCH (n:Somelabel) WHERE NOT (n)-[:BELONGS_TO]->() RETURN n;
MATCH (n:Somelabel)<-[:BELONGS_TO]-(c:Somelabel)
WHERE NOT (n)-[:BELONGS_TO]->() RETURN n, count(c);
Based on your diagram, it looks like you are actually looking for "leaf" nodes. This query will search for all Somelabel nodes that have no outgoing relationships, and return each such node along with a count of the number of distinct nodes that have a relationship pointing to that node.
MATCH (n:Somelabel)
WHERE NOT (n)-[:BELONGS_TO]->()
OPTIONAL MATCH (m)-[:BELONGS_TO]->(n)
RETURN n, COUNT(DISTINCT m);
If you are actually looking for all "root" nodes, your original query would have worked.
As a sanity check, if you have a specific node that you believe is a "leaf" node (let's say it has an id value of 123), this query should return a single row with null values for r and m. If you get non-null results, then you actually have outgoing relationships.
MATCH (n {id:123})
OPTIONAL MATCH (n)-[r]->(m)
RETURN r, m
I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.