Is there any way to index relationship existence? - neo4j

Recently i faced with the problem in creating chain of nodes using next query in loop
MATCH (p: Node) WHERE NOT (p)-[:RELATIONSHIP]->()
WITH p LIMIT 1000
MATCH (q: Node{id: p.id}) WITH p, max(id(q)) as tail
MATCH (t: Node) where id(t) = tail
WITH p, t
CREATE (p)-[:RELATIONSHIP]->(t)
The problem appears after creating chain with first ~1 000 000 nodes. Query
MATCH (p: Node) WHERE NOT (p)-[:RELATIONSHIP]->()
works very slow because it looks through first 1 000 000 and checks if they don't have a relationship, but they all have. At some amount of nodes query ends with "Unknown error". To get around with it I tried next queries.
MATCH (p: Node) with p skip 1000000
Match (p) WHERE NOT (p)-[:RELATIONSHIP]->()
or
MATCH (p: Node) with p order by id(p) desc
MATCH (p) WHERE NOT (p)-[:RELATIONSHIP]->()
But i wonder if there more elegant way to solve this problem like "indexing relationship existence"?

You can index relationship properties using "legacy indexing," which isn't exactly recommended anymore, but this won't index the absence of relationships so it wouldn't do you any good. I'd probably try to find a way to mark nodes in need of relationships through either a label or an index on a property. Start your match from there, it'll be much faster.

Related

How could I optimize this cypher query?

When I used this cypher query
match p=(n)-[r*8]-(n)
where id(n)=548
with p
where ALL(x IN nodes(p)[1..length(p)] WHERE SINGLE(y IN nodes(p)[1..length(p)] WHERE x=y))
return count(p)
it took 51922 ms to return the result; it is really a long time. How could I optimize this cypher query? Any help would be appreciated.
Looks like you want a simple circuit with no repeating nodes (except the start and end node).
There's an APOC Procedure to get all simple paths between two nodes, with a maximum path length. It doesn't currently work when the start and end nodes are the same, but if we set the end node as any adjacent node to your start node, and filter to only keep paths of length 7 (since the paths exclude the last hop back to the start node), then we should be able to get the right answer extremely fast.
match (n)--[m]
with distinct n, m
call apoc.algo.allSimplePaths(n, m, '', 7) YIELD path
with path
where length(path) = 7
return count(path)

Neo4j Cypher find two disjoint nodes

I'm using Neo4j to try to find any node that is not connected to a specific node "a". The query that I have so far is:
MATCH p = shortestPath((a:Node {id:"123"})-[*]-(b:Node))
WHERE p IS NULL
RETURN b.id as b
So it tries to find the shortest path between a and b. If it doesn't find a path, then it returns that node's id. However, this causes my query to run for a few minutes then crashes when it runs out of memory. I was wondering if this method would even work, and if there is a more efficient way? Any help would be greatly appreciated!
edit:
MATCH (a:Node {id:"123"})-[*]-(b:Node),
(c:Node)
WITH collect(b) as col, a, b, c
WHERE a <> b AND NOT c IN col
RETURN c.id
So col (collect(b)) contains every node connected to a, therefore if c is not in col then c is not connected to a?
For one, you're giving this MATCH an impossible predicate to fulfill, so it will never find the shortest path.
WHERE clauses are associated with MATCH, OPTIONAL MATCH, and WITH clauses, so your query is asking for the shortest path where the path doesn't exist. That will never return anything.
Also, the shortestPath will start at the node you DON'T want to be connected, so this has no way of finding the nodes that aren't connected to it.
Probably the easiest way to approach this is to MATCH to all nodes connected to your node in question, then MATCH to all :Nodes checking for those that aren't in the connected set. That means you won't have to do a shortestPath from every single node in the db, just a membership check in a collection.
You'll need APOC Procedures for this, as it has the fastest means of matching to nodes within a subgraph.
MATCH (a:Node {id:"123"})
CALL apoc.path.subgraphNodes(a, {}) YIELD node
WITH collect(node) as subgraph
MATCH (b:Node)
WHERE NOT b in subgraph
RETURN b.id as b
EDIT
Your edited query is likely to blow up, that's going to generate a huge result set (the query will build a result set of every node reachable from your start node by a unique path in a cartesian product with every :Node).
Instead, go step by step, collect the distinct nodes (because otherwise you'll get multiples of the same nodes that can be reached via different paths), and then only after you have your collection should you start your match for nodes that aren't in the list.
MATCH (:Node {id:"123"})-[*0..]-(b:Node)
WITH collect(DISTINCT b) as col
MATCH (a:Node)
WHERE NOT a IN col
RETURN a.id

Create association between nodes if one doesnt exist using cypher

Say there are 2 labels P and M. M has nodes with names M1,M2,M3..M10. I need to associate 50 nodes of P with each Node of M. Also no node of label P should have 2 association with node of M.
This is the cypher query I could come up with, but doesn't seem to work.
MATCH (u:P), (r:M{Name:'M1'}),(s:M)
where not (s)-[:OWNS]->(u)
with u limit 50
CREATE (r)-[:OWNS]->(u);
This way I would run for all 10 nodes of M. Any help in correcting the query is appreciated.
You can utilize apoc.periodic.* library for batching. More info in documentation
call apoc.periodic.commit("
MATCH (u:P), (r:M{Name:'M1'}),(s:M) where not (s)-[:OWNS]->(u)
with u,r limit {limit}
CREATE (r)-[:OWNS]->(u)
RETURN count(*)
",{limit:10000})
If there will always be just one (r)-[:OWNS]->(u) relationship, I would change my first match to include
call apoc.periodic.commit("
MATCH (u:P), (r:M{Name:'M1'}),(s:M) where not (s)-[:OWNS]->(u) and not (r)-[:OWNS]->(u)
with u,r limit {limit}
CREATE (r)-[:OWNS]->(u)
RETURN count(*)
",{limit:10000})
So there is no way the procedure will fall into a loop
This query should be a fast and easy-to-understand. It is fast because it avoids Cartesian products:
MATCH (u:P)
WHERE not (:M)-[:OWNS]->(u)
WITH u LIMIT 50
MATCH (r:M {Name:'M1'})
CREATE (r)-[:OWNS]->(u);
It first matches 50 unowned P nodes. It then finds the M node that is supposed to be the "owner", and creates an OWNS relationship between it and each of the 50 P nodes.
To make this query even faster, you can first create an index on :M(Name) so that the owning M node can be found quickly (without scanning all M nodes):
CREATE INDEX ON :M(Name);
This worked for me.
MATCH (u:P), (r:M{Name:'M1'}),(s:M)
where not (s)-[:OWNS]->(u)
with u,r limit 50
CREATE (r)-[:OWNS]->(u);
Thanks for Thomas for mentioning limit on u and r.
I think one way to connect all 10 nodes :M in one query
MATCH (m:M)
WITH collect(m) as nodes
UNWIND nodes as node
MATCH (p:P) where not ()-[:OWNS]->(p)
WITH node,p limit 50
CREATE (node)-[:OWNS]->(p)
Although I am not really sure if we need to collect and unwind, could just simplify it to:
MATCH (m:M)
MATCH (p:P) where not ()-[:OWNS]->(p)
WITH m,p limit 50
CREATE (node)-[:OWNS]->(p)

How to eliminate a path where exists a relationship outside of the path, but between nodes in the path?

I have modified the 'vanilla' initial query in this console, and added one relationship type 'LOCKED' between the 'Morpheus' and 'Cypher' nodes.
How can I modify the existing (first-run) query, which is a variable length path so that it no longer reaches the Agent Smith node due to the additional Locked relationship I've added?
First-run query:
MATCH (n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
RETURN n AS Neo,r,m
I have tried this kind of thing:
MATCH p=(n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
AND none(rel IN rels(p) WHERE EXISTS (StartNode(rel)-[:LOCKED]->EndNode(rel)))
RETURN n AS Neo,r,m
..but it doesn't recognize the pattern inside the none() function.
I'm using Community 2.2.1
Thanks for reading
I'm pretty sure you can't use a function in a MATCHy type clause like that (though it's clever). What about this?
MATCH path=(neo:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
WHERE neo.name='Neo'
AND NOT('LOCKED' IN rels(path))
RETURN neo,r,m
EDIT:
Oops, looks like Dave might have beat me to the punch. Here's the solution I came up with anyway ;)
MATCH p=(neo:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE neo.name='Neo'
WITH p, neo, m
UNWIND rels(p) AS rel
MATCH (a)-[rel]->(b)
OPTIONAL MATCH a-[locked_rel:LOCKED]->b
WITH neo, m, collect(locked_rel) AS locked_rels
WHERE none(locked_rel IN locked_rels WHERE ()-[locked_rel]->())
RETURN neo, m
Ok, this is a little convoluted but i think it works. The approach is to take all of the paths and find the last known good nodes (ones that have LOCKED relationships leaving them). Then use that node(s) as a new ending point(s) and return the paths.
match p=(n:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
where n.name='Neo'
with n, relationships(p) as rels
unwind rels as r
with n
, case
when type(r) = 'LOCKED' then startNode(r)
else null
end as last_good_node
with n
, (collect( distinct last_good_node)) as last_good_nodes
unwind last_good_nodes as g
match p=n-[r:KNOWS|LOVES*]->g
return p
I think this would be simpler if there was a locked: true property on the KNOWS and LOVES relationships.

Neo4j cypher - Counting immediate children of root nodes

I'm struggling with a problem despite having read a lot of documentation... I'm trying to find my graph root node (or nodes, they may be several top nodes) and counting their immediate children (all relations are typed :BELONGS_TO)
My graph looks like this (cf. attached screenshot). I have been trying the following query which works as long as the root node only has ONE incomming relationship, and it doesn not when it has more than one. (i'm not realy familiar with the cyhper language yet).
MATCH (n:Somelabel) WHERE NOT (()-[:BELONGS_TO]->(n:Somelabel)) RETURN n
Any help would be much appreciated ! (i haven't even tried to count the root nodes immediate children yet...which would be "2" according to my graph)
Correct query was given by cybersam
MATCH (n:Somelabel) WHERE NOT (n)-[:BELONGS_TO]->() RETURN n;
MATCH (n:Somelabel)<-[:BELONGS_TO]-(c:Somelabel)
WHERE NOT (n)-[:BELONGS_TO]->() RETURN n, count(c);
Based on your diagram, it looks like you are actually looking for "leaf" nodes. This query will search for all Somelabel nodes that have no outgoing relationships, and return each such node along with a count of the number of distinct nodes that have a relationship pointing to that node.
MATCH (n:Somelabel)
WHERE NOT (n)-[:BELONGS_TO]->()
OPTIONAL MATCH (m)-[:BELONGS_TO]->(n)
RETURN n, COUNT(DISTINCT m);
If you are actually looking for all "root" nodes, your original query would have worked.
As a sanity check, if you have a specific node that you believe is a "leaf" node (let's say it has an id value of 123), this query should return a single row with null values for r and m. If you get non-null results, then you actually have outgoing relationships.
MATCH (n {id:123})
OPTIONAL MATCH (n)-[r]->(m)
RETURN r, m

Resources