I want to delete all nodes of a given type and their relations. In total there are 1.4 million nodes of this type.
Using MATCH (n:Type) DETACH DELETE n Neo4j hangs itself up after a few minutes and has to be restarted.
Is there a better way to delete a large number of nodes? Can I delete them in chunks somehow (LIMIT is not supported with DELETE)?
Try this
Match (n:Type) with n
Match (n)-[r]-()
Delete n, r
If you want to delete them in chunks the query would look like
Match (n:Type) with n limit 1000
Match (n)-[r]-()
Delete n, r
Related
I have created a large graph in Neo4j and have an empty node that is connected via 11 million relationships in graph that I need to remove. I know that if I just delete the node I will leave behind all the hanging relationships but I have been unsuccessful in my attempts to remove them. I have tried the following CYPHER commands but they hang and fail to complete:
MATCH (n:Label {uid: ''}) DETACH DELETE n;
and
MATCH (n:Label {uid: ''})-[r]-() DELETE r;
I'm working under the assumption that there are not enough resources to load the 11 million relationship subgraph in memory in order to detach and delete the node. Is there any way to loop over the relationships in order to lower the required system resources?
1) You can use apoc.periodic.commit function from the apoc library:
call apoc.periodic.commit(
'MATCH (n:Label {uid: {uid}})-[r]->()
WITH r LIMIT {limit}
DELETE r
RETURN COUNT(r)', {
limit: 1000,
uid: ...
})
2) You can delete the node, and then create it again with apoc.create.node function:
MATCH (n:Label {uid: 2})
WITH n, {labels: labels(n), properties: properties(n)} AS data
DETACH DELETE n
WITH data
CALL apoc.create.node(data.labels, data.properties) yield node AS newNode
RETURN newNode
You could delete the relationships in batches and then delete the node
MATCH (n:Label {uid: ''})-[r]-()
WITH r
LIMIT 1000
DELETE r;
If you run that successively you will delete the relationships in small batches. Play with the limit amount to see what your running system will tolerate resource wise.
If you are on Neo4j 4.0 or above, then this is the best way:
MATCH (n:Label {uid: ''})-[r]-()
CALL {
WITH r
DELETE r
} IN TRANSACTIONS OF 1000 ROWS;
I'm starting with Neo4j and using graphs, and I'm trying to get the following:
I have to find the subtraction(difference) between the number of users (each user is a node) and the number of differents names they have. I have 16 nodes, and each one has his own name (name is one of the properties it has), but some of them have the same name (for example the node A has (Name:Amanda,City:Roma) and node B has (Name:Amanda, City:Paris), so I will have less name's count because some of them are repeated.
I have tried this:
match (n) with n, count(n) as c return sum(c)
That gives me the number of nodes. And then I tried this
match (n) with n, count(n) as nodeC with n, count( distinct n.Name) as
nameC return sum(nodeC) as sumN, sum(nameC) as sumC, sumN-sumC
But it doesn't work (I'm not sure if even i'm getting the names well, because when I try it, separated, it doesn't work neither).
I think this is what you are looking for:
MATCH (n)
RETURN COUNT(n) - COUNT(DISTINCT n.name) AS diff;
Recently i faced with the problem in creating chain of nodes using next query in loop
MATCH (p: Node) WHERE NOT (p)-[:RELATIONSHIP]->()
WITH p LIMIT 1000
MATCH (q: Node{id: p.id}) WITH p, max(id(q)) as tail
MATCH (t: Node) where id(t) = tail
WITH p, t
CREATE (p)-[:RELATIONSHIP]->(t)
The problem appears after creating chain with first ~1 000 000 nodes. Query
MATCH (p: Node) WHERE NOT (p)-[:RELATIONSHIP]->()
works very slow because it looks through first 1 000 000 and checks if they don't have a relationship, but they all have. At some amount of nodes query ends with "Unknown error". To get around with it I tried next queries.
MATCH (p: Node) with p skip 1000000
Match (p) WHERE NOT (p)-[:RELATIONSHIP]->()
or
MATCH (p: Node) with p order by id(p) desc
MATCH (p) WHERE NOT (p)-[:RELATIONSHIP]->()
But i wonder if there more elegant way to solve this problem like "indexing relationship existence"?
You can index relationship properties using "legacy indexing," which isn't exactly recommended anymore, but this won't index the absence of relationships so it wouldn't do you any good. I'd probably try to find a way to mark nodes in need of relationships through either a label or an index on a property. Start your match from there, it'll be much faster.
I'm playing with 2.0 M6 neo4j server (oracle jdk7 on win7 64).
I'm trying to delete a node and its relationships using a single cypher query over the REST API.
The query I create (which works if I run it in the browser UI) looks like:
START n = node( 1916 ) MATCH n-[r]-() DELETE n, r
Which by the time I put it through gson comes out as:
{"query":"START n \u003d node( 1916 ) MATCH n-[r]-() DELETE n, r"}
Which when sent to the server gets the response:
{
"columns" : [ ],
"data" : [ ]
}
My test fails because the node can still be found in neo4j server by its id...
If I simplify my query to just delete a node (that has no relationships) so its:
START n = node( 1920 ) DELETE n
Which becomes
{"query":"START n \u003d node( 1920 ) DELETE n"}
Then the node is deleted.
Have I missed something?
Thanks, Andy
For neo4j 2.0 you would do
START n=node(1916)
OPTIONAL MATCH n-[r]-()
DELETE r, n;
MATCH n-[r]-() will only match the node if there is at least one relationship attached to it.
You want to make the relationship match optional: MATCH n-[r?]-()
Also, you need to delete the relationships before the node.
So, your full query is:
START n=node(1916)
MATCH n-[r?]-()
DELETE r, n
Both START and the [r?] syntax are being phased out. It's also usually not advised to directly use internal ids. Try something like:
match (n{some_field:"some_val"})
optional match (n)-[r]-()
delete n,r
(see http://docs.neo4j.org/refcard/2.1/)
The question mark(?) is not supported in Neo4J 2.0.3, so the answer would be to use OPTIONAL MATCH
START n=node(nodeid)
OPTIONAL MATCH n-[r]-()
DELETE r, n;
And again there is a pretty syntax change. Neo4j 2.3 introduces the following:
MATCH (n {id: 1916})
DETACH DELETE n
Detach automatically deletes all in- and outgoing relationships.
Based upon latest documents and I have also tested it
START n=node(1578)
MATCH (n)-[r]-()
DELETE n,r
We have to put () around n and there is also no need of ? in [r?].
Even it works without OPTIONAL.
Using Cypher how can I get all nodes in a graph? I am running some testing against the graph and I have some nodes without relationships so am having trouble crafting a query.
The reason I want to get them all is that I want to delete all the nodes in the graph at the start of every test.
So, this gives you all nodes:
MATCH (n)
RETURN n;
If you want to delete everything from a graph, you can do something like this:
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n, r;
Updated for 2.0+
Edit:
Now in 2.3 they have DETACH DELETE, so you can do something like:
MATCH (n)
DETACH DELETE n;
Would this work for you?
START a=node:index_name('*:*')
Assuming you have an index with these orphaned nodes in them.
This just works fine in 2.0:
MATCH n RETURN n
If you need to delete some large number of objects from the graph, one needs to be mindful of the not building up such a large single transaction such that a Java OUT OF HEAP Error will be encountered.
If your nodes have more than 100 relationships per node ((100+1)*10k=>1010k deletes) reduce the batch size or see the recommendations at the bottom.
With 4.4 and newer versions you can utilize the CALL {} IN TRANSACTIONS syntax.
MATCH (n:Foo) where n.foo='bar'
CALL { WITH n
DETACH DELETE n
} IN TRANSACTIONS OF 10000 ROWS;
With 3.x forward and using APOC
call apoc.periodic.iterate("MATCH (n:Foo) where n.foo='bar' return id(n) as id", "MATCH (n) WHERE id(n) = id DETACH DELETE n", {batchSize:10000})
yield batches, total return batches, total
For best practices around deleting huge data in neo4j, follow these guidelines.