i'm using this cypher query to create relationship between two nodes in Neo4j
MATCH (first:FIRSTNODE)
with first
MATCH (second:SECONDNODE)
WHERE first.ID = second.ID
CREATE (first)-[:RELATION]->(second)
first has 100.000 of nodes and second has 1.100.000 nodes.
I have imported the csv file and then i've created index of the two tables; but when i try to run the query with the relation neo4j got stuck and stop working.
I noticed that the cpu usage goes at 100% when this happens.
I'm working with an cpu of 8x4.0Ghz and 10Gb of ram and an SSD.
Do you know something that can help me to resolve this problem?
EDIT 1:
Using apoc.periodic.commit it works. But if then i run a second query like this:
call apoc.periodic.commit("
MATCH (third:THIRDNODE)
WHERE NOT (third)-[:RELATION2]->()
WITH third LIMIT {limit}
MATCH (second:SECONDNODE)
WHERE third.ID = second.ID2
CREATE (third)-[:RELATION2]->(second)
RETURN count(*)
", {limit:10000})
it got stuck again
You can try using apoc.periodic.commit from APOC Procedures. The docs about this procedure says:
apoc.periodic.commit(statement,params) - runs the given statement in
separate transactions until it returns 0
Install APOC Procedures and try it:
call apoc.periodic.commit("
MATCH (first:FIRSTNODE),
WHERE NOT (first)-[:RELATION]->()
WITH first LIMIT {limit}
MATCH (second:SECONDNODE)
WHERE first.ID = second.ID
CREATE (first)-[:RELATION]->(second)
RETURN count(*)
", {limit:10000})
Remember to install APOC procedures according the version of Neo4j you are using. Take a look in the version compatibility matrix.
I have a neo4j database with ~260000 (EDIT: Incorrect by order of magnitude previously, missing 0) nodes of genes, something along the lines of:
example_nodes: sourceId, targetId
with an index on both sourceId and targetId
I am trying to build the relationships between all the nodes but am constantly running into OOM issues. I've increased my JVM heap size to -Xmx4096m and dbms.memory.pagecache.size=16g on a system with 16G of RAM.
I am assuming I need to optimize my query because it simply cannot complete in any of its current forms. However, I have tried the following three to no avail:
MATCH (start:example_nodes),(end:example_nodes) WHERE start.targetId = end.sourceId CREATE (start)-[r:CONNECT]->(end) RETURN r
(on a subset of the 5000 nodes, this query above completes in only a matter of seconds. It does of course warn: This query builds a cartesian product between disconnected patterns.)
MATCH (start:example_nodes) WITH start MATCH (end:example_nodes) WHERE start.targetId = end.sourceId CREATE (start)-[r:CONNECT]->(end) RETURN r
OPTIONAL MATCH (start:example_nodes) WITH start MATCH (end:example_nodes) WHERE start.targetId = end.sourceId CREATE (start)-[r:CONNECT]->(end) RETURN r
Any ideas how this query could be optimized to succeed would be much appreciated.
--
Edit
In a lot of ways I feel that while the apoc libary does indeed solve the memory issues, the function could be optimized if it were to run along the lines of this incredibly simple pseudocode:
for each start_gene
create relationship to end_gene where start_gene.targetId = end_gene.source_id
move on to next once relationship has been created
But I am unsure how to achieve this in cypher.
You can use apoc library for batching.
call apoc.periodic.commit("
MATCH (start:example_nodes),(end:example_nodes) WHERE not (start)-[:CONNECT]->(end) and id(start) > id(end) AND start.targetId =
end.sourceId
with start,end limit {limit}
CREATE (start)-[:CONNECT]->(end)
RETURN count(*)
",{limit:5000})
What is the best way to cleanup the graph from all nodes and relationships via Cypher?
At http://neo4j.com/docs/stable/query-delete.html#delete-delete-a-node-and-connected-relationships the example
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n,r
has the note:
This query isn’t for deleting large amounts of data
So, is the following better?
MATCH ()-[r]-() DELETE r
and
MATCH (n) DELETE n
Or is there another way that is better for large graphs?
As you've mentioned the most easy way is to stop Neo4j, drop the data/graph.db folder and restart it.
Deleting a large graph via Cypher will be always slower but still doable if you use a proper transaction size to prevent memory issues (remember transaction are built up in memory first before they get committed). Typically 50-100k atomic operations is a good idea. You can add a limit to your deletion statement to control tx sizes and report back how many nodes have been deleted. Rerun this statement until a value of 0 is returned back:
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
WITH n,r LIMIT 50000
DELETE n,r
RETURN count(n) as deletedNodesCount
According to the official document here:
MATCH (n)
DETACH DELETE n
but it also said This query isn’t for deleting large amounts of data. so it's better use with limit.
match (n)
with n limit 10000
DETACH DELETE n;
Wrote this little script, added it in my NEO/bin folder.
Tested on v3.0.6 community
#!/bin/sh
echo Stopping neo4j
./neo4j stop
echo Erasing ALL data
rm -rf ../data/databases/graph.db
./neo4j start
echo Done
I use it when my LOAD CSV imports are crappy.
Hope it helps
What is the best way to clean up the graph from all nodes and relationships via Cypher?
I've outlined four options below that are current as of July 2022:
Option 1: MATCH (x) DETACH DELETE x
Option 2: CALL {} IN TRANSACTIONS
Option 3: delete data directories
Option 4: delete in code
Option 1: MATCH (x) DETACH DELETE x - works only with small data sets
As you posted in your question, the following works fine, but only if there aren't too many nodes and relationships:
MATCH (x) DETACH DELETE x
If the number of nodes and/or relationships is high enough, this won't work. Here's what "not working" looks like against http://localhost:7474/browser/:
There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.
And here's what
shows up in neo4j console output (or in logs, if you have that enabled):
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.Scheduler-1"
Option 2: CALL {} IN TRANSACTIONS - does not work as of July 2022
An alternative, available since 4.4 according to neo4j docs, is to use a new CALL {} IN TRANSACTIONS feature:
With 4.4 and newer versions you can utilize the CALL {} IN TRANSACTIONS syntax [...] to delete subsets of the matched records in batches until the full delete is complete
Unfortunately, this doesn't work in my tests. Here's an example attempting to delete relationships only:
MATCH ()-[r]-()
CALL { WITH r DELETE r }
IN TRANSACTIONS OF 1000 ROWS
Running that in browser results in this error:
A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.
In code, it produces the same result. Here's an attempt connecting via bolt in Java:
session.executeWrite(tx -> tx.run("MATCH (x) " +
"CALL { WITH x DETACH DELETE x } " +
"IN TRANSACTIONS OF 10000 ROWS"));
which results in this error, identical to what the browser showed:
org.neo4j.driver.exceptions.DatabaseException: A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.
at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:111)
at org.neo4j.driver.internal.InternalTransaction.run(InternalTransaction.java:58)
at org.neo4j.driver.internal.AbstractQueryRunner.run(AbstractQueryRunner.java:34)
Looking at the documentation for Transactions, it states: "Transactions can be either explicit or implicit." What's the difference? From that same doc:
Explicit transactions:
Are opened by the user.
Can execute multiple Cypher queries in sequence.
Are committed, or rolled back, by the user.
Implicit transactions, sometimes called auto-commit transactions or :auto transactions:
Are opened automatically.
Can execute a single Cypher query.
Are committed automatically when the query finishes successfully.
I can't determine from docs or experimentation how to open an implicit transaction (and thus, to be able to use 'CALL { ... } IN TRANSACTIONS' structure), so this is apparently a dead end.
In a recent Neo4j AuraDB Office Hours posted May 31, 2022, they tried using this same feature in AuraDB. It didn't work for them either, though the behavior was different from what I've observed in Neo4j Community. I'm guessing they'll address this at some point, feels like a bug, but at least for now it's another confirmation that
'CALL { ... } IN TRANSACTIONS' is not the way forward.
Option 3: delete data directories - works with any size data set
This is the easiest, most straightforward mechanism that actually works:
stop the server
manually delete data directories
restart the server
Here's what that looks like:
% ./bin/neo4j stop
% rm -rf data/databases data/transactions
% ./bin/neo4j start
This is pretty simple. You could write a script to capture this as a single command.
Option 4: delete in code - works with any size data set
Below is a minimal Java program that handles deletion of all nodes and relationships, regardless of how many.
The manual-delete option works fine, but I needed a way to delete all nodes and relationships in code.
This works in Neo4j Community 4.4.3, and since I'm using only basic functionality (no extensions), I assume this would work across a range of other Neo4j versions, and probably AuraDB, too.
import org.neo4j.driver.AuthTokens;
import org.neo4j.driver.GraphDatabase;
import org.neo4j.driver.Session;
public static void main(String[] args) throws InterruptedException {
String boltUri = "...";
String user = "...";
String password = "...";
Session session = GraphDatabase.driver(boltUri, AuthTokens.basic(user, password)).session();
int count = 1;
while (count > 0) {
session.executeWrite(tx -> tx.run("MATCH (x) WITH x LIMIT 1000 DETACH DELETE x"));
count = session.executeWrite(tx -> tx.run("MATCH (x) RETURN COUNT(x)").single().values().get(0).asInt());
}
}
optional match (n)-[p:owner_real_estate_relation]->() with n,p LIMIT 1000 delete p
In test run, deleted 50000 relationships, completed after 589 ms.
I performed several tests and the best combination was
`call apoc.periodic.iterate("MATCH p=()-[r]->() RETURN r,p LIMIT 5000000;","DELETE r;", {batchSize:10000, parallel: true}`)
(this code deleted 300,000,000 relationships in 3251s)
It is worth noting that using the "parallel" parameter drastically reduces the time.
This for Neo4j 4.4.1
AWS EC2: m5.xlarge
neo4j:
resources:
memory: 29000Mi
configs:
dbms.memory.heap.initial_size: "20G"
dbms.memory.heap.max_size: "20G"
dbms.memory.pagecache.size: "5G"
I'm beginner about neo4j and I'm evaluating neo4j version 2.0.0 RC1 community edition.
I tried to delete a node from one million nodes using browser interface(i.e host:7474/browser/)
Even though match query without delete clause works fine, match query with delete return Unknown error.
The following query working fine and fast response
match (u:User{uid:'3282'}) return u
The delete query returning Unknown error
match (u:User{uid:'3282'}) delete u return u
The node labeled User contains one million nodes, so I guessed Unknown error is because of slow performance.
Also, setting property query return unknown error in like fashion.
Is it usual neo4j's write performance? Is there a way to resolve the problem?
Thanks
I believe the issue is that you're trying to return the node that you just deleted. You can delete without the return, which should work fine:
match (u:User{uid:'3282'}) delete u;
Using Cypher how can I get all nodes in a graph? I am running some testing against the graph and I have some nodes without relationships so am having trouble crafting a query.
The reason I want to get them all is that I want to delete all the nodes in the graph at the start of every test.
So, this gives you all nodes:
MATCH (n)
RETURN n;
If you want to delete everything from a graph, you can do something like this:
MATCH (n)
OPTIONAL MATCH (n)-[r]-()
DELETE n, r;
Updated for 2.0+
Edit:
Now in 2.3 they have DETACH DELETE, so you can do something like:
MATCH (n)
DETACH DELETE n;
Would this work for you?
START a=node:index_name('*:*')
Assuming you have an index with these orphaned nodes in them.
This just works fine in 2.0:
MATCH n RETURN n
If you need to delete some large number of objects from the graph, one needs to be mindful of the not building up such a large single transaction such that a Java OUT OF HEAP Error will be encountered.
If your nodes have more than 100 relationships per node ((100+1)*10k=>1010k deletes) reduce the batch size or see the recommendations at the bottom.
With 4.4 and newer versions you can utilize the CALL {} IN TRANSACTIONS syntax.
MATCH (n:Foo) where n.foo='bar'
CALL { WITH n
DETACH DELETE n
} IN TRANSACTIONS OF 10000 ROWS;
With 3.x forward and using APOC
call apoc.periodic.iterate("MATCH (n:Foo) where n.foo='bar' return id(n) as id", "MATCH (n) WHERE id(n) = id DETACH DELETE n", {batchSize:10000})
yield batches, total return batches, total
For best practices around deleting huge data in neo4j, follow these guidelines.