I am running Neo4j 3.0.4 Enterprise in HA and am encountering an extremely mysterious issue. On doing a very basic Cypher query from our Spring Data Neo4j application (following is taken from the Neo4j DB query.log):
MATCH (n:`Node`)
WHERE n.`key` = { `key` }
WITH n
MATCH p=(n)-[*0..1]-(m)
RETURN p, ID(n) - {key: 123456}
I get an exception related to the page cache:
java.lang.NullPointerException
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCursor.assertPagedFileStillMappedAndGetIdOfLastPage(MuninnPageCursor.java:369)
at org.neo4j.io.pagecache.impl.muninn.MuninnReadPageCursor.next(MuninnReadPageCursor.java:55)
at org.neo4j.io.pagecache.impl.muninn.MuninnPageCursor.next(MuninnPageCursor.java:121)
at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:1039)
at org.neo4j.kernel.impl.store.CommonAbstractStore.access$000(CommonAbstractStore.java:64)
at org.neo4j.kernel.impl.store.CommonAbstractStore$1.next(CommonAbstractStore.java:1179)
at org.neo4j.kernel.impl.api.store.StoreSingleNodeCursor.next(StoreSingleNodeCursor.java:64)
at org.neo4j.kernel.impl.api.StateHandlingStatementOperations.nodeCursorById(StateHandlingStatementOperations.java:137)
at org.neo4j.kernel.impl.api.ConstraintEnforcingEntityOperations.nodeCursorById(ConstraintEnforcingEntityOperations.java:422)
at org.neo4j.kernel.impl.api.OperationsFacade.nodeGetProperty(OperationsFacade.java:333)
at org.neo4j.cypher.internal.spi.v3_0.TransactionBoundQueryContext$NodeOperations.getProperty(TransactionBoundQueryContext.scala:316)
And then from here on out, the same query will fail intermittently and will be logged as a one liner:
java.lang.NullPointerException
On the app server side, we get the following generic error code:
org.neo4j.ogm.exception.CypherException: Error executing Cypher "Neo.DatabaseError.Statement.ExecutionFailed"
Has any Neo4j experts seen this issue before?
Moving the master in my cluster (ie restarting) seems to fix it but this resurfaces if any significant amount of load is placed on the server. The page cache should be fairly large as I have an 8 GB database and have left not only the heap size as default but the page cache size unset too so that it takes on its default value.
The logs indicate that there are a large number of concurrent queries right before this exception (all happening within the same second and quite a few querying for the exact same thing). Could there be some type of race condition in how the page cache works? What is a realistic limit on concurrent reads?
Any advice at all is greatly appreciated!
Related
I am using Neo4j community 4.2.1, playing with graph databases. I plan to operate on lots of data and want to get familiar with indexes and stuff.
However, I'm stuck at a very basic level because in the browser Neo4j reports query runtimes which have nothing to do with reality.
I'm executing the following query in the browser at http://localhost:7687/:
match (m:Method),(o:Method) where m.name=o.name and m.name <> '<init>' and
m.signature=o.signature and toInteger(o.access)%8 in [1,4]
return m,o
The DB has ~5000 Method labels.
The browser returns data after about 30sec. However, Neo4j reports
Started streaming 93636 records after 1 ms and completed after 42 ms, displaying first 1000 rows.
Well, 42ms and 30sec is really far away from each other! What am I supposed to do with this message? Did the query take only milliseconds and the remaining 30secs were spent rendering the stuff in the browser? What is going on here? How can I improve my query if I cannot even tell how long it really ran?
I modified the query, returning count(m) + count(n) instead of m,n which changed things, now runtime is about 2secs and Neo4j reports about the same amount.
Can somebody tell me how I can get realistic runtime figures of my queries without using the stop watch of my cell?
working on a pretty small graph of 5000 nodes with low density (mean connectivity < 5), I get the following error which I never got before upgrading to neo4j 3.3.0. The graph contains 900 molecules and their scaffold hierarchy, down to 5 levels.
(:Molecule)<-[:substructureOf*1..5]-(:Scaffold)
Neo.TransientError.General.StackOverFlowError
There is not enough stack size to perform the current task. This is generally considered to be a database error, so please contact Neo4j support. You could try increasing the stack size: for example to set the stack size to 2M, add `dbms.jvm.additional=-Xss2M' to in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation just add -Xss2M as command line flag.
The query is actually very simple, I use distinct because several path may lead to a single scaffold.
match (m:Molecule) <-[:substructureOf*3]- (s:Scaffold) return distinct s limit 20
This query returns the above error message whereas the next query does work.
match (m:Molecule) <-[:substructureOf*3]- (s:Scaffold) return s limit 20
Interestingly, the query works on a much larger database, in this small one the deepest hierarchy happened to be 2. Therefore the result of the last query is "No changes, no records)".
How comes that adding DISTINCT to the query fails with that memory error? Is there a way to avoid it, because I cannot guess the depth of the hierarchy which can be different for each molecules.
I tried the following values for as suggested in other posts.
#dbms.memory.heap.initial_size=512m
#dbms.memory.heap.max_size=512m
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=4096m
dbms.memory.heap.initial_size=4096m
dbms.memory.heap.max_size=4096m
None of these addressed the issue.
Thanks in advance for any help or clues.
Thanks for the additional info, I was able to replicate this on Neo4j 3.3.0 and 3.3.1, and this likely has to do with the behavior of the pruning-var-expand operation (that is meant to help when using variable-length expansions and distinct results) that was introduced in 3.2.x, and only when using an exact number of expansions (not a range). Neo4j engineering will be looking into this.
In the meantime, your requirement is such that we can use a different kind of query to get the results you want that should avoid this operation. Try giving this one a try:
match (s:Scaffold)
where (s)-[:substructureOf*3]->(:Molecule)
return distinct s limit 20
And if you do need to perform queries that may produce this error, you may be able to circumvent them by prepending your query with CYPHER 3.1, which will execute this with a plan produced by an older version of Cypher which doesn't use the pruning var expand operation.
We're loading data in a Neo4j Server which represents mainly (almost) k-ary trees with k between 2 and 10 in most case. We have about 50 node types possible, and about same amount of type of relationships.
The server is online and data can be loaded from several instances (So, unhappily, we can't use neo4j-import)
We experience very slow loading for about 100 000 nodes and relationships, which take about 6mn to load in a good machine. Sometimes we experience loading of the same datas which takes 40mn ! Looking at the neo4j process, it sometime doing nothing....
In this case, we have messages like :
WARN [o.n.k.g.TimeoutGuard] Transaction timeout. (Overtime: 1481 ms).
Beside we don't experience problems with query which execute quickly despite very complex structures
We load data as follow :
A cypher file is loaded like this :
neo4j-shell -host localhost -v -port 1337 -file myGraph.cypher
The cypher file contains several sections :
Constraints creations :
CREATE CONSTRAINT ON (p:MyNodeType) ASSERT p.uid IS UNIQUE;
Index on very little set of Nodes (10 at more)
We carefully select these to avoid counter performance behaviours.
CREATE INDEX ON :MyNodeType1(uid);
Nodes creations
USING PERIODIC COMMIT 4000 LOAD CSV WITH HEADERS FROM "file:////tmp/my.csv" AS csvLine CREATE (p:MyNodeType1 {Prop1: csvLine.prop1, mySupUUID: toInt(csvLine.uidFonctionEnglobante), lineNum: toInt(csvLine.lineNum), uid: toInt(csvLine.uid), name: csvLine.name, projectID: csvLine.projectID, vValue: csvLine.vValue});
Relationships creations
LOAD CSV WITH HEADERS FROM "file:////tmp/RelsInfixExpression-vLeftOperand-SimpleName_javaouille-normal-b11695.csv" AS csvLine Match (n1:MyNodeType1) Where n1.uid = toInt(csvLine.uidFather) With n1, csvLine Match (n2:MyNodeType2) Where n2.uid = toInt(csvLine.uidSon) MERGE (n1)-[:vOperandLink]-(n2);
Question 1
We experienced, sometimes, OOM in Neo4j server while loading datas, difficult to reproduce even with the same datas. But having recently added USING PERIODIC COMMIT 1000 to relationships loading commands, we never reproduced this problem. Could it is possibly the solution for OOM problem ?
Question 2
Is the Periodic Commit parameter good ?
Is there another way to speed up data loading ? Ie. another strategy to write the data loading script ?
Question 3
Is there ways to prevent timeout ? With another way to write the data loading script or maybe JVM tuning ?
Question 4
Some months ago we splited the cypher script in 2 or 3 parts to launch it concurrently, but we stoped that because the server messed up the data frequently and became unusable. Is there a way to split "cleanly" the script and launch them concurrently ?
Question 1: Yes, USING PERIODIC COMMIT is the first thing to try when LOAD CSV causes OOM errors.
Question 2&3: The "sweet spot" for periodic commit batch size depends on your Cypher query, your data characteristics, and how your neo4j server is configured (all of which can change over time). You do not want the batch size to be too high (to avoid occasional OOMs), nor too low (to avoid slowing down the import). And you should tune the server's memory configuration as well. But you will have to do your own experimentation to discover the best batch size and server configuration, and adjust them as needed.
Question 4: Concurrent write operations that touch the same nodes and/or relationships must be avoided, as they can cause errors (like deadlocks and constraint violations). If you can split up your operations so that they act on completely disjoint subgraphs, then they should be able to run concurrently without these kinds of errors.
Also, you should PROFILE your queries to see how the server will actual execute them. For example, even if both :MyNodeType1(uid) and :MyNodeType2(uid) are indexed (or have uniqueness constraints), that does not mean that the Cypher planner will automatically use those indexes when it executes your last query. If your profile of that query shows that it is not using the indexes, then you can add hints to the query to make the planner (more likely to) use them:
LOAD CSV WITH HEADERS FROM "file:////tmp/RelsInfixExpression-vLeftOperand-SimpleName_javaouille-normal-b11695.csv" AS csvLine
MATCH (n1:MyNodeType1) USING INDEX n1:MyNodeType1(uid)
WHERE n1.uid = TOINT(csvLine.uidFather)
MATCH (n2:MyNodeType2) USING INDEX n2:MyNodeType2(uid)
WHERE n2.uid = TOINT(csvLine.uidSon)
MERGE (n1)-[:vOperandLink]-(n2);
In addition, if it is OK to store the uid values as strings, you can remove the uses of TOINT().This will speed up things to some extent.
Just upgraded from 2.2.5 to 2.3.2 and a previous query that executed immediately now takes a considerable time. Seems linked to the depth as reducing it form 5 to 3 makes it quicker.
More details are as below,
Following is the Neo4j query used to search the recommended restaurants for the passed user_id, where depth/degree of the search is kept as 5.
MATCH (u:User {user_id:"bf9203fba4484f96b4983152e9ee859a"})-[r*1..5]-(place:Place)
WHERE ALL (rel in r WHERE rel.rating >= 0)
RETURN DISTINCT place.place_id, place.title, length(r) as LoR ORDER BY LoR, place.title
Old server instance has Neo4j 2.2.5, where result is displayed instantly but on new VM with Neo4j 2.3.2 it is taking quite long time to return the result.
If we decrease the search depth value to 2 or 3, queries are running faster
Anyone else experiencing this?
How do the query times compare when running completely non-warmed-up, e.g. just start server and run this query? I'm suspecting property loading may be the biggest problem due to the removal of the object cache in the 2.3 series.
Do the Place nodes and relationships w/ the rating property have many properties each?
I have a basic neo4j database with 10 nodes (no relationships). I am running embedded mode and starting the server/webadmin as follows:
GraphDatabaseService graphDb;
graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder("/path/to/data/directory").newGraphDatabase();
WrappingNeoServer srv = new WrappingNeoServer((GraphDatabaseAPI) graphDb);
srv.start();
Once I create the nodes, query performance is fine, but when I restart the server, query performance for basic cypher queries becomes slow. The following query takes about 1-3 seconds:
MATCH (n) RETURN count(n);
Before a restart (immediately after the nodes are created), this query is less than 100ms.
Here is a link to the data directory I am using: https://drive.google.com/file/d/0B1pENwDgk7SQTFkxU1BGd2poeUU/edit?usp=sharing
I am running version 2.0.1.
What could be causing this slow performance?
The databases has to be initialized, memory mapping set up.
Nodes have to be loaded from disk.
The Cypher query parser has to build up it's rules first.
Your query has to be parsed build and put in the cache.
Much more
That's why never trust the first 100-1000 queries for performance measurements.