I am setting a timeout for Neo4j queries in the configuration, but it does not seem to work to a meaningful degree of accuracy. Is there a way to get Neo4j to timeout a query close (i.e., within ~10% of) the specified timeout?
I have tried setting dbms.transaction.timeout, which seems to somewhat work, just not accurately.
For instance, if I set it to 2s, the query stops executing after around 8s. I have also set it to 10s, and had a query stop after 30s. Not a big deal, except on my real dataset I set timeout=3600s and I have a query that has been running for 2+ hours that ran for 5 hours=18000s.
I also saw a post about using, e.g.,
unsupported.dbms.executiontime_limit.enabled=true
unsupported.dbms.executiontime_limit.time=2s
But same issue as above.
Are there any other ways to get Neo4j to timeout a query with a little more accuracy?
Related
I'm using Neo4j 3.5.14 Enterprise (cypher over http/bolt). I'm seeing an issue where randomly a cypher query would be stuck never to be back again which takes out a worker thread. Eventually, if the service is not redeployed, all worker threads would be stuck and the service is no longer doing its job.
I tried using apoc.cypher.runTimeboxed but that appears to cause my queries to not return until the time limit is over (20000 ms in this case) even though in some cases it can return faster than that. I'm actually not sure that runTimeboxed would work because I believe it is actually stuck forever which might not respond to time limit anyway depending on how that's implemented.
My question is - how would you end a runaway query like that? Any tricks?
In our application we occasionally add around 10,000 nodes and 100,000 relationships to a Neo4J graph over the course of a few minutes, and then DETACH DELETE many of them a few minutes later. Previously the delete query was very quick (<100ms), but after a small change to our data model and some of our other queries (which are not running at the time), it now often blocks for minutes before completing.
While this blocking is happening there are no other queries running, and I have an export from Halin showing all the transactions that are happening at the time. It's difficult to reproduce here, but in summary there are exactly two transactions going on, one of which is my delete query. The delete query is stated to be blocked by the other one, which has 7 locks out, is in the Running state, and has no attached query or client at all. I imagine this means that it's an internal Neo4J process. It has 0 cpu time, and its entire 180s runtime is accounted for by idle time. There's no other information given.
What could be causing this transaction to lock the nodes that I want to delete for such a long time with no queries running?
What I've tried:
Using apoc.periodic.iterate and apoc.periodic.commit to split the query into smaller chunks - the inner queries end up locked
Looking in the query logs - difficult to be sure but I can't see any evidence of the internal transaction
Looking in the debug logs - records of garbage collections (always around 300ms) and some graph algorithms running, but never while this query is blocked, and nothing else relevant
Other info:
Neo4J version: 3.5.18-enterprise (docker)
Cluster mode: HA cluster with 2 nodes (also reproduced with only 1 node)
It turned out that there was a query a few minutes before that had been set going and then the client disconnected (missing await in C#). I still don't quite understand why this caused the observations, but my guess is that Neo4j put the query into a weird state after the client disconnected, and then some part of it ended up waiting for the transaction timeout before releasing its locks.
So I received a dated schema that used to work well at the beginning but it's experiencing some scaling issues.
Among of them, the space used by the indexes is catching my attention so I would like to know if they are being used, how many times, etc.
Other that explaining/profiling queries, is there anything else I could use to have this kind of information?
The information you are looking for would be under metrics monitoring, but index accesses is not one of the available metrics Neo4j provides. (Neo4j supports Prometheus, but I don't know if Prometheus captures that info either)
But there are some indirect ways you can get this data.
Assuming you have a test server that replicates production, with appropriate load tests, you can try removing the index and seeing how it affects the load tests. (This way is a bit cumbersome, but probably gives the most accurate measure of how varies DB changes affect performance, but only if the load tests accurately reflect production use.)
Alternatively, for a more static analysis, you should only be executing pre-defined, parameterized cyphers. So you can Profile/Explain those Cyphers against the DB at different scales, and compare those notes to the Cypher logs (either calling end, or using Neo4j metrics monitoring) to get an idea of how often each one is called.
I'd like to set a query timeout in neo4j.conf for Neo4j 3.0.1. Any query taking longer than the timeout should get killed. I'm primarily concerned with setting the timeout for queries originating from the Neo4j Browser.
It looks like this was possible in the past with:
execution_guard_enabled=true
org.neo4j.server.webserver.limit.executiontime=20000
However, this old method doesn't work for me. I see Neo4j 3.0 has a dbms.transaction_timeout option defined as a "timeout for idle transactions". However, this setting also doesn't seem to do the trick.
Thanks to #stdob for the comment explaining a solution.
In Neo4j 3.0.1 Community, I verified that the following addition to neo4j.conf enabled a query timeout of 1 second for Browser queries:
unsupported.dbms.executiontime_limit.enabled=true
unsupported.dbms.executiontime_limit.time=1s
I did not check whether the timeout applies to queries oustide of Neo4j Browser, but I assume so. I did find some documentation in the Neo4j codebase for unsupported.dbms.executiontime_limit.time:
If execution time limiting is enabled in the database, this configures the maximum request execution time.
I believe dbms.transaction.timeout is the current way of limiting execution time
Whenever I try to run cypher queries in Neo4j browser 2.0 on large (anywhere from 3 to 10GB) batch-imported datasets, I receive an "Unknown Error." Then Neo4j server stops responding, and I need to exit out using Task Manager. Prior to this operation, the server shuts down quickly and easily. I have no such issues with smaller batch-imported datasets.
I work on a Win 7 64bit computer, using the Neo4j browser. I have adjusted the .properties file to allow for much larger memory allocations. I have configured my JVM heap to 12g, which should be fine for 64bit JDK. I just recently doubled my RAM, which I thought would fix the issue.
My CPU usage is pegged. I have the logs enabled but I don't know where to find them.
I really like the visualization capabilities of the 2.0.4 browser, does anyone know what might be going wrong?
Your query is taking a long time, and the web browser interface reports "Unknown Error" after a certain timeout period. The query is still running, but you won't see the results in the browser. This drove me nuts too when it first happened to me. If you run the query in the neo4j shell you can verify whether or not this is the problem, because the shell won't time out.
Once this timeout occurs, you can find that the whole system becomes quite non-responsive, especially if you re-run the query, because now you have two extremely long queries running in parallel!
Depending on the type of query, you may be able to improve performance. Sometimes it's as simple as limiting the number of returned nodes (in cases where you only need to find one node or path).
Hope this helps.
Grace and peace,
Jim