I have some relly long running queries. Just as abckground information: I am crawling my graph for all instances of a specific meta path. for example, count all instances of a specific metha path found in the graph.
MATCH (a:Content) - [:isTaggedWith]-> (t:Term) <-[:isTaggedWith]-(b:Content) return (*)
In the first place, I want to measure the runtimes. is there any possibility to do so? especially in the community edition?
Furthermore, I have the problem that I do not know, whether a query is still running in neo4j or if it was already terminated. I issue the query from a rest client but I am open to other options if necessary. For example, I queried neo4j with a rest client and set the read timeout (client side) on 2 days. The problem is, that I can't verify whether the query is still running or if the client is simply waiting for the neo4j answer, which will never appear because the query might already be killed in the backend. is there really no possibility to check from the browser or another client which queries are currently running? maybe with an option to terminate them as well.
Thanks in advance!
Measuring Query Performance
To answer your first question, there are two main options for measuring the performance of a query. The first is to use PROFILE; put it in front of a query (like PROFILE MATCH (a:Content)-[:IsTaggedWith]->(t:Term)...), and it will execute the query and display the execution plan used, including the native API calls, number of results from each operation, number of total database hits, and total time of execution.
The downside is that PROFILE will execute the query, so if it is an operation that writes to the database, the changes are persisted. To profile a query without actually executing it, EXPLAIN can be used instead of PROFILE. This will show the query plan and native operations that will be used to execute the query, as well as the estimated total database hits, but it will not actually run the query, so it is only an estimate.
Checking Long Running Queries (Enterprise only)
Checking for running queries can be accomplished using Cypher in Enterprise Edition: CALL dbms.listQueries;. You must be logged in as an admin user to perform the query. If you want to stop a long-running query, use CALL dbms.killQuery() and pass in the ID of the query you wish to terminate.
Note that besides manual killing of a query and timeout of it based on the configured query timeout, unless you have something else set up to kill long-runners, the queries should, in general, not be getting killed on the backend; however, with the above method, you can double-check your assumptions that the queries are indeed executing after sending.
These are available only in Enterprise Edition; there is no way that I am aware of to use these functions or replicate their behavior in Community.
For measuring long running queries I figured out the following approach:
Use a tmux (tmux crash course) terminal session, which is really very easy. Hereby, you can execute your query and close the terminal. Later on you can get back the session.
New session: tmux new -s *sessionName*
Detach from current session (within session): tmux detach
List sessions: tmux ls
Re-attach to session: tmux a -t *sessionName*
Within the tmux session, execute the query via the cypher shell. Either directly in the shell or pipe the command into the shell. The ladder approach is preferable because you can use the unix command time to actually measure the runtime as follows:
time cat query.cypher | cypher-shell -u neo4j -p n > result.txt
The file query.cypher simply conatins the regular query including terminating semicolon at the end. The result of the query will be piped into the result.txt and the runtime of the execution will be displayed in the terminal.
Moreover, it is possible to list the running queries only in the enterprise edition as correctly stated by #rebecca.
Related
I'm using Neo4j 3.5.14 Enterprise (cypher over http/bolt). I'm seeing an issue where randomly a cypher query would be stuck never to be back again which takes out a worker thread. Eventually, if the service is not redeployed, all worker threads would be stuck and the service is no longer doing its job.
I tried using apoc.cypher.runTimeboxed but that appears to cause my queries to not return until the time limit is over (20000 ms in this case) even though in some cases it can return faster than that. I'm actually not sure that runTimeboxed would work because I believe it is actually stuck forever which might not respond to time limit anyway depending on how that's implemented.
My question is - how would you end a runaway query like that? Any tricks?
In our application we occasionally add around 10,000 nodes and 100,000 relationships to a Neo4J graph over the course of a few minutes, and then DETACH DELETE many of them a few minutes later. Previously the delete query was very quick (<100ms), but after a small change to our data model and some of our other queries (which are not running at the time), it now often blocks for minutes before completing.
While this blocking is happening there are no other queries running, and I have an export from Halin showing all the transactions that are happening at the time. It's difficult to reproduce here, but in summary there are exactly two transactions going on, one of which is my delete query. The delete query is stated to be blocked by the other one, which has 7 locks out, is in the Running state, and has no attached query or client at all. I imagine this means that it's an internal Neo4J process. It has 0 cpu time, and its entire 180s runtime is accounted for by idle time. There's no other information given.
What could be causing this transaction to lock the nodes that I want to delete for such a long time with no queries running?
What I've tried:
Using apoc.periodic.iterate and apoc.periodic.commit to split the query into smaller chunks - the inner queries end up locked
Looking in the query logs - difficult to be sure but I can't see any evidence of the internal transaction
Looking in the debug logs - records of garbage collections (always around 300ms) and some graph algorithms running, but never while this query is blocked, and nothing else relevant
Other info:
Neo4J version: 3.5.18-enterprise (docker)
Cluster mode: HA cluster with 2 nodes (also reproduced with only 1 node)
It turned out that there was a query a few minutes before that had been set going and then the client disconnected (missing await in C#). I still don't quite understand why this caused the observations, but my guess is that Neo4j put the query into a weird state after the client disconnected, and then some part of it ended up waiting for the transaction timeout before releasing its locks.
I've been using Redis as a queue to communicate between distributed python scripts. At any moment, some nodes push and some nodes pop values from a list.
I've run into a problem, however. At a certain point a LPUSH will make the server run out of memmory. As i understand the virtual memmory feature that used to exist in Redis until version 2.4 is considered deprecated (and thus advised against).
The problem i have is that a strategy that discards any key is not acceptable. As such, the server is configured with noeviction (values are not evicted and an error should be returned).
What I would need is a way to find out that the command failed from redis-py so I can make a particular node wait until there is space to push items into the list. I've looked through the code and redis-py itself throws no exceptions (it doesn't use exceptions as a design choice).
LPUSH itself returns the number of records in that particular list, however, since the list is accessed from different nodes, the value itself will tell me nothing.
Any ideas how I can achieve this?
Please tell me if any additional information on the nature of the problem would help clarify it.
I understand that Neo4j supports different options to run the Cypher queries. The web browser, neo4j shell and the REST API.
Is there a difference in performance when using the shell and the API?
I'm working on a dataset that has around 10 million objects(nodes+edges).
Thanks!
The web browser use in the backend the ReST API. The shell is connected directly into neo4j.
So yes you will see performance differences, the shell will generally be more faster. Now using the shell will perform slower that connecting to ReST API in your application because in the shell you can't pass parameters.
In your application, passing parameters will permit that the execution can be cached (after the warmup).
Also, if you have bad indexes and bad queries, running it on a 10 million objects dataset will just result in being not performant in the shell, in the browser and in your application.
Whenever I try to run cypher queries in Neo4j browser 2.0 on large (anywhere from 3 to 10GB) batch-imported datasets, I receive an "Unknown Error." Then Neo4j server stops responding, and I need to exit out using Task Manager. Prior to this operation, the server shuts down quickly and easily. I have no such issues with smaller batch-imported datasets.
I work on a Win 7 64bit computer, using the Neo4j browser. I have adjusted the .properties file to allow for much larger memory allocations. I have configured my JVM heap to 12g, which should be fine for 64bit JDK. I just recently doubled my RAM, which I thought would fix the issue.
My CPU usage is pegged. I have the logs enabled but I don't know where to find them.
I really like the visualization capabilities of the 2.0.4 browser, does anyone know what might be going wrong?
Your query is taking a long time, and the web browser interface reports "Unknown Error" after a certain timeout period. The query is still running, but you won't see the results in the browser. This drove me nuts too when it first happened to me. If you run the query in the neo4j shell you can verify whether or not this is the problem, because the shell won't time out.
Once this timeout occurs, you can find that the whole system becomes quite non-responsive, especially if you re-run the query, because now you have two extremely long queries running in parallel!
Depending on the type of query, you may be able to improve performance. Sometimes it's as simple as limiting the number of returned nodes (in cases where you only need to find one node or path).
Hope this helps.
Grace and peace,
Jim