The way to check progress of Cypher query execution (Neo4j) - neo4j

Are there any function to know the progress of the query execution or estimate the time to return the query result ?
Almost the same question were asked three years ago.
(Is there any way of checking progress of Cypher query execution?)
At that time ,there was no such function.

Sadly, there is no way to see the progress of query.
Neo4j comes with the procedure CALL dbms.listQueries() where you can see some informations about your queries (execution time, cpu, locks, ...) but not the progress of your query (you can also type :queries in the neo4j browser).
Generally what I do to see the progress of a write query with periodic commit (ex: a LOAD CSV), is to write a read query that counts the number of updated/created node.
Cheers.

Related

Neo4j Cypher optimization of complex paginated query

I have a rather long and complex paginated query. I'm trying to optimize it. In the worst case - first, I have to execute the data query in a one call to Neo4j, and then I have to execute pretty much the same query for the count. Of course, I do everything in one transaction. Anyway, I don't like the overall execution time, so I extracted the most common part for both - data and count queries and execute it on the first call. This common query returns the IDs of nodes, which I then pass as parameters to the rest of data and count queries. Now, everything works much faster. One thing I don't like is that a common query can sometimes return quite a large set of IDs.. it can be 20k..50k Long IDs.
So my question is - because I'm doing this in a one transaction - is there a way to preserve such Set of IDs somewhere in Neo4j between common query and data/count query calls and just refer them somehow in the subsequent data/count queries without moving between app JVM and Neo4j?
Also, am I crazy for doing this, or is this a good approach to optimize a complex paginated query?
Only with a custom procedure.
Otherwise you'd need to return them.
But usually it's uncommon to both provide counts (even google doesn't provide "real" counts) and data.
One way is to just stream the results with the reactive driver as long as the user scrolls.
Otherwise I would just query for pageSize+1 and return "more than pageSize results".
If you just stream the id's back (and don't collect them as an aggregation) you can start using the id's received already to issue your new queries (even in parallel).

Jena Query Execution time

I wonder if for sparql query, there is some logging that can be activated that can provide the query execution time, or is this just something that must be done as part of the code that call the query ?
Not as such.
It's also important to remember that Jena uses a streaming query engine so when you call QueryExecution.execSelect() it is not executing the full query rather preparing an iterator that can answer the query. Only when you iterate the results does the query actually get executed so calling code must take this into account when looking to take timings.
Exact behaviour differs with the query kind:
SELECT
execSelect() returns a ResultSet backed by an iterator that evaluates the query as it is iterated, exhaust the ResultSet by iteration to actually execute the query. So time the full iterator exhaustion to time the query execution.
ASK
execAsk() creates the iterator and calls hasNext() on it to get the boolean result, so only need to time execAsk()
CONSTRUCT/DESCRIBE
execConstruct()/execDescribe() fully evaluates the query and returns the resulting model so can just time this call.
Alternatively execConstructTriples()/execDescribeTriples() just prepares an iterator, exhaust the iterator to actually execute the query.
You might want to take a look at a tool like SPARQL Query Benchmarker if you are just looking to benchmark specific queries on your data (or see examples of how to do this kind of timings).
Disclaimer - This was a tool developed and released as OSS as part of my $dayjob some years back. It's using quite outdated versions of Jena but the core techniques still apply.

neo4j browser reports completely unrealistic runtime

I am using Neo4j community 4.2.1, playing with graph databases. I plan to operate on lots of data and want to get familiar with indexes and stuff.
However, I'm stuck at a very basic level because in the browser Neo4j reports query runtimes which have nothing to do with reality.
I'm executing the following query in the browser at http://localhost:7687/:
match (m:Method),(o:Method) where m.name=o.name and m.name <> '<init>' and
m.signature=o.signature and toInteger(o.access)%8 in [1,4]
return m,o
The DB has ~5000 Method labels.
The browser returns data after about 30sec. However, Neo4j reports
Started streaming 93636 records after 1 ms and completed after 42 ms, displaying first 1000 rows.
Well, 42ms and 30sec is really far away from each other! What am I supposed to do with this message? Did the query take only milliseconds and the remaining 30secs were spent rendering the stuff in the browser? What is going on here? How can I improve my query if I cannot even tell how long it really ran?
I modified the query, returning count(m) + count(n) instead of m,n which changed things, now runtime is about 2secs and Neo4j reports about the same amount.
Can somebody tell me how I can get realistic runtime figures of my queries without using the stop watch of my cell?

Why Neo4j index not working with order by?

why neo4j order by is very slow for large database :(
here is the example query:
PROFILE MATCH (n:Item) RETURN n ORDER BY n.name Desc LIMIT 25
and in result it's read all records but i already used index on name property.
here is the result
Click here to see results
it reads all nodes, it's real mess for large number of records.
any solution for this?
or neo4j is not good choice too for us :(
and any way to get last record from nodes?
Your question and problem are not very clear.
1) Are you sure that you added the index correctly?
CREATE INDEX ON :Item(name)
In the Neo4j browser execute :schema to see all your indexes.
2) How many Items does your database hold and what running time are you expecting and achieving?
3) What do you mean by 'last record from nodes'?
Indexes are currently only used to find entry points into the graph, but not for other uses including ordering of results.
Indexed-backed ORDER BY operations have been a highly requested feature for awhile, and while we've been tracking and ordering its priority, we've had several other features that took priority over this work.
I believe indexed-backed ORDER BY operations are currently scheduled very soon, for our 3.5 release coming in the last few months of 2018.

First neo4j query taking more than expected time

I have an API in Django and its structure is something like -
FetchData():
run cypher query1
run cypher query2
run cypher query3
return
When I run these queries in neo4j query window each take around 100ms. But when I call this API, query1 takes 1s and other 2 take expected 100ms to execute. This pattern is repeated every time I call the API.
Can anyone explain what should be done here to run the first query in expected time.
Neo4j tries to cache the graph in RAM. Upon first invocations caches are not warmed up yet, so it takes longer to do the IO operations. Subsequent invocations don't hit IO and read directly from RAM.
That sounds weird. The cache should only need to be warmed if the server or db is shut down, not after each of your API calls. Are you using paramterized queries? The only thing I can think of is maybe each set of queries is different causing them to have to be re-parsed and re-planned.

Resources