neo4j - query DB while updating graph - neo4j

I have a script that runs in the background and "fix" nodes, which means it does a lot of removing and creating relationships.
While this script is running, I try to run the following Cypher query:
MATCH (pr:Property)-[r2:SIMILAR*0..1]-()<-[r]-(it:Item)
WHERE pr.name in ["BLACK","BLACK2"] and toFloat(it.crawler) >= 3.8
return pr.name, type(r),it
I run it a few times. Sometimes I get an answer and sometimes I get something like:
Unable to load RELATIONSHIP with id 9765815.
Neo.ClientError.Statement.EntityNotFound
Of course the 'id' changes all the time.
I understand that in the middle of computation, some of the relationships change. But I thought neo4j knows how to handle it and return the last "true" results (CRUD).
Is there a way to ignore the changes and return the current results?
I'm running neo4j-enterprise 2.0.3.
EDIT :
I'm running the query both from the browser and from the nodejs neo4j agent

Neo4j is fully ACID, the "I" meaning isolation, i.e. transactions should not see each other's uncommitted modifications. Therefore, you're right in saying that Neo4j should handle it.
Here's what I think is happening. The Cypher query, since it executes in a single transaction, always succeeds, no problem there. But something happens afterwards: the "browser" that ships with Neo4j always displays all relationships between nodes that it is displaying, no matter if you asked for them or not. I'm assuming that loading these is part of a different transaction than your original Cypher query. In that case, it can encounter relationships that it thinks exist, but have been deleted in the meantime by your script.
This is an assumption, but should be easy to verify. If it's correct, then:
if you change it (the whole node) to it.someProperty, then the issue should never occur
if you run the original cypher query from (let's say) the command line or via REST, but not from the browser, the issue should never occur
Please let us know your findings. Cheers

Related

apoc.periodic.iterate fails the batch if there is an duplicate data in parameter

I am using an apoc.periodic.iterate query to store millions of data . Since the data may contain duplicates I am using MERGE action to create nodes but unfortunately whenever the data is duplicated the whole batch is getting with error like this
"LockClient[200] can't wait on resource RWLock[NODE(14), hash=1645803399] since => LockClient[200] <-[:HELD_BY]- RWLock[NODE(101)"
Changing parallel as false works fine
Also by removing duplicates the query is passed successfully
But both of the above solution takes more time since dealing with millions of data . Is there any alternate solution like making a it to wait for the lock
You cannot use parallel:true, because you are creating relationships in your query. Every time you want to add a relationship to a node, the cypher engine adds a write lock to a node, and other processes can't add to that particular node. That is why you have the write lock exception. Not much you can do except to run it with parallel:false setting.
To avoid deadlocks, concurrent requests that update the DB should avoid touching the same nodes or relationships (including the nodes on both ends of those relationships). One way to achieve this is to figure out a way to have the concurrent requests work on disjoint subgraphs.
Or, you can retry queries that throw a DeadlockDetectedException. The docs show an example of how to do that.

Can "DISTINCT" in a CYPHER query be responsible of a memory error when the query returns no result?

working on a pretty small graph of 5000 nodes with low density (mean connectivity < 5), I get the following error which I never got before upgrading to neo4j 3.3.0. The graph contains 900 molecules and their scaffold hierarchy, down to 5 levels.
(:Molecule)<-[:substructureOf*1..5]-(:Scaffold)
Neo.TransientError.General.StackOverFlowError
There is not enough stack size to perform the current task. This is generally considered to be a database error, so please contact Neo4j support. You could try increasing the stack size: for example to set the stack size to 2M, add `dbms.jvm.additional=-Xss2M' to in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation just add -Xss2M as command line flag.
The query is actually very simple, I use distinct because several path may lead to a single scaffold.
match (m:Molecule) <-[:substructureOf*3]- (s:Scaffold) return distinct s limit 20
This query returns the above error message whereas the next query does work.
match (m:Molecule) <-[:substructureOf*3]- (s:Scaffold) return s limit 20
Interestingly, the query works on a much larger database, in this small one the deepest hierarchy happened to be 2. Therefore the result of the last query is "No changes, no records)".
How comes that adding DISTINCT to the query fails with that memory error? Is there a way to avoid it, because I cannot guess the depth of the hierarchy which can be different for each molecules.
I tried the following values for as suggested in other posts.
#dbms.memory.heap.initial_size=512m
#dbms.memory.heap.max_size=512m
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=4096m
dbms.memory.heap.initial_size=4096m
dbms.memory.heap.max_size=4096m
None of these addressed the issue.
Thanks in advance for any help or clues.
Thanks for the additional info, I was able to replicate this on Neo4j 3.3.0 and 3.3.1, and this likely has to do with the behavior of the pruning-var-expand operation (that is meant to help when using variable-length expansions and distinct results) that was introduced in 3.2.x, and only when using an exact number of expansions (not a range). Neo4j engineering will be looking into this.
In the meantime, your requirement is such that we can use a different kind of query to get the results you want that should avoid this operation. Try giving this one a try:
match (s:Scaffold)
where (s)-[:substructureOf*3]->(:Molecule)
return distinct s limit 20
And if you do need to perform queries that may produce this error, you may be able to circumvent them by prepending your query with CYPHER 3.1, which will execute this with a plan produced by an older version of Cypher which doesn't use the pruning var expand operation.

Can't get data to load in neo4j -- "no records, no changes"

The other day I started moving a project I was doing from MySQL to Neo4j. I'm still fairly new at Neo4J, but I got everything imported and working fine. I stopped the server, packed my stuff up, and went home.
When I got home I started trying to fiddle with some of the nodes and edges but everytime I call MATCH (n) RETURN n I get nothing back -- All the browser says is "no records, no changes."
I thought that was a little weird, so I ran MATCH (n) RETURN count(n) just on a whim. It returns "154"
I've searched for this same error but nothing relevant shows up when I do.
Just to make it clear: This is not happening when I am trying to load the data from the CSV. The data is already in the database, it just won't show up for some reason.
Anybody got any ideas?
I read you get nothing in the graph view. Never seen that myself.
Do you have results showing in the table or the code tabs ?
Do you have set labels ? Try your query with labels. There is presets in the browser that make the name property displayed for Person nodes, for example.
Maybe, all nodes are white on white with no id displayed.
It may be worth running the consistency checker to see if it detects anything off with the db.
You can only run this on an offline db, and it's usually best to run this on a copy of the db rather than the original db itself.

Different results of two (synonymous) queries in Neo4j

I have identified that some queries happen to return less results than expected. I have taken one of the missing results and tried to force Neo4j to return this result - and I succeeded with the following query:
match (q0),(q1),(q2),(q3),(q4),(q5)
where
q0.name='v4' and q1.name='v3' and q2.name='v5' and
q3.name='v1' and q4.name='v3' and q5.name='v0' and
(q1)-->(q0) and (q0)-->(q3) and (q2)-->(q0) and (q4)-->(q0) and
(q5)-->(q4)
return *
I have supposed that the following query is semantically equivalent to the previous one. However in this case, Neo4j returns no result at all.
match (q1)-->(q0), (q0)-->(q3), (q2)-->(q0), (q4)-->(q0), (q5)-->(q4)
where
q0.name='v4' and q1.name='v3' and q2.name='v5' and
q3.name='v1' and q4.name='v3' and q5.name='v0'
return *
I have also manually verified that the required edges among vertices v0, v1, v3, v4 and v5 are present in the database with right directions.
Am I missing some important difference between these queries or is it just a bug of Neo4j? (I have tested these queries on Neo4j 2.1.6 Community Edition.)
Thank you for any advice
/EDIT: Updating to newest version 2.2.1 was of no help.
This might not be a complete answer, but here's what I found out.
These queries aren't synonymous, if I understand correctly.
First of all, use EXPLAIN (or even PROFILE) to look under the hood. The first query will be executed as follows:
The second query:
As you can see (even without going deep down), those are different queries in terms of both efficiency and semantics.
Next, what's actually going on here:
the 1st query will look through all (single) nodes, filter them by name, then - try to group them according to your pattern, which will involve computing Cartesian product (hence the enormous space complexity), then collect those groups into the larger ones, and then evaluate your other conditions.
the 2nd query will first pick a pair of nodes connected with some relationship (which satisfy the condition on the name property), then throw in the third node and filter again, ..., and so on till the end. The number of nodes is expected to decrease after every filter cycle.
By the way, is it possible that you accidentally set the same name twice (for q1 and q3?)

Neo4j Embedded - Auto Index Multiple Properties

I turned on node auto-indexing and it's indexing the properties I need. If I start up the Neo4j server and open the webadmin, I see that there is an index called node_auto_index as per this post. It works perfectly from the webadmin and I can run Cypher queries like this:
START n=node:node_auto_index('__type:user AND __username:admin') RETURN n
The query returns exactly what I expect. However, if I shut down the server and open the DB in embedded mode from a Scala application, this doesn't work. If I try to run the same Cypher query, I get an error that node_auto_index doesn't exist. I checked the GraphDatabaseService properties, and auto indexing is up and running on the right keys, but when getting a list of all of the index names, the list is always empty. And I can't use the AutoIndex API because it only indexes on one property, and I definitely need both.
So from this point, what would be the best way to ago about querying the auto-index with multiple properties from my Scala (Java) code?
EDIT: I noticed that the ReadableIndex interface (which is what the auto-index is) can take a query string. I can't find much documentation on it, so I'm going to try a few things, but is there any chance that could take a Cypher query? Or just the single-quoted string in my query above?
Turns out that the query function of the ReadableIndex actually takes a Lucene Query, which I now realize is what I had quoted above. So calling this code:
val nodes = db.index.getNodeAutoIndexer.getAutoIndex.query("__type:user AND __username:admin")
Gave me exactly what I wanted.

Resources