Possible indexing bug in Neo4j 1.8.3? - neo4j

Environment: Neo4j Community 1.8,2, Node.js 0.10.22, Debian squeeze, JDK 1.6.x
The problem I'm about to describe is very spurious and we are at a loss to figure out what in our code could be causing it. So this is a shot in the dark...
All of our Nodes are assigned a GUID property on creation via TransactionEventHandler plugin unless they have an existing GUID property. We have auto-indexing enabled for this GUID property. This seems to work fine. The majority of our queries are GUID-based. That is, we often find Nodes by GUID as all or part of the query. We've noticed that rarely an existing Node with guidA is overwritten with the properties of a just-created Node with guidB. Note that in this case, the GUIDs were actually generated by a foreign system (we're importing users from one system into another). We can see this happening because we keep a version history for each GUID. And we can see at the time that this problem occurs both guidA and guidB share the same Neo4j node id. It also might be the case that a Node with guidB had been created and then deleted some time in the past. We have to do more experimentation to confirm this.
One hypotheses is that:
the node with guidB was created in the past and had Neo4j id = 1234.
It was then deleted which allowed id 1234 to be reused at some time in the future. However, the guidB --> 1234 record still existed in the index.
The node with guidA was then created and was given Neo4j id 1234.
The user with guidB was then re-imported into the system, looked up by GUID, and because the original record in the index still remained, the node with id 1234 was found.
The properties of the node with id n were then overwritten with guidB's user properties.
The only reason I came up with this is because I know that the Lucene records are not immediately deleted when the associated node is deleted. Again, this happens infrequently and the key may be the deletion of the node.
Any possibility that this is an indexing bug?

This issue with auto-indexing was fixed at some point.
It only happens across server restarts after the deletion and before the new node is created, that's why it is so rarely.
What you can do is to query the index for the newly deleted GUID then it will be removed. For safekeeping you can also add a check that compares the GUID of the node returned from the index with the GUID searched for.
Probably a good idea to have a job go over your data and check the index / re-index the data by re-setting the guid property.
And as it is a GUID probably use the unique node creation features with the GUID to create the nodes in the first place?

Related

Unable to edit data created on primary node in mongodb after the node became secondary

I'm using mongodb replica set in my rails application with 1 primary(node A) and 1 secondary node(node B).
It was working prefectly fine until i added one more node(node C) and made node C as primary. Now that primary node (node C) is having all the content but as per my observation content created on previous primary(node A) can only be read now but not edited or destroyed. As i have understood that data can only be written to primary node so i guess data from secondary(node A- earlier primary) can only be read while being accessed.
Is this a common behaviour or i'm missing something?
EDIT:
I took a db dump of replica set from the primary node(node C) and then db.dropDatabase() and mongorestore again. I found data missing in some collections. Can anyone explain what could be the issue.
In a mongodb replica set you can only write (modify, create, destroy) on the primary node. Writes are then propagated to other (secondary) nodes in the replica set. Note that this propagation may not be immediate.
However when the primary change you should be able to write on data previously written by another primary.
Note that when you add a node to a replica set, it's preferable to load the latest database backup within this node before. The replication process is based on an oplog shared between each node that indicates creation/deletion/update, however this oplog has a limited number of entries. So earlier entries may not be considered by your new primary ...

How to create a node with id 0?

I deleted the reference node. So I need to recreate the reference node.
Using cypher how to create a node with id 0?
thanks.
The short answer is you can't, and you don't need to. Do you have a specific problem without that node? If so, maybe you can elaborate, chances are there is something else that answers your problem better than trying to recreate a node with a specific id.
The long answer is you can't assign id:s to nodes with cypher. The id is an index or offset into the node storage on disk, so it makes sense to let Neo4j worry about it and not try to manipulate it or include it in any application logic. See Node identifiers in neo4j and Has anyone used Neo4j node IDs as foreign keys to other databases for large property sets?.
You also most likely don't need a reference node. It is created by default in a new database, but it's use is deprecated and it won't exist in future releases. See Is concept of reference node in neo4j still used or deprecated?.
If you still want to assign id to nodes you create, it is accidentally possible in a roundabout way with with the CSV batch importer (1,2) and, I believe, with the Java API batch inserter.
If you still want to recreate or simulate the reference node you can either delete the database data files and let Neo4j recreate the the database, or you can try what this person did: Recreate reference node in a Neo4j database. You can also force Neo4j to recycle the ids of deleted nodes faster, so that new nodes that you create receive those ids that have been freed up and not yet reassigned.

when using neo4js rest api: deleted properties came back

When I use the Neo4j REST API, there seems to be a bug:
A node was indexed by some index. After I deleted some properties of that node, unindex it, and then index it again, those properties came back.
This happens once a while. Not every time.
I'm sure those properties are deleted, by querying that node in the cypher console after the delete operation.
Also, some posts reported this without a satisfying answer: the number of nodes/relationships/properties reported by neo4j webadmin looks crazy. I have 5 (including id 0) nodes, but it shows 932 nodes, 4213 properties. This happens every time. Some people say it's the highest ID in use. I don't think it makes any sense semantically to show the highest ID on the "nodes" label. In addition, the highest ID for my nodes is 466, not 932.
I assume you're judging the properties off the count, instead of off a query?
Neo4j's web console uses meta data to display information like node count, property count, and relationship count. This metadata is not always up to date, but it's much faster to use this then to have to scan the entire Graph Database for this information every time.
Neo4j will adjust these properties every now and then, but it doesn't do a de-fragment of it's information all the time.

after clearing the neo4j database,when creating new node it starts to count from where the increment was before [duplicate]

Is there a possibility to reset the indices once I deleted the nodes just as if deleted the whole folder manually?
I am deleting the whole database with node.delete() and relation.delete() and just want the indices to start at 1 again and not where I had actually stopped...
I assume you are referring to the node and relationship IDs rather than the indexes?
Quick answer: You cannot explicitly force the counter to reset.
Slightly longer answer: Generally speaking, these IDs should not carry any relevance within your application. There have been a number of discussions about this within the Neo4j mailing list and Stack Overflow as the ID is an internal artifact and should not be used like a primary key. It's purpose is more akin to an in-memory address and if you require unique identifiers, you are better off considering something like a UUID.
You can stop your database, delete all the files in the database folder, and start it again.
This way, the ID generation will start back from 1.
This procedure completely wipes your data, so handle with care.
Now you certainly can do this using Python.
see https://stackoverflow.com/a/23310320

Has anyone used Neo4j node IDs as foreign keys to other databases for large property sets?

I am building a large graph database that has a significant set of meta data about each node (thousands of properties per node). I am currently going through the process of determining which meta data should be a node within Neo4j, which should become a property of the Node and which should be housed in a separate database.
My thinking is to use the meta data in 3 ways:
1 - If the property is shared between many nodes, to make that property it's own node and create an edge to that property.
2 - If the property is important to traversing the graph, but not "highly" shared, to add that as a node property. (Which could also be indexed within Neo4j if needed)
3 = If the meta data is strictly describing that node, to have that stored in a separate NoSQL database, with the Neo4J Node ID becoming the foreign key to the other database.
While it seems like the most efficient use of using the graph database, it seems like a pain to have the different property types and having to determine which type of property it is before using it. (Likely a property lookup key-value store) It would also likely mean that I would need an easy way to promote a property from 3 to 2 to 1 for instances when a property becomes highly shared, or needed for efficient traversal.
Has anyone taken this approach? Any thoughts to share, or things to avoid?
Do never ever store a Neo4j node id in an external system. The node id is basically a offset in the respective store file. If you delete a node its id might be reused when new nodes are created.
The right approach is to have a "good" identifier (e.g. uuid) as a node property and put that into Neo4j's index. That uuid is then save to be stored in third party systems.
Some time ago I've created a unmanaged extension that adds a uuid to each new node and prevent manual changes to these uuids: https://github.com/sarmbruster/neo4j-uuid.
Update (2013-08-21)
I've blogged about UUIDs with Neo4j.

Resources