Is it always good to create an ID for a node? - neo4j

My data doesn't come from relational database and so it lacks a unique ID, but parsed from a variety of data sources. Is it good to create a unique sequential ID for each node?
I know Neo4j will automatically create a ID internally but I am talking about user created ID.

IMHO, it’s always useful to have a unique id for nodes. It doesn’t need to be sequential necessarily. I wouldn’t rely on the ID neo4j generates for every node because it’s reusable, i.e., if you delete a node, the ID of the deleted node can be reused for a node you may create later. But all in all, it depends in your requirements.
Here you can find out how you can create UUID in neo4j:
https://neo4j.com/docs/labs/apoc/current/graph-updates/uuid/

As you mentioned already, Neo4j always creates an id for you. There is no need for a user defined one.

Related

How to handle cypher query common stanzas

I'm writing a bunch of queries in order to build a tree inside Neo4j, but in order to add different types of new data, I'm writing the same opening stanzas for each of my queries.
Example: I want to be able to add Root(identifier=Root1)->A(identifier=1)->B(identifier=2)... without modifying the trees pointed to by other roots.
All of my queries start off with
Match
(root:`Root` {identifier=$identifier})
Create
(root)-[:`someRel`]->(a:`A` {identifier=$a_identifier})
Then some time passes and A needs a child:
Match
(root:`Root` {identifier=$identifier})
-[:`someRel`]->
(a:`A` {identifier=$a_identifier})
Create
(a)-[:`someOtherRel`]->(b:`B` {identifier=$b_identifier})
Then some other time passes and maybe B needs a child, and I have to use the same opening stanza to get to A and then add another one to get the correct B.
Is there some functionality that I'm missing that will allow me to not have to build up those opening stanzas every time I want to get to the correct B, (or C or D) or do I just need to do this using string concatenation?
String concatenation example: (python)
MATCH
{ROOT_LOOKUP_STANZA},
{A_LOOKUP_STANZA},
{B_LOOKUP_STANZA},
CREATE
(b)-[:`c_relationship`]->(c:`C` {...})
Some additional notes:
Root Nodes have to be uniquely identified
The rest of the nodes have to be uniquely identified with their parents. So the following is valid:
Root(root)->A(a)->B(b)
Root(root)->A(a1)->B(b)
In this case B(b) references two different nodes because their parents are different.
So your main problem is that children do not have unique ids, only the root nodes have unique ids. Neo4j does not have have any mechanic (yet) to carry the final context of one query into the start of another, and makes no guarantee that a nodes internal id will be the same between queries. So for your data as is, you must match the whole chain to be sure you match the correct node to append to. There are a few things you can do make this not necessary though.
Add a UUID
By adding a universally unique id to each node (, and indexing that property,) you will be able to match on that id with the guarantee that there are no collisions and it will be the same across queries. Any time using a nodes internal ID would be useful, that is a good sign you could use UUID's in your data. (Also helps if the data is mirrored to other databases)
Store the path as a Unique ID
It's possible you don't know the UUID assigned in Neo4j (because it's not in the source data), but in a tree you can create a unique ID in the format of <parent-ID>_<index><sorted-labels><source-id>. The idea here is that the parent is guaranteed to have a unique id, and you combine that id with the info that makes this child unique to that parent. This allows you do generate a deterministic unique ID. (Requires a tree data structure, with a unique root id) In most cases, you can probably leave the index part out (that is for cases of lists/arrays in the source data). In essence, you are storing the path from the root node to this node as the nodes unique id. (Again, you will want an index on this id)
Batch the job
If this is all part of just one job, another option is pool the changes you want to make, and generate one cypher that will do all of them while Neo4j already has everything fetched.

Neo4j unique IDs by tree with root node counter?

Is using a tree with a counter on the root node, to be referenced and incremented when creating new nodes, a viable way of managing unique IDs in Neo4j? In a previous question on performance on this forum (Neo4j merge performance VS create/set), the approach was described, and it occurred to me it may suggest a methodology for unique ID management without having to extend the Neo4j database (and support that extension). However, I noticed this approach has not been mentioned in other discussions on best practice for unique ID management (Best practice for unique IDs in Neo4J and other databases?).
Can anyone help validate or reject this approach?
Thanks!
You can just create a singleton node (I'll give it the label IdCounter in my example) to hold the "next-valid ID counter" value. There is no need for it be part of any "tree" or for it to have any relationships at all.
When you create the singleton, initialize it with the first id value that you want to use. For example:
CREATE (:IdCounter {nextId: 1});
Here is a simple example of how to use it when creating a new node.
MATCH (c:IdCounter)
CREATE (x {id: c.nextId})
SET c.nextId = c.nextId + 1
RETURN x;
Since all Cypher queries are transactional, if the node creation did not happen for any reason, then the nextId increment would also not be done, so you should never end up with any gaps in assigned id numbers.
However, to avoid re-using the same id number, you would have to write your queries carefully to ensure that the increment always happens whenever you create a new node (using CREATE, CREATE UNIQUE, or MERGE).

Is node reference equality in embedded neo4j guaranteed?

I am using an embedded graph database as part of a java application. Suppose that I carry out some type of cypher query, and return an ExecutionResult which contains a collection of nodes.
These nodes may be assumed to form a connected graph.
Each of these nodes has some relationships, which I can access using node.getRelationships(Direction.OUTGOING). My question is, if the target of one of these relationships already occurs in the Execution result (i.e. the relationship is part of the query template), is it guaranteed that Relationship.getEndPoint == Node X.
I suppose that what I am really asking is, when a transaction in Neo4j returns a node, does it return just the one object, and different queries will just keep returning references to that one object, or does it keep producing new objects which happen to refer to the same data point? Since Node doesn't override the equalsTo method, I have been assuming the former, but I was hoping someone could tell me.
Nodes are not reference-equals. You'll only get NodeProxy objects which are created on the fly in different operations.
But the equals()-method does id-equality so you should use that.
n1.equals(n2)
or if you keep the node id around use
n1.getId() == n2.getId()
See when you create a node neo4j internally assigns it a node-id. All the relationships you create will have reference to the start node id and end node id.
For checking do this
First create a node and save its node id by calling method node.getId()
Now create a relationship to it from another node. And call your relationship.getEndNode().getId() .
You will see the node-ids are same.
It sounds like your asking - does Neo 'out of the box' give concurrency control of database entities, like n-hibernate or entity framework does for SQL.
The answer is no! You will have to manage it yourself. If you do delelop it though, could make you a few bob

How to create a node with id 0?

I deleted the reference node. So I need to recreate the reference node.
Using cypher how to create a node with id 0?
thanks.
The short answer is you can't, and you don't need to. Do you have a specific problem without that node? If so, maybe you can elaborate, chances are there is something else that answers your problem better than trying to recreate a node with a specific id.
The long answer is you can't assign id:s to nodes with cypher. The id is an index or offset into the node storage on disk, so it makes sense to let Neo4j worry about it and not try to manipulate it or include it in any application logic. See Node identifiers in neo4j and Has anyone used Neo4j node IDs as foreign keys to other databases for large property sets?.
You also most likely don't need a reference node. It is created by default in a new database, but it's use is deprecated and it won't exist in future releases. See Is concept of reference node in neo4j still used or deprecated?.
If you still want to assign id to nodes you create, it is accidentally possible in a roundabout way with with the CSV batch importer (1,2) and, I believe, with the Java API batch inserter.
If you still want to recreate or simulate the reference node you can either delete the database data files and let Neo4j recreate the the database, or you can try what this person did: Recreate reference node in a Neo4j database. You can also force Neo4j to recycle the ids of deleted nodes faster, so that new nodes that you create receive those ids that have been freed up and not yet reassigned.

Has anyone used Neo4j node IDs as foreign keys to other databases for large property sets?

I am building a large graph database that has a significant set of meta data about each node (thousands of properties per node). I am currently going through the process of determining which meta data should be a node within Neo4j, which should become a property of the Node and which should be housed in a separate database.
My thinking is to use the meta data in 3 ways:
1 - If the property is shared between many nodes, to make that property it's own node and create an edge to that property.
2 - If the property is important to traversing the graph, but not "highly" shared, to add that as a node property. (Which could also be indexed within Neo4j if needed)
3 = If the meta data is strictly describing that node, to have that stored in a separate NoSQL database, with the Neo4J Node ID becoming the foreign key to the other database.
While it seems like the most efficient use of using the graph database, it seems like a pain to have the different property types and having to determine which type of property it is before using it. (Likely a property lookup key-value store) It would also likely mean that I would need an easy way to promote a property from 3 to 2 to 1 for instances when a property becomes highly shared, or needed for efficient traversal.
Has anyone taken this approach? Any thoughts to share, or things to avoid?
Do never ever store a Neo4j node id in an external system. The node id is basically a offset in the respective store file. If you delete a node its id might be reused when new nodes are created.
The right approach is to have a "good" identifier (e.g. uuid) as a node property and put that into Neo4j's index. That uuid is then save to be stored in third party systems.
Some time ago I've created a unmanaged extension that adds a uuid to each new node and prevent manual changes to these uuids: https://github.com/sarmbruster/neo4j-uuid.
Update (2013-08-21)
I've blogged about UUIDs with Neo4j.

Resources