Reliable (auto)incrementing identifiers for all nodes/relationships in Neo4j

Reliable (auto)incrementing identifiers for all nodes/relationships in Neo4j - neo4j

I'm looking for a way to generate unique identifiers for all my nodes/relationships in Neo4j, based on an incrementing counter (not big long uuids).
The internal ids maintained by the Neo4j engine are known not to be reliable as outside references.
A solution that comes close is the code proposed in this question, but it doesn't work when a single CREATE clause creates multiple new nodes:
// get unique id
MERGE (id:UniqueId{name:'Person'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid
// create a new node attached to every existing :something node
MATCH (n:something)
CREATE (:somethingRelated {id:uid}) -[:rel]-> (n)
When there are multiple (n:something), every newly created (:somethingRelated) will share the same id. Is there some way around this, using only Cypher?

Try this to allocate a block of ids:
// collect nodes to connect
MATCH (n:Crew) WITH collect(n) AS nodes
MERGE (id:UniqueId { name:'Person' })
// reserve id-range
SET id.count = coalesce(id.count,0)+ size(nodes)
WITH nodes, id.count - size(nodes) AS base
// for each index
UNWIND range(0,size(nodes)-1) AS idx
// get node, compute id
WITH nodes[idx] AS n, base +idx AS id
CREATE (:SomethingRelated { uid:id })-[:rel]->(n)

From my point of view it's not possible to do that in Cypher.
I suggest you to write Java Extension for that, because your approach with Cypher would not work in concurrent environment. You are not able to secure uniqueness.
Could you please tell us more about your use case and why you don't to use UUID? - https://github.com/graphaware/neo4j-uuid
Based on your comment below I suggest to create ID's in your application.

Related

I am trying to create the nodes using Cypher with the counter

I am trying to create the nodes using Cypher with the counter how can I do it? I did not find anything helpful in documentation is there anyone who came across this case?

I have node named User and I wanted to create a property named "counterId". This is how I generated a counter from 1 to x where x is the number of all Users.
//collect all users in a list
MATCH (n:User)
WITH collect(n) as n_list
//for each user, set the counterId to this counter i
FOREACH(i IN range(1, size(n_list)+1) | set (n_list[i-1]).counterId = i);

Neo4j cypher query for connected components is too slow

I am looking for help to optimize my following cypher query.
CALL algo.unionFind.stream()
YIELD nodeId,setId
MATCH(n) where ID(n) = nodeId AND NOT (n)-[:IS_CHILD_OF]-()
call apoc.create.uuids(1) YIELD uuid
WITH n as nod, uuid, setId
WHERE nod is not null
MERGE(groupid:GroupId {group_id:'id_'+toString(setId)})
ON CREATE set groupid.group_value = uuid, groupid.updated_at = '1512135348335'
MERGE(nod)-[:IS_CHILD_OF]->(groupid)
RETURN count(nod);
I have already applied the unique constraints and index over group_id. Even I am using a good configurations machine i3-2xl.
The above query is taking too long time ~22 minutes for ~500k nodes.
Following are the things I want to achieve from the above query.
Get all the connected components(sub-graph).
Create a new node for each group(connected components).
Assign uuid as a value of the new group node.
Build the relationship with all the group members with the new group node.
Any suggestions are welcome to optimize the above query, or please let me know if is there any other way to achieve my listed requirement.

querying a DB for an unknown element with a given uuid

Lot of years ago, I discussed with some neo4j engineers about the ability to query an unknown object given it's uuid.
At that time, the answer was that there was no general db index in neo4j.
Now, I have the same problem to solve:
each node I create has an unique id (uuid in the form <nx:u-<uuid>-?v=n> where ns is the namespace, uuid is a unique uuid and v=n is the version number of the element.
I'd like to have the ability to run the following cypher query:
match (n) where n.about = 'ki:u-SSD-v5.0?v=2' return n;
which actually return nothing.
The following query
match (n:'mm:ontology') where n.about = 'ki:u-SSD-v5.0?v=2' return n;
returns what I need, despite the fact that at query time I don't know the element type.
Can anyone help on this?
Paolo

Have you considered adding a achema index to every node in the database for the about attribute?
For instance
Add a global label to all nodes in the graph (e.g. Node) that do not already have it. If your graph is overly large and/or heap overly small you will need to batch this operation. Something along the lines of the following...
MATCH (n)
WHERE NOT n:Node
WITH n
LIMIT 100000
SET n:Node
After the label is added then create an index on the about attribute for your new global label (e.g. Node). These steps can be performed interchangeably as well.
CREATE CONSTRAINT ON (node:Node) assert node.about IS UNIQUE
Then querying with something like the following
MATCH (n:Node)
WHERE n.about = 'ki:u-SSD-v5.0?v=2'
RETURN n;
will return the node you are seeking in a performant manner.

How to get last node created in neo4j?

So I know when you created nodes neo4j has a UUID for each node. I know you can access a particular node by that UUID by accessing the ID. For example:
START n=node(144)
RETURN n;
How would I get the last node that was created? I know I could show all nodes and then run the same command in anotehr query with the corresponding ID, but is there a way to do this quickly? Can I order nodes by id and limit by 1? Is there a simpler way? Either way I have not figured out how to do so through a simple cypher query.

Every time not guaranteed that a new node always has a larger id than all previously created nodes,
So Better way is to set created_at property which stores current time-stamp while creating node.
You can use timestamp() function to store current time stamp
Then,
Match (n)
Return n
Order by n.created_at desc
Limit 1

Please be aware that Neo4j's internal node id is not a UUID. Nor is it guaranteed that a new node always has a larger id than all previously created nodes. The node id is (multiplied with some constant) the offset of the node's location within a store file. Due to space reclaiming a new node might get a lower id number.
BIG FAT WARNING: Do not take any assumption on node ids.
Depending on your requirements you could organize all nodes into a linked list. There is one "magic" node having a specific label, e.g. References that has always a relationship to the latest created node:
CREATE (entryPoint:Reference {to:'latest'}) // create reference node
When a node from your domain is created, you need to take multiple actions:
remove the latest relationships if existing
create your node
connect your new node to the previously latest node
create the reference link
.
MATCH (entryPoint:Reference {to:'latest'})-[r:latest]->(latestNode)
CREATE (domainNode:Person {name:'Foo'}), // create your domain node
(domainNode)-[:previous]->(latestNode), // build up a linked list based on creation timepoint
(entryPoint)-[:latest]->(domainNode) // connect to reference node
DELETE r //delete old reference link

I finally found the answer. The ID() function will return the neo4j ID for a node:
Match (n)
Return n
Order by ID(n) desc
Limit 1;

Inserting Data into Neo4j through neo4j rest binding Batch REST API becomes slow as more data is inserted

I am currently trying to insert lots of data into neo4j. by using the neo4j java-rest-binding library, i am doing batch insertions by 500 cypher queries, Currently I have at most 200k nodes and 1.4m relationships stored in my graph.
With my current data i am already experiencing request timeouts during insertion, I was wondering if there are any configurations that could make the inserting of batch requests faster.
Or maybe some improvements to the query I am currently using
Also here is a sample query being used,
MERGE (firstNode {id:'ABC'})
ON CREATE SET firstNode.type="RINGCODE", firstNode.created = 100, firstNode:rbt
ON MATCH SET firstNode.type="RINGCODE", firstNode:rbt
MERGE (secondNode{id:'RBT-TC664'})
WITH firstNode, secondNode OPTIONAL MATCH firstNode - [existing:`sku`] - ()
DELETE existing
CREATE UNIQUE p = (firstNode)-[r:`sku`]-(secondNode) RETURN p;

Use labels
create an index or unique constraint for the label + property (id)
represent types with labels instead
Otherwise Neo4j has to scan all nodes to find out if the node you want to merge is already in the database.
If you don't need the uniqueness, you can also use create which doesn't check but just creates and doesn't slow down.
What does rbt stand for?
create constraint on (n:Rbt) assert n.id is unique;
MERGE (firstNode:Rbt:RingCode {id:'ABC'})
ON CREATE SET firstNode.created = 100
MERGE (secondNode:Rbt {id:'RBT-TC664'})
WITH firstNode, secondNode
OPTIONAL MATCH firstNode -[existing:`sku`]- ()
DELETE existing
MERGE p = (firstNode)-[r:`sku`]-(secondNode)
RETURN p;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Reliable (auto)incrementing identifiers for all nodes/relationships in Neo4j - neo4j

Related

I am trying to create the nodes using Cypher with the counter

Neo4j cypher query for connected components is too slow

querying a DB for an unknown element with a given uuid

How to get last node created in neo4j?

Inserting Data into Neo4j through neo4j rest binding Batch REST API becomes slow as more data is inserted

Categories

Resources