Node identifiers in neo4j - neo4j

I'm new to Neo4j - just started playing with it yesterday evening.
I've notice all nodes are identified by an auto-incremented integer that is generated during node creation - is this always the case?
My dataset has natural string keys so I'd like to avoid having to map between the Neo4j assigned ids and my own. Is it possible to use string identifiers instead?

Think of the node-id as an implementation detail (like the rowid of relational databases, can be used to identify nodes but should not be relied on to be never reused).
You would add your natural keys as properties to the node and then index your nodes with the natural key (or enable auto-indexing for them).
E..g in the Java API:
Index<Node> idIndex = db.index().forNodes("identifiers");
Node n = db.createNode();
n.setProperty("id", "my-natural-key");
idIndex.add(n, "id",n.getProperty("id"));
// later
Node n = idIndex.get("id","my-natural-key").getSingle(); // node or null
With auto-indexer you would enable auto-indexing for your "id" field.
// via configuration
GraphDatabaseService db = new EmbeddedGraphDatabase("path/to/db",
MapUtils.stringMap(
Config.NODE_KEYS_INDEXABLE, "id", Config.NODE_AUTO_INDEXING, "true" ));
// programmatic (not persistent)
db.index().getNodeAutoIndexer().startAutoIndexingProperty( "id" );
// Nodes with property "id" will be automatically indexed at tx-commit
Node n = db.createNode();
n.setProperty("id", "my-natural-key");
// Usage
ReadableIndex<Node> autoIndex = db.index().getNodeAutoIndexer().getAutoIndex();
Node n = autoIndex.get("id","my-natural-key").getSingle();
See: http://docs.neo4j.org/chunked/milestone/auto-indexing.html
And: http://docs.neo4j.org/chunked/milestone/indexing.html

This should help:
Create the index to back automatic indexing during batch import We
know that if auto indexing is enabled in neo4j.properties, each node
that is created will be added to an index named node_auto_index. Now,
here’s the cool bit. If we add the original manual index (at the time
of batch import) and name it as node_auto_index and enable auto
indexing in neo4j, then the batch-inserted nodes will appear as if
auto-indexed. And from there on each time you create a node, the node
will get indexed as well.**
Source : Identifying nodes with Custom Keys

According Neo docs there should be automatic indexes in place
http://neo4j.com/docs/stable/query-schema-index.html
but there's still a lot of limitations

Beyond all answers still neo4j creates its own ids to work faster and serve better. Please make sure internal system does not conflict between ids then it will create nodes with same properties and shows in the system as empty nodes.

the ID's generated are default and cant be modified by users. user can use your string identifiers as a property for that node.

Related

Idempotency check in Neo4J

Does anybody know how to insert into Neo4J based on the existing data? This is part of data ingestion.
For example, there is already a node in neo4j with an attribute updatedAt. Whenever a new pubsub event to ingest data is received, I need to check for the updatedAt attribute of both incoming data and the existing node in neo4j and decide which data needs to be discarded.
In the end, neo4j should have a node with the latest updatedAt value.
Is there any way other than Triggers, APOC?
Please help. I am really new to Neo4j. Thanks
I think that what you are looking for is Cypher MERGE query and more precisely MERGE query with ON CREATE and ON MATCH. For example:
MERGE (n:TYPE_OF_NODE { id: 'ID_OF_NODE_TO_BE_FOUND' })
ON CREATE SET
n.createdAt = INPUT_DATE,
.../* Set other attributes if needed */
ON MATCH SET n.updatedAt =
CASE WHEN n.updatedAt < INPUT_DATE THEN INPUT_DATE ELSE n.updatedAt END,
... /* Set other attributes if needed */
RETURN n
The query above checks if a node of a given type TYPE_OF_NODE and with id equal to ID_OF_NODE_TO_BE_FOUND exists. By INPUT_DATE I mean some date that was received together with an event to ingest data.
If a node does not exist, it will be created and some attributes will be set
(see ON CREATE SET).
If it exists, then it will be updated (see
ON MATCH SET). Specifically, updateAt attribute will be set to
the later value and for that I used CASE statement.

Auto increment id Neo4j to retrieve elements in insert order

Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.

Neo4j SDN 4 emulate sequence object(not UUID)

Is it possible in Neo4j or SDN4 to create/emulate something similar to a PostgreSQL sequence database object?
I need this thread safe functionality in order to be able to ask it for next, unique Long value. I'm going to use this value as a surrogate key for my entities.
UPDATED
I don't want to go with UUID because I have to expose these IDs within my web application url parameters and in case of UUID my urls look awful. I want to go with a plain Long values for IDs like StackOverflow does, for example:
stackoverflow.com/questions/42228501/neo4j-sdn-4-emulate-sequence-objectnot-uuid
This can be done with user procedures and functions. As an example:
package sequence;
import org.neo4j.procedure.*;
import java.util.concurrent.atomic.AtomicInteger;
public class Next {
private static AtomicInteger sequence = new AtomicInteger(0);
#UserFunction
public synchronized Number next() {
return sequence.incrementAndGet();
}
}
The problem of this example is that when the server is restarted the counter will be set to zero.
So it is necessary to keep the last value of the counter. This can be done using these examples:
https://maxdemarzi.com/2015/03/25/triggers-in-neo4j/
https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/master/src/main/java/apoc/trigger/Trigger.java
No. As far as I'm aware there isn't any similar functionality to sequences or auto increment identifiers in Neo4j. This question has also been asked a few times in the past.
The APOC project might be worth checking out for this though. There seems to be a request to add it.
If your main interest is in having a way to generate unique IDs, and you do not care if the unique IDs are strings, then you should consider using the APOC facilities for generating UUIDs.
There is an APOC function that generates a UUID, apoc.create.uuid. In older versions of APOC, this is a procedure that must be invoked using the CALL syntax. For example, to create and return a single Foo node with a new UUID:
CREATE (f:Foo {uuid: apoc.create.uuid()})
RETURN f;
There is also an APOC procedure, apoc.create.uuids(count), that generates a specified number of UUIDs. For example, to create and return 5 Foo nodes with new UUIDs:
CALL apoc.create.uuids(5) YIELD uuid
CREATE (f:Foo {uuid: uuid})
RETURN f;
The most simplest way in Neo4j is to disable ids reuse and use node Graph ID like sequencer.
https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
Table A.83. dbms.ids.reuse.types.override
Description: Specified names of id types (comma separated) that should be reused. Currently only 'node' and 'relationship' types are supported.
Valid values: dbms.ids.reuse.types.override is a list separated by "," where items are one of NODE, RELATIONSHIP
Default value: [RELATIONSHIP, NODE]

Adding index on neo4j node property value

I have imported freebase dump to neo4j. But currently i am facing issue with get queries because of size of db. While import i just created node index and indexed URI property to index for each node. For each node i am adding multiple properties like label_en, type_content_type_en.
props.put(URI_PROPERTY, subject.stringValue());
Long subjectNode = db.createNode(props);
tmpIndex.put(subject.stringValue(), subjectNode);
nodeIndex.add(subjectNode, props);
Now my cypher queries are like this. Which are timing out. I am unable to add index on label_en property. Can anybody help?
match (n)-[r*0..1]->(a) where n.label_en=~'Hibernate.*' return n, a
Update
BatchInserter db = BatchInserters.inserter("ttl.db", config);
BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider(db);
BatchInserterIndex index = indexProvider.nodeIndex("ttlIndex", MapUtil.stringMap("type", "exact"));
Question: When i have added node in nodeindex i have added with property URI
props.put(URI_PROPERTY, subject.stringValue());
Long subjectNode = db.createNode(props);
nodeIndex.add(subjectNode, props);
Later in code i have added another property to node(Named as label_en). But I have not added or updated nodeindex. So as per my understanding lucene does not have label_en property indexed. My graph is already built so i am trying to add index on label_en property of my node because my query is on label_en.
Your code sample is missing how you created your index. But I'm pretty sure what you're doing is using a legacy index, which is based on Apache Lucene.
Your Cypher query is using the regex operator =~. That's not how you use a legacy index; this seems to be forcing cypher to ignore the legacy index, and have the java layer run that regex on every possible value of the label_en property.
Instead, with Cypher you should use a START clause and use the legacy indexing query language.
For you, that would look something like this:
START n=node:my_index_name("label_en:Hibernate.*")
MATCH (n)-[r*0..1]->(a)
RETURN n, a;
Notice the string label_en:Hibernate.* - that's a Lucene query string that says to check that property name for that particular string. Cypher/neo4j is not interpreting that; it's passing it through to Lucene.
Your code didn't provide the name of your index. You'll have to change my_index_name above to whatever you named it when you created the legacy index.

Neo4j 2.0 Merge with unique constraints performance bug?

Here's the situation:
I have a node that has a property ContactId which is set as unique and indexed. The node label is :Contact
(node:Contact {ContactId:1})
I have another node similar to that pattern for Address:
(node2:Address {AddressId:1})
I now try to add a new node that (among other properties, includes ContactId (for referencing))
(node3:ContactAddress {AddressId:1,ContactId:1})
When I run a merge command for each, the time for adding a node that contains a property that is set as unique in another node type seems to make the process much slower.
The ContactAddress node only contains relational properties between the Contact and Address nodes. Contact and Address nodes contain up to 10 properties each. Is this a bug, where Neo4j is check the property key -> value -> then node label?
Code and screenshot below:
string strForEach = string.Format("(n in {{{0}}} |
MERGE (c:{1} {{{2} : n.{2}}}) SET c = n)", propKey, label, PK_Field);
var query = client
.Cypher
.ForEach(strForEach)
.WithParam(propKey, entities.ToList());
Constraint checks are more expensive than just inserts. They also take a global lock on the constraint to prevent multiple insertion.
I saw you don't use parameters, but string substitiution, I really recommend to change that and go with parameters.
Also setting the whole node c to n triggers constraint check again.
Your probably want to use the ON CREATE SET clause of MERGE
(n in {nodes} |
MERGE (c:Label {key : n.key}}) ON CREATE SET c.foo = n.foo, c.bar = n.bar )

Resources