Is node reference equality in embedded neo4j guaranteed? - neo4j

I am using an embedded graph database as part of a java application. Suppose that I carry out some type of cypher query, and return an ExecutionResult which contains a collection of nodes.
These nodes may be assumed to form a connected graph.
Each of these nodes has some relationships, which I can access using node.getRelationships(Direction.OUTGOING). My question is, if the target of one of these relationships already occurs in the Execution result (i.e. the relationship is part of the query template), is it guaranteed that Relationship.getEndPoint == Node X.
I suppose that what I am really asking is, when a transaction in Neo4j returns a node, does it return just the one object, and different queries will just keep returning references to that one object, or does it keep producing new objects which happen to refer to the same data point? Since Node doesn't override the equalsTo method, I have been assuming the former, but I was hoping someone could tell me.

Nodes are not reference-equals. You'll only get NodeProxy objects which are created on the fly in different operations.
But the equals()-method does id-equality so you should use that.
n1.equals(n2)
or if you keep the node id around use
n1.getId() == n2.getId()

See when you create a node neo4j internally assigns it a node-id. All the relationships you create will have reference to the start node id and end node id.
For checking do this
First create a node and save its node id by calling method node.getId()
Now create a relationship to it from another node. And call your relationship.getEndNode().getId() .
You will see the node-ids are same.

It sounds like your asking - does Neo 'out of the box' give concurrency control of database entities, like n-hibernate or entity framework does for SQL.
The answer is no! You will have to manage it yourself. If you do delelop it though, could make you a few bob

Related

Neo4j unique IDs by tree with root node counter?

Is using a tree with a counter on the root node, to be referenced and incremented when creating new nodes, a viable way of managing unique IDs in Neo4j? In a previous question on performance on this forum (Neo4j merge performance VS create/set), the approach was described, and it occurred to me it may suggest a methodology for unique ID management without having to extend the Neo4j database (and support that extension). However, I noticed this approach has not been mentioned in other discussions on best practice for unique ID management (Best practice for unique IDs in Neo4J and other databases?).
Can anyone help validate or reject this approach?
Thanks!
You can just create a singleton node (I'll give it the label IdCounter in my example) to hold the "next-valid ID counter" value. There is no need for it be part of any "tree" or for it to have any relationships at all.
When you create the singleton, initialize it with the first id value that you want to use. For example:
CREATE (:IdCounter {nextId: 1});
Here is a simple example of how to use it when creating a new node.
MATCH (c:IdCounter)
CREATE (x {id: c.nextId})
SET c.nextId = c.nextId + 1
RETURN x;
Since all Cypher queries are transactional, if the node creation did not happen for any reason, then the nextId increment would also not be done, so you should never end up with any gaps in assigned id numbers.
However, to avoid re-using the same id number, you would have to write your queries carefully to ensure that the increment always happens whenever you create a new node (using CREATE, CREATE UNIQUE, or MERGE).

Cypher: Create relationships between nodes based on a common property key id

I'm brand new to Cypher (and Stackoverflow) and am having trouble creating relationships between nodes based on share property keys.
I would like to do something like this:
MATCH (a:Person)-->()<--(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
to create a relationship between Country node and Person nodes where they share the same id.
The above creates no errors when run but doesn't create any relationships either and I know that the ids should match.
Many thanks!!
EDIT:
I think I know what is going wrong - I'm asking to match nodes that have a relationship to eachother but no relationships are set up yet hence 0 results. I have now tried:
MATCH (a:Person),
(b:Country)
WHERE HAS (a.id) AND HAS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);
and the query is running. It's a big data set so might take a while......
That worked. Had to reduce the size of my data set (down from 64k nodes) as Neo4j was taking way too long to process but once I had a smaller set it worked fine.
One minor addition for future Googlers.
per the help files as of version 3.4
The has() function has been superseded by exists() and has been removed.
The new code should read
MATCH (a:Person),
(b:Country)
WHERE EXISTS (a.id) AND EXISTS (b.id) AND a.id=b.id
CREATE (a)-[:LIVES]->(b);

Assumptions regarding Node ID strings in Neo4j - cypher

In my recent question, Modeling conditional relationships in neo4j v.2 (cypher), the answer has led me to another question regarding my data model and the cypher syntax to represent it. Lets say in my model, there is a node CLT1 that is what I'll call the Source node. CLT1 has relationships to other 286 Target nodes. This is a model of a target node:
CREATE
(Abnormally_high:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{Pro1:'x',Prop2:'y',Prop3:'z'})
Key point: I am assuming the string after the CREATE clause is
The ID of this target node
The ID is significant because its content has domain-specific meaning
and is query-able.
in this case its the phrase ...."Abnormally_high".
I made this assumption based on the movie database example.
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
The first strings after CREATE definitely have domain-specific meaning!
In my earlier post I discuss Problem 2. I find that problem 2 arises because among the 286 target nodes, there are many instances where there was at least one more Target node who shares the identical ID. In this instance, the ID is "Abnormally_high". The other Target nodes may differ in the value of any of Label1 - Label10 or the associated properties.
Apparently, Cypher doesn't like that. In Problem 2, I was discussing the ways to deal with the fact that cypher doesn't like using the same node ID multiple times even though the labels or properties were different.
My problem are my assumptions about the Target node ID.
AM I RIGHT?
I am now thinking that I could instead use this....
CREATE (CLT1_target_1:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{name:'Abnormally_high',Prop2:'y',Prop3:'z'})
If indeed the first string after the CREATE clause is an ID, then all I have to do is put a unique target node identifier.... like CLT1_target_1 and increment up to CLT1_target_286. If I do this, then I can have the name as a property and change whatever label or property I want.
Do I have this right?
You are wrong. In Cypher, a node name (like "Abnormally_high") is just a variable name that exists for the lifetime of the query (and sometimes not even that long). The node name used in a Cypher query is never persisted in any way, and can be any arbitrary string.
Also, in neo4j, the term "ID" has a specific meaning. The neo4j DB will automatically assign a (currently) unique integer ID to each new node. You have no control over the ID value assigned to a node. And when a node is deleted, neo4j can reassign its ID to a new node.
You should read the neo4j manual (available at docs.neo4j.org), especially the section on Cypher, to get a better understanding.

How to create a node with id 0?

I deleted the reference node. So I need to recreate the reference node.
Using cypher how to create a node with id 0?
thanks.
The short answer is you can't, and you don't need to. Do you have a specific problem without that node? If so, maybe you can elaborate, chances are there is something else that answers your problem better than trying to recreate a node with a specific id.
The long answer is you can't assign id:s to nodes with cypher. The id is an index or offset into the node storage on disk, so it makes sense to let Neo4j worry about it and not try to manipulate it or include it in any application logic. See Node identifiers in neo4j and Has anyone used Neo4j node IDs as foreign keys to other databases for large property sets?.
You also most likely don't need a reference node. It is created by default in a new database, but it's use is deprecated and it won't exist in future releases. See Is concept of reference node in neo4j still used or deprecated?.
If you still want to assign id to nodes you create, it is accidentally possible in a roundabout way with with the CSV batch importer (1,2) and, I believe, with the Java API batch inserter.
If you still want to recreate or simulate the reference node you can either delete the database data files and let Neo4j recreate the the database, or you can try what this person did: Recreate reference node in a Neo4j database. You can also force Neo4j to recycle the ids of deleted nodes faster, so that new nodes that you create receive those ids that have been freed up and not yet reassigned.

Neo4j Key-Value List recommended implementation

I've been using Neo4j for a little while now and have an app up and running using Neo4j, its all working really well and Neo4j has been really cool at solving this problem, but I now need to extend the app and having been trying to impl. a Key-Value List of data into Neo4j and I'm not sure the best way to go about it.
I have a List, the list is around 7 million elements in length and so a bit long for just storing the whole list in memory and managing it myself. I tested this and it would consume 3Gb.
My choices are either:
(a) Neo4j is just the wrong tool for the job and I should use an actual key-value data store. A little adverse to do this as I'd have to introduce another data store just for this list of data.
(b) Use Neo4j, by creating a node per key-value setting the key and value as properties on the node, but there is no relationship other then having an index to group these nodes together, exposing the key of the key-value as the key on the index.
(c) Create a single node and hold all key-values as properties, this feels wrong, because when getting the node it will load the whole thing into memory, then I'd have to search the properties for the one I'm interested in, and I might as well manage the List myself.
(d) The key is a two part key that actually points to two nodes, so create a relationship and set the value as a property on the relationship. I started down this path, but when it came to doing a lookup for a specific key/value it's not simple and fast, so backed away from this.
Either options 'a' or 'b' feel the way to go.
Any advice would be appreciated.
Example scenario
We have Node A and Node B which has a relationship between the two Nodes.
The nodes all have a property of 'foo', with foo having some value.
In this example node A has foo=X and Node B has foo=Y
We then have this list of K/Vs. One of those K/V is Key:X+Y=Value:Z
So, the original idea was to create another relationship between Node A and Node B and store a property on the relationship holding Z. Then create an index on 'foo' and a relationship idx on the new relationship.
When given Key X+Y get the value.
Lookup logic would be get Node A (from X) and Node B (from y), then walk through Node A relationships to Node B lookup for this new relationship type.
While this will work, I do not like the fact I have to lookup through all relationships to/from the nodes looking for a specific type this is inefficient. Especially if there are many relationships of different types.
So the conclusion to go with options 'A' or 'B', or I'm trying to do something impractical with Neo.
Don't try to store 7 million items in a Neo4j property -- you're right, that's wrong.
Redis and Neo4j often make a good pairing, but I don't quite understand what you're trying to do or what you mean in "d" -- what are the key/value pairs, and how do they relate to the nodes and relationships in the graph? Examples would help.
UPDATE: The most natural way to do this with a graph database is to store it as a property on the edge between the two nodes. Then you can use Gremlin to get its value.
For example, to return a property on an edge that exists between two vertices (nodes) that have some properties:
start = g.idx('vertices')[[key:value]] // start vertex
edge = start.outE(label).as('e') // edge
end = edge.inV.filter{it.someprop == somevalue} // end vertex
prop = end.back('e').prop // edge property
return prop
You could store it in an index like you suggested, but this adds more complexity to your system, and if you need to reference the data as part of the traversal, then you will either have to store duplicate data or look it up in Redis during the traversal, which you can do, see:
Have Gremlin Talk to Redis in Real Time while It's Walking the Graph
https://groups.google.com/d/msg/gremlin-users/xhqP-0wIg5s/bxkNEh9jSw4J
UPDATE 2:
If the ID of vertex a and b are known ahead of time, then it's even easier:
g.v(a).outE(label).filter{it.inVertex.id == b}.prop
If vertex a and b are known ahead of time, then it's:
a.outE(label).filter{it.inVertex == b}.prop

Resources