neo4j not reusing existing vertex in cypher create unique query

neo4j not reusing existing vertex in cypher create unique query - neo4j

A portion of my neo4j graph represents objects, their values, and the attributes associated with those values. To use common programming syntax, I'm persisting something like the result of:
Object.Attribute = Value
where Object, Attribute and Value are all nodes, and they're linked by VALUE and ATTRIBUTE relationships like this:
Object-[:VALUE]->Value-[:ATTRIBUTE]->Attribute
To describe this with a specific example, the result of this code:
Object.Colour = 'Red'
would be persisted as:
Object-[:VALUE]->(Value { value:'Red' })-[:ATTRIBUTE]->(Attribute { name:'Colour' })
The problem occurs when I want to modify a persisted state like that above, and I wish to re-use existing Attribute (and ideally, Value) nodes - that is, I don't want to have multiple instances of the Attribute { name:'Colour' } node, I want to have a single instance that is related to each Value node instance.
The following Cypher query will go ahead and create new Value and Attribute nodes every time, regardless of whether identical nodes already exist:
start o=node(something)
create unique o-[:VALUE]->(v {value:'Green'})-[:ATTRIBUTE]->(a {name:'Colour'})
return v;
The following will apparently recycle both Value and Attribute nodes, but of course won't work when the required Attribute doesn't exist (i.e. the first time it's used):
start o=node(something), a=node(something)
create unique o-[:VALUE]->(v {value:'Green'})-[:ATTRIBUTE]->a
where a.name = 'Colour'
return v;
The statement in the documentation that "create unique will always make the least change possible to the graph — if it can use parts of the existing graph, it will", doesn't appear to be completely true, and I don't understand why my query doesn't exhibit this behaviour.
How can I get the "recycling" effect of the latter query, combined with the creation on demand of the Attribute (and Value) when required like the former?

The first problem is that you are building a property graph in a property graph. It's possible, but awkward and not always a good choice.
Your example:
http://console.neo4j.org/r/evl77k
If understood you correctly, you want to be able to change the value node and re-use the same attribute node, right?
The problem is that in your second query, you ask Cypher to find value node with a different property than what is already in your graph. Cypher can't find such a node, and so creates one for you. It then tries to find an outgoing ATTRIBUTE relationship from the newly created node, and of course it doesn't find any. So it creates a new relationship and attribute for you.
If you want to keep using the same attribute node, you simply omit the property value from the value node, like so:
START o=node(0)
CREATE UNIQUE o-[:VALUE]->(v)-[:ATTRIBUTE]->(a {name:'Colour'})
SET v.value = 'Green'
RETURN v
This will find you Colour attribute node, and then set the property for you, instead of creating new paths every time.
Makes sense?
Andrés

Related

Cyper query- Property value change propagation

Hi,
In the above graph, we have a scenario where in any one of value property of a node is updating, the effect of that value, to be propagated to the remaining nodes. How should this value change event be propagated thru' the cypher query.
Appreciate your support

Is the requirement that this particular property should always be the same for this group of nodes? If it must be the same, then I would recommend extracting it into a node instead, and create relationships to that node from all nodes that should be using it.
With the value in a single place, it will only require a single property change on that node and everything will be in the right state.
EDIT
Requirements are rather fuzzy, so my answer will be fuzzy as well.
If you're matching based upon relationship types, then you'll want some kind of multiplicity on the relationship and maybe specifying allowed types in the match. Such as:
MATCH (start:RNode)-[:R45|R34|R23|R12*]->(r:RNode)
WHERE start.ID = 123 (or however you're matching on your start node)
That will match on every single node from your startNode up the relationship chain until there are no more of the allowed relationships to continue traversing.
If you need a more complicated expansion, you may want to look at the APOC Procedure library's Path Expander.
After you find the right matching query, then it should just be a matter of doing the recalculation for all matched nodes.

Do labels and properties have an id value in Neo4j?

I know that nodes and relationships have an integral value that identifies them. Is the same true of labels and/or properties? Is it sufficient to identify a node labeling by giving a node id and a string? That is, is it possible to assign the same label to the same node/relationship more than once?

All nodes and relationships have IDs and can be looked up by their IDs. You can do it like this:
MATCH (n) WHERE id(n) = 5 RETURN n;
IDs should be thought of as an internal implementation detail; don't rely on them to have any particular value or to be consistent or ordered, just rely on them to uniquely identify each node.
In general, IMHO it's good practice to assign your own meaningful identifier to nodes, probably indexed, that you can use to find your nodes, in the same way you'd assign a primary key to a relational database record.
It is possible to assign a label to more than one node; labels should be thought of more as classes of nodes, sort of like entities in an ERD. Typically labels would be things like Person, Company, Job, etc. Labels don't have much to do with identifiers though.

Do labels and properties have an id value in Neo4j?
No
Is it sufficient to identify a node labeling by giving a node id and a string?
If by this you mean that you have the node id and a label value in a String variable then yes, it is. It is also sufficient to just use the ID.
MATCH (n) WHERE ID(n) = 1234 RETURN n
Better:
MATCH (n:YourLabel) WHERE ID(n) = 1234 RETURN n
As FrobberOfBits intimated it is also preferable to ignore (withint reason) the internal IDs that Neo attaches to Nodes/Relationships in your external interactions. This is in part to do with Neo recycling it's internal identifiers (So if you create a Node it gets assigned ID 1, delete that Node, the next created Node could be assigned ID 1), and inpart to do with exposing the internals of the system. Instead you should probably attach meaningful identifiers or UUIDS where required.
Using your own identifier:
MATCH (n:YourLabel{uid:1234}) RETURN n
To make lookups fast, index them:
CREATE INDEX ON :YourLabel(uid)
Is it possible to assign the same label to the same node/relationship more than once?
Nodes can have as many labels as you want to assign them (including none), which is handy when you want to maintain a hierarchy of Node "types". Having a label makes them faster to lookup as Neo has a hint of where to start.
Relationships can only have a single type and that type is immutable. i.e If you want to change from type HAS_A to HAD_A you cannot just change the type, you must delete and re-add the relationship.
A node can be related to another node as many times as you want, using the same relationship type or different relationship types. Performancewise it is better to have different relationship types than to use properties on relationships as the lookup is faster.
CREATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_PET{type:"Cat"}]->(m),
(p)-[:HAS_PET{type:"Dog"}]->(d)
Is fine, as is:
REATE (p:Person{name:"Dave"}), (m:Pet), (d:Pet),
(p)-[:HAS_CAT]->(m),
(p)-[:HAS_DOG]->(d)
Which is now faster if you wanted to do a query for matching all dogs. If Dave's dog gets reassigned you cannot do:
MATCH (p:Person{name:"Dave"})-[rel:HAS_DOG]->()
SET rel:HAS_CAT
But you could do:
MATCH (p:Person{name:"Dave"})<-[rel:HAS_PET{type:"Dog"}]-()
SET rel.type = "Cat"
But I've probably missed the point of the multiple-assignment question.

Not sure what you're aiming for:
yes, property-names, rel-types and labels have internal id's they are not stored as strings
you can identify nodes by label + property + value, one if you have a uniqueness constraint otherwise multiple
you can identify nodes by label (many)

Assumptions regarding Node ID strings in Neo4j - cypher

In my recent question, Modeling conditional relationships in neo4j v.2 (cypher), the answer has led me to another question regarding my data model and the cypher syntax to represent it. Lets say in my model, there is a node CLT1 that is what I'll call the Source node. CLT1 has relationships to other 286 Target nodes. This is a model of a target node:
CREATE
(Abnormally_high:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{Pro1:'x',Prop2:'y',Prop3:'z'})
Key point: I am assuming the string after the CREATE clause is
The ID of this target node
The ID is significant because its content has domain-specific meaning
and is query-able.
in this case its the phrase ...."Abnormally_high".
I made this assumption based on the movie database example.
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
The first strings after CREATE definitely have domain-specific meaning!
In my earlier post I discuss Problem 2. I find that problem 2 arises because among the 286 target nodes, there are many instances where there was at least one more Target node who shares the identical ID. In this instance, the ID is "Abnormally_high". The other Target nodes may differ in the value of any of Label1 - Label10 or the associated properties.
Apparently, Cypher doesn't like that. In Problem 2, I was discussing the ways to deal with the fact that cypher doesn't like using the same node ID multiple times even though the labels or properties were different.
My problem are my assumptions about the Target node ID.
AM I RIGHT?
I am now thinking that I could instead use this....
CREATE (CLT1_target_1:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{name:'Abnormally_high',Prop2:'y',Prop3:'z'})
If indeed the first string after the CREATE clause is an ID, then all I have to do is put a unique target node identifier.... like CLT1_target_1 and increment up to CLT1_target_286. If I do this, then I can have the name as a property and change whatever label or property I want.
Do I have this right?

You are wrong. In Cypher, a node name (like "Abnormally_high") is just a variable name that exists for the lifetime of the query (and sometimes not even that long). The node name used in a Cypher query is never persisted in any way, and can be any arbitrary string.
Also, in neo4j, the term "ID" has a specific meaning. The neo4j DB will automatically assign a (currently) unique integer ID to each new node. You have no control over the ID value assigned to a node. And when a node is deleted, neo4j can reassign its ID to a new node.
You should read the neo4j manual (available at docs.neo4j.org), especially the section on Cypher, to get a better understanding.

Is node reference equality in embedded neo4j guaranteed?

I am using an embedded graph database as part of a java application. Suppose that I carry out some type of cypher query, and return an ExecutionResult which contains a collection of nodes.
These nodes may be assumed to form a connected graph.
Each of these nodes has some relationships, which I can access using node.getRelationships(Direction.OUTGOING). My question is, if the target of one of these relationships already occurs in the Execution result (i.e. the relationship is part of the query template), is it guaranteed that Relationship.getEndPoint == Node X.
I suppose that what I am really asking is, when a transaction in Neo4j returns a node, does it return just the one object, and different queries will just keep returning references to that one object, or does it keep producing new objects which happen to refer to the same data point? Since Node doesn't override the equalsTo method, I have been assuming the former, but I was hoping someone could tell me.

Nodes are not reference-equals. You'll only get NodeProxy objects which are created on the fly in different operations.
But the equals()-method does id-equality so you should use that.
n1.equals(n2)
or if you keep the node id around use
n1.getId() == n2.getId()

See when you create a node neo4j internally assigns it a node-id. All the relationships you create will have reference to the start node id and end node id.
For checking do this
First create a node and save its node id by calling method node.getId()
Now create a relationship to it from another node. And call your relationship.getEndNode().getId() .
You will see the node-ids are same.

It sounds like your asking - does Neo 'out of the box' give concurrency control of database entities, like n-hibernate or entity framework does for SQL.
The answer is no! You will have to manage it yourself. If you do delelop it though, could make you a few bob

Combining two neo4j cypher calls into one call

I have the following two cypher calls that I'd like to combine into one;
start r=relationship:link("key:\"foo\" and value:\"bar\"") return r.guid
This returns a relationship that contains a guid that I need based on a key value pair (in this case key:foo and value:bar).
Lets assume r.guid above returns 12345.
I then need all the property relationships for the object in question based on the returned guid and a property type key;
start r=relationship:properties("to:\"12345\" and key:\"baz\"") return r
This returns several relationships which have the values I need, in this case all property types baz that belong to guid 12345.
How do I combine these two calls into one? I'm sure its simple but I'm stumbling..

The answer I've gotten is that there is no way to perform an index lookup in the middle of a Cypher query, or to use a variable you have declared to perform the lookup.
Perhaps in later version of Cypher, as this ability should be standard especially with the dense node issue and the suggested solution of indexing.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart