Create a node in neo4j if not present. - neo4j

Is it possible to create a node only if it not present in the graph.
Example node A is already present, So my query should Check if node A is already present if not create a node. I don't want to use constraint here.
It's needed for load data from mysql without duplicate entries.

Yes, you want the MERGE keyword:
MERGE either matches existing nodes and binds them, or it creates new data and binds that. It’s like a combination of MATCH and CREATE that additionally allows you to specify what happens if the data was matched or created.
For example, you can specify that the graph must contain a node for a user with a certain name. If there isn’t a node with the correct name, a new node will be created and its name property set.
Use whichever columns that make your rows in MySQL unique.
http://neo4j.com/docs/stable/query-merge.html

Related

MERGE the Creation of Nodes from Two geohash Columns in CSV

So I am planning to create a geohash Graph with neo4j.
my CSV contains ,for each row, two informations for geohash one for pickup and another for dropoff as follow :
What I want is:
the node that have the same geohash as another one shouldn't be recreated (so multiple edges are allowed).
one node could be a pickup and a dropoff in the same time
I tried to use MERGE but works by columns:
load csv from "file:///green_data.csv" as line
merge(pick:pickup{geohash:line[20]})merge (drop:dropoff{geohash: line[22]})merge(pick)-[:trip]->(drop)
as you can see , the same geohash dr5rkky node is being created twice one for pickups and another for dropoffs
how to avoid that ?
load csv from "file:///green_data.csv" as line MERGE(p:HashNode {geohash: line[20]}) ON CREATE set p.pickup=True ON MATCH set p.pickup=True MERGE(d:HashNode {geohash: line[22]}) ON CREATE set d.dropoff=True ON MATCH set d.dropoff=True MERGE (p)-[:trip]->(d)
Base on neo4j docs:
MERGE either matches existing nodes and binds them, or it creates new data and binds that. It’s like a combination of MATCH and CREATE that additionally allows you to specify what happens if the data was matched or created.
The last part of MERGE is the ON CREATE and ON MATCH. These allow a query to express additional changes to the properties of a node or relationship, depending on if the element was MATCH -ed in the database or if it was CREATE -ed.

How to handle cypher query common stanzas

I'm writing a bunch of queries in order to build a tree inside Neo4j, but in order to add different types of new data, I'm writing the same opening stanzas for each of my queries.
Example: I want to be able to add Root(identifier=Root1)->A(identifier=1)->B(identifier=2)... without modifying the trees pointed to by other roots.
All of my queries start off with
Match
(root:`Root` {identifier=$identifier})
Create
(root)-[:`someRel`]->(a:`A` {identifier=$a_identifier})
Then some time passes and A needs a child:
Match
(root:`Root` {identifier=$identifier})
-[:`someRel`]->
(a:`A` {identifier=$a_identifier})
Create
(a)-[:`someOtherRel`]->(b:`B` {identifier=$b_identifier})
Then some other time passes and maybe B needs a child, and I have to use the same opening stanza to get to A and then add another one to get the correct B.
Is there some functionality that I'm missing that will allow me to not have to build up those opening stanzas every time I want to get to the correct B, (or C or D) or do I just need to do this using string concatenation?
String concatenation example: (python)
MATCH
{ROOT_LOOKUP_STANZA},
{A_LOOKUP_STANZA},
{B_LOOKUP_STANZA},
CREATE
(b)-[:`c_relationship`]->(c:`C` {...})
Some additional notes:
Root Nodes have to be uniquely identified
The rest of the nodes have to be uniquely identified with their parents. So the following is valid:
Root(root)->A(a)->B(b)
Root(root)->A(a1)->B(b)
In this case B(b) references two different nodes because their parents are different.
So your main problem is that children do not have unique ids, only the root nodes have unique ids. Neo4j does not have have any mechanic (yet) to carry the final context of one query into the start of another, and makes no guarantee that a nodes internal id will be the same between queries. So for your data as is, you must match the whole chain to be sure you match the correct node to append to. There are a few things you can do make this not necessary though.
Add a UUID
By adding a universally unique id to each node (, and indexing that property,) you will be able to match on that id with the guarantee that there are no collisions and it will be the same across queries. Any time using a nodes internal ID would be useful, that is a good sign you could use UUID's in your data. (Also helps if the data is mirrored to other databases)
Store the path as a Unique ID
It's possible you don't know the UUID assigned in Neo4j (because it's not in the source data), but in a tree you can create a unique ID in the format of <parent-ID>_<index><sorted-labels><source-id>. The idea here is that the parent is guaranteed to have a unique id, and you combine that id with the info that makes this child unique to that parent. This allows you do generate a deterministic unique ID. (Requires a tree data structure, with a unique root id) In most cases, you can probably leave the index part out (that is for cases of lists/arrays in the source data). In essence, you are storing the path from the root node to this node as the nodes unique id. (Again, you will want an index on this id)
Batch the job
If this is all part of just one job, another option is pool the changes you want to make, and generate one cypher that will do all of them while Neo4j already has everything fetched.

How to get the Node name in Neo4J

I am new to Neo4j and am referring to this tutorial.
I am not finding any answer on how to fetch the node name using CQL.
For example:
If I create two nodes like so:
CREATE (Dhawan:player{name: "Shikar Dhawan", YOB: 1985, POB: "Delhi"})
CREATE (Ind:Country {name: "India"})
and then build relationship at a later date using:
CREATE (Dhawan)-[r:BATSMAN_OF]->(Ind)
How do we know the node name: Dhawan or Ind?
Using:
MATCH (n) RETURN n
I am getting back the label name but not the node name!
How do I get all the details of an existing graph DB?
The thing you're calling "the node name" is actually a variable, and is only present for the duration of a single query (or less, if you don't include it in a WITH clause and it goes out of scope). It is never saved to the graph db, and is not persisted data.
In your example, you would only be able to use CREATE (Dhawan)-[r:BATSMAN_OF]->(Ind) (and have those variables refer to your previously created nodes) if the create was performed in the same query where those variables were previously bound (and still in scope).
Otherwise, this would create two new nodes, create the :BATSMAN_OF relationship between them, and bind those variables to the new nodes for the duration of their scope.

Assumptions regarding Node ID strings in Neo4j - cypher

In my recent question, Modeling conditional relationships in neo4j v.2 (cypher), the answer has led me to another question regarding my data model and the cypher syntax to represent it. Lets say in my model, there is a node CLT1 that is what I'll call the Source node. CLT1 has relationships to other 286 Target nodes. This is a model of a target node:
CREATE
(Abnormally_high:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{Pro1:'x',Prop2:'y',Prop3:'z'})
Key point: I am assuming the string after the CREATE clause is
The ID of this target node
The ID is significant because its content has domain-specific meaning
and is query-able.
in this case its the phrase ...."Abnormally_high".
I made this assumption based on the movie database example.
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
The first strings after CREATE definitely have domain-specific meaning!
In my earlier post I discuss Problem 2. I find that problem 2 arises because among the 286 target nodes, there are many instances where there was at least one more Target node who shares the identical ID. In this instance, the ID is "Abnormally_high". The other Target nodes may differ in the value of any of Label1 - Label10 or the associated properties.
Apparently, Cypher doesn't like that. In Problem 2, I was discussing the ways to deal with the fact that cypher doesn't like using the same node ID multiple times even though the labels or properties were different.
My problem are my assumptions about the Target node ID.
AM I RIGHT?
I am now thinking that I could instead use this....
CREATE (CLT1_target_1:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{name:'Abnormally_high',Prop2:'y',Prop3:'z'})
If indeed the first string after the CREATE clause is an ID, then all I have to do is put a unique target node identifier.... like CLT1_target_1 and increment up to CLT1_target_286. If I do this, then I can have the name as a property and change whatever label or property I want.
Do I have this right?
You are wrong. In Cypher, a node name (like "Abnormally_high") is just a variable name that exists for the lifetime of the query (and sometimes not even that long). The node name used in a Cypher query is never persisted in any way, and can be any arbitrary string.
Also, in neo4j, the term "ID" has a specific meaning. The neo4j DB will automatically assign a (currently) unique integer ID to each new node. You have no control over the ID value assigned to a node. And when a node is deleted, neo4j can reassign its ID to a new node.
You should read the neo4j manual (available at docs.neo4j.org), especially the section on Cypher, to get a better understanding.

NEO4J Server: Update nodes

I'm using neo4j server versión 1.8. The current requirement of my application is to create and/or update multiple nodes and add them to a specific index (all in one request). We have several indexes, and one node can be in some of them. Those nodes have a property that is a ID (significant for our application). If there is already a node with that ID, we only need to override its properties. What we now do is to first check which of these nodes exist (via GET in the index that contains all the nodes) and stored in a data structure. Then I make the batch request to create or update the nodes. If the ID does not exist in the data structure, we have to create a node. Otherwise, only we have to update its properties. This is slow, it is best to make a batch request only, instead of two. I try with UNIQUE INDEXES but the node can be in several indexes. What do you recommend I do?

Resources