I'm using neo4j server versiĆ³n 1.8. The current requirement of my application is to create and/or update multiple nodes and add them to a specific index (all in one request). We have several indexes, and one node can be in some of them. Those nodes have a property that is a ID (significant for our application). If there is already a node with that ID, we only need to override its properties. What we now do is to first check which of these nodes exist (via GET in the index that contains all the nodes) and stored in a data structure. Then I make the batch request to create or update the nodes. If the ID does not exist in the data structure, we have to create a node. Otherwise, only we have to update its properties. This is slow, it is best to make a batch request only, instead of two. I try with UNIQUE INDEXES but the node can be in several indexes. What do you recommend I do?
Related
New to Neo4j. My goal is to make a database with various csv sources. I have created node labels of "geochemistry" and "geospatial" to be linked with the "LABID" node via a common property. I have loaded in one dataset easily, and made the necessary connections with the ":LOCATED" relationship being defined as between the geospatial and LABID nodes.
However, moving on to the second csv source, I am a little confused. I have tried matching the new geospatial data (which have no current relationships) to another set of Lab IDs. Below is my current code:
MATCH (g:geospatial) WHERE NOT (g)-[:LOCATED]->(:LABID)
MATCH (l:LABID)
WHERE l.labid = g.Sample_ID
MERGE (g)-[r:LOCATED]->(l)
RETURN r
labid is a current property in LABID, and so is Sample_ID to the geospatial nodes.
After completing the above query, the output is "(no changes, no records)"
Thanks for the help in advance!
I'm writing a bunch of queries in order to build a tree inside Neo4j, but in order to add different types of new data, I'm writing the same opening stanzas for each of my queries.
Example: I want to be able to add Root(identifier=Root1)->A(identifier=1)->B(identifier=2)... without modifying the trees pointed to by other roots.
All of my queries start off with
Match
(root:`Root` {identifier=$identifier})
Create
(root)-[:`someRel`]->(a:`A` {identifier=$a_identifier})
Then some time passes and A needs a child:
Match
(root:`Root` {identifier=$identifier})
-[:`someRel`]->
(a:`A` {identifier=$a_identifier})
Create
(a)-[:`someOtherRel`]->(b:`B` {identifier=$b_identifier})
Then some other time passes and maybe B needs a child, and I have to use the same opening stanza to get to A and then add another one to get the correct B.
Is there some functionality that I'm missing that will allow me to not have to build up those opening stanzas every time I want to get to the correct B, (or C or D) or do I just need to do this using string concatenation?
String concatenation example: (python)
MATCH
{ROOT_LOOKUP_STANZA},
{A_LOOKUP_STANZA},
{B_LOOKUP_STANZA},
CREATE
(b)-[:`c_relationship`]->(c:`C` {...})
Some additional notes:
Root Nodes have to be uniquely identified
The rest of the nodes have to be uniquely identified with their parents. So the following is valid:
Root(root)->A(a)->B(b)
Root(root)->A(a1)->B(b)
In this case B(b) references two different nodes because their parents are different.
So your main problem is that children do not have unique ids, only the root nodes have unique ids. Neo4j does not have have any mechanic (yet) to carry the final context of one query into the start of another, and makes no guarantee that a nodes internal id will be the same between queries. So for your data as is, you must match the whole chain to be sure you match the correct node to append to. There are a few things you can do make this not necessary though.
Add a UUID
By adding a universally unique id to each node (, and indexing that property,) you will be able to match on that id with the guarantee that there are no collisions and it will be the same across queries. Any time using a nodes internal ID would be useful, that is a good sign you could use UUID's in your data. (Also helps if the data is mirrored to other databases)
Store the path as a Unique ID
It's possible you don't know the UUID assigned in Neo4j (because it's not in the source data), but in a tree you can create a unique ID in the format of <parent-ID>_<index><sorted-labels><source-id>. The idea here is that the parent is guaranteed to have a unique id, and you combine that id with the info that makes this child unique to that parent. This allows you do generate a deterministic unique ID. (Requires a tree data structure, with a unique root id) In most cases, you can probably leave the index part out (that is for cases of lists/arrays in the source data). In essence, you are storing the path from the root node to this node as the nodes unique id. (Again, you will want an index on this id)
Batch the job
If this is all part of just one job, another option is pool the changes you want to make, and generate one cypher that will do all of them while Neo4j already has everything fetched.
I am storing Hierarchy data in Neo4j. I want to store history of the Node. Consider I have a label called GROUP and the earlier name was "MARKETING" now it has been changed to "MARKET123". So i want to create a new node where the name will be MARKET123 and the create a relationship with other connected node same as for the older node named "MARKETING"...
But all this i want to do dynamically instead of passing the other Nodes name and the relationship value in the cypher query.
Please suggest me how it can be done.
You can add versioning to your graph nodes.
Here is a graph gist about time-based versioning that you may adapt to your needs.
http://www.neo4j.org/graphgist?608bf0701e3306a23e77
I deleted the reference node. So I need to recreate the reference node.
Using cypher how to create a node with id 0?
thanks.
The short answer is you can't, and you don't need to. Do you have a specific problem without that node? If so, maybe you can elaborate, chances are there is something else that answers your problem better than trying to recreate a node with a specific id.
The long answer is you can't assign id:s to nodes with cypher. The id is an index or offset into the node storage on disk, so it makes sense to let Neo4j worry about it and not try to manipulate it or include it in any application logic. See Node identifiers in neo4j and Has anyone used Neo4j node IDs as foreign keys to other databases for large property sets?.
You also most likely don't need a reference node. It is created by default in a new database, but it's use is deprecated and it won't exist in future releases. See Is concept of reference node in neo4j still used or deprecated?.
If you still want to assign id to nodes you create, it is accidentally possible in a roundabout way with with the CSV batch importer (1,2) and, I believe, with the Java API batch inserter.
If you still want to recreate or simulate the reference node you can either delete the database data files and let Neo4j recreate the the database, or you can try what this person did: Recreate reference node in a Neo4j database. You can also force Neo4j to recycle the ids of deleted nodes faster, so that new nodes that you create receive those ids that have been freed up and not yet reassigned.
I am building a large graph database that has a significant set of meta data about each node (thousands of properties per node). I am currently going through the process of determining which meta data should be a node within Neo4j, which should become a property of the Node and which should be housed in a separate database.
My thinking is to use the meta data in 3 ways:
1 - If the property is shared between many nodes, to make that property it's own node and create an edge to that property.
2 - If the property is important to traversing the graph, but not "highly" shared, to add that as a node property. (Which could also be indexed within Neo4j if needed)
3 = If the meta data is strictly describing that node, to have that stored in a separate NoSQL database, with the Neo4J Node ID becoming the foreign key to the other database.
While it seems like the most efficient use of using the graph database, it seems like a pain to have the different property types and having to determine which type of property it is before using it. (Likely a property lookup key-value store) It would also likely mean that I would need an easy way to promote a property from 3 to 2 to 1 for instances when a property becomes highly shared, or needed for efficient traversal.
Has anyone taken this approach? Any thoughts to share, or things to avoid?
Do never ever store a Neo4j node id in an external system. The node id is basically a offset in the respective store file. If you delete a node its id might be reused when new nodes are created.
The right approach is to have a "good" identifier (e.g. uuid) as a node property and put that into Neo4j's index. That uuid is then save to be stored in third party systems.
Some time ago I've created a unmanaged extension that adds a uuid to each new node and prevent manual changes to these uuids: https://github.com/sarmbruster/neo4j-uuid.
Update (2013-08-21)
I've blogged about UUIDs with Neo4j.