I need to write batch importing utility for my Neo4j database but I don't want to lose the repository feature of SDN. To achieve this goal I want to insert such nodes that can be still queried using auto generated repository methods.
I inserted some nodes to my database and I looked at their properties and labels to see how they are set and I noticed that SDN inserted nodes have two labels. For example nodes representing class SomeClass have labels: ["_SomeClass", "SomeClass"]. My question is: why set two, almost identical labels for each node?
Oh that's actually simple. We somehow have to note if the current node is of type SomeClass, which we do by prepending the "_". As there are labels added for each super-type you need to differentiate what the actual type of the node in Spring Data Neo4j is.
So you could have: _Developer, Developer, Employee, Person for a class hierarchy from Person down to Developer. And then there could be additional labels for interfaces.
When you now do: DeveloperRepository.findAll() then you only want those with _Developer back, not ones that derived from Developer.
Related
In early editions of Neo4j, Super nodes were typically seen as a bad thing for performance. I have not seen too much about that recently with the 2.X and 3.X releases so was wondering if that was still a problem.
The issue I have is I need to store a finite number of options for a specific Node type. For example, Person and favorite colors. I can store an array in the Person Node that stores the colors the user likes, or I can create a Node for each color and then create a relationship from the Person to the Color Node. It seems the super node option would be faster to query but am worried as super nodes were bad in the past.
If I am trying to look up people who like a specific color, what's the recommended way to store such data in Neo?
I think the major issue here will be that the Color node will become a very connected node.
Maybe you need an Options subgraph to have a template of these options and then :
copy template node option to link this copy with the main entity node
or
copy only the choosen option into a property of the main entity node, like your array proposition
or, if your option has no properties
add label onto your main entity node
I think, even with the increase in performance of hyperlinked nodes with newer Neo4j versions, the read/write time will be always more than the one who has less.
I hope this help a bit.
I'm totally new to Neo4j and I'm testing it in these days.
One issue I have with it is how to correctly implement a relationship which involves 3 different nodes using Spring Data. Suppose, for example, that I have 3 #NodeEntitys: User, Tag and TaggableObject.
As you can argue, a User can add a Tag to a TaggableObject; I model this operation with a #RelationshipEntity TaggingOperation.
However, I can't find a simple way to glue the 3 entities inside the relationship. I mean, the obvious choice is to set #StartNode User tagger and #EndNode TaggedObject taggedObject; but how can I also add the Tag to the relationship?
This is called a "hyperedge", I believe, and it's not something that Neo4j supports directly. You can create an additional node to support it, tough. So you could have a TagEvent node with a schema like so:
(:User)-[:PERFORMED]->(:TagEvent)
(:Tag)<-[:USED]-(:TagEvent)
(:TagObject)<-[:TAGGED]-(:TagEvent)
Another alternative is to store a foreign key as a property on a relationship or a node. Obviously that's not very graphy, but if you just need it for reference that might not be a bad solution. Just remember to not use the internal Neo4j ID as in future versions that may not be dependable. You should create your own ID for this purpose.
I'm using ne04j 2.1.2 community edition.
I have a nodes with a label called Company and I created these nodes and label by loading CSV file along with the MERGE and CREATE commands.
So in future if my label names changes,say Company to Organization, I wanted to maintain the createddate, UpdatedDate, NewLabelName, OldLabelName values somewhere.
So in order to achieve that I thought of maintaining one master node which holds the label information i.e., it should have the properties like NewLabelName, OldLabelName, CreatedDate, UpdatedDate. So the label name should come from the Master Node to other nodes. Whenever we made any changes to label ,then the corresponding UpdatedDate property value should be updated in the master node and NewLabelName should come from the master node to other nodes (nodes for which that label belongs to) .
Hope you understand the scenario here.
But how can i achieve this ? is it possible to achieve ? if yes, then how can i define the relationship between master and other nodes?
(Here my other nodes are Name of the Companies like Google, Yahoo, Samsung etc.. and those will be having some other child nodes like location)
Please suggest the solution. (I wanted to achieve these using cypher not using java)
Thanks
Although labels can be changed, you should do that rarely (e.g., to recover from a mistake). Changing a large number of labels is very expensive and should never be done as a part of normal processing.
Also, like a Java class name, a label name is not something you'd normally show to end users. So, there is really no reason to ever change them. Just try to pick reasonable label names to start with, and don't plan to change them.
Can one Neo4j database be divided up so that there are multiple starting points in one database so that all queries can be isolated, instead of having multiple databases?
I have thought about this and I think it can work up to a point, but once things like labels are used then the idea will not work, as a label query will always span the whole database.
Anyway I would like to know if anyone has successfully done this and how they did it.
What you are describing sounds like multitenancy. Neo4j 2.0.1 does not at this time support multitenancy as a feature. There are various methods and strategies for implementing a multitenant architecture within your Neo4j database instance.
You can partition sets of your property graph by label. Since nodes can have multiple labels, you can label one partition with a unique identifying label for that partition.
Please refer to documentation on labels here: http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-labels.html
Things to note with this strategy are to ensure that all your Cypher calls contain the partition identifier for the label, to ensure that the two partitions are isolated from one another within the graph. It's important to ensure that relationships from one partition do not span into another partition.
For example, partition 1 could be the label Partition1. Assuming your application context is operating on Partition1:
MERGE (user:User:Partition1 { name: 'Peter' })
RETURN user
Assuming your application context is operating on Partition2:
MERGE (user:User:Partition2 { name: 'Peter' })
RETURN user
When executing these two queries, two separate Peters are created for Partition1 and Partition2.
You'll just need to ensure that the partition label your application is operating on appends its label to each one of your queries. While this is tedious, it is the suggested way to go about multitenancy at this time.
I am building a large graph database that has a significant set of meta data about each node (thousands of properties per node). I am currently going through the process of determining which meta data should be a node within Neo4j, which should become a property of the Node and which should be housed in a separate database.
My thinking is to use the meta data in 3 ways:
1 - If the property is shared between many nodes, to make that property it's own node and create an edge to that property.
2 - If the property is important to traversing the graph, but not "highly" shared, to add that as a node property. (Which could also be indexed within Neo4j if needed)
3 = If the meta data is strictly describing that node, to have that stored in a separate NoSQL database, with the Neo4J Node ID becoming the foreign key to the other database.
While it seems like the most efficient use of using the graph database, it seems like a pain to have the different property types and having to determine which type of property it is before using it. (Likely a property lookup key-value store) It would also likely mean that I would need an easy way to promote a property from 3 to 2 to 1 for instances when a property becomes highly shared, or needed for efficient traversal.
Has anyone taken this approach? Any thoughts to share, or things to avoid?
Do never ever store a Neo4j node id in an external system. The node id is basically a offset in the respective store file. If you delete a node its id might be reused when new nodes are created.
The right approach is to have a "good" identifier (e.g. uuid) as a node property and put that into Neo4j's index. That uuid is then save to be stored in third party systems.
Some time ago I've created a unmanaged extension that adds a uuid to each new node and prevent manual changes to these uuids: https://github.com/sarmbruster/neo4j-uuid.
Update (2013-08-21)
I've blogged about UUIDs with Neo4j.