Neo4j Store Data In Array Or User Super Node - neo4j

In early editions of Neo4j, Super nodes were typically seen as a bad thing for performance. I have not seen too much about that recently with the 2.X and 3.X releases so was wondering if that was still a problem.
The issue I have is I need to store a finite number of options for a specific Node type. For example, Person and favorite colors. I can store an array in the Person Node that stores the colors the user likes, or I can create a Node for each color and then create a relationship from the Person to the Color Node. It seems the super node option would be faster to query but am worried as super nodes were bad in the past.
If I am trying to look up people who like a specific color, what's the recommended way to store such data in Neo?

I think the major issue here will be that the Color node will become a very connected node.
Maybe you need an Options subgraph to have a template of these options and then :
copy template node option to link this copy with the main entity node
or
copy only the choosen option into a property of the main entity node, like your array proposition
or, if your option has no properties
add label onto your main entity node
I think, even with the increase in performance of hyperlinked nodes with newer Neo4j versions, the read/write time will be always more than the one who has less.
I hope this help a bit.

Related

Neo4J - Which is better to store element as a property of user or as a node & relationship?

I got a problem when designing a graph model with million users. I need to store information that user is registered or non-register.
As I see we have 2 options:
Store a property "register = true/false" in each user node. So with 1 million user, we have 1 million properties "register".
Store a Registered node then make relationship just for registered user to this node. So we have number of relationship equal exactly with the registered user.
Which option is better in performance searching also about minimum storage?
Thanks in advance,
Modeling your data as a graph is a difficult thing to pin down exactly. Typically, when it comes to NoSQL databases, the most important thing to consider is how you will be using your data, and to model it based on that.
Using the external node might run into performance problems, as Neo4J typically starts to run into issues during traversing as it approaches around 10,000 relationships in a single node. You will be well above that limit with an external "Registered" node; on the other hand as long as you are not anchoring your search to that node, it should be okay.
No matter which route you go, the query you described in the comments will likely anchor on (start with) the user, then traverse to who their friends are, and then for each friend, it will check whether it
A. has the "registered" property set to 'true'
B. has a relationship to the "Registered" node.
Each of these methods appears to have a similar execution time, and indexing on the "registered" property will have negligible impact because it is not being used as an anchor (presumably; you would have to PROFILE your query with both methods to find out for sure). So, like you mentioned, one might consider the space restraints.
Besides that, there is not much difference from a performance analysis perspective between the two methods that I can see.
A third option, mentioned by #InverseFalcon, is to use an additional label, ':Registered' on those nodes that are registered. This might well result in a faster comparison time than keeping it in a property, as labels will be inlined in the node store and can be checked there, whereas properties might have an additional level of indirection to the property store.

Node vs Relationship

So I've just worked through the tutorial and I'm unclear about a few things. The main one, however, is how do you decide when something is a relationship and when it should be a Node?
For example, in the Movies Database,there is a relationship showing who acted in which film. A property of that relationship is the Role. BUT, what if it's a series of films? The role may well be constant between films (say, Jack Ryan in The Hunt for Red October, Patriot Games etc.)
We may also want to have some kind of Character bio, which would obviously remain constant between movies. Worse, the actor may change from one movie to another (Alec Baldwin then Harrison Ford.) There are many others like this (James Bond, for example).
Even if the actor doesn't change (Main roles in Harry Potter) the character is constant. So, at what point would the Role become a node in its own right? When it does, can I have a 3-way relationship (Actor-Role-Movie)? Say I start of with it being a relationship and then, down the line, decide it should've been a node, is there a simple way to go through the database and convert it?
No there is no way to convert your datamodel. When you start your own Database first take time to find a fitting schema. There is no ideal way to create a schema and also there are many different models fitting to the same situation without being totally wrong.
My strategy is to put less information to the relationship itself. I only add properties that directly concern the relationship and store all the other data in the nodes. Also think of properties you could use for traversing the graph. For example you might need some flags or even different labels for relationships even they more or less are the same. The apoc.algo.aStar is only including relationshiptypes you want (you could exclude certain nodes by giving them a special relationshiptype). So keep that in mind that you take a look at procedures that you might use later.
Try to create the schema as simple as possible and find a way to stay consistent in terms of what things are nodes and what deserves a relationship. Dont mix it up.
Choose a design that makes sense for you! (device 1)-[cable]-(device 2) vs (device 1)-[has cable]-(cable)-[has cable]-(device 2) in this case I'd prefer the first because [has cable] wouldn't bring anymore information. Irrespective to what I wrote above I would have a lot of information in this [cable] relationship but it totally makes sense for me because I wouldnt want to search in a device node for cable information.
For your example giving the role a own node is also valid way. For example if you want to espacially query which actors had the same role in common I'll totally go for giving the role a extra node.
Summary:
Think of what you want to do with the data and choose the easiest model.

Tracking the history of nodes in neo4j

I'm using Neo4j Community edition 2.1.4. I have hierarchy of 4 levels and each level names i have treated it as label name for that level.So in my graph i have totally 4 labels. Now for the first time I have loaded csv file into neo4j and using MERGE and CREATEkeywords created the nodes and relationships. In future the requirement is like
scenario 1:
if someone wants to rename the hierarchy level name to some new name, then I have to
change the label name to a new name.
Scenario 2:
if any of the property name of node changes to to new name
In the both the cases i wanted to track the history of the node. How can i do it? So that in future someone wants to see the history details , they can query and get the details.
So how can i track the history details of the nodes in neo4j?
My Approch:
For the first time i will load the csv file and create nodes and relationships. Then if someone wants to change the label name of node A(level name which is standard) which is having a properties like ID, name,start_date,end_date,Status.Then i will replicate the node A with all the properties and change the status to inactive and i will overwrite the old node with the new details. But i'm clueless whether this solution is going to work or not. Also i have more thane 10000 nodes in my db.
So please suggest me a better approach to track the nodes history.
Have a look at the GraphAware ChangeFeed Module. You can track history of changes and also configure which changes you would like to track and which to ignore.
You can use versionning. Examples in this blog post : neo4j.org/graphgist?608bf0701e3306a23e77 that you can adapt for your needs

Deriving the label names from other node in neo4j

I'm using ne04j 2.1.2 community edition.
I have a nodes with a label called Company and I created these nodes and label by loading CSV file along with the MERGE and CREATE commands.
So in future if my label names changes,say Company to Organization, I wanted to maintain the createddate, UpdatedDate, NewLabelName, OldLabelName values somewhere.
So in order to achieve that I thought of maintaining one master node which holds the label information i.e., it should have the properties like NewLabelName, OldLabelName, CreatedDate, UpdatedDate. So the label name should come from the Master Node to other nodes. Whenever we made any changes to label ,then the corresponding UpdatedDate property value should be updated in the master node and NewLabelName should come from the master node to other nodes (nodes for which that label belongs to) .
Hope you understand the scenario here.
But how can i achieve this ? is it possible to achieve ? if yes, then how can i define the relationship between master and other nodes?
(Here my other nodes are Name of the Companies like Google, Yahoo, Samsung etc.. and those will be having some other child nodes like location)
Please suggest the solution. (I wanted to achieve these using cypher not using java)
Thanks
Although labels can be changed, you should do that rarely (e.g., to recover from a mistake). Changing a large number of labels is very expensive and should never be done as a part of normal processing.
Also, like a Java class name, a label name is not something you'd normally show to end users. So, there is really no reason to ever change them. Just try to pick reasonable label names to start with, and don't plan to change them.

Neo4j data modeling for branching/merging graphs

We are working on a system where users can define their own nodes and connections, and can query them with arbitrary queries. A user can create a "branch" much like in SCM systems and later can merge back changes into the main graph.
Is it possible to create an efficient data model for that in Neo4j? What would be the best approach? Of course we don't want to duplicate all the graph data for every branch as we have several million nodes in the DB.
I have read Ian Robinson's excellent article on Time-Based Versioned Graphs and Tom Zeppenfeldt's alternative approach with Network versioning using relationnodes but unfortunately they are solving a different problem.
I Would love to know what you guys think, any thoughts appreciated.
I'm not sure what your experience level is. Any insight into that would be helpful.
It would be my guess that this system would rely heavily on tags on the nodes. maybe come up with 5-20 node types that are very broad, including the names and a few key properties. Then you could allow the users to select from those base categories and create their own spin-offs by adding tags.
Say you had your basic categories of (:Thing{Name:"",Place:""}) and (:Object{Category:"",Count:4})
Your users would have a drop-down or something with "Thing" and "Object". They'd select "Thing" for instance, and type a new label (Say "Cool"), values for "Name" and "Place", and add any custom properties (IsAwesome:True).
So now you've got a new node (:Thing:Cool{Name:"Rock",Place:"Here",IsAwesome:True}) Which allows you to query by broad categories or a users created categories. Hopefully this would keep each broad category to a proportional fraction of your overall node count.
Not sure if this is exactly what you're asking for. Good luck!
Hmm. While this isn't insane, think about the type of system you're replacing first. SQL. In SQL databases you wouldn't use branches because it's data storage. If you're trying to get data from multiple sources into one DB, I'd suggest exporting them all to CSV files and using a MERGE statement in cypher to bring them all into your DB at once.
This could manifest similar to branching by having each person run a script on their own copy of the DB when you merge that takes all the nodes and edges in their copy and puts them all into a CSV. IE
MATCH (n)-[:e]-(n2)
RETURN n,e,n2
Then comparing these CSV's as you pull them into your final DB to see what's already there from the other copies.
IMPORT CSV WITH HEADERS FROM "file:\\YourFile.CSV" AS file
MERGE (N:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N2:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N)-[E:Edge]-(N2)
This will work, as long as you're using node types that you already know about and each person isn't creating new data structures that you don't know about until the merge.

Resources