So I've just worked through the tutorial and I'm unclear about a few things. The main one, however, is how do you decide when something is a relationship and when it should be a Node?
For example, in the Movies Database,there is a relationship showing who acted in which film. A property of that relationship is the Role. BUT, what if it's a series of films? The role may well be constant between films (say, Jack Ryan in The Hunt for Red October, Patriot Games etc.)
We may also want to have some kind of Character bio, which would obviously remain constant between movies. Worse, the actor may change from one movie to another (Alec Baldwin then Harrison Ford.) There are many others like this (James Bond, for example).
Even if the actor doesn't change (Main roles in Harry Potter) the character is constant. So, at what point would the Role become a node in its own right? When it does, can I have a 3-way relationship (Actor-Role-Movie)? Say I start of with it being a relationship and then, down the line, decide it should've been a node, is there a simple way to go through the database and convert it?
No there is no way to convert your datamodel. When you start your own Database first take time to find a fitting schema. There is no ideal way to create a schema and also there are many different models fitting to the same situation without being totally wrong.
My strategy is to put less information to the relationship itself. I only add properties that directly concern the relationship and store all the other data in the nodes. Also think of properties you could use for traversing the graph. For example you might need some flags or even different labels for relationships even they more or less are the same. The apoc.algo.aStar is only including relationshiptypes you want (you could exclude certain nodes by giving them a special relationshiptype). So keep that in mind that you take a look at procedures that you might use later.
Try to create the schema as simple as possible and find a way to stay consistent in terms of what things are nodes and what deserves a relationship. Dont mix it up.
Choose a design that makes sense for you! (device 1)-[cable]-(device 2) vs (device 1)-[has cable]-(cable)-[has cable]-(device 2) in this case I'd prefer the first because [has cable] wouldn't bring anymore information. Irrespective to what I wrote above I would have a lot of information in this [cable] relationship but it totally makes sense for me because I wouldnt want to search in a device node for cable information.
For your example giving the role a own node is also valid way. For example if you want to espacially query which actors had the same role in common I'll totally go for giving the role a extra node.
Summary:
Think of what you want to do with the data and choose the easiest model.
Related
I was looking up how to utilise temporary relationships in Neo4j when I came across this question: Cypher temp relationship
and the comment underneath it made me wonder when they should be used and since no one argued against him, I thought I would bring it up here.
I come from a mainly SQL background and my main reason for using virtual relationships was to eliminate duplicated data and do traversals to get properties of something instead.
For a more specific example, let's say we have a robust cake recipe, which has sugar as an ingredient. The sugar is what makes the cake sweet.
Now imagine a use case where I don't like sweet cakes so I want to get all the ingredients of the recipe that make the cake sweet and possibly remove them or find alternatives.
Then there's another use case where I just want foods that are sweet. I could work backwards from the sweet ingredients to get to the food or just store that a cake is sweet in general, which saves time from traversal and makes a query easier. However, as I mentioned before, this duplicates known data that can be inferred.
Sorry if the example is too strange, I suck at making them. I hope the main question comes across, though.
My feeling is that the only valid scenario for creating redundant "shortcut" relationships is this:
Your use case has a stringent time constraint (e.g., average query time must be less than 200ms), but your neo4j query -- despite optimization -- exceeds that constraint, and you have verified that adding "shortcut" relationships will indeed make the response time acceptable.
You should be aware that adding redundant "shortcut" relationships comes with its own costs:
Queries that modify the DB would need to be more complex (to modify the redundant relationships) and also slower.
You'd always have to add the redundant relationships -- even if actually you never need some (most?) of them.
If you want to make concurrent updates to the DB, the chances that you may lose some updates and introduce inconsistencies into the DB would increase -- meaning that you'd have to work even harder to avoid inconsistencies.
NOTE: For visualization purposes, you can use virtual nodes and relationships, which are temporary and not actually stored in the DB.
I got a problem when designing a graph model with million users. I need to store information that user is registered or non-register.
As I see we have 2 options:
Store a property "register = true/false" in each user node. So with 1 million user, we have 1 million properties "register".
Store a Registered node then make relationship just for registered user to this node. So we have number of relationship equal exactly with the registered user.
Which option is better in performance searching also about minimum storage?
Thanks in advance,
Modeling your data as a graph is a difficult thing to pin down exactly. Typically, when it comes to NoSQL databases, the most important thing to consider is how you will be using your data, and to model it based on that.
Using the external node might run into performance problems, as Neo4J typically starts to run into issues during traversing as it approaches around 10,000 relationships in a single node. You will be well above that limit with an external "Registered" node; on the other hand as long as you are not anchoring your search to that node, it should be okay.
No matter which route you go, the query you described in the comments will likely anchor on (start with) the user, then traverse to who their friends are, and then for each friend, it will check whether it
A. has the "registered" property set to 'true'
B. has a relationship to the "Registered" node.
Each of these methods appears to have a similar execution time, and indexing on the "registered" property will have negligible impact because it is not being used as an anchor (presumably; you would have to PROFILE your query with both methods to find out for sure). So, like you mentioned, one might consider the space restraints.
Besides that, there is not much difference from a performance analysis perspective between the two methods that I can see.
A third option, mentioned by #InverseFalcon, is to use an additional label, ':Registered' on those nodes that are registered. This might well result in a faster comparison time than keeping it in a property, as labels will be inlined in the node store and can be checked there, whereas properties might have an additional level of indirection to the property store.
I am new to Cypher and trying to design a graphic database and store user behaviour. Thanks in advance!
Case Example:
1. A user visited a web page
2. A user owned a device (id:xxxxx)
In UML Class diagram, the arrow (relationship) is pointing toward to parent class
But, my point of view, not all relationship in Cypher are Parent-Child type, does it means that i should not apply this kind of concept into Cypher?
So, the question is "how to design the direction of relationship"?
(user)-[r:visited]->(webpage {url:xxx})
(user)-[r:owned]->(mobileDevice {uuid:xxx})
-- or --
(user)<-[r:visitedBy]-(webpage {url:xxx})
(user)<-[r:owned]-(mobileDevice {uuid:xxx})
Thank you again
This is a common question. The answer is that it's up to you! Relationship types can be whatever you choose and you should go with what is most comfortable. I would suggest that whatever you do, just try to be consistent.
Personally between "visted" and "visited by", I would go with "visited" because I think it makes more sense to be talking about the fact that the user visited a page, not that a page was visited by the user. I often recommend that people name their relationships so that the node-relationship-node makes a sentence. Since the user is the primary actor your sentence would be "(the) user visited (the) webpage". That might come from me being a native English speaker and the way that English sentences are formed, though.
As a side note, relationships in Neo4j are generally UPPER_SNAKE_CASE. Again, Neo4j doesn't restrict you from any one particular style, but that's what I've seen most. This guide gives a pretty good overview of common Cypher conventions:
http://nigelsmall.com/zen
I'm a Neo4j newbie, just playing around in the browser modelling data for a project at the moment.
Here's my use case: A user can have a bunch of items. Each item is described by a storyline.
(:User)-[:OWNS]->(:Item)<-[:DESCRIBES]-(:Storyline)
No issues so far. However, the storyline needs to contain "cards", basically chapters of the story that need to be in order. So, my first thought was this.
(:Storyline)<-[:FOLLOWS]<-(a:Card)<-[:FOLLOWS]-(b:Card)
However, if we start at Card B, we now have to follow the path back to see what storyline/item the card belongs to. Seems inefficient. Would it be better to do this?
(a:Card {order: 0})-[:BELONGS_TO]->(:Storyline)
(b:Card {order: 1})-[:BELONGS_TO]->(:Storyline)
Or, might I even trash the Storyline and just have the following?
(:Card {order:0})-[:DESCRIBES]->(:Item)
Next, a user should be free to create a link to another storyline card belonging to his own or any other user's item.
(storyA:Card)-[:LINKS_TO]->(storyB:Card)
However, the owner of storyB may or may not want to link back to the first guy's story. I know you can ignore the direction of the relationship in a cypher query by doing:
(a)-[r]-(b)
But I read that explicitly creating bi-directional relationships is usually a bad idea. So if storyB wants to link back, how would you best represent this in the data model? Maybe another relationship type, like :LINKS_MUTUALLY or something, or a "mutual" boolean property on the :LINKS_TO relationship?
Regarding you first issue, it's usually better to have relationships rather than properties in this case.
I'd throw in FIRST and LAST relationships, like in this TimeTree, and model it as:
(a:Card)-[:DESCRIBES]->(i:Item)
(b:Card)-[:DESCRIBES]->(i:Item)
(c:Card)-[:DESCRIBES]->(i:Item)
(a:Card)<-[:FOLLOWS]-(b:Card)
(b:Card)<-[:FOLLOWS]-(c:Card)
(a:Card)<-[:FIRST_CARD]-(i:Item) //optional, for easy navigation
(c:Card)<-[:LAST_CARD]-(i:Item) //optional, for easy navigation
As for bidirectional relationships, the only are a bad idea if a relationship in one direction implies the other one. In your case, this is not the case, so creating (storyA:Card)-[:LINKS_TO]->(storyB:Card) and (storyA:Card)<-[:LINKS_TO]-(storyB:Card) is perfectly fine, since each relationship is there for a different reason.
To clarify, let's assume that we have nodes representing people and the following relationships: "BIOLOGICAL_MOTHER" and "BIOLOGICAL_FATHER".
Then, for any person node, said node can only have one "BIOLOGICAL_MOTHER" and one "BIOLOGICAL_FATHER". How can we ensure that this is the case?
No. Neo4J currently only supports uniqueness constraints.
I believe several people are working on different schema constructs for neo4j, that would permit you to constrain graphs in any number of different ways. What it seems you're asking for boils down to a database constraint that if there is a relationship of type BIOLOGICAL_FATHER from one person to another, that the DB may not accept any creation of new relationships of that same type. In other words, relationship cardinality constraints, by relationship type.
At the moment, I think the best you can do is verify in your application code that such a relationship doesn't exist before creating it, but the DB won't do this checking for you.
The particular constraint you're looking for sounds easy enough, hopefully a neo4j dev will jump in here and say, "Oh, no worries, that's planned for release XYZ" - but I'm not sure about that.
More broadly, there are a number of issues with graphs that make constraints very tricky. In my personal graph domain, I'd like to make it impossible to create new relationships such that they would introduce cycles in the graph over a particular relationship type. (E.g. (a)-[:owns]->(b)-[:owns]->(a) is extremely undesirable for me). This would be a very costly constraint to actually enforce in the general case, since verifying whether a new relationship was OK could potentially involve traversing a huge graph.
Over the long run, it seems reasonable that neo4j might implement local constraints, but still shy away from anything that implied non-local constraint checking.
Steve,
In terms of Cypher, if I am given two names of people - say Sam and Dave, and wish to make Sam the father of Dave, but only if Dave doesn't yet have a father, I could do something like this:
MATCH (f {name : 'Sam'}), (s {name : 'Dave'})
WHERE NOT (s)<-[:FATHER]-()
CREATE (f)-[:FATHER]->(s)
If Dave already has a father the WHERE clause filters Dave out, which means no relationship will be created.
Grace and peace,
Jim