Neo4j/Cypher Query (user action) relationship direction - neo4j

I am new to Cypher and trying to design a graphic database and store user behaviour. Thanks in advance!
Case Example:
1. A user visited a web page
2. A user owned a device (id:xxxxx)
In UML Class diagram, the arrow (relationship) is pointing toward to parent class
But, my point of view, not all relationship in Cypher are Parent-Child type, does it means that i should not apply this kind of concept into Cypher?
So, the question is "how to design the direction of relationship"?
(user)-[r:visited]->(webpage {url:xxx})
(user)-[r:owned]->(mobileDevice {uuid:xxx})
-- or --
(user)<-[r:visitedBy]-(webpage {url:xxx})
(user)<-[r:owned]-(mobileDevice {uuid:xxx})
Thank you again

This is a common question. The answer is that it's up to you! Relationship types can be whatever you choose and you should go with what is most comfortable. I would suggest that whatever you do, just try to be consistent.
Personally between "visted" and "visited by", I would go with "visited" because I think it makes more sense to be talking about the fact that the user visited a page, not that a page was visited by the user. I often recommend that people name their relationships so that the node-relationship-node makes a sentence. Since the user is the primary actor your sentence would be "(the) user visited (the) webpage". That might come from me being a native English speaker and the way that English sentences are formed, though.
As a side note, relationships in Neo4j are generally UPPER_SNAKE_CASE. Again, Neo4j doesn't restrict you from any one particular style, but that's what I've seen most. This guide gives a pretty good overview of common Cypher conventions:
http://nigelsmall.com/zen

Related

Node vs Relationship

So I've just worked through the tutorial and I'm unclear about a few things. The main one, however, is how do you decide when something is a relationship and when it should be a Node?
For example, in the Movies Database,there is a relationship showing who acted in which film. A property of that relationship is the Role. BUT, what if it's a series of films? The role may well be constant between films (say, Jack Ryan in The Hunt for Red October, Patriot Games etc.)
We may also want to have some kind of Character bio, which would obviously remain constant between movies. Worse, the actor may change from one movie to another (Alec Baldwin then Harrison Ford.) There are many others like this (James Bond, for example).
Even if the actor doesn't change (Main roles in Harry Potter) the character is constant. So, at what point would the Role become a node in its own right? When it does, can I have a 3-way relationship (Actor-Role-Movie)? Say I start of with it being a relationship and then, down the line, decide it should've been a node, is there a simple way to go through the database and convert it?
No there is no way to convert your datamodel. When you start your own Database first take time to find a fitting schema. There is no ideal way to create a schema and also there are many different models fitting to the same situation without being totally wrong.
My strategy is to put less information to the relationship itself. I only add properties that directly concern the relationship and store all the other data in the nodes. Also think of properties you could use for traversing the graph. For example you might need some flags or even different labels for relationships even they more or less are the same. The apoc.algo.aStar is only including relationshiptypes you want (you could exclude certain nodes by giving them a special relationshiptype). So keep that in mind that you take a look at procedures that you might use later.
Try to create the schema as simple as possible and find a way to stay consistent in terms of what things are nodes and what deserves a relationship. Dont mix it up.
Choose a design that makes sense for you! (device 1)-[cable]-(device 2) vs (device 1)-[has cable]-(cable)-[has cable]-(device 2) in this case I'd prefer the first because [has cable] wouldn't bring anymore information. Irrespective to what I wrote above I would have a lot of information in this [cable] relationship but it totally makes sense for me because I wouldnt want to search in a device node for cable information.
For your example giving the role a own node is also valid way. For example if you want to espacially query which actors had the same role in common I'll totally go for giving the role a extra node.
Summary:
Think of what you want to do with the data and choose the easiest model.

Storing data that's related to a user but that's it

In our graph database, I'm looking to store misc data that a user does but isn't really related to anything such as changing a password or updating a username. (there are about 20 other use cases that we have)
I attached two possibilities below but I don't know which one is better if I'm looking to eventually do queries such as how many peopled the password yesterday or who changed the password yesterday.
There are several more options and as always it depends. What are your queries and what is your most likely entrypoint into the graph ? For example :
Those are just three possibilities and the model really depends on how you want to query your data. What else could I have done ?
I could have done away with the Event nodes all together and put the properties on a relationship between User and EventType. Hard to use that relationship as an entry point into the graph though.
I could have added a Date node which could be an entry point into the graph (or maybe an index on eventDate is sufficient).
I could have ...
There is no single right (or wrong) answer. The better choice is often the one that reflects your reality/business the best.
Hope this helps.
Regards,
Tom

Neo4j Relationship design

Revisiting Neo4j after a long absence. I have read a lot of articles but still find I have a few questions to get me going again....
Bidirectional relationships
I have a “connected to”-type scenario where 2 nodes are connected to each other. In fact, the idea is to model a type of flow. However, the flow in both directions is not always the same. I’m uncertain of the best method to use: 1 relationship with 2 properties or 2 distinct relationships?
The former feels like the comfortable choice but then doesn’t feel natural in terms of modelling the actual facts – for example: what to call the properties because FlowIn and FlowOut wouldn’t make sense when looked at from each nodes’ perspective. I also wonder about the performance of properties versus relationships in this case – these values will need to be updated.
Representing Time
Now I want to take a step further and represent the flow between nodes at specific times or, more accurately, between specific times. So between 2pm and 3pm the flow between #1 and #2 will be x.
How should this be done in an optimal way? Relationship per time frame per connection seems….verbose. Could a timeframe being represented as a node be of value?!
Are there any Maximum Flow samples with Cypher out there?
Particularly interested in push-relabel max flow problem solving.
Thank you for any advice to might have to offer.
While you have definitely given some thought to your problem the question is a little unclear. This seems to be a question about Graph Data Models. You would like to know how best to organize a model to represent a complex relationship. If you are trying to track the "flow" between two nodes then assign a weight property to a unidirected edge.
Bidirectional relationships should be carefully considered. Neo4j can process them just as fast as unidirectional relationships. A quote from the graphaware about using bidirectional relationships:
Relationships in Neo4j can be traversed in both directions with the same speed. Moreover, direction can be completely ignored. Therefore, there is no need to create two different relationships between nodes, if one implies the other.
I believe your problems can be alleviated by gaining a better understanding of Graph data models. Looking at a few different models and understanding the why will help more than understanding cypher syntax at this point. May I suggest reading this survey by 2 professors at the University of Chile titled "Survey of Graph Database Models." The "Hypernode" model on page 21 may be of particular interest to you since it sounds like you are trying to model a complex cyclic object. From page twenty one;
Hypernodes can be used to represent simple (flat) and complex objects (hierarchical, composite, and cyclic) as well as mappings and records. A key feature is its inherent ability to encapsulate information.
Hopefully that information helps you in your efforts to model a complex relationship.

"Ordered lists" and bi- vs. uni-directional relationships

I'm a Neo4j newbie, just playing around in the browser modelling data for a project at the moment.
Here's my use case: A user can have a bunch of items. Each item is described by a storyline.
(:User)-[:OWNS]->(:Item)<-[:DESCRIBES]-(:Storyline)
No issues so far. However, the storyline needs to contain "cards", basically chapters of the story that need to be in order. So, my first thought was this.
(:Storyline)<-[:FOLLOWS]<-(a:Card)<-[:FOLLOWS]-(b:Card)
However, if we start at Card B, we now have to follow the path back to see what storyline/item the card belongs to. Seems inefficient. Would it be better to do this?
(a:Card {order: 0})-[:BELONGS_TO]->(:Storyline)
(b:Card {order: 1})-[:BELONGS_TO]->(:Storyline)
Or, might I even trash the Storyline and just have the following?
(:Card {order:0})-[:DESCRIBES]->(:Item)
Next, a user should be free to create a link to another storyline card belonging to his own or any other user's item.
(storyA:Card)-[:LINKS_TO]->(storyB:Card)
However, the owner of storyB may or may not want to link back to the first guy's story. I know you can ignore the direction of the relationship in a cypher query by doing:
(a)-[r]-(b)
But I read that explicitly creating bi-directional relationships is usually a bad idea. So if storyB wants to link back, how would you best represent this in the data model? Maybe another relationship type, like :LINKS_MUTUALLY or something, or a "mutual" boolean property on the :LINKS_TO relationship?
Regarding you first issue, it's usually better to have relationships rather than properties in this case.
I'd throw in FIRST and LAST relationships, like in this TimeTree, and model it as:
(a:Card)-[:DESCRIBES]->(i:Item)
(b:Card)-[:DESCRIBES]->(i:Item)
(c:Card)-[:DESCRIBES]->(i:Item)
(a:Card)<-[:FOLLOWS]-(b:Card)
(b:Card)<-[:FOLLOWS]-(c:Card)
(a:Card)<-[:FIRST_CARD]-(i:Item) //optional, for easy navigation
(c:Card)<-[:LAST_CARD]-(i:Item) //optional, for easy navigation
As for bidirectional relationships, the only are a bad idea if a relationship in one direction implies the other one. In your case, this is not the case, so creating (storyA:Card)-[:LINKS_TO]->(storyB:Card) and (storyA:Card)<-[:LINKS_TO]-(storyB:Card) is perfectly fine, since each relationship is there for a different reason.

Is there any way to ensure that a node is only connected to one instance of a particular relationship type

To clarify, let's assume that we have nodes representing people and the following relationships: "BIOLOGICAL_MOTHER" and "BIOLOGICAL_FATHER".
Then, for any person node, said node can only have one "BIOLOGICAL_MOTHER" and one "BIOLOGICAL_FATHER". How can we ensure that this is the case?
No. Neo4J currently only supports uniqueness constraints.
I believe several people are working on different schema constructs for neo4j, that would permit you to constrain graphs in any number of different ways. What it seems you're asking for boils down to a database constraint that if there is a relationship of type BIOLOGICAL_FATHER from one person to another, that the DB may not accept any creation of new relationships of that same type. In other words, relationship cardinality constraints, by relationship type.
At the moment, I think the best you can do is verify in your application code that such a relationship doesn't exist before creating it, but the DB won't do this checking for you.
The particular constraint you're looking for sounds easy enough, hopefully a neo4j dev will jump in here and say, "Oh, no worries, that's planned for release XYZ" - but I'm not sure about that.
More broadly, there are a number of issues with graphs that make constraints very tricky. In my personal graph domain, I'd like to make it impossible to create new relationships such that they would introduce cycles in the graph over a particular relationship type. (E.g. (a)-[:owns]->(b)-[:owns]->(a) is extremely undesirable for me). This would be a very costly constraint to actually enforce in the general case, since verifying whether a new relationship was OK could potentially involve traversing a huge graph.
Over the long run, it seems reasonable that neo4j might implement local constraints, but still shy away from anything that implied non-local constraint checking.
Steve,
In terms of Cypher, if I am given two names of people - say Sam and Dave, and wish to make Sam the father of Dave, but only if Dave doesn't yet have a father, I could do something like this:
MATCH (f {name : 'Sam'}), (s {name : 'Dave'})
WHERE NOT (s)<-[:FATHER]-()
CREATE (f)-[:FATHER]->(s)
If Dave already has a father the WHERE clause filters Dave out, which means no relationship will be created.
Grace and peace,
Jim

Resources