Modeling conditional relationships in neo4j v.2 (cypher) - neo4j

I have two related problems I need help with.
Problem 1: How do I model a conditional relationship?
I want my data to indicate that when test CLT1's "Result" property = "High", CLT1 has relationship to Disease A. If I take a node-centric approach, I imagine that the code might look something like...
(CLT 1 {Result: "High"}) -[:INDICATES] -> (Disease A)
Further, when CLT1's "Result" property = "Low", CLT1 has a relationship to Disease B
(CLT 1 {Result: "Low"}) -[:INDICATES] -> (Disease B)
Alternatively, if I take a relationship-centric approach, the code might look like this...
(CLT 1) -[:INDICATES {Result: "High"}] -> (Disease A)
(CLT 1) -[:INDICATES {Result: "Low"} ] -> (Disease B)
Problem 2
I have had the experience that I am modeling my data, there is 1 node with a unique name, but either different labels or properties. The thing is that I want these nodes to be distinguishable. However, they are not as they look the same to cypher.
I can either give them multiple properties, labels or different names. The diversity has to be for each different class... in labels or properties (1+n labels, properties) or in different names.
Problem 2 relates to Problem 1 in that I can't model the conditional relationship or distinguish the same node (CLT1) by its labels or properties. I may have to resolve it by making the query-able "condition" in the relationship.
DO I have this right? Do I have any other options?

For your first question, I'd take the relationship-centric approach as this kind of represents the inference of the information leading from your result-node to the disease.
Should work pretty well in modeling and querying too.
For your second question. That's what node-labels are for they represent different roles a node can play, each with different relevant properties and relationships.
So you could do MATCH (p:Person {name:"Jose"}) and treat it differently from MATCH (d:Developer {name:"Jose"}). I.e look at other props and rels.

Related

NEO4J: caption value in node not display?

try create 2 nodes and relation them, but caption not appear in second node "water"
You can really only pick one field per label to use as the identifier for the Neo4j Browser, as typically you'd expect there to be only one main human-readable identifier for a given thing in a data model. You've here got two concepts ('Medicine', in your model) that have very different ways of identifying them which isn't wrong but is probably unhelpful.
For example, a Person label might have a name field, a Car label might have a model field and so on - you wouldn't typically have a mix of the two, but the two labels might co-exist in the same graph and you could set the browser up to use name on Person nodes and model on Car nodes.
Your data model's a bit odd to my eyes which might be the cause of the confusion - why are m1 and m2 both Medicines, when m1 seems to really be a symptom of a disease? The node labels in the model ideally would be the nouns or concepts of your domain, and the relationships the verbs that relate them.
For example, I personally would model what you've put as an example as follows:
CREATE (s: Symptom { name: 'Fever' })
CREATE (m: Medicine { name: 'Water', chemicalName: 'Dihydrogen oxide' })
CREATE (m)-[:TREATS]->(s)
RETURN m, s
You could elect to use chemicalName as the label for your Medicine nodes, or just the plain name field as you see fit - but they would all then render consistently in the graph visualisation.

How can I mitigate having bidirectional relationships in a family tree, in Neo4j?

I am running into this wall regarding bidirectional relationships.
Say I am attempting to create a graph that represents a family tree. The problem here is that:
* Timmy can be Suzie's brother, but
* Suzie can not be Timmy's brother.
Thus, it becomes necessary to model this in 2 directions:
(Sure, technically I could say SIBLING_TO and leave only one edge...what I'm not sure what the vocabulary is when I try to connect a grandma to a grandson.)
When it's all said and done, I pretty sure there's no way around the fact that the direction matters in this example.
I was reading this blog post, regarding common Neo4j mistakes. The author states that this bidirectionality is not the most efficient way to model data in Neo4j and should be avoided.
And I am starting to agree. I set up a mock set of 2 families:
and I found that a lot of queries I was attempting to run were going very, very slow. This is because of the 'all connected to all' nature of the graph, at least within each respective family.
My question is this:
1) Am I correct to say that bidirectionality is not ideal?
2) If so, is my example of a family tree representable in any other way...and what is the 'best practice' in the many situations where my problem may occur?
3) If it is not possible to represent the family tree in another way, is it technically possible to still write queries in some manner that gets around the problem of 1) ?
Thanks for reading this and for your thoughts.
Storing redundant information (your bidirectional relationships) in a DB is never a good idea. Here is a better way to represent a family tree.
To indicate "siblingness", you only need a single relationship type, say SIBLING_OF, and you only need to have a single such relationship between 2 sibling nodes.
To indicate ancestry, you only need a single relationship type, say CHILD_OF, and you only need to have a single such relationship between a child to each of its parents.
You should also have a node label for each person, say Person. And each person should have a unique ID property (say, id), and some sort of property indicating gender (say, a boolean isMale).
With this very simple data model, here are some sample queries:
To find Person 123's sisters (note that the pattern does not specify a relationship direction):
MATCH (p:Person {id: 123})-[:SIBLING_OF]-(sister:Person {isMale: false})
RETURN sister;
To find Person 123's grandfathers (note that this pattern specifies that matching paths must have a depth of 2):
MATCH (p:Person {id: 123})-[:CHILD_OF*2..2]->(gf:Person {isMale: true})
RETURN gf;
To find Person 123's great-grandchildren:
MATCH (p:Person {id: 123})<-[:CHILD_OF*3..3]-(ggc:Person)
RETURN ggc;
To find Person 123's maternal uncles:
MATCH (p:Person {id: 123})-[:CHILD_OF]->(:Person {isMale: false})-[:SIBLING_OF]-(maternalUncle:Person {isMale: true})
RETURN maternalUncle;
I'm not sure if you are aware that it's possible to query bidirectionally (that is, to ignore the direction). So you can do:
MATCH (a)-[:SIBLING_OF]-(b)
and since I'm not matching a direction it will match both ways. This is how I would suggest modeling things.
Generally you only want to make multiple relationships if you actually want to store different state. For example a KNOWS relationship could only apply one way because person A might know person B, but B might not know A. Similarly, you might have a LIKES relationship with a value property showing how much A like B, and there might be different strengths of "liking" in the two directions

What is the most performant way to create the following MATCH statement and why?

The question:
What is the most performant way to create the following MATCH statement and why?
The detailed problem:
Let's say we have a Place node with a variable amount of properties and need to look up nodes from potentially billions of nodes by it's category. I'm trying to wrap my head around the performance of each query and it's proving to be quite difficult.
The possible queries:
Match Place node using a property lookup:
MATCH (entity:Place { category: "Food" })
Match Place node with isCategory relationship to Food node:
MATCH (entity:Place)-[:isCategory]->(category:Food)
Match Place node with Food relationship to Category node:
MATCH (entity)-[category:Food]->(:Category)
Match Food node with isCategoryFor relationship to Place node:
MATCH (category:Food)-[:isCategoryFor]->(entity:place)
And obviously all the variations in between. With relationship directions going the other way as well.
More complexity:
Let's throw in a little more complexity and say we now need to find all Place nodes using multiple categories. For example: Find all Place nodes with category Food or Bar
Would we just tack on another MATCH statement? If not, what is the most performant route to take here?
Extra:
Is there a tool to help me describe the traversal process and tell me the best method to choose?
If I understand your domain correctly, I would recommend making your Categorys into nodes themselves.
MERGE (:Category {name:"Food"})
MERGE (:Category {name:"Bar"})
MERGE (:Category {name:"Park"})
And connecting each Place node to the Categorys it belongs to.
MERGE (:Place {name:"Central Park"})-[:IS_A]->(:Category {name:"Park"})
MERGE (:Place {name:"Joe's Diner"})-[:IS_A]->(:Category {name:"Food"})
MERGE (:Place {name:"Joe's Diner"})-[:IS_A]->(:Category {name:"Bar"})
Then, if you want to find Places that belong to a Category, it can be pretty quick. Start by matching the category, then branch out to the places related to the category.
MATCH (c:Category {name:"Bar"}), (c)<-[:IS_A]-(p:Place)
RETURN p
You'll have a relatively limited number of categories, so matching the category will be quick. Then, because of the way Neo4j actually stores data, it will be fast to find all the places related to that category.
More Complexity
Finding places within multiple categories will be easy as well.
MATCH (c:Category) WHERE c.name = "Bar" OR c.name = "Food", (c)<-[:IS_A]-(p:Place)
RETURN p
Again, you just match the categories first (fast because there aren't many of them), then branch out to the connected places.
Use an Index
If you want fast, you need to use indexes where it makes sense. In this example, I would use an index on the category's name property.
CREATE INDEX ON :Category(name)
Or better yet, use a uniqueness constraint on the category names, which will index them and prevent duplicates.
CREATE CONSTRAINT ON (c:Category) ASSERT c.name IS UNIQUE
Indexes (and uniqueness) make a big difference on the speed of your queries.
Why this is fastest
Neo4j stores nodes and relationships in a very compact, quick-to-access format. Once you have a node or relationship, getting the adjacent relationships or nodes is very fast. However, it stores each node's (and relationship's) properties separately, meaning that looking through properties is relatively slow.
The goal is to get to a starting node as quickly as possible. Once there, traversing related entities is quick. If you only have 1,000 categories, but you have a billion places, it will be faster to pick out an individual Category than an individual Place. Once you have that starting node, getting to related nodes will be very efficient.
The Other Options
Just to reinforce, this is what makes your other options slower or otherwise worse.
In your first example, you are looking through properties on each node to look for the match. Property lookup is slow and you are doing it a billion times. An index can help with this, but it's still a lot of work. Additionally, you are effectively duplicating the category data over each of you billion places, and not taking advantage of Neo4j's strengths.
In all your other examples, your data models seem odd. "Food", "Bar", "Park", etc. are all instances of categories, not separate types. They should each be their own node, but they should all have the Category label, because that's what they are. In addition, categories are things, and thus they should be nodes. A relationship describes the connection between things. It does not make sense to use categories in this way.
I hope this helps!

Modeling arrows/relationships as nodes in Neo4j

Relationship/Arrows in Neo4j can not get more than one type/label (see here, and here). I have a data model that edges need to get labels and (probably) properties. If I decide to use Neo4j (instead of OriendDB which supports labeled arrow), I think I would have then two options to model an arrow, say f, between two nodes A and B:
1) encode an arrow f as a span, say A<--f-->B, such that f is also a node and --> and <-- are arrows.
or
2) encode an arrow f as A --> f -->B, such that f is a node again and two --> are arrows.
Though this seems to be adding unnecessary complexity on my data model, it does not seem to be any other option at the moment if I want to use Neo4j. Then, I am trying to see which of the above encoding might fit better in my queries (queries are the core of my system). For doing so, I need to resort to examples. So I have two question:
First Question:
part1) I have nodes labeled as Person and father, and there are arrows between them like Person<-[:sr]-father-[:tr]->Person in order to model who is father of who (tr is father of sr). For a given person p1 how can I get all of his ancestors.
part2) If I had Person-[:sr]->father-[:tr]->Person structure instead, for modeling father relationship, how the above same query would look like.
This is answered here when father is considered as a simple relationship (instead of being encoded as a node)
Second Question:
part1) I have nodes labeled as A nodes with the property p1 for each. I want to query A nodes, get those elements that p1<5, then create the following structure: for each a1 in the query result I create qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
part2) What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
This question is answered here when isA is considered as a simple arrow (instead of being modeled as a node).
First, some terminology; relationships don't have labels, they only have types. And yes, one type per relationship.
Second, relative to modeling, I think the direction of the relationship isn't always super important, since with neo4j you can traverse it both ways easily. So the difference between A-->f-->B and A<--f-->B I think should be entirely driven what what makes sense semantically for your domain, nothing else. So your options (1) and (2) at the top seem the same to me in terms of overall complexity, which brings me to point #3:
Your main choice is between making a complex relationship into a node (which I think we're calling f here) or keeping it as a relationship. Making "a relationship into a node" is called reification and I think it's considered a fairly standard practice to accommodate a number of modeling issues. It does add complexity (over a simple relationship) but adds flexibility. That's a pretty standard engineering tradeoff everywhere.
So with all of that said, for your first question I wouldn't recommend an intermediate node at all. :father is a very simple relationship, and I don't see why you'd ever need more than one label on it. So for question one, I would pick "neither of the options you list" and would instead model it as (personA)-[:father]->(personB). More simple. You'd query that by saying
MATCH (personA { name: "Bob"})-[:father]->(bobsDad) RETURN bobsDad
Yes, you could model this as (personA)-[:sr]->(fatherhood)-[:tr]->(personB) but I don't see how this gains you much. As for the relationship direction, again it doesn't matter for performance or query, only for semantics of whatever :tr and :sr are supposed to mean.
I have nodes labeled as A nodes with the property p1 for each. I want
to query A nodes, get those elements that p1<5, then create the
following structure: for each a1 in the query result I create
qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
That's this:
MATCH (aNode:A)
WHERE aNode.p1 < 5
WITH aNode
MATCH (qa1 { label: "some qa1 node" })
CREATE (qa1)<-[:sr]-(isA)-[:tr]->aNode;
Note that you'll need to adjust the criteria for qa1 and also specify something meaningful for isA.
What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
It should be trivial to modify that query above, just change the direction of the arrows, same query.

Neo4j node property type

I'm playing around with neo4j, and I was wondering, is it common to have a type property on nodes that specify what type of Node it is? I've tried searching for this practice, and I've seen some people use name for a purpose like this, but I was wondering if it was considered a good practice or if indexes would be the more practical method?
An example would be a "User" node, which would have type: user, this way if the index was bad, I would be able to do an all-node scan and look for types of user.
Labels have been added to neo4j 2.0. They fix this problem.
You can create nodes with labels:
CREATE (me:American {name: "Emil"}) RETURN me;
You can match on labels:
MATCH (n:American)
WHERE n.name = 'Emil'
RETURN n
You can set any number of labels on a node:
MATCH (n)
WHERE n.name='Emil'
SET n :Swedish:Bossman
RETURN n
You can delete any number of labels on a node:
MATCH (n { name: 'Emil' })
REMOVE n:Swedish
Etc...
True, it does depend on your use case.
If you add a type property and then wish to find all users, then you're in potential trouble as you've got to examine that property on every node to get to the users. In that case, the index would probably do better- but not in cases where you need to query for all users with conditions and relations not available in the index (unless of course, your index is the source of the "start").
If you have graphs like mine, where a relation type implies two different node types like A-(knows)-(B) and A or B can be a User or a Customer, then it doesn't work.
So your use case is really important- it's easy to model graphs generically, but important to "tune" it as per your usage pattern.
IMHO you shouldn't have to put a type property on the node. Instead, a common way to reference all nodes of a specific "type" is to connect all user nodes to a node called "Users" maybe. That way starting at the Users node, you can very easily find all user nodes. The "Users" node itself can be indexed so you can find it easily, or it can be connected to the reference node.
I think it's really up to you. Some people like indexed type attributes, but I find that they're mostly useful when you have other indexed attributes to narrow down the number of index hits (search for all users over age 21, for example).
That said, as #Luanne points out, most of us try to solve the problem in-graph first. Another way to do that (and the more natural way, in my opinion) is to use the relationship type to infer a practical node type, i.e. "A - (knows) -> B", so A must be a user or some other thing that can "know", and B must be another user, a topic, or some other object that can "be known".
For client APIs, modeling the element type as a property makes it easy to instantiate the right domain object in your client-side code so I always include a type property on each node/vertex.
The "type" var name is commonly used for this, but in some languages like Python, "type" is a reserved word so I use "element_type" in Bulbs ( http://bulbflow.com/quickstart/#models ).
This is not needed for edges/relationships because they already contain a type (the label) -- note that Neo4j also uses the keyword "type" instead of label for relationships.
I'd say it's common practice. As an example, this is exactly how Spring Data Neo4j knows of which entity type a certain node is. Each node has "type" property that contains the qualified class name of the entity. These properties are automatically indexed in the "types" index, thus nodes can be looked up really fast. You could implement your use case exactly like this.
Labels have recently been added to Neo4j 2.0 ( http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-labels.html ). They are still under development at the moment, but they address this exact problem.

Resources