Why neo4j don't allows not directed or bidirectional relationships at creation time? - neo4j

I know that Neo4j requires a relationship direction at creation time, but allows ignore this direction in query time. By this way I can query my graph ignoring the relationship direction.
I also know that there are some workarounds for cases when the relationships are naturally bidirectional or not directed, like described here.
My question is: Why is it implemented that way? Has a good reason to not allow not directed or bidirectional relationships at creation time? Is it a limitation of the database architecture?
The Cypher statements like below are not allowed:
CREATE ()-[:KNOWS]-()
CREATE ()<-[:KNOWS]->()
I searched the web for an answer, but I did not find much. For example, this github issue.
Is strange to have to define a relationship direction to one that don't have it. It seems to me that i'm hurting the semantic of my graph.
EDIT 1:
To clarify my standpoint about a "semantic problem" (maybe the term is wrong):
Suppose that I run this simple CREATE statement:
CREATE (a:Person {name:'a'})-[:KNOWS]->(b:Person {name:'b'})
As result i have this very simple graph:
The :KNOWS relationship has a direction only because Neo4j requires a relationship direction at creation time. In my domain a knows b and b knows a.
Now, a new team member will query my graph with this Cypher query:
MATCH path = (a:Person {name:'a'})-[:KNOWS]-(b:Person {name:'b'})
return path
This new team member don't know that when I created this graph I considered that :KNOWS relationship is not directed. The result that he will see is the same:
By the result this new team member can think that only Person a consider knows Person b. It seems to me bad. Not for you? This make any sense?

Fundamentally, it boils down to the internals of how the data is stored on disk in Neo4j -- note Chapter 6 of the O'Reilly Neo4j e-book.
In the data structure of a relationship they have a "firstNode" and a "secondNode", where each is either the left or the right hand side of the relationship.
To flag a relationship as uni/bi-directional would require an additional bit per node, where I would argue it is better to retain the direction in the data store and just ignore direction during querying.

In Neo4j relationships are always directed.
But if you don't care about the direction, you can ignore the direction when querying.
MATCH (p1:Person {name:"me"})-[:KNOWS]-(p2)
RETURN p2;
And with MERGE you can also leave off the direction when creating.
MATCH (p1:Person {name:"me"})
MATCH (p2:Person {name:"you"})
MERGE (p1)-[:KNOWS]-(p2);
You only need 2 relationships if they really convey a different meaning, e.g. :FOLLOWS on Twitter.

It seems to me that i'm hurting the semantic of my graph.
I can't see why a < or > symbol used during creation of a relationship hurts the semantics of your graph if you are going to not use that symbol during matching (and thus treating that relationship as undirected/bidirectional).
Suppose that the syntax proposed by you is supported. Now how will you connect with an undirected relationship two nodes a and b? You still have two options:
CREATE (a)-[:KNOWS]-(b)
CREATE (b)-[:KNOWS]-(a)
The pair (a, b) is always ordered by appearance even if not by semantics. So even if we remove the < or > symbol from the relationship declaration, the problem with the order of nodes in it cannot be eliminated. Therefore simply don't treat it is a problem.

Related

How can I mitigate having bidirectional relationships in a family tree, in Neo4j?

I am running into this wall regarding bidirectional relationships.
Say I am attempting to create a graph that represents a family tree. The problem here is that:
* Timmy can be Suzie's brother, but
* Suzie can not be Timmy's brother.
Thus, it becomes necessary to model this in 2 directions:
(Sure, technically I could say SIBLING_TO and leave only one edge...what I'm not sure what the vocabulary is when I try to connect a grandma to a grandson.)
When it's all said and done, I pretty sure there's no way around the fact that the direction matters in this example.
I was reading this blog post, regarding common Neo4j mistakes. The author states that this bidirectionality is not the most efficient way to model data in Neo4j and should be avoided.
And I am starting to agree. I set up a mock set of 2 families:
and I found that a lot of queries I was attempting to run were going very, very slow. This is because of the 'all connected to all' nature of the graph, at least within each respective family.
My question is this:
1) Am I correct to say that bidirectionality is not ideal?
2) If so, is my example of a family tree representable in any other way...and what is the 'best practice' in the many situations where my problem may occur?
3) If it is not possible to represent the family tree in another way, is it technically possible to still write queries in some manner that gets around the problem of 1) ?
Thanks for reading this and for your thoughts.
Storing redundant information (your bidirectional relationships) in a DB is never a good idea. Here is a better way to represent a family tree.
To indicate "siblingness", you only need a single relationship type, say SIBLING_OF, and you only need to have a single such relationship between 2 sibling nodes.
To indicate ancestry, you only need a single relationship type, say CHILD_OF, and you only need to have a single such relationship between a child to each of its parents.
You should also have a node label for each person, say Person. And each person should have a unique ID property (say, id), and some sort of property indicating gender (say, a boolean isMale).
With this very simple data model, here are some sample queries:
To find Person 123's sisters (note that the pattern does not specify a relationship direction):
MATCH (p:Person {id: 123})-[:SIBLING_OF]-(sister:Person {isMale: false})
RETURN sister;
To find Person 123's grandfathers (note that this pattern specifies that matching paths must have a depth of 2):
MATCH (p:Person {id: 123})-[:CHILD_OF*2..2]->(gf:Person {isMale: true})
RETURN gf;
To find Person 123's great-grandchildren:
MATCH (p:Person {id: 123})<-[:CHILD_OF*3..3]-(ggc:Person)
RETURN ggc;
To find Person 123's maternal uncles:
MATCH (p:Person {id: 123})-[:CHILD_OF]->(:Person {isMale: false})-[:SIBLING_OF]-(maternalUncle:Person {isMale: true})
RETURN maternalUncle;
I'm not sure if you are aware that it's possible to query bidirectionally (that is, to ignore the direction). So you can do:
MATCH (a)-[:SIBLING_OF]-(b)
and since I'm not matching a direction it will match both ways. This is how I would suggest modeling things.
Generally you only want to make multiple relationships if you actually want to store different state. For example a KNOWS relationship could only apply one way because person A might know person B, but B might not know A. Similarly, you might have a LIKES relationship with a value property showing how much A like B, and there might be different strengths of "liking" in the two directions

Returning on-the-fly relationship in graph form in Neo4j

I'm pretty new to Neo4j and graph DBs in general, and have been playing around with it for the last few days. I've now hit something I'm stumped on: I'm trying to create a "temporary" relationship between two disjoint nodes just for the sake of a RETURN, then not store this relationship within the DB afterwards.
The dataset I'm using is a graph of Movie and Person nodes provided in one of the basic Neo4j built-in tutorials. My query is currently as follows:
MATCH (p1:Person)-[r1:ACTED_IN]-(m1:Movie)-[r2:ACTED_IN]-(p2:Person)
WHERE p1.name="Kevin Bacon"
RETURN {start:p1,rel:"COSTAR",end:p2}
What I'd ultimately like to see is a central "Kevin Bacon" node with COSTAR relationships to a series of Person nodes around it, without any Movie nodes or ACTED_IN relationships being displayed. The query above does show the COSTAR relationship in the returned rows, but it does not appear on the graph itself; I've attached a few screenshots of what I'm seeing.
The only other idea I have is to use the MERGE keyword to create a COSTAR relationship, but (as I understand it) this actually stores the relationship in the DB which is what I'm trying to avoid.
Any suggestions would be greatly appreciated.
The neo4j Browser only visualizes nodes and relationships that actually exist in the DB. So, there is no way to do what you want without actually creating the COSTAR relationships, visualizing the result in the Browser, and then deleting all the COSTAR relationships.
As a workaround you could simply display the nodes of all of Kevin Bacon's costars, like this:
MATCH (p1:Person)-[:ACTED_IN]-(:Movie)-[:ACTED_IN]-(p2:Person)
WHERE p1.name="Kevin Bacon"
RETURN DISTINCT p2;
So you want the relationships to appear in the graph visualization in the Neo4j browser but not store these relationships in the graph itself? I can't think of a way to make that happen (without hacking it), but would deleting the relationships after you are done generating the visual work?
Query to create COSTAR relationships:
MATCH (p1:Person)-[r1:ACTED_IN]-(m1:Movie)-[r2:ACTED_IN]-(p2:Person)
WHERE p1.name="Kevin Bacon"
CREATE UNIQUE (p1)<-[:COSTAR]-(p2);
Execute your query to populate the graph in Neo4j Browser...
Then to delete the COSTAR relationships:
MATCH (:Person)-[r:COSTAR]-(:Person)
DELETE r;
The best way to achieve this (now... 6 years later) is with the gds.graph.create.* functions (assuming you load GDS)
https://neo4j.com/docs/graph-data-science/current/graph-create/
With a graph as simple as this, gds.graph.create(...) would be enough (creating COSTAR for all co-starrings)
Or, if you wanted to do some constraining, gds.graph.create.cypher(...)
The in-memory graph projection feels like what you wanted to achieve - it persists only as long as the DBMS is active, or until you call gds.graph.drop(...)

neo4j and uni-directional relationship

I'm new to neo4j. I've just read some information on this tool, installed it on Ubuntu and made a bunch of queries. And at this moment, I must confess, that I realy like it. However, there is something (I think very simple and intuitive), which I do not know how to implement. So, I created three nodes like so:
CREATE (n:Object {id:1}) RETURN n
CREATE (n:Object {id:2}) RETURN n
CREATE (n:Object {id:3}) RETURN n
And I created a hierarchical relationship between them:
MATCH (a:Object {id:1}), (b:Object {id:2}) CREATE (a)-[:PARENT]->(b)
MATCH (a:Object {id:2}), (b:Object {id:3}) CREATE (a)-[:PARENT]->(b)
So, I think this simple hierarchy should look like this:
(id:1)
-> (id:2)
-> (id:3)
What I want now is to get a path from any node. For example, if I want to have a path from node (id:2), I will get (id:2) -> (id:3). And if I want to get a path from node (id:1), I will get (id:1)->(id:2)->(id:3). I tried this query:
MATCH (n:Object {id:2})-[*]-(children) return n, children
which I though should return a path (id:2)->(id:3), but unexpectedly (just for me) it returns (id:1)->(id:2)->(id:3). So, what I'm doing wrong and what is the right query to use?
All relationships in neo4j are directed. When you say (n)-[:foo]->(m), that relationship goes only one way, from n to m.
Now what's tricky about this is that you can navigate the relationship both ways. This doesn't make the relationship bi-directional, it never is -- it only means that you can look at it in either direction.
When you write this query: (n:Object {id:2})-[*]-(children) you didn't put an arrow head on that relationship, so children could refer to something either downstream or upstream of the node in question.
In other words, saying (n)-[:test]-(m) is the same thing as matching both (n)<-[:test]-(m) and (n)-[:test]->(m).
So children could refer to the ID 1 object or ID 2 object.
Returning only children
To directly answer your question,
Your query
MATCH (n:Object {id:2})-[*]-(children) return n, children
matches not only relationships FROM (n {id:2}) TO its children, but also relationships TO (n {id:2}) FROM its parents.
You need to additionally specify the direction that you'd like. This returns the results you expect:
MATCH (n:Object {id:2})-[*]->(children) return n, children
Issues with the example
I'd like to answer your comment about uni-directional and bi-directional relationships, but let's first resolve a couple of issues with the example.
Using correct labels
Let's revisit your example:
(:Object {id:1})-[:PARENT]->(:Object {id:2})-[:PARENT]->(:Object {id:3})
There's no point to using labels like :Object, :Node, :Thing. If you really don't care, don't use a label at all!
In this case, it looks we're talking about people, although it could easily also be motherboards and daughterboards, or something else!
Let's use People instead of Objects:
(:Person {id:1})-[:PARENT]->(:Person {id:2})-[:PARENT]->(:Person {id:3})
IDs in Neo4j
Neo4j stores its own IDs of every node and relationship. You can retrieve those IDs with id(nodeOrRelationship), and access by ID with a WHERE clause or by specifying them as a start point for your match. START n=node(2) MATCH (n)-[*]-(children) return n, children is equivalent to your original query MATCH (n:Object {id:2})-[*]-(children) return n, children.
Let's, instead of IDs, store something useful about the nodes, like names:
(:Person {name:'Bob'})-[:PARENT]->(:Person {name:'Mary'})-[:PARENT]->(:Person {name:'Tom'})
Relationship ambiguity
Lastly, let's disambiguate the relationships. Does PARENT mean "is the parent of", or "has this parent"? It might be clear to you which one you meant, but someone unfamiliar with your system might have the opposite interpretation.
I think you meant "is the parent of", so let's make that clear:
(:Person {name:'Bob'})-[:PARENT_OF]->(:Person {name:'Mary'})-[:PARENT_OF]->(:Person {name:'Tom'})
More information about uni-directional and bi-directional relationships in Neo4j
Now that we've taken care of a few basic issues with the example, let's address the directionality of relationships in Neo4j and graphs in general.
There are several ways we could have expressed the relationships this example. Let's look at a few.
Undirected/bidirectional relationship
Let's abstract the parent relationship that we used above, for the purposes of discussion:
(bob)-[:KIN]-(mary)-[:KIN]-(tom)
Here the relationship KIN indicates that they are related but we don't know exactly who is the parent of whom. Is Tom the child of Mary, or vice-versa?
Notice that I didn't use any arrows. In the graph pseudo-code above, the KIN relationship is a bidirectional or undirected relationship.
Relationships in Neo4j, however, are always directional. If the KIN relationship was really how you wanted to track things, then you'd create a directional relationship, but always ignore the direction in your MATCH queries, e.g. MATCH (a)-[:KIN]-(b) and not MATCH (a)-[:KIN]->(b).
But is the KIN relationship really the best way to store this information? We can make it more specific. Let's go back to the PARENT_OF relationship that we were using earlier.
Directed/unidirectional relationship
Back to the example. We know that Bob is the parent of Mary who is the parent of Tom:
(bob)-[:PARENT_OF]->(mary)-[:PARENT_OF]->(tom)
Obviously, the corollary of this is:
(bob)<-[:CHILD_OF]-(mary)<-[:CHILD_OF]-(tom)
Or, equivalently:
(tom)-[:CHILD_OF]->(mary)-[:CHILD_OF]->(bob)
So, should we go ahead and create both the PARENT_OF and the CHILD_OF relationships between our (bob), (mary) and (tom) nodes?
The answer is no. We can pick one of those relationships, whichever best models the idea, and still be able to search both ways.
Using only the :PARENT_OF relationship, we can do
MATCH (mary {name:'Mary'})-[:PARENT_OF]->(children) RETURN children
to find the children, or
MATCH (mary {name:'Mary'})<-[:PARENT_OF]-(parents) RETURN parents
to find the parents, using (mary) as the starting point each time.
For more information, see this fantastic article from GraphAware

Neo4j Bidirectional Relationship

Is there a way to create bidirectional relationship in Neo4j using Cypher? I would like the relationship to be bidirectional rather than making two unidirectional relationships in both directions For eg:
(A)<-[FRIEND]->(B)
Rather than:
(A)-[FRIEND]->(B)
(A)<-[FRIEND]-(B)
Thanks in advance :)
No, there isn't. All relationships in neo4j have a direction, starting and ending at a given node.
There are a small number of workarounds.
Firstly, as you've suggested, we can either have two relationships, one going from A to B and the other from B to A.
Alternatively, when writing our MATCH query, we can specify to match patterns directionlessly, by using a query such as
MATCH (A)-[FRIEND]-(B) RETURN A, B
which will not care about whether A is friends with B or vice versa, and allows us to choose a direction arbitrarily when we create the relationship.
According to this article: Modeling Data in Neo4j: Bidirectional Relationships
The strictly better choice is to create a relationship in an arbitrary direction and not specify the direction when querying:
MATCH (neo)-[:PARTNER]-(partner)
The engine is capable of traversing the edge in either direction. Creating the anti-directional edge is unnecessary and only serves to waste space and traversal time.

Why do relationships as a concept exist in neo4j or graph databases in general?

I can't seem to find any discussion on this. I had been imagining a database that was schemaless and node based and heirarchical, and one day I decided it was too common sense to not exist, so I started searching around and neo4j is about 95% of what I imagined.
What I didn't imagine was the concept of relationships. I don't understand why they are necessary. They seem to add a ton of complexity to all topics centered around graph databases, but I don't quite understand what the benefit is. Relationships seem to be almost exactly like nodes, except more limited.
To explain what I'm thinking, I was imagining starting a company, so I create myself as my first nodes:
create (u:User { u.name:"mindreader"});
create (c:Company { c.name:"mindreader Corp"});
One day I get a customer, so I put his company into my db.
create (c:Company { c.name:"Customer Company"});
create (u:User { u.name:"Customer Employee1" });
create (u:User { u.name:"Customer Employee2"});
I decide to link users to their customers
match (u:User) where u.name =~ "Customer.*"
match (c:Company) where c.name =~ "Customer.*
create (u)-[:Employee]->(c);
match (u:User where name = "mindreader"
match (c:Company) where name =~ "mindreader.*"
create (u)-[:Employee]->(c);
Then I hire some people:
match (c:Company) where c.name =~ "mindreader.*"
create (u:User { name:"Employee1"})-[:Employee]->(c)
create (u:User { name:"Employee2"})-[:Employee]->(c);
One day hr says they need to know when I hired employees. Okay:
match (c:Company)<-[r:Employee]-(u:User)
where name =~ "mindreader.*" and u.name =~ "Employee.*"
set r.hiredate = '2013-01-01';
Then hr comes back and says hey, we need to know which person in the company recruited a new employee so that they can get a cash reward for it.
Well now what I need is for a relationship to point to a user but that isn't allowed (:Hired_By relationship between :Employee relationship and a User). We could have an extra relationship :Hired_By, but if the :Employee relationship is ever deleted, the hired_by will remain unless someone remembers to delete it.
What I could have done in neo4j was just have a
(u:User)-[:hiring_info]->(hire_info:HiringInfo)-[:hired_by]->(u:User)
In which case the relationships only confer minimal information, the name.
What I originally envisioned was that there would be nodes, and then each property of a node could be a datatype or it could be a pointer to another node. In my case, a user record would end up looking like:
User {
name: "Employee1"
hiring_info: {
hire_date: "2013-01-01"
hired_by: u:User # -> would point to a user
}
}
Essentially it is still a graph. Nodes point to each other. The name of the relationship is just a field in the origin node. To query it you would just go
match (u:User) where ... return u.name, u.hiring_info.hiring_date, u.hiring_info.hired_by.name
If you needed a one to many relationship of the same type, you would just have a collection of pointers to nodes. If you referenced a collection in return, you'd get essentially a join. If you delete hiring_info, it would delete the pointer. References to other nodes would not have to be a disorganized list at the toplevel of a node. Furthermore when I query each user I will know all of the info about a user without both querying for the user itself and also all of its relationships. I would know his name and the fact that he hired someone in the same query. From the database backend, I'm not sure much would change.
I see quite a few questions from people asking whether they should use nodes or relationships to model this or that, and occasionally people asking for a relationship between relationships. It feels like the XML problem where you are wondering if a pieces of information should be its own tag or just a property its parent tag.
The query engine goes to great pains to handle relationships, so there must be some huge advantage to having them, but I can't quite see it.
Different databases are for different things. You seem to be looking for a noSQL database.
This is an extremely wide topic area that you've reached into, so I'll give you the short of it. There's a spectrum of database schemas, each of which have different use cases.
NoSQL aka Non-relational Databases:
Every object is a single document. You can have references to other documents, but any additional traversal means you're making another query. Times when you don't have relationships between your data very often, and are usually just going to want to query once and have a large amount of flexibly-stored data as the document that is returnedNote: These are not "nodes". Node have a very specific definition and implies that there are edges.)
SQL aka Relational Databases:
This is table land, this is where foreign keys and one-to-many relationships come into play. Here you have strict schemas and very fast queries. This is honestly what you should use for your user example. Small amounts of data where the relationships between things are shallow (You don't have to follow a relationship more than 1-2 times to get to the relevant entry) are where these excel.
Graph Database:
Use this when relationships are key to what you're trying to do. The most common example of a graph is something like a social graph where you're connecting different users together and need to follow relationships for many steps. (Figure out if two people are connected within a depth for 4 for instance)
Relationships exist in graph databases because that is the entire concept of a graph database. It doesn't really fit your application, but to be fair you could just keep more in the node part of your database. In general the whole idea of a database is something that lets you query a LOT of data very quickly. Depending on the intrinsic structure of your data there are different ways that that makes sense. Hence the different kinds of databases.
In strongly connected graphs, Neo4j is 1000x faster on 1000x the data than a SQL database. NoSQL would probably never be able to perform in a strongly connected graph scenario.
Take a look at what we're building right now: http://vimeo.com/81206025
Update: In reaction to mindreader's comment, we added the related properties to the picture:
RDBM systems are tabular and put more information in the tables than the relationships. Graph databases put more information in relationships. In the end, you can accomplish much the same goals.
However, putting more information in relationships can make queries smaller and faster.
Here's an example:
Graph databases are also good at storing human-readable knowledge representations, being edge (relationship) centric. RDF takes it one step further were all information is stored as edges rather than nodes. This is ideal for working with predicate logic, propositional calculus, and triples.
Maybe the right answer is an object database.
Objectivity/DB, which now supports a full suite of graph database capabilities, allows you to design complex schema with one-to-one, one-to-many, many-to-one, and many-to-many reference attributes. It has the semantics to view objects as graph nodes and edges. An edge can be just the reference attribute from one node to another or an edge can exist as an edge object that sits between two nodes.
An edge object can have any number of attribute and can have references off to other objects, as shown in the diagram below.
Being able to "hang" complex objects off of an edge allows Objectivity/DB to support weighted queries where the edge-weight can be calculated using a user-defined weight calculator operator. The weight calculator operator can build the weight from a static attribute on the edge or build the weight by digging down through the objects connected to the edge. In the picture, above, we could create a edge-weight calculator that computes the sum of the CallDetail lengths connected to the Call edge.

Resources