neo4j and uni-directional relationship - neo4j

I'm new to neo4j. I've just read some information on this tool, installed it on Ubuntu and made a bunch of queries. And at this moment, I must confess, that I realy like it. However, there is something (I think very simple and intuitive), which I do not know how to implement. So, I created three nodes like so:
CREATE (n:Object {id:1}) RETURN n
CREATE (n:Object {id:2}) RETURN n
CREATE (n:Object {id:3}) RETURN n
And I created a hierarchical relationship between them:
MATCH (a:Object {id:1}), (b:Object {id:2}) CREATE (a)-[:PARENT]->(b)
MATCH (a:Object {id:2}), (b:Object {id:3}) CREATE (a)-[:PARENT]->(b)
So, I think this simple hierarchy should look like this:
(id:1)
-> (id:2)
-> (id:3)
What I want now is to get a path from any node. For example, if I want to have a path from node (id:2), I will get (id:2) -> (id:3). And if I want to get a path from node (id:1), I will get (id:1)->(id:2)->(id:3). I tried this query:
MATCH (n:Object {id:2})-[*]-(children) return n, children
which I though should return a path (id:2)->(id:3), but unexpectedly (just for me) it returns (id:1)->(id:2)->(id:3). So, what I'm doing wrong and what is the right query to use?

All relationships in neo4j are directed. When you say (n)-[:foo]->(m), that relationship goes only one way, from n to m.
Now what's tricky about this is that you can navigate the relationship both ways. This doesn't make the relationship bi-directional, it never is -- it only means that you can look at it in either direction.
When you write this query: (n:Object {id:2})-[*]-(children) you didn't put an arrow head on that relationship, so children could refer to something either downstream or upstream of the node in question.
In other words, saying (n)-[:test]-(m) is the same thing as matching both (n)<-[:test]-(m) and (n)-[:test]->(m).
So children could refer to the ID 1 object or ID 2 object.

Returning only children
To directly answer your question,
Your query
MATCH (n:Object {id:2})-[*]-(children) return n, children
matches not only relationships FROM (n {id:2}) TO its children, but also relationships TO (n {id:2}) FROM its parents.
You need to additionally specify the direction that you'd like. This returns the results you expect:
MATCH (n:Object {id:2})-[*]->(children) return n, children
Issues with the example
I'd like to answer your comment about uni-directional and bi-directional relationships, but let's first resolve a couple of issues with the example.
Using correct labels
Let's revisit your example:
(:Object {id:1})-[:PARENT]->(:Object {id:2})-[:PARENT]->(:Object {id:3})
There's no point to using labels like :Object, :Node, :Thing. If you really don't care, don't use a label at all!
In this case, it looks we're talking about people, although it could easily also be motherboards and daughterboards, or something else!
Let's use People instead of Objects:
(:Person {id:1})-[:PARENT]->(:Person {id:2})-[:PARENT]->(:Person {id:3})
IDs in Neo4j
Neo4j stores its own IDs of every node and relationship. You can retrieve those IDs with id(nodeOrRelationship), and access by ID with a WHERE clause or by specifying them as a start point for your match. START n=node(2) MATCH (n)-[*]-(children) return n, children is equivalent to your original query MATCH (n:Object {id:2})-[*]-(children) return n, children.
Let's, instead of IDs, store something useful about the nodes, like names:
(:Person {name:'Bob'})-[:PARENT]->(:Person {name:'Mary'})-[:PARENT]->(:Person {name:'Tom'})
Relationship ambiguity
Lastly, let's disambiguate the relationships. Does PARENT mean "is the parent of", or "has this parent"? It might be clear to you which one you meant, but someone unfamiliar with your system might have the opposite interpretation.
I think you meant "is the parent of", so let's make that clear:
(:Person {name:'Bob'})-[:PARENT_OF]->(:Person {name:'Mary'})-[:PARENT_OF]->(:Person {name:'Tom'})
More information about uni-directional and bi-directional relationships in Neo4j
Now that we've taken care of a few basic issues with the example, let's address the directionality of relationships in Neo4j and graphs in general.
There are several ways we could have expressed the relationships this example. Let's look at a few.
Undirected/bidirectional relationship
Let's abstract the parent relationship that we used above, for the purposes of discussion:
(bob)-[:KIN]-(mary)-[:KIN]-(tom)
Here the relationship KIN indicates that they are related but we don't know exactly who is the parent of whom. Is Tom the child of Mary, or vice-versa?
Notice that I didn't use any arrows. In the graph pseudo-code above, the KIN relationship is a bidirectional or undirected relationship.
Relationships in Neo4j, however, are always directional. If the KIN relationship was really how you wanted to track things, then you'd create a directional relationship, but always ignore the direction in your MATCH queries, e.g. MATCH (a)-[:KIN]-(b) and not MATCH (a)-[:KIN]->(b).
But is the KIN relationship really the best way to store this information? We can make it more specific. Let's go back to the PARENT_OF relationship that we were using earlier.
Directed/unidirectional relationship
Back to the example. We know that Bob is the parent of Mary who is the parent of Tom:
(bob)-[:PARENT_OF]->(mary)-[:PARENT_OF]->(tom)
Obviously, the corollary of this is:
(bob)<-[:CHILD_OF]-(mary)<-[:CHILD_OF]-(tom)
Or, equivalently:
(tom)-[:CHILD_OF]->(mary)-[:CHILD_OF]->(bob)
So, should we go ahead and create both the PARENT_OF and the CHILD_OF relationships between our (bob), (mary) and (tom) nodes?
The answer is no. We can pick one of those relationships, whichever best models the idea, and still be able to search both ways.
Using only the :PARENT_OF relationship, we can do
MATCH (mary {name:'Mary'})-[:PARENT_OF]->(children) RETURN children
to find the children, or
MATCH (mary {name:'Mary'})<-[:PARENT_OF]-(parents) RETURN parents
to find the parents, using (mary) as the starting point each time.
For more information, see this fantastic article from GraphAware

Related

How to move all neo4j relationships with all their labels and properties from one node to another?

Suppose you've got two nodes that represent the same thing, and you want to merge those two nodes. Both nodes can have any number of relations with other nodes.
The basics are fairly easy, and would look something like this:
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->()
CREATE (a)-[s]->()
SET s = PROPERTIES(r)
DELETE DETACH b
Only I can't create a relation without a type. And Cypher doesn't support variable labels either. I'd love to be able to do something like
CREATE (a)-[s:{LABELS(r)}]->(o)
but that doesn't work. To create the relation, you need to know the type of the relation, and in this case I really don't.
Is there a way to dynamically assign types to relationships, or am I going to have to query the types of the old relation, and then string concat new queries with the proper types? That's not impossible, but a lot slower and more complex. And this could potentially match a lot of elements and even more relationships, so having to generate a separate query for every instance is going to slow things down quite a lot.
Or is there a way to change the target of the old relationship? That would probably be the fastest, but I'm not aware of any way to do that.
I think you need to take a look at APOC, especially apoc.create.relationship which enable creating relationships with dynamic type.
Adapting your example, you should end up with something along the line of (not tested):
MATCH (a), (b) WHERE a.id == b.id
MATCH (b)-[r]->(n)
CALL apoc.create.relationship(a, type(r), properties(r), n)
DETACH DELETE b
NB
relationships have TYPE and not label
the proper cypher statement to delete relationships attached to a node and the node itself is DETACH DELETE (and not DELETE DETACH)
Related resource: https://markhneedham.com/blog/2016/10/30/neo4j-create-dynamic-relationship-type/
The APOC procedure apoc.refactor.mergeNodes should be very helpful. That procedure is very powerful, and you need to read the documentation to understand how to configure it to do what you want in your specific situation.
Here is a simple example that shows how to use the procedure's default configuration to merge nodes with the same id:
MATCH (node:Foo)
WITH node.id AS id, COLLECT(node) AS nodes
WHERE SIZE(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {}) YIELD node
RETURN node
In this example, I specified an arbitrary Foo label to avoid accidentally merging unwanted nodes. Doing so also helps to speed up the query if you have a lot of nodes with other labels (since they will not need to be scanned for the id property).
The aggregating function COLLECT is used to collect a list of all the nodes with the same id. After checking the size of the list, it is passed to the procedure.

Why neo4j don't allows not directed or bidirectional relationships at creation time?

I know that Neo4j requires a relationship direction at creation time, but allows ignore this direction in query time. By this way I can query my graph ignoring the relationship direction.
I also know that there are some workarounds for cases when the relationships are naturally bidirectional or not directed, like described here.
My question is: Why is it implemented that way? Has a good reason to not allow not directed or bidirectional relationships at creation time? Is it a limitation of the database architecture?
The Cypher statements like below are not allowed:
CREATE ()-[:KNOWS]-()
CREATE ()<-[:KNOWS]->()
I searched the web for an answer, but I did not find much. For example, this github issue.
Is strange to have to define a relationship direction to one that don't have it. It seems to me that i'm hurting the semantic of my graph.
EDIT 1:
To clarify my standpoint about a "semantic problem" (maybe the term is wrong):
Suppose that I run this simple CREATE statement:
CREATE (a:Person {name:'a'})-[:KNOWS]->(b:Person {name:'b'})
As result i have this very simple graph:
The :KNOWS relationship has a direction only because Neo4j requires a relationship direction at creation time. In my domain a knows b and b knows a.
Now, a new team member will query my graph with this Cypher query:
MATCH path = (a:Person {name:'a'})-[:KNOWS]-(b:Person {name:'b'})
return path
This new team member don't know that when I created this graph I considered that :KNOWS relationship is not directed. The result that he will see is the same:
By the result this new team member can think that only Person a consider knows Person b. It seems to me bad. Not for you? This make any sense?
Fundamentally, it boils down to the internals of how the data is stored on disk in Neo4j -- note Chapter 6 of the O'Reilly Neo4j e-book.
In the data structure of a relationship they have a "firstNode" and a "secondNode", where each is either the left or the right hand side of the relationship.
To flag a relationship as uni/bi-directional would require an additional bit per node, where I would argue it is better to retain the direction in the data store and just ignore direction during querying.
In Neo4j relationships are always directed.
But if you don't care about the direction, you can ignore the direction when querying.
MATCH (p1:Person {name:"me"})-[:KNOWS]-(p2)
RETURN p2;
And with MERGE you can also leave off the direction when creating.
MATCH (p1:Person {name:"me"})
MATCH (p2:Person {name:"you"})
MERGE (p1)-[:KNOWS]-(p2);
You only need 2 relationships if they really convey a different meaning, e.g. :FOLLOWS on Twitter.
It seems to me that i'm hurting the semantic of my graph.
I can't see why a < or > symbol used during creation of a relationship hurts the semantics of your graph if you are going to not use that symbol during matching (and thus treating that relationship as undirected/bidirectional).
Suppose that the syntax proposed by you is supported. Now how will you connect with an undirected relationship two nodes a and b? You still have two options:
CREATE (a)-[:KNOWS]-(b)
CREATE (b)-[:KNOWS]-(a)
The pair (a, b) is always ordered by appearance even if not by semantics. So even if we remove the < or > symbol from the relationship declaration, the problem with the order of nodes in it cannot be eliminated. Therefore simply don't treat it is a problem.

follow all relationships but specific ones

How can I tell cypher to NOT follow a certain relationship/edge?
E.g. I have a :NODE that is connected to another :NODE via a :BUDDY relationship. Additionally every :NODE is related to :STUFF in arbitrary depth by arbitrary edges which are NOT of type :BUDDY. I now want to add a shortcut relation from each :NODE to its :STUFF. However, I do not include :STUFF of its :BUDDIES.
(:NODE)-[:BUDDY]->(:NODE)
(:NODE)-[*]->(:STUFF)
My current query looks like this:
MATCH (n:Node)-[*]->(s:STUFF) WHERE NOT (n)-[:BUDDY]->()-[*]->(s) CREATE (n)-[:HAS]->(s)
However I have some issues with this query:
1) If I ever add a :BUDDY relationship not directly between :NODE but children of :NODE the query will use that relationship for matching. This might not be intended as I do not want to include buddies at all.
2) Explain tells me that neo4j does the match (:NODE)-[*]->(:STUFF) and then AntiSemiApply the pattern (n)-[:BUDDY]->(). As a result it matches the whole graph to then unmatch most of the found connections. This seems ineffective and the query runs slower than I like (However subjective this might sound).
One (bad) fix is to restrict the depth of (:NODE)-[*]->(:STUFF) via (:NODE)-[*..XX]->(:STUFF). However, I cannot guarantee that depth unless I use a ridiculous high number for worst case scenarios.
I'd actually just like to tell neo4j to just not follow a certain relationship. E.g. MATCH (n:NODE)-[ALLBUT(:BUDDY)*]->(s:STUFF) CREATE (n)-[:HAS]->(s). How can I achieve this without having to enumerate all allowed connections and connect them with a | (which is really fast - but I have to manually keep track of all possible relations)?
One option for this particular schema is to explicitly traverse past the point where the BUDDY relationship is a concern, and then do all the unbounded traversing you like from there. Then you only have to apply the filter to single-step relationships:
MATCH (n:Node) -[r]-> (leaf)
WHERE NOT type(r) = 'BUDDY'
WITH n, leaf
MATCH (leaf) -[*] -> (s:Stuff)
WITH n, COLLECT(DISTINCT leaf) AS leaves, COLLECT(DISTINCT s) AS stuff
RETURN n, [leaf IN leaves WHERE leaf:Stuff] + stuff AS stuffs
The other option is to install apoc and take a look at the path expander procedure, which allows you to 'blacklist' node labels (such as :Node) from your path query, which may work just as well depending on your graph. See here.
My final solution is generating a string from the set of
{relations}\{relations_id_do_not_want}
and use that for matching. As I am using an API and thus can do this generation automatically it is not as much as an inconvenience as I feared, but still an inconvenience. However, this was the only solution I was able to find since posting this question.
You could use a condition o nrelationship type:
MATCH (:Node)-[r:REL]->(:OtherNode)
WHERE NOT type(r) = 'your_unwanted_rel_type'
I have no clues about perf though

How can I mitigate having bidirectional relationships in a family tree, in Neo4j?

I am running into this wall regarding bidirectional relationships.
Say I am attempting to create a graph that represents a family tree. The problem here is that:
* Timmy can be Suzie's brother, but
* Suzie can not be Timmy's brother.
Thus, it becomes necessary to model this in 2 directions:
(Sure, technically I could say SIBLING_TO and leave only one edge...what I'm not sure what the vocabulary is when I try to connect a grandma to a grandson.)
When it's all said and done, I pretty sure there's no way around the fact that the direction matters in this example.
I was reading this blog post, regarding common Neo4j mistakes. The author states that this bidirectionality is not the most efficient way to model data in Neo4j and should be avoided.
And I am starting to agree. I set up a mock set of 2 families:
and I found that a lot of queries I was attempting to run were going very, very slow. This is because of the 'all connected to all' nature of the graph, at least within each respective family.
My question is this:
1) Am I correct to say that bidirectionality is not ideal?
2) If so, is my example of a family tree representable in any other way...and what is the 'best practice' in the many situations where my problem may occur?
3) If it is not possible to represent the family tree in another way, is it technically possible to still write queries in some manner that gets around the problem of 1) ?
Thanks for reading this and for your thoughts.
Storing redundant information (your bidirectional relationships) in a DB is never a good idea. Here is a better way to represent a family tree.
To indicate "siblingness", you only need a single relationship type, say SIBLING_OF, and you only need to have a single such relationship between 2 sibling nodes.
To indicate ancestry, you only need a single relationship type, say CHILD_OF, and you only need to have a single such relationship between a child to each of its parents.
You should also have a node label for each person, say Person. And each person should have a unique ID property (say, id), and some sort of property indicating gender (say, a boolean isMale).
With this very simple data model, here are some sample queries:
To find Person 123's sisters (note that the pattern does not specify a relationship direction):
MATCH (p:Person {id: 123})-[:SIBLING_OF]-(sister:Person {isMale: false})
RETURN sister;
To find Person 123's grandfathers (note that this pattern specifies that matching paths must have a depth of 2):
MATCH (p:Person {id: 123})-[:CHILD_OF*2..2]->(gf:Person {isMale: true})
RETURN gf;
To find Person 123's great-grandchildren:
MATCH (p:Person {id: 123})<-[:CHILD_OF*3..3]-(ggc:Person)
RETURN ggc;
To find Person 123's maternal uncles:
MATCH (p:Person {id: 123})-[:CHILD_OF]->(:Person {isMale: false})-[:SIBLING_OF]-(maternalUncle:Person {isMale: true})
RETURN maternalUncle;
I'm not sure if you are aware that it's possible to query bidirectionally (that is, to ignore the direction). So you can do:
MATCH (a)-[:SIBLING_OF]-(b)
and since I'm not matching a direction it will match both ways. This is how I would suggest modeling things.
Generally you only want to make multiple relationships if you actually want to store different state. For example a KNOWS relationship could only apply one way because person A might know person B, but B might not know A. Similarly, you might have a LIKES relationship with a value property showing how much A like B, and there might be different strengths of "liking" in the two directions

Kevin path at neo4j

I am trying to discover how many hops I have to do to know a friend with Cipher. I have these relationships.
(Gutierrez)-[:Conhece]->(Felipe),
(Felipe)-[:Conhece]->(Gutierrez),
(Felipe)-[:Conhece]->(Fernando),
(Fernando)-[:Conhece]->(Felipe),
(Fernando)-[:Conhece]->(Pedro),
(Pedro)-[:Conhece]->(Fernando),
(Pedro)-[:Conhece]->(Arthur),
(Arthur)-[:Conhece]->(Pedro),
(Arthur)-[:Conhece]->(Vitor),
(Vitor)-[:Conhece]->(Arthur),
and when I execute my query it shows Fernando. What I want is to show only Vitor, Pedro and Arthur.
MATCH (n:Leitor)-[r:Conhece]-m
WHERE n.nome='Pedro' OR m.nome='Vitor'
RETURN n,r,m
with my Bacon Path query ->
Ok, I'm adding another answer because I think I understand what you want and it's different than my other answer.
If you want to find everybody that both Pedro and Vitor have met (in this case, just Author):
MATCH (pedro:Leitor)-[:Conhece]-(in_common:Leitor)-[:Conhece]-(vitor:Leitor)
WHERE pedro.nome='Pedro' AND vitor.nome='Vitor'
RETURN in_common
That also might look a bit better like this:
MATCH (pedro:Leitor {nome: 'Pedro'})-[:Conhece]-(in_common:Leitor)-[:Conhece]-(vitor:Leitor {name: 'Vitor'})
RETURN in_common
I also notice from your screenshots that every meeting has two relationships. That may very well be what you want, but if your plan was to always have two relationships whenever two people meet then you can get away with just one relationship. When you query bidirectionally (that is, without specifying direction like in the queries above) then you'll get relationships in either direction.
Normally you only want relationships going in both directions if the direction is important. That could be because your just recording that it goes from one node to another, or it could be because you're storing different values on the relationships.
Here you go:
MATCH shortestPath((n:Leitor)-[rels:Conhece*]-m)
WHERE n.nome IN ['Pedro', 'Vitor']
RETURN n,rels,m,length(rels)
In this case rels will be a collection of relationships because the path is variable length. You can also do:
MATCH path=shortestPath((n:Leitor)-[rels:Conhece*]-m)
WHERE n.nome IN ['Pedro', 'Vitor']
RETURN n,rels,m,length(rels),path,length(path)

Resources