Fastest way to remove reciprocal relationships in Neo4j? - neo4j

I imported a thesaurus with duplicate "Related Term" relationships so when A and B are related, my graph contains (A)-[:RT]->(B) as well as (B)-[:RT]->(A)
To clean this up, because Neo4j allows me to traverse the graph in both directions, I did
match (t0)-[r0:RT]->(t1)-[r1:RT]->(t2)
where t0=t2 AND id(r0) > id(r1)
delete r0
Is this the fastest way ? Answer : No, because it can be simplified.
Edited into
match (t0)-[r0:RT]->(t1)-[r1:RT]->(t0)
where id(r0)>id(r1)
delete r0

In Cypher the relationships are unique in each path so unless you break up the query into two separate matches, r1 and r2 will never bind the same relationship.
MATCH (t0)-[r:RT]->(t1)-[:RT]->(t0)
DELETE r
Besides, the relationships declared in your pattern have different direction with regards to (t0), so they can't bind the same relationship for that reason also. You can see this if you break the query up.
MATCH (t0)-[r1:RT]->(t1), (t0)<-[r2:RT]-(t1)
Addendum
As you pointed out in a comment, a pattern like this ends up deleting both relationships after all. This is incidental–each individual match will behave as above and only one relationship is deleted. The reason the query as a whole deletes both relationships is that the pattern is symmetric, i.e. a node will satisfy the pattern in the place designated t0 if and only if it also satisfies the pattern in the place designated t1, or (semi-formally)
(t0)-[:RT]->(t1)-[:RT]->(t0) iff (t1)-[:RT]->(t0)-[:RT]->(t1)
Perhaps I should have said that r1 and r2 can never bind the same relationship at the same time, or in the same match or path. The solution is to break the symmetry. I imagined a local query with a discriminating property
(t0 {name:"t0"})-[r:RT]->(t1)-[:RT]->(t0)
DELETE r
but for a global query, if you want to do it all at once, comparing id on t0 and t1 is great. Which is exactly the answer you had arrived at already.

I use something like this:
match (s:Node)-[r]-(n:Node)
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)
it will remove duplicate and reciprocal relationships regardless of the direction of the edges; However, if you want to specify direction, you can also try:
match (s:Node)-[r]->(n:Node)
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)
or the other way around.
Hope this helps.

I think where clause wont be required.
match (t0)-[r0:RT]->(t1)-[r1:RT]->(t0)
delete r0

I usually use something like this:
match (t0)-[r:RT]-(t1)
with t0,t1, collect(r) as rels
forach (r in tail(rels) | delete r)

Related

Deleting duplicate relationships in neo4j - is this correct?

I have developed a query which, by trial and error, appears to find all of the duplicated relationships in a Neo4j DB. I want delete all but one of these relationships but I'm concerned that I have not thought of problematic cases that could result in data deletion.
So, does this query delete all but one of a duplicated relationship?
MATCH (a)-->(b)<--(a) # identify where the duplication is present
WITH DISTINCT a, b
MATCH (a)-[r]->(b) # get all duplicated paths themselves
WITH a, b, collect(r)[1..] as rs # remove the first instance from the list
UNWIND rs as r
DELETE r
If I replace the UNWIND rs as r; DELETE r with WITH a, b, count(rs) as cnt RETURN cnt it seems to return the unnecessary relationships.
I'm still relucant to put this somewhere to be used by others, though....
Thanks
First of all, let me (strictly) define the term: "duplicate relationships". Two relationships are duplicates if they:
Connect the same pair of nodes (call them a and b)
Have the same relationship type
Have exactly the same set of properties (both names and values)
Have the same directionality between a and b (iff directionality is significant for use case)
Your query only considers #1 and #4, so it generally could delete non-duplicate relationships as well.
Here is a query that will take all of the above into consideration (assuming #4 should be included):
MATCH (a)-[r1]->(b)<-[r2]-(a)
WHERE TYPE(r1) = TYPE(r2) AND PROPERTIES(r1) = PROPERTIES(r2)
WITH a, b, apoc.coll.union(COLLECT(r1), COLLECT(r2))[1..] AS rs
UNWIND rs as r
DELETE r
Aggregating functions (like COLLECT) use non-aggregated terms as grouping keys, so there is no need for the query to perform a separate redundant DISTINCT a,b test.
The APOC function apoc.coll.union returns the distinct union of its 2 input lists.

Traverse both incoming and outgoing relationship in Cypher

I am new at Neo4j but not to graphs and I have a specific problem I did not manage to solve with Cypher.
With this type of data:
I would like to be able in a single query to follow some incoming and some outgoing flow.
Example:
Starting on "source"
Follow all "A" relationships in the outgoing way
Follow all "B" relationships in the incoming way
My problem is that Cypher only allows one single direction to be specified in the relationship pattern.
So I could do (source)-[:A|:B*]->() or (source)<-[:A|:B*]-().
But I have no possibility to tell Cypher that I want to follow -[:A]-> and <-[:B]-.
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
Thanks in advance for your help :)
Alternatively to #Gabor Szarnyas answer, I think you can achieve your goal using the APOC procedure apoc.path.expand.
Using this sample data set:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
And calling apoc.path.expand:
match (source:Source)
call apoc.path.expand(source,"A>|<B","",0,5) yield path
return path
You will get this path as output:
The apoc.path.expand call starts from the source node following -[:A]-> and <-[:B]- relationships.
Remember to install APOC procedures according to the version of Neo4j you are using. Take a look in the version compatibility matrix.
To express this in a single query would require a regular path query, which has been proposed to and accepted to openCypher, but it is not yet implemented.
I see two possible workarounds. I recreated your example with this command with a Source label for the source node:
CREATE (:Source)-[:A]->()-[:A]->()<-[:B]-()-[:A]->()
(1) Insert additional relationships that have the same direction:
MATCH (s)-[:B]->(t)
CREATE (s)<-[:B2]-(t)
And use this relationship for traversal:
MATCH p=(source)-[:A|:B2*]->()
RETURN p
(2) As you mentioned:
By the way, I know that I could do -[:A|:B]- but this won't solve my problem because I don't want to follow -[:B]-> and <-[:A]-.
You could use this approach to first get potential path candidates and manually check the directions of the relationships afterwards. Of course, this is an expensive operation but you only have to calculate it on the candidates, a possibly small data set.
MATCH p=(source:Source)-[:A|:B*]-()
WITH p, nodes(p) AS nodes, relationships(p) AS rels
WHERE all(i IN range(0, size(rels) - 1) WHERE
CASE type(rels[i])
WHEN 'A' THEN startNode(rels[i]) = nodes[i]
ELSE /* B */ startNode(rels[i]) = nodes[i+1]
END)
RETURN p
Let's break down how this works:
We store path candidates in p and use the nodes and relationships functions to extract the lists of nodes/relationships from it.
We define a range of indexes for the relationships (e.g. from 0, 1, 2 if there are 3 relationships).
To determine the direction of relationships, we use the startNode function. For example, if there is a relationship r between nodes n1 to n2, the paths will like <n1, r, n2>. If r was traversed to in the outgoing direction, the startNode(r) will return n1, if it was traverse in the incoming direction, startNode(r) will return n2. The type of the relationship is checked with the type function and a simple CASE expression is used to differentiate between types.
The WHERE clause uses the all predicate function to check whether all :A and :B relationships had the appropriate directions.

match a branching path of variable length

I have a graph which looks like this:
Here is the link to the graph in the neo4j console:
http://console.neo4j.org/?id=av3001
Basically, you have two branching paths, of variable length. I want to match the two paths between orange node and yellow nodes. I want to return one row of data for each path, including all traversed nodes. I also want to be able to include different WHERE clauses on different intermediate nodes.
At the end, i need to have a table of data, like this:
a - b - c - d
neo - morpheus - null - leo
neo - morpheus - trinity - cypher
How could i do that?
I have tried using OPTIONAL MATCH, but i can't get the two rows separately.
I have tried using variable length path, which returns the two paths but doesn't allow me to access and filter intermediate nodes. Plus it returns a list, and not a table of data.
I've seen this question:
Cypher - matching two different possible paths and return both
It's on the same subject but the example is very complex, a more generic solution to this simpler problem is what i'm looking for.
You can define what your end node by using WHERE statement. So in your case end node has no outgoing relationship. Not sure why you expect a null on return as you said neo - morpheus - null - leo
MATCH p=(n:Person{name:"Neo"})-[*]->(end) where not (end)-->()
RETURN extract(x IN nodes(p) | x.name)
Edit:
may not the the best option as I am not sure how to do this programmatically. If I use UNWIND I get back only one row. So this is a dummy solution
MATCH p=(n{name:"Neo"})-[*]->(end) where not (end)-->()
with nodes(p) as list
return list[0].name,list[1].name,list[2].name,list[3].name
You can use Cypher to match a path like this MATCH p=(:a)-[*]->(:d) RETURN p, and p will be a list of nodes/relationships in the path in the order it was traversed. You can apply WHERE to filter the path just like with node matching, and apply any list functions you need to it.
I will add these examples too
// Where on path
MATCH p=(:a)-[*]-(:d) WHERE NONE(n in NODES(p) WHERE n.name="Trinity") WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Spit path into columns
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Match path, filter on label
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN FILTER(n in p WHERE "a" in LABELS(n)) as a, FILTER(n in p WHERE "b" in LABELS(n)) as b, FILTER(n in p WHERE "c" in LABELS(n)) as c, FILTER(n in p WHERE "d" in LABELS(n)) as d
Unfortunately, you HAVE to explicitly set some logic for each column. You can't make dynamic columns (that I know of). In your table example, what is the rule for which column gets 'null'? In the last example, I set each column to be the set of nodes of a label.
I.m.o. you're asking for extensive post-processing of the results of a simply query (give me all the paths starting from Neo). I say this because :
You state you need to be able to specify specific WHERE clauses for each path (but you don't specify which clauses for which path ... indicating this might be a dynamic thing ?)
You don't know the size of the longest path beforehand ... but you still want the result to be a same-size-for-all-results table. And would any null columns then always be just before the end node ? Why (for that makes no real sense other then convenience) ?
...
Therefore (and again i.m.o.) you need to process the results in a (Java or whatever you prefer) program. There you'll have full control over the resultset and be able to slice and dice as you wish. Cypher (exactly like SQL in fact) can only do so much and it seems that you're going beyond that.
Hope this helps,
Regards,
Tom
P.S. This may seem like an easy opt-out, but look at how simple your query is as compared to the constructs that have to be wrought trying to answer your logic. So ... separate the concerns.

How to eliminate a path where exists a relationship outside of the path, but between nodes in the path?

I have modified the 'vanilla' initial query in this console, and added one relationship type 'LOCKED' between the 'Morpheus' and 'Cypher' nodes.
How can I modify the existing (first-run) query, which is a variable length path so that it no longer reaches the Agent Smith node due to the additional Locked relationship I've added?
First-run query:
MATCH (n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
RETURN n AS Neo,r,m
I have tried this kind of thing:
MATCH p=(n:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE n.name='Neo'
AND none(rel IN rels(p) WHERE EXISTS (StartNode(rel)-[:LOCKED]->EndNode(rel)))
RETURN n AS Neo,r,m
..but it doesn't recognize the pattern inside the none() function.
I'm using Community 2.2.1
Thanks for reading
I'm pretty sure you can't use a function in a MATCHy type clause like that (though it's clever). What about this?
MATCH path=(neo:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
WHERE neo.name='Neo'
AND NOT('LOCKED' IN rels(path))
RETURN neo,r,m
EDIT:
Oops, looks like Dave might have beat me to the punch. Here's the solution I came up with anyway ;)
MATCH p=(neo:Crew)-[r:KNOWS|LOVES*2..4]->m
WHERE neo.name='Neo'
WITH p, neo, m
UNWIND rels(p) AS rel
MATCH (a)-[rel]->(b)
OPTIONAL MATCH a-[locked_rel:LOCKED]->b
WITH neo, m, collect(locked_rel) AS locked_rels
WHERE none(locked_rel IN locked_rels WHERE ()-[locked_rel]->())
RETURN neo, m
Ok, this is a little convoluted but i think it works. The approach is to take all of the paths and find the last known good nodes (ones that have LOCKED relationships leaving them). Then use that node(s) as a new ending point(s) and return the paths.
match p=(n:Crew)-[r:KNOWS|LOVES|LOCKED*2..4]->m
where n.name='Neo'
with n, relationships(p) as rels
unwind rels as r
with n
, case
when type(r) = 'LOCKED' then startNode(r)
else null
end as last_good_node
with n
, (collect( distinct last_good_node)) as last_good_nodes
unwind last_good_nodes as g
match p=n-[r:KNOWS|LOVES*]->g
return p
I think this would be simpler if there was a locked: true property on the KNOWS and LOVES relationships.

In neo4j is there a way to get path between more than 2 random nodes whose direction of relation is not known

I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.

Resources