I am trying to write a query to return all related nodes related by "IMMEDIATE_FAMILY_MEMBER"
This is my query so far
MATCH (f:NaturalPerson)-[r:IMMEDIATE_FAMILY_MEMBER*1..6]-(t)
WHERE f.Name="Jacob"
RETURN f AS fromNode, t AS toNode, r AS Metadata
Initially i thought it worked quite well, but as soon as i added a the child Thuthukile (Parents Jacob and Nkosazana) i get "duplicate" results.
At the moment the query will return a pair of related nodes and all the relationships that were traversed to link them together (ie the metadata).
How do i change this query so i return a distinct pair of nodes with the shortest path (all the relationships) between them.
Also as an extra question, is it possible to specify an or for the label of the relation itself. Ie, the same query but also include the :KNOWS relationship
Edit:
cybersams answer was correct, i made one small change to get the result i wanted.
My final query was this
MATCH (f:NaturalPerson)-[r:IMMEDIATE_FAMILY_MEMBER*..6]-(t)
WHERE f.Name="Jacob" AND t.Name<>"Jacob"
WITH f, t, r
ORDER BY SIZE(r)
RETURN f AS fromNode, t AS toNode, COLLECT(r)[0] AS Metadata
I needed to exclude the "from person" as a destination as i was not interested in the shortest path back to the parent
Aside: why is it NaturalPerson? Are there "unnatural" people in the DB as well?
This should work:
MATCH (f:NaturalPerson)-[r:IMMEDIATE_FAMILY_MEMBER*..6]-(t)
WHERE f.Name="Jacob"
WITH f, t, r
ORDER BY SIZE(r)
RETURN f AS fromNode, t AS toNode, COLLECT(r)[0] AS Metadata
The query sorts by the length of the paths found, uses the aggregating function COLLECT to get a list of all the r paths for a given pair of f and t values, and uses the first (i.e., the shortest) path in each list.
Related
When doing a Cypher query to retrieve a specific subgraph with automorphisms, let's say
MATCH (a)-[:X]-(b)-[:X]-(c),
RETURN a, b, c
It seems that the default behaviour is to return every retrieved subgraph and all their automorphisms.
In that exemple, if (u)-[:X]-(v)-[:X]-(w) is a graph matching the pattern, the output will be u,v,w but also w,v,u, which consist in the same graph.
Is there a way to retrieve each subgraph only once ?
EDIT: It would be great if Cypher have a feature to do that in the search, using some kind of symmetry breaking condition as it would reduce the computing time. If that is not the case, how would you post-process to find the desired output ?
In the query you are making, (a)-[r:X]-(b) and (a)-[t:X]-(c) refer to a similar pattern. Since (b) and (c) can be interchanged. What is the need to repeat matching twice? MATCH (a)-[r:X]-(b) RETURN a, r, b returns all the subgraphs you are looking for.
EDIT
You can do something as follows to find the nodes, which are having two relations of type X.
MATCH (a)-[r:X]-(b) WHERE size((a)-[:X]-()) = 2 RETURN a, r, b
For these kind of mirrored patterns, we can add a restriction on the internal graph ids so only one of the two paths is kept:
MATCH (a)-[:X]-(b)-[:X]-(c)
WHERE id(a) < id(c)
RETURN a, b, c
This will also prevent the case where a = c.
I have developed a query which, by trial and error, appears to find all of the duplicated relationships in a Neo4j DB. I want delete all but one of these relationships but I'm concerned that I have not thought of problematic cases that could result in data deletion.
So, does this query delete all but one of a duplicated relationship?
MATCH (a)-->(b)<--(a) # identify where the duplication is present
WITH DISTINCT a, b
MATCH (a)-[r]->(b) # get all duplicated paths themselves
WITH a, b, collect(r)[1..] as rs # remove the first instance from the list
UNWIND rs as r
DELETE r
If I replace the UNWIND rs as r; DELETE r with WITH a, b, count(rs) as cnt RETURN cnt it seems to return the unnecessary relationships.
I'm still relucant to put this somewhere to be used by others, though....
Thanks
First of all, let me (strictly) define the term: "duplicate relationships". Two relationships are duplicates if they:
Connect the same pair of nodes (call them a and b)
Have the same relationship type
Have exactly the same set of properties (both names and values)
Have the same directionality between a and b (iff directionality is significant for use case)
Your query only considers #1 and #4, so it generally could delete non-duplicate relationships as well.
Here is a query that will take all of the above into consideration (assuming #4 should be included):
MATCH (a)-[r1]->(b)<-[r2]-(a)
WHERE TYPE(r1) = TYPE(r2) AND PROPERTIES(r1) = PROPERTIES(r2)
WITH a, b, apoc.coll.union(COLLECT(r1), COLLECT(r2))[1..] AS rs
UNWIND rs as r
DELETE r
Aggregating functions (like COLLECT) use non-aggregated terms as grouping keys, so there is no need for the query to perform a separate redundant DISTINCT a,b test.
The APOC function apoc.coll.union returns the distinct union of its 2 input lists.
I have a graph which looks like this:
Here is the link to the graph in the neo4j console:
http://console.neo4j.org/?id=av3001
Basically, you have two branching paths, of variable length. I want to match the two paths between orange node and yellow nodes. I want to return one row of data for each path, including all traversed nodes. I also want to be able to include different WHERE clauses on different intermediate nodes.
At the end, i need to have a table of data, like this:
a - b - c - d
neo - morpheus - null - leo
neo - morpheus - trinity - cypher
How could i do that?
I have tried using OPTIONAL MATCH, but i can't get the two rows separately.
I have tried using variable length path, which returns the two paths but doesn't allow me to access and filter intermediate nodes. Plus it returns a list, and not a table of data.
I've seen this question:
Cypher - matching two different possible paths and return both
It's on the same subject but the example is very complex, a more generic solution to this simpler problem is what i'm looking for.
You can define what your end node by using WHERE statement. So in your case end node has no outgoing relationship. Not sure why you expect a null on return as you said neo - morpheus - null - leo
MATCH p=(n:Person{name:"Neo"})-[*]->(end) where not (end)-->()
RETURN extract(x IN nodes(p) | x.name)
Edit:
may not the the best option as I am not sure how to do this programmatically. If I use UNWIND I get back only one row. So this is a dummy solution
MATCH p=(n{name:"Neo"})-[*]->(end) where not (end)-->()
with nodes(p) as list
return list[0].name,list[1].name,list[2].name,list[3].name
You can use Cypher to match a path like this MATCH p=(:a)-[*]->(:d) RETURN p, and p will be a list of nodes/relationships in the path in the order it was traversed. You can apply WHERE to filter the path just like with node matching, and apply any list functions you need to it.
I will add these examples too
// Where on path
MATCH p=(:a)-[*]-(:d) WHERE NONE(n in NODES(p) WHERE n.name="Trinity") WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Spit path into columns
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN p[0], p[1], p[2], p[3]
// Match path, filter on label
MATCH p=(:a)-[*]-(:d) WITH NODES(p) as p RETURN FILTER(n in p WHERE "a" in LABELS(n)) as a, FILTER(n in p WHERE "b" in LABELS(n)) as b, FILTER(n in p WHERE "c" in LABELS(n)) as c, FILTER(n in p WHERE "d" in LABELS(n)) as d
Unfortunately, you HAVE to explicitly set some logic for each column. You can't make dynamic columns (that I know of). In your table example, what is the rule for which column gets 'null'? In the last example, I set each column to be the set of nodes of a label.
I.m.o. you're asking for extensive post-processing of the results of a simply query (give me all the paths starting from Neo). I say this because :
You state you need to be able to specify specific WHERE clauses for each path (but you don't specify which clauses for which path ... indicating this might be a dynamic thing ?)
You don't know the size of the longest path beforehand ... but you still want the result to be a same-size-for-all-results table. And would any null columns then always be just before the end node ? Why (for that makes no real sense other then convenience) ?
...
Therefore (and again i.m.o.) you need to process the results in a (Java or whatever you prefer) program. There you'll have full control over the resultset and be able to slice and dice as you wish. Cypher (exactly like SQL in fact) can only do so much and it seems that you're going beyond that.
Hope this helps,
Regards,
Tom
P.S. This may seem like an easy opt-out, but look at how simple your query is as compared to the constructs that have to be wrought trying to answer your logic. So ... separate the concerns.
I have a DAG which for the most part is a tree... but there are a few cycles in it. I mention it in case it matters.
I have to translate the graph into pairs of relations. If:
A -> B
C
D -> 1
2 -> X
Y
Then I would produce ArB, ArC, arD, Dr1, Dr2, 2rX, 2rY, where r is some relationship information (in other words, the query cannot totally ignore it.)
Also, in my graph, node A has many cousins, so I need to 'anchor' my query to A.
My current attempt generates all possible pairs, so I get many unhelpful pairs such as ArY since A can eventually traverse to Y.
What is a query that starts (or ends) with A, that returns a list of pairs? I don't want to query Neo individually for each node - I want to get the list in one shot if possible.
The query would be great, doc pages that explain would be great. Any help is appreciated.
EDIT Here's what I have so far, using Frobber's post as inspiration:
1. MATCH p=(n {id:"some_id"})-[*]->(m)
2. WITH DISTINCT(NODES(p)) as zoot
3. MATCH (x)-[r]->(y)
4. WHERE x IN zoot AND y IN zoot
5. RETURN DISTINCT x, TYPE(r) as r, y
Where in line 1, I make a path that includes all the nodes under the one I care about.
In line 2, I start a new match that is intended to return my pairs
Line 3, I convert the path of nodes to a collection of nodes
Line 4, I accept only x and y nodes that were scooped up the first match. I am not sure why I have to include y in the condition, but it seems to matter.
Line 5, I return the results. I do not know why I need a distinct here. I thought the one on line 3 would do the trick.
So far, this is working for me. I have no insight into its performance in a large graph.
Here's an approach to try - this query is modeled off of the sample matrix data you can find online so you can play with it before adapting it to your schema.
MATCH p=(n:Crew)-[r:KNOWS*]-m
WHERE n.name='Neo'
WITH p, length(nodes(p)) AS nCount, length(relationships(p)) AS rCount
RETURN nodes(p)[nCount-2], relationships(p)[rCount-1], nodes(p)[nCount-1];
ORDER BY length(p) ASC;
A couple of notes about what's going on here:
Consider the "Neo" node (n.name="Neo") to be your "A" here. You're rooting this path traversal in some particular node you pick out.
We're matching paths, not nodes or edges.
We're going through all paths rooted at the A node, ordering by path length. This gets the near nodes before the distant nodes.
For each path we find, we're looking at the nodes and relationships in the path, and then returning the last pair. The second-to-last node (nodes(p)[nCount-2]) and the last relationship in the path (relationships(p)[rCount-1]).
This query basically returns the node, the relationship, and the connected node showing that you can get those items; from there you just customize the query to pull out whatever about those nodes/rels you might need pursuant to your schema.
The basic formula starts with matching p=(someNode {startingPoint: "A"})-[r:*]->(otherStuff); from there it's just processing paths as you go.
I have a scenario where I have more than 2 random nodes.
I need to get all possible paths connecting all three nodes. I do not know the direction of relation and the relationship type.
Example : I have in the graph database with three nodes person->Purchase->Product.
I need to get the path connecting these three nodes. But I do not know the order in which I need to query, for example if I give the query as person-Product-Purchase, it will return no rows as the order is incorrect.
So in this case how should I frame the query?
In a nutshell I need to find the path between more than two nodes where the match clause may be mentioned in what ever order the user knows.
You could list all of the nodes in multiple bound identifiers in the start, and then your match would find the ones that match, in any order. And you could do this for N items, if needed. For example, here is a query for 3 items:
start a=node:node_auto_index('name:(person product purchase)'),
b=node:node_auto_index('name:(person product purchase)'),
c=node:node_auto_index('name:(person product purchase)')
match p=a-->b-->c
return p;
http://console.neo4j.org/r/tbwu2d
I actually just made a blog post about how start works, which might help:
http://wes.skeweredrook.com/cypher-it-all-starts-with-the-start/
Wouldn't be acceptable to make several queries ? In your case you'd automatically generate 6 queries with all the possible combinations (factorial on the number of variables)
A possible solution would be to first get three sets of nodes (s,m,e). These sets may be the same as in the question (or contain partially or completely different nodes). The sets are important, because starting, middle and end node are not fixed.
Here is the code for the Matrix example with added nodes.
match (s) where s.name in ["Oracle", "Neo", "Cypher"]
match (m) where m.name in ["Oracle", "Neo", "Cypher"] and s <> m
match (e) where e.name in ["Oracle", "Neo", "Cypher"] and s <> e and m <> e
match rel=(s)-[r1*1..]-(m)-[r2*1..]-(e)
return s, r1, m, r2, e, rel;
The additional where clause makes sure the same node is not used twice in one result row.
The relations are matched with one or more edges (*1..) or hops between the nodes s and m or m and e respectively and disregarding the directions.
Note that cypher 3 syntax is used here.