Matching multiple independent (or dependent) paths - neo4j

I'm looking at a hierarchy of people and organizations, and trying to find where/if they meet and share management. Let's say "Bob" and "Susan" work for different branches. I want to show their two reporting relationships up through the company if/when they overlap.
This query currently works great, and returns a single path:
MATCH path=(p:Person {name: "Bob"})-[:reports_to*]->(o:Organization {code: "TopOfCompany"})
RETURN path;
This query also works great, and returns a single path:
MATCH path2=(p:Person {name: "Susan"})-[:reports_to*]->(o2:Organization {code: "TopOfCompany"})
RETURN path2;
This query (doing both of them in one operation) returns nothing at all:
MATCH path=(p:Person {name: "Bob"})-[:reports_to*]->(o:Organization {code: "TopOfCompany"}),
path2=(p:Person {name: "Susan"})-[:reports_to*]->(o2:Organization {code: "TopOfCompany"})
RETURN path,path2;
The same is true if I reuse the first o binding in the second path query.
I'm aware that I could reformulate this to find where the two people meet in the middle, like this:
MATCH path=(p1:Person {name: "Bob"})-[:reports_to*]->(o:Organization)<-[:reports_to*]-(p2:Person {name: "Susan"})
RETURN path;
And indeed that query runs fine - but if they don't meet in the middle, this query will fail since the o:Organization in the middle doesn't exist.
There are probably other equivalent ways I could reformulate to get to the right results - but the heart of my question is, is it not possible to identify two different independent paths in one query? This would be useful in the case where they don't meet, where the targets ("TopOfCompany") I'm matching to are different, or I just wanted to compare a series of paths.
Oh, and I'm on 2.2M04, using the server. The query with two paths succeeds, but the results are empty, as in the JSON version of the results is:
{"columns":["path","path2"],"data":[],"stats":{"contains_updates":false,"nodes_created":0,"nodes_deleted":0,"properties_set":0,"relationships_created":0,"relationship_deleted":0,"labels_added":0,"labels_removed":0,"indexes_added":0,"indexes_removed":0,"constraints_added":0,"constraints_removed":0}}

This query of yours is using the same variable (p) for the Bob AND Susan nodes, which probably explains why it does not work as you expected (a single node cannot have 2 different values for the same property):
MATCH path=(p:Person {name: "Bob"})-[:reports_to*]->(o:Organization {code: "TopOfCompany"}),
path2=(p:Person {name: "Susan"})-[:reports_to*]->(o2:Organization {code: "TopOfCompany"})
RETURN path,path2;
You can either use different variables, or just get rid of the node variables entirely (since you don't use them anywhere) -- like this:
MATCH path=(:Person {name: "Bob"})-[:reports_to*]->(:Organization {code: "TopOfCompany"}),
path2=(:Person {name: "Susan"})-[:reports_to*]->(:Organization {code: "TopOfCompany"})
RETURN path,path2;

Optional Match, which could be considered the Cypher equivalent of outer join in SQL, can be used when working with path matching. The following query matches the two individual paths as well as the path that matches both people:
MATCH
path1=(p1:Person {name: "Bob"})-[:reports_to*]->(o1:Organization {code: "TopOfCompany"})
OPTIONAL MATCH
path2=(p2:Person {name: "Susan"})-[:reports_to*]->(o2:Organization {code: "TopOfCompany"})
OPTIONAL MATCH
path3=(p1)-[:reports_to*]->(o:Organization {code: "TopOfCompany"})<-[:reports_to*]-(p2)
RETURN path1, path2, path3;

Related

Neo4j: Get all relationships between a set of nodes

Let's say I have nodes A and B. I want to find all the paths that connect these two nodes. How would I do it?
Here's an illustration I made. Sorry, I know it sucks lol
You can use a variable length relationship to return all such paths.
For your example (which has relationships pointing in both directions), this query using an undirected variable length relationship should work:
MATCH p=(:Foo {id: 'A'})-[*]-(:Foo {id: 'B'})
RETURN p
Note, however, that variable length relationship without a reasonable upper bound can take virtually forever or run out of memory. So, depending on your DB characteristics, you should determine and use a reasonable upper bound. For example:
MATCH p=(:Foo {id: 'A'})-[*..7]-(:Foo {id: 'B'})
RETURN p
To improve performance, it is also generally helpful to specify the possible relationship types along the path, to avoid going down inappropriate paths, as in:
MATCH p=(:Foo {id: 'A'})-[:TYPE_A|TYPE_B|TYPE_C*..7]-(:Foo {id: 'B'})
RETURN p

Neo4j Cypher: interdependent relationship values in a path

I have a graph dataset loaded in Neo4j with nodes being various persons and relationships being some "real" relationships between them. What makes it complicated is that each relationship has a time period during which it was valid. For example:
(p1:PERSON {name: "Andy"})
-[r1:HAS_RELATIONSHIP {from: "20190201", to: "20190215"}]->
(p2:PERSON {name: "Betty"})
-[r2:HAS_RELATIONSHIP {from: "20190301", to: "20190331"}]->
(p3:PERSON {name: "Cecil"})
I'd like to take one concrete person P and get a list of all persons with whom P was in an indirect relationship through other persons. It must hold that the intersection of dates in any relationship chain is nonempty.
So from the previous example, if we take Andy as P, the result should be Andy, Betty, because the relationship with Cecil was valid in a completely different period of time. But in the following case:
(p1:PERSON {name: "Andy"})
-[r1:HAS_RELATIONSHIP {from: "20190201", to: "20190215"}]->
(p2:PERSON {name: "Betty"})
-[r2:HAS_RELATIONSHIP {from: "20190210", to: "20190301"}]->
(p3:PERSON {name: "Cecil"})
the result should be Andy, Betty, Cecil.
Is there a way how to specify this condition in Cypher? I'm looking for an efficient solution which prunes the already found paths.
You basically have a list of intervals from all relationships on a path. For this list of intervals you need to check if they all overlap. This can be done by checking max(from) <= min(to), in cypher:
MATCH path=(p:PERSON {name:'Andy'})-[*..10]-(other) // Doesn't matter how you get the paths
UNWIND relationships(path) as r
WITH path,max(r.from) AS maxFrom,min(r.to) AS minTo
WHERE maxFrom <= minTo
RETURN extract(x in nodes(path) | x.name)

Enforce order of relations in multipath queries

I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.
The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.
If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)

Two simple Cypher queries fail when combined

I'm stumped on this one, and I think the answer will be straightforward, so let me cut right to it.
Given a graph that looks like this:
Created by a query that looks like this:
CREATE (simpsons:Family {name: "Simpson"})
CREATE (homer:Father {name: "Homer"})
CREATE (lisa:Daughter {name: "Lisa"})
CREATE (snowball:Pet {name: "snowball"})
CREATE (lisa)-[:owns]->(snowball)-[:has]->(:Item {name: "catnip"})
CREATE (homer)-[:has]->(:Item {name: "beer"})
CREATE (lisa)-[:has]->(:Item {name: "saxophone"})
CREATE (lisa)<-[:memberOf]-(simpsons)-[:memberOf]->(homer)
Why would a query that looks like this fail?
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
Taken separately, its two components both find matches.
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"})
RETURN f;
and
MATCH (f:Family),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
But when pieced together there are no matches.
I have tried PROFILEing the query and it seems like Cypher works backwards from Snowball. It can make that first connection between the family and Snowball.
After that it does a VarLengthExpand(All)
snowball, f, lisa
(f)-[ UNNAMED22:*..10]-(lisa)
Which yields 6 rows. We then drop to 0 rows with this Filter:
snowball, f, lisa
lisa: Daughter
I can get the match to work if I declare a connection between the family and a daughter in the first line of the match statement, but for reasons having to do w/ my particular application this is not a useful workaround.
MATCH (f:Family)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(lisa)-[*1..10]-(snowball:Pet {name: "snowball"})-[*1..10]-(:Item {name: "catnip"})
RETURN f;
I think I'm missing something about how Cypher searches for these patterns. Does anyone have insight into what that might be? Thank you for your time!
This isn't a Cypher bug, this is a side-effect of relationship uniqueness within a given MATCH pattern.
From the uniqueness section of the docs:
While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.
This type of uniqueness is usually correct, and is great for preventing infinite loops when using variable-length relationships which traverse a cycle.
Relationship uniqueness is preserved for patterns from a MATCH or an OPTIONAL MATCH, even when these include multiple comma separated paths, as in your case.
You have all of the paths within the pattern of a single MATCH, so relationships must be unique; if used in one path, they will not be reused for another path.
The real problem is here: (f)-[*1..10]-(snowball:Pet) because you've already traversed the same relationship (<memberOf between the Simpsons and Lisa) when you did (f)-[*1..10]-(lisa:Daughter) earlier. Since the relationship cannot be reused, one of those two paths will not be able to be matched, so the entire MATCH fails...no such pattern exists with unique relationships.
Note that when you break up the single MATCH into multiple MATCHes, as in stdob--'s answer, the query succeeds. There is no uniqueness in play here between separate MATCH clauses.

Neo4j Passing distinct nodes through WITH in Cypher

I have the following query, where there are 3 MATCHES, connected with WITH, searching through 3 paths.
MATCH (:File {name: 'A'})-[:FILE_OF]->(:Fun {name: 'B'})-->(ent:CFGEntry)-[:Flows*]->()-->(expr:CallExpr {name: 'C'})-->()-[:IS_PARENT]->(Callee {name: 'd'})
WITH expr, ent
MATCH (expr)-->(:Arg {chNum: '1'})-->(id:Id)
WITH id, ent
MATCH (entry)-[:Flows*]->(:IdDecl)-[:Def]->(sym:Sym)
WHERE id.name = sym.name
RETURN id.name
The query returns two distinct id and one distinct entry, and 7 distinct sym.
The problem is that since in the second MATCH I pass "WITH id, entry", and two distinct id were found, two instances of entry is passed to the third match instead of 1, and the run time of the third match unnecessarily gets doubled at least.
I am wondering if anyone know how I should write this query to just make use of one single instance of entry.
Your best bet will be to aggregate id, but then you'll need to adjust your logic in the third part of your query accordingly:
MATCH (:File {name: 'A'})-[:FILE_OF]->(:Fun {name: 'B'})-->(ent:CFGEntry)-[:Flows*]->()-->(expr:CallExpr {name: 'C'})-->()-[:IS_PARENT]->(Callee {name: 'd'})
WITH expr, ent
MATCH (expr)-->(:Arg {chNum: '1'})-->(id:Id)
WITH collect(id.name) as names, ent
MATCH (entry)-[:Flows*]->(:IdDecl)-[:Def]->(sym:Sym)
WHERE sym.name in names
RETURN sym.name

Resources