Query intersection of Paths in Neo4j using Cypher - neo4j

Having this query working in Cypher (Neo4j):
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
RETURN p
which returns all possible paths belonging a specific group (group is just a property to classify nodes), I am struggling to get a query that returns the paths in common between both collection of paths. It would be something like this:
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
MATCH p=(g3:Node)-[:FOLLOWED_BY *2..2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15
RETURN INTERSECTION(path1, path2)
Of course I made that up. The goal is to get all the paths in common between both queries.

The start/end nodes of your 2 MATCHes have different groups, so they can never find common paths.
Therefore, when you ask for "paths in common", I assume you actually want to find the shared middle nodes (between the 2 sets of 3-node paths). If so, this query should work:
MATCH p1=(g:Node)-[:FOLLOWED_BY *2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
WITH COLLECT(DISTINCT NODES(p1)[1]) AS middle1
MATCH p2=(g3:Node)-[:FOLLOWED_BY *2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15 AND NODES(p2)[1] IN middle1
RETURN DISTINCT NODES(p2)[1] AS common_middle_node;

Related

Enforce order of relations in multipath queries

I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.
The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.
If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)

Why is a single Neo4j relationship shown twice in Cypher query results?

Let's consider a trivial graph with a directed relationship:
CREATE
(`0` :Car {value:"Ford"})
, (`1` :Car {value:"Subaru"})
, (`0`)-[:`DOCUMENT` {value:"DOC-1"}]->(`1`);
The following query MATCH (n1:Car)-[r:DOCUMENT]-(n2:Car) RETURN * returns:
╒══════════════════╤══════════════════╤═════════════════╕
│"n1" │"n2" │"r" │
╞══════════════════╪══════════════════╪═════════════════╡
│{"value":"Subaru"}│{"value":"Ford"} │{"value":"DOC-1"}│
├──────────────────┼──────────────────┼─────────────────┤
│{"value":"Ford"} │{"value":"Subaru"}│{"value":"DOC-1"}│
└──────────────────┴──────────────────┴─────────────────┘
The graph defines only a Ford->Subaru relationship, why are two relationships?
How to interpret the reversed one (line 1; not specified in the CREATE) statement?
Note: This is a follow-up to Convert multiple relationships between 2 nodes to a single one with weight asked by me earlier. I solved my problem, but I'm not convinced my answer is the best solution.
Your MATCH statement here doesn't specify the direction, therefore there are two possible paths that will match the pattern (remember that the ordering of nodes in the path is important and distinguishes paths from each other), thus your two answers.
If you specify the direction of the relationship instead you'll find there is only one possible path that matches:
MATCH (n1:Car)-[r:DOCUMENT]->(n2:Car)
RETURN *
As for the question of why we get two paths back when we omit the direction, remember that paths are order-sensitive: two paths that have the same elements but with a different order of the elements are different paths.
To help put this into perspective, consider the following two queries:
# Query 1
MATCH (n1:Car)-[r:DOCUMENT]-(n2:Car)
WHERE n1.value = 'Ford'
RETURN *
╒══════════════════╤══════════════════╤═════════════════╕
│"n1" │"n2" │"r" │
╞══════════════════╪══════════════════╪═════════════════╡
│{"value":"Ford"} │{"value":"Subaru"}│{"value":"DOC-1"}│
└──────────────────┴──────────────────┴─────────────────┘
# Query 2
MATCH (n1:Car)-[r:DOCUMENT]-(n2:Car)
WHERE n1.value = 'Subaru'
RETURN *
╒══════════════════╤══════════════════╤═════════════════╕
│"n1" │"n2" │"r" │
╞══════════════════╪══════════════════╪═════════════════╡
│{"value":"Subaru"}│{"value":"Ford"} │{"value":"DOC-1"}│
└──────────────────┴──────────────────┴─────────────────┘
Conceptually (and also used by the planner, in absence of indexes), to get to each of the above results you start off with the results of the full match as in your description, then filter to the only one which meets the given criteria.
The results above would not be consistent with the original directionless match query if that original query only returned a single row instead of two.
Additional information from the OP
It will take a while to wrap my head around it, but it does work this way and here's a piece of documentation to confirm it's by design:
When your pattern contains a bound relationship, and that relationship pattern doesn’t specify direction, Cypher will try to match the relationship in both directions.
MATCH (a)-[r]-(b)
WHERE id(r)= 0
RETURN a,b
This returns the two connected nodes, once as the start node, and once as the end node.

Cypher - traversal of nodes

I have a simple graph with nodes labeled as Organization and two directed relationship types 'Control' and 'Influence'. My objective is -
Step-1
Given a node, find all nodes connected to it with 'Control' relationship (in any direction)
Step-2
For all nodes found by Step-1, find any outbound 'Influence' relationships (of any length) and include those nodes as well
What I have been able to come up thus far:
MATCH (x:Organization {ORGID: "5621"})-[:Control*1..]-(y) WITH y MATCH y-[:Influence*0..]-(z) RETURN y,z;
Questions
1) This query does not include the starting node, how can I get that in the result?
2) Ideally I'd like to get the relationships in the result also, it just returns the nodes
TIA
Here is one way to do what you want:
MATCH (x:Organization { ORGID: "5621" }), p1 = x-[:Influence*0..]->(z)
WHERE NOT z-[:Influence]->()
WITH x, COLLECT(p1) AS c1
OPTIONAL MATCH x-[:Control*]-(y), p2=y-[:Influence*0..]->(z)
WHERE NOT z-[:Influence]->()
RETURN c1 + COLLECT(p2) AS result;
The query returns a collection of matching Influence paths. The WHERE clauses are used to make sure that each path is as long as possible (to avoid a lot of duplicate subpaths from showing up in the results). Every path is a collection of node(s) separated by relationships, and can consist of just a node if the path has no relationships.

query for all paths with a set of specified relationships in neo4j

I am trying to query a graph to return all of the paths with a set of specified relationships.
The graph I have contains the following nodes: Person
The relationships which connect two people are: knows, married
So an example of some data are:
c:Person-[:knows]->b:Person
a:Person-[:married]->c:Person
d:Person-[knows]-> a:Person
In my query I would like to be able to find all paths that contain both relationships 'knows' and 'married'; however, I don't care about the ordering of such relationships in the paths. For example, my query should return the following paths:
1) a:Person-[:married]->c:Person-[:knows]->b:Person
2) d:Person-[:knows]->a:Person-[:married]->c:Person
I tried the following query
MATCH p=(a)-[:KNOWS|MARRIED*1..3]-(b)
RETURN p
However, it returned the paths only having knows relationship or the paths only having married relationship, but not both.
Is there any way of finding the paths I want? Many thanks!
You can try this
MATCH p=(a)-[rel*]-(b)
WHERE type(rel)='KNOWS' OR type(rel)='MARRIED'
RETURN p
It will provide you all paths
MATCH p = (a:Person)-[rels:KNOWS|MARRIED*]->(b:Person)
WITH p, EXTRACT(r IN rels | TYPE(r)) AS types
WHERE 'KNOWS' IN types AND 'MARRIED' IN types
RETURN p

Cypher query to find all paths with same relationship type

I'm struggling to find a single clean, efficient Cypher query that will let me identify all distinct paths emanating from a start node such that every relationship in the path is of the same type when there are many relationship types.
Here's a simple version of the model:
CREATE (a), (b), (c), (d), (e), (f), (g),
(a)-[:X]->(b)-[:X]->(c)-[:X]->(d)-[:X]->(e),
(a)-[:Y]->(c)-[:Y]->(f)-[:Y]->(g)
In this model (a) has two outgoing relationship types, X and Y. I would like to retrieve all the paths that link nodes along relationship X as well as all the paths that link nodes along relationship Y.
I can do this programmatically outside of cypher by making a series of queries, the first to
retrieve the list of outgoing relationships from the start node, and then a single query (submitted together as a batch) for each relationship. That looks like:
START n=node(1)
MATCH n-[r]->()
RETURN COLLECT(DISTINCT TYPE(r)) as rels;
followed by:
START n=node(1)
MATCH n-[:`reltype_param`*]->()
RETURN p as path;
The above satisfies my need, but requires at minimum 2 round trips to the server (again, assuming I batch together the second set of queries in one transaction).
A single-query approach that works, but is horribly inefficient is the following single Cypher query:
START n=node(1)
MATCH p = n-[r*]->() WHERE
ALL (x in RELATIONSHIPS(p) WHERE TYPE(x) = TYPE(HEAD(RELATIONSHIPS(p))))
RETURN p as path;
That query uses the ALL predicate to filter the relationships along the paths enforcing that each relationship in the path matches the first relationship in the path. This, however, is really just a filter operation on what it essentially a combinatorial explosion of all possible paths --- much less efficient than traversing a relationship of a known, given type first.
I feel like this should be possible with a single Cypher query, but I have not been able to get it right.
Here's a minor optimization, at least non-matching the paths will fail fast:
MATCH n-[r]->()
WITH distinct type(r) AS t
MATCH p = n-[r*]->()
WHERE type(r[-1]) = t // last entry matches
RETURN p AS path
This is probably one of those things that should be in the Java API if you want it to be really performant, though.

Resources