Enforce order of relations in multipath queries - neo4j

I'm looking into neo4j as a Graph database, and variable length path queries will be a very important use case. I now think I've found an example query that Cypher will not support.
The main issue is that I want to treat composed relations as a single relation. Let my give an example: finding co-actors. I've done this using the standard database of movies. The goal is to find all actors that have acted alongside Tom Hanks. This can be found with the query:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:ACTED_IN]-(a:Person) return a
Now, what if we want to find co-actors of co-actors recursively.
We can rewrite the above query to:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*2]-(a:Person) return a
And then it becomes clear we can do this with
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN*]-(a:Person) return a
Notably, all odd-length paths are excluded because they do not end in a Person.
Now, I have found a query that I cannot figure out how to make recursive:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-(a:Person) return DISTINCT a
In words, all actors that have a director in common with Tom Hanks.
In order to make this recursive I tried:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN|DIRECTED*]-(a:Person) return DISTINCT a
However, (besides not seeming to complete at all). This will also capture co-actors.
That is, it will match paths of the form
()-[:ACTED_IN]->()<-[:ACTED_IN]-()
So what I am wondering is:
can we somehow restrict the order in which relations occur in a multi-path query?
Something like:
MATCH (tom {name: "Tom Hanks"}){-[:ACTED_IN]->()<-[:DIRECTED]-()-[:DIRECTED]->()<-[:ACTED_IN]-}*(a:Person) return DISTINCT a
Where the * applies to everything in the curly braces.

The path expander procs from APOC Procedures should help here, as we added the ability to express repeating sequences of labels, relationships, or both.
In this case, since you want to match on the actor of the pattern rather than the director (or any of the movies in the path), we need to specify which nodes in the path you want to return, which requires either using the labelFilter in addition to the relationshipFilter, or just to use the combined sequence config property to specify the alternating labels/relationships expected, and making sure we use an end node filter on the :Person node at the point in the pattern that you want.
Here's how you would do this after installing APOC:
MATCH (tom:Person {name: "Tom Hanks"})
CALL apoc.path.expandConfig(tom, {sequence:'>Person, ACTED_IN>, *, <DIRECTED, *, DIRECTED>, *, <ACTED_IN', maxLevel:12}) YIELD path
WITH last(nodes(path)) as person, min(length(path)) as distance
RETURN person.name
We would usually use subgraphNodes() for these, since it's efficient at expanding out and pruning paths to nodes we've already seen, but in this case, we want to keep the ability to revisit already visited nodes, as they may occur in further iterations of the sequence, so to get a correct answer we can't use this or any of the procs that use NODE_GLOBAL uniqueness.
Because of this, we need to guard against exploring too many paths, as the permutations of relationships to explore that fit the path will skyrocket, even after we've already found all distinct nodes possible. To avoid this, we'll have to add a maxLevel, so I'm using 12 in this case.
This procedure will also produce multiple paths to the same node, so we're going to get the minimum length of all paths to each node.
The sequence config property lets us specify alternating label and relationship type filterings for each step in the sequence, starting at the starting node. We are using an end node filter symbol, > before the first Person label (>Person) indicating that we only want paths to the Person node at this point in the sequence (as the first element in the sequence it will also be the last element in the sequence as it repeats). We use the wildcard * for the label filter of all other nodes, meaning the nodes are whitelisted and will be traversed no matter what their label is, but we don't want to return any paths to these nodes.

If you want to see all the actors who acted in movies directed by directors who directed Tom Hanks, but who have never acted with Tom, here is one way:
MATCH (tom {name: "Tom Hanks"})-[:ACTED_IN]->(m)
MATCH (m)<-[:ACTED_IN]-(ignoredActor)
WITH COLLECT(DISTINCT m) AS ignoredMovies, COLLECT(DISTINCT ignoredActor) AS ignoredActors
UNWIND ignoredMovies AS movie
MATCH (movie)<-[:DIRECTED]-()-[:DIRECTED]->(m2)
WHERE NOT m2 IN ignoredMovies
MATCH (m2)<-[:ACTED_IN]-(a:Person)
WHERE NOT a IN ignoredActors
RETURN DISTINCT a
The top 2 MATCH clauses are deliberately not combined into one clause, so that the Tom Hanks node will be captured as an ignoredActor. (A MATCH clause filters out any result that use the same relationship twice.)

Related

Neo4j: Get all relationships between a set of nodes

Let's say I have nodes A and B. I want to find all the paths that connect these two nodes. How would I do it?
Here's an illustration I made. Sorry, I know it sucks lol
You can use a variable length relationship to return all such paths.
For your example (which has relationships pointing in both directions), this query using an undirected variable length relationship should work:
MATCH p=(:Foo {id: 'A'})-[*]-(:Foo {id: 'B'})
RETURN p
Note, however, that variable length relationship without a reasonable upper bound can take virtually forever or run out of memory. So, depending on your DB characteristics, you should determine and use a reasonable upper bound. For example:
MATCH p=(:Foo {id: 'A'})-[*..7]-(:Foo {id: 'B'})
RETURN p
To improve performance, it is also generally helpful to specify the possible relationship types along the path, to avoid going down inappropriate paths, as in:
MATCH p=(:Foo {id: 'A'})-[:TYPE_A|TYPE_B|TYPE_C*..7]-(:Foo {id: 'B'})
RETURN p

Two simple Cypher queries fail when combined

I'm stumped on this one, and I think the answer will be straightforward, so let me cut right to it.
Given a graph that looks like this:
Created by a query that looks like this:
CREATE (simpsons:Family {name: "Simpson"})
CREATE (homer:Father {name: "Homer"})
CREATE (lisa:Daughter {name: "Lisa"})
CREATE (snowball:Pet {name: "snowball"})
CREATE (lisa)-[:owns]->(snowball)-[:has]->(:Item {name: "catnip"})
CREATE (homer)-[:has]->(:Item {name: "beer"})
CREATE (lisa)-[:has]->(:Item {name: "saxophone"})
CREATE (lisa)<-[:memberOf]-(simpsons)-[:memberOf]->(homer)
Why would a query that looks like this fail?
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
Taken separately, its two components both find matches.
MATCH (f:Family),
(f)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"})
RETURN f;
and
MATCH (f:Family),
(f)-[*1..10]-(snowball:Pet),
(snowball)-[*1..10]-(:Item {name: "catnip"})
RETURN f;
But when pieced together there are no matches.
I have tried PROFILEing the query and it seems like Cypher works backwards from Snowball. It can make that first connection between the family and Snowball.
After that it does a VarLengthExpand(All)
snowball, f, lisa
(f)-[ UNNAMED22:*..10]-(lisa)
Which yields 6 rows. We then drop to 0 rows with this Filter:
snowball, f, lisa
lisa: Daughter
I can get the match to work if I declare a connection between the family and a daughter in the first line of the match statement, but for reasons having to do w/ my particular application this is not a useful workaround.
MATCH (f:Family)-[*1..10]-(lisa:Daughter),
(lisa)-[*1..10]-(:Item {name: "saxophone"}),
(lisa)-[*1..10]-(snowball:Pet {name: "snowball"})-[*1..10]-(:Item {name: "catnip"})
RETURN f;
I think I'm missing something about how Cypher searches for these patterns. Does anyone have insight into what that might be? Thank you for your time!
This isn't a Cypher bug, this is a side-effect of relationship uniqueness within a given MATCH pattern.
From the uniqueness section of the docs:
While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.
This type of uniqueness is usually correct, and is great for preventing infinite loops when using variable-length relationships which traverse a cycle.
Relationship uniqueness is preserved for patterns from a MATCH or an OPTIONAL MATCH, even when these include multiple comma separated paths, as in your case.
You have all of the paths within the pattern of a single MATCH, so relationships must be unique; if used in one path, they will not be reused for another path.
The real problem is here: (f)-[*1..10]-(snowball:Pet) because you've already traversed the same relationship (<memberOf between the Simpsons and Lisa) when you did (f)-[*1..10]-(lisa:Daughter) earlier. Since the relationship cannot be reused, one of those two paths will not be able to be matched, so the entire MATCH fails...no such pattern exists with unique relationships.
Note that when you break up the single MATCH into multiple MATCHes, as in stdob--'s answer, the query succeeds. There is no uniqueness in play here between separate MATCH clauses.

Query intersection of Paths in Neo4j using Cypher

Having this query working in Cypher (Neo4j):
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
RETURN p
which returns all possible paths belonging a specific group (group is just a property to classify nodes), I am struggling to get a query that returns the paths in common between both collection of paths. It would be something like this:
MATCH p=(g:Node)-[:FOLLOWED_BY *2..2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
MATCH p=(g3:Node)-[:FOLLOWED_BY *2..2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15
RETURN INTERSECTION(path1, path2)
Of course I made that up. The goal is to get all the paths in common between both queries.
The start/end nodes of your 2 MATCHes have different groups, so they can never find common paths.
Therefore, when you ask for "paths in common", I assume you actually want to find the shared middle nodes (between the 2 sets of 3-node paths). If so, this query should work:
MATCH p1=(g:Node)-[:FOLLOWED_BY *2]->(g2:Node)
WHERE g.group=10 AND g2.group=10
WITH COLLECT(DISTINCT NODES(p1)[1]) AS middle1
MATCH p2=(g3:Node)-[:FOLLOWED_BY *2]->(g4:Node)
WHERE g3.group=15 AND g4.group=15 AND NODES(p2)[1] IN middle1
RETURN DISTINCT NODES(p2)[1] AS common_middle_node;

Combining depth- and breadth-first traversals in a single cypher query

My graph is a tree structure with root and end nodes, and a line of nodes between them with [:NEXT]-> relationships from one to the next. Some nodes along that path also have [:BRANCH]-> relationships to other root nodes, and through them to other lines of nodes.
What Cypher query will return an ordered list of the nodes on the path from beginning to end, with any BRANCH relationships being included with the records for the nodes that have them?
EDIT: It's not a technical diagram, but the basic structure looks like this:
with each node depicted as a black circle. In this case, I would would want every node depicted here.
How about
MATCH p=(root)-[:NEXT*0..]->(leaf)
OPTIONAL MATCH (leaf)-[:BRANCH]->(branched)
RETURN leaf, branched, length(p) as l
ORDER BY l ASC
see also this graph-gist: http://gist.neo4j.org/?9042990
This query - a bit slow - should work (I guess):
START n=node(startID), child=node(*)
MATCH (n)-[rels*]-(child)
WHERE all(r in rels WHERE type(r) IN ["NEXT", "BRANCH"])
RETURN *
That is based on Neo4j 2.0.x Cypher syntax.
Technically this query will stop at the end of the tree started from startID: that is because the end in the diagram above belongs to a single path, but not the end of all the branches.
I would also recommend to limit the cardinality of the relationships - [rels*1..n] - to prevent the query to go away...
You wont be able to control the order in which the nodes are returned as per the depth first or breadth first algo unless you have a variable to save previous element or kind of recursive call which I dont think is not possible using only Cypher.
What you can do
MATCH p =(n)-[:NEXT*]->(end)
WITH collect(p) as node_paths
MATCH (n1)-[:NEXT]->(m)-[:BRANCH]->(n2)
WITH collect(m) as branch_nodes , node_paths
RETURN branch_nodes,node_paths
Now node_paths consists of all the paths with pattern (node)-[:NEXT]->(node)-[:NEXT]->...(node) . Now you have the paths and branch Nodes(starting point of basically all the paths in the node_paths except the one which will be emerging from root node) , you can arrange the output order accordingly.

Cypher query to find all paths with same relationship type

I'm struggling to find a single clean, efficient Cypher query that will let me identify all distinct paths emanating from a start node such that every relationship in the path is of the same type when there are many relationship types.
Here's a simple version of the model:
CREATE (a), (b), (c), (d), (e), (f), (g),
(a)-[:X]->(b)-[:X]->(c)-[:X]->(d)-[:X]->(e),
(a)-[:Y]->(c)-[:Y]->(f)-[:Y]->(g)
In this model (a) has two outgoing relationship types, X and Y. I would like to retrieve all the paths that link nodes along relationship X as well as all the paths that link nodes along relationship Y.
I can do this programmatically outside of cypher by making a series of queries, the first to
retrieve the list of outgoing relationships from the start node, and then a single query (submitted together as a batch) for each relationship. That looks like:
START n=node(1)
MATCH n-[r]->()
RETURN COLLECT(DISTINCT TYPE(r)) as rels;
followed by:
START n=node(1)
MATCH n-[:`reltype_param`*]->()
RETURN p as path;
The above satisfies my need, but requires at minimum 2 round trips to the server (again, assuming I batch together the second set of queries in one transaction).
A single-query approach that works, but is horribly inefficient is the following single Cypher query:
START n=node(1)
MATCH p = n-[r*]->() WHERE
ALL (x in RELATIONSHIPS(p) WHERE TYPE(x) = TYPE(HEAD(RELATIONSHIPS(p))))
RETURN p as path;
That query uses the ALL predicate to filter the relationships along the paths enforcing that each relationship in the path matches the first relationship in the path. This, however, is really just a filter operation on what it essentially a combinatorial explosion of all possible paths --- much less efficient than traversing a relationship of a known, given type first.
I feel like this should be possible with a single Cypher query, but I have not been able to get it right.
Here's a minor optimization, at least non-matching the paths will fail fast:
MATCH n-[r]->()
WITH distinct type(r) AS t
MATCH p = n-[r*]->()
WHERE type(r[-1]) = t // last entry matches
RETURN p AS path
This is probably one of those things that should be in the Java API if you want it to be really performant, though.

Resources