Cypher match path based on previous relationship - neo4j

I am trying to impose restrictions on my path match pattern.
I would like to match the next relationship based on the type of the previous used relationship.
Here is an example of a simplified Database:
(A)-1-(B)-2-(C)-1-(E)-2-(F)
| |
3----(D)----3
Query 1:
start n=node(A), m=node(F)
match p=n-[r*]-m
return p
should result both paths
(A)-1-(B)-2-(C)-1-(E)-2-(F)
(A)-1-(B)-3-(D)-3-(E)-2-(F)
However, when running the query starting from node (F):
start m=node(F),n=node(A)
match p=m-[r*]-n
return p
The result should be only:
(F)-2-(E)-1-(C)-2-(B)-1-(A)
Path
(F)-2-(E)-3-(D)-3-(B)-1-(A)
should not be valid, since it violates these constrains:
Coming from a -1- type relationship you can proceed to either a
-2- or -3- relationship.
Coming from a -2- or -3- type relationship you can only proceed to
a -1- relationship.
These paths are valid:
()-1-()-2-()
()-1-()-3-()
()-2-()-1-()
()-3-()-1-()
These path are not valid:
()-3-()-2-()
()-2-()-3-()

First, upvote for the very detailed, specific, and well laid out question.
Unfortunately, I don't think it's possible to do what you want to do with Cypher, I think you need the Traversal API to do this. Cypher is a declarative language; that is, you tell it what you want, and it goes and gets it for you. Using the traversal API is an imperative approach to query; that is, you tell neo4j exactly how to traverse the graph.
Your query here imposes constraints about the order in which relationships get traversed, and what makes a valid path. Nothing wrong with that, but I believe that imposing constraints on the order of traversal implicitly means you're telling cypher which way to traverse, and you just can't do that with cypher because it's declarative. Another common example of the declarative vs. imperative thing is breadth-first vs. depth-first search. If you're looking for certain nodes, you can't tell cypher to traverse breadth-first vs. depth-first; you just tell it which nodes you want, and it goes and gets them.
Now, paths can be treated like collections in cypher via the relationships() function. And you can use the filter and reduce functions to work with individual relationships. But your query is harder in that you need code that says something like "If the first relationship in a path is a 1, then the next must be a 2 or a 3". This is exactly the sort of thing that you can do with Evaluators in the Traversal API. Check the interface and you can see how you could write your own method that would implement exactly the logic you're talking about via Evaluator#evaluate(Path path).
As a general note, because declarative query (cypher) hides traversal details from you, IMHO it's always better to use declarative query for ease, if you can specify what you want declaratively. But there are cases where you have to control the order of traversal, and for that you need traversal. (I have had cases where I need all nodes connected to something else, via breadth-first search only, to a maximum depth of 3, along complex relationship criteria -- I couldn't use cypher for that).
To give you a way forward, check the link I provided on the traversal framework. Perhaps you can describe your query as a TraversalDescription, and then hand it off to neo4j to run.

Related

Cypher return multiple hops through pattern of relationships and nodes

I'm making a proof of concept access control system with neo4j at work, and I need some help with Cypher.
The data model is as follows:
(:User|Business)-[:can]->(:Permission)<-[:allows]-(:Business)
Now I want to get a path from a User or a Business to all the Business-nodes that you can reach trough the
-[:can]->(:Permission)<-[:allows]-
pattern. I have managed to write a MATCH that gets me halfway there:
MATCH
path =
(:User {userId: 'e96cca53-475c-4534-9fe1-06671909fa93'})-[:can|allows*]-(b:Business)
but this doesn't have any directions, and I can't figure out how to include the directions without reducing the returned matches to only the direct matches (i.e it doesn't continue after the first hit on a :Business node)
So what I'm wondering is:
Is there a way to match multiple of these hops in one query?
Should I model this entirely different?
Am I on the wrong path completely and the query should be completely
rewritten
Currently the syntax of variable-length expansions doesn't allow fine control for separate directions for certain types. There are improvements in the pipeline around this, but for the moment Cypher alone won't get you what you want.
We can use APOC Procedures for this, as fine control of the direction of types in the expansion, and sequences of relationships, are supported in the path expander procs.
First, though, you'll need to figure out how to address your user-or-business match, either by adding a common label to these nodes by which you can MATCH either type by property, or you can use a subquery with two UNIONed queries, one for :Business nodes, the other for :User nodes, that way you can still take advantage of an index on either, and get possible results in a single variable.
Once you've got that, you can use apoc.path.expandConfig(), passing some options to get what you want:
// assume you've matched to your `start` node already
CALL apoc.path.expandConfig(start, {relationshipFilter:'can>|<allows', labelFilter:'>Business'}) YIELD path
RETURN path
This one doesn't use sequences, but it does restrict the direction of expansion per relationship type. We are also setting the labelFilter such that :Business nodes are the end node of the path and not nodes of any other label.
You can specify the path as follows:
MATCH path = (:User {userId: $id})-[:can]->(:Permission)
<-[:allows]-(:Business))
RETURN path
This should return the results you're after.
I see a good solution has been provided via path expanding APOC procedures.
But I'll focus on your point #2: "Should I model this entirely differently?"
Well, not entirely but I think yes.
The really liberating part of working with Neo4j is that you can change the road you are driving over as easily as you can change your driving strategy: model vs query. And since you are at an early stage in your project, you can experiment with different models. There's a good opportunity to make just a semantic change to make an 'end run' around the problem.
The semantics of a relationship in Neo4j are expressed through
the mandatory TYPE you assign to the relationship, combined with
the direction you choose to point the mandatory arrow
The trick you solved with APOC was how to traverse a path of relationships that alternate between pointing forward and backward along the query's path. But before reaching for a power tool, why not just reverse the direction of either of your relationship types. You can change the model for allows from
<-[:allows]-
to
-[:is_allowed_by]->
and that buys you a lot. Now the directions of both relationships are the same and you can combine both relationships into a single relationship in the match pattern. And the path traversal can be expressed like this, short & sweet:
(u:User)-[:can|is_allowed_by*]->(c:Company)
That will literally go to all lengths to find every user-to-company path, branching included.

How to query Neo4j N levels deep with variable length relationship and filters on each level

I'm new(ish) to Neo4j and I'm attempting to build a tool that allows users on a UI to essentially specify a path of nodes they would like to query neo4j for. For each node in the path they can specify specific properties of the node and generally they don't care about the relationship types/properties. The relationships need to be variable in length because the typical use case for them is they have a start node and they want to know if it reaches some end node without caring about (all of) the intermediate nodes between the start and end.
Some restrictions the user has when building the path from the UI is that it can't have cycles, it can't have nodes who has more than one child with children and nodes can't have more than one incoming edge. This is only enforced from their perspective, not in the query itself.
The issue I'm having is being able to specify filtering on each level of the path without getting strange behavior.
I've tried a lot of variations of my Cypher query such as breaking up the path into multiple MATCH statements, tinkering with the relationships and anything else I could think of.
Here is a Gist of a sample Cypher dump
cypher-dump
This query gives me the path that I'm trying to get however it doesn't specify name or type on n_four.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
RETURN path
This query is what I'd like to work however it leaves out the leafs at the third level which I am having trouble understanding.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
AND n_four.name IN ["TAB1", "TAB2", "COPYA"]
AND n_four.type IN ["RESOURCE_TABLE", "COBOL_COPYBOOK"]
RETURN path
What I've noticed is that when I "... RETURN n_four" in my query it is including nodes that are at the third level as well.
This behavior is caused by your (probably inappropriate) use of [*0..] in your MATCH pattern.
FYI:
[*0..] matches 0 or more relationships. For instance, (a)-[*0..]->(b) would succeed even if a and b are the same node (and there is no relationship from that node back to itself).
The default lower bound is 1. So [*] is equivalent to [*..] and [*1..].
Your 2 queries use the same MATCH pattern, ending in ...->(n_three)-[*0..]->(n_four).
Your first query does not specify any WHERE tests for n_four, so the query is free to return paths in which n_three and n_four are the same node. This lack of specificity is why the query is able to return 2 extra nodes.
Your second query specifies WHERE tests for n_four that make it impossible for n_three and n_four to be the same node. The query is now more picky, and so those 2 extra nodes are no longer returned.
You should not use [*0..] unless you are sure you want to optionally match 0 relationships. It can also add unnecessary overhead. And, as you now know, it also makes the query a bit trickier to understand.

How does path search work on Cypher and what types of filtering can be done during expansion?

I'm trying to understand Neo4j's mechanics when dealing with path searches. I studied the query patterns and execution plan operators in the developer manual, but I still have some questions.
Please correct me if I'm wrong, but from the content I read, and from some posts on Neo4j's blog, I understood that Cypher and Java traversals generally perform depth-first searches, more specifically informed searches, and that variable-length queries fit into it. I also read that shortest path planning uses a breadth-first bidirectional search, and a depth-first search as a fallback.
Is there any way to perform breadth-first searches in Neo4j other than that?
I know the APOC procedures library allows this kind of search through path expanders, but I'm limiting my scope to just the Cypher language for now.
Also, does the variable-length pattern run recursively?
And what kinds of filtering are executed during expansion? I read that functions like ALL normally are checked during expansion, but some are executed later.
The reason for these questions is to see to what extent I would be able to manipulate the data and make complex traversals using only Cypher and what already comes with Neo4j, without external libraries and without having to write procedures through the API.
Forgive me if these questions are trivial. Thanks in advance.
Being a declarative query language Cypher doesnot controls how the underlying engine will search the pattern. So by just specifying a pattern we cannot specify which pattern matching algorithm shall be used for finding patterns. So we cannot perform breath first algorithm with just using Cypher.
For running a variable length pattern recursively we can use kleene star over single edge labels for example :KNOWS* or we can use the or operator in cypher for example :KNOWS|FOLLOWS* in cypher. Essentially this means (KNOWS|FOLLOWS)* in Regular Path Query (RPQ) format. Which is equivalent to (a|b)* in regular expression syntax. We can also have more edge labels such as (a|b|c|....)* in cypher.
We can also specify a varibale length over the pattern as ()-[:KNOWS|FOLLOWS*1..6]->() in cypher.
However we cannot do recursive iterations over a few pattern for example a sub-graph which has a node with one incoming edge and atleast two out going edges from the same node. for example a pattern such as this (a)-[r]->(b)-[s]->(c), (b)-[v]->(d). This is an example of a conjunctive pattern.
Filtering can be is normally performed in the WHERE clause based on the property key value pairs. We can set the labels in the pattern expressed in the MATCH clause so that the underlying engine knows the start node. For example as done in the following query
match (a:Person{name:'Keanu Reeves'})-[r]->(b) return * limit 5

Cypher: Ordering Nodes on Same Level by Property on Relationship

I am new to Neo4j and currently playing with this tree structure:
The numbers in the yellow boxes are a property named order on the relationship CHILD_OF.
My goal was
a) to manage the sorting order of nodes at the same level through this property rather than through directed relationships (like e.g. LEFT, RIGHT or IS_NEXT_SIBLING, etc.).
b) being able to use plain integers instead of complete paths for the order property (i.e. not maintaining sth. like 0001.0001.0002).
I can't however find the right hint on how or if it is possible to recursively query the graph so that it keeps returning the nodes depth-first but for the sorting at each level consider the order property on the relationship.
I expect that if it is possible it might include matching the complete path iterating over it with the collection utilities of Cypher, but I am not even close enough to post some good starting point.
Question
What I'd expect from answers to this question is not necessarily a solution, but a hint on whether this is a bad approach that would perform badly anyways. In terms of Cypher I am interested if there is a practical solution to this.
I have a general idea on how I would tackle it as a Neo4j server plugin with the Java traversal or core api (which doesn't mean that it would perform well, but that's another topic), so this question really targets the design and Cypher aspect.
This might work:
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, extract(r in rels(path) | r.order) as orders
ORDER BY orders
if it complains about sorting arrays then computing a number where each digit (or two digits) are your order and order by that number
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, reduce(a=1, r in rels(path) | a*10+r.order) as orders
ORDER BY orders

Cypher query optimisation - Utilising known properties of nodes

Setup:
Neo4j and Cypher version 2.2.0.
I'm querying Neo4j as an in-memory instance in Eclipse created TestGraphDatabaseFactory().newImpermanentDatabase();.
I'm using this approach as it seems faster than the embedded version and I assume it has the same functionality.
My graph database is randomly generated programmatically with varying numbers of nodes.
Background:
I generate cypher queries automatically. These queries are used to try and identify a single 'target' node. I can limit the possible matches of the queries by using known 'node' properties. I only use a 'name' property in this case. If there is a known name for a node, I can use it to find the node id and use this in the start clause. As well as known names, I also know (for some nodes) if there are names known not to belong to a node. I specify this in the where clause.
The sorts of queries that I am running look like this...
START
nvari = node(5)
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvari:C4)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION),
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION),
WHERE
NOT(nvarj.Name IN ['nf']) AND NOT(nvarm.Name IN ['nb','nj'])
RETURN DISTINCT target
Another way to think about this (if it helps), is that this is an isomorphism testing problem where we have some information about how nodes in a query and target graph correspond to each other based on restrictions on labels.
Question:
With regards to optimisation:
Would it help to include relation variables in the match clause? I took them out because the node variables are sufficient to distinguish between relationships but this might slow it down?
Should I restructure the match clause to have match/where couples including the where clauses from my previous example first? My expectation is that they can limit possible bindings early on. For example...
START
nvari = node(5)
MATCH
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarj.Name IN ['nf'])
MATCH
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarm.Name IN ['nb','nj'])
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION)
RETURN DISTINCT target
On the side:
(Less important but still an interest) If I make each relationship in a match clause an optional match except for relationships containing the target node, would cypher essentially be finding a maximum common sub-graph between the query and the graph data base with the constraint that the MCS contains the target node?
Thanks a lot in advance! I hope I have made my requirements clear but I appreciate that this is not a typical use-case for Neo4j.
I think querying with node properties is almost always preferable to using relationship properties (if you had a choice), as that opens up the possibility that indexing can help speed up the query.
As an aside, I would avoid using the IN operator if the collection of possible values only has a single element. For example, this snippet: NOT(nvarj.Name IN ['nf']), should be (nvarj.Name <> 'nf'). The current versions of Cypher might not use an index for the IN operator.
Restructuring a query to eliminate undesirable bindings earlier is exactly what you should be doing.
First of all, you would need to keep using MATCH for at least the first relationship in your query (which binds target), or else your result would contain a lot of null rows -- not very useful.
But, thinking clearly about this, if all the other relationships were placed in separate OPTIONAl MATCH clauses, you'd be essentially saying that you want a match even if none of the optional matches succeeded. Therefore, the logical equivalent would be:
MATCH (target:C5)-[:IN_LOCATION]->(nvara:LOCATION)
RETURN DISTINCT target
I don't think this is a useful result.

Resources