Neo4j stop searching an undirected path when given node is encountered - neo4j

I have the following test data in Neo4j:
merge (n1:device {name:"n1"})-[:phys {name:"phys"}]->(:interface {name:"n1a"})-[:cable {name:"cable"}]->(:interface {name:"n2a"})-[:phys {name:"phys"}]->(n2:device {name:"n2"})
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1b"})-[:cable {name:"cable"}]->(:interface {name:"n2b"})-[:phys {name:"phys"}]->(n2)
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1c"})-[:cable {name:"cable"}]->(:interface {name:"n2c"})-[:phys {name:"phys"}]->(n2)
merge (n1)-[:phys {name:"phys"}]->(:interface {name:"n1d"})-[:cable {name:"cable"}]->(:interface {name:"n2d"})
Giving:
While this example has exactly 3 relationships and 2 nodes on each of the 4 paths between each of n1 and n2, my real data could have many more, and also many more paths.
This is a undirected graph and in the real dataset, relationships on parts of each path are in either direction.
I know that every path starts at a :device and either just ends at a non :device or ends at a :device, and along the way there could be any number of relationships and other non :device nodes.
So I am looking to do:
match p=(:device {name:"n1"})-[*]-(:device) return (p)
and have it return the same, (I would be happy with double), number of records as:
match p=(:device {name:"n1"})-[*]->(:device) return (p)
So I am looking for a way to stop matching relationships and cease following the path when the first (:device) is encountered in the path.
From my limited understanding, I could easily achieve this by making every relationship bidirectional. However I have avoided that option to date as I have read it is bad practice.
Extra for experts :-)
Additionally, I would like a way to return any full paths that don't end at a :device (eg, the bottom one)
Thanks

This is a use case that is a little hard to do with just Cypher, as we don't have a way to specify "follow a variable-length path and stop when you reach another node of this type".
We can do something like this when we use LIMIT, but that becomes too restrictive when we don't know how many results there will be, or we need to do this for multiple starting nodes.
Because of this, there are some APOC path finder procedures that include more flexible options. One of these is a labelFilter option which lets you describe how to filter nodes with particular labels found during expansion (blacklisting, whitelisting, etc). One of these filters is called a termination filter (uses an / symbol before the appropriate label), which means to include the path to that node as a result, and stop expansion, which is exactly what you're looking for.
After you install APOC, you can use the apoc.path.expandConfig() procedure, starting from your start node, and supply the labelFilter config parameter to get this behavior:
MATCH (start:device {name:"n1"})
CALL apoc.path.expandConfig(start, {labelFilter:'/device'}) YIELD path
RETURN path

Related

Cypher return multiple hops through pattern of relationships and nodes

I'm making a proof of concept access control system with neo4j at work, and I need some help with Cypher.
The data model is as follows:
(:User|Business)-[:can]->(:Permission)<-[:allows]-(:Business)
Now I want to get a path from a User or a Business to all the Business-nodes that you can reach trough the
-[:can]->(:Permission)<-[:allows]-
pattern. I have managed to write a MATCH that gets me halfway there:
MATCH
path =
(:User {userId: 'e96cca53-475c-4534-9fe1-06671909fa93'})-[:can|allows*]-(b:Business)
but this doesn't have any directions, and I can't figure out how to include the directions without reducing the returned matches to only the direct matches (i.e it doesn't continue after the first hit on a :Business node)
So what I'm wondering is:
Is there a way to match multiple of these hops in one query?
Should I model this entirely different?
Am I on the wrong path completely and the query should be completely
rewritten
Currently the syntax of variable-length expansions doesn't allow fine control for separate directions for certain types. There are improvements in the pipeline around this, but for the moment Cypher alone won't get you what you want.
We can use APOC Procedures for this, as fine control of the direction of types in the expansion, and sequences of relationships, are supported in the path expander procs.
First, though, you'll need to figure out how to address your user-or-business match, either by adding a common label to these nodes by which you can MATCH either type by property, or you can use a subquery with two UNIONed queries, one for :Business nodes, the other for :User nodes, that way you can still take advantage of an index on either, and get possible results in a single variable.
Once you've got that, you can use apoc.path.expandConfig(), passing some options to get what you want:
// assume you've matched to your `start` node already
CALL apoc.path.expandConfig(start, {relationshipFilter:'can>|<allows', labelFilter:'>Business'}) YIELD path
RETURN path
This one doesn't use sequences, but it does restrict the direction of expansion per relationship type. We are also setting the labelFilter such that :Business nodes are the end node of the path and not nodes of any other label.
You can specify the path as follows:
MATCH path = (:User {userId: $id})-[:can]->(:Permission)
<-[:allows]-(:Business))
RETURN path
This should return the results you're after.
I see a good solution has been provided via path expanding APOC procedures.
But I'll focus on your point #2: "Should I model this entirely differently?"
Well, not entirely but I think yes.
The really liberating part of working with Neo4j is that you can change the road you are driving over as easily as you can change your driving strategy: model vs query. And since you are at an early stage in your project, you can experiment with different models. There's a good opportunity to make just a semantic change to make an 'end run' around the problem.
The semantics of a relationship in Neo4j are expressed through
the mandatory TYPE you assign to the relationship, combined with
the direction you choose to point the mandatory arrow
The trick you solved with APOC was how to traverse a path of relationships that alternate between pointing forward and backward along the query's path. But before reaching for a power tool, why not just reverse the direction of either of your relationship types. You can change the model for allows from
<-[:allows]-
to
-[:is_allowed_by]->
and that buys you a lot. Now the directions of both relationships are the same and you can combine both relationships into a single relationship in the match pattern. And the path traversal can be expressed like this, short & sweet:
(u:User)-[:can|is_allowed_by*]->(c:Company)
That will literally go to all lengths to find every user-to-company path, branching included.

How to query Neo4j N levels deep with variable length relationship and filters on each level

I'm new(ish) to Neo4j and I'm attempting to build a tool that allows users on a UI to essentially specify a path of nodes they would like to query neo4j for. For each node in the path they can specify specific properties of the node and generally they don't care about the relationship types/properties. The relationships need to be variable in length because the typical use case for them is they have a start node and they want to know if it reaches some end node without caring about (all of) the intermediate nodes between the start and end.
Some restrictions the user has when building the path from the UI is that it can't have cycles, it can't have nodes who has more than one child with children and nodes can't have more than one incoming edge. This is only enforced from their perspective, not in the query itself.
The issue I'm having is being able to specify filtering on each level of the path without getting strange behavior.
I've tried a lot of variations of my Cypher query such as breaking up the path into multiple MATCH statements, tinkering with the relationships and anything else I could think of.
Here is a Gist of a sample Cypher dump
cypher-dump
This query gives me the path that I'm trying to get however it doesn't specify name or type on n_four.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
RETURN path
This query is what I'd like to work however it leaves out the leafs at the third level which I am having trouble understanding.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
AND n_four.name IN ["TAB1", "TAB2", "COPYA"]
AND n_four.type IN ["RESOURCE_TABLE", "COBOL_COPYBOOK"]
RETURN path
What I've noticed is that when I "... RETURN n_four" in my query it is including nodes that are at the third level as well.
This behavior is caused by your (probably inappropriate) use of [*0..] in your MATCH pattern.
FYI:
[*0..] matches 0 or more relationships. For instance, (a)-[*0..]->(b) would succeed even if a and b are the same node (and there is no relationship from that node back to itself).
The default lower bound is 1. So [*] is equivalent to [*..] and [*1..].
Your 2 queries use the same MATCH pattern, ending in ...->(n_three)-[*0..]->(n_four).
Your first query does not specify any WHERE tests for n_four, so the query is free to return paths in which n_three and n_four are the same node. This lack of specificity is why the query is able to return 2 extra nodes.
Your second query specifies WHERE tests for n_four that make it impossible for n_three and n_four to be the same node. The query is now more picky, and so those 2 extra nodes are no longer returned.
You should not use [*0..] unless you are sure you want to optionally match 0 relationships. It can also add unnecessary overhead. And, as you now know, it also makes the query a bit trickier to understand.

follow all relationships but specific ones

How can I tell cypher to NOT follow a certain relationship/edge?
E.g. I have a :NODE that is connected to another :NODE via a :BUDDY relationship. Additionally every :NODE is related to :STUFF in arbitrary depth by arbitrary edges which are NOT of type :BUDDY. I now want to add a shortcut relation from each :NODE to its :STUFF. However, I do not include :STUFF of its :BUDDIES.
(:NODE)-[:BUDDY]->(:NODE)
(:NODE)-[*]->(:STUFF)
My current query looks like this:
MATCH (n:Node)-[*]->(s:STUFF) WHERE NOT (n)-[:BUDDY]->()-[*]->(s) CREATE (n)-[:HAS]->(s)
However I have some issues with this query:
1) If I ever add a :BUDDY relationship not directly between :NODE but children of :NODE the query will use that relationship for matching. This might not be intended as I do not want to include buddies at all.
2) Explain tells me that neo4j does the match (:NODE)-[*]->(:STUFF) and then AntiSemiApply the pattern (n)-[:BUDDY]->(). As a result it matches the whole graph to then unmatch most of the found connections. This seems ineffective and the query runs slower than I like (However subjective this might sound).
One (bad) fix is to restrict the depth of (:NODE)-[*]->(:STUFF) via (:NODE)-[*..XX]->(:STUFF). However, I cannot guarantee that depth unless I use a ridiculous high number for worst case scenarios.
I'd actually just like to tell neo4j to just not follow a certain relationship. E.g. MATCH (n:NODE)-[ALLBUT(:BUDDY)*]->(s:STUFF) CREATE (n)-[:HAS]->(s). How can I achieve this without having to enumerate all allowed connections and connect them with a | (which is really fast - but I have to manually keep track of all possible relations)?
One option for this particular schema is to explicitly traverse past the point where the BUDDY relationship is a concern, and then do all the unbounded traversing you like from there. Then you only have to apply the filter to single-step relationships:
MATCH (n:Node) -[r]-> (leaf)
WHERE NOT type(r) = 'BUDDY'
WITH n, leaf
MATCH (leaf) -[*] -> (s:Stuff)
WITH n, COLLECT(DISTINCT leaf) AS leaves, COLLECT(DISTINCT s) AS stuff
RETURN n, [leaf IN leaves WHERE leaf:Stuff] + stuff AS stuffs
The other option is to install apoc and take a look at the path expander procedure, which allows you to 'blacklist' node labels (such as :Node) from your path query, which may work just as well depending on your graph. See here.
My final solution is generating a string from the set of
{relations}\{relations_id_do_not_want}
and use that for matching. As I am using an API and thus can do this generation automatically it is not as much as an inconvenience as I feared, but still an inconvenience. However, this was the only solution I was able to find since posting this question.
You could use a condition o nrelationship type:
MATCH (:Node)-[r:REL]->(:OtherNode)
WHERE NOT type(r) = 'your_unwanted_rel_type'
I have no clues about perf though

How do I match from multiple starting nodes to the closest n nodes of a particular type per row?

I'm looking for a way to do breadth-first-search or shortest path from some start node(s) to a kind of node (whether by label or by property), and to stop when I find the first match (or n matches, if I could have that as a parameter).
I'd like to know if a solution exists with Cypher itself, and if not, if there are existing procedures (from APOC or some other source) to do this, and if not, how I might implement this (using the traversal framework maybe?)
The shortestPath() algorithms, both from neo4j itself and the APOC libraries, work best when you know your start and end nodes, or if you want to match based upon all possible start or end nodes.
But when we don't know our end nodes specifically, and we want to simply find the first node matching some predicates or labels (or nodes, if we allow the number we want to find to be parameterized), those procedures don't seem to work well.
For example, let's say I have a social graph of :Persons with [:Knows] relationships between them, and a [:LivesIn] relationship between :Persons and :City. Lastly, :Persons are additionally labeled with their occupation (so a :Person who is a doctor is also labeled with :Doctor)
With this example graph, for a given :City, for all :Persons in that city, I want to find the shortest path of each :Person following [:Knows] relationships to a :Doctor. As output I want to see each :Person, their closest :Doctor, the number of [:Knows] hops to that :Doctor, ordered by the number of hops descending.
My query, if I was using the neo4j shortest path, could look something like this:
MATCH (c:City)<-[:LivesIn]-(p:Person)
WHERE c.name = "San Diego"
WITH p
MATCH path = shortestPath( (p)-[:Knows*]-(d:Doctor) )
...
At this point we have rows pairing every person to every single doctor in the graph, and the shortest paths between all of them. As far as what to do next, I might collect all paths so we are back to one row per :Person, then order all collections by path length ascending, then for each user take the head of the path collection, then output the :Person, their closest :Doctor, and the number of hops between them.
That's not efficient at all. I want the shortest path to the closest :Doctor, and to stop searching once that first :Doctor node is found.
If an easy solution exists, I'd additionally like to know if supports other options (finding based on predicates and properties, not just labels), and if the number of matches I want it to find can be parameterized (if I want to find the closest 2 doctors, for example).
Why not use LIMIT 1 in the return statement?
APOC has matured quite a bit, and now has answers to these kinds of requirements.
Using apoc.path.expandConfig() (and other path expander procedures) it's now possible to expand out to nodes with certain labels, and stop further expansion once a given limit is reached (such as requesting the n closest doctors per person).
Using apoc.cypher.run(), it's now possible to perform a Cypher query with a LIMIT, and that limit will apply to the results per-row instead of limiting all rows. This is good for more complicated cases where node labels and relationship types alone aren't enough to describe the traversal or the nodes needed, such as when property evaluation is needed.
Both approaches are demonstrated in this Neo4j knowledge base entry.

Cypher query optimisation - Utilising known properties of nodes

Setup:
Neo4j and Cypher version 2.2.0.
I'm querying Neo4j as an in-memory instance in Eclipse created TestGraphDatabaseFactory().newImpermanentDatabase();.
I'm using this approach as it seems faster than the embedded version and I assume it has the same functionality.
My graph database is randomly generated programmatically with varying numbers of nodes.
Background:
I generate cypher queries automatically. These queries are used to try and identify a single 'target' node. I can limit the possible matches of the queries by using known 'node' properties. I only use a 'name' property in this case. If there is a known name for a node, I can use it to find the node id and use this in the start clause. As well as known names, I also know (for some nodes) if there are names known not to belong to a node. I specify this in the where clause.
The sorts of queries that I am running look like this...
START
nvari = node(5)
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvari:C4)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION),
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION),
WHERE
NOT(nvarj.Name IN ['nf']) AND NOT(nvarm.Name IN ['nb','nj'])
RETURN DISTINCT target
Another way to think about this (if it helps), is that this is an isomorphism testing problem where we have some information about how nodes in a query and target graph correspond to each other based on restrictions on labels.
Question:
With regards to optimisation:
Would it help to include relation variables in the match clause? I took them out because the node variables are sufficient to distinguish between relationships but this might slow it down?
Should I restructure the match clause to have match/where couples including the where clauses from my previous example first? My expectation is that they can limit possible bindings early on. For example...
START
nvari = node(5)
MATCH
(nvarj:C2)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarj.Name IN ['nf'])
MATCH
(nvarm:C3)-[:IN_LOCATION]->(nvarg:LOCATION)
WHERE NOT(nvarm.Name IN ['nb','nj'])
MATCH
(target:C5)-[:IN_LOCATION]->(nvara:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarb:LOCATION),
(nvara:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvarc:LOCATION),
(nvard:LOCATION)-[:CONNECTED]->(nvare:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarf:LOCATION),
(nvarg:LOCATION)-[:CONNECTED]->(nvarh:LOCATION),
(nvare:LOCATION)-[:CONNECTED]->(nvark:LOCATION)
RETURN DISTINCT target
On the side:
(Less important but still an interest) If I make each relationship in a match clause an optional match except for relationships containing the target node, would cypher essentially be finding a maximum common sub-graph between the query and the graph data base with the constraint that the MCS contains the target node?
Thanks a lot in advance! I hope I have made my requirements clear but I appreciate that this is not a typical use-case for Neo4j.
I think querying with node properties is almost always preferable to using relationship properties (if you had a choice), as that opens up the possibility that indexing can help speed up the query.
As an aside, I would avoid using the IN operator if the collection of possible values only has a single element. For example, this snippet: NOT(nvarj.Name IN ['nf']), should be (nvarj.Name <> 'nf'). The current versions of Cypher might not use an index for the IN operator.
Restructuring a query to eliminate undesirable bindings earlier is exactly what you should be doing.
First of all, you would need to keep using MATCH for at least the first relationship in your query (which binds target), or else your result would contain a lot of null rows -- not very useful.
But, thinking clearly about this, if all the other relationships were placed in separate OPTIONAl MATCH clauses, you'd be essentially saying that you want a match even if none of the optional matches succeeded. Therefore, the logical equivalent would be:
MATCH (target:C5)-[:IN_LOCATION]->(nvara:LOCATION)
RETURN DISTINCT target
I don't think this is a useful result.

Resources