Store a users traversal path through a nested hierarchy - neo4j

I have a nested tree stored in Neo4j. Where each node can have a (n)-[:CHILD]->(c) relationship with other nodes. Allowing you to query the entire tree from a given node down with MATCH (n)-[c:CHILD*]-(m).
What I am having trouble with, is figuring out how to store a path that a user takes as they walk through a tree. For instance a query to return the path would be (user)-[:USER_PATH*]->(node).
However the path has to remain along the lines of :CHILD relationships, it cannot jump outside of its branch. A user path cannot jump from the leaf of one branch to a leaf of another branch, without first retracting its way back up the path it came from until it finds a fork that will walk down to it.
Also I do not think shortest path will work, because its not the shortest path I want, I want the users actual path. But it should disregard relationships that were abandoned as the user backed out of any branches. It should not be leaving around dead paths.
How would I be able to update the graph after each node is walked to so these rules stay in tact??
It can be assumed that in drilling down into a new branch, it can only step through to one more set of siblings. However all branches it came through are still open, so their siblings can be selected.
Best I can figure is that it needs to:
"prefer" walking the :USER_PATH relationships as long as it can
until it needs to break that path to get to the new node
at which point it creates any new relationships
then delete any old relationships that are no longer on that path
I have no idea how to accomplish that though.
I have spent a ton of time in trial and error and googling to no avail.
Thanks in advance!
Given the image below:
red node = User
green nodes = A valid node to be a new "target"
blue nodes = invalid target node
So if you were to back out of the leaf node it is in currently, it would delete that final :RATIONAL_PATH relation in the chain.
Also the path should adjust to any of the green nodes that were selected, but keeping the existing :RATIONAL_PATH in tact for as far as possible.

Personally I think removing the existing path and creating a new one with shortestPath() is probably the best way to go. The cost of reusing the existing path and performing cleanup is often going to be higher and more complex than simply starting over.
Otherwise, the approach to take would be to match down to the last node of your path, and then perform a shortestPath() to the new node, and create your path.
And then we'd have to perform cleanup. This would probably involve matching all paths along :RATIONAL_PATH relationships to the end node resulting in a group of paths. The one with the shortest distance is going to be the one we keep. We'd need to collect those relationships, collect the other relationships of other paths that are no longer valid, do some set subtraction to get the relationships not used in the shortest path, and deleting them.
That's quite a bit of work that should probably be avoided.

Related

Cypher return multiple hops through pattern of relationships and nodes

I'm making a proof of concept access control system with neo4j at work, and I need some help with Cypher.
The data model is as follows:
(:User|Business)-[:can]->(:Permission)<-[:allows]-(:Business)
Now I want to get a path from a User or a Business to all the Business-nodes that you can reach trough the
-[:can]->(:Permission)<-[:allows]-
pattern. I have managed to write a MATCH that gets me halfway there:
MATCH
path =
(:User {userId: 'e96cca53-475c-4534-9fe1-06671909fa93'})-[:can|allows*]-(b:Business)
but this doesn't have any directions, and I can't figure out how to include the directions without reducing the returned matches to only the direct matches (i.e it doesn't continue after the first hit on a :Business node)
So what I'm wondering is:
Is there a way to match multiple of these hops in one query?
Should I model this entirely different?
Am I on the wrong path completely and the query should be completely
rewritten
Currently the syntax of variable-length expansions doesn't allow fine control for separate directions for certain types. There are improvements in the pipeline around this, but for the moment Cypher alone won't get you what you want.
We can use APOC Procedures for this, as fine control of the direction of types in the expansion, and sequences of relationships, are supported in the path expander procs.
First, though, you'll need to figure out how to address your user-or-business match, either by adding a common label to these nodes by which you can MATCH either type by property, or you can use a subquery with two UNIONed queries, one for :Business nodes, the other for :User nodes, that way you can still take advantage of an index on either, and get possible results in a single variable.
Once you've got that, you can use apoc.path.expandConfig(), passing some options to get what you want:
// assume you've matched to your `start` node already
CALL apoc.path.expandConfig(start, {relationshipFilter:'can>|<allows', labelFilter:'>Business'}) YIELD path
RETURN path
This one doesn't use sequences, but it does restrict the direction of expansion per relationship type. We are also setting the labelFilter such that :Business nodes are the end node of the path and not nodes of any other label.
You can specify the path as follows:
MATCH path = (:User {userId: $id})-[:can]->(:Permission)
<-[:allows]-(:Business))
RETURN path
This should return the results you're after.
I see a good solution has been provided via path expanding APOC procedures.
But I'll focus on your point #2: "Should I model this entirely differently?"
Well, not entirely but I think yes.
The really liberating part of working with Neo4j is that you can change the road you are driving over as easily as you can change your driving strategy: model vs query. And since you are at an early stage in your project, you can experiment with different models. There's a good opportunity to make just a semantic change to make an 'end run' around the problem.
The semantics of a relationship in Neo4j are expressed through
the mandatory TYPE you assign to the relationship, combined with
the direction you choose to point the mandatory arrow
The trick you solved with APOC was how to traverse a path of relationships that alternate between pointing forward and backward along the query's path. But before reaching for a power tool, why not just reverse the direction of either of your relationship types. You can change the model for allows from
<-[:allows]-
to
-[:is_allowed_by]->
and that buys you a lot. Now the directions of both relationships are the same and you can combine both relationships into a single relationship in the match pattern. And the path traversal can be expressed like this, short & sweet:
(u:User)-[:can|is_allowed_by*]->(c:Company)
That will literally go to all lengths to find every user-to-company path, branching included.

How to query Neo4j N levels deep with variable length relationship and filters on each level

I'm new(ish) to Neo4j and I'm attempting to build a tool that allows users on a UI to essentially specify a path of nodes they would like to query neo4j for. For each node in the path they can specify specific properties of the node and generally they don't care about the relationship types/properties. The relationships need to be variable in length because the typical use case for them is they have a start node and they want to know if it reaches some end node without caring about (all of) the intermediate nodes between the start and end.
Some restrictions the user has when building the path from the UI is that it can't have cycles, it can't have nodes who has more than one child with children and nodes can't have more than one incoming edge. This is only enforced from their perspective, not in the query itself.
The issue I'm having is being able to specify filtering on each level of the path without getting strange behavior.
I've tried a lot of variations of my Cypher query such as breaking up the path into multiple MATCH statements, tinkering with the relationships and anything else I could think of.
Here is a Gist of a sample Cypher dump
cypher-dump
This query gives me the path that I'm trying to get however it doesn't specify name or type on n_four.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
RETURN path
This query is what I'd like to work however it leaves out the leafs at the third level which I am having trouble understanding.
MATCH path = (n_one)-[*0..]->(n_two)-[*0..]->(n_three)-[*0..]->(n_four)
WHERE n_one.type IN ["JCL_JOB"]
AND n_two.type IN ["JCL_PROC"]
AND n_three.name IN ["INPA", "OUTA", "PRGA"]
AND n_three.type IN ["RESOURCE_FILE", "COBOL_PROGRAM"]
AND n_four.name IN ["TAB1", "TAB2", "COPYA"]
AND n_four.type IN ["RESOURCE_TABLE", "COBOL_COPYBOOK"]
RETURN path
What I've noticed is that when I "... RETURN n_four" in my query it is including nodes that are at the third level as well.
This behavior is caused by your (probably inappropriate) use of [*0..] in your MATCH pattern.
FYI:
[*0..] matches 0 or more relationships. For instance, (a)-[*0..]->(b) would succeed even if a and b are the same node (and there is no relationship from that node back to itself).
The default lower bound is 1. So [*] is equivalent to [*..] and [*1..].
Your 2 queries use the same MATCH pattern, ending in ...->(n_three)-[*0..]->(n_four).
Your first query does not specify any WHERE tests for n_four, so the query is free to return paths in which n_three and n_four are the same node. This lack of specificity is why the query is able to return 2 extra nodes.
Your second query specifies WHERE tests for n_four that make it impossible for n_three and n_four to be the same node. The query is now more picky, and so those 2 extra nodes are no longer returned.
You should not use [*0..] unless you are sure you want to optionally match 0 relationships. It can also add unnecessary overhead. And, as you now know, it also makes the query a bit trickier to understand.

Neo4j labels and properties, and their differences

Say we have a Neo4j database with several 50,000 node subgraphs. Each subgraph has a root. I want to find all nodes in one subgraph.
One way would be to recursively walk the tree. It works but can be thousands of trips to the database.
One way is to add a subgraph identifier to each node:
MATCH(n {subgraph_id:{my_graph_id}}) return n
Another way would be to relate each node in a subgraph to the subgraph's root:
MATCH(n)-[]->(root:ROOT {id: {my_graph_id}}) return n
This feels more "graphy" if that matters. Seems expensive.
Or, I could add a label to each node. If {my_graph_id} was "BOBS_QA_COPY" then
MATCH(n:BOBS_QA_COPY) return n
would scoop up all the nodes in the subgraph.
My question is when is it appropriate to use a garden-variety property, add relationships, or set a label?
Setting a label to identify a particular subgraph makes me feel weird, like I am abusing the tool. I expect labels to say what something is, not which instance of something it is.
For example, if we were graphing car information, I could see having parts labeled "FORD EXPLORER". But I am less sure that it would make sense to have parts labeled "TONYS FORD EXPLORER". Now, I could see (USER id:"Tony") having a relationship to a FORD EXPLORER graph...
I may be having a bout of "SQL brain"...
Let's work this through, step by step.
If there are N non-root nodes, adding an extra N ROOT relationships makes the least sense. It is very expensive in storage, it will pollute the data model with relationships that don't need to be there and that can unnecessarily complicate queries that want to traverse paths, and it is not the fastest way to find all the nodes in a subgraph.
Adding a subgraph ID property to every node is also expensive in storage (but less so), and would require either: (a) scanning every node to find all the nodes with a specific ID (slow), or (b) using an index, say, :Node(subgraph_id) (faster). Approach (b), which is preferable, would also require that all the nodes have the same Node label.
But wait, if approach 2(b) already requires all nodes to be labelled, why don't we just use a different label for each subgroup? By doing that, we don't need the subgraph_id property at all, and we don't need an index either! And finding all the nodes with the same label is fast.
Thus, using a per-subgroup label would be the best option.

neo4j concurrency issue - deleting and matching

I have one program that builds graphs. I have a reaper which deletes old graphs.
Sometimes, the set of nodes returned by the queries used when building the graphs overlaps with the set of nodes being reaped. This is giving me a spurious "EntityNotFound: Node with id xxxxxx" error.
I say it is spurious because the reality is that we're not deleting the nodes we're adding - they are on separate graphs.
However, the loader's MATCH has two parts:
MATCH(n: MYNODE {indexed-var:"ddd", version:"xxx"} ...
It is true that some n indexed by "ddd" can be in the graph being deleted, but the specific version of node n I am adding will always have a 'safe' version number. However, EXPLAIN clearly shows I am sucking in multiple MYNODE nodes, and then filtering to the specific node. I am guessing that the delete program is deleting a MYNODE node after the loader has fetched it, but before it is filtered.
The loader and deleter are both running with transactions, so it isn't an immediate thing - the failure happens on commit.
Can I use _LOCK_ to prevent the read and the delete from acting on the same nodes at the same time? Other ideas?
One solution is to:
Add to MYNODE another property (let's call it id) whose value is a string that concatenates index and version, separated by a delimiter character (let's say it is "|").
Create an index on :MYNODE(id).
Change your MATCH clause to:
MATCH(n: MYNODE {id:"ddd|xxx"} ...
The use of an index allows Cypher to immediately get the desired node(s), avoiding the need to iterate through all the MYNODE nodes and filtering out the undesired ones (some of which may no longer exist). This approach has the added benefit of being much faster.

neo4j - how to snapshot everything in a label

Im using neo4j to store information about maps and sensors. Every time the map or sensor layout changes I need to keep a copy. I can imagine querying and manually creating said copy but I'm wondering if it's possible to build a neo4j type query that would do this for me.
So far all I've come up with is a way to replicate the nodes in a given label:
match ( a:some_label { some_params }) with a create ( b:some_label ) set b=a,b.other_id=value;
This would allow me to put version and time stamp info on a given snap shot.
What it doesn't do is copy the edge information. Suggestions? Maybe a second (similar) query?
If I understand you correctly, you are essentially trying to maintain a history of the state of a node and the state of its incoming relationship. One way to do this is to chain the nodes in reverse chronological order.
For example, suppose the nodes in the chain are labeled Some_label and the relationships are of type SOME_TYPE. The head node of the chain is always the current (most recent) node. Unless a Some_label node is chronologically the earliest node in the chain, it will have a SOME_TYPE relationship to the previous version of the node.
Here is how you'd insert a new relationship and node (with some properties) at the head of the chain. (Just to set up this example, I assume that the first node in the chain is linked to by some node labeled HeadRef).
MATCH (x:HeadRef)-[r1:SOME_TYPE]->(a1:Some_label)
CREATE (x)-[r2:SOME_TYPE {x: "ghi"}]->(a2:Some_label {a:123, b: true})-[r:SOME_TYPE]->(a1)
SET r=r1
WITH r1
DELETE r1
Note that this approach is also much more performant than maintaining your own other_id property to link nodes together. You should always use relationships instead -- that is the graph DB way.

Resources