Neo4j Traversal Framework Expander and Ordering - neo4j

I am trying to understand Neo4J java traversal API but after a thorough reading I am stuck on certain points.
What I seem to know:
Difference between PathExpander and BranchOrderingPolicy. As per my understanding, the former tells what relationships are eligible to be explored from a particular position and the latter specifies the ordering in which they should be evaluated.
I would like to know the following things:
Whether or to what extent this understanding is correct or if it can be altered to give the correct understanding.
If correct, how is PathExpander different from Evaluator.
How does PathExpander and BranchOrderingPolicy work. What I intend to ask is, is PathExpander consulted everytime a relationship is added to the traversal and what does it do with the iterable of relationships returned. Similarly with branch ordering.
During traversal how and when do the components Expander, BranchOrdering, Evaluator, Uniqueness come into picture. Basically I wish to know the template algorithm where one would say like first expander is asked for a collection of relationships to expand and then ordering policy is consulted to select one of the eligibles....
If correct, does the ordering policy specified by BranchOrderingPolicy apply on the eligible relationships only(after expander has done). Perhaps it must be.
Please include anything else that might be helpful in understanding the API.

I'll try to describe these parts to the best of my ability.
As to the difference between PathExpander and BranchOrderingPolicy: a PathExpander is invoked for each traversal branch the first time the traversal continues from that branch. (A traversal branch is a node including the path leading up to that node, note that there may be many paths, i.e. many branches to the same node, mostly depending on uniqueness). The result of invoking the PathExpander is an Iterator<Relationship> which will lazily provide new relationships off of that traversal branch when needed. That brings us to BranchOrderingPolicy which looks at all alive traversal branches. By "alive" I mean a branch that has one or more relationships on it such that more branches can be created from it. Given all alive branches it picks one of them, following its next relationship (retreived from the relationship iterator on that branch, potentially if it's the first call initializes that iterator using the PathExpander (as described above).
Difference between PathExpander and Evaluator: that split is very much a matter of convenience and separation of concerns. PathExpander grows the number of branches and Evaluator filters, i.e. reduces the number of branches. An expander creates new branches that are evaluated by the Evaluator. With that said you can write a PathExpander that does both those things and it could be more efficient doing so. But the convenience of having them separated, where there can be multiple Evaluators is quite useful.
See above (1)
Some of this is described in (1), but a broader picture would be that the BranchOrderingPolicy is the driver in the traversal - out of every alive branch it picks one and follows it one relationship out to a new branch. Only branches that comply with the selected uniqueness will be created. The relationships for a branch are retreived on the first time this happens for every branch, in the form of a lazy relationship iterator using the PathExpander. New branches get evaluated the first time they are selected where one result of the evaluation is whether or not this branch is a dead end and the other is whether or not to include it in the result out to the user.
I think the above explains that
Is this sufficient information?

Related

When should inferred relationships and nodes be used over explicit ones?

I was looking up how to utilise temporary relationships in Neo4j when I came across this question: Cypher temp relationship
and the comment underneath it made me wonder when they should be used and since no one argued against him, I thought I would bring it up here.
I come from a mainly SQL background and my main reason for using virtual relationships was to eliminate duplicated data and do traversals to get properties of something instead.
For a more specific example, let's say we have a robust cake recipe, which has sugar as an ingredient. The sugar is what makes the cake sweet.
Now imagine a use case where I don't like sweet cakes so I want to get all the ingredients of the recipe that make the cake sweet and possibly remove them or find alternatives.
Then there's another use case where I just want foods that are sweet. I could work backwards from the sweet ingredients to get to the food or just store that a cake is sweet in general, which saves time from traversal and makes a query easier. However, as I mentioned before, this duplicates known data that can be inferred.
Sorry if the example is too strange, I suck at making them. I hope the main question comes across, though.
My feeling is that the only valid scenario for creating redundant "shortcut" relationships is this:
Your use case has a stringent time constraint (e.g., average query time must be less than 200ms), but your neo4j query -- despite optimization -- exceeds that constraint, and you have verified that adding "shortcut" relationships will indeed make the response time acceptable.
You should be aware that adding redundant "shortcut" relationships comes with its own costs:
Queries that modify the DB would need to be more complex (to modify the redundant relationships) and also slower.
You'd always have to add the redundant relationships -- even if actually you never need some (most?) of them.
If you want to make concurrent updates to the DB, the chances that you may lose some updates and introduce inconsistencies into the DB would increase -- meaning that you'd have to work even harder to avoid inconsistencies.
NOTE: For visualization purposes, you can use virtual nodes and relationships, which are temporary and not actually stored in the DB.

Is there a race condition when creating unique paths?

I recently discovered that a race condition exists when executing concurrent MERGE statements. Specifically, duplicate nodes can be created in the scenario where a node is created after the MATCH step but before the CREATE step of a given MERGE.
This can be worked around in some instances using unique constraints on the merged nodes; however, this falls short in scenarios where:
There is no single unique property to enforce (e.g. pairs of properties need to be unique but individual ones don't).
Trying to merge relationships and paths.
Does using CREATE UNIQUE solve this problem (or do the same pitfalls exist)? If so, is it the only option? It feels like the usefulness of MERGE is fairly heavily diminished when it effectively can't guarantee the uniqueness of the path or node being merged...
When MERGE statements are executed concurrently, these situations may occur. Basically, each transaction gets a view of the graph at the first point of reading, and won't see updates made after that point (with some variations). The main exception to this are uniquely constrained nodes, where Neo4j will initialise a fresh reader from the index when reading, regardless of what was previously read in the transaction.
A workaround could be to create a 'dummy' property and a unique constraint on it and one of the node labels. In Neo4j 2.2.5, this should work to get around your problem.

Cypher query: Is it possible to "hide" an existing path with a "virtual relationship"?

We are working on a project trying to map a structure like Java code connections with Noe4J 2.1.5. We have succeeded in connecting Applications-Jars-Classes-Methods and can for example get a Cypher answer resulting in:
App1-->Jar1-->Class1-->Method1-->Method2-->Method3<--Class22<--Jar2<--App1
Now we would like to be able to get the condensed answer to what Jars that are connected like this, "hiding" the existing path above?
Jar1--Jar2
Is it possible with Cypher to get this result without creating a new Relationship like
Jar1-[:PATH_EXISTS]-Jar2
We can't find anything related collapsing/hiding paths in the manual nor here on stack overflow
Regards
Christofer
There's basically two ways of going about this.
The first is to explicitly create the new relationship, but I won't talk about this that much because it seems you've thought of that and rejected it. That method is easy, but more disk intensive (depending on the size of your graph)
The second is simply to query for the path when needed, with a variable length path like this:
MATCH (jar1 {myid: "something"})-[*]->(jar2 {myid: "somethingelse"})
RETURN jar2;
This will get you what you need, but it requires that this distant path be recomputed every time it's needed. So, it's easy, but it's compute intensive.
Now, more broadly what it sounds like you want is something like a graph inference engine. In the OWL/RDF world, people will create ontologies that describe different types of entities, and the relationships between them. One of the consequences of these relationships is that they can be transitive and can have implications on them. A classic example is that a person is an entity, and things like motherOf and fatherOf are relationships between. So if you have a path of fatherOf relationships between nodes, i.e. (A)-[:fatherOf]->(B)-[:fatherOf]->(C), the inference engine will return the "fact" that (A) and (C) are related by family. This would be a consequence of your ontological definition. That "fact" wouldn't actually be in the RDF store, it would simply be entailed by the facts.
In your case, you'd do something like writing an ontology that specified that all of the individual relationships you have in your graph are a specialization of some relationship type (like "related to"). You'd then ask the reasoner if a "related to" relationship exists between Jar1 and Jar2, and the answer would be yes because of your ontological definitions.
OK, so the bad news is that neo4j isn't RDF and doesn't do this. Also, doing this sort of thing is way harder than I'm making it sound; correct ontology modeling is an art unto itself, not unlike logic programming from the prolog world of the 1970s. But basically, that kind of inference is what it sounds like you're looking for.
What I think you might be able to hope for in some future release of neo4j is something akin to a database "view", or better schema support. I.e. it ought to be possible to specify that whenever a certain relationship pattern holds, some other relationship ought also be present.

How to programmatically add constraints to Neo4J Cypher queries

I am writing a sever plugin for Neo4J. The plugin receives a cypher query, and executes it. Currently, my implementation uses a CypherExecutor.
I now need to further constrain the results. (For example, imagine that the results need to be filtered by ACLs.)
One approach is to filter the results after executing the query. I'd rather not do this, for performance reasons as well as other limitations (for example, any aggregate results would be wrong.)
I considered adding the constraints to the query itself. I've looked at the command.AbstractQuery subclasses produced using the CypherParser. That object model is immutable.
I am wondering whether I will need to resort to cloning Neo4J's ExecutionEngine and CypherCompiler, just to extend the ExecutionPlanBuilder... I would like to avoid this option if at all possible.
Any recommendations about how this can be done?
In my case, I am simply trying to simulate multiple isolated graphs. I am OK with how this might be modeled -- whether I add a 'tenantId' to each node, or maintain a tenant node and add (:Tenant)<-[:scopedTo]-(n) relationships to every node.

How can I specify which relationship type to use as a function of the current node at every step of a traversal with neo4j?

I'd like to traverse my graph using the neo4j traversal API, but I need to be able to specify which relationship type to use at every step, and the relationship type to use needs to be a function of the current node. Is there a way to do this?
in the current Traverser API you can't choose the exact relationship to traverse. Instead, you take the more granular approach of node.getRelationships(), chose the one you want and the end onde on it, and so on.
The algo gets a bit more verbose than using Traverser, but gives you more flexibility. For a tinkering approach, Gremlin supports the notion of functions for choosing edges to traverse, see here. This will soon be implemented using Blueprint Pipes for Java-level performance.
HTH
/peter neubauer

Resources