Neo4j and Cypher: Reversing Only One Arrow - neo4j

Let's say I have three nodes in my Neo4j graph, with directed relationships like this: (a)<--(b)-->(c). Furthermore, assume that (b) does NOT have the property visit_type_name, whereas both (a) and (c) do. Now what I would like to do is reverse only one of these arrows. For the moment, it does not matter which one, although being able to specify conditions, involving properties, on which one to reverse would be nice. I tried the following:
MATCH(x)-[r]->(y)
WHERE NOT EXISTS(()-->(x))
AND NOT EXISTS(x.visit_type_name)
DELETE r
MERGE(y)-->(x)
My thought was that after this code reversed, say, the arrow (a)<--(b) to (a)-->(b), then (b) would no longer be parent-less, and the MATCH would not continue on and do the same thing with the (b)-->(c) link. Unfortunately, Cypher does continue and reverse both arrows, which is not what I want. So then I tried this, thinking that I needed to change the granularity of the Cypher match:
MATCH(y)
WITH y
MATCH(x)-[r]->(y)
WHERE NOT EXISTS(()-->(x))
AND NOT EXISTS(x.visit_type_name)
DELETE r
MERGE(y)-->(x)
Unfortunately, this does the same thing as before.
How can I reverse only one arrow in this situation?
Is there a way to finalize the first arrow reversal transaction before moving on?
Many thanks for your time!

I think I have a method of doing this. Each node has a unique numerical id which I can leverage as follows:
MATCH(x)-->(y)
WHERE NOT EXISTS(()-->(x))
AND NOT EXISTS(x.visit_type_name)
WITH MIN(y.id) AS min_y_id, x
MATCH(x)-[r]->(min_y)
WHERE min_y.id = min_y_id
DELETE r
MERGE(min_y)-->(x)
This essentially picks out the minimum id and only reverses the arrow for the corresponding node.

Related

Cypher return multiple hops through pattern of relationships and nodes

I'm making a proof of concept access control system with neo4j at work, and I need some help with Cypher.
The data model is as follows:
(:User|Business)-[:can]->(:Permission)<-[:allows]-(:Business)
Now I want to get a path from a User or a Business to all the Business-nodes that you can reach trough the
-[:can]->(:Permission)<-[:allows]-
pattern. I have managed to write a MATCH that gets me halfway there:
MATCH
path =
(:User {userId: 'e96cca53-475c-4534-9fe1-06671909fa93'})-[:can|allows*]-(b:Business)
but this doesn't have any directions, and I can't figure out how to include the directions without reducing the returned matches to only the direct matches (i.e it doesn't continue after the first hit on a :Business node)
So what I'm wondering is:
Is there a way to match multiple of these hops in one query?
Should I model this entirely different?
Am I on the wrong path completely and the query should be completely
rewritten
Currently the syntax of variable-length expansions doesn't allow fine control for separate directions for certain types. There are improvements in the pipeline around this, but for the moment Cypher alone won't get you what you want.
We can use APOC Procedures for this, as fine control of the direction of types in the expansion, and sequences of relationships, are supported in the path expander procs.
First, though, you'll need to figure out how to address your user-or-business match, either by adding a common label to these nodes by which you can MATCH either type by property, or you can use a subquery with two UNIONed queries, one for :Business nodes, the other for :User nodes, that way you can still take advantage of an index on either, and get possible results in a single variable.
Once you've got that, you can use apoc.path.expandConfig(), passing some options to get what you want:
// assume you've matched to your `start` node already
CALL apoc.path.expandConfig(start, {relationshipFilter:'can>|<allows', labelFilter:'>Business'}) YIELD path
RETURN path
This one doesn't use sequences, but it does restrict the direction of expansion per relationship type. We are also setting the labelFilter such that :Business nodes are the end node of the path and not nodes of any other label.
You can specify the path as follows:
MATCH path = (:User {userId: $id})-[:can]->(:Permission)
<-[:allows]-(:Business))
RETURN path
This should return the results you're after.
I see a good solution has been provided via path expanding APOC procedures.
But I'll focus on your point #2: "Should I model this entirely differently?"
Well, not entirely but I think yes.
The really liberating part of working with Neo4j is that you can change the road you are driving over as easily as you can change your driving strategy: model vs query. And since you are at an early stage in your project, you can experiment with different models. There's a good opportunity to make just a semantic change to make an 'end run' around the problem.
The semantics of a relationship in Neo4j are expressed through
the mandatory TYPE you assign to the relationship, combined with
the direction you choose to point the mandatory arrow
The trick you solved with APOC was how to traverse a path of relationships that alternate between pointing forward and backward along the query's path. But before reaching for a power tool, why not just reverse the direction of either of your relationship types. You can change the model for allows from
<-[:allows]-
to
-[:is_allowed_by]->
and that buys you a lot. Now the directions of both relationships are the same and you can combine both relationships into a single relationship in the match pattern. And the path traversal can be expressed like this, short & sweet:
(u:User)-[:can|is_allowed_by*]->(c:Company)
That will literally go to all lengths to find every user-to-company path, branching included.

Fastest way to get all nodes under a specified starting node

I have a query that has been working for awhile, but as my graph has grown has seriously slowed down:
MATCH p1=(n2)-[*0..]->(n3)-[r4]->(n5)
WHERE (id(n2) = 123456 // Fill in starting node ID
AND all(r6 in relationships(p1) WHERE (NOT exists(r6.value1) OR r6.value1 = r6.value2) // Add some constraints on the path
))
RETURN id(n3),n3.constr,r4.constr,type(r4),id(n5),n5.constr,n5.value // Things about n3,r4,n5, n3 may be the starting node
Unfortunately, there are various node labels and relationships under my starting node, and I want to return information about them so I can't constrain my query any further on those pieces. I can quickly get my starting node since I have its ID, but I can't find a quick way to get everything underneath the starting node.
This question asks the same thing, but without any real answer other than to add label constraints which I can't do. Since I know I have a tree structure (and want all nodes under a starting node), is there a faster way to perform this query? Is this something I should write in the Traversal API (if so, what would that look like)?
There is one thing I don't understand in your query.
Why have you done this (n2)-[*0..]->(n3)-[r4]->(n5) and not just this (n2)-[*0..]->(n5) ?
Moreover I don't see any constraint on the last node of your path. Normally this node is a leaf, so it's better to express it like this :
MATCH p=(root)-[*]->(leaf)
WHERE NOT (leaf)-->()
RETURN p
With this kind of query, you are only searching all the path between the root and the leafs. It's much more faster than to search all the path in your tree.
And to go one level deeper, If you want the best performances, you should use a graph traversal. Take a look at APOC with the apoc.path.expand procedure : https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_expand_paths

Neo4j cyper query: How to travese

I am trying to learn neo4j, so I just took a use case of a travel app to learn but I am not sure about the optimal way to solve it. Any help will be appreciated.
Thanks in advance.
So consider a use case in which I have to travel from one place (PLACE A) to other (PLACE C) by train, but there is no direct connection between the two places. And so we have to change our train in PLACE B.
Two places are connected via a relation IS_CONNECTED relation. refering to green nodes in the image
And then if there is an is_connected relation between two place then there will be an out going relation i.e. CONNECTED_VIA to a common train from both the node which implies how they are connected referring to red nodes in image
my question is how are we suppose to know that we have to change the station from place b
My understanding is:
We will check where the two places are connected via IS_CONNECTED relationship
match (start:place{name:"heidelberg"}), (end:place{name:"frankfurt"})
MATCH path = (start)-[:IS_CONNECTED*..]->(end)
RETURN path
this will show that these two places are connected
Then we will see that if place A and place c are directly connected or not by the query
match (p:place{name:"heidelberg"})-[:CONNECTED_VIA]->(q)<-[:CONNECTED_VIA]-(t:place{name:"frankfurt"})
return q
And this will return nothing because there is no direct connections
My brain stopped functioning after this. I am trying to figure how from past 3 days. I am sorry I look ao confused
Please click here for the image of what i am referring
You'll want to use variable-length relationships in your :CONNECTED_VIA match, and then get the :Place nodes that are in your path. And it's usually a good idea to use an upper bound, whatever makes sense in your graph.
Then we can use a filter on the nodes in your path to only keep the ones that are :Place nodes.
match path = (p:place{name:"heidelberg"})-[:CONNECTED_VIA*..4]-(t:place{name:"frankfurt"})
return path, [node in nodes(path)[1..-1] where node:Place] as connectionPlaces
And if you're only interested in the shortest paths, you may want to check the shortestPath() or shortestPaths() functions.
One last thing to note...when determining if two locations are connected, if all you need is a true or false if they're connected, you can use the EXISTS() function to return whether such a pattern exists:
match (start:place{name:"heidelberg"}), (end:place{name:"frankfurt"})
return exists((start)-[:IS_CONNECTED*..5]->(end))

Seeking Neo4J Cypher query for long but (nearly) unique paths

We have a Neo4J database representing an evolutionary process with about 100K nodes and 200K relations. Nodes are individuals in generations, and edges represent parent-child relationships. The primary goal is to be able to take one or nodes of interest in the final generation, and explore their evolutionary history (roughly, "how did we get here?").
The "obvious" first query to find all their ancestors doesn't work because there are just too many possible ancestors and paths through that space:
match (a)-[:PARENT_OF*]->(c {is_interesting: true})
return distinct a;
So we've pre-processed the data so that some edges are marked as "special" such that almost every node has at most one "special" parent edge, although occasionally both parent edges are marked as "special". My hope, then, was that this query would (efficiently) generate the (nearly) unique path along "special" edges:
match (a)-[r:PARENT_OF* {special: true}]->(c {is_interesting: true})
return distinct a;
This, however, is still unworkably slow.
This is frustrating because "as a human", the logic is simple: Start from the small number of "interesting" nodes (often 1, never more than a few dozen), and chase back along the almost always unique "special" edges. Assuming a very low number of nodes with two "special" parents, this should be something like O(N) where N is the number of generations back in time.
In Neo4J, however, going back 25 steps from a unique "interesting" node where every step is unique, however, takes 30 seconds, and once there's a single bifurcation (where both parents are "special") it gets worse much faster as a function of steps. 28 steps (which gets us to the first bifurcation) takes 2 minutes, 30 (where there's still only the one bifurcation) takes 6 minutes, and I haven't even thought to try the full 100 steps to the beginning of the simulation.
Some similar work last year seemed to perform better, but we used a variety of edge labels (e.g., (a)-[:SPECIAL_PARENT_OF*]->(c) as well as (a)-[:PARENT_OF*]->(c)) instead of using data fields on the edges. Is querying on relationship field values just not a good idea? We have quite a few different values attached to a relationship in this model (some boolean, some numeric) and we were hoping/assuming we could use those to efficiently limit searches, but maybe that wasn't really the case.
Suggestions for how to tune our model or queries would be greatly appreciated.
Update I should have mentioned, this is all with Neo4J 2.1.7. I'm going to give 2.2 a try as per Brian Underwood's suggestion and will report back.
I've had some luck with specifying a limit on the path length. So if you know that it's never more than 30 hops you might try:
MATCH (c {is_interesting: true})
WITH c
MATCH (a)-[:PARENT_OF*1..30]->c
RETURN DISTINCT a
Also, is there an index on the is_interesting property? That could also cause slowness, for sure.
What version of Neo4j are you using? If you are using or if you upgrade to 2.2.0, you get to use the new query profiling tools:
http://neo4j.com/docs/2.2.0/how-do-i-profile-a-query.html
Also if you use them in the web console you get a nice graph-ish tree thing (technical term) showing each step.
After exploring things with the profiling tools in Neo4J 2.2 (thanks to Brian Underwood for the tip) it's pretty clear that (at the moment) Neo4J doesn't do any pre-filtering on edge properties, which leads to nasty combinatorial explosions with long paths.
For example the original query:
match (a)-[r:PARENT_OF* {special: true}]->(c {is_interesting: true})
return distinct a;
finds all the paths from a to c and then eliminates the ones that have edges that aren't special. Since there are many millions of paths from a to c, this is totally infeasible.
If I instead add a IS_SPECIAL edge wherever there was a PARENT_OF edge that had {special: true}, then the queries become really fast, allowing me to push back around 100 generations in under a second.
This query creates all the new edges:
match (a)-[r:PARENT_OF {special: true}]->(b)
create (a)-[:IS_SPECIAL]->(b);
and takes under a second to add 91K relationships in our graph.
Then
match (c {is_interesting: true})
with c
match (a)-[:IS_SPECIAL*]->(c)
return distinct a;
takes under a second to find the 112 nodes along the "special" path back from a unique target node c. Matching c first and limiting the set of nodes using with c seems to also be important, as Neo4J doesn't appear to pre-filter on node properties either, and if there are several "interesting" target nodes things get a lot slower.

Getting a "slice" of a linked-list in neo4j

So, I have a structure that resembles a linked-list. Each node has a prev field for an id to the previous node, and I link them together using a chain relationship. There are some cases when a node is not part of this chain, ie, it's "prev" points to another node, but nothing points to it.. or only 1 node points to it.
I want to take a "slice" of this list, only including the nodes that are directly linked. ie, from the point of node A, back to node B, return all nodes in between.
This is what I have so far
match (fb {id: A}) - [:chain] -> (eb {id:B})
return fb
However, it returns no results... I think I need it to go recursive in some way, but I'm not sure how to indicate that. I've tried using :chain*, but this tends to process forever. I think I need a way to limit it..
How do I do this?
What about this?
MATCH (fb {id: A})-[:chain*1..10]->(eb {id:B})
RETURN fb
That should limit it to 10 levels. You can change that if you like, obviously, but it affects performance
EDIT: Was just reading this guide to performance tuning:
http://neo4j.com/developer/guide-performance-tuning/
One bit that caught my eye:
If you’re using queries that will have a relatively large working set
(ie. will be traversing long paths, looking at lots of properties, or
collecting large sets of results in order to do sorting, etc) then
you’ll need a larger working heap. If you have small queries that do
very limited traversals and return small amounts of data, you need
less. Assume 1-2GB to start and tune from there

Resources