How to find all paths from a node - neo4j

I'm new to neo4j, i've read a couple of tutorials but i am stuck with finding all paths from a node till it reaches another when the status changes and different path each time.
I've made a picture:
Starting from the node at the top, I would like to find all nodes T that have status=1 and we move from node of type O to T with a 'o' relationship and from T to O with 'i' relationships. If we reach a node T with status = 0 then we go the 'i' relationship and check if T status = 1 etc
I don't know the depth of the graph. I've found on the manual that we can use [r*1..] but i am not sure how to use here.
I have tried
match (o1:O)-[:o]-(t:T), (t)-[:i]-(o2:O)-[:o]-(t2:T)
return o1, t, o2, t2
for the first depth but i don't know how to do it with unknown depth and make go deeper as long as status is not 1

Your schema looks like so (the question mark means I'm not sure what relationship you wanted there).
(:O)<-[:o]-(:T)<-[:i]-(:O)<-[:o]-(:T)<-[:?]-(:T)
You need to somehow identify the first node from which you start, and I'm not sure exactly what nodes you are trying to get from the schema, but something like this would return all nodes with status 1 that are somehow connected to first node, which here is just identified by having status 0 (so might actually be more than one node).
MATCH (firstnode:O {Status: 0})<-[:o|:i*..]-(othernodes) WHERE othernodes.Status=1 RETURN othernodes
But be warned - any *.. command will take forever to run.

Related

Cypher: Get Nodes with ONLY incoming or ONLY outgoing Edges (Start Nodes / End Nodes)

Iam searching for the right cypher query to get the first and last nodes of paths when selecting a node which is in between. The idea is to compress a large impact graph so that only the sources (only outgoing edges = green nodes) and the final consequences (only incoming edges = red nodes) as well as the selected node is displayed.
Here is an illustrative example graph:
Now, when selecting e.g node d, i would like to receive node d and the first node and last node of every path in which node d is part of as well as the respective (new) relationships so that the output is the follwing graph:
Hence, Iam searching for a kind of collapsing where the start and end nodes are excluded.
Due to this answer I already know that is possible to create virtual graphs with apoc.create.vRelationship.
But Iam struggling with the identification of the green start nodes and red end nodes as described above as well as the creation of the desired output.
Iam searching for a query where only the node in between (e.g node d) is a parameter and the output is always like in the second image.
I appreciate every help or inspiration a lot, thank you in advance!
For your illustrated data model (assuming the desired middle node is neither the start nor end node):
MATCH (start)-[:RELATED_TO*]->(middle)-[:RELATED_TO*]->(end)
WHERE
middle.id = 123 AND
NOT EXISTS(()-[:RELATED_TO]->(start)) AND
NOT EXISTS((end)-[:RELATED_TO]->())
RETURN start, middle, end,
apoc.create.vRelationship(start, 'RELATED_TO', {}, middle) as pre_rel,
apoc.create.vRelationship(middle, 'RELATED_TO', {}, end) as post_rel
[UPDATE]
The above query can, unfortunately, create duplicate virtual relationships. This one does not:
MATCH (middle)
WHERE middle.id = 123
MATCH (start)-[:RELATED_TO*]->(middle)
WHERE NOT EXISTS(()-[:RELATED_TO]->(start))
WITH middle, COLLECT(start) AS starts, COLLECT(apoc.create.vRelationship(start, 'RELATED_TO', {}, middle)) AS vr1s
MATCH (middle)-[:RELATED_TO*]->(end)
WHERE NOT EXISTS((end)-[:RELATED_TO]->())
RETURN middle, starts, COLLECT(end) AS ends, vr1s, COLLECT(apoc.create.vRelationship(middle, 'RELATED_TO', {}, end)) AS vr2s
NOTE: You also need to uncheck the "Connect result nodes" option in the Browser Settings (click on the Gear icon in the Browser's left panel), or else some "real" relationships will also be displayed.
This query would return node d (filtering here by a name property just as an example) and all related edge nodes:
MATCH (d {name: "d"})-[:RELATED_TO*]-(n)
WHERE NOT ((n)-[:RELATED_TO]->() AND (n)<-[:RELATED_TO]-())
RETURN d, n
The condition for the edge nodes would be that they don't have :RELATED_TO relationships in both directions.

Connecting the end of a set to the beginning

We have a set of nodes that are connected. Each node has a link to the next node in the chain. When the chain runs out, that end node just hangs out there. See the graphic below.
Node path
Each of these nodes has the same level, so as long as they are in the chain, they have the same number. So what I am hoping to do is come up with a cypher query that builds a link between the max ID and the MIN ID that share the same line number. So basically connecting the end, with the beginning. Is there a clever way to do this ?
Your question lacks some clarity, but what about thinking along the lines below ?
// find all levels in your dataset of nodes in the chains
MATCH (n)
WHERE (n)-[:NEXT]-()
WITH COLLECT(DISTINCT n.level) AS levels
UNWIND levels AS level
// for each level, find the chain
MATCH (start {level:level})-[:NEXT*]->(end {level:level})
WHERE NOT (
({level:level})-[:NEXT]->(start)
OR
(end)-[:NEXT]->({level:level})
)
// connect end to start
MERGE (end)-[:MYRELTYPE]->(start)

How to access node objects in a collection of paths? (paths of two or more nodes)

I have a graph where some nodes were created out of an error in the app.
I want to delete those nodes (they represent a log), but I can't figure out how to loop thru the nodes.
I don't know how to access nodes in a collection of paths, and I need to do that in order to compare one node to another.
match (o:Order{id:123})
match (o)-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
with collect((l:Log)-[:STATUS]->(os:OrderStatus)) as logs
I want to access each one of the nodes in the paths to perform a comparation. There are 5 or 6 of (l)-[:STATUS]->(os) normally for each Order.
How can I access the (l) and (os) nodes of each path, to perform the comparations between their properties?
For example, if I had this collection of paths in one of the Orders:
(log1)-[:STATUS]->(os1)
(log2)-[:STATUS]->(os2)
(log3)-[:STATUS]->(os3)
(log4)-[:STATUS]->(os2) <-- This is the error
(log5)-[:STATUS]->(os4)
So, from the collection of paths above, I'd want to detach delete the (log4), because the (os2) node is lower than the previous one (os3), and should be greater.
And after that, I want to attach the (log3) to the (log5)
NOTE: Each one of the (os) nodes has an id that represents the "status", and go from 1 to 5. Also, the (log) nodes are ordered by the created datetime.
Any idea on how to do this? Thank you in advance guys!
EDIT
I didn't mention some other scenarios I had. This is one of them:
Based on #cybersam answer, I found out how to work it out.
I had to run 2 separated queries to make it work, but the principle is the same, and is as follows:
Create new relationships:
MATCH(o:Order)-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
WHERE SIZE((o)-[:STATUS_CHANGE*]->()-[:STATUS]->(os)) >= 1
WITH o, os, COLLECT(l)[0] AS keep
WITH o, collect(keep) AS k
FOREACH(i IN range(0,size(k)-1) |
FOREACH(a IN [k[i]] |
FOREACH(b IN [k[i+1]] |
FOREACH(c IN CASE WHEN b IS NOT NULL THEN [1] END | MERGE (a)-[:STATUS_CHANGE]->(b) ))));
Delete exceeded nodes:
MATCH(o:Order)-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
WHERE (os)<-[:STATUS]-()-[:STATUS_CHANGE*]->(l)-[:STATUS]->(os)
WITH o, os, COLLECT(l) AS exceed
UNWIND exceed AS del
detach delete del;
This queries worked on every scenario.
Assuming all your errors follow the same pattern (the unwanted Log nodes are always referencing an "older" OrderStatus), this may work for you:
MATCH (o:Order{id:123})-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
WHERE SIZE(()-[:STATUS]->(os)) > 1
WITH os, COLLECT(l) AS logs
UNWIND logs[1..] AS unwanted
OPTIONAL MATCH (x)-[:STATUS_CHANGE]->(unwanted)-[:STATUS_CHANGE]->(y)
DETACH DELETE unwanted
FOREACH(ignored IN CASE WHEN x IS NOT NULL THEN [1] END | CREATE (x)-[:STATUS_CHANGE]->(y))
This query:
Finds (in order) all relevant OrderStatus nodes having multiple STATUS relationships.
Uses the aggregating function COLLECT to collect (in order) the Log nodes related to each of those OrderStatus nodes.
Uses UNWIND logs[1..] to get the individual unwanted Log nodes.
Uses OPTIONAL MATCH to get the 2 nodes that may need to be connected together, after the unwanted node is deleted.
Uses DETACH DELETE to deleted each unwanted node and its relationships.
Uses FOREACH to connect together the pair of nodes that might have been foiund by the OPTIONAL MATCH.

Leaf nodes and paths in cypher

I have a graph in Neo4J that looks like this:
(a {flag:any})<- (many, 0 or more) <-(b {flag:true})<- (many, 0 or more) <-(c {flag: any})
-OR-
(a {flag:any})<- (many, 0 or more) <-(d)
-OR-
(a {flag:any})
Where a, b, c, and d all have the same type, and the relations are also the same. All the nodes have flag:false except where noted. Of course the real graph is a tree, not a vine.
In short, every path should begin with a and end with the first flag=true node, or should begin with a and get all children down to the leaf of the tree. Per the last example, a doesn't have to have any children - it can be a root and a leaf. Finally, in the first case, we'll never pull in c. b stops the traversal.
How can I write this query?
I have gotten it to work with a path and several unwind/collect statements that are basically horse****, lol. I want a better query, but I am so confused now it is not going to happen.
The following query should return all 3 kinds of paths. I assume that all relevant nodes are labeled Foo, and all relevant relationships have the BAR type.
The first term of the WHERE clause looks for paths (of length 0 or more, because of the variable-length relationship pattern used in the MATCH clasue) that end in a node with a true flag with no true flags earlier in the path (except for possibly the starting node). The second term looks for paths (of length 0 or more) ending with a leaf node, where no nodes (except for possibly the starting node) have a true flag.
MATCH p=(a:Foo)<-[:BAR*0..]-(b:Foo)
WHERE
(b.flag AND NONE(x IN NODES(p)[1..-1] WHERE x.flag)) OR
((NOT (b)<-[:BAR]-()) AND NONE(y IN NODES(p)[1..] WHERE y.flag))
RETURN p;
NOTE: Variable-length relationship patterns with no upper bound (like [:BAR*0..]) can be very expensive, and can take a very long time or cause an out of memory error. So, you may need to specify a reasonable upper bound (for example, [:BAR*0..5]).
I would approach this query as the UNION of the two cases:
MATCH shortestPath((a)<-[:REL_TYPE*1..]-(end:Label {flag: true}))
RETURN a, end
UNION
MATCH (a)<-[:REL_TYPE*0..]-(end:Label)
WHERE NOT (end)<-[:REL_TYPE]-()
RETURN a, end
Let's break it down:
To express that we only want to traverse until the first flag is true, we use shortestPath.
To express that we want to traverse down to the leaf, we use the following formalisation: a node is a leaf if it has no relationships that could be continued, captured by a WHERE NOT filter on patterns.
This should give an idea of the basic ideas to use for such queries -- please provide some feedback so that I can refine the answer.

In a mnesia cluster, which node is queried?

Let's say you have a mnesia table replicated on nodes A and B. If on node C, which does not contain a copy of the table, I do mnesia:change_config(extra_db_nodes, [NodeA, NodeB]), and then on node C I do mnesia:dirty_read(user, bob) how does node C choose which node's copy of the table to execute a query on?
According to my own research answer for the question is - it will choose the most recently connected node. I will be grateful for pointing out errors if found - mnesia is a really complex system!
As Dan Gudmundsson pointed out on the mailing list algorithm of selection of the remote node to query is defined in mnesia_lib:set_remote_where_to_read/2. It is the following
set_remote_where_to_read(Tab, Ignore) ->
Active = val({Tab, active_replicas}),
Valid =
case mnesia_recover:get_master_nodes(Tab) of
[] -> Active;
Masters -> mnesia_lib:intersect(Masters, Active)
end,
Available = mnesia_lib:intersect(val({current, db_nodes}), Valid -- Ignore),
DiscOnlyC = val({Tab, disc_only_copies}),
Prefered = Available -- DiscOnlyC,
if
Prefered /= [] ->
set({Tab, where_to_read}, hd(Prefered));
Available /= [] ->
set({Tab, where_to_read}, hd(Available));
true ->
set({Tab, where_to_read}, nowhere)
end.
So it gets the list of active_replicas (i.e. list of candidates), optionally shrinks the list to master nodes for the table, remove tables to be ignored (for any reason), shrinks the list to currently connected nodes and then selects in the following order:
First non-disc_only_copies
Any available node
The most important part is in fact the list of active_replicas, since it determines the order of nodes in the list of candidates.
List of active_replicas is formed by remote calls of mnesia_controller:add_active_replica/* from newly connected nodes to old nodes (i.e. one which were in the cluster before), which boils down to the function add/1 which adds the item as the head of the list.
Hence answer for the question is - it will choose the most recently connected node.
Notes:
To check out the list of active replicas on the given node you can use this (dirty hack) code:
[ {T,X} || {{T,active_replicas}, X} <- ets:tab2list(mnesia_gvar) ].
Well, node C would need to contact either node A or node B in order to do a query. Thus node C will have to decide itself which table copy to execute the query on.
If you need something more than this you would either need to have some algorithm which will decide which node to query on, or even replicate the table on node C (this would typically depend on what kind of characteristics you want / need).
If node A and node B form or are part of a database cluster, a good start is probably the round robin algorithm (or random, as you suggest).

Resources