We have a set of nodes that are connected. Each node has a link to the next node in the chain. When the chain runs out, that end node just hangs out there. See the graphic below.
Node path
Each of these nodes has the same level, so as long as they are in the chain, they have the same number. So what I am hoping to do is come up with a cypher query that builds a link between the max ID and the MIN ID that share the same line number. So basically connecting the end, with the beginning. Is there a clever way to do this ?
Your question lacks some clarity, but what about thinking along the lines below ?
// find all levels in your dataset of nodes in the chains
MATCH (n)
WHERE (n)-[:NEXT]-()
WITH COLLECT(DISTINCT n.level) AS levels
UNWIND levels AS level
// for each level, find the chain
MATCH (start {level:level})-[:NEXT*]->(end {level:level})
WHERE NOT (
({level:level})-[:NEXT]->(start)
OR
(end)-[:NEXT]->({level:level})
)
// connect end to start
MERGE (end)-[:MYRELTYPE]->(start)
Related
Iam searching for the right cypher query to get the first and last nodes of paths when selecting a node which is in between. The idea is to compress a large impact graph so that only the sources (only outgoing edges = green nodes) and the final consequences (only incoming edges = red nodes) as well as the selected node is displayed.
Here is an illustrative example graph:
Now, when selecting e.g node d, i would like to receive node d and the first node and last node of every path in which node d is part of as well as the respective (new) relationships so that the output is the follwing graph:
Hence, Iam searching for a kind of collapsing where the start and end nodes are excluded.
Due to this answer I already know that is possible to create virtual graphs with apoc.create.vRelationship.
But Iam struggling with the identification of the green start nodes and red end nodes as described above as well as the creation of the desired output.
Iam searching for a query where only the node in between (e.g node d) is a parameter and the output is always like in the second image.
I appreciate every help or inspiration a lot, thank you in advance!
For your illustrated data model (assuming the desired middle node is neither the start nor end node):
MATCH (start)-[:RELATED_TO*]->(middle)-[:RELATED_TO*]->(end)
WHERE
middle.id = 123 AND
NOT EXISTS(()-[:RELATED_TO]->(start)) AND
NOT EXISTS((end)-[:RELATED_TO]->())
RETURN start, middle, end,
apoc.create.vRelationship(start, 'RELATED_TO', {}, middle) as pre_rel,
apoc.create.vRelationship(middle, 'RELATED_TO', {}, end) as post_rel
[UPDATE]
The above query can, unfortunately, create duplicate virtual relationships. This one does not:
MATCH (middle)
WHERE middle.id = 123
MATCH (start)-[:RELATED_TO*]->(middle)
WHERE NOT EXISTS(()-[:RELATED_TO]->(start))
WITH middle, COLLECT(start) AS starts, COLLECT(apoc.create.vRelationship(start, 'RELATED_TO', {}, middle)) AS vr1s
MATCH (middle)-[:RELATED_TO*]->(end)
WHERE NOT EXISTS((end)-[:RELATED_TO]->())
RETURN middle, starts, COLLECT(end) AS ends, vr1s, COLLECT(apoc.create.vRelationship(middle, 'RELATED_TO', {}, end)) AS vr2s
NOTE: You also need to uncheck the "Connect result nodes" option in the Browser Settings (click on the Gear icon in the Browser's left panel), or else some "real" relationships will also be displayed.
This query would return node d (filtering here by a name property just as an example) and all related edge nodes:
MATCH (d {name: "d"})-[:RELATED_TO*]-(n)
WHERE NOT ((n)-[:RELATED_TO]->() AND (n)<-[:RELATED_TO]-())
RETURN d, n
The condition for the edge nodes would be that they don't have :RELATED_TO relationships in both directions.
I'm new to neo4j, i've read a couple of tutorials but i am stuck with finding all paths from a node till it reaches another when the status changes and different path each time.
I've made a picture:
Starting from the node at the top, I would like to find all nodes T that have status=1 and we move from node of type O to T with a 'o' relationship and from T to O with 'i' relationships. If we reach a node T with status = 0 then we go the 'i' relationship and check if T status = 1 etc
I don't know the depth of the graph. I've found on the manual that we can use [r*1..] but i am not sure how to use here.
I have tried
match (o1:O)-[:o]-(t:T), (t)-[:i]-(o2:O)-[:o]-(t2:T)
return o1, t, o2, t2
for the first depth but i don't know how to do it with unknown depth and make go deeper as long as status is not 1
Your schema looks like so (the question mark means I'm not sure what relationship you wanted there).
(:O)<-[:o]-(:T)<-[:i]-(:O)<-[:o]-(:T)<-[:?]-(:T)
You need to somehow identify the first node from which you start, and I'm not sure exactly what nodes you are trying to get from the schema, but something like this would return all nodes with status 1 that are somehow connected to first node, which here is just identified by having status 0 (so might actually be more than one node).
MATCH (firstnode:O {Status: 0})<-[:o|:i*..]-(othernodes) WHERE othernodes.Status=1 RETURN othernodes
But be warned - any *.. command will take forever to run.
I have a graph where some nodes were created out of an error in the app.
I want to delete those nodes (they represent a log), but I can't figure out how to loop thru the nodes.
I don't know how to access nodes in a collection of paths, and I need to do that in order to compare one node to another.
match (o:Order{id:123})
match (o)-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
with collect((l:Log)-[:STATUS]->(os:OrderStatus)) as logs
I want to access each one of the nodes in the paths to perform a comparation. There are 5 or 6 of (l)-[:STATUS]->(os) normally for each Order.
How can I access the (l) and (os) nodes of each path, to perform the comparations between their properties?
For example, if I had this collection of paths in one of the Orders:
(log1)-[:STATUS]->(os1)
(log2)-[:STATUS]->(os2)
(log3)-[:STATUS]->(os3)
(log4)-[:STATUS]->(os2) <-- This is the error
(log5)-[:STATUS]->(os4)
So, from the collection of paths above, I'd want to detach delete the (log4), because the (os2) node is lower than the previous one (os3), and should be greater.
And after that, I want to attach the (log3) to the (log5)
NOTE: Each one of the (os) nodes has an id that represents the "status", and go from 1 to 5. Also, the (log) nodes are ordered by the created datetime.
Any idea on how to do this? Thank you in advance guys!
EDIT
I didn't mention some other scenarios I had. This is one of them:
Based on #cybersam answer, I found out how to work it out.
I had to run 2 separated queries to make it work, but the principle is the same, and is as follows:
Create new relationships:
MATCH(o:Order)-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
WHERE SIZE((o)-[:STATUS_CHANGE*]->()-[:STATUS]->(os)) >= 1
WITH o, os, COLLECT(l)[0] AS keep
WITH o, collect(keep) AS k
FOREACH(i IN range(0,size(k)-1) |
FOREACH(a IN [k[i]] |
FOREACH(b IN [k[i+1]] |
FOREACH(c IN CASE WHEN b IS NOT NULL THEN [1] END | MERGE (a)-[:STATUS_CHANGE]->(b) ))));
Delete exceeded nodes:
MATCH(o:Order)-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
WHERE (os)<-[:STATUS]-()-[:STATUS_CHANGE*]->(l)-[:STATUS]->(os)
WITH o, os, COLLECT(l) AS exceed
UNWIND exceed AS del
detach delete del;
This queries worked on every scenario.
Assuming all your errors follow the same pattern (the unwanted Log nodes are always referencing an "older" OrderStatus), this may work for you:
MATCH (o:Order{id:123})-[:STATUS_CHANGE*]->(l:Log)-[:STATUS]->(os:OrderStatus)
WHERE SIZE(()-[:STATUS]->(os)) > 1
WITH os, COLLECT(l) AS logs
UNWIND logs[1..] AS unwanted
OPTIONAL MATCH (x)-[:STATUS_CHANGE]->(unwanted)-[:STATUS_CHANGE]->(y)
DETACH DELETE unwanted
FOREACH(ignored IN CASE WHEN x IS NOT NULL THEN [1] END | CREATE (x)-[:STATUS_CHANGE]->(y))
This query:
Finds (in order) all relevant OrderStatus nodes having multiple STATUS relationships.
Uses the aggregating function COLLECT to collect (in order) the Log nodes related to each of those OrderStatus nodes.
Uses UNWIND logs[1..] to get the individual unwanted Log nodes.
Uses OPTIONAL MATCH to get the 2 nodes that may need to be connected together, after the unwanted node is deleted.
Uses DETACH DELETE to deleted each unwanted node and its relationships.
Uses FOREACH to connect together the pair of nodes that might have been foiund by the OPTIONAL MATCH.
I am creating simple graph db for tranportation between few cities. My structure is:
Station = physical station
Stop = each station has several stops, depend on time and line ID
Ride = connection between stops
I need to find route from city A to city C, but i has no direct stopconnection, but they are connected thru city B. see picture please, as new user i cant post images to question.
How can I get router from City A with STOP 1 connect RIDE 1 to STOP 2 then
STOP 2 connected by same City B to STOP3 and finnaly from STOP3 by RIDE2 to STOP4 (City C)?
Thank you.
UPDATE
Solution from Vince is ok, but I need set filter to STOP nodes for departure time, something like
MATCH p=shortestPath((a:City {name:'A'})-[*{departuretime>xxx}]-(c:City {name:'C'})) RETURN p
Is possible to do without iterations all matches collection? because its to slow.
If you are simply looking for a single route between two nodes, this Cypher query will return the shortest path between two City nodes, A and C.
MATCH p=shortestPath((a:City {name:'A'})-[*]-(c:City {name:'C'})) RETURN p
In general if you have a lot of potential paths in your graph, you should limit the search depth appropriately:
MATCH p=shortestPath((a:City {name:'A'})-[*..4]-(c:City {name:'C'})) RETURN p
If you want to return all possible paths you can omit the shortestPath clause:
MATCH p=(a:City {name:'A'})-[*]-(c:City) {name:'C'}) RETURN p
The same caveats apply. See the Neo4j documentation for full details
Update
After your subsequent comment.
I'm not sure what the exact purpose of the time property is here, but it seems as if you actually want to create the shortest weighted path between two nodes, based on some minimum time cost. This is different of course to shortestPath, because that minimises on the number of edges traversed only, not the cost of those edges.
You'd normally model the traversal cost on edges, rather than nodes, but your graph has time only on the STOP nodes (and not for example on the RIDE edges, or the CITY nodes). To make a shortest weighted path query work here, we'd need to also model time as a property on all nodes and edges. If you make this change, and set the value to 0 for all nodes / edges where it isn't relevant then the following Cypher query does what I think you need.
MATCH p=(a:City {name: 'A'})-[*]-(c:City {name:'C'})
RETURN p AS shortestPath,
reduce(time=0, n in nodes(p) | time + n.time) AS m,
reduce(time=0, r in relationships(p) | time + r.time) as n
ORDER BY m + n ASC
LIMIT 1
In your example graph this produces a least cost path between A and C:
(A)->(STOP1)-(STOP2)->(B)->(STOP5)->(STOP6)->(C)
with a minimum time cost of 230.
This path includes two stops you have designated "bad", though I don't really understand why they're bad, because their traversal costs are less than other stops that are not "bad".
Or, use Dijkstra
This simple Cypher will probably not be performant on densely connected graphs. If you find that performance is a problem, you should use the REST API and the path endpoint of your source node, and request a shortest weighted path to the target node using Dijkstra's algorithm. Details here
Ah ok, if the requirement is to find paths through the graph where the departure time at every stop is no earlier than the departure time of the previous stop, this should work:
MATCH p=(:City {name:'A'})-[*]-(:City {name:'C'})
MATCH (a:Stop) where a in nodes(p)
MATCH (b:Stop) where b in nodes(p)
WITH p, a, b order by b.time
WITH p as ps, collect(distinct a) as as, collect(distinct b) as bs
WHERE as = bs
WITH ps, last(as).time - head(as).time as elapsed
RETURN ps, elapsed ORDER BY elapsed ASC
This query works by matching every possible path, and then collecting all the stops on each matched path twice over. One of these collections of stops is ordered by departure time, while the other is not. Only if the two collections are equal (i.e. number and order) is the path admitted to the results. This step evicts invalid routes. Finally, the paths themselves are ordered by least elapsed time between the first and last stop, so the quickest route is first in the list.
Normal warnings about performance, etc. apply :)
My graph looks like this
medium-[:firstChapter]->chapter1-[:nextChapter]->chapter2_to_N
there is only one node connected via :firstChapter and then several nodes may follow, connected via :nextChapter
I tried to match all nodes that are either connected via relationship :firstChapter to medium or connected via :nextChapter from one chapter to another
The query I tried looks like this
start n=node(543) match n-[:firstChapter|nextChapter*]->m return m;
node(543) is the node medium.
Surprisingly, this query returns all nodes in the path, even though the nodes are not connected to n (=medium node)
If I leave out the * sign after nextChapter, only the first node with the :firstChapter relationship is returned (chapter1), which seems to be correct.
start n=node(543) match n-[:firstChapter|nextChapter*]->m return m;
Why does the query above return nodes not connected to n? As far as I understand it, the * sign usually returns nodes that are an unlimited number of relationships away, right?
What is the best way to match all nodes of a path (only once) that are either connected via :firstChapter or :nextChapter to a start node? In this case all chapters
The query above serves that purpose, but I don't think the output is correct...
EDIT:
Added a diagramm to clarify.
As you can see, the first chapter may only be reached via :firstChapter,
So it is still unclear, why the query above returns ALL chapter nodes
Try doing match p=n-[:firstChapter|nextChapter*]->m to see what p is. Hopefully that provides some insight about how they are connected.
What you might be looking for in the end is:
start n=node(543)
match n-[:firstChapter|nextChapter*]->m
return collect(distinct m);
To get a collection of distinct chapter nodes that follow n.
update
Here's another one--didn't actually test it but it might get you closer to what you want:
start n=node(543)
match n-[:firstChapter]->f-[:nextChapter*]-m
return f + collect(distinct m);
Using the * operator, the query looks for all relationships along the line for both relationship types, :firstChapter and :nextChapter (not just :nextChapter). Your chapter data for node(543) likely contains some relationships to chapter nodes not in the 543 chain and the query is returning them.
Consider adding an extra relationship using type :nextChapter to connect the start node to the first chapter, and check the relationships that exist on your chapters.
Then run:
start n=node(543)
match n-[:nextChapter*]->m
return m;
and see if you still get extra results. If so, you could run the following, bumping up n each time until you find the node that has the extra relationship(s) - (though I'm sure there are other ways!)
start n=node(543)
match n-[:nextChapter*1..n]->m
return m;