New to Neo4j but can see so many possibilities in graph databases, in particular IT data workflow and system impact. But unsure of the correct design for maximum efficiency.
Consider a system that takes in files, processes them, stores them in database and makes data available in various reports. However, depending on the file, the data may be in one report, but not the other.
System Architecture and Reality
An important use case is to be able to report the impact on downstream reports if upstream files are missing or components that process those files fail.
Test Cases
I have come up with 4 designs, 3 of which seem to work, but unsure which is best.
Design 1
Design 2
Design 3
Design 4
Would appreciate any help or advice on this.
Code used:
---------------------------------------------------------------------------
-- Design Experiments
---------------------------------------------------------------------------
// 1. Combination of the Workflows with shared nodes where they interact
with same Process or DataStore
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[r*]->(rp:Report) RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[r*]->(rp:Report) RETURN rp
// 2. Same node relationship design as #1, but assign a workflow property
to each node and relationship as a property array
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1", workflow: ["workflow1","workflow2"]})
CREATE (p2:Provider {name: "Provider 2", workflow: ["workflow3"]})
CREATE (f1:File {name: "File 1", workflow: ["workflow1"]})
CREATE (f2:File {name: "File 2", workflow: ["workflow2"]})
CREATE (f3:File {name: "File 3", workflow: ["workflow3"]})
CREATE (pp:PreProcess {name: "PreProcess", workflow: ["workflow1"]})
CREATE (p:Process {name: "Process", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (d:DataStore {name: "DataStore", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (rA:Report {name: "Report A", workflow: ["workflow1","workflow3"]})
CREATE (rB:Report {name: "Report B", workflow: ["workflow2"]})
CREATE (p1)-[:PROVIDES{workflow: ["workflow1"]}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: ["workflow2"]}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: ["workflow3"]}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: ["workflow3"]}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: ["workflow1","workflow2","workflow3"]}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow1","workflow3"]}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(rB)
// Show individual workflows
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow1") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow2") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow3") RETURN p
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"}) WITH a.workflow AS workflows
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"}) WITH a.workflow AS workflows
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r
// 3. Same node relationship design as #1, but create a relationship
with a workflow property for each workflow, resulting in multiple
relatinships between nodes.
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: "workflow1"}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow1"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow2"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow3"}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow3"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow2"}]->(rB)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp
// 4. Distinct set of nodes and relationships for each workflow, but all
with same node type so can still be matched
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 1"})
CREATE (p3:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp1:PreProcess {name: "PreProcess"})
CREATE (pc1:Process {name: "Process"})
CREATE (pc2:Process {name: "Process"})
CREATE (pc3:Process {name: "Process"})
CREATE (d1:DataStore {name: "DataStore"})
CREATE (d2:DataStore {name: "DataStore"})
CREATE (d3:DataStore {name: "DataStore"})
CREATE (rA1:Report {name: "Report A"})
CREATE (rB2:Report {name: "Report B"})
CREATE (rA3:Report {name: "Report A"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p2)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p3)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp1)
CREATE (pp1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pc1)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(pc2)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(pc3)
CREATE (pc1)-[:DELIVERS_TO{workflow: "workflow1"}]->(d1)
CREATE (pc2)-[:DELIVERS_TO{workflow: "workflow2"}]->(d2)
CREATE (pc3)-[:DELIVERS_TO{workflow: "workflow3"}]->(d3)
CREATE (d1)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA1)
CREATE (d2)-[:DELIVERS_TO{workflow: "workflow3"}]->(rB2)
CREATE (d3)-[:DELIVERS_TO{workflow: "workflow2"}]->(rA3)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j*]->(rp:Report) RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j*]->(rp:Report) RETURN rp
Following recommendation, have expanded Design 1 to include a direct link between File and Report.
Design 1a
// 1a. Combination of the Workflows with shared nodes where they interact
with same Process or DataStore.
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)
CREATE (f1)-[:USED_BY{}]->(rA)
CREATE (f2)-[:USED_BY{}]->(rB)
CREATE (f3)-[:USED_BY{}]->(rA)
// Show impacted reports (and path) if Provider 1 is down
MATCH path = (:Provider{name:'Provider 1'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report
// Show impacted reports (and path) if Provider 2 is down
MATCH path = (:Provider{name:'Provider 2'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report
You've done some thorough exploration here, you've found designs for which your queries work. There is a cost to them, however.
Design 2 doesn't use relationships at all, so the solution doesn't seem very graphy. It also requires you to ensure the workflows lists on the relevant nodes are kept in sync and up to date. That seems to have a higher maintenance cost.
Design 3 has a similar cost, but now the properties are on the relationships, and you also have to provide redundant relationships throughout your model, so the cost is higher.
Design 4 requires redundancy of each used step in the process, where every subgraph is a single path from provider to report. While that is easy to understand and query over, redundant nodes and relationships probably aren't the way to go.
Design 1 is interesting in that it provides the correct answers but only to certain questions...questions about impacts from processors, preprocessors, and datastores in the path, what happens when these hardware and software components go down.
However it doesn't work for data lineage/dependence. Not yet. You may want to consider altering design 1 so that there are separate paths to consider for data dependence vs what you already have for the pipeline process.
Data dependence can be a different thing. If you're asking questions about this, then you're mostly concerned with the inputs and outputs, files to reports. In that case you might consider creating a :DEPENDS_ON relationship between the relevant files and reports nodes.
Consider adding this in to design 1's creation script at the end:
match (f:File), (r:Report{name:'Report A'})
where f.name in ['File 1', 'File 3']
create (r)<-[:USED_BY]-(f)
and
match (f:File), (r:Report{name:'Report B'})
where f.name in ['File 2']
create (r)<-[:USED_BY]-(f)
For questions about the data lineages, your queries can use only the relevant relationships, in this case :PROVIDES and :USED_BY.
match path = (:Provider{name:'Provider 1'})-[:PROVIDES|USED_BY*]->(r:Report)
return path, r.name as report
Or the inverse, what sources does a report draw upon?
match path = (p:Provider)-[:PROVIDES|USED_BY*]->(r:Report{name:'Report A')
return path, p.name as report
And if your model changes so that intermediary reports are modeled (the output of preprocess and process operations), then you can create :USED_BY relationships to those in a chain from the :File to the :Report (instead of directly between the :File and :Report) so you'll see the chain of dependencies during the processing.
Related
Initial Situation
Large Neo4j 3.4.6 graph with a tree-like structure (10 levels deep, 10 million nodes).
Unexceptional all nodes are connected with each other. The nodes as well as the relationships are in each case of the same type.
Exactly one central root node.
Reduced and simplified example:
Graphic representation
CREATE (Root:CustomType {name: 'Root'})
CREATE (NodeA:CustomType {name: 'NodeA'})
CREATE (NodeB:CustomType {name: 'NodeB'})
CREATE (NodeC:CustomType {name: 'NodeC'})
CREATE (NodeD:CustomType {name: 'NodeD'})
CREATE (NodeE:CustomType {name: 'NodeE'})
CREATE (NodeF:CustomType {name: 'NodeF'})
CREATE (NodeG:CustomType {name: 'NodeG'})
CREATE (NodeH:CustomType {name: 'NodeH'})
CREATE (NodeI:CustomType {name: 'NodeI'})
CREATE (NodeJ:CustomType {name: 'NodeJ'})
CREATE (NodeK:CustomType {name: 'NodeK'})
CREATE (NodeL:CustomType {name: 'NodeL'})
CREATE (NodeM:CustomType {name: 'NodeM'})
CREATE (NodeN:CustomType {name: 'NodeN'})
CREATE (NodeO:CustomType {name: 'NodeO'})
CREATE (NodeP:CustomType {name: 'NodeP'})
CREATE (NodeQ:CustomType {name: 'NodeQ'})
CREATE
(Root)-[:CONTAINS]->(NodeA),
(Root)-[:CONTAINS]->(NodeB),
(Root)-[:CONTAINS]->(NodeC),
(NodeA)-[:CONTAINS]->(NodeD),
(NodeA)-[:CONTAINS]->(NodeE),
(NodeA)-[:CONTAINS]->(NodeF),
(NodeE)-[:CONTAINS]->(NodeG),
(NodeE)-[:CONTAINS]->(NodeH),
(NodeF)-[:CONTAINS]->(NodeI),
(NodeF)-[:CONTAINS]->(NodeJ),
(NodeF)-[:CONTAINS]->(NodeK),
(NodeI)-[:CONTAINS]->(NodeL),
(NodeI)-[:CONTAINS]->(NodeM),
(NodeJ)-[:CONTAINS]->(NodeN),
(NodeK)-[:CONTAINS]->(NodeO),
(NodeK)-[:CONTAINS]->(NodeP),
(NodeM)-[:CONTAINS]->(NodeQ);
To be solved challenge
By means of a MATCH-WITH-UNWIND Cypher query I’m successfully able to select a subtree and bind it to a path. Let’s say the subtree spans over the nodes A,E,F,I and J.
Based on this path I need all leaves of the subtree, not the complete tree now.
.
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'}) /* simplified */
WITH
nodes(path) as selectedPath
/* here: necessary magic to identify the leaf nodes of the subtree */
RETURN
leafNode;
Among other things I tried to solve the requirement with a WHERE NOT(node-->()) approach, but realized this works for leaves of the complete tree only. Unfortunately I was not able to convince the WHERE NOT(node-->()) clause to respect the selected subtree boundaries.
So, how can I find all leaves of a selected subgraph with Cypher and Neo4j? Can you please give me an advice how to solve this challenge? Many thanks in advance for pointing me into the right direction!
You correctly noted that the check node with no children is suitable only for the entire tree. So you need to go through all the relationships in the subtree, and find such a node of the subtree that is as the end of the relationship, but not as the start of the relationship:
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'})
UNWIND relationShips(path) AS r
WITH collect(DISTINCT endNode(r)) AS endNodes,
collect(DISTINCT startNode(r)) AS startNodes
UNWIND endNodes AS leaf
WITH leaf WHERE NOT leaf IN startNodes
RETURN leaf
Relatively new to Neo4j. I realize the way I originally posted this it was too ambiguous. Below is hopefully a better explanation.
//Subgraph 1
Create (p1:Person {name: 'Person1'})
Create (p2:Person {name: 'Person2'})
Create (a1:Address {street: 'Suspicious'})
Create (p1)-[:Resides]->(a1)
Create (p2)-[:Resides]->(a1)
//Subgraph 2
Create (p3:Person {name: 'Person3'})
Create (p4:Person {name: 'Person4'})
Create (a2:Address {street: 'Double'})
Create (p3)-[:Resides]->(a2)
Create (p4)-[:Resides]->(a2)
Create (p3)-[:Knows]->(p4)
//Subgraph 3
Create (p5:Person {name: 'Person5'})
Create (a3:Address {street: 'Single'})
Create (p5)-[:Resides]->(a3)
What I would like to write is a query to detect the following:
- All addresses (and people) that have 2 or more People residing there that do not know each other.
This means that only Subgraph1 should be found.
Subgraph2 would not be found because there are 2 people that reside there but they know each other.
Subgraph3 would not be found because there is only 1 person residing there.
Again, thanks for the help.
This Cypher query should work:
MATCH (n1)-[:RESIDES_AT]->()<-[:RESIDES_AT]-(n2)
WHERE NOT exists((n1)-[:KNOWS]-(n2))
RETURN n1, n2
start by matching on nodes that have a RESIDES_AT relationship to the same node, then filter out nodes that have a KNOWS relationship.
I'm setting up a graph structure with transformers that 'require' and 'produce' 1 or more Kafka topics. I can define the graph structure ok, but I'd like some help with a query.
I'd like to query: what chain of transformers and topics are required to create a certain topic, for instance in the sample below, what transformers are required to produce Topic3. I'd expect
Ingest1->Topic7->T1->Topic1->T2->Topic3
The first answer below isn't quite correct, because it doesn't take into account the alternating directions of requires and produces.
A correct query up to a certain depth would be something like
MATCH (topic:Topic{name:"topic-3"})
<-[:produces]- (tr1) -[:requires]->(tp1)
<-[:produces]- (tr2) -[:requires]->(tp2)
<-[:produces]- (tr3)
return [topic,tr1,tp1,tr2,tp2,tr3] as List
So it seems I'm looking for something that can repeat the paired produces/requires vertices.
Here's some data that I'm playing with.
CREATE (DB1:Database {backbone: true, name:"postgres db 1"})
CREATE (Ingest1:Ingest {backbone: true, name: "ingest-1"})
CREATE (KV1:KV {name: "key-value store 1"})
CREATE (KV2:KV {name: "key-value store 2"})
CREATE (KV1)-[:requires]->(DB1)
CREATE (KV2)-[:requires]->(DB1)
CREATE (Topic1:Topic {name: "topic-1", partitions:100})
CREATE (Topic2:Topic {name: "topic-2", partitions:100})
CREATE (Topic3:Topic {name: "topic-3", partitions:100})
CREATE (Topic4:Topic {name: "topic-4", partitions:100})
CREATE (Topic5:Topic {name: "topic-5", partitions:100})
CREATE (Topic6:Topic {name: "topic-6", partitions:100})
CREATE (Topic7:Topic {name: "topic-7", partitions:100})
CREATE (Topic8:Topic {name: "topic-8", partitions:100})
CREATE (T2:Transformer {name: "T2"})
CREATE (T1:Transformer {name: "T1"})
CREATE (T3:Transformer {name: "T3"})
CREATE (T4:Transformer {name: "T4"})
CREATE (T5:Transformer {name: "T5"})
CREATE (T6:Transformer {name: "T6"})
CREATE (T7:Transformer {name: "T7"})
CREATE (T8:Transformer {name: "T8"})
CREATE (T9:Transformer {name: "T9"})
CREATE (T4)-[:requires]->(Topic3)
CREATE (T5)-[:requires]->(Topic3)
CREATE (T2)-[:produces]->(Topic3)
CREATE (T2)-[:produces]->(Topic4)
CREATE (T2)-[:produces]->(KV1)
CREATE (T2)-[:requires]->(Topic1)
CREATE (T4)-[:produces]->(Topic5)
CREATE (T2)-[:requires]->(Topic2)
CREATE (T1)-[:produces]->(Topic1)
CREATE (T1)-[:requires]->(Topic7)
CREATE (T3)-[:produces]->(Topic2)
CREATE (T3)-[:requires]->(Topic8)
CREATE (Ingest1)-[:produces]->(Topic7)
CREATE (Ingest1)-[:produces]->(Topic8);
How about something like this?
// find the transformer from the selected topic
MATCH (topic3:Topic {name: "topic-3"})<-[produces]-(transformer:Transformer)
// find the path(s) back from the transformer to the ingest
MATCH p=(transformer)-[:produces|requires*]-(i:Ingest)
// put the names in a collection from topic3 back to ingest
WITH reduce(chain = [topic3.name], n in nodes(p) | chain + n.name) as chain
// return the collection in the desired order
RETURN reverse(chain)
It could be simplified to this as well
MATCH p=(topic3:Topic {name: "topic-3"})-[:produces|requires*]-(i:Ingest)
WITH reduce(chain = [topic3.name], n in nodes(p) | chain + n.name) as chain
RETURN reverse(chain)
I have two graphs built like this :
CREATE (level1a:Bug {name: 'a'})
CREATE (level1b:Bug {name: 'b'})
CREATE (level2c:Bug {name: 'c'})
CREATE (level2d:Bug {name: 'd'})
CREATE (level3e:Bug {name: 'e'})
CREATE (level3f:Bug {name: 'f'})
CREATE (level3g:Bug {name: 'g'})
CREATE (level3h:Bug {name: 'h'})
CREATE (level1a)-[:LINK]->(level2c)
CREATE (level1b)-[:LINK]->(level2d)
CREATE (level2c)-[:LINK]->(level3e)
CREATE (level2c)-[:LINK]->(level3f)
CREATE (level2d)-[:LINK]->(level3g)
CREATE (level2d)-[:LINK]->(level3h)
And also available here : http://console.neo4j.org/?id=duplicate_bug2
When I execute the query :
MATCH (a:Bug {name: 'a'})-[:LINK]->()-[:LINK]->(end) return end
I get the expected two nodes (f and e). But if I do two match queries like this :
MATCH (a:Bug {name: 'a'})-[:LINK]->()-[:LINK]->(end)
MATCH (b:Bug {name: 'b'})-[:LINK]->()-[:LINK]->(end2)
return end, end2
I get duplicates nodes in end and end2. Why is this? The two graphs are not even connected!
BR,
S
Since both matches will return multiple rows and there is no correlation between the two match statements it will generate a cross product of the two result sets. In this case it is 2x2 so you get four rows of each node with each node.
I think what you are after is something like this query. It finds all of the ends from the first match, combines them in a collection and then repeats the process for the second match. Then it returns a single row in the result set with all of the ends of a and all of the ends of b regardless of how many there are at the end of each match.
MATCH (a:Bug {name: 'a'})-[:LINK]->()-[:LINK]->(end)
with collect(end) as end
MATCH (b:Bug {name: 'b'})-[:LINK]->()-[:LINK]->(end2)
return end, collect(end2) as end2
According to the neo4j documentation:
CREATE UNIQUE is in the middle of MATCH and CREATE — it will match
what it can, and create what is missing. CREATE UNIQUE will always
make the least change possible to the graph — if it can use parts of
the existing graph, it will.
This sounds great, but CREATE UNIQUE doesn't seem to follow the 'least possible change' rule. e.g., here is some Cypher to create two people:
CREATE (n:Person {name: 'Alice'})
CREATE (n:Person {name: 'Bob'})
CREATE INDEX ON :Person(name)
and here's two CREATE UNIQUE statements, to create a relationship between those people. Since both people already exist in the graph, only the relationships should be newly created:
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)-[:knows]->(b:Person {name: 'Bob'})
RETURN a
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)<-[:knows]-(b:Person {name: 'Bob'})
RETURN a
After this, the graph should look like
(Alice)<---KNOWS--->(Bob).
But when you run a MATCH query:
MATCH (a:Person)
RETURN a
it seems that the graph now looks like
(Bob)
(Bob)--KNOWS-->(Alice)--KNOWS-->(Bob);
two extra Bobs have been created.
I looked a bit through the other Cypher commands, but none of them seem intended for this use case: create a link between existing node A and existing node B if B exists, and otherwise create a link between existing node A and a newly created node B. How can this problem best be solved within the Cypher framework?
This query should do what you want (if you always want to end up with a single knows relationship between the 2 nodes):
MATCH (a:Person {name: 'Alice'})
MERGE (b:Person {name: 'Bob'})
MERGE (a)-[:knows]->(b)
RETURN a;
Here is how you can do it with CREATE UNIQUE
MATCH (a:Person {name: 'Alice'}), (b:Person {name:'Bob'})
CREATE UNIQUE (a)-[:knows]->(b), (b)-[:knows]->(a)
You need 2 match clauses otherwise you are always creating the node in the CREATE UNIQUE statement, not matching existing nodes.