Neo4j: Cypher query returns duplicate results - neo4j

I have two graphs built like this :
CREATE (level1a:Bug {name: 'a'})
CREATE (level1b:Bug {name: 'b'})
CREATE (level2c:Bug {name: 'c'})
CREATE (level2d:Bug {name: 'd'})
CREATE (level3e:Bug {name: 'e'})
CREATE (level3f:Bug {name: 'f'})
CREATE (level3g:Bug {name: 'g'})
CREATE (level3h:Bug {name: 'h'})
CREATE (level1a)-[:LINK]->(level2c)
CREATE (level1b)-[:LINK]->(level2d)
CREATE (level2c)-[:LINK]->(level3e)
CREATE (level2c)-[:LINK]->(level3f)
CREATE (level2d)-[:LINK]->(level3g)
CREATE (level2d)-[:LINK]->(level3h)
And also available here : http://console.neo4j.org/?id=duplicate_bug2
When I execute the query :
MATCH (a:Bug {name: 'a'})-[:LINK]->()-[:LINK]->(end) return end
I get the expected two nodes (f and e). But if I do two match queries like this :
MATCH (a:Bug {name: 'a'})-[:LINK]->()-[:LINK]->(end)
MATCH (b:Bug {name: 'b'})-[:LINK]->()-[:LINK]->(end2)
return end, end2
I get duplicates nodes in end and end2. Why is this? The two graphs are not even connected!
BR,
S

Since both matches will return multiple rows and there is no correlation between the two match statements it will generate a cross product of the two result sets. In this case it is 2x2 so you get four rows of each node with each node.
I think what you are after is something like this query. It finds all of the ends from the first match, combines them in a collection and then repeats the process for the second match. Then it returns a single row in the result set with all of the ends of a and all of the ends of b regardless of how many there are at the end of each match.
MATCH (a:Bug {name: 'a'})-[:LINK]->()-[:LINK]->(end)
with collect(end) as end
MATCH (b:Bug {name: 'b'})-[:LINK]->()-[:LINK]->(end2)
return end, collect(end2) as end2

Related

How would I query for a relationship between these two nodes?

If I have a graph that looks like so for nodes (p) and (e):
(p:Person)-[r:WorksFor]->(e:Employer)
And I have the following data:
(Person {name: Andrew})-[r:WorksFor]->(Employer {name: Google})
(Person {name: James})-[r:WorksFor]->(Employer {name: Google})
(Person {name: James})-[r:WorksFor]->(Employer {name: Apple})
(Person {name: Evan})-[r:WorksFor]->(Employer {name: Apple})
How can I query between (Person {name: Evan}) through each relationship and get to (Person {name: Andrew}) returning each employer and person along the way with an arbitrary number of employers and persons in-between?
Ideally the above would return a chain that looked like:
(Andrew)->(Google)->(James)->(Apple)->(Evan)
Thank you for your help.
(EDIT) Addendum:
The following seems to work but only if the players are separated by only two degrees, is there a way to make this completely variable length?
MATCH
(p:Person {name: "Andrew"})-->(e:Employer)<--(p3:Person)-->(e2:Employer)<--(p2:Person {name: "Evan"})
RETURN *
You want variable-length pattern matching.
Depending on your graph you can define the relationships to traverse or leave off the type. We can omit the direction to specify that we don't care about the direction of relationships to traverse:
MATCH path = (p:Person {name: "Andrew"})-[:WorksFor*]-(p2:Person {name: "Evan"})
RETURN path
If you want the nodes in the path, you can return nodes(path) to get that list.
If you only want the shortest path between these two, you can match to both then match using the shortestPath function:
MATCH (p:Person {name: "Andrew"}), (p2:Person {name: "Evan"})
MATCH path = shortestPath((p)-[:WorksFor*]-(p2))
RETURN path

How to match all paths that ends with nodes with common properties in Neo4j?

I would like to match all paths from one given node.
-->(c: {name:"*Tom*"})
/
(a)-->(b)-->(d: {name:"*Tom*"})
\
-->(e: {name:"*Tom*"})
These paths have specified structure that:
- the name of all children of the second-last node (b) should contain "Tom" substring.
How to write correct Cypher?
Let's recreate the dataset:
CREATE
(a:Person {name: 'Start'}),
(b:Person),
(c:Person {name: 'Tommy Lee Jones'}),
(d:Person {name: 'Tom Hanks'}),
(e:Person {name: 'Tom the Cat'}),
(a)-[:FRIEND]->(b),
(b)-[:FRIEND]->(c),
(b)-[:FRIEND]->(d),
(b)-[:FRIEND]->(e)
As you said in the comment, all requires a list. To get a list, you should use the collect function on the neighbours of b:
MATCH (:Person)-[:FRIEND]->(b:Person)-[:FRIEND]->(bn:Person)
WITH b, collect(bn) AS bns
WHERE all(bn in bns where bn.name =~ '.*Tom.*')
RETURN b, bns
We call b's neighbours as bn and collect them to a bns list.

query dependency chain of producers and requirements

I'm setting up a graph structure with transformers that 'require' and 'produce' 1 or more Kafka topics. I can define the graph structure ok, but I'd like some help with a query.
I'd like to query: what chain of transformers and topics are required to create a certain topic, for instance in the sample below, what transformers are required to produce Topic3. I'd expect
Ingest1->Topic7->T1->Topic1->T2->Topic3
The first answer below isn't quite correct, because it doesn't take into account the alternating directions of requires and produces.
A correct query up to a certain depth would be something like
MATCH (topic:Topic{name:"topic-3"})
<-[:produces]- (tr1) -[:requires]->(tp1)
<-[:produces]- (tr2) -[:requires]->(tp2)
<-[:produces]- (tr3)
return [topic,tr1,tp1,tr2,tp2,tr3] as List
So it seems I'm looking for something that can repeat the paired produces/requires vertices.
Here's some data that I'm playing with.
CREATE (DB1:Database {backbone: true, name:"postgres db 1"})
CREATE (Ingest1:Ingest {backbone: true, name: "ingest-1"})
CREATE (KV1:KV {name: "key-value store 1"})
CREATE (KV2:KV {name: "key-value store 2"})
CREATE (KV1)-[:requires]->(DB1)
CREATE (KV2)-[:requires]->(DB1)
CREATE (Topic1:Topic {name: "topic-1", partitions:100})
CREATE (Topic2:Topic {name: "topic-2", partitions:100})
CREATE (Topic3:Topic {name: "topic-3", partitions:100})
CREATE (Topic4:Topic {name: "topic-4", partitions:100})
CREATE (Topic5:Topic {name: "topic-5", partitions:100})
CREATE (Topic6:Topic {name: "topic-6", partitions:100})
CREATE (Topic7:Topic {name: "topic-7", partitions:100})
CREATE (Topic8:Topic {name: "topic-8", partitions:100})
CREATE (T2:Transformer {name: "T2"})
CREATE (T1:Transformer {name: "T1"})
CREATE (T3:Transformer {name: "T3"})
CREATE (T4:Transformer {name: "T4"})
CREATE (T5:Transformer {name: "T5"})
CREATE (T6:Transformer {name: "T6"})
CREATE (T7:Transformer {name: "T7"})
CREATE (T8:Transformer {name: "T8"})
CREATE (T9:Transformer {name: "T9"})
CREATE (T4)-[:requires]->(Topic3)
CREATE (T5)-[:requires]->(Topic3)
CREATE (T2)-[:produces]->(Topic3)
CREATE (T2)-[:produces]->(Topic4)
CREATE (T2)-[:produces]->(KV1)
CREATE (T2)-[:requires]->(Topic1)
CREATE (T4)-[:produces]->(Topic5)
CREATE (T2)-[:requires]->(Topic2)
CREATE (T1)-[:produces]->(Topic1)
CREATE (T1)-[:requires]->(Topic7)
CREATE (T3)-[:produces]->(Topic2)
CREATE (T3)-[:requires]->(Topic8)
CREATE (Ingest1)-[:produces]->(Topic7)
CREATE (Ingest1)-[:produces]->(Topic8);
How about something like this?
// find the transformer from the selected topic
MATCH (topic3:Topic {name: "topic-3"})<-[produces]-(transformer:Transformer)
// find the path(s) back from the transformer to the ingest
MATCH p=(transformer)-[:produces|requires*]-(i:Ingest)
// put the names in a collection from topic3 back to ingest
WITH reduce(chain = [topic3.name], n in nodes(p) | chain + n.name) as chain
// return the collection in the desired order
RETURN reverse(chain)
It could be simplified to this as well
MATCH p=(topic3:Topic {name: "topic-3"})-[:produces|requires*]-(i:Ingest)
WITH reduce(chain = [topic3.name], n in nodes(p) | chain + n.name) as chain
RETURN reverse(chain)

Neo4j Passing distinct nodes through WITH in Cypher

I have the following query, where there are 3 MATCHES, connected with WITH, searching through 3 paths.
MATCH (:File {name: 'A'})-[:FILE_OF]->(:Fun {name: 'B'})-->(ent:CFGEntry)-[:Flows*]->()-->(expr:CallExpr {name: 'C'})-->()-[:IS_PARENT]->(Callee {name: 'd'})
WITH expr, ent
MATCH (expr)-->(:Arg {chNum: '1'})-->(id:Id)
WITH id, ent
MATCH (entry)-[:Flows*]->(:IdDecl)-[:Def]->(sym:Sym)
WHERE id.name = sym.name
RETURN id.name
The query returns two distinct id and one distinct entry, and 7 distinct sym.
The problem is that since in the second MATCH I pass "WITH id, entry", and two distinct id were found, two instances of entry is passed to the third match instead of 1, and the run time of the third match unnecessarily gets doubled at least.
I am wondering if anyone know how I should write this query to just make use of one single instance of entry.
Your best bet will be to aggregate id, but then you'll need to adjust your logic in the third part of your query accordingly:
MATCH (:File {name: 'A'})-[:FILE_OF]->(:Fun {name: 'B'})-->(ent:CFGEntry)-[:Flows*]->()-->(expr:CallExpr {name: 'C'})-->()-[:IS_PARENT]->(Callee {name: 'd'})
WITH expr, ent
MATCH (expr)-->(:Arg {chNum: '1'})-->(id:Id)
WITH collect(id.name) as names, ent
MATCH (entry)-[:Flows*]->(:IdDecl)-[:Def]->(sym:Sym)
WHERE sym.name in names
RETURN sym.name

CREATE UNIQUE in neo4j produces duplicate nodes

According to the neo4j documentation:
CREATE UNIQUE is in the middle of MATCH and CREATE — it will match
what it can, and create what is missing. CREATE UNIQUE will always
make the least change possible to the graph — if it can use parts of
the existing graph, it will.
This sounds great, but CREATE UNIQUE doesn't seem to follow the 'least possible change' rule. e.g., here is some Cypher to create two people:
CREATE (n:Person {name: 'Alice'})
CREATE (n:Person {name: 'Bob'})
CREATE INDEX ON :Person(name)
and here's two CREATE UNIQUE statements, to create a relationship between those people. Since both people already exist in the graph, only the relationships should be newly created:
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)-[:knows]->(b:Person {name: 'Bob'})
RETURN a
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)<-[:knows]-(b:Person {name: 'Bob'})
RETURN a
After this, the graph should look like
(Alice)<---KNOWS--->(Bob).
But when you run a MATCH query:
MATCH (a:Person)
RETURN a
it seems that the graph now looks like
(Bob)
(Bob)--KNOWS-->(Alice)--KNOWS-->(Bob);
two extra Bobs have been created.
I looked a bit through the other Cypher commands, but none of them seem intended for this use case: create a link between existing node A and existing node B if B exists, and otherwise create a link between existing node A and a newly created node B. How can this problem best be solved within the Cypher framework?
This query should do what you want (if you always want to end up with a single knows relationship between the 2 nodes):
MATCH (a:Person {name: 'Alice'})
MERGE (b:Person {name: 'Bob'})
MERGE (a)-[:knows]->(b)
RETURN a;
Here is how you can do it with CREATE UNIQUE
MATCH (a:Person {name: 'Alice'}), (b:Person {name:'Bob'})
CREATE UNIQUE (a)-[:knows]->(b), (b)-[:knows]->(a)
You need 2 match clauses otherwise you are always creating the node in the CREATE UNIQUE statement, not matching existing nodes.

Resources