CREATE UNIQUE in neo4j produces duplicate nodes - neo4j

According to the neo4j documentation:
CREATE UNIQUE is in the middle of MATCH and CREATE — it will match
what it can, and create what is missing. CREATE UNIQUE will always
make the least change possible to the graph — if it can use parts of
the existing graph, it will.
This sounds great, but CREATE UNIQUE doesn't seem to follow the 'least possible change' rule. e.g., here is some Cypher to create two people:
CREATE (n:Person {name: 'Alice'})
CREATE (n:Person {name: 'Bob'})
CREATE INDEX ON :Person(name)
and here's two CREATE UNIQUE statements, to create a relationship between those people. Since both people already exist in the graph, only the relationships should be newly created:
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)-[:knows]->(b:Person {name: 'Bob'})
RETURN a
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)<-[:knows]-(b:Person {name: 'Bob'})
RETURN a
After this, the graph should look like
(Alice)<---KNOWS--->(Bob).
But when you run a MATCH query:
MATCH (a:Person)
RETURN a
it seems that the graph now looks like
(Bob)
(Bob)--KNOWS-->(Alice)--KNOWS-->(Bob);
two extra Bobs have been created.
I looked a bit through the other Cypher commands, but none of them seem intended for this use case: create a link between existing node A and existing node B if B exists, and otherwise create a link between existing node A and a newly created node B. How can this problem best be solved within the Cypher framework?

This query should do what you want (if you always want to end up with a single knows relationship between the 2 nodes):
MATCH (a:Person {name: 'Alice'})
MERGE (b:Person {name: 'Bob'})
MERGE (a)-[:knows]->(b)
RETURN a;

Here is how you can do it with CREATE UNIQUE
MATCH (a:Person {name: 'Alice'}), (b:Person {name:'Bob'})
CREATE UNIQUE (a)-[:knows]->(b), (b)-[:knows]->(a)
You need 2 match clauses otherwise you are always creating the node in the CREATE UNIQUE statement, not matching existing nodes.

Related

Find all leaves of a selected subgraph with Neo4j/ Cypher

Initial Situation
Large Neo4j 3.4.6 graph with a tree-like structure (10 levels deep, 10 million nodes).
Unexceptional all nodes are connected with each other. The nodes as well as the relationships are in each case of the same type.
Exactly one central root node.
Reduced and simplified example:
Graphic representation
CREATE (Root:CustomType {name: 'Root'})
CREATE (NodeA:CustomType {name: 'NodeA'})
CREATE (NodeB:CustomType {name: 'NodeB'})
CREATE (NodeC:CustomType {name: 'NodeC'})
CREATE (NodeD:CustomType {name: 'NodeD'})
CREATE (NodeE:CustomType {name: 'NodeE'})
CREATE (NodeF:CustomType {name: 'NodeF'})
CREATE (NodeG:CustomType {name: 'NodeG'})
CREATE (NodeH:CustomType {name: 'NodeH'})
CREATE (NodeI:CustomType {name: 'NodeI'})
CREATE (NodeJ:CustomType {name: 'NodeJ'})
CREATE (NodeK:CustomType {name: 'NodeK'})
CREATE (NodeL:CustomType {name: 'NodeL'})
CREATE (NodeM:CustomType {name: 'NodeM'})
CREATE (NodeN:CustomType {name: 'NodeN'})
CREATE (NodeO:CustomType {name: 'NodeO'})
CREATE (NodeP:CustomType {name: 'NodeP'})
CREATE (NodeQ:CustomType {name: 'NodeQ'})
CREATE
(Root)-[:CONTAINS]->(NodeA),
(Root)-[:CONTAINS]->(NodeB),
(Root)-[:CONTAINS]->(NodeC),
(NodeA)-[:CONTAINS]->(NodeD),
(NodeA)-[:CONTAINS]->(NodeE),
(NodeA)-[:CONTAINS]->(NodeF),
(NodeE)-[:CONTAINS]->(NodeG),
(NodeE)-[:CONTAINS]->(NodeH),
(NodeF)-[:CONTAINS]->(NodeI),
(NodeF)-[:CONTAINS]->(NodeJ),
(NodeF)-[:CONTAINS]->(NodeK),
(NodeI)-[:CONTAINS]->(NodeL),
(NodeI)-[:CONTAINS]->(NodeM),
(NodeJ)-[:CONTAINS]->(NodeN),
(NodeK)-[:CONTAINS]->(NodeO),
(NodeK)-[:CONTAINS]->(NodeP),
(NodeM)-[:CONTAINS]->(NodeQ);
To be solved challenge
By means of a MATCH-WITH-UNWIND Cypher query I’m successfully able to select a subtree and bind it to a path. Let’s say the subtree spans over the nodes A,E,F,I and J.
Based on this path I need all leaves of the subtree, not the complete tree now.
.
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'}) /* simplified */
WITH
nodes(path) as selectedPath
/* here: necessary magic to identify the leaf nodes of the subtree */
RETURN
leafNode;
Among other things I tried to solve the requirement with a WHERE NOT(node-->()) approach, but realized this works for leaves of the complete tree only. Unfortunately I was not able to convince the WHERE NOT(node-->()) clause to respect the selected subtree boundaries.
So, how can I find all leaves of a selected subgraph with Cypher and Neo4j? Can you please give me an advice how to solve this challenge? Many thanks in advance for pointing me into the right direction!
You correctly noted that the check node with no children is suitable only for the entire tree. So you need to go through all the relationships in the subtree, and find such a node of the subtree that is as the end of the relationship, but not as the start of the relationship:
MATCH
path = (:CustomType {name:'NodeA'})-[:CONTAINS*]->(:CustomType {name:'NodeJ'})
UNWIND relationShips(path) AS r
WITH collect(DISTINCT endNode(r)) AS endNodes,
collect(DISTINCT startNode(r)) AS startNodes
UNWIND endNodes AS leaf
WITH leaf WHERE NOT leaf IN startNodes
RETURN leaf

How to merge tree in neo4j

Let's say I have a database with named nodes and that the database is either empty or has the following content:
I now need a neo4j statement, that inserts exactly that tree structure, if it does not exists already in the database.
For simple node pair merge, I could use something like
MERGE ({name: 'A'})-[:R1]->({name: 'B'})
But I want the tree structure. How do I add C here?
Firstly, you have to add a label on your tree node (Tree in my above example) and create a unique constraint on the name attribute like this :
CREATE CONSTRAINT ON (n:Tree) ASSERT n.name IS UNIQUE;
Then you can use this script to create the C node and the others is they don't exist :
MERGE (a:Tree {name: 'A'})
MERGE (b:Tree {name: 'B'})
MERGE (c:Tree {name: 'C'})
MERGE (a)-[:R1]->(b)
MERGE (a)-[:R2]->(c);
As you can see you have to use one MERGE per node, and then one MERGE per relationship.

Neo4j's Cypher query language - reducing nodes in a match

Relatively new to Neo4j. I realize the way I originally posted this it was too ambiguous. Below is hopefully a better explanation.
//Subgraph 1
Create (p1:Person {name: 'Person1'})
Create (p2:Person {name: 'Person2'})
Create (a1:Address {street: 'Suspicious'})
Create (p1)-[:Resides]->(a1)
Create (p2)-[:Resides]->(a1)
//Subgraph 2
Create (p3:Person {name: 'Person3'})
Create (p4:Person {name: 'Person4'})
Create (a2:Address {street: 'Double'})
Create (p3)-[:Resides]->(a2)
Create (p4)-[:Resides]->(a2)
Create (p3)-[:Knows]->(p4)
//Subgraph 3
Create (p5:Person {name: 'Person5'})
Create (a3:Address {street: 'Single'})
Create (p5)-[:Resides]->(a3)
What I would like to write is a query to detect the following:
- All addresses (and people) that have 2 or more People residing there that do not know each other.
This means that only Subgraph1 should be found.
Subgraph2 would not be found because there are 2 people that reside there but they know each other.
Subgraph3 would not be found because there is only 1 person residing there.
Again, thanks for the help.
This Cypher query should work:
MATCH (n1)-[:RESIDES_AT]->()<-[:RESIDES_AT]-(n2)
WHERE NOT exists((n1)-[:KNOWS]-(n2))
RETURN n1, n2
start by matching on nodes that have a RESIDES_AT relationship to the same node, then filter out nodes that have a KNOWS relationship.

Multiple relations like join tables (neo4j)

How do I express the following in neo4j?
match or create user bob; bob works at studio; while at studio, he's allowed to doodle; while at studio, he's also allowed to type.
Here's what I have:
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:works_at]->(c)-[:allowed_to]->(p:permission {name:'doodle'})
MERGE (u)-[:works_at]->(c)-[:allowed_to]->(p:permission {name:'type'})
This doesn't work as permission becomes a relation of company.
Also, is it possible to chain relations such that:
MERGE work=(u)-[:works_at]->(c)
CREATE (work)-[:allowed_to]->(p:permission {name:'doodle'})
CREATE (work)-[:allowed_to]->(p:permission {name:'type'})
where you assign a relation to a variable to continue it later on in another query?
How about modelling it so the company grants the permission? Something like this...
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:works_at]->(c)
MERGE (u)-[:allowed_to]->(p1:permission {name:'doodle'})<-[:GRANTS]-(c)
MERGE (u)-[:allowed_to]->(p2:permission {name:'type'})<-[:GRANTS]-(c)
RETURN *
You can't really refer to objects via identifiers/variables you have created previously in other queries. You would have to re-match or merge those previously created objects in your new query.
Part 2 could be modelled something like this..
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:DOES]->(work:Work {start_date: timestamp()} )-[:AT]->(c)
CREATE (work)-[:allowed_to]->(p:permission {name:'doodle'})
CREATE (work)-[:allowed_to]->(p:permission {name:'type'})
As an alternate, if you never need to lookup all users with a certain permission at a company, you could maintain a collection of permissions as relationship properties.
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[r:works_at]->(c)
SET r.permissions = ['doodle', 'type']

Cypher, create unique relationship for one node

I want to add a "created by" relationship on nodes in my database. Any node should be able of having this relationship but there can never be more than one.
Right now my query looks something like this:
MATCH (u:User {email: 'my#mail.com'})
MERGE (n:Node {name: 'Node name'})
ON CREATE SET n.name='Node name', n.attribute='value'
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(u)
RETURN n
As I have understood Cypher there is no way to achieve what I want, the current query will only make sure there are no unique relationships based on TWO nodes, not ONE. So, this will create more CREATED_BY relationships when run for another User and I want to limit the outgoing CREATED_BY relationship to just one for all nodes.
Is there a way to achieve this without running multiple queries involving program logic?
Thanks.
Update
I tried to simplyfy the query by removing implementation details, if it helps here's the updated query based on cybersams response.
MERGE (c:Company {name: 'Test Company'})
ON CREATE SET c.uuid='db764628-5695-40ee-92a7-6b750854ebfa', c.created_at='2015-02-23 23:08:15', c.updated_at='2015-02-23 23:08:15'
WITH c
OPTIONAL MATCH (c)
WHERE NOT (c)-[:CREATED_BY]-()
CREATE (c)-[:CREATED_BY {date: '2015-02-23 23:08:15'}]->(u:User {token: '32ba9d2a2367131cecc53c310cfcdd62413bf18e8048c496ea69257822c0ee53'})
RETURN c
Still not working as expected.
Update #2
I ended up splitting this into two queries.
The problem I found was that there was two possible outcomes as I noticed.
The CREATED_BY relationship was created and (n) was returned using OPTIONAL MATCH, this relationship would always be created if it didn't already exist between (n) and (u), so when changing the email attribute it would re-create the relationship.
The Node (n) was not found (because of not using OPTIONAL MATCH and the WHERE NOT (c)-[:CREATED_BY]-() clause), resulting in no relationship created (yay!) but without getting the (n) back the MERGE query looses all it's meaning I think.
My Solution was the following two queries:
MERGE (n:Node {name: 'Name'})
ON CREATE SET
SET n.attribute='value'
WITH n
OPTIONAL MATCH (n)-[r:CREATED_BY]-()
RETURN c, r
Then I had program logic check the value of r, if there was no relationship I would run the second query.
MATCH (n:Node {name: 'Name'})
MATCH (u:User {email: 'my#email.com'})
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(u)
RETURN n
Unfortunately I couldn't find any real solution to combining this in one single query with Cypher. Sam, thanks! I've selected your answer even though it didn't quite solve my problem, but it was very close.
This should work for you:
MERGE (n:Node {name: 'Node name'})
ON CREATE SET n.attribute='value'
WITH n
OPTIONAL MATCH (n)
WHERE NOT (n)-[:CREATED_BY]->()
CREATE UNIQUE (n)-[:CREATED_BY {date: '2015-02-23'}]->(:User {email: 'my#mail.com'})
RETURN n;
I've removed the starting MATCH clause (because I presume you want to create a CREATED_BY relationship even when that User does not yet exist in the DB), and simplified the ON CREATE to remove the redundant setting of the name property.
I have also added an OPTIONAL MATCH that will only match an n node that does not already have an outgoing CREATED_BY relationship, followed by a CREATE UNIQUE clause that fully specifies the User node.

Resources