Conditional partial merge of pattern into graph - neo4j

I'm trying to create a relationship that connects a person to a city -> state -> country without recreating the city/state/country nodes and relationships if they do already exist - so I'd end-up with only one USA node in my graph for example
I start with a person
CREATE (p:Person {name:'Omar', Id: 'a'})
RETURN p
then I'd like to turn this into an apoc.do.case statement with apoc
or turn it into one merge statement using unique the constraint that creates a new node if no node is found or otherwise matches an existing node
// first case where the city/state/country all exist
MATCH (locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality)-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// second case where only state/country exist
MATCH (adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// third case where only country exists
MATCH (country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country)
return p
// last case where none of city/state/country exist, so I have to create all nodes + relations
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
return p
The key here is I only want to end-up with one (California)->(USA). I don't want those nodes & relationships to get duplicated

Your queries that use MATCH never specify which Person you want. Variable names like p only exist for the life of a query (and sometimes not even that long). So p is unbound in your MATCH queries, and can result in your MERGE clauses creating empty nodes. You need to add MATCH (p:Person {Id: 'a'}) to the start of those queries (assuming all people have unique Id values).
It should NOT be the responsibility of every single query to ensure that all needed localities exist and are connected correctly -- that is way too much complexity and overhead for every query. Instead, you should create the appropriate localities and inter-locality relationships separately -- before you need them. If fact, it should be the responsibility of each query that creates a locality to create all the relationships associated with it.
A MERGE will only not create the specified pattern if every single thing in the pattern already exists, so to avoid duplicates a MERGE pattern should have at most 1 thing that might not already exist. So, a MERGE pattern should have at most 1 relationship, and if it has a relationship then the 2 end nodes should already be bound (by MATCH clauses, for example).
Once the Locality nodes and the inter-locality relationships exist, you can add a person like this:
MATCH (locality:Locality {name: "San Diego"})
MERGE (p:Person {Id: 'a'}) // create person if needed, specifying a unique identifier
ON CREATE SET p.name = 'Omar'; // set other properties as needed
MERGE (p)-[:SITUATED_IN]->(locality) // create relationship if necessary
The above considerations should help you design the code for creating the Locality nodes and the inter-locality relationships.

Finally, the solution I used is much simpler, it's a series of merges.
match (person:Person {Id: 'Omar'}) // that should be present in the graph
merge (country:Country {name: 'USA'})
merge (state:State {name: 'California'})-[:SITUATED_IN]->(country)
merge (city:City {name: 'Los Angeles'})-[:SITUATED_IN]->(state)
merge (person)-[:SITUATED_IN]->(city)
return person;

Related

Match paths of node types where nodes may have cycles

I'm trying to find a match pattern to match paths of certain node types. I don't care about the type of relation. Any relation type may match. I only care about the node types.
Of course the following would work:
MATCH (n)-->(:a)-->(:b)-->(:c) WHERE id(n) = 0
But, some of these paths may have relations to themselves. This could be for :b, so I'd also like to match:
MATCH (n)-->(:a)-->(:b)-->(:b)-->(:c) WHERE id(n) = 0
And:
MATCH (n)-->(:a)-->(:b)-->(:b)-->(:b)-->(:c) WHERE id(n) = 0
I can do this with relations easily enough, but I can't figure out how to do this with nodes, something like:
MATCH (n)-->(:a)-->(:b*1..)-->(:c) WHERE id(n) = 0
As a practical example, let's say I have a database with people, cars and bikes. The cars and bikes are "owned" by people, and people have relationships like son, daughter, husband, wife, etc. What I'm looking for is a query that from a specific node, gets all nodes of related types. So:
MATCH (n)-->(:person*1..)-->(:car) WHERE Id(n) = 0
I would expect that to get node "n", it's parents, grandparents, children, grandchildren, all recursively. And then of those people, their cars. If I could assume that I know the full list of relations, and that they only apply to people, I could get this to work as follows:
MATCH
p = (n)-->(:person)-[:son|daughter|husband|wife|etc*0..]->(:person)-->(:car)
WHERE Id(n) = 0
RETURN nodes(p)
What I'm looking for is the same without having to specify the full list of relations; but just the node label.
Edit:
If you want to find the path from one Person node to each Car node, using only the node labels, and assuming nodes may create cycles, you can use apoc.path.expandConfig.
For example:
MERGE (mark:Person {name: "Mark"})
MERGE (lju:Person {name: "Lju"})
MERGE (praveena:Person {name: "Praveena"})
MERGE (zhen:Person {name: "Zhen"})
MERGE (martin:Person {name: "Martin"})
MERGE (joe:Person {name: "Joe"})
MERGE (stefan:Person {name: "Stefan"})
MERGE (alicia:Person {name: "Alicia"})
MERGE (markCar:Car {name: "Mark's car"})
MERGE (ljuCar:Car {name: "Lju's car"})
MERGE (praveenaCar:Car {name: "Praveena's car"})
MERGE (zhenCar:Car {name: "Zhen's car"})
MERGE (zhen)-[:CHILD_OF]-(mark)
MERGE (praveena)-[:CHILD_OF]-(martin)
MERGE (praveena)-[:MARRIED_TO]-(joe)
MERGE (zhen)-[:CHILD_OF]-(joe)
MERGE (alicia)-[:CHILD_OF]-(joe)
MERGE (zhen)-[:CHILD_OF]-(mark)
MERGE (anthony)-[:CHILD_OF]-(rik)
MERGE (martin)-[:CHILD_OF]-(mark)
MERGE (stefan)-[:CHILD_OF]-(zhen)
MERGE (lju)-[:CHILD_OF]-(stefan)
MERGE (markCar)-[:OWNED]-(mark)
MERGE (ljuCar)-[:OWNED]-(lju)
MERGE (praveenaCar)-[:OWNED]-(praveena)
MERGE (zhenCar)-[:OWNED]-(zhen)
Running a query:
MATCH (n:Person{name:'Joe'})
CALL apoc.path.expandConfig(n, {labelFilter: "Person|/Car", uniqueness: "NODE_GLOBAL"})
YIELD path
RETURN path
will return four unique paths from Joe node to the four car nodes. There are several options for uniqueness of the path, see uniqueness
The /CAR makes it a Termination label, i.e. returned paths are only up to this given label.

How to do this in a single Cypher Query?

So this is a very basic question. I am trying to make a cypher query that creates a node and connects it to multiple nodes.
As an example, let's say I have a database with towns and cars. I want to create a query that:
creates people, and
connects them with the town they live in and any cars they may own.
So here goes:
Here's one way I tried this query (I have WHERE clauses that specify which town and which cars, but to simplify):
MATCH (t: Town)
OPTIONAL MATCH (c: Car)
MERGE a = ((c) <-[:OWNS_CAR]- (p:Person {name: "John"}) -[:LIVES_IN]-> (t))
RETURN a
But this returns multiple people named John - one for each car he owns!
In two queries:
MATCH (t:Town)
MERGE a = ((p:Person {name: "John"}) -[:LIVES_IN]-> (t))
MATCH (p:Person {name: "John"})
OPTIONAL MATCH (c:Car)
MERGE a = ((p) -[:OWNS_CAR]-> (c))
This gives me the result I want, but I was wondering if I could do this in 1 query. I don't like the idea that I have to find John again! Any suggestions?
It took me a bit to wrap my head around why MERGE sometimes creates duplicate nodes when I didn't intend that. This article helped me.
The basic insight is that it would be best to merge the Person node first before you match the towns and cars. That way you won't get a new Person node for each relationship pattern.
If Person nodes are uniquely identified by their name properties, a unique constraint would prevent you from creating duplicates even if you run a mistaken query.
If a person can have multiple cars and residences in multiple towns, you also want to avoid a cartesian product of cars and towns in your result set before you do the merge. Try using the table output in Neo4j Browser to see how many rows are getting returned before you do the MERGE to create relationships.
Here's how I would approach your query.
MERGE (p:Person {name:"John"})
WITH p
OPTIONAL MATCH (c:Car)
WHERE c.licensePlate in ["xyz123", "999aaa"]
WITH p, COLLECT(c) as cars
OPTIONAL MATCH (t:Town)
WHERE t.name in ["Lexington", "Concord"]
WITH p, cars, COLLECT(t) as towns
FOREACH(car in cars | MERGE (p)-[:OWNS]->(car))
FOREACH(town in towns | MERGE (p)-[:LIVES_IN]->(town))
RETURN p, towns, cars

Merging duplicate nodes and their relationship

I have a requirements to merge the duplicate nodes and keep one copy. Issue I am facing is, when I merge nodes, there will be duplicate relationship created. Instead, I want to merge the relationship as well without duplicates.
Can you give some suggestions?
CREATE (n:People { name: 'Person1', lastname: 'Person1LastName', email_ID:'Person1#test2.com' })
CREATE (n:People { name: 'Person2', lastname: 'Person2LastName', email_ID:'Person2#test2.com' })
CREATE (n:People { name: 'Person2', lastname: 'Person2LastName', staysin:'California' })
CREATE (n:People { name: 'Person3', lastname: 'Person3LastName', email_ID:'Person3#test2.com' })
Person2 -[r:Has_Met]->(Person1)
(Person3)-[r:FRIENDS_WITH]->(Person2) having email_ID='Person2#test2.com'
Now i wants to keep Person2 nodes and keep both the relationship with other nodes -
something like this:
MATCH (p:People{name:"person1"})
WITH p.name as name, collect(p) as nodes, count() as cnt
WHERE cnt > 1
WITH head(nodes) as first, tail(nodes) as rest
UNWIND rest AS to_delete
MATCH (to_delete)-[r:HAS_MET]->(e:name)
MERGE (first)-[r1:HAS_MET]->(e)
on create SET r1=r
SET to_delete.isDuplicate=true
RETURN count();
This is a related question, but here I know only one relationship (HAS_MET) will be considered. How do I consider all the relationships once?
Without presentation of your model or listing of sample data, unfortunately, I am only able to answer in general, which may help you nevertheless.
Have a look at the APOC library and consider the use of the procedures Merge Nodes and Redirect Relationship To. You will find explanatory images and Cypher statements there for each case.
Extension after question update
Initial situation
CREATE
(p1:People {name: 'Person1', lastname: 'Person1LastName', email_ID: 'Person1#test2.com'}),
(p2a:People {name: 'Person2', lastname: 'Person2LastName', email_ID: 'Person2#test2.com'}),
(p2b:People {name: 'Person2', lastname: 'Person2LastName', staysin: 'California'}),
(p3:People {name: 'Person3', lastname: 'Person3LastName', email_ID: 'Person3#test2.com'}),
(p2a)-[:HAS_MET]->(p1),
(p2b)-[:HAS_MET]->(p1),
(p3)-[:FRIENDS_WITH]->(p2a);
Solution
MATCH (oneNode:People {email_ID: 'Person2#test2.com'}), (otherNode:People {staysin: 'California'})
CALL apoc.refactor.mergeNodes([oneNode, otherNode])
YIELD node
MATCH (node)-[relation:HAS_MET]->(:People)
WITH tail(collect(relation)) AS surplusRelations
UNWIND surplusRelations AS surplusRelation
DELETE surplusRelation;
line 1: select both to be combined nodes
line 2: call appropriate merge nodes procedure
line 3: define result variable
line 4: identify all relationships between the combined node and a met person (there are two at least)
line 5: select all relationships but the first one
line 7: delete all surplus relationships
Result
merged node Person2, containing all attributes from source nodes (note especially email_ID and staysin)
one relationship Person1-Person2

neo4j cypher query to delete a middle node and connect all its parent node to child node

If I have a graph by executing this query, then i want to delete a middle node say 'and' and connect its previous node say 'graph' to its child node say 'db' by using its corresponding outgoing relationships based on same 'seqid'
MERGE (n:Person { name: 'graph'})
MERGE (n:Person { name: 'and'})
MERGE (n:Person { name: 'relational' })
MERGE (n:Person { name: 'nosql'})
MERGE (n:Person { name: 'server'})
MERGE (n:Person { name: 'db'})
MERGE (a:Person { name: 'graph'}) MERGE (b:Person { name: 'and' }) MERGE (a)-[:NEXT{seqid:1}]->(b)
MERGE (a:Person { name: 'and' }) MERGE (b:Person { name: 'db'}) MERGE (a)-[:NEXT{seqid:1 , caps: 'true'}]->(b)
MERGE (a:Person { name: 'relational'}) MERGE (b:Person { name: 'db'}) MERGE (a)-[:NEXT{seqid:1}]->(b)
MERGE (a:Person { name: 'nosql'}) MERGE (b:Person { name: 'db' }) MERGE (a)-[:NEXT{seqid:2, caps: 'true'}]->(b)
MERGE (a:Person { name: 'server'}) MERGE (b:Person { name: 'and' }) MERGE (a)-[:NEXT{seqid:1}]->(b)
MERGE (a:Person { name: 'and' }) MERGE (b:Person { name: 'db'}) MERGE (a)-[:NEXT{seqid:1}]->(b)
MERGE (a:Person { name: 'server'}) MERGE (b:Person { name: 'and'}) MERGE (a)-[:CONNECTS{seqid:2}]->(b)
MERGE (a:Person { name: 'and' }) MERGE (b:Person { name: 'db'}) MERGE (a)-[:CONNECTS{seqid:2, caps: 'true'}]->(b)
i.e.
(graph)-[:NEXT{seqid:1 , caps: 'true'}]->(db)
(relational)-[:NEXT{seqid:1}]->(db)
(nosql)-[:NEXT{seqid:2, caps: 'true'}]->(db)
(server)-[:NEXT{seqid:1}]->(db)
(server)-[:CONNECTS{seqid:2, caps: 'true'}]->(db)
pls help me to solve this.............
(I am using neo4j 2.3.6 community edition via java api in embedded mode..)
The obstacle here is that relationship types cannot be created dynamically. You cannot check for the incoming relationships, not knowing their type, and create a new relationship of that same type.
If you know the types of the relationships you need to process, and can address those explicitly, then you can do this with Cypher. Here's the query to do this for all :NEXT relationships, copying over the properties of the relationship from the middle node to the end node over to the newly created relationship:
MATCH (middle:Person{name:'and'})
WITH middle
MATCH (from:Person)-[rFrom:NEXT]->(middle)
WHERE exists(rFrom.seqid)
WITH middle, rFrom, from
MATCH (middle)-[rTo:NEXT]->(to:Person)
WHERE rTo.seqid = rFrom.seqid
WITH middle, rFrom, from, rTo, to
CREATE (from)-[rNew:NEXT]->(to)
SET rNew += rTo
DELETE rFrom
You'll want to repeat this for every relationship type you're interested in, and when there are no more relationships to or from your middle node, delete the node.
Note that if you do upgrade to neo4j 3, the APOC Procedures library has procedures for graph refactoring, which will easily take care of this.
EDIT
Altered my Cypher above to do CREATE instead of MERGE.
Also removed deletion of the relationships from the middle node to the next node, as you seem want to take the relationship properties from the relationship connecting the middle node to the next node, and since there may be multiple incoming relationships to the middle node with the same type and id, but only a single relationship from the middle node with that type and id.
This means your ratio of incoming relationships with the same type and id is not equal to the outgoing relationships of the same type and id, so we'll be reusing those outgoing relationships when creating the new relationships.
Only after you're all done creating the new relationships should you detach and delete the middle node.
Adding another answer that will fulfill all requirements, but requires Neo4j 3.0.x or greater. Specifically, this requires the apoc.create.relationship() procedure from APOC Procedures, which will let us create a relationship with a dynamic type, supplied from the matched from relationships.
This will take care of all relationships at once (at least those with seqid), so we should be okay to detach and delete the middle node at the end.
MATCH (middle:Person{name:'and'})
WITH middle
MATCH (from:Person)-[rFrom]->(middle)
WHERE EXISTS(rFrom.seqid)
WITH middle, rFrom, from
MATCH (middle)-[rTo]->(to:Person)
WHERE TYPE(rTo) = TYPE(rFrom) AND rTo.seqid = rFrom.seqid
WITH middle, rFrom, from, rTo, to
CALL apoc.create.relationship(from, TYPE(rFrom), PROPERTIES(rTo), to) YIELD rel
DETACH DELETE middle

CREATE UNIQUE in neo4j produces duplicate nodes

According to the neo4j documentation:
CREATE UNIQUE is in the middle of MATCH and CREATE — it will match
what it can, and create what is missing. CREATE UNIQUE will always
make the least change possible to the graph — if it can use parts of
the existing graph, it will.
This sounds great, but CREATE UNIQUE doesn't seem to follow the 'least possible change' rule. e.g., here is some Cypher to create two people:
CREATE (n:Person {name: 'Alice'})
CREATE (n:Person {name: 'Bob'})
CREATE INDEX ON :Person(name)
and here's two CREATE UNIQUE statements, to create a relationship between those people. Since both people already exist in the graph, only the relationships should be newly created:
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)-[:knows]->(b:Person {name: 'Bob'})
RETURN a
MATCH (a:Person {name: 'Alice'})
CREATE UNIQUE (a)<-[:knows]-(b:Person {name: 'Bob'})
RETURN a
After this, the graph should look like
(Alice)<---KNOWS--->(Bob).
But when you run a MATCH query:
MATCH (a:Person)
RETURN a
it seems that the graph now looks like
(Bob)
(Bob)--KNOWS-->(Alice)--KNOWS-->(Bob);
two extra Bobs have been created.
I looked a bit through the other Cypher commands, but none of them seem intended for this use case: create a link between existing node A and existing node B if B exists, and otherwise create a link between existing node A and a newly created node B. How can this problem best be solved within the Cypher framework?
This query should do what you want (if you always want to end up with a single knows relationship between the 2 nodes):
MATCH (a:Person {name: 'Alice'})
MERGE (b:Person {name: 'Bob'})
MERGE (a)-[:knows]->(b)
RETURN a;
Here is how you can do it with CREATE UNIQUE
MATCH (a:Person {name: 'Alice'}), (b:Person {name:'Bob'})
CREATE UNIQUE (a)-[:knows]->(b), (b)-[:knows]->(a)
You need 2 match clauses otherwise you are always creating the node in the CREATE UNIQUE statement, not matching existing nodes.

Resources