How to prevent neo4j MERGE from creating duplicate relationships? - neo4j

I am attempting to create nodes and relationships if they do not exist. I do not know ahead of time if anything in the DB exists.
This is my initial query:
MERGE (t:type { name: 'aaa'})
MERGE (m:model { name: 'bbb'})
MERGE (r:region {name: 'ccc'})
MERGE (p:param {name: 'ddd'})
MERGE (i:init {value: 123})
MERGE (u:forecast {url: 'http://something.png'})
MERGE (t)-[:HAS]-(m)-[:HAS]-(r)-[:HAS]-(p)-[:HAS]-(i)-[:HAS]-(u)
This correctly produces a graph like this:
Then I run this query again, but this time I change the name of the "model" object to "bbc" (instead of "bbb"):
MERGE (t:type { name: 'aaa'})
MERGE (m:model { name: 'bbc'})
MERGE (r:region {name: 'ccc'})
MERGE (p:param {name: 'ddd'})
MERGE (i:init {value: 123})
MERGE (u:forecast {url: 'http://something.png'})
MERGE (t)-[:HAS]-(m)-[:HAS]-(r)-[:HAS]-(p)-[:HAS]-(i)-[:HAS]-(u)
Now, however, my graph looks like this:
Everything looks correct except for the three duplicated relationships.
I realize that MATCH will create the whole path if it does not exist. There must be some way to avoid creating duplicate relationships, though.
I would appreciate being pointed in the right direction!

The MERGE statement checks if the pattern as a whole already exists or not. So, if there is one node different, the whole pattern is determined as non-existent and all relationships are created.
The solution is to split this MERGE statement into multiple, i.e. one MERGE for each relationship:
MERGE (t)-[:HAS]-(m)-[:HAS]-(r)-[:HAS]-(p)-[:HAS]-(i)-[:HAS]-(u)
becomes
MERGE (t)-[:HAS]-(m)
MERGE (m)-[:HAS]-(r)
MERGE (r)-[:HAS]-(p)
MERGE (p)-[:HAS]-(i)
MERGE (i)-[:HAS]-(u)

Related

neo4j apoc.path.expandConfig - How to use relationshipFilter with a list of relationships

According to the documentations, the pipe char (|) acts as a or in the relationshipFilter, while the comma char (,) acts as a concatenation of relationships, creating a list of them.
see for example (look at the explanation with the black background comparing to the query itself):
and:
here, page 14: sequences:
The question is, are commas stronger than pipes?
i.e., If I want several options of steps sequences, can I specify several strict lists, or do I must specify one list, each step with several options?
I wanted to achieve 4 relationship sequence options:
1.CREATE> or
2.REACT,REPLY or
3.CREATE>,RELATED or
4.REPLY>,CREATE
So I wrote a simple query:
MATCH(u:User{key:1})
CALL apoc.path.expandConfig(u, {maxLevel: 3,
relationshipFilter: 'CREATE>|REACT,REPLY|CREATE>,RELATED|REPLY>,CREATE',
uniqueness:"RELATIONSHIP_GLOBAL"})
YIELD path
RETURN path
Given a sample data:
MERGE (a:User{key: 1})
MERGE (b:Tags{key: 2})
MERGE (c:Post{key: 3})
MERGE (d:Comment{key: 4})
MERGE (e:Comment{key: 5})
MERGE (f:Comment{key: 6})
MERGE (g:User{key: 7})
MERGE (h:User{key: 8})
MERGE (i:Post{key: 9})
MERGE (j:Tags{key: 10})
MERGE (k:Post{key: 11})
MERGE (l:Comment{key: 12})
MERGE (a)-[:CREATE]-(b)
MERGE (a)-[:CREATE]-(c)
MERGE (a)-[:REACT]-(c)
MERGE (a)-[:CREATE]-(d)
MERGE (a)-[:REACT]-(d)
MERGE (b)-[:RELATED]-(c)
MERGE (d)-[:REPLY]-(c)
MERGE (d)-[:REPLY]-(d)
MERGE (h)-[:REACT]-(c)
MERGE (g)-[:REACT]-(c)
MERGE (h)-[:CREATE]-(j)
MERGE (j)-[:RELATED]-(c)
MERGE (g)-[:CREATE]-(i)
MERGE (e)-[:REPLY]-(i)
MERGE (f)-[:REPLY]-(i)
MERGE (a)-[:REPLY]-(i)
MERGE (h)-[:CREATE]-(k)
MERGE (l)-[:REPLY]-(k)
MERGE (a)-[:REACT]-(l)
I was expecting to get an answer including (a:User{key: 1})-[:REPLY]->(i:Post{key: 9})<-[:CREATE]-(g:User{key: 7}), which corresponds with my last part of the relationshipFilter, but did not get it.
Thank you for your time
I believe your relationshipFilter needs to be changed.
You have written: 'CREATE>|REACT,REPLY|CREATE>,RELATED|REPLY>,CREATE'
Which matches:
CREATE> OR REACT
REPLY OR CREATE
RELATED OR REPLY
CREATE (this clause is never checked because of maxLevel:3.)
It appears you intended to use the relationshipFilter: "CREATE>,REACT|REPLY,CREATE>|RELATED,REPLY>|CREATE"
Which matches
CREATE>
REACT OR REPLY
CREATE> OR RELATED
REPLY> OR CREATE

Exclude nodes using WHERE not works

I have an error when i try to exclude nodes using MATCH&WHERE
I have the next nodes & rrlationships:
MERGE (a1:accz {id: 1})
MERGE (a2:accz {id: 2})
MERGE (a3:accz {id: 3})
MERGE (a4:accz {id: 4})
MERGE (a5:accz {id: 5})
MERGE (i1:itemz {id: 1})
MERGE (i2:itemz {id: 2})
MERGE (i3:itemz {id: 3})
MERGE (i4:itemz {id: 4})
MERGE (a1)-[:AUTHOR]->(i1)
MERGE (a2)-[:AUTHOR]->(i2)
MERGE (a3)-[:AUTHOR]->(i1)
MERGE (a3)-[:AUTHOR]->(i3)
MERGE (a4)-[:AUTHOR]->(i4)
MERGE (a4)-[:AUTHOR]->(i5)
MERGE (a4)-[:AUTHOR]->(i5)
MERGE (a5)-[:AUTHOR]->(i2)
MERGE (a5)-[:AUTHOR]->(i5)
When i execute (I include in a explicit way the items with which the accz need have a relationship):
MATCH (a:accz)-[:AUTHOR]->(i:itemz) WHERE ({id: i.id} IN [({id: 3}), ({id: 4})]) RETURN a
i got the accz nodes (3,4,5), and is ok. But then i exclude some nodes using WHERE, like the next query:
MATCH (a:accz)-[:AUTHOR]->(i:itemz) WHERE ({id: i.id} IN [({id: 3}), ({id: 4})]) AND (NOT (a)-[:AUTHOR]->(:itemz {id:5})) RETURN a
but i continue getting the accz node id:5, this should be excluded because the acc{id:5} is AUTHOR of (:itemz {id:5})
what im doing wrong?
The odd behaviors seen in your example would seem like bugs, but can be explained (after some careful thought). One conclusion, after all is said and done, is that you should avoid using unbound nodes in a MERGE clause.
The odd behaviors
Your creation query has no MERGE clause to create the itemz node i5. That is, this clause is missing: MERGE (i5:itemz {id: 5}).
Therefore, it would seem like the 2 MERGE (a4)-[:AUTHOR]->(i5) clauses should result in the creation of a new unlabelled i5 node with no properties -- but no such node is created!
And it would also seem like the MERGE (a5)-[:AUTHOR]->(i5) clause should result in a relationship with that new i5 -- but instead it unexpectedly results in a relationship with i4!
Explanation
This snippet of code causes the odd behavior (I have added comments to clarify):
MERGE (a4)-[:AUTHOR]->(i4) // Makes sure `(a4)-[:AUTHOR]->(i4)` relationship exists
MERGE (a4)-[:AUTHOR]->(i5) // Matches above relationship, so creates `i5` and binds it to `i4`!
MERGE (a4)-[:AUTHOR]->(i5) // Matches same relationship, so nothing is done.
So, after the snippet is executed, i4 and i5 are bound to the same node. This explains the odd behaviors.
Conclusion
To avoid unexpected results, you should avoid using unbound nodes in MERGE clauses.
If your creation query had included a MERGE (i5:itemz {id: 5}) clause before the relationships were created, then your queries would have worked reasonably. The result of the first query would contain accz nodes 3 and 4, and the result of the second query would only contain 3.
By the way, ({id: i.id} IN [({id: 3}), ({id: 4})]) can be greatly simplified to just i.id IN [3, 4].

How to merge tree in neo4j

Let's say I have a database with named nodes and that the database is either empty or has the following content:
I now need a neo4j statement, that inserts exactly that tree structure, if it does not exists already in the database.
For simple node pair merge, I could use something like
MERGE ({name: 'A'})-[:R1]->({name: 'B'})
But I want the tree structure. How do I add C here?
Firstly, you have to add a label on your tree node (Tree in my above example) and create a unique constraint on the name attribute like this :
CREATE CONSTRAINT ON (n:Tree) ASSERT n.name IS UNIQUE;
Then you can use this script to create the C node and the others is they don't exist :
MERGE (a:Tree {name: 'A'})
MERGE (b:Tree {name: 'B'})
MERGE (c:Tree {name: 'C'})
MERGE (a)-[:R1]->(b)
MERGE (a)-[:R2]->(c);
As you can see you have to use one MERGE per node, and then one MERGE per relationship.

Neo4j - LOAD-CSV not creating all nodes

I am just getting started on Neo4J, and I am trying to load some data into Neo4j 3.1 using LOAD CSV with the following script:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///Fake59.csv" AS line
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname, title: line.Title,
gender: line.Gender, birthday: line.Birthday, bloodType: line.BloodType, weight: line.Pounds, height: line.FeetInches})
MERGE (contact:Contact {phoneNumber: line.TelephoneNumber, email: line.EmailAddress})
MERGE (person)-[:CONTACTED_AT]->(contact)
MERGE (color:Color {name: line.Color})
MERGE (person)-[:FAVORITE_COLOR]->(Color)
MERGE (address:Address {streetAddress: line.StreetAddress, city: line.City, zipCode: line.ZipCode})
MERGE (person)-[:LIVES_AT]->(address)
MERGE (state:State {abbr: line.State, name: line.StateFull})
MERGE (city)-[:STATE_OF]->(stage)
MERGE (country:Country {name: line.CountryFull, abbr: line.Country, code: line.TelephoneCountryCode})
MERGE (state)-[:IN_COUNTRY]->(country)
MERGE (credentials:Credentials {userName: line.Username, password: line.Password, GUID: line.GUID})
MERGE (person)-[:LOGS_in]->(credentials)
MERGE (browser:Browser {agent: line.BrowserUserAgent})
MERGE (person)-[:BROWSES_WITH]->(browser)
MERGE (creditCard:CreditCard {number: line.CCNumber, cvv2: line.CVV2, expireDate: line.CCExpires})
MERGE (person)-[:USES_CC]->(creditCard)
MERGE (creditCompany:CreditCompany {name: line.CCType})
MERGE (creditCard)-[:MANAGED_BY]->(creditCompany)
MERGE (occupation:Occupation {name: line.Occupation})
MERGE (person)-[:WORKS_AS]->(occupation)
MERGE (company:Company {name: line.Company})
MERGE (person)-[:WORKDS_FOR]->(company)
MERGE (company)-[:EMPLOYES]->(occupation)
MERGE (vehicle:Vehicle {name: line.Vehicle})
MERGE (person)-[:DRIVES]->(vehicle)
The input file has about 50k rows. It runs for a few hours the process does not finish, but after that time if I query the database I see that only the node type (Person) got created. If I run a smaller file with 3 entries only all the additional nodes and relationships are created.
I have already changed the amount of memory allocated to Neo4j and to the JVM, and still no success. I understand that MERGE takes longer than CREATE to be executed but I am trying to avoid duplication of nodes with the insert.
Any ideas or suggestions on what I should change or how I can improve this ?
Thank you,
--MD.
Try splitting your query into multiple smaller ones. Works better and is easier to manage. Also when using MERGE you should typically want to do it on a single property like an email for person or something unique and then use ON CREATE SET. Should fasten the query. Looks like this:
MERGE (contact:Contact {email: line.EmailAddress})
ON CREATE SET contact.phoneNumber = line.TelephoneNumber
In your case with the person where there is no single unique property you can use a combination of many, but know that every property you add in the MERGE slows down the query.
MERGE (person:Person {firstName: line.GivenName, middleInitial: line.MiddleInitial, lastName: line.Surname})
ON CREATE SET person.title = line.Title, person.gender = line.Gender,
person.birthday = line.Birthday, person.bloodType = line.BloodType,
person.weight = line.Pounds, person.height = line.FeetInches

Multiple relations like join tables (neo4j)

How do I express the following in neo4j?
match or create user bob; bob works at studio; while at studio, he's allowed to doodle; while at studio, he's also allowed to type.
Here's what I have:
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:works_at]->(c)-[:allowed_to]->(p:permission {name:'doodle'})
MERGE (u)-[:works_at]->(c)-[:allowed_to]->(p:permission {name:'type'})
This doesn't work as permission becomes a relation of company.
Also, is it possible to chain relations such that:
MERGE work=(u)-[:works_at]->(c)
CREATE (work)-[:allowed_to]->(p:permission {name:'doodle'})
CREATE (work)-[:allowed_to]->(p:permission {name:'type'})
where you assign a relation to a variable to continue it later on in another query?
How about modelling it so the company grants the permission? Something like this...
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:works_at]->(c)
MERGE (u)-[:allowed_to]->(p1:permission {name:'doodle'})<-[:GRANTS]-(c)
MERGE (u)-[:allowed_to]->(p2:permission {name:'type'})<-[:GRANTS]-(c)
RETURN *
You can't really refer to objects via identifiers/variables you have created previously in other queries. You would have to re-match or merge those previously created objects in your new query.
Part 2 could be modelled something like this..
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[:DOES]->(work:Work {start_date: timestamp()} )-[:AT]->(c)
CREATE (work)-[:allowed_to]->(p:permission {name:'doodle'})
CREATE (work)-[:allowed_to]->(p:permission {name:'type'})
As an alternate, if you never need to lookup all users with a certain permission at a company, you could maintain a collection of permissions as relationship properties.
MERGE (u:user {name:'bob'})
MERGE (c:company {name: 'studio'})
MERGE (u)-[r:works_at]->(c)
SET r.permissions = ['doodle', 'type']

Resources