Redundancy of graph in Neo4j - neo4j

I have created a small graph in Neo4j and the respective nodes and relationships are created. If I run the same code again, the nodes and relationships are again created instead of displaying the message like the nodes and relationships already exist similar like Oracle.
MERGE (a:Person1 { name : 'ROGER', title : 'Developer', age :28})
MERGE (b:Person2 { name : 'Britney', title : 'financier',age :32})
MERGE (c:Person3 { name : 'Christian', title : 'tester',age :24})
Create (a)-[:HUSBAND{last_name:'WHITE'}]->(b) RETURN a,b,c;
So I want to clarify whether Neo4j has duplication or the nodes will be created many times
Thanks in advance...

For reference, the MERGE statements are not creating new persons, only your CREATE statement in the end, see http://console.neo4j.org/r/qrzr6u saying upon re-execution
created 1 relationship set 1 property
You probably want the MERGE on all statements:
MERGE (a:Person1 { name : 'ROGER', title : 'Developer', age :28 })
MERGE (b:Person2 { name : 'Britney', title : 'financier', age :32 })
MERGE (c:Person3 { name : 'Christian', title : 'tester', age :24 })
MERGE (a)-[:HUSBAND { last_name:'WHITE' }]->(b)
RETURN a,b,c;
See http://console.neo4j.org/r/vmfl2v for an example.

MERGE does not re-create data if it already exists. CREATE always creates data, even if it already exists.
The documentation on merge points out that it always matches on the full pattern.
In the case of the cypher snippet you gave us, if you ran it twice you should end up with only one copy of Roger, Britney, and Christian, but I would expect two separate relationships between Roger and Britney, because CREATE always creates.
Watch out for the gotcha on MERGE though, it always merges on the full pattern you specify. So for example if you do this:
MERGE (a:Person {fname: "Henry"});
MERGE (a:Person {fname: "Henry", lname: "Banks"});
Then you get two Henrys, one with no lname property, and one with. This is because the second MERGE looks for a Person node with fname:Henry, lname:Banks and fails to find it, so it creates one. It does not add an extra property to an existing node. This is a common trip-up using MERGE.
Another common trip-up using MERGE (again due to the "whole pattern match") is this:
MERGE (a:Person {name:"Henry"})-[:knows]->(b:Person {name: "Mary"});
MERGE (a:Person {name:"Henry"})-[:married]->(b:Person {name: "Mary"});
This ends up creating two Henry's and two Mary's.

Related

Merging Nodes in Neo4j

I am trying merge two neo4j graphs using CYPHER. The first one is the example of Countries and their Capitals. The second one is a sample example I created.
WITH "https://gist.githubusercontent.com/jimmycrequer/7aa867900d0cf0b9588d4354f09cb286/raw/countries.json" AS url
CALL apoc.load.json(url) YIELD value AS v
MERGE (c:Country {name: v.name})
SET c.population = v.population, c.area = v.area
CREATE (capital:City {name: v.capital})
CREATE (c)<-[:IS_CAPITAL_OF]-(capital)
FOREACH (n IN v.neighbors |
MERGE (neighbor:Country {name: n})
MERGE (c)-[:IS_NEIGHBOR_OF]-(neighbor)
)
To this, I'm trying to add my graph
//Manufacturers
MERGE (BMW:Manufacturer {name:"BMW" , headquarters :"Germany" , employees :100306,factories:25 ,revenue:95.8 ,production:1668982 ,sales: 1688982 })
MERGE(Germany:Country)-[:MANUFACTURERS]->(BMW)
The Node Germany has the following properties
id:103, area:357022, name:Germany, population:8288000
When, I try to look for the final output. I see there is an empty blank node created for the relationship [:MANUFACTURERS] and a node BMW is created.
Change your second query a bit. Just because you name the node variable Germany, Neo4j doesnt know you want to match the country with the name property Germany.
And in most cases you should merge or match nodes first and only then add tje relationship between the two
MERGE (BMW:Manufacturer {name:"BMW" , headquarters :"Germany" , employees :100306,factories:25 ,revenue:95.8 ,production:1668982 ,sales: 1688982 })
MERGE (Germany:Country{name:'Germany})
MERGE (Germany)-[:MANUFACTURERS]->(BMW)

Conditional partial merge of pattern into graph

I'm trying to create a relationship that connects a person to a city -> state -> country without recreating the city/state/country nodes and relationships if they do already exist - so I'd end-up with only one USA node in my graph for example
I start with a person
CREATE (p:Person {name:'Omar', Id: 'a'})
RETURN p
then I'd like to turn this into an apoc.do.case statement with apoc
or turn it into one merge statement using unique the constraint that creates a new node if no node is found or otherwise matches an existing node
// first case where the city/state/country all exist
MATCH (locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality)-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// second case where only state/country exist
MATCH (adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea)-[:SITUATED_IN]->(country)
return p
// third case where only country exists
MATCH (country:Country { name: 'USA' })
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country)
return p
// last case where none of city/state/country exist, so I have to create all nodes + relations
MERGE (p)-[:SITUATED_IN]->(locality:Locality{name:"San Diego"})-[:SITUATED_IN]->(adminArea:AdministrativeArea { name: 'California' })-[:SITUATED_IN]->(country:Country { name: 'USA' })
return p
The key here is I only want to end-up with one (California)->(USA). I don't want those nodes & relationships to get duplicated
Your queries that use MATCH never specify which Person you want. Variable names like p only exist for the life of a query (and sometimes not even that long). So p is unbound in your MATCH queries, and can result in your MERGE clauses creating empty nodes. You need to add MATCH (p:Person {Id: 'a'}) to the start of those queries (assuming all people have unique Id values).
It should NOT be the responsibility of every single query to ensure that all needed localities exist and are connected correctly -- that is way too much complexity and overhead for every query. Instead, you should create the appropriate localities and inter-locality relationships separately -- before you need them. If fact, it should be the responsibility of each query that creates a locality to create all the relationships associated with it.
A MERGE will only not create the specified pattern if every single thing in the pattern already exists, so to avoid duplicates a MERGE pattern should have at most 1 thing that might not already exist. So, a MERGE pattern should have at most 1 relationship, and if it has a relationship then the 2 end nodes should already be bound (by MATCH clauses, for example).
Once the Locality nodes and the inter-locality relationships exist, you can add a person like this:
MATCH (locality:Locality {name: "San Diego"})
MERGE (p:Person {Id: 'a'}) // create person if needed, specifying a unique identifier
ON CREATE SET p.name = 'Omar'; // set other properties as needed
MERGE (p)-[:SITUATED_IN]->(locality) // create relationship if necessary
The above considerations should help you design the code for creating the Locality nodes and the inter-locality relationships.
Finally, the solution I used is much simpler, it's a series of merges.
match (person:Person {Id: 'Omar'}) // that should be present in the graph
merge (country:Country {name: 'USA'})
merge (state:State {name: 'California'})-[:SITUATED_IN]->(country)
merge (city:City {name: 'Los Angeles'})-[:SITUATED_IN]->(state)
merge (person)-[:SITUATED_IN]->(city)
return person;

Cypher 'Node Already Exists' issue with MERGE

I am preplexed on why I am getting an issue with this Cypher statment when I have a unique constraint on the address of the location node but am using a merge which should find that if it exists and only return the id for the rest of the statment. What am I missing?
Here is my statement:
MERGE(l:Location{location_name:"Starbucks", address:"36350 Van Dyke Ave", city: "Sterling Heights",state: "MI", zip_code:"48312",type:"location",room_number:"",long:-83.028889,lat:42.561152})
CREATE(m:Meetup{meet_date:1455984000,access:"Private",status:"Active",type:"project",did_happen:"",topic:"New features for StudyUup",agenda:"This is a brainstorming session to come with with new ideas for the companion website, StudyUup. Using MatchUup as the base, what should be added, removed, or modified? Bring your thinking caps and ideas!"})
WITH m,l
MATCH (g:Project{title_slug:"studyuup"}) MATCH (p:Person{username:"wkolcz"})
WITH m,l,g,p
MERGE (g)-[:CREATED {rating:0}]->(m)
MERGE (m)-[:MEETUP_AT {rating:0}]->(l)-[:HOSTED_MEETUP]->(m)
MERGE (m)<-[:ATTENDING]-(p)
RETURN id(m) as meeting_id
I am getting:
Node 416 already exists with label Location and property "address"=[36350 Van Dyke Ave]
You've encountered a common misunderstanding of MERGE. MERGE merges on everything you've specified within the single MERGE clause. So the order of operations are:
Search for a :Location node with all of the properties you've specified.
If found, return the node.
If not found, create the node.
Your problem occurs at step 3. Because a node with all of the properties you've specified does not exist, it goes to step 3 and tries to create a node with all of those properties. That's when your uniqueness constraint is violated.
The best practice is to merge on the property that you've constrained to be unique and then use SET to update the other properties. In your case:
MERGE (l:Location {address:"36350 Van Dyke Ave"})
SET l.location_name = "Starbucks",
l.city = "Sterling Heights"
...
The same logic is going to apply for the relationships you're merging later in the query. If the entire pattern doesn't exist, it's going to try to create the entire pattern. That's why you should stick to the best practice of:
MERGE (node1:Label1 {unique_property: "value"})
MERGE (node2:Label2 {unique_property: "value"})
MERGE (node1)-[:REL]-(node2)

Making a relation in neo4j

I am not sure what I am doing wrong here, so here is how I create nodes
CREATE (urlnode_1:UrlNode {url:'url1', nodenumber:1})
CREATE (urlnode_2:UrlNode {url:'url2', nodenumber:2})
I create relations as follows
CREATE
(urlnode_1)-[:OutLink {anchor_text:['MY']}]->(urlnode_2)
Two nodes are created successfully first, now on running the code to create the relation, I would have liked the relation to exist between the two created nodes but it creates two new nodes say 3 and 4 and shows a relation between them. What am i doing wrong here?
To guide you the best way I can, let's sum up some Neo4j basics concerning node and relationships creation :
A node can have one or more labels, labels are meaned to group the nodes by domain (User, Speaker, Company, etc..see a label as a table name for e.g. ). A node can also have properties.
A relationship can have only ONE type, relationships are organizing the graph. Relationships can also have properties.
To create a node, you can use the CREATE writing clause :
CREATE (n:Person {firstname: 'John'})
The CREATE statement will not check if other nodes with same label and properties already exists, it will just create a new node
Relationships can also be created with the same clause :
MATCH (n:Person {firstname: 'John'}), (p:Person {firstname: 'Pierre'})
CREATE (n)-[:KNOWS]->(p)
A complete pattern can also be created in one go :
CREATE (n:Person {name:'Chris'})-[:KNOWS]->(p:Person {name:'Oliver'})
REMINDER : CREATE will not check for existing nodes.
--- AND NOW MERGE ---
MERGE will lazily check for existing nodes, see him as a MATCH OR CREATE clause :
MERGE (n:Person {firstname:'Fred'})
If the node with label Person and firstname Fred does not exist, the node will be created, otherwise nothing will happen. This is where come the handy ON MATCH and ON CREATE mentionned by #joslinm .
If you run this query multiple times after the node creation, your graph will not change, if you know the http protocol, you can say that MERGE is an indempotent request.
Be aware that, MERGE will ensure that an entire pattern exist in the database, by creating it if it does not already exist, meaning that if you do MERGE with a complete pattern, the entire pattern will be looked up for existence, not a single node :
Say a node with label Person and name property with value 'John' already exist in the db :
MERGE (n:Person {name:'John'})
will not affect the graph
However :
MERGE (n:Person {name:'John'})-[:KNOWS]->(:Person {name:'Nathalia'})
A new John node will be created, because the entire pattern does not exist.
It is recommended to use MERGE incrementally :
MERGE (n:Person {name:'John'})
MERGE (p:Person {name:'Nathalia'})
MERGE (n)-[:KNOWS]->(p)
If you want to know more about the MERGE clause, I can highly recommend you this wonderful article from Luanne on GraphAware : http://graphaware.com/neo4j/2014/07/31/cypher-merge-explained.html
Chris
If you create a relationship, a new one will get created every single time. They are not inherently unique. It sounds like you'd rather be merging the relationship; i.e., if they relationship is there, match it, if not, create it.
The merge syntax for it is as follows:
MERGE (a:Node)-[:LIKES]->(b:Node)
ON
MATCH SET a.msg = 'I matched!'
ON
CREATE SET a.msg = 'I created!'
RETURN a
You can try it out here: http://console.neo4j.org/
You'll notice that first the msg will be "I created!" then after it matches, it will be "I matched!"

Match node with multiple properties in neo4j

I'm having troubles with the current data model and was wondering if anyone could suggest a better way to structure my data.
My nodes labelled 'Person' have 5-10 properties each like: Name, Address, Nationality, Phone, Age... And there is no unique property I could use as an Index.
Since I don't want duplicates, every time I create a new person I use MERGE instead than CREATE. But the problem is that doing a MERGE where I'm matching 5-10 properties on a node slows down the queries enormously.
So would taking out the properties in each Person node as separate nodes labeled Address, Nationality, Phone, Age help the performance of my MERGE query? Any other possible solutions?
Thanks in advance!
How about generating a GUID/UUID that's unique for each person and using that as your ID for each Person? Then you could MERGE on that property quickly and use ON CREATE to set the other properties e.g.
MERGE (p:Person {id: 'abcd-efgh-...'})
ON CREATE SET p.name = "mark", p.address = "..."
Or if not then maybe a hash or combination of all those properties as your key e.g.
MERGE (p:Person {id: "name-address-nationality-phone"})
ON CREATE SET p.name = "...", p.address = "..."

Resources