How to add unique data to a neo4j graph database - neo4j

I am adding iteratively data to my neo4j database but I am stuck with how to overwrite or update existing data and to check whether the data does not already exist in there.
Basically I have a set of movies with their corresponding id's, e.g.:
[
{id: 'gameofthrones', genre: 'fantasy', release: '2017'},
{id: 'inception', genre: 'scifi', release: '2010'},
...
]
I can add the movies as follows:
CREATE
(m1:Movie {id: 'gameofthrones', genre: 'fantasy', release: '2017'}),
(m2:Movie {id: 'inception', genre: 'scifi', release: '2010'})
However, when I run the script two times, then it creates 4 nodes instead of keeping it at two nodes.
So my question is, how can I make sure that it checks whether the node id is already present, and if so overwrite it instead of creating a new node?
I tried (but only the properties get added)
// data
attributes['id'] = 'gameofthrones';
attributes['genre'] = 'fantasy';
...
// query
MERGE ( s:Movie {attributes}.id)
ON CREATE SET ( s:Movie {attributes} )
which I call in NodeJS as follows:
executeQuery(queryStr, {"attributes": attributes})
// cypher (nodejs)
function executeQuery(queryStr, params) {
var qq = Q.defer();
db.cypher({
query: queryStr,
params: params,
}, function (error, results) {
if (error) {
qq.reject(error);
} else {
if (results.length != 0) {
qq.resolve(true);
} else {
qq.resolve(false);
}
};
});
return qq.promise;
};

you must change your query to this
MERGE ( s:Movie {attributes}.id)
ON CREATE SET s += {attributes}
ON MATCH SET s += {attributes} // optional
this should work, but you should use apoc.map.clean() so you do not set the id twice, which can cause some problems.
MERGE ( s:Movie {attributes}.id)
ON CREATE SET s += apoc.map.clean({attributes},['id'],[])

You can achieve this with MERGE clause as follows
MERGE (m1:Movie {id: 'gameofthrones'})
ON CREATE SET m1.genre = 'fantasy', m1.release = '2017'
MERGE (m2:Movie {id: 'inception'})
ON CREATE SET m2.genre: 'scifi', m2.release = '2010'
Ideally you want to create queries with parameters instead of literal strings. You can achieve this if you user apoc.load.json
with "file:///home/sample.json" as url // can be also http://url/sample.json
CALL apoc.load.json(url) yield value
UNWIND value as item
MERGE (m1:Movie {id: item.id})
ON CREATE SET m1.genre = item.genre, m1.release = item.release
example for dynamic properties with apoc functions:
with "file:///home/sample.json" as url // can be also http://url/sample.json
CALL apoc.load.json(url) yield value
UNWIND value as item
MERGE (m1:Movie {id: item.id})
ON CREATE SET m1 += apoc.map.clean(item,['id'],[])
or if you do not have apoc plugin:
with "file:///home/sample.json" as url // can be also http://url/sample.json
CALL apoc.load.json(url) yield value
UNWIND value as item
MERGE (m1:Movie {id: item.id})
ON CREATE SET m1 += item
note that id will first be merged and later updated with ON CREATE SET and you want to avoid writing a single property twice, using apoc and above query we can achieve that

Related

Noe4j: How do you handle first seen and last seen properties on import when not in first import statement?

So I am working with email data. I was requested to add a count property of times seen for relationships between nodes, and a second requirement to add a first seen and last seen to each relationship and every node but the recipient(the data is external to internal so the recipient does not require the first or last seen).
So I started working with the following imports. This seems to work fine for the NO ATTACHMENT OR LINK if the sender is in the first import, but if the sender is not in the first import the first and last seen portion is messed up because the initial set is in the first import.
// NO ATTACHMENT OR LINK - FIRST IMPORT
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_1.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld, last_seen: datetime(row.DateTime) })
SET a.first_seen = coalesce(a.last_seen)
MERGE (b:Recipient { name: row.To, last_seen: datetime(row.DateTime) })
SET b.first_seen = coalesce(a.last_seen)
WITH a,b,row
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {last_seen: datetime(row.DateTime)}, b, {}) YIELD rel as rel1
SET rel1.first_seen = coalesce(rel1.last_seen)
SET rel1.times_seen = coalesce(rel1.times_seen, 0) + 1
RETURN a,b
// NO ATTACHMENT OR LINK - REST OF IMPORTS
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_2.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b) YIELD rel
SET rel.last_seen = dt
SET rel.times_seen = coalesce(rel.times_seen, 0) + 1
RETURN a, b
Anyways for the way I am importing this data, is there a better way to do this, so that I dont have to break up the data into an initial import and following imports with a different import statement. And how should I handle the first seen and last seen properties if I go about it this way.
This logic should work for both first and non-first passes:
LOAD CSV WITH HEADERS FROM "file:///sessions/new_neo_test_1.csv" AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.first_seen = dt
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.first_seen = dt
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b, {}) YIELD rel
SET rel.first_seen = COALESCE(rel.first_seen, dt)
SET rel.last_seen = dt
SET rel.times_seen = COALESCE(rel.times_seen, 0) + 1
RETURN a, b
You just need to use the appropriate file path, which should probably be passed as a parameter instead of being hardcoded, as shown here.
After a MERGE clause, the optional ON CREATE clause is only executed if the MERGE created something.
Also, you should never specify mutable properties (like last_seen) in a MERGE pattern, as that would just cause the creation of a new node if the mutable property has a new value.
In this scenario, it would probably make good sense to split out your data load, and separate out the phases of creating the nodes, and then joining them up, i.e.:
First pass - MERGE the Senders and Receivers from all your CSVs
Second pass - MATCH the Senders and Receivers, and then join the relationships based on desired logic
Like this you know that the Senders and Receivers are already there when you're adding relationships after

Neo4J - Optimizing 3 merge queries into a single query

I am trying to make a Cypher query which makes 2 nodes and adds a relationship between them.
For adding a node I'm checking if the node is existing or not, if existing then I'm simply going ahead and setting a property.
// Query 1 for creating or updating node 1
MERGE (Kunal:PERSON)
ON CREATE SET
Kunal.name = 'Kunal',
Kunal.type = 'Person',
Kunal.created = timestamp()
ON MATCH SET
Kunal.lastUpdated = timestamp()
RETURN Kunal
// Query 2 for creating or updating node 2
MERGE (Bangalore: LOC)
ON CREATE SET
Bangalore.name = 'Bangalore',
Bangalore.type = 'Location',
Bangalore.created = timestamp()
ON MATCH SET
Bangalore.lastUpdated = timestamp()
RETURN Bangalore
Likewise I am checking if a relationship exists between the above created nodes, if not exists then creating it else updating its properties.
// Query 3 for creating relation or updating it.
MERGE (Kunal: PERSON { name: 'Kunal', type: 'Person' })
MERGE (Bangalore: LOC { name: 'Bangalore', type: 'Location' })
MERGE (Kunal)-[r:LIVES_IN]->(Bangalore)
ON CREATE SET
r.duration = 36
ON MATCH SET
r.duration = r.duration + 1
RETURN *
The problem is these are 3 separate queries which will have 3 database calls when I run it via the Python driver. Is there a way to optimize these queries into a single query.
Of course you can concatenate your three queries to one.
In this case you can omit the first and second MERGE of your last query, because it is assured by the start of new query already.
MERGE (kunal:PERSON {name: ‘Kunal'})
ON CREATE SET
kunal.type = 'Person',
kunal.created = timestamp()
ON MATCH SET
kunal.lastUpdated = timestamp()
MERGE (bangalore:LOC {name: 'Bangalore'})
ON CREATE SET
bangalore.type = 'Location',
bangalore.created = timestamp()
ON MATCH SET
bangalore.lastUpdated = timestamp()
MERGE (kunal)-[r:LIVES_IN]->(bangalore)
ON CREATE SET
r.duration = 36
ON MATCH SET
r.duration = r.duration + 1
RETURN *

When I UNWIND an empty array, the parent item doesn't MERGE. How do I get this query to work?

UNWIND { newGames } as gameItem
UNWIND gameItem.release_dates as releaseDateItem
UNWIND gameItem.publishersWithName as publisherItem
UNWIND gameItem.developersWithName as developerItem
MERGE (game:Game {id: gameItem.id})
ON CREATE SET game = {${gameItemTemplate}}
ON MATCH SET game = {${gameItemTemplate}}
MERGE (platform:Platform {name: releaseDateItem.platform})
MERGE (publisher:GameCompany {name: publisherItem.name})
MERGE (developer:GameCompany {name: developerItem.name})
MERGE (game)-[:RELEASED {date: releaseDateItem.date}]->(platform)
MERGE (publisher)-[:PUBLISHED]->(game)
MERGE (developer)-[:DEVELOPED]->(game)
gameItem.publishersWithName and gameItem.developersWithName can potentially be empty. In such cases, the Game doesn't get added.
When I remove all the publisher and developer stuff (or split the queries in 2, but then I have to UNWIND newGames twice...), they are added successfully:
UNWIND { newGames } as gameItem
UNWIND gameItem.release_dates as releaseDateItem
MERGE (game:Game {id: gameItem.id})
ON CREATE SET game = {${gameItemTemplate}}
ON MATCH SET game = {${gameItemTemplate}}
MERGE (platform:Platform {name: releaseDateItem.platform})
I'd like to add the Game even if the gameItem.publishersWithName or gameItem.developersWithName is [].
UNWIND turns an empty array into 0 rows, that's why the query doesn't continue.
2 solutions :
a) use FOREACH instead
b) use a CASE :
UNWIND CASE length({yourVar}) WHEN 0 THEN [null] ELSE {yourVar} END
AS it
// continue query
NB: will be addressed in APOC

Common Tag on Neo4j does not work

I wonder why my query does not work while before I could get answer from it:
I have lable as "Person" who are connected to with relation "HAS" lables as "Data" and Data are connected to with relation "TAGGED" Tag as another node
I want to gain common tag between two person
MATCH (o:Person {username: "Mahsa" })-[:HAS]-()-[r1:TAGGED]->(tag)
<- [r2:TAGGED]-()-[:HAS]-(f:Person {username: "Frank"})
return tag.name
my Graph setup:
CREATE (_0 { `name`:"Mahsa" })
CREATE (_1 { `name`:"Frank" })
CREATE (_2 { `name`:"Data1" })
CREATE (_3 { `name`:"Data2" })
CREATE (_4 { `name`:"Tag1" })
CREATE (_5 { `name`:"Tag2" })
CREATE (_6 { `name`:"Tag3" })
CREATE (_7 { `name`:"Tag4" })
CREATE _0-[:`HAS`]->_2
CREATE _0-[:`HAS`]->_3
CREATE _1-[:`HAS`]->_2
CREATE _1-[:`HAS`]->_3
CREATE _2-[:`TAGGED`]->_4
CREATE _2-[:`TAGGED`]->_5
CREATE _3-[:`TAGGED`]->_6
CREATE _3-[:`TAGGED`]->_7
and when I test this query on http://console.neo4j.org/ again I get null:
MATCH (me)-[:HAS]->(myFavorites)-[:TAGGED]->(tag)
<-[:TAGGED]-(theirFavorites)<-[:HAS]-(people)
WHERE me.name = 'Mahsa' AND NOT me=people
RETURN people.name AS name, count(*) AS similar_favs
ORDER BY similar_favs DESC
None of your tags are shared.
If you change your setup to have Tag1 and Tag2 shared then it returns sth.
create
(_0 {`name`:"Mahsa"}),
(_1 {`name`:"Frank"}),
(_2 {`name`:"Data1"}),
(_3 {`name`:"Data2"}),
(_4 {`name`:"Tag1"}),
(_5 {`name`:"Tag2"}),
_0-[:HAS]->_3,
_0-[:HAS]->_2,
_1-[:HAS]->_3,
_1-[:HAS]->_2,
_2-[:TAGGED]->_5,
_2-[:TAGGED]->_4,
_3-[:TAGGED]->_5,
_3-[:TAGGED]->_4
see: http://console.neo4j.org/r/9a9cto
Your data setup is wrong, it misses labels and the correct property-names for your first query!

Neo4j cypher query: Exclude subpaths from MATCH

I would like to match certain paths in my graph. These good paths should not contain certain subpaths, e.g. avoiding certain nodes.
For example, given the graph
a->b->c->d
a->avoid1->b
c->avoid2->d
NB: There could be many more nodes in between the edges I specified, e.g. a->t0->t1->b or a->avoid1->t2->b.
Now I would like to get all paths from a to d which do not contain certain subpaths, to be precise, those subpaths going from a over avoid1 to b and from c over avoid2 to d.
My current (insufficient) approach is to MATCH the entire path I am looking for and then specifying the node I want to avoid:
MATCH p=(a)-[:CF*]->(b)-[:CF*]->(c)-[:CF*]->(d)
WHERE NOT (avoid1 IN nodes(p))
This is not working out for me because I actually need to "filter out" subpaths and not nodes.
I need something like this:
MATCH p=(a)-[:CF*]->(b)-[:CF*]->(c)-[:CF*]->(d)
WHERE NOT ( (a)-[:CF*]->(avoid1)->[:CF*]->(b) IN p) AND NOT ( (c)-[:CF*]->(avoid2)->[:CF*]->(d) )
This does not work, I know but it could help to explain what I need: a way to filter out paths based on the fact if they contain certain subpaths.
EDIT:
Here are the commands:
MERGE (a:MYTYPE { label:'a' })
MERGE (b:MYTYPE { label:'b' })
MERGE (c:MYTYPE { label:'c' })
MERGE (d:MYTYPE { label:'d' })
MERGE (avoid1:MYTYPE { label:'avoid1' })
MERGE (avoid2:MYTYPE { label:'avoid2' })
CREATE (a)-[:CF]->(b)
CREATE (b)-[:CF]->(c)
CREATE (c)-[:CF]->(d)
CREATE (a)-[:CF]->(avoid1)
CREATE (avoid1)-[:CF]->(b)
CREATE (c)-[:CF]->(avoid2)
CREATE (avoid2)-[:CF]->(d)
and my current try (as suggested by dave's answer):
MATCH (a:MYTYPE { label:'a' })
MATCH (b:MYTYPE { label:'b' })
MATCH (c:MYTYPE { label:'c' })
MATCH (d:MYTYPE { label:'d' })
MATCH (avoid1:MYTYPE { label:'avoid1' })
MATCH (avoid2:MYTYPE { label:'avoid2' })
MATCH p=(a)-[:CF*]->(b)-[:CF*]->(c)-[:CF*]->(d)
WHERE NOT ( (a)-[:CF*]->(avoid1 {label:'avoid1'})-[:CF*]->(b) )
RETURN p
Yet, this gives me "(no rows)".
This query should allow you to filter on paths:
MATCH p=(a)-[:CF*]->(b)-[:CF*]->(c)-[:CF*]->(d)
WHERE NOT ( (a)-[:CF*]->()-[:CF*]->(b))
AND NOT ( (c)-[:CF*]->()-[:CF*]->(d) )
return p;`
You could also specify a label/property for the node that you want to filter on:
MATCH p=(a)-[:CF*]->(b)-[:CF*]->(c)-[:CF*]->(d)
WHERE NOT ( (a)-[:CF*]->(:Person {name:'Dave'})-[:CF*]->(b)) AND NOT ( (c)-[:CF*]->()-[:CF*]->(d) )
return p;

Resources