Neo4j: second merge for relationship not working properly - neo4j

I am working with email data. I have 2 outcomes in the field Outcome2 and they are FAILED_TO and TO. The first one FAILED_TO works fine if there is a failed to event the nodes are created and all properties, are updated or added. But the TO portion doesnt work. No new nodes are created. Now this was created later on in the statement. This may be a simple fix. Any help would be greatly appreciated. And I would like to avoid apoc if at all possible.
// NO ATTACHMENT OR LINK - FOLLOWING IMPORTS
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_3.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.firstseen = dt
SET a.lastseen = dt
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.firstseen = dt
SET b.lastseen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false" AND row.Outcome2 = "FAILED_TO"
MERGE (a)-[rel1:FAILED_TO]->(b)
ON CREATE SET rel1.firstseen = dt
SET rel1.lastseen = dt
SET rel1.timesseen = coalesce(rel1.timesseen, 0) + 1
WITH a,b,row,dt,rel1
WHERE row.Url = "false" AND row.FileHash = "false" AND row.Outcome2 = "TO"
MERGE (a)-[rel2:TO]->(b)
ON CREATE SET rel2.firstseen = dt
SET rel2.lastseen = dt
SET rel2.timesseen = coalesce(rel2.timesseen, 0) + 1
return a,b

It is because of these two lines
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false" AND row.Outcome2 = "FAILED_TO"
The WHERE ... AND row.Outcome2 = "FAILED_TO literally removes the other rows where row.Outcome2 = "TO".
Instead, you can do something like the following. Instead of the WHERE row.outcome2, create a collection of [1] for each case when either FAILED_TO or TO are found. Then later, use that in a FOREACH loop to create that relationship if the corresponding collection has a value.
Since roe.Woutcome2 can only be one value or the other only one of the sets of statement inside the FOREACH clause will actually be executed per row.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_3.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.firstseen = dt
SET a.lastseen = dt
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.firstseen = dt
SET b.lastseen = dt
WITH a, b, row, dt
, CASE WHEN row.Outcome2 = 'FAILED_TO' THEN [1] ELSE [] END AS fail
, CASE WHEN row.Outcome2 = 'TO' THEN [1] ELSE [] END AS success
WHERE row.Url = "false" AND row.FileHash = "false"
FOREACH ( x in fail |
MERGE (a)-[rel1:FAILED_TO]->(b)
ON CREATE SET rel1.firstseen = dt
SET rel1.lastseen = dt
SET rel1.timesseen = coalesce(rel1.timesseen, 0) + 1
)
FOREACH ( x in success |
MERGE (a)-[rel2:TO]->(b)
ON CREATE SET rel2.firstseen = dt
SET rel2.lastseen = dt
SET rel2.timesseen = coalesce(rel2.timesseen, 0) + 1
)
RETURN a, b

Related

how to create and update nodes and property using plain cypher?

MERGE (c:contact {guid : '500010'})
ON CREATE SET
c.data_source = '1',
c.guid = '500010',
c.created = timestamp()
ON MATCH SET
c.lastUpdated = timestamp()
MERGE (s:speciality {specialtygroup_desc : 'cold'})
ON CREATE SET s.data_source = '1',
s.specialtygroup_desc = 'fever',
s.created = timestamp()
ON MATCH SET s.data_source = '1',
s.specialtygroup_desc = 'comman cold',
s.lastUpdated = timestamp()
MERGE (c)-[r:is_specialised_in]->(s)
ON CREATE SET
r.duration = 1
ON MATCH SET
r.duration = r.duration + 1
On the first run, node is created as "fever". On the second run, I have updated the specialty_group to "common cold". But it is creating new node with "fever". I am not able to update the "fever" to "common cold". What changes should I make to the above query?

how to create and update nodes and property using plain cypher query?

How do I create and update nodes and property using plain cypher query?
Below is my query:
MERGE (c:contact {guid : '500010'})
ON CREATE SET
c.data_source = '1',
c.guid = '500010',
c.created = timestamp()
ON MATCH SET
c.lastUpdated = timestamp()
MERGE (s:speciality {specialtygroup_desc : 'cold'})
ON CREATE SET s.data_source = '1',
s.specialtygroup_desc = 'fever',
s.created = timestamp()
ON MATCH SET s.data_source = '1',
s.specialtygroup_desc = 'comman cold',
s.lastUpdated = timestamp()
MERGE (c)-[r:is_specialised_in]->(s)
ON CREATE SET
r.duration = 1
ON MATCH SET
r.duration = r.duration + 1
On the first run, node is created as "fever".
On the second run, I have updated the specialty_group to "common cold". But it is creating new node with "fever". I am not able to update the "fever" to "common cold".
What changes should I make to the above query?
The MERGE (s:speciality {specialtygroup_desc : 'cold'}) clause looks for a specialtygroup_desc value of "cold".
During the first execution, that MERGE clause finds no "cold" node -- so it creates one, and the subsequent ON CREATE clause changes it to "fever".
During the second execution, that MERGE again finds no "cold" node (since it is now a "fever" node), so it again creates a "cold" node and the ON CREATE clause yet again changes it to "fever". The ON MATCH clause is never used. This is why you end up with another "fever" node.
Unfortunately, you have not explained your use case in enough detail to offer a recommendation for how to fix your code.
I think you want to update all node "cold" to "common cold" and if not exists "cold" or "common cold", create new "fever" ?
My suggestion:
OPTIONAL MATCH (ss:speciality {specialtygroup_desc : 'cold'}
SET ss.specialtygroup_desc='common cold', ss.lastUpdated = timestamp()
MERGE (c:contact {guid : '500010'})
ON CREATE SET
c.data_source = '1',
c.guid = '500010',
c.created = timestamp()
ON MATCH SET
c.lastUpdated = timestamp()
MERGE (s:speciality {specialtygroup_desc : 'common cold'})
ON CREATE SET s.data_source = '1',
s.specialtygroup_desc = 'fever',
s.created = timestamp()
MERGE (c)-[r:is_specialised_in]->(s)
ON CREATE SET
r.duration = 1
ON MATCH SET
r.duration = r.duration + 1

Noe4j: How do you handle first seen and last seen properties on import when not in first import statement?

So I am working with email data. I was requested to add a count property of times seen for relationships between nodes, and a second requirement to add a first seen and last seen to each relationship and every node but the recipient(the data is external to internal so the recipient does not require the first or last seen).
So I started working with the following imports. This seems to work fine for the NO ATTACHMENT OR LINK if the sender is in the first import, but if the sender is not in the first import the first and last seen portion is messed up because the initial set is in the first import.
// NO ATTACHMENT OR LINK - FIRST IMPORT
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_1.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld, last_seen: datetime(row.DateTime) })
SET a.first_seen = coalesce(a.last_seen)
MERGE (b:Recipient { name: row.To, last_seen: datetime(row.DateTime) })
SET b.first_seen = coalesce(a.last_seen)
WITH a,b,row
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {last_seen: datetime(row.DateTime)}, b, {}) YIELD rel as rel1
SET rel1.first_seen = coalesce(rel1.last_seen)
SET rel1.times_seen = coalesce(rel1.times_seen, 0) + 1
RETURN a,b
// NO ATTACHMENT OR LINK - REST OF IMPORTS
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_2.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b) YIELD rel
SET rel.last_seen = dt
SET rel.times_seen = coalesce(rel.times_seen, 0) + 1
RETURN a, b
Anyways for the way I am importing this data, is there a better way to do this, so that I dont have to break up the data into an initial import and following imports with a different import statement. And how should I handle the first seen and last seen properties if I go about it this way.
This logic should work for both first and non-first passes:
LOAD CSV WITH HEADERS FROM "file:///sessions/new_neo_test_1.csv" AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.first_seen = dt
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.first_seen = dt
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b, {}) YIELD rel
SET rel.first_seen = COALESCE(rel.first_seen, dt)
SET rel.last_seen = dt
SET rel.times_seen = COALESCE(rel.times_seen, 0) + 1
RETURN a, b
You just need to use the appropriate file path, which should probably be passed as a parameter instead of being hardcoded, as shown here.
After a MERGE clause, the optional ON CREATE clause is only executed if the MERGE created something.
Also, you should never specify mutable properties (like last_seen) in a MERGE pattern, as that would just cause the creation of a new node if the mutable property has a new value.
In this scenario, it would probably make good sense to split out your data load, and separate out the phases of creating the nodes, and then joining them up, i.e.:
First pass - MERGE the Senders and Receivers from all your CSVs
Second pass - MATCH the Senders and Receivers, and then join the relationships based on desired logic
Like this you know that the Senders and Receivers are already there when you're adding relationships after

Neo4J - Optimizing 3 merge queries into a single query

I am trying to make a Cypher query which makes 2 nodes and adds a relationship between them.
For adding a node I'm checking if the node is existing or not, if existing then I'm simply going ahead and setting a property.
// Query 1 for creating or updating node 1
MERGE (Kunal:PERSON)
ON CREATE SET
Kunal.name = 'Kunal',
Kunal.type = 'Person',
Kunal.created = timestamp()
ON MATCH SET
Kunal.lastUpdated = timestamp()
RETURN Kunal
// Query 2 for creating or updating node 2
MERGE (Bangalore: LOC)
ON CREATE SET
Bangalore.name = 'Bangalore',
Bangalore.type = 'Location',
Bangalore.created = timestamp()
ON MATCH SET
Bangalore.lastUpdated = timestamp()
RETURN Bangalore
Likewise I am checking if a relationship exists between the above created nodes, if not exists then creating it else updating its properties.
// Query 3 for creating relation or updating it.
MERGE (Kunal: PERSON { name: 'Kunal', type: 'Person' })
MERGE (Bangalore: LOC { name: 'Bangalore', type: 'Location' })
MERGE (Kunal)-[r:LIVES_IN]->(Bangalore)
ON CREATE SET
r.duration = 36
ON MATCH SET
r.duration = r.duration + 1
RETURN *
The problem is these are 3 separate queries which will have 3 database calls when I run it via the Python driver. Is there a way to optimize these queries into a single query.
Of course you can concatenate your three queries to one.
In this case you can omit the first and second MERGE of your last query, because it is assured by the start of new query already.
MERGE (kunal:PERSON {name: ‘Kunal'})
ON CREATE SET
kunal.type = 'Person',
kunal.created = timestamp()
ON MATCH SET
kunal.lastUpdated = timestamp()
MERGE (bangalore:LOC {name: 'Bangalore'})
ON CREATE SET
bangalore.type = 'Location',
bangalore.created = timestamp()
ON MATCH SET
bangalore.lastUpdated = timestamp()
MERGE (kunal)-[r:LIVES_IN]->(bangalore)
ON CREATE SET
r.duration = 36
ON MATCH SET
r.duration = r.duration + 1
RETURN *

Node creation using cypher Foreach

I have 2 csv files and their sructure is as follows:
1.csv
id name age
1 aa 23
2 bb 24
2.csv
id product location
1 apple CA
2 samsung PA
1 HTC AR
2 philips CA
3 sony AR
// 1.csv
LOAD CSV WITH HEADERS FROM "file:///G:/1.csv" AS csvLine
CREATE (a:first { id: toInt(csvLine.id), name: csvLine.name, age: csvLine.age})
// 2.csv
LOAD CSV WITH HEADERS FROM "file:///G:/2.csv" AS csvLine
CREATE (b:second { id: toInt(csvLine.id), product: csvLine.product, location: csvLine.location})
Now i want to create another node called "third", using the following cypher query.
LOAD CSV WITH HEADERS FROM "file:///G:/1.csv" AS csvLine
MATCH c = (a:first), d = (b.second)
FOREACH (n IN nodes(c) |
CREATE (e:third)
SET e.name = label(a) + label(b) + "id"
SET e.origin = label(a)
SET e.destination = label(b)
SET e.param = a.id)
But the above query give me duplicate entries. I think here it runs 2 time after the load. Please suggest or any alternative way for this.
CREATE always creates, even if something is already there. So that's why you're getting duplicates. You probably want MERGE which only creates an item if it doesn't already exist.
I wouldn't ever do CREATE (e:third) or MERGE (e:third) because without specifying properties, you'll end up with duplicates anyway. I'd change this:
CREATE (e:third)
SET e.name = label(a) + label(b) + "id"
SET e.origin = label(a)
SET e.destination = label(b)
SET e.param = a.id)
To this:
MERGE (e:third { name: label(a) + label(b) + "id",
origin: label(a),
destination: label(b),
param: a.id })
This then would create the same node when necessary, but avoid creating duplicates with all the same property values.
Here's the documentation on MERGE
You don't use csvLine at all for matching the :first and :second node!
So your query doesn't make sense
This doesn't make sense either:
MATCH c = (a:first), d = (b.second)
FOREACH (n IN nodes(c) |
CREATE (e:third)
c are paths with a single node, i.e. (a)
so instead of the foreach you would use a directly instead

Resources