I have 2 csv files and their sructure is as follows:
1.csv
id name age
1 aa 23
2 bb 24
2.csv
id product location
1 apple CA
2 samsung PA
1 HTC AR
2 philips CA
3 sony AR
// 1.csv
LOAD CSV WITH HEADERS FROM "file:///G:/1.csv" AS csvLine
CREATE (a:first { id: toInt(csvLine.id), name: csvLine.name, age: csvLine.age})
// 2.csv
LOAD CSV WITH HEADERS FROM "file:///G:/2.csv" AS csvLine
CREATE (b:second { id: toInt(csvLine.id), product: csvLine.product, location: csvLine.location})
Now i want to create another node called "third", using the following cypher query.
LOAD CSV WITH HEADERS FROM "file:///G:/1.csv" AS csvLine
MATCH c = (a:first), d = (b.second)
FOREACH (n IN nodes(c) |
CREATE (e:third)
SET e.name = label(a) + label(b) + "id"
SET e.origin = label(a)
SET e.destination = label(b)
SET e.param = a.id)
But the above query give me duplicate entries. I think here it runs 2 time after the load. Please suggest or any alternative way for this.
CREATE always creates, even if something is already there. So that's why you're getting duplicates. You probably want MERGE which only creates an item if it doesn't already exist.
I wouldn't ever do CREATE (e:third) or MERGE (e:third) because without specifying properties, you'll end up with duplicates anyway. I'd change this:
CREATE (e:third)
SET e.name = label(a) + label(b) + "id"
SET e.origin = label(a)
SET e.destination = label(b)
SET e.param = a.id)
To this:
MERGE (e:third { name: label(a) + label(b) + "id",
origin: label(a),
destination: label(b),
param: a.id })
This then would create the same node when necessary, but avoid creating duplicates with all the same property values.
Here's the documentation on MERGE
You don't use csvLine at all for matching the :first and :second node!
So your query doesn't make sense
This doesn't make sense either:
MATCH c = (a:first), d = (b.second)
FOREACH (n IN nodes(c) |
CREATE (e:third)
c are paths with a single node, i.e. (a)
so instead of the foreach you would use a directly instead
Related
How do I create and update nodes and property using plain cypher query?
Below is my query:
MERGE (c:contact {guid : '500010'})
ON CREATE SET
c.data_source = '1',
c.guid = '500010',
c.created = timestamp()
ON MATCH SET
c.lastUpdated = timestamp()
MERGE (s:speciality {specialtygroup_desc : 'cold'})
ON CREATE SET s.data_source = '1',
s.specialtygroup_desc = 'fever',
s.created = timestamp()
ON MATCH SET s.data_source = '1',
s.specialtygroup_desc = 'comman cold',
s.lastUpdated = timestamp()
MERGE (c)-[r:is_specialised_in]->(s)
ON CREATE SET
r.duration = 1
ON MATCH SET
r.duration = r.duration + 1
On the first run, node is created as "fever".
On the second run, I have updated the specialty_group to "common cold". But it is creating new node with "fever". I am not able to update the "fever" to "common cold".
What changes should I make to the above query?
The MERGE (s:speciality {specialtygroup_desc : 'cold'}) clause looks for a specialtygroup_desc value of "cold".
During the first execution, that MERGE clause finds no "cold" node -- so it creates one, and the subsequent ON CREATE clause changes it to "fever".
During the second execution, that MERGE again finds no "cold" node (since it is now a "fever" node), so it again creates a "cold" node and the ON CREATE clause yet again changes it to "fever". The ON MATCH clause is never used. This is why you end up with another "fever" node.
Unfortunately, you have not explained your use case in enough detail to offer a recommendation for how to fix your code.
I think you want to update all node "cold" to "common cold" and if not exists "cold" or "common cold", create new "fever" ?
My suggestion:
OPTIONAL MATCH (ss:speciality {specialtygroup_desc : 'cold'}
SET ss.specialtygroup_desc='common cold', ss.lastUpdated = timestamp()
MERGE (c:contact {guid : '500010'})
ON CREATE SET
c.data_source = '1',
c.guid = '500010',
c.created = timestamp()
ON MATCH SET
c.lastUpdated = timestamp()
MERGE (s:speciality {specialtygroup_desc : 'common cold'})
ON CREATE SET s.data_source = '1',
s.specialtygroup_desc = 'fever',
s.created = timestamp()
MERGE (c)-[r:is_specialised_in]->(s)
ON CREATE SET
r.duration = 1
ON MATCH SET
r.duration = r.duration + 1
So I am working with email data. I was requested to add a count property of times seen for relationships between nodes, and a second requirement to add a first seen and last seen to each relationship and every node but the recipient(the data is external to internal so the recipient does not require the first or last seen).
So I started working with the following imports. This seems to work fine for the NO ATTACHMENT OR LINK if the sender is in the first import, but if the sender is not in the first import the first and last seen portion is messed up because the initial set is in the first import.
// NO ATTACHMENT OR LINK - FIRST IMPORT
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_1.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld, last_seen: datetime(row.DateTime) })
SET a.first_seen = coalesce(a.last_seen)
MERGE (b:Recipient { name: row.To, last_seen: datetime(row.DateTime) })
SET b.first_seen = coalesce(a.last_seen)
WITH a,b,row
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {last_seen: datetime(row.DateTime)}, b, {}) YIELD rel as rel1
SET rel1.first_seen = coalesce(rel1.last_seen)
SET rel1.times_seen = coalesce(rel1.times_seen, 0) + 1
RETURN a,b
// NO ATTACHMENT OR LINK - REST OF IMPORTS
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_2.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b) YIELD rel
SET rel.last_seen = dt
SET rel.times_seen = coalesce(rel.times_seen, 0) + 1
RETURN a, b
Anyways for the way I am importing this data, is there a better way to do this, so that I dont have to break up the data into an initial import and following imports with a different import statement. And how should I handle the first seen and last seen properties if I go about it this way.
This logic should work for both first and non-first passes:
LOAD CSV WITH HEADERS FROM "file:///sessions/new_neo_test_1.csv" AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.first_seen = dt
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.first_seen = dt
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b, {}) YIELD rel
SET rel.first_seen = COALESCE(rel.first_seen, dt)
SET rel.last_seen = dt
SET rel.times_seen = COALESCE(rel.times_seen, 0) + 1
RETURN a, b
You just need to use the appropriate file path, which should probably be passed as a parameter instead of being hardcoded, as shown here.
After a MERGE clause, the optional ON CREATE clause is only executed if the MERGE created something.
Also, you should never specify mutable properties (like last_seen) in a MERGE pattern, as that would just cause the creation of a new node if the mutable property has a new value.
In this scenario, it would probably make good sense to split out your data load, and separate out the phases of creating the nodes, and then joining them up, i.e.:
First pass - MERGE the Senders and Receivers from all your CSVs
Second pass - MATCH the Senders and Receivers, and then join the relationships based on desired logic
Like this you know that the Senders and Receivers are already there when you're adding relationships after
I have edges like this:
(People)-[:USE]->(Product)
(People)-[:REVIEW]->(Product)
Now I have a new csv of People who are reviewers but they are missing some of the attributes I have already.
I want to do something like:
LOAD CSV WITH HEADERS FROM "file:///abcd.csv" AS row
MERGE (svc:Consumer {name: row.referring_name})
ON CREATE SET
svc.skewNum = toInteger(row.skew_num)
MERGE (p:PrimaryConsumer) WHERE p.name = svc.name
ON MATCH SET
svc.city = p.city,
svc.latitude = toFloat(p.latitude),
svc.longitude = toFloat(p.longitude),
svc.consumerId = toInteger(p.primaryConsumerId)
Which borks:
Neo.ClientError.Statement.SyntaxError: Invalid input 'H': expected 'i/I' (line 10, column 28 (offset: 346))
"MERGE (p:PrimaryConsumer) WHERE p.name = svc.name"
I am 100% assured that the names are unique and will match a unique consumer name in the existing set of nodes (to be seen).
How would I add existing attributes to new data when I have a match on unique node attributes? (I am hoping to get unique id's, but I have to be able to perform an update to the new data on match)
Thank you.
This is the entire cypher script -- modified as per #cypher's input.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///abcde.csv" AS row
MERGE (svc:Consumer {name: row.referring_name})
ON CREATE SET
svc.skeyNum = toInteger(row.skew_num)
MATCH (p:primaryConsumer {name: svc:name})
ON MATCH SET
svc.city = p.city,
svc.latitude = toFloat(p.latitude),
svc.longitude = toFloat(p.longitude),
svc.providerId = toInteger(p.providerId)
MERGE (spec:Product {name: row.svc_prod_name})
ON CREATE SET
spec.name = row.svc_prov_name,
spec.skew = toInteger(row.skew_id),
spec.city = row.svc_prov_city,
spec.totalAllowed = toFloat(row.total_allowed)
MERGE (svc)-[r:CONFIRMED_PURCHASE]->(spec)
ON MATCH SET r.totalAllowed = r.totalAllowed + spec.totalAllowed
ON CREATE SET r.totalAllowed = spec.totalAllowed
;
MERGE does not accept a WHERE clause.
Change this:
MERGE (p:PrimaryConsumer) WHERE p.name = svc.name
to this:
MERGE (p:PrimaryConsumer {name: svc.name})
[EDIT]
You entire query should then look like this:
LOAD CSV WITH HEADERS FROM "file:///abcd.csv" AS row
MERGE (svc:Consumer {name: row.referring_name})
ON CREATE SET
svc.skewNum = toInteger(row.skew_num)
MERGE (p:PrimaryConsumer {name: svc.name})
ON MATCH SET
svc.city = p.city,
svc.latitude = toFloat(p.latitude),
svc.longitude = toFloat(p.longitude),
svc.consumerId = toInteger(p.primaryConsumerId)
I am trying to make a Cypher query which makes 2 nodes and adds a relationship between them.
For adding a node I'm checking if the node is existing or not, if existing then I'm simply going ahead and setting a property.
// Query 1 for creating or updating node 1
MERGE (Kunal:PERSON)
ON CREATE SET
Kunal.name = 'Kunal',
Kunal.type = 'Person',
Kunal.created = timestamp()
ON MATCH SET
Kunal.lastUpdated = timestamp()
RETURN Kunal
// Query 2 for creating or updating node 2
MERGE (Bangalore: LOC)
ON CREATE SET
Bangalore.name = 'Bangalore',
Bangalore.type = 'Location',
Bangalore.created = timestamp()
ON MATCH SET
Bangalore.lastUpdated = timestamp()
RETURN Bangalore
Likewise I am checking if a relationship exists between the above created nodes, if not exists then creating it else updating its properties.
// Query 3 for creating relation or updating it.
MERGE (Kunal: PERSON { name: 'Kunal', type: 'Person' })
MERGE (Bangalore: LOC { name: 'Bangalore', type: 'Location' })
MERGE (Kunal)-[r:LIVES_IN]->(Bangalore)
ON CREATE SET
r.duration = 36
ON MATCH SET
r.duration = r.duration + 1
RETURN *
The problem is these are 3 separate queries which will have 3 database calls when I run it via the Python driver. Is there a way to optimize these queries into a single query.
Of course you can concatenate your three queries to one.
In this case you can omit the first and second MERGE of your last query, because it is assured by the start of new query already.
MERGE (kunal:PERSON {name: ‘Kunal'})
ON CREATE SET
kunal.type = 'Person',
kunal.created = timestamp()
ON MATCH SET
kunal.lastUpdated = timestamp()
MERGE (bangalore:LOC {name: 'Bangalore'})
ON CREATE SET
bangalore.type = 'Location',
bangalore.created = timestamp()
ON MATCH SET
bangalore.lastUpdated = timestamp()
MERGE (kunal)-[r:LIVES_IN]->(bangalore)
ON CREATE SET
r.duration = 36
ON MATCH SET
r.duration = r.duration + 1
RETURN *
I've created a simple graph using the following data:
CREATE (england:Country {Name:"England",Id:1})
CREATE CONSTRAINT ON (c:Country) ASSERT c.ID IS UNIQUE
CREATE (john:King {Name:"John",Id:1})
CREATE (henry3:King {Name:"Henry III",Id:2})
CREATE (edward1:King {Name:"Edward I",Id:3})
CREATE CONSTRAINT ON (k:King) ASSERT k.ID IS UNIQUE
MATCH (country:Country),(king:King) WHERE country.Id = 1 AND king.Id = 1 CREATE (country)<-[r:KING_OF]-(king)
MATCH (country:Country),(king:King) WHERE country.Id = 1 AND king.Id = 2 CREATE (country)<-[r:KING_OF]-(king)
MATCH (country:Country),(king:King) WHERE country.Id = 1 AND king.Id = 3 CREATE (country)<-[r:KING_OF]-(king)
CREATE (cornwall:County {Name:"Cornwall",Id:1})
CREATE (devon:County {Name:"Devon",Id:2})
CREATE (somerset:County {Name:"Somerset",Id:3})
CREATE CONSTRAINT ON (c:County) ASSERT c.ID IS UNIQUE
MATCH (country:Country),(county:County) WHERE country.Id = 1 AND county.Id = 1 CREATE (country)<-[r:COUNTY_IN]-(county)
MATCH (country:Country),(county:County) WHERE country.Id = 1 AND county.Id = 2 CREATE (country)<-[r:COUNTY_IN]-(county)
MATCH (country:Country),(county:County) WHERE country.Id = 1 AND county.Id = 3 CREATE (country)<-[r:COUNTY_IN]-(county)
CREATE (bristol:City {Name:"Bristol",Id:1})
CREATE (london:City {Name:"London",Id:2})
CREATE (york:City {Name:"York",Id:3})
CREATE CONSTRAINT ON (c:City) ASSERT c.ID IS UNIQUE
MATCH (country:Country),(city:City) WHERE country.Id = 1 AND city.Id = 1 CREATE (country)<-[r:CITY_IN]-(city)
MATCH (country:Country),(city:City) WHERE country.Id = 1 AND city.Id = 2 CREATE (country)<-[r:CITY_IN]-(city)
MATCH (country:Country),(city:City) WHERE country.Id = 1 AND city.Id = 3 CREATE (country)<-[r:CITY_IN]-(city)
(Note that I use 'thiscomputer.mydomain.com' as an alias for 'localhost', which SO ridiculously bars me from using)...
If I go to http://thiscomputer.mydomain.com:7474/browser/ and execute the cypher
MATCH cities = (city:City)-[]->(c:Country {Id:1}), counties = (county:County)-[]->(c:Country {Id:1}), kings = (king:King)-[]->(c:Country {Id:1}) RETURN cities, counties, kings
...I get a nice graph centred around the country of England. No nodes are repeated.
If I fire up a REST client and execute the following:
URL: http://thiscomputer.mydomain.com:7474/db/data/transaction/commit
Verb: POST
Body:
{
"statements" : [{
"statement": "MATCH cities = (city:City)-[]->(c:Country {Id:1}), counties = (county:County)-[]->(c:Country {Id:1}), kings = (king:King)-[]->(c:Country {Id:1}) RETURN cities, counties, kings",
"resultDataContents" : [ "graph" ]
}]
}
(Note that the cypher query is identical)
I end up with a set of results that are too large to paste in here.
Basically, I appear to get the cartesian product of the results. Since there are 3 kings, 3 counties and 3 cities, there are 27 results.
This is a fairly simple graph. In the domain in which I work, the graph I want would contain an aggregate root (in this example, England), and many associated nodes of different types. If the REST API returned the cartesian product, I could end up with many thousands of results.
So my question is this: why does the REST API return the cartesian product of all the returned nodes? Is my cypher incorrect, or am I using the REST API incorrectly?
It would be much simpler if I could get results like this (pseudo-JSON):
{
Country:{
Name: "England",
Id: 1,
Cities:[
...etc
],
Counties:[
...etc
],
Kings:[
...etc
]
}
}
Or possibly even this:
{
data:{
nodes:[
{
Name:"England",
Id: 1,
uniqueIdInResults:1
},
...etc
],
relationships:[
uniqueIdInResults1: 1,
uniqueIdInResults2: 2,
type: "KING_OF"
]
}
}
In short, I think denormalising the data in the results like this can quickly result in a response which is too large. Is there any way to structure the cypher or the call to the REST API to give results with less repetition?
The query you are running in the browser is absolutely returning a Cartesian product as well - it just doesn't appear that way because the visual result will only show the nodes once.
Run the following in the browser. You'll see pretty quickly all the overlap you're getting.
MATCH cities = (city:City)-[]->(c:Country {Id:1}),
counties = (county:County)-[]->(c:Country {Id:1}),
kings = (king:King)-[]->(c:Country {Id:1})
RETURN EXTRACT(x IN NODES(cities) | x.Name),
EXTRACT(x IN NODES(counties) |x.Name),
EXTRACT(x IN NODES(kings) | x.Name)
As expected, I get a Cartesian product of 27 rows.