How to add property and value from loadCsv - neo4j

I have csv with header like :
string,alias,source
I was trying to use query like this :
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM file AS line WITH line
Match (p{name:SUBSTRING(line.string ,7)})
Create (p:line.source:line.alias)
but I get error about last line. Is it possible to add new property to exsisting node using loadCsv ?

I think you're looking for the SET command. CREATE is reserved for nodes, relationships, and patterns.
You may want to review the documentation or the Cypher reference.
Also your match doesn't seem to be using a label (you're only using the variable p). If at all possible, use labels in your graph, without them you can't take advantage of your indexes or unique constraints, and even without those, it ensures that you're only scanning over nodes of that label instead of your entire graph.

Related

neo4j load csv taking infinite time to execute my query

I am loading the data into neo4j using loadcsv function. I have two types of nodes -Director and Company.
The below command is working fine and is executing within 50milisec.
LOAD CSV FROM "file:///Director.csv" AS line
CREATE(:Director {DirectorDIN:line[0]})
Load csv from "file:///Company.csv" AS line
Create(:Company{CompanyCIN:line[0]})
Now I am trying to build the relationship between the two nodes which is taking an infinite time to execute my query. Here is the simple query that I am trying.
LOAD CSV FROM "file:///CompanyDirector.csv" AS line
match(c:Company{CompanyCIN:toString(line[0])}),(d:Director{DirectorDIN:toString(line[1])}) create (c)-[:Directed_by]->(d)
I have also tried:
LOAD CSV FROM "file:///CompanyDirector.csv" AS line
match(c:Company{CompanyCIN:line[0]}),(d:Director{DirectorDIN:line[1]}) create (c)-[:Directed_by]->(d)
It is taking an infinite time. Please let me know what can be the issue over here?
Information:
The CSV file does not contain more than 20k records.
CompanyCIN is alphanumeric
DirectorDIN is numeric in nature
I think you forgot to create some schema constraint in your database :
CREATE CONSTRAINT on (n:Company) ASSERT n.CompanyCIN IS UNIQUE;
CREATE CONSTRAINT on (n:Director) ASSERT n.DirectorDIN IS UNIQUE;
Without thoses constraints the complexity of your query is N*M, where N is the number of Company nodes and M the number of Director.
To see what I mean, you can EXPLAIN your query before and after the creation of thoses constraints.
Moreover, you should also use the PERIODIC COMMIT on your LOAD CSV query, like that :
USING PERIODIC COMMIT 5000
LOAD CSV FROM "file:///CompanyDirector.csv" AS line
MATCH (c:Company{CompanyCIN:line[0]})
MATCH (d:Director{DirectorDIN:line[1]})
CREATE (c)-[:Directed_by]->(d)
The main issue was that you did not have indexes on :Company(CompanyCIN) and :Director{DirectorDIN). Without the indexes, neo4j is forced to evaluate every possible pair of Company and Director nodes for every line in your CSV file. That takes a lot of time.
CREATE INDEX ON :Company(CompanyCIN);
CREATE INDEX ON :Director{DirectorDIN);
By the way, creating the corresponding uniqueness constraints (as suggested by #logisma) has the side-effect of creating these indexes, but the issue was not caused by missing uniqueness constraints.
In addition, you should avoid creating duplicate Directed_by relationships by using MERGE instead of CREATE.
This should work better (you can use the USING PERIODIC COMMIT option, as suggested by #logisima if you have ):
USING PERIODIC COMMIT 5000 LOAD CSV FROM "file:///CompanyDirector.csv" AS line
MATCH (c:Company {CompanyCIN:line[0]})
MATCH (d:Director {DirectorDIN:line[1]})
MERGE (c)-[:Directed_by]->(d)

How to update existing specific node in graphdb by loading updated CSV file in neo4J apoc

I am facing problem updating node by loading recently updated csv file in. neo4j. since it is a large file I think apoc procedure is need to be used. I have updated existing node by loading external updated file without apoc. but problem is I need to update it in parallel using apoc. here is my file element
original element in file
ID,SHOPNAME,DIVISION,DISTRICT,THANA
1795,ARAFAT DISTRIBUTION,RAJSHAHI,JOYPURHAT,Panchbibi
1796,CONNECT DISTRIBUTION,DHAKA,GAZIPUR,Gazipur Sadar
1797,HUMAYUN KABIR,DHAKA,DHAKA,Demra
I have created node from this CSV
then I have another updated file u.csv the updated elements are given bellow
ID,SHOPNAME,DIVISION,DISTRICT,THANA
1795,ABC,RAJSHAHI,JOYPURHAT,Panchbibi
1796,XYZ,DHAKA,GAZIPUR,Gazipur Sadar
1797,HUMAYUN KABIR,DHAKA,DHAKA,Demra
without apoc my query was
LOAD CSV FROM "file:///u.csv" AS line
MERGE (c:Agent {ID:line[0]})
ON MATCH SET c.SHOPNAME = line[1]
RETURN c
This code updated desired column except I have got a blank node
{"ID":"ID"}
my first question is why a new blank node is created and how could I solve this
Now I am wanting it for updating large file so I have used to apoc procedure for batch processing
with apoc my query was
CALL apoc.periodic.iterate('LOAD CSV WITH HEADERS FROM "file:///u.csv" AS line return line','MERGE (p:Agent{ID:TOINTEGER(line.ID)}) ON MATCH SET p.SHOPNAME=TOINTEGER(line.SHOPNAME) ' ,{batchSize:10000, iterateList:true, parallel:true});
but I could not updated the specific nodes rather it created two nodes with related id so I am getting 5 nodes here rather than 3 nodes
{"ID":1795}
{"ID":1796}
I am very new to neo4j but trying to learn. kindly help me to solve the problem
I am using neo4j 3.5.6 and apoc 3.5.0.4
I see 2-3 possible issues here:
Regarding Duplicate Nodes: You used TOINTEGER function in one and not in another data load query, so nodes are duplicated. One Agent node with id with the data type string and other Agent node with id with the data type integer.
Suggestion: Use TOINTEGER function in both queries or none.
Regarding Blank Nodes:
In your second query, you are setting node property only if node found(i.e. ON MATCH).
But as per the first case, we have found it's creating a new node every time and not matching any of the previous node. Also not setting property when creating. So there will nodes with no SHOPNAME.
Suggestion: Either Add ON CREATE to MERGE query or remove ON MATCH from MERGE query and update node every time. Adding ON
CREATE is a recommended and efficient way.
Please find below query with ON CREATE:
MERGE (c:Agent {ID:line[0]})
ON CREATE SET
c.SHOPNAME = line[1]
You are also converting SHOPNAME to integer in your query with APOC using TOINTEGER, this will not work.

Creating relationships between nodes in neo4j is extremely slow

I'm using a python script to generate and execute queries loaded from data in a CSV file. I've got a substantial amount of data that needs to be imported so speed is very important.
The problem I'm having is that merging between two nodes takes a very long time, and including the cypher to create the relations between the nodes causes a query to take around 3 seconds (for a query which takes around 100ms without).
Here's a small bit of the query I'm trying to execute:
MERGE (s0:Chemical{`name`: "10074-g5"})
SET s0.`name`="10074-g5"
MERGE (y0:Gene{`gene-id`: "4149"})
SET y0.`name`="MAX"
SET y0.`gene-id`="4149"
MERGE (s0)-[:INTERACTS_WITH]->(y0)
MERGE (s1:Chemical{`name`: "10074-g5"})
SET s1.`name`="10074-g5"
MERGE (y1:Gene{`gene-id`: "4149"})
SET y1.`name`="MAX"
SET y1.`gene-id`="4149"
MERGE (s1)-[:INTERACTS_WITH]->(y1)
Any suggestions on why this is running so slowly? I've got index's set up on Chemical->name and Gene->gene-id so I honestly don't understand why this runs so slowly.
Most of your SET clauses are just setting properties to the same values they already have (as guaranteed by the preceding MERGE clauses).
The remaining SET clauses probably only need to be executed if the MERGE had created a new node. So, they should probably be preceded by ON CREATE.
You should never generate a long sequence of almost identical Cypher code. Instead, your Cypher code should use parameters, and you should pass your data as parameter(s).
You said you have a :Gene(id) index, whereas your code actually requires a :Gene(gene-id) index.
Below is sample Cypher code that uses the dataList parameter (a list of maps containing the desired property values), which fixes most of the above issues. The UNWIND clause just "unwinds" the list into individual maps.
UNWIND $dataList AS d
MERGE (s:Chemical{name: d.sName})
MERGE (y:Gene{`gene-id`: d.yId})
ON CREATE SET y.name=d.yName
MERGE (s)-[:INTERACTS_WITH]->(y)

How to define large set of properties of a node without having to type them all?

I have imported a csv file into neo4j. I have been trying to define a large number of properties (all the columns) for each node. How can i do that without having to type in each name?
I have been trying this:
USING PERIODIC COMMIT
load csv WITH headers from "file:///frozen_catalog.csv" AS line
//Creating nodes for each product id with its properties
CREATE (product:product{id : line.`o_prd`,
Gross_Price_Average: TOINT(line.`Gross_Price_Average`),
O_PRD_SPG: TOINT(line.`O_PRD_SPG`)});
You can adding properties from maps. For example:
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
MERGE (P:Product {productID: row.productID})
SET P += row
http://neo4j.com/docs/developer-manual/current/cypher/clauses/set/#set-adding-properties-from-maps
The LOAD CSV command cannot perform automatic type conversion to ints on certain fields, that must be done explicitly (though you can avoid having to explicitly mention all other fields by using the map projection feature to transform your line data before setting it via stdob--'s suggestion).
You may want to take a look at Neo4j's import tool, as this will allow you to specify field type in headers, which should perform type conversion for you.
That said, 77 columns is a lot of data to all store on individual nodes. You may want to take another look at your data and figure out if some of those properties would be better modeled as nodes with their own label with relationships to your product nodes. You mentioned some of these were categorical properties. Categories are well suited to be modeled separately as nodes instead of as properties, and maybe some of your other properties would work better as nodes as well.

In Neo4j, is there a way to read relationship name dynamically using loadcsv?

I have created nodes using LOAD CSV method using Cypher. The next part is creating relationships with the nodes. For that I have CSV in the following format
fromStopName,from,route,toStopName,to
Swargate,1,route1_1,Swargate Corner,2
Swargate Corner,2,route1_1,Hirabaug,3
Hirabaug,3,route1_1,Maruti,4
Maruti,4,route1_1,Mandai,5
Now I would like to have "route" name as relationship between nodes. So, I am using the following LOAD CSV command in CYPHER
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:C:\\\\busroutes.csv" AS row
MATCH(f {name:row.fromStopName}),(t {name:row.toStopName}) CREATE f - [:row.route]->t
But looks like, I cannot do that. Instead, if I name relationship statically and then assign property from csv route field, it works.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:C:\\\\busroutes.csv" AS row
MATCH(f {name:row.fromStopName}),(t {name:row.toStopName}) CREATE f - [:CONNECTS {route: row.route}]->t
I am wondering if this is disabled to enforce good practice of having "pure" verb kind of relationships and avoiding creating multiplicity of same relationship. like "connected by 1_1" "connected by 1_2".
Or I am just not finding the right link or not using correct syntax. Appreciate help!
Right now you can't as this is structural information.
Either use neo4j-import tool for that.
Or use one CSV file per type and spell out the rel-type.
Or even filter the CSV and do multi-pass:
e.g.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:C:\\\\busroutes.csv" AS row
with row where row.route = "route1_1"
MATCH(f {name:row.fromStopName}),(t {name:row.toStopName})
CREATE (f)-[:route1_1]->(t)
There is also a trick using fake conditionals but you still have to spell them out.

Resources