Cannot successfully import csv to neo4j - neo4j

I am trying to import a CSV file to neo4j database using the following query:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///I:/Traces.csv" AS row
MERGE (e:Event {SystemCall: coalesce(row.syscall, "No Value"), ReturnValue: coalesce(row.retvalue,"No Value"), ReturnTime: coalesce(row.rettime,"No Value"), CallTime: coalesce(row.calltime,"No Value")})
MERGE (s:Subject {ProcessName: coalesce(row.processname, "No Value"), Pid: coalesce(row.pid, "No Value"), tid: coalesce(row.tid, "No Value")})
MERGE (o:Object {Argument1: coalesce(row.arg1, "No Value"), Argument2: coalesce(row.arg2, "No Value")})
MERGE (e)-[:IS_GENERATED_BY]->(s)
MERGE (e)-[:AFFECTS]->(o)
MERGE (e)-[:AFFECTS] ->(s)
The CSV file is hosted at location: https://drive.google.com/open?id=0B8vCvM9jIcTzRktRTGpxOUZXQjA
The query takes almost 80K milliseconds to run but returns no row. Please help.

There are 2 problems.
You must not have superfluous whitespace around the comma delimiters in your CSV files. (Also, you should eliminate the extra 11 commas at the end of each line).
Every field name in your CSV file header has to match the corresponding property name in your Cypher query. Currently, most of the names are different.
By the way, when talking about a Cypher query, the term "return" almost always refers to the RETURN clause. Your question has nothing to do with a RETURN clause, and is actually about data creation.
[UPDATE]
In addition, if you want to import a huge amount of data into a brand new neo4j DB, you should consider using the Import tool instead. It would be much faster than LOAD CSV.

Related

Neo4J - unable to create relationships (30,000)

I've got two csv files Job (30,000 entries) and Cat (30 entries) imported into neo4j and am trying to create a relationship between them
Each Job has a cat_ID and Cat contains the category name and ID
after executing the following
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
it returns (no changes, no records)
I received a prompt recommending that I index the category and so using
Create INDEX ON :Job(cat_id); I did, but I still get the same error
How do I create a relationship between the two?
I am able to get this to work on a smaller dataset
You are probably trying to match on non-existing nodes. Try
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MERGE (job:Job {cat_ID: row.cat_ID})
MERGE (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)
Have a look in your logs and see if you are running out of memory.
You could try chunking the data set up into smaller pieces with Periodic Commit and see if that helps:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///DimCategory.csv' AS row
MATCH (job:Job {cat_ID: row.cat_ID})
MATCH (cat:category {category: row.category})
CREATE (job)-[r:under]->(cat)

Neo4J CSV into Props

I'm using cypher and the neo4j browser to create nodes from csv input.
I want to read in each row of my csv file with headers and then create a node with that row as properties.
MY current code is:
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS ROW
WITH ROW
CREATE (n:node $ROW)
This throws an error saying parameter missing.
Try this
LOAD CSV WITH HEADERS FROM '<yourFilePath>' AS row
CREATE (n:node)
SET n+= row
In Cypher, variables that start with "$" must be passed to the query as parameters. Your Cypher code is locally binding values to the ROW variable (and not passing a parameter), so change $ROW to ROW.
In addition, if you want to make sure that you do not generate duplicate nodes, you should consider using MERGE instead of CREATE. But before you do so, you must carefully read the documentation on MERGE to understand how to use it properly.

Error in importing data into neo4j via csv

I am new to neo4j.
I can't figure out what I am doing wrong?, please help.
LOAD CSV WITH HEADERS FROM "file:///C:\Users\Chandra Harsha\Downloads\neo4j_module_datasets\test.csv" as line
MERGE (n:Node{name:line.Source})
MERGE (m:Node{name:line.Target})
MERGE (n)-[:TO{distance:line.dist}]->(m)
Error message:
Invalid input 's': expected four hexadecimal digits specifying a unicode character (line 1, column 41 (offset: 40))
"LOAD CSV WITH HEADERS FROM "file:///C:\Users\Chandra Harsha\Downloads\neo4j_module_datasets\test.csv" as line"
You've got two problems here.
The immediate error is that the backslashes in the path are being seen as escape characters rather than folder separators - you either have to escape the backslashes (by just doubling them up) or use forward-slashes instead. For example:
LOAD CSV WITH HEADERS FROM "file:///C:\\Users\\Chandra Harsha\\Downloads\\neo4j_module_datasets\\test.csv" as line
MERGE (n:Node{name:line.Source})
MERGE (m:Node{name:line.Target})
MERGE (n)-[:TO{distance:line.dist}]->(m)
However - Neo4j doesn't allow you to load files from arbitrary places on the filesystem anyway as a security precaution. Instead, you should move the file you want to import into the import folder under your database. If you're using Neo4j Desktop, you can find this by selecting your database project and clicking the little down-arrow icon on 'Open Folder', choosing 'Import' from the list.
Drop the CSV in there, then just use its filename in your file space. So for example:
LOAD CSV WITH HEADERS FROM "file:///test.csv" as line
MERGE (n:Node{name:line.Source})
MERGE (m:Node{name:line.Target})
MERGE (n)-[:TO{distance:line.dist}]->(m)
You can still use folders under the import directory - anything starting file:/// will be resolved relative to that import folder, so things like the following are also fine:
LOAD CSV WITH HEADERS FROM "file:///neo4j_module_datasets/test.csv" as line
MERGE (n:Node{name:line.Source})
MERGE (m:Node{name:line.Target})
MERGE (n)-[:TO{distance:line.dist}]->(m)

Cypher Import from CSV to Neo4J - How To Improve Performance

I am importing the following to Neo4J:
categories.csv
CategoryName1
CategoryName2
CategoryName3
...
categories_relations.csv
category_parent category_child
CategoryName3 CategoryName10
CategoryName32 CategoryName41
...
Basically, categories_relations.csv shows parent-child relationships between the categories from categories.csv.
I imported the first csv file with the following query which went well and pretty quickly:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories.csv' as line
CREATE (:Category {name:line[0]})
Then I imported the second csv file with:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///categories_relations.csv' as line
MATCH (a:Category),(b:Category)
WHERE a.name = line[0] AND b.name = line[1]
CREATE (a)-[r:ISPARENTOF]->(b)
I have about 2 million nodes.
I tried executing the 2nd query and it is taking quite long. Can I make the query execute more quickly?
Confirm you are matching on right property. You are setting only one property for Category node i.e. name while creating
categories. But you are matching on property id in your second
query to create the relationships between categories.
For executing the 2nd query faster you can add an index on the property (here id) which you are matching Category nodes on.
CREATE INDEX ON :Category(id)
If it still takes time, You can refer my answer to Load CSV here

Out of memory when creating large number of relationships

I'm new to Neo4J, and I want to try it on some data I've exported from MySQL. I've got the community edition running with neo4j console, and I'm entering commands using the neo4j-shell command line client.
I have 2 CSV files, that I use to create 2 types of node, as follows:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/updates.csv" AS row
CREATE (:Update {update_id: row.id, update_type: row.update_type, customer_name: row.customer_name, .... });
CREATE INDEX ON :Update(update_id);
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/facts.csv" AS row
CREATE (:Fact {update_id: row.update_id, status: row.status, ..... });
CREATE INDEX ON :Fact(update_id);
This gives me approx 650,000 Update nodes, and 21,000,000 Fact nodes.
Once the indexes are online, I try to create relationships between the nodes, as follows:
MATCH (a:Update)
WITH a
MATCH (b:Fact{update_id:a.update_id})
CREATE (b)-[:FROM]->(a)
This fails with an OutOfMemoryError. I believe this is because Neo4J does not commit the transaction until it completes, keeping it in memory.
What can I do to prevent this? I have read about USING PERIODIC COMMIT but it appears this is only useful when reading the CSV, as it doesn't work in my case:
neo4j-sh (?)$ USING PERIODIC COMMIT
> MATCH (a:Update)
> WITH a
> MATCH (b:Fact{update_id:a.update_id})
> CREATE (b)-[:FROM]->(a);
QueryExecutionKernelException: Invalid input 'M': expected whitespace, comment, an integer or LoadCSVQuery (line 2, column 1 (offset: 22))
"MATCH (a:Update)"
^
Is it possible to create relationships in this way, between large numbers of existing nodes, or do I need to take a different approach?
The Out of Memory Exception is normal as it will try to commit it all at once and as you didn't provide it, I assume java heap settings are set as default (512m).
You can however, batch the process with kind of pagination, only I would prefer to use MERGE rather than CREATE in this case :
MATCH (a:Update)
WITH a
SKIP 0
LIMIT 50000
MATCH (b:Fact{update_id:a.update_id})
MERGE (b)-[:FROM]->(a)
Modify SKIP and LIMIT after each batch until your reach 650k update nodes.

Resources