Create node with specific internal id using LOAD CSV - neo4j

for initial data import (step 1) into Neo4j database I'm using the neo4j-admin import tool. There you can specify the internal id of a node by :ID in the header.
I would also like to use LOAD CSV command for creating more nodes (step 2) into an already existing database (with data from previous step). I can't find the answer on how to specify internal id of a node by using this command.
Why it is not possible, while at initial import it is? In second step I'm having a similar csv files as in the first step, which means I have a csv file of nodes with first column being an id of a node AND I have a csv file of relationships between them with columns be like start_id,end_id,relationshipType.
Thanks, Petr M

You have misunderstood how the IDs contained in the import input files are used. They are not used as native IDs.
The IDs in the input data files are only used to enable the import tool to know that node X in file A is supposed to be the same as node Y in file B. After the import is completed, the IDs from the files are forgotten.
Whenever a node is created, the neo4j server always decides on its own what the actual native ID will be.
Also, it is never recommended to store native IDs in the DB, since it is not a reliable way to identify a specific entity (node or relationship) over time. After an entity is deleted, its native ID can be re-assigned to a new entity.

Related

Neo4j APOC export/import with primary keys instead of internal ids

I am trying to import multiple CSVs, exported from multiple different Neo4j databases with APOC's export, into one big database. Some of the nodes are shared.
There is a problem that relationships in the CSV use the Neo4j's internal IDs for the _start and _end, instead of the nodes' "primary key" -- is the #Index with primary = true (same as #Id) a thing of the Neo4j or the Neo4j's Java OGM?. This is bad because these multiple exports could (and will) have same internal IDs for different nodes and the merged graph will be a mess. The same applies for nodes, I want to merge them based on the primary key during the import instead of creating duplicates.
Is there a way to export a Neo4j database with APOC in a way that it relies on primary keys instead of internal IDs? I need a CSV or JSON file, no CQL. Or is there another way of exporting a Neo4j database in a way that I can import multiple exports and they will merge seamlessly? ...something different than writing my CSV exporter and imported, this will be the very last option.

Facing problem to importing csv file with bulk importer in neo4j

I am trying to load nodes and its relations from csv file using neo4j bulk importer my script like this
neo4j-admin import \
--id-type=string \
--nodes:AGENT="nodes_AGENT_C_20190610.csv" \
--nodes:CUSTOMER="nodes_CUSTOMER_C_20190610.csv" \
--relationships:CASHOUT="relcashoutTest-header.csv,relcashoutTest.csv"
and my csv file like this for relationship files
:TYPE,:START_ID(CUSTOMER),:END_ID(AGENT),TXNID:string,TIMESTAMP:datetime,AMOUNT:int,CHANNEL
Here TYPE indicates the column named RELATIONSHIP
and my relational csv file look like this
CASHOUT,abc,xyz,6C19MX7DXL,2019-03-01T11:02:55,40,charge
CASHOUT,pqr,jkl,6C19MX7E2V,2019-03-01T11:02:57,10,charge
after running my import.sh script I am getting bellow error
unexpected error: Group 'CUSTOMER' not found. Available groups are: []
I have gone through the document but didn't figure it out my mistakes. Any help will be appreciated neo4j version is 3.5.8
The :START_ID and :END_ID fields can take an optional ID space, as in :START_ID(CUSTOMER).
But an ID space is not the same thing as a node label. In order for :START_ID(CUSTOMER) to work, one of your node CSV files (presumably the one for the CUSTOMER label) must specify, in its header, :ID(CUSTOMER) instead of just :ID. Doing so would associate the CUSTOMER ID space with the nodes created by that file, and you should no longer see that specific error.
You may also need to do something similar for the AGENT ID space.
NOTE: If all your nodes have unique values in the :ID field (across CSV files), then you do not need to use ID spaces at all. In that case, your relationship file header can simply use :START_ID and :END_ID without any qualification.

How to import unique data with Talend?

I have 100M datasets in Oracle and try to import all these datasets into Neo4j with Talend. My question is, since the 100M datasets is updating everyday, how can I make sure Talend will only import datasets which are not already existed in the neo4j database? In other words, talend will only import the updated datasets.
For example, suppose Neo4j contains 38890, 38891, 38892 right now. In Oracle, the updated datasets are 38890,38891, 38892, 38893. The expected result is 38893 will be the imported only.
The datasets is very large, it seems not very efficienct to import these datasets to Neo4j everyday and delete the duplicate. Could anyone help me out please? Thanks in advance.
You should to do 2 loads, 1 for the initial FULL Load, just like you do it now and another one for the daily incremental loads.
Check your primary keys and find a way to make a SELECT query which will return your new/modified rows. You need another query which will show you which rows had been deleted / modified as you need to remove these rows before adding the new/modified rows into your db.
To run this automatically you need to right click on your job and select "export Job" It will build your job into a JAVA JAR file. With a .sh and .bat launcher. You can then use the windows scheduler to execute this daily, or use CRON to execute it daily if you happen to have a linux server.
You certainly have an updated timestamp on your tables in oracle, so I would use that to filter out the data that was only updated since the last import, which would be much less data, e.g. 1-5M rows.
For those entries you can have a unique constraint and then use cypher with the MERGE on the entries which is a get-or-create.
Make sure to use parameters for updating the data, against the embedded or server APIs
FOREACH (p in {people} |
MERGE (person:Person {name:{p.name}})
ON CREATE SET person.age = p.age, ...
}

Neo4j staged batch import

I want to import existing entities and their relationships from MySQL database to a new Neo4j db. I have several questions that I still do not quite understand -
Based on the description of the batch importer, it appears as if I need to have both an entity and relationship file. Can I execute an import without one or the other file type?
Can I execute a series of batch imports, using different files for different entities?
Are you using the batch importer from the Neo4j website or the one by jexp/Michael Hunger ?
If it's the jexp batch-import you could execute just the entity/nodes file (resulting in a bunch of nodes and no edges) or just the rels file (resulting in an empty graph since there's no nodes to connect). Or you could import the nodes, then import the rels, either in the same import or in a series of imports.

Use CSV to populate Neo4j

I am very new for Neo4j. I am a learner of this graph database. I need to load a csv file into Neo4j database. I am trying from 2 days,I couldn't able to find good information of reading the csv file in to Neo4j. Please suggest me wil sample code or blogs of reading csv file into Neo4j.
Example:
Suppose if i have a csv file in This way how can we read it into Neo4j
id name language
1 Victor Richards West Frisian
2 Virginia Shaw Korean
3 Lois Simpson Belarusian
4 Randy Bishop Hiri Motu
5 Lori Mendoza Tok Pisin
You may want to try https://github.com/sroycode/neo4j-import
This populates data directly from a pair of CSV files ( entries must be COMMA separated )
To build: (you need maven)
sh build.sh
The nodes file has a mandatory field id and any other fields you like
NODES.txt
id,name,language
1,Victor Richards,West Frisian
2,Virginia Shaw,Korean
3,Lois Simpson,Belarusian
The relationships file has 3 mandatory fields from,to,type. Assuming you have a field age ( long integer), and info, the relations file will look like
RELNS.txt
from,to,type,age#long,info
1,2,KNOWS,10,known each other from school
1,3,CLUBMATES,5,member of country club
Running:
sh run.sh graph.db NODES.txt RELNS.txt
will create graph.db in the current folder which you can copy to the neo4j data folder.
Note:
If you are using neo4j later than 1.6.* , please add this line in conf/neo4j.properties
allow_store_upgrade = true
Have fun.
Please take a look at https://github.com/jexp/batch-import
Can be used as starting point
There is nothing available to generically load CSV data into Neo4j because the source and destination data structures are different: CSV data is tabular whereas Neo4j holds graph data.
In order to achieve such an import, you will need to add a separate step to translate your tabular data into some form of graph (e.g. a tree) before it can be loaded into Neo4j. Taking the tree structure further as an example, the page below shows how XML data can be converted into Cypher which may then be directly executed against a Neo4j instance.
http://geoff.nigelsmall.net/xml2graph/
Please feel free to use this tool if it helps (bear in mind it can only deal with small files) but this will of course require you to convert your CSV to XML first.
Cheers
Nigel
there is probably no known CSV importer for neo4j, you must import it yourself:
i usually do it myself via gremlin's g.loadGraphML(); function.
http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-load-a-sample-graph
i parse my data with some external script into the xml syntax and load the particular xml file. you can view the syntax here:
https://raw.github.com/tinkerpop/gremlin/master/data/graph-example-1.xml
parsing an 100mb file takes few minutes.
in your case what you need to do is a simple bipartite graph with vertices consisting of users and languages, and edges of "speaks". if you know some programming, then create user nodes with parameters id, name | unique language nodes with parameters name | relationships where you need to connect each user with the particular language. note that users can be duplicite whereas languages can't.
I believe your question is too generic. What does your csv file contain? Logical meaning of the contents of a csv file can vary very much. An example of two columns with IDs, which would represent entities connected to each other.
3921 584
831 9891
3841 92
...
In this case you could either write a BatchInserter code snippet which would import it faster, see http://docs.neo4j.org/chunked/milestone/batchinsert.html.
Or you could import using regular GraphDatabaseService with transaction sizes of a couple of thousands inserts for performance. See how to setup and use the graph db at http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded.html.

Resources