Migrating mysql data to neo4j database - neo4j

I wanted to migrate data from Mysql to neo4j. I'm using Neo4j 2.1.2 64 bit installer on 64 bit windows machine.
I followed the blog in the link http://maxdemarzi.com/2012/02/28/batch-importer-part-2/#more-660 where migrating data from postgreSQL is well well explained.
Even I took the same example and created the sames tables in mysql. After creating nodes and relationship tables in mysql, i exported them as a csv file . So that I can use them in the batch import command.
Here all my fields are varchar and row_number() fiels is also a varchar field.
I used the below command to export mysql's relationship table into myrels.csv file (same thing for nodes table):
SELECT *
INTO OUTFILE 'D:/Tech_Explorations/BigData_Related/Neo4j/mqytoneo4j/myrels.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n'
FROM
(
SELECT 'start' AS `start`, 'end' AS `end`,'type' AS `type`,'status' AS `status`
UNION ALL
SELECT `start`, `end`,`type`,`status`
FROM `vouch_rels`
) `sub_query`;
Used below query to load the mynodes.csv and myrels.csv o neo4j:
java -server -Xms1024M -jar D:/Neo4j/target/batch-import-jar-with-dependencies.jar
neo4j/data/graph.db mynodes.csv myrels.csv
When i executed the above batch import query , it's giving me an error saying
Exception in thread "main" java.lang.NumberFormatException: For input string: "1
,"1","python,confirmed"
Where "1,"1","python,confirmed" is row in the myrels.csv.
The above error might be because of some datatype or csv file issue but I'm not able to figure it out. Even I tried with changing different csv load options while loading from mysql to csv file. But still getting the same error.

MySQL to Neo4j migration is not a straightforward export-load problem. The property graph needs to be clear for Neo4j and should be consistent with the MySQL schema. There is no way to automatically generate Neo4j property graph from MySQL schema to my knowledge. After the 2 schemas are well defined you can write your own migrations in any programming language.
The python way to do the migration
py2neo is a Python library that makes it easy to write migrations as it provides a ton of useful functions, option to run cypher queries, transaction support, etc.
I used py2neo in a project to migrate around 100MB data from MySQL to Neo4j. Here is the sample code for reference along with documentation. The data is not provided but the schema of both MySQL and Neo4j property graph is given.
P.S: I might have digressed from trying to address your problem. But I have written this answer as it might help readers who are looking to solve the MySQL to Neo4j migration problem using Python.

I'd suggest looking at the LOAD CSV Cypher option. There are detailed docs on the Neo4j website.
Basically, you can use a Cypher query like the following to import your data.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/path/to/your.csv" AS csvLine
MATCH (person:Person { id: toInt(csvLine.personId)}),(movie:Movie { id: toInt(csvLine.movieId)})
CREATE (person)-[:PLAYED { role: csvLine.role }]->(movie)
If you wish to proceed with the Java batch import tool then I believe your file needs to be tab delimited not comma delimited.

Related

How to export data from neo4j to a MySQL table

I have below data in my neo4j database which I want to insert into mysql table using jdbc.
"{""id"":7512,""labels"":[""person1""],""properties"":{""person1"":""Nishant"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7513,""labels"":[""person1""],""properties"":{""person1"":""anish"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7519,""labels"":[""person1""],""properties"":{""person1"":""nishant"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7520,""labels"":[""person1""],""properties"":{""person1"":""xiaoyi"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7521,""labels"":[""person1""],""properties"":{""person1"":""pavan"",""group_uuid"":""3ddc954a-16f5-4c59-a94a-b262f9784211""}}"
"{""id"":7522,""labels"":[""person1""],""properties"":{""person1"":""jose"",""group_uuid"":""6b27c9c8-4d5b-4ebc-b8c2-667bb159e029""}}"
"{""id"":7523,""labels"":[""person1""],""properties"":{""person1"":""neil"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7524,""labels"":[""person1""],""properties"":{""person1"":""menish"",""group_uuid"":""9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5""}}"
"{""id"":7525,""labels"":[""person1""],""properties"":{""person1"":""ankur"",""group_uuid"":""3ddc954a-16f5-4c59-a94a-b262f9784211""}}"
Desired Output in mysql database table.
id,name,group_id
7525,ankur,3ddc954a-16f5-4c59-a94a-b262f9784211
7524,menish,9d7d4bf6-6db6-4cf2-8186-d8d0621a58c5
...
Since you did not provide much info in your question, here is a general approach for exporting from neo4j to MySQL.
Execute a Cypher query using one of the APOC export to CSV procedures to export the data intended for the table to a CSV file.
Import from the CSV file into MySQL. (E.g., here is a tutorial.)

Neo4J Load CSV from Cypher throws "Unknown Error" or DeadlockDetected

I'm evaluating using Neo4J Community 2.1.3 to store a list of concepts and relationships between them. I'm trying to load my sample test data (CSV files) into Neo4J using Cypher from the Web interface , as described in the online manual.
My data looks something like this:
concepts.csv
id,concept
1,tree
2,apple
3,grapes
4,fruit salad
5,motor vehicle
6,internal combustion engine
relationships.csv
sourceid,targetid
2,1
4,2
4,3
5,6
6,5
And so on... For my sample, I have ~17K concepts and ~16M relationships. Following the manual, I started Neo4J server, and entered this into Cypher:
LOAD CSV WITH HEADERS FROM "file:///data/concepts.csv" AS csvLine
CREATE (c:Concept { id: csvLine.id, concept: csvLine.concept })
This worked fine and loaded my concepts. Then I tried to load my relationships.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///data/relationships.csv" AS csvLine
MATCH (c1:Concept { id: csvLine.sourceid }),(c2:Concept { id: csvLine.targetid })
CREATE (c1)-[:RELATED_TO]->(c2)
This would run for an hour or so, but always stopped with either:
"Unknown error" (no other info!), or
"Neo.TransientError.Transaction.DeadlockDetected" with a detailed message like
"LockClient[695] can't wait on resource RWLock[RELATIONSHIP(572801), hash=267423386] since => LockClient[695] <-[:HELD_BY]- RWLock[NODE(4145), hash=1224203266] <-[:WAITING_FOR]- LockClient[691] <-[:HELD_BY]- RWLock[RELATIONSHIP(572801), hash=267423386]"
It would stop after loading maybe 200-300K relationships. I've done a "sort | uniq" on the relationships.csv so I'm pretty sure there are no duplicates. I looked at the log files in data/log but found no error message.
Has anyone seen this before? BTW, I don't mind losing a small portion of the relationships, so I'll be happy if I can just turn off ACID transactions. I also want to avoid writing code (to use the Java API) at this stage. I just want to load up my data to try it out. Is there anyway to do this?
My full data set will have millions of concepts and maybe hundreds of millions of relationships. Does anyone know if Neo4J can handle this amount of data?
Thank you.
You're doing it correctly.
Do you use the neo4j-shell or the browser?
Did you do: create index on :Concept(id);?
If you don't have an index, searching for the concepts will take exponentially longer, as it has to scan all nodes of this label for this id-value. You should / could also check via prefixing your query with PROFILE if it uses an index for both matches.
Never seen that deadlock before despite importing millions of relationships.
Can you share the full stack trace? If you use shell, you might want to do export STACKTRACES=true
Can you use USING PERIODIC COMMIT 1000 ?

Loading Neo4j database dump (neo4j-shell)

My database was affected by the bug in Neo4j 2.1.1 that tends to corrupt the database in the areas where many nodes have been deleted. It turns out most of the relationships that have been affected were marked for deletion in my database. I have dumped the rest of the data using neo4j-shell and with a single query. This gives a 1.5G Cypher file that I need to import into a mint database to have my data back in a healthy data structure.
I have noticed that the dump file contains definitions for (1) schema, (2) nodes and (3) relationships. I have already removed the schema definitions from the file because they can be applied later on. Now the issue is that since the dump file uses a single series of identifiers for nodes during node creation (in the following format: _nodeid) and relationship creation, it seems that all CREATE statements (33,160,527 in my case) need to be run in a single transaction.
My first attempt to do so kept the server busy for 36 hours without results. I had neo4j-shell load the data directly into a new database directory instead of connecting to a server. The data files in the new database directory never showed any sign of receiving data, and the message log showed many messages indicating thread blocks.
I wonder what is the best way of getting this data back into the database? Should I load a specific config file? Do I need to allocate a large Java heap? What is the trick to have such a large dump file loaded into a database?
The dump command is not meant for larger scale exports, there was originally a version that did, but it was not included in the product.
if you have the old database still around, you can try some things:
contact Neo4j support to help you recover your data
use my store-utils to copy it over to a new db (it will skip all broken records)
query the data with cypher and export the results as csv
you could use the shell-import-tools for that
and then import your data from the CSV using either the shell tools again, or the load csv command or the batch-importer
Here is what I finally did:
First I identified all unaffected nodes and marked them with one specific label (let's say Carriable). This was a pretty easy process in my case because all the affected nodes had the same label, so, I just excluded this specific label. In my case I did not have to identify the affected relationships separately because all the affected relationships were also connected to nodes from the affected label.
Then I exported the whole database except the affected nodes and relationships to GraphML using a single query (in neo4j-shell):
export-graphml -o /home/mah/full.gml -t -r match (n:Carriable) optional match (n)-[i]-(:Carriable) return n,i
This took about a half hour to yield a 4GB XML file.
Then I imported the entire GraphML back into a mint database:
JAVA_OPTS="-Xmx8G" neo4j-shell -c "import-graphml -c -t -b 10000 -i /home/mah/full.gml" -path /db/newneo
This took yet another half hour to accomplish.
Please note that I allocated more than sufficient Java heap memory (JAVA_OPTS="-Xmx8G"), imposed a particularly small batch size (-b 10000) and allowed the use of on-disk caching.
Finally, I removed the unnecessary "Carriable" label and recreated the constraints.

Is there a tool to dump a Neo4j graph as Cypher and re-load it from Cypher?

Everyone familiar with MySQL has likely used the mysqldump command which can generate a file of SQL statements representing both the schema and data in a MySQL database.
These SQL text files are commonly used for many purposes: backups, seeding replicas, copying databases between installations (- copy prod DBs to staging environments etc) and others.
Is there a similar tool for Neo4j that can dump an entire graph into a text file of Cypher statements, that when executed on an empty database would reconstruct the original data?
Thanks.
In neo4j version 2 (e.g. 2.0.0M3), using neo4j-shell, you can use the command
dump
which will create the cypher statements (pretty much like mysqldump would do. To read in the file you can use
cat dump.cql | neo4j-shell
Cypher is just a query language for Neo4J just as SQL is for MySQL or other relational databases. If you wish to transfer the db, then you just need to copy the folder containing the database files. Simple.
For example my folder simple-graph contains all the db files. Just copy the folder and store it at some other location. You can directly start using it as:
GraphDatabaseServiceraphDb = new EmbeddedGraphDatabase(DB_PATH);//DB_PATH is path to the new location
You can use the procedure apoc.export.cypher.all() to dump all the data in your database.
For example, you can dump the database into a single file called dump-file.cypher:
neo4j#neo4j> CALL apoc.export.cypher.all('dump-file.cypher');
For details of the procedure, please see the documentation: https://neo4j.com/labs/apoc/4.4/overview/apoc.export/apoc.export.cypher.all/.

Use CSV to populate Neo4j

I am very new for Neo4j. I am a learner of this graph database. I need to load a csv file into Neo4j database. I am trying from 2 days,I couldn't able to find good information of reading the csv file in to Neo4j. Please suggest me wil sample code or blogs of reading csv file into Neo4j.
Example:
Suppose if i have a csv file in This way how can we read it into Neo4j
id name language
1 Victor Richards West Frisian
2 Virginia Shaw Korean
3 Lois Simpson Belarusian
4 Randy Bishop Hiri Motu
5 Lori Mendoza Tok Pisin
You may want to try https://github.com/sroycode/neo4j-import
This populates data directly from a pair of CSV files ( entries must be COMMA separated )
To build: (you need maven)
sh build.sh
The nodes file has a mandatory field id and any other fields you like
NODES.txt
id,name,language
1,Victor Richards,West Frisian
2,Virginia Shaw,Korean
3,Lois Simpson,Belarusian
The relationships file has 3 mandatory fields from,to,type. Assuming you have a field age ( long integer), and info, the relations file will look like
RELNS.txt
from,to,type,age#long,info
1,2,KNOWS,10,known each other from school
1,3,CLUBMATES,5,member of country club
Running:
sh run.sh graph.db NODES.txt RELNS.txt
will create graph.db in the current folder which you can copy to the neo4j data folder.
Note:
If you are using neo4j later than 1.6.* , please add this line in conf/neo4j.properties
allow_store_upgrade = true
Have fun.
Please take a look at https://github.com/jexp/batch-import
Can be used as starting point
There is nothing available to generically load CSV data into Neo4j because the source and destination data structures are different: CSV data is tabular whereas Neo4j holds graph data.
In order to achieve such an import, you will need to add a separate step to translate your tabular data into some form of graph (e.g. a tree) before it can be loaded into Neo4j. Taking the tree structure further as an example, the page below shows how XML data can be converted into Cypher which may then be directly executed against a Neo4j instance.
http://geoff.nigelsmall.net/xml2graph/
Please feel free to use this tool if it helps (bear in mind it can only deal with small files) but this will of course require you to convert your CSV to XML first.
Cheers
Nigel
there is probably no known CSV importer for neo4j, you must import it yourself:
i usually do it myself via gremlin's g.loadGraphML(); function.
http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-load-a-sample-graph
i parse my data with some external script into the xml syntax and load the particular xml file. you can view the syntax here:
https://raw.github.com/tinkerpop/gremlin/master/data/graph-example-1.xml
parsing an 100mb file takes few minutes.
in your case what you need to do is a simple bipartite graph with vertices consisting of users and languages, and edges of "speaks". if you know some programming, then create user nodes with parameters id, name | unique language nodes with parameters name | relationships where you need to connect each user with the particular language. note that users can be duplicite whereas languages can't.
I believe your question is too generic. What does your csv file contain? Logical meaning of the contents of a csv file can vary very much. An example of two columns with IDs, which would represent entities connected to each other.
3921 584
831 9891
3841 92
...
In this case you could either write a BatchInserter code snippet which would import it faster, see http://docs.neo4j.org/chunked/milestone/batchinsert.html.
Or you could import using regular GraphDatabaseService with transaction sizes of a couple of thousands inserts for performance. See how to setup and use the graph db at http://docs.neo4j.org/chunked/milestone/tutorials-java-embedded.html.

Resources