Neo4j APOC reliable export/import to file format? - neo4j

APOC procedures support the following export/import formats: (export, import)
CSV
JSON
Cypher script
Graphml
Gephi
From what I've tried using csv and cypher script is very slow compared to graphml, thus I haven't followed up on these.
Using graphml is fast but it seems that some information gets lost between export and import (field types
like integers suddenly become strings even though useTypes is set to true on export). It is also not feasible to track changes with git (when exporting/importing schema and structural nodes only, no data) because when exporting -> wiping database -> importing (from first export) -> exporting again changes the order of the exported data completely between the first export and second export file.
JSON has an option to select between four formats: JSON_LINES (default), ARRAY_JSON, JSON, JSON_ID_AS_KEYS
Is JSON as fast as graphml and a typesafe format? Are any of the JSON format options objectively better than others?

Related

Using saxon & XPath 3.1 to parse JSON files

This is for the case of calling Saxon from a Java application. I understand that Saxon can use XPath 3.1 to run queries against JSON files. A couple of question on this:
Where is there an example of how to do this? I've done searches and find lots of answers on details of doing this, but noting on how to read in the file and perform queries. Is it the same as XML?
Is it possible to have a schema file for the JSON so returned values are correctly typed? If so, how?
Is XQuery also able to perform queries on JSON?
What version of Saxon supports this? (We are using 9.9.1.1 and want to know if I need to upgrade.)
Technically, you don't run queries against JSON files; you run them against the data structure that results from parsing a JSON file, which is a structure of maps and arrays. You can parse the JSON file using the parse-json() or json-doc() functions, and then query the result using operators that work on maps and arrays. Some of these (and examples of their use) are shown in the spec at
https://www.w3.org/TR/xpath-31/#id-maps-and-arrays
Googling for "query maps arrays JSON XPath 3.1" finds quite a lot of useful material. Or get Priscilla Walmsley's book: http://www.datypic.com/books/xquery/chapter24.html
Data types: the data types of string, number, and boolean that are intrinsic to JSON are automatically recognized by their form. There's no capability to do further typing using a schema.
XQuery is a superset of XPath, but as far as JSON/Maps/Arrays are concerned, I think the facilities in XPath and those in XQuery are exactly the same.
Saxon has added a bit of extra conformance and performance in each successive release. 9.9 is pretty complete in its coverage; 10.0 adds some optimizations (like a new internal data structure for maps whose keys are all strings, such as you get when you parse JSON). Details of changes in successive Saxon releases are described in copious detail at http://www.saxonica.com/documentation/index.html#!changes

Converting ROOT Tree to HDF5

I have a TTree in ROOT with 1000 events and 15 variables associated to each of them. I would like to convert this in its entirety to an hdf5 dataset. How do I organise my data in HDF5 Groups such that I can access data both by event number and by variable (if I wanted all the data from the 'kinetic energy' variable for example, over all events)? Note: I have already tried the root2hdf5 conversion tool but this does not work for branches with arrays / compound datatypes.
You can try loading the TTree into a Pandas Dataframe with root_pandas, which should work for array branches (not sure for compound datatypes).
From there, you can use both event and variable indexing, and use the regular Pandas functionality to save in your favorite format like HDF5.

ArangoDB - how to import neo4j database export into ArangoDB

Are there any utilities to import database from Neo4j into ArangoDB? arangoimp utility expects the data to be in certain format for edges and vertices than what is exported by Neo4j.
Thanks!
Note: This is not an answer per se, but a comment wouldn't allow me to structure the information I gathered in a readable way.
Resources online seem to be scarce w/r to the transition from neo4j to arangodb.
One possible way is to combine APOC (https://github.com/neo4j-contrib/neo4j-apoc-procedures) and neo4j-shell-tools (https://github.com/jexp/neo4j-shell-tools)
Use apoc to create a cypher export file for the database (see https://neo4j.com/developer/kb/export-sub-graph-to-cypher-and-import/)
Use the neo4j-shell-tool cypher import with the -o switch -- this should generate csv-files
Analyse the csv-files,
massage them with csvtool OR
create json-data with one of the numerous csv2json converters available (npm, ...) and massage these files with jq
Feed the files to arangoimp, repeat 3 if necessary
There is also a graphml to json converter (https://github.com/uskudnik/GraphGL/blob/master/examples/graphml-to-json.py) available, so that you could use the afforementioned neo4j-shell-tools to export to graphml, convert this representation to json and massage these files to the necessary format.
I'm sorry that I can't be of more help, but maybe these thoughts get you started.

How do I make Cypher respect character encoding when using LOAD CSV in browser?

My case: List of Danish-named students (with names including characters as ü,æ,ø,å). Minimal Working Example
CSV file:
Fornavn;Efternavn;Mobil;Adresse
Øjvind;Ørnenæb;87654321;Paradisæblevej 125, 5610 Åkirkeby
Süzette;Ågård;12345678;Ærøvej 123, 2000 Frederiksberg
In-browser neo4j-editor:
$ LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS line FIELDTERMINATOR ";"
CREATE (:Elev {fornavn: line.Fornavn, efternavn: line.Efternavn, mobil: line.Mobilnr, adresse: line.Adresse})
Resulting in registrations like:
Neo4j browser screenshot, containing ?-characters, where Danish/German characters are wanted. My data come from a Learning Management System into Excel. When exporting as CSV from Excel, I can control file encoding as a function of the Save As dialogue box. I have tried encoding from Excel as "UTF-8" (which the Neo4j manual says it wants), "ISO-Western European", "Windows-Western European", "Unicode" in separately named file, and adjusted the FROM 'file:///path/to/file.csv' clause accordingly.
Intriguingly, exactly the same misrepresentation results, independent of which (apparent?) file encoding, I request from Excel when "Saving As". When Copy-pasting the names and addresses directly into the editor, I do not encounter the same problem.
Check Michael Hunger's blog post here which contains some tips, namely:
if you use non-ascii characters (umlauts, accents etc.) make sure to use the appropriate locale or provide the System property -Dfile.encoding=UTF8

Convert Neo4j DB to XML?

Can I convert Neo4J Database files to XML?
I agree, GraphML is the way to go, if you don't have problems with the verbosity of XML. A simple way to do it is to open the Neo4j graph from Gremlin, where GraphML is the default import/export format, something like
peters: ./gremlin.sh
gremlin> $_g := neo4j:open('/tmp/neo4j')
==>neograph[/tmp/neo4j, vertices:2, edges:1]
gremlin> g:save('graphml-export.xml')
As described here
Does that solve your problem?
With Blueprints, simply do:
Graph graph = new Neo4jGraph("/tmp/mygraph");
GraphMLWriter.outputGraph(graph, new FileOutputStream("mygraph.xml"));
Or, with Gremlin (which does the same thing in the back):
g = new Neo4jGraph('/tmp/mygraph');
g.saveGraphML('mygraph.xml');
Finally, to the constructor for Neo4jGraph, you can also pass in a GraphDatabaseService instance.
I don't believe anything exists out there for this, not as of few months ago when messing with it. From what I saw, there are 2 main roadblocks:
XML is hierarchical, you can't represent graph data readily in this format.
Lack of explicit IDs for nodes. Even though implicit IDs exist it'd be like using ROWID in oracle for import/export...not guaranteed to be the same.
Some people have suggested that GraphML would be the proper format for this, I'm inclined to agree. If you don't have graphical structures and you would be fine represented in an XML/hierarchical format...well then that's just bad luck. Since the majority of users who would tackle this sort of enhancement task are using data that wouldn't store that way, I don't see an XML solution coming out...more likely to see a format supporting all uses first.
Take a look at NoSqlUnit
It has tools for converting GraphML to neo4j and back again.
In particular, there is com.lordofthejars.nosqlunit.graph.parser.GraphMLWriter and com.lordofthejars.nosqlunit.graph.parser.GraphMLReader which read / write XML files to / from a neo4j database.

Resources