Batch insertion in Neo4j

Batch insertion in Neo4j - neo4j

I am new to neo4j. I have created graph database in neo4j i.e "my.graphdb" which also include schema indexes. Now I want to use this database into my java program. For that I used batch insertion. Also I am creating schema indexes in my java program. But when I run the program it gives me following exception.
Exception :
java.lang.UnsupportedOperationException: Schema modification is currently not available
through the BatchDatabase API.
In short, I want to use my existing graph database (my.graphdb) into new java program. Moreover, I want to use my existing data and indexes present in my.graphdb to insert new nodes and relationships.
Please let me know what should I do ?

The BatchGraphDatabase should not be used.
Use the BatchInserter or the transactional EmbeddedGraphDatabase.
The Batchinserter also supports the creation of schema indexes which are populated at shutdown (test with a small dataset first, it can take a while, will be faster in 2.0.1).

Related

What is neo4j closest feature to sql stored functions?

I'm new to neo4j and graph databases in general.
Given a complex Cypher query, that I don't want to store inside the application (or several applications), but keep centralized, what options are left to me?
In a SQL database I would use a stored function. Are UDF function the way to go in neo4j?
From the docs it seems to me that they're more a way to extend the database functionality by being able to access the graph internals, but I've just started studying them.

Take a look at the custom functions and procedures available in the apoc library.
https://neo4j.com/docs/labs/apoc/current/cypher-execution/cypher-based-procedures-functions/
CALL apoc.custom.asProcedure('answer','RETURN 42 as answer')
CALL custom.answer() YIELD row RETURN row.answer

Neo4j web client fails with large Cypher CREATE query. 144000 lines

I'm new to neo4j and currently attempting to migrate existing data into a neo4j database. I have written a small program to convert current data (in bespoke format) into a large CREATE cypher query for initial population of the database. My first iteration has been to somewhat retain the structuring of the existing object model, i.e Objects become nodes, node type is same as object name in current object model, and the members become properties (member name is property name). This is done for all fundamental types (and strings) and any member objects are thus decomposed in the same way as in the original object model.
This has been fine in terms of performance and 13000+ line CREATE cypher queries have been generated which can be executed throuh the web frontend/client. However the model is not ideal for a graph database, I beleive, since there can be many properties, and instead I would like to deomcompose these 'fundamental' nodes (with members which are fundamental types) into their own node, relating to a more 'abstract' node which represents the more higher level object/class. This means each member is a node with a single (at first, it may grow) property say { value:"42" }, or I could set the node type to the data type (i.e integer). If my understanding is correct this would also allow me to create relationships between the 'members' (since they are nodes and not propeties) allowing a greater freedom when expressing relationships between original members of different objects rather than just relating the parent objects to each other.
The problem is this now generates 144000+ line Cypher queries (and this isn't a large dataset in compraison to others) which the neo4j client seems to bulk at. The code highlighting appears to work in the query input box of the client (i.e it highlights correctly, which I assume implies it parsed it correctly and is valid cypher query), but when I come to run the query, I get the usual browser not responding and then a stack overflow (no punn intended) error. Whats more the neo4j client doesn't exit elegantly and always requires me to force end task and the db is in the 2.5-3GB usage from, what is effectively and small amount of data (144000 lines, approx 2/3 are relationships so at most ~48000 nodes). Yet I read I should be able to deal with millions of nodes and relationships in the milliseconds?
Have tried it with firefox and chrome. I am using the neo4j community edition on windows10. The sdk would initially be used with C# and C++. This research is in its initial stages so I haven't used the sdk yet.
Is this a valid approach, i.e to initially populate to database via a CREATE query?
Also is my approach about decomposing the data into fundamental types a good one? or are there issues which are likely to arise from this approach.

That is a very large Cypher query!!!
You would do much better to populate your database using LOAD CSV FROM... and supplying a CSV file containing the data you want to load.
For a detailed explaination, have a look at:
https://neo4j.com/developer/guide-import-csv/
(This page also discusses the batch loader for really large datasets.)
Since you are generating code for the Cypher query I wouldn't imagine you would have too much trouble generating a CSV file.
(As an indication of performance, I have been loading a 1 million record CSV today into Neo4j running on my laptop in under two minutes.)

Embedded automatic full text indexing completely removed from Neo4j as of 3.0.0?

I'm moving from Neo4j 2.2.* to (still prerelease) 3.0.0 and all of a sudden it seems that configuration parameters
node_auto_indexing=true
relationship_auto_indexing=true
node_keys_indexable=some_node_property
relationship_keys_indexable=some_rel_property
had gone and are not available any more. This is sad because I need full-text indexing (namely, fuzzy search queries and range searches), I was happily using it since 2.0.0 and had a naive hope that new Lucene 5.5 will make my life better with 3.0.0.
Is this functionality completely removed? START clause is still here in Cypher, neo4j-shell still has command which allows manipulating "legacy" FT indices so my question is:
how do I populate my FT index without using Java or another external programming language?
case 1: I import some bunch of "static" data into the graph which
will rarely be updated (consider dictionary) and need to arrange FTS
on those once, and manually perform complete reindex on occasional updates of the dataset;
case 2: nodes and relationships with specific properties
automagically get indexed upon creation or upon assignment of a new value to the property with specific name, near-realtime, as it used to be before.
New schema indexes are cool in 3.0.0 and range searches are implemented, but a) they work only on properties of nodes, no relationships, b) they don't allow full-text, fuzzy queries, and AFAIK regular expression matching does not use index.
Thanks for your suggestions!
WBR, Andrii

Andrii,
only the default config parameters have been removed not the functionality.
What is the actual use-case you are using the FTS indexes (on rels) for?
In 3.0 you can still use the start-clause but using stored procedures you can add nodes and relationship explicitly to indexes. And you can use similar procedures to query your indexes even more efficiently, e.g. by passing in start and end-nodes.
See (WIP): https://github.com/jexp/neo4j-apoc-procedures#manual-indexes

Are there Label properties - Is there a Neo4j schema for Labels?

How to ensure that all nodes of a label have some common properties ?
For example, I want to create a property "name" for all nodes of a label "Person", but I can make a mistake in writing of property name (namee ! for example)

There is no such mechanism built in Neo4j today (the current version of Neo4j at the time of writing is 2.1.6). What you are describing is some sort of schema (if you compare e.g. DDL for a RDBMS) and Neo4j is basically schema free. This type of structural integrity is quite often handled in the application layer for NoSQL databases.
The only schema operations that are available today for Neo4j are described here.
Currently they include:
Unique - e.g. CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE
Indexes - create an index on a label e.g. CREATE INDEX ON :Person(name)
A comment on this answer from Michael Hunger who is part of team behind Neo4j indicates that more constraints will be available for Neo4j in the future. Furthermore, Michael points to the following alternatives:
Take a look at Structr, a layer above Neo4j that among other things enforces a stricter schema (check the schema docs here)
SylvaDB, an easy-to-use layer above Neo4j that also has schema support. Seems very
In addition to this, FrobberOfBits pointed to the tool NeoProfiler that contains a number of profilers, most of which run very simple Cypher queries against your database and provide summary statistics. Some profilers will actually discover data in your graph and then spawn other profilers which will run later. For example, if a label called "Person" is discovered in the data, a label profiler will be added to the run queue to inspect the population of nodes with that label.

Importing data from oracle to neo4j using java API

Can u please share any links/sample source code for generating the graph using neo4j from Oracle database tables data .
And my use case is oracle schema table names as Nodes and columns are properties. And also need to genetate graph in tree structure.

Make sure you commit the transaction after creating the nodes with tx.success(), tx.finish().
If you still don't see the nodes, please post your code and/or any exceptions.

Use JDBC to extract your oracle db data. Then use the Java API to build the corresponding nodes :
GraphDatabaseService db;
try(Transaction tx = db.beginTx()){
Node datanode = db.createNode(Labels.TABLENAME);
datanode.setProperty("column name", "column value"); //do this for each column.
tx.success();
}
Also remember to scale your transactions. I tend to use around 1500 creates per transaction and it works fine for me, but you might have to play with it a little bit.
Just do a SELECT * FROM table LIMIT 1000 OFFSET X*1000 with X being the value for how many times you've run the query before. Then keep those 1000 records stored somewhere in a collection or something so you can build your nodes with them. Repeat this until you've handled every record in your database.
Not sure what you mean with "And also need to genetate graph in tree structure.", if you mean you'd like to convert foreign keys into relationships, remember to just index the key and in stead of adding the FK as a property, create a relationship to the original node in stead. You can find it by doing an index lookup. Or you could just create your own little in-memory index with a HashMap. But since you're already storing 1000 sql records in-memory, plus you are building the transaction... you need to be a bit careful with your memory depending on your JVM settings.

You need to code this ETL process yourself. Follow the below
Write your first Neo4j example by following this article.
Understand how to model with graphs.
There are multiple ways of talking to Neo4j using Java. Choose the one that suits your needs.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart