Fail to import into Neo4J with batch-import - neo4j

I'm trying to import a SQLite3 database into Neo4J using batch-import. Being a Neo4J noob, I followed Max De Marzi's post : Batch Importer – Part 2.
I get this error:
# java -server -Xmx2G -jar /opt/batch-import/target/batch-import-jar-with-dependencies.jar /var/lib/neo4j/data/graph.db nodes.csv relations.csv
Usage: Importer data/dir nodes.csv relationships.csv [node_index node-index-name fulltext|exact nodes_index.csv rel_index rel-index-name fulltext|exact rels_index.csv ....]
Using: Importer /var/lib/neo4j/data/graph.db nodes.csv relations.csv
Using Existing Configuration File
..
Importing 271544 Nodes took 2 seconds
Total import time: 4 seconds
Exception in thread "main" org.neo4j.graphdb.NotFoundException: id=271565
at org.neo4j.unsafe.batchinsert.BatchInserterImpl.getNodeRecord(BatchInserterImpl.java:917)
at org.neo4j.unsafe.batchinsert.BatchInserterImpl.createRelationship(BatchInserterImpl.java:471)
at org.neo4j.batchimport.Importer.importRelationships(Importer.java:136)
at org.neo4j.batchimport.Importer.doImport(Importer.java:214)
at org.neo4j.batchimport.Importer.main(Importer.java:78)
But the node exists :
$ grep ^271565 nodes.csv
271565 'la Callas' 'n_term' 0.0
Has anyone else had this issue?
Thanks.

Can you show your file headers?
As you can see you only imported 271544 nodes. So there is no way there is a node with the node-id 271565.
The id in the relationship file refers to the row number in the nodes-file not to what is in your own "id" column (how could it know).
The only thing you can do here is to use id:id which is a special type and will force the neo4j-id's to correspond to your provided id's. And in the relationship-file use start:id, end:id.

You can try an alternate method to import bulk-data into neo4j.
First convert your database into csv files and import it into Gephi - a graph visualization tool. Then by using the Gephi plugin for neo4j database support, you should be able to export your database (from Gephi) into neo4j format.
Finally just copy the exported file into appropriate neo4j directory.
For importing database into Gephi, you will need two csv files - one with all the nodes and other with all the relationships. Follow this tutorial : http://blog.neo4j.org/2013/01/fun-with-beer-and-graphs.html
Get Gephi from here: https://gephi.org/
Get the Plugin from here : https://marketplace.gephi.org/plugin/neo4j-graph-database-support/
Hope this helps.

Can you supply your input files to test? What branch are you using?
I found a similar error reported here: https://github.com/jexp/batch-import/issues/59

Related

Apache Beam: Wait for AvroIO write step is done before start ImportTransform Dataflow template

I'm using apache beam to create a pipeline where basically reads an InputFile, Convert to Avro, write the AvroFile to a bucket and then Import these avro files to Spanner using Dataflow template
The problem that I'm facing is that the last step (Import the Avro files to the Database) is starting before the previous (write Avro Files to the bucket) is done.
I tried to add the Wait.on function but that only works if returns a PCollection, but when I write the files to avro it returns PDone.
Example of the Code:
// Step 1: Read Files
PCollection<String> lines = pipeline.apply("Reading Input Data exported from Cassandra",TextIO.read().from(options.getInputFile()));
// Step 2: Convert to Avro
lines .apply("Write Item Avro File",AvroIO.writeGenericRecords(spannerItemAvroSchema).to(options.getOutput()).withSuffix(".avro"));
// Step 3: Import to the DataBase
pipeline.apply( new ImportTransform(
spannerConfig,
options.getInputDir(),
options.getWaitForIndexes(),
options.getWaitForForeignKeys(),
options.getEarlyIndexCreateFlag()));
Again, the problem is because step 3 starts before Step 2 is done
any ideas?
This is a flaw in the API, see, e.g. a recent discussion on this on the beam dev list. The only solutions for now are to either fork AvroIO to return a PCollection or run two pipelines sequentially.

Problems using python packages (Neomodel & py2neo) with Neo4j

I am having some issues using the Neomodel and py2neo clients with Neo4j. I have installed Neomodel and py2neo in seperate anaconda virtual environments and tested each individually. Neo4j is installed/docked using docker.
Neomodel
The code
from neomodel import (config, StructuredNode, StringProperty, IntegerProperty,UniqueIdProperty, RelationshipTo, RelationshipFrom)
config.DATABASE_URL = 'bolt://neo4j:password#localhost:7687'
class Country(StructuredNode):
code = StringProperty(unique_index=True, required=True)
# traverse incoming IS_FROM relation, inflate to Person objects
inhabitant = RelationshipFrom('Person', 'IS_FROM')
class Person(StructuredNode):
uid = UniqueIdProperty()
name = StringProperty(unique_index=True)
age = IntegerProperty(index=True, default=0)
# traverse outgoing IS_FROM relations, inflate to Country objects
country = RelationshipTo(Country, 'IS_FROM')
jim = Person(name='Jim', age=3).save()
jim.age = 4
jim.save() # validation happens here
jim.delete()
jim.refresh() # reload properties from neo
jim.id # neo4j internal id
While Neomodel generates the node viewed on the neo4j webapp. The node created is Jim with age=3 i.e. It does not seem to have recorded the fact that Jims age changed from 3 -> 4. Also, I am assuming that jim.delete() would have deleted the node which it did not neither. Lastly, it prompts the following error (below is a snippet of the last lines of the error).
Error
...
File "/Users/sjamal/.conda/envs/tneo/lib/python3.6/site-
packages/neomodel/core.py", line 452, in inflate
if db_property in node.properties:
AttributeError: 'Node' object has no attribute 'properties'
Now I did find this post where the user "Jack Daniel" mentioned that neomodel does not support neo4j 3. So I tried docking the Neo4j v.2.3 image but then I receive the following error (note that its a snippet of the last few lines of the error)
Error when docking image Neo4j 2.3
File "/Users/sjamal/.conda/envs/tneo/lib/python3.6/ssl.py", line 817, in __init__
self.do_handshake()
File "/Users/sjamal/.conda/envs/tneo/lib/python3.6/ssl.py", line 1077, in do_handshake
self._sslobj.do_handshake()
File "/Users/sjamal/.conda/envs/tneo/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
OSError: [Errno 0] Error
Py2neo
I started looking into using p2neo due to the issues I had with Neomodel but I cannot seem to get my configurations right.
The code
from py2neo import Node, Relationship, Graph
graph = Graph("localhost", user='neo4j', password='password', bolt=None)
alice = Node("Person", name="Alice")
bob = Node("Person", name="Bob")
alice_knows_bob = Relationship(alice, "KNOWS", bob)
graph.create(alice_knows_bob)
Error
File "/Users/sjamal/.conda/envs/py2neo_test/lib/python3.6/site-packages/neo4j/bolt/connection.py", line 459, in acquire
connection = self.connector(address)
File "/Users/sjamal/.conda/envs/py2neo_test/lib/python3.6/site-packages/neo4j/v1/bolt.py", line 46, in <lambda>
pool = ConnectionPool(lambda a: connect(a, security_plan.ssl_context, **config))
File "/Users/sjamal/.conda/envs/py2neo_test/lib/python3.6/site-packages/neo4j/bolt/connection.py", line 601, in connect
raise ProtocolError("Connection to %r closed without handshake response" % (address,))
neo4j.bolt.connection.ProtocolError: Connection to ('localhost', 7687) closed without handshake response
Thanks to anyone looking into this. I would be happy to receive any suggestion or explanation on how to set up Py2neo irrespective if I get Neomodel to work or not.
So I have managed to solve my issue with Py2neo but not the issue I had with Neomodel. If I do find a way to get Neomodel working I will post it and either link to this post or post as a comment in this thread.
Py2neo solution with py2neo v4.0 and neo4j v3.o
I tried various combinations, starting with neo4j 2.3 together with different versions of py2neo such as 3.1.2 and then did the same with neo4j v3.0.
I am posting my script that I used to create the node and the graph connection as I was going mad when trying to figure out if I set up the configuration poorly or there was a bug in the package, driver etc.
Py2neo script
from py2neo import Node, Relationship, Graph
graph = Graph('http://localhost:7474/db/data',user='neo4j',pass word='password1234')
tx = graph.begin()
a = Node(label='hero',name='Sabri')
tx.create(a)
tx.commit()
Outdated driver py2neo v3.1.2 in tandem with Neo4j v3.4
As discussed in this Github issue report https://github.com/neo4j/neo4j-python-driver/issues/252 the user who reported the issue was using py2neo 3.1.2 together with Neo4jv3.4. The suspicion was that it was due to an outdated driver (v1.1) that came with py2neo 3.1.2. The new distribution of Neo4j v3.4 seems to come with the new driver 1.6.
Upgrading py2neo to v4.0 and sticking to latest version of Neo4j server i.e. v3.4
When doing this I ran into a different error
File "/Users/sjamal/.conda/envs/py2neo.v4/lib/python3.6/site-packages/py2neo/internal/http.py", line 26, in <module>
from neo4j.addressing import SocketAddress
ModuleNotFoundError: No module named 'neo4j.addressing'
It was discussed in this stackoverflow thread (ModuleNotFoundError: No module named 'neo4j.addressing' and ModuleNotFoundError: No module named 'neo4j') that the issue might be that the driver 1.6 driver might have to be manually installed through pip, which I did.
pip install neo4j-driver==1.6.2
I now received a new error where TypeError was caught when calling a map object.
File "/Users/sjamal/.conda/envs/py2neo.v4/lib/python3.6/site-packages/py2neo/internal/http.py", line 74, in fix_parameters
raise TypeError("Parameters of type {} are not supported".format(type(value).__name__))
TypeError: Parameters of type map are not supported
I found this github issue posted by speters-cmri https://github.com/technige/py2neo/issues/688 which contained the following github commit (https://github.com/technige/py2neo/compare/v4...CMRI-ProCan:v4) to resolve the issue by modifying a json.py script in the py2neo package
I ran my script again to add a test node and it ran without any issues.
If you are too lazy or simply too frustrated to go through a long explanation here is a summary
1. Make sure neo4j v3.0+ is installed. I suggest you look into docker to install neo4j using a docker image
2. pip install py2neo==v4.0
3. pip install neo4j-driver==1.6.2
4. Modify json.py file as described here https://github.com/technige/py2neo/compare/v4...CMRI-ProCan:v4
5. Run py2neo script outlined above

cypher statement returns (no changes, no rows)

I've watched Nicole White's awesome youtube “Using LOAD CSV in the Real World” and decided to re-create the neo4j data using the same method.
I’ve cloned her git repo on this subject and have been working this example on the community version of neo4j on my Mac.
I’m stepping thru the load.cql file one command at a time pasting each command into the command window.
Things are going pretty good- I’ve got a bunch of nodes created. To deal with
null values for sub_products and sub_issues in the master file, I created
two other csv files: sub_issues.csv and sub_products.csv as described in the video.
But when I try reading ether these files, I get "(no changes, no rows)”
somehow I get the impression there is something wrong…
below is the actual command sequence I used for the incremental read.
// Load.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'file:///Volumes/microSD/neo4j-complaints/sub_issue.csv' AS line
WITH line
WHERE line.`Sub-issue` <> '' AND
line.`Sub-issue` IS NOT NULL
MATCH (complaint:Complaint { id: TOINT(line.`Complaint ID`) })
MATCH (complaint)-[:WITH]->(issue:Issue)
MERGE (subIssue:SubIssue { name: UPPER(line.`Sub-issue`) })
MERGE (subIssue)-[:IN_CATEGORY]->(issue)
CREATE (complaint)-[:WITH]->(subIssue)
;
Strip out some of the later statements and do a "RETURN identifier1, identifier2" etc. to see what the engine is doing.

Neo4j: Java API IndexHits<Node>.size() is 0

I'm trying to use the Java API for Neo4j but I seem to be stuck at IndexHits. If I query the DB with Cypher using
START n=node:types(type="Process") RETURN n;
I get all 2087 nodes of type "Process".
In my application I have the following lines
Index<Node> nodeIndex = db.index().forNodes("types");
IndexHits<Node> hits = nodeIndex.get("type", "Process");
System.out.println("Node index size: " + hits.size());
which leads my console to spit out a value of 0. Here, db is of course an instance of GraphDatabaseService.
I expected an object that included all 2087 nodes. What am I doing wrong?
The .size() question is just the prelude to my iterator
for(Node process : hits) { ... }
but that does not much when hits.size() == 0. According to http://api.neo4j.org/1.9.2/org/neo4j/graphdb/index/IndexHits.html this should be possible, provided there is something in hits.
Thanks in advance for your help.
I figured it out. Man, I feel so embarrassed...
It so happens that I had set up the DB_PATH to my default data folder, whereas the default storage folder is the default data folder plus graph.db. When I tried to run the code from that corrected DB_PATH I got an error saying that a lock file was in place because the Neo4j server was running. After shutting it down it worked perfectly.
So, if you happen to see the following error, just stop the server and run the code again:
Caused by: org.neo4j.kernel.StoreLockException: Could not create lock file
at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:74)
at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:40)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:491)
I found on several forums that you cannot run the Neo4j server and use the Java API to query it at the same time.

How to list all installed neo4j server plugins from web interface or shell?

Right now I'm using the neo4jrestclient python package to list extensions:
from neo4jrestclient.client import GraphDatabase
gdb = GraphDatabase("http://localhost:7474/db/data/")
ext = gdb.extensions
Is there a direct shell command I can use do this? I also don't see anything on the web interface. I'm using 1.8.
Thanks!
Since this is the top answer in Google and the accepted answer is out of date for v3.0+, here's a new answer.
On this page, they show a number of new procedures, and the one in question to get a list of all procedures in the database (including plugins) is "dmbs.procedures()", and I find it most useful to have the signature of the procedure as well as the name. The query for that is:
CALL dbms.procedures() YIELD name, signature RETURN name, signature
Run
curl -v http://localhost:7474/db/data/
From the command line. Extensions aren't available on the web interface.
You have the extensions documentation for neo4j-rest-client:
>>> gdb.extensions
{u'GetAll': <Neo4j ExtensionModule: [u'get_all_nodes', u'getAllRelationships']>}
>>> gdb.extensions.GetAll
<Neo4j ExtensionModule: [u'get_all_nodes', u'getAllRelationships']>
>>> gdb.extensions.GetAll.getAllRelationships()[:]
[<Neo4j Relationship: http://localhost:7474/db/data/relationship/0>,
<Neo4j Relationship: http://localhost:7474/db/data/relationship/1>,
<Neo4j Relationship: http://localhost:7474/db/data/relationship/2>,
<Neo4j Relationship: http://localhost:7474/db/data/relationship/3>]
In newer versions, it's
SHOW PROCEDURES yield name, description, signature

Resources