UnsupportedOperationException when retrieving all nodes with GlobalGraphOperations in neo4j - neo4j

I'm using neo4j 1.8.2 and am trying to retrieve all nodes from a graph, but I am getting UnsupportedOperationException.
GraphDatabaseService db = GraphDatabaseFactory.databaseFor("http://localhost:7474/db/data/");
Iterable<Node> nodes = GlobalGraphOperations.at(db).getAllNodes();
I found it in the API documentation, so I can't understand, what am I doing wrong.

Where did you find it?
The REST Graph Database doesn't support this operation (or at least not if called from GlobalGraphOperations
The db.getAllNodes() is implemented using a remote cypher query which is what you should do as well.
new RestCypherQueryEngine(restGraphDb.getRestAPI()).query(....)
or
restGraphDB.query()

It seems you can't do this with a remote database. Check the source, it frequently throws UnsupportedOperationExceptions. Maybe an embedded database is an option for you?

Related

DSE graph modify vertex properties,

So its not obvious from the javadocs on how to modify properties of a Vertex after adding into graph.
I tried TinkerPop way.
GraphTraversalSource g = DseGraph.traversal(dseSession);
g.V().toStream().forEach(vertex -> vertex.property("name", "Santosh"));
But I get an exception
Exception in thread "main" java.lang.IllegalStateException: Property addition is not supported
at org.apache.tinkerpop.gremlin.structure.Element$Exceptions.propertyAdditionNotSupported(Element.java:133)
at org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex.property(DetachedVertex.java:91)
at com.trimble.tpaas.profilex.random.MainGraphConnectivity.lambda$testSchemaCreation$0(MainGraphConnectivity.java:41)
at org.apache.tinkerpop.gremlin.process.traversal.Traversal.forEachRemaining(Traversal.java:250)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at com.trimble.tpaas.profilex.random.MainGraphConnectivity.testSchemaCreation(MainGraphConnectivity.java:41)
at com.trimble.tpaas.profilex.random.MainGraphConnectivity.main(MainGraphConnectivity.java:23)
So question where can I refer to understand how to modify an existing vertex property using DSE java driver or otherwise.
When you connect to a DSE Graph with the DataStax Java Driver:
g = DseGraph.traversal(dseSession)
or the TinkerPop Driver for that matter:
graph = EmptyGraph.instance()
g = graph.traversal().withRemote('conf/remote-graph.properties')
the results you receive are disconnected from the database. In TinkerPop we call that state "detached". So vertices returned from g.V() are in a "detached" state and you can't directly interact with them as though they are backed by the database for storing their properties.
All database mutations should occur through the Traversal API (i.e. Gremlin). So, if you want to add a property to all vertices in your graph, you might do:
g.V().property('name','Santosh').iterate()

Using Merge in BatchInserter?

I am using the BatchInserter in order to create some nodes and relationships, however I have unique nodes, and I wanted to make multiple relationships between them.
I can easily do that using the Cypher and in the very same time by using the Java Core API by:
ResourceIterator<Node> existedNodes = graphDBService.findNodesByLabelAndProperty( DynamicLabel.label( "BaseProduct" ), "code", source.getBaseProduct().getCode() ).iterator();
if ( !existedNodes.hasNext() )
{
//TO DO
}
else {
// create relationship with the retrieved node
}
and in Cypher I can easily use the merge.
is there any possible way to do the same with the BatchInserter ?
No it is not possible in the batch-inserter, as those APIs are not available there.
That's why I usually keep in-memory maps with the information I need to look up.
See this blog post for a groovy script:
http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/

Why does spring-data require START in a Cypher query?

I have User type in neo4j database with a 'registered' property that stores the timestamp (Long) when user joined the site. I want to find how many users have registered before a given date. I defined a query method on the Spring-data Graph repository interface:
#Query("MATCH user=node:User WHERE user.registered < {0} RETURN count(*)")
def countUsersBefore(registered: java.lang.Long): java.lang.Long
I see in the Neo4j manual a lot of queries that just start with MATCH, but Spring-data doesn't seem to like it and requires a START. In my case I don't have an obvious node from where I can start, since my query is not following any relationships, it's just a plain count-where combination.
How can I fix this query? Do I need an index on the 'registered' property?
If you want to use this syntax you have to use Spring Data Neo4j 3.0-M01 which works with Neo4j 2.0.0-M06.
You also need that to be able to use labels.
But better wait for the next milestone version of SDN 3.0 which will work with Neo4j 2.0.0 final.
Update:
If you use the SDN types index:
START user=node:__types__(className="org.example.User")
WHERE user.registered < {0}
RETURN count(*)
or in a repository this derived method should work:
public interface UserRepository extends GraphRepository<User> {
int countByRegisteredLessThan(int value);
}
Instead of MATCH user=node:User..., you want MATCH (user:User)...

Neo4j indexes and legacy data

I have a legacy dataset (ENRON data represented as GraphML) that I would like to query. In an comment in a related question, #StefanArmbruster suggests that I use Cypher to query the database. My query use case is simple: given a message id (a property of the Message node), retrieve the node that has that id, and also retrieve the sender and recipient nodes of that message.
It seems that to do this in Cypher, I first have to create an index of the nodes. Is there a way to do this automatically when the data is loaded from the graphML file? (I had used Gremlin to load the data and create the database.)
I also have an external Lucene index of the data (I need it for other purposes). Does it make sense to have two indexes? I could, for example, index the Neo4J node ids into my external index, and then query the graph based on those ids. My concern is about the persistence of these ids. (By analogy, Lucene document ids should not be treated as persistent.)
So, should I:
Index the Neo4j graph internally to query on message ids using Cypher? (If so, what is the best way to do that: regenerate the database with some suitable incantation to get the index built? Build the index on the already-existing db?)
Store Neo4j node ids in my external Lucene index and retrieve nodes via these stored ids?
UPDATE
I have been trying to get auto-indexing to work with Gremlin and an embedded server, but with no luck. In the documentation it says
The underlying database is auto-indexed, see Section 14.12, “Automatic Indexing” so the script can return the imported node by index lookup.
But when I examine the graph after loading a new database, no indexes seem to exist.
The Neo4j documentation on auto indexing says that a bunch of configuration is required. In addition to setting node_auto_indexing = true, you have to configure it
To actually auto index something, you have to set which properties
should get indexed. You do this by listing the property keys to index
on. In the configuration file, use the node_keys_indexable and
relationship_keys_indexable configuration keys. When using embedded
mode, use the GraphDatabaseSettings.node_keys_indexable and
GraphDatabaseSettings.relationship_keys_indexable configuration keys.
In all cases, the value should be a comma separated list of property
keys to index on.
So is Gremlin supposed to set the GraphDatabaseSettings parameters? I tried passing in a map into the Neo4jGraph constructor like this:
Map<String,String> config = [
'node_auto_indexing':'true',
'node_keys_indexable': 'emailID'
]
Neo4jGraph g = new Neo4jGraph(graphDB, config);
g.loadGraphML("../databases/data.graphml");
but that had no apparent effect on index creation.
UPDATE 2
Rather than configuring the database through Gremlin, I used the examples given in the Neo4j documentation so that my database creation was like this (in Groovy):
protected Neo4jGraph getGraph(String graphDBname, String databaseName) {
boolean populateDB = !new File(graphDBName).exists();
if(populateDB)
println "creating database";
else
println "opening database";
GraphDatabaseService graphDB = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder( graphDBName ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "emailID" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
setConfig( GraphDatabaseSettings.dump_configuration, "true").
newGraphDatabase();
Neo4jGraph g = new Neo4jGraph(graphDB);
if (populateDB) {
println "Populating graph"
g.loadGraphML(databaseName);
}
return g;
}
and my retrieval was done like this:
ReadableIndex<Node> autoNodeIndex = graph.rawGraph.index()
.getNodeAutoIndexer()
.getAutoIndex();
def node = autoNodeIndex.get( "emailID", "<2614099.1075839927264.JavaMail.evans#thyme>" ).getSingle();
And this seemed to work. Note, however, that the getIndices() call on the Neo4jGraph object still returned an empty list. So the upshot is that I can exercise the Neo4j API correctly, but the Gremlin wrapper seems to be unable to reflect the indexing state. The expression g.idx('node_auto_index') (documented in Gremlin Methods) returns null.
the auto indexes are created lazily. That is - when you have enabled the auto-indexing, the actual index is first created when you index your first property. Make sure you are inserting data before checking the existence of the index, otherwise it might not show up.
For some auto-indexing code (using programmatic configuration), see e.g. https://github.com/neo4j-contrib/rabbithole/blob/master/src/test/java/org/neo4j/community/console/IndexTest.java (this is working with Neo4j 1.8
/peter
Have you tried the automatic index feature? It's basically the use case you're looking for--unfortunately it needs to be enabled before you import the data. (Otherwise you have to remove/add the properties to reindex them.)
http://docs.neo4j.org/chunked/milestone/auto-indexing.html

Why does all data go away when restarting Neo4j?

I don't understand this paradigm I guess?
For a small single server or development environment... I hate having to load 100's of thousands of records just to analyze it in a graph... am I missing the big picture here?
UPDATE (3/21/2012 10:38a):
My current setup:
Default Install
Default Configs
Server Setup
Creating nodes via REST API
How do you instantiate your database, embedded or server? Are you running ImpermanentGraphDatabase, because that's the in-memory test database. If you use the normal EmbeddedGraphDatabase your graph is persisted trasactionally along the way when you insert your data.
Please give a little more information.
If using Java embedded transactions must be closed when saving objects or they might get lost. In earlier versions this was done by calling finally { tx.finish(); }, later versions (2.1+) it should happen automatically when instantiated within the try-with-resource. (This makes it possible to run into problems if the Transaction tx is instantiated outside the try clause).
GraphDatabaseService graphDb = new GraphDatabaseFactory().newEmbeddedDatabase(DB_PATH);
try (Transaction tx = graphDb.beginTx()) {
// create some nodes here
}

Resources