Neo4j check create result best practices - neo4j

I am using Neo4J client for C#, and I am trying to create some unique nodes.
I've already created index and unique constraint in the database so I am sure that a duplication is not possible, but I want to detect when the creation of a node failed for unique constraint violation.
I am new with Neo4j but I see that common examples follow the (bad) practices to use ExecuteWithoutResults to execute this kind of request, so there is not any feedback of creation execution and I see also that there are not any exception generated if the creation failed.
What is the best practice to get result from a node creation command?
Following a piece of code showing how I create a node:
await client.Cypher
.Merge("(u:User { UserId: {userId}})")
.OnCreate()
.Set("u = {user}")
.WithParams(new
{
userId = user.UserId,
user
})
.ExecuteWithoutResultsAsync();

If you fail to create the node, the exception would be thrown to the client.
However - your code uses MERGE - so that wouldn't happen anyway. Using Neo4jClient - there is no way to get the X nodes created feedback.
You could change your code to .Return(u => u.As<User>()) and check the results, but that's about the only way I can think of.

Related

Using Merge in BatchInserter?

I am using the BatchInserter in order to create some nodes and relationships, however I have unique nodes, and I wanted to make multiple relationships between them.
I can easily do that using the Cypher and in the very same time by using the Java Core API by:
ResourceIterator<Node> existedNodes = graphDBService.findNodesByLabelAndProperty( DynamicLabel.label( "BaseProduct" ), "code", source.getBaseProduct().getCode() ).iterator();
if ( !existedNodes.hasNext() )
{
//TO DO
}
else {
// create relationship with the retrieved node
}
and in Cypher I can easily use the merge.
is there any possible way to do the same with the BatchInserter ?
No it is not possible in the batch-inserter, as those APIs are not available there.
That's why I usually keep in-memory maps with the information I need to look up.
See this blog post for a groovy script:
http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/

Neo4jClient: doubts about CRUD API

My persistency layer essentially uses Neo4jClient to access a Neo4j 1.9.4 database. More specifically, to create nodes I use IGraphClient#Create() in Neo4jClient's CRUD API and to query the graph I use Neo4jClient's Cypher support.
All was well until a friend of mine pointed out that for every query, I essentially did two HTTP requests:
one request to get a node reference from a legacy index by the node's unique ID (not its node ID! but a unique ID generated by SnowMaker)
one Cypher query that started from this node reference that does the actual work.
For read operations, I did the obvious thing and moved the index lookup into my Start() call, i.e.:
GraphClient.Cypher
.Start(new { user = Node.ByIndexLookup("User", "Id", userId) })
// ... the rest of the query ...
For create operations, on the other hand, I don't think this is actually possible. What I mean is: the Create() method takes a POCO, a couple of relationship instances and a couple of index entries in order to create a node, its relationships and its index entries in one transaction/HTTP request. The problem is the node references that you pass to the relationship instances: where do they come from? From previous HTTP requests, right?
My questions:
Can I use the CRUD API to look up node A by its ID, create node B from a POCO, create a relationship between A and B and add B's ID to a legacy index in one request?
If not, what is the alternative? Is the CRUD API considered legacy code and should we move towards a Cypher-based Neo4j 2.0 approach?
Does this Cypher-based approach mean that we lose POCO-to-node translation for create operations? That was very convenient.
Also, can Neo4jClient's documentation be updated because it is, frankly, quite poor. I do realize that Readify also offers commercial support so that might explain things.
Thanks!
I'm the author of Neo4jClient. (The guy who gives his software away for free.)
Q1a:
"Can I use the CRUD API to look up node A by its ID, create node B from a POCO, create a relationship between A and B"
Cypher is the way of not just the future, but also the 'now'.
Start with the Cypher (lots of resources for that):
START user=node:user(Id: 1234)
CREATE user-[:INVITED]->(user2 { Id: 4567, Name: "Jim" })
Return user2
Then convert it to C#:
graphClient.Cypher
.Start(new { user = Node.ByIndexLookup("User", "Id", userId) })
.Create("user-[:INVITED]->(user2 {newUser})")
.WithParam("newUser", new User { Id = 4567, Name = "Jim" })
.Return(user2 => user2.Node<User>())
.Results;
There are lots more similar examples here: https://github.com/Readify/Neo4jClient/wiki/cypher-examples
Q1b:
" and add B's ID to a legacy index in one request?"
No, legacy indexes are not supported in Cypher. If you really want to keep using them, then you should stick with the CRUD API. That's ok: if you want to use legacy indexes, use the legacy API.
Q2.
"If not, what is the alternative? Is the CRUD API considered legacy code and should we move towards a Cypher-based Neo4j 2.0 approach?"
That's exactly what you want to do. Cypher, with labels and automated indexes:
// One time op to create the index
// Yes, this syntax is a bit clunky in C# for now
graphClient.Cypher
.Create("INDEX ON :User(Id)")
.ExecuteWithoutResults();
// Find an existing user, create a new one, relate them,
// and index them, all in a single HTTP call
graphClient.Cypher
.Match("(user:User)")
.Where((User user) => user.Id == userId)
.Create("user-[:INVITED]->(user2 {newUser})")
.WithParam("newUser", new User { Id = 4567, Name = "Jim" })
.ExecuteWithoutResults();
More examples here: https://github.com/Readify/Neo4jClient/wiki/cypher-examples
Q3.
"Does this Cypher-based approach mean that we lose POCO-to-node translation for create operations? That was very convenient."
Correct. But that's what we collectively all want to do, where Neo4j is going, and where Neo4jClient is going too.
Think about SQL for a second (something that I assume you are familiar with). Do you run a query to find the internal identifier of a node, including its file offset on disk, then use this internal identifier in a second query to manipulate it? No. You run a single query that does all that in one hit.
Now, a common use case for why people like passing around Node<T> or NodeReference instances is to reduce repetition in queries. This is a legitimate concern, however because the fluent queries in .NET are immutable, we can just construct a base query:
public ICypherFluentQuery FindUserById(long userId)
{
return graphClient.Cypher
.Match("(user:User)")
.Where((User user) => user.Id == userId);
// Nothing has been executed here: we've just built a query object
}
Then use it like so:
public void DeleteUser(long userId)
{
FindUserById(userId)
.Delete("user")
.ExecuteWithoutResults();
}
Or, add even more Cypher logic to delete all the relationships too:
Then use it like so:
public void DeleteUser(long userId)
{
FindUserById(userId)
.Match("user-[:?rel]-()")
.Delete("rel, user")
.ExecuteWithoutResults();
}
This way, you can effectively reuse references, but without ever having to pull them back across the wire in the first place.

Update relationship / payload with the Neo4jClient

I am new to Neo4j and Neo4jClient. I am trying to update an existing relationship. Here is how I created the relationship.
var item2RefAddedBefore = _graphClient.CreateRelationship((NodeReference<Item>)item2Ref,
new AddedBefore(item1Ref, new Payload() { Frequency = 1 }));
For this particular use case, I would like to update the Payload whenever the Nodes and relationship already exist. I am using Cypher mostly with the Neo4jClient.
Appreciate any help!
Use this IGraphClient signature:
void Update<TRelationshipData>(RelationshipReference<TRelationshipData> relationshipReference, Action<TRelationshipData> updateCallback)
where TRelationshipData : class, new();
Like this:
graphClient.Update(
(RelationshipReference<Payload>)item2RefAddedBefore,
p => { p.Foo = "Bar"; });
Update: The syntax is a little awkward right now, where CreateRelationship only returns a RelationshipReference instead of a RelationshipReference<TData> but Update requires the latter, so you need to explicitly cast it. To be honest, we probably won't fix this any time soon as all of the investment for both Neo4j and Neo4jClient is going towards doing mutations via Cypher instead.

Neo4j indexes and legacy data

I have a legacy dataset (ENRON data represented as GraphML) that I would like to query. In an comment in a related question, #StefanArmbruster suggests that I use Cypher to query the database. My query use case is simple: given a message id (a property of the Message node), retrieve the node that has that id, and also retrieve the sender and recipient nodes of that message.
It seems that to do this in Cypher, I first have to create an index of the nodes. Is there a way to do this automatically when the data is loaded from the graphML file? (I had used Gremlin to load the data and create the database.)
I also have an external Lucene index of the data (I need it for other purposes). Does it make sense to have two indexes? I could, for example, index the Neo4J node ids into my external index, and then query the graph based on those ids. My concern is about the persistence of these ids. (By analogy, Lucene document ids should not be treated as persistent.)
So, should I:
Index the Neo4j graph internally to query on message ids using Cypher? (If so, what is the best way to do that: regenerate the database with some suitable incantation to get the index built? Build the index on the already-existing db?)
Store Neo4j node ids in my external Lucene index and retrieve nodes via these stored ids?
UPDATE
I have been trying to get auto-indexing to work with Gremlin and an embedded server, but with no luck. In the documentation it says
The underlying database is auto-indexed, see Section 14.12, “Automatic Indexing” so the script can return the imported node by index lookup.
But when I examine the graph after loading a new database, no indexes seem to exist.
The Neo4j documentation on auto indexing says that a bunch of configuration is required. In addition to setting node_auto_indexing = true, you have to configure it
To actually auto index something, you have to set which properties
should get indexed. You do this by listing the property keys to index
on. In the configuration file, use the node_keys_indexable and
relationship_keys_indexable configuration keys. When using embedded
mode, use the GraphDatabaseSettings.node_keys_indexable and
GraphDatabaseSettings.relationship_keys_indexable configuration keys.
In all cases, the value should be a comma separated list of property
keys to index on.
So is Gremlin supposed to set the GraphDatabaseSettings parameters? I tried passing in a map into the Neo4jGraph constructor like this:
Map<String,String> config = [
'node_auto_indexing':'true',
'node_keys_indexable': 'emailID'
]
Neo4jGraph g = new Neo4jGraph(graphDB, config);
g.loadGraphML("../databases/data.graphml");
but that had no apparent effect on index creation.
UPDATE 2
Rather than configuring the database through Gremlin, I used the examples given in the Neo4j documentation so that my database creation was like this (in Groovy):
protected Neo4jGraph getGraph(String graphDBname, String databaseName) {
boolean populateDB = !new File(graphDBName).exists();
if(populateDB)
println "creating database";
else
println "opening database";
GraphDatabaseService graphDB = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder( graphDBName ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "emailID" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
setConfig( GraphDatabaseSettings.dump_configuration, "true").
newGraphDatabase();
Neo4jGraph g = new Neo4jGraph(graphDB);
if (populateDB) {
println "Populating graph"
g.loadGraphML(databaseName);
}
return g;
}
and my retrieval was done like this:
ReadableIndex<Node> autoNodeIndex = graph.rawGraph.index()
.getNodeAutoIndexer()
.getAutoIndex();
def node = autoNodeIndex.get( "emailID", "<2614099.1075839927264.JavaMail.evans#thyme>" ).getSingle();
And this seemed to work. Note, however, that the getIndices() call on the Neo4jGraph object still returned an empty list. So the upshot is that I can exercise the Neo4j API correctly, but the Gremlin wrapper seems to be unable to reflect the indexing state. The expression g.idx('node_auto_index') (documented in Gremlin Methods) returns null.
the auto indexes are created lazily. That is - when you have enabled the auto-indexing, the actual index is first created when you index your first property. Make sure you are inserting data before checking the existence of the index, otherwise it might not show up.
For some auto-indexing code (using programmatic configuration), see e.g. https://github.com/neo4j-contrib/rabbithole/blob/master/src/test/java/org/neo4j/community/console/IndexTest.java (this is working with Neo4j 1.8
/peter
Have you tried the automatic index feature? It's basically the use case you're looking for--unfortunately it needs to be enabled before you import the data. (Otherwise you have to remove/add the properties to reindex them.)
http://docs.neo4j.org/chunked/milestone/auto-indexing.html

troubleshooting a NullPointerException in grails

preface note: I'm just starting to learn Grails, so I'm sure there are many other problems and room for optimization.
I've got two domains, a parent (Collection) and child (Event), in a one-to-many mapping. I'm trying to code an integration test for the deletion of children. Prior to the code in question, I've successfully created a parent and three children. The point where I'm having problems is getting a single child in preparation to delete it. The first line of my sample code is only there because of my rudimentary attempt to troubleshoot.
// lines 95-100 of my EventIntegrationTests.groovy file
// delete a single event
assertEquals("2nd Event", event2.title) // passes
def foundEvent = Event.get(event2.id) // no apparent problems
assertEquals("2nd Event", foundEvent.title) // FAILS (line #98)
foundEvent.delete()
assertFalse Event.exists(foundEvent.id)
The error message I'm getting is:
Cannot get property 'title' on null object
java.lang.NullPointerException: Cannot get property 'title' on null object
at edu.learninggrails.EventIntegrationTests.testEventsDelete(EventIntegrationTests.groovy:98)
What should my next troubleshooting steps be? (Since the first assertEquals passes, event2 is clearly not null, so at this point I have no idea how to troubleshoot the failure of the second assertEquals.)
This is not evident from the code: did you persist event2 by calling save()? Get will try to retrieve it from the persistent storage (the in-memory database for example) and if the event wasn't saved, the retrieved instance will be null.
If you did save it, did the save go through OK? Calling event.save() will return false if there was something wrong while saving the item (like a validation error). Lastly, you might try calling event.save(flush:true) in case the Hibernate session doesn't handle this case as you might expect (I'm not entirely sure about this one, but it can't hurt to try).
Try to print or inspect the event2.id on line 97 and check if you actually have an id, if so check if you actually get an Event object on line 97.
I dont think you saved the parent and its children successfully. after you save, you should make sure that every object that was persisted has a non null id, in your test.
What you are seeing is you created the event2 with a title, but didnt save it. It passes the first assertion because you created it. When you do the get, null is returned because your save failed.
in general for DAO integration tests i do the following
Setup -- create all objects Ill use in the test.
Save -- assert that all ids on saved objects are NOT null.
Clear the hibernate session -- this is important because if you don't do it, objects can be in the session from the previous operations. In your real world scenario, you are probably going to start with a find, i.e. an empty session. In other words, you are not going to start with anything in the session. If you are you need to adjust this rule so that the session in the test, when you start the actual testing part, is the same as the session of the code in the wild
Load the objects on which you want to operate and do what you need to do.

Resources