Neo4jClient: doubts about CRUD API - neo4jclient

My persistency layer essentially uses Neo4jClient to access a Neo4j 1.9.4 database. More specifically, to create nodes I use IGraphClient#Create() in Neo4jClient's CRUD API and to query the graph I use Neo4jClient's Cypher support.
All was well until a friend of mine pointed out that for every query, I essentially did two HTTP requests:
one request to get a node reference from a legacy index by the node's unique ID (not its node ID! but a unique ID generated by SnowMaker)
one Cypher query that started from this node reference that does the actual work.
For read operations, I did the obvious thing and moved the index lookup into my Start() call, i.e.:
GraphClient.Cypher
.Start(new { user = Node.ByIndexLookup("User", "Id", userId) })
// ... the rest of the query ...
For create operations, on the other hand, I don't think this is actually possible. What I mean is: the Create() method takes a POCO, a couple of relationship instances and a couple of index entries in order to create a node, its relationships and its index entries in one transaction/HTTP request. The problem is the node references that you pass to the relationship instances: where do they come from? From previous HTTP requests, right?
My questions:
Can I use the CRUD API to look up node A by its ID, create node B from a POCO, create a relationship between A and B and add B's ID to a legacy index in one request?
If not, what is the alternative? Is the CRUD API considered legacy code and should we move towards a Cypher-based Neo4j 2.0 approach?
Does this Cypher-based approach mean that we lose POCO-to-node translation for create operations? That was very convenient.
Also, can Neo4jClient's documentation be updated because it is, frankly, quite poor. I do realize that Readify also offers commercial support so that might explain things.
Thanks!

I'm the author of Neo4jClient. (The guy who gives his software away for free.)
Q1a:
"Can I use the CRUD API to look up node A by its ID, create node B from a POCO, create a relationship between A and B"
Cypher is the way of not just the future, but also the 'now'.
Start with the Cypher (lots of resources for that):
START user=node:user(Id: 1234)
CREATE user-[:INVITED]->(user2 { Id: 4567, Name: "Jim" })
Return user2
Then convert it to C#:
graphClient.Cypher
.Start(new { user = Node.ByIndexLookup("User", "Id", userId) })
.Create("user-[:INVITED]->(user2 {newUser})")
.WithParam("newUser", new User { Id = 4567, Name = "Jim" })
.Return(user2 => user2.Node<User>())
.Results;
There are lots more similar examples here: https://github.com/Readify/Neo4jClient/wiki/cypher-examples
Q1b:
" and add B's ID to a legacy index in one request?"
No, legacy indexes are not supported in Cypher. If you really want to keep using them, then you should stick with the CRUD API. That's ok: if you want to use legacy indexes, use the legacy API.
Q2.
"If not, what is the alternative? Is the CRUD API considered legacy code and should we move towards a Cypher-based Neo4j 2.0 approach?"
That's exactly what you want to do. Cypher, with labels and automated indexes:
// One time op to create the index
// Yes, this syntax is a bit clunky in C# for now
graphClient.Cypher
.Create("INDEX ON :User(Id)")
.ExecuteWithoutResults();
// Find an existing user, create a new one, relate them,
// and index them, all in a single HTTP call
graphClient.Cypher
.Match("(user:User)")
.Where((User user) => user.Id == userId)
.Create("user-[:INVITED]->(user2 {newUser})")
.WithParam("newUser", new User { Id = 4567, Name = "Jim" })
.ExecuteWithoutResults();
More examples here: https://github.com/Readify/Neo4jClient/wiki/cypher-examples
Q3.
"Does this Cypher-based approach mean that we lose POCO-to-node translation for create operations? That was very convenient."
Correct. But that's what we collectively all want to do, where Neo4j is going, and where Neo4jClient is going too.
Think about SQL for a second (something that I assume you are familiar with). Do you run a query to find the internal identifier of a node, including its file offset on disk, then use this internal identifier in a second query to manipulate it? No. You run a single query that does all that in one hit.
Now, a common use case for why people like passing around Node<T> or NodeReference instances is to reduce repetition in queries. This is a legitimate concern, however because the fluent queries in .NET are immutable, we can just construct a base query:
public ICypherFluentQuery FindUserById(long userId)
{
return graphClient.Cypher
.Match("(user:User)")
.Where((User user) => user.Id == userId);
// Nothing has been executed here: we've just built a query object
}
Then use it like so:
public void DeleteUser(long userId)
{
FindUserById(userId)
.Delete("user")
.ExecuteWithoutResults();
}
Or, add even more Cypher logic to delete all the relationships too:
Then use it like so:
public void DeleteUser(long userId)
{
FindUserById(userId)
.Match("user-[:?rel]-()")
.Delete("rel, user")
.ExecuteWithoutResults();
}
This way, you can effectively reuse references, but without ever having to pull them back across the wire in the first place.

Related

Neo4j check create result best practices

I am using Neo4J client for C#, and I am trying to create some unique nodes.
I've already created index and unique constraint in the database so I am sure that a duplication is not possible, but I want to detect when the creation of a node failed for unique constraint violation.
I am new with Neo4j but I see that common examples follow the (bad) practices to use ExecuteWithoutResults to execute this kind of request, so there is not any feedback of creation execution and I see also that there are not any exception generated if the creation failed.
What is the best practice to get result from a node creation command?
Following a piece of code showing how I create a node:
await client.Cypher
.Merge("(u:User { UserId: {userId}})")
.OnCreate()
.Set("u = {user}")
.WithParams(new
{
userId = user.UserId,
user
})
.ExecuteWithoutResultsAsync();
If you fail to create the node, the exception would be thrown to the client.
However - your code uses MERGE - so that wouldn't happen anyway. Using Neo4jClient - there is no way to get the X nodes created feedback.
You could change your code to .Return(u => u.As<User>()) and check the results, but that's about the only way I can think of.

Cypher 1.9.9, START by both relationship and node index

My Neo4j 1.9.9 entities are stored using Spring Data Neo4j. However, because many derived queries from repository methods are wrong, I've been forced to use directly Cypher
Basically, I have two classes:
#NodeEntity
public class RecommenderMashup {
#Indexed(indexType = IndexType.SIMPLE, indexName = "recommenderMashupIds")
private String mashupId;
}
#RelationshipEntity(type = "MASHUP_TO_MASHUP_SIMILARITY")
public class MashupToMashupSimilarity {
#StartNode
private RecommenderMashup mashupFrom;
#EndNode
private RecommenderMashup mashupTo;
}
In addition to the indexes directly provided, as you know, Spring Data Neo4j adds two other indexes: __types__ for nodes and __rel_types__ for relationship; both of them have className as their key.
So, I've tried the query below to get all the MashupToMashupSimilarity objects related to a specific node
START `mashupFrom`=node:`recommenderMashupIds`(`mashupId`='5367575248633856'),
`mashupTo`=node:__types__(className="package.RecommenderMashup"),
`mashupToMashupSimilarity`=rel:__rel_types__(className="package.MashupToMashupSimilarity")
MATCH `mashupFrom`-[:`mashupToMashupSimilarity`]->`mashupTo`
RETURN `mashupToMashupSimilarity`;
However, I always got empty results. I suspect that this is due to the fact that the START clause contains both nodes and relationships. Is this possible? Otherwise, what could be the problem here?
Additional infos
The suspect came from the fact that
START `mashupToMashupSimilarity`=rel:__rel_types__(className='package.MashupToMashupSimilarity')
RETURN `mashupToMashupSimilarity`;
and
START `mashup`=node:__types__(className="package.RecommenderMashup")
RETURN `mashup`;
and other similar queries always return the right results.
The only working alternative at this point is
START `mashupFrom`=node:`recommenderMashupIds`(`mashupId`='6006582764634112'),
`mashupTo`=node:__types__(className="package.RecommenderMashup")
MATCH `mashupFrom`-[`similarity`:MASHUP_TO_MASHUP_SIMILARITY]->`mashupTo`
RETURN `similarity`;
both I don't know how it works in terms of performance (the indexes should be faster). Also, I'm curious what I've been doing wrong.
Did you try to run your queries in the neo4j-browser or shell? did they work there?
This query is also wrong,
START `mashupFrom`=node:`recommenderMashupIds`(`mashupId`='5367575248633856'),
`mashupTo`=node:__types__(className="package.RecommenderMashup"),
`mashupToMashupSimilarity`=rel:__rel_types__(className="package.MashupToMashupSimilarity")
MATCH `mashupFrom`-[:`mashupToMashupSimilarity`]->`mashupTo`
RETURN `mashupToMashupSimilarity`;
you use mashupToMashupSimilarity as identifier for the relationship,
but then you use it wrongly as relationship-type:
-[:mashupToMashupSimilarity]->
it should be: -[mashupToMashupSimilarity]->
but of course better, skip the rel-index check and use -[similarity:MASHUP_TO_MASHUP_SIMILARITY]->
And you can just leave of the relationship-index lookup which doesn't make sense at all, as you should already filter with the relationship-type.
Update: Don't use index lookups for type check
START mashupFrom=node:recommenderMashupIds(mashupId='5367575248633856')
MATCH (mashupFrom)-[mashupToMashupSimilarity:MASHUP_TO_MASHUP_SIMILARITY]->(mashupTo)
WHERE mashupTo.__type__ = 'package.RecommenderMashup'
RETURN mashupToMashupSimilarity;
As the relationship-type is already restricting, I think you don't even need the type-check on the target node.

Update relationship / payload with the Neo4jClient

I am new to Neo4j and Neo4jClient. I am trying to update an existing relationship. Here is how I created the relationship.
var item2RefAddedBefore = _graphClient.CreateRelationship((NodeReference<Item>)item2Ref,
new AddedBefore(item1Ref, new Payload() { Frequency = 1 }));
For this particular use case, I would like to update the Payload whenever the Nodes and relationship already exist. I am using Cypher mostly with the Neo4jClient.
Appreciate any help!
Use this IGraphClient signature:
void Update<TRelationshipData>(RelationshipReference<TRelationshipData> relationshipReference, Action<TRelationshipData> updateCallback)
where TRelationshipData : class, new();
Like this:
graphClient.Update(
(RelationshipReference<Payload>)item2RefAddedBefore,
p => { p.Foo = "Bar"; });
Update: The syntax is a little awkward right now, where CreateRelationship only returns a RelationshipReference instead of a RelationshipReference<TData> but Update requires the latter, so you need to explicitly cast it. To be honest, we probably won't fix this any time soon as all of the investment for both Neo4j and Neo4jClient is going towards doing mutations via Cypher instead.

Performance of repository pattern and IQueryable<T>

I have no idea if I'm doing this right, but this is how a Get method in my repository looks:
public IQueryable<User> GetUsers(IEnumerable<Expression<Func<User, object>>> eagerLoading)
{
IQueryable<User> query = db.Users.AsNoTracking();
if (eagerLoading != null)
{
foreach (var expression in eagerLoading)
{
query = query.Include(expression);
}
}
return query;
}
Lets say I also have a GeographyRepository that has GetCountries method, which is similar to this.
I have 2 separate service layer classes calling these 2 separate repositories, sharing the same DbContext (EF 4.1 code-first).
So in my controller, I'd do:
myViewModel.User = userService.GetUserById(1);
myViewModel.Countries = geoService.GetCountries();
This is 2 separate calls to the database. If I didn't use these patterns and tie up the interface and database, I'd have 1 call. I guess its something of a performance vs maintainability.
My question is, can this be pushed to 1 database call? Can we merge queries like this when views calls multiple repositories?
I'd say that if performance is the real issue then I'd try and avoid going back to the database altogether. I'm assuming the list returned from geoService.GetCountries() is fairly static, so I'd be inclined to cache it in the service after the initial load and remove the database hit altogether. The fact that you have a service there suggests that it would be the perfect place to abstract away such details.
Generally when asking questions about performance, it's rare that all perf related issues can be tarred with the same brush and you need to analyse each situation and work out an appropriate solution for the specific perf issue you're having.

Repository Interface - Available Functions & Filtering Output

I've got a repository using LINQ for modelling the data that has a whole bunch of functions for getting data out. A very common way of getting data out is for things such as drop down lists. These drop down lists can vary. If we're creating something we usually have a drop down list with all entries of a certain type, which means I need a function available which filters by the type of entity. We also have pages to filter data, the drop down lists only contain entries that currently are used, so I need a filter that requires used entries. This means there are six different queries to get the same type of data out.
The problem with defining a function for each of these is that there'd be six functions at least for every type of output, all in one repository. It gets very large, very quick. Here's something like I was planning to do:
public IEnumerable<Supplier> ListSuppliers(bool areInUse, bool includeAllOption, int contractTypeID)
{
if (areInUse && includeAllOption)
{
}
else if (areInUse)
{
}
else if (includeAllOption)
{
}
}
Although "areInUse" doesn't seem very English friendly, I'm not brilliant with naming. As you can see, logic resides in my data access layer (repository) which isn't friendly. I could define separate functions but as I say, it grows quite quick.
Could anyone recommend a good solution?
NOTE: I use LINQ for entities only, I don't use it to query. Please don't ask, it's a constraint on the system not specified by me. If I had the choice, I'd use LINQ, but I don't unfortunately.
Have your method take a Func<Supplier,bool> which can be used in Where clause so that you can pass it in any type of filter than you would like to construct. You can use a PredicateBuilder to construct arbitrarily complex functions based on boolean operations.
public IEnumerable<Supplier> ListSuppliers( Func<Supplier,bool> filter )
{
return this.DataContext.Suppliers.Where( filter );
}
var filter = PredicateBuilder.False<Supplier>();
filter = filter.Or( s => s.IsInUse ).Or( s => s.ContractTypeID == 3 );
var suppliers = repository.ListSuppliers( filter );
You can implement
IEnumerable<Supplier> GetAllSuppliers() { ... }
end then use LINQ on the returned collection. This will retrieve all suppliers from the database that are then filtered using LINQ.
Assuming you are using LINQ to SQL you can also implement
IQueryable<Supplier> GetAllSuppliers() { ... }
end then use LINQ on the returned collection. This will only retrieve the necessary suppliers from the database when the collection is enumerated. This is very powerful and there are also some limits to the LINQ you can use. However, the biggest problem is that you are able to drill right through your data-access layer and into the database using LINQ.
A query like
var query = from supplier in repository.GetAllSuppliers()
where suppliers.Name.StartsWith("Foo") select supplier;
will map into SQL similar to this when it is enumerated
SELECT ... WHERE Name LIKE 'Foo%'

Resources