Batching Neo4J updates for speed - neo4j

My goal is to have a list of words in the Oxford dictionary with a relationship between them called IS_ONE_STEP_AWAY_FROM. Each word in relationship is the same length and varies by only one letter.
I am currently able to batch insert the words themselves, but how can I batch insert these relationships?
class Word
{
public string Value { get; set; }
}
public void SeedDatabase()
{
var words = new Queue<Word>();
EnqueueWords(words);
//Create the words as a batch
GraphClient.Cypher
.Create("(w:Word {words})")
.WithParam("words", words)
.ExecuteWithoutResults();
//Add relationships one word at a time
while (words.Count > 0)
{
var word = words.Dequeue();
var relatedWords = WordGroups[word.Value].Except(Enumerable.Repeat(word.Value, 1)).ToList();
if (relatedWords.Count > 0)
{
foreach (string relatedWord in relatedWords)
{
GraphClient.Cypher
.Match("(w1 :Word { Value : {rootWord} }), (w2 :Word { Value : {relatedWord} })")
.Create("(w1)-[r:IS_ONE_STEP_AWAY_FROM]->(w2)")
.WithParam("rootWord", word.Value)
.WithParam("relatedWord", relatedWord)
.ExecuteWithoutResults();
}
}
}
}

Peter do you have an index or constraint on :Word(Value) ?
I also don't fully understand how this batches:
GraphClient.Cypher
.Create("(w:Word {words})")
.WithParam("words", words)
.ExecuteWithoutResults();
And where you define the property-name (aka Value).
Do you run this currently concurrently?
For the relationships I'd recommend to group them by start-word.
then you could do something like:
MATCH (w1:Word {Value:{rootWord}})
UNWIND {relatedWords} as relatedWord
MATCH (w2:Word {Value:relatedWord}}
CREATE (w1)-[r:IS_ONE_STEP_AWAY_FROM]->(w2);
Neo4jClient also still have to learn to use the new transactional endpoint.

Related

Processing temporary properties returned by map projections

Consider the following database schema:
type Actor {
actorId: ID!
name: String
movies: [Movie!]! #relationship(type: "ACTED_IN", direction: OUT)
}
type Movie {
movieId: ID!
title: String
description: String
year: Int
actors(limit: Int = 10): [Actor!]! #relationship(type: "ACTED_IN", direction: IN)
}
Now, I want to know what the top movies with most number of actors, along with the counts. The following Cypher works perfectly in Neo4j:
type Query {
getMoviesWithMostActors(limit: Int = 5): [Movie]
(
statement: """
MATCH (movie:Movie)
MATCH (movie) <-[act:ACTED_IN]- (:Actor)
WITH movie, count(act) AS actorCount
ORDER BY actorCount DESCENDING
LIMIT $limit
RETURN movie {.*, numActors: actorCount}
"""
)
}
However, it fails in GraphQL playground. I tried the following:
query {
this_works: getMoviesWithMostActorsbase(limit: 2) {
movieId
}
this_does_not_work: getMoviesWithMostActorsbase(limit: 2) {
movieId
numActors
}
}
This returned: GRAPHQL_VALIDATION_FAILED.
"GraphQLError: Cannot query field \"numActors\" on type \"Movie\"."
My question is how do I return temporary properties without modifying the type definitions itself. And since this is a dummy example and in fact I need to do it with multiple types of nodes and with different types of scoring (int/float/array), I want to know how to do it without frequently editing the schema whenever I want to add a new query.
Known Workarounds
Extend schema with multiple nullable properties.
Schema needs to be changed with every new query
Return a map and simulate a node object, as shown below.
Need to add 2 more types for every single type of nodes
Actual object types are lost
type MovieData {
identity: Int!
labels: [String]!
properties: Movie!
}
type MovieWithScore {
movieData: MovieData!
movieScore: String!
}
type Query {
getMoviesWithMostActors(limit: Int = 5): [MovieWithScore]
(
statement: """
MATCH (movie:Movie)
MATCH (movie) <-[act:ACTED_IN]- (:Actor)
WITH movie, count(act) AS actorCount
ORDER BY actorCount DESCENDING
LIMIT $limit
RETURN {
movieData: movie,
movieScore: 'number of actors: ' + toString(actorCount)
}
"""
)
}

Neo4j Adding Multiple Nodes and Edges Efficiently

I have the below example.
I was wondering what is the best and quickest way to add a list of nodes and edges in a single transaction? I use standard C# Neo4j .NET packages but open to the Neo4jClient as I've read that's faster. Anything that supports .NET and 4.5 to be honest.
I have an lists of about 60000 FooA objects that need to be added into Neo4j and it can take hours!
Firstly, FooB objects hardly change so I don't have to add them everyday. The performance issues is with adding new FooA objects twice a day.
Each FooA object has a list of FooB objects has two lists containing the relationships I need to add; RelA and RelB (see below).
public class FooA
{
public long Id {get;set;} //UniqueConstraint
public string Name {get;set;}
public long Age {get;set;}
public List<RelA> ListA {get;set;}
public List<RelB> ListB {get;set;}
}
public class FooB
{
public long Id {get;set;} //UniqueConstraint
public string Prop {get;set;}
}
public class RelA
{
public string Val1 {get;set;}
pulic NodeTypeA Node {get;set;
}
public class RelB
{
public FooB Start {get;set;}
public FooB End {get;set;}
public string ValExample {get;set;}
}
Currently, I check if Node 'A' exists by matching by Id. If it does then I completely skip and move onto the next item. If not, I create Node 'A' with its own properties. I then create the edges with their own unique properties.
That's quite a few transactions per item. Match node by Id -> add nodes -> add edges.
foreach(var ntA in FooAList)
{
//First transaction.
MATCH (FooA {Id: ntA.Id)})
if not exists
{
//2nd transaction
CREATE (n:FooA {Id: 1234, Name: "Example", Age: toInteger(24)})
//Multiple transactions.
foreach (var a in ListA)
{
MATCH (n:FooA {Id: ntA.Id}), (n2:FooB {Id: a.Id }) with n,n2 LIMIT 1
CREATE (n)-[:RelA {Prop: a.Val1}]-(n2)
}
foreach (var b in Listb)
{
MATCH (n:FooB {Id: b.Start.Id}), (n2:FooB {Id: b.End.Id }) with n,n2 LIMIT 1
CREATE (n)-[:RelA {Prop: b.ValExample}]-(n2)
}
}
How would one go about adding a list of FooA's using for example Neo4jClient and UNWIND or any other way apart from CSV import.
Hope that makes sense, and thanks!
The biggest problem is the nested lists, which mean you have to do your foreach loops, so you end up executing a minimum of 4 queries per FooA, which for 60,000 - well - that's a lot!
Quick Note RE: Indexing
First and foremost - you need an index on the Id property of your FooA and FooB nodes, this will speed up your queries dramatically.
I've played a bit with this, and have it storing 60,000 FooA entries, and creating 96,000 RelB instances in about 12-15 seconds on my aging computer.
The Solution
I've split it into 2 sections - FooA and RelB:
FooA
I've had to normalise the FooA class into something I can use in Neo4jClient - so let's introduce that:
public class CypherableFooA
{
public CypherableFooA(FooA fooA){
Id = fooA.Id;
Name = fooA.Name;
Age = fooA.Age;
}
public long Id { get; set; }
public string Name { get; set; }
public long Age { get; set; }
public string RelA_Val1 {get;set;}
public long RelA_FooBId {get;set;}
}
I've added the RelA_Val1 and RelA_FooBId properties to be able to access them in the UNWIND. I convert your FooA using a helper method:
public static IList<CypherableFooA> ConvertToCypherable(FooA fooA){
var output = new List<CypherableFooA>();
foreach (var element in fooA.ListA)
{
var cfa = new CypherableFooA(fooA);
cfa.RelA_FooBId = element.Node.Id;
cfa.RelA_Val1 = element.Val1;
output.Add(cfa);
}
return output;
}
This combined with:
var cypherable = fooAList.SelectMany(a => ConvertToCypherable(a)).ToList();
Flattens the FooA instances, so I end up with 1 CypherableFooA for each item in the ListA property of a FooA. e.g. if you had 2 items in ListA on every FooA and you have 5,000 FooA instances - you would end up with cypherable containing 10,000 items.
Now, with cypherable I call my AddFooAs method:
public static void AddFooAs(IGraphClient gc, IList<CypherableFooA> fooAs, int batchSize = 10000, int startPoint = 0)
{
var batch = fooAs.Skip(startPoint).Take(batchSize).ToList();
Console.WriteLine($"FOOA--> {startPoint} to {batchSize + startPoint} (of {fooAs.Count}) = {batch.Count}");
if (batch.Count == 0)
return;
gc.Cypher
.Unwind(batch, "faItem")
.Merge("(fa:FooA {Id: faItem.Id})")
.OnCreate().Set("fa = faItem")
.Merge("(fb:FooB {Id: faItem.RelA_FooBId})")
.Create("(fa)-[:RelA {Prop: faItem.RelA_Val1}]->(fb)")
.ExecuteWithoutResults();
AddFooAs(gc, fooAs, batchSize, startPoint + batch.Count);
}
This batches the query into batches of 10,000 (by default) - this takes about 5-6 seconds on mine - about the same as if I try all 60,000 in one go.
RelB
You store RelB in your example with FooA, but the query you're writing doesn't use the FooA at all, so what I've done is extract and flatten all the RelB instances in the ListB property:
var relBs = fooAList.SelectMany(a => a.ListB.Select(lb => lb));
Then I add them to Neo4j like so:
public static void AddRelBs(IGraphClient gc, IList<RelB> relbs, int batchSize = 10000, int startPoint = 0)
{
var batch = relbs.Select(r => new { StartId = r.Start.Id, EndId = r.End.Id, r.ValExample }).Skip(startPoint).Take(batchSize).ToList();
Console.WriteLine($"RELB--> {startPoint} to {batchSize + startPoint} (of {relbs.Count}) = {batch.Count}");
if(batch.Count == 0)
return;
var query = gc.Cypher
.Unwind(batch, "rbItem")
.Match("(fb1:FooB {Id: rbItem.StartId}),(fb2:FooB {Id: rbItem.EndId})")
.Create("(fb1)-[:RelA {Prop: rbItem.ValExample}]->(fb2)");
query.ExecuteWithoutResults();
AddRelBs(gc, relbs, batchSize, startPoint + batch.Count);
}
Again, batching defaulted to 10,000.
Obviously time will vary depending on the number of rels in ListB and ListA - My tests has one item in ListA and 2 in ListB.

how to find the relationship between 2nd degree friends with spring data neo4j

my relationship graph as below pic
Using MATCH (n:Person {name:'1'})-[]-()-[r]-(m) RETURN m,r will return "4" and "5",but cannot get the relationship between "4" and "5",actually,"4" follow "5","r" just represent the friends of 4(4-->2) and 5(5-->3).
In spring data neo4j
domain.java
#NodeEntity(label = "Person")
public class PersonTest {
#Id
private Long id;
#Property(name = "name")
private String name;
#Relationship(type = "Follow") //direction=Relationship.DIRECTION
private List<PersonTest> friends;
Repository.java
public interface PersonRepository extends Neo4jRepository<PersonTest,Long> {
#Query("MATCH (n:Person {name:{name}})-[]-()-[r]-(m) RETURN m,r")
Collection<PersonTest> graph(#Param("name") String name);
}
Service.java
Collection<PersonTest> persons = personRepository.graph(name);
Iterator<PersonTest> result = persons.iterator();
while (result.hasNext()) {
for (PersonTest friend : p.getFriends()) {
//........here will get 2 and 3!
}
}
How to resolve this problem??get the relationship between "4" and "5".
To find related children at a certain level, you can use a two-sided pattern of variable length:
MATCH (n:Person {name:'1'})-[:Follow*2]->(m)-[r:Follow]-()<-[:Follow*2]-(n)
RETURN m,r
http://console.neo4j.org/r/el2x80
Update. I think that this is a more correct query:
MATCH (n:Person {name:'1'})-[:Follow*2]->(m:Person)-[r:Follow]-(k:Person)
WHERE (n)-[:Follow*2]->(k)
RETURN m, r
http://console.neo4j.org/r/bv2u8k

Neo4jClient : C# query to fetch collection with multiple columns

How one can fetch multiple columns data using neo4jClient -
For eq. the example shown on link
Cyper query to fetch multiple column collection
The sample shown above passes properties of event node for collection instead of complete event node.
The query I am constructing takes few properties from the event node and few properties from the relation.
For eq. The relation attribute "registerd_on" needs to be added.
So how to pass multiple properties for collection ?
It's not very nice, but if you look at what is returned by doing a collection you get an array of arrays, but these arrays don't have properties as such, so you can only really parse them as string.
Using the :play movies dataset as a base:
var query = gc.Cypher
.Match("(p:Person {name:'Tom Hanks'})-->(m:Movie)")
.With("p, collect([m.title, m.released]) as collection")
.Return((p, collection) => new
{
Person = p.As<Person>(),
Collection = Return.As<IEnumerable<IEnumerable<string>>>("collection")
});
where Person is :
public class Person
{
public string name { get; set; }
}
You can then access the data like so:
foreach (var result in results)
{
Console.WriteLine($"Person: {result.Person.name}");
foreach (var collection in result.Collection)
{
foreach (var item in collection)
{
Console.WriteLine($"\t{item}");
}
}
}
which is not nice :/

Grails named query to select row based on the max value in multiple rows

I have a groovy multi select tag () in which I want to display only the row which has MAX revision from a set of rows. Here is a post which does it in SQL: to select single row based on the max value in multiple rows?
How can I write in groovy? Should I create a named query?
Thanks in advance.
There are various ways to achieve this. Those include namedQueries, criteria, detached criteria or even HQL. Here is the namedQuery which would get what is needed:
static namedQueries = {
maxRevision {
eq 'revision', {
projections {
max 'revision'
}
}
//projections if needed
projections {
property 'name'
}
}
}
//controller
AppInfo.maxRevision().list()
With a detached criteria, it would be similar as:
AppInfo.withCriteria {
eq 'revision', {
projections {
max 'revision'
}
}
projections {
property 'name'
}
}
with HQL:
select ai from AppInfo as ai
where ai.revision = (
select max(revision) from AppInfo
)
taking into consideration this below domain class:
class AppInfo {
String name
Integer revision
}
UPDATE
Above would give the max of all the revisions. If you are looking for max of each group then you will have to use the below HQL:
AppInfo.executeQuery("""
select ai from AppInfo as ai
where ai.revision in (
select max(a.revision) from AppInfo as a
where a.name = ai.name
)
""")
This can also be written as a Criteria.

Resources