Neo4j Adding Multiple Nodes and Edges Efficiently - neo4j

I have the below example.
I was wondering what is the best and quickest way to add a list of nodes and edges in a single transaction? I use standard C# Neo4j .NET packages but open to the Neo4jClient as I've read that's faster. Anything that supports .NET and 4.5 to be honest.
I have an lists of about 60000 FooA objects that need to be added into Neo4j and it can take hours!
Firstly, FooB objects hardly change so I don't have to add them everyday. The performance issues is with adding new FooA objects twice a day.
Each FooA object has a list of FooB objects has two lists containing the relationships I need to add; RelA and RelB (see below).
public class FooA
{
public long Id {get;set;} //UniqueConstraint
public string Name {get;set;}
public long Age {get;set;}
public List<RelA> ListA {get;set;}
public List<RelB> ListB {get;set;}
}
public class FooB
{
public long Id {get;set;} //UniqueConstraint
public string Prop {get;set;}
}
public class RelA
{
public string Val1 {get;set;}
pulic NodeTypeA Node {get;set;
}
public class RelB
{
public FooB Start {get;set;}
public FooB End {get;set;}
public string ValExample {get;set;}
}
Currently, I check if Node 'A' exists by matching by Id. If it does then I completely skip and move onto the next item. If not, I create Node 'A' with its own properties. I then create the edges with their own unique properties.
That's quite a few transactions per item. Match node by Id -> add nodes -> add edges.
foreach(var ntA in FooAList)
{
//First transaction.
MATCH (FooA {Id: ntA.Id)})
if not exists
{
//2nd transaction
CREATE (n:FooA {Id: 1234, Name: "Example", Age: toInteger(24)})
//Multiple transactions.
foreach (var a in ListA)
{
MATCH (n:FooA {Id: ntA.Id}), (n2:FooB {Id: a.Id }) with n,n2 LIMIT 1
CREATE (n)-[:RelA {Prop: a.Val1}]-(n2)
}
foreach (var b in Listb)
{
MATCH (n:FooB {Id: b.Start.Id}), (n2:FooB {Id: b.End.Id }) with n,n2 LIMIT 1
CREATE (n)-[:RelA {Prop: b.ValExample}]-(n2)
}
}
How would one go about adding a list of FooA's using for example Neo4jClient and UNWIND or any other way apart from CSV import.
Hope that makes sense, and thanks!

The biggest problem is the nested lists, which mean you have to do your foreach loops, so you end up executing a minimum of 4 queries per FooA, which for 60,000 - well - that's a lot!
Quick Note RE: Indexing
First and foremost - you need an index on the Id property of your FooA and FooB nodes, this will speed up your queries dramatically.
I've played a bit with this, and have it storing 60,000 FooA entries, and creating 96,000 RelB instances in about 12-15 seconds on my aging computer.
The Solution
I've split it into 2 sections - FooA and RelB:
FooA
I've had to normalise the FooA class into something I can use in Neo4jClient - so let's introduce that:
public class CypherableFooA
{
public CypherableFooA(FooA fooA){
Id = fooA.Id;
Name = fooA.Name;
Age = fooA.Age;
}
public long Id { get; set; }
public string Name { get; set; }
public long Age { get; set; }
public string RelA_Val1 {get;set;}
public long RelA_FooBId {get;set;}
}
I've added the RelA_Val1 and RelA_FooBId properties to be able to access them in the UNWIND. I convert your FooA using a helper method:
public static IList<CypherableFooA> ConvertToCypherable(FooA fooA){
var output = new List<CypherableFooA>();
foreach (var element in fooA.ListA)
{
var cfa = new CypherableFooA(fooA);
cfa.RelA_FooBId = element.Node.Id;
cfa.RelA_Val1 = element.Val1;
output.Add(cfa);
}
return output;
}
This combined with:
var cypherable = fooAList.SelectMany(a => ConvertToCypherable(a)).ToList();
Flattens the FooA instances, so I end up with 1 CypherableFooA for each item in the ListA property of a FooA. e.g. if you had 2 items in ListA on every FooA and you have 5,000 FooA instances - you would end up with cypherable containing 10,000 items.
Now, with cypherable I call my AddFooAs method:
public static void AddFooAs(IGraphClient gc, IList<CypherableFooA> fooAs, int batchSize = 10000, int startPoint = 0)
{
var batch = fooAs.Skip(startPoint).Take(batchSize).ToList();
Console.WriteLine($"FOOA--> {startPoint} to {batchSize + startPoint} (of {fooAs.Count}) = {batch.Count}");
if (batch.Count == 0)
return;
gc.Cypher
.Unwind(batch, "faItem")
.Merge("(fa:FooA {Id: faItem.Id})")
.OnCreate().Set("fa = faItem")
.Merge("(fb:FooB {Id: faItem.RelA_FooBId})")
.Create("(fa)-[:RelA {Prop: faItem.RelA_Val1}]->(fb)")
.ExecuteWithoutResults();
AddFooAs(gc, fooAs, batchSize, startPoint + batch.Count);
}
This batches the query into batches of 10,000 (by default) - this takes about 5-6 seconds on mine - about the same as if I try all 60,000 in one go.
RelB
You store RelB in your example with FooA, but the query you're writing doesn't use the FooA at all, so what I've done is extract and flatten all the RelB instances in the ListB property:
var relBs = fooAList.SelectMany(a => a.ListB.Select(lb => lb));
Then I add them to Neo4j like so:
public static void AddRelBs(IGraphClient gc, IList<RelB> relbs, int batchSize = 10000, int startPoint = 0)
{
var batch = relbs.Select(r => new { StartId = r.Start.Id, EndId = r.End.Id, r.ValExample }).Skip(startPoint).Take(batchSize).ToList();
Console.WriteLine($"RELB--> {startPoint} to {batchSize + startPoint} (of {relbs.Count}) = {batch.Count}");
if(batch.Count == 0)
return;
var query = gc.Cypher
.Unwind(batch, "rbItem")
.Match("(fb1:FooB {Id: rbItem.StartId}),(fb2:FooB {Id: rbItem.EndId})")
.Create("(fb1)-[:RelA {Prop: rbItem.ValExample}]->(fb2)");
query.ExecuteWithoutResults();
AddRelBs(gc, relbs, batchSize, startPoint + batch.Count);
}
Again, batching defaulted to 10,000.
Obviously time will vary depending on the number of rels in ListB and ListA - My tests has one item in ListA and 2 in ListB.

Related

Neo4jClient : C# query to fetch collection with multiple columns

How one can fetch multiple columns data using neo4jClient -
For eq. the example shown on link
Cyper query to fetch multiple column collection
The sample shown above passes properties of event node for collection instead of complete event node.
The query I am constructing takes few properties from the event node and few properties from the relation.
For eq. The relation attribute "registerd_on" needs to be added.
So how to pass multiple properties for collection ?
It's not very nice, but if you look at what is returned by doing a collection you get an array of arrays, but these arrays don't have properties as such, so you can only really parse them as string.
Using the :play movies dataset as a base:
var query = gc.Cypher
.Match("(p:Person {name:'Tom Hanks'})-->(m:Movie)")
.With("p, collect([m.title, m.released]) as collection")
.Return((p, collection) => new
{
Person = p.As<Person>(),
Collection = Return.As<IEnumerable<IEnumerable<string>>>("collection")
});
where Person is :
public class Person
{
public string name { get; set; }
}
You can then access the data like so:
foreach (var result in results)
{
Console.WriteLine($"Person: {result.Person.name}");
foreach (var collection in result.Collection)
{
foreach (var item in collection)
{
Console.WriteLine($"\t{item}");
}
}
}
which is not nice :/

Database First with Multiple Tables with Foreign Keys in a Single View

I have two tables department and teacher like this:
Department table (DeptID is the primary key)
DeptID | DeptName
1 P
2 C
3 M
Teacher table (DeptID is a foreign key)
DeptID | TeacherName
1 ABC
1 PQR
2 XYZ
I have used database first approach to create a single model out of these two tables. I want to display both details in a single view like this:
TeacherName | DeptName
ABC P
PQR P
XYZ C
I tried to create controllers using scaffolding but it would provide views and CRUD operations for a single table in the model.
Is there any method using which I can map these two tables together in a single view ? or is it possible (easily achievable) when I use different models for each table in the database ?
You have to create Viewmodel.
public class DepartmentTeacher
{
public int DeptID {get;set;}
public string DeptName {get;set;}
public int TeachID {get;set;}
public string TeachName {get;set;}
}
using (var db = new SchoolContext())
{
var query = (from tc in db.Teacher
join dp in db.Department on tc.DeptID equals dp.DeptID
//where st.STUDENT_ID == Customer_Id maybe you need
select new
{
dp.DeptName,
tc.TeachName
});
foreach (var item in query)
{
DepartmentTeacher.DeptName = item.DeptName;
DepartmentTeacher.TeachName = item.TeachName;
}
}
return View(DepartmentTeacher);
You can use every process this viewmodel.However you have to description this Viewmodel on your view page.

Batching Neo4J updates for speed

My goal is to have a list of words in the Oxford dictionary with a relationship between them called IS_ONE_STEP_AWAY_FROM. Each word in relationship is the same length and varies by only one letter.
I am currently able to batch insert the words themselves, but how can I batch insert these relationships?
class Word
{
public string Value { get; set; }
}
public void SeedDatabase()
{
var words = new Queue<Word>();
EnqueueWords(words);
//Create the words as a batch
GraphClient.Cypher
.Create("(w:Word {words})")
.WithParam("words", words)
.ExecuteWithoutResults();
//Add relationships one word at a time
while (words.Count > 0)
{
var word = words.Dequeue();
var relatedWords = WordGroups[word.Value].Except(Enumerable.Repeat(word.Value, 1)).ToList();
if (relatedWords.Count > 0)
{
foreach (string relatedWord in relatedWords)
{
GraphClient.Cypher
.Match("(w1 :Word { Value : {rootWord} }), (w2 :Word { Value : {relatedWord} })")
.Create("(w1)-[r:IS_ONE_STEP_AWAY_FROM]->(w2)")
.WithParam("rootWord", word.Value)
.WithParam("relatedWord", relatedWord)
.ExecuteWithoutResults();
}
}
}
}
Peter do you have an index or constraint on :Word(Value) ?
I also don't fully understand how this batches:
GraphClient.Cypher
.Create("(w:Word {words})")
.WithParam("words", words)
.ExecuteWithoutResults();
And where you define the property-name (aka Value).
Do you run this currently concurrently?
For the relationships I'd recommend to group them by start-word.
then you could do something like:
MATCH (w1:Word {Value:{rootWord}})
UNWIND {relatedWords} as relatedWord
MATCH (w2:Word {Value:relatedWord}}
CREATE (w1)-[r:IS_ONE_STEP_AWAY_FROM]->(w2);
Neo4jClient also still have to learn to use the new transactional endpoint.

Default node type for Node<T>

I am starting to investigate the use of Neo4j using the neo4client API.
I have created a basic database, and can query it using the web client. I am now trying to build a sample C# interface. I am having some problems with index lookups. My database consists of nodes with two properties: conceptID and fullySpecifiedName. Auto-indexing is enabled, and both node properties are listed in the node_keys_indexable property of neo4j.properties.
I keep getting IntelliSense errors in my C# when using the Node class. It appears to be defined as Node<T>, but I don't know what to supply as the value of the type. Consider this example from this forum...
var result = _graphClient
.Cypher
.Start(new
{
n = Node.ByIndexLookup("index_name", "key_name", "Key_value")
})
.Return((n) => new
{
N = n.Node<Item>()
})
.Results
.Single();
var n = result.N;
Where does the "Item" in Node<Item> come from?
I have deduced that the index name I should use is node_auto_index, but I can't figure out a default node type.
Item is the type of node you have stored in the DB, so if you have you're storing a class:
public class MyType { public int conceptId { get; set; } public string fullySpecifiedName { get;set; } }
You would be retrieving Node<MyType> back.
Simple flow:
//Store a 'MyType'
_graphClient.Create(new MyType{conceptId = 1, fullySpecifiedName = "Name");
//Query MyType by Index
var query =
_graphClient.Cypher
.Start(new { n = Node.ByIndexLookup("node_auto_index", "conceptId", 1)
.Return<Node<MyType>>("n");
Node<MyType> result = query.Results.Single();
//Get the MyType instance
MyType myType = result.Data;
You can bypass the result.Data step by doing .Return<MyType>("n") instead of Node<MyType> as you'll just get an instance of MyType in that case.

How to do multiple Group By's in linq to sql?

how can you do multiple "group by's" in linq to sql?
Can you please show me in both linq query syntax and linq method syntax.
Thanks
Edit.
I am talking about multiple parameters say grouping by "sex" and "age".
Also I forgot to mention how would I say add up all the ages before I group them.
If i had this example how would I do this
Table Product
ProductId
ProductName
ProductQty
ProductPrice
Now imagine for whatever reason I had tons of rows each with the same ProductName, different ProductQty and ProductPrice.
How would I groupt hem up by Product Name and add together ProductQty and ProductPrice?
I know in this example it probably makes no sense why there would row after row with the same product name but in my database it makes sense(it is not products).
To group by multiple properties, you need to create a new object to group by:
var groupedResult = from person in db.People
group by new { person.Sex, person.Age } into personGroup
select new
{
personGroup.Key.Sex,
personGroup.Key.Age,
NumberInGroup = personGroup.Count()
}
Apologies, I didn't see your final edit. I may be misunderstanding, but if you sum the age, you can't group by it. You could group by sex, sum or average the age...but you couldn't group by sex and summed age at the same time in a single statement. It might be possible to use a nested LINQ query to get the summed or average age for any given sex...bit more complex though.
EDIT:
To solve your specific problem, it should be pretty simple and straightforward. You are grouping only by name, so the rest is elementary (example updated with service and concrete dto type):
class ProductInventoryInfo
{
public string Name { get; set; }
public decimal Total { get; set; }
}
class ProductService: IProductService
{
public IList<ProductInventoryInfo> GetProductInventory()
{
// ...
var groupedResult = from product in db.Products
group by product.ProductName into productGroup
select new ProductInventoryInfo
{
Name = productGroup.Key,
Total = productGroup.Sum(p => p.ProductCost * p.ProductQty)
}
return groupedResult.ToList();
}
}

Resources