I am working on the project with Spring and Neo4j database. I configured my Neo4j database to be rest neo4j. This is the configuration:
<neo4j:config graphDatabaseService="graphDatabaseService" />
<bean id="graphDatabaseService" class="org.springframework.data.neo4j.rest.SpringRestGraphDatabase">
<constructor-arg index="0" value="http://localhost:7474/db/data" />
</bean>
At the beginning I was using notations on my domain objects (#NodeEntity, #RelatedTo, etc.) and repositories to save nodes and relationships. My domain objects are User(id, name), Item(id, name, description, list of terms), Term(content, count). So there not so many properties out there, but even so, saving the object through repository, for example a User with defined id and name lasted for 25 seconds.
I read that this kind of communication with neo4j database is not yet well optimized, so I switched on using the Neo4jTemplate.
This is a example of saving the user (constraints in User are Strings ("id", "name", "USER"):
public Node saveUser(User user) {
Node node = template.createNode();
node.setProperty(User.ID, user.getId());
node.setProperty(User.NAME, user.getName());
node.setProperty("_type", User.LABEL);
template.index(INDEX_ID, node, User.ID, user.getId());
return node;
}
And this is a example of saving the item with relationships to its terms. So each term is a node which is connected to the item:
public Node saveItem(Item item) {
Node node = template.createNode();
node.setProperty(Item.ID, item.getId());
node.setProperty(Item.NAME, item.getName());
node.setProperty(Item.DESCRIPTION, item.getDescription());
node.setProperty("_type", Item.LABEL);
template.index(INDEX_ID, node, Item.ID, item.getId());
for(String termContent : item.getTerms()) {
Node term = termRepository.getNodeByContent(termContent);
if(term == null) {
term = termRepository.saveTerm(new Term(termContent));
} else {
termRepository.addCountToTerm(term);
}
int frequency = 1;
Relationship contains = node.createRelationshipTo(term, RelationshipTypes.CONTAINS);
contains.setProperty(Term.FREQUENCY, frequency);
}
return node;
}
The object termRepository (it isn't extending GraphRespository<Term>) has methods which are similar to the method of saving the user. Fetching the term is done like this:
public Node getNodeByContent(String content) {
if(!template.getGraphDatabaseService().index().existsForNodes(INDEX_ID))
return null;
return template.lookup(INDEX_ID, Term.CONTENT, content).to(Node.class).singleOrNull();
}
And, finally what is my problem. Even now it is still slow, inserting user (only parameters id and name) and indexing it takes for 3 seconds, and inserting item where it is being connected to terms takes for 30 seconds (for 4 terms - which is very small ammount according to the number of 60-70 which I will have in real situation).
Please, can you give me some hint or anything else that could help me with this kind of issue?
Thanks in advance.
This is really strage, where does your server run? Seems to be something with the network setup.
I mean SDN over REST is not fast, but it is also not that slow.
Can you share your classes too?
You should not do the individual property updates over the wire. Use cypher statements that create all the properties in one go.
There is also neo4jTemplate.createNode(map of properties) which does it as one operation.
Related
I am currently trying to unwind a list of objects that I want to merge to the database using the Neo4J Client. What I would like to do is unwind the list and create the nodes with a label generated based on a property from the items themselves instead of hardcoding a label name. From what I can find I have to use the APOC merge method to do so. However, I am unable to translate this to the Neo4J client. In the neo4J explanation they yield a node after the apoc.merge.node call and then return the node. However, I cannot simply return the node nor can I set the node (I got to the point of just messing about, and at one point I got the labels to work but it overwrote all properties with the last item in the list).
I seem to miss something fundamental but i'm not quite sure what. Does anyone here know how to do this with neo4J client (and if possible, give a bit of an explanation what is going on)? I am very new to the development world and I feel I am just missing a crucial piece of understanding when it comes to this..
The code that I tried that turned all properties into the last node's properties but at least created the labels as I expected:
public async void CreateBatchItems(List<TToDataBase> itemList)
{
await Client.Cypher
.Unwind(itemList, "row")
.Merge("(n)")
.With("row, n")
.Call("apoc.merge.node([n.Name], n)").Yield("node")
.Set("n += node")
.ExecuteWithoutResultsAsync();
}
Thank you in advance!
Edit:
Some clarification about the input:
The objects are actually very basic, as (at least for now), they merely contain a name and an objectID (and these object ID's are later used to create relations). So its a very basic class with two properties:
public class Neo4JBaseClass
{
public Neo4JBaseClass() { }
public Neo4JBaseClass(string name, string objectId)
{
Name = name;
ObjectId = objectId;
}
[JsonProperty(PropertyName = "ObjectId")]
public string ObjectId { get; set; }
[JsonProperty(PropertyName = "Name")]
public string Name { get; set; }
}
I have also tried a slight variation where this class also has the added property
[JsonProperty(PropertyName = "PropertyMap")]
public IProperty PropertyMap { get; set; }
where PropertyMap is another basic object holding the name and objectId. This seemed like a good idea for future proofing anyway, so the propertylist can be easily expanded without having to change the base object.
[EDITED]
The main issue is that Merge("(n)") matches any arbitrary node that already exists.
You have not shown the data structure for each element of itemList, so this answer will assume it looks like this:
{Name: 'SomeLabel', id: 123, Props: {foo: 'xyz', bar: true}}
With above data structure, this should work:
public async void CreateBatchItems(List<TToDataBase> itemList)
{
await Client.Cypher
.Unwind(itemList, "row")
.Call("apoc.merge.node([row.ObjectId], row.id)").Yield("node")
.Set("node += row.Props")
.ExecuteWithoutResultsAsync();
}
[UPDATE]
The data structure you added to your question is very different than what I had imagined. Since neither of the properties in a row is a map, .Set("node += row.Props") would generate an error.
Using your data structure for each row, this might work:
public async void CreateBatchItems(List<TToDataBase> itemList)
{
await Client.Cypher
.Unwind(itemList, "row")
.Merge("(n:Foo {id: row.ObjectId})")
.Set("n += row.Name")
.ExecuteWithoutResultsAsync();
}
This code assigns the node label Foo to all the generated nodes. A node should always have a label, which improves clarity and also tends to improve efficiency -- especially if you also create indexes. For example, an index on :Foo(id) would make the above query more efficient.
This code also assumes that the id property is supposed to contain a unique Foo node identifier.
I am trying to implement a plugin for neo4j to add an autoincrement ID using GraphAware library. To this end, I've written the following classes:
public class ModuleBootstrapper implements RuntimeModuleBootstrapper
{
public RuntimeModule bootstrapModule(String moduleId, Map<String, String> config, GraphDatabaseService database)
{
return new MyModule(moduleId, config, database);
}
}
And:
public class MyModule extends BaseTxDrivenModule<Void>
{
int counter = 0;
public Void beforeCommit(ImprovedTransactionData transactionData)
throws DeliberateTransactionRollbackException
{
if (transactionData.mutationsOccurred()) {
for (Node newNode : transactionData.getAllCreatedNodes()) {
newNode.setProperty("id", counter++);
}
}
}
}
And for the testing I can execute:
CREATE (n);
And then:
MATCH (n) RETURN n;
And I can see the effect of my plugin as some id property added to the node. But when I run:
CREATE (n) RETURN n;
The returned node does not have the mentioned id property but again, when I match the node in a separate query, I see the things have worked out just fine. It's just that in the CREATE query, the returned node infos are the ones before my plugin modified them.
The questions are; why is that? Didn't I modify the nodes through my plugin within the transaction? Shouldn't the returned nodes be showing the modifications I've made on them? Is there any way I can make this happen?
While you're still within the transaction, the Cypher result is already computed and there is no clean way to add additional informations to it.
I guess a feature request on the neo4j repository could be cool but in total honesty this would require a serious change into the neo4j core codebase.
BTW, the incremental ID is already implemented in the graphaware-uuid plugin : https://github.com/graphaware/neo4j-uuid#specifying-the-generator-through-configuration
I know this may sound nonsense but I will try my best to explain the problem that I am facing with neo4jClient.
I developing a Social TaskManagment Software. and for the server side code and data Storage I choose to have neo4j (neo4jClient) and C# in place.
in Our software Imagine a User : (mhs) Post a Task of "#somebody please help me on #Neo4j #cypher" the task would be decorated with some fancy character including (# # + /) the obove post will be save in neo4j as graph like this :
(post:Post {Text : #somebody please help me on #Neo4j})-[Has_HashTag]-(neo4j:HashTag)
(post:Post)-[Has_HashTag]-(Cypher: HashTag)
(post:Post)-[Has_Author)-[mhs:User]
(post:Post)-[Has_MentionedUser)-[Somebody:User]
Now Imagine User #mhs is trying to search the Post that Have HashTag of #Neo4j and mentioned to #somebody
Here I am building (hand Craft) a Cypher Query Including the 2 seach Paramerts in Cypher with Some Fancy C# code which result the blow Cypher Query:
MATCH (nodes)-[r]-(post:Post),
(post:Post)-[:HAS_MentionedUsers]->(assignee1307989068:User),
(nodes)-[r]-(post:Post)-[:HAS_HashTags]->(Hashtag1482870844:HashTag)
WHERE (assignee1307989068.UserName = "somebody") AND (Hashtag1482870844.Value = "neo4j")
RETURN post AS Post, Collect(nodes) as nodes
ORDER BY post.creationDate
the above cypher will return a post with just all the nodes of the post that is not included in Where clause.
my question is how to include all the Nodes related to the Targeted (post) without including them in Return part of the cypher. Something like return (*).
The Other problem is How can I deserialize the result set into C# without knowing what shape it may have.
The Search method that I am producing the mentioned Cypher is as fallow:
public List<PostNode> Search(string searchterm)
{
List<string> where = new List<string>();
var tokenizedstring = searchterm.Split(' ');
var querystring = new StringBuilder();
var relatedNodes = new List<string>();
var q = new CypherFluentQuery(_graphClient) as ICypherFluentQuery;
foreach (var t in tokenizedstring)
{
_commandService.BuildPostQueystring(t, ref querystring, ref where, ref relatedNodes);
}
if (querystring[querystring.Length - 1] == ',')
querystring = querystring.Remove(querystring.Length - 1, 1);
q = q.Match(querystring.ToString());
int i = 1;
if (where.Count > 0)
q = q.Where(where[0]);
while (i < where.Count)
{
q = q.AndWhere(where[i]);
i++;
}
var rq = q.Return(
(post, nodes) => new PostNode
{
Post = post.As<Node<string>>(),
Nodes = nodes.CollectAs<string>()
})
.OrderBy("post.creationDate");
var results = rq.Results.ToList();
//foreach (var result in results)
//{
// //dynamic p = JsonConvert.DeserializeObject<dynamic>(result.Post.Data);
// //dynamic d = JsonConvert.DeserializeObject<dynamic>(result.Nodes.Data);
//}
return results;
}
}
//Some Helper Class just to cast the result.
public class PostNode
{
public Node<string> Post { get; set; }
public IEnumerable<Node<string>> Nodes { get; set; }
}
But as you may noticed it does not have the nodes that is included in the search term via Where Clause in Cypher Query.
I am really stopped here for a while, as I can not provide any decent solution for this. so your help may save me a lot.
may be i am totally in a wrong direction so please help in any way you can think of.
It appears that a list of UNKNOWN related nodes are being provisioned, so one of this:
Scenario A
It doesn't matter what EXACTLY those related nodes are, I just want them.
Question: What is intention of retrieving those unknown but related nodes? By answering this chances are this query could be improved for good.
Scenario B
These unknown related nodes are actually known! It's just they are not fully known at time of query execution however down the road somewhere in C# code we will have things like this
if (nodes.Any(_ => _ is HashTag) {...}
Question:
This type of behaviour requires to KNOW the type. Even with reflection or C# dynamic stuff that requirement is still there because Neo4jClient has no way of correlating a bag of JSON data coming from Neo4j into any local type. When Neo4jClient receives bulk of data over wire somehow it should know what type would YOU prefer to represent. This is why queries are always like this:
Return((a, p) => new
{
Author = a.As<Author>(), //we expect node content to be represented as Author
Post = p.As<Post>()
})
Neo4jClient does NOT preserve C# types inside your Neo4j database. It would have been nasty to do so. However, idea behind it is that you shouldn't find yourself desperate for it and if you do so then I would recommend looking for problem somewhere else i.e. why would you rely on your client library to describe your domain for you?
According to Neo4j documentation the "reference node concept is obsolete - indexes are the canonical way of getting hold of entry points in the graph.".
However if I use GlobalGraphOperations.getAllNodes() I'm still returned a node with id 0 which I didn't create and which has all the looks of a reference node.
I'm trying to implement a method getNode(String uuid)
public Node getNode(String uuid)
{
GlobalGraphOperations globalGraphOperations = GlobalGraphOperations.at(graphDb);
for(Node tmpNode : globalGraphOperations.getAllNodes())
{
if(tmpNode.equals(graphDb.getReferenceNode()))
{ continue;}
String tmpNodeUuid = (String)tmpNode.getProperty("uuid");
if (tmpNodeUuid.equals(uuid))
{
return tmpNode;
}
}
return null;
}
why does getAllNodes return a reference node?
how to implement programmatically getNode() without using deprecated function getReferenceNode()?
The reference node concept is indeed deprecated and will be removed with Neo4j version 2.0. In 1.x the concept still exists and the reference node is created when the database is intially created. If you don't need it, you can just delete the reference node. The method you're writing is gonna get slow as the graph grows as the entire graph is traversed. You should create an index for the UUID property and use that to look up nodes in the graph, which is much faster. As well as being the 'canonical way of getting hold of entry points in the graph' :-)
I have created nodes using Spring using the following basic process (see below). I have a POJO for my Concepts and using this object and a Neo4J template to create nodes with indexes. I am still unable to discover what the 'KEY' for the created index is. I know the name of the index is 'CID' but assumed the 'KEY' would be 'conceptId'. However, when I use the following query (see below), no data is returned. It is confirmed that the Index does exist, but I am unable to find out what the proper 'KEY' for said index is so I can utilize it to improve query performance. I am able to query a specified node using WHERE clause searching for a specific value for a property of said node. However, when I try to find the node using the Index 'CID' with Key 'conceptId' no nodes are returned.
// Concept POJO
#NodeEntity
public class Concept {
#GraphId
private Long nodeId;
#Indexed(indexName="CID", fieldName="conceptId")
private Long conceptId;
...
// service where code to create Concept Nodes exists
#Repository
public class ConceptService {
#Autowired
private Neo4jTemplate n4jTemplate;
#Autowired
private ConceptRepository cr;
// Call to create node in a 'service'
public void addConceptNode(Concept concept) {
concept = n4jTemplate.save(concept);
}
...
//Cypher Queries used to retrieve nodes using index
START a=node:CID( conceptId = "66573009")
RETURN a;
// this returns 0 nodes quickly
START a=node:CID( conceptid = "66573009")
RETURN a;
// this returns 0 nodes quickly
START a=node:CID( CID = "66573009")
RETURN a;
// this returns 0 nodes quickly
START a=node:CID( cid = "66573009")
RETURN a;
// this returns 0 nodes quickly
START a=node:CID( CONCEPTID = "66573009")
RETURN a;
// this returns 0 nodes quickly
// Cypher query not using index to retrieve same node
START a=node(*)
WHERE HAS(a.conceptId) AND a.conceptId = 66573009
RETURN a;
// this returns 1 node in 77365ms
//'quickly' = approx.(43-87ms).
There is more to the code than what is shown, but this gives you the basic gist of how I am creating nodes with indexes in a Neo4J DB. There are more properties and more indexes. When using Spring to retrieve the nodes it seems to 'auto' use (I am assuming it is using the index) the index created because it returns the results faster than using the Neo4J data browser.
Any help would be greatly appreciated.
Thanks!