neo4j - find child nodes with property value is slow - neo4j

I have a graph with approximately 1 million nodes.
The graph represents a catalog tree (spare parts). Maximum deep is about 6.
A node have a filter property that can have any value, even empty. This filter property is used to filter the catalog for the user.
What I want is to ask a question like this when I click a node (any level):
"for each child node, tell me if any of its children (any level) has a filter attribute with a value of ...".
With my query I takes about 12 sec for each child to get the result. Should not this scenario be an ideal use case for neo? Shouldn't it be way faster?
I can send the nodes and relations as text files if you want the data.
my query is something like this:
start n=node(3)
match n-[:PARENT_ITEM;1..6]->x
where x.filter="something"
return count(x)
I'm running on a Windows Azure Large server (4 cores, 7Gb ram) and i haven't done any configurations after neo installation.

Related

Why so many db hit in neo4j?

There is total 1 Category node and 2 Template node in my case. I put an * in [*] to support more further scenarios. But why there are so many db hit in this cypher for current data?
It's probably the * in the relationship part of your query that's doing it.
While you've got only one Category node and two Template nodes, you've asked Neo4j to hop through any number of relationships to get from one to the other and not given it any help to narrow down the search besides specifying the starting node.
For example, if your Category was connected to 100,000 other nodes (of any label, not just Template) you've forced Neo4j to jump through every single one of them looking to see if there's a path to a Template node - and if those nodes have their own connections then they all need to be explored too, because the depth of the traversal isn't constrained.
If you know how Category and Template nodes can be connected in ways you're interested in (for example, if there's only every some specific set of relationships you want to traverse) then you'll radically improve the performance of the query. Equally, reducing the maximum length of the path will help.

Neo4j Cypher query: perform calculation on relationships' properties during traversing process

I am working on RPG game, exactly on the exchange artifacts component. I am using Neo4j graph database to store all artifacts and players exchange orders for these artifacts.
Graph diagram looks as follows:
:Exchange relationships represents players exchange orders. For e.g.: Player B is exchanging 1 Mega Boots artifact for 10 gold. Player C is exchanging 1 Mega Helmet for 2 pairs of Mega Boots.
So now, I am working on creating cypher query that should provide different paths. Each path should reveal artifact exchange orders sequence, so in the end I will get more gold then I have at the begging.
For e.g.: existing gold amount 100.
Path1: Gold->MegaBoots->MegaHelmet->MegaSword->Gold, Number of gold after all exchanges 115
Path2: Gold->MegaBoots->MegaHelmet->Gold, Number of gold after all exchanges 111
Complexity: During moving between 2 adjacent nodes, query should determine (make calculation on properties of relationship that connects these nodes), whether I have enough resources to get to endNode.
For e.g.:
Initially, gold amount is 10 and query starts moving from startNode :Artifact({name=gold}) to it's adjacent node :Artifact({name=MegaBoots}). Query sees 2 :Exchange relationships and selects only relationship with id=2, as it's baseResourceAmount property is equal to initial gold amount (relationship with id=1 is not suitable for as, it's baseResourceAmount value greater then initial gold amount).
Now, query moves from node :Artifact({name=MegaBoots}) to end node :Artifact({name=MegaHelmet}) using :Exchange relationship with id=4 as after 1st exchange our resource amount is 2 which is equal to relationship's baseResourceAmount property value.
Eventually, the final path will be Gold--:Exchange(id=2)-->MegaBoots--:Exchange(id=4)-->MegaHelmet
So, does anyone know how to tell Cypher to make specific calculations on properties of relationships that bridge 2 adjacent nodes?

How does the Minimum spanning tree in neo4j work

I am playing around with some graph theory algorithms in neo4j. I am trying to find the minimum spanning tree (mst) within my network. I synthetically created a network of 10 000 people. Each person has 12 relationship types each one linking him back to the other 9999 and each relationship with its own weight assigned.
The problem I have however is the fact that according to the definition the results must be a tree spanning over the ENTIRE network. The neo4j function however only returns a very small sub-graph (only about 12 nodes) of the entire network.
The code I am using looks like this:
MATCH (a:Name {Name:"Dillon Snow"})
CALL algo.mst(a,"Weight",{stats:true})
YIELD loadMillis, computeMillis, writeMillis, weightSum, weightMin, weightMax, relationshipCount
RETURN loadMillis, computeMillis, writeMillis, weightSum, weightMin, weightMax, relationshipCount
What can I change to get the function to return the mst spreading through the entire network
algo.mst.* has not been adapted to the matured Neo4j-Graph-Algorithms-CoreAPI in its current release (3.2.5.2/3.3.0.0 # Dec 2017) which might lead to unexpected results. But there is a pull request in the pipe, you can expect some changes in the next release.
Anyway.. The procedure should add a new relationship-type (default mst) to your nodes. In a connected graph each node should be connected as well while a disconnected graph leads to connections only between the nodes of this particular connected component (from your startNode).
If i understand you right you have multiple relationship types and more then one of them between a pair of nodes? E.g. Node A is connected to Node B with several relations, each of them with a different type and property value. This is a problem. In general the Graph-Algorithms-API does not support multible releationships. Each pair of nodes can only have one connection per direction. Although you can import multible types the core-api itself has no idea of the underlying type. If multible relationships between a pair of nodes get imported usualy the last one wins. This has been mentioned in the documentation ;)
To overcome this limitation you could replace your relationship types with some kind of artificial nodes. When traversing over the result tree the occurence of one of those nodes would indicate the original relationship.

Single node with properties takes forever to query

I have a 50K node graph with 10 properties per node. Each node of the same type but with different values. Each of the properties is on an index and I have increased the heap and page cache memory sizes for the database. However using the browser console, creating the nodes takes 6 minutes!
And also a query for all the properties takes a very long time (~2 minutes) to appear in the browser console but when the results do appear the bottom of the browser says that the result of 50K node properties took only 2500 ms.
How do I improve the performance importing/querying 10's of thousands of unique instances a single node with 10 properties each and no relationships?
It takes time to update 10 different indexes for each node that you create. Do you really have use cases that require an index for every single property? If not, get rid of the indexes you do not need. Remember, indexes can speed up finding the first node(s) to initiate a query, but they do not help at all when traversing paths through a graph.
If you really need all 10 indexes, then to speed up the importing step, you can: drop all the indexes, import all 50K nodes, and then create each index one at a time (which will take some time for each index). The overall time will be about the same, but the import itself should be much faster.
It takes the neo4j browser a very long time to generate and display the visualization for a very large result (e.g., 10's of thousands of nodes). The browser is not intended for viewing that much data at one time.
1) Check that you are running a recent version of Neo4j. 3+ has optimised the way that properties are stored and indexed.
2) Check how you're running the query. Maybe your query is not optimised or is problematic in some way. Note in particular that each MATCH generates a 'row'. Multiple MATCH clauses will yield the Cartesian product of all matched sets, which could be problematic with large armounts of data.
3) Check that each of these properties needs to be attached to a node. Neo4j is optimised for searching for relationships, not for properties.
Consider turning nodes that look like this:
(:Train {
maxSpeedInKPH: 350,
fuelType: 'Diesel',
numberOfEngines: 3
})
to
(:Train)
-[:USES_FUEL_TYPE]->(:Fuel {type: 'Diesel'}),
-[:HAS_MAX_SPEED]->(:MaxSpeed {value: 350, unit: 'k/h'}),
-[:HAS_ENGINE]->(:Engine),
-[:HAS_ENGINE]->(:Engine),
-[:HAS_ENGINE]->(:Engine)
There is generally a benefit to spinning properties out into relationships, even if the uniqueness is low. For example if you have a property which has a unique value per node, generally keep that in the node. But if your 50000 nodes have less, say, 25000 unique values in that property, it would probably still be beneficial to spin them out into relationships. This is absolutely the case with integer-type properties, where you'll also be able to add additional "bucket relationships" to provide a form of indexing. In the example above, the max speed was 350. After spinning the property out into a relationship, you could also put an additional relationship of the type [:HAS_MAX_SPEED_ABOVE]-> 300. This would complicate your querying, but should make it faster.
4) If none of the above apply to you, cannot be implemented or do not help, consider switching to a more traditional relational database like SQL. SQL would be a perfect candidate for your use case, i.e. 50k different nodes (rows) with only 10 different properties (columns) and no relationships (joins).

Most efficient way to get all connected nodes in neo4j

The answer to this question shows how to get a list of all nodes connected to a particular node via a path of known relationship types.
As a follow up to that question, I'm trying to determine if traversing the graph like this is the most efficient way to get all nodes connected to a particular node via any path.
My scenario: I have a tree of groups (group can have any number of children). This I model with IS_PARENT_OF relationships. Groups can also relate to any other groups via a special relationship called role playing. This I model with PLAYS_ROLE_IN relationships.
The most common question I want to ask is MATCH(n {name: "xxx") -[*]-> (o) RETURN o.name, but this seems to be extremely slow on even a small number of nodes (4000 nodes - takes 5s to return an answer). Note that the graph may contain cycles (n-IS_PARENT_OF->o, n<-PLAYS_ROLE_IN-o).
Is connectedness via any path not something that can be indexed?
As a first point, by not using labels and an indexed property for your starting node, this will already need to first find ALL the nodes in the graph and opening the PropertyContainer to see if the node has the property name with a value "xxx".
Secondly, if you now an approximate maximum depth of parentship, you may want to limit the depth of the search
I would suggest you add a label of your choice to your nodes and index the name property.
Use label, e.g. :Group for your starting point and an index for :Group(name)
Then Neo4j can quickly find your starting point without scanning the whole graph.
You can easily see where the time is spent by prefixing your query with PROFILE.
Do you really want all arbitrarily long paths from the starting point? Or just all pairs of connected nodes?
If the latter then this query would be more efficient.
MATCH (n:Group)-[:IS_PARENT_OF|:PLAYS_ROLE_IN]->(m:Group)
RETURN n,m

Resources