Neo4J Dictionary Dense Node? - neo4j

I am building a Neo4J graph that needs to contain a controlled vocabulary (namely the Getty AAT Thesaurus). Whenever I add a new term from the thesaurus I have a relationship:
(aat:Thesaurus)-[:LISTS]->(term:Term {term:"Something"})
I have a read a bit about a dense node problem in neo4j and am wondering if I have 100,000 t-[:LISTS]->term if that will cause a problem as our database grows. Any idea?

If there is only a single Thesaurus node, then you can get rid of the Thesaurus node and the LISTS relationships.

Related

Find all nodes with two-way relationships starting from one specific node using cypher in neo4j

neo4j nodes and relationships
This is quite a tough job. I'm trying to find all nodes with two-way relationships starting from a specific node. Based on the image above, I would like to find all two-way relationships starting from node 1. Only nodes with two-way relationships match. For example, node 1,3,4 matches and node 1,2,3 matches as two separate groups. However, if node 2 and 4 has a two-way relationship, then node 1,2,3,4 matches as one group. The main idea is that all nodes are linked both ways in such a group. My idea is to find all nodes with two-way relationships starting from 1 and continue processing, but I'm not able to continue. Can anyone help me with this problem, thanks a lot. By the way, only the largest 'two-way-circle' is needed.
Your problem looks a lot like finding strongly connected components in the graph. As defined in the docs.
A directed graph is strongly connected if there is a path between all
pairs of vertices ( nodes ). This algorithms treats the graph as directed, so
the direction of the relationship is important and strongly connected
compoment exists only if there are relationships between nodes in both
direction.
Check out more in the documentation. You will need neo4j-graph-algorithms.
Example query with writing back the component of the graph to the node.
CALL algo.scc('Label','C', {write:true,partitionProperty:'partition'})
YIELD loadMillis, computeMillis, writeMillis, setCount, maxSetSize, minSetSize
And then you can find your biggest component with the following query.
MATCH (u:Label)
RETURN distinct(u.partition) as partition,count(*) as size_of_partition
ORDER by size_of_partition DESC LIMIT 1

Neo4j, do nodes without relationships affect performance?

How do nodes without relationships affect performance?
The input stream contains duplicate nodes and once I've determined that a node is not of interest I'd like a short-hand way to know that I've already seen this node and want to disregard it.
If I store one instance of the node in the db without any relationships will it impact performance? Potentially the number of relationship-less nodes is very large.
Usually these don't affect performance, they take up space on disk but will not be loaded if you don't access them. And as you don't traverse them it doesn't matter that much.
I would still skip them, you can do it both with neo4j-import there is a --skip-duplicate-nodes option as well as LOAD CSV or Cypher in general, there is the MERGE clause which only creates a new node if it is not already there.

What would be better for performance in a Neo4J database, many relationships from nodes or reducing relationships by copying subgraphs?

I am considering using Neo4j to track multiple users' content that is organized in a graph structure. So a user would create a graph "A", but then another user could link their own content in their own graph "B" to a node in graph "A". Eventually I could have X number of users and hence X relationships stemming from a single node in graph "A" into other graphs. So at some point, would it be better to copy the nodes from the "A" graph into a new subgraph that "B" can link off of and then own?
It seems to be a relationship indexing versus node indexing problem.
I also heard that newer Neo4J will be improving relationship transversal through hash maps or potentially b-trees which would improve the relationship searching.
I would go for the most intuitive representation (no copying). Design what is best for your domain and do the optimizations later if needed. I recommend reading the chapter 'Avoiding Anti-Patterns' in the Graph Databases book

Modelling alternatives and performance when traversing a tree structure in Neo4J

I modelled a tree structure using the Neo4J graph database. All nodes represent a category with a characterising name. So I have to traverse my tree very often from the root to a specific node / category. To which node depends on a list coming as input. This list contains strings representing the names of the categories from the root to the target node.
I wonder, if it would be effective to store these names as the types of the edges instead of a name property in the particular nodes.
I thought that when I do so, Neo4J doesn't have to look for the fitting name property of every child node every time going a step deeper in the tree. Instead Neo4J could lookup the name in the map that contains the outgoing edges.
What do you think?
Sounds sensible. How many different names do you have? If it is just categories those shouldn't be millions.
Did you load your data into the graph and run a performance comparison between both approaches? Is it a performance critical thing in your graph?

Neo4j - get all articulation vertices

using Neo4j, I would like to get all the articulation vertices (vertices/nodes that when removed, splits the graph in more connected components) from my graph.
Is there an easy way to do it (without completely re-implementing DFS)?
Alternatively, is there a possibility to do a traversal with the exclusion of a certain node? (and its relationships) (I have a fairly small number of nodes, using neo4j embedded so optimal O() is not critical)
you could exclude nodes by not continuing past them, e.g. with the Traversal Framework, see http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-traversal.html#_new_traversal_framework. Also, you could implement your own RelationshipExpander that will not expand relationships to your node to avoid in a traversal, see http://components.neo4j.org/neo4j/1.5.M01/apidocs/org/neo4j/graphdb/RelationshipExpander.html
HTH
/peter

Resources