Neo4j - get all articulation vertices - neo4j

using Neo4j, I would like to get all the articulation vertices (vertices/nodes that when removed, splits the graph in more connected components) from my graph.
Is there an easy way to do it (without completely re-implementing DFS)?
Alternatively, is there a possibility to do a traversal with the exclusion of a certain node? (and its relationships) (I have a fairly small number of nodes, using neo4j embedded so optimal O() is not critical)

you could exclude nodes by not continuing past them, e.g. with the Traversal Framework, see http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-traversal.html#_new_traversal_framework. Also, you could implement your own RelationshipExpander that will not expand relationships to your node to avoid in a traversal, see http://components.neo4j.org/neo4j/1.5.M01/apidocs/org/neo4j/graphdb/RelationshipExpander.html
HTH
/peter

Related

Cypher - unlimited path length and large path length queries hang

I am using Neo4j Community 4.0.4.
I have encountered this issue using the offical Bolt driver for Python, but it is also completely reproducible in the Neo4j browser (version 4.0.7).
I have a very simple graph for now, consisting of the following node and relationship types:
(:Document)-[:contains]->(:Block)
(:Block)<-[:prev]-(:Block)-[:next]->(:Block)
There are only 75 nodes in my entire test database for now - 1 Document node and 74 Block nodes.
Running the following Cypher statement brings the CPU to 100% and the memory utilization rises indefinitely, after which I have to kill the session:
match (d:Doc{name: 'doc name'})
optional match (d)-[*]-(n)
return d,n
I also got the Java heap size error at some point.
It only starts to work if I set a strict upper bound on the relationship or specify the direction, e.g.:
optional match (d)-[*..5]->(n)
For example, this already does not work (the answer takes forever so I have to kill the session):
optional match (d)-[*..5]-(n)
Considering that (a) I am doing a strictly local graph traversal that graph databases are supposed to be exceptionally good at, (b) clusters associated with different starting nodes are NOT connected and (c) my test data set is tiny, how can this be happening?
From the symptoms it appears that the engine simply does not keep track of which nodes and relationships were already visited when preparing the results ... or am I missing something?
UPDATE:
This was just answered via the Neo4j community forum by a Neo4j staff member:
https://community.neo4j.com/t/getting-paths-of-any-length-or-long-paths-does-not-work/18298
I wrongly assumed that Cypher would just dynamically switch from the path uniqueness traversal to the node uniqueness traversal just because the operation following the match dealt only with nodes and not with relationships.
Poor assumption on my part - not only Cypher doesn't do it automatically, there is no way AT ALL in core Cypher to drop a path during traversal if all the nodes in the path were aleady visited.
The APOC-based solution was suggested:
match (d:Doc{name: 'doc name'})
CALL apoc.path.subgraphNodes(d, {}) YIELD node as n
return d, n
In my case I have disconnected sub-graphs that are tens of thousands of nodes each and are relatively dense. This came up when trying to delete a (:Doc) node and everything that's connected to it before re-loading a new version of the sub-graph into Neo4j:
disconnect delete d, n
I see this task of "removing the old version before re-loading" as a very common operational task for sub-graphs that many people may have in their use cases... Installing and managing additional libraries (like APOC or the Graph Data Science library) seems like an overkill for something this simple... But it's either that or making the deletions more targeted.
A MATCH clause avoids traversing the same relationship twice, so that would not be the issue. However, it can still travel between the same 2 nodes multiple times (as long as different relationships are used).
The main thing to consider is that variable-length relationship patterns have exponential (time and memory) complexity. If the nodes being traversed have an average of R relevant relationships, then the MATCH clause has to traverse about R**P possible paths of length P. The higher that P gets (especially with no upper bound), the worse it gets. But a high R also hurts.

Neo4J using properties on relationships for quicker lookup?

I am yet trying to make use of neo4j to perform a complex query (similar to shortest path search except I have very strange conditions applied to this search like minimum path length in terms of nodes traversed count).
My dataset contains around 2.5M nodes of one single type and around 1.5 billion edges (One single type as well). Each given node has on average 1000 directional relation to a "next" node.
Yet, I have a query that allows me to retrieve this shortest path given all of my conditions but the only way I found to have decent response time (under one second) is to actually limit the number of results after each new node added to the path, filter it, order it and then pursue to the next node (This is kind of a greedy algorithm I suppose).
I'd like to limit them a lot less than I do in order to yield more path as a result, but the problem is the exponential complexity of this search that makes going from LIMIT 40 to LIMIT 60 usually a matter of x10 ~ x100 processing time.
This being said, I am yet evaluating several solutions to increase the speed of the request but I'm quite unsure of the result they will yield as I'm not sure about how neo4j really stores my data internally.
The solution I think about yet is to actually add a property to my relationships which would be an integer in between 1 and 15 because I usually will only query the relationships that have one or two max different values for this property. (like only relationships that have this property to 8 or 9 for example).
As I can guess yet, for each relationship, neo4j then have to gather the original node properties and use it to apply my further filters which takes a very long time when crossing 4 nodes long path with 1000 relationships each (I guess O(1000^4)). Am I right ?
With relationship properties, will it have direct access to it without further data fetching ? Is there any chance it will make my queries faster? How are neo4j edges properties stored ?
UPDATE
Following #logisima 's advice I've written a procedure directly with the Java traversal API of neo4j. I then switched to the raw Java procedure API of Neo4J to leverage even more power and flexibility as my use case required it.
The results are really good : the lower bound complexity is overall a little less thant it was before but the higher bound is like ten time faster and when at least some of the nodes that will be used for the traversal are in the cache of Neo4j, the performances just becomes astonishing (depth 20 in less than a second for one of my tests when I only need depth 4 usually).
But that's not all. The procedures makes it very very easily customisable while keeping the performances at their best and optimizing every single operation at its best. The results is that I can use far more powerful filters in far less computing time and can easily update my procedure to add new features. Last but not least Procedures are very easily pluggable with spring-data for neo4j (which I use to connect neo4j to my HTTP API). Where as with cypher, I would have to auto generate the queries (as being very complex, there was like 30 java classes to do the trick properly) and I should have used jdbc for neo4j while handling a separate connection pool only for this request. Cannot recommend more to use the awesome neo4j java API.
Thanks again #logisima
If you're trying to do a custom shortespath algo, then you should write a cypher procedure with the traversal API.
The principe of Cypher is to make pattern matching, and you want to traverse the graph in a specific way to find your good solution.
The response time should be really faster for your use-case !

neo4j shortestPath algorithm

I have a question about shortestPath algorithm in neo4j. 
If I have a graph with 10^6 nodes and each node has 1000 relationships, searching for the shortest path up to 4 levels, must search for 1000*1000*1000*1000=10^12 nodes that is higher than total nodes. The reason is that some nodes are repeated during search. My question is that in neo4j shortestPath  algorithm, this search takes time of touching 10^6 nodes or 10^12 nodes. In other words, does it mark up nodes that are already searched  to not search them again?
Thanks 
I don't believe that kind of pruning is used. In Cypher, the default uniqueness for traversals is RELATIONSHIP_PATH: within each path, a relationship must be unique, they can't be reused.
You might try using either the shortestPath proc in the Graph Algorithms project or one of APOC Procedures' path expander procs instead.
With APOC path expanders, you can either set the uniqueness yourself to NODE_GLOBAL (which prevents processing of the same nodes multiple times during all expansions), or use one of the procs that already does this under the hood (subgraphNodes(), subgraphAll(), or spanningTree()).
The gotchas (at the moment) with APOC are that you can't currently supply the end nodes of the expansion (you'll have to expand out to nodes with certain defined labels and filter your results after with a WHERE clause), and expansions only go in one direction (from start node out) instead of bi-directional (such as from cypher's shortestPath()), so you won't realize any efficiency improvements that can happen from expanding from the other direction.
I currently have a PR on APOC to supply known end nodes of the expansion, so that should make it into the next APOC release (within the next week or so).

Neo4J Dictionary Dense Node?

I am building a Neo4J graph that needs to contain a controlled vocabulary (namely the Getty AAT Thesaurus). Whenever I add a new term from the thesaurus I have a relationship:
(aat:Thesaurus)-[:LISTS]->(term:Term {term:"Something"})
I have a read a bit about a dense node problem in neo4j and am wondering if I have 100,000 t-[:LISTS]->term if that will cause a problem as our database grows. Any idea?
If there is only a single Thesaurus node, then you can get rid of the Thesaurus node and the LISTS relationships.

using neo4j get spanning tree from graph with loops

I want to find the spanning tree from graph with loops. I cannot use regular bfs traversal here. so I check the allsimplepaths java function api, It seems find loop between two nodes. right now i select a random root, but don't know the end points. so i just want to get the spanning tree from graph while the it has many loops maybe. so it should convert to DAG and then give the tree structures. The graph may have more than one spanning tree.
how to do this? can allsimplepaths applied here?
Look at TraversalDescription with an appropriate uniqueness (NODE_GLOBAL) and Path-Expanders that follow the interesting relationships.

Resources