Are there internal algorithms in neo4j for community detection? - neo4j

Gephi uses Louvain algorithm for detection community detection in graphs. Are there built-in algorithms in neo4j like Gephi's Louvain for community detection? As I have found in neo4j's help docs, there are only shortest path algorithms.

No it doesn't.
However there's also Dijkstra and A* and the traversal framework provides facilities for writing such algorithms (something that at least the Dijkstra algo uses).

Related

Neo4j K-means algorithm

Hello Stack Overflow community,
I'm really need a help about something,
I want to apply a community detection algorithm on a graph, contains distances between person ( social network)
I want to know if neo4j k-means algorithm of community détection works with this type of graphs ?
The k-means for Graph Data Science (GDS) is in alpha mode, meaning on the early stage of development and can have major change(s) on it.
Here is the documentation: https://neo4j.com/docs/graph-data-science/current/algorithms/alpha/kmeans/
Enjoy reading it!

Neo4j Community detection

Hello Stack overflow community;
I am working in a scholar project using Neo4j database and i need help from members which are worked before with neo4j gds in order to finding a solution for my problem;
i want to apply a community detection algorithm called "Newman-Girvan" but it doesn't exist any algorithm with this name in neo4j gds library; i found an algorithm called "Modularity Optimization", is it the Newman-Girvan algorithm and just the name is changed or it is a different algorithm?
Thanks in advance.
I've not used the newman-girvan algorithm, but the fact that it's a hierarchical algorithm with a dendrogram output suggests you can use comparable GDS algorithms, specifically Louvain, or the newest, Leiden. Leiden has the advantage of enforcing the generation of intermediary communities. I've used both algorithms with multigraphs; I believe this capability was just introduce with gdg v 2.x.
The documentation on the algorithms is at
https://neo4j.com/docs/graph-data-science/current/
https://neo4j.com/docs/graph-data-science/current/algorithms/alpha/leiden/
multigraph:
https://neo4j.com/docs/graph-data-science/current/graph-project-cypher-aggregation/

Using mahout for anomaly detection

Can anyone please help me in letting me know if there is any good library for doing anomaly detection using mahout?
Amongst other algorithms Mahout has an OnlineSummarizer which uses the T-Digest algorithm to compute online descriptive statistics. For an example of using The OnlineSummarizer for anomaly detection please see: Strata 2014-anomaly-detection.

sharding a neo4j graph, min-cut

I've heard of a max flow min cut method for sharding or segmenting a graph database. Does someone have a sample cypher query that can do that say against the movielens dataset? Basically I want to segment users into different shards/clusters based on what they like so maybe the min cuts can naturally find clusters of users around the genres say Horror, Drama, or maybe it will create non-intuitive clusters/segments like hipster/romantics and conservative/comedy/horror groups.
my short answer is no, sorry I don't know how you would express that.
my longer answer is even if this were possible - which it very well may be - I would advise against it.
multiple algorithms 'do' min-cut max-flow, these will all have different performance characteristics and, because clustering is computationally expensive, I'd guess you want control over the specific algorithm implementation used.
Cypher is a declarative language, you specify what you're looking for but not how to do it, and it will be difficult to specify such a complex problem in a way that the Cypher engine can figure out what you're trying to do. that will make it hard for Cypher (or any declarative language engine) to produce an efficient query plan.
my suggestion is find the specific algorithm you wish to use and implement it using the Neo4j Java API.
if you're running Neo4j in embedded mode you're done at that point. if you're running Neo4j server you'll then just have to run that code as an Unmanaged Server Extension
AFAIK you're after 'Community Detection' algorithms. There are non-overlapping (communities do not overlap) and overlapping variants, where non-overlapping is generally easier to implement and understand. Common algorithms are:
Non-overlapping: Louvain
Overlapping: Label Propagation Algorithm (LPA) (typically non-overlapping, but there are extensions to make it overlapping)
Here are a few C++ code examples for the algorithms: Louvain, Oslom (overlapping), LPA (non-overlapping), and Infomap)
And if you want bleeding edge I was recommended the SCD algorithm
Academic paper: "High Quality, Scalable and Parallel Community Detection for Large Real Graphs"
C++ implementation

Does MapR have scalable machine learning algos. Like Mahout?

I am specifically wondering if MapR has Kmeans clustering just like Mahout?
As far as I know, MapR is only a "faster" Hadoop. There are no algorithms included.
So your jobs should be compatible.
But what is the deal in implementing your own? K-means is ultra simple. See my blog post:
http://codingwiththomas.blogspot.com/2011/05/k-means-clustering-with-mapreduce.html
However I have implemented a k-means clustering with BSP (Bulk Synchronous Parallel) and Apache Hama which is almost ten times faster if you compare it with the Mahout benchmark results in this book: http://www.manning.com/ingersoll/ (linked jira: https://issues.apache.org/jira/browse/MAHOUT-588)
Here is the benchmark of k-means with Apache Hama: http://wiki.apache.org/hama/Benchmarks
You can find it here:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/KMeansBSP.java

Resources