Neo4j Hamiltonian path (TSP)

Neo4j Hamiltonian path (TSP) - neo4j

Being a newbie with graphs, I'm looking whether it's possible to use Neo4j to calculate an optimal route which passes through all entered waypoints (distances are weights of the edges).
I'm familiar with ability to use A* and Dijkstra to find shortest/cheapest paths, but haven't found an easy way to do this. Since the number of nodes for each calculation will be relatively small (< 30), I'm primarily hoping for ease of implementation with Neo4j (if possible) compared to coding the solution from scratch in Node.js, since I guess performance won't be a problem at this scale.
Thank you for your time!

for this you must take a look into gremling traversal language, which neo4j aslo implements. pure cypher core wont be of any help.

Related

Neo4j Community detection

Hello Stack overflow community;
I am working in a scholar project using Neo4j database and i need help from members which are worked before with neo4j gds in order to finding a solution for my problem;
i want to apply a community detection algorithm called "Newman-Girvan" but it doesn't exist any algorithm with this name in neo4j gds library; i found an algorithm called "Modularity Optimization", is it the Newman-Girvan algorithm and just the name is changed or it is a different algorithm?
Thanks in advance.

I've not used the newman-girvan algorithm, but the fact that it's a hierarchical algorithm with a dendrogram output suggests you can use comparable GDS algorithms, specifically Louvain, or the newest, Leiden. Leiden has the advantage of enforcing the generation of intermediary communities. I've used both algorithms with multigraphs; I believe this capability was just introduce with gdg v 2.x.
The documentation on the algorithms is at
https://neo4j.com/docs/graph-data-science/current/
https://neo4j.com/docs/graph-data-science/current/algorithms/alpha/leiden/
multigraph:
https://neo4j.com/docs/graph-data-science/current/graph-project-cypher-aggregation/

How do I specify that a subset of nodes needs to be fully connected?

Consider a number of nodes with some connections between them.
My model's task is to color the nodes. One of the conditions is that the black nodes form a fully-connected set.
How do I code that?
NB: in case it matters: the connections between symbols are a precondition.

What have you tried? Stack-overflow works the best if you show what you tried and where you got stuck. Based on how you model your graph, there could be many different ways.
Here’s a hint to get you started: in programming with z3, you usually write the code that “checks” the nodes are fully connected. Then, through the magic of constraint solving, that causes the solver to provide models that satisfy that criteria. So, start with modeling your graph and how you can check that the same-colored nodes are connected.
Note that hard problems like graph coloring, clique finding, isomorphisms etc remain hard in this realm too. They’re easier to code perhaps, but you shouldn’t expect better performance than exhaustive search for large instances on average; unless your graphs have special structure that the solver can exploit. But in that case, you’re better off using a custom algorithm anyhow, instead of relying on a general purpose SMT solver. Of course, this all depends on what your main goal is. It’s best to try multiple approaches and pick the one that performs the best.

path planning -> ways from goal to initial state?

the problem: is it true that finding a path from goal to start point is much more efficient than finding a path from start to goal?
if this is true,can some one help me out and explain why?
my opinion:
it shouldn't be different because finding a way from goal to start is just like renaming goal to start and start to goal.

The answer to your question all depends on the path finding algorithm you use.
One of the most well know path finding algorithms, A-Star (or A*), is commonly used in a reverse sense. It all has to do with the heuristics. Since we usually use proximity as the heuristic for the algorithm we can get stuck in obstacles. These obstacles might however be easier to face the other way around. A great explanation with examples can be found here. Just for clarity: if there is no certain knowledge of obstacles, then there is no predictable difference between forwards and backwards path finding by A*.
Another reason why you might want to reverse the pathfinding is if you have multiple actors trying to reach the same goal. Instead of having to execute A*, or another path finding algorithm, for every actor you can combine them into a single executing of a graph explorational path finding algorithm. For example, a variation on Dijkstra's algorithm could find all the shortest distances to all actors in one graph exploration.

sharding a neo4j graph, min-cut

I've heard of a max flow min cut method for sharding or segmenting a graph database. Does someone have a sample cypher query that can do that say against the movielens dataset? Basically I want to segment users into different shards/clusters based on what they like so maybe the min cuts can naturally find clusters of users around the genres say Horror, Drama, or maybe it will create non-intuitive clusters/segments like hipster/romantics and conservative/comedy/horror groups.

my short answer is no, sorry I don't know how you would express that.
my longer answer is even if this were possible - which it very well may be - I would advise against it.
multiple algorithms 'do' min-cut max-flow, these will all have different performance characteristics and, because clustering is computationally expensive, I'd guess you want control over the specific algorithm implementation used.
Cypher is a declarative language, you specify what you're looking for but not how to do it, and it will be difficult to specify such a complex problem in a way that the Cypher engine can figure out what you're trying to do. that will make it hard for Cypher (or any declarative language engine) to produce an efficient query plan.
my suggestion is find the specific algorithm you wish to use and implement it using the Neo4j Java API.
if you're running Neo4j in embedded mode you're done at that point. if you're running Neo4j server you'll then just have to run that code as an Unmanaged Server Extension
AFAIK you're after 'Community Detection' algorithms. There are non-overlapping (communities do not overlap) and overlapping variants, where non-overlapping is generally easier to implement and understand. Common algorithms are:
Non-overlapping: Louvain
Overlapping: Label Propagation Algorithm (LPA) (typically non-overlapping, but there are extensions to make it overlapping)
Here are a few C++ code examples for the algorithms: Louvain, Oslom (overlapping), LPA (non-overlapping), and Infomap)
And if you want bleeding edge I was recommended the SCD algorithm
Academic paper: "High Quality, Scalable and Parallel Community Detection for Large Real Graphs"
C++ implementation

Neo4J Performance Benchmarking

I have created a basic implementation of high level client over Neo4J (https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-neo4j) and want to compare its performance with Native neo4j driver (and maybe SpringData too). This way I would be able to determine overhead my library is putting over native driver.
I plan to create an extension of YCSB for Neo4J.
My question is: what should be considered as a basic unit of object to be written into neo4j (should it be a single node or a couple of nodes joined by an edge).
What's current practice in Neo4J world. How people benchmarking neo4j performance are doing it.

There's already been some work for benchmarking Neo4J with Gatling: http://maxdemarzi.com/2013/02/14/neo4j-and-gatling-sitting-in-a-tree-performance-t-e-s-t-ing/
You could maybe adapt it.

See graphdb-benchmarks
The project graphdb-benchmarks is a benchmark between popular graph dataases. Currently the framework supports Titan, OrientDB, Neo4j and Sparksee. The purpose of this benchmark is to examine the performance of each graph database in terms of execution time. The benchmark is composed of four workloads, Clustering, Massive Insertion, Single Insertion and Query Workload. Every workload has been designed to simulate common operations in graph database systems.
Clustering Workload (CW): CW consists of a well-known community detection algorithm for modularity optimization, the Louvain Method. We adapt the algorithm on top of the benchmarked graph databases and employ cache techniques to take advantage of both graph database capabilities and in-memory execution speed. We measure the time the algorithm needs to converge.
Massive Insertion Workload (MIW): Create the graph database and configure it for massive loading, then we populate it with a particular dataset. We measure the time for the creation of the whole graph.
Single Insertion Workload (SIW): Create the graph database and load it with a particular dataset. Every object insertion (node or edge) is committed directly and the graph is constructed incrementally. We measure the insertion time per block, which consists of one thousand edges and the nodes that appear during the insertion of these edges.
Query Workload (QW): Execute three common queries:
FindNeighbours (FN): finds the neighbours of all nodes.
FindAdjacentNodes (FA): finds the adjacent nodes of all edges.
FindShortestPath (FS): finds the shortest path between the first node and 100 randomly picked nodes.

One way to performance-test is to use e.g. http://gatling-tool.org/. There is work underway to create benchmark frameworks at http://ldbc.eu . Otherwise, benchmarking is highly dependent on your domain dataset and the queries you are trying to do. Maybe you could start at https://github.com/neo4j/performance-benchmark and improve on it?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart