I have a graph, and i want to use the apoc dijkstra algorithm on it, so far everything is working. But i want to exclude certain nodes or node properties from the possible path, so that the dijkstra algorithm doesnt return a path that contains these excluded nodes or properties.
Is it possible, for example, to filter all existing nodes BEFORE calling the apoc.dijkstra algorithm?
I know that is it possible to filter the found path AFTER the algorithm, but then it is possible that there is a possible path in the graph that was not found, because the filterting of the nodes occurred afterwards..
Apoc dijkstra is an old an deprecated implementation of Dijkstra algorithm. You should check out the Graph Data Science plugin at https://neo4j.com/docs/graph-data-science/current/. It supports shortest weighted path algorithm or otherwise known as dijkstra algorithm, https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/shortest-path/. You can can define which nodes you want and relationships you want to traverse when projecting the graph.
Hope this helps!
Related
I am having a discussion with a friend if the following will work:
We recently learned in a lecture about Breadth-First-Search. I know that it is a special case of Dijkstra where each edge weight is set to one. Assume now we are given a graph where the edges have integer weights of more than one. Then I would modify this graph by introducing additional vertices and connecting them by edges with weight one, e.g. assume we have an edge of weight 3 connecting the vertices u and v, then I would introduce dummy-vertices d1, d2, remove the edge connecting u and v and instead add edges {u, d1}, {d1, d2}, {d2,v} of weight one.
If I modify my whole graph this way and then apply breadth-first search starting from one of the original vertices, wouldn't this work as well?
Thank you very much in advance!
Since BFS is guaranteed to return an optimal path on unweighted graphs, and you've created the unweighted equivalent of your original graph, you'll be guaranteed to get the shortest path.
What you lose by doing this over Dijkstra's algorithm is runtime optimality. Now the runtime of your algorithm is dependent on the edge weights, whereas Dijkstra's is only dependent on the number of edges.
This sort of thought experiment is a great way to understand how Dijkstra's algorithm works (eg. how would you modify your algorithm to not require creating a new graph? Or not take 100 steps for an edge with weight 100?). In fact this is probably how Dijkstra discovered the algorithm to begin with.
I was working on to find the shortest path between two nodes in undirected acyclic graph using Dijkstra's algorithms. I wanted to find the longest path that is possible by the same algorithm. I also want to avoid few routes with 0 edge values. How do I do that using Dijkstra's algorithm?
Now after searching through Stackoverflow I came across one given solution which just states that we need to modify the relaxation part to find the longest path.
Like:
if(distanceValueOfNodeA< EdgeValueofNodeBtoA )
{
distanceValueOfNodeA = EdgeValueofNodeBtoA;
}
But we are not considering adding distanceValueOfNodeB
But for shortest paths we calculate:
distanceValueOfNodeA = distanceValueOfNodeB+EdgeValueofNodeBtoA
Should we ignore distanceValueOfNodeB to calculate distanceValueOfNodeA ?
I am sorry to disappoint you but that problem is known as Longest path in a graph and there isn't an efficient algorithm to solve it, so niether Djikstra algorithm with any modification can.
It belongs to a class of problems known as NP-hard,those are problems for which there isn't (at the moment) an algorithm to solve them in faster time complexity compared to exponential.
We want to present our data in a graph and thought about using one of graphdbs. During our vendor investigation process, one of the experts suggested that using graphdb on dense graph won't be efficient and we'd better off with columnar-based db like cassandra.
I gave your use case some thought and given your graph is very dense (number of relationships = number of nodes squared) and that you seem to only need a few hop traversals from the particular node along different relationships. I’d actually recommend you also try out a columnar database.
Graph databases tend to work well when you have sparse graphs (num of relationships << num of nodes ^ 2) and with deep traversals - from 4-5 hops to hundreds of hops. If I understood your use-case correctly, a columnar database should generally outperform graphs there.
Our use case will probably end up with nodes connected to 10s of millions of other nodes with about 30% overlap between different nodes - so in a way, it's probably a dense graph. Overall there will be probably a few billion nodes.
Looking in Neo4j source code I found some reference of isDense flag on the nodes to differentiate the processing logic - not sure what that does. But I also wonder whether it was done as an edge case patch and won't work well if most of the nodes in the graph are dense.
Does anyone have any experience with graphdbs on dense graphs and should it be considered in such cases?
All opinions are appreciated!
When the use of graph DB comes into mind it shows multiple tables are linked with each other, which is a perfect use case for graph DB.
We are handling JansuGraph with a scale of 20B vertices and 15B edges. It's not a large dense graph with a vertex connected with 10s M vertices. But still, we observed the super node case, where a vertex is connected with more number of vertices than expectation. But with our use case while doing traversal (DFS) we always traverse with max N children nodes of a node and a limited depth say M, which is absolutely fine considering the number of joins required in non-graph DBS (columnar, relational, Athena, etc..).
The only way (i feel) to get all relations of a node is to do a full DFS or inner joins datasets until no common data found.
Excited to know more about other creative solutions.
I do not have experience with dense graphs using graph databases, but I do not think that dense graph is a problem. Since You are going to use graph algorithms, I suppose, You would benefit from using graph database (depending on the algorithms complexity - the more "hops", the more You benefit from constant edge traversing time).
A good trade-off could be to use one of not native graph databases (like Titan, its follow-up JanusGraph, Mongo Db, ..), which actually uses column based storages (Cassandra, Barkley DB, .. ) as its backend.
I have a question about weighted graphs in neo4j. Is a property (like ".setProperty("cost", weight)") the only way of constructing a weighted graph. The problem is that a program, which often needs this weights by "(Double) rel.getProperty("cost")" will get too slow, because the cast takes some time;
Well, you actually could encode the weight into the relationship type which is faster, something like
create a-[:`KNOWS_0.34`]->b
http://console.neo4j.org/r/2dez98 for an example.
Can some one explain (or quote a reference) to compare the scoring mechanism used by SOLR and LUCENE in simpler words.
Is there any difference in them;
I am not that good at solr/lucene but my finding showed as if they are different.
P.S: i just tries a simple query like "+Contents:risk" and didn't use any filter other stuff.
Lucene uses concepts from the Vector space model to compute the score of documents. In summary, queries and documents can be seen as vectors. To compute the score of a document for a particular query, Lucene calculates how near each document's vector are from the query's vector. The more a document is near the query in VSM, the higher the score. You can have more details by looking at Lucene's Similarity class and Lucene's Scoring document.
The actual formula can be found in the Similarity javadocs.
Here's a summary of the parameters involved and a brief description of what they mean.
Solr uses Lucene under the hood, and by default Solr uses the default Lucene similarity algorithm.